a better system for filetypes


there are currently 2 filetype systems that are widley used:


both have various issues, one being that many file formats are known by multiple names. MIME is also often overly lengthy, and also names things strangly. for example, the default type for the whole system is "application/octet-stream".


how i would do better


first, ditch the awkward 2 level hierarchy in MIME.


i would propose a more flexible hierarchy, stored externally. not embedding this data in the type itself means that it can be changed later, which is useful if you have something like YAML that is a superset of JSON, but created later.


also, assign supertypes in a logical manner. yes, xml is a textual format, and should be denoted as such. as i see it, the primary use case for supertypes is providing fallbacks for what to use to edit/display a file.


because of this, A should only be denoted a subtype of B if every valid document of type A is also a valid document of type B.


another consideration is formats that encapsulate other formats, like a gzipped tarball. some equivalent of mime type paramaters would probably also be necessary.


type database


a type database would likely look something like this:


bin

txt < bin

sgml < txt

html < sgml

xml < sgml

xhtml < xml


/gemlog/