How Mozilla determines MIME Types

Introduction

All data handling in Mozilla is based on the MIME type of the content. This means that every time an URI is loaded, Mozilla must find out its MIME type. The several ways how this happens are described in this document.

Content-Type "hints"

Mozilla has a concept of "content-type hints". This means that, for example, if Mozilla encounters a <link type="text/css" rel="stylesheet" href="..."> element, a type of text/css will be assumed. This is, however, overridden by the actual MIME type the server sends (if any). (For this specific example, the server override only happens in standards-compliant mode. See Mozilla's Quirks Mode or Web Author FAQ).

Similar handling happens for <a href="..." type="foo/bar">, starting in Mozilla 1.6alpha.

HTTP

For HTTP URIs Mozilla usually gets a MIME type sent from the server, and uses it. Contrary to Internet Explorer's MIME type guessing, Mozilla will generally not sniff the type of the document. However, starting in Mozilla 1.7alpha, Mozilla does do content sniffing, like this:

When the Content-Type sent by the server is one of (case-sensitively)

  • text/plain
  • text/plain; charset=ISO-8859-1
  • text/plain; charset=iso-8859-1

and the server did not send a Content-Encoding header, Mozilla will sniff the first block of data it gets and check for non-text bytes. Text bytes are 9-13, 27, and 31-255. When encountering a non-text byte, the helper app dialog will be shown, showing the MIME type corresponding to the extension of the file.

Also, for images loaded via <img src>, Mozilla's image library will do content sniffing (never extension sniffing) to find out the real type of the image.

If the server did not send a Content-Type header, Mozilla uses the unknown decoder to find a MIME type.

File URIs

For file: URIs, Mozilla will ask the ExternalHelperAppService for a MIME type.

FTP

Like HTTP URIs without a MIME type, FTP URIs go through the unknown decoder.

Unknown Decoder

Located at netwerk/streamconv/converters/nsUnknownDecoder.cpp, the interesting part starts at line 287, the sSnifferEntries array together with the DetermineContentType function. It does the following:

  • Checks the start of the file for "magic numbers"; this can currently detect PDF and Postscript.
  • If the file starts with <?xml, asks the ExternalHelperAppService for a MIME type for the URI. This is done because the generic text/xml MIME type does not work for XUL files, and XHTML files get a different DOM when interpreted as text/xml.
  • The Image Library will be asked for the MIME type given the content. This should allow reliable detection of all image types Mozilla supports.
  • Checks whether the data is HTML by looking for some common HTML tags.
  • The URI is handed to the ExternalHelperAppService for MIME type guessing
  • If all else fails, the buffer (i.e. the first few bytes of the file) is searched for embedded nulls; if none are found, text/plain will be used, otherwise application/octet-stream.

ExternalHelperAppService

(located at uriloader/exthandler/nsExternalHelperAppService.cpp)

The file->MIME type mapping works like this:

  • On BeOS, the operating system is asked for the type of the file (not quite yet, bug 217723)
  • On MacOS, the type and creator code will be used to lookup the type of the file from the OS
  • a hardcoded list of extensions is checked (containing currently 13 entries, nsExternalHelperAppService.cpp line 463 (This is done for speed – it is faster to find data in the hardcoded list than asking the OS or looking in preferences)
  • If the extension is not listed there, it becomes interesting. Firstly, the Operating System is asked for a MIME type. (On Unix, this means checking mime.types.)
  • If that fails, a user-supplied helper app is searched for by extension, and the specified MIME type will be used. (i.e. the list in Edit/Preferences/Helper Applications) If that failed, a list of "extra" MIME types is searched for an extension match. See line 507 for the complete list.
  • If that also failed, the list of loaded plugins is checked for a plugin that can handle this extension, and is asked for the MIME type
  • If no plugin is registered, the ext-to-type-mapping XPCOM category is searched for the extension. This allows extensions to register additional mappings. The key of the category entry is the extension without leading dot, the value is the MIME type. The extension must be lowercase.
  • If no ext-to-type-mapping category is found, the ExternalHelperAppService returns application/x-extension-EXT, where EXT is the extension of the file.

Helper Applications

A somewhat related issue are the helper applications. When loading an URI with a type that Mozilla can not handle, a helper app dialog shows up, and the displayed information comes from these sources:

  • Ask the OS for a handler of the given <extension, MIME Type> pair. Note that the extension here comes from the Content-Disposition header if present, and from the URL itself otherwise. This is where the listed "default application" comes from.
  • The "data source" (that is, the list of helper applications) is searched for an entry with the MIME type of the URI. The data source is the mimeTypes.rdf file in the profile directory. If this fails, the data source is searched via the extension (Content-Disposition as above). If one of these lookups succeed, this is where the application in the "Open with" field comes from, and also where the description of the type comes from.
  • If this also failed, the extras are searched again, and will supply the extension-list and a description of the MIME type.

Document Loading - From Load Start to Finding a Handler

Original Document Information