International characters in XUL JavaScript - Archive of obsolete content

Gecko 1.8, as used in Firefox 1.5 and other applications, added support for non-ASCII characters in JavaScript files loaded from XUL files.

This means that such script files can use any character from virtually any language of the world. For example, they can contain a line:

var text = "Ein schönes Beispiel eines mehrsprachigen Textes: 日本語";

This mixes German and Japanese characters.

Earlier versions always interpreted JS files loaded from XUL as ISO-8859-1 (Latin-1), in both local and remote cases. Unicode escapes, as discussed below, have always worked.

How the character encoding is determined in Gecko 1.8 and later

When the JavaScript file is loaded from a chrome:// URL, a Byte Order Mark (FIXME: )(BOM) is used to determine the character encoding of the script. Otherwise, the character encoding will be the same as the one used by the XUL file (which can be specified using an encoding attribute in the <?xml?> tag). By default this will use UTF-8, which can represent virtually all characters in the world.

If the script file is loaded via HTTP, the HTTP header can contain a character encoding declaration as part of the Content-Type header, for example:

Content-Type: application/javascript; charset=UTF-8

If no charset parameter is specified, the same rules as above apply.

Cross-version compatibility

If you want the same code to work in both Gecko 1.8 and earlier versions, you must limit yourself to ASCII. However, you can use unicode escapes – the earlier example rewritten using them would be:

var text = "Ein sch\u00F6nes Beispiel eines mehrsprachigen Textes: \u65E5\u672C\u8A9E";

An alternative might be to use property files via nsIStringBundle or the XUL <stringbundle> element; this would allow for localization of the XUL. This can not be done in XUL files loaded from the web, only in privileged code, e.g. in extensions.