Encodings for localization files

When creating a localization for Mozilla products, it’s important to be aware of the encoding of the files that you generate.

In general, files in the Mozilla repositories are UTF-8 encoded. There are a few exceptions, though.

Installer

The Windows installer can’t handle UTF-8, but only the codepages provided by Windows. This is tricky to hook up in the build process, so here it goes:

File Encoding Notes
toolkit/installer/windows/charset.mk ASCII The WIN_INSTALLER_CHARSET variable must be set to an encoding which matches toolkit/installer/windows/install.it CHARSET= parameter. See the table below for appropriate values.
toolkit/installer/windows/install.it A Windows codepage. This must match the CHARSET= parameter in this file, and the WIN_INSTALLER_CHARSET parameter in charset.mk The FONTNAME/FONTSIZE/CHARSET parameters in this file must be set to good values. For most Western scripts, β€˜MS Sans Serif’ and β€˜8’ are good defaults for the font settings. Eastern scripts will need to choose appropriate fonts that are shipped with Windows. See the table below for appropriate values for the CHARSET= parameter.
browser/installer/installer.inc UTF-8
toolkit/installer/unix/install.it UTF-8

Native Windows encodings

The following table lists native Windows encodings, and the WIN_INSTALLER_CHARSET and CHARSET= values for each:

Encoding Name WIN_INSTALLER_CHARSET (charset.mk) CHARSET= (windows/install.it)
ANSI_CHARSET CP1252 0
BALTIC_CHARSET CP1257 186
CHINESEBIG5_CHARSET CP950 136
EASTEUROPE_CHARSET CP1250 238
GB2312_CHARSET CP936 134
GREEK_CHARSET CP1253 161
HANGUL_CHARSET CP949 129
RUSSIAN_CHARSET CP1251 204
SHIFTJIS_CHARSET CP932 128
TURKISH_CHARSET CP1254 162
VIETNAMESE_CHARSET CP1258 163
Middle East language editions of Windows:
ARABIC_CHARSET CP1256 178
HEBREW_CHARSET CP1255 177
Thai language editions of Windows:
THAI_CHARSET CP874 222