This article describes how to write textual data to streams, files and sockets in an internationalization-aware way.
When writing textual data to an output stream or to a file, you need to pick a character encoding.
Some character encodings (UTF-8, UTF-16, UTF-32) can represent "all" characters (the full repertoire of Unicode) while others can only represent a subset of the full repertoire.
When the file is to be read only by the application/extension itself, using UTF-8 is often the best choice — it can represent all characters, and ASCII characters are represented efficiently.
Writing to a stream
In Gecko 1.8 (SeaMonkey 1.0, Firefox 1.5), you can use
var charset = "UTF-8"; // Can be any character encoding name that Mozilla supports var os = Components.classes["@mozilla.org/intl/converter-output-stream;1"] .createInstance(Components.interfaces.nsIConverterOutputStream); // This assumes that fos is the
nsIOutputStreamyou want to write to os.init(fos, charset, 0, 0x0000); os.writeString("Umlaute: \u00FC \u00E4\n"); os.writeString("Hebrew: \u05D0 \u05D1\n"); // etc. os.close();
You can also write character arrays using the
write function, although using
The example above passes 0 as the third argument, which disables buffering (Note: the implementation of the converter stream might not support buffering). Passing 0 ensures that data will be immediately written to the underlying stream; however, for better perfomance you should pass a value like 4096 instead.
You can specify what should happen with characters that are not supported by the selected character encoding. The last argument to init specifies that: 0x0000 means that writing unsupported characters throws an exception (with an error code of NS_ERROR_LOSS_OF_SIGNIFICANT_DATA), and no data will be written.
To instead write a replacement character in such cases, specify the character instead of 0x00:
os.init(fos, charset, 0, "?".charCodeAt(0));
You can, of course, replace the
"?" in that example with any other character. You can also specify any unicode character U+ABCD directly as 0xABCD.
Note: If the replacement character is not a supported character in the chosen character encoding, attempts to write unsupported characters will fail with NS_ERROR_LOSS_OF_SIGNIFICANT_DATA.
Versions before Gecko 1.8
Firefox 1.0.x, Mozilla 1.7.x and older versions do not support
nsIScriptableUnicodeConverter, and writing that to the stream.
Here's an example:
// First, get and initialize the converter var converter = Components.classes["@mozilla.org/intl/scriptableunicodeconverter"] .createInstance(Components.interfaces.nsIScriptableUnicodeConverter); converter.charset = /* The character encoding you want, using UTF-8 for this example */ "UTF-8";
Now you can convert and write to the stream:
// This code assumes that os is your nsIOutputStream // your_string here is the string you want to write. var chunk = converter.ConvertFromUnicode(your_string); os.write(chunk, chunk.length); // Repeat as needed for further strings
At the end, you need to call
Finish and write its data to the stream. Note that not many character encodings need it, but for those that do calling this is important for proper output.
var fin = converter.Finish(); if (fin.length > 0) os.write(fin, fin.length); os.close();
Converting a string into a stream
Sometimes, it is useful to convert a string into a stream, for example for uploading it using
The example here requires Gecko 1.8 (Firefox 1.5, SeaMonkey 1.0).
nsIScriptableUnicodeConverter has a simple method to do that:
// First, get and initialize the converter var converter = Components.classes["@mozilla.org/intl/scriptableunicodeconverter"] .createInstance(Components.interfaces.nsIScriptableUnicodeConverter); converter.charset = /* The charset you want to use. Using UTF-8 in this example */ "UTF-8"; // Now, convert a string to an nsIInputStream var stream = converter.convertToInputStream("A string with non-ASCII characters: \u00FC \u05D0\n"); // stream can now be used as an nsIInputStream