Writing textual data

This article describes how to write textual data to streams, files and sockets in an internationalization-aware way.

When writing textual data to an output stream or to a file, you need to pick a character encoding.

Some character encodings (UTF-8, UTF-16, UTF-32) can represent "all" characters (the full repertoire of Unicode) while others can only represent a subset of the full repertoire.

When the file is to be read only by the application/extension itself, using UTF-8 is often the best choice — it can represent all characters, and ASCII characters are represented efficiently.

Writing to a stream

In Gecko 1.8 (SeaMonkey 1.0, Firefox 1.5), you can use nsIConverterOutputStream:

var charset = "UTF-8"; // Can be any character encoding name that Mozilla supports

var os = Components.classes["@mozilla.org/intl/converter-output-stream;1"]
                   .createInstance(Components.interfaces.nsIConverterOutputStream);

// This assumes that fos is the nsIOutputStream you want to write to
os.init(fos, charset, 0, 0x0000);

os.writeString("Umlaute: \u00FC \u00E4\n");
os.writeString("Hebrew:  \u05D0 \u05D1\n");
// etc.

os.close();

You can also write character arrays using the write function, although using writeString is simpler from JavaScript code.

The example above passes 0 as the third argument, which disables buffering (Note: the implementation of the converter stream might not support buffering). Passing 0 ensures that data will be immediately written to the underlying stream; however, for better perfomance you should pass a value like 4096 instead.

Unsupported characters

You can specify what should happen with characters that are not supported by the selected character encoding. The last argument to init specifies that: 0x0000 means that writing unsupported characters throws an exception (with an error code of NS_ERROR_LOSS_OF_SIGNIFICANT_DATA), and no data will be written.

To instead write a replacement character in such cases, specify the character instead of 0x00:

os.init(fos, charset, 0, "?".charCodeAt(0));

You can, of course, replace the "?" in that example with any other character. You can also specify any unicode character U+ABCD directly as 0xABCD.

Note: If the replacement character is not a supported character in the chosen character encoding, attempts to write unsupported characters will fail with NS_ERROR_LOSS_OF_SIGNIFICANT_DATA.

Versions before Gecko 1.8

Firefox 1.0.x, Mozilla 1.7.x and older versions do not support nsIConverterOutputStream.

Alternative ways usable from JavaScript do not support character encodings that use embedded nulls (such as UTF-16 and UTF-32). They work by manually converting the string you want to write to a byte sequence using nsIScriptableUnicodeConverter, and writing that to the stream.

Here's an example:

// First, get and initialize the converter
var converter = Components.classes["@mozilla.org/intl/scriptableunicodeconverter"]
                          .createInstance(Components.interfaces.nsIScriptableUnicodeConverter);
converter.charset = /* The character encoding you want, using UTF-8 for this example */ "UTF-8";

Now you can convert and write to the stream:

// This code assumes that os is your nsIOutputStream
// your_string here is the string you want to write.
var chunk = converter.ConvertFromUnicode(your_string);
os.write(chunk, chunk.length);
// Repeat as needed for further strings

At the end, you need to call Finish and write its data to the stream. Note that not many character encodings need it, but for those that do calling this is important for proper output.

 var fin = converter.Finish();
 if (fin.length > 0)
   os.write(fin, fin.length);
 os.close();

Converting a string into a stream

Sometimes, it is useful to convert a string into a stream, for example for uploading it using nsIUploadChannel.

The example here requires Gecko 1.8 (Firefox 1.5, SeaMonkey 1.0).

nsIScriptableUnicodeConverter has a simple method to do that:

// First, get and initialize the converter
var converter = Components.classes["@mozilla.org/intl/scriptableunicodeconverter"]
                          .createInstance(Components.interfaces.nsIScriptableUnicodeConverter);
converter.charset = /* The charset you want to use. Using UTF-8 in this example */ "UTF-8";

// Now, convert a string to an nsIInputStream
var stream = converter.convertToInputStream("A string with non-ASCII characters: \u00FC \u05D0\n");
// stream can now be used as an nsIInputStream

See Also

Reading textual data