Reading from Files

File and Stream Guide: [ nsIScriptableIO | Accessing Files | Getting File Information | Reading from Files | Writing to Files | Moving, Copying and Deleting Files | Uploading and Downloading Files | Working With Directories ]

Important note: The pages from the File and Stream Guide use the IO object (nsIScriptableIO), which was not available in any released version of the platform (pending some fixes). There are alternative XPCOM APIs you can use, your help in updating this pages to use the supported API is very much welcome!

Other documentation on files and I/O not using the unavailable nsIScriptableIO APIs: Code snippets: File I/O, Open and Save Dialogs, Reading textual data, Writing textual data, List of file-related error codes.

Reading data from a file involves getting a reference to a file and then creating an input stream to read from it. An input stream provides a means of reading bytes, strings or other values from the file. While there is only one method to create an input stream, it provides a number of options to control exactly how reading is performed. For instance, there are options to read a file as if it were text or a binary file. Using the former will cause characters in the stream to be interpreted with a particular character encoding. In addition, lines may be read by reading the file up until a linefeed character is detected. A binary stream would be used to read bytes or numbers encoded within the file.

To create an input stream, first get a reference to a nsIFile, and then use nsIScriptableIO.newInputStream() to open a stream for reading from it. To learn more about file objects, see Files and Streams.

var file = IO.getFile("Home", "sample.txt");
var stream = IO.newInputStream(file, "text");

This example first retrieves a file object using nsIScriptableIO.getFile(). In this case, the file 'sample.txt' within the user's home directory is retrieved. Next, nsIScriptableIO.newInputStream() is called to create a new input stream for reading from the file. This method will open the file and return a stream. Naturally, if the file does not exist, an error will occur. You may wish to use nsIFile.exists() to check if the file exists before opening it.

In fact, it is a good idea to enclose file reading and writing operations within a try-catch block to capture any errors that might occur during the process. See File Errors for a list of errors that might occur.

The newInputStream method takes two arguments in this example. The first argument is the file to read from, and the second is a set of flags which control the reading. In this case, the 'text' flag is used which indicates that you expect this file to be text file. This method actually takes a number of additional arguments, however they are optional so they do not need to be specified if they aren't needed. These extra arguments will be discussed later.

You can also specify to read from a binary file by using the flag 'binary' instead of 'text'. The difference is that text streams process the bytes being read into characters in a particular character encoding whereas binary streams always read bytes. First, let's look at reading form text files.

Reading Text Files

Characters are interpreted from a text input stream using a specified character encoding. This means that if a character within the file occupies several bytes, it will converted into a single character when read. This is done automatically as long as you specify the 'text' flag to nsIScriptableIO.newInputStream(). The default character encoding is UTF-8 which means that characters below value 128 will occupy a single byte whereas characters above value 128 will occupy multiple bytes, depending on their value. However, a number of other character encodings are available; see Reading Other Character Encodings below for details about reading text in other encodings.

There are several methods available to read from an input stream. The two most common for text files are the readString and readLine methods for reading a string and a line from the file respectively.

var file = IO.getFile("Home", "sample.txt");
var stream = IO.newInputStream(file, "text");
var str = stream.readString(20);
stream.close();

In this example, a text input stream is created for the file 'sample.txt'. The readString method is called to read a 20 character string from the file. Since the stream was just opened, the string will be read from the beginning of the file. A further read from the file will read additional characters after the first 20. The readString takes one argument, the number of characters to read from the file. Note that this is distinct from the number of bytes to read. Depending on the character encoding, it is possible that more than 20 bytes will be read from the file in order to retrieve 20 characters from the file.

The readString method returns the string read from the stream. Usually this string have a length equal to the number of characters that were requested. However, it may not do, for instance if the end of the file was reached. This doesn't trigger an error, and the string will just be returned containing as many characters as are available.

You can use the available method to check if data is available for reading:

var file = IO.getFile("Home", "sample.txt");
var stream = IO.newInputStream(file, "text");
var str = stream.readString(stream.available());
stream.close();

In this example, the available method is called to determine the number of available bytes for reading. Note that this method returns the number of bytes available, not the number of characters available, because the file hasn't been read and parsed yet, so the actual number of characters isn't known yet. Fortunately as a character is always at least one byte long in the default UTF-8 encoding, the number of characters will always be equal to or smaller than the number of bytes available. For normal file streams, the available method will return the total number of bytes left in the file to read. In effect, the above code ends up reading the entire contents of the file into a single string.

The available method returns 0 if there is no more data to read, so you could use a loop to read a number of strings. This example keeps reading 10 character strings from a file until there is no more data to read. Note that the last string read may be less than ten characters long.

while(stream.available())
  output += stream.readString(10);

The close method is used at the end to close the stream when you are finished reading from it. You should always strive to close a stream when you have finished reading or writing to it to ensure that it doesn't remain open longer than necessary.

Reading Lines

The readLine method may be used to read a line from the file. This method doesn't take a count of characters to read but instead keeps reading until the end of a line is reached. This method handles all the different types of end of line characters and combinations, so you do not need to worry about platform specific conventions. The line read from the file is returned by the readLine method. It is important to note that the newline characters themselves are not included in the returned string.

var lines = [];
while(stream.available())
  lines.push(stream.readLine());

In this example, each line from a stream is read and added to an array. The result is an array containing each of the lines in the file.

Reading Other Character Encodings

The default character encoding is UTF-8. If you know that a file is stored using a different encoding, you can specify a third argument to nsIScriptableIO.newInputStream() which specifies the encoding. This example opens a file using the UTF-16 encoding.

var stream = IO.newInputStream(file, "text", "UTF-16");

This third argument is not needed if the file is stored in UTF-8. For a list of supported character encodings, see Supported Character Sets.

The files can be read in the same manner using the readString and readLine methods. The difference is handled internally so you don't need to write any other part of the code differently.

Reading Binary Data

In addition to text, binary values may be read from a file either as bytes, or interpreted as numbers. A number of methods are available which may be used for this. If you are expecting to read from a binary file instead of text, specify the 'binary' flag when creating the stream with nsIScriptableIO.newInputStream().

var stream = IO.newInputStream(file, "binary");

This line will create and open a binary stream for a file. Once the stream has been opened, you can read from the file using various reading methods. The readString method may be used to read a certain number of bytes into a string. This example will keep reading 20 byte blocks from a file and append them to a string. Note that binary streams do not interpret characters within the stream, so the returned string will only have characters below 256, if you expect to use the data as text.

while(stream.available())
  output += stream.readString(20);

Although this can be a suitable means of reading binary data, usually you will want to retrieve the data with some additional methods that are more useful. The following methods are available:

  • readBoolean will read a single byte from a stream and return false if the byte is zero and true if the byte has a non-zero value.
  • read8 will read a single byte and return it. The 8 in the method name indicates that 8 bits of data are being read.
  • read16 will read two bytes from the stream and interpret them as an integer. This method will return the value as a number.
  • read32 will read four bytes and return this as a single 32-bit integer.
  • readFloat will read four bytes and interpret them as a floating point value.
  • readDouble will read eight bytes and interpret them as a double floating point value.

In this next example, the read32 method is used to read a 32-bit length from the file. Then, this length is used to determine how many additional bytes of data to read.

var length = stream.read32();
var data = stream.readString(length);

All values are read in big endian form, which means that integers are stored in the file with their higher bits first. For both the read16 and read32 methods, the values are always interpreted as unsigned values. If you want the sign bit to be interpreted such that negative values can be read, you can use a simple calculation to convert the value:

var val = stream.read16();
if (val > 0x7fff)
  val = ~(0x10000 - val - 1);

Sometimes, a file will contain bytes that have the value zero. You can use the read8 method to read these. The read and readString methods may not be suitable for this due to zero-values. An additional method that is useful here is the readByteArray method. This method is used to read a number of bytes into an array. Unlike with the other reading methods that read a string, the readByteArray method returns an array where each element is a byte of data from the file.

var arr = stream.readByteArray(8);
alert("The second byte read was: " + arr[1]);

This example uses the readByteArray method to read 8 bytes from the stream. Unlike with the read and readString methods, the readByteArray method will fail if there isn't enough data to read. This is also the case of the other number reading methods.