Message Summary Database - Archive of obsolete content

The Mail Summary Files (.msf) are used to store summary information about messages and threads in a folder, and some meta information about the folder.

nsIMsgDatabase

The main access point to the summary information is nsIMsgDatabase. nsIMsgFolder has a method to get the database for a folder.

nsIMsgDatabase is an abstraction on top of MDB, which is a set of db interfaces. The MDB interfaces are implemented in Mork. MDB is a schema-less db interface, so it's trivial to add new attributes without regenerating the db, and it's trivial for older code to read newer databases, because the code can ignore but maintain the attributes it doesn't know about. If we were to replace Mork, we could do it at the MDB level (unlikely, because implementing the MDB interface on top of a different DB would be very hard), the nsIMsgDatabase level (probably the easiest), or we could invent a whole new database interface and change all the code that uses the nsIMsgDatabase interface.

Message Headers

The message header object implements the nsIMsgDBHdr interface. This includes a set of per-message flags, the more commonly used headers (e.g., subject, sender, from, to, cc, date, etc), and a few other attributes, e.g., keywords. There are a set of generic property methods so that core code and extensions can set attributes on msg headers without changing nsIMsgHdr.idl.

Msg Threads

We store thread information persistently in the database and expose these object through the [nsIMsgThread interface. So the db knows which messages are in a thread, which message a message is in reply to, etc. This allows us to store watch/ignore information on a thread object, and avoids having to generate threading information whenever a folder is open. This has arguably been more trouble than it's been worth, especially when we've threaded incorrectly.

Meta Information

The nsIDBFolderInfo interface handles the folder meta data. This also supports generic properties. There is a method on nsIMsgDatabase to get the dbFolderInfo.

Mork - the Good, the Bad, and the Ugly

Mork has a horrible reputation and we're always asked why we don't just replace it with some other db. The bottom line is that we haven't been able to justify the very large cost of doing so because there won't be any benefit to the end-user, at least in the short term. Here are some of the trade-offs:

The mork file format is ascii text, but unreadable. If it were binary, there would be less complaints, and it would be smaller.

The MDB interfaces are overly complex, and unfamiliar. The Mork code uses a lot of terminology of its own invention, as near as I can tell (though I suspect most DB code would be that way). It's difficult to fix bugs in Mork. But, on the other hand, there are very few bugs in the Mork code.

Mork loads the whole database in memory, and keeps it there. This makes it very fast to access our database objects, but it does increase memory usage.

Mork assumes the caller will do file locking, so two processes or threads writing to the same database can corrupt it. But I believe sql-lite has the same problem.

Mork is schema-less. This means we can add properties to rows arbitrarily. And we only pay the storage cost for a property if the row has the property set. We can still index on these properties, if we want. We would need the equivalent capabilities if we replaced Mork.