The Implementation of the Application Object Model

Warning: The content of this article may be out of date. As of 2007, we're trying to rely less on RDF, not more. This page might be interesting from historic perspective.

Purpose - The purpose of this document is two-fold. The first section of the document describes the motivation and reasoning behind using RDF as the foundation of XUL. This section makes a technical argument both for having XUL in the first place and for using RDF as the underlying implementation of XUL's content model. The second section describes the XUL/RDF architecture itself and outlines enhancements to the XUL language in order to allow the markup language to reference local data and to indicate how and when it would like to be annotatable with local data.

The Case for the XUL/RDF Approach

What is XUL?

XUL (pronounced zool) used to be a ravening demon intent upon taking over the Earth in Ghost Busters. In recent years, XUL dropped the Z from its name and replaced it with an X, and upon doing so, transformed into something far worse than a demon. It became a markup language.

XUL stands for "extensible user interface language". It is an XML-based language for describing the contents of windows and dialogs. XUL has language constructs for all of the typical dialog controls, as well as for widgets like toolbars, trees, progress bars, and menus. Where HTML describes the contents of a single document, XUL describes the contents of an entire window (which could itself contain multiple HTML documents).

The HTML content tree structure for a single document is represented as a set of objects that can be accessed and manipulated. This is referred to as the DOM (document object model). In a similar fashion, XUL's content tree for a single window is represented as a set of objects that can be accessed and manipulated. This is referred to as the AOM (application object model).

How are XML DOM/AOM trees represented in NGLayout?

NGLayout has its own objects for holding content trees. nsXMLELement is used to represent a single node in a content tree for an XML language. In the absence of any modifications to XUL, etc., this is what would be used by default to hold the tree for a remote XUL file. This nsXMLElement implements a whole slew of interfaces, since Raptor has multiple APIs for referring to these objects. Examples of these interfaces include nsIDOMElement, nsIDOMNode, nsIXMLContent, and nsIContent.

What is RDF?

RDF (pronounced R-D-F) has never been a ravening demon intent upon taking over the Earth, although that does depend on who you ask. RDF stands for resource description framework, a rather intimidating name that doesn't do much to help the layman understand exactly what it does.

RDF provides a very general mechanism for representing relationships between different disparate types of data. The relationships between data are represented as a directed labeled graph structure. The data itself can be fed in from any number of different sources (e.g., from a local file system, from your bookmarks file, or from some remote downloadable RDF file), and can then be combined into a single graph.

Ok, so after that paragraph many of you probably still don't understand what RDF does, so here's the short and sweet version. RDF can suck up data from different places (like your bookmarks and history or another web site), and it can combine them. This gives you a feature called aggregation, the ability to put completely different kinds of data into the same place. For example, the traditional bookmarks tree view could contain anything from mail messages to local files to maps of other sites.

This same ability to aggregate data leads to another capability of RDF. Because RDF can suck up data from a remote location (e.g., from a downloadable RDF file), and also suck up data from a local source, it can take the remote data and combine it with local data. You'll often hear people refer to this as RDF's local/remote merging capability. This capability is desirable for two reasons.

What is local/remote merging good for?

The first reason to have local/remote merging is that a remote file must be able to reference local data and have it merged in with the information that it specified. An example of this feature is a downloadable file that describes the user's main bookmarks folder. Because the file lives remotely, it can be updated whenever the owner of the Web site sees fit. A site admin could add new bookmarks to the file, and then the user would pick them up. However, in order for this to work, the local bookmarks have to be merged in with the remote bookmarks specified by the site.

The second reason to have local/remote merging is to save state and/or to record customizations that the user has made to his or her user interface. Imagine a remote file that describes the user's personal toolbar. The downloaded file contains an AOL Instant Messenger button, and the user really doesn't want that on the toolbar. Now since the AOL bookmark was specified by the remote file, the user has no way of going off to the site and changing that file just to get rid of the button (even though the user might wish that he or she could!).

What the user needs to be able to do is make a local annotation to the remote file, e.g., to save somewhere on the local hard drive the fact that the AOL Instant Messenger button shouldn't be on the toolbar. When the remote file is subsequently downloaded, these local annotations are sucked in and superimposed on top of the structure described by the file. The resulting structure is what the user actually ends up viewing.

Forget XUL! Let's just use serialized RDF!

Well, that's just plain cool! I say forget XUL! Let's just use serialized RDF to represent the UI!

That would be an unwise decision for a variety of reasons.

  1. RDF is a graph structure, and windows and dialogs are actually content trees. Because RDF can express arbitrary relationships between nodes, the serialized form of RDF in XML has to contain extra syntax. In other words, the traditional relationship of "containment" is not the only kind of relationship expressible in RDF. In HTML or XUL, when you put one tag inside another tag, a relationship of containment can safely be assumed. Not so with RDF. For RDF, an extra tag has to be inserted between two other tags to indicate exactly what kind of relationship should exist between the two nodes. The resultant file, although still relatively easy to manipulate, is bloated needlessly with extra syntax that is not strictly necessary.
  2. RDF is not a markup language. RDF can be serialized as an XML file, but it isn't a parallel to HTML. It's something different. It makes more sense to describe a content tree structure according to the accepted standards (that is accessible and manipulable via standard DOM APIs) using a real markup language, either an extension of HTML or an XML language like XUL.

Forget XUL! Let's just extend HTML!

Still I say forget XUL! Let's just extend HTML and use it to represent the UI!

That would also be an unwise decision for a variety of reasons.

  1. HTML is designed to handle documents and not to handle applications. It makes sense to maintain a clean separation between the Web APPLICATION and the Web DOCUMENTS that the application contains. By keeping the DOM tree in a sandbox (safely insulated from the containing AOM tree in which it might occur), you have an easy means of distinguishing the two trees and preventing scripts in the DOM tree from manipulating the AOM tree. In other words, it makes sense to have two parallel languages, one for describing applications, and the other for describing documents that can be used by those applications. It does not make sense to confuse or obscure the purpose of HTML by adding in a whole slew of new functionality that hasnothing to do with the display of Web pages.
  2. What content developers want from the next release of Mozilla is a standards-compliant browser. If we ship a browser that does not have 100% support for CSS2, for example, but we've extended HTML by adding 20-30 new tags, people are going to put down their blinders and see only the fact that we were off adding a whole slew of new stuff to HTML when we could have been firming up our standards story. By making XUL a separate beast, this is less of an issue. Since an AOM tree cannot be placed inside a DOM tree, XUL applications don't ever occur inside Web pages. By enforcing this distinction, we can successfully defend ourselves from any claims that we were "mucking about with strange new HTML" when we could have been working on our standards story.

Forget XUL and RDF?

We don't need any of this complicated nonsense! Just let us go off and make our widgets and build our applications the old-fashioned way! We're still cross-platform, since we'd only have to write widgets like trees and toolbars once! I say, forget this whole thing! Let's just ship a product!

That would be the unwisest decision of all. Why?

  1. Code bloat. If NGLayout's formatting objects aren't used to render the widgetry, then drawing code has to be written for each and every widget. Look at the difference between the two tree widgets that exist in NGLayout today as an example. The tree widget that is extending the table code has no drawing code, and even by the time it has matched and far outstripped the capabilities of the old gfx-based tree widget, it will still CONTAIN NO DRAWING CODE. Individual widgets, even complex widgets like the trees and toolbars need little to no drawing code (above and beyond what NGLayout already has) in order to work.
  2. Native widgets vs. NGLayout's form element widgets. This is really another issue of code bloat. It is a given that NGLayout will ship with its own non-native form element widgets. This has to be done in order to get features like transparency into those widgets. If XUL is dropped from the picture, then dialogs would have to be described using native widgetry (unless you're willing to design an HTML dialog system, which still wouldn't cut it without extending HTML), and Communicator would have to ship with both widget sets. The native widgets would also be an issue for something like a gfx-toolbar, which would need to contain the native widgets and still look consistent on all platforms. The native widgets also wouldn't be fully CSS responsive, which would mean they might look strange when placed in a dialog or on a toolbar with a rendered background. Of course you could always say that toolbars and dialogs just wouldn't be able to have backgrounds with different colors, but that leads into the next point.
  3. A staggering loss of functionality. An incredible array of features would be lost. Yes, you'd probably ship a browser slightly faster than with the XUL-based approach, but the features that would be lost are too compelling to ignore. These include: configurable UI, the ability to place HTML inside widgets like the trees and toolbars, scriptability of those widgets, scriptability of the UI, local/remote merging, aggregation of data... the list goes on and on.
  4. Future time-saving. XP development takes much less time than native platform development. But if we pay the cost and get XUL into the system now, in the future XUL development will take even less time than XP development. The additional cost up front to implement XUL is worth it for the amount of time it will save in future versions of the browser.
  5. Timing. It's now or never. If this approach isn't taken now, then we'll fall back into our old pattern. This is the pivotal point, because the apps are all being re-written from scratch. For this particular codebase, if we don't do this from start, we won't do it at all.

Why use RDF at all?

Ok, ok. So you've convinced me that XUL is the way and the light. But why do you need to use RDF at all? nsIContent, nsIDOMElement, and all of those interfaces mentioned at the beginning of this document are just that: interfaces! Couldn't you just make mail content and bookmarks content etc. and have them implement those interfaces?

The approach that I took in the past when faced with this question was to say, "RDF exists now. Much of it is written. To implement some sort of pluggable system that could do local/remote merging and mimic the functionality of RDF would require a month or two of engineering time that we can't afford to spend."

That answer was the incorrect counter to the question. The answer itself implied a concession that some newly-architected system that connected directly into the DOM APIs would be preferable to RDF if only there were time to engineer it. That is simply not the case. To discover why, let's explore what this pluggable content architecture would have to look like in order to match the feature set and functionality we need.

First off, let's consider how we'd figure out what kind of node to instantiate, e.g., a bookmarks folder node vs. a mail folder node. In the XUL, you could use a syntax like

<toolbar localData="mailbox:blah"/>

which would specify a URI that pointed to a specific mailbox node.

Our architecture must know how to examine this localData attribute to determine not only which kind of pluggable content needs to be instantiated, but that also has to determine which specific NODE should be instantiated. In order to accomplish this, our architecture has to have some sort of facility whereby different types of content, e.g., mail and bookmarks, can register themselves as the appropriate content to be instantiated for a given URI. We could do this using a registry that can map strings to CIDs, so we have a story for instantiating our different content nodes.

But now let's consider one of these nodes, e.g., nsMailElement. Let's look at the set of interfaces that nsMailElement needs to implement in order to exist in a XUL tree as fully scriptable content. nsIContent, nsIDOMNode, nsIDOMElement, and nsIXMLElement must be implemented at a minimum. Four interfaces with over 100 methods combined, a significant portion of which are redundant. Every new kind of content node would have to implement all of these interfaces. Several of the implemented functions would even have identical implementations, i.e. duplicate code that would be doing the same thing in the function bodies.

Here's another problem. The various interfaces, nsIContent et.al., are not yet solidified. They are likely to change following the first release of NGLayout, and when they do, anyone that implemented one of those interfaces will have to change as well in order to upgrade to the new world.

Statement #1

The two points raised in the previous two paragraphs, namely (1) redundancy of methods in the interfaces as well as a likely code redundancy in the implementation of some of those methods, and (2) the desire to be insulated from the layout DLL should those interfaces change, imply that a layer needs to exist between the pluggable content and the content tree interfaces.

This layer would serve several useful functions. First of all it could streamline the redundancy in the interface methods and present a new interface for pluggable content that was much smaller and easier to plug into than the 4-5 interfaces required if directly implementing the content tree interfaces. Furthermore, should the content tree interfaces change, only this layer would need to be updated. The pluggable content, safe behind this layer, wouldn't have to change at all.

One natural solution to try for implementing such a layer might be inheritance. However, in this XPCOM world, where each type of pluggable content is off in its own DLL, there's no clean way to inherit functionality from some base class implementation, when that implementation must necessarily reside in a different DLL, without introducing a code dependency between all pluggable content and the common base class.

This inability to provide a cleanly inherited system argues for a different approach, namely that all content node implementations be the same kind of object, and that those objects communicate with their pluggable content through this new streamlined interface we talked about earlier. We'll call this new interface a pluggable data source.

Just in case you still aren't convinced that all content nodes should be the same kind of element, consider another problem: how to implement aggregation. Suppose that a bookmarks folder contains a mailbox folder, a composer page template, and a bookmark. In order to achieve aggregation of data, a content node implementation cannot make any assumptions about what kind of children it holds. It can only refer to its child nodes through the various content tree interfaces. What we run into now is the problem of how one local content node knows how to instantiate other kinds of content nodes.

The only way that one content node would know how to instantiate a content node of a completely different type is if it had additional information stored for every child content node that it contained. It would have to consult the registry of data sources in order to instantiate each of its children.

If all content nodes are of the same type this problem can be solved in a cleaner fashion. A single content node could be initialized with its URI by its parent node, it could store its URI in a member variable, and it could use that as a basis for resolving the pluggable data source from which it would obtain its information.

Statement #2

All content nodes that reside in the AOM and that implement the content tree interfaces must be the same class of object.

So now we're on the right track, but there are still some flaws in our architecture. Let's consider another required feature that has heretofore gone unmentioned in this document: the need to take the same set of data and present it as completely different content models. The perfect example of this requirement is the Personal Toolbar. The Personal Toolbar must show up in a tree widget (in which case it has to be faking a tree content model, complete with <tt><treeitem></tt> and <tt><treecell></tt> nodes), or it must be able to show up on a toolbar (complete with <tt><button></tt> nodes and popup trees attached to folder buttons).

In our current architecture, we have a set of content nodes obtained from any number of data sources. We've solved our aggregation of data problem, but we have no efficient way of taking the data and representing it in different ways, since we'd have to go back to the data sources to re-aggregate everything into a new tree.

"Aha!" some of you might be saying. "Couldn't you just perform a tree transformation on whatever representation you have in memory?" The answer is "Yes, provided there is one single common intermediate representation of the collected and aggregated data to use as the basis for the translation."

"Why?" you ask. Well, let's take this problem to the natural extreme, and assume that there are n total possible representations for the same group of data. Then without some common internal representation of the data, it would be necessary to implement n*n total translators in order to guarantee that for whatever content model you happen to have built that the transformation could be applied. If there is a common intermediate representation of the data in question, then we need only implement n translators, one for each content model representation that can hold our aggregate data.

Yet another example of this problem arises from the need to perform fast sorts on a potentially large number of content items. Using the DOM APIs to walk the content nodes and reorder them would be an act of madness. Futhermore, the original natural order of the items (e.g., in the case of bookmarks) would be lost. When performing the sort, it would be advantageous to be able to form the new sorted content model without losing or disrupting the natural sort.

Statement #3

The fact that the same hunk of aggregate data can be represented as any number of different content models (e.g., sorted, or as a toolbar, a tree view, or a menu) implies a need for a common intermediate representation for aggregated pluggable content that exists on top of the pluggable data sources and that exists underneath the content tree nodes that implement the interfaces through which the data is actually exposed.

Let's go back to the sorting problem and consider a hypothetical situation. Suppose we decide we want to cache the sort relationship on our data, so that we don't have to continually resort it as the user hits the column headers in the tree. What we really need in this situation is the ability to take our intermediate representation and form an entirely different set of connections between our data objects. We need the ability to use arcs of a different type to connect our nodes, e.g., rather than chaining the nodes using a "natural order" relationship, it would be advantageous and desirable to be able to add an additional relationship to the nodes, e.g., a "sorted ascending on name" relationship. If we have something like this, then we can perform sorts without tearing down our intermediate representation AND without even losing our original information.

Another problem that arises in the tree view is the need to reorder columns. If a conventional tree structure is used as the intermediate representation of our data, then the column reordering could result in a potentially enormous and time-consuming walk through the tree in order to reorder the children of each item. But suppose that instead we could store additional information about the tree's columns, namely in which order they occur, then the act of persistently saving a column reordering would take far less time (O(1) to swap two columns, as opposed to a worst case O(n) where n is the number of cells in the tree).

Some of you might be saying, "Wait a minute. For sorting and column reordering, you have to rebuild the whole content model anyway! Why do you even need to change the intermediate representation of the data?" The answer is simple: persistence. Changes such as sorting and column ordering must be remembered across application sessions, and that means that these changes to the intermediate representation of data must be saved.

Statement #4

The intermediate representation of our data must be more flexible than a tree. It needs the ability to chain its nodes using a variety of distinct relationships in order to efficiently implement actions that permanently modify the data itself.

Last but not least, let's tackle local annotations and local/remote merging. Our architecture must be capable of applying local annotations to remote data, e.g., remembering that a button was removed from a toolbar or remembering the order of the columns in the bookmarks tree view.

This implies that changes that are made to our aggregated intermediate representation of data must be persistent. We must have the ability to add and remove nodes from the tree by saving the changes into the equivalent of a data source that can house the permanent changes (so that they can be sucked in and aggregated like everything else). This implies a need for the ability to make "negative" and "positive" assertions about connections in our tree, i.e., so that we can delete arcs and/or add arcs to the tree.

Statement #5

When a change is made to aggregated data that falls outside of the domain of an existing data source, it must be possible for that change to be persistently remembered by recording the change into a new data source that can then be read in when the data is re-aggregated in future sessions of the application.

The architecture that I have just described, the very architecture that I claim it is most desirable to have in order to implement our required feature set, is a combination of XUL and RDF.

The XUL/RDF Architecture

If you've read this far, you should now have a general idea of what the architecture is like, as well as the motivations for choosing such an architecture. Now it's time to fill in some of the details by mentioning XUL and RDF specifically. A picture (provided by Chris Waterson) of the XUL/RDF architecture is shown below.

Image:xulrdf.gif

Let's start over on the left side of the picture. A XUL document is read into Gecko's parser, and a specialized content sink, known as the XUL content sink, is responsible for constructing the in-memory RDF graph representation of the XUL.

The RDF graph represented by the XUL is then aggregated with the contents of other RDF stores (like bookmarks or history) to construct a composite data source, which is aggregated data still in an RDF graph form.

The resultant aggregated graph is fed into the XUL content model builder. This code is responsible for lazily presenting content nodes based on the RDF graph structure to the application that requests those nodes. Since this presentation happens "on demand", no content node is instantiated until the application specifically requests it, or demands an operation that requires the instantiation of the node to successfully complete, e.g., asking for the number of children of a content node.

When the application makes changes to the DOM, those changes percolate down into RDF, which can then decide what it should do with the changes in question. The most common options that RDF has to choose from are as follows:

  1. The change is really occurring in some RDF store. Handling of the change is the responsibility of the data source. (Example: deletion of an IMAP message causing the icon to change to a red X in the tree view.)
  2. Some change has been made in the composite data source that is the equivalent of a local annotation to the XUL file. In this case, RDF will do one of two things. If the XUL stream came from a remote site, then RDF has no choice but to make a local annotation on the graph. If the XUL stream came from a local file, then RDF can either locally annotate the graph (just as before), or it can serialize the XUL and write over the original file (thus allowing local changes to a local file to be reflected right in the XUL).

Referencing local data: The LOCALDATA attribute

A tag that references local RDF data does so by using the localData attribute, which gives the URI of the local data that should be merged in with the XUL content tree. Examples of tags that can reference local data are menu, menuitem, menubar, toolbox, toolbar, treebody, and treeitem.

TODO:

  • Outline the naming scheme for local data. Give an example once I actually know what the naming scheme will be.
  • Talk to John McMullen about 5.0 preferences and figure out where LOCALDATA will need to be used (likely on every form element control)
  • Talk to Steve Elmer. It's likely that this might be the desirable solution for buttons that observe preferences (like the HOME button)

Denoting persistence of local annotations to the AOM: The PERSISTENT attribute

A content tree in XUL that wishes to allow persistent local annotations to be made to all the nodes in the subtree (including the node itself) must specify this capability using the persistent attribute. The persistent attribute has values of true and false, with true specifying that local annotations that are persistent are desired. The default assumption if this attribute is not present is that local annotations are not allowed.

One subtree nested inside another subtree can specify a different value for the persistent attribute, thus allowing the XUL writer to specify a default for the whole window, but to still selectively override it for certain subtrees.

Note that even if the persistent attribute is set to false, that changes can still be made to a window's content tree. They simply won't be remembered across sessions.

Preventing the Persistence of Attributes: The DISCARDABLE attribute

Individual attributes can be specified as non-persistent (i.e.,discardable) through the use of the discardable attribute. For example, the open attribute used to indicate whether a node is expanded or collapsed in a tree view is something that is sometimes persistent (in the case of bookmarks) and sometimes not (in the case of mail folders). In both cases, the overall tree structure needs to be persistent, but this particular attribute could be either.

The discardable attribute takes as its value the attribute name that should be non-persistent. Within the subtree at which the discardable attribute occurs, the attribute in question will be considered to be non-persistent.

The discardable attribute is ignored when used inside a non-persistent subtree.

Note that the persistent and discardable attributes only apply when a change has been made to the composite data source that was not handled by another data source (e.g., bookmarks). If you delete a bookmark, that's always going to be persistent, regardless of what you set these attributes to be. The attributes in question only apply to local annotations that are made outside of the domain of any particular data source.

TODO:

  • More examples
  • Talk about data sources

Original Document Information

  • Written by: Dave Hyatt
  • Last Modified on: 1/31/99