RDF Datasource How-To - Archive of obsolete content

This article is at least partially outdated. Help bring it up to date, if you can. The XPCOM registration parts and the "As of this writing, it is not currently possible to implement JavaScript XPCOM components" comment seem outdated didn't check the whole article.

This document is a cookbook that describes how to create a native, client-side datasource that works with Mozilla's RDF implementation. It supersedes (and borrows from) the original document put together by Robert Churchill.

What is a datasource?

The "RDF universe" consists of a set of statements about Internet resources; for example, "my home page was last modified April 2nd", or "that news article was sent by Bob". In the most abstract sense, a datasource is a collection of such statements.

More concretely, a datasource is a translator that can present information as a collection of RDF statements. For example, a "file system datasource" would translate the file system into statements like "/tmp is a directory" and "/tmp/foo is contained within /tmp". An "IMAP datasource" would use the IMAP protocol to translate your mail server's inbox as a collection of statements like "message number 126's subject is 'make money fast on the Internet'" and "message number 126 was sent by 'spammer128@hotmail.com'". An "address book" datasource could translate a database file into statements like "spammer128@hotmail.com's real name is 'Billy Dumple'" and "spammer128@hotmail.com is considered an 'important friend'."

Statements from one datasource can be combined with statements from another datasource using a composite datasource. By combining statements from the IMAP datasource and address book datasource, above, we'd be able to identify the sender of "message 126" as an "important friend".

Deciding on a vocabulary

The vocabulary is the set of properties that you will use to express relationships between elements (resources and literals) in your data model. The first question that you must answer is "should I use an existing vocabulary, or invent my own?" A reasonable answer is, "use an existing vocabulary unless you absolutely must invent your own." This will allow your datasource to be integrated with other datasources with a minimum of effort.

There are several existing vocabularies of note, including:

The RDF Schema Specification. This vocabulary is a "meta vocabulary" that is used to specify other vocabularies.
The Dublin Core. This vocabulary is useful for describing electronic resources. It contains elements for authorship, subject, publication date, etc.

Mapping your data to nodes and arcs

[write me!]

Implementing the `nsIRDFDataSource` interface

Your first chore will be to implement the nsIRDFDataSource interface. There are basically two approaches that you can take in this endeavor:

Delegate to an inner proxy. For example, you may choose to delegate to the in-memory datasource, which is a generic datasource that implements nsIRDFDataSource.

Typically, you provide a parser for reading in some sort of static storage (e.g., a data file); the parser translates the datafile into a series of calls to Assert() to set up the in-memory datasource. When Flush() is called, or the last reference to the datasource is released, a routine walks the in-memory datasource and re-serializes the graph back to the original file format. For examples of an implementation like this, look at the RDF/XML datasource or the bookmarks datasource.

You may want to choose this implementation if your primary goal is to "wrap" a legacy data store. This implementation may cause problems if your data store can be modified "on the fly" by other agents.
Aggregate the in-memory datasource. This is an extreme case of delegation, where you use XPCOM aggregation to implement the nsIRDFDataSource interface. See Aggregating the In-Memory Datasource for technical details.

If you take this approach, you won't be able to selectively implement methods of the nsIRDFDataSource interface; instead, all of the methods will be "forwarded" to the in-memory datasource. This can be useful if your datasource is "read-only", and you aren't worried about modification using Assert(), etc.
Implement the interface yourself. If you choose this route, you'll need to implement each of the nsIRDFDataStore methods "by hand". Although this is more work, it is really the only way to create a "live" datasource that may be changed by some outside agent.

The file system datasource and local mail datasource are good examples of datasources that have been implemented this way.

You'll probably need to choose this implementation if your datasource is "live", and may be modified or altered by some outside agent (e.g., new mail arriving). You may also need to choose this implementation if the data set which your datasource is modeling is too large to fit in to memory (e.g., the entire file system structure).

[More info on what each method needs to do here]

RDF Commands

[Describe what commands are, and why you'd implement them.]

Registering the datasource component

A datasource is an XPCOM component. As such, it must (currently, see [1]) have:

An XPCOM CLSID to identify the data source implementation
An implementation class (that corresponds to the CLSID) whose code lives in a DLL. The DLL must be located in the XPCOM components directory
A factory that is registered to an XPCOM ProgID in order to be instantiated from the repository.

Constructing a DLL for a component is beyond the scope of this document; the reader is referred to the RDF factory as a guideline.

Registering an RDF datasource is fairly simple: in the DLL's NSRegisterSelf() method, you simply call the component manager's RegisterComponent() method:

extern "C" PR_IMPLEMENT(nsresult)
NSRegisterSelf(nsISupports* aServiceManager, const char* aPath)
{
   nsresult rv;
   ...
   // Assume compMgr refers to the component manager
   rv = compMgr->RegisterComponent(kMyDataSourceCID,
            "My Data Source",
            NS_RDF_DATASOURCE_PROGID_PREFIX "my-datasource",
            aPath, PR_TRUE, PR_TRUE);
   ...
}

Replace kMyDataSourceCID with your datasource's CLSID. Replace "My Data Source" with a descriptive string that should appear in the registry. Finally, replace "my-datasource" with a value appropriate for your datasource. This value, when prefixed with "rdf:", is a datasource identifier, and may be used with nsIRDFService::GetDataSource() to retrieve your datasource from the RDF service. For example, the above datasource would be accessable as follows:

nsIRDFService* rdf;
rv = nsServiceManager::GetService(kRDFServiceCID,
          kIRDFServiceIID,
          (nsISupports**) &rdf);

if (NS_SUCCEEDED(rv)) {
    nsIRDFDataSource* myDataSource;
    rv = rdf->GetDataSource("rdf:my-datasource",
                 &myDataSource);

    if (NS_SUCCEEDED(rv)) {
        // ...do something to myDataSource here...
        NS_RELEASE(myDataSource);
    }
    nsServiceManager::ReleaseService(kRDFServiceCID, rdf);
}

Displaying RDF as content

Now that you've gone through all this pain to expose your information as a datasource, you probably want to see it. Using XUL, you can display the contents of your datasource in a tree control, a menu, or a toolbar. In fact, you can convert RDF to an arbitrary content model using XUL Templates.

The following XUL fragment illustrates how to instantiate a tree control whose body is "rooted" to a resource (http://foo.bar.com/) that your datasource describes:

<window
  xmlns:html="http://www.w3.org/1999/xhtml"
  xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#"
  xmlns="http://www.mozilla.org/keymaster/gat...re.is.only.xul">

  <tree datasources="rdf:my-datasource" ref="http://foo.bar.com/">
    <template>
      <treechildren>
        <treeitem uri="...">
          <treerow>
            <treecell>
              <text value="rdf:http://home.netscape.com/NC-rdf#Name" />
            </treecell>
            <treecell>
              <text value="rdf:http://home.netscape.com/NC-rdf#URL" />
            </treecell>
          </treerow>
        </treeitem>
      </treechildren>
    </template>

    <treehead>
      <treeitem>
        <treecell>Name</treecell>
        <treecell>URL</treecell>
      </treeitem>
    </treehead>

    <!-- treechildren built _here_ -->
  </tree>

</window>

The important "magic attributes" have been called out in bold, above:

datasources="rdf:my-datasource". This is a space-separated list that may include internal XPCOM datasource "identifiers" (as described above) and URIs for local or remote RDF/XML documents. Each datasource that is listed will be loaded, and the assertions contained in the datasource will be made available to the tree control for display.
ref="http://foo.bar.com/". This roots the graph in your content model. The tree tag will be treated as if it has the ID attribute with a value http://foo.bar.com/.
<template>...</template>. The XUL template that is used to build content from the graph. Starting with the resource that corresponds to the tree element, http://foo.bar.com/, the graph will be traversed and content will be constructed using the pattern specified within the template tags.

For a complete description of how content is built from RDF, see the XUL:Template Guide.

¹ As of this writing, it is not currently possible to implement JavaScript XPCOM components; however, it may soon be possible to do so via XPConnect. Update: JavaScript XPCOM should now be possible.

Contact: Chris Waterson (waterson@netscape.com)

Original Document Information

Author(s): Chris Waterson
Last Updated Date: June 19, 2000
Copyright Information: Copyright (C) Chris Waterson