XUL Parser in Python

Warning: The content of this article may be out of date. This article is from 2000.

v.00001

To celebrate ActiveState's recent announcement about support for Perl and Python in Mozilla, I have put together this little Python script that parses your local XUL and builds a list of all the XUL elements and their attributes in an HTML page.

With new widgets and attributes landing all the time, I wanted to get some quicker way of looking at the XUL--at particular builds, at particular widgets, at which elements had which attributes, etc.

The script writes out all the attributes and none of the values, but the parser itself is seeing the elements, their attributes, and the values of those attributes, and you just have to ask for them if you want them. For example, you could easily adapt this to:

  • return the id values of all the elements
  • take elements on the command line and only spell them out
  • build new chrome subdirectories (i.e. new "packages") on the fly using search and replace.

As I adapt the script in these ways and try to further generalize the actual code as I get time, I will make it available here. In the explanation section below, I try to say a little something about how this script works. It's really just a wrapper around Python's xmllib XML Parser, but I had to sort of fool around with it. Python's support for XML (and potential with XUL) is extensive, and so it's more a matter of choosing an approach and hooking things up than anything else.

Source code

The source code for the XUL parser is available.

It only runs as a hard-coded script right now, so if you want to use it you have to go in and changes some of the stuff like CHROME_DIR and which information you want out. It also, I'm afraid, only works on the win32 platform, where the <tt>dir</tt> command it depends on gets you your XUL files. Again, you can easily use <tt>ls</tt> instead, as I should have done, and pick up some extra support. I will make these adjustments and change it into a real module when I get a second, so that someone can instantiate the parser from their own scripts and use it more flexibly:

>>> import XULTool
>>> myXP = XULParser()
>>> CHROME_DIR = 'D:\src\mozilla\xpfe'
>>> res = myXP.parseXUL(CHROME_DIR, 'window', 'id')

Where something like the above would write out all the XUL window ids in the build. Something like that, anyway.

Results

The results from the script are written to a file called res.html. This is another hard-coded thing I have to open up. If you want to test this script you should change the filename "res.html" specified in the fourth line and compare different results files. Note also that I wrote the build number into the header myself, and haven't figured out yet how to get that written automatically. I guess you could get a modify date from the files as you open them.

How the script works

As I said before, there isn't too much to explain in this script, particularly if you have used Python's xmllib parser before. At the middle of it is a subclass of the xmllib parser that overrides that parser's unknown_starttag method and asks it to do all the work. The unknown_starttag handler is fed the tag name, the attributes of that tag, and the attributes' values, so all you have to do as you hit each XUL element is build up a nested dictionary of elements and their associated attributes. After all the XUL files in the specified directory and its subdirectories are fed to the parser and parsed (using the win32 system's <tt>dir /s /b *.xul</tt> command), the dictionary of dictionaries is sorted and written into an HTML table.

The xml namespace support in xmllib was resolving the xul and html namespaces in a very annoying way, so I have an additional function, strip(), that takes off the whole namespace that xmllib is trying to tack onto the front of each item it finds in the XML.

Some modifications to this script may suggest other uses. The parser provides the tag name itself as a string and the attributes and their values in a dictionary, "a". If you want to look for certain widgets within the XUL files, you can get the filename from the calling method p.feed(data) and create a condition that only gets the elements specified in sys.argv[1].

Or, since the dictionary is already storing the values of all the attributes it finds, you can write the values of a particular attribute (e.g., id) to the results file by checking the attribute in sys.argv[1]:

for attr in a.keys():
  if strip(attr) == sys.argv[1]:
    el_list[name][strip(attr)] = strip(a[attr])

and writing thevalues to the HTML results file instead of thekeys:

for item in elements:
	w.write('<tr><td class="head">' + item + '</td></tr>\n')
	for a in el_list[item].values():
		w.write('<tr><td class="at">' + a + '</td>')

With these modifications, the script creates an output more like a Periodic Table of XUL Elements.

Until I spruce it up a little, this is just a very basic demonstration of using Python's XML parser with XUL. But Mozilla's upcoming support for languages like Perl and Python will really open up the Mozilla platform to tools and approaches of this kind--and make now a good time to start thinking about how these various technologies will be put together. The first level of support for Python in Mozilla will apparently be for Python modules made available as XPCOM objects. Extending on the approach in this script, then, you could imagine a kind of introspective XUL chrome that could modify and replicate itself by calling services from the XULParser and XULWriter XPCOM objects.

Please feel free to suggest changes, change the format of the results output, or adapt this script in any way you want. I have probably made some errors and undoubtedly written some strange, graceless Python. Most of all, please feel free to experiment with the Mozilla development platform in this way or in any other way you can imagine.

Original document information