History Service Design

This document provides a high-level overview of the overall History service design of the Places system. Places is designed to be a complete replacement for the Firefox bookmarks and history systems using Storage. This system provides additional performance, flexibility, and querying capabilities over the old one, for both end users and extensions developers.

View the service interface definition: nsINavHistoryService.idl.

Objectives

The primary objectives of the new History service implementation in Places are:

Improve access to browsing history
Allow association of useful metadata with urls
Flexible query system for both end-users and add-ons developers
Clean architecture for ease of code reuse and maintainability

The most known and visible feature of history are views. The user can query his visits based on a date range, meta contents or revisit behavior. This involves storing informations on all of the user's visits, including visit time, type of visit and meta data. Old history system was instead only storing first and last visit date, and a generic visits counter, creating some problem due to the impossibility to represent real history flow in a timeframe.

History views must be organized and as concise as possible. Each visit is associated with a favicon, to allow for better recognizability and groupable by common timeframes or domains. History views should allow to quickly find a page in a certain timeframe remembering only small details about it. The system should be customizable enough to allow creation of custom, calendar and graphical views based on queries executed through a common API upon underlying data.

The querying system allows to extract slices of data based on common patterns, this is directly usable by the end user, but also by implementers to provide a variety of nice features. For example, is possible to create a query folder containing the 10 pages most visited by the user, allowing to fast find good candidates for bookmarking.

A locationbar intelligent algorithm (aka: the awesomebar) allows searchs through bookmarks, history, keywords (And much more) detaching the user from the old long searches through tree views. This is possible through a relevance algorithm that assigns a param called frecency to every page in history, see The Places frecency algorithm for major informations. History service provides the basics to create such adaptive search paths, allowing for a better browsing experience through a common interface.

Long term objectives include the ability to index more informations from the visited pages, through fulltext indexes, and the possibility to generalize the frecency algorithm to allow for its use in user's queries. Might be also nice to provide a hook for third-party products so they can provide text searching capabilities to the Places system.

Places core

History Service (nsINavHistoryService.idl) is the core of Places, every other Places service depends on it to correctly work, so it gets always initialized at application startup.

Actual tasks executed by this service include:

Database creation, maintenance and initialization: all services rely on a common shared database called places.sqlite. History Service checks for database existance and coherence, and initializes the connection through the Storage service.
History management: provides basic and advanced APIs to store and modify visits and page details.
Relevance calculation: maintain and expire frecency values for pages.
History expiration: expire old visits and invalid informations from the database.
Queries creation: allows the creation for complex queries.
Queries results maintenance: allows easy access to queries results.
Charset for pages: allows setting and reading charset for each page.

Database maintenance

At startup the service creates an exclusive Storage connection to places.sqlite, the exclusive locking is needed for both a perf gain and data-safety. In case the database is missing a new one is created, if instead the database exists but the connection to it fails due to database corruption, the corrupt database is moved away and a new one is created. In both cases an implementer can know what did happen at startup and act accordingly (maybe restoring a backup).

Once the database connection has been set-up the schema version of the database is checked. Schema version is upgraded every time a change is made to the database schema, History service will eventually upgrade the database, moving and fixing its datas accordingly. In case the database has been created for the first time History service will create all tables, indexes and triggers, calling related initTables static methods of other dependant services.

Finally temporary tables, indexes and triggers are created, this happens at every run since those entities are removed when closing the connection. Temporary storage is used to avoid too frequent writes to the disk, since those are most likely to cause fsyncs (to ensure data integrity) on the target filesystem. The temporary data (actually visits and pages) are synced to disk either on a timer and on bookmark's inserts.

Performance

To ensure performance a bunch of statements, commonly used when adding or reading visit informations, are created at startup. These statements can be reused avoiding the overhead due to createstatement calls, before closing the connection these statements need to be finalized though, since not doing that would cause leaking. The internal syncing service takes care of initializing Places, syncing data to disk and finalize statements, so usually that's not a problem. The same pre-compiled statements approach is used in other dependant services and in Autocomplete.

Another step to improve performance is mainly dedicated to not locking UI. Since Places is actually not thread-safe and doing most of the work in the main-thread, adding visits (the most common action usually executed on user interaction) could end up locking the UI till the database I/O to save data is complete. For this reason visits and favicons are added lazily on a timer. In future this will probably change in favour of Storage asynchronous statements, so that the INSERTs will be queued-up by mozStorage itself in a separate thread.

SQL queries are built with performance in mind, and constantly tested against large databases. Database indexes are quite important, and a good query can make the difference between minutes or seconds. Many queries are especially optimized, for example limited history queries, often used to build menus, allowing for good performances even in presence of a really large history.

Storing pages and visits

Pages (intended as URIs) are stored into a table shared by both History and Bookmarks, every url is unique in this table and is associated with a place id, commonly used as the foreign key on other tables. This table has common attributes for a page like typed, hidden and frecency.

Certain scheme are excluded from history, so they will never be added, for example: about, view-source, chrome. If a page can be added we first check if it exists already, then it is added or updated accordingly to the previous check, and common attributes are set based on concurrent visit properties. Typed means the page as a typed visit, hidden means the page should not be shown in autocomplete, while frecency is the relevance the page will have in locationbar queries. Finally observers are notified a new visit has been added.

Every visit is identified by its visit date, and a visit type (also known as transition type) that represents how we have come to that page (typed, click, redirect, bookmark, etc.). An additional property of a visit is the visit we have come from, this is used to track visit chains in global history, so for example if clicking a link causes a redirect the from visit will allow to follow up the chain of visits. At the moment this is used when catching favicons or bookmarks for redirected pages.

Querying for data

Querying data is a basic feature for Places. Thanks to the underlying database, history size has increased far more than 10 times from the previous Mork based implementation, thus the need for a simple and efficient way to extract slices of data.

A Places query is made of of a query object (nsINavHistoryQuery) and a query options object (nsINavHistoryQueryOptions) that defines the constraints results must obey to. These objects can be directly instantiated and built, setting their attributes, but Places allows a more readable and manageable form for queries: place query URIs. Query URIs (for example place:queryType=0&sort=8&maxResults=10) can be easily built and read by users (through a built-in advanced search builder UI) and can be bookmarked, creating a so called smart bookmark. A smart bookmark can update itself automatically when bookmarks or visits change, providing a sort of saved search.

Once Places receives a place: URI or a query object an internal query builder checks for the requested options and make up a real SQL query that is executed with Storage APIs. Many queries can be executed and combined at the same time. An history container node is built (the so-called rootNode), this node will fill itself. Finally the root node is assigned to a history result object, that can be furtherly modified setting for example a sorting mode. Only the first level of a node is usually filled, internal nodes or queries will populate themselves when opened.

An implementer working on a UI can then associate a viewer with the result, so that it will be notified when the result tree changes.

Queries can act on a variety of datas, coming from all dependant services, so it is possible to query history, bookmarks or both, also with values coming from other services like tags, annotations. Result nodes can be simple uri nodes with page attributes or complete visit attributes, or can be containers (folders, other queries, dynamic containers and so on).

See Querying Places for deeper knowledge on the actual implementation.

Expiration

Expiration is an important part of data management for two reasons:

Privacy: expiring data based on user interaction is important, nothing must be left behind on a page removal
Database maintenance: having cleaner and smaller tables helps queries performances

Expiration is done at certain moments, but in future will most likely be moved to async queries, to be executed on a separate thread. Actually pages and visits are expired:

On page added (lazily)
On idle
On shutdown

Expiration of pages happens when a page is no more referenced, so there are no more visits, nor bookmarks associated with it. Visits are expired based on user preferences, there is an hard limit on the minimum number of days that should be retained, visits in that range won't never expire. Visits inside a slushy limit (> than the hard limit) will be removed only if we are over a maximum number of pages we can retain, for performance issues.

Finally expiration can be forced by the user himself to clean up the full history or slices of it (last hour, last day, ...).

See Places Expiration for more information.