nsIParserUtils

Provides non-Web HTML parsing functionality to Firefox extensions and XULRunner applications.
1.0
66
Introduced
Gecko 13.0
Inherits from: nsISupports Last changed in Gecko 14.0 (Firefox 14.0 / Thunderbird 14.0 / SeaMonkey 2.11)

Warning: Do not use this from within Gecko--use nsContentUtils, nsTreeSanitizer, and so on directly instead.

Implemented by: @mozilla.org/parserutils;1 as a service:

var parserUtils = Components.classes["@mozilla.org/parserutils;1"]
                  .getService(Components.interfaces.nsIParserUtils);

Method overview

AString convertToPlainText(in AString src, in unsigned long flags, in unsigned long wrapCol);
nsIDOMDocumentFragment parseFragment(in AString fragment, in unsigned long flags, in boolean isXML, in nsIURI baseURI, in nsIDOMElement element);
AString sanitize(in AString src, in unsigned long flags);

Constants

Constant Value Description
SanitizerAllowComments (1 << 0) Flag for sanitizer: Allow comment nodes.
SanitizerAllowStyle (1 << 1)

Flag for sanitizer: Allow <style> elements and style attributes (with contents sanitized in case of -moz-binding).

Note: If -moz-binding is absent, properties that might be XSS risks in other Web engines are preserved!
SanitizerCidEmbedsOnly (1 << 2)

Flag for sanitizer: Only allow cid: URLs for embedded content.

At present, sanitizing CSS backgrounds, and so on., is not supported, so setting this together with SanitizerAllowStyle doesn't make sense.

At present, sanitizing CSS syntax in SVG presentational attributes is not supported, so this option flattens out SVG.
SanitizerDropNonCSSPresentation (1 << 3) Flag for sanitizer: Drops non-CSS presentational HTML elements and attributes, such as <font>, <center>, and the bgcolor attribute.
SanitizerDropForms (1 << 4) Flag for sanitizer: Drops forms and form controls (excluding <fieldset> and <legend>.
SanitizerDropMedia (1 << 5) Flag for sanitizer: Drops <img>, <video>, <audio>, and <source>, and flattens out SVG.
SanitizerLogRemovals (1 << 6) Flag for sanitizer: Log messages to the console for everything that gets sanitized.

Methods

convertToPlainText()

Converts HTML to plain text.

AString convertToPlainText(
  in AString src,
  in unsigned long flags,
  in unsigned long wrapCol
);
Parameters
src
The HTML source to parse (C++ callers are allowed but not required to use the same string for the return value.)
flags
Conversion option flags defined in nsIDocumentEncoder.
wrapCol
Number of characters per line; 0 for no auto-wrapping.
Return value

The plain text conversion of the HTML specified in src.

parseFragment()

Parses markup into a sanitized document fragment.

nsIDOMDocumentFragment parseFragment(
  in AString fragment,
  in unsigned long flags,
  in boolean isXML,
  in nsIURI baseURI,
  in nsIDOMElement element
);
Parameters
fragment
The input markup.
flags
Sanitization option flags defined above.
isXML
true if |fragment| is XML and false if HTML.
baseURI
The base URL for this fragment.
element
The context node for the fragment parsing algorithm.
Return value

An nsIDOMDocumentFragment object for the resulting sanitized document fragment.

sanitize()

Parses a string into an HTML document, sanitizes the document, and returns the result serialized to a string.

The sanitizer is designed to protect against XSS when sanitized content is inserted into a different-origin context without an iframe-equivalent sandboxing mechanism.

By default, the sanitizer doesn't try to avoid leaking information that the content was viewed to third parties. That is, by default, for example <img> with a source pointing to an HTTP server potentially controlled by a third party is not removed. To avoid ambient information leakage upon loading the sanitized content, use the SanitizerInternalEmbedsOnly flag. In that case, <a> links (and similar) to other content are preserved, so an explicit user action (following a link) after the content has been loaded can still leak information.

By default, non-dangerous non-CSS presentational HTML elements and attributes or forms are not removed. To remove these, use SanitizerDropNonCSSPresentation and/or SanitizerDropForms.

By default, comments and CSS is removed. To preserve comments, use SanitizerAllowComments. To preserve <style> elements and style attributes on other elements, use SanitizerAllowStyle. -moz-binding is removed from <style> elements and style attributes if present. In this case, properties that Gecko doesn't recognize can get removed as a side effect.

Note: If -moz-binding is not present, <style> elements and style attributes, and if SanitizerAllowStyle is specified, the sanitized content may still be XSS dangerous if loaded into a non-Gecko Web engine!
AString sanitize(
  in AString src,
  in unsigned long flags
);
Parameters
src
The HTML source to parse (C++ callers are allowed but not required to use the same string for the return value).
flags
Sanitization option flags defined above.
Return value

The resulting text.