The new nsString class implementation (1999)

In fact, it's very unlikely this is really the page you want. If you happen to see dbaron lying around somewhere, please direct him here so he can fix this message. In the meantime, try the XPCOM string guide—but note that that article bears a warning of its own!

This document is intended to briefly describe the new nsString class architecture, and discuss the implications on memory management, optimizations, internationalization and usage patterns.

Disclaimer: I absolutely hate string classes. No one has ever devised one that more than 2 programmers can agree on. So, why then am I proposing this? Well, nsString has served us well so far, but it's in need of a facelift. And XPCOM has really taken off, so nsString needs to be brought into alignment.

Justification

The nsString class is a wide character string class used throughout all of Gecko (and other modules) as the default implementation. However, it suffers from a few implementation details which need to be addressed and that are the subject of this document. The deficiencies of the current implementation are:

  1. Class based -- making it unsuitable for cross-dll usage due to fragility
  2. Little intrinsic i18n support
  3. Few efficiencies, notably a lack of support for narrow (1-byte) character strings
  4. No support for external memory management policy
  5. Lack of XPCOM interface

Notable features of the new nsStrImpl implementation are:

  1. Intrinsic support for 1 and 2 byte character widths
  2. Provides automatic conversion between strings with different character sizes
  3. Inviolate base structure eliminates class fragility problem; safe across DLL boundaries
  4. Offers C-style function API to manipulate nsStrImpl
  5. Offers simple memory allocator API for specialized memory policy
  6. Shares binary format with BString
  7. Coming soon: a new XPCOM (nsIString) interface
  8. Non-templatized; this is a requirement for Gecko
  9. Very efficient buffer manipulation

Architecture

The fundamental data type in the new architecture is struct nsStrImpl, given below:

struct nsStrImpl {
  PRInt32 mLength;
  void*   mBuffer;
  PRInt32 mCapacity;
  char    mCharSize;
  char    mUnused;

  // and now for the nsStrImpl API...
  static void EnsureCapacity(nsStrImpl& aString,PRUint32 aNewLength);
  static void GrowCapacity(nsStrImpl& aString,PRUint32 aNewLength);

  static void Append(nsStrImpl& aDest,const nsStrImpl& aSource,PRUint32 anOffset,PRInt32 aCount);
  static void AppendCString(nsStrImpl& aDest,const char* aSource,PRUint32 anOffset,PRInt32 aCount);

  static void Assign(nsStrImpl& aDest,const nsStrImpl& aSource,PRUint32 anOffset,PRInt32 aCount);
  static void AssignCString(nsStrImpl& aDest,const char* aSource,PRUint32 anOffset,PRInt32 aCount);

  // assign a char or a substring into the existing string...
  static void Insert(nsStrImpl& aDest,PRUint32 aDestOffset,
                     const nsStrImpl& aSource,PRUint32 aSrcOffset,PRInt32 aCount);

  static void InsertCString(nsStrImpl& aDest,PRUint32 aDestOffset,
                            const char* aSource,PRUint32 aSrcOffset,PRInt32 aCount);

  static void InsertChar(nsStrImpl& aDest,PRUint32 aDestOffset,char theChar);
  static void InsertChar(nsStrImpl& aDest,PRUint32 aDestOffset,PRUnichar theUnichar);
  static void InsertChar(nsStrImpl& aDest,PRUint32 aDestOffset,PRInt32 theQuadChar);

  static void Delete(nsStrImpl& aDest,PRUint32 aDestOffset,PRUint32 aCount);
  static void Truncate(nsStrImpl& aDest,PRUint32 aDestOffset);

  static PRInt32 Compare(const nsStrImpl& aDest,const nsStrImpl& aSource,
                         PRInt32 aCount,PRBool aIgnoreCase);
};
nsString

The nsString class is still with us as a subclass (wrapper) of nsStrImpl. By default, nsStrings use a 2-byte UCS2 character storage model. The nsString class is very lightweight since it gets its functionality from the nsStrImpl static library. In addition to the nsStrImpl API shown above, nsString, nsAutoString and nsCString all offer additional API's (that all degrade to those found in nsStrImpl) for construction, searching and comparison. Also note that the new nsString interface mimics fully the interface in the existing nsString class found in mozilla/base/src/nsString.h.

nsAutoString

We still offer an nsAutoString that provides its own stack based buffer. This very useful class allows programmers to take advantage of the nsString/nsStrImpl implementation, while eliminating heap based allocations. An additional improvement has been made to nsAutoString that allows it to use an arbitrarily sized stack based buffer rather than its own internal buffer. This means that you can continue to use efficient (temporary) stack buffers for string storage with the bonus of storage pools that serve your specific need. (Sounds complicated, but it's really easy). This class fully interoperates with nsString and nsStrImpl.

nsCString

The new nsCString class shares the same API with nsString, but uses a 1-byte ASCII character storage model. This will allow programmers to use the nsString API's like a standard char* without incurring the 2-byte per character overhead. This class fully interoperates with nsString, nsAutoString and nsStrImpl.

nsIString

Naturally we will need include an nsIString interface onto the nsStrImpl/nsString classes. I won't repeat it's interface here since it is basically a restatement (in XPCOM terms) of the nsString interface.

Usage Patterns

How To Use These Classes

To increase the portability, thread and process safety of Gecko, I suggest the following rules regarding the use of each of our string class derivatives:

<center> String Class</center> <center> Where To Use</center>
nsStrImpl Use to pass strings between modules who have linked the nsStrImpl function library.
nsString Use these locally in objects who span of control is known to live within your own process. These should typically not be exposed to objects in other modules.
nsAutoString Use these locally in cases where you don't want to incur heap allocation unless absolutely necessary.
nsCString Same as nsString, but should be used with caution because of localization concerns.
nsIString Use to pass strings between modules that may not use nsStrImpl implementation. This is the most generic approach, but offers reference counted strings.

There are implications regarding this implementation, notably dealing with API changes throughout Gecko. Notably, nsStrings in API's will be discouraged in public API's. These API's will need to be rewritten using nsStrImpl references instead. As an alternative, programmers can pass nsIStrings between modules.

I18n Issues

Another concern (mainly of the i18n team) has to do with the use of a 1-byte (ASCII) nsCString at all. The i18n team correctly points out that that anarchy will prevail if judicious control over their use is not mandated. The problem stems from assumptions that programmers make regarding ASCII strings; the typical assumption being that they will never need to interoperate with code that assumes UCS2 strings. This assumption is nearly always wrong -- and will seriously hinder our ability to localize the source base.

It is recognized that (ASCII) nsCString's are useful in the following contexts:

  1. Whenever calling libraries that expect a char* variant
  2. Whenever maximum memory efficiency is essential

I would argue that only the first case is normatively legitimate. The i18n folks will tell you it's better to use a wide string and convert to 1-byte forms for this purpose even though there is a performance penalty for doing so. Since I have to acknowledge the idiom, I have made nsCString available but note that it should be used sparingly.

Memory Management

A principal enhancement of the new architecture is pluggable memory allocators. All nsString subclasses provide their own default allocator implementations, but programmers are free to use their own. In the new prototype nsStrImpl and nsString classes, the allocator is an intrinsic member installed during construction of the string (by default they share a global allocator).

Note: The COM rules imply that everyone needs to use the same allocators, that they acquire via a global COM service called CoGetMalloc(). Our nsStrImpl uses an allocator pattern so that programmers can install their own policy, but this may also make allocation simpler in a multiprocess environment. I'm wondering if this is sufficient, namely, that a string can return it's own (shared) allocator for this purpose.

Our minimalistic nsIMemoryAgent interface is just rich enough to support the nsString idiom, and could be extended to serve as the general memory allocation idiom. Here's it's API:

class nsIMemoryAgent : nsISupports {
  void* New(nsInt32 aSize)=0;  //used for both alloc and realloc
  void* Delete(void* aPtr)=0;
};
Internationalization

The new nsStrImpl/nsString implementation addresses at least two of the primary concerns of our i18n team. First, nsStrImpl offers charset conversion hooks for use during construction, comparison and assignment. (These are stubbed out today awaiting their review and implementation). Second, they are concerned that programmers be prevented from abusing the string classes in a number of ways. To wit:

  1. They want to ensure that the underlying buffers cannot be corrupted or altered erroneously
  2. They want to ensure that the appropriate set of conversion functions get applied
  3. They want some control over the usage pattern of strings, such that the 2-byte (UCS2) form is used whenever possible, and some restrictions are applied to the use of 1-byte (ASCII) nsCStrings.

Original Document Information

  • Author: Rick Gessner
  • Last Updated Date: January 20, 1999
  • Copyright Information: Portions of this content are © 1998–2007 by individual mozilla.org contributors; content available under a Creative Commons license | Details.