The Unicode Bidirectional Text Algorithm

Draft
This page is not complete.

The Unicodeยฎ Bidirectional Algorithm (also known as the BiDi Algorithm) is part of the Unicode text standard that describes how the user agent should order characters while rendering Unicode text. Understanding this algorithm in at least basic terms is helpful when you're striving to produce localization-ready web content or apps.

In this guide, we'll take a look at the BiDi Algorithm and learn in general what it does and how it applies to your content, so that you'll be better prepared when using the features of HTML and CSS to which the algorithm applies while determining the order and directionality of text during rendering.

Fundamentals

(base direction, character types, etc)

The algorithm

Character level directionality

Directional runs

(what they are, how base direction applies)

Handling neutral characters

Overriding the algorithm

Content about using HTML and CSS to override the default behavior of the algorithm; include info about isolating ranges etc.

Overiding BiDi using Unicode control characters

Unicode provides a number of special control characters that make it possible to control directionality of ranges of text. There are two sets of control characters; one set opens the override, and another restores the original directionality. You must always follow each opening character with an appropriate closing character.

Initial Unicode BiDi algorithm control characters
Character Code point HTML entity Markup equivalent Description
Left-to-Right Isolate (LRI) U+2066 ⁦ dir="ltr" Sets the base direction to LTR, isolating the embedded content from the surrounding text
Right-to-Left Isolate (LRI) U+2067 ⁧ dir="rtl" Sets the base direction to RTL, isolating the embedded content from the surrounding text
First Strong Isolate (FSI) U+2068 ⁨ dir="auto" Isolates the content and sets the base direction according to the first strongly-typed directional character in the embedded content
Left-to-Right Embedding (LRE) U+202A ‪ dir="ltr" Sets the base direction to LTR but allows the embedded text to interact with the surrounding content; this risks the effect spilling over to the outer content
Right-to-Left Embedding (RLE) U+202B ‫ dir="rtl" Sets the base direction to RTL, but lets the embedded text interact with the surrounding content, risking spillover effects
Left-to-Right Override (LRO) U+202D &#x202D; <bdo dir="ltr"> Overrides the BiDi algorithm, displaying the characters in memory order, from left to right
Right-to-Left Override (RLO) U+202E &#x202E; <bdo dir="rtl"> Overrides the BiDi algorithm and displays the embedded characters in reverse memory order, from right to left
Closing Unicode BiDi algorithm control characters
Character Code point HTML entity Markup equivalent Description
Pop Directional Formatting (PDF) U+202C &#x202C; Closing whatever opening tag used the dir attribute Used for RLE or LRE
</bdo> Used for RLO or LRO
Pop Directional Isolate (PDI) U+2069 &#x2069; Closing whatever opening tag used the dir attribute Used for RLI, LRI, or FSI

See also