HTML Ruby and Bidirectional Text
The set of HTML <ruby>
elements allow us to add pronunciation above text. For example:
"When you visit the zoo, be sure to see the panda - 熊猫
That is, the word or character which needs text above it is wrapped in <ruby>
. The pronunciation is wrapped in <rt>
. The <rp>
element indicates the presence of a parenthesis - which isn't usually displayed, but will be shown if the browser doesn't support <ruby>
syntax.
That's fairly easy for scripts written left-to-right. But how does it work for scripts like Arabic where the text is written right-to-left, but the user may want the pronunciations left-to-right?
Let's take the phrase "Hello World" in Arabic: مرحبا بالعالم. Google Translate tells me this is pronounced "marhaban bialealami".
For a single word, the directionality can be ignored. The browser should be smart enough to place the pronunciation above the word:
HTML
<p>Hello is: <ruby>مرحبا<rp>(</rp><rt>marhaban</rt><rp>)</rp></ruby>. What a useful word!</p>
Hello is: مرحبا. What a useful word!
What about if we have a few words - or a whole sentence - which is entirely RTL?
HTML
<p dir="rtl">مرحبا بالعالم</p>
Is displayed aligned to the right side of the screen:
مرحبا بالعالم
There are a few ways to add pronunciation.
Separate The Words
The first is to write each word separately. For example <ruby>1st word</ruby> <ruby>2nd word</ruby>
. Obviously, this isn't normally how you'd write a RTL language! But it does work:
HTML
<p dir="rtl"><ruby>مرحبا<rp>(</rp><rt>marhaban</rt><rp>)</rp></ruby> <ruby>بالعالم<rp>(</rp><rt>bialealami</rt><rp>)</rp></ruby></p>
Which displays as:
مرحبا بالعالم
It helps to think of the way the characters of the script are stored in memory.
A word that displays as ABC
is stored as C
B
A
.
So the above is written "correctly" - even though it looks odd in the source-code view.
All At Once
But there is an alternative if you want the source text to look natural - i.e. [2nd word] [1st word]
.
It's a bit messy, but you can write the LTR text in <rt>
"backwards"!
HTML
<p dir="rtl"><ruby>مرحبا بالعالم<rt>bialealami marhaban</rt></ruby></p>
مرحبا بالعالم
But, again, that doesn't seem very satisfying! It also divorces the pronunciation from the original word - which is unfortunate for screenreaders.
The Ruby layout algorithm is usually clever enough to group words separated by spaces:
مرحبا بالعالم
مرحبا بالعالم
Although, if the pronunciations have a significantly different length than each other, it can get a bit messy:
مرحبا بالعالم
مرحبا بالعالم
In which case, you probably need to go for the first technique and wrap each word in its own <ruby>
element:
مرحبا بالعالم
BDO
It's tempting to think that simply using the <bdo>
element can help us here. It can't!
Using the bidirectional override will display characters RTL, rather than words.
HTML
<p dir="rtl"><ruby>مرحبا بالعالم<rt><bdo dir="rtl">marhaban bialealami</bdo></rt></ruby></p>
Becomes:
مرحبا بالعالم
I guess you could spell each word backwards. Which would be extremely annoying for everyone and a complete nightmare for screen readers!
Instead, it can be fixed if each word is then given an explicit LTR direction:
HTML
<p dir="rtl"><ruby>مرحبا بالعالم<rt>
<bdo dir="rtl">
<span dir="ltr">marhaban</span> <span dir="ltr">bialealami</span>
</bdo></rt></ruby></p>
مرحبا بالعالم
Is that it?
So, I think those are the only ways to achieving mixing bidirectional text pronunciation. But I'd welcome any corrections and suggestions!
@edent says:
More comments on Mastodon.