<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/rss-style.xsl" type="text/xsl"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	     xmlns:dc="http://purl.org/dc/elements/1.1/"
	   xmlns:atom="http://www.w3.org/2005/Atom"
	     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	  xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>i18n &#8211; Terence Eden’s Blog</title>
	<atom:link href="https://shkspr.mobi/blog/tag/i18n/feed/" rel="self" type="application/rss+xml" />
	<link>https://shkspr.mobi/blog</link>
	<description>Regular nonsense about tech and its effects 🙃</description>
	<lastBuildDate>Tue, 04 Nov 2025 06:55:28 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://shkspr.mobi/blog/wp-content/uploads/2023/07/cropped-avatar-32x32.jpeg</url>
	<title>i18n &#8211; Terence Eden’s Blog</title>
	<link>https://shkspr.mobi/blog</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title><![CDATA[Internationalise The Fediverse]]></title>
		<link>https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/</link>
					<comments>https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sat, 17 Feb 2024 12:34:58 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[ActivityPub]]></category>
		<category><![CDATA[fediverse]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[mastodon]]></category>
		<category><![CDATA[unicode]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=49643</guid>

					<description><![CDATA[We live in the future now. It is OK to use Unicode everywhere.  It seems bizarre to me that modern Internet services sometimes &#34;forget&#34; that there&#039;s a world outside the Anglosphere. Some people have the temerity to speak foreign languages! And some of those languages have accents on their letters!! Even worse, some don&#039;t use English letters at all!!!  A decade ago, I was miffed that GitHub only…]]></description>
										<content:encoded><![CDATA[<p>We live in the future now. It is OK to use Unicode everywhere.</p>

<p>It seems bizarre to me that modern Internet services sometimes "forget" that there's a world outside the Anglosphere. Some people have the temerity to speak <em>foreign</em> languages! And some of those languages have accents on their letters!! Even worse, some don't use English letters <em>at all!!!</em></p>

<p>A decade ago, I was miffed that <a href="https://shkspr.mobi/blog/2013/06/is-github-racist/">GitHub only supported some ASCII characters</a> in its project names. There's no <em>technical</em> reason why your repo can't be called "ഹലോ വേൾഡ്".</p>

<p>Similarly, I'm frustrated that Mastodon (the largest ActivityPub service) <a href="https://github.com/mastodon/mastodon/issues/8417">doesn't allow Unicode usernames</a> and has <a href="https://jam.xwx.moe/notice/AdXsJF6Q5oYHJBEAiG">resisted efforts to change</a>.</p>

<p>So I built a small ActivityPub server which publishes content from an Actor called <a href="https://i18n.viii.fi/.well-known/webfinger"><code>@你好@i18n.viii.fi</code></a> - it is only a demo account, but it works!</p>

<p>Some ActivityPub clients report that they are able to follow it and receive messages from it. Others - like Mastodon - simply can't see anything from it.  Take a look <a href="https://mastodon.social/@Edent/111920759100955860">at the replies on Mastodon</a> to see which services work.  You can also <a href="https://fed.xnor.in/users/$Aet3ViWYORXdinGChM">see some of its posts on the Fediverse</a>.</p>

<h2 id="what-does-the-fox-spec-say"><a href="https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/#what-does-the-fox-spec-say">What Does The <del>Fox</del> Spec Say?</a></h2>

<p>The ActivityPub specification says:</p>

<blockquote><p>Building an international base of users is important in a federated network.
<a href="https://www.w3.org/TR/activitypub/#i18n-concerns">Internationalization</a></p></blockquote>

<p>I can't find anything in the specifications which limits what languages a username can be written in. But there are a few clues scattered about.</p>

<p>The user's <code>@</code> name is defined by <code>preferredUsername</code> which is:</p>

<blockquote><p>A short username which may be used to refer to the actor, with no uniqueness guarantees. 
<a href="https://www.w3.org/TR/activitypub/#preferredUsername">4.1 Actor objects</a></p></blockquote>

<p>There's nothing in there about what scripts it can contain. However, later on, the spec says:</p>

<blockquote><p>Properties containing natural language values, such as <code>name</code>, <code>preferredUsername</code>, or <code>summary</code>, make use of <a href="https://www.w3.org/TR/activitystreams-core/#naturalLanguageValues">natural language support defined in ActivityStreams</a>.
<a href="https://www.w3.org/TR/activitypub/#h-note-2">4. Actors</a></p></blockquote>

<p>So it is expected that a preferred username could be written in multiple scripts. Which implies that the default need not be limited to A-Z0-9.</p>

<p>The <a href="https://www.w3.org/TR/activitystreams-core/#marking-up-language">ActivityStreams specification talks about language mapping</a>.</p>

<p>Finally, the <a href="https://www.w3.org/TR/activitypub/#liked-property">ActivityPub specification has some examples on non-Latin text</a> in names.</p>

<p>So, I think that it is acceptable for usernames to be written in a variety of non-Latin scripts.</p>

<h2 id="but-what-about"><a href="https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/#but-what-about">But What About...?</a></h2>

<p>There are usually a few objections to "Unicode Everywhere" zealots like me. I'd like to forestall any arguments.</p>

<h3 id="what-about-homograph-attacks"><a href="https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/#what-about-homograph-attacks">What about homograph attacks?</a></h3>

<p>Well, what about them? ASCII has plenty of similar looking characters. I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known. Are mixed language homographs more dangerous? I don't think so.</p>

<h3 id="what-if-people-make-names-that-cant-be-typed"><a href="https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/#what-if-people-make-names-that-cant-be-typed">What if people make names that can't be typed?</a></h3>

<p>Well, what if they do? Maybe not being found by people who can't type your language is a feature, not a bug.  But, anyway, clients can let users search for other people, or copy and paste their names.</p>

<h3 id="what-about-weird-zalgo-text"><a href="https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/#what-about-weird-zalgo-text">What about weird "Zalgo" text?</a></h3>

<p>It is up to a client to decide how they want to render text input. The "problems" of strange Unicode combinations are well known. This is not a hard computer-science problem.</p>

<h3 id="what-about-bi-directional-text"><a href="https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/#what-about-bi-directional-text">What about bi-directional text?</a></h3>

<p><a href="https://www.w3.org/TR/activitystreams-core/#h-biditext">The spec makes clear this is allowed</a>.</p>

<h3 id="do-people-even-want-a-username-in-their-own-script"><a href="https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/#do-people-even-want-a-username-in-their-own-script">Do people even want a username in their own script?</a></h3>

<p>I have no evidence for this. But I bet you'd get pretty frustrated if you had to switch keyboard just to type your own name, wouldn't you? In any case, why can't I have a username of <code>@😉</code></p>

<h2 id="whats-next"><a href="https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/#whats-next">What's Next?</a></h2>

<p>If you build ActivityPub software, give some thought to the billions of people who don't have names which easily fit into ASCII.</p>

<p>If your software can see <a href="https://i18n.viii.fi/.well-known/webfinger"><code>@你好@i18n.viii.fi</code></a> and its posts, please let me know.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=49643&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2024/02/internationalise-the-fediverse/feed/</wfw:commentRss>
			<slash:comments>36</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Not Quite Emoji Domain Names]]></title>
		<link>https://shkspr.mobi/blog/2022/11/not-quite-emoji-domain-names/</link>
					<comments>https://shkspr.mobi/blog/2022/11/not-quite-emoji-domain-names/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Fri, 04 Nov 2022 12:34:13 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[domains]]></category>
		<category><![CDATA[emoji]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[NaBloPoMo]]></category>
		<category><![CDATA[unicode]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=43928</guid>

					<description><![CDATA[Like all good geeks, I have far too many domain names that I acquired for interesting projects which never took off.  My latest is a bit different though.  https://⏻.ga/🔗  That&#039;s &#34;Unicode Power Symbol Dot Gabon&#34;.  Because why not.  Regular readers will know that I helped get ⏻ and several power symbols into Unicode.  When I do talks about this, I usually refer to them as Emoji because, to most peo…]]></description>
										<content:encoded><![CDATA[<html><head><style>
@font-face { 
  font-family: "power";
  src: url(data:application/octet-stream;base64,d09GMgABAAAAAARMAA4AAAAACNAAAAPvAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP0ZGVE0cGh4GVgCCYhEICoUUhDABNgIkAygLFgAEIAWDdAeBFz93ZWJmBhtcB8iOxDgmcsRI5UX+Q571flJmrKx1G3faMbWgT8JJaVcQD/8/5u77M4malrZknsbpZlk8EgqZQ00kC4UUGTycdp7gnef436GyhAI+52tgAcacBezsL21O4FUACIDLl3TCAeCK6cpHpq4t/UPsNdoAU8AgECghkPXdAiSdSAMQIyteJF4GkC1+Y68jLRkCg5wckpDHQwH3yhQ405kVIUn7l4ajMpD0wQGAVQXWIWgMLwRdFrveXFNEQhagM+nflD+zIv7/hw/qd8n/b3fkt7SFffUxpB6SNKEghq9Atbz/CPno/38MwTmBsR04DtB86hTNhYBZ+28LJJHhPJ1phnoifW2qrq6qPiMAErypu68TSalRwxVwY1zXhenCDTTcVD/sfXz9NjXeCtt+iesS0LDPQo2GGw96Ht2ixvAVB3BqtbjxercaogaB69L8ar59/bEWdAmSpkGtrpBM6xo1YRoNR9T9UH2z6YZQLlFTLQhAZYOw6rXVwpMbN28+ffg4/NGzqFv3CQm/ex4IN5qaj2i+pXnT+7Astdmc5tuoBcj0FU6FnXtDQu2ZE+QQGlqleBzoBONgfETnMSGs/xpnFBdPX7+73eHZU4eOGIf2p886HHavL56eUWz81zrPCeZZHwbW31g+JmJHF4vGRtnlN7yrPmSZ2weHYPF1Z5AiNq6D5SrM090UcXEL+F1No3F//8aNnly/vv5R/K5d8Y/+EdkwPYr/59sbFhYa+jihfv36k3Gjf/+OxuVWN8WNUQQcHwpcURuWQX4WK47vUCQK1oUyp7lhyWrRAV+5eWsQNa9IN5cHtTaTf5qu88GMYnmR4cRZkEZ59WeUBA1+/ToYVMKSjb6gQV/avi1exmvXGnuBDKbN2DF0dUvMwfPAeD37rzItGeCWaPtk2v7/kl6OE8h2l4VA9qGdcDWAlOrni8/FDgAyFToGWYzJKG2hR18+hcEIwGGvVh8DoLDSmbRPC+9gESI5Aj1cTUJdpBPI4MpikMXSB8ih4RPk0fUHvmDqH3xjfuiCH8jIkfgLjlRv3GTRiQB4UlLgvaqUK0m+HCaTdy5PpuTv9SlSBUmVrlSOREVU8uUpMccXSZeKNzU0Zrpvy9FrrxDoOWKnVgam6jDGJhb96MaLmjCpuq5YZo90vEkvGmo/MVyqJGmjYJbQnUr0afYlsj7e1CzeM7M1hW9rYRNH763mRarMK5gDetvMeYnknSpUCk6RbEiVsw/qJBIOQEi0kiURiUlCUpJJS/My50u73LQD/MOVQXxwpK+rv4+RJ8Hfb46/ShVG6TH+CMNAsI9SGfCk1nPpwiUqhbaH0iUskld4Bil8lKDQhCPFAAA=);
}
.power {font-family:"power";color:#F00;}
</style>

</head><body><p>Like all good geeks, I have far too many domain names that I acquired for interesting projects which never took off.  My latest is a bit different though.</p>

<h2 id="https-%e2%8f%bb-ga"><a href="http://web.archive.org/web/20221212040251/https://xn--soh.ga/">https://<span class="power">⏻</span>.ga/</a><a href="https://shkspr.mobi/blog/2022/11/not-quite-emoji-domain-names/#https-%e2%8f%bb-ga">🔗</a></h2>

<p>That's "Unicode Power Symbol Dot Gabon".  Because why not.</p>

<p>Regular readers will know that I helped get <span class="power">⏻</span> and <a href="https://unicodepowersymbol.com/">several power symbols into Unicode</a>.  When I do talks about this, I usually refer to them as Emoji because, to most people, Emoji are simply little pictures in text.  But that is a gross oversimplification. You know the meme that <a href="https://journals.lib.washington.edu/index.php/nasko/article/download/15879/13281">real Champagne must be from the Champagne region of France - otherwise it is merely sparkling wine</a>?  Well, Emoji must come from the <a href="https://unicodeplus.com/plane/1">Supplementary Multilingual Plane of Unicode</a> otherwise they're just ✨sparkling✨ characters.</p>

<p>Except... That's not <em>quite</em> true. There are a bunch of symbols stuffed in the <a href="https://en.wikipedia.org/wiki/Miscellaneous_Symbols">Miscellaneous block of the Basic Multilingual Plane</a> which are <em>also</em> Emoji.</p>

<p>The Power Symbol appears in the block <a href="https://en.wikipedia.org/wiki/Miscellaneous_Technical">Miscellaneous Technical</a>.  The symbol itself is not an Emoji, but it is in a block which has 18 Emoji. Confused? Good<sup id="fnref:Babel"><a href="https://shkspr.mobi/blog/2022/11/not-quite-emoji-domain-names/#fn:Babel" class="footnote-ref" title="For more information, please read the Book of Genesis, Chapter 11, verses 1-9." role="doc-noteref">0</a></sup>!</p>

<p>Domain names can only contain the ASCII characters <code>A</code>-<code>Z</code>, <code>0</code>-<code>9</code>, and <code>-</code>. That's a problem if you speak anything other than basic English. Luckily, there's a workaround! I have a Chinese language domain <span style="word-break: keep-all;"><a href="https://莎士比亚.org/">莎士比亚.org</a></span> - through the magic of the <a href="https://www.rfc-editor.org/rfc/rfc3492">Punycode Algorithm</a>, it becomes <a href="https://xn--jlq54w7ypemw.org">xn--jlq54w7ypemw.org</a>.  This use of non-Latin letters in domains is known as IDN - Internationalised<sup id="fnref:idn"><a href="https://shkspr.mobi/blog/2022/11/not-quite-emoji-domain-names/#fn:idn" class="footnote-ref" title="As if English weren't international!" role="doc-noteref">1</a></sup> Domain Names.</p>

<p>IDNs have several <a href="https://www.icann.org/en/blogs/details/hello-world-enabling-internationalized-domain-names-idns-16-6-2021-en">officially supported "scripts"</a> - for example Thai, Greek, Hebrew, Cyrillic, Chinese etc.  Each top level domain (like .uk, .com, .中国) can choose which scripts they'll accept. For example, a Chinese Top Level Domain may only accept Chinese characters and not Greek characters.  IANA maintains a list of <a href="https://www.iana.org/domains/idn-tables">which domains support which scripts</a>.  But it is incomplete. Because it doesn't mention Emoji.</p>

<p>The Punycode algorithm works with emoji. This means you can have <a href="https://xn--i-7iq.ws/">Emoji in domain names</a>!  Mostly.  And that "mostly" is important.</p>

<p>Not every Top Level Domain accepts Emoji domain names (because they hate having fun, I guess?<sup id="fnref:security"><a href="https://shkspr.mobi/blog/2022/11/not-quite-emoji-domain-names/#fn:security" class="footnote-ref" title="But also, quite reasonably, for legitimate security concerns of having Emoji domains." role="doc-noteref">2</a></sup>)</p>

<p>The .ga registry doesn't publish any rules showing which scripts it will accept<sup id="fnref:gascript"><a href="https://shkspr.mobi/blog/2022/11/not-quite-emoji-domain-names/#fn:gascript" class="footnote-ref" title="Do let me know if I am wrong." role="doc-noteref">3</a></sup>.  But seems quite happy to take registrations for Punycode domains. So I registered <a href="http://web.archive.org/web/20221212040251/https://xn--soh.ga/"></a><a href="https://xn--soh.ga/">https://xn--soh.ga/</a> and, after an unusually long delay, it worked!</p>

<p>Does this mean <a href="http://web.archive.org/web/20221212040251/https://xn--soh.ga/"><span class="power">⏻</span>.ga</a> is an Emoji domain? No! <span class="power">⏻</span> is <em>not</em> an Emoji! It is a small pictographic symbol encoded in Unicode.</p>

<p>Does this mean <a href="http://web.archive.org/web/20221212040251/https://xn--soh.ga/"><span class="power">⏻</span>.ga</a> is an IDN? No! <span class="power">⏻</span> is <em>not</em> an international script. It is a language-neutral technical symbol.</p>

<p>So what the fuck kind of domain is it?</p>

<p>Drop an answer in the box bellow.</p>

<div id="footnotes" role="doc-endnotes">
<hr>
<ol start="0">

<li id="fn:Babel">
<p>For more information, please read the <a href="http://www.qbible.com/hebrew-old-testament/genesis/11.html">Book of Genesis, Chapter 11, verses 1-9</a>.&nbsp;<a href="https://shkspr.mobi/blog/2022/11/not-quite-emoji-domain-names/#fnref:Babel" class="footnote-backref" role="doc-backlink">↩︎</a></p>
</li>

<li id="fn:idn">
<p>As if English weren't international!&nbsp;<a href="https://shkspr.mobi/blog/2022/11/not-quite-emoji-domain-names/#fnref:idn" class="footnote-backref" role="doc-backlink">↩︎</a></p>
</li>

<li id="fn:security">
<p>But also, quite reasonably, for <a href="https://www.icann.org/en/system/files/files/sac-095-en.pdf">legitimate security concerns of having Emoji domains</a>.&nbsp;<a href="https://shkspr.mobi/blog/2022/11/not-quite-emoji-domain-names/#fnref:security" class="footnote-backref" role="doc-backlink">↩︎</a></p>
</li>

<li id="fn:gascript">
<p>Do let me know if I am wrong.&nbsp;<a href="https://shkspr.mobi/blog/2022/11/not-quite-emoji-domain-names/#fnref:gascript" class="footnote-backref" role="doc-backlink">↩︎</a></p>
</li>

</ol>
</div>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=43928&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2022/11/not-quite-emoji-domain-names/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Simultaneous Translation in HTML]]></title>
		<link>https://shkspr.mobi/blog/2022/07/simultaneous-translation-in-html/</link>
					<comments>https://shkspr.mobi/blog/2022/07/simultaneous-translation-in-html/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Tue, 12 Jul 2022 11:34:43 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[i18n]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=42919</guid>

					<description><![CDATA[How do you show two languages simultaneously in HTML?  If you want to show text in a foreign language, the markup is simple:  &#60;html lang=&#34;en-GB&#34;&#62; ... As Caesar said: &#60;i lang=&#34;la&#34;&#62;veni vidi vici&#60;/i&#62;   That says the page is in British English (en-GB) but the specific phrase is in Latin (la). But how can you offer an in-text translation of that phrase into the page&#039;s native language?  Here are a few …]]></description>
										<content:encoded><![CDATA[<p>How do you show two languages simultaneously in HTML?  If you want to show text in a foreign language, the markup is simple:</p>

<pre><code class="language-html">&lt;html lang="en-GB"&gt;
...
As Caesar said: &lt;i lang="la"&gt;veni vidi vici&lt;/i&gt;
</code></pre>

<p>That says the page is in British English (en-GB) but the specific phrase is in Latin (la). But how can you offer an in-text translation of that phrase into the page's native language?</p>

<p>Here are a few options - and their drawbacks.</p>

<h2 id="title-text"><a href="https://shkspr.mobi/blog/2022/07/simultaneous-translation-in-html/#title-text">Title Text</a></h2>

<pre><code class="language-html">&lt;i lang="la" title="I came, I saw, I conquered"&gt;veni vidi vici&lt;/i&gt;
</code></pre>

<p><br>
<i lang="la" title="I came, I saw, I conquered">veni vidi vici</i></p>

<p>The user has to hover their pointer over the text and a pop-up will appear with the translation.  There are two disadvantages to this:</p>

<ol>
<li>Not all devices - like mobile browsers - support title text.</li>
<li>The title text has no separate language attribute - so is semantically in Latin.</li>
</ol>

<p>The language can be corrected by <a href="https://twitter.com/nevali/status/1537354665968513026">wrapping the title in a separate span</a>.</p>

<h2 id="tables"><a href="https://shkspr.mobi/blog/2022/07/simultaneous-translation-in-html/#tables">Tables</a></h2>

<p>The humble <code>&lt;table&gt;</code> can present two or more items of text adjacent to one another.</p>

<pre><code class="language-html">&lt;table&gt;
   &lt;tr&gt;
      &lt;td lang="la"&gt;veni vidi vici&lt;/td&gt;
      &lt;td lang="en"&gt;I came, I saw, I conquered&lt;/td&gt;
   &lt;/td&gt;
&lt;/table&gt;
</code></pre>

<p><br></p>

<table>
   <tbody><tr>
      <td lang="la">veni vidi vici</td>
      <td lang="en">I came, I saw, I conquered</td>
   
</tr></tbody></table>

<p>Tables can be problematic on narrow screens - either requiring wrapping or scrolling.</p>

<h2 id="details"><a href="https://shkspr.mobi/blog/2022/07/simultaneous-translation-in-html/#details">Details</a></h2>

<pre><code class="language-html">&lt;details&gt;
    &lt;summary lang="la"&gt;veni vidi vici&lt;/summary&gt;
    I came, I saw, I conquered
&lt;/details&gt;
</code></pre>

<p><br></p>

<details>
    <summary lang="la">veni vidi vici</summary>
    I came, I saw, I conquered
</details>

<p>Again, it requires interaction - which may not work on devices like eReaders. Unfortunately, details is a block element, but you can <a href="https://shkspr.mobi/blog/2020/12/a-terrible-way-to-do-footnotes-in-html/">read my experiments in making them inline</a>.</p>

<h2 id="ruby"><a href="https://shkspr.mobi/blog/2022/07/simultaneous-translation-in-html/#ruby">Ruby</a></h2>

<p>As <a href="https://twitter.com/jribbens/status/1475632280894943234">suggested by John Ribbens</a></p>

<pre><code class="language-html">&lt;ruby lang="la"&gt;
   veni vidi vici
   &lt;rt lang="en-GB"&gt;I came, I saw, I conquered&lt;/rt&gt;
&lt;/ruby&gt;
</code></pre>

<p><br>
<ruby lang="la">
   veni vidi vici
   <rt lang="en-GB">I came, I saw, I conquered</rt>
</ruby></p>

<p>That works quite well - although Ruby text is pretty small. But it can be styled with CSS.</p>

<p>Ruby is usually used for <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/ruby">showing pronunciation of characters</a>. But, crucially, <a href="https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-ruby-element">it isn't <em>restricted</em> to that</a>.</p>

<h2 id="description-lists"><a href="https://shkspr.mobi/blog/2022/07/simultaneous-translation-in-html/#description-lists">Description Lists</a></h2>

<pre><code class="language-html">&lt;dl&gt;
    &lt;dt lang="la"&gt;veni vidi vici&lt;/dt&gt;
    &lt;dd&gt;I came, I saw, I conquered&lt;/dd&gt;
&lt;/dl&gt;
</code></pre>

<p><br></p>

<dl>
    <dt lang="la">veni vidi vici</dt>
    <dd>I came, I saw, I conquered</dd>
</dl>

<p>Again, very easy to style with CSS. One of the nice things about <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/dl">Description Lists</a> is that it allows for <em>multiple</em> definitions:</p>

<pre><code class="language-html">&lt;dl&gt;
    &lt;dt lang="la"&gt;veni vidi vici&lt;/dt&gt;
    &lt;dd&gt;I came, I saw, I conquered&lt;/dd&gt;
    &lt;dd lang="ja"&gt;私は私が征服した来た&lt;/dd&gt;
&lt;/dl&gt;
</code></pre>

<h2 id="mix-them-all-together"><a href="https://shkspr.mobi/blog/2022/07/simultaneous-translation-in-html/#mix-them-all-together">MIX THEM ALL TOGETHER!</a></h2>

<p>Let's take a section from <a href="https://www.gutenberg.org/cache/epub/2383/pg2383.html">Chaucer's Canterbury Tales</a>. Most of the Middle English is understandable - but a few archaic words need translation. It's also useful to have some commentary on the text.</p>

<pre><code class="language-html">&lt;dl&gt;
   &lt;dt lang="enm"&gt;Full many a fat partridge had he in 
      &lt;ruby&gt;mew&lt;rp&gt;(&lt;/rp&gt;&lt;rt lang="en-GB"&gt;cage&lt;/rt&gt;&lt;rp&gt;)&lt;/rp&gt;&lt;/ruby&gt;
   &lt;/dt&gt;
   &lt;dd&gt;The place behind Whitehall, where the King's hawks were caged was called the Mews.&lt;/dd&gt;
&lt;/dl&gt;

&lt;details&gt;
   &lt;summary lang="enm"&gt;And many a bream, and many a 
      &lt;ruby&gt;luce&lt;rp&gt;(&lt;/rp&gt;&lt;rt lang="en-GB"&gt;pike&lt;/rt&gt;&lt;rp&gt;)&lt;/rp&gt;
      in 
      &lt;ruby&gt;stew&lt;rp&gt;(&lt;/rp&gt;&lt;rt lang="en-GB"&gt;fish-pond&lt;/rt&gt;&lt;rp&gt;)&lt;/rp&gt;
   &lt;/summary&gt;
   In those Catholic days, when much fish was eaten, no gentleman's mansion was complete without a "stew".
&lt;/details&gt;
</code></pre>

<p><br></p>

<dl>
   <dt lang="enm">Full many a fat partridge had he in <ruby>mew<rp>(</rp><rt lang="en-GB">cage</rt><rp>)</rp></ruby></dt>
   <dd>The place behind Whitehall, where the King's hawks were caged was called the Mews.</dd>
</dl>

<details>
   <summary style="font-weight: normal;" lang="enm">And many a bream, and many a <ruby>luce<rp>(</rp><rt lang="en-GB">pike</rt><rp>)</rp> in <ruby>stew<rp>(</rp><rt lang="en-GB">fish-pond</rt><rp>)</rp></ruby></ruby></summary>
   In those Catholic days, when much fish was eaten, no gentleman's mansion was complete without a "stew".
</details>

<h2 id="which-should-you-use"><a href="https://shkspr.mobi/blog/2022/07/simultaneous-translation-in-html/#which-should-you-use">Which should you use?</a></h2>

<p>Yes.</p>

<p>There's no definitive "correct" answer here. <code>title</code> text might make sense for occasional words which need translating - and you're sure either the user's device supports it, or they won't be substantially disadvantaged if it doesn't.</p>

<p>Similarly, <code>details</code> works for interactive content which is <em>optional</em> to understanding.</p>

<p>The <code>ruby</code> elements are great if you want a fairly unobtrusive way to translate <em>specific</em> words.</p>

<p>Lists are great if you need to offer <em>multiple</em> translations.</p>

<p>Mashing them all together is a bit silly and complicated - but allows for a greater variety in the way the texts are displayed.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=42919&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2022/07/simultaneous-translation-in-html/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[HTML Ruby and Bidirectional Text]]></title>
		<link>https://shkspr.mobi/blog/2022/06/html-ruby-and-bidirectional-text/</link>
					<comments>https://shkspr.mobi/blog/2022/06/html-ruby-and-bidirectional-text/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 22 Jun 2022 11:34:37 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[i18n]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=42945</guid>

					<description><![CDATA[The set of HTML &#60;ruby&#62; elements allow us to add pronunciation above text. For example:  &#34;When you visit the zoo, be sure to see the panda - 熊(Xióng)猫(māo).&#34;  This is written as:  &#60;ruby&#62;熊&#60;rp&#62;(&#60;/rp&#62;&#60;rt&#62;Xióng&#60;/rt&#62;&#60;rp&#62;)&#60;/rp&#62;&#60;/ruby&#62;&#60;ruby&#62;猫&#60;rp&#62;(&#60;/rp&#62;&#60;rt&#62;māo&#60;/rt&#62;&#60;rp&#62;)&#60;/rp&#62;&#60;/ruby&#62;.   That is, the word or character which needs text above it is wrapped in &#60;ruby&#62;. The pronunciation is wrapped in &#60;rt&#62;. The &#60;r…]]></description>
										<content:encoded><![CDATA[<p>The set of <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/rp">HTML <code>&lt;ruby&gt;</code> elements</a> allow us to add pronunciation above text. For example:</p>

<p>"When you visit the zoo, be sure to see the panda - <ruby>熊<rp>(</rp><rt>Xióng</rt><rp>)</rp></ruby><ruby>猫<rp>(</rp><rt>māo</rt><rp>)</rp></ruby>."</p>

<p>This is written as:</p>

<pre><code class="language-html">&lt;ruby&gt;熊&lt;rp&gt;(&lt;/rp&gt;&lt;rt&gt;Xióng&lt;/rt&gt;&lt;rp&gt;)&lt;/rp&gt;&lt;/ruby&gt;&lt;ruby&gt;猫&lt;rp&gt;(&lt;/rp&gt;&lt;rt&gt;māo&lt;/rt&gt;&lt;rp&gt;)&lt;/rp&gt;&lt;/ruby&gt;.
</code></pre>

<p>That is, the word or character which needs text above it is wrapped in <code>&lt;ruby&gt;</code>. The pronunciation is wrapped in <code>&lt;rt&gt;</code>. The <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/rp"><code>&lt;rp&gt;</code> element</a> indicates the presence of a parenthesis - which isn't usually displayed, but will be shown if the browser doesn't support <code>&lt;ruby&gt;</code> syntax.</p>

<p>That's fairly easy for scripts written left-to-right. But how does it work for scripts like Arabic where the text is written right-to-left, but the user may want the pronunciations left-to-right?</p>

<p>Let's take the phrase "Hello World" in Arabic: <span dir="rtl">مرحبا بالعالم</span>. Google Translate tells me this is pronounced "marhaban bialealami".</p>

<p>For a single word, the directionality can be ignored. The browser should be smart enough to place the pronunciation above the word:</p>

<pre><code class="language-html">&lt;p&gt;Hello is: &lt;ruby&gt;مرحبا&lt;rp&gt;(&lt;/rp&gt;&lt;rt&gt;marhaban&lt;/rt&gt;&lt;rp&gt;)&lt;/rp&gt;&lt;/ruby&gt;. What a useful word!&lt;/p&gt;
</code></pre>

<p>Hello is: <ruby>مرحبا<rp>(</rp><rt>marhaban</rt><rp>)</rp></ruby>. What a useful word!</p>

<p>What about if we have a few words - or a whole sentence - which is entirely RTL?</p>

<pre><code class="language-html">&lt;p dir="rtl"&gt;مرحبا بالعالم&lt;/p&gt;
</code></pre>

<p>Is displayed aligned to the right side of the screen:</p>

<p dir="rtl" style="font-size: 2em;">مرحبا بالعالم</p>

<p>There are a few ways to add pronunciation.</p>

<h2 id="separate-the-words"><a href="https://shkspr.mobi/blog/2022/06/html-ruby-and-bidirectional-text/#separate-the-words">Separate The Words</a></h2>

<p>The first is to write each word separately.  For example <code>&lt;ruby&gt;1st word&lt;/ruby&gt; &lt;ruby&gt;2nd word&lt;/ruby&gt;</code>. Obviously, this isn't normally how you'd write a RTL language! But it does work:</p>

<pre><code class="language-html">&lt;p dir="rtl"&gt;&lt;ruby&gt;مرحبا&lt;rp&gt;(&lt;/rp&gt;&lt;rt&gt;marhaban&lt;/rt&gt;&lt;rp&gt;)&lt;/rp&gt;&lt;/ruby&gt; &lt;ruby&gt;بالعالم&lt;rp&gt;(&lt;/rp&gt;&lt;rt&gt;bialealami&lt;/rt&gt;&lt;rp&gt;)&lt;/rp&gt;&lt;/ruby&gt;&lt;/p&gt;
</code></pre>

<p>Which displays as:</p>

<p dir="rtl" style="font-size: 2em;"><ruby>مرحبا<rp>(</rp><rt>marhaban</rt><rp>)</rp></ruby> <ruby>بالعالم<rp>(</rp><rt>bialealami</rt><rp>)</rp></ruby></p>

<p>It helps to think of the way the characters of the script are stored in memory.</p>

<p>A word that <em>displays</em> as <code>ABC</code> is <em>stored</em> as <code>C</code> <code>B</code> <code>A</code>.</p>

<p>So the above is written "correctly" - even though it looks odd in the source-code view.</p>

<h2 id="all-at-once"><a href="https://shkspr.mobi/blog/2022/06/html-ruby-and-bidirectional-text/#all-at-once">All At Once</a></h2>

<p>But there is an alternative if you want the source text to look natural - i.e. <code>[2nd word] [1st word]</code>.</p>

<p>It's a bit messy, but you can write the LTR text in <em><code>&lt;rt&gt;</code></em> "backwards"!</p>

<pre><code class="language-html">&lt;p dir="rtl"&gt;&lt;ruby&gt;مرحبا بالعالم&lt;rt&gt;bialealami marhaban&lt;/rt&gt;&lt;/ruby&gt;&lt;/p&gt;
</code></pre>

<p dir="rtl" style="font-size: 2em;"><ruby>مرحبا بالعالم<rt>bialealami marhaban</rt></ruby></p>

<p>But, again, that doesn't seem very satisfying! It also divorces the pronunciation from the original word - which is unfortunate for screenreaders.</p>

<p>The Ruby layout algorithm is usually clever enough to group words separated by spaces:</p>

<p dir="rtl" style="font-size: 2em;"><ruby>مرحبا بالعالم<rt>B A</rt></ruby></p>

<p dir="rtl" style="font-size: 2em;"><ruby>مرحبا بالعالم<rt>Bbbbbbbbbbbbbb Aaaaaaaaaaaaa</rt></ruby></p>

<p>Although, if the pronunciations have a significantly different length than each other, it can get a bit messy:</p>

<p dir="rtl" style="font-size: 2em;"><ruby>مرحبا بالعالم<rt>Bbbbbbbbbbbbbb A</rt></ruby></p>

<p dir="rtl" style="font-size: 2em;"><ruby>مرحبا بالعالم<rt>B Aaaaaaaaaaaaa</rt></ruby></p>

<p>In which case, you probably need to go for the first technique and wrap each word in its own <code>&lt;ruby&gt;</code> element:</p>

<p dir="rtl" style="font-size: 2em;"><ruby>مرحبا<rp>(</rp><rt>A</rt><rp>)</rp></ruby> <ruby>بالعالم<rp>(</rp><rt>Bbbbbbbbbbbbbb</rt><rp>)</rp></ruby></p>

<h2 id="bdo"><a href="https://shkspr.mobi/blog/2022/06/html-ruby-and-bidirectional-text/#bdo">BDO</a></h2>

<p>It's tempting to think that simply using <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/bdo">the <code>&lt;bdo&gt;</code> element</a> can help us here. It can't!</p>

<p>Using the bidirectional override will display <em>characters</em> RTL, rather than words.</p>

<pre><code class="language-html">&lt;p dir="rtl"&gt;&lt;ruby&gt;مرحبا بالعالم&lt;rt&gt;&lt;bdo dir="rtl"&gt;marhaban bialealami&lt;/bdo&gt;&lt;/rt&gt;&lt;/ruby&gt;&lt;/p&gt;
</code></pre>

<p>Becomes:</p>

<p dir="rtl" style="font-size: 2em;"><ruby>مرحبا بالعالم<rt><bdo dir="rtl">marhaban bialealami</bdo></rt></ruby></p>

<p>I guess you could spell each word backwards. Which would be extremely annoying for everyone and a complete nightmare for screen readers!</p>

<p>Instead, it can be fixed if each word is then given an explicit LTR direction:</p>

<pre><code class="language-html">&lt;p dir="rtl"&gt;&lt;ruby&gt;مرحبا بالعالم&lt;rt&gt;
   &lt;bdo dir="rtl"&gt;
      &lt;span dir="ltr"&gt;marhaban&lt;/span&gt; &lt;span dir="ltr"&gt;bialealami&lt;/span&gt;
   &lt;/bdo&gt;&lt;/rt&gt;&lt;/ruby&gt;&lt;/p&gt;
</code></pre>

<p dir="rtl" style="font-size: 2em;"><ruby>مرحبا بالعالم<rt>
   <bdo dir="rtl">
      <span dir="ltr">marhaban</span> <span dir="ltr">bialealami</span>
   </bdo></rt></ruby></p>

<h2 id="is-that-it"><a href="https://shkspr.mobi/blog/2022/06/html-ruby-and-bidirectional-text/#is-that-it">Is that it?</a></h2>

<p>So, I <em>think</em> those are the only ways to achieving mixing bidirectional text pronunciation. But I'd welcome any corrections and suggestions!</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=42945&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2022/06/html-ruby-and-bidirectional-text/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[If HTML5 Were British]]></title>
		<link>https://shkspr.mobi/blog/2020/11/if-html5-were-british/</link>
					<comments>https://shkspr.mobi/blog/2020/11/if-html5-were-british/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Mon, 02 Nov 2020 12:23:17 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[NaBloPoMo]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=37030</guid>

					<description><![CDATA[If you&#039;ve been around programming circles long enough, you&#039;ll probably have read the seminal &#34;If PHP Were British&#34;. If not, go read it now. I&#039;ll wait.  I love the idea of a non-American programming language. I&#039;m aware that there are some, but I&#039;m unaware of any which are in British English. Except, perhaps, BBC Basic. Although that also allows traitorous American spelling for some keywords.  HTML …]]></description>
										<content:encoded><![CDATA[<p>If you've been around programming circles long enough, you'll probably have read the seminal "<a href="https://aloneonahill.com/blog/if-php-were-british/">If PHP Were British</a>". If not, go read it now. I'll wait.</p>

<p>I love the idea of a non-American programming language. I'm aware that there are some, but I'm unaware of any which are in <em>British</em> English. Except, perhaps, BBC Basic. Although that also allows <a href="https://www.bbcbasic.co.uk/bbcwin/manual/bbcwin4.html#colour">traitorous American spelling for some keywords</a>.</p>

<p>HTML was invented by a Brit (Hi Sir Timbl!). So why doesn't it use British spelling for everything?</p>

<p>Well, I guess, the answer is... it mostly does!</p>

<p>Looking through the <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element">big list of HTML elements</a> only one is noticably in American English. The <code>&lt;dialog&gt;</code> element was introduced reasonably recently in <a href="https://www.w3.org/TR/html52/interactive-elements.html#the-dialog-element">HTML 5.2</a>. I would love to know if there were any late-night arguments about whether it should have been dialog<strong>ue</strong>...</p>

<p>In the <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/center">obsolete element section</a> we find the much-missed <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/center"><code>&lt;center&gt;</code></a>. Perhaps, in an alternate timeline, it was named cent<strong>re</strong> and is still in use today?</p>

<p>Center has a curious history.</p>

<blockquote><p><code>CENTER</code> was introduced by Netscape before they added support for the HTML 3.0 <code>DIV</code> element. It is retained in HTML 3.2 on account of its widespread deployment. 
<a href="https://www.w3.org/TR/2018/SPSD-html32-20180315/#center">HTML 3.2 Reference Specification - 1997</a></p></blockquote>

<p>You can <a href="http://ksi.cpsc.ucalgary.ca/archives/WWW-TALK/www-talk-1994q4/0332.html">read some of the original <del>arguments</del> discussions from the early 1990s</a>.</p>

<p>CSS was invented by <a href="https://en.wikipedia.org/wiki/H%C3%A5kon_Wium_Lie">Håkon Wium Lie</a> with <a href="https://www.w3.org/TR/CSS1/#color-and-background-properties">CSS version 1</a> specifying the spelling <code>color</code>. As Håkon was Norwegian, I suppose we could have ended up with <code>farge</code>. That might have been nice.</p>

<p>Perhaps that's what the <em>World-Wide</em> Web needs. HTML elements which are <em>not</em> in English. There is no technical limitation why we can't have an <code>&lt;电影&gt;</code> element. Or a CSS property of <code>نطاط</code></p>

<p>British English is the best. But I only think that because it is what I've grown up with. English is the world's most popular second language. But it won't be long before Chinese catches up to the total number of speakers. Is it fair to make new web developers learn an entirely new human language while they struggle with learning a new computer language?</p>

<p>I constantly find myself typing <code>colour</code> when I mean <code>color</code> - does a Hindi speaking developer want to be able to program in their preferred language?</p>

<p>Why can't an HTML document start <code>&lt;!DOCTYPE html ፊደል&gt;</code> and then have all the elements written in Geʽez script?</p>

<p>I know you think this is too hard to achieve. But part of the job of computer scientists is to work out how to make computers do the hard work for us.  Humans shouldn't adapt to a computer's needs; the computer must adapt to ours.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=37030&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2020/11/if-html5-were-british/feed/</wfw:commentRss>
			<slash:comments>12</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Localisation is too hard for Gmail]]></title>
		<link>https://shkspr.mobi/blog/2020/10/localisation-is-too-hard-for-gmail/</link>
					<comments>https://shkspr.mobi/blog/2020/10/localisation-is-too-hard-for-gmail/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 21 Oct 2020 11:05:55 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[gmail]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[rant]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=36992</guid>

					<description><![CDATA[/ləʊk(ə)lʌɪˈzeɪʃ(ə)n/ The ability to adjust a user-interface to the user&#039;s local language or dialect  Because I live in the UK, I speak en_GB (English, Great Britain) rather than en_US (English, Simplified United States).  Mostly, all dialects of English are mutually intelligible. Sure, the Brits love the letter U and the Americans stick a Z in every possible word. But we get along reasonably well…]]></description>
										<content:encoded><![CDATA[<blockquote><p>/ləʊk(ə)lʌɪˈzeɪʃ(ə)n/
The ability to adjust a user-interface to the user's local language or dialect</p></blockquote>

<p>Because I live in the UK, I speak en_GB (English, Great Britain) rather than en_US (English, <del>Simplified</del> United States).</p>

<p>Mostly, all dialects of English are mutually intelligible. Sure, the Brits love the letter U and the Americans stick a Z in every possible word. But we get along reasonably well.  Except in Gmail.</p>

<p>Here's my en_GB localised Gmail interface. Note how there is a folder called "Bin".</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2020/10/Bin-fs8.png" alt="The Gmail Interface." width="795" height="703" class="aligncenter size-full wp-image-36993">

<p>Everyone using Gmail in en_GB will know that deleted emails go into the "Bin".</p>

<p>Gmail has <a href="https://support.google.com/mail/answer/7190?hl=en">a handy search feature</a> to allow you to find emails in a specific folder.  For example "Bob in:spam" finds all email containing the word "Bob" in your spam folder. "Proposal in:sent" gets everything you've sent with the word proposal.</p>

<p>But it is <em>impossible</em> to search the "Bin" folder.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2020/10/Beer-in-Bin-fs8.png" alt="A search for &quot;beer in:bin&quot; returns nothing." width="977" height="204" class="aligncenter size-full wp-image-36995">

<p>Why? Because you have to search the <em>Trash</em> folder.  Because that's the names used by Americans.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2020/10/Beer-in-Trash-fs8.png" alt="Lots of results in the trash folder." width="983" height="336" class="aligncenter size-full wp-image-36996">

<blockquote><p>/hɪˈdʒɛməni/</p></blockquote>

<p>The same is true even if you've chosen a non-English language.
<img src="https://shkspr.mobi/blog/wp-content/uploads/2020/10/Gmail-in-German-fs8.png" alt="The Gmail interface in German." width="1057" height="586" class="aligncenter size-full wp-image-36997"></p>

<p>Sadly, Google don't respond to user complaints or feedback. The best you can do is hope a ranty blog post gets high enough traction on social media. Then, maybe, something will change.</p>

<video src="https://shkspr.mobi/blog/wp-content/uploads/2020/10/oscar-the-grouch.mp4" autoplay="" muted="" loop="">

<p>If you're building a service - remember that localisation is about much more than the GUI. All aspects of the interface need to be considered.</p>
</video><img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=36992&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2020/10/localisation-is-too-hard-for-gmail/feed/</wfw:commentRss>
			<slash:comments>10</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[<input type="country" />]]></title>
		<link>https://shkspr.mobi/blog/2017/11/input-type-country/</link>
					<comments>https://shkspr.mobi/blog/2017/11/input-type-country/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Thu, 02 Nov 2017 07:35:10 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[flag]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[NaBloPoMo]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=28790</guid>

					<description><![CDATA[Recently, Lea Verou asked an important question about whether HTML should have a standardised way of letting users select a country from a list.  Lea Verou@LeaVerouHTML Idea: &#60;input type=&#34;country&#34;&#62; which would become a searchable dropdown with all countries and their flags.Wouldn&#039;t that be awesome?❤️ 1,863💬 113🔁 013:17 - Sat 21 October 2017  You can read through the conversation and make your own …]]></description>
										<content:encoded><![CDATA[<p>Recently, Lea Verou asked an important question about whether HTML should have a standardised way of letting users select a country from a list.</p>

<blockquote class="social-embed" id="social-embed-921727157705035776" lang="en" itemscope="" itemtype="https://schema.org/SocialMediaPosting"><header class="social-embed-header" itemprop="author" itemscope="" itemtype="https://schema.org/Person"><a href="https://twitter.com/LeaVerou" class="social-embed-user" itemprop="url"><img class="social-embed-avatar social-embed-avatar-circle" src="data:image/webp;base64,UklGRg4FAABXRUJQVlA4WAoAAAAQAAAALwAALwAAQUxQSPkBAAABkFxt26Inz7xfPA3oyt0qwR2W7m4VxBNqcVauFTj76C84zPu+DzqZ+SqIiAlAZgkAhnbP3Hy98vXr0usbU7sGAQRBzkkA2k8/qTJz5fHJViAkuQjQM7NOUtXcSXdTI7k21Q1IDgGNF1dIU2NmUyOXzzUgbChg9AmZOnP0lHw0jJAtCdhRohpzNuXaVoRMAUd+MmWBKX8cRMgQcIw0FmrkIYT/CLb9UGPBpt83Qf4hGCzRWLhxrQ8CIJGmZ1RGqHzcIAkQcJnKKJVnESBJz5pZHGbLnYkI5qiMVDkJQXvJPRb31RbgHI3RGo+j9rlrPOr3a4Y+0eNxrvftpzIi5a6FqKicvkuLyXjjAz0m56sSo3Yup3GRX+MrxeVc/kCP69U9WkzGm9epMSlnDkTlyt3Dn+kRsdRf+9w1HvUHNbhAi8d4EuiouMfivtYKwQI1FuU0RJLedbM4zFa7EoHgKjUO5QUEIJHml9QYlE+bJAEgGKnQijOWBiD4W7Drp1pRpj+2QvDvgFOkFWPkMQT8P+BkSi0i5c+jCMiYBOypUi0vU5Z3IiRZgICJF2TqeXhKPh1DwEYDmq6tkaaezdXI1cuNCNi4AAPzFZKpmpN0N1WSpbk+QJBnEoCO808qzFx+fLYDCAlyFgEwtn/x3odympY+3J3fNwJABJkBAFZQOCDuAgAAkBAAnQEqMAAwAD61Sp9LJyQioac4DtjgFolsAIFEMH6B+Kv4AdBO7V2A8wH6gdQD0AP6r/gOsz9ADy3fYb/bv0qrv68QAwO1+MdVBDQvGhqAdGX9yPYA/UBC3ADgPsXi5j0xCf0ag0UKGbFnyPlYpKdvC4UNkmjPp4lPDJfMhJREdbdmQjZdqVR6giQAAP773SVz2tak38RhSOrFNOgex6SwAZelVr8jjILiJFco/89vD7kXlE5Hclf7+NBea6kX6RXpcMcS3QX/LfugHt+/7Zr3Xc5sNv64I/L7HZK8S+kLpuJ/98Wq4Xh3MiJsydES/bg6/Pw/6C4qhsSTovz5Dgu+NHfZFFNEeWiOY/PIE9xKjMW2GcNpsLmsIKTOMT12UhYlS1lsg6yfkGxOM9DIv6WugvAibosErB7mhDfxzYBeFt1bAdNTBDsKqBuGGVs6+L8W1FblomJ5M1Vj/4tdMztZwEFD+qMnk9UU6T+ENGNFPpMc66h5TZ2y+CI7iEHvymgPDmTRCc4RMuxDdvI+OkdKYJE2bI6hXAkbtWSdtV0cLGeiZgP0WUbYFC7a7eFr9RccYsM0Y+JW27Tvh0PUGKr0QxiDIxltaPvQ1OqYg5hdzSOHi5ibVLA90QtIErEEP0+a6PfeScWNus/8zdY/rAeNOn2pzH/+kB5z9+7rmImA2owaUZzd9uzL2JWs0XnFZwvfdt2LIz98YfaeFN2pIhuv4WY3g8RviPjQKXUv9BUHjIy9vMQSnZDf58SWWGMUEJULFuNKVO8/FHs7z1kO9Xw/5y14YOJ7xxAVjT1/6d+yxNqv33k/4AEyCwTlGfc52BdPtK85mUGtDXpZBh6umP8x8UhbT5gcOfnlSYVMvdraUI6PGnsj5rGsHRJsPOi2yYc8rHL6GosV81rPG/Oca+IzJLuyYVVy7EEGitFvGoSp5yAfpODsybcgGjE5XUqkpv8FV+reJdTcg+/6UCFE7qUi0GyUR/J/nIasAAAA" alt="" itemprop="image"><div class="social-embed-user-names"><p class="social-embed-user-names-name" itemprop="name">Lea Verou</p>@LeaVerou</div></a><img class="social-embed-logo" alt="Twitter" src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%0Aaria-label%3D%22Twitter%22%20role%3D%22img%22%0AviewBox%3D%220%200%20512%20512%22%3E%3Cpath%0Ad%3D%22m0%200H512V512H0%22%0Afill%3D%22%23fff%22%2F%3E%3Cpath%20fill%3D%22%231d9bf0%22%20d%3D%22m458%20140q-23%2010-45%2012%2025-15%2034-43-24%2014-50%2019a79%2079%200%2000-135%2072q-101-7-163-83a80%2080%200%200024%20106q-17%200-36-10s-3%2062%2064%2079q-19%205-36%201s15%2053%2074%2055q-50%2040-117%2033a224%20224%200%2000346-200q23-16%2040-41%22%2F%3E%3C%2Fsvg%3E"></header><section class="social-embed-text" itemprop="articleBody">HTML Idea: &lt;input type="country"&gt; which would become a searchable dropdown with all countries and their flags.<br>Wouldn't that be awesome?</section><hr class="social-embed-hr"><footer class="social-embed-footer"><a href="https://twitter.com/LeaVerou/status/921727157705035776"><span aria-label="1863 likes" class="social-embed-meta">❤️ 1,863</span><span aria-label="113 replies" class="social-embed-meta">💬 113</span><span aria-label="0 reposts" class="social-embed-meta">🔁 0</span><time datetime="2017-10-21T13:17:33.000Z" itemprop="datePublished">13:17 - Sat 21 October 2017</time></a></footer></blockquote>

<p>You can read through the conversation and make your own mind up (while also marvelling at the witless mansplainers) - but I'd like to give you my considered take on it.</p>

<p>(Disclaimer - I'm an editor on the HTML 5.3 spec and I work for the UK Government. This is a personal blog post and doesn't represent the views of my employers, associates, or friends.)</p>

<h2 id="who-are-you"><a href="https://shkspr.mobi/blog/2017/11/input-type-country/#who-are-you">Who Are You?</a></h2>

<p>Let's start with the big one.  What is a country?  This is about as contentious as it gets! It involves national identities, international politics, and hereditary relationships.</p>

<p>Scotland, for example, is a country.  <a href="http://www.parliament.uk/about/living-heritage/evolutionofparliament/legislativescrutiny/act-of-union-1707/overview/">That is a (fairly) uncontentious statement</a> - and yet in drop-down lists, I rarely see it mentioned. Why? Because it is one of the four countries which make up the country of the United Kingdom - and so it is usually (but not always) subsumed into that.</p>

<p>Some countries don't recognise each other.  Some believe that the other country is really part of <em>their</em> country.  <a href="https://en.wikipedia.org/wiki/Gregor_MacGregor#Poyais_scheme">Some countries don't exist</a>.</p>

<p>There are two main schemes to classify what is and isn't a country.  The first is ISO 3166-1.  It provides two- and three-letter codes for every country.  Well... sort of.</p>

<p>ISO 3166 contains 249 different countries, territories, protectorates, principalities, duchies, and other bits-and-bobs. It contains the Falklands, but not Scotland.</p>

<p>The second is... whatever your country says is another country!</p>

<p>My friends in the Government Registers Team have published <a href="https://web.archive.org/web/20171219102544/https://country.register.gov.uk/">a canonical list of every country that the UK recognises</a>. There are 199 entries. Which countries are <em>not</em> in there is left as an exercise for the reader.</p>

<p>The UK's register of countries should allow every Government website to have the same list in a drop down. When new countries are recognised, one list needs to be updated - and then all websites automagically update. In theory.</p>

<p>Incidentally, that list of 199 countries includes four entries for countries <strong>which no-longer exist</strong>. For example Yugoslavia.</p>

<p>Which brings us to the next question...</p>

<h2 id="whats-the-use-case"><a href="https://shkspr.mobi/blog/2017/11/input-type-country/#whats-the-use-case">What's the use case?</a></h2>

<p>The most obvious one is "I want to give a site my current address" - presumably for identification purposes or postal deliveries.</p>

<p>But what if the use case is "I want to say where I was born"?</p>

<p>Borders shift.  Countries disappear, merge, split, change names, change flags, and do all manner of weird things which trip up your edge cases.</p>

<p>The user may want to find the name in their own script - for example would a Greek user be looking for "Greece" or "Ελλάδα"?  If a Chinese speaker wants to visit the UK, do they look in the drop-down for "英国"?</p>

<p>International Dialling Codes - not every country is unique - <code>+1</code> is used by USA, Canada, Anguilla, Dominican Republic, and dozens more. Are there <a href="https://github.com/mledoze/countries/issues/114">countries where there is more than one international dialling code</a>?</p>

<p>OK, what if the user wants to select their language based on their country?</p>

<h2 id="do-you-have-a-flag"><a href="https://www.youtube.com/watch?v=hYeFcSq7Mxg">Do You Have A Flag?</a><a href="https://shkspr.mobi/blog/2017/11/input-type-country/#do-you-have-a-flag">🔗</a></h2>

<p>It is one of the classic conventions that first-year students of user interface design are taught - <a href="http://www.flagsarenotlanguages.com/blog/why-flags-do-not-represent-language/">countries do not represent language</a>!</p>

<p>Some countries have multiple official languages.  Some users may not speak the language of their country. Some languages are only used for official purposes, and not by the general population.</p>

<p>Flags <em>mostly</em> represent countries.  There are people in Wales who would rather see Y Ddraig Goch  rather than the <a href="https://www.flaginstitute.org/wp/british-flags/the-union-jack-or-the-union-flag/">Union Jack</a>. And vice-versa.  Flags can make people angry.</p>

<p>The flag of the USA last changed in 1960 - but <a href="https://www.washingtonpost.com/news/worldviews/wp/2017/08/08/mauritanias-president-bundles-a-patriotic-flag-change-with-abolishing-the-senate/">Mauritania changed theirs in August 2017</a>. How quickly can a browser update their list of countries?</p>

<h2 id="and-yet"><a href="https://shkspr.mobi/blog/2017/11/input-type-country/#and-yet">...and yet...</a></h2>

<p>I instinctively <em>like</em> this idea! <a href="https://twitter.com/Glightstar/status/714203191999664129">This isn't a new question</a>, nothing ever is, but I think it is an idea which has merit.</p>

<p>One of the goals of HTML is to stop web developers having to re-invent the wheel. That's why we have lots of different <code>&lt;input&gt;</code> types - to reduce complexity.</p>

<p>Colour picker <input type="color"></p>

<p>Number inputs <input id="number" type="number" value="42"></p>

<p>Range selector <input type="range"></p>

<p>Some modern browsers support date input <input id="date" type="date"></p>

<p>The challenges of a country selector are...</p>

<ol>
<li>Keeping everyone happy and not causing major diplomatic incidents. Easy‽</li>
<li>Usability. Making sure it's easy to search for the name of a country.</li>
<li>Consistency. How do you indicate that this list contains historic countries?</li>
</ol>

<p>None of these are insurmountable problems - but it's far from trivial.</p>

<p>And yet... I think there is a real possibility that this could work. Millions of websites already find ways to cope with the ambiguity - perhaps browsers can too?</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=28790&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2017/11/input-type-country/feed/</wfw:commentRss>
			<slash:comments>35</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Why can't you send email to a Chinese address?]]></title>
		<link>https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/</link>
					<comments>https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Tue, 20 Sep 2016 11:41:33 +0000</pubDate>
				<category><![CDATA[usability]]></category>
		<category><![CDATA[chinese]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[unicode]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=23331</guid>

					<description><![CDATA[We all know what an email address looks like and how to validate them, right?  A few years ago I got the Chinese domain name 莎士比亚.org.  You can browse to it, link to it, and send email to it.  Or can you?  When I tried two years ago, none of the major email providers supported sending to non-ASCII email addresses.  Today, I tried again with six of the big &#34;Western&#34; webmail providers.  How did they…]]></description>
										<content:encoded><![CDATA[<p>We all know what an email address looks like and <a href="https://david-gilbertson.medium.com/the-100-correct-way-to-validate-email-addresses-7c4818f24643">how to validate them</a>, right?</p>

<p>A few years ago I got the Chinese domain name <a href="https://莎士比亚.org">莎士比亚.org</a>.  You can browse to it, link to it, and send email to it.  <em>Or can you?</em></p>

<p>When I tried <a href="https://shkspr.mobi/blog/2014/01/poor-idn-support-from-major-webmail-providers/">two years ago</a>, <strong>none</strong> of the major email providers supported sending to non-ASCII email addresses.</p>

<p>Today, I tried again with six of the big "Western" webmail providers.  How did they do?</p>

<h2 id="show-me-the-data"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#show-me-the-data">Show Me The Data!</a></h2>

<p>I tested by trying to send an email to <code>test@莎士比亚.org</code> and the <a href="https://en.wikipedia.org/wiki/Punycode">Punycode</a> representation <code>test@xn--jlq54w7ypemw.org</code></p>

<table>
<thead>
<tr>
  <th align="right"></th>
  <th align="center">test@莎士比亚.org</th>
  <th align="center">test@xn--jlq54w7ypemw.org</th>
</tr>
</thead>
<tbody>
<tr>
  <td align="right">Gmail</td>
  <td align="center"><span style="color:green">✔</span></td>
  <td align="center"><span style="color:green">✔</span></td>
</tr>
<tr>
  <td align="right">Outlook</td>
  <td align="center"><span style="color:green">✔</span></td>
  <td align="center"><span style="color:green">✔</span></td>
</tr>
<tr>
  <td align="right">Yahoo</td>
  <td align="center"><span style="color:red">❌</span></td>
  <td align="center"><span style="color:red">❌</span></td>
</tr>
<tr>
  <td align="right">iCloud</td>
  <td align="center"><span style="color:red">❌</span></td>
  <td align="center"><span style="color:green">✔</span></td>
</tr>
<tr>
  <td align="right">OWA</td>
  <td align="center"><span style="color:red">❌</span></td>
  <td align="center"><span style="color:green">✔</span></td>
</tr>
<tr>
  <td align="right">FastMail</td>
  <td align="center"><span style="color:green">✔</span> ⭐</td>
  <td align="center"><span style="color:green">✔</span></td>
</tr>
</tbody>
</table>

<h2 id="winners"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#winners">Winners!</a></h2>

<p>Both Gmail and Outlook failed the last time I tried them - I'm very pleased to say that both of them now support sending to Chinese addresses.</p>

<p>One strange thing to note, when looking through Outlook's message details, I found this example of <a href="https://en.wikipedia.org/wiki/Mojibake">Mojibake</a>.
<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/Outlook-Encoding-Issues-.png" alt="Outlook showing encoding errors, mangling up the email address" width="528" height="163" class="aligncenter size-full wp-image-23344"></p>

<h2 id="losers"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#losers">Losers!</a></h2>

<h3 id="yahoo"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#yahoo">Yahoo</a></h3>

<p>The biggest loser is Yahoo.  Very strange considering <a href="https://en.wikipedia.org/wiki/Jerry_Yang">Jerry Yang</a>, their founder, is Taiwanese-American.  Even stranger given <a href="https://en.wikipedia.org/wiki/Criticism_of_Yahoo!#Work_in_the_People.27s_Republic_of_China">Yahoo's continued dealings with China</a>.</p>

<p>The Yahoo webmail portal simply wouldn't let me send to a Chinese domain name.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/Yahoo-email-not-recognised-.png" alt="Yahoo unable to send a message to a Chinese email address" width="640" height="349" class="aligncenter size-full wp-image-23340">

<p>The Punycode representation appeared to send but immediately failed.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/Yahoo-unable-to-send-message-.png" alt="Yahoo unable to send a message to a Chinese email address" width="638" height="169" class="aligncenter size-full wp-image-23339">

<h3 id="icloud"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#icloud">iCloud</a></h3>

<p>Apple's much-vaunted "It Just Works" philosophy obviously doesn't extend to International email addresses.  It accepted the Punycode but gave this <em>delightful</em> error message on the Chinese domain.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/iCloud-Delivery-Failure-Notification-.png" alt="iCloud showing a delivery failure notification" width="607" height="458" class="aligncenter size-full wp-image-23342">

<h3 id="owa"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#owa">OWA</a></h3>

<p>Microsoft's Outlook Web Access got <em>very</em> confused and tried to look up the email address in the local directory.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/OWA-No-Match-Found-.png" alt="Outlook Web Access showing no match found" width="543" height="170" class="aligncenter size-full wp-image-23343">

<h2 id="errr"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#errr">Errr?</a></h2>

<h3 id="%e2%ad%90-fastmail"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#%e2%ad%90-fastmail">⭐ FastMail</a></h3>

<p>Lots of people recommended that I try <a href="https://www.fastmail.com/">Fastmail</a> - it <em>really</em> didn't like the look of the Chinese domain and painted it with a red error colour.  That said, it sent the email without further complaint.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/Fastmail-showing-red-error-on-email-.png" alt="Fastmail apparently showing that the email address is invalid" width="545" height="436" class="aligncenter size-full wp-image-23345">

<h2 id="what-about-a-chinese-local-part"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#what-about-a-chinese-local-part">What about a Chinese Local-Part?</a></h2>

<p>Email is a venerable protocol. That's a polite way of saying it is old and outdated.  The <a href="https://en.wikipedia.org/wiki/Email_address#Local-part">local-part</a> of the email address (<code>test@</code>) is generally restricted to a handful of <a href="https://www.jochentopf.com/email/chars.html">7 Bit ASCII characters</a>.  None of the email providers I tried would let me sign up with a Chinese name. So no 你好@yahoo.com for me!</p>

<p>But what happens if you're foolish enough to try to send an email to <code>你好@莎士比亚.org</code>?</p>

<p>Well you'll probably get this error message:</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/SMTPUTF8-Delivery-Failure-Notification.png" alt="Technical details of permanent failure: local-part of envelope RCPT address contains utf8 but remote server did not offer SMTPUTF8" width="659" height="160" class="aligncenter size-full wp-image-23348">

<p>In 2012, <a href="https://tools.ietf.org/html/rfc6531">RFC 6531 defined how International Email Addresses should work</a>.  Over four years later and <a href="https://en.wikipedia.org/wiki/Extended_SMTP#SMTPUTF8">support is <em>still</em> not widespread</a>.</p>

<p>It's 2016 and the majority of the world <strong>can't send an email to their preferred name</strong>.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=23331&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
	</channel>
</rss>
