<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/rss-style.xsl" type="text/xsl"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	     xmlns:dc="http://purl.org/dc/elements/1.1/"
	   xmlns:atom="http://www.w3.org/2005/Atom"
	     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	  xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>chinese &#8211; Terence Eden’s Blog</title>
	<atom:link href="https://shkspr.mobi/blog/tag/chinese/feed/" rel="self" type="application/rss+xml" />
	<link>https://shkspr.mobi/blog</link>
	<description>Regular nonsense about tech and its effects 🙃</description>
	<lastBuildDate>Sun, 08 Jun 2025 10:54:40 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://shkspr.mobi/blog/wp-content/uploads/2023/07/cropped-avatar-32x32.jpeg</url>
	<title>chinese &#8211; Terence Eden’s Blog</title>
	<link>https://shkspr.mobi/blog</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title><![CDATA[Book Review: Babel - R. F. Kuang ★★★★★]]></title>
		<link>https://shkspr.mobi/blog/2024/01/book-review-babel-r-f-kuang/</link>
					<comments>https://shkspr.mobi/blog/2024/01/book-review-babel-r-f-kuang/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Fri, 26 Jan 2024 12:34:12 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[Book Review]]></category>
		<category><![CDATA[china]]></category>
		<category><![CDATA[chinese]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=49181</guid>

					<description><![CDATA[This is an astonishing book. On the one hand, it&#039;s the basic &#34;Harry Potter&#34; trope - a young orphan is gifted, gets sent to school to learn magic, becomes pals with the other weird kids, has adventures, and fights a monster. Except here, Harry is Chinese, is sent to Oxford University to learn magic, and faces up to the reality of colonialism and Empire.  Oh, and the magic is based on the…]]></description>
										<content:encoded><![CDATA[<p><img src="https://shkspr.mobi/blog/wp-content/uploads/2024/01/x400.jpg" alt="Book cover featuring the dreaming spires of Oxford. The page is ripped in two and the Tower of Babel is no longer there." width="200" class="alignleft size-full wp-image-49182">This is an astonishing book. On the one hand, it's the basic "Harry Potter" trope - a young orphan is gifted, gets sent to school to learn magic, becomes pals with the other weird kids, has adventures, and fights a monster. Except here, Harry is Chinese, is sent to Oxford University to learn magic, and faces up to the reality of colonialism and Empire.</p>

<p>Oh, and the magic is based on the <a href="https://en.wikipedia.org/wiki/Linguistic_relativity">Sapir-Whorf hypothesis</a>.</p>

<p>I lived in Oxford for several years (although, thankfully, I wasn't a scholar) and Kuang has <em>perfectly</em> captured the madness of the city. Her world-building is delightfully realistic and the parenthetical footnotes sprinkled throughout lead to a mesmerising blurring of reality and fiction.  When you read sentences like "Phonological calques are often semantic calques as well." it often feels like you're receiving an education as well as experiencing the narrative flow.</p>

<p>The book's politics aren't subtle - but they needn't be. This isn't smuggled polemic; it is righteous fury bound into a novel and set loose on an unsuspecting world. It is the very essence of what it means to be "woke". Our characters gradually have the scales drop from their eyes and they begin to realise the nightmare world they live in.</p>

<p>A thoroughly entertaining read, with a perfect mixture of alternative history, science-fantasy, heartbreak, and wonder.</p>

<p>On a minor technical note - the publishers have rendered all the Chinese characters as tiny images which makes it hard to read them. A bit of a baffling editorial decision!</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=49181&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2024/01/book-review-babel-r-f-kuang/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Book Review: Invisible Planets - Ken Liu ★★★★☆]]></title>
		<link>https://shkspr.mobi/blog/2023/05/book-review-invisible-planets-ken-liu/</link>
					<comments>https://shkspr.mobi/blog/2023/05/book-review-invisible-planets-ken-liu/#respond</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 31 May 2023 11:34:35 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[Book Review]]></category>
		<category><![CDATA[chinese]]></category>
		<category><![CDATA[Sci Fi]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=45870</guid>

					<description><![CDATA[Yet another compendium of Chinese sci-fi stories - and there are some great stories in this collection. There are also some essays about what makes Chinese science fiction Chinese. Based on my (limited) experience, I&#039;d say one of the defining characteristics of the Chinese SF I&#039;ve read is the way exposition is dispensed with and replaced by poetry.  Mankind streamed across the river of time,…]]></description>
										<content:encoded><![CDATA[<p><img src="https://shkspr.mobi/blog/wp-content/uploads/2023/05/Invisible-Planets.jpg" alt="Book cover." width="219" height="346" class="alignleft size-full wp-image-45871">Yet another compendium of Chinese sci-fi stories - and there are some great stories in this collection. There are also some essays about what makes Chinese science fiction Chinese. Based on my (limited) experience, I'd say one of the defining characteristics of the Chinese SF I've read is the way exposition is dispensed with and replaced by poetry.</p>

<blockquote><p>Mankind streamed across the river of time, aiming straight for the Door Into Summer. In that moment, our tiny planet was falling like a single drop of dew in a boundless universe, tumbling toward that plane made up of the broken remains of a planet.
"Grave of the Fireflies"</p></blockquote>

<p>There's a good range in here, from ghost stories to an homage to 1984. Some of the literary allusions are helpfully footnoted to allow the more casual reader to follow along. I think that's a sensible way to expand the inclusiveness of the stories.</p>

<p>It's an excellent and varied collection. More hits than misses - and the misses are at least <em>interesting</em>.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=45870&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2023/05/book-review-invisible-planets-ken-liu/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Book Review: The Reincarnated Giant - Mingwei Song ★★★☆☆]]></title>
		<link>https://shkspr.mobi/blog/2023/04/book-review-the-reincarnated-giant-mingwei-song/</link>
					<comments>https://shkspr.mobi/blog/2023/04/book-review-the-reincarnated-giant-mingwei-song/#respond</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sun, 02 Apr 2023 11:34:25 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[Book Review]]></category>
		<category><![CDATA[chinese]]></category>
		<category><![CDATA[Sci Fi]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=45218</guid>

					<description><![CDATA[This is an anthology of modern Chinese science fiction, loosely grouped into three main themes.  I&#039;m sad to say that some of the stories are a lot of hard work. One is barely sci-fi - more like a spiritual paean to the souls of people caught in a disaster which, bizarrely, has a throwaway line about aliens in it. One is an interminable description of domesticity which, if I&#039;ve understood…]]></description>
										<content:encoded><![CDATA[<p><img src="https://shkspr.mobi/blog/wp-content/uploads/2023/04/51OxIb3gR0L.jpg" alt="Book cover. A cybernetic man floats in a tangle of wires." width="200" class="alignleft size-full wp-image-45220">This is an anthology of modern Chinese science fiction, loosely grouped into three main themes.</p>

<p>I'm sad to say that some of the stories are a lot of hard work. One is barely sci-fi - more like a spiritual paean to the souls of people caught in a disaster which, bizarrely, has a throwaway line about aliens in it. One is an interminable description of domesticity which, if I've understood correctly ends in a manic sequence where an elderly author travels back in time to fuck someone who may or may not be the wife or mother of his time-travelling child. Yeah, me neither.</p>

<p>One story is almost dream-like in its babble. I'm not sure if that's by design or whether it is genuinely untranslatable.</p>

<p>Another story, which is quite good, starts with a long stream of life in rural China and switches, without warning, into space:</p>

<blockquote><p>He stood up again and continued on. He had not gone far before he turned into a bookstore. How wonderful the city was with its bookstores still open at night. He spent all his money, save for his return fare, on books to add to the school’s meager library. At midnight, carrying two heavy bundles of books, he boarded the train home.
Fifty thousand light-years from Earth, near the center of the Milky Way galaxy, an intergalactic war that had raged for twenty thousand years was near its conclusion.</p></blockquote>

<p>What?!</p>

<p>One of the stories is actually just a couple of chapters from a much longer novel. It was entertaining enough, but was a strange inclusion for a book of otherwise self-contained stories.</p>

<p>But... Some of the stories are absolute crackers!</p>

<blockquote><p>Since Yiyi’s lectures on classical literature at the feedlot had a tranquilizing effect, producing a special flavor in his students’ meat, the dinosaurs left him alone</p></blockquote>

<p>I mean - come on, that's worth the price of admission alone! Some of the stories are riffs on classics you've probably read before - and it's interesting to see a Chinese perspective on them.</p>

<p>Many of the stories flow heavily with beautiful symbolism. I'm sure that some of the symbolism in the stories is incredibly obvious to those with a deeper understanding of Chinese culture than I. There are some helpful footnotes scattered throughout which help orientate the lost reader - but they can't replicate the innate recognition which comes with immersion.</p>

<p>Some of the stories read, at times, like poetry:</p>

<blockquote><p>the error was like a small tear in the calf of a silk stocking, just a tiny cool spot at first, a premonition lying in the subconscious like a snake.</p></blockquote>

<p>I've certainly not seen anything that evocative in mainstream Western sci-fi.  It's full of these poetic rubies in the rubble.</p>

<p>Amusingly, some of the stories end with <em>very</em> different moral lessons than you'd find in an English compendium. It certainly made me reflect on my baked-in cultural assumptions.</p>

<p>There are enough good stories in here to outweigh the bad, and they are <em>really</em> good. But I'd recommend skipping through a few of the more tedious entries.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=45218&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2023/04/book-review-the-reincarnated-giant-mingwei-song/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Book Review: Adventures in Space - New Short Stories by Chinese & English Science Fiction Writers ★★★☆☆]]></title>
		<link>https://shkspr.mobi/blog/2023/03/book-review-adventures-in-space-new-short-stories-by-chinese-english-science-fiction-writers/</link>
					<comments>https://shkspr.mobi/blog/2023/03/book-review-adventures-in-space-new-short-stories-by-chinese-english-science-fiction-writers/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Fri, 03 Mar 2023 12:34:21 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[Book Review]]></category>
		<category><![CDATA[chinese]]></category>
		<category><![CDATA[NetGalley]]></category>
		<category><![CDATA[Sci Fi]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=45059</guid>

					<description><![CDATA[This is a curious - and slightly unsatisfying - collection of short stories. There&#039;s no cohesive theme; some are about space travel, some alien invasion, some about madness on Mars, some about interstellar religions. You bounce around between themes without much chance to reflect on how different authors tackle the same subject.  The stories alternate between Chinese authors and English-speaking…]]></description>
										<content:encoded><![CDATA[<p><img src="https://shkspr.mobi/blog/wp-content/uploads/2023/02/adventures-in-space.jpg" alt="Book cover for Adventures in Space." width="200" class="alignleft size-full wp-image-45060">This is a curious - and slightly unsatisfying - collection of short stories. There's no cohesive theme; some are about space travel, some alien invasion, some about madness on Mars, some about interstellar religions. You bounce around between themes without much chance to reflect on how different authors tackle the same subject.</p>

<p>The stories alternate between Chinese authors and English-speaking authors. Again, it feels a little disjointed. Will general audiences not read Chinese sci-fi unless it is intermingled with western tales? It feels a bit like hiding vegetables in mashed potatoes.  And it isn't like the English stories deal with Chinese themes or characters - so it feel like a bit of a wasted opportunity.</p>

<p>For all that, the stories are pretty good. There are some amazingly original ideas tucked away in there. Visions of Doomsday and Dæmons, flights into the unknown, and the bureaucracy of inter-planetary relations.</p>

<p>If you're happy with a mish-mash of themes and styles, this is a fine assortment of tales.</p>

<p>Thanks to <a href="https://www.netgalley.co.uk">NetGalley</a> for the review copy.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=45059&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2023/03/book-review-adventures-in-space-new-short-stories-by-chinese-english-science-fiction-writers/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Book Review: Land of Big Numbers - Te-Ping Chen ★★★☆☆]]></title>
		<link>https://shkspr.mobi/blog/2021/03/book-review-land-of-big-numbers-te-ping-chen/</link>
					<comments>https://shkspr.mobi/blog/2021/03/book-review-land-of-big-numbers-te-ping-chen/#respond</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sun, 21 Mar 2021 12:16:55 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[Book Review]]></category>
		<category><![CDATA[china]]></category>
		<category><![CDATA[chinese]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=38351</guid>

					<description><![CDATA[I&#039;ve had a long-held fascination with China. I took Mandarin at University and, a few years ago, I was lucky enough to go to Beijing. So I was excited to pick up this book of short stories about modern China.  It is a mixed lot of tales about Chinese people both in and outside of China. But, with the exception of a couple of stories - they just fell flat for me.  I found it hard to assess if the…]]></description>
										<content:encoded><![CDATA[<p><img src="https://shkspr.mobi/blog/wp-content/uploads/2021/03/Land-of-Big-Numbers_-Stories.jpeg" alt="Book cover." width="220" class="alignleft size-full wp-image-38352">I've had a long-held fascination with China. I took Mandarin at University and, a few years ago, <a href="https://shkspr.mobi/blog/2017/05/a-vegetarian-in-beijing/">I was lucky enough to go to Beijing</a>. So I was excited to pick up this book of short stories about modern China.</p>

<p>It is a mixed lot of tales about Chinese people both in and outside of China. But, with the exception of a couple of stories - they just fell flat for me.</p>

<p>I found it hard to assess if the stories are intended to be realistic or allegorical. As the author is a journalist, I thought the stories might be grounded in reality - or based on interviews.  Instead, they're an amalgam of possibly-true little slices of life from a perspective you may not have encountered. There's nothing particularly wrong with them, but there's only so many times you can read about someone lost and alone in a big city before it gets repetitive.</p>

<p>The final story, "Gubeikou Spirit", is fantastic. It is a wonderful tale of manipulation, lack of agency, and Kafkaesque bureaucracy. It feels like the author has perfectly captured the dream-logic of a nightmare.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=38351&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2021/03/book-review-land-of-big-numbers-te-ping-chen/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[How Do You Sort Chinese Numbers?]]></title>
		<link>https://shkspr.mobi/blog/2016/11/how-do-you-sort-chinese-numbers/</link>
					<comments>https://shkspr.mobi/blog/2016/11/how-do-you-sort-chinese-numbers/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Tue, 08 Nov 2016 11:27:48 +0000</pubDate>
				<category><![CDATA[usability]]></category>
		<category><![CDATA[chinese]]></category>
		<category><![CDATA[mandarin]]></category>
		<category><![CDATA[NaBloPoMo]]></category>
		<category><![CDATA[unicode]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=23428</guid>

					<description><![CDATA[Imagine you have a series of number you wish to sort.  Sorting is a well known computer science problem - generally speaking you compare one value to the next and then move the item either up or down a list.  With &#34;English&#34; characters, that&#039;s fairly easy.  When a computer sees the character 1 it&#039;s really seeing the Unicode character U+0031.  When it sees 2 it&#039;s really seeing the character U+0032…]]></description>
										<content:encoded><![CDATA[<p>Imagine you have a series of number you wish to sort.  Sorting is a well known computer science problem - generally speaking you compare one value to the next and then move the item either up or down a list.</p>

<p>With "English" characters, that's fairly easy.</p>

<p>When a computer sees the character <code>1</code> it's <em>really</em> seeing the Unicode character <code>U+0031</code>.  When it sees <code>2</code> it's <em>really</em> seeing the character <code>U+0032</code> and so on.</p>

<p>The <a href="https://en.wikipedia.org/wiki/Arabic_numerals">Arabic numbers</a> we use (0 - 9) have an identical ordering in Unicode. This makes it very easy for a computer to sort "Western" numbers.</p>

<p>But for Chinese... Well, it's <em>complicated!</em></p>

<h2 id="counting-in-mandarin-chinese"><a href="https://shkspr.mobi/blog/2016/11/how-do-you-sort-chinese-numbers/#counting-in-mandarin-chinese">Counting in Mandarin Chinese</a></h2>

<p>Here's a very quick primer on Chinese numbers.</p>

<p>一 = 1<br>
二 = 2<br>
三 = 3<br>
四 = 4<br>
五 = 5<br>
六 = 6<br>
七 = 7<br>
八 = 8<br>
九 = 9<br>
十 = 10<br>
十一 = 11<br>
十二 = 12<br>
二十 = 20<br>
二十一 = 21<br>
二十二 = 22<br>
一百 = 100<br>
一百一 = 101<br>
一百二十三 = 123</p>

<p>In <a href="http://www.amathsdictionaryforkids.com/qr/b/base10system.html">Base-10</a> the length of a number  reflects its size. A 4 digit number is <em>always</em> bigger than a 3 digit number.</p>

<p>In Chinese, a 3 character number like 四十二 (42) is <em>longer</em> than a 2 character number like 九十 (90), yet its value is <em>smaller</em>.</p>

<p>But that's not the worst of it!</p>

<p>Because of the <a href="https://news.ycombinator.com/item?id=8041288">controversial</a> process of <a href="https://en.wikipedia.org/wiki/Han_unification">Han Unification</a> - a whole bunch of Chinese, Japanese, and Korean characters (CJK) are lumped together in the same Unicode code block  This leaves us with the somewhat weird situation where a number's numerical order doesn't match the order in which they're presented in Unicode.</p>

<p>Here's how the characters are represented:</p>

<table>
<thead>
<tr>
  <th align="right">Character</th>
  <th align="left">Number</th>
  <th align="left">Unicode Codepoint</th>
</tr>
</thead>
<tbody>
<tr>
  <td align="right">一</td>
  <td align="left">1</td>
  <td align="left">U+4E00</td>
</tr>
<tr>
  <td align="right">二</td>
  <td align="left">2</td>
  <td align="left">U+4E8C</td>
</tr>
<tr>
  <td align="right">三</td>
  <td align="left">3</td>
  <td align="left">U+4E09</td>
</tr>
<tr>
  <td align="right">四</td>
  <td align="left">4</td>
  <td align="left">U+56DB</td>
</tr>
<tr>
  <td align="right">五</td>
  <td align="left">5</td>
  <td align="left">U+4E94</td>
</tr>
<tr>
  <td align="right">六</td>
  <td align="left">6</td>
  <td align="left">U+516D</td>
</tr>
<tr>
  <td align="right">七</td>
  <td align="left">7</td>
  <td align="left">U+4E03</td>
</tr>
<tr>
  <td align="right">八</td>
  <td align="left">8</td>
  <td align="left">U+516B</td>
</tr>
<tr>
  <td align="right">九</td>
  <td align="left">9</td>
  <td align="left">U+4E5D</td>
</tr>
<tr>
  <td align="right">十</td>
  <td align="left">10</td>
  <td align="left">U+5341</td>
</tr>
<tr>
  <td align="right">百</td>
  <td align="left">100</td>
  <td align="left">U+767E</td>
</tr>
</tbody>
</table>

<p>Which, if my sorting is correct, gives us an ordering of:
<code>1 7 3 2 5 9 8 6 10 4</code></p>

<p>This makes it <strong>impossible</strong> to perform even a basic sort of a simple list of numbers without first doing some complex fiddling to transform the characters into numbers first.</p>

<h2 id="it-gets-even-more-complicated"><a href="https://shkspr.mobi/blog/2016/11/how-do-you-sort-chinese-numbers/#it-gets-even-more-complicated">It gets even more complicated.</a></h2>

<p>Anyone who has tried to sort a list of files with numbers in their name, knows that computers don't always see the world in the same way as humans.  It's quite common to see a sorted list which looks like this:</p>

<pre><code>10.mp3
11.mp3
1.mp3
20.mp3
2.mp3
3.mp3
4.mp3
...
</code></pre>

<p>Why? Because sorting by "text" is different to sorting by "value".</p>

<p>How do Chinese file names get sorted?  Here's Ubuntu's File manager trying to sort some files with Chinese numbers in them:
<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/10/Chinese-characters-in-file-names-sorted-in-Linux-fs8.png" alt="Chinese characters in filenames sorted in linux - the files are in the wrong order" width="150" height="478" class="aligncenter size-full wp-image-23432"></p>

<p>Yet another ordering!  Why?  It turns out that <a href="https://en.wikipedia.org/wiki/Chinese_characters#Indexing">there are <em>lots</em> of ways to sort Chinese characters</a>.</p>

<p>In this case, the <a href="https://twitter.com/m13253/status/784726363282415617">characters are sorted according to the "English" pronunciation order</a>!  That's the equivalent of sorting the numbers 1 - 10 <em>alphabetically</em>: eight five four nine one seven six ten three two.</p>

<h2 id="can-we-make-it-even-more-complicated"><a href="https://shkspr.mobi/blog/2016/11/how-do-you-sort-chinese-numbers/#can-we-make-it-even-more-complicated">Can we make it even more complicated?</a></h2>

<p>Of course!</p>

<p>Let's include into the mix some <a href="https://en.wikipedia.org/wiki/Gujarati_alphabet#Digits">Gujarati digits</a>.  They look quite similar to our familiar Arabic digits and, like Arabic digits, have a sensible Unicode ordering.</p>

<p>Imagine a folder with the files <code>1</code>, <code>2</code>, <code>3</code>, <code>10</code> - with the numbers in Arabic, Chinese, and Gujarati.  How would you expect the files to be sorted?  Should <code>1</code> and <code>一</code> be grouped with  Gujarati's <code>૧</code>?</p>

<p>Naïvely we might expect the order to be 1, 2, 3, 10, ૧, ૨, ૩, ૧૦, 一, 二, 三, 十.</p>

<p>Ubuntu handles it two different ways.  In the GUI, the files are grouped:
<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/10/Arabic-Chinese-and-Gujarati-numbers-in-filenames-the-ordering-is-inconsistent-fs8.png" alt="Arabic, Chinese, and Gujarati numbers in filenames - the ordering is inconsistent" width="152" height="449" class="aligncenter size-full wp-image-23438"></p>

<p>On the command line, we find yet another weird way to order files:</p>

<pre><code>10.mp3
૧૦.mp3
1.mp3
૧.mp3
2.mp3
૨.mp3
3.mp3
૩.mp3
一.mp3
三.mp3
二.mp3
十.mp3
</code></pre>

<p>Would <em>any</em> human expect an ordering like this?</p>

<h2 id="whats-the-solution"><a href="https://shkspr.mobi/blog/2016/11/how-do-you-sort-chinese-numbers/#whats-the-solution">What's the solution?</a></h2>

<p>I've complained before that <a href="https://shkspr.mobi/blog/2013/06/is-github-racist/">modern computing tools often ignore modern languages</a>.  Usually it's not outright racism - just an ignorance of how the world works and how people interact with machines.</p>

<p>The correct way, in my opinion, is to have <em>context aware</em> tools which empathise with what the user is trying to achieve.</p>

<p>There are several <a href="http://stackoverflow.com/questions/15076443/convert-numbers-in-chinese-characters-to-arabic-numbers">algorithms for converting "Chinese numbers" into "Arabic numbers"</a>.  When a tool encounters a character which represents a number, it should assume that <em>the numerical representation contains semantic meaning</em>.</p>

<p>Yes, it might be hard work - but that's what computers are here for. They do hard work so humans don't have to. And if your computer can't even sort files in the correct order, what else might it be getting wrong?</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=23428&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2016/11/how-do-you-sort-chinese-numbers/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Why can't you send email to a Chinese address?]]></title>
		<link>https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/</link>
					<comments>https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Tue, 20 Sep 2016 11:41:33 +0000</pubDate>
				<category><![CDATA[usability]]></category>
		<category><![CDATA[chinese]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[unicode]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=23331</guid>

					<description><![CDATA[We all know what an email address looks like and how to validate them, right?  A few years ago I got the Chinese domain name 莎士比亚.org.  You can browse to it, link to it, and send email to it.  Or can you?  When I tried two years ago, none of the major email providers supported sending to non-ASCII email addresses.  Today, I tried again with six of the big &#34;Western&#34; webmail providers.  How did they…]]></description>
										<content:encoded><![CDATA[<p>We all know what an email address looks like and <a href="https://david-gilbertson.medium.com/the-100-correct-way-to-validate-email-addresses-7c4818f24643">how to validate them</a>, right?</p>

<p>A few years ago I got the Chinese domain name <a href="https://莎士比亚.org">莎士比亚.org</a>.  You can browse to it, link to it, and send email to it.  <em>Or can you?</em></p>

<p>When I tried <a href="https://shkspr.mobi/blog/2014/01/poor-idn-support-from-major-webmail-providers/">two years ago</a>, <strong>none</strong> of the major email providers supported sending to non-ASCII email addresses.</p>

<p>Today, I tried again with six of the big "Western" webmail providers.  How did they do?</p>

<h2 id="show-me-the-data"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#show-me-the-data">Show Me The Data!</a></h2>

<p>I tested by trying to send an email to <code>test@莎士比亚.org</code> and the <a href="https://en.wikipedia.org/wiki/Punycode">Punycode</a> representation <code>test@xn--jlq54w7ypemw.org</code></p>

<table>
<thead>
<tr>
  <th align="right"></th>
  <th align="center">test@莎士比亚.org</th>
  <th align="center">test@xn--jlq54w7ypemw.org</th>
</tr>
</thead>
<tbody>
<tr>
  <td align="right">Gmail</td>
  <td align="center"><span style="color:green">✔</span></td>
  <td align="center"><span style="color:green">✔</span></td>
</tr>
<tr>
  <td align="right">Outlook</td>
  <td align="center"><span style="color:green">✔</span></td>
  <td align="center"><span style="color:green">✔</span></td>
</tr>
<tr>
  <td align="right">Yahoo</td>
  <td align="center"><span style="color:red">❌</span></td>
  <td align="center"><span style="color:red">❌</span></td>
</tr>
<tr>
  <td align="right">iCloud</td>
  <td align="center"><span style="color:red">❌</span></td>
  <td align="center"><span style="color:green">✔</span></td>
</tr>
<tr>
  <td align="right">OWA</td>
  <td align="center"><span style="color:red">❌</span></td>
  <td align="center"><span style="color:green">✔</span></td>
</tr>
<tr>
  <td align="right">FastMail</td>
  <td align="center"><span style="color:green">✔</span> ⭐</td>
  <td align="center"><span style="color:green">✔</span></td>
</tr>
</tbody>
</table>

<h2 id="winners"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#winners">Winners!</a></h2>

<p>Both Gmail and Outlook failed the last time I tried them - I'm very pleased to say that both of them now support sending to Chinese addresses.</p>

<p>One strange thing to note, when looking through Outlook's message details, I found this example of <a href="https://en.wikipedia.org/wiki/Mojibake">Mojibake</a>.
<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/Outlook-Encoding-Issues-.png" alt="Outlook showing encoding errors, mangling up the email address" width="528" height="163" class="aligncenter size-full wp-image-23344"></p>

<h2 id="losers"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#losers">Losers!</a></h2>

<h3 id="yahoo"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#yahoo">Yahoo</a></h3>

<p>The biggest loser is Yahoo.  Very strange considering <a href="https://en.wikipedia.org/wiki/Jerry_Yang">Jerry Yang</a>, their founder, is Taiwanese-American.  Even stranger given <a href="https://en.wikipedia.org/wiki/Criticism_of_Yahoo!#Work_in_the_People.27s_Republic_of_China">Yahoo's continued dealings with China</a>.</p>

<p>The Yahoo webmail portal simply wouldn't let me send to a Chinese domain name.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/Yahoo-email-not-recognised-.png" alt="Yahoo unable to send a message to a Chinese email address" width="640" height="349" class="aligncenter size-full wp-image-23340">

<p>The Punycode representation appeared to send but immediately failed.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/Yahoo-unable-to-send-message-.png" alt="Yahoo unable to send a message to a Chinese email address" width="638" height="169" class="aligncenter size-full wp-image-23339">

<h3 id="icloud"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#icloud">iCloud</a></h3>

<p>Apple's much-vaunted "It Just Works" philosophy obviously doesn't extend to International email addresses.  It accepted the Punycode but gave this <em>delightful</em> error message on the Chinese domain.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/iCloud-Delivery-Failure-Notification-.png" alt="iCloud showing a delivery failure notification" width="607" height="458" class="aligncenter size-full wp-image-23342">

<h3 id="owa"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#owa">OWA</a></h3>

<p>Microsoft's Outlook Web Access got <em>very</em> confused and tried to look up the email address in the local directory.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/OWA-No-Match-Found-.png" alt="Outlook Web Access showing no match found" width="543" height="170" class="aligncenter size-full wp-image-23343">

<h2 id="errr"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#errr">Errr?</a></h2>

<h3 id="%e2%ad%90-fastmail"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#%e2%ad%90-fastmail">⭐ FastMail</a></h3>

<p>Lots of people recommended that I try <a href="https://www.fastmail.com/">Fastmail</a> - it <em>really</em> didn't like the look of the Chinese domain and painted it with a red error colour.  That said, it sent the email without further complaint.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/Fastmail-showing-red-error-on-email-.png" alt="Fastmail apparently showing that the email address is invalid" width="545" height="436" class="aligncenter size-full wp-image-23345">

<h2 id="what-about-a-chinese-local-part"><a href="https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/#what-about-a-chinese-local-part">What about a Chinese Local-Part?</a></h2>

<p>Email is a venerable protocol. That's a polite way of saying it is old and outdated.  The <a href="https://en.wikipedia.org/wiki/Email_address#Local-part">local-part</a> of the email address (<code>test@</code>) is generally restricted to a handful of <a href="https://www.jochentopf.com/email/chars.html">7 Bit ASCII characters</a>.  None of the email providers I tried would let me sign up with a Chinese name. So no 你好@yahoo.com for me!</p>

<p>But what happens if you're foolish enough to try to send an email to <code>你好@莎士比亚.org</code>?</p>

<p>Well you'll probably get this error message:</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2016/09/SMTPUTF8-Delivery-Failure-Notification.png" alt="Technical details of permanent failure: local-part of envelope RCPT address contains utf8 but remote server did not offer SMTPUTF8" width="659" height="160" class="aligncenter size-full wp-image-23348">

<p>In 2012, <a href="https://tools.ietf.org/html/rfc6531">RFC 6531 defined how International Email Addresses should work</a>.  Over four years later and <a href="https://en.wikipedia.org/wiki/Extended_SMTP#SMTPUTF8">support is <em>still</em> not widespread</a>.</p>

<p>It's 2016 and the majority of the world <strong>can't send an email to their preferred name</strong>.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=23331&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[How Do You Pronounce Your Domain Name?]]></title>
		<link>https://shkspr.mobi/blog/2013/12/how-do-you-pronounce-your-domain-name/</link>
					<comments>https://shkspr.mobi/blog/2013/12/how-do-you-pronounce-your-domain-name/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Fri, 27 Dec 2013 14:15:33 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[chinese]]></category>
		<category><![CDATA[domains]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[rant]]></category>
		<guid isPermaLink="false">http://shkspr.mobi/blog/?p=9392</guid>

					<description><![CDATA[I was listening to a podcast recently which was kind enough to mention one of my blog posts.  The presenter said:  ...and you should Google for this, because I&#039;m really not sure how to pronounce this.  Is it shu-huk-spur? dot mobby?  Le sigh!  It&#039;s a conversation I have most weeks when I&#039;m on the phone to someone - usually a call centre - and they ask for my email address.  &#34;Sierra Hotel Kilo…]]></description>
										<content:encoded><![CDATA[<p>I was listening to a podcast recently which was kind enough to mention one of my blog posts.  The presenter said:</p>

<blockquote>...and you should Google for this, because I'm really not sure how to pronounce this.  Is it shu-huk-spur? dot mobby?</blockquote>

<p><em>Le sigh!</em>  It's a conversation I have most weeks when I'm on the phone to someone - usually a call centre - and they ask for my email address.</p>

<blockquote>"Sierra Hotel Kilo Sierra Papa Romeo Dot Mike Oscar Bravo India"</blockquote>

<p>Whereupon I am inevitably asked:</p>

<blockquote>Is that dot com or dot co dot UK at the end, sir?</blockquote>

<p>Yes! I have chosen an almost unpronounceable domain on an obscure TLD.  Woe is me!</p>

<p>Originally, I thought this wouldn't be a problem. Typing in the domain is quick and easy.  But a surprising number of organisations still insist on taking personal data over the phone.  Which means more reading out the phonetic spelling.</p>

<p>Frustratingly, a large number of websites refuse to accept .mobi as a valid TLD for email addresses.  The geniuses who coded them appeared to think that every email address must end with a 3 character (.com, .org, .net) or 2 character (.uk, .de, .io) sequence.  Despite the fact that there are <a href="http://www.iana.org/domains/root/db">dozens of domains which don't fit in this restriction</a>.</p>

<h2 id="doubling-down"><a href="https://shkspr.mobi/blog/2013/12/how-do-you-pronounce-your-domain-name/#doubling-down">Doubling Down</a></h2>

<p>Being the belligerent sod that I am, I refuse to give in to the tyranny of the spoken word!  We live in an digital world and digital data should be communicated by digital means.  I want to impart information like my email address over the wire - not over the phone.</p>

<p>Regular readers will know that I was thwarted in my quest to buy a .中国 domain - but I did manage to grab <a href="http://莎士比亚.org/" title="http://莎士比亚.org/">http://莎士比亚.org/</a>.</p>

<p>I think I'm going to move my primary email to that domain.  When I get some call-centre who won't let me fill in a form online to give them my details, I shall very politely say my email address is:</p>

<blockquote>Eden - yes, like the garden - at Shā​shì​bǐ​yà... Oh, of course, the <a href="http://commons.wikimedia.org/wiki/Commons:Stroke_Order_Project">stroke order</a> is... Well, no, it's a Mandarin Chinese domain... No... No... Fine, would you like the punycode representation?  Hello?</blockquote>

<p>I'll also refuse to do business when any organisation which doesn't recognise IDN email addresses. That'll show 'em!</p>

<p>Perhaps I'll also move this blog over to that domain as well. I wonder what impact speakability has on SEO?</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=9392&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2013/12/how-do-you-pronounce-your-domain-name/feed/</wfw:commentRss>
			<slash:comments>21</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Introducing 莎士比亚.org - Readable Shakespeare Plays In Chinese]]></title>
		<link>https://shkspr.mobi/blog/2013/06/introducing-%e8%8e%8e%e5%a3%ab%e6%af%94%e4%ba%9a-%e4%b8%ad%e5%9b%bd-readable-shakespeare-plays-in-chinese/</link>
					<comments>https://shkspr.mobi/blog/2013/06/introducing-%e8%8e%8e%e5%a3%ab%e6%af%94%e4%ba%9a-%e4%b8%ad%e5%9b%bd-readable-shakespeare-plays-in-chinese/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sat, 15 Jun 2013 13:33:57 +0000</pubDate>
				<category><![CDATA[Shakespeare]]></category>
		<category><![CDATA[chinese]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[shakespeare]]></category>
		<guid isPermaLink="false">http://shkspr.mobi/blog/?p=8382</guid>

					<description><![CDATA[I&#039;m very pleased to announce the launch of 莎士比亚.org - beautiful and readable copies of Shakespeare plays in Chinese.  If you would like to help, the text is available on GitHub for people to correct.   Why?  I&#039;ve long held a fascination with Shakespeare - hence the name of this website.  At university I studied Mandarin as my minor degree.  I was a clumsy student, but enjoyed the regularity and po…]]></description>
										<content:encoded><![CDATA[<p>I'm very pleased to announce the launch of <a href="http://莎士比亚.org/">莎士比亚.org</a> - beautiful and readable copies of Shakespeare plays in Chinese.</p>

<p>If you would like to help, the <a href="https://github.com/edent/xn--jlQ54W7yPemW.org/">text is available on GitHub</a> for people to correct.
<a href="http://xn--jlq54w7ypemw.org/第十二夜"><img src="https://shkspr.mobi/blog/wp-content/uploads/2013/06/Chinese-Shakespeare.jpg" alt="Chinese Shakespeare" width="567" height="494" class="aligncenter size-full wp-image-8383"></a></p>

<h2 id="why"><a href="https://shkspr.mobi/blog/2013/06/introducing-%e8%8e%8e%e5%a3%ab%e6%af%94%e4%ba%9a-%e4%b8%ad%e5%9b%bd-readable-shakespeare-plays-in-chinese/#why">Why?</a></h2>

<p>I've long held a fascination with Shakespeare - hence the name of this website.  At university I studied Mandarin as my minor degree.  I was a clumsy student, but enjoyed the regularity and poetry of the language.</p>

<p>I discovered the Chinese writer <a href="http://en.wikipedia.org/wiki/Zhu_Shenghao">Zhu Shenghao</a> had translated many of Shakespeare's plays before his untimely death in 1944.  I couldn't find any comprehensive collection online, so I gathered together what I could find and reformatted it in what I consider to be a beautiful format.</p>

<p>According to <a href="http://en.wikisource.org/wiki/Copyright_Law_of_the_People%27s_Republic_of_China_(2010)#Section_3_Term_of_Protection">Chinese copyright law</a> the rights expire fifty years after the author's death.</p>

<h2 id="how"><a href="https://shkspr.mobi/blog/2013/06/introducing-%e8%8e%8e%e5%a3%ab%e6%af%94%e4%ba%9a-%e4%b8%ad%e5%9b%bd-readable-shakespeare-plays-in-chinese/#how">How?</a></h2>

<p>I am indebted to <a href="https://twitter.com/beng">Ben Griffiths</a>'s code "<a href="https://github.com/techbelly/the-plays-the-thing">The Play's The Thing</a>" which marks up Shakespeare's plays in XML, then uses Ruby to create a static site.</p>

<p>So, I've learned how to use PHP's domDocument to manipulate atrociously formatted HTML, discovered more than I ever wanted to know about different methods of encoding Chinese characters (UTF-8, GB2312, big5, etc) and how to convert between them, got my head around Ruby to parse XML and spit out HTML, and a whole bunch about <a href="https://shkspr.mobi/blog/2013/05/subsetting-chinese-fonts/" title="Subsetting (Chinese) Fonts">font subsetting</a> in order to reduce the size of the webfonts.</p>

<p>On the issue of fonts, I've chosen four different fonts - each for different heading or body texts.  The fonts used for the titles range from 20KB-360KB.  The main body font is 3.2MB.  That's probably far too large.  They're subsetted down to the the bare minimum number of characters needed, but I'll have to find a way to improve on that.</p>

<p>I showed the pages off to a couple of native Chinese speakers and they confirmed that the fonts were legible.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2013/06/Romeo-Juliet-Chinese.jpg" alt="Romeo Juliet Chinese" width="470" height="419" class="aligncenter size-full wp-image-8399">

<h2 id="why-not-%e4%b8%ad%e5%9b%bd"><a href="https://shkspr.mobi/blog/2013/06/introducing-%e8%8e%8e%e5%a3%ab%e6%af%94%e4%ba%9a-%e4%b8%ad%e5%9b%bd-readable-shakespeare-plays-in-chinese/#why-not-%e4%b8%ad%e5%9b%bd">Why not .中国?</a></h2>

<p>I had originally registered an International Domain Name ending in .中国 - sadly, the Chinese Registry rejected the name.  According to the <a href="http://sbcx.saic.gov.cn/trade-e/">Chinese Trademark Registry</a> there are a dozen companies who have registered "Shakespeare" as a trademark.  And, unless I was prepared to pay over $500 for a trademark I was out of luck.</p>

<p>Hmph.</p>

<h2 id="help"><a href="https://shkspr.mobi/blog/2013/06/introducing-%e8%8e%8e%e5%a3%ab%e6%af%94%e4%ba%9a-%e4%b8%ad%e5%9b%bd-readable-shakespeare-plays-in-chinese/#help">Help?</a></h2>

<p>I'm sure that the translations aren't necessarily formatted in the correct way.  There are also likely to be plays missing.  My design ambitions exceed my skills.</p>

<p>If you would like to help improve this service, please pop along to the <a href="https://github.com/edent/xn--jlQ54W7yPemW.org/">GitHub repository</a>.  There, you will be able to leave a report if something is wrong or even make the changes yourself.</p>

<p>Or, drop a note in the comment box below!</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=8382&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2013/06/introducing-%e8%8e%8e%e5%a3%ab%e6%af%94%e4%ba%9a-%e4%b8%ad%e5%9b%bd-readable-shakespeare-plays-in-chinese/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Is GitHub Racist?]]></title>
		<link>https://shkspr.mobi/blog/2013/06/is-github-racist/</link>
					<comments>https://shkspr.mobi/blog/2013/06/is-github-racist/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sat, 08 Jun 2013 16:20:26 +0000</pubDate>
				<category><![CDATA[usability]]></category>
		<category><![CDATA[accents]]></category>
		<category><![CDATA[chinese]]></category>
		<category><![CDATA[github]]></category>
		<category><![CDATA[racism]]></category>
		<guid isPermaLink="false">http://shkspr.mobi/blog/?p=8316</guid>

					<description><![CDATA[One of the interesting aspects of privilege is how it lays bare our unconscious assumptions about the world.  A male software developer may never consider that a user would want or need to change their name.  Thus they would design a product which ignored the millions of women changing their names after marriage.  It&#039;s very temping to see software as racist when, in reality, it&#039;s more likely to…]]></description>
										<content:encoded><![CDATA[<p>One of the interesting aspects of privilege is how it lays bare our unconscious assumptions about the world.  A male software developer may never consider that a user would want or need to change their name.  Thus they would design a product which ignored the millions of women changing their names after marriage.</p>

<p>It's very <a href="https://web.archive.org/web/20130812093717/http://mymisanthropicmusings.org.uk/is-my-crm-racist/">temping to see software as racist</a> when, in reality, it's more likely to have a root cause of unconscious assumptions.</p>

<p>Take, for example, <a href="http://github.com">GitHub</a>.  You can host all of your software projects on there - as long as you speak English.</p>

<p>Wait? What?</p>

<p>Try adding a repository which contains, say, Chinese - and all those beautiful characters will be replaced with "-".
<img src="https://shkspr.mobi/blog/wp-content/uploads/2013/06/Chinese-GitHub-fs8.png" alt="Chinese GitHub" width="382" height="215" class="aligncenter size-full wp-image-8320"></p>

<p>I asked GitHub about this, and quickly got this reply.</p>

<blockquote>Unfortunately, at the moment, you can only use ASCII (i.e. Windows-1252) characters in Repo names. Most things on GitHub.com support non-ASCII but because of limitations in Git, the repo name isn't one of them. Sorry about the international-unfriendliness</blockquote>

<p>Interestingly, that's not quite the case.  <a href="https://en.wikipedia.org/wiki/Windows-1252">Windows-1252</a> contains some characters with accents - they simply aren't recognised by GitHub.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2013/06/Accents-Github-fs8.png" alt="Accents Github" width="369" height="217" class="aligncenter size-full wp-image-8319">

<p>We don't live in a homogeneous world. US English is not the global language.  Even if it was, ASCII is insufficient to the task of information interchange.</p>

<p><a href="https://en.wikipedia.org/wiki/ASCII">ASCII was invented in 1972</a> - 40 years later and our brand new shiny kit is hamstrung by the needs of the <em>telegraph</em> industry!  It's like that wonderful urban legend about the <a href="http://www.snopes.com/history/american/gauge.asp">Space Shuttle being constrained by the size of a horse's arse</a>.</p>

<p>Obviously, GitHub isn't racist.  Either they or the originators of Git have assumed that their local dialect is sufficient for a service which aims to be universally acceptable.  All the more strange given that Linus Torvalds, the creator of Git, is Finnish and - one presumes - knows about <em><a href="http://en.wikipedia.org/wiki/Finnish_alphabet">ääkköset</a></em> (the "extra" letters in the Finnish alphabet).</p>

<p>At this stage in the maturity of the software industry, we should consider the practice of not supporting Unicode as outmoded and dangerous as assuming every year can be represented by a two digit number.</p>

<p>There's a world outside our narrow viewpoint and, if we want to do business with that world, we need to speak their language.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=8316&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2013/06/is-github-racist/feed/</wfw:commentRss>
			<slash:comments>11</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Subsetting (Chinese) Fonts]]></title>
		<link>https://shkspr.mobi/blog/2013/05/subsetting-chinese-fonts/</link>
					<comments>https://shkspr.mobi/blog/2013/05/subsetting-chinese-fonts/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 22 May 2013 07:52:11 +0000</pubDate>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[chinese]]></category>
		<category><![CDATA[font]]></category>
		<category><![CDATA[fonts]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[unicode]]></category>
		<category><![CDATA[utf-8]]></category>
		<guid isPermaLink="false">http://shkspr.mobi/blog/?p=8294</guid>

					<description><![CDATA[There are loads of really delightful Simplified and Traditional Chinese True Type Fonts available on the web.  There&#039;s only one issue - the file sizes are really large.  In many cases, too large to effectively use as a web-font.  For example, this calligraphy style font is 3.4MB.    The beautiful Paper Cut Font weighs in at 14MB!    That file-size is far to heavy to embed on a web page. …]]></description>
										<content:encoded><![CDATA[<p>There are loads of really delightful <a href="https://web.archive.org/web/20130520092556/http://chinesefont.brushes8.com/tag/simplified-chinese-font">Simplified and Traditional Chinese True Type Fonts</a> available on the web.  There's only one issue - the file sizes are really large.  In many cases, too large to effectively use as a web-font.</p>

<p>For example, this <a href="https://web.archive.org/web/20121219024439/http://chinesefont.brushes8.com/richwin-fonts/richwin-xing-kai-jian-fan-font.html">calligraphy style font</a> is 3.4MB.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2013/05/Richwin-Xing-kai-jian-Fan-Font-fs8.png" alt="Richwin-Xing-kai-jian-Fan-Font-fs8" width="445" height="79" class="alignnone size-full wp-image-8296">

<p>The beautiful <a href="https://web.archive.org/web/20140608004145/http://chinesefont.brushes8.com/xin-di-paper-cut-font-simplified-chinese.html">Paper Cut Font</a> weighs in at 14MB!</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2013/05/Paper-Cut-Chinese-Font-fs8.png" alt="Paper Cut Chinese Font-fs8" width="487" height="223" class="alignnone size-full wp-image-8297">

<p>That file-size is far to heavy to embed on a web page.</p>

<h2 id="subsetting"><a href="https://shkspr.mobi/blog/2013/05/subsetting-chinese-fonts/#subsetting">Subsetting</a></h2>

<p>Generally speaking, font files like .ttf contain a representation of every single character. 0-9, a-z, A-z, all the punctuation, non-English characters etc.</p>

<p>That's really useful if the font is installed on your computer and you want to write a document which <em>could</em> contain every character.  It's less helpful if you want to use a fancy font on your website's headers.</p>

<p>Subsetting is the act of creating a subset of a font.  That is, a font file which only contains specific characters.</p>

<p>Let's suppose that we only want a specific phrase rendered in this font.</p>

<pre>我很丢脸。我没有吃Fruity Oaty Bar</pre>

<p>We only need 19 unique characters - we can get rid of any character which doesn't appear in that heading.</p>

<p>There are sevel font manipulation tools available.  I've chosen <a href="https://web.archive.org/web/20130522120140/http://fonts.philip.html5.org/">Font Optimizer</a> which has an excellent live demo page.  The <a href="https://web.archive.org/web/20160322095140/https://bitbucket.org/philip/font-optimizer/overview">source code is on BitBucket</a>.</p>

<p>The command line syntax is really simple</p>

<pre>./subset.pl --chars="我很丢脸。我没有吃Fruity Oaty Bar" input.ttf output.ttf</pre>

<p>The file size reduction is impressive.  My original font was over 14MB.  The optimized one is 32<strong>K</strong>B</p>

<pre>14,066,456 input.ttf
   32,084 output.ttf
</pre>

<p>The process run instantly - fast enough to run as a web service to generate these fonts dynamically, I would think.</p>

<p>One could quite easily create a scrap of JavaScript which read the contents of a block of text and then requested a font which contained only the necessary characters.</p>

<p>Apparently, <a href="https://web.archive.org/web/20130620105955/http://scripts.sil.org/cms/scripts/page.php?item_id=OFL_web_fonts_and_RFNs#b4599c52">Monotype have a proprietary and patent-pending solution</a> to this rather trivial application.</p>

<h2 id="uses"><a href="https://shkspr.mobi/blog/2013/05/subsetting-chinese-fonts/#uses">Uses</a></h2>

<p>Being able to subset fonts to reduce file size is incredibly useful.  Supposing you want a different font for body text, headers, and navigation.  Rather than having to load three large font files containing every character in the known universe, you could subset each one for only exactly the relevant characters.</p>

<p>This also has an interesting DRM like effect.  Some people don't want their shiny web fonts to be downloaded and used as a regular font.  With subsetting, the font only contains the specific characters.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=8294&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2013/05/subsetting-chinese-fonts/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
	</channel>
</rss>
