Amazon Prime Video's weird Unicode problems
It's 2019 and high-tech devices are still plagued by text encoding bugs.
I recently bought the new 4K Amazon Fire Stick. It's a little Android dongle which plays videos. It's neat - but quite often displays weird text errors.
Take the kids' TV show House of Anubis, the Fire displays the description like this:
Looking at the source code for the description:
That's the character "private use two" (U+0092). What on earth is that doing there?
Well, in the ancient Windows-1252 encoding, 0x92
is ’
- the curly quote. It looks like someone has tried to write "in the house’s attic" on an old Windows machine, Amazon have ingested the data thinking it was Unicode, and displayed an error!
Here's "Young Sheldon":
How does the word "isn't" become "isn’t"? There's a great explanation from Justin Weiss, basically:
- Unicode
’
(U+2019) contains 3 bytes -0xE2 0x80 0x99
- When those three bytes are converted to Windows-1252, they become â € ™
The Android version of Prime video goes a step further:
The Euro Symbol has been converted into its HTML entity €
There are hundreds of examples of video descriptions being mangled like this.
And, occasionally, you find this hot mess: What's going on?
Two countries separated by a common language
Except, of course, this doesn't happen on the American version of Amazon Prime Video.
Here's the description of Gotham S03 from Amazon USA:
Once the text gets to this side of the pond, it goes horribly wrong.
As well as the â€TM
issue, there's another new snag. The hyphen is actually -
hyphen-minus (U+002D). Something in Amazon's conversion process is transforming that to –
en dash (U+2013).
That then gets buggily encoded to â€".
I've reported these errors to my friends at Amazon - they are in the process of correcting them.
You can buy the Amazon Fire TV Stick 4K Ultra HD for just £50.
Andy Mabbett says:
£50.
I see what you did there.