Amazon Prime Video’s weird Unicode problems

It’s 2019 and high-tech devices are still plagued by text encoding bugs.

I recently bought the new 4K Amazon Fire Stick. It’s a little Android dongle which plays videos. It’s neat – but quite often displays weird text errors.

Take the kids’ TV show House of Anubis, the Fire displays the description like this:
A description of a TV show - there are some weird blank characters on screen.

Looking at the source code for the description:
Welcome to Anubis House. It’ is at this boarding school that students begin settling in for the academic year only to find one student missing, a secret panel in the house’s attic, codes leading to backroom staircases, clandestine rituals, and more!

That’s the character “private use two” (U+0092). What on earth is that doing there?

Well, in the ancient Windows-1252 encoding, 0x92 is – the curly quote. It looks like someone has tried to write “in the house’s attic” on an old Windows machine, Amazon have ingested the data thinking it was Unicode, and displayed an error!

Here’s “Young Sheldon“:

Description with an error in it.

How does the word “isn’t” become “isn’t”? There’s a great explanation from Justin Weiss, basically:

  • Unicode (U+2019) contains 3 bytes – 0xE2 0x80 0x99
  • When those three bytes are converted to Windows-1252, they become â € ™

The Android version of Prime video goes a step further:

Screenshot of the same text on Android. An additional error has crept in.

The Euro Symbol has been converted into its HTML entity €

There are hundreds of examples of video descriptions being mangled like this.

And, occasionally, you find this hot mess:

Dozens of weird characters on screen.
What’s going on?

Two countries separated by a common language

Except, of course, this doesn’t happen on the American version of Amazon Prime Video.

Here’s the description of Gotham S03 from Amazon USA:

Screencap from the American version. All the text is pristine.

Once the text gets to this side of the pond, it goes horribly wrong.
Screencap showing the description text from the UK. It is mangled.

As well as the â€TM issue, there’s another new snag. The hyphen is actually - hyphen-minus (U+002D). Something in Amazon’s conversion process is transforming that to en dash (U+2013).

That then gets buggily encoded to —.

I’ve reported these errors to my friends at Amazon – they are in the process of correcting them.

  1. £50.

    I see what you did there.

