Virgin Media don't understand Unicode


More adventures with Unicode. I logged in to my Virgin Media account to see when my promotional discount would end. Here's what their billing PDF said.

Promotional offer(s) You'Re Receiving A Â2 Loyalty Discount. If You Change Your Package You May Lose This Discount Which Will End 10-OCT-18.

Let'S Ignore The Weird Capitalisation Virgin'S System Uses. What's that  doing there?

Their website says:

 You'Re Receiving A 2 Loyalty Discount. If You Change Your Package You May Lose This Discount Which Will End 10-OCT-18

No  symbol, but also no £ sign. Ah, but let's look at the underlying code.

HTML code from Virgin.

What's that weird character? It is the control character string terminator, of course...

Well, my discount is nearly finished, so I asked them for a larger discount. "Sure!" they said "How does Ÿ3 sound?"

Promotional Offer(s) 6m œ3.00 RIV Discount (until 17th February 2019).

Amusingly, when I copy the Ÿ from that PDF, it shows up as the character œ!

What's Going On?

I've written extensively about how the £ symbol is encoded - but here's a primer.

  • £ in ISO-8859-1 (Latin-1) is decimal 163.
  • £ in Unicode is also 163 - but it gets stored as two UTF-8 bytes - 194 & 163. In hex this is 0xC2 0xA3.
  • In Windows-1252 - the legacy encoding for ancient version of Microsoft's software - 0xC2 gets rendered as Â.

So, at some point, Virgin's billing software is seeing 0xC2 0xA3, encoding it as £, and then grabbing the first character to print on the bills.

Where do the other characters come from?

  • In Code Page 437 - an ancient IBM encoding - the £ symbol is 0x9C.
  • 0x9C in Windows 1252 is œ
  • The String Terminator is 0xC2 0x9C

And the Ÿ character? Not a clue! Inspecting the raw text of the PDF shows the underlying code is:
6m \2343.00 RIV Discount.
PDFs escape octal characters. Octal 234 is decimal 156 - which is hex 0x9C.

Nearest I can get is the ISO/IEC 8859-15 encoding, where œ is 0xBD and Ÿ is 0xBE. Perhaps a font substitution error?

Everything is awful

This isn't just ugly. It points to the fact that Virgin don't test their software and don't upgrade their systems. What other horrors lie in their technology stack?

And it isn't just a tech issue. It is bad for screenreaders - meaning visually impaired users get a poor experience.

The year is 2018. And we're still battling text encoding issues due to crappy software.

Support this blog

Enjoyed this blog post? You can say thanks to the author in the following ways:

Donate to charity
Give to charity.
Buy me a birthday present
Amazon Wishlist
Get me a coffee
Donate on Ko-Fi.

One thought on “Virgin Media don't understand Unicode

  1. I also have concerns with the way Virgin Media operate their account security. The highly restrictive password rules (7-10 chars, must start with a letter, etc) suggests it isn’t a very sophisticated implementation and I would guess not hashed before storage. I’ve raised this by email and twitter on a number of occasions but it has been ignored.

    The lack of testing you’ve identified on their website reinforces my worries.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.