A small text rendering bug in legal judgements


OK, first off, you have to read this amazing judgement about whether Walker's Sensations Poppadoms count as a potato-based snack for VAT purposes. Like most judgements, it is written in fairly plain and accessible language. The arguments are easy to follow and it even manages to throw in a little humour.

But if you read closely, you'll see there are a few instances where an errant question-mark pops up:

Screenshot of text. Highlighted are a couple of instances of a question mark followed by the letters "o", "u", "r".

From context, it is pretty clear the word should be "flour" but is rendered as "?our" - why is that?

The original PDF judgement can be downloaded from the official Tribunals website (an ancient service which is long overdue for an update).

If you search the PDF the word "flour" and select it, notice what happens:

Looking at the metadata of the PDF, it appears the file was created with Office 365 which has "helpfully" used a typographic ligature - "fl".

Drawing showing how two letters can be squashed together to form a new symbol.

Ligatures are handy for displaying characters in a pleasing manner - but they can really confuse some software.

One way to deal with this is to use a process called "Unicode Normalisation". It is rather dull and technical, but there are plenty of libraries which will split these characters.

Here's how it works for the "fi" ligature:

Graphic showing the "F" "I" ligature being split.

There are a few issues here.

Firstly, Office 365 should not be using Unicode ligatures. The text should have the letters "f" and "l" but it is the font which should display as a ligature.

Secondly, Bailii's processing of the PDF should either cope with normalisation or it should throw loud and explicit warnings when it runs into something it doesn't understand.

Thirdly, as well as Bailii and the Tribunal Service, the PDF is also available at the more modern Case Law service from The National Archive. Their HTML and PDF documents also have the ligatures, but have subtly different layouts because they have been re-rendered with LibreOffice 7.2.

I've reported the issue to Bailii via their contact form. I've also raised a bug with The National Archive.

And now I'm off to enjoy some tasty potato-based snacks which have been assessed at the correct level of tax!


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

11 thoughts on “A small text rendering bug in legal judgements”

  1. said on social.chatty.monster:

    @Edent The next time I get mildly scolded for pedantic textual analysis of something, I'm going to point to the time a real live judge wrote

    Nominative determinism is not a characteristic of snack foods: calling a snack food "Hula Hoops" does not mean that one could twirl that product around one's midriff, nor is "Monster Munch" generally reserved as a food for monsters.

    Reply | Reply to original comment on social.chatty.monster
  2. says:

    Ligatures are handy for displaying characters in a pleasing manner - but they can really confuse some software.

    Frankly, I feel like the "fi" and "fl" ligatures are less pleasing and more confusing to the eye. The dot above the "i" is gone, and the two letters don't look like two distinct letters anymore, "fl" looks like "A".

    Reply
  3. says:

    At least with Firefox on Linux at 2024-01-23T20:12:02Z it appears as two separate characters, at least in the first link to the HTML version.

    Rendered text for paragraph 7 of the HTML ruling.
    Reply

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">