A small text rendering bug in legal judgements
OK, first off, you have to read this amazing judgement about whether Walker's Sensations Poppadoms count as a potato-based snack for VAT purposes. Like most judgements, it is written in fairly plain and accessible language. The arguments are easy to follow and it even manages to throw in a little humour.
But if you read closely, you'll see there are a few instances where an errant question-mark pops up:

From context, it is pretty clear the word should be "flour" but is rendered as "?our" - why is that?
The original PDF judgement can be downloaded from the official Tribunals website (an ancient service which is long overdue for an update).
If you search the PDF the word "flour" and select it, notice what happens:
Looking at the metadata of the PDF, it appears the file was created with Office 365 which has "helpfully" used a typographic ligature - "fl".

Ligatures are handy for displaying characters in a pleasing manner - but they can really confuse some software.
One way to deal with this is to use a process called "Unicode Normalisation". It is rather dull and technical, but there are plenty of libraries which will split these characters.
Here's how it works for the "fi" ligature:

There are a few issues here.
Firstly, Office 365 should not be using Unicode ligatures. The text should have the letters "f" and "l" but it is the font which should display as a ligature.
Secondly, Bailii's processing of the PDF should either cope with normalisation or it should throw loud and explicit warnings when it runs into something it doesn't understand.
Thirdly, as well as Bailii and the Tribunal Service, the PDF is also available at the more modern Case Law service from The National Archive. Their HTML and PDF documents also have the ligatures, but have subtly different layouts because they have been re-rendered with LibreOffice 7.2.
I've reported the issue to Bailii via their contact form. I've also raised a bug with The National Archive.
And now I'm off to enjoy some tasty potato-based snacks which have been assessed at the correct level of tax!
At least with Firefox on Linux at 2024-01-23T20:12:02Z it appears as two separate characters, at least in the first link to the HTML version.
@edent says:
rerdavies says:
Ironically, you have presented the evidence that fi/fl ligatures should NOT be decomposed. Documents should store text in either NFC or NFD format. NFKD normalization is destructive, and is used only for... searching/comparison. In the table you have presented, fi is composed in both NFC and NFD, which clearly indicates that fi/fl should not be decomposed even if stored in NFD (decomposed) form.
More comments on Mastodon.