PDFs are the Cheques of the 21st Century


Cheques (checks if you're American) are an amazing legacy technology. Invented in the 17th Century, they immediately transformed the financial landscape. They allowed anyone to transfer both vast and trivial sums of wealth with ease.

Whole industries grew up around them - one of my first jobs was programming binary loadlifters repairing computerised cheque-readers - they're an example of a technology which "just works".

Which, of course, is nonsense. They're fragile, easy to forge, have little protection, and are cumbersome to use for all parties - they're just generally terrible. At this stage of a legacy technology, we generally have two choices - replace or double-down.

Most of the modern world has replaced cheques with electronic banking. Whether its the innovative mobile services of Africa's M-Pesa or the somewhat stodgy British Direct Debits - money flows electronically. Low latency, low cost, low effort.

America, however, has decided to pile on successive work-arounds for the inconvenience. This cool app lets you take a photo of a check and have it magically deposit in your account! (After five days.)

What a pile of nonsense. Shuffling paper around from one end of the country to the other, at high latency, high cost, and high effort. It's just bollocks, isn't it? No matter how many iPhone-based band-aids you stick on.

So we come to the PDF.

The Chilcot Report into the Iraq War has just been released. As PDF. Fifty separate PDFs.

This is despite repeated requests that it should be available in a more suitable format.

Let me be clear. Releasing a PDF is a hostile act. It says you care more about your own convenience than your users'.

As described in The New Statesman, this is an intersection of politics, open standards, and technology.

Let's take a quick look at PDF's sins.

Hard to reflow

If you're reading a PDF on a digital device - because who prints out to paper any more? - you'll probably want to zoom in on the text. Here's what happens with the Chilcot Report if you ask Android to reflow the text:

A PDF demonstrating poor reflow-

Inaccessible.

Supposedly accessible PDFs fail in horrendous ways - even when trying to do something as simple as copying text from them.

There are many technique you can use to make a PDF accessible - just like with HTML - but it would appear that the Chilcot Report doesn't take advantage of any of these.

Difficult to extract data.

Quick - find every reference in these PDFs to "Camp David". You can't. Quick - disambiguate between Camp David, Sir David Manning, David Brummell, Dr David Kelly. You can't. There is no semantic markup.

Of course, some people see this as an advantage.

RT[redacted]: Crushing news that people will need to read shit and not just hop from keyword to keyword https://twitter.com/edent/status/750652342853365760Matilda (@tillywrites) July 6, 2016

Huge attack surface

PDF is a horrifically complicated standard. So complex that it leads to all sorts of terrifying ways to abuse the reader.

(That's not to say that HTML5 isn't also complex, of course.)

Bloated file size.

Combined, those PDFs add up to around 40MB. Great if you're on a fast connection - not great if you're paying per MB on your phone.

Converted to a plain text format like markdown, it's a mere 370 KB. That's a full two orders of magnitude smaller.

PDF does make it possible to link to specific pages - but pages are a skeuomorph. We want to be able to link to specific paragraphs.

While the reports contain some external links, there are no internal links between the documents - making navigation and cross referencing even harder than necessary.

And So It Goes

Other people have written about PDF's failings with far more eloquence than me. This isn't a new rant - Jakob Nielsen was criticising PDF back in 2001.

I truly believe that the Internet needs to treat the PDF as harshly as it treated Flash. We should be embarrassed that such legacy technologies have been allowed to create a stranglehold on our creativity. They are stifling our democracy by trapping vital information in a digital tar pit.

We must drive PDF out - cast it to the winds - make it as impolite to use as auto-playing MIDI on a website.

You wouldn't accept being paid by paper cheque - why should you accept receiving data by PDF?


If you'd like to help convert the Chilcot Report to a more open, accessible, and semantic format please get involved with official-inquiries.com


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

One thought on “PDFs are the Cheques of the 21st Century”

Trackbacks and Pingbacks

What links here from around this blog?

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">