Leveson - Death By A Thousand (Paper) Cuts


I've been listening to the Leveson inquiry. A large part of the exchanges seem to go like this:

Jay: Turning to page 51.
Witness: Which bundle?
Jay: 1606.
Witness: 1660?
Leveson: No, the page after.
Jay: Paragraph 7.
Witness: I don't have a paragraph 7.
Jay: Ah, I have an earlier print out.
Leveson: You'll find it in tab 15.
Witness: Is this Volume 2?

And so on, ad nauseum.

Surely there's no reason to have so much paper wastefully printed and then discarded? Why not a single reference electronic document which can be supplied to each participant? Allowing them to increase the font size, annotate, cross reference, and search?

Ah, search. Searching text is something computers are really good at. Within a fraction of a second, even a modest computer can extract every sentence which contains the word "Clegg" from hundreds of thousands of pages. Brilliant! Makes life really easy. Until humans come along and bugger about with it.

Let's take a look at the "smoking gun" emails which have been submitted from News International to Leveson. Specifically KRM18.

I have no idea how these emails were supplied to Leveson. I hope that they were submitted electronically - with all headers intact. What's supplied to the pubic, however, is this:

Leveson Email Printed
The emails have been...

  • Printed out.
  • Redacted with marker pen.
  • Scanned in as a PDF.
  • Then subject to an uncorrected OCR process.

Computers are really bad at recognising text. OCR (Optical Character Recognition) is a very error-prone process. Take a look at how the computer has translated the above document.

Leveson Email Printed OCR

It's partly there. But enough of the characters are mangled, and words distorted that searching through the text is near impossible.

I get that PDF is a reasonably popular file format for sharing documents. It preserves the document structure faithfully - but at the expense of readability, fluidity, and usefulness. But distributing images is the least useful way of distributing information to people who want to use it.

It's simply bad civic responsibility to do this. These emails, if they are important enough to be made public, should be made public in their original form. I understand that some redactions should be made - but that's about the limit.

How on Earth is anyone supposed to make sense of this extract?
OCR

We need to shake off the tyranny of printed paper. It is wasteful, non-useful, and - in this context - damaging to justice.

I leave you with an entirely random extract from the emails...
Please Consider The Environment Before Printing This Email


Share this post on…

What links here from around this blog?

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre> <p> <br> <img src="" alt="" title="" srcset="">