Terence Eden’s Blog

Context-Aware Text Recognition?

google leveson ocr · 350 words · Viewed ~548 times

A scanned document, the text is askew. Next to it is a computer-generated version of the text. A passage is highlighted.

I've been playing with Google's Cloud Vision API. It is OCR (Optical Character Recognition) - but in THE CLOUD and uses MACHINE LEARNING! When it works, it is indistinguishable from magic. When it fails, it reveals a very limited understanding of human text. Let's take a look at this quick example - a piece of evidence from Leveson Inquiry Considering that the document is a digital scan of…

Continue reading →

Crowdsourcing Leveson

crowdsourcing justice leveson ocr politics text · 1 comment · 550 words

I've already blogged about the Leveson Inquiry's disturbing habit of releasing evidence as scanned in PDFs. I had a suggestion from digital journalist Kevin Anderson Terence Eden is on Mastodon@edentGah! The #leveson witness statements are photocopied & scanned in levesoninquiry.org.uk/evidence/?witn…Disastrous for open justice - shkspr.mobi/blog/index.php…❤️ 0💬 0🔁 110:12 - Fri 11 May 2012Mr And…

Continue reading →

Leveson - Death By A Thousand (Paper) Cuts

leveson murdoch ocr paper politics · 450 words · Viewed ~342 times

I've been listening to the Leveson inquiry. A large part of the exchanges seem to go like this: Jay: Turning to page 51. Witness: Which bundle? Jay: 1606. Witness: 1660? Leveson: No, the page after. Jay: Paragraph 7. Witness: I don't have a paragraph 7. Jay: Ah, I have an earlier print out. Leveson: You'll find it in tab 15. Witness: Is this Volume 2? And so on, ad nauseum. Surely there's no…

Continue reading →