Terence Eden’s Blog

Context-Aware Text Recognition?

google leveson ocr · 350 words · Viewed ~548 times

A scanned document, the text is askew. Next to it is a computer-generated version of the text. A passage is highlighted.

I've been playing with Google's Cloud Vision API. It is OCR (Optical Character Recognition) - but in THE CLOUD and uses MACHINE LEARNING! When it works, it is indistinguishable from magic. When it fails, it reveals a very limited understanding of human text. Let's take a look at this quick example - a piece of evidence from Leveson Inquiry Considering that the document is a digital scan of…

Continue reading →

Selecting Text In Images - Pure SVG, No JavaScript

images ocr svg · 8 comments · 400 words · Viewed ~1,344 times

Recently, I wanted to embed an photograph of a book page. I thought it would be nifty if the text from the page could be selected. If you hover your mouse over this image, you should be able to select part of the text. Ideally, it will look something like this... It even works on Android (tried on Chrome, Opera, FireFox) and iOS 7. So, how did I do it? Originally, I was pointed to…

Continue reading →

Crowdsourcing Leveson

crowdsourcing justice leveson ocr politics text · 1 comment · 550 words

I've already blogged about the Leveson Inquiry's disturbing habit of releasing evidence as scanned in PDFs. I had a suggestion from digital journalist Kevin Anderson Terence Eden is on Mastodon@edentGah! The #leveson witness statements are photocopied & scanned in levesoninquiry.org.uk/evidence/?witn…Disastrous for open justice - shkspr.mobi/blog/index.php…❤️ 0💬 0🔁 110:12 - Fri 11 May 2012Mr And…

Continue reading →

Leveson - Death By A Thousand (Paper) Cuts

leveson murdoch ocr paper politics · 450 words · Viewed ~342 times

I've been listening to the Leveson inquiry. A large part of the exchanges seem to go like this: Jay: Turning to page 51. Witness: Which bundle? Jay: 1606. Witness: 1660? Leveson: No, the page after. Jay: Paragraph 7. Witness: I don't have a paragraph 7. Jay: Ah, I have an earlier print out. Leveson: You'll find it in tab 15. Witness: Is this Volume 2? And so on, ad nauseum. Surely there's no…

Continue reading →