Context-Aware Text Recognition?

A scanned document, the text is askew. Next to it is a computer-generated version of the text. A passage is highlighted.

I've been playing with Google's Cloud Vision API. It is OCR (Optical Character Recognition) - but in THE CLOUD and uses MACHINE LEARNING! When it works, it is indistinguishable from magic. When it fails, it reveals a very limited understanding of human text. Let's take a look at this quick example - a piece of […] Read More

Selecting Text In Images - Pure SVG, No JavaScript

Recently, I wanted to embed an photograph of a book page. I thought it would be nifty if the text from the page could be selected. If you hover your mouse over this image, you should be able to select part of the text. Ideally, it will look something like this... It even works on […] Read More

Crowdsourcing Leveson

I've already blogged about the Leveson Inquiry's disturbing habit of releasing evidence as scanned in PDFs. I had a suggestion from digital journalist Kevin Anderson @edent Put the Leveson docs up on Google Docs. I'd be curious how their OCR could handle them. Then click 'make public' — Mr Anderson (@kevglobal) May 11, 2012 Google […] Read More

Leveson - Death By A Thousand (Paper) Cuts

I've been listening to the Leveson inquiry. A large part of the exchanges seem to go like this: Jay: Turning to page 51. Witness: Which bundle? Jay: 1606. Witness: 1660? Leveson: No, the page after. Jay: Paragraph 7. Witness: I don't have a paragraph 7. Jay: Ah, I have an earlier print out. Leveson: You'll […] Read More