Context-Aware Text Recognition?

by @edent | , , | 350 words | Read ~441 times.

A scanned document, the text is askew. Next to it is a computer-generated version of the text. A passage is highlighted.

I've been playing with Google's Cloud Vision API. It is OCR (Optical Character Recognition) - but in THE CLOUD and uses MACHINE LEARNING! When it works, it is indistinguishable from magic. When it fails, it reveals a very limited understanding of human text. Let's take a look at this quick example - a piece of…

Selecting Text In Images - Pure SVG, No JavaScript

by @edent | , , | 1 comment | 450 words | Read ~1,192 times.

Recently, I wanted to embed an photograph of a book page. I thought it would be nifty if the text from the page could be selected. If you hover your mouse over this image, you should be able to select part of the text. Ideally, it will look something like this... It even works on…

Crowdsourcing Leveson

by @edent | , , , , , | 1 comment | 500 words | Read ~128 times.

I've already blogged about the Leveson Inquiry's disturbing habit of releasing evidence as scanned in PDFs. I had a suggestion from digital journalist Kevin Anderson @edent Put the Leveson docs up on Google Docs. I'd be curious how their OCR could handle them. Then click 'make public' — Mr Anderson (@kevglobal) May 11, 2012 Google…

Leveson - Death By A Thousand (Paper) Cuts

by @edent | , , , , | 1 comment | 450 words | Read ~311 times.

I've been listening to the Leveson inquiry. A large part of the exchanges seem to go like this: Jay: Turning to page 51. Witness: Which bundle? Jay: 1606. Witness: 1660? Leveson: No, the page after. Jay: Paragraph 7. Witness: I don't have a paragraph 7. Jay: Ah, I have an earlier print out. Leveson: You'll…