Context-Aware Text Recognition?

by @edent | # # # | Read ~262 times.
A scanned document, the text is askew. Next to it is a computer-generated version of the text. A passage is highlighted.

I've been playing with Google's Cloud Vision API. It is OCR (Optical Character Recognition) - but in THE CLOUD and uses MACHINE LEARNING! When it works, it is indistinguishable from magic. When it fails, it reveals a very limited understanding of human text. Let's take a look at this quick example - a piece of… Continue reading →

Selecting Text In Images - Pure SVG, No JavaScript

by @edent | # # # | 1 comment | Read ~1,090 times.

Recently, I wanted to embed an photograph of a book page. I thought it would be nifty if the text from the page could be selected. If you hover your mouse over this image, you should be able to select part of the text. Ideally, it will look something like this... It even works on… Continue reading →

Crowdsourcing Leveson

by @edent | # # # # # # | 1 comment | Read ~120 times.

I've already blogged about the Leveson Inquiry's disturbing habit of releasing evidence as scanned in PDFs. I had a suggestion from digital journalist Kevin Anderson @edent Put the Leveson docs up on Google Docs. I'd be curious how their OCR could handle them. Then click 'make public' — Mr Anderson (@kevglobal) May 11, 2012 Google… Continue reading →

Leveson - Death By A Thousand (Paper) Cuts

by @edent | # # # # # | 1 comment | Read ~297 times.

I've been listening to the Leveson inquiry. A large part of the exchanges seem to go like this: Jay: Turning to page 51. Witness: Which bundle? Jay: 1606. Witness: 1660? Leveson: No, the page after. Jay: Paragraph 7. Witness: I don't have a paragraph 7. Jay: Ah, I have an earlier print out. Leveson: You'll… Continue reading →