Tagged: ebook

How Do You Upgrade An eBook?

As I've mentioned before, Jasper Fforde is one of my favourite authors. His latest book "One of Our Thursdays is Missing" is a brilliant work of fiction - but contains a rather worrying flaw.

Well, I say a "worrying flaw" - I mean an error. All books contain errata - I think that's a given - but outside of academia, Jasper Fforde is the only author I know who offers upgrades to his books.

Here's a sample from the original Thursday Next "patch"

5: Using a fine black pen make the following corrections:

6: Go to page 32 and replace 'Stella' with 'Steller's - this is the correct spelling. The large slow-moving-manatee-type-mermaid-legend creature was named by Georg Wilhelm Steller, the German naturalist.

What's The Error?

The book contains a number of charming illustrations - the final illustration is meant to be (NO SPOILERS) about a wiped out clown army. Instead, it's a repeat of an earlier illustration of (NO SPOILERS) mimes encircling a car.
Screenshot of incorrect image in One of Our Thursdays is Missing

At the time of writing, there's no upgrade listed for the latest book - although there are a range of fabulous special features. I've dropped Mr Fforde an email alerting him to this devastating turn of events.

Yes, yes, it's fairly minor in the grand scheme of things.

Expecting More

Of course, one could argue that traditional books don't get upgrades - so why should this be a problem for ebooks?

  • I can take my physical book back to the shop and get a replacement. Or even send it back to the publisher. With eBooks, this is virtually impossible - not least because of the DRM issues involved in revoking a book.
  • If a book contains a serious error, I can print out an errata sheet. The DRM on eBooks prevents me from altering their contents.
  • We should expect more. This is a new medium - we should expect more than simply plain text in a DRM layer.

Yes, it all comes down to DRM - or, as Jasper Fforde puts it...

The DRM was the Dark Reading Matter – the unseeable part of the BookWorld.

and

DRM’s existence remained theoretical, at best.

I don't know if I'm reading too much in to Fforde's work - but he doesn't strike me as the sort of author to use an acronym without a thorough understanding of what it means. I have no shame in saying that I removed the DRM on the ePub I purchased in order to read it on my Kindle (which does not support Adobe's DRM scheme).

I don't know how I expect an eBook upgrade to work - with or without DRM. I don't want an author, book seller, or publisher to randomly change the book I'm reading - that's a little too similar to Amazon deleting 1984 from its Kindles.

I know I don't want to buy an entirely new copy - just because some punctuation has been fixed.

Should I be able to download a diff and let my eReader decide which version of a book I want to see?

Should I be able to get an update free? Should it cost?

So many questions and so few answers. If you've got any thoughts on the matter - please let me know.

Quoting Page Numbers from eBooks Considered Harmful

It emerged this morning* that the Guardian newspaper has realised that the way it writes is unsuitable for the web.
*Source: Guardian Newspaper, 18/11/2011, page http://www.guardian.co.uk/media/mind-your-language/2011/feb/18/mind-your-language-day-date-time

By using non-specific language, I have introduced a degree of ambiguity which makes it hard for reader - both in the present day and the future - to understand the ideas I am trying to convey.

For example - the above text doesn't state which of the many Guardian newspapers is under discussion.
The words "this morning" are highly subjective depending on timezone - and the date at which the article is read.
Finally, the source link is separate from the text making it hard for an automatic process to understand what words relate to which website.

The above chould be rewritten as...

The Guardian has realised that the way it writes is unsuitable for the web.

We can, of course, extend our markup to make it easier for both humans and robots to understand what it is we have written. Using the ideas behind the Semantic Web we can embed information about the relationships, dates, times, locations, etc, in a meaningful way.

Page Numbers

This leads me into the bizarre decision of Amazon to introduce page numbering for its Kindle eBook reader.
Amazon Page Numbers

I hope this post will convince you that this move is philosophically wrong and - potentially - dangerous.

Why Do Paper Books Have Page Numbers?

Paper books (not "real books" as some people term them) are physically printed on individual sheets of paper. Page numbering is a limitation with non-digital media to allow semi-rapid and predictable access to pseudo-exact locations.

For example, a paper book may say

"For further information on the Infinite Improbability Drive, see Adams 1979, page 42"

It doesn't tell us where on a page unless it also introduces a paragraph number. As a paragraphs aren't usually numbered, this makes locating the exact reference hard for both machines and people.

Furthermore, different versions of the same text may have substantially different formatting - larger print, more images, smaller paper, etc - so every time a book is republished all page references have to be recalculated.

Page numbers are a hack. They are an illusion designed to give humans the impression that an imprecise reference is specific.

Why Don't eBooks Have Page Numbers?

Because they don't need them! An eBook is a digital document which may be marked up in a many ways. From the humble text document as an unbroken stream of characters - to a meta-data rich ePub full of syntactic markup.

An eBook never knows the size of the screen upon which it will be displayed. Even if it did, it wouldn't know how large its text would render, nor what font would be used, nor if the screen orientation were to change.

The idea of a page number for a digital document is an absurdity. Page numbers are only needed for an item that may be printed out in the physical world. Even then, they still suffer from the above problems.

Why Use Page Numbers In Quotes?

The idea is simple. To use the Harvard Style as an example - other styles are equally useless - we see this.

Lawrence (1966, p.124) states "we should expect..."

So, assuming we can find the exact copy of "Lawrence's" paper, printed in exactly the same font and size, on exactly the same shaped paper, and typeset identically, we can simply flip through to page 124 and scan through the page until we find the quote we're looking for.

Rubbish! Utter, unmitigated, total and utter rubbish. A long, boring and error-prone process which does little to help us find the information for which we are looking.

The Role Of Computers

We have computers for one simple reason - they perform boring jobs with speed and accuracy.

If I want to find where Alan Turing declared "It is possible to invent a single machine which can be used to compute any computable sequence" within "On Computable Numbers, with an Application to the Entscheidungsproblem" I can do one of two things.

  1. I can click on a link which takes me straight to the sentence or chapter
  2. I can hit "search" or "find" in my document viewer

Having documents which are correctly marked-up is the key to successful linking of data. Manually marking up a document is tiresome and problematic for humans. Where we fail, computers make up for our shortcomings with their incredible speed at searching through documents.

My Kindle is a fairly slow as far as modern computers are concerned. Yet it has no difficulty searching through millions of words in hundreds of documents to find the single sentence I'm looking for. And it all takes less time than it would for me to flick through some pages.

A Changing World

Page numbering as a system of referencing relies on such an unlikely series of events as to be worse than useless. If both reader and writer do not have identical copies of the quoted work, there is a real risk of confusion and misunderstanding.

Page numbers are an ugly hack which fail to achieve the precision they so desperately crave. Let's remove the hack and replace it with something which actually works.

People who quote a page number from an electronic book don't understand how the modern world works. Documents have evolved - so must our citation styles.

Voynich Manuscript for Kindle and other eBook Readers

For years I've been mildly obsessed with the Voynich Manuscript. An ancient book, written in a language no one can decipher, showing plants which don't exist, and measuring astronomical configurations which make no sense. The book is an enigma. Many think it to be an elaborate hoax - others have more.... esoteric explanations.

Regardless of what the book means, it is a beautiful and mysterious work of art. And I want to read it on my Kindle!

Archive.org has an ebook version of the Voynich Manuscript, but it has gone through a bizarre OCR process which, unsurprisingly, leaves it unreadable. So I've decided to roll my own!

Getting The Manuscript

The PDF on Archive.org is a bit cumbersome and doesn't work very well on Kindle or other readers.
The Yale site has all the scans available as high-res JPGs or MrSIDs - but it's a pain to download hundreds of images from the site.
So - I turned to a torrent. Don't worry! These images are hundred of years old - they are in the public domain.

The Original Image

The images I obtained are extremely high resolution scans of the manuscript.
Voynich172 - resized

I've shrunk down the above image - here is a detail of it, so you can see just how high quality the scans are.
Voynich172 - sample
A typical page is 3MB - taking the whole book to around 700MB. There's easily enough space on the Kindle, but the time to render every page will be prohibitively long.

Crop and Resize

Firstly, the image needs to be cropped to the same aspect ratio as the Kindle screen (0.75), then resized to 600*800.
Voynich172 - cropped
From 3MB, the image is now 111KB.

Greyscale

eInk typically only handles greyscale images. So I dropped the colours out of the image.
Voynich172 - cropped bw
This takes the file size to under 100KB.

We can make the image marginally clearer to read by dumbly removing the background colour.
Voynich172 - cropped bw clear
As you can see, this doesn't look wonderful - many details are lost and there are odd looking artefacts throughout the image.

How They Look On The Kindle

The images look fantastic on the Kindle.

Screenshot 1
The full page - click for bigger

Colour Screenshot
The coloured image - click for bigger

Black and White Screenshot
Black and White Screenshot - click for bigger

Clear Background Screenshot
Clear Background Screenshot - click for bigger

It turns out that there's no real need to remove the background colours - although it does make it maginally clearer. Changing the the image to greyscale also has little appreciable difference. I think I'll probably keep them in colour so that future ebook readers can see them in all their glory.

Incidentally, I tried the high-resolution file. The kindle took around 30 seconds to render it. Here's a quick screen grab of it fully zoomed in.

Full Size Screenshot
Full Size Screenshot - click for bigger

TODO

This is a work in progress. Converting several hundred images will take a fair bit of time - unless I fully automate it and drop the cropping.
The Kindle can be hacked to display images (or use them as screensavers) - but I will convert the images into an ePub and Mobi so they can be easily read by all eReaders.
Once done, I'm sure the mysteries of the universe will be revealed to me!

Kindle 3 Vs Elonex 511EB

I've just taken delivery of a shiny new Amazon Kindle 3. I'm looking forward to giving it a thorough review - but here's a quick comparison between it and my venerable Elonex 511EB.

Kindle and 511EB side by side
Click for bligger

Continue reading

The 511EB Is Getting a Firmware Update

Like many people, I've been frustrated by the lack of firmware updates to the Elonex 511EB ebook reader. There are several bugs which are frustrating to many users - as judged from the comments on this blog. With the Amazon Kindle dropping to a lower price, this ebook reader really needs to be updated if it wants to stay competitive.
511EB in the grass
Well, I'm pleased to say that it looks like there will be a firmware update - and soon!
Continue reading