It emerged this morning* that the Guardian newspaper has realised that the way it writes is unsuitable for the web.
*Source: Guardian Newspaper, 18/11/2011, page http://www.guardian.co.uk/media/mind-your-language/2011/feb/18/mind-your-language-day-date-time

By using non-specific language, I have introduced a degree of ambiguity which makes it hard for reader - both in the present day and the future - to understand the ideas I am trying to convey.

For example - the above text doesn't state which of the many Guardian newspapers is under discussion.
The words "this morning" are highly subjective depending on timezone - and the date at which the article is read.
Finally, the source link is separate from the text making it hard for an automatic process to understand what words relate to which website.

The above chould be rewritten as...

The Guardian has realised that the way it writes is unsuitable for the web.

We can, of course, extend our markup to make it easier for both humans and robots to understand what it is we have written. Using the ideas behind the Semantic Web we can embed information about the relationships, dates, times, locations, etc, in a meaningful way.

Page Numbers

This leads me into the bizarre decision of Amazon to introduce page numbering for its Kindle eBook reader.
Amazon Page Numbers

I hope this post will convince you that this move is philosophically wrong and - potentially - dangerous.

Why Do Paper Books Have Page Numbers?

Paper books (not "real books" as some people term them) are physically printed on individual sheets of paper. Page numbering is a limitation with non-digital media to allow semi-rapid and predictable access to pseudo-exact locations.

For example, a paper book may say

"For further information on the Infinite Improbability Drive, see Adams 1979, page 42"

It doesn't tell us where on a page unless it also introduces a paragraph number. As a paragraphs aren't usually numbered, this makes locating the exact reference hard for both machines and people.

Furthermore, different versions of the same text may have substantially different formatting - larger print, more images, smaller paper, etc - so every time a book is republished all page references have to be recalculated.

Page numbers are a hack. They are an illusion designed to give humans the impression that an imprecise reference is specific.

Why Don't eBooks Have Page Numbers?

Because they don't need them! An eBook is a digital document which may be marked up in a many ways. From the humble text document as an unbroken stream of characters - to a meta-data rich ePub full of syntactic markup.

An eBook never knows the size of the screen upon which it will be displayed. Even if it did, it wouldn't know how large its text would render, nor what font would be used, nor if the screen orientation were to change.

The idea of a page number for a digital document is an absurdity. Page numbers are only needed for an item that may be printed out in the physical world. Even then, they still suffer from the above problems.

Why Use Page Numbers In Quotes?

The idea is simple. To use the Harvard Style as an example - other styles are equally useless - we see this.

Lawrence (1966, p.124) states "we should expect..."

So, assuming we can find the exact copy of "Lawrence's" paper, printed in exactly the same font and size, on exactly the same shaped paper, and typeset identically, we can simply flip through to page 124 and scan through the page until we find the quote we're looking for.

Rubbish! Utter, unmitigated, total and utter rubbish. A long, boring and error-prone process which does little to help us find the information for which we are looking.

The Role Of Computers

We have computers for one simple reason - they perform boring jobs with speed and accuracy.

If I want to find where Alan Turing declared "It is possible to invent a single machine which can be used to compute any computable sequence" within "On Computable Numbers, with an Application to the Entscheidungsproblem" I can do one of two things.

  1. I can click on a link which takes me straight to the sentence or chapter
  2. I can hit "search" or "find" in my document viewer

Having documents which are correctly marked-up is the key to successful linking of data. Manually marking up a document is tiresome and problematic for humans. Where we fail, computers make up for our shortcomings with their incredible speed at searching through documents.

My Kindle is a fairly slow as far as modern computers are concerned. Yet it has no difficulty searching through millions of words in hundreds of documents to find the single sentence I'm looking for. And it all takes less time than it would for me to flick through some pages.

A Changing World

Page numbering as a system of referencing relies on such an unlikely series of events as to be worse than useless. If both reader and writer do not have identical copies of the quoted work, there is a real risk of confusion and misunderstanding.

Page numbers are an ugly hack which fail to achieve the precision they so desperately crave. Let's remove the hack and replace it with something which actually works.

People who quote a page number from an electronic book don't understand how the modern world works. Documents have evolved - so must our citation styles.

