Page numbers aren't the answer

There's a new pre-print paper called Pinpointing the problem: Providing page numbers for citations as a crucial part of open science by Leon Y. Xiao and Nick Ballou. It's a short, easily understandable paper, and well worth a read. I think I disagree with nearly all of its conclusions!

The main point, I agree with. Citing a whole paper is a lossy process. Saying "Smith, J (1963) Practical Time Travel, Gallifrey Press" is basically fine but it doesn't tell you where in the publication the bit is you're citing.

Xiao & Ballou contend that the answer is to add a page number to every citation. I think that's a simple solution - but almost always the wrong way to solve the problem.

Firstly, page numbers aren't stable. If you've got the large-print version of a paper, it will have a different page numbers than the regular print edition. Paperbacks and hardbacks have different numbers. The paper copy might be formatted differently from the digital copy.

Secondly, most modern documents simply don't have pages. Anything published as HTML / ePub won't have a page number. Page numbers are a skeuomorph in those contexts.

Finally, even if there is a stable page number - is that precise enough? A page may have several hundred words in multiple sections. Page numbers aren't granular.

I've ranted before about I consider Quoting Page Numbers from eBooks is Considered Harmful.

So what's the solution?

URl fragments!

A properly formatted document should be accessible by sub-section. For example:

That takes us to the paper we want, and straight to the anchor heading we're discussing. This also works with the obsolete PDF format.

Now, not all papers have the correct markup to do this. That's OK! We can cheat using Text Fragments. That allows us to link to the first instance of a specific string of text in a document. For example: the polarity of the neutron flow

I genuinely think both of those solutions make more sense than hoping that your reader has a copy of a paper in exactly the same format as you do.

One of the things I found most frustrating during my MSc was how... old fashioned academia was. PDFs with two column layouts, unreadable fonts, paywalled knowledge trapped in moribund formats. Let's try to drag it into the late 20th Century, eh?

12 thoughts on “Page numbers aren't the answer”

  1. @Edent Does this raise a more abstract question about the limits of specificity appropriate/allowable for certain types of citation?

    If I’m picking out a single statistic, then it might make sense to point to the last sentence of paragraph 3 under the ‘Results’ section heading.

    If I’m paraphrasing their findings, or describing concepts that required me to read most of the paper, then it’s messy, misleading and possibly disingenuous to pinpoint to their first or last mention.?

  2. says:

    @Edent I think your solution is creative. In graduate school (History) when digital was in its infancy, we used Turibian (sub-set of Chicago), and it had very precise rules (even including digital) for specifying editions and page numbers or date accessed.

    I was always surprised how other Humanities used MLA or some other guide because they were never as precise, and accountability and reproducibility are important.

    I have read science and math texts but never with a eye to footnotes (except for any that elaborated on the text) or citations because I wouldn't/couldn't attempt to verify or reproduce any of it.

    It does surprise me that science is so imprecise it's not required to specify where in a reference the author is referring. It's been decades since I needed to cite anything, and I didn't read the paper you're critiquing, but I probably agree with their objection and your conclusion.

    Which, if you add my 2 cents to a sawbuck will barely buy you a cup of coffee.

  3. says:

    @drandrewv2 @Edent I find using page numbers very useful for books (not generally articles) mainly published long before the web. Lots of humanities (& some biology) data is in giant books where this approach is useful.

    A more radical suggestion is that papers should be succinct as possible, the "“Monotremes oviparous, ovum meroblastic”" telegram of 1884 probably represents the lower limit.

    Picture shows Mill ex. Moench (1770: 229), a contested 1st description of London Plane

  4. says:

    @Edent Super interesting.

    I would love to see EPUB support in Zotero, I think it could change the landscape here.

    Portability of documents in your personal library is a really important user need for (early career especially) researchers who are frequently moving institutions (and countries) due to temporary contracts. I think this might be quite a big part of why PDFs are still relatively popular in academia, esp in sciences.

  5. @Edent @LeonXiaoY @CatherineFlick

    a compelling response! I agree with you - I'd love to see version-controlled DOIs with url fragments that link to specific sections. I just think that in the current world of antiquated publishing formats, page numbers—for some citations, where there'd otherwise be ambiguity—is the best of a bad situation.

    we'll add some extra detail about how a better system would work and point people to your post 👍

    (sorry also for completely missing your earlier reply!)

  6. Lee says:

    I shudder when people use page numbers for referencing, and then require fixed-format documents to support this. As you say, the page numbers don't always translate across "documents"... sometimes even if the output format is ostensibly the same.

    Two different versions of the same document renderer are capable of producing different page contents. Two similar fonts with the same weight and point size (but slightly different metrics) are capable of producing different page contents.

    You end up getting in arguments over which is the "true" document.

  7. Amen to this. Plus academia's obsession with PDF needs to be challenged. It's making work harder to read, harder to find, harder to use and harder to build on.

  8. @Edent What if scholarly documents will always come in a wide variety of forms & formats, multiple infrastructural regimes will always co-exist (w/ only kludgy links btwn them), & scholars in their different domains continue to develop & maintain their knowledge practices in different directions? The real challenge isn’t a fix in a (never to be realized) unified knowledge infrastructure & undifferentiated communities of inquiry, but in our existing situation of deep plurality.

  9. @Edent I think 'academic paper' is a big clue to why everything is broken. When we printed, then we digitalled, we broke the models the monks handed down to us (on a scrolling medium.) Pages were a major infotech schism. We don't even separate concerns.


What are your reckons?

