One thing I'm finding extremely frustrating in academia is the number of people citing papers which don't seem to actually exist.
As part of a data analytics class, I'm learning about "database sharding". That is, the process of splitting data between multiple machines. But where does the term come from?
Wikipedia - the source of all truth - says:
In a database context, most recognize the term "shard" is most likely derived from either one of two sources: Computer Corporation of America's "A System for Highly Available Replicated Data"
It lists the reference as "Sarin, DeWitt & Rosenburg, Overview of SHARD: A System for Highly Available Replicated Data, Technical Report CCA-88-01, Computer Corporation of America, May 1988"
I've contacted the authors of this, and other papers, but they've not been able to supply me with a copy of the paper.
As far as I can tell, it was originally an internal company report to the Computer Corporation of America. Their new owners didn't respond to a request for archival material.
Perhaps I can't find it because the authors' names are misspelled?
But can we trust this source? Probably; it was written by one Ronni Lynne Rosenberg - and I assume she can spell her own name correctly!
I've updated the Wikipedia citation.
But now I'm stumped. Everyone refers to this ur-paper, but I can't find it anywhere. I've checked all the sources I have access to. And even some of those despicable sites which share academic PDFs for free. None of them have it.
This leads me to conclude one of three possibilities:
- It exists, but I'm too stupid to find it
- People are citing things which they haven't read
- I have fundamentally misunderstood how academia works
What do you reckon?
The inimitable Dr Laura James has found a clue! The British Library holds a copy of "TECHNICAL REPORT- COMPUTER CORPORATION OF AMERICA CCA".
I've requested an interlibrary loan from my university to see if it contains this mythical document.