Where is the original "Overview of SHARD" paper?

academia citation MSc · 15 comments · 550 words · Viewed ~2,476 times

One thing I'm finding extremely frustrating in academia is the number of people citing papers which don't seem to actually exist.

As part of a data analytics class, I'm learning about "database sharding". That is, the process of splitting data between multiple machines. But where does the term come from?

Wikipedia - the source of all truth - says:

In a database context, most recognize the term "shard" is most likely derived from either one of two sources: Computer Corporation of America's "A System for Highly Available Replicated Data"

It lists the reference as "Sarin, DeWitt & Rosenburg, Overview of SHARD: A System for Highly Available Replicated Data, Technical Report CCA-88-01, Computer Corporation of America, May 1988"

It is a heavily cited paper. But it doesn't seem to exist!

I've contacted the authors of this, and other papers, but they've not been able to supply me with a copy of the paper.

As far as I can tell, it was originally an internal company report to the Computer Corporation of America. Their new owners didn't respond to a request for archival material.

Perhaps I can't find it because the authors' names are misspelled?

This 1989 thesis from MIT spells the name as "Rosenberg" - with an e, not a u. Citation with a variant spelling of Rosenberg. (Thanks to Suzy Hamilton for helping me find that paper.]

But can we trust this source? Probably; it was written by one Ronni Lynne Rosenberg - and I assume she can spell her own name correctly! The author's name on the title sheet. I've updated the Wikipedia citation.

But now I'm stumped. Everyone refers to this ur-paper, but I can't find it anywhere. I've checked all the sources I have access to. And even some of those despicable sites which share academic PDFs for free. None of them have it.

This leads me to conclude one of three possibilities:

It exists, but I'm too stupid to find it
People are citing things which they haven't read
I have fundamentally misunderstood how academia works

What do you reckon?

2021-06-30 UPDATE! The inimitable Dr Laura James has found a clue! The British Library holds a copy of "TECHNICAL REPORT- COMPUTER CORPORATION OF AMERICA CCA".

I've requested an interlibrary loan from my university to see if it contains this mythical document.

2021-07-13 Update! Sadly, it's a bust. I got a lovely and detailed email from the British Library. In it, they say:

I have asked my colleague in Boston Spa, in Yorkshire to check the shelves and our holdings for this catalogue record:

Title: TECHNICAL REPORT- COMPUTER CORPORATION OF AMERICA CCA.

Holdings Notes: Document Supply 8715.133000 81/05, 1981-

Shelfmark(s): Document Supply 8715.133000

Sadly it is rather misleading and the dash -, implies that we have CCA-81/05 onwards but in fact we just have that particular technical report and no others. I will ask our data quality team to correct the record for future enquirers.

The quest continues!

15 thoughts on “Where is the original "Overview of SHARD" paper?”

Alex

I think it's mostly 2. You might reference something you haven't read in order to make it clear the tradition you're talking about. Like if I said "justice" and then (Rawls 1971) to let you know it's that idea rather than maybe a legal thing. But I probably won't read him.

Reply | Reply to original comment on twitter.com 2021-06-27 12:19
Martin Paul Eve

I've been talking for years about objects only known through the proxy of citation, where the original is totally lost. It's usually editions of The Iliad but here's a more recent version.

Reply | Reply to original comment on twitter.com 2021-06-27 12:39
Jose Haro Peralta

According to Wikipeda (en.wikipedia.org/wiki/Computer_…), CCA was acquired by Xerox in 1988, so I guess it's both a CCA and a Xerox paper?

Reply | Reply to original comment on twitter.com 2021-06-27 13:28
Caroline Jarrett

My vote is for reason 2: people are citing things they haven’t read.

Famous example is Likert’s 1932 paper which, to be fair, was really difficult to find and paywalled until relatively recently. But these days is quite easy to find and relatively easy to read when found.

By the way: most academics will very happily send a paper to anyone who asks them. The current academic journal / paywall structure currently mostly exists to make money for the publishers by profiting from the donated labour of academics, and in some cases fees that academics have to pay. There are some exceptions: excellent open-access journal.

And another thought: I’ve ended up citing something that was there when I cited it but is now unavailable at the original place and I haven’t been able to find it anywhere else. So there’s a fourth explanation: 4. Was available, isn’t now

Reply 2021-06-27 13:42
Merton Hale

A few years ago I was doing very indepth research on how the human eye worked. Not from a medical point of view but as a "machine." All research was done on the internet. I quickly became apparent that the vast majority of papers, websites, etc. were just citing each other. An ongoing circle. And unsustantiated claims or descriptions just flowed around the internet with nobody really caring. I'm very sure that people are citing things they have not read. If the paper exists it would be very surprising if you could not find it. You do know what you are doing.

Reply 2021-06-27 14:34
Chris Midgley

I’m quite sure the paper existed: https://archive.org/details/DTIC_ADA209437/page/n113/mode/2up?q=overview+of+shard (from 1989) references it, and has a description of SHARD in pages 31-5.

One major difficulty is that it predates DOIs (by about a decade), making it hard to reference.

DTIC have a lot of related papers. You could FOIA them and see if they have a copy.

Reply 2021-06-27 23:25
Dr. David Mills

If it turns up on a scrap of burned parchment, I can probably help.

Reply | Reply to original comment on twitter.com 2021-06-28 08:06
Henry Hadlow

Another framing of this is ‘Who pays to make sure scientific journal articles are available indefinitely?’ Companies and universities pay $€£ to journal publishers to access their huge archives, but that doesn’t work for corporate memos. arxiv.org solves it, kinda.

Reply | Reply to original comment on twitter.com 2021-06-28 09:15
Laura

A variety of loose search terms doesn’t show it up at the Cambridge UL. I guess there might be some unindexed content?? Local academics recommend the UL, or Imperial, which are deemed to have Good Collections of computer science…

Reply 2021-06-29 18:57
NI-Lab.

Wikipedia にはシャードの語源の可能性として Computer Corporation of America's "A System for Highly Available Replicated Data" が挙げられているが見つからなさそう・・・

Where is the original “Overview of SHARD” paper? – Terence Eden’s Blog shkspr.mobi/blog/2021/06/w…

Reply | Reply to original comment on twitter.com 2022-10-08 13:02
Ronni Rosenberg

I'm the Ronni Rosenberg. This was an internal CCA paper (not from academia or a published journal), from 35 years ago! I don't have a copy and I have no idea how to get it. Sorry about that. It does seem to be the earliest reference to data "sharding." (The other early reference mentioned in Wikipedia is from much later, 1997.)

Fortunately, you need not go back 35 years to read about sharding; it's easy to get current info. Cheers.

Reply 2023-07-24 19:22
1. @edent
  
  Thank you so much for your comment 🙂
  
  The current info is comprehensive, but I think that it is still useful to see the original ideas that people proposed. Hopefully there's a photocopy lurking in a filing cabinet somewhere…
  
  Reply 2023-07-25 08:35
John Maguire

I reached out to Ronni Rosenberg on LinkedIn who had this to say:

Yes, I was involved, 35 years ago! I believe it was an internal CCA paper. I don't have a copy and I have no idea how to get it. Sorry about that. It does seem to be the earliest reference to "shard" in the DB context. (The other early reference pointed to in Wikipedia is from much later, 1997.) Fortunately, you need not go back 35 years to read about sharding; it's easy to get current info. Cheers.

Fingers crossed it surfaces eventually!

Reply 2023-07-24 19:32
Avi Berliner

Thanks for this and sending me down a rabbit hole of my own. While I haven't found the paper you are looking for, I would suggest that it isn't even the original SHARD paper...

See my LinkedIn post https://www.linkedin.com/posts/aviberliner_original-shard-paper-june-1986-activity-7092883546519846916-O7wu where I have uploaded a 1986 paper.

Reply 2023-08-03 16:22
1. @edent
  
  For those looking for a link to the paper, it is "MIT-LCS-TR-364" at https://hdl.handle.net/1721.1/149632 or DTIC ADA171427 https://apps.dtic.mil/sti/pdfs/ADA171427.pdf
  
  Reply 2023-08-03 16:31

Share this post on…

15 thoughts on “Where is the original "Overview of SHARD" paper?”

Merton Hale

Chris Midgley

Laura

@edent

John Maguire

Avi Berliner

@edent

What are your reckons? Cancel reply