Welcome to Yak Shaving School! As part of my MSc I'm reading a book about Data Analytics. So I've been chasing down quotes to find their origin.
One paper had this popular quote in it (emphasis added):
As with many rapidly emerging concepts, Big Data has been variously defined and operationalized, ranging from trite proclamations that Big Data consists of data-sets too large to fit in an Excel spreadsheet or be stored on a single machine (Strom, 2012)
Kitchin, RobBig Data, new epistemologies and paradigm shifts() SAGE Publications. Big Data & Society. Page: 205395171452848. DOI: https://doi.org/10.1177/2053951714528481
I keep seeing that damned Excel quote. But who originally said it? The "Big Data" paper above references "Strom". Well, here's what Strom has to say for themselves:
Big Data is everywhere. As Bit.ly‘s chief scientist Hilary Mason likes to say: “Big Data usually refers to a dataset that is too big to fit into your available memory, or too big to store on your own hard drive, or too big to fit into an Excel spreadsheet.”
Big Data Makes Things Better - Slashdot.org August 3rd 2012
Aha! It's a blogpost from Slashdot. And Strom is quoting someone else - Hilary Mason. I've seen Mason being quoted saying this before. Here's the earliest Tweet I could find attributing Mason - from April 2013:
— Atypic (@ATYPIC) April 11, 2013
But I couldn't find the original quote. I want to be able to cite who originally said it, and where & when they said it. Not a second-hand transcription.
Googling around, I found this definition of Big Data from July 2013:
“Big Data” is “it doesn’t fit in Excel”
Stéphane Hamel – één van inmiddels meer dan 30 definities van Big Data!.
Data Science – de toekomst van webanalisten?
Interesting! That's Stéphane Hamel - not Hilary Mason. Searching for Hamel's name, lead me to this 2017 article
The simplest definition of “Big Data” is “it doesn’t fit in Excel”
Stephane Hamel comment 8/2012 Big Data – What It Means For The Digital Analyst.
Definitions of Big Data
The "What it means for the digital analyst" page has since disappeared - but is available in the Wayback Machine. Here's the quote in full:
I have joked that the simplest definition of "Big Data" is "it doesn't fit in Excel" - and when you think of it, it's true for most people who wonder how to make the shift from a traditional approach to a Big Data one. Shifting away from Excel forces the analyst to change his approach, view the data differently, and explore new solutions.
And that's a whole lot of fun to do! 🙂
August 2nd, 2012
There's also a Slideshow from March 2013 in which Hamel uses the phrase:
A bit more digging and I found this document from July 2012:
How Big is Big Data() Columbia University. DOI: https://doi.org/10.7916/d82v2qkb
The @SHamelCP Twitter account doesn't exist any more. And while some of its Tweets are in the Internet Archive, that one is missing. But there are contemporary Tweets which suggest that it was Tweeted at about that time:
— jwindz (@jwindz) July 3, 2012
Back in 2012, the Retweet function didn't exist, hence the slightly weird syntax. Here's a link to a bunch of people quote tweeting it in July 2012.
The reason @SHamelCP doesn't exist is because at some point it was renamed to @SHamel67. Which means, the original Tweet exists! And here it is:
I reckon that's the earliest directly citable Tweet of the phrase. But there is some evidence of it being used earlier. Here's a report from the BigDataWeek Community meetup in London:
The panel started off with Edd asking, So what is big data? The answers ranged from correct but slightly silly:
lots of 0s and 1s
too big to fit in x (where x is your usual tool - excel, SQL, memory etc) - Hilary
”Big data, ready or not” 25th April 2012
Here's the video - with the quote at ~15 minutes 30 seconds in:
And, slightly earlier:
“Big Data usually refers to a data set that is too big to fit into your available memory, or too big to store on your own hard drive, or too big to fit into an Excel spreadsheet,” says Mason
Hilary Mason Wants To Get You Started With Big Data 26th December 2011
(Although possibly originally published in September 2011)
Prior to that, things start getting a little fuzzy. In April 2011, Mike Driscoll wrote a blog post about a presentation he gave with Hilary Mason and Joe Adler:
- Choose The Right-Sized Tool
Or, as I like to say, you don’t need a chainsaw to cut butter.
If you’ve got 600 lines of CSV data that you need to work with on a one-time basis, paste it into Excel or Emacs and just do it
When you’re data gets very large, so big it can’t fit reasonably on your laptop (in 2010, that’s north of a terabyte), then you’re in Hadoop, parallelized database , or overpriced Big Iron territory.
the seven secrets of successful data scientists 19th April 2011
So the proto-phrase seems to have appeared between April 2011 and April 2012. By July 2012 it had become much more pithy. And from there became endlessly quotable.
Before April 2011, it was always expressed much more fuzzily. A McKinsey report from May 2011 says:
In some cases, decisions will not necessarily be automated but augmented by analyzing huge, entire datasets using big data techniques and technologies rather than just smaller samples that individuals with spreadsheets can handle and understand.
Big data: The next frontier for innovation, competition, and productivity
And, even further back, here's what RedMonk's Stephen O'Grady had to say back in 2009:
Excel has been used on big data for years, it’s true. But not directly on big data. With a row limit of around 65,000, it certainly can’t be used as a direct window into data warehouses or marts
What’s After Excel? Big Data and the Future of Spreadsheets 19th November 2009
Please don't think I'm picking on any of the people mentioned in this blog post - I've seen the quote attributed to a dozen other people, and to none. It is a catchy little slogan with huge memetic potential. I think it has now now become a standard truism.
But this was a great reminder to me that is always worth following the trail of a quote to see where it leads.
Thanks to Pete Skomoroch for alerting me to this earlier usage, from March 2009.
@jakehofman was pondering a blog post on that, often people contact me about "big data" where big = slightly larger than can fit in excel 🙂
— Pete Skomoroch (@peteskomoroch) March 6, 2009
Pete recollects that people were using this phrase in 2007 — but I've yet to find evidence of it. If you have, please stick a note in the comments.