I few days ago, I was somewhat surprised to find that one of my Tweets had been used as a citation in Wikipedia!
I began to wonder – how often are Tweets used in citations?
It’s possible to search for your own Tweets using this (somewhat obscure) link:
Just edit the end of it to see if you, or your friends, have been cited. Note – the username is case sensitive, so “Edent” isn’t the same as “edent”.
For example, we can see where Cory Doctorow is cited:
Aha! The page on the New Zealand Internet Blackout references:
Success! I'm blacked out in solidarity with Kiwis whose net has been pwned by American entertainment giants: http://creativefreedom.org.nz/
— son of an asylum seeker, father of an immigrant (@doctorow) February 19, 2009
Ok, so which Twitter user has been cited the most? TO THE API, ROBIN!
Wikipedia’s own help pages are a little lacking, so I went to the help pages of the software which runs Wikipedia – MediaWiki.
We want to search for external URLs which point to Twitter and have a namespace of 0 (that means they’re articles, not talk pages). We can grab a maximum of 500 results at a time, using JSON, and we want to include “www.twitter.com” and “twitter.com”. Here’s what we use.
- This only gets 500 results at a time – to paginate through, we add
- This only searches the English Wikipedia.
- There are roughly 17,800 links to Twitter from English Wikipedia returned by the API.
- The majority of citations just point to a Twitter user’s page – not to a specific Tweet.
- Some of the returned Tweets use the obsolete HashBang URls (e.g. http://twitter.com/#!/007/status/133679555167784960.)
- Some Tweets have been deleted.
import urllib2 import json from collections import Counter #euoffset=17800 api = "https://en.wikipedia.org/w/api.php?action=query&eunamespace=0&eulimit=500&format=json&list=exturlusage&euquery=*.twitter.com" euoffset = 0 words =  while euoffset < 17500: try: site_data = json.load(urllib2.urlopen(api + "&euoffset=" + str(euoffset))) # Itterate through for element in site_data['query']['exturlusage']: # Remove HashBangs #! # Lowercase everything twitterURL = element['url'].replace("/#!","").lower() twitterUser = twitterURL.replace("http://twitter.com/","") twitterUser = twitterUser.replace("https://twitter.com/","") twitterUser = twitterUser.replace("@","") slash = twitterUser.find("/") if slash > 0: twitterUser = twitterUser[:slash] print twitterURL # print twitterUser words.append(twitterUser) except urllib2.URLError: print "Unable to retreive data" sys.exit() euoffset += 500 # Most cited user word_counts = Counter(words) print word_counts
The most cited Twitter users are…
- LaLiga A Spanish Football competition – 105
- Lea Michele An American singer / TV actor – 54
- Guldbaggen The Swedish equivalent of the Oscars – 35
- Kevin Tancharoen An American movie director – 12
- PRESTO card Ottawa’s public transit smartcard – 10
- ICE T An American rapper – 10
- Northern Pride RLFC A British rugby team – 10
- 穴井勇輝（勇吹輝） A Japanese Actor – 9
- NICKI MINAJ An American singer – 8
- Counting Crows An American band – 8
And the most cited individual Tweet?
Linked to from lots of Lost Girl pages.
What Have We Learned Today?
Twitter, unsurprisingly, has limited utility as an encyclopædic source – it’s great for breaking news and ephemeral events, but it’s fragile and lacks depth. There are very few occasions where Twitter would be the sole, and canonical, source of information [Citation needed].