I few days ago, I was somewhat surprised to find that one of my Tweets had been used as a citation in Wikipedia!
I began to wonder - how often are Tweets used in citations?
It's possible to search for your own Tweets using this (somewhat obscure) link:
Just edit the end of it to see if you, or your friends, have been cited. Note - the username is case sensitive, so "Edent" isn't the same as "edent".
For example, we can see where Cory Doctorow is cited:
Aha! The page on the New Zealand Internet Blackout references:
Success! I'm blacked out in solidarity with Kiwis whose net has been pwned by American entertainment giants: http://creativefreedom.org.nz/
— Cory Doctorow (@doctorow) February 19, 2009
Ok, so which Twitter user has been cited the most? TO THE API, ROBIN!
Wikipedia's own help pages are a little lacking, so I went to the help pages of the software which runs Wikipedia - MediaWiki.
We want to search for external URLs which point to Twitter and have a namespace of 0 (that means they're articles, not talk pages). We can grab a maximum of 500 results at a time, using JSON, and we want to include "www.twitter.com" and "twitter.com". Here's what we use.
- This only gets 500 results at a time - to paginate through, we add
- This only searches the English Wikipedia.
- There are roughly 17,800 links to Twitter from English Wikipedia returned by the API.
- The majority of citations just point to a Twitter user's page - not to a specific Tweet.
- Some of the returned Tweets use the obsolete HashBang URls (e.g. http://twitter.com/#!/007/status/133679555167784960.)
- Some Tweets have been deleted.
import urllib2 import json from collections import Counter #euoffset=17800 api = "https://en.wikipedia.org/w/api.php?action=query&eunamespace=0&eulimit=500&format=json&list=exturlusage&euquery=*.twitter.com" euoffset = 0 words =  while euoffset < 17500: try: site_data = json.load(urllib2.urlopen(api + "&euoffset=" + str(euoffset))) # Itterate through for element in site_data['query']['exturlusage']: # Remove HashBangs #! # Lowercase everything twitterURL = element['url'].replace("/#!","").lower() twitterUser = twitterURL.replace("http://twitter.com/","") twitterUser = twitterUser.replace("https://twitter.com/","") twitterUser = twitterUser.replace("@","") slash = twitterUser.find("/") if slash > 0: twitterUser = twitterUser[:slash] print twitterURL # print twitterUser words.append(twitterUser) except urllib2.URLError: print "Unable to retreive data" sys.exit() euoffset += 500 # Most cited user word_counts = Counter(words) print word_counts
The most cited Twitter users are...
- LaLiga A Spanish Football competition - 105
- Lea Michele An American singer / TV actor - 54
- Guldbaggen The Swedish equivalent of the Oscars - 35
- Kevin Tancharoen An American movie director - 12
- PRESTO card Ottawa's public transit smartcard - 10
- ICE T An American rapper - 10
- Northern Pride RLFC A British rugby team - 10
- 穴井勇輝（勇吹輝） A Japanese Actor - 9
- NICKI MINAJ An American singer - 8
- Counting Crows An American band - 8
And the most cited individual Tweet?
— Syfy PR (@SyfyPR) March 18, 2013
Linked to from lots of Lost Girl pages.
What Have We Learned Today?
Twitter, unsurprisingly, has limited utility as an encyclopædic source - it's great for breaking news and ephemeral events, but it's fragile and lacks depth. There are very few occasions where Twitter would be the sole, and canonical, source of information [Citation needed].