Which Twitter User Receives The Most Citations on Wikipedia?
I few days ago, I was somewhat surprised to find that one of my Tweets had been used as a citation in Wikipedia!
I began to wonder - how often are Tweets used in citations?
It's possible to search for your own Tweets using this (somewhat obscure) link:
https://en.wikipedia.org/w/index.php?title=Special%3ALinkSearch&target=twitter.com%2Fedent
Just edit the end of it to see if you, or your friends, have been cited. Note - the username is case sensitive, so "Edent" isn't the same as "edent".
For example, we can see where Cory Doctorow is cited:
https://en.wikipedia.org/w/index.php?title=Special%3ALinkSearch&target=twitter.com%2Fdoctorow
Aha! The page on the New Zealand Internet Blackout references:
Ok, so which Twitter user has been cited the most? TO THE API, ROBIN!
Wikipedia's own help pages are a little lacking, so I went to the help pages of the software which runs Wikipedia - MediaWiki.
We want to search for external URLs which point to Twitter and have a namespace of 0 (that means they're articles, not talk pages). We can grab a maximum of 500 results at a time, using JSON, and we want to include "www.twitter.com" and "twitter.com". Here's what we use.
https://en.wikipedia.org/w/api.php?
action=query&
list=exturlusage&
eunamespace=0&
eulimit=500&
format=json&
euquery=*.twitter.com
Run it yourself to see the results.
Limitations
- This only gets 500 results at a time - to paginate through, we add
euoffset=
- This only searches the English Wikipedia.
- There are roughly 17,800 links to Twitter from English Wikipedia returned by the API.
- The majority of citations just point to a Twitter user's page - not to a specific Tweet.
- Some of the returned Tweets use the obsolete HashBang URls (e.g. http://twitter.com/#!/007/status/133679555167784960.)
- Some Tweets have been deleted.
Crappy Python!
import urllib2
import json
from collections import Counter
#euoffset=17800
api = "https://en.wikipedia.org/w/api.php?action=query&eunamespace=0&eulimit=500&format=json&list=exturlusage&euquery=*.twitter.com"
euoffset = 0
words = []
while euoffset < 17500:
try:
site_data = json.load(urllib2.urlopen(api + "&euoffset=" + str(euoffset)))
# Itterate through
for element in site_data['query']['exturlusage']:
# Remove HashBangs #!
# Lowercase everything
twitterURL = element['url'].replace("/#!","").lower()
twitterUser = twitterURL.replace("http://twitter.com/","")
twitterUser = twitterUser.replace("https://twitter.com/","")
twitterUser = twitterUser.replace("@","")
slash = twitterUser.find("/")
if slash > 0:
twitterUser = twitterUser[:slash]
print twitterURL
# print twitterUser
words.append(twitterUser)
except urllib2.URLError:
print "Unable to retreive data"
sys.exit()
euoffset += 500
# Most cited user
word_counts = Counter(words)
print word_counts
The Results...
The most cited Twitter users are...
- LaLiga A Spanish Football competition - 105
- Lea Michele An American singer / TV actor - 54
- Guldbaggen The Swedish equivalent of the Oscars - 35
- Kevin Tancharoen An American movie director - 12
- PRESTO card Ottawa's public transit smartcard - 10
- ICE T An American rapper - 10
- Northern Pride RLFC A British rugby team - 10
- 穴井勇輝(勇吹輝) A Japanese Actor - 9
- NICKI MINAJ An American singer - 8
- Counting Crows An American band - 8
And the most cited individual Tweet?
Linked to from lots of Lost Girl pages.
What Have We Learned Today?
Wikipedia does have a large amount of pop-culture (do we need hundreds of words on My Little Pony Characters?)
Twitter, unsurprisingly, has limited utility as an encyclopædic source - it's great for breaking news and ephemeral events, but it's fragile and lacks depth. There are very few occasions where Twitter would be the sole, and canonical, source of information [Citation needed].
You're welcome.
Terence Eden says:
Reply to original comment on iscurrently.live
|Hopefully, all the tweets cited in Wikipedia (like nearly all other web pages cited) will be in the Internet Archive's Wayback Machine.