Which Twitter User Receives The Most Citations on Wikipedia?
I few days ago, I was somewhat surprised to find that one of my Tweets had been used as a citation in Wikipedia!
I began to wonder - how often are Tweets used in citations?
It's possible to search for your own Tweets using this (somewhat obscure) link:
https://en.wikipedia.org/w/index.php?title=Special%3ALinkSearch&target=twitter.com%2Fedent
Just edit the end of it to see if you, or your friends, have been cited. Note - the username is case sensitive, so "Edent" isn't the same as "edent".
For example, we can see where Cory Doctorow is cited:
https://en.wikipedia.org/w/index.php?title=Special%3ALinkSearch&target=twitter.com%2Fdoctorow
Aha! The page on the New Zealand Internet Blackout references:
Ok, so which Twitter user has been cited the most? TO THE API, ROBIN!
Wikipedia's own help pages are a little lacking, so I went to the help pages of the software which runs Wikipedia - MediaWiki.
We want to search for external URLs which point to Twitter and have a namespace of 0 (that means they're articles, not talk pages). We can grab a maximum of 500 results at a time, using JSON, and we want to include "www.twitter.com" and "twitter.com". Here's what we use.
https://en.wikipedia.org/w/api.php? action=query& list=exturlusage& eunamespace=0& eulimit=500& format=json& euquery=*.twitter.com
Run it yourself to see the results.
Limitations
- This only gets 500 results at a time - to paginate through, we add
euoffset=
- This only searches the English Wikipedia.
- There are roughly 17,800 links to Twitter from English Wikipedia returned by the API.
- The majority of citations just point to a Twitter user's page - not to a specific Tweet.
- Some of the returned Tweets use the obsolete HashBang URls (e.g. http://twitter.com/#!/007/status/133679555167784960.)
- Some Tweets have been deleted.
Crappy Python!
import urllib2
import json
from collections import Counter
#euoffset=17800
api = "https://en.wikipedia.org/w/api.php?action=query&eunamespace=0&eulimit=500&format=json&list=exturlusage&euquery=*.twitter.com"
euoffset = 0
words = []
while euoffset < 17500:
try:
site_data = json.load(urllib2.urlopen(api + "&euoffset=" + str(euoffset)))
# Itterate through
for element in site_data['query']['exturlusage']:
# Remove HashBangs #!
# Lowercase everything
twitterURL = element['url'].replace("/#!","").lower()
twitterUser = twitterURL.replace("http://twitter.com/","")
twitterUser = twitterUser.replace("https://twitter.com/","")
twitterUser = twitterUser.replace("@","")
slash = twitterUser.find("/")
if slash > 0:
twitterUser = twitterUser[:slash]
print twitterURL
# print twitterUser
words.append(twitterUser)
except urllib2.URLError:
print "Unable to retreive data"
sys.exit()
euoffset += 500
# Most cited user
word_counts = Counter(words)
print word_counts
The Results...
The most cited Twitter users are...
- LaLiga A Spanish Football competition - 105
- Lea Michele An American singer / TV actor - 54
- Guldbaggen The Swedish equivalent of the Oscars - 35
- Kevin Tancharoen An American movie director - 12
- PRESTO card Ottawa's public transit smartcard - 10
- ICE T An American rapper - 10
- Northern Pride RLFC A British rugby team - 10
- 穴井勇輝(勇吹輝) A Japanese Actor - 9
- NICKI MINAJ An American singer - 8
- Counting Crows An American band - 8
And the most cited individual Tweet?
Linked to from lots of Lost Girl pages.
What Have We Learned Today?
Wikipedia does have a large amount of pop-culture (do we need hundreds of words on My Little Pony Characters?)
Twitter, unsurprisingly, has limited utility as an encyclopædic source - it's great for breaking news and ephemeral events, but it's fragile and lacks depth. There are very few occasions where Twitter would be the sole, and canonical, source of information [Citation needed].
Andy Mabbett says:
Citation needed? That would be https://en.wikipedia.org/wiki/WP:Twitter
You're welcome.
Andy Mabbett says:
As a result of your post, I've arranged for hashbangs to be removed from Twitter URLs in Wikipedia; see here, for example .
Terence Eden says:
Brilliant! Thanks Andy.
3 foxes in a trenchcoat said on iscurrently.live:
@Edent @davidgerard nice post! I'd hate to figure out how many cited tweets have been lost to the ages.. :/
Andy Mabbett says:
Elon Musk, in his wisdom [SIC] has decided to delete inactive Twitter accounts. Some reports say that's defined by having no activity in as few as 30 days.
Hopefully, all the tweets cited in Wikipedia (like nearly all other web pages cited) will be in the Internet Archive's Wayback Machine.