This is what a graph of 8,000 fake Twitter accounts looks like


Recently I've been plagued with Tweets saying that I'm "trending in London."

Twitter Spam Trending-fs8

Twitter Spam Trending Again-fs8

More bloody spammers-fs8

As flattering as that is, it's not true. There appears to be a network of Twitter bots which are randomly repeating other people's tweets, ripping off avatars and bios, and generally causing a nuisance.

Looking at the users' Twitter name, I don't think it's unreasonable to think that "ekip_uhokoqeq" and "utadaqusoxeh" are randomly generated sequence of characters. And, without wishing to judge, that photo doesn't look like a Susan...

Let's take a look at one of the user's profile :

Twitter Spam Profile-fs8

It's possible to trace back the bio and photo to different users - they've had their details misappropriated. The Tweets seem to be just randomly taken from other users.

Let's take a look at who this bot is following :

Twitter Spam Following-fs8

Again, random sequences of letters, hijacked bios and photos. Clicking through each of these profiles reveals a network of thousands of fake accounts.

I adapted a script to visualise the network of accounts - this is only looking 3 levels deep (Twitter's API limits make going much further a time consuming task) :

Spam Graph-fs8

I used Gephi to draw the graph.

As you can see - there are dozens of randomly named accounts. They appear to only be following each other - I don't think any "real" users are in there. I can only assume that by forming a network like this, they can evade Twitter's filters. The bots can then either start generating spam, be sold off as fake followers, or used for some other unsavoury purpose.

I ran the script over the weekend to a recursive depth of 4 and identified over 8,000 spam accounts. Using Cytoscape and Allegro Layout I was able to create this visualisation of the tangled web of connections.

Big Graph of Twitter Spammers-fs8

Using Python's Graph-Tool I generated a somewhat prettier visualisation of how all these accounts are connected.
Network Diagram Spammers-fs8
They each mostly follow around 8 accounts - there's a fair bit of clustering. While there are a few accounts with larger follower numbers, it's hard to discern if there's a definitive pattern. The large gap appears to be users who have been suspended.

As the weekend drew to a close, I'd reached the fifth level of my recursive algorithm. Using Cytoscape's "Organic" layout, another interesting pattern emerges.
Circles of Spam-fs8
There appears to be several "loops" - that is bots which are in an almost closed network with each other. I see at least half a dozen circles - the rest appear to be following other fake accounts at random.

The centre of those circles appear to be real people. I can't say why they have lots of fake followers - it's possible that they - or someone else - has just bought them to make it look like they're more popular than they really are. There's no suggestion that they control the fake accounts.

One of the central nodes has 650,000 followers. It's not possible to know quite how many of those are fake - I'm guessing the majority are.

It seems that there's a nasty nest of these bots. In the last few weeks I've reported a dozen or so for spam - but with literately tens of thousands in the network it's impossible for any individual to make a meaningful impact.

I wish Twitter could track down the source of this problem and eradicate it.

If you want to have a play with this dataset - you can download a .zip file of the relationships and their metadata.

11 thoughts on “This is what a graph of 8,000 fake Twitter accounts looks like

  1. Seems like you'd be better off just emailing your findings to Twitter, rather than trying to report each individual bot via the API. These findings could also help them improve their detection. Seems like having a human examine any account with a gibberish name, copied tweets/profile info/photo, and/or following other suspicious accounts, would help a lot. (Of course never automatically close an account for such reasons, only mark it as "hey, one of you admins should look at this and make sure it's a real person when you get a chance".)

      1. So far you can "buy" 30K Twitter fake followers for $5 on the market ...and more than that there is no real protection for any user to be linked to them ...if someone know your @user (obvious) and guess your e-mail account that you use to register he/she can make you the (un)happy owner of those fake flowers in less then 24 hours for just $5 ...I guess the problem did not reached yet a level where Twitter consider fixing it ...

  2. This been happening literally for years, and this is the first time I've seen it presented so well. Incredible work. I used to work for a social media startup that has software that crawled Twitter for keywords in geographic areas (so we could tweet on behalf of local businesses and connect them with people looking for their products or services) and there'd be times I'd have an entire list of tweets from nothing but fake accounts. Often, they were actually using the same keyword or completely same tweet. I'd see some add random characters to differentiate, and you could tell many were ripped off from actual tweets, just removing @ tags and hashtags. I've also seen a lot of this rampantly happening around the hashtag we use for our event in October, ripping off attendees tweets. It's so creepy, actually.
    Again, great work. Definitely sharing this one.

  3. Nice work and great analysis! Here is another view of the fake accounts:
    http://i.imgur.com/kFBHd9t.jpg

    We can clearly see, from bottom to top:
    - (Bottom) 6 account which have paid for fake followers.
    - (Middle) A pool of 20+ unassigned fake accounts, ready to be bought I presume.
    - (Middle) Oddly, bridges from those unassigned accounts to the real network.
    - (Top) The legitimate twitter network, which looks a lot more 'organic'.

    This was generated automatically with our in-browser graph exploration tool (graphistry.com). It connects to a GPU farm to do the heavy lifting and leaves only rendering for the browser. That let us scale to 1000x more nodes. We are hoping to release a public beta soon, email me at [email protected] if you'd like to give it a spin beforehand.

  4. In a completely different way this can actually put be perceived as a very positive thing for everyone. By mixing bits and pieces of everyone's social media, this system deters identity theft a little bit, adding a thin layer of randomized uncertainty safety into search results that are not well-intentioned. It reminds me of another new species Im seeing online: "person info database" websites proudly described as purposefully inaccurate and mixmatched real data (names, Id Dob street address etc.) Supposed to help normalize the scary easy search engine availability of all our info. Scatter the info!

    1. @rosemina in a perfect world, where the AIs mimic human behaviour accurately, this would be a good cloak. Unfortunately, as can be seen in Graphistry's visualization, the narrowly defined payoff functions that determine each bot's behaviour result in an aggregate pattern that is easily distinguished from the complex, contingent, and evolving behaviour of a truly intelligent agent.

      Information diffusion is not really happening with these strategies. They are simply stat-manipulators, whose job as a collective is to skew automatically determined sentiment analyses for commercial or political purposes. While it may seem like your right to privacy is being protected, the fact of the matter is that your right to assembly and free speech is being actively undermined.

      If you treat the portion of the twitter graph composed of intelligent agents (legitimate users) as a bacterial colony, a group of simple bots like the ones being discussed here can be viewed as analogous to a class of bacteriophage.

Leave a Reply

Your email address will not be published. Required fields are marked *