Exploring BlueSky's Domain Handles
Hot new social networking site BlueSky has an interesting approach to usernames. Rather than just being @example
you can verify your domain name and be @example.com
! Isn't that exciting?
Some people are @whatever.tld
and others are @cool.subdomain.funny.lol.fwd.boring.tld
I wanted to know what the distribution is of these domain names. For example, are there more .uk users than .org users?
Shut up and show me the results
You can play with the interactive data
Oh, and the large number of .gy domains is due to The Fediverse Bridge.
Getting the data
BlueSky has an open "firehose" of the data passing through it. Following the sample code I listened for public interactions - people posting, liking, or follows.
From there, I grabbed every username which wasn't on the default .bsky.social
domain. I left the code running for a few days until I had over 22,000 usernames.
Note, these data are all public - although I'm not sure if users necessarily realise that. It doesn't include lurkers (people who don't interact). Some of the accounts may have been moved, banned, or deleted.
Drawing a TreeMap
I used Plotly's TreeMap library to draw a static map of all the Top Level Domains (TLD).
As you can see, .com dominates the landscape - but there are quite a few country code TLDs in there as well.
Public Suffixes
Domain names have the concepts of Public Suffixes. For example, users can register domains at .co.uk and .org.uk as well as just plain .uk. The Python tldextract
library allowed me to see which domains were public suffixes, so I could attach them to their parent TLD.
I then drew a TreeMap showing this.
Note! You'll need to hack your Plotly installation to allow empty leaf nodes to get in the same style as the first map.
So what? What next?
- Not everyone from, say, Brazil will have a .br domain name - but it is fascinating to see which countries dominate.
- It might be fun to go full "Information Is Beautiful" and turn each ccTLD into its country's flag.
- Are there ethical implications of recording the fact that an account has publicly shared themselves on a social network?
- What percentage of all users have a domain name handle?
Get the code
Everything is open source on GitHub.
Lex said on mastodon.social:
@Edent My math intuition is that this is suspiciously 'smooth' for chunky data based on human choices. Meaning that it would probably beautifully snap to an exponential decay curve.
I might be undervaluing the large size of the dataset though, making humans look like a natural phenomenon?
Lovely graph, even the palette choice is pleasing!
James Henstridge said on aus.social:
@Edent From what I can tell, you could also enumerate all domain handles via the https://plc.directory/export API. This would give the complete data set rather than just the accounts that were posting when you were collecting data.
I'm not sure how different the results would be though.,
Matt Tams said on bsky.app:
Really really interesting 🙂 Regarding the point on ethics: firstly, I think it’s the right question to be asking (I think it’s ALWAYS the right question to be asking), but I’m not inclined to believe there’s anything ethically shady about aggregating non-PII public data to plot broad trends
Merlin.📞3633.BDRip.x265.10bit said on bsky.app:
Hello from the one and only .ax user!
More comments on Mastodon.