Snowflake IDs in Mastodon (and Unique IDs in the Fediverse more generally)


Computer Science has two canonical "hard problems":

  1. cache invalidation
  2. naming things
  3. off-by-one errors

Let's talk about how we name unique items in Federated services - for example, posts on a social media service.

If you have only one service, it's pretty easy. Every time a new entry is created in a database, give it a sequential number.

This becomes a problem at scale. If you have millions of users on hundreds of different shards of a database, eventually you'll get a clash of IDs. To that end, Twitter invented Snowflake IDs.

Snowflakes are pretty clever. They are a 64 bit ID. The first part of the ID is a timestamp, and the second part is some information about the server which generated the ID. This means that IDs can be sorted by time, and will globally unique.

So, does the Fediverse use Snowflake? Yes. AND NO!

The Mastodon service uses Snowflake for its IDs. For example, one of my posts has the ID 109347703064222520. If you bitshift that into a 64 bit number, you'll see it is a UNIX timestamp with a few bits of extra data.

But here comes the problem. On a Federated service, there's no guarantee of uniqueness. A naughty server could deliberately generate a duplicate Snowflake ID. You can't trust anyone on the Internet!

So here's how Mastodon - and the wider Fediverse - deals with unique IDs. It cheats.

  • When I make a post, my server gives it a Snowflake ID which is unique to my server - e.g. 123456
  • This generates a URl which is globally unique - e.g. https:// my_server.xyz/@username/123456
  • My post is delivered to your server.
  • Your server gives it an ID which is unique to your server - e.g. 4d7b70c7
  • Your server turns that into a URl - e.g. https:// your_server.lol/incoming/from/@username@my_server.xyz/4d7b70c7

For example, here's the JSON response I get when I look up the Snowflake ID of a reply someone sent me. My server replies with:

JSON JSON{
 'created_at': datetime.datetime(2022, 11, 13, 0, 52, 37, tzinfo=tzutc()),
 'id': 109347716491680514,
 ...
 'uri': 'https://queer.af/@erincandescent/109347716173491502',
}

The original ID is 109347716173491502 on the sender's server. But the unique ID on my server is 109347716491680514.

Their unique ID could be a GUID, a series of emoji, or a sequential number. It doesn't matter. To prevent clashes, the receiving server generates its own ID.

This could all be solved a lot more easily if every server minted each post using a Proof of Work transaction against a centralised BlockCha… *sound of gunshot*

Ahem.

It is a pretty reasonable solution. A remote server may or may not have a globally unique ID. It doesn't matter. Once you see a post, it is given its own locally unique ID. And that's good enough.


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

One thought on “Snowflake IDs in Mastodon (and Unique IDs in the Fediverse more generally)”

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">