How to search Mastodon by date & time


Two years ago to the day, I built Twistory - a service for seeing what you posted on Twitter on this day in previous years. If you've ever used Facebook, you'll know how it is supposed to work. You see posts which show that exactly 5 years ago you were starting a new job, 6 years ago you were at a wedding, etc.

The Twitter version never really worked properly because the Twitter API doesn't support searching for historic Tweets. What I had to do was manually build search queries like: ?q=from:edent (until:2011-11-15 since:2011-11-14) and redirect people to the website.

Eugh!

I'm trying to build something similar for the Mastodon social network. Yes, I know it is new to you - but some of us have been there for several years.

So here's how to search Mastodon for posts made on specific dates!

(Skip to the code and ignore all the exciting preamble.)

Sadly, the Mastodon Search API is still quite basic. It isn't possible to directly search by user or by data parameters.

If you download the archive of all your posts, you'll find an ActivityPub feed called outbox.json - which is a collection of everything you've ever posted.

It can be parsed using jq to get all of the statuses posted on a specific day:

cat outbox.json | jq ".orderedItems[] | select (.published | fromdateiso8601 > 1636533813) | select (.published | fromdateiso8601 < 1636620302)"

The fromdateiso8601 are the Unix epoch times from 365 days ago and 364 days ago.

So, conceptually, it's possible to build - as long as you're willing to download your data and manually parse it. Let's see if we can do a little better than that.

Building in Python

We're going to build this using Python3 and the Mastodon.py library.

Install the library on the command line with:

pip3 install -U Mastodon.py

Now, launch Python and load the library:

from mastodon import Mastodon

We'll need an API key. Go to the website of your Mastodon instance. In settings, there should be an option called "Development". Use that to create a new app which has the "Read" scope.

Once created, we will use the "Your access token" which will be a long string of random letters and numbers. In this example, we'll be using abc123. Your real access token will be longer!

Let's set up a connection to your Mastodon instance:

instance = "https://mastodon.example.com"
mastodon = Mastodon( api_base_url=instance, access_token="abc123" )

Next, we need your user ID. This isn't your @ name, instead it is the numerical ID assigned by the server. For this, we need the Verify account credentials API call:

mastodon.me()

That produces:

{
   'id': 7112,
   'username': 'Edent',
   'acct': 'Edent',
   'display_name': 'Terence Eden',
   ...

Looks like I was a pretty early adopter!

Getting the last 20 statuses is:

me = mastodon.me()
my_id = me["id"]
mastodon.account_statuses(id = my_id)

The problem is, that can only receive a maximum of 40 statuses at a time.

statuses = mastodon.account_statuses(id = my_id, limit="40")

We need to use Pagination. The API makes it pretty easy to grab the next page. There's also a call to get every post.

Be warned - this can take a long time. If you have thousands of posts it may take a few minutes. It can also quickly deplete your API rate limits. Use with caution!

We can reduce some of the load by excluding anything you've "boosted".

statuses = mastodon.account_statuses(id = my_id, limit="40", exclude_reblogs=True)
all_statuses = mastodon.fetch_remaining(statuses)

You can run len(all_statuses) to see how many you have retrieved.

The next step is finding all the posts which happened on a certain day each year. Let's say we want every post which happened on a 14th of February.

The library returns timestamps as Python DateTime objects.

status_date = all_statuses[1]["created_at"]

Shows

datetime.datetime(2016, 11, 1, 17, 4, 23, 842000, tzinfo=tzutc())

The datetime library is pretty handy. You can find the day using status_date.day and month with status_date.month.

This means we can loop through every status and show only the ones we care about.

for status in all_statuses:
     if (status["created_at"].day == 14 and status["created_at"].month == 2):
             print(status["uri"])

That will get you a list of URls which contain posts made on a specific day in previous years.

Putting it all together

from mastodon import Mastodon
instance = "https://mastodon.example.com"
mastodon = Mastodon( api_base_url=instance, access_token="abc123" )
me = mastodon.me()
my_id = me["id"]
mastodon.account_statuses(id = my_id)
statuses = mastodon.account_statuses(id = my_id, limit="40", exclude_reblogs=True)
all_statuses = mastodon.fetch_remaining(statuses)
for status in all_statuses:
   if (status["created_at"].day == 14 and status["created_at"].month == 2 and status["created_at"].year < 2022) :
      print(status["uri"])

Building a time machine

All of the above works, but is pretty inefficient because there's no way to search for specific timeframes on Mastodon. Or so I thought!

If we can calculate the maximum and minimum IDs for a given day, we will have a much more efficient search!

Let's dive in to the Mastodon Snowflake code. It is really well documented:

Our ID will be composed of the following:

6 bytes (48 bits) of millisecond-level timestamp

2 bytes (16 bits) of sequence data

OK! Let's look at a typical Mastodon Status ID mastodon.social/@Edent/109326536843609210, it was posted on 2022-11-11 at 18:16.

Let's take the ID 109326536843609210 and perform a bitwise shift on it.

print(109326536843609210 >> 16)
1668190564630

Hey! That looks a bit like a UNIX timestamp! The last three numbers will be the sequence, so we can eliminate them and see what happens if we convert it to a timestamp.

from datetime import datetime
datetime.fromtimestamp(1668190564630/1000)
datetime.datetime(2022, 11, 11, 18, 16, 4, 630000)

Nice! So we can go backward and take a date - say this time last year - and convert it to a maximum and minimum ID.

min_id = ( int( datetime(2022,11,11,00,00).timestamp() ) << 16 ) * 1000
max_id = ( int( datetime(2022,11,11,23,59).timestamp() ) << 16 ) * 1000

Which gives us 109322226892800000 and 109327885271040000 respectively.

Let's try that with the API!

Final Code

from datetime import datetime, timedelta
from mastodon import Mastodon

#  Set up access
instance = "https://mastodon.example"
mastodon = Mastodon( api_base_url=instance, access_token="abc123" )

#  Get user's info
me = mastodon.me()
my_id = me["id"]
year_joined = me["created_at"].year

#  Today's date
year_now  = datetime.now().year
month_now = datetime.now().month
day_now   = datetime.now().day

#  Counter
year_counter = year_now

#  Loop through previous years
#  Start with last year and go down until the user joined
while (year_counter >= year_joined ) :
   year_counter -= 1
   #  The end of today is the start of tomorrow
   #  This means yesterday can take into account leap-years
   today_end = datetime(year_counter, month_now, day_now, 00, 00) + timedelta(days=1)
   yesterday_end = today_end - timedelta(days=1)
   #  Bitwise shift the integer representation and convert to milliseconds
   max_id = ( int( today_end.timestamp() )     << 16 ) * 1000
   min_id = ( int( yesterday_end.timestamp() ) << 16 ) * 1000
   #  Call the API
   statuses = mastodon.account_statuses(id = my_id, max_id=max_id, min_id=min_id, limit="40", exclude_reblogs=True)
   #  Fetch further statuses if there are any
   all_statuses = mastodon.fetch_remaining(statuses)
   #  Print the date and URl
   for status in all_statuses :
      print( str(status["created_at"]) + " " + status["uri"] )

Next Steps

It works on my machine! But that's not really good enough. Ideally I'd like to turn this in to a web app which people could use with their own account.

If you're interested in helping out with that grab the code or drop me a line!


7 thoughts on “How to search Mastodon by date & time

  1. I wonder if having no search would improve twitter like experiences.

    It may be useful to hold politicians and public figures to their words but for everyone else it allows witch hunts.



  2. That’s so interesting, Naheem.

    One school of thought is that since a Twitter stream (or Toot stream) is published by a user, that user should have control over what others see. An aspect of privacy.

    The other school of thought is that since something has been published to the world, it’s no longer under the control of the author, and others have a right to see what you said previously.

    My friend David Eastman wrote a blog post a few years ago that I think about often. The idea that the privacy and anonymity we used to experience never needed to be actively thought about, because invading it was always such a hassle for other people. That is, until our lives were digitised (sometimes with our consent, very often without).

    I think the same is true with regards to the “privacy” of our past. The spectacularly unfunny stuff I wrote as a youth is thankfully lost. It wouldn’t be if I had kept a blog or a Twitter feed in my teens.

    The EU has a ‘right to be forgotten’ law of course, which speaks to this.

    For a very short time a decade ago I wondered whether I might stand for election to something. I began to realise that my digital output might, in future years, be scrutinised, and that in turn affected what I wrote and how I wrote it. I’ve given up on such lofty (or stupid) ideas now but am conscious that if anyone will read my digital output in the future, it will likely be my kids (and their kids?) and I want to come across as Not A Dickhead. The point is that awareness of posterity now colours everything I post online. I think it’s subconscious now, and mental state, a way of thinking.

    It’s fascinating to be reminded that my mental (shall we say) orientation in this regard is a product of design choices by technologists, who may not even realise they’re making choices.

Leave a Reply

Your email address will not be published. Required fields are marked *