How to search Mastodon by date & time

HowTo mastodon NaBloPoMo Open Source python tutorial · 7 comments · 1,150 words · Viewed ~1,466 times.

Two years ago to the day, I built Twistory - a service for seeing what you posted on Twitter on this day in previous years. If you've ever used Facebook, you'll know how it is supposed to work. You see posts which show that exactly 5 years ago you were starting a new job, 6 years ago you were at a wedding, etc.

The Twitter version never really worked properly because the Twitter API doesn't support searching for historic Tweets. What I had to do was manually build search queries like: ?q=from:edent (until:2011-11-15 since:2011-11-14) and redirect people to the website.

Eugh!

I'm trying to build something similar for the Mastodon social network. Yes, I know it is new to you - but some of us have been there for several years.

So here's how to search Mastodon for posts made on specific dates!

(Skip to the code and ignore all the exciting preamble.)

Sadly, the Mastodon Search API is still quite basic. It isn't possible to directly search by user or by data parameters.

If you download the archive of all your posts, you'll find an ActivityPub feed called outbox.json - which is a collection of everything you've ever posted.

It can be parsed using jq to get all of the statuses posted on a specific day:

cat outbox.json | jq ".orderedItems[] | select (.published | fromdateiso8601 > 1636533813) | select (.published | fromdateiso8601 < 1636620302)"

The fromdateiso8601 are the Unix epoch times from 365 days ago and 364 days ago.

So, conceptually, it's possible to build - as long as you're willing to download your data and manually parse it. Let's see if we can do a little better than that.

Building in Python

We're going to build this using Python3 and the Mastodon.py library.

Install the library on the command line with:

pip3 install -U Mastodon.py

Now, launch Python and load the library:

 Python 3from mastodon import Mastodon

We'll need an API key. Go to the website of your Mastodon instance. In settings, there should be an option called "Development". Use that to create a new app which has the "Read" scope.

Once created, we will use the "Your access token" which will be a long string of random letters and numbers. In this example, we'll be using abc123. Your real access token will be longer!

Let's set up a connection to your Mastodon instance:

 Python 3instance = "https://mastodon.example.com"

mastodon = Mastodon( api_base_url=instance, access_token="abc123" )

Next, we need your user ID. This isn't your @ name, instead it is the numerical ID assigned by the server. For this, we need the Verify account credentials API call:

 Python 3mastodon.me()

That produces:

 JSON{

   'id': 7112,

   'username': 'Edent',

   'acct': 'Edent',

   'display_name': 'Terence Eden',

   ...

Looks like I was a pretty early adopter!

Getting the last 20 statuses is:

 Python 3me = mastodon.me()

my_id = me["id"]

mastodon.account_statuses(id = my_id)

The problem is, that can only receive a maximum of 40 statuses at a time.

 Python 3statuses = mastodon.account_statuses(id = my_id, limit="40")

We need to use Pagination. The API makes it pretty easy to grab the next page. There's also a call to get every post.

Be warned - this can take a long time. If you have thousands of posts it may take a few minutes. It can also quickly deplete your API rate limits. Use with caution!

We can reduce some of the load by excluding anything you've "boosted".

 Python 3statuses = mastodon.account_statuses(id = my_id, limit="40", exclude_reblogs=True)

all_statuses = mastodon.fetch_remaining(statuses)

You can run len(all_statuses) to see how many you have retrieved.

The next step is finding all the posts which happened on a certain day each year. Let's say we want every post which happened on a 14th of February.

The library returns timestamps as Python DateTime objects.

 Python 3status_date = all_statuses[1]["created_at"]

Shows

 Python 3datetime.datetime(2016, 11, 1, 17, 4, 23, 842000, tzinfo=tzutc())

The datetime library is pretty handy. You can find the day using status_date.day and month with status_date.month.

This means we can loop through every status and show only the ones we care about.

 Python 3for status in all_statuses:

     if (status["created_at"].day == 14 and status["created_at"].month == 2):

             print(status["uri"])

That will get you a list of URls which contain posts made on a specific day in previous years.

Putting it all together

 Python 3from mastodon import Mastodon

instance = "https://mastodon.example.com"

mastodon = Mastodon( api_base_url=instance, access_token="abc123" )

me = mastodon.me()

my_id = me["id"]

mastodon.account_statuses(id = my_id)

statuses = mastodon.account_statuses(id = my_id, limit="40", exclude_reblogs=True)

all_statuses = mastodon.fetch_remaining(statuses)

for status in all_statuses:

   if (status["created_at"].day == 14 and status["created_at"].month == 2 and status["created_at"].year < 2022) :

      print(status["uri"])

Building a time machine

All of the above works, but is pretty inefficient because there's no way to search for specific timeframes on Mastodon. Or so I thought!

If we can calculate the maximum and minimum IDs for a given day, we will have a much more efficient search!

Let's dive in to the Mastodon Snowflake code. It is really well documented:

Our ID will be composed of the following: 6 bytes (48 bits) of millisecond-level timestamp 2 bytes (16 bits) of sequence data

OK! Let's look at a typical Mastodon Status ID mastodon.social/@Edent/109326536843609210, it was posted on 2022-11-11 at 18:16.

Let's take the ID 109326536843609210 and perform a bitwise shift on it.

 Python 3print(109326536843609210 >> 16)

1668190564630

Hey! That looks a bit like a UNIX timestamp! The last three numbers will be the sequence, so we can eliminate them and see what happens if we convert it to a timestamp.

 Python 3from datetime import datetime

datetime.fromtimestamp(1668190564630/1000)

datetime.datetime(2022, 11, 11, 18, 16, 4, 630000)

Nice! So we can go backward and take a date - say this time last year - and convert it to a maximum and minimum ID.

 Python 3min_id = ( int( datetime(2022,11,11,00,00).timestamp() ) << 16 ) * 1000

max_id = ( int( datetime(2022,11,11,23,59).timestamp() ) << 16 ) * 1000

Which gives us 109322226892800000 and 109327885271040000 respectively.

Let's try that with the API!

Final Code

 Python 3from datetime import datetime, timedelta

from mastodon import Mastodon



#  Set up access

instance = "https://mastodon.example"

mastodon = Mastodon( api_base_url=instance, access_token="abc123" )



#  Get user's info

me = mastodon.me()

my_id = me["id"]

year_joined = me["created_at"].year



#  Today's date

year_now  = datetime.now().year

month_now = datetime.now().month

day_now   = datetime.now().day



#  Counter

year_counter = year_now



#  Loop through previous years

#  Start with last year and go down until the user joined

while (year_counter >= year_joined ) :

   year_counter -= 1

   #  The end of today is the start of tomorrow

   #  This means yesterday can take into account leap-years

   today_end = datetime(year_counter, month_now, day_now, 00, 00) + timedelta(days=1)

   yesterday_end = today_end - timedelta(days=1)

   #  Bitwise shift the integer representation and convert to milliseconds

   max_id = ( int( today_end.timestamp() )     << 16 ) * 1000

   min_id = ( int( yesterday_end.timestamp() ) << 16 ) * 1000

   #  Call the API

   statuses = mastodon.account_statuses(id = my_id, max_id=max_id, min_id=min_id, limit="40", exclude_reblogs=True)

   #  Fetch further statuses if there are any

   all_statuses = mastodon.fetch_remaining(statuses)

   #  Print the date and URl

   for status in all_statuses :

      print( str(status["created_at"]) + " " + status["uri"] )

Next Steps

It works on my machine! But that's not really good enough. Ideally I'd like to turn this in to a web app which people could use with their own account.

If you're interested in helping out with that grab the code or drop me a line!

7 thoughts on “How to search Mastodon by date & time”

2022-11-14 13:04

Naheem Says said on twitter.com:

I wonder if having no search would improve twitter like experiences.

It may be useful to hold politicians and public figures to their words but for everyone else it allows witch hunts.

Reply | Reply to original comment on twitter.com
2022-11-14 22:39

Robert Sharp says:

That’s so interesting, Naheem.

One school of thought is that since a Twitter stream (or Toot stream) is published by a user, that user should have control over what others see. An aspect of privacy.

The other school of thought is that since something has been published to the world, it’s no longer under the control of the author, and others have a right to see what you said previously.

My friend David Eastman wrote a blog post a few years ago that I think about often. The idea that the privacy and anonymity we used to experience never needed to be actively thought about, because invading it was always such a hassle for other people. That is, until our lives were digitised (sometimes with our consent, very often without).

I think the same is true with regards to the “privacy” of our past. The spectacularly unfunny stuff I wrote as a youth is thankfully lost. It wouldn’t be if I had kept a blog or a Twitter feed in my teens.

The EU has a ‘right to be forgotten’ law of course, which speaks to this.

For a very short time a decade ago I wondered whether I might stand for election to something. I began to realise that my digital output might, in future years, be scrutinised, and that in turn affected what I wrote and how I wrote it. I’ve given up on such lofty (or stupid) ideas now but am conscious that if anyone will read my digital output in the future, it will likely be my kids (and their kids?) and I want to come across as Not A Dickhead. The point is that awareness of posterity now colours everything I post online. I think it’s subconscious now, and mental state, a way of thinking.

It’s fascinating to be reminded that my mental (shall we say) orientation in this regard is a product of design choices by technologists, who may not even realise they’re making choices.

Reply
2022-11-15 07:02

Workshopshed said on mastodon.scot:

@Edent nice work, Terence

Reply | Reply to original comment on mastodon.scot
2022-11-15 07:22

backseat said on fosstodon.org:

@Edent thanks for publishing your code - looks great!

Reply | Reply to original comment on fosstodon.org
2022-11-15 11:36

Benjamin S-B :verified: said on genomic.social:

@Edent Excellent project. You might have seen, but there's also this feature request for better 🧵 handling: https://github.com/mastodon/mastodon/issues/8615
Fold toots in a thread, only display some of them in timeline · Issue #8615 · mastodon/mastodon

Reply | Reply to original comment on genomic.social
2022-11-16 06:24

Poliverso notizie dal fediverso said on poliverso.org:

COME FARE RICERCHE SU MASTODON PER DATA E ORA. IL POST DI @

Esattamente due anni fa, ho creato Twistory , un servizio per vedere ciò che hai pubblicato su Twitter in questo giorno negli anni precedenti. Se hai mai usato Facebook, saprai come dovrebbe funzionare. Vedi post che mostrano che esattamente 5 anni fa stavi iniziando un nuovo lavoro, 6 anni fa eri a un matrimonio, ecc.La versione di Twitter non ha mai funzionato correttamente perché l'API di Twitter non supporta la ricerca di Tweet storici. Quello che dovevo fare era creare manualmente query di ricerca come: ?q=from:edent ( fino al:2011-11-15 since:2011-11-14) e reindirizzare le persone al sito web.Eh già!Sto cercando di costruire qualcosa di simile per il social network Mastodon. Sì, lo so che per te è nuovo, ma alcuni di noi sono lì da diversi anni.Quindi ecco come cercare su Mastodon i post pubblicati in date specifiche!

(qui il post di Terence Eden) ! Che succede nel Fediverso?

Reply | Reply to original comment on poliverso.org
2022-11-29 11:30

Dane says:

Further to this, if you want to start getting your old Pixelfed posts, they also use Snowflake IDs, but have an epoch in Feburary 2019, so there's a slightly different way to calculate them:

min_id = (int(datetime(2022,11,28,2,00).timestamp()) * 1000) - 1549756800000 << 22
max_id = (int(datetime(2022,11,28,2,59).timestamp()) * 1000) - 1549756800000 << 22

https://github.com/pixelfed/pixelfed/blob/dev/app/Services/SnowflakeService.php

Reply
More comments on Mastodon.