How to search Mastodon by date & time
Two years ago to the day, I built Twistory - a service for seeing what you posted on Twitter on this day in previous years. If you've ever used Facebook, you'll know how it is supposed to work. You see posts which show that exactly 5 years ago you were starting a new job, 6 years ago you were at a wedding, etc.
The Twitter version never really worked properly because the Twitter API doesn't support searching for historic Tweets. What I had to do was manually build search queries like: ?q=from:edent (until:2011-11-15 since:2011-11-14) and redirect people to the website.
Eugh!
I'm trying to build something similar for the Mastodon social network. Yes, I know it is new to you - but some of us have been there for several years.
So here's how to search Mastodon for posts made on specific dates!
(Skip to the code and ignore all the exciting preamble.)
Sadly, the Mastodon Search API is still quite basic. It isn't possible to directly search by user or by data parameters.
If you download the archive of all your posts, you'll find an ActivityPub feed called outbox.json
- which is a collection of everything you've ever posted.
It can be parsed using jq to get all of the statuses posted on a specific day:
cat outbox.json | jq ".orderedItems[] | select (.published | fromdateiso8601 > 1636533813) | select (.published | fromdateiso8601 < 1636620302)"
The fromdateiso8601
are the Unix epoch times from 365 days ago and 364 days ago.
So, conceptually, it's possible to build - as long as you're willing to download your data and manually parse it. Let's see if we can do a little better than that.
Building in Python
We're going to build this using Python3 and the Mastodon.py library.
Install the library on the command line with:
pip3 install -U Mastodon.py
Now, launch Python and load the library:
Python 3from mastodon import Mastodon
We'll need an API key. Go to the website of your Mastodon instance. In settings, there should be an option called "Development". Use that to create a new app which has the "Read" scope.
Once created, we will use the "Your access token" which will be a long string of random letters and numbers. In this example, we'll be using abc123
. Your real access token will be longer!
Let's set up a connection to your Mastodon instance:
Python 3instance = "https://mastodon.example.com"
mastodon = Mastodon( api_base_url=instance, access_token="abc123" )
Next, we need your user ID. This isn't your @ name, instead it is the numerical ID assigned by the server. For this, we need the Verify account credentials API call:
Python 3mastodon.me()
That produces:
JSON{
'id': 7112,
'username': 'Edent',
'acct': 'Edent',
'display_name': 'Terence Eden',
...
Looks like I was a pretty early adopter!
Getting the last 20 statuses is:
Python 3me = mastodon.me()
my_id = me["id"]
mastodon.account_statuses(id = my_id)
The problem is, that can only receive a maximum of 40 statuses at a time.
Python 3statuses = mastodon.account_statuses(id = my_id, limit="40")
We need to use Pagination. The API makes it pretty easy to grab the next page. There's also a call to get every post.
Be warned - this can take a long time. If you have thousands of posts it may take a few minutes. It can also quickly deplete your API rate limits. Use with caution!
We can reduce some of the load by excluding anything you've "boosted".
Python 3statuses = mastodon.account_statuses(id = my_id, limit="40", exclude_reblogs=True)
all_statuses = mastodon.fetch_remaining(statuses)
You can run len(all_statuses)
to see how many you have retrieved.
The next step is finding all the posts which happened on a certain day each year. Let's say we want every post which happened on a 14th of February.
The library returns timestamps as Python DateTime objects.
Python 3status_date = all_statuses[1]["created_at"]
Shows
Python 3datetime.datetime(2016, 11, 1, 17, 4, 23, 842000, tzinfo=tzutc())
The datetime library is pretty handy. You can find the day using status_date.day
and month with status_date.month
.
This means we can loop through every status and show only the ones we care about.
Python 3for status in all_statuses:
if (status["created_at"].day == 14 and status["created_at"].month == 2):
print(status["uri"])
That will get you a list of URls which contain posts made on a specific day in previous years.
Putting it all together
Python 3from mastodon import Mastodon
instance = "https://mastodon.example.com"
mastodon = Mastodon( api_base_url=instance, access_token="abc123" )
me = mastodon.me()
my_id = me["id"]
mastodon.account_statuses(id = my_id)
statuses = mastodon.account_statuses(id = my_id, limit="40", exclude_reblogs=True)
all_statuses = mastodon.fetch_remaining(statuses)
for status in all_statuses:
if (status["created_at"].day == 14 and status["created_at"].month == 2 and status["created_at"].year < 2022) :
print(status["uri"])
Building a time machine
All of the above works, but is pretty inefficient because there's no way to search for specific timeframes on Mastodon. Or so I thought!
If we can calculate the maximum and minimum IDs for a given day, we will have a much more efficient search!
Let's dive in to the Mastodon Snowflake code. It is really well documented:
Our ID will be composed of the following:
6 bytes (48 bits) of millisecond-level timestamp
2 bytes (16 bits) of sequence data
OK! Let's look at a typical Mastodon Status ID mastodon.social/@Edent/109326536843609210
, it was posted on 2022-11-11 at 18:16.
Let's take the ID 109326536843609210
and perform a bitwise shift on it.
Python 3print(109326536843609210 >> 16)
1668190564630
Hey! That looks a bit like a UNIX timestamp! The last three numbers will be the sequence, so we can eliminate them and see what happens if we convert it to a timestamp.
Python 3from datetime import datetime
datetime.fromtimestamp(1668190564630/1000)
datetime.datetime(2022, 11, 11, 18, 16, 4, 630000)
Nice! So we can go backward and take a date - say this time last year - and convert it to a maximum and minimum ID.
Python 3min_id = ( int( datetime(2022,11,11,00,00).timestamp() ) << 16 ) * 1000
max_id = ( int( datetime(2022,11,11,23,59).timestamp() ) << 16 ) * 1000
Which gives us 109322226892800000
and 109327885271040000
respectively.
Let's try that with the API!
Final Code
Python 3from datetime import datetime, timedelta
from mastodon import Mastodon
# Set up access
instance = "https://mastodon.example"
mastodon = Mastodon( api_base_url=instance, access_token="abc123" )
# Get user's info
me = mastodon.me()
my_id = me["id"]
year_joined = me["created_at"].year
# Today's date
year_now = datetime.now().year
month_now = datetime.now().month
day_now = datetime.now().day
# Counter
year_counter = year_now
# Loop through previous years
# Start with last year and go down until the user joined
while (year_counter >= year_joined ) :
year_counter -= 1
# The end of today is the start of tomorrow
# This means yesterday can take into account leap-years
today_end = datetime(year_counter, month_now, day_now, 00, 00) + timedelta(days=1)
yesterday_end = today_end - timedelta(days=1)
# Bitwise shift the integer representation and convert to milliseconds
max_id = ( int( today_end.timestamp() ) << 16 ) * 1000
min_id = ( int( yesterday_end.timestamp() ) << 16 ) * 1000
# Call the API
statuses = mastodon.account_statuses(id = my_id, max_id=max_id, min_id=min_id, limit="40", exclude_reblogs=True)
# Fetch further statuses if there are any
all_statuses = mastodon.fetch_remaining(statuses)
# Print the date and URl
for status in all_statuses :
print( str(status["created_at"]) + " " + status["uri"] )
Next Steps
It works on my machine! But that's not really good enough. Ideally I'd like to turn this in to a web app which people could use with their own account.
If you're interested in helping out with that grab the code or drop me a line!
Naheem Says said on twitter.com:
I wonder if having no search would improve twitter like experiences.
It may be useful to hold politicians and public figures to their words but for everyone else it allows witch hunts.
Robert Sharp says:
That’s so interesting, Naheem.
One school of thought is that since a Twitter stream (or Toot stream) is published by a user, that user should have control over what others see. An aspect of privacy.
The other school of thought is that since something has been published to the world, it’s no longer under the control of the author, and others have a right to see what you said previously.
My friend David Eastman wrote a blog post a few years ago that I think about often. The idea that the privacy and anonymity we used to experience never needed to be actively thought about, because invading it was always such a hassle for other people. That is, until our lives were digitised (sometimes with our consent, very often without).
I think the same is true with regards to the “privacy” of our past. The spectacularly unfunny stuff I wrote as a youth is thankfully lost. It wouldn’t be if I had kept a blog or a Twitter feed in my teens.
The EU has a ‘right to be forgotten’ law of course, which speaks to this.
For a very short time a decade ago I wondered whether I might stand for election to something. I began to realise that my digital output might, in future years, be scrutinised, and that in turn affected what I wrote and how I wrote it. I’ve given up on such lofty (or stupid) ideas now but am conscious that if anyone will read my digital output in the future, it will likely be my kids (and their kids?) and I want to come across as Not A Dickhead. The point is that awareness of posterity now colours everything I post online. I think it’s subconscious now, and mental state, a way of thinking.
It’s fascinating to be reminded that my mental (shall we say) orientation in this regard is a product of design choices by technologists, who may not even realise they’re making choices.
Workshopshed said on mastodon.scot:
@Edent nice work, Terence
backseat said on fosstodon.org:
@Edent thanks for publishing your code - looks great!
Benjamin S-B :verified: said on genomic.social:
@Edent Excellent project. You might have seen, but there's also this feature request for better 🧵 handling: https://github.com/mastodon/mastodon/issues/8615 Fold toots in a thread, only display some of them in timeline · Issue #8615 · mastodon/mastodon
Poliverso notizie dal fediverso said on poliverso.org:
COME FARE RICERCHE SU MASTODON PER DATA E ORA. IL POST DI @
(qui il post di Terence Eden)!Che succede nel Fediverso?
Dane says:
Further to this, if you want to start getting your old Pixelfed posts, they also use Snowflake IDs, but have an epoch in Feburary 2019, so there's a slightly different way to calculate them:
min_id = (int(datetime(2022,11,28,2,00).timestamp()) * 1000) - 1549756800000 << 22 max_id = (int(datetime(2022,11,28,2,59).timestamp()) * 1000) - 1549756800000 << 22
https://github.com/pixelfed/pixelfed/blob/dev/app/Services/SnowflakeService.php
More comments on Mastodon.