How to search Mastodon by date & time
Two years ago to the day, I built Twistory - a service for seeing what you posted on Twitter on this day in previous years. If you've ever used Facebook, you'll know how it is supposed to work. You see posts which show that exactly 5 years ago you were starting a new job, 6 years ago you were at a wedding, etc.
The Twitter version never really worked properly because the Twitter API doesn't support searching for historic Tweets. What I had to do was manually build search queries like: ?q=from:edent (until:2011-11-15 since:2011-11-14) and redirect people to the website.
Eugh!
I'm trying to build something similar for the Mastodon social network. Yes, I know it is new to you - but some of us have been there for several years.
So here's how to search Mastodon for posts made on specific dates!
(Skip to the code and ignore all the exciting preamble.)
Sadly, the Mastodon Search API is still quite basic. It isn't possible to directly search by user or by data parameters.
If you download the archive of all your posts, you'll find an ActivityPub feed called outbox.json
- which is a collection of everything you've ever posted.
It can be parsed using jq to get all of the statuses posted on a specific day:
cat outbox.json | jq ".orderedItems[] | select (.published | fromdateiso8601 > 1636533813) | select (.published | fromdateiso8601 < 1636620302)"
The fromdateiso8601
are the Unix epoch times from 365 days ago and 364 days ago.
So, conceptually, it's possible to build - as long as you're willing to download your data and manually parse it. Let's see if we can do a little better than that.
Building in Python
We're going to build this using Python3 and the Mastodon.py library.
Install the library on the command line with:
pip3 install -U Mastodon.py
Now, launch Python and load the library:
Python 3
from mastodon import Mastodon
We'll need an API key. Go to the website of your Mastodon instance. In settings, there should be an option called "Development". Use that to create a new app which has the "Read" scope.
Once created, we will use the "Your access token" which will be a long string of random letters and numbers. In this example, we'll be using abc123
. Your real access token will be longer!
Let's set up a connection to your Mastodon instance:
Python 3
instance = "https://mastodon.example.com"
mastodon = Mastodon( api_base_url=instance, access_token="abc123" )
Next, we need your user ID. This isn't your @ name, instead it is the numerical ID assigned by the server. For this, we need the Verify account credentials API call:
Python 3
mastodon.me()
That produces:
JSON
{
'id': 7112,
'username': 'Edent',
'acct': 'Edent',
'display_name': 'Terence Eden',
...
Looks like I was a pretty early adopter!
Getting the last 20 statuses is:
Python 3
me = mastodon.me()
my_id = me["id"]
mastodon.account_statuses(id = my_id)
The problem is, that can only receive a maximum of 40 statuses at a time.
Python 3
statuses = mastodon.account_statuses(id = my_id, limit="40")
We need to use Pagination. The API makes it pretty easy to grab the next page. There's also a call to get every post.
Be warned - this can take a long time. If you have thousands of posts it may take a few minutes. It can also quickly deplete your API rate limits. Use with caution!
We can reduce some of the load by excluding anything you've "boosted".
Python 3
statuses = mastodon.account_statuses(id = my_id, limit="40", exclude_reblogs=True)
all_statuses = mastodon.fetch_remaining(statuses)
You can run len(all_statuses)
to see how many you have retrieved.
The next step is finding all the posts which happened on a certain day each year. Let's say we want every post which happened on a 14th of February.
The library returns timestamps as Python DateTime objects.
Python 3
status_date = all_statuses[1]["created_at"]
Shows
Python 3
datetime.datetime(2016, 11, 1, 17, 4, 23, 842000, tzinfo=tzutc())
The datetime library is pretty handy. You can find the day using status_date.day
and month with status_date.month
.
This means we can loop through every status and show only the ones we care about.
Python 3
for status in all_statuses:
if (status["created_at"].day == 14 and status["created_at"].month == 2):
print(status["uri"])
That will get you a list of URls which contain posts made on a specific day in previous years.
Putting it all together
Python 3
from mastodon import Mastodon
instance = "https://mastodon.example.com"
mastodon = Mastodon( api_base_url=instance, access_token="abc123" )
me = mastodon.me()
my_id = me["id"]
mastodon.account_statuses(id = my_id)
statuses = mastodon.account_statuses(id = my_id, limit="40", exclude_reblogs=True)
all_statuses = mastodon.fetch_remaining(statuses)
for status in all_statuses:
if (status["created_at"].day == 14 and status["created_at"].month == 2 and status["created_at"].year < 2022) :
print(status["uri"])
Building a time machine
All of the above works, but is pretty inefficient because there's no way to search for specific timeframes on Mastodon. Or so I thought!
If we can calculate the maximum and minimum IDs for a given day, we will have a much more efficient search!
Let's dive in to the Mastodon Snowflake code. It is really well documented:
Our ID will be composed of the following: 6 bytes (48 bits) of millisecond-level timestamp 2 bytes (16 bits) of sequence data
OK! Let's look at a typical Mastodon Status ID mastodon.social/@Edent/109326536843609210
, it was posted on 2022-11-11 at 18:16.
Let's take the ID 109326536843609210
and perform a bitwise shift on it.
Python 3
print(109326536843609210 >> 16)
1668190564630
Hey! That looks a bit like a UNIX timestamp! The last three numbers will be the sequence, so we can eliminate them and see what happens if we convert it to a timestamp.
Python 3
from datetime import datetime
datetime.fromtimestamp(1668190564630/1000)
datetime.datetime(2022, 11, 11, 18, 16, 4, 630000)
Nice! So we can go backward and take a date - say this time last year - and convert it to a maximum and minimum ID.
Python 3
min_id = ( int( datetime(2022,11,11,00,00).timestamp() ) << 16 ) * 1000
max_id = ( int( datetime(2022,11,11,23,59).timestamp() ) << 16 ) * 1000
Which gives us 109322226892800000
and 109327885271040000
respectively.
Let's try that with the API!
Final Code
Python 3
from datetime import datetime, timedelta
from mastodon import Mastodon
# Set up access
instance = "https://mastodon.example"
mastodon = Mastodon( api_base_url=instance, access_token="abc123" )
# Get user's info
me = mastodon.me()
my_id = me["id"]
year_joined = me["created_at"].year
# Today's date
year_now = datetime.now().year
month_now = datetime.now().month
day_now = datetime.now().day
# Counter
year_counter = year_now
# Loop through previous years
# Start with last year and go down until the user joined
while (year_counter >= year_joined ) :
year_counter -= 1
# The end of today is the start of tomorrow
# This means yesterday can take into account leap-years
today_end = datetime(year_counter, month_now, day_now, 00, 00) + timedelta(days=1)
yesterday_end = today_end - timedelta(days=1)
# Bitwise shift the integer representation and convert to milliseconds
max_id = ( int( today_end.timestamp() ) << 16 ) * 1000
min_id = ( int( yesterday_end.timestamp() ) << 16 ) * 1000
# Call the API
statuses = mastodon.account_statuses(id = my_id, max_id=max_id, min_id=min_id, limit="40", exclude_reblogs=True)
# Fetch further statuses if there are any
all_statuses = mastodon.fetch_remaining(statuses)
# Print the date and URl
for status in all_statuses :
print( str(status["created_at"]) + " " + status["uri"] )
Next Steps
It works on my machine! But that's not really good enough. Ideally I'd like to turn this in to a web app which people could use with their own account.
If you're interested in helping out with that grab the code or drop me a line!
Dane says:
More comments on Mastodon.