You don't need an API key to archive Twitter Data
Apparently there's no need for IP laws any more, so here's a way to archive high-fidelity Twitter data without signing up for an expensive API key.
This is perfect for academics wishing to preserve Tweets, journalists wanting to download evidence, or simply embedding content without leaking user data back to Twitter.
tl;dr
You can get the full JSON code of any Tweet by using this API:
https://cdn.syndication.twimg.com/tweet-result?id=123456789&token=01010101010
Add any valid Twitter id
, and choose a random number for your token
. Done.
Background
Twitter has an "embed" functionality. Websites can import a full copy of a Tweet, including its media and metadata. Twitter's documentation is a little lacklustre but here's a brief explanation of how it works.
Embed Code
Using HTML like this:
HTML
<iframe
src="https://platform.twitter.com/embed/Tweet.html?id=719484841172054016"
width=512
height=768></iframe>
Produces an embeddable which looks like this:
API Call
With a bit of sniffing of the traffic, it's possible to see that the iframe eventually calls a URl like this:
https://cdn.syndication.twimg.com/tweet-result?id=719484841172054016&token=123
Visit that and you'll see the JSON code of a Tweet.
Options
id=
this is the numeric ID of the Tweet.token=
this is the API token. It can be set to a random number. It isn't checked.- There's an optional
lang=
which takes BCP47 language codes. For examplelang=en
orlang=zh
. However, they don't seem to make any difference to the output.
Output
Here's the JSON of the above Tweet. As you can see, it includes metadata on the number of replies, favourites, and retweets. There are entities, fully expanded links, and media in a variety of formats. There's also information on whether the post has been edited, if the user is stupid enough to pay for a blue-tick, and the language of the message.
Tweet With Image
JSON
{
"__typename": "Tweet",
"lang": "en",
"favorite_count": 4,
"possibly_sensitive": false,
"created_at": "2016-04-11T11:18:48.000Z",
"display_text_range": [
0,
120
],
"entities": {
"hashtags": [],
"urls": [],
"user_mentions": [
{
"id_str": "23937508",
"indices": [
20,
30
],
"name": "BBC Radio 4",
"screen_name": "BBCRadio4"
}
],
"symbols": [],
"media": [
{
"display_url": "pic.x.com/6F3ZSiWuIn",
"expanded_url": "https://x.com/edent/status/719484841172054016/photo/1",
"indices": [
97,
120
],
"url": "https://t.co/6F3ZSiWuIn"
}
]
},
"id_str": "719484841172054016",
"text": "Warning! I'll be on @BBCRadio4's You And Yours shortly.\nPlease tune your wirelesses accordingly. https://t.co/6F3ZSiWuIn",
"user": {
"id_str": "14054507",
"name": "Terence Eden is on Mastodon",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/1623225628530016260/SW0HsKjP_normal.jpg",
"screen_name": "edent",
"verified": false,
"is_blue_verified": false,
"profile_image_shape": "Circle"
},
"edit_control": {
"edit_tweet_ids": [
"719484841172054016"
],
"editable_until_msecs": "1460375328174",
"is_edit_eligible": true,
"edits_remaining": "5"
},
"mediaDetails": [
{
"display_url": "pic.x.com/6F3ZSiWuIn",
"expanded_url": "https://x.com/edent/status/719484841172054016/photo/1",
"ext_media_availability": {
"status": "Available"
},
"indices": [
97,
120
],
"media_url_https": "https://pbs.twimg.com/media/CfwfpnJWwAEXwe3.jpg",
"original_info": {
"height": 1280,
"width": 960,
"focus_rects": []
},
"sizes": {
"large": {
"h": 1280,
"resize": "fit",
"w": 960
},
"medium": {
"h": 1200,
"resize": "fit",
"w": 900
},
"small": {
"h": 680,
"resize": "fit",
"w": 510
},
"thumb": {
"h": 150,
"resize": "crop",
"w": 150
}
},
"type": "photo",
"url": "https://t.co/6F3ZSiWuIn"
}
],
"photos": [
{
"backgroundColor": {
"red": 204,
"green": 214,
"blue": 221
},
"cropCandidates": [],
"expandedUrl": "https://x.com/edent/status/719484841172054016/photo/1",
"url": "https://pbs.twimg.com/media/CfwfpnJWwAEXwe3.jpg",
"width": 960,
"height": 1280
}
],
"conversation_count": 1,
"news_action_type": "conversation",
"isEdited": false,
"isStaleEdit": false
}
Replies
Here's a more complicated example. This Tweet is in reply to another Tweet - so both messages are included:
JSON
{
"__typename": "Tweet",
"in_reply_to_screen_name": "edent",
"in_reply_to_status_id_str": "1095653997644574720",
"in_reply_to_user_id_str": "14054507",
"lang": "en",
"favorite_count": 0,
"created_at": "2019-02-13T12:22:59.000Z",
"display_text_range": [
7,
252
],
"entities": {
"hashtags": [],
"urls": [],
"user_mentions": [
{
"id_str": "14054507",
"indices": [
0,
6
],
"name": "Terence Eden is on Mastodon",
"screen_name": "edent"
}
],
"symbols": []
},
"id_str": "1095659600420966400",
"text": "@edent I can definitely see how this would get in the way of making your day a productive one. Do you find this happens often? If it does, I'd be happy to chat to you about a reliable alternative with us during your lunch break! ☕ PM me for a chat! ^JH",
"user": {
"id_str": "20139563",
"name": "Sky",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/1674689671006240769/OpfisqRG_normal.jpg",
"screen_name": "SkyUK",
"verified": false,
"verified_type": "Business",
"is_blue_verified": false,
"profile_image_shape": "Square"
},
"edit_control": {
"edit_tweet_ids": [
"1095659600420966400"
],
"editable_until_msecs": "1550062379768",
"is_edit_eligible": true,
"edits_remaining": "5"
},
"conversation_count": 2,
"news_action_type": "conversation",
"parent": {
"lang": "en",
"reply_count": 2,
"retweet_count": 1,
"favorite_count": 1,
"possibly_sensitive": false,
"created_at": "2019-02-13T12:00:43.000Z",
"display_text_range": [
0,
112
],
"entities": {
"hashtags": [],
"urls": [],
"user_mentions": [
{
"id_str": "17872077",
"indices": [
33,
45
],
"name": "Virgin Media ❤️",
"screen_name": "virginmedia"
}
],
"symbols": [],
"media": [
{
"display_url": "pic.x.com/mje6nh38CZ",
"expanded_url": "https://x.com/edent/status/1095653997644574720/photo/1",
"indices": [
113,
136
],
"url": "https://t.co/mje6nh38CZ"
}
]
},
"id_str": "1095653997644574720",
"text": "Working from home is tricky when @virginmedia goes down so hard even its status page falls over.\nTime for lunch. https://t.co/mje6nh38CZ",
"user": {
"id_str": "14054507",
"name": "Terence Eden is on Mastodon",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/1623225628530016260/SW0HsKjP_normal.jpg",
"screen_name": "edent",
"verified": false,
"is_blue_verified": false,
"profile_image_shape": "Circle"
},
"edit_control": {
"edit_tweet_ids": [
"1095653997644574720"
],
"editable_until_msecs": "1550061043962",
"is_edit_eligible": true,
"edits_remaining": "5"
},
"mediaDetails": [
{
"display_url": "pic.x.com/mje6nh38CZ",
"expanded_url": "https://x.com/edent/status/1095653997644574720/photo/1",
"ext_alt_text": "Oops! something's broken! ",
"ext_media_availability": {
"status": "Available"
},
"indices": [
113,
136
],
"media_url_https": "https://pbs.twimg.com/media/DzSLf6sWsAAGWWH.jpg",
"original_info": {
"height": 797,
"width": 1080,
"focus_rects": [
{
"x": 0,
"y": 192,
"w": 1080,
"h": 605
},
{
"x": 142,
"y": 0,
"w": 797,
"h": 797
},
{
"x": 191,
"y": 0,
"w": 699,
"h": 797
},
{
"x": 341,
"y": 0,
"w": 399,
"h": 797
},
{
"x": 0,
"y": 0,
"w": 1080,
"h": 797
}
]
},
"sizes": {
"large": {
"h": 797,
"resize": "fit",
"w": 1080
},
"medium": {
"h": 797,
"resize": "fit",
"w": 1080
},
"small": {
"h": 502,
"resize": "fit",
"w": 680
},
"thumb": {
"h": 150,
"resize": "crop",
"w": 150
}
},
"type": "photo",
"url": "https://t.co/mje6nh38CZ"
}
],
"photos": [
{
"accessibilityLabel": "Oops! something's broken! ",
"backgroundColor": {
"red": 204,
"green": 214,
"blue": 221
},
"cropCandidates": [
{
"x": 0,
"y": 192,
"w": 1080,
"h": 605
},
{
"x": 142,
"y": 0,
"w": 797,
"h": 797
},
{
"x": 191,
"y": 0,
"w": 699,
"h": 797
},
{
"x": 341,
"y": 0,
"w": 399,
"h": 797
},
{
"x": 0,
"y": 0,
"w": 1080,
"h": 797
}
],
"expandedUrl": "https://x.com/edent/status/1095653997644574720/photo/1",
"url": "https://pbs.twimg.com/media/DzSLf6sWsAAGWWH.jpg",
"width": 1080,
"height": 797
}
],
"isEdited": false,
"isStaleEdit": false
},
"isEdited": false,
"isStaleEdit": false
}
Quote Tweets
Here's an example where I have quoted a Tweet:
JSON
{
"__typename": "Tweet",
"lang": "en",
"favorite_count": 9,
"possibly_sensitive": false,
"created_at": "2022-08-19T13:36:44.000Z",
"display_text_range": [
0,
182
],
"entities": {
"hashtags": [],
"urls": [
{
"display_url": "gu.com",
"expanded_url": "http://gu.com",
"indices": [
17,
40
],
"url": "https://t.co/Skj7FB7Tyt"
}
],
"user_mentions": [],
"symbols": []
},
"id_str": "1560621791470448642",
"text": "Whoever buys the https://t.co/Skj7FB7Tyt domain will effectively get to rewrite history.\nThey can redirect links like these - and change the nature of the content being commented on.",
"user": {
"id_str": "14054507",
"name": "Terence Eden is on Mastodon",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/1623225628530016260/SW0HsKjP_normal.jpg",
"screen_name": "edent",
"verified": false,
"is_blue_verified": false,
"profile_image_shape": "Circle"
},
"edit_control": {
"edit_tweet_ids": [
"1560621791470448642"
],
"editable_until_msecs": "1660918004000",
"is_edit_eligible": true,
"edits_remaining": "5"
},
"conversation_count": 4,
"news_action_type": "conversation",
"quoted_tweet": {
"lang": "en",
"reply_count": 131,
"retweet_count": 1337,
"favorite_count": 2789,
"possibly_sensitive": false,
"created_at": "2018-11-27T15:56:19.000Z",
"display_text_range": [
0,
279
],
"entities": {
"hashtags": [],
"urls": [
{
"display_url": "gu.com/p/axa7k/stw",
"expanded_url": "https://gu.com/p/axa7k/stw",
"indices": [
256,
279
],
"url": "https://t.co/UulPL1CtcK"
}
],
"user_mentions": [],
"symbols": []
},
"id_str": "1067447032363794432",
"text": "The Steele Dossier asserted Russian hacking of the DNC was \"conducted with the full knowledge & support of Trump & senior members of his campaign.” Trump's war against the FBI & efforts to obstruct make sense if he thought they could prove it. https://t.co/UulPL1CtcK",
"user": {
"id_str": "548384458",
"name": "Joyce Alene",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/952257848301498371/5s24RH-g_normal.jpg",
"screen_name": "JoyceWhiteVance",
"verified": false,
"is_blue_verified": true,
"profile_image_shape": "Circle"
},
"edit_control": {
"edit_tweet_ids": [
"1067447032363794432"
],
"editable_until_msecs": "1543335979379",
"is_edit_eligible": true,
"edits_remaining": "5"
},
"isEdited": false,
"isStaleEdit": false
},
"isEdited": false,
"isStaleEdit": false
}
Downloading Media
Videos are also available to download, with no restrictions, in a variety of resolutions:
JSON
"mediaDetails": [
{
"type": "video",
"url": "https://t.co/Qw1IFom7Fh",
"video_info": {
"aspect_ratio": [
3,
4
],
"duration_millis": 13578,
"variants": [
{
"content_type": "application/x-mpegURL",
"url": "https://video.twimg.com/ext_tw_video/1432767873504718850/pu/pl/DiIKFNNZLWbLmECm.m3u8?tag=12"
},
{
"bitrate": 632000,
"content_type": "video/mp4",
"url": "https://video.twimg.com/ext_tw_video/1432767873504718850/pu/vid/320x426/oq2p-t0RJEEKuDD6.mp4?tag=12"
},
{
"bitrate": 950000,
"content_type": "video/mp4",
"url": "https://video.twimg.com/ext_tw_video/1432767873504718850/pu/vid/480x640/3X8ZsBmXmmaaakmM.mp4?tag=12"
},
{
"bitrate": 2176000,
"content_type": "video/mp4",
"url": "https://video.twimg.com/ext_tw_video/1432767873504718850/pu/vid/720x960/sS9cLdGn93eUmvKC.mp4?tag=12"
}
]
}
}
],
Other Examples
Limitations
There are a few small limitations with this approach.
- It doesn't capture replies
- If the Tweet is in reply to something, it will capture the parent.
- If the Tweet quotes something, it will capture the quoted Tweet.
- The counts for replies, retweets, and favourites may not be accurate
- Older messages seem worse for this, but that's a natural part of digital decay.
- Reduced metadata
- The official API used to tell you which device was used to post the message, user's timezone, and other bits of useful information.
- You need to know the ID of the Tweet
- There's no way to automatically grab every Tweet by a user, or from a search.
- Sometimes the API stops responding
- Change the token to another random number.
- Occasionally replies and quotes won't be included
- Calling the API again often recovers the data.
Python Code
If you're technically inclined, I've written some Python code to automate turning the JSON into HTML.
Have Fun
Remember, the owner of Twitter no longer believes in IP law. So I guess you can go nuts and download all of Twitter's data and use it for any purpose?
@Edent Nice. This might also be usable for Hugo shortcodes. https://gohugo.io/content-management/shortcodes/
Shortcodes
Reply to original comment on mastodon.nl
|@Edent Nice and very helpful. I updated my IRC bot to use that instead of unreliable scraping of Nitter instances, works like a charm.
Thanks for sharing!
Reply to original comment on mastodon.social
|More comments on Mastodon.