<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/rss-style.xsl" type="text/xsl"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	     xmlns:dc="http://purl.org/dc/elements/1.1/"
	   xmlns:atom="http://www.w3.org/2005/Atom"
	     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	  xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>Deficiencies in the Twitter Archive &#8211; Terence Eden’s Blog</title>
	<atom:link href="https://shkspr.mobi/blog/2013/02/deficiencies-in-the-twitter-archive/feed/" rel="self" type="application/rss+xml" />
	<link>https://shkspr.mobi/blog</link>
	<description>Regular nonsense about tech and its effects 🙃</description>
	<lastBuildDate>Sat, 24 Aug 2024 20:01:05 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>

<image>
	<url>https://shkspr.mobi/blog/wp-content/uploads/2023/07/cropped-avatar-32x32.jpeg</url>
	<title>Deficiencies in the Twitter Archive &#8211; Terence Eden’s Blog</title>
	<link>https://shkspr.mobi/blog</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title><![CDATA[Deficiencies in the Twitter Archive]]></title>
		<link>https://shkspr.mobi/blog/2013/02/deficiencies-in-the-twitter-archive/</link>
					<comments>https://shkspr.mobi/blog/2013/02/deficiencies-in-the-twitter-archive/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sun, 24 Feb 2013 12:00:58 +0000</pubDate>
				<category><![CDATA[usability]]></category>
		<category><![CDATA[twitter]]></category>
		<guid isPermaLink="false">http://shkspr.mobi/blog/?p=7598</guid>

					<description><![CDATA[After Twitter&#039;s repeated broken promises, I was unsure if I&#039;d ever get access to my Twitter archive.  But, finally, I&#039;m able to extract my data from their systems.  There are a number of deficiencies.  Of course, it&#039;s impossible to please all the people all the time but, in typical Twitter fashion, they don&#039;t appear to have taken the effort to satisfied anyone.  Let&#039;s take a quick run through of…]]></description>
										<content:encoded><![CDATA[<p>After Twitter's repeated broken promises, I was unsure if I'd ever get access to my Twitter archive.  But, finally, I'm able to extract my data from their systems.</p>

<p>There are a number of deficiencies.  Of course, it's impossible to please all the people all the time but, in typical Twitter fashion, they don't appear to have taken the effort to satisfied anyone.</p>

<p>Let's take a quick run through of where the archive breaks down.</p>

<h2 id="usernames-change"><a href="https://shkspr.mobi/blog/2013/02/deficiencies-in-the-twitter-archive/#usernames-change">Usernames Change</a></h2>

<p>When I first signed up to Twitter, I was known as "Vodaclone". What a witty and original name! I took advantage of Twitter's ability to change screen names a few months after joining.</p>

<blockquote class="social-embed" id="social-embed-881354746" lang="en" itemscope="" itemtype="https://schema.org/SocialMediaPosting"><header class="social-embed-header" itemprop="author" itemscope="" itemtype="https://schema.org/Person"><a href="https://twitter.com/edent" class="social-embed-user" itemprop="url"><img class="social-embed-avatar social-embed-avatar-circle" src="data:image/webp;base64,UklGRkgBAABXRUJQVlA4IDwBAACQCACdASowADAAPrVQn0ynJCKiJyto4BaJaQAIIsx4Au9dhDqVA1i1RoRTO7nbdyy03nM5FhvV62goUj37tuxqpfpPeTBZvrJ78w0qAAD+/hVyFHvYXIrMCjny0z7wqsB9/QE08xls/AQdXJFX0adG9lISsm6kV96J5FINBFXzHwfzMCr4N6r3z5/Aa/wfEoVGX3H976she3jyS8RqJv7Jw7bOxoTSPlu4gNbfXYZ9TnbdQ0MNnMObyaRQLIu556jIj03zfJrVgqRM8GPwRoWb1M9AfzFe6Mtg13uEIqrTHmiuBpH+bTVB5EEQ3uby0C//XOAPJOFv4QV8RZDPQd517Khyba8Jlr97j2kIBJD9K3mbOHSHiQDasj6Y3forATbIg4QZHxWnCeqqMkVYfUAivuL0L/68mMnagAAA" alt="" itemprop="image"><div class="social-embed-user-names"><p class="social-embed-user-names-name" itemprop="name">Terence Eden is on Mastodon</p>@edent</div></a><img class="social-embed-logo" alt="Twitter" src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%0Aaria-label%3D%22Twitter%22%20role%3D%22img%22%0AviewBox%3D%220%200%20512%20512%22%3E%3Cpath%0Ad%3D%22m0%200H512V512H0%22%0Afill%3D%22%23fff%22%2F%3E%3Cpath%20fill%3D%22%231d9bf0%22%20d%3D%22m458%20140q-23%2010-45%2012%2025-15%2034-43-24%2014-50%2019a79%2079%200%2000-135%2072q-101-7-163-83a80%2080%200%200024%20106q-17%200-36-10s-3%2062%2064%2079q-19%205-36%201s15%2053%2074%2055q-50%2040-117%2033a224%20224%200%2000346-200q23-16%2040-41%22%2F%3E%3C%2Fsvg%3E"></header><section class="social-embed-text" itemprop="articleBody">Having listened to the phreadz-headz, I am no longer Vodaclone. Behold, edent! For I am Terence Eden and I stand by what I say.</section><hr class="social-embed-hr"><footer class="social-embed-footer"><a href="https://twitter.com/edent/status/881354746"><span aria-label="0 likes" class="social-embed-meta">❤️ 0</span><span aria-label="0 replies" class="social-embed-meta">💬 0</span><span aria-label="0 reposts" class="social-embed-meta">🔁 0</span><time datetime="2008-08-08T10:48:29.000Z" itemprop="datePublished">10:48 - Fri 08 August 2008</time></a></footer></blockquote>

<p>Yet all the tweets are written as though they come from @edent.</p>

<h2 id="profile-picture-changes"><a href="https://shkspr.mobi/blog/2013/02/deficiencies-in-the-twitter-archive/#profile-picture-changes">Profile Picture Changes</a></h2>

<p>Like many people, I update my avatar.  According to the Twitter archive - I've used the same one since the dawn of time.
<img src="https://shkspr.mobi/blog/wp-content/uploads/2013/02/Twitter-Archive.png" alt="Twitter Archive" width="940" height="748" class="aligncenter size-full wp-image-7600">
It's sad that there doesn't appear to be any record of the faces I pulled or the banners I added.</p>

<h2 id="missing-media"><a href="https://shkspr.mobi/blog/2013/02/deficiencies-in-the-twitter-archive/#missing-media">Missing Media</a></h2>

<p>Indeed, it's odd that the avatar images aren't linked to the originals.  What's more annoying is that the images I've uploaded to Twitter's image service aren't included.  My archive of ~28,000 Tweets weighs in at 6MB - while adding in dozens of images would balloon that - it's couldn't be a huge strain on Twitter's resources.</p>

<p>I used grep to extract all the media_url parameters, then used wget to download them all.  My 370 images took up a mere 33MB.</p>

<h2 id="favourites"><a href="https://shkspr.mobi/blog/2013/02/deficiencies-in-the-twitter-archive/#favourites">Favourites</a></h2>

<p>Remember all those cool Tweets you favourited?  They're not here.</p>

<h2 id="no-direct-messages"><a href="https://shkspr.mobi/blog/2013/02/deficiencies-in-the-twitter-archive/#no-direct-messages">No Direct Messages</a></h2>

<p>Twitter has gone to great lengths to try and kill off its DM service. Once it realised that people were using the service in a way which wouldn't force them to interact with advertisers, they've made it steadily harder to access private messages.</p>

<p>They are, of course, complete absent from the archive.  Hopefully all the meaningful DMs which were sent to you are backed up in email somewhere - but the ones you sent are nowhere to be seen.  Good luck future scholars of the world!</p>

<h2 id="lack-of-thread-context"><a href="https://shkspr.mobi/blog/2013/02/deficiencies-in-the-twitter-archive/#lack-of-thread-context">Lack of Thread Context</a></h2>

<p>A typical tweet from 2010 reads</p>

<blockquote class="social-embed" id="social-embed-19938122225" lang="en" itemscope="" itemtype="https://schema.org/SocialMediaPosting"><blockquote class="social-embed" id="social-embed-19937905358" lang="en" itemscope="" itemtype="https://schema.org/SocialMediaPosting"><header class="social-embed-header" itemprop="author" itemscope="" itemtype="https://schema.org/Person"><a href="https://twitter.com/amanda" class="social-embed-user" itemprop="url"><img class="social-embed-avatar social-embed-avatar-circle" src="data:image/webp;base64,UklGRmoCAABXRUJQVlA4IF4CAADQDACdASowADAAPrVMoUonJCMhrjgLMOAWiWQAnTNwwo/Ou9XyCghjYTW5aOmnagIOyL75u6d4zZjBQidDUxmhzqjRcwxiXBvJ+GS/09SU3Ej6Q0QInpp5kROw3MemuxeR7A0JZPJ/BmzDE10L4oAA9lzn9bqHMIwh+N0bGif59o7e1oWALgrpIDuZ0ZtybV57YBPjGkTYu5/L90WwGVfLoEltmzEx9xfL6je6DTDQB0b4wxp/mfjGk8wOv3v5ynvOGas9t7vHGzfOZb8j/zUCtoIXzikfJCQZVtd2yMqb7R2Vhyu3NwQFtIh484MvGXT4ZdtFulRQJu5pvoJFvrDKL18Atky/hIFOSPLF7/CubXqqpyFHD9pJ2QAaKibyhIuzwZ4qBlgQGpWqpAGg07xXEqxlD+SpSC4r/3s1Nf9semaLOJkDS1neFBvyvo0nhgMq+V1mwOTlx7O7qZ58PBxb9nv9pXkSPhdsS8WbAVqBHfUrM48aT4YQB2nQ3kHjsnkOT7fEsO1BQ2Z3bGCGJz6bxV5cmFjZbqhV3yLZOxqqgbe8TtPAGnFIXPHqNH0NKZKm8mQ91Ap40EkhmA6IybbYrY32Ni3goGLQ6o5z/3Xr9Z8iXSG/BKky0VSy0PGJHJ+LbaGw4sAxRSxUkcrmhDmWcejXLkMdqU7MHJtjoItaG+5a3RDxbSJODe44JUVpoGAL1rmN8fhIMzLwmmHI/OrCidBqMpRqTCUIFWCsBb/GOArTUEQeFNZJSeXlKozELFwPJPxK5GpfurWNziBCI50vMeFWsN9jJD1q3DweAAA=" alt="" itemprop="image"><div class="social-embed-user-names"><p class="social-embed-user-names-name" itemprop="name">Amanda Rose</p>@amanda</div></a><img class="social-embed-logo" alt="Twitter" src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%0Aaria-label%3D%22Twitter%22%20role%3D%22img%22%0AviewBox%3D%220%200%20512%20512%22%3E%3Cpath%0Ad%3D%22m0%200H512V512H0%22%0Afill%3D%22%23fff%22%2F%3E%3Cpath%20fill%3D%22%231d9bf0%22%20d%3D%22m458%20140q-23%2010-45%2012%2025-15%2034-43-24%2014-50%2019a79%2079%200%2000-135%2072q-101-7-163-83a80%2080%200%200024%20106q-17%200-36-10s-3%2062%2064%2079q-19%205-36%201s15%2053%2074%2055q-50%2040-117%2033a224%20224%200%2000346-200q23-16%2040-41%22%2F%3E%3C%2Fsvg%3E"></header><section class="social-embed-text" itemprop="articleBody">Ordering a batch of new books. Anything slightly off the radar you think I should read?</section><hr class="social-embed-hr"><footer class="social-embed-footer"><a href="https://twitter.com/amanda/status/19937905358"><span aria-label="0 likes" class="social-embed-meta">❤️ 0</span><span aria-label="0 replies" class="social-embed-meta">💬 0</span><span aria-label="0 reposts" class="social-embed-meta">🔁 0</span><time datetime="2010-07-30T20:40:32.000Z" itemprop="datePublished">20:40 - Fri 30 July 2010</time></a></footer></blockquote><header class="social-embed-header" itemprop="author" itemscope="" itemtype="https://schema.org/Person"><a href="https://twitter.com/edent" class="social-embed-user" itemprop="url"><img class="social-embed-avatar social-embed-avatar-circle" src="data:image/webp;base64,UklGRkgBAABXRUJQVlA4IDwBAACQCACdASowADAAPrVQn0ynJCKiJyto4BaJaQAIIsx4Au9dhDqVA1i1RoRTO7nbdyy03nM5FhvV62goUj37tuxqpfpPeTBZvrJ78w0qAAD+/hVyFHvYXIrMCjny0z7wqsB9/QE08xls/AQdXJFX0adG9lISsm6kV96J5FINBFXzHwfzMCr4N6r3z5/Aa/wfEoVGX3H976she3jyS8RqJv7Jw7bOxoTSPlu4gNbfXYZ9TnbdQ0MNnMObyaRQLIu556jIj03zfJrVgqRM8GPwRoWb1M9AfzFe6Mtg13uEIqrTHmiuBpH+bTVB5EEQ3uby0C//XOAPJOFv4QV8RZDPQd517Khyba8Jlr97j2kIBJD9K3mbOHSHiQDasj6Y3forATbIg4QZHxWnCeqqMkVYfUAivuL0L/68mMnagAAA" alt="" itemprop="image"><div class="social-embed-user-names"><p class="social-embed-user-names-name" itemprop="name">Terence Eden is on Mastodon</p>@edent</div></a><img class="social-embed-logo" alt="Twitter" src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%0Aaria-label%3D%22Twitter%22%20role%3D%22img%22%0AviewBox%3D%220%200%20512%20512%22%3E%3Cpath%0Ad%3D%22m0%200H512V512H0%22%0Afill%3D%22%23fff%22%2F%3E%3Cpath%20fill%3D%22%231d9bf0%22%20d%3D%22m458%20140q-23%2010-45%2012%2025-15%2034-43-24%2014-50%2019a79%2079%200%2000-135%2072q-101-7-163-83a80%2080%200%200024%20106q-17%200-36-10s-3%2062%2064%2079q-19%205-36%201s15%2053%2074%2055q-50%2040-117%2033a224%20224%200%2000346-200q23-16%2040-41%22%2F%3E%3C%2Fsvg%3E"></header><section class="social-embed-text" itemprop="articleBody"><small class="social-embed-reply"><a href="https://twitter.com/amanda/status/19937905358">Replying to @amanda</a></small><a href="https://twitter.com/amanda">@amanda</a> American Gods (or anything else by Neil Gaiman). For The Win or Makers by Cory Doctorow. Punchbag by Robert Llewellyn.</section><hr class="social-embed-hr"><footer class="social-embed-footer"><a href="https://twitter.com/edent/status/19938122225"><span aria-label="0 likes" class="social-embed-meta">❤️ 0</span><span aria-label="0 replies" class="social-embed-meta">💬 0</span><span aria-label="0 reposts" class="social-embed-meta">🔁 0</span><time datetime="2010-07-30T20:44:23.000Z" itemprop="datePublished">20:44 - Fri 30 July 2010</time></a></footer></blockquote>

<p>What was Amanda saying?  The only way to find out is to go on to the Twitter website and see the thread in question.  The archive doesn't include <em>any</em> of the replies people sent you.</p>

<p>Ideally, Twitter could have included the complete conversation threads in the archive. Twitter's threading tools are notably abysmal.  The lack of being able to search for tweets via the "in_reply_to" metadata makes understanding conversation threads particularly troublesome.</p>

<h2 id="no-updated-metadata"><a href="https://shkspr.mobi/blog/2013/02/deficiencies-in-the-twitter-archive/#no-updated-metadata">No Updated Metadata</a></h2>

<p>I remember sitting down with Twitter's developer relations guy - Raffi Krikorian - at WarbleCamp (this was back when Twitter cared about developers).  We thrashed out some of the ideas around Entities and how they could be useful to the developer community.</p>

<blockquote class="social-embed" id="social-embed-13614627436" lang="en" itemscope="" itemtype="https://schema.org/SocialMediaPosting"><blockquote class="social-embed" id="social-embed-13609992755" lang="en" itemscope="" itemtype="https://schema.org/SocialMediaPosting"><header class="social-embed-header" itemprop="author" itemscope="" itemtype="https://schema.org/Person"><a href="https://twitter.com/raffi" class="social-embed-user" itemprop="url"><img class="social-embed-avatar social-embed-avatar-circle" src="data:image/webp;base64,UklGRkYBAABXRUJQVlA4IDoBAAAwCACdASowADAAPrVWp06nJKOiI4kA4BaJYwDLmyTAZs1Q68NHzAUlMwqPpHBKiPLJNNtay/kATEG2z9iJ33fnm93xg0XDdhEgAAD++2VXRiitOGINvvtTbN5xjb1gFuNK6hdOo8VyHLk9LFs2/R3Y94hyej69GyeWFdDPBzeXDtUUQBqNS79K7vFNVACxdnJHsx4aBmEAEOQfXaXOPpxSWMFZDDt3wdrU6f0ljFWHUvoE4UP641+Qkcc1QRJ0rcp8Hlkh1rY/cgm7c+F05N9I+T+b4ElPRBHPmT8oPtdQblezr0gky1GmBgTId9Dh/MmtEPE+sbI43yuldboE8GdcXCxnqpPtjmxV0DO5Q3x2GSR59h68engHWxEe7hKWIOdM7/5tpm/soO5jlCrtcvJGtWd6DDh6/F1AAA==" alt="" itemprop="image"><div class="social-embed-user-names"><p class="social-embed-user-names-name" itemprop="name">uɐᴉɹoʞᴉɹʞ ᴉɟɟɐɹ👨🏼‍🚀 → @raffi@mastodon.cloud</p>@raffi</div></a><img class="social-embed-logo" alt="Twitter" src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%0Aaria-label%3D%22Twitter%22%20role%3D%22img%22%0AviewBox%3D%220%200%20512%20512%22%3E%3Cpath%0Ad%3D%22m0%200H512V512H0%22%0Afill%3D%22%23fff%22%2F%3E%3Cpath%20fill%3D%22%231d9bf0%22%20d%3D%22m458%20140q-23%2010-45%2012%2025-15%2034-43-24%2014-50%2019a79%2079%200%2000-135%2072q-101-7-163-83a80%2080%200%200024%20106q-17%200-36-10s-3%2062%2064%2079q-19%205-36%201s15%2053%2074%2055q-50%2040-117%2033a224%20224%200%2000346-200q23-16%2040-41%22%2F%3E%3C%2Fsvg%3E"></header><section class="social-embed-text" itemprop="articleBody"><small class="social-embed-reply"><a href="https://twitter.com/edent">Replying to @edent</a></small><a href="https://twitter.com/edent">@edent</a> let's talk about "entities" - something we're about to launch on @twitterapi to help with hashtags and the like <a href="https://twitter.com/hashtag/warblecamp">#warblecamp</a></section><hr class="social-embed-hr"><footer class="social-embed-footer"><a href="https://twitter.com/raffi/status/13609992755"><span aria-label="0 likes" class="social-embed-meta">❤️ 0</span><span aria-label="0 replies" class="social-embed-meta">💬 0</span><span aria-label="0 reposts" class="social-embed-meta">🔁 0</span><time datetime="2010-05-08T14:10:15.000Z" itemprop="datePublished">14:10 - Sat 08 May 2010</time></a></footer></blockquote><header class="social-embed-header" itemprop="author" itemscope="" itemtype="https://schema.org/Person"><a href="https://twitter.com/edent" class="social-embed-user" itemprop="url"><img class="social-embed-avatar social-embed-avatar-circle" src="data:image/webp;base64,UklGRkgBAABXRUJQVlA4IDwBAACQCACdASowADAAPrVQn0ynJCKiJyto4BaJaQAIIsx4Au9dhDqVA1i1RoRTO7nbdyy03nM5FhvV62goUj37tuxqpfpPeTBZvrJ78w0qAAD+/hVyFHvYXIrMCjny0z7wqsB9/QE08xls/AQdXJFX0adG9lISsm6kV96J5FINBFXzHwfzMCr4N6r3z5/Aa/wfEoVGX3H976she3jyS8RqJv7Jw7bOxoTSPlu4gNbfXYZ9TnbdQ0MNnMObyaRQLIu556jIj03zfJrVgqRM8GPwRoWb1M9AfzFe6Mtg13uEIqrTHmiuBpH+bTVB5EEQ3uby0C//XOAPJOFv4QV8RZDPQd517Khyba8Jlr97j2kIBJD9K3mbOHSHiQDasj6Y3forATbIg4QZHxWnCeqqMkVYfUAivuL0L/68mMnagAAA" alt="" itemprop="image"><div class="social-embed-user-names"><p class="social-embed-user-names-name" itemprop="name">Terence Eden is on Mastodon</p>@edent</div></a><img class="social-embed-logo" alt="Twitter" src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%0Aaria-label%3D%22Twitter%22%20role%3D%22img%22%0AviewBox%3D%220%200%20512%20512%22%3E%3Cpath%0Ad%3D%22m0%200H512V512H0%22%0Afill%3D%22%23fff%22%2F%3E%3Cpath%20fill%3D%22%231d9bf0%22%20d%3D%22m458%20140q-23%2010-45%2012%2025-15%2034-43-24%2014-50%2019a79%2079%200%2000-135%2072q-101-7-163-83a80%2080%200%200024%20106q-17%200-36-10s-3%2062%2064%2079q-19%205-36%201s15%2053%2074%2055q-50%2040-117%2033a224%20224%200%2000346-200q23-16%2040-41%22%2F%3E%3C%2Fsvg%3E"></header><section class="social-embed-text" itemprop="articleBody"><small class="social-embed-reply"><a href="https://twitter.com/raffi/status/13609992755">Replying to @raffi</a></small>Really productive chat with <a href="https://twitter.com/raffi">@raffi</a> - think we came up with a solution to the hashtag problem <a href="https://twitter.com/hashtag/WarbleCamp">#WarbleCamp</a></section><hr class="social-embed-hr"><footer class="social-embed-footer"><a href="https://twitter.com/edent/status/13614627436"><span aria-label="0 likes" class="social-embed-meta">❤️ 0</span><span aria-label="0 replies" class="social-embed-meta">💬 0</span><span aria-label="0 reposts" class="social-embed-meta">🔁 0</span><time datetime="2010-05-08T15:40:02.000Z" itemprop="datePublished">15:40 - Sat 08 May 2010</time></a></footer></blockquote>

<p>One of the things which never happened was "backporting" entities.  Tweets which were made before entities were switched on are stuck with no metadata.  So if you're trying to examine your archive for hashtags, links, etc - you'll have a hard time.</p>

<p>A perfect example is <a href="https://twitter.com/twitterapi/status/13630816113">this early tweet about Twitter annotations</a>.  Even on the Twitter website, the URL isn't automatically linked.</p>

<blockquote class="social-embed" id="social-embed-13630816113" lang="en" itemscope="" itemtype="https://schema.org/SocialMediaPosting"><header class="social-embed-header" itemprop="author" itemscope="" itemtype="https://schema.org/Person"><a href="https://twitter.com/API" class="social-embed-user" itemprop="url"><img class="social-embed-avatar social-embed-avatar-square" src="data:image/webp;base64,UklGRngBAABXRUJQVlA4IGwBAADwBwCdASowADAAPrVapk8nJSMiI4gA4BaJaQAM7Oq32bb6I/3nBpPxBwFaQ2GrE2pp8G7k6oPVdaGQZKxdkj8D2canPEleIAAA/v4S8SOGkSdsqGZuif63SPPmqMfMV5P/Ex5aFzPYdVn8VaHTofBkvW8tA72VW/EVeknpifVe05R3y+7fe7enehy18P8X2HJcyyPnXFcm29d6IGd0bFTA6kY2ug94ZSRe3390dy54A/El6QLUYG+7H3cR+1XiypV+29QEPG85APs2kHIvFBY7dkH9gvw3lPYM3xrTET3quxDN7+uuu//nKInyPy9sNYbtbzaDQXGYCXeJXrzPqKfe/0RvP59/UfUqN9UkZk7a2/FbmDXul4cXYi+Y+QBvwAE6D9ROa645KpPP7fl/zuWe+6sLka32sR3Au218RARvi5oue0fxw6SWjxkmm9ZGWBhK2vc6dJcEBZQPNTTAcr2e2sefXDkNpDgRWAAA" alt="" itemprop="image"><div class="social-embed-user-names"><p class="social-embed-user-names-name" itemprop="name">API</p>@API<br><img src="data:image/webp;base64,UklGRhICAABXRUJQVlA4IAYCAACQCwCdASpJAEkAPrVYpE4nJSMiI/qqiOAWiWkAEVXizVgMsDSBTCf2I83dEz6vv2UI3Pt/KBCW0jXcqF0YZ1LJq2JjUU2wKLTXsaGRF4daFgOmkMK9+5MWEsN5Y8KjAALFniQWKAD++MzMa/PUPXg7fcuBr14PUNNn42XISu1JeNRytc/gD+QLHrIVH5w/nU9O68tnzwVwANia4WJvznIJpgOMQGQXZMIHogDRqEZkjxX6gWelqYMgBuZHdRy2hKOB317QUgEMXL3lYC/dH2t8cP7G2ODh1M7i1giAl1zniDkcsFwGaH/lRdxkXMURbFlFm4ofjgYl33Nm+ANo/kbt4kvxelsvcmibhDNoZ9BTzdo/Non+ZCgI//e1ftV1InWiQUEpktWkEfms93cLXD47X1GHv2tymEr45zIm1zHO5MhtmP57a/49gxiyw9s2raCtAigt9M/ntSOFGqY2CiUHANLGCW5Pgt7EvAV5n8Z9i/cK1kCF9cfaJtv5n8uG+GCjFBFNRxa7gLs4Tx4aLY/vIt86PUUY0GH/3MQHJrXWTHp4dysICH+BdNraV8JJRJIQDEz3LFgBbK3gB7BVZ71FuyCNCojrlFo3gdkzmei+TVZsJ8CWxj1KgpJhbBxM9zZ3MhK+ZtczkvsfV5/XMGcxKSfCK81efCGXqlCCMkfoMl7yQAAAAA==" alt="" class="social-embed-badge"> X</div></a><img class="social-embed-logo" alt="Twitter" src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%0Aaria-label%3D%22Twitter%22%20role%3D%22img%22%0AviewBox%3D%220%200%20512%20512%22%3E%3Cpath%0Ad%3D%22m0%200H512V512H0%22%0Afill%3D%22%23fff%22%2F%3E%3Cpath%20fill%3D%22%231d9bf0%22%20d%3D%22m458%20140q-23%2010-45%2012%2025-15%2034-43-24%2014-50%2019a79%2079%200%2000-135%2072q-101-7-163-83a80%2080%200%200024%20106q-17%200-36-10s-3%2062%2064%2079q-19%205-36%201s15%2053%2074%2055q-50%2040-117%2033a224%20224%200%2000346-200q23-16%2040-41%22%2F%3E%3C%2Fsvg%3E"></header><section class="social-embed-text" itemprop="articleBody">thoughts? RT <a href="https://twitter.com/raffi">@raffi</a>: Talking, in a very preliminary way, about @twitterapi annotations http://bit.ly/twannotations <a href="https://twitter.com/hashtag/warblecamp">#warblecamp</a></section><hr class="social-embed-hr"><footer class="social-embed-footer"><a href="https://twitter.com/API/status/13630816113"><span aria-label="15 likes" class="social-embed-meta">❤️ 15</span><span aria-label="0 replies" class="social-embed-meta">💬 0</span><span aria-label="0 reposts" class="social-embed-meta">🔁 0</span><time datetime="2010-05-08T21:58:08.000Z" itemprop="datePublished">21:58 - Sat 08 May 2010</time></a></footer></blockquote>

<h2 id="comparisons-to-other-services"><a href="https://shkspr.mobi/blog/2013/02/deficiencies-in-the-twitter-archive/#comparisons-to-other-services">Comparisons to Other Services</a></h2>

<p>Facebook, for all its failings are pretty good at giving you an archive of all your content.  Given the complexities of their databases, it's not surprising that it's a bit mangled - but it's there.  You get all the photos and videos you uploaded as well.</p>

<p>Yes, it's great that Twitter has finally kept its word on data extraction - but this really feels like an underwhelming effort.</p>

<h2 id="tools"><a href="https://shkspr.mobi/blog/2013/02/deficiencies-in-the-twitter-archive/#tools">Tools</a></h2>

<p>So, it looks like I'll have to write a tool to download all the missing tweets, conversations, photos, favourites, and DMs.  I'll also need to write a script which reformats the metadata on old posts to ensure they are compatible with new ones.</p>

<p>Then dump everything into a database, or series of flat files.</p>

<p>Of my 28k Tweets - around 12k are replies to other tweets.  How very sociable of me!  Twitter rate limit their API to 150 queries an hour. So it will take around 4 days to get all the tweets to which I've replied. Of course, then it becomes a recursive issue (I have to see which of those are replies, and which of those replies are replies etc). So, probably a week.</p>

<p>Fun times ahead.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=7598&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2013/02/deficiencies-in-the-twitter-archive/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
	</channel>
</rss>
