Exporting TwitPic Images - Python
As part of my quest to ensure I have a reasonable backup of all my social media data, I've been investigating ho easy it is to export photos from TwitPic.
I've been using TwitPic since 2008 and have uploaded 1,200 images there.
There's no official export function for TwitPic. The services which used to exist relied on their RSS feeds - which have since been killed off.
This little Python script uses some undocumented APIs to grab all your images, save them in a directory, and make sure they have the correct timestamp.
Firstly, the documented bit. It's possible to make an unauthenticated API call to get the most recent 20 images any user has uploaded.
https://api.twitpic.com/2/users/show.json?username=edent&page=1
Obviously, use your own username in place of mine!
You can increment the page number, to get each page. Using the "photo_count" property, you should be able to work out how many pages you will need to request in order to grab all the photos. Remember to round up to the nearest integer.
Next, we have the undocumented API call. It's possible to request the thumbnail of an image with a simple API call:
https://twitpic.com/show/thumb/d834ni
To grab the full image, simply change "thumb" to "full":
http://twitpic.com/show/full/d834ni
Deficiencies
Sadly, the TwitPic images don't retain their original EXIF metadata - so there's no way of seeing where the photo was take, what camera was used, etc.
In fact, original images are not preserved at all. None seem to be larger than 800*600 - even though the API "helpfully" tells you what their original full resolution was.
My code is a quick sketch in Python. It could be made to run in parallel as it takes quite a while to download all the images - even though the total file size was only 100MB.
It has only been tested on Linux, so some things may need to change if you're on another OS. Specifically, this code strips out "/" from the file name - other computers may need other special characters replaced.
My code also assumes you have 100 pages or fewer of pictures to download.
On the plus side, it does make sure that the file names won't be too long for your file system and that any HTML entities are correctly decoded.
Enjoy!
Code
import urllib
import urllib2
import json
import collections
import HTMLParser
import time
import os
# Create a parser for HTML entities
h = HTMLParser.HTMLParser()
# Maximum filename length
# Last 4 characters will be .jpg or .png etc
max_length = os.statvfs('.').f_namemax - 4
# Target Page
twitpic_api = "https://api.twitpic.com/2/users/show.json?username=YOURNAMEHERE&page="
# Get the data about the target page
for page in range(0, 100):
print page
twitpic_data = json.load(urllib2.urlopen(twitpic_api + str(page)))
# Get the info about each image on the page
twitpic_images = twitpic_data["images"]
for item in twitpic_images:
twitpic_id = item['short_id']
twitpic_title = item["message"]
# Replace / (which can't be used in a file name) with a similar looking character
twitpic_title = twitpic_title.replace('/', u'u2044')
twitpic_title = twitpic_title[:max_length]
twitpic_file_type = item["type"]
twitpic_time = time.mktime(time.strptime(item["timestamp"], "%Y-%m-%d %H:%M:%S"))
twitpic_file_url = "https://twitpic.com/show/full/"+twitpic_id
twitpic_file_name = h.unescape(twitpic_title) + "." + twitpic_file_type
# Save the file
urllib.urlretrieve (twitpic_file_url, twitpic_file_name)
# Set the file time
os.utime(twitpic_file_name,(twitpic_time, twitpic_time))
Tom Parker says:
BTW, to make the separator replacement code portable, use os.sep (http://docs.python.org/2/library/os.html#os.sep) which will be the right character on each OS.
Terence Eden says:
Ooh! Very useful, thanks 🙂
hugs says:
I made a modified version here: https://gist.github.com/hugs/fa7892da03ce660c212e
My changes: * Use https instead of http * Use the short_id as the filename * Save the raw page data to separate "page-*.json" files
Big question: What's the license for this. Is it okay if I use the MIT license for my fork?
Cheers and thanks for this!
Terence Eden says:
Looks great 🙂 Yes, please consider mine as being under an MIT license.