As part of my quest to ensure I have a reasonable backup of all my social media data, I’ve been investigating ho easy it is to export photos from TwitPic.
I’ve been using TwitPic since 2008 and have uploaded 1,200 images there.
There’s no official export function for TwitPic. The services which used to exist relied on their RSS feeds – which have since been killed off.
This little Python script uses some undocumented APIs to grab all your images, save them in a directory, and make sure they have the correct timestamp.
Firstly, the documented bit. It’s possible to make an unauthenticated API call to get the most recent 20 images any user has uploaded.
Obviously, use your own username in place of mine!
You can increment the page number, to get each page. Using the “photo_count” property, you should be able to work out how many pages you will need to request in order to grab all the photos. Remember to round up to the nearest integer.
Next, we have the undocumented API call. It’s possible to request the thumbnail of an image with a simple API call:
To grab the full image, simply change “thumb” to “full”:
Sadly, the TwitPic images don’t retain their original EXIF metadata – so there’s no way of seeing where the photo was take, what camera was used, etc.
In fact, original images are not preserved at all. None seem to be larger than 800*600 – even though the API “helpfully” tells you what their original full resolution was.
My code is a quick sketch in Python. It could be made to run in parallel as it takes quite a while to download all the images – even though the total file size was only 100MB.
It has only been tested on Linux, so some things may need to change if you’re on another OS. Specifically, this code strips out “/” from the file name – other computers may need other special characters replaced.
My code also assumes you have 100 pages or fewer of pictures to download.
On the plus side, it does make sure that the file names won’t be too long for your file system and that any HTML entities are correctly decoded.
import urllib import urllib2 import json import collections import HTMLParser import time import os # Create a parser for HTML entities h = HTMLParser.HTMLParser() # Maximum filename length # Last 4 characters will be .jpg or .png etc max_length = os.statvfs('.').f_namemax - 4 # Target Page twitpic_api = "https://api.twitpic.com/2/users/show.json?username=YOURNAMEHERE&page=" # Get the data about the target page for page in range(0, 100): print page twitpic_data = json.load(urllib2.urlopen(twitpic_api + str(page))) # Get the info about each image on the page twitpic_images = twitpic_data["images"] for item in twitpic_images: twitpic_id = item['short_id'] twitpic_title = item["message"] # Replace / (which can't be used in a file name) with a similar looking character twitpic_title = twitpic_title.replace('/', u'u2044') twitpic_title = twitpic_title[:max_length] twitpic_file_type = item["type"] twitpic_time = time.mktime(time.strptime(item["timestamp"], "%Y-%m-%d %H:%M:%S")) twitpic_file_url = "https://twitpic.com/show/full/"+twitpic_id twitpic_file_name = h.unescape(twitpic_title) + "." + twitpic_file_type # Save the file urllib.urlretrieve (twitpic_file_url, twitpic_file_name) # Set the file time os.utime(twitpic_file_name,(twitpic_time, twitpic_time))