Exporting TwitPic Images - Python


As part of my quest to ensure I have a reasonable backup of all my social media data, I've been investigating ho easy it is to export photos from TwitPic.

I've been using TwitPic since 2008 and have uploaded 1,200 images there.

There's no official export function for TwitPic. The services which used to exist relied on their RSS feeds - which have since been killed off.

This little Python script uses some undocumented APIs to grab all your images, save them in a directory, and make sure they have the correct timestamp.

Firstly, the documented bit. It's possible to make an unauthenticated API call to get the most recent 20 images any user has uploaded.

https://api.twitpic.com/2/users/show.json?username=edent&page=1

Obviously, use your own username in place of mine!

You can increment the page number, to get each page. Using the "photo_count" property, you should be able to work out how many pages you will need to request in order to grab all the photos. Remember to round up to the nearest integer.

Next, we have the undocumented API call. It's possible to request the thumbnail of an image with a simple API call:

https://twitpic.com/show/thumb/d834ni

To grab the full image, simply change "thumb" to "full":

http://twitpic.com/show/full/d834ni

Deficiencies

Sadly, the TwitPic images don't retain their original EXIF metadata - so there's no way of seeing where the photo was take, what camera was used, etc.

In fact, original images are not preserved at all. None seem to be larger than 800*600 - even though the API "helpfully" tells you what their original full resolution was.

My code is a quick sketch in Python. It could be made to run in parallel as it takes quite a while to download all the images - even though the total file size was only 100MB.

It has only been tested on Linux, so some things may need to change if you're on another OS. Specifically, this code strips out "/" from the file name - other computers may need other special characters replaced.

My code also assumes you have 100 pages or fewer of pictures to download.

On the plus side, it does make sure that the file names won't be too long for your file system and that any HTML entities are correctly decoded.

Enjoy!

Code

import urllib
import urllib2
import json
import collections
import HTMLParser
import time
import os

#  Create a parser for HTML entities
h = HTMLParser.HTMLParser()

#  Maximum filename length
#  Last 4 characters will be .jpg or .png etc
max_length = os.statvfs('.').f_namemax - 4

#  Target Page
twitpic_api = "https://api.twitpic.com/2/users/show.json?username=YOURNAMEHERE&page="

#  Get the data about the target page
for page in range(0, 100):
    print page
    twitpic_data = json.load(urllib2.urlopen(twitpic_api + str(page)))

    #   Get the info about each image on the page
    twitpic_images = twitpic_data["images"]

    for item in twitpic_images:
        twitpic_id = item['short_id']
        twitpic_title = item["message"]
        #   Replace / (which can't be used in a file name) with a similar looking character
        twitpic_title = twitpic_title.replace('/', u'u2044')
        twitpic_title = twitpic_title[:max_length]
        twitpic_file_type = item["type"]
        twitpic_time = time.mktime(time.strptime(item["timestamp"], "%Y-%m-%d %H:%M:%S"))
        twitpic_file_url = "https://twitpic.com/show/full/"+twitpic_id
        twitpic_file_name = h.unescape(twitpic_title) + "." + twitpic_file_type

        #   Save the file
        urllib.urlretrieve (twitpic_file_url, twitpic_file_name)
        #   Set the file time
        os.utime(twitpic_file_name,(twitpic_time, twitpic_time))


Share this post on…

4 thoughts on “Exporting TwitPic Images - Python”

  1. hugs says:

    I made a modified version here: https://gist.github.com/hugs/fa7892da03ce660c212e

    My changes:
    * Use https instead of http
    * Use the short_id as the filename
    * Save the raw page data to separate "page-*.json" files

    Big question: What's the license for this. Is it okay if I use the MIT license for my fork?

    Cheers and thanks for this!

    - Jason

    Reply

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre> <p> <br> <img src="" alt="" title="" srcset="">