Extracting Your Data from the AudioBoo API


Earlier this week, I wrote about the Future of AudioBoo. I'm sure the service is going to be just fine - but thought it would be an interesting exercise to liberate my data from there just in case.

As I begin the move to decentralised services where possible, I think it's important that I take responsibility for my own data.

The API docs for AudioBoo are very clear, so here's a quick guide on how to download all you Boos and (most of) their data.

Get All Your Boos

The AudioBoo API - unlike some - doesn't require any authentication. It also doesn't restrict you to only downloading your own data. So, if you want to swim in Stephen Fry's vocal delights, go right ahead!

The API call is very simple:

http://api.audioboo.fm/audio_clips?
   username=edent&
   page[items]=150&
   page[number]=1

You can grab up to 150 items at a time. To move the the next page of items, change the "page[number]=" to 2 - and so on.

Examining The Data

Each page of data starts with something similar to:

{"window":60,
"version":200,
"timestamp":1365710614,
"body":{
    "totals":{
	"count":64,
	"offset":0
    },
    "audio_clips":[....]

This tells us how many Boos there are (in my case, 64) and - if we're no longer on page 1 - which number Boo to start with.

The Boo's Data

Boos are presented in reverse chronological order - the most recent first.

Here is what an individual Boo's JSON looks like - it's fairly simple and self explanatory.

{
"id":363171,
"title":"QR Codes at OpenTech",
"user":{
    "id":2861,
    "username":"edent",
    "counts":{
	"audio_clips":64,
	"followers":20,
	"followings":8
    },
    "urls":{
	"profile":"http://audioboo.fm/edent",
	"image":"http://www.gravatar.com/avatar/a4c65091cd258c86e6187eaaed4ef939?default=http%3A%2F%2Fd15mj6e6qmt1na.cloudfront.net%2Fassets%2Favatar_green-a51f577eaf0b191acb694b5ec3c733dc.gif"
    }
},
"duration":1173.53,
"mp3_filesize":9388160,
"uploaded_at":"2011-05-21T21:23:09Z",
"recorded_at":"2011-05-21T21:23:09Z",
"location":{
    "description":"City of London, Camden Town, United Kingdom",
    "longitude":-0.131275,
    "latitude":51.5226,
    "accuracy":29.6755
},
"counts":{
    "comments":1,
    "plays":133
},
"urls":{
    "detail":"http://audioboo.fm/boos/363171-qr-codes-at-opentech",
    "high_mp3":"http://audioboo.fm/boos/363171-qr-codes-at-opentech.mp3",
    "image":"http://audioboo.fm/files/images/0120/4984/Twitter_Search.png"
},
"tags":[
    {
	"display_tag":"edent",
	"normalised_tag":"edent",
	"url":"http://audioboo.fm/tag/edent"
    },
    {
	"display_tag":"open data",
	"normalised_tag":"opendata",
	"url":"http://audioboo.fm/tag/opendata"
    }
    ]
}

The two fields which we're probably most interested in are "high_mp3" which gives you the URL to a high quality MP3 recording, and "image" which contains the picture you associated with the Boo. As you can see - the data is fairly comprehensive. Perhaps the only thing it is missing are the comments. You get told the number of comments left, but not their content.

Parsing The Data

Once you've called the API and got a response, it's trivial to extract your data for use.

<?php
$json = json_decode($string,true);
$audio = $json["body"]["audio_clips"];

foreach ($audio as $key)
{
    $mp3 = $key["urls"]["high_mp3"];
    $image = $key["urls"]["image"];
    // Do Something...
}

Personally, I wrote the URLs to a file, then used wget to download all the mp3s and images. You could also write the data to CSV or something similar and then import it into WordPress (or your blogging platform of choice).

Limitations

Perhaps the only minor downside is that the files are MP3s. I thought that AudioBoo stored them as FLAC. That said, MP3s are more than adequate for voice recordings. They also have fully comprehensive ID3 tags. The lack of comments is a little sad - but my Boos never quite garnered enough interest to attract much attention.

There are no download limiations that I ran into. I quite happily scoffed down 5GB of data as fast as they could send it to me.

So, there you have it! A simple way to download all your AudioBoo data. Here's a quick PHP file which should download your first 150 Boos - good luck!

<?php
$AudioBooAPIresponse = file_get_contents("http://api.audioboo.fm/audio_clips?username=YOUR-USERNAME&page[items]=150&page[number]=1");
$json = json_decode($AudioBooAPIresponse,true);
$AudioClips = $json["body"]["audio_clips"];

foreach ($AudioClips as $key)
{
    $id	   = $key["id"];
    $title	= $key["title"];
    $description  = $key["description"];
    $recorded_at  = $key["recorded_at"];
    $location     = $key["location"]["description"];
    $longitude    = $key["location"]["longitude"];
    $latitude     = $key["location"]["latitude"];
    $accuracy     = $key["location"]["accuracy"];
    $original     = $key["urls"]["detail"];
    $mp3	  = $key["urls"]["high_mp3"];
    $image	= $key["urls"]["image"];
    $duration     = $key["duration"];
    $mp3_filesize = $key["mp3_filesize"];

    //  Save The Files
    $mp3_file = file_get_contents($mp3);
    file_put_contents($id . ".mp3", $mp3_file);

    $image_file = file_get_contents($image);
    file_put_contents($id . ".jpg", $image_file);
}

Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

3 thoughts on “Extracting Your Data from the AudioBoo API”

  1. Mark Rock says:

    The flac files are 240m down in a facility in Nevada. And whilst I'm glad we've made it that simple to retrieve all your boos, don't worry. The originals are safe with us.

    Reply
  2. says:

    Thanks for sharing this, really useful - I've just downloaded all of mine and learned about parsing json in php! Ta very much 🙂

    Reply

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">