Liberate Your YouTube Videos
If you've been following this blog, you'll know that Google unjustly shut down my YouTube channel. They've now reinstated it - but I can no longer trust them as custodians of my data.
So, here's a quick tutorial on how to download all your videos - and metadata - from YouTube.
The Official Way
Google offers a "takeout" service which will allow you to package up all your YouTube videos for export.
It creates a multi-gigabyte archive - which isn't particularly suitable for hosting elsewhere. Once the archive is created, you have to download it in 2GB chunks. The archives are only available for 7 days - so if you're on a normal speed Internet connection, you might not be able to grab everything.
If you do manage to download everything - you'll find another problem. The files are enormous because you're downloading the originals - not the web-optimised versions.
So, how can we download high-quality, low-filesize copies of the videos suitable for HTML5 use?
P-p-p-p-pick Up Some Python
We'll be using the excellent YouTube-DL - make sure you have the most recent version installed.
The Google Takeout from above contains a file called uploads.json
- it has a list of every video you've uploaded and some associated metadata:
...
,{
"contentDetails" : {
"videoId" : "2gIM9MzfaC8"
},
"etag" : "\"mPrpS7Nrk6Ggi_P7VJ8-KsEOiIw/7fgl8GbhgRwxNU3aCz9jzUB_65M\"",
"id" : "UUAEmywW2HASHP0MohSNIN0qHLpdeAxkQv",
"kind" : "youtube#playlistItem",
"snippet" : {
"channelId" : "UCyC5lCspQ5sXZ9L3ZdEEF2Q",
"channelTitle" : "Terence Eden",
"description" : "Look at me walk to work!",
"playlistId" : "UUyC5lCspQ5sXZ9L3ZdEEF2Q",
"position" : 168,
"publishedAt" : "2010-12-02T10:03:15.000Z",
"resourceId" : {
"kind" : "youtube#video",
"videoId" : "2gIM9MzfaC8"
},
"thumbnails" : {
"default" : {
"height" : 90,
"url" : "https://i.ytimg.com/vi/2gIM9MzfaC8/default.jpg",
"width" : 120
},
"high" : {
"height" : 360,
"url" : "https://i.ytimg.com/vi/2gIM9MzfaC8/hqdefault.jpg",
"width" : 480
},
"medium" : {
"height" : 180,
"url" : "https://i.ytimg.com/vi/2gIM9MzfaC8/mqdefault.jpg",
"width" : 320
}
},
"title" : "Walking To Work In The Snow"
},
"status" : {
"privacyStatus" : "public"
}
}, {
...
So, we want to go through that JSON, download some web-friendly versions of the media, and save them.
This Python downloads a high resolution MP4 video and AAC audio - it then mixes them together. It will also download a slightly lower resolution WEBM file. It then grabs a screenshot and any subtitles which are present. Finally, a little HTML5 snippet is written.
from __future__ import unicode_literals
from datetime import datetime
import youtube_dl
import json
import urllib
import os
# Read the JSON
with open('uploads.json') as data_file:
data = json.load(data_file)
# Itterate through the JSON
for video in data:
videoId = video["contentDetails"]["videoId"]
description = video["snippet"]["description"]
publishedAt = video["snippet"]["publishedAt"]
title = video["snippet"]["title"]
status = video["status"]["privacyStatus"]
# Create a date object based on the video's timestamp
date_object = datetime.strptime(publishedAt,"%Y-%m-%dT%H:%M:%S.000Z")
# Create a filepath and filename
# /YYYYMMDD-HHMM/YYYYMMDD-HHMM_My Video_abc123
filepath = date_object.strftime('%Y%m%d-%H%M')
filename = date_object.strftime('%Y%m%d') + "_" + title + "_" + videoId
# YouTube Options
ydl_opts = {
# The highest quality MP4 should have the best resolution.
# Best quality AAC audio - because that's the codec which will fit in an MP4
# A lower quality WEBM for HTML5 streaming
'format': 'bestvideo[ext=mp4]+bestaudio[acodec=aac],webm',
'audioformat' :'aac',
'merge_output_format' : 'mp4',
# Don't create multiple copies of things
'nooverwrites' : 'true',
# Some videos have subtitles
'writeautomaticsub' : 'true',
'writethumbnail' : 'true',
# Make sure the filenames don't contain weird characters
'restrictfilenames' : 'true',
# Add the correct extention for each filename
'outtmpl' : filepath + '/' + filename + '.%(ext)s',
# If there is a problem, try again
'retries' : '5',
# 'verbose' : 'true',
}
# Download the files
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
if status == "private":
# YouTube-DL won't download private videos
print "Skipping private video " + videoId
else:
print "Downloading " + videoId + " to " + filename
ydl.download(['http://www.youtube.com/watch?v='+videoId])
# Write an HTML5 snippet
html = "<video poster=\""+filename+".jpg\" controls >\n"
html += " <source src=\""+filename+".mp4\" type=\"video/mp4; codecs=mp4a.40.2, avc1.42001E\">\n"
html += " <source src=\""+filename+".webm\" type=\"video/webm; codecs=vorbis, vp8.0\">\n"
html += " <track src=\""+filename+".en.vtt\" kind=\"subtitles\" srclang=\"en\" label=\"English\">\n"
html += "</video>"
# Write teh HTML
with open(os.path.join(filepath, "video.html"), 'w') as html_file:
html_file.write(html)
The files it downloads are significantly smaller than the original uploads - with no noticeable loss of quality. The combined size of the MP4 and WEBM are around half the size of the original files.
There is one major bug - occasionally the script will crap out with:
youtube_dl.utils.DownloadError: ERROR: content too short (expected 153471376 bytes and served 88079712)
This is a persistent error with YouTube-DL.
The script can be run again - it's smart enough to avoid re-downloading the videos.
And there you have it - a quick way to grab everything you've uploaded. It's missing a few things - view counts and comments, mostly - but it's good enough for re-hosting elsewhere.
This is what it looks like
This is what happens when you put that dash of HTML into a web-page:
I think there's a problem with the subtitle track - but it is not a finalised standard yet.
What links here from around this blog?