Liberate Your YouTube Videos


If you've been following this blog, you'll know that Google unjustly shut down my YouTube channel. They've now reinstated it - but I can no longer trust them as custodians of my data.

So, here's a quick tutorial on how to download all your videos - and metadata - from YouTube.

The Official Way

Google offers a "takeout" service which will allow you to package up all your YouTube videos for export.

It creates a multi-gigabyte archive - which isn't particularly suitable for hosting elsewhere. Google Takeout Once the archive is created, you have to download it in 2GB chunks. The archives are only available for 7 days - so if you're on a normal speed Internet connection, you might not be able to grab everything.

If you do manage to download everything - you'll find another problem. YouTube Export Zip-fs8 The files are enormous because you're downloading the originals - not the web-optimised versions.

So, how can we download high-quality, low-filesize copies of the videos suitable for HTML5 use?

P-p-p-p-pick Up Some Python

We'll be using the excellent YouTube-DL - make sure you have the most recent version installed.

The Google Takeout from above contains a file called uploads.json - it has a list of every video you've uploaded and some associated metadata:

...
,{
  "contentDetails" : {
    "videoId" : "2gIM9MzfaC8"
  },
  "etag" : "\"mPrpS7Nrk6Ggi_P7VJ8-KsEOiIw/7fgl8GbhgRwxNU3aCz9jzUB_65M\"",
  "id" : "UUAEmywW2HASHP0MohSNIN0qHLpdeAxkQv",
  "kind" : "youtube#playlistItem",
  "snippet" : {
    "channelId" : "UCyC5lCspQ5sXZ9L3ZdEEF2Q",
    "channelTitle" : "Terence Eden",
    "description" : "Look at me walk to work!",
    "playlistId" : "UUyC5lCspQ5sXZ9L3ZdEEF2Q",
    "position" : 168,
    "publishedAt" : "2010-12-02T10:03:15.000Z",
    "resourceId" : {
      "kind" : "youtube#video",
      "videoId" : "2gIM9MzfaC8"
    },
    "thumbnails" : {
      "default" : {
	"height" : 90,
	"url" : "https://i.ytimg.com/vi/2gIM9MzfaC8/default.jpg",
	"width" : 120
      },
      "high" : {
	"height" : 360,
	"url" : "https://i.ytimg.com/vi/2gIM9MzfaC8/hqdefault.jpg",
	"width" : 480
      },
      "medium" : {
	"height" : 180,
	"url" : "https://i.ytimg.com/vi/2gIM9MzfaC8/mqdefault.jpg",
	"width" : 320
      }
    },
    "title" : "Walking To Work In The Snow"
  },
  "status" : {
    "privacyStatus" : "public"
  }
}, {
...

So, we want to go through that JSON, download some web-friendly versions of the media, and save them.

This Python downloads a high resolution MP4 video and AAC audio - it then mixes them together. It will also download a slightly lower resolution WEBM file. It then grabs a screenshot and any subtitles which are present. Finally, a little HTML5 snippet is written.

from __future__ import unicode_literals
from datetime import datetime

import youtube_dl
import json
import urllib
import os

#    Read the JSON
with open('uploads.json') as data_file:
    data = json.load(data_file)

#    Itterate through the JSON
for video in data:
    videoId     = video["contentDetails"]["videoId"]
    description = video["snippet"]["description"]
    publishedAt = video["snippet"]["publishedAt"]
    title       = video["snippet"]["title"]
    status      = video["status"]["privacyStatus"]

    #    Create a date object based on the video's timestamp
    date_object = datetime.strptime(publishedAt,"%Y-%m-%dT%H:%M:%S.000Z")

    #    Create a filepath and filename
    #    /YYYYMMDD-HHMM/YYYYMMDD-HHMM_My Video_abc123
    filepath    = date_object.strftime('%Y%m%d-%H%M')
    filename    = date_object.strftime('%Y%m%d') + "_" + title + "_" + videoId

    #    YouTube Options
    ydl_opts = {
	#    The highest quality MP4 should have the best resolution.
	#    Best quality AAC audio - because that's the codec which will fit in an MP4
	#    A lower quality WEBM for HTML5 streaming
	'format': 'bestvideo[ext=mp4]+bestaudio[acodec=aac],webm',
	'audioformat' :'aac',
	'merge_output_format' : 'mp4',
	#    Don't create multiple copies of things
	'nooverwrites' : 'true',
	#    Some videos have subtitles
	'writeautomaticsub' : 'true',
	'writethumbnail' : 'true',
	#    Make sure the filenames don't contain weird characters
	'restrictfilenames' : 'true',
	#    Add the correct extention for each filename
	'outtmpl' : filepath + '/' + filename + '.%(ext)s',
	#    If there is a problem, try again
	'retries' : '5',
	# 'verbose' : 'true',
    }

    #    Download the files
    with youtube_dl.YoutubeDL(ydl_opts) as ydl:
	if status == "private":
	    #    YouTube-DL won't download private videos
	    print "Skipping private video " + videoId
	else:
	    print "Downloading " + videoId + " to " + filename
	    ydl.download(['http://www.youtube.com/watch?v='+videoId])
	    #    Write an HTML5 snippet
	    html  = "<video   poster=\""+filename+".jpg\" controls >\n"
	    html += "    <source src=\""+filename+".mp4\"    type=\"video/mp4; codecs=mp4a.40.2, avc1.42001E\">\n"
	    html += "    <source src=\""+filename+".webm\"   type=\"video/webm; codecs=vorbis, vp8.0\">\n"
	    html += "    <track  src=\""+filename+".en.vtt\" kind=\"subtitles\" srclang=\"en\" label=\"English\">\n"
	    html += "</video>"
	    #    Write teh HTML
	    with open(os.path.join(filepath, "video.html"), 'w') as html_file:
		html_file.write(html)

The files it downloads are significantly smaller than the original uploads - with no noticeable loss of quality. The combined size of the MP4 and WEBM are around half the size of the original files.

YT Output folders-fs8

There is one major bug - occasionally the script will crap out with:

youtube_dl.utils.DownloadError: ERROR: content too short (expected 153471376 bytes and served 88079712)

This is a persistent error with YouTube-DL.

The script can be run again - it's smart enough to avoid re-downloading the videos.

And there you have it - a quick way to grab everything you've uploaded. It's missing a few things - view counts and comments, mostly - but it's good enough for re-hosting elsewhere.

This is what it looks like

This is what happens when you put that dash of HTML into a web-page:

I think there's a problem with the subtitle track - but it is not a finalised standard yet.


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

What links here from around this blog?

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">