Google's Secret Screenshot API


I've been looking for a way to programmatically take screenshots of websites. Most of the solutions I've found won't work on headless servers, require complex libraries to be installed, or cost money.

So, what do we do when faced with a knotty programming problem? Hack it!

Google has a "Pagespeed" service, it allows any webmaster to get a comprehensive report on how Google assess their page. It also includes a screenshot of how Google sees the webpage. Here's how my blog looks according to the service:

Pagespeed Screenshot

How It Works

The API requires no authentication - and you don't need to own the page in order to take screenshots of it.

The URL for accessing the API is simple:

https://www.googleapis.com/pagespeedonline/v1/runPagespeed?screenshot=true&strategy=mobile&url=

Simply add a percent encoded URl after &url=

You can remove &strategy=mobile if you want a "desktop" view.

This gets you back loads of JSON data about your page. Right at the end, you'll see something like this:

 "screenshot": {
  "data": "_9j_4AAQSkZJRgABAQAAAQABAAD_2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj_2wBDAQcHBwoIChMKChMoG
...
QrdBQpESIyiOyhSiopQhISkZPJ4A5NSqX4NIffUC5xkilin8qbw-agWOKWOfGkOgpfemgQoY7TBjQV8_k_T_wA6aKU0L9pv3AXv-T_tpoP_2Q==",
  "height": 569,
  "mime_type": "image/jpeg",
  "width": 320
 }
}

"Aha!" You may be thinking, "This is a Base64 encoding of the image!" And you would be mostly right.

For reasons best known to Google, / is replaced with _, and + is replaced with -. You'll need to reverse those changes before going any further.

Limitations

This API isn't perfect.

  • Image width is 320px - not great for high-resolution snapshots.
  • Web fonts can prove tricky. Google seems to render most of them.
  • There's no way to pass authentication or cookie data - so you just get the "public" view of the page.
  • Similarly, no POST data - although GET is fine.
  • Plugins like Flash & Java may not work.
  • Complex JavaScript pages won't necessarily work.
  • It's a bit slow to generate the report.
  • Only one rendering - so you can't use it to see how Firefox compares to Chrome.

Overall, not a perfect solution - but for quickly generating a screenshot without needing to install anything, it's pretty good.

A Little Light Code Music

I've hastily cobbled together some code to automate the grabbing and decoding the screenshot data. If you want to improve it, fork the code on GitHub.

import urllib2
import json
import base64
import sys

#	The website's URL
site = "https://twitter.com/edent"

#	The Google API.  Remove "&strategy=mobile" for a desktop screenshot
api = "https://www.googleapis.com/pagespeedonline/v1/runPagespeed?screenshot=true&strategy=mobile&url=" + urllib2.quote(site)

#	Get the results from Google
try:
    site_data = json.load(urllib2.urlopen(api))
except urllib2.URLError:
    print "Unable to retreive data"
    sys.exit()

try:
    screenshot_encoded =  site_data['screenshot']['data']
except ValueError:
    print "Invalid JSON encountered."
    sys.exit()	

#	Google has a weird way of encoding the Base64 data
screenshot_encoded = screenshot_encoded.replace("_", "/")
screenshot_encoded = screenshot_encoded.replace("-", "+")

#	Decode the Base64 data
screenshot_decoded = base64.b64decode(screenshot_encoded)

#	Save the file
with open('screenshot.jpg', 'w') as file_:
    file_.write(screenshot_decoded)

If you know of any other easy to use and free screenshot services - please let people know in the comments.

6 thoughts on “Google's Secret Screenshot API

  1. Thanks for sharing this. I tried to display the Data URI without replacing those 2 characters and came here searching for a solution. Here's the code I used in my project.


    //* PageSpeed Screenshot setup
    $shot_replace = array( "_", "-" );
    $shot_real = array( "/", "+" );
    $shotdata = str_replace( $shot_replace, $shot_real, $pageshot->data );

    It's been a few years after your post and I see another set of data as follows,

    "page_rect": {
    "left": 0,
    "top": 0,
    "width": 1024,
    "height": 768
    }

    Does that mean we can get a bigger resolution screenshot?

    Thank you,
    Shyam

    1. Hi Shyam - where are you seeing that `page_rect` code? I can't see it.

      Ah! You're using the V2 API. I *think* that the `1024` refers to the viewport of the screenshot. I still can't see any way to get a higher resolution. Sorry.

What do you reckon?