Quick and dirty way to rip an eBook from Android

@edentdrm ebook MSc · 3 comments · 450 words · Viewed ~7,474 times.

I recently purchased a book for my MSC which was only available via a crappy Android app. There was no obvious way to decrypt it to read on a more sensible device, so I resorted to the ancient art of screenscraping.

This is a quick-and-dirty way to grab images of the pages and convert them to a standard PDF using Linux. There's a lot more you can do to make the end book more useful, but this'll get you started

Lots of Screen Shots

With a USB cable plugged into my phone and laptop, I wrote a horrible little bash script:

 BASH#!/bin/bash

for i in {00001..00555}; do

   adb exec-out screencap -p > $i.png

   adb shell input tap 1000 2000

   sleep 1s

done

echo All done

This runs a loop 555 times. Takes a screenshot, names it for the loop number with padded zeros, taps the bottom right of the screen, then waits for a second to ensure the page has refreshed. Slow and dull, but works reliably.

Images range from 200KB to 2MB depending on complexity. Back them up before doing the next bit.

Cropping

The screenshots are all 1080x2160. But the page only takes up part of that. The top left corner is at 50x432 and the bottom right is at 1028x1726.

This command crops all the images. It is destructive, so make sure you have a backup.

mogrify -crop 978x1294+50+432 +repage *.png

It's also useful to trim the images to remove any whitespace from the borders. That makes a smaller file size.

mogrify -trim *.png

Images can be shrunk with:

pngquant *.png

PDF and OCR

Sticking all the images together into a single PDF is pretty easy:

convert *.png +repage output.pdf

The +repage option keeps the aspect ratio of the trimmed image.

But there's no text to search. There are a bunch of OCR programs on Linux, I like PDF Sandwich:

pdfsandwich -rgb -nopreproc output.pdf

That'll get you a colour PDF with OCR'd text embedded in it. The text is "sandwiched" behind the image of the page, so you can't see it but can search for it.

You can also use OCRmyPDF which may result in a smaller file:

ocrmypdf -l eng output.pdf output_ocr.pdf

And that's it. I now have a searchable PDF which I can read on any device.

What have we learned?

DRM on textbooks is an annoyance. For computer science books, it's little more than a fig-leaf.

3 thoughts on “Quick and dirty way to rip an eBook from Android”

What links here from around this blog?

Quick and dirty way to rip an eBook from Android

Lots of Screen Shots

Cropping

PDF and OCR

What have we learned?

3 thoughts on “Quick and dirty way to rip an eBook from Android”

Hacker News said on twitter.com:

안드로이드에서 eBook 추출하기 | GeekNews said on :

matoken said on twitter.com:

What links here from around this blog?

What are your reckons? Cancel reply

Lots of Screen Shots

Cropping

PDF and OCR

What have we learned?

Share this post on…

3 thoughts on “Quick and dirty way to rip an eBook from Android”

Hacker News said on twitter.com:

안드로이드에서 eBook 추출하기 | GeekNews said on :

matoken said on twitter.com:

What links here from around this blog?

What are your reckons? Cancel reply