Quick and dirty way to rip an eBook from Android


I recently purchased a book for my MSC which was only available via a crappy Android app. There was no obvious way to decrypt it to read on a more sensible device, so I resorted to the ancient art of screenscraping.

This is a quick-and-dirty way to grab images of the pages and convert them to a standard PDF using Linux. There's a lot more you can do to make the end book more useful, but this'll get you started

Lots of Screen Shots

With a USB cable plugged into my phone and laptop, I wrote a horrible little bash script:

#!/bin/bash
for i in {00001..00555}; do
   adb exec-out screencap -p > $i.png
   adb shell input tap 1000 2000
   sleep 1s
done
echo All done

This runs a loop 555 times. Takes a screenshot, names it for the loop number with padded zeros, taps the bottom right of the screen, then waits for a second to ensure the page has refreshed. Slow and dull, but works reliably.

Images range from 200KB to 2MB depending on complexity. Back them up before doing the next bit.

Cropping

The screenshots are all 1080x2160. But the page only takes up part of that. The top left corner is at 50x432 and the bottom right is at 1028x1726.

This command crops all the images. It is destructive, so make sure you have a backup.

mogrify -crop 978x1294+50+432 +repage *.png

It's also useful to trim the images to remove any whitespace from the borders. That makes a smaller file size.

mogrify -trim *.png

Images can be shrunk with:

pngquant *.png

PDF and OCR

Sticking all the images together into a single PDF is pretty easy:

convert *.png +repage output.pdf

The +repage option keeps the aspect ratio of the trimmed image.

But there's no text to search. There are a bunch of OCR programs on Linux, I like PDF Sandwich:

pdfsandwich -rgb -nopreproc output.pdf

That'll get you a colour PDF with OCR'd text embedded in it. The text is "sandwiched" behind the image of the page, so you can't see it but can search for it.

You can also use OCRmyPDF which may result in a smaller file:

ocrmypdf -l eng output.pdf output_ocr.pdf

And that's it. I now have a searchable PDF which I can read on any device.

What have we learned?

DRM on textbooks is an annoyance. For computer science books, it's little more than a fig-leaf.


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

3 thoughts on “Quick and dirty way to rip an eBook from Android”

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">