Terence Eden. He has a beard and is smiling.
Theme Switcher:

Would adding Brotli Compression help shrink ePubs?

· 3 comments · 1,050 words · Viewed ~511 times


The ePub format is the cross-platform way to package an eBook. At its heart, an ePub is just a bundled webpage with extra metadata - that makes it extremely easy to build workflows to create them and apps to read them.

Once you've finished authoring your ePub, you've got a folder full of HTML0, CSS, metadata documents, and other resources. The result is then stored in a standard Zip file and is then renamed to .epub. This is known as the Open Container Format (OCF).

There are actually a few different compression schemes for Zip files, but the specification says:

OCF ZIP containers MUST include only stored (uncompressed) and Deflate-compressed ZIP entries within the ZIP archive.

The Deflate algorithm is venerable1 and, while incredible for its time, has been superseded by more modern compression schemes. For example, Brotli.

What happens if we unzip an ePub and then recompress it with Brotli? Will that dramatically reduce the file size?

Steps

  • Unzip the book
    • unzip book.epub -d book/
  • Brotli files can't contain directories, so tar the directory without any compression
    • tar -cvf book.tar book/
  • Create a Zip file with maximum compression
    • zip -9 book.tar.zip book.tar
  • Create a Brotli file with maximum compression
    • brotli -k -q 11 book.tar

Results

I took a random(ish) sample from Standard eBooks and a few from my personal stash2.

Book 1 Book 2 Book 3 Book 4
Contents 768KB 911KB 389KB 594KB
Deflate 250KB 248KB 103KB 175KB
Brotli 190KB 187KB 82KB 137KB

The good news is that ePubs compress pretty well already! That isn't much of a surprise - compression algorithms love the repetitious nature of HTML and human-readable text. Obviously Brotli is better but, on the file sizes we're talking about, not dramatically better. Saving 60KB is OK - but in a world of terabyte sized SD cards does it matter?

Brotli is also computationally harder to decompress, which makes it slightly less attractive for low-powered eReaders.

It's also possible to make a small saving by reducing the complexity and verbosity of the CSS and HTML.

However, that's not the real problem.

I lied to you

An ePub contains more than just text and text-based metadata. It can contain web fonts, images, even music. The above books had all their fonts and media stripped out. Let's run the experiment again but, this time, including everything in the original book.

Book 1 Book 2 Book 3 Book 4
Contents 23MB 3.8MB 0.76MB 0.93MB
Deflate 22MB 1.7MB 0.46MB 0.51MB
Brotli 22MB 1.5MB 0.43MB 0.47MB

All of a sudden, Brotli makes next to no difference. Yes, the textual compression is still there, but it is overshadowed by the huge cost of the media files.

Mixed Media

The ePub 3.3 specification lays out which multimedia formats are acceptable. As well as the older formats like gif, png, and jpeg - newer formats like WebP are acceptable. Similarly, TTF fonts are listed in the standard along with WOFF2.

Modern image and font formats have better compression than their ancestors. Indeed, WOFF2 uses Brotli as its compression scheme.

The biggest filesize saving in ePubs comes from properly compressing images and fonts.

Can You Picture That?

It is a matter of opinion as to what resolution is best suited to an ePub. Most modern eReaders have, at best, 300ppi resolution. They're also normally monochrome. But eBooks aren't always read on low-resolution, black and white eInk screens - so it probably makes sense to have high-resolution colour images in order to future-proof books.

But the compression of those images is not a matter of opinion. Lossless compression algorithms are well supported for legacy and modern image formats.

Let's take a specific example. Twenty Years at Hull House is the 22MB book above. Less than a MB of that is for text, the rest is images.

The largest illustration in the book is a 1937x1971, transparent PNG weighing in at 1MB. Increasing the lossless compression level takes it down to 840KB. Reducing the palette to something more suitable takes it to 640KB. If you were releasing this as an ePub 3.3 file, using WebP would take the image to a hair over 600KB.

Basically, a 20%-40% filesize reduction with no loss of fidelity.

Across all the PNG images in the ePub, I was able to easily get the filesize from 20MB to 16MB.

Converting to lossless WebP got it down to 13MB.

What The Font?

Fonts can be shrunk in a number of ways. The most obvious way is to compress to WOFF2 which, as described above, uses Brotli compression.

Based on my quick tests, a typical ePub's TTF will see about a 50% reduction in font size. For typical "English" language fonts, that's a reduction from 30KB to 15KB. So big relative compression, but small absolute compression.

Complex decorative fonts can go from 800KB to 80KB. But it is rare for a font to exceed a megabyte.

If it does, that usually means that it has more glyphs than strictly necessary. If your book is written entirely in the Latin alphabet, do you really need all those fancy accents, Chinese ideographs, and emoji? Probably not.

I've previously written about Subsetting Fonts and the perils of excessive trimming.

Back to Basics

Brotli is magic - but changing the compression algorithm for the ePub standard is probably a false economy. The text portion of modern eBooks is already fairly small and compresses with reasonable efficiency.

The best compression gains come from either using next-generation image and font formats or, if legacy compatibility is necessary, using the most aggressive compression settings for traditional images.


  1. OK! It is actually XHTML, but let's not quibble. ↩︎

  2. That's a fancy way of saying "old". ↩︎

  3. I couldn't be bothered automating this. Go ahead a run it on every ePub if you want something more representative. ↩︎


Share this post on…

3 thoughts on “Would adding Brotli Compression help shrink ePubs?”

  1. I'd want to know how Brotli does vs. Deflate on low-spec hardware. We wouldn't want to make it take excessively longer to unpack things, thus slowing page turns unless absolutely necessary.

    Reply

    1. It is unlikely to slow down page turning speed. It's possible to have every chapter be its own file - so flipping to a new chapter could be slower. But I suspect most readers decompress the book in its entirety in order to render it. So slightly slower on first open, but no impact on navigation.

      Reply

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

See allowed HTML elements: <a href="" title="">
<abbr title="">
<acronym title="">
<b>
<blockquote cite="">
<br>
<cite>
<code>
<del datetime="">
<em>
<i>
<img src="" alt="" title="" srcset="">
<p>
<pre>
<q cite="">
<s>
<strike>
<strong>

To respond on your own website, write a post which contains a link to this post - then enter the URl of your page here. Learn more about WebMentions.