Using Pandoc to format a Dissertation from Markdown to HTML, PDF, and ePub


Metawork is so much more fun than real work. Sharpening your pencils. Colour coordinating your filing system. Creating Gantt charts of what you intend to do. Marvellous!

In that spirit, here's how I used the venerable pandoc to convert my MSc dissertation from .md into a variety of more readable formats.

Prep

I've no idea what you already have installed on your system but, at a minimum, you need to install the latest version of pandoc and you'll need a modern version of the weasyprint library.

I found pandoc's dependencies... interesting. Depending on your operating system, you may find yourself having to install all sorts of esoteric libraries.

Commands

This is what you'll need to run to generate the outputs. I'll show you the commands, then what you need in each file.

HTML:

pandoc dissertation.md \
   --citeproc --metadata-file=metadata.yml --embed-resources --standalone \
    -o ../output/dissertation.html

ePub:

pandoc dissertation.md \
   --citeproc --metadata-file=metadata.yml --epub-embed-font=fonts/font.ttf \
   -o ../output/dissertation.epub

PDF:

pandoc dissertation.md \
   --citeproc --metadata-file=metadata.yml --pdf-engine=weasyprint \
   -o ../output/dissertation.pdf

Metadata

You'll need a metadata file called metadata.yml. It must start and finish with ---

 YAML---
lang
: en-GB
title
: My awesome dissertation
bibliography
: Bibliography.bib
csl
: harvard-cite-them-right.csl
link-citations
: true
reference-section-title
: References
author
: Terence Eden
creator
: Terence Eden
rights
: 🄯 CC BY-NC 4.0
keywords
: [Some, comma, separated, keywords]
date
: 2023-04-24
description
: A dissertation about stuff
css
: style.css
cover-image
: media/cover.jpg
---

Hopefully all those entries are self-explanatory. Now let's go into each extra file that this requires!

Bibliography

I assume that you're using Zotero or some other citation manager. Export all your citations using in Better BibTex format as Bibliography.bib.

You may need to install a Better BibTex plugin.

Citation Style Language - CSL

Your citation manager probably already has your preferred CSL format. If not, you can download the .csl file from Zotero or from the CSL GitHub.

Layout CSS

The layout for HTML, ePub, and PDF is all controlled by CSS. Here's a minimum viable stylesheet. It's designed for A4 paper as well as screens. Figures and tables stay on their own printed page. Sections also get a page-break.

I've commented any interesting bits. Feel free to add to it for your own use:

CSS CSS@page {
    /* Set paper options */
    size: A4;
    margin: 1cm;
}

@media print {
    /* Stop figures & tables from being broken */
    figure, table {
      break-inside: avoid-page;
    }

    /*  Put each section on a new page*/
    h2 {
        break-before: always;
    }
}

@media screen {
    body {
        max-width: 45em;
        margin-left: auto;
        margin-right: auto;
    }
}

@font-face {
    font-family: "My-Font";
    /* WOFF2 for web, TTF for ePub */
    src:
        url("fonts/font.ttf"),
        url("fonts/font.woff2") format("woff2");
}

html {
    font-size: 1em;
}

body {
    text-align: justify;
    font-family: "My-Font", sans-serif;
}

a {
    color: #001aff;
}

img {
    max-width: 100%;
    margin: auto;
    display: block;
}

/* Make references spaced out better */
#refs > div {
    padding-bottom: 1em;
}

figure {
    border: 1px solid gray;
    max-width: fit-content;
}

figcaption {
    font-size: .9em;
    color: rgb(24, 24, 24);
    text-align: center;
}

table {
    width: 100%;
    border-collapse: collapse;
    font-size: .9em;
}
td {
    padding: .5em;
    border: 1px solid black;
    text-align: left;
}

h2 {
    text-align: left;
}

p {
    margin:.5em;
}

Fonts

If you want a custom font, you'll need it in TTF and WOFF2 format. Not all eReaders can use WOFF2, so TTF is needed as a fallback. I recommend using FontSquirrel to generate the font formats.

Markdown

OK, now here we go! You can use a mix of Markdown and HTML.

A few points that may not be obvious:

  • No need for a # top level heading. Pandoc will insert one for you.
  • No need for a table of contents. Again, let Pandoc doc it.
  • You will need to manually add lists for figures, acronyms, tables, glossary, etc.
  • I like to use internal links in Markdown. For example
 MARKDOWN## 1. Introduction {#1-introduction}

### 1.1 Context {#1-1-context}
  • That lets you write as can be seen [in Methodology](#2-Methodology)
  • Speaking of which, I couldn't find a good way to do figures and captions in Markdown, so I reverted to HTML:
HTML HTML<figure id="figure-01">
   <figcaption>Figure 01 - A fictional depiction of 3D visualisations of Cybersecurity interfaces</figcaption>
   <img src="media/jurassic.jpg" width="" alt="Screenshots from the movie Jurassic Park. A young woman looks at a 3D display on a monitor. She exclaims It's a UNIX System. I know this.">
   <figcaption>[@spielbergJurassicPark1993]</figcaption>
</figure>
HTML HTML<div id="refs">
## References
</div>

So, a sample document will look something like:

 MARKDOWN## 1. Introduction {#1-introduction}

### 1.1 Context {#1-1-context}
Blah blah blah

![Alt text.](media/photo.jpg)

Or

<figure id="figure-01">
   <figcaption>Figure 01</figcaption>
   <img src="media/example.jpg" width="" alt="Description.">
   <figcaption>[@spielbergJurassicPark1993]</figcaption>
</figure>

<div id="refs">
## References
</div>

## Appendix
* Some
* Bullets

Is it worth it?

Well, now, there's the question! You can just export these formats from Google Docs, Office 365, etc. But this method gives you much more control over what the end-product looks like.

Isn't yak-shaving fun⸮


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

2 thoughts on “Using Pandoc to format a Dissertation from Markdown to HTML, PDF, and ePub”

    1. @edent says:

      If you read the MSc itself, there is an ORCiD. But I couldn't find a suitable way to put it directly in the metadata and have it show in the document.

      Reply

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">