Using Pandoc to format a Dissertation from Markdown to HTML, PDF, and ePub
Metawork is so much more fun than real work. Sharpening your pencils. Colour coordinating your filing system. Creating Gantt charts of what you intend to do. Marvellous!
In that spirit, here's how I used the venerable pandoc
to convert my MSc dissertation from .md into a variety of more readable formats.
Prep
I've no idea what you already have installed on your system but, at a minimum, you need to install the latest version of pandoc and you'll need a modern version of the weasyprint library.
I found pandoc's dependencies... interesting. Depending on your operating system, you may find yourself having to install all sorts of esoteric libraries.
Commands
This is what you'll need to run to generate the outputs. I'll show you the commands, then what you need in each file.
HTML:
pandoc dissertation.md \
--citeproc --metadata-file=metadata.yml --embed-resources --standalone \
-o ../output/dissertation.html
ePub:
pandoc dissertation.md \
--citeproc --metadata-file=metadata.yml --epub-embed-font=fonts/font.ttf \
-o ../output/dissertation.epub
PDF:
pandoc dissertation.md \
--citeproc --metadata-file=metadata.yml --pdf-engine=weasyprint \
-o ../output/dissertation.pdf
Metadata
You'll need a metadata file called metadata.yml
. It must start and finish with ---
YAML---
lang: en-GB
title: My awesome dissertation
bibliography: Bibliography.bib
csl: harvard-cite-them-right.csl
link-citations: true
reference-section-title: References
author: Terence Eden
creator: Terence Eden
rights: 🄯 CC BY-NC 4.0
keywords: [Some, comma, separated, keywords]
date: 2023-04-24
description: A dissertation about stuff
css: style.css
cover-image: media/cover.jpg
---
Hopefully all those entries are self-explanatory. Now let's go into each extra file that this requires!
Bibliography
I assume that you're using Zotero or some other citation manager. Export all your citations using in Better BibTex format as Bibliography.bib.
You may need to install a Better BibTex plugin.
Citation Style Language - CSL
Your citation manager probably already has your preferred CSL format. If not, you can download the .csl file from Zotero or from the CSL GitHub.
Layout CSS
The layout for HTML, ePub, and PDF is all controlled by CSS. Here's a minimum viable stylesheet. It's designed for A4 paper as well as screens. Figures and tables stay on their own printed page. Sections also get a page-break.
I've commented any interesting bits. Feel free to add to it for your own use:
CSS@page {
/* Set paper options */
size: A4;
margin: 1cm;
}
@media print {
/* Stop figures & tables from being broken */
figure, table {
break-inside: avoid-page;
}
/* Put each section on a new page*/
h2 {
break-before: always;
}
}
@media screen {
body {
max-width: 45em;
margin-left: auto;
margin-right: auto;
}
}
@font-face {
font-family: "My-Font";
/* WOFF2 for web, TTF for ePub */
src:
url("fonts/font.ttf"),
url("fonts/font.woff2") format("woff2");
}
html {
font-size: 1em;
}
body {
text-align: justify;
font-family: "My-Font", sans-serif;
}
a {
color: #001aff;
}
img {
max-width: 100%;
margin: auto;
display: block;
}
/* Make references spaced out better */
#refs > div {
padding-bottom: 1em;
}
figure {
border: 1px solid gray;
max-width: fit-content;
}
figcaption {
font-size: .9em;
color: rgb(24, 24, 24);
text-align: center;
}
table {
width: 100%;
border-collapse: collapse;
font-size: .9em;
}
td {
padding: .5em;
border: 1px solid black;
text-align: left;
}
h2 {
text-align: left;
}
p {
margin:.5em;
}
Fonts
If you want a custom font, you'll need it in TTF and WOFF2 format. Not all eReaders can use WOFF2, so TTF is needed as a fallback. I recommend using FontSquirrel to generate the font formats.
Markdown
OK, now here we go! You can use a mix of Markdown and HTML.
A few points that may not be obvious:
- No need for a
#
top level heading. Pandoc will insert one for you. - No need for a table of contents. Again, let Pandoc doc it.
- You will need to manually add lists for figures, acronyms, tables, glossary, etc.
- I like to use internal links in Markdown. For example
MARKDOWN## 1. Introduction {#1-introduction}
### 1.1 Context {#1-1-context}
- That lets you write
as can be seen [in Methodology](#2-Methodology)
- Speaking of which, I couldn't find a good way to do figures and captions in Markdown, so I reverted to HTML:
HTML<figure id="figure-01">
<figcaption>Figure 01 - A fictional depiction of 3D visualisations of Cybersecurity interfaces</figcaption>
<img src="media/jurassic.jpg" width="" alt="Screenshots from the movie Jurassic Park. A young woman looks at a 3D display on a monitor. She exclaims It's a UNIX System. I know this.">
<figcaption>[@spielbergJurassicPark1993]</figcaption>
</figure>
- Notice the way citations are done there? I, again, use Zotero and a VS Code plugin to insert citations. Don't attempt to do them by hand!
- No need to add your own list of references. Pandoc will add them where it sees:
HTML<div id="refs">
## References
</div>
So, a sample document will look something like:
MARKDOWN## 1. Introduction {#1-introduction}
### 1.1 Context {#1-1-context}
Blah blah blah
![Alt text.](media/photo.jpg)
Or
<figure id="figure-01">
<figcaption>Figure 01</figcaption>
<img src="media/example.jpg" width="" alt="Description.">
<figcaption>[@spielbergJurassicPark1993]</figcaption>
</figure>
<div id="refs">
## References
</div>
## Appendix
* Some
* Bullets
Is it worth it?
Well, now, there's the question! You can just export these formats from Google Docs, Office 365, etc. But this method gives you much more control over what the end-product looks like.
Isn't yak-shaving fun⸮
Andy Mabbett says:
No ORCID iD (or other PID) in your metadata? No URI?
@edent says:
If you read the MSc itself, there is an ORCiD. But I couldn't find a suitable way to put it directly in the metadata and have it show in the document.
More comments on Mastodon.