Shakespeare Serif - an experimental font based on the First Folio
Disclaimer! Work In Progress! See source code.
I recently read this wonderful blog post about using 17th Century Dutch fonts on the web. And, because I'm an idiot, I decided to try and build something similar using Shakespeare's first folio as a template.
Now, before setting off on a journey, it is worth seeing if anyone else has tried this before. I found David Pustansky's First Folio Font. There's not much info about it, other than it's based on the 1623 folio. It's a nice font, but missing brackets and a few other pieces of punctuation. Also, no ligatures. And the long s is in the wrong place.
So, let's try to build a font!
You can read how it works, or skip straight to the demo.
Get some scans
There are various scans of the First Folio. I picked The Bodlian's scan as it seemed the highest resolution.
I plucked a couple of pages at random to see what I could find. Of course, a modern font can't replicate the vagaries of hot metal printing. As you can see here, each letter "y" is substantially different.
Within the plays, there are some italic characters - which could be used to make a variant font. You can also see just how poor quality some of the letters are.
There are also plenty of ligatures to choose from:
Ready? Let's go!
Extract the characters
This Python code reads in an image file. It then extracts every distinct letter, number, and punctuation mark. It then detects which character it is and saves each glyph to disk with a filename like this:
As you can see, the text detection is good, but the letter recognition is poor.
Python 3import cv2
import pytesseract
from PIL import Image
def preprocess_image(image_path):
# Load the image using OpenCV
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
# Thresholding to convert to binary image
_, binary_image = cv2.threshold(image, 128, 255, cv2.THRESH_BINARY_INV)
# Find contours to isolate individual letters
contours, _ = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
return image, contours
def extract_and_save_letters(image, contours, output_directory):
# Create output directory if it doesn't exist
import os
if not os.path.exists(output_directory):
os.makedirs(output_directory)
for i, contour in enumerate(contours):
x, y, w, h = cv2.boundingRect(contour)
# Crop and save each letter as a separate image
letter_image = image[y:y + h, x:x + w]
# (Don't) Perform OCR to extract the text (letter) from the contour
letter_text = "_"
#letter_text = pytesseract.image_to_string(letter_image, config='--psm 10')
#letter_text = letter_text.strip() # Remove leading/trailing whitespace
# Create a filename with the detected letter
letter_filename = f"letter_{letter_text}_{i}.png"
letter_path = os.path.join(output_directory, letter_filename)
cv2.imwrite(letter_path, letter_image)
if __name__ == "__main__":
input_image_path = "letters.jpg"
output_directory = "/tmp/letters/"
# Preprocess the image
image, contours = preprocess_image(input_image_path)
# Perform OCR and save individual letters
extract_and_save_letters(image, contours, output_directory)
Something to note - the CHAIN_APPROX_SIMPLE
is looking for contiguous characters. So it loses the dots from i
, j
, :
, and ;
. But it is quick.
Detecting Dots
In order to get glyphs which vertically separate, we need to vertically erode the image so it looks like this:
Python 3# Erode the image vertically
kernel = np.array([[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0]], dtype=np.uint8)
erode = cv2.erode(image, kernel,iterations = 6)
We use this eroded image for contiguous detection - but we do the actual cropping on the original image.
As you can see, it does make some character touch each other - which means you get occasional crops like this:
They can either be manually split, or ignored.
Put each letter into a folder
There's no automated way to do this. It's just a lot of tedious dragging and dropping. It's hard to tell the difference between o and O, or commas and apostrophes.
Ideally we want several of each glyph because we're about to...
Find the average letterform
Here's a selection of letter "e" images which were extracted.
I didn't want to make some rather arbitrary decisions on which letters I like best. So I cheated.
I copied all the letter "e"s into a folder. I used Python to create the average letter based on the two-dozen or so that I'd extracted. This code takes all the images in a directory, and spits out a 1bpp average letter - like this:
Python 3import os
import numpy as np
import argparse
import math
from PIL import Image
def get_arguments():
ap = argparse.ArgumentParser()
ap.add_argument('-l', '--letter', type=str,
help='The letter you want to average')
arguments = vars(ap.parse_args())
return arguments
def load_and_resize_images_from_directory(directory, target_size):
image_files = [f for f in os.listdir(directory) if f.endswith(".png")]
images = []
for image_file in image_files:
image_path = os.path.join(directory, image_file)
print("Reading " + image_path)
image = Image.open(image_path).convert("L") # Convert to grayscale
# Create a new white background image
new_size = (target_size[0], target_size[1])
new_image = Image.new("L", new_size, color=255) # White background
old_width, old_height = image.size
# Center the image
x1 = int(math.floor((target_size[0] - old_width) / 2))
y1 = int(math.floor((target_size[1] - old_height) / 2))
# Paste the image at the center
new_image.paste(image, (x1, y1, x1 + old_width, y1 + old_height))
# Make it larger to see if that improves the curve detection
new_image = new_image.resize( (600,600), Image.LANCZOS)
images.append(new_image)
return images
def calculate_average_image(images):
# Convert the list of images to numpy arrays
images_array = [np.array(img) for img in images]
# Calculate the average image along the first axis
average_image = np.mean(images_array, axis=0)
return average_image
def convert_to_1bpp(average_image, threshold=120):
# Convert the average image to 1bpp by setting a threshold value
binary_image = np.where(average_image >= threshold, 255, 0).astype(np.uint8)
return binary_image
def save_1bpp_image(binary_image, output_path):
# Convert the numpy array to a binary image
binary_image = Image.fromarray(binary_image, mode="L")
# Save the 1bpp monochrome image to the specified output path
binary_image.save(output_path)
if __name__ == "__main__":
args = get_arguments()
letter = args['letter']
input_directory = "../letters/" + letter + "/"
output_png_path = "../letters/" + letter + ".png"
target_size = (75, 75) # Set the desired target size for resizing
# Load, resize, and add border to all images from the directory
images = load_and_resize_images_from_directory(input_directory, target_size)
# Calculate the average image
average_image = calculate_average_image(images)
# Convert the average image to 1bpp
binary_image = convert_to_1bpp(average_image)
# Save the 1bpp monochrome image
save_1bpp_image(binary_image, output_png_path)
One Big Image
The next step is to create a single image which holds all of the glyphs. Our good friend ImageMagick comes to the rescue here:
montage *.png -tile 12x8 -geometry +10+10 all_glyphs.png
That takes all of the average symbol .png files and places them on a single image. It looks like this:
Trace Those Glyphs
The GlyphTracer App will take the image and generates a Spline Font Database. It isn't the most intuitive app to use. But after a bit of clicking around you can work out how to assign each image to a glyph.
GlyphTracer uses potrace which turns those raggedy rasters into smoothly curved paths.
Once done, we're on to the next step.
Forge Those Fonts!
The venerable FontForge will open the SFD and show us what the proto-font looks like:
As you can see, all the letters have been vertically centred. So double tap and edit their position - you can also adjust the curves if you like:
The final result looks something like this:
FontForge's "File" ➡ "Generate Font" will let you save the output as TTF, WOFF2, or anything else you want.
Demo!
Here's what the font looks like when rendered on the web:
Two houſeholds, both alike in dignity!
Alas poor Yorik; I knew him Horatio.
To be? Or not to be? That's the uestion.
Bump sickly, vexing wizard! Be sly, fox, and charm the dragon's breath.
TODO
- Get more sample images from the 1st Folio.
- Extract more letters, numbers, ligatures, and symbols.
- Sort symbols into sub-directories.
- Generate font with complete alphabet.
- Tidy up curves.
- Set correct height, ascenders, descenders, etc.
- Make the ligatures automatic.
- Other font stuff that I haven't even thought of yet!
Want to help out? See the source code on GitHub.
Eric Meyer said on mastodon.social:
@Edent I love older fonts (IM Fell is one of the fonts used on my site) and this is fantastic.
Also, with your post you’ve given me the tools to jump-start a font recreation project I’ve had in mind for a bit, so thank you!
Shaula Evans said on zirk.us:
@Edent My heart! This is stunning.
R.J. Faas said on mastodon.social:
@Edent Such an amazing project. Makes me wish I was still teaching typography to community college students (without making the 90 mile drive to campus).
@edent says:
Further discussion on Lemmy - https://beehaw.org/post/6867184
Ben Schmidt said on sigmoid.social:
@Edent Can't figure out how to comment on Lemmy, but--have you ever encountered the Print & Probability project by @mdlincoln and others? They have, if I understand correctly, some kind of API for accessing individual characters in 17th century English books that might interesting to use here. https://printprobdb.psc.edu/api/docs/#operation/characters_update Print & Probability
Nicole Sullivan said on front-end.social:
@Edent I love this! What a cool project.
Rockwalrus says:
You can have several variants of a glyph in a Unicode font and have them display semi-randomly. See this link for several examples: https://opentypecookbook.com/common-techniques/ (search for randomization.)
Matthew Malthouse said on mstdn.social:
@Edent My first thought was, surely someone must have… And I was really surprised that no one seems to have. Nice job!
z3z said on mastodon.scot:
@Edent This is amazing!
More comments on Mastodon.