Banish the � with Unifont

by @edent | # # | 7 comments | Read ~4,982 times.

The GNU Unifont project is amazing. It contains every Unicode glyph in one single file! I am going to argue that you should bundle it with your apps, your operating systems, and - at a pinch - your websites.

The Unifont is a perfect fallback font. If your app or website uses a Unicode character which isn't supported on a device, the user will usually see � - a replacement character. If you include Unifont, they'll see the correct character.

There are two downsides:

  1. The TTF font is 12MB.
  2. The font design is... Spartan... Each character is a maximum of 16-by-16 pixels.

Good enough for "Latin": Latin characters with accents.
"CJK": Chinese, Japanese, and Korean pictograms.

and Emoji: Lots of Emoji

Compression

Using the WOFF2 format, Unifont compresses to a svelte 1.5MB. I am not arguing that you should serve an extra couple of MB on every webpage (although modern sites are so bloated it probably doesn't matter...) - but perhaps you should bundle it with your webview apps.

Use on the web

I've converted it, so you can directly download unifont-12.0.01.woff2 which covers all Basic Multilingual Plane characters. You can also download the Upper Planes for Emoji and more esoteric characters.

If anyone knows how to combine WOFF2 fonts - please let me know!

CSS

@font-face {
    font-family: "unifont";
    src: url("unifont-12.0.01.woff2");
}
@font-face {
  font-family: "unifont-upper";
  src: url("unifont_upper-12.0.01.woff2");
  unicode-range: U+10000-10FFFF;
}

body {
    font-family: "unifont", "unifont-upper"
}

You can use it on your webpage, it is GPLv2 with the font licencing exemption.

Demo!

You can see typography examples on my demo page.

Does it work for every language?

Almost!

Unifont only stores one glyph per printable Unicode code point. This means that complex scripts with special forms for letter combinations including consonant combinations and floating vowel marks such as with Indic scripts (Devanagari, Bengali, Tamil, etc.) or letters that change shape depending upon their position in a word (Indic and Arabic scripts) will not render well in Unifont. In those cases, Unifont is only suitable as a font of last resort. Users wishing to properly render such complex scripts should use full OpenType fonts that faithfully display such alternate forms.

Are you serious?

Pretty much! Like I said earlier, including an extra couple of MB isn't always necessary - but it isn't excessive on complex sites.

Ideally, modern operating systems would include all the characters you'd ever need and be regularly updated. Sadly, that's rarely the case.

One of the best things about Unifont is that it is regularly updated. Usually there are several updates each year - mostly as soon as there's a new Unicode release.

Sure, the font isn't the prettiest in the world. But I will always prefer a pixelated symbol rather than a �.

7 thoughts on “Banish the � with Unifont

  1. mart-e says:

    If anyone knows how to combine WOFF2 fonts - please let me know!

    You can actually just use the same font-family name in two different font-face and trust the unicode range to sleect the right one
    https://jakearchibald.com/2017/combining-fonts/
    https://css-tricks.com/whats-deal-declaring-font-properties-font-face/#article-header-id-1

    1. @edent says:

      Thanks. But, for tidiness, I'd like to have them all in one file. I can't find a way to do that.

  2. pomax says:

    I'm curious to know how you fit every glyph, when the spec makes that impossible due to there being way more code points than fit in a USHORT, the data type used for glyph ids...

    1. @edent says:

      Magic? I don't know. I'm just going by what the Unifont website says. And the fact that FontForge runs out of memory whenever I try to manipulate them.

      1. Pomax says:

        It very explicitly does not. The project website rather clearly states "This page contains the latest release of GNU Unifont, with glyphs for every printable code point in the Unicode Basic Multilingual Plane (BMP). The BMP occupies the first 65,536 code points", which is a an incredibly far cry from "The GNU Unifont project is amazing. It contains every Unicode glyph in one single file! ".

    2. Matt M says:

      Looks like they split the BMP (one ushort worth) and astral plane into two fonts which makes this work.

  3. Barney says:

    This is great! An ideal companion to this app I just built for typeahead character searching. Thanks! https://asapacan.github.io/charsearch/

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.