Banish the � with Unifont


The GNU Unifont project is amazing. It contains every Unicode glyph in one single file! I am going to argue that you should bundle it with your apps, your operating systems, and - at a pinch - your websites.

The Unifont is a perfect fallback font. If your app or website uses a Unicode character which isn't supported on a device, the user will usually see � - a replacement character. If you include Unifont, they'll see the correct character.

There are two downsides:

  1. The TTF font is 12MB.
  2. The font design is... Spartan... Each character is a maximum of 16-by-16 pixels.

Good enough for "Latin": Latin characters with accents. "CJK": Chinese, Japanese, and Korean pictograms.

and Emoji: Lots of Emoji

Compression

Using the WOFF2 format, Unifont compresses to a svelte 1.5MB. I am not arguing that you should serve an extra couple of MB on every webpage (although modern sites are so bloated it probably doesn't matter...) - but perhaps you should bundle it with your webview apps.

Use on the web

I've converted it, so you can directly download unifont-12.0.01.woff2 which covers all Basic Multilingual Plane characters. You can also download the Upper Planes for Emoji and more esoteric characters.

If anyone knows how to combine WOFF2 fonts - please let me know!

CSS

@font-face {
    font-family: "unifont";
    src: url("unifont-12.0.01.woff2");
}
@font-face {
  font-family: "unifont-upper";
  src: url("unifont_upper-12.0.01.woff2");
  unicode-range: U+10000-10FFFF;
}

body {
    font-family: "unifont", "unifont-upper"
}

You can use it on your webpage, it is GPLv2 with the font licencing exemption.

Demo!

You can see typography examples on my demo page.

Does it work for every language?

Almost!

Unifont only stores one glyph per printable Unicode code point. This means that complex scripts with special forms for letter combinations including consonant combinations and floating vowel marks such as with Indic scripts (Devanagari, Bengali, Tamil, etc.) or letters that change shape depending upon their position in a word (Indic and Arabic scripts) will not render well in Unifont. In those cases, Unifont is only suitable as a font of last resort. Users wishing to properly render such complex scripts should use full OpenType fonts that faithfully display such alternate forms.

Are you serious?

Pretty much! Like I said earlier, including an extra couple of MB isn't always necessary - but it isn't excessive on complex sites.

Ideally, modern operating systems would include all the characters you'd ever need and be regularly updated. Sadly, that's rarely the case.

One of the best things about Unifont is that it is regularly updated. Usually there are several updates each year - mostly as soon as there's a new Unicode release.

Sure, the font isn't the prettiest in the world. But I will always prefer a pixelated symbol rather than a �.


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

7 thoughts on “Banish the � with Unifont”

  1. says:

    I'm curious to know how you fit every glyph, when the spec makes that impossible due to there being way more code points than fit in a USHORT, the data type used for glyph ids...

    Reply
    1. says:

      Magic? I don't know. I'm just going by what the Unifont website says. And the fact that FontForge runs out of memory whenever I try to manipulate them.

      Reply
      1. says:

        It very explicitly does not. The project website rather clearly states "This page contains the latest release of GNU Unifont, with glyphs for every printable code point in the Unicode Basic Multilingual Plane (BMP). The BMP occupies the first 65,536 code points", which is a an incredibly far cry from "The GNU Unifont project is amazing. It contains every Unicode glyph in one single file! ".

        Reply
    2. Matt M says:

      Looks like they split the BMP (one ushort worth) and astral plane into two fonts which makes this work.

      Reply

What links here from around this blog?

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">