Is GitHub Racist?


One of the interesting aspects of privilege is how it lays bare our unconscious assumptions about the world. A male software developer may never consider that a user would want or need to change their name. Thus they would design a product which ignored the millions of women changing their names after marriage.

It's very temping to see software as racist when, in reality, it's more likely to have a root cause of unconscious assumptions.

Take, for example, GitHub. You can host all of your software projects on there - as long as you speak English.

Wait? What?

Try adding a repository which contains, say, Chinese - and all those beautiful characters will be replaced with "-". Chinese GitHub

I asked GitHub about this, and quickly got this reply.

Unfortunately, at the moment, you can only use ASCII (i.e. Windows-1252) characters in Repo names. Most things on GitHub.com support non-ASCII but because of limitations in Git, the repo name isn't one of them. Sorry about the international-unfriendliness

Interestingly, that's not quite the case. Windows-1252 contains some characters with accents - they simply aren't recognised by GitHub.

Accents Github

We don't live in a homogeneous world. US English is not the global language. Even if it was, ASCII is insufficient to the task of information interchange.

ASCII was invented in 1972 - 40 years later and our brand new shiny kit is hamstrung by the needs of the telegraph industry! It's like that wonderful urban legend about the Space Shuttle being constrained by the size of a horse's arse.

Obviously, GitHub isn't racist. Either they or the originators of Git have assumed that their local dialect is sufficient for a service which aims to be universally acceptable. All the more strange given that Linus Torvalds, the creator of Git, is Finnish and - one presumes - knows about ääkköset (the "extra" letters in the Finnish alphabet).

At this stage in the maturity of the software industry, we should consider the practice of not supporting Unicode as outmoded and dangerous as assuming every year can be represented by a two digit number.

There's a world outside our narrow viewpoint and, if we want to do business with that world, we need to speak their language.


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

11 thoughts on “Is GitHub Racist?”

  1. Nice point, however there are also issues with email addresses, domain names, and even Windows drive names (e.g. C:). Not everybody is entirely being localized, but you are right. Browsers support domain names and TLDs in UTF-8 and there can be a transition over time.

    Reply
  2. Luke Shumaer says:

    > Either they or the originators of Git have assumed that their local dialect is > sufficient for a service which aims to be universally acceptable. All the more > strange given that Linus Torvalds, the creator of Git, ...

    This is a limitation of GitHub, not Git itself.

    Reply
    1. says:

      Interesting. I couldn't find anything in Git which suggested a limitation within it. Although that's what I understood from GitHub's message.

      Reply
  3. Carl says:

    "A male software developer may never consider that a user would want or need to change their name. Thus they would design a product which ignored the millions of women changing their names after marriage."

    I don’t know if you wrote that intentionally to reinforce your point about people making "unconscious assumptions" about the world, but I feel you should address the fact that many men change their surnames when they are married and many women keep their own surnames. That’s sexism, not just an unconscious assumption.

    Reply
  4. Charlie says:

    It will probably be a limitation of the underlying software stack, perhaps how repos are replicated. There are still users out there with OSes which do not do UTF-8 very well.

    Also it can still be a problem guessing the correct encoding for user-provided data (some browsers are terrible at this).

    Reply
  5. says:

    It is a common mistake to think 2bit is all it takes for Unicode. Pinyin, Gugyeol, Hiragana have all the spaces need for all the letters, but significant number of languages haven’t given enough spaces for all the characters – so they use a script engine (such as indic-script for indic languages) to draw a third character using two other characters next to each other. So Unicode is not hunky dory for most of the countries.

    Reply
  6. says:

    There is a real limitation with git, which is that git doesn't recognize unicode equivalency (https://en.wikipedia.org/wiki/Unicode_equivalence). OS X filesytems generally use NFD for unicode decomposition of filenames, Linux usually uses NFC. What this means is that if you git clone a repo created on a filesystem with a different unicode decomposition convention, git might untrack any of the files with non-ascii in the filename.

    Reply
  7. Morris says:

    Interesting observation. As a side note: Bitbucket handles it more gracefully. There, you are allowed to create repositories with non-ASCII names. While the repo name itself will get transcribed to ASCII, the website created will maintain the UTF-8 characters.

    Reply

Trackbacks and Pingbacks

What links here from around this blog?

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">