Is GitHub Racist?

One of the interesting aspects of privilege is how it lays bare our unconscious assumptions about the world. A male software developer may never consider that a user would want or need to change their name. Thus they would design a product which ignored the millions of women changing their names after marriage.

It's very temping to see software as racist when, in reality, it's more likely to have a root cause of unconscious assumptions.

Take, for example, GitHub. You can host all of your software projects on there - as long as you speak English.

Wait? What?

Try adding a repository which contains, say, Chinese - and all those beautiful characters will be replaced with "-".
Chinese GitHub

I asked GitHub about this, and quickly got this reply.

Unfortunately, at the moment, you can only use ASCII (i.e. Windows-1252) characters in Repo names. Most things on GitHub.com support non-ASCII but because of limitations in Git, the repo name isn't one of them. Sorry about the international-unfriendliness

Interestingly, that's not quite the case. Windows-1252 contains some characters with accents - they simply aren't recognised by GitHub.

Accents Github

We don't live in a homogeneous world. US English is not the global language. Even if it was, ASCII is insufficient to the task of information interchange.

ASCII was invented in 1972 - 40 years later and our brand new shiny kit is hamstrung by the needs of the telegraph industry! It's like that wonderful urban legend about the Space Shuttle being constrained by the size of a horse's arse.

Obviously, GitHub isn't racist. Either they or the originators of Git have assumed that their local dialect is sufficient for a service which aims to be universally acceptable. All the more strange given that Linus Torvalds, the creator of Git, is Finnish and - one presumes - knows about ääkköset (the "extra" letters in the Finnish alphabet).

At this stage in the maturity of the software industry, we should consider the practice of not supporting Unicode as outmoded and dangerous as assuming every year can be represented by a two digit number.

There's a world outside our narrow viewpoint and, if we want to do business with that world, we need to speak their language.


8 Responses to “Is GitHub Racist?”

  1. Ahmet Alp Balkan (@ahmetalpbalkan) Image of Ahmet Alp Balkan (@ahmetalpbalkan)

    Nice point, however there are also issues with email addresses, domain names, and even Windows drive names (e.g. C:\). Not everybody is entirely being localized, but you are right. Browsers support domain names and TLDs in UTF-8 and there can be a transition over time.

    Reply
  2. Luke Shumaer Image of Luke Shumaer

    > Either they or the originators of Git have assumed that their local dialect is
    > sufficient for a service which aims to be universally acceptable. All the more
    > strange given that Linus Torvalds, the creator of Git, ...

    This is a limitation of GitHub, not Git itself.

    Reply
    • Terence Eden Image of Terence Eden

      Interesting. I couldn't find anything in Git which suggested a limitation within it. Although that's what I understood from GitHub's message.

      Reply
  3. Carl Image of Carl

    "A male software developer may never consider that a user would want or need to change their name. Thus they would design a product which ignored the millions of women changing their names after marriage."

    I don’t know if you wrote that intentionally to reinforce your point about people making "unconscious assumptions" about the world, but I feel you should address the fact that many men change their surnames when they are married and many women keep their own surnames. That’s sexism, not just an unconscious assumption.

    Reply
  4. Charlie Image of Charlie

    It will probably be a limitation of the underlying software stack, perhaps how repos are replicated. There are still users out there with OSes which do not do UTF-8 very well.

    Also it can still be a problem guessing the correct encoding for user-provided data (some browsers are terrible at this).

    Reply
  5. Sam Image of Sam

    It is a common mistake to think 2bit is all it takes for Unicode. Pinyin, Gugyeol, Hiragana have all the spaces need for all the letters, but significant number of languages haven’t given enough spaces for all the characters – so they use a script engine (such as indic-script for indic languages) to draw a third character using two other characters next to each other. So Unicode is not hunky dory for most of the countries.

    Reply
  6. yan Image of yan

    There is a real limitation with git, which is that git doesn't recognize unicode equivalency (https://en.wikipedia.org/wiki/Unicode_equivalence). OS X filesytems generally use NFD for unicode decomposition of filenames, Linux usually uses NFC. What this means is that if you git clone a repo created on a filesystem with a different unicode decomposition convention, git might untrack any of the files with non-ascii in the filename.

    Reply

What Do You Reckon?