Facebook Mangles Unicode URLs


Facebook rewrite URLs with Unicode in the path - this is not best practice and could be dangerous.

It is possible to create a URL like http://bit.ly/😀 - the Unicode characters are valid in the path.

The URL Encoded representation is :

bit.ly/%F0%9F%98%80

Facebook mangles these URLs in such a way that it might be possible to redirect a user to a malicious site.

Here's what's happening. When Facebook sees the "😀" character in text, it rewrites it to the "󾰀" character (󾰀). That's a "private use character". This means Facebook can replace the user's computer's default smiley with a Facebook supplied image or font glyph - if it wants.

In normal text - such as "I passed my exams 😀" - changing the smiley is doesn't present a problem, but Facebook also replaces the text in a URL!

So, the URL :

bit.ly/%F0%9F%98%80%F0%9F%98%80

Will point to a Facebook security page.

Facebook changes the URL to :

bit.ly/%F3%BE%B0%80%F3%BE%B0%80

Which points elsewhere - bit.ly/󾰀󾰀.

I performed a couple of quick experiments. It is sometimes possible to post a link which displays a preview of a "good" site, but when clicked on leads to a bad site.

rickroll-fs8

The chances of this being used as a successful attack vector are slim. Tricking the user into clicking on a link which subsequently steals their password is made marginally easier if the link and link preview don't match - but I'm sure there are easier ways of deceiving the user.

The real issue here is that Facebook is altering the text that you write - and that can have unexpected consequences.

We live in a non-ASCII world now. A URL like https://莎士比亚.org/奥瑟罗 is perfectly valid. Facebook - and other sites - should not be confused by non-Latin characters.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.