Facebook Mangles Unicode URLs
Facebook rewrite URLs with Unicode in the path - this is not best practice and could be dangerous.
It is possible to create a URL like http://bit.ly/😀 - the Unicode characters are valid in the path.
The URL Encoded representation is :
bit.ly/%F0%9F%98%80
Facebook mangles these URLs in such a way that it might be possible to redirect a user to a malicious site.
Here's what's happening. When Facebook sees the "😀" character in text, it rewrites it to the "" character (󾰀). That's a "private use character". This means Facebook can replace the user's computer's default smiley with a Facebook supplied image or font glyph - if it wants.
In normal text - such as "I passed my exams 😀" - changing the smiley is doesn't present a problem, but Facebook also replaces the text in a URL!
So, the URL :
bit.ly/%F0%9F%98%80%F0%9F%98%80
Will point to a Facebook security page.
Facebook changes the URL to :
bit.ly/%F3%BE%B0%80%F3%BE%B0%80
Which points elsewhere - bit.ly/.
I performed a couple of quick experiments. It is sometimes possible to post a link which displays a preview of a "good" site, but when clicked on leads to a bad site.
The chances of this being used as a successful attack vector are slim. Tricking the user into clicking on a link which subsequently steals their password is made marginally easier if the link and link preview don't match - but I'm sure there are easier ways of deceiving the user.
The real issue here is that Facebook is altering the text that you write - and that can have unexpected consequences.
We live in a non-ASCII world now. A URL like https://莎士比亚.org/奥瑟罗 is perfectly valid. Facebook - and other sites - should not be confused by non-Latin characters.