Searching For A Smile


What happens if you search the web for the Unicode character "☺"? On the one hand, it's a symbol just like the letter A or the punctuation mark "!" - on the other, it contains semantic meaning. A smiling, happy face. I decided to look at a few popular search engines to see what they'd […] Read More

Facebook Mangles Unicode URLs


Facebook rewrite URLs with Unicode in the path - this is not best practice and could be dangerous. It is possible to create a URL like http://bit.ly/😀 - the Unicode characters are valid in the path. The URL Encoded representation is : bit.ly/%F0%9F%98%80 Facebook mangles these URLs in such a way that it might be […] Read More

Evading Profanity Filters Using Bi-Directional Text


There are some very sensitive souls on the Internet who object to seeing swear words. To that end, a huge industry has sprung up around "Profanity Filters" - services which claim to be able to detect naughty words and automatically redact them. The approach of dumbly looking for strings of text leads to a range […] Read More

RTL Bugs


Take a look at the following text, looks normal enough doesn't it? "Harry ‮".draziw a si ‭Potter Now, try to select the text and see what happens. WHAT WITCHCRAFT IS THIS?! If you examine the source code for this page, you'll see that I'm using the Unicode Bi-Directional characters. "Harry ‮".draziw a si ‭Potter These […] Read More

Homoglyph Attacks


Homoglyphs are characters that love each other very much look strikingly similar to each other. Can you quickly tell the difference between these two - O0? That's The capital letter "o" and the number 0. How about Il1|? Depending on the font used - and your attention to detail, it may be hard to spot […] Read More

Let's get the IEC Power Symbol into Unicode


I've just launched a campaign to get the IEC Power Symbol into Unicode! A couple of months ago, I asked this question on HackerNews I was looking for the electrical "standby" symbol - AKA IEC5009 / IEEE1621. You know, the circle with the line through it. The one that's on every single bloody piece of […] Read More

Subsetting (Chinese) Fonts


There are loads of really delightful Simplified and Traditional Chinese True Type Fonts available on the web. There's only one issue - the file sizes are really large. In many cases, too large to effectively use as a web-font. For example, this calligraphy style font is 3.4MB. The beautiful Paper Cut Font weighs in at […] Read More

Usability of mixing LTR and RTL text?


Annoyingly, FourSquare has started be be a source of spam for me. I get friend request from people who only like certain brands of stores, from recruitment consultants trying to work out who I'm visiting, and from cultists who are desperate for me to visit Scientology centres. I also get friend requests from people I've […] Read More