Terence Eden. He has a beard and is smiling.
Theme Switcher:

Counting Invisible Strings

· 300 words


When is a string not a string? When it's a series of control characters! Not a particularly funny riddle, but one I've been wrestling with recently.

Imagine we want to write a program which displays a Twitter user's name. Not their @ handle, but their "real" name.

For example, instead of @POTUS, display "President Obama". Easy, right? Not quite. What happens when a user is named "️"?

Normally, we'd just say

if (null == $name) {
   ...Do Stuff...
}

Ah! But that's not an empty string, it's ️ AKA %EF%B8%8F AKA variation selector-16.

Yup! Some clever wag has managed to set their Twitter name to a Unicode control character. Interesting and annoying!

That rather puts a spanner in the works. Something like <a href="https://twitter.com/example">&#xFE0F;</a> won't be clickable because it is not a displayable character. It is invisible.

So, how can we test to see if a Unicode string is invisible? I'm using PHP because, hey, that's what I'm using.

Can we count the characters?

print strlen(urldecode("%EF%B8%8F"));
3
print mb_strlen(urldecode("%EF%B8%8F"));
3

Nope.

PHP has some built in functions ctype_print and ctype_graph - but they only test whether the string contains any non-printable characters. No good for us, because the string may contain visible and invisible characters.

Ok, can we use regex? That's what some people suggested to me - but it doesn't seem to deal with the edge-case of non-printing characters.

Well, I'm stumped! If anyone knows of a good way to do this - please reveal yourself!


Share this post on…

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

See allowed HTML elements: <a href="" title="">
<abbr title="">
<acronym title="">
<b>
<blockquote cite="">
<br>
<cite>
<code>
<del datetime="">
<em>
<i>
<img src="" alt="" title="" srcset="">
<p>
<pre>
<q cite="">
<s>
<strike>
<strong>

To respond on your own website, write a post which contains a link to this post - then enter the URl of your page here. Learn more about WebMentions.