A curious way to break Twitter's search results


(This isn't really a security issue, although I've disclosed it to the Twitter team.)

"Fuzzing" is a computer science term which means "sending weird data into a program and seeing what happens." It's a useful way to see how your code can break in new and unexpected ways. It's particularly good at showing what a website's search engine does when it is confused.

For example, here's a fairly mundane Tweet.

OK, the bot sending it appears to have had a bit of a meltdown, but that's not the interesting thing. If we search for some of the HTML elements in it, we get this hot mess:

Screenshot of a tweet. The HTML is malformed.

WTF?! Let's take a look at what the search engine is doing. Here's some of the HTML for that tweet.

&amp;<strong>lt;html</strong>&gt;

&lt;head&amp;g<strong>t;&lt;title</strong>

&gt;502 Bad Gateway&amp;l<strong>t;/title</strong>

&gt;&lt;/head&gt;

This looks to me like an off-by-one error. I suspect that the internal parser is highlighting the zeroth character rather than the first. Because the < are stored as their escaped version - &lt - when going backwards by one extra character, the escaped element is bisected.

Or not. I'm not the Twitter Engineering Team. Might be dragons. Who knows?


Share this post on…

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">