A curious way to break Twitter's search results


(This isn't really a security issue, although I've disclosed it to the Twitter team.)

"Fuzzing" is a computer science term which means "sending weird data into a program and seeing what happens." It's a useful way to see how your code can break in new and unexpected ways. It's particularly good at showing what a website's search engine does when it is confused.

For example, here's a fairly mundane Tweet.

OK, the bot sending it appears to have had a bit of a meltdown, but that's not the interesting thing. If we search for some of the HTML elements in it, we get this hot mess:

Screenshot of a tweet. The HTML is malformed.

WTF?! Let's take a look at what the search engine is doing. Here's some of the HTML for that tweet.

&amp;<strong>lt;html</strong>&gt;

&lt;head&amp;g<strong>t;&lt;title</strong>

&gt;502 Bad Gateway&amp;l<strong>t;/title</strong>

&gt;&lt;/head&gt;

This looks to me like an off-by-one error. I suspect that the internal parser is highlighting the zeroth character rather than the first. Because the < are stored as their escaped version - &lt - when going backwards by one extra character, the escaped element is bisected.

Or not. I'm not the Twitter Engineering Team. Might be dragons. Who knows?

Support this blog

Enjoyed this blog post? You can say thanks to the author in the following ways:

Donate to charity
Give to charity.
Buy me a birthday present
Amazon Wishlist
Get me a coffee
Donate on Ko-Fi.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.