A curious way to break Twitter’s search results

by @edent | # # # | Read ~179 times.

(This isn’t really a security issue, although I’ve disclosed it to the Twitter team.)

“Fuzzing” is a computer science term which means “sending weird data into a program and seeing what happens.” It’s a useful way to see how your code can break in new and unexpected ways. It’s particularly good at showing what a website’s search engine does when it is confused.

For example, here’s a fairly mundane Tweet.

OK, the bot sending it appears to have had a bit of a meltdown, but that’s not the interesting thing. If we search for some of the HTML elements in it, we get this hot mess:

Screenshot of a tweet. The HTML is malformed.

WTF?! Let’s take a look at what the search engine is doing. Here’s some of the HTML for that tweet.



&gt;502 Bad Gateway&amp;l<strong>t;/title</strong>


This looks to me like an off-by-one error. I suspect that the internal parser is highlighting the zeroth character rather than the first. Because the < are stored as their escaped version – &lt – when going backwards by one extra character, the escaped element is bisected.

Or not. I’m not the Twitter Engineering Team. Might be dragons. Who knows?

Leave a Reply

Your email address will not be published. Required fields are marked *