A curious way to break Twitter's search results


(This isn't really a security issue, although I've disclosed it to the Twitter team.)

"Fuzzing" is a computer science term which means "sending weird data into a program and seeing what happens." It's a useful way to see how your code can break in new and unexpected ways. It's particularly good at showing what a website's search engine does when it is confused.

For example, here's a fairly mundane Tweet.

OK, the bot sending it appears to have had a bit of a meltdown, but that's not the interesting thing. If we search for some of the HTML elements in it, we get this hot mess:

Screenshot of a tweet. The HTML is malformed.

WTF?! Let's take a look at what the search engine is doing. Here's some of the HTML for that tweet.

&amp;<strong>lt;html</strong>&gt;

&lt;head&amp;g<strong>t;&lt;title</strong>

&gt;502 Bad Gateway&amp;l<strong>t;/title</strong>

&gt;&lt;/head&gt;

This looks to me like an off-by-one error. I suspect that the internal parser is highlighting the zeroth character rather than the first. Because the < are stored as their escaped version - &lt - when going backwards by one extra character, the escaped element is bisected.

Or not. I'm not the Twitter Engineering Team. Might be dragons. Who knows?


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">