(This isn't really a security issue, although I've disclosed it to the Twitter team.)
"Fuzzing" is a computer science term which means "sending weird data into a program and seeing what happens." It's a useful way to see how your code can break in new and unexpected ways. It's particularly good at showing what a website's search engine does when it is confused.
For example, here's a fairly mundane Tweet.
🏢 HONOUR INTERNATIONAL LIMITED
🇰🇾 Cayman Islands
<head><title>502 Bad Gateway</title></head>
— Offshore A—Z (@OffshoreAZ) February 16, 2018
OK, the bot sending it appears to have had a bit of a meltdown, but that's not the interesting thing. If we search for some of the HTML elements in it, we get this hot mess:
WTF?! Let's take a look at what the search engine is doing. Here's some of the HTML for that tweet.
&<strong>lt;html</strong>> <head&g<strong>t;<title</strong> >502 Bad Gateway&l<strong>t;/title</strong> ></head>
This looks to me like an off-by-one error. I suspect that the internal parser is highlighting the zeroth character rather than the first. Because the
< are stored as their escaped version -
< - when going backwards by one extra character, the escaped element is bisected.
Or not. I'm not the Twitter Engineering Team. Might be dragons. Who knows?