Search Engine Optimisation is the (dark) art of getting a site to the top of Google's ranking algorithm. If you're in the business of selling decorations for ponds, you want your shop to be right at the top of the results when people search for "bespoke synthetic frog spawn."
The problem is, there are lots of people all playing the same game. So, what "unusual" tactics can be used to drive sites to the top?
Yesterday, I looked at how homoglyphs like Il (capital i lower L) can be used to confuse humans. Today, let's see how they can confuse computers.
Shortly after publishing my blog post, I was the number one search result for"νⅰａｇｒа".
Why? Because, apparently, I'm the only person on the whole Web who has used those weird characters in precisely that combination.
We can already see this sort of spam attack being used on the Web. Do a search for "Ԍооｇｌе" and you'll see a bunch of sites like these:
Evading Spam Filters
Now, no one in their right mind is going to type those perculiar characters in to a search box. But they can have their uses elsewhere.
Google has a nifty little feature - the "I'm Feeling Lucky" button. Press that, and you're immediately taken to Google's first result.
Simply append &btnI on to a Google search URL and you'll be taken there.
Very few services will block a Google URL from being shared. This means, if you can claim the top spot for a particular search, you'll be able to craft a URL which will look like it takes you to Google, but will actually take you elsewhere.
Copy and paste this into your URL bar and see what happens.
What Can Be Done To Stop This?
Damned if I know!
Search engines - and others - could look at the characters being used as see if they naturally belong together. Should the presence of a Cyrillic "о" in the middle of Latin text set off alarm bells? How about the Greek "Τ" mixed in with the Cherokee "Ꭺ"?
Perhaps computers need to get better at spotting "odd" patterns of letters - deriving a better semantic understanding of the text in context rather than just looking for literal strings.
"SEO Experts" are already exploiting this weakness in search engines. Do we need to find a way to nip this in the bug?