Twitter’s way of linking URLs is broken. It’s annoying to users, and a pain in the arse to developers. This quick post talks about the problem and offers a solution.
I’ve raised a bug with Twitter and I hope you’ll star it as important to you.
A common trope in programming classes is “how do you detect valid email address?”
It should be obvious, right? A string of text, an @, a domain – probably ending in .com.
As it turns out, it’s not that simple. “who+o’firstname.lastname@example.org” is a potentially valid address, for example.
There are literally thousands of ways to detect the potentially infinite variety of email addresses.
The same is true for URLs – and slavish adherence to guidelines is killing Twitter’s usefulness.
The URL Matching Problem
Which of these strings should be turned into hyperlinks?
www.bbc.co.uk example.com http://test https://test.test ftp://news.com
As it happens, Twitter only matches “https://test.test” and none of the others.
Twitter’s matching regex is, as far as I can tell, this
If it starts with http:// or https:// and has a dot in it - it's a URL
I think this is a serious weakness. Twitter users are sharing URLs which their followers can’t click on – Twitter is also linking to URLs which don’t exist.
Much like the email regexes, I would take a much more lax approach. Essentially, if it looks vaguely like a URL – link to it.
I would suggest the following rules:
- If it starts with a protocol – http:// ftp:// tel: etc – create a hyperlink.
- If it starts with www. – create a hyperlink.
- If it ends . then a valid TLD – create a hyperlink.
- If it contains a valid TLD followed by a slash then some other characters – create a hyperlink.
The “correct” method would then be for Twitter to perform an HTTP HEAD request to see if the URL is potentially valid. There are three drawbacks to this.
- It may place excessive load on Twitter’s servers to process and cache these requests.
- The URL may be that of an Intranet site – and thus inaccessible to Twitter.
- The URL may be valid but temporarily inaccessible.
Regardless of the method, surely it’s inexcusable that “www.example.com” isn’t detected as a URL whereas “http://bork.bork.bork” is?
If you think Twitter’s approach to hyperlinks is wrong – please make your voice heard at the bug report.