Imagine, just for a moment, you were a computer. Take a look at the following sentence and try to work out where and how you should hyperlink the text.
He said “You should visit http://example.com/!”
Obvious, isn’t it? Except, of course, it’s not really that simple. There could well be a file named “!” on the webserver. Infact, there could be file named “!”” on there.
And yet, to my tastes, it looks so ugly to write something like:
Visit my blog (http://example.com/blog )
The space before the closing parenthesis looks misplaced to me. Without it though, how are we to accurately say whether it is genuinely part of the URL or not?
This isn’t purely a theoretical problem, I’ve noticed some text parsers getting rather confused with trailing punctuation.
That’s the Unicode character for an ending double quotation mark, AKA ”. A URL parser saw it and believed it to be a genuine part of the link.
I don’t think it’s entirely fair to blame this on a programming error. Yes, the programmer could look for matching and opposite symbols, but how does one go about performing the semantic processing needed to understand which characters are superfluous?
Consider these limited examples.
- I love example.com/! – the “!” should not be part of the link.
- example.com/#!something – the hashbang is an abomination, but a perfectly valid link.
- Visit “example.com/” –
- Is your domain example.com/? – The extra “?” may not do any harm, but does it mangle our semantic understanding to see it linked? Does it look ugly if we write “.com ?”?
- Click on example.com/My%20Blog – the ” ” can be escaped as “%20“, should we therefore examine all whitespaces after an obvious URL?
Four years ago, I was talking about how we should interpret hashtag validity. This is a similar, but trickier, problem. Should parsers get better? Should servers be more forgiving before serving an error? Or should humans work around the limitations of our digital servants and alter our established patterns of communication?