How Should We Punctuate on the Web?


Imagine, just for a moment, you were a computer. Take a look at the following sentence and try to work out where and how you should hyperlink the text.

He said "You should visit http://example.com/!"

Obvious, isn't it? Except, of course, it's not really that simple. There could well be a file named "!" on the webserver. Infact, there could be file named "!"" on there.

And yet, to my tastes, it looks so ugly to write something like:

Visit my blog (http://example.com/blog )

The space before the closing parenthesis looks misplaced to me. Without it though, how are we to accurately say whether it is genuinely part of the URL or not?

This isn't purely a theoretical problem, I've noticed some text parsers getting rather confused with trailing punctuation.

Guardian 404 Error-fs8

That's the Unicode character for an ending double quotation mark, AKA ”. A URL parser saw it and believed it to be a genuine part of the link.

I don't think it's entirely fair to blame this on a programming error. Yes, the programmer could look for matching and opposite symbols, but how does one go about performing the semantic processing needed to understand which characters are superfluous?

Consider these limited examples.

  • I love example.com/! - the "!" should not be part of the link.
  • example.com/#!something - the hashbang is an abomination, but a perfectly valid link.
  • Visit "example.com/" -
  • Is your domain example.com/? - The extra "?" may not do any harm, but does it mangle our semantic understanding to see it linked? Does it look ugly if we write ".com ?"?
  • Click on example.com/My%20Blog - the " " can be escaped as "%20", should we therefore examine all whitespaces after an obvious URL?

Four years ago, I was talking about how we should interpret hashtag validity. This is a similar, but trickier, problem. Should parsers get better? Should servers be more forgiving before serving an error? Or should humans work around the limitations of our digital servants and alter our established patterns of communication?


Share this post on…

2 thoughts on “How Should We Punctuate on the Web?”

  1. says:

    If it's on the web, it's written in HyperText Markup Language. Therefore, you can use the facilities offered by said language to mark up the text into a link.

    If it's in text/plain, there's already a fairly good convention from decades of email and USENET: putting angle brackets around the URL.

    Reply
  2. URL resolving in servers/frameworks/whatever could have a feature where they auto-truncate & redirect commonly mistaken encoded punctuation — similar to Apache's mod_speling.

    Very Postel Principle, but in self-defense.

    I may have to try this with my personal site and see how it goes.

    Reply

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">