How much do you know about the humble
<title> tag? It has been there since the earliest HTML specification. The 1995 spec says:
There may only be one title in any document. It should identify the content of the document in a fairly wide context.
It may not contain anchors, paragraph marks, or highlighting.
Remarkably little has changed in the intervening decades. The modern HTML5 spec defines it as only containing text. That means you can’t nest tags inside it. For example, this is invalid:
<title>I love <em>you</em>!</title>
Try it – you won’t get emphasised text.
This is a problem for internationalisation and accessibility – and one the HTML5 editors have been wrestling with.
(Usual disclaimer, I’m HMG’s representative on the W3C’s Advisory Committee and I’m an editor on HTML5 – these are my personal views.)
Let’s imagine this potential page title which includes multiple languages:
<html lang="en-gb"> <head> <title>A review of La vie en rose</title> ...
At the moment a screen reader would assume the whole title, including “la vie en rose”, is in British English (the document’s language). Hearing a computerised voice reading French in a British accent is akin to hearing Vogon poetry read in Dick Van Dyke’s “cockornay”. Unpleasant for all involved.
What we want to write is something like:
<html lang="en-gb"> <head> <title> <span lang="en">A review of</span> <span lang="fr">La vie en rose</span> </title> ...
This is invalid HTML – in most browsers it will display like this:
Incidentally, this is a long known limitation. How can we fix it?
- Change the element
- Make a new meta element
- Something else
Changing the element
This is impossible if we want to maintain backwards compatibility. Either we accept that older browsers will see garbled titles, or that newer browsers may get confused by older pages.
Any page with a title like
<title>This page is about the <span> element</title> is going to require a work-around for new browsers.
New meta element
How about a new element like
Nope! If a browser doesn’t recognise an element in the
<head> it will print it in the body.
Here are two existing things which could be repurposed.
The accessibility attribute
longdesc lets an author place a “long description” elsewhere in the page. It is currently only valid on the
<img> element. But let’s imagine it used elsewhere:
<title longdesc="#desc">A review of La vie en rose</title> </head> <body> <h1 id="desc"><span lang="en">A review of</span> <span lang="fr">La vie en rose</span></h1> ...
That might help screen readers – but it doesn’t improve the semantics of the page.
Here’s some Schema.org MicroData:
<html lang="en-gb"> <head itemscope itemtype="https://schema.org/WebPage"> <meta itemprop="alternativeHeadline" content="A review of <span lang='fr'>La vie en rose</span>"/> <title>...
That might be better for semantics, but would require screen readers to support it.
We could also hope that AI improves sufficiently that it can immediately recognise individual words within a sentence are from a different language. A task which befuddles most humans.
What should be done? If you’re interested in working on this – come join in with the HTML5 development process at the W3C