Updating all the examples in the HTML5 Spec
I'm currently helping to edit the HTML5 specification. As part of our preparations for HTML5.3 I've started going through the provided examples and improving them. This blog post explains the what, why, and when of the process. You can follow along on GitHub.
How is the Spec Written?
The spec is written using a mash-up of HTML and MarkDown which is then run through Bikeshed to produce beautiful, pure, unsullied HTML.
There is a small problem with HTML... It's hard to display HTML in HTML. That is, if I want to talk about the <nav>
element, I need to escape the HTML elements and write:
<pre><nav></pre>
That's just about readable for short snippets. But consider this genuine (although admittedly extreme) example:
<pre><code class="lang-c"><span class="keyword">for</span> (<span class="ident">j</span> = 0; <span class="ident">j</span> < 256; <span class="ident">j</span>++) { <span class="ident">i_t3</span> = (<span class="ident">i_t3</span> & 0x1ffff) | (<span class="ident">j</span> << 17); <span class="ident">i_t6</span> = (((((((<span class="ident">i_t3</span> >> 3) ^ <span class="ident">i_t3</span>) >> 1) ^ <span class="ident">i_t3</span>) >> 8) ^ <span class="ident">i_t3</span>) >> 5) & 0xff; <span class="keyword">if</span> (<span class="ident">i_t6</span> == <span class="ident">i_t1</span>) <span class="keyword">break</span>; }</code></pre>
BLEURGH! YUK! Also, pretty hard to maintain. I've found dozens of examples which have errors in them; possibly because they're unreadable.
Can you quickly and intuitively spot the error in this example?
<form><div><label>Customer name:;lt;input></label>>/div></form>
Luckily, there is a way we can write HTML without having to escape it. The <xmp>
element!
Let's write some eXaMPles!
<xmp highlight="html">
<form>
<div><label>Customer name: lt;input></label>>/div>
</form>
</xmp>
Wow! Suddenly easier to read. Makes it quicker to edit and to find mistakes.
How to fix it.
Doing a quick grep -rinc "pre highlight=\"html" | grep -v :0
through the spec showed around 600 examples. Ideally I'd run some magic/tragic one line Unix command and everything would be fixed. Sadly, reality got in the way!
Some examples use unescaped markup to highlight specific parts of the examples. Some mix CSS and HTML. Some still use UPPERCASE element names. Some haven't been updated since the stone-age. Some are needlessly verbose and encumbered with excess verbiage which makes it, inter alia, complex to process.
So, I've been going through each example individually. Converting <pre>
to <xmp>
where possible, updating the examples, simplifying them where necessary, and generally giving them a good old tidy-up.
I'm indebted to two specific Atom plugins:
- escape-utils - for unescaping HTML
- atom-beautify - for automagically fixing indentation problems
You're my only hope!
I think I've done this right. But I'm not sure. I've gone through each example I've changed and compared it to the original - they look fine. Nevertheless, I'm certain I've made mistakes somewhere.
I'd be jolly grateful if you could cast your eye over this ridiculously large diff and point out where I've messed things up.
Seriously. Go here - https://github.com/w3c/html/pull/1199 - and get stuck in.
THANKS!
Thomas says:
Frederick Yocum says: