HTML Oddities: Is a newline just another whitespace in attribute values?


Consider these two HTML elements:

 HTML<div class="a b">…</div>

<div class="a
b">…</div>

Is there any semantic difference between them? Is there any way to target one but not the other? In other words, are they logically different?

I think the answer is no. On every browser I've tested, both are the same. Whether using JS or CSS, there's no difference between them. You could replace every \n with a and nothing would break.

But is that true for every attribute? Are there some attributes where a newline is *significant"?

For the vast majority of attributes, the answer is no. Consider the alt attribute for providing alternate text on images. This:

 HTML<img src="" alt="First line.
Second Line.

Forth line.">

When rendered by a browser, the newlines become spaces. See:

First line.
Second Line.
Forth line.

But there's are three attributes where newlines do matter. Can you work out what they are?

Title

Hover your cursor over this text and a title will appear. It will look something like:

Title text showing multiple lines.

The HTML specification has a section on "space-separated tokens" which it defines as "ASCII whitespace":

  • U+0009 TAB
  • U+000A LF
  • U+000C FF
  • U+000D CR
  • U+0020 SPACE

So tab, any newline, and space are all equivalent when it comes to tokenisation of content.

However, for title specifically:

If the title attribute's value contains U+000A LINE FEED (LF) characters, the content is split into multiple lines. Each U+000A LINE FEED (LF) character represents a line break.

3.2.6.1 The title attribute

Placeholder

There's another similar case:

The good old <textarea> element has a placeholder attribute. That also allows newlines - although in a subtly different way to the title element!

All U+000D CARRIAGE RETURN U+000A LINE FEED character pairs (CRLF) in the hint, as well as all other U+000D CARRIAGE RETURN (CR) and U+000A LINE FEED (LF) characters in the hint, must be treated as line breaks when rendering the hint.

4.10.11 The textarea element

Quite why carriage returns are allowed here, but not in title, I don't know!

Also note, the textarea's placeholder is different from the <input>'s placeholder, which doesn't support newlines.

ID

I warn you though, this one is pretty nasty!

Consider this piece of HTML:

 HTML<p id="test
">Hello</p>

I know! What sort of sicko would include a newline in their ID?! But, it turns out, that is significant.

Try to select that element using CSS like:

 CSS#test {
   color: red;
}

It won't work! The literal ID is not test. If you run:

 JavaScriptdocument.querySelector("p")

It will return <p id="test\n"> - which means you can only select it with:

 JavaScriptdocument.getElementById("test\n")

Or with CSS using special character selectors:

 CSS#test\a {
  color: blue;
}

The spec says

The id attribute specifies its element's unique identifier (ID).

There are no other restrictions on what form an ID can take; in particular, IDs can consist of just digits, start with a digit, start with an underscore, consist of just punctuation, etc.

While it doesn't specifically mention newlines, it seems clear that the attribute can contain *anything".

Any others?

I'm pretty sure those three are the only attributes which treat newlines in their values as significant. Think I'm wrong? Please leave a comment.


Share this post on…

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

See allowed HTML elements: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">