<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/rss-style.xsl" type="text/xsl"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	     xmlns:dc="http://purl.org/dc/elements/1.1/"
	   xmlns:atom="http://www.w3.org/2005/Atom"
	     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	  xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>syntax &#8211; Terence Eden’s Blog</title>
	<atom:link href="https://shkspr.mobi/blog/tag/syntax/feed/" rel="self" type="application/rss+xml" />
	<link>https://shkspr.mobi/blog</link>
	<description>Regular nonsense about tech and its effects 🙃</description>
	<lastBuildDate>Wed, 30 Oct 2024 17:40:07 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://shkspr.mobi/blog/wp-content/uploads/2023/07/cropped-avatar-32x32.jpeg</url>
	<title>syntax &#8211; Terence Eden’s Blog</title>
	<link>https://shkspr.mobi/blog</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title><![CDATA[Is this a bug in every Markdown (Extra) parser?]]></title>
		<link>https://shkspr.mobi/blog/2024/10/is-this-a-bug-in-every-markdown-extra-parser/</link>
					<comments>https://shkspr.mobi/blog/2024/10/is-this-a-bug-in-every-markdown-extra-parser/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 30 Oct 2024 12:34:24 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[markdown]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[syntax]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=53007</guid>

					<description><![CDATA[Markdown is, I think it is fair to say, a frustrating &#34;specification&#34;. It&#039;s origins are a back-of-a-fag-packet document and a buggy Perl script - and we&#039;ve been dealing with the consequences ever since.  There are now multiple Markdown parsers, each with their own idiosyncrasies. To make matters worse, there&#039;s a set of extensions popularly known as &#34;Markdown Extra&#34;.  Extra has support for things…]]></description>
										<content:encoded><![CDATA[<p>Markdown is, I think it is fair to say, a frustrating "specification". It's origins are <a href="https://daringfireball.net/projects/markdown/">a back-of-a-fag-packet document</a> and a buggy Perl script - and we've been dealing with the consequences ever since.</p>

<p>There are now multiple Markdown parsers, each with their own idiosyncrasies. To make matters worse, there's a set of extensions popularly known as "Markdown Extra".</p>

<p>Extra has support for things like tables, footnotes, and - in some dialects - autolinks.</p>

<p>Most of the time, when an author writes the text <code>Visit https://example.com</code> they want the URl to be automatically turned into a hyperlink.  Most Markdown parsers support that. Hurrah!</p>

<p>But there's a rather nasty little edge case.</p>

<p>Markdown is explicitly designed so that <a href="https://daringfireball.net/projects/markdown/syntax#html">authors can mix and match HTML and Markdown</a> in the same document.  This is perfectly valid:</p>

<p><code>I &lt;em&gt;love&lt;/em&gt; the delicious taste of **fresh** oranges!</code></p>

<p>Which becomes:</p>

<p><code>I &lt;em&gt;love&lt;/em&gt; the delicious taste of &lt;strong&gt;fresh&lt;/strong&gt; oranges!</code></p>

<p>This is also valid:</p>

<p><code>&lt;a href="https://example.com/"&gt;Visit my *favourite* site https://example.com/&lt;/a&gt;!</code></p>

<p>The parser is smart enough to ignore the link inside the <code>href=""</code> but will process all the Markdown contents of the <code>&lt;a&gt;</code> element.</p>

<p>The text <em>favourite</em> is converted to <code>&lt;em&gt;favourite&lt;/em&gt;</code> correctly.</p>

<p>But what about the link? Should that be autolinked?</p>

<p>Here's how <a href="https://babelmark.github.io/?text=Autolink%3A+https%3A%2F%2Fexample.com%2F%0A%0AHTML%3A+%3Ca+href%3D%22https%3A%2F%2Fexample.com%2F%22%3EVisit+my+*favourite*+site+https%3A%2F%2Fexample.com%2F%3C%2Fa%3E!">a few dozen different Markdown parsers fare</a>.</p>

<p>Nearly all of the ones which support Autolink end up producing <em>broken</em> HTML. They nest an anchor within an anchor. Something explicitly forbidden by the HTML specification.</p>

<pre><code class="language-html">&lt;a href="https://example.com/"&gt;Visit my
   &lt;em&gt;favourite&lt;/em&gt; site 
   &lt;a href="https://example.com/"&gt;https://example.com/&lt;/a&gt;
&lt;/a&gt;!
</code></pre>

<p>Others break in weird and unexpected ways.</p>

<h2 id="is-this-a-bug"><a href="https://shkspr.mobi/blog/2024/10/is-this-a-bug-in-every-markdown-extra-parser/#is-this-a-bug">Is this a bug?</a></h2>

<p>Markdown is an excellent example of "do what I mean, not what I say" software. To the human reading the text, it might seem obvious which parts need to be transformed and which don't.</p>

<p>There are various specifications for how autolinking should work - but I couldn't find any documents which explicitly discuss where it <em>shouldn't</em> work.</p>

<p>At this point, you're probably going to leave a comment saying that it is the users who are wrong. They should wrap links in brackets, or stick to pure Markdown, or some other tosh.</p>

<p>Markdown was supposed to <strong>simplify</strong> the process of writing HTML. Anything which forces the user to write in an unnatural or confusing way is a bug.</p>

<blockquote><p>Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).</p>

<p><a href="https://daringfireball.net/projects/markdown/">Markdown Introduction</a></p></blockquote>

<p>I don't think any of the authors of Markdown parsers have been naughty here. They mostly just follow the spec. But Markdown was designed without ever being tested with real users.  And real users break things in all sorts of unexpected and delightful ways.</p>

<p>That's where the real bug is. When we don't test with users and fail to meet their expectations, we produce faulty software.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=53007&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2024/10/is-this-a-bug-in-every-markdown-extra-parser/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
	</channel>
</rss>
