<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/rss-style.xsl" type="text/xsl"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	     xmlns:dc="http://purl.org/dc/elements/1.1/"
	   xmlns:atom="http://www.w3.org/2005/Atom"
	     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	  xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>When is a URL not a URL? &#8211; Terence Eden’s Blog</title>
	<atom:link href="https://shkspr.mobi/blog/2011/07/when-is-a-url-not-a-url/feed/" rel="self" type="application/rss+xml" />
	<link>https://shkspr.mobi/blog</link>
	<description>Regular nonsense about tech and its effects 🙃</description>
	<lastBuildDate>Thu, 28 Jul 2011 16:55:02 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>

<image>
	<url>https://shkspr.mobi/blog/wp-content/uploads/2023/07/cropped-avatar-32x32.jpeg</url>
	<title>When is a URL not a URL? &#8211; Terence Eden’s Blog</title>
	<link>https://shkspr.mobi/blog</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title><![CDATA[When is a URL not a URL?]]></title>
		<link>https://shkspr.mobi/blog/2011/07/when-is-a-url-not-a-url/</link>
					<comments>https://shkspr.mobi/blog/2011/07/when-is-a-url-not-a-url/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 27 Jul 2011 11:37:57 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[usability]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[urls]]></category>
		<guid isPermaLink="false">http://shkspr.mobi/blog/?p=4271</guid>

					<description><![CDATA[Summary  Twitter&#039;s way of linking URLs is broken.  It&#039;s annoying to users, and a pain in the arse to developers.  This quick post talks about the problem and offers a solution.  I&#039;ve raised a bug with Twitter and I hope you&#039;ll star it as important to you.   Preamble  A common trope in programming classes is &#34;how do you detect valid email address?&#34;  It should be obvious, right?  A string of text,…]]></description>
										<content:encoded><![CDATA[<h2 id="summary"><a href="https://shkspr.mobi/blog/2011/07/when-is-a-url-not-a-url/#summary">Summary</a></h2>

<p>Twitter's way of linking URLs is broken.  It's annoying to users, and a pain in the arse to developers.  This quick post talks about the problem and offers a solution.</p>

<p><a href="http://code.google.com/p/twitter-api/issues/detail?id=2240">I've raised a bug with Twitter</a> and I hope you'll star it as important to you.
<span id="more-4271"></span></p>

<h2 id="preamble"><a href="https://shkspr.mobi/blog/2011/07/when-is-a-url-not-a-url/#preamble">Preamble</a></h2>

<p>A common trope in programming classes is "<a href="http://www.regular-expressions.info/email.html">how do you detect valid email address</a>?"</p>

<p>It should be obvious, right?  A string of text, an @, a domain - probably ending in .com.
As it turns out, it's not that simple.  "who+o'toole@invalid.museum" is a potentially valid address, for example.
There are literally thousands of ways to detect the potentially infinite variety of email addresses.</p>

<p>The same is true for URLs - and slavish adherence to guidelines is killing Twitter's usefulness.</p>

<h2 id="the-url-matching-problem"><a href="https://shkspr.mobi/blog/2011/07/when-is-a-url-not-a-url/#the-url-matching-problem">The URL Matching Problem</a></h2>

<p>Which of these strings should be turned into hyperlinks?</p>

<pre>www.bbc.co.uk

example.com

http://test

https://test.test

ftp://news.com
</pre>

<p>As it happens, Twitter only matches "https://test.test" and none of the others.</p>

<p><a href="https://twitter.com/edent/status/96172785436590080"><img src="https://shkspr.mobi/blog/wp-content/uploads/2011/07/URL-test-1.jpg" alt="" title="URL test 1" width="514" height="216" class="aligncenter size-full wp-image-4274"></a></p>

<p>Twitter's matching regex is, as far as I can tell, this</p>

<pre>If it starts with http:// or https:// and has a dot in it - it's a URL</pre>

<p>I think this is a serious weakness.  Twitter users are sharing URLs which their followers can't click on - Twitter is also linking to URLs which don't exist.</p>

<p>I've picked these examples more or less at random.
<a href="https://twitter.com/ianvisits/status/82712842112991232"><img src="https://shkspr.mobi/blog/wp-content/uploads/2011/07/URL-test-2.jpg" alt="" title="URL test 2" width="514" height="216" class="aligncenter size-full wp-image-4275"></a></p>

<p><a href="https://twitter.com/PeakChief/status/82722453767462912"><img src="https://shkspr.mobi/blog/wp-content/uploads/2011/07/URL-test-3.jpg" alt="" title="URL test 3" width="514" height="216" class="aligncenter size-full wp-image-4276"></a></p>

<h2 id="solution"><a href="https://shkspr.mobi/blog/2011/07/when-is-a-url-not-a-url/#solution">Solution?</a></h2>

<p>Much like the email regexes, I would take a much more lax approach.  Essentially, if it looks vaguely like a URL - link to it.</p>

<p>I would suggest the following rules:</p>

<ul>
    <li>If it starts with a protocol - http:// ftp:// tel: etc - create a hyperlink.</li>
    <li>If it starts with www. - create a hyperlink.</li>
    <li>If it ends . then a <a href="http://data.iana.org/TLD/tlds-alpha-by-domain.txt">valid TLD</a> - create a hyperlink.</li>
    <li>If it contains a <a href="http://data.iana.org/TLD/tlds-alpha-by-domain.txt">valid TLD</a> followed by a slash then some other characters - create a hyperlink.</li>
</ul>

<p>The "correct" method would then be for Twitter to perform an <a href="http://en.wikipedia.org/wiki/HTTP#Request_methods">HTTP HEAD request</a> to see if the URL is potentially valid.  There are three drawbacks to this.</p>

<ol>
    <li>It may place excessive load on Twitter's servers to process and cache these requests.</li>
    <li>The URL may be that of an Intranet site - and thus inaccessible to Twitter.</li>
    <li>The URL may be valid but temporarily inaccessible.</li>
</ol>

<p>Regardless of the method, surely it's inexcusable that "www.example.com" isn't detected as a URL whereas "http://bork.bork.bork" is?</p>

<h2 id="action"><a href="https://shkspr.mobi/blog/2011/07/when-is-a-url-not-a-url/#action">ACTION!</a></h2>

<p>If you think Twitter's approach to hyperlinks is wrong - please <a href="http://code.google.com/p/twitter-api/issues/detail?id=2240">make your voice heard at the bug report</a>.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=4271&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2011/07/when-is-a-url-not-a-url/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
	</channel>
</rss>
