<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/rss-style.xsl" type="text/xsl"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	     xmlns:dc="http://purl.org/dc/elements/1.1/"
	   xmlns:atom="http://www.w3.org/2005/Atom"
	     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	  xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>Illegal Hashes &#8211; Terence Eden’s Blog</title>
	<atom:link href="https://shkspr.mobi/blog/2022/11/illegal-hashes/feed/" rel="self" type="application/rss+xml" />
	<link>https://shkspr.mobi/blog</link>
	<description>Regular nonsense about tech and its effects 🙃</description>
	<lastBuildDate>Thu, 28 Nov 2024 10:06:33 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://shkspr.mobi/blog/wp-content/uploads/2023/07/cropped-avatar-32x32.jpeg</url>
	<title>Illegal Hashes &#8211; Terence Eden’s Blog</title>
	<link>https://shkspr.mobi/blog</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title><![CDATA[Illegal Hashes]]></title>
		<link>https://shkspr.mobi/blog/2022/11/illegal-hashes/</link>
					<comments>https://shkspr.mobi/blog/2022/11/illegal-hashes/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Mon, 28 Nov 2022 12:34:14 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[cryptography]]></category>
		<category><![CDATA[hashing]]></category>
		<category><![CDATA[NaBloPoMo]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=43570</guid>

					<description><![CDATA[To understand this blog post, you need to know two things.   There exists a class of numbers which are illegal in some jurisdictions. For example, a number may be copyrighted content, a decryption key, or other text considered illegal. There exists a class of algorithms which will take any arbitrary data and produce a fixed length text from it. This process is known as &#34;hashing&#34;. These algorithms …]]></description>
										<content:encoded><![CDATA[<p>To understand this blog post, you need to know two things.</p>

<ol>
<li>There exists a class of <a href="https://en.wikipedia.org/wiki/Illegal_number">numbers which are illegal in some jurisdictions</a>. For example, a number may be copyrighted content, a decryption key, or other text considered illegal.</li>
<li>There exists a class of algorithms which will take any arbitrary data and produce a fixed length text from it. This process is known as "<a href="https://en.wikipedia.org/wiki/Hash_function">hashing</a>". These algorithms are deterministic - that is, entering the same data will always produce the same hash.</li>
</ol>

<p>Let's take the <a href="https://en.wikipedia.org/wiki/MD5">MD5 hashing algorithm</a>. Feed it <em>any</em> data and it will produce hash with a fixed length of 128 bits. Using an 8 bit alphabet, that's 16 human-readable characters.</p>

<p>Suppose you live in a country with <i lang="fr">Lèse-majesté</i> - laws which make it treasonous to insult or threaten the monarch.</p>

<p>There exists a seemingly innocent piece of data - an image, an MP3, a text file - which when fed to MD5 produces these 128 bits:</p>

<pre><code class="language-_">01001001 00100000 01101000 01100001 
01110100 01100101 00100000 01110100 
01101000 01100101 00100000 01110001 
01110101 01100101 01100101 01101110 
</code></pre>

<p>Decoded into ASCII, that spells <code>I hate the queen</code>.</p>

<p>128 bits is <em>probably</em> too short to be illegal in all but the most repressive of regimes. It would be hard, if not impossible, to squeeze terrorist plans into that little space.</p>

<p>But it is just enough space to store an <a href="https://en.wikipedia.org/wiki/AACS_encryption_key_controversy">encryption key for copyrighted material</a>.</p>

<p>Therefore, it is possible that there exists a file which - by pure coincidence - happens to have an MD5 hash which is illegal.</p>

<p>Because MD5 is a relatively weak algorithm, it is possible to <a href="https://github.com/corkami/collisions">create <em>deliberate</em> hash "collisions"</a>. That is, take some data and manipulate it until it has the <em>same</em> MD5 as a different piece of data.</p>

<p>Someone could, theoretically, deliberately create a file which looks unremarkable when viewed, but is illegal when hashed.</p>

<p>The SHA-1 hashing algorithm produces 160 bits - 20 ASCII characters. It is <em>somewhat</em> <a href="https://www.zdnet.com/article/sha-1-collision-attacks-are-now-actually-practical-and-a-looming-danger/">cheap and easy to produce a file with a specific SHA-1 hash</a>.</p>

<p>The SHA-512 hashing algorithm, as its name suggests, produces a 512 bit hash. That's enough space for 64 ASCII characters. Is that long enough to contain text which is blatantly illegal? Almost certainly. But modern hashing algorithms are designed to be resistant to collision attacks. So much so that it seems like <a href="https://link.springer.com/chapter/10.1007/978-3-030-84242-0_22">theoretical quantum computers will be needed to crack them</a>.  The chances of any file having an illegal hash is infinitesimally small.</p>

<p>Nevertheless, it intrigues me that there may be a form of hash-steganography. How would you detect whether the hash of a file was problematic?</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=43570&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2022/11/illegal-hashes/feed/</wfw:commentRss>
			<slash:comments>8</slash:comments>
		
		
			</item>
	</channel>
</rss>
