Illegal Hashes
To understand this blog post, you need to know two things.
- There exists a class of numbers which are illegal in some jurisdictions. For example, a number may be copyrighted content, a decryption key, or other text considered illegal.
- There exists a class of algorithms which will take any arbitrary data and produce a fixed length text from it. This process is known as "hashing". These algorithms are deterministic - that is, entering the same data will always produce the same hash.
Let's take the MD5 hashing algorithm. Feed it any data and it will produce hash with a fixed length of 128 bits. Using an 8 bit alphabet, that's 16 human-readable characters.
Suppose you live in a country with Lèse-majesté - laws which make it treasonous to insult or threaten the monarch.
There exists a seemingly innocent piece of data - an image, an MP3, a text file - which when fed to MD5 produces these 128 bits:
01001001 00100000 01101000 01100001
01110100 01100101 00100000 01110100
01101000 01100101 00100000 01110001
01110101 01100101 01100101 01101110
Decoded into ASCII, that spells I hate the queen
.
128 bits is probably too short to be illegal in all but the most repressive of regimes. It would be hard, if not impossible, to squeeze terrorist plans into that little space.
But it is just enough space to store an encryption key for copyrighted material.
Therefore, it is possible that there exists a file which - by pure coincidence - happens to have an MD5 hash which is illegal.
Because MD5 is a relatively weak algorithm, it is possible to create deliberate hash "collisions". That is, take some data and manipulate it until it has the same MD5 as a different piece of data.
Someone could, theoretically, deliberately create a file which looks unremarkable when viewed, but is illegal when hashed.
The SHA-1 hashing algorithm produces 160 bits - 20 ASCII characters. It is somewhat cheap and easy to produce a file with a specific SHA-1 hash.
The SHA-512 hashing algorithm, as its name suggests, produces a 512 bit hash. That's enough space for 64 ASCII characters. Is that long enough to contain text which is blatantly illegal? Almost certainly. But modern hashing algorithms are designed to be resistant to collision attacks. So much so that it seems like theoretical quantum computers will be needed to crack them. The chances of any file having an illegal hash is infinitesimally small.
Nevertheless, it intrigues me that there may be a form of hash-steganography. How would you detect whether the hash of a file was problematic?
Christian Lawson-Perfect said on mathstodon.xyz:
@Edent oh, that's a brilliant idea! A person could even put out an album of a dozen or so innocuous MP3 files whose MD5 hashes would concatenate to a long illegal string.I'm not sure this would be as much of a technicality to avoid prosecution as you'd like it to be - as easy as collisions are to deliberately find, they're still very hard to stumble across, so you'd have to go a long way to make it look accidental
iucounu said on mastodon.social:
@Edent I seem to remember this was likely the case with criminal.jpg: an innocuous doodle that would get your account instantly locked if you posted it on Twitter. The hypothesis was, it had the same hash as a banned image, and I guess Twitter checks image hashes?
Lawrence Akka KC said on mastodon.me.uk:
@christianp @Edent Not sure I follow. I guess the hypothetical law might say that stating "I hate the Queen" would be illegal. Publishing the hash would not be "stating" IHTQ. So there would be no offence. Much would turn on the wording of the actual law.
Lawrence Akka KC said on mastodon.me.uk:
@christianp @Edent If everyone started publishing the hash, and it became sufficiently known that it represented the forbidden text, then a judge would have to decide whether publishing the hash was sufficiently equivalent to 'stating' IHTQ, such that an offence had been committed.
Christian Lawson-Perfect said on mathstodon.xyz:
@law @Edent that's what I was thinking of: if it was well-known that the hash could be computed to produce the illegal string, then distributing the original file would be considered equivalent to distributing the illegal string.But I know nothing about the law, so it'd be the height of arrogance not to defer to you!
Alan says:
Not a lawyer and this is for entertainment only, but in the US I believe for it to be a crime it would require evidence of intent to use those bits for said illegal purposes. Simply having the bits is not enough to constitute a crime, since there are valid legal reasons to have those bits. More philosophically, information requires a context of interpretation and without an implied context of usage it has no meaning. The word "Beans" may have a specific precise meaning in one context, and an entirely different meaning in a different context.
Dan Q says:
Moreover, there exist theoretical files whereby hashing the file contents plus some specific part of the file metadata results in an illegal number. E.g. MD5( contents + filename ) => illegal number.
Suppose the file contents express, in a human-readable way (e.g. a document), the formula to use to generate a number... the file might not lead to an illegal number with the filename it has, but might lead to one if the file is renamed.
Is it acceptable to knowingly rename the file?
HackerNewsTop10 said on twitter.com:
Illegal Hashes Link: shkspr.mobi/blog/2022/11/i… Comments: news.ycombinator.com/item?id=337729…
More comments on Mastodon.