Illegal Hashes


To understand this blog post, you need to know two things.

  1. There exists a class of numbers which are illegal in some jurisdictions. For example, a number may be copyrighted content, a decryption key, or other text considered illegal.
  2. There exists a class of algorithms which will take any arbitrary data and produce a fixed length text from it. This process is known as "hashing". These algorithms are deterministic - that is, entering the same data will always produce the same hash.

Let's take the MD5 hashing algorithm. Feed it any data and it will produce hash with a fixed length of 128 bits. Using an 8 bit alphabet, that's 16 human-readable characters.

Suppose you live in a country with Lèse-majesté - laws which make it treasonous to insult or threaten the monarch.

There exists a seemingly innocent piece of data - an image, an MP3, a text file - which when fed to MD5 produces these 128 bits:

 YAML01001001 00100000 01101000 01100001 
01110100 01100101 00100000 01110100
01101000 01100101 00100000 01110001
01110101 01100101 01100101 01101110

Decoded into ASCII, that spells I hate the queen.

128 bits is probably too short to be illegal in all but the most repressive of regimes. It would be hard, if not impossible, to squeeze terrorist plans into that little space.

But it is just enough space to store an encryption key for copyrighted material.

Therefore, it is possible that there exists a file which - by pure coincidence - happens to have an MD5 hash which is illegal.

Because MD5 is a relatively weak algorithm, it is possible to create deliberate hash "collisions". That is, take some data and manipulate it until it has the same MD5 as a different piece of data.

Someone could, theoretically, deliberately create a file which looks unremarkable when viewed, but is illegal when hashed.

The SHA-1 hashing algorithm produces 160 bits - 20 ASCII characters. It is somewhat cheap and easy to produce a file with a specific SHA-1 hash.

The SHA-512 hashing algorithm, as its name suggests, produces a 512 bit hash. That's enough space for 64 ASCII characters. Is that long enough to contain text which is blatantly illegal? Almost certainly. But modern hashing algorithms are designed to be resistant to collision attacks. So much so that it seems like theoretical quantum computers will be needed to crack them. The chances of any file having an illegal hash is infinitesimally small.

Nevertheless, it intrigues me that there may be a form of hash-steganography. How would you detect whether the hash of a file was problematic?


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

9 thoughts on “Illegal Hashes”

  1. said on mathstodon.xyz:

    @Edent oh, that's a brilliant idea! A person could even put out an album of a dozen or so innocuous MP3 files whose MD5 hashes would concatenate to a long illegal string.I'm not sure this would be as much of a technicality to avoid prosecution as you'd like it to be - as easy as collisions are to deliberately find, they're still very hard to stumble across, so you'd have to go a long way to make it look accidental

    Reply | Reply to original comment on mathstodon.xyz
  2. Alan says:

    Not a lawyer and this is for entertainment only, but in the US I believe for it to be a crime it would require evidence of intent to use those bits for said illegal purposes. Simply having the bits is not enough to constitute a crime, since there are valid legal reasons to have those bits. More philosophically, information requires a context of interpretation and without an implied context of usage it has no meaning. The word "Beans" may have a specific precise meaning in one context, and an entirely different meaning in a different context.

    Reply
  3. says:

    Moreover, there exist theoretical files whereby hashing the file contents plus some specific part of the file metadata results in an illegal number. E.g. MD5( contents + filename ) => illegal number.

    Suppose the file contents express, in a human-readable way (e.g. a document), the formula to use to generate a number... the file might not lead to an illegal number with the filename it has, but might lead to one if the file is renamed.

    Is it acceptable to knowingly rename the file?

    Reply

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">