Illegal Hashes
To understand this blog post, you need to know two things.
- There exists a class of numbers which are illegal in some jurisdictions. For example, a number may be copyrighted content, a decryption key, or other text considered illegal.
- There exists a class of algorithms which will take any arbitrary data and produce a fixed length text from it. This process is known as "hashing". These algorithms are deterministic - that is, entering the same data will always produce the same hash.
Let's take the MD5 hashing algorithm. Feed it any data and it will produce hash with a fixed length of 128 bits. Using an 8 bit alphabet, that's 16 human-readable characters.
Suppose you live in a country with Lèse-majesté - laws which make it treasonous to insult or threaten the monarch.
There exists a seemingly innocent piece of data - an image, an MP3, a text file - which when fed to MD5 produces these 128 bits:
01001001 00100000 01101000 01100001
01110100 01100101 00100000 01110100
01101000 01100101 00100000 01110001
01110101 01100101 01100101 01101110
Decoded into ASCII, that spells I hate the queen
.
128 bits is probably too short to be illegal in all but the most repressive of regimes. It would be hard, if not impossible, to squeeze terrorist plans into that little space.
But it is just enough space to store an encryption key for copyrighted material.
Therefore, it is possible that there exists a file which - by pure coincidence - happens to have an MD5 hash which is illegal.
Because MD5 is a relatively weak algorithm, it is possible to create deliberate hash "collisions". That is, take some data and manipulate it until it has the same MD5 as a different piece of data.
Someone could, theoretically, deliberately create a file which looks unremarkable when viewed, but is illegal when hashed.
The SHA-1 hashing algorithm produces 160 bits - 20 ASCII characters. It is somewhat cheap and easy to produce a file with a specific SHA-1 hash.
The SHA-512 hashing algorithm, as its name suggests, produces a 512 bit hash. That's enough space for 64 ASCII characters. Is that long enough to contain text which is blatantly illegal? Almost certainly. But modern hashing algorithms are designed to be resistant to collision attacks. So much so that it seems like theoretical quantum computers will be needed to crack them. The chances of any file having an illegal hash is infinitesimally small.
Nevertheless, it intrigues me that there may be a form of hash-steganography. How would you detect whether the hash of a file was problematic?
Alan says:
More comments on Mastodon.