Lazy way to cause SHA-256 collisions for lazy evaluators
Humans are lazy. That's why we have computers; to do the boring work for us.
I recently downloaded a file. The website said the file should have a SHA-256 hash of: ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb
So I ran sha256 filename
on my machine. And then lazily compared the hashes. By which I mean "Yeah the first few characters match, as do the last few. It's probably fine."
Stupid lazy humans.
It's pretty easy to demonstrate that you can take a string, generate a hash, and then create a different string which has a hash with the same first character as the original.
This toy example starts with a string "a" and then creates a different string "b". It then pads the "b" string with spaces until the first characters of both hashes match:
PHP$s1 = "a";
$h1 = hash("sha256", $s1);
$first = substr($h1, 0, 1);
echo $first . "\n";
$s2 = "b";
for ($x=0; $x<100; $x++) {
$s2 .= " ";
$h2 = hash("sha256", $s2);
if ( $first == substr($h2,0,1) ) {
echo "{$s2}.{$h2}\n";
}
}
Assuming SHA-256 is randomly distributed, and there are 16 possible values for each character, this should produce ~6.25 matching hashes.
And, indeed it does! Four hashes appear which all have the same first character.
What about generating a "collision" of the first and last characters? 16x16 = 256. So that should be doable on modest hardware.
PHP$s1 = "a";
$h1 = hash("sha256", $s1);
$first = substr($h1, 0, 1);
$last = substr($h1, -1);
echo "{$first}-{$last}\n";
$s2 = "b";
for ($x=0; $x<2000; $x++) {
$s2 .= " ";
$h2 = hash("sha256", $s2);
if ( $first == substr($h2,0,1) && $last == substr($h2, -1) ) {
echo "{$s2}.{$h2}\n";
}
}
That spits out a bunch of strings where the first and last characters of the hash matches the first and last of the original hash.
Of course, from there it gets exponentially harder. If you want to match 2 characters at each end, you'll need around 65,536 attempts. As it happens, it takes 74,653 attempts before a "collision" of the first two and last two characters are found.
Of course, padding with spaces makes the string significantly longer. So you could pad with incrementing Unicode characters. Depending on the contents of the message, you could fill it with invisible characters which wouldn't be seen by an ordinary user.
Is this a practical way to generate full-length hashes? No, obviously not. It goes some way to show just how difficult it is to force even a partial collision.
There is a brilliant explanation of this problem space on David Buchanan's blog.
Rick said on mastodon.social:
@Edent hashing is so cool. Great post!
Julian Fietkau said on fietkau.social:
@Edent There's a topic in the HCI & security field overlap about hash visualization, saying that software may want to offer deterministic visualizations of hashes so people can perceive and compare them more holistically. The quintessential work is Perrig & Song (1999): https://users.ece.cmu.edu/~adrian/projects/validation/validation.pdf
A few years ago I got together with a colleague to survey the landscape and propose our own algorithm: https://fietkau.science/hash_visualization_password_validation See slide 8 for more prior work examples.
Fun stuff! 😀
stephen said on mastodon.online:
@Edent i'm a lazy human so I made a shasum command to calculate each of the popular sha*sum hashes. this ways I don't even need to read the fine print of which sha*sum command to run. I guess I should make it better by accepting the hash on the command line.
cesarb said on mastodon.social:
@Edent I recall this being a minor plot point in a hacking fiction story I read somewhere a long time ago (unfortunately, I can't recall where I read it; it was so long ago that the hash was probably still MD5). IIRC, the protagonist fell for that trick (the correct and fake hash had equal start and end, only differing in the middle bytes) and got hacked. Since I read that story, I've always compared not only the beginning and the end of hashes, but also a sequence somewhere in the middle.
More comments on Mastodon.