Lazy way to cause SHA-256 collisions for lazy evaluators
Humans are lazy. That's why we have computers; to do the boring work for us.
I recently downloaded a file. The website said the file should have a SHA-256 hash of:
ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb
So I ran sha256 filename
on my machine. And then lazily compared the hashes. By which I mean "Yeah the first few characters match, as do the last few. It's probably fine."
Stupid lazy humans.
It's pretty easy to demonstrate that you can take a string, generate a hash, and then create a different string which has a hash with the same first character as the original.
This toy example starts with a string "a" and then creates a different string "b". It then pads the "b" string with spaces until the first characters of both hashes match:
PHP
$s1 = "a";
$h1 = hash("sha256", $s1);
$first = substr($h1, 0, 1);
echo $first . "\n";
$s2 = "b";
for ($x=0; $x<100; $x++) {
$s2 .= " ";
$h2 = hash("sha256", $s2);
if ( $first == substr($h2,0,1) ) {
echo "{$s2}.{$h2}\n";
}
}
Assuming SHA-256 is randomly distributed, and there are 16 possible values for each character, this should produce ~6.25 matching hashes.
And, indeed it does! Four hashes appear which all have the same first character.
What about generating a "collision" of the first and last characters? 16x16 = 256. So that should be doable on modest hardware.
PHP
$s1 = "a";
$h1 = hash("sha256", $s1);
$first = substr($h1, 0, 1);
$last = substr($h1, -1);
echo "{$first}-{$last}\n";
$s2 = "b";
for ($x=0; $x<2000; $x++) {
$s2 .= " ";
$h2 = hash("sha256", $s2);
if ( $first == substr($h2,0,1) && $last == substr($h2, -1) ) {
echo "{$s2}.{$h2}\n";
}
}
That spits out a bunch of strings where the first and last characters of the hash matches the first and last of the original hash.
Of course, from there it gets exponentially harder. If you want to match 2 characters at each end, you'll need around 65,536 attempts. As it happens, it takes 74,653 attempts before a "collision" of the first two and last two characters are found.
Of course, padding with spaces makes the string significantly longer. So you could pad with incrementing Unicode characters. Depending on the contents of the message, you could fill it with invisible characters which wouldn't be seen by an ordinary user.
Is this a practical way to generate full-length hashes? No, obviously not. It goes some way to show just how difficult it is to force even a partial collision.
There is a brilliant explanation of this problem space on David Buchanan's blog.
More comments on Mastodon.