> Was just verifying your tweet's hash, and then...omg!!! I couldn't believe wha...

johndough · on Sept 11, 2023

It's not too difficult. All you need is to generate many variations of potential words and check whether the sha256 hash matches the wanted leading characters. For example, check this text.

isp · on Sept 11, 2023

    $ echo -n $'It\'s not too difficult. All you need is to generate many variations of potential words and check whether the sha256 hash matches the wanted leading characters. For example, check this text.' | sha256sum
    182a7c9d2e99162688aaaf3f97638edd7d06f8d295e456c7bb1f16abf3a8f70c  -

wppick · on Sept 11, 2023

Isn't this kind of the same thing that crypto miners do?

acchow · on Sept 11, 2023

Yes except Bitcoin does double SHA (a second round of SHA on the first result)

CobrastanJorji · on Sept 11, 2023

Well played.

More seriously, is there a really good, open source library that generates many slight variations of an input block of text?

TylerE · on Sept 11, 2023

Llama?

stainablesteel · on Sept 11, 2023

can we get this guy to pass a captcha? i've got questions

isp · on Sept 11, 2023

    $ echo -n $'Was just verifying your tweet\'s hash, and then...omg!!! I couldn\'t believe what I realised. The SHA256 of THIS tweet starts with exactly the same 7 characters as your tweet\'s hash. What are the chances of that?' | sha256sum
    182a7c9c08b2f0f9333bf23828c5fbf47addf74e815b6a22ca10825450bc2ee1  -

Checks out(!)

Source: https://twitter.com/benoconnor/status/1701057433131421935

rawling · on Sept 11, 2023

I'm confused as to how they've done this. The original message, sure, you can brute force the digits and hope you get a collision, and try a new, plausible preamble if not. But I don't see how they've found this collision without it looking like anything is brute forced.

Is it using substitute Unicode characters or something?

E: no, just hand typed it and got the same...

devit · on Sept 11, 2023

Presumably generated lots of variations of that sentence. It's 28 bits, so you only need to have around 14 places where 4 variations are possible.

For instance, it could start with "Was", "I was", "verifying" could be "checking" or "computing" or "testing", etc.

A bit tight, but it seems feasible with some work.

aib · on Sept 11, 2023

Indeed, this is how I did it. 2^28 is around 270 million. With little more than a handful options, it should be possible. Although I have to say, it turned out to be more difficult than I'd initially thought. Maybe it's better to think of it as 28 different boolean choices.

  echo -n "Indeed. This is how I managed to do it. 2^28 is around 300 million. With only a handful options, it's possible. Although, I must say that it turned out to be more difficult than I'd initially thought. Perhaps it's better to think of it as 28 different alternatives." | sha256sum

Alextigtig · on Sept 12, 2023

This is crazy! The hash of this comment once again begins 182a7c9!!! Well done indeed!

fbdab103 · on Sept 11, 2023

Throwing a loop of numbers on the end seems far more feasible. I put together a hasty Python implementation which can immediately find three hits at 5 characters. TQDM is reporting ~450k tries per second, so depending on how lucky you are, would probably want to redo in a faster language to solve for 7+

    import hashlib
    import itertools
    
    import tqdm
    
    BLOCKS_SIZE = 5
    sentence_prefix = "The SHA256 for this sentence begins with:"
    
    itos = {0x00:"zero", 0x01:"one", 0x02:"two", 0x03:"three", 0x04:"four", 0x05:"five", 0x06:"six", 0x07:"seven", 0x08:"eight", 0x09:"nine",
           0x0a:"a", 0x0b:"b", 0x0c:"c", 0x0d:"d", 0x0e:"e", 0x0f:"f"}
    for nums in tqdm.tqdm(itertools.product(itos.keys(), repeat=BLOCKS_SIZE)):
        sentence = f"{sentence_prefix} {', '.join(itos[num] for num in nums[:-1])}, and {itos[nums[-1]]}."
        hash_true = hashlib.sha256(bytes(sentence, "utf8")).hexdigest()
        guessed_prefix = "".join(f"{n:x}" for n in nums)
        true_prefix = hash_true[:BLOCKS_SIZE]
        if guessed_prefix == true_prefix:
            print("collision")
            print(sentence)
            print(hash_true)

hinkley · on Sept 11, 2023

All we are demonstrating here is why Sha256 is 256 bits and not 32 bits. We have trivially identified collisions for the first 28 bits of the output, which is only 11% of the entire hash size.

Difficulty of collisions roughly doubles for each additional bit. Imagine we had a SHA32, that would be 16 times harder to achieve a collision. SHA256 is 43 with 67 zeroes behind it more difficult than the examples here.

rawling · on Sept 11, 2023

Yeah, I forgot how many bits were in a hex digit and made it seem much harder than it really was to myself.

hughesjj · on Sept 12, 2023

I also mess up base16 and either base 256 or base64 when doing Feynman estimates

noctune · on Sept 11, 2023

7 digits of the hash makes for 16*7 possible hashes. I spot 4 potential "filler" lines in that tweet, so if you find log4(16*7)=14 candidates for each of those filler lines, then one combination would be expected to yield that hash.

petercooper · on Sept 11, 2023

I just did an extremely lazy version of that and posted a reply of "And the SHA-1 digest (in hex) of this tweet starts BEEF" - https://twitter.com/cooperx86/status/1701261047917633846

Basically I had several substitutions around words, case, punctuation, etc. and just ran it until it found some hits. Quite easy with just four characters though but was only a proof of concept.

nstbayless · on Sept 11, 2023

It's possible that the tweets were actually produced together somehow. This might buy just enough search space between the two of them.

kazinator · on Sept 11, 2023

You can generate sentences of the form "This sentence begins with: " followed by seven comma-separated english words denoting hex digits. Then search that space of digits, until you get a hit.

For each digit combination, you can try it with multiple variations of the sentence like "The SHA256 of this sentence begins with", "The SHA256 hash of this text starts with" and many more. That increases the search space without increasing the number of digits that have to match, making it more likely that a hit is found.

JamesSwift · on Sept 11, 2023

I think this is a really good usecase for chatGPT to generate a massive number of variations that you then feed into a validation function

kazinator · on Sept 11, 2023

What are the chances of that? From the perspective of one specific sentence: one in 7 digits of hex: 16 * 7 = 268,435,456.

archgoon · on Sept 11, 2023

If we don't vary the sentence ("begins with, starts with, is prefixed with" etc...) and only consider the final part, then the number of english sentences is (unsurprisingly) equal to the number of possible hashes.

So for each distinct sentence, we pick randomly from the n different hashes, and we attempt to brute force this n times. So the probability of not getting this is:

(1-1/n)^n

As n increases, this will yield e^-1, so we have about a 36.7% chance of this not happening for any given length. So this has a probability of happening of 63.3%.

So there is a decent chance that there exists a sentence "The sha256 of this sentence is ..." for even the full sha256. If you are allowed to modify the sentence to be something like "'Begins with', 'starts with', 'OMG guys, check this out':" you can get this up to almost 1. Finding it would be mildly hard though barring some novel discovery about sha256.

krackers · on Sept 11, 2023

Fyi your comment was dead on arrival, and seems many of your past ones are as well. I'd email dang and ask if you've tripped some spam filter.

mcmoor · on Sept 12, 2023

What's the difference between "this sentence" and "this tweet" in this context?