Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Was just verifying your tweet's hash, and then...omg!!! I couldn't believe what I realised. The SHA256 of THIS tweet starts with exactly the same 7 characters as your tweet's hash. What are the chances of that?

As always, the real WTF is in the comments.



It's not too difficult. All you need is to generate many variations of potential words and check whether the sha256 hash matches the wanted leading characters. For example, check this text.


    $ echo -n $'It\'s not too difficult. All you need is to generate many variations of potential words and check whether the sha256 hash matches the wanted leading characters. For example, check this text.' | sha256sum
    182a7c9d2e99162688aaaf3f97638edd7d06f8d295e456c7bb1f16abf3a8f70c  -


Isn't this kind of the same thing that crypto miners do?


Yes except Bitcoin does double SHA (a second round of SHA on the first result)


Well played.

More seriously, is there a really good, open source library that generates many slight variations of an input block of text?


Llama?


can we get this guy to pass a captcha? i've got questions


    $ echo -n $'Was just verifying your tweet\'s hash, and then...omg!!! I couldn\'t believe what I realised. The SHA256 of THIS tweet starts with exactly the same 7 characters as your tweet\'s hash. What are the chances of that?' | sha256sum
    182a7c9c08b2f0f9333bf23828c5fbf47addf74e815b6a22ca10825450bc2ee1  -
Checks out(!)

Source: https://twitter.com/benoconnor/status/1701057433131421935


I'm confused as to how they've done this. The original message, sure, you can brute force the digits and hope you get a collision, and try a new, plausible preamble if not. But I don't see how they've found this collision without it looking like anything is brute forced.

Is it using substitute Unicode characters or something?

E: no, just hand typed it and got the same...


Presumably generated lots of variations of that sentence. It's 28 bits, so you only need to have around 14 places where 4 variations are possible.

For instance, it could start with "Was", "I was", "verifying" could be "checking" or "computing" or "testing", etc.

A bit tight, but it seems feasible with some work.


Indeed, this is how I did it. 2^28 is around 270 million. With little more than a handful options, it should be possible. Although I have to say, it turned out to be more difficult than I'd initially thought. Maybe it's better to think of it as 28 different boolean choices.

  echo -n "Indeed. This is how I managed to do it. 2^28 is around 300 million. With only a handful options, it's possible. Although, I must say that it turned out to be more difficult than I'd initially thought. Perhaps it's better to think of it as 28 different alternatives." | sha256sum


This is crazy! The hash of this comment once again begins 182a7c9!!! Well done indeed!


Throwing a loop of numbers on the end seems far more feasible. I put together a hasty Python implementation which can immediately find three hits at 5 characters. TQDM is reporting ~450k tries per second, so depending on how lucky you are, would probably want to redo in a faster language to solve for 7+

    import hashlib
    import itertools
    
    import tqdm
    
    BLOCKS_SIZE = 5
    sentence_prefix = "The SHA256 for this sentence begins with:"
    
    itos = {0x00:"zero", 0x01:"one", 0x02:"two", 0x03:"three", 0x04:"four", 0x05:"five", 0x06:"six", 0x07:"seven", 0x08:"eight", 0x09:"nine",
           0x0a:"a", 0x0b:"b", 0x0c:"c", 0x0d:"d", 0x0e:"e", 0x0f:"f"}
    for nums in tqdm.tqdm(itertools.product(itos.keys(), repeat=BLOCKS_SIZE)):
        sentence = f"{sentence_prefix} {', '.join(itos[num] for num in nums[:-1])}, and {itos[nums[-1]]}."
        hash_true = hashlib.sha256(bytes(sentence, "utf8")).hexdigest()
        guessed_prefix = "".join(f"{n:x}" for n in nums)
        true_prefix = hash_true[:BLOCKS_SIZE]
        if guessed_prefix == true_prefix:
            print("collision")
            print(sentence)
            print(hash_true)


All we are demonstrating here is why Sha256 is 256 bits and not 32 bits. We have trivially identified collisions for the first 28 bits of the output, which is only 11% of the entire hash size.

Difficulty of collisions roughly doubles for each additional bit. Imagine we had a SHA32, that would be 16 times harder to achieve a collision. SHA256 is 43 with 67 zeroes behind it more difficult than the examples here.


Yeah, I forgot how many bits were in a hex digit and made it seem much harder than it really was to myself.


I also mess up base16 and either base 256 or base64 when doing Feynman estimates


7 digits of the hash makes for 16*7 possible hashes. I spot 4 potential "filler" lines in that tweet, so if you find log4(16*7)=14 candidates for each of those filler lines, then one combination would be expected to yield that hash.


I just did an extremely lazy version of that and posted a reply of "And the SHA-1 digest (in hex) of this tweet starts BEEF" - https://twitter.com/cooperx86/status/1701261047917633846

Basically I had several substitutions around words, case, punctuation, etc. and just ran it until it found some hits. Quite easy with just four characters though but was only a proof of concept.


It's possible that the tweets were actually produced together somehow. This might buy just enough search space between the two of them.


You can generate sentences of the form "This sentence begins with: " followed by seven comma-separated english words denoting hex digits. Then search that space of digits, until you get a hit.

For each digit combination, you can try it with multiple variations of the sentence like "The SHA256 of this sentence begins with", "The SHA256 hash of this text starts with" and many more. That increases the search space without increasing the number of digits that have to match, making it more likely that a hit is found.


I think this is a really good usecase for chatGPT to generate a massive number of variations that you then feed into a validation function


What are the chances of that? From the perspective of one specific sentence: one in 7 digits of hex: 16 * 7 = 268,435,456.


If we don't vary the sentence ("begins with, starts with, is prefixed with" etc...) and only consider the final part, then the number of english sentences is (unsurprisingly) equal to the number of possible hashes.

So for each distinct sentence, we pick randomly from the n different hashes, and we attempt to brute force this n times. So the probability of not getting this is:

(1-1/n)^n

As n increases, this will yield e^-1, so we have about a 36.7% chance of this not happening for any given length. So this has a probability of happening of 63.3%.

So there is a decent chance that there exists a sentence "The sha256 of this sentence is ..." for even the full sha256. If you are allowed to modify the sentence to be something like "'Begins with', 'starts with', 'OMG guys, check this out':" you can get this up to almost 1. Finding it would be mildly hard though barring some novel discovery about sha256.


Fyi your comment was dead on arrival, and seems many of your past ones are as well. I'd email dang and ask if you've tripped some spam filter.


What's the difference between "this sentence" and "this tweet" in this context?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: