SHA-0 Broken, MD5 Rumored Broken 707
An anonymous reader writes "Exciting advances in breaking hash functions this week at the CRYPTO conference. SHA-0 has
definitely been broken (collision found in the full function). Rumors are that at the informal rump session, a researcher will announce a collision in full MD5 and RIPEMD-128. And Ed Felten is speculating about collisions in SHA-1! Many systems, especially those that use cryptography for digital signatures are most at risk here."
md5 is so weak (Score:5, Informative)
Isn't this Inevitable? (Score:4, Informative)
Kind of expected this (Score:5, Informative)
I have been expecting the MD5 crack for a while, it just isn't a secure hash anymore. SHA-0 was proven to be mathematically weak back in '98. There was no real need to brute force it. I highly doubt that SHA-1 was cracked, if it was, we are in trouble, is there any better hash to replace it? I figured that we would get quite a few more years out of SHA-1
What we really need is a mathematically strong hash which will let you user define its strength. For example the first byte of the hash tells the program how strong the hash is. As the strength byte increases, the mandatory execution time of the hash increases exponentially.
Re:Consequences? (Score:1, Informative)
It's in the attached articles there. MD5 and others are "hash" algorithms. They generate a small "digest", say 2048 bits, from a much larger data set. So if you run a 85-meg JPEG through MD5, you'll get a (hopefully) unique 2048-bit number. The goal is to have a hash algorithm that makes it impossible for anyone to "decode" the MD5 digest back into the 85-meg JPEG that it came from.
So what's the big deal about finding a collision? The answer is this direct quote:
The finding of a single collision in SHA-1 would not, by itself, cause much trouble, since one arbitrary collision won't do an attacker much good in practice. But history tells us that such discoveries are usually followed by a series of bigger discoveries that widen the breach, to the point that the broken primitive becomes unusable.
In other words: When people find collisions (two different datasets that result in the exact same digest), then that is the first step towards being able to "reverse" the digest process, and extract the original data from the digest, thus rendering the encryption useless.
Re:Consequences? (Score:2, Informative)
Re:Don't the laws of computing make it... (Score:1, Informative)
OTP (one time pass) is totally unbreakable! but is not very practical for use.
How is this news? (Score:5, Informative)
On a side note, I would recommend 'Practical Crytography' to anyone interested in, or working with modern crytography.
Re:Airplane quotes (Score:4, Informative)
BTW, it wasn't that long ago that /. reported about the Feds finally formally dropped DES. [slashdot.org]
Collision != Broken (Score:3, Informative)
All they found here were two values C, D for which H(C) = H(D). Anyone who thinks this was suprising needs to take a look at the idea of a hash:
Start out with 512 bits (2^512 possible values)
Hash it to 128 bits (2^128 possible values)
For each possible value to come out of the hash, you're going to have (2^(512-128)) = 2^384 collisions.
If they could find two data values with less than, say, 80 bits of data that both hash to the same, then that's something, 'cause that could be someone's password. And if two values of 80-bit data hash to the same thing, then the algorithm is bad.
I just don't think this is anything special. It's sort of like saying, "look, i found my password in the newspaper by circling various letters and numbers thoughout the articles." Or maybe not, it's still just not that great.
Re:Consequences? (Score:5, Informative)
Re:Should We Fear? (Score:5, Informative)
Carefully crrafted binary data can be made to have the same checksum.
This is not a generalised attack where I can create binary data to have a CHOSEN checksum.
Therefore, if you verify your downloads by checksum, I can't generate a fake download with the same checksum.
First step is MATCHING some checksums (this has been done)
The next step is CHOOSING the chekcsum (aka DEADBEEF attack)
The next step is MANIPULATING, i.e. adding junk to a given binary file to allow you to choose the cheksum. that's the scary one!
- substitute trojaned binary
- append some binary junk so the checksum matches
- profit!!!
Nothing to worry about yet, sort of like the first proof-of-concept brute force crack of DES.
Yes, it can be done under some circumstances.
Yes, eventually processing power and methods may improve to make this a valid attack
Yes, you can sleep soundly tonight.
Re:Of course! (Score:3, Informative)
There always going to be collisions in check-sums. If that weren't the case than we wouldn't need to distribute actual files, just check-sums.
Um, wow. Let's start from the top.
1) Yes, there's always going to be collisions in check-sums. Fewer bits than the checksum than the data, pigeon hole principle, yada yada.
2) Even if there were no collisions, the whole point (well, one of two... more on the second below) is that they're NOT REVERSIBLE. So perhaps just distributing collisionless hashes would be a good replacement for kazaa (it's all about collection, not actually listening, doncha know)... but for those who like to be able to use files, it's kinda missing the point.
3) And that point is, no KNOWN collisions. More importantly, no way, given a known data/hash pair, to generate another data string with the same hash. If you violate that principle, you've made the cryptosystem useless for ensuring that the data is the same data that was originally hashed. And that's exactly what this article is about... a technique (who knows how general yet?) to find collisions.
Re:Consequences? (Score:5, Informative)
However, it might be possible to construct a file to a given hash. In that case, MD5 sums become worthless to check wether a downloaded file is what independent sources claim it is.
The security implications are quite severe. Also, P2P clients that use MD5 to identify unique files will be vulnerable to spoofing by malicious users.
Re:Collision != Broken (Score:5, Informative)
The existence of collisions is obvious. The point is that it should be so phenomenally difficult to find a collision that you can't ever come up with one in a sane amount of time.
If they could find two data values with less than, say, 80 bits of data that both hash to the same, then that's something, 'cause that could be someone's password.
That would be unlikely to be a big deal. For passwords, I can just brute force the password, maybe precompute a dictionary of password hashes.
MD5 Rumored broken? (Score:2, Informative)
Re:Consequences? (Score:4, Informative)
The REAL problem here is that hashes are used to store passwords and other security sensitive data. It is still impossible to recover the original data, but if a method is known that produces collisions then it would be possible to find an equivalent data set to give the same resulting hash. In other words, you may not have the exact same key but you can make one of your own that will work anyway.
One possible (and temporary) solution would be to salt the data somehow. This adds an extra layer of security because the hash you are looking at is a an unknown password AND an unknown data set which you (theoretically) have no access to. You can generate a data set that produces the same hash, but when submitting that data set it will be salted to generate a new hash that won't work. Think of it as a single math equation with two unknowns.
This fails, of course, if you manage to get a password and it's corresponding hash. Then you only have one unknown (the salt).
=Smidge=
Re:Don't the laws of computing make it... (Score:5, Informative)
in regards to the strength of 256-bit encryption:
now, the annual energy output of our sun is about 121 * 10^41 ergs. this is enough to power about 2.7 * 10^56 single bit changes on our ideal computer; enough state changes to put a 187-bit counter through all its values. if we build a dyson sphere around the sun and captured all of its energy output for 32 years, without any loss, we should power a computer to count up to 2 ^ 192. of course, it wouldn't have the energy left over to perform any useful calculations with this counter.
but that's just one star, and a measly one at that. a typical supernova releases something like 10^51 ergs. (about a hundred times as much energy would be released in the form of neutrinos, but i let them go for now.) if all this energy could be channeled into a single orgy of computation, a 219-bit counter could be cycled through all of its states.
these numbers have nothing to do with the technology of the devices; they are the maximum that thermodynamics will allow. and they strongly imply that brute-force attracks against 256-bit keys will be infeasable until computers are built from something other than matter and occupy something other than space.
bruce schneier, applied cryptography, p 158
Re:huh? (Score:3, Informative)
md5 [wikipedia.org]
ripemd [wikipedia.org]
digital signature [wikipedia.org]
cryptograph [wikipedia.org]
crypto conference [wikipedia.org]
Ed Felten [wikipedia.org]
-jim
Re:Umm... (Score:5, Informative)
The complexity of the attack was around 2^51.
SHA-0 is a 160-bit hash. Now, you might at first think that means it would take aorund 2^160 tries to get two strings that collided, but because of the birthday paradox (what's the probability that two people on a football field have the same birthday) it's actually supposed to be the square root of that, or 2^80.
Unfortunately, 2^51 is much, much less than 2^80. Which means that this crack was not just brute force (which is kinda redundant, since the definition of a crack to a cryptosystem is a violation like this without using brute force, but whatever). That means they've found a mathematical technique for making this 'easy'. Generally, once the idea is out there for an easy crack, people can gnaw on it and make it more general or easier, bringing this down to every-day-concern level.
Re:What are the rammifications? (Score:3, Informative)
So, the fact that there are collisions is not a big deal; we already knew that. If one is found, then that doesn't really change anything. What would change everything is if someone found a way to find a collision for an arbitraty byte string. That way you could set up a mirror for a popular piece of software and instead of the real program, you could distribute something that has the exact same md5sum, but which does something completely else (most likely nothing sensible, since it'll be effectively a random string).
Re:Umm... (Score:3, Informative)
But there shouldn't be any practical effect - SHA-0 was believed to be weak, so nobody really uses it. The news about possible weaknesses in other algorithms like SHA-1 and MD5 is much more significant.
Re:Consequences? (Score:0, Informative)
A cryptographically strong hash has so much chaos in the hash function that "dialing in" the desired hash by fiddling with bits in the source (plain text) file takes years and years to produce the desired hash. A cryptographically weak hash (a "broken" hash algorithm) could be exploited to produce the desired hash in minutes or hours.
Re:Consequences? (Score:5, Informative)
You say 2^160 like it's some tiny number.
To demonstrate how large that truly is, let me do a little thing that my cryptography prof did when I took the course several years ago:
2^160 = 1.4615016373309029182036848327163e+48 possibilities. (We'll be nice to the people following along and say 1.46 x 10 ^ 48 possibilities.) Now, let's say you can successfully generate one unique hash per second, that's 1.46 x 10 ^ 46 seconds.
But just how long is that exactly?
Well, let's be nice (for the sake of making the math easy), and say that there are 100 seconds in a minute. That's 1.46 x 10^46 minutes. Now, let's do the same thing for minutes in an hour: that's ~ 1.46 x 10^44 hours.
Now we reach hours in a day. I'm feeling really generous, so we'll say 100 hours in a day. That's 1.46 x 10 ^ 42 days.
365 (or 366) days in a year is too close to 3 x 10 ^ 2, and dividing by 3s is just messy, so we'll say 1 000 days in a year. That gives us 1.46 x 10 ^ 39 years.
Chances are good that our sun will have burnt itself out long before you've managed to generate every single possible hash.
However, what makes this particular case more interesting is that they've found a way to get a collision without brute forcing their way through every possible hash. It's not particularly useful yet, as it's still a 2^51 problem, (approx 2.25 x 10 ^15), so it's hardly trivial enough for you to do on your home PC, but it's a step in the right direction.
Re:Isn't this Inevitable? (Score:3, Informative)
I'm not sure about the definition of "collision" in this case, but of course if you always reduce the number of bits, or always maps to fixed number of bits, then you will always have >1 thing (specifically, an infinite number of things for most hash algos) which will map to the same reduced number of bits.
What they mean by "broken" is that you can trivially *find* such a string.
Summary (Score:5, Informative)
First, a little about SHA and SHA-1. SHA (0) was developed as a national standard hash function. Curiously, a last minute change was proposed by the NSA which resulted in SHA-1. The change was puzzling the cryptographers in academia. After some time, an attack was discovered on some classes of cryptographic hashes which SHA-1 turned out to be curiously immune to.
All that aside, one should view cryptographers a bit like plumbers. Cryptographers have a bag full of equipment (algorithms), which they combine in many ways to accomplish what they want. After much research and testing, certain ways of combining those pipes, valves, and spigots are 'proven' to work well, within given assumptions, such as expecting an L-joint not to leak. SHA-1 likewise is an integral component in many cryptographic systems.
SHA, MD5, RIPEM, and others are cryptographic hash functions. Cryptographers expect certain things out of a hash function. Its job is to take a variable amount of input bytes, and give you back a small, fixed number of bytes. Most importantly:
So, in summary, this discovery is a bit like finding out that an L-joint which has been used reliably by plumbers the world over is much more likely to leak than previously thought.
But we haven't seen the results yet...
Re:Time taken (Score:1, Informative)
Re:Umm... (Score:5, Informative)
The key is not that they found a collision, but that they did so with substantially less than brute-force effort. Substantially less here means 30 - 130 orders of magnitude less (depending on whether you call SHA-0 a 2^160 space or a 2^80 space because of the birthday paradox mentioned in the parent article).
--Pat / zippy@cs.brandeis.edu
Re:What are the ramifications? (Score:5, Informative)
So, very simply, collisions are when a hash function takes two different input keys and produces the same hash for both as output. Going with our English-language library example from above, "Stevenson, Robert Louis" and "Sheridan, John" would both produce the same hash ("S"). That's all there is to a collision. They are inevitable when you take longer strings and turn them into shorter ones.
Now what this has to do with security may be becoming clearer. Because SHA1 and MD5 are good, general-purpose algorithms, they are used in lots of places. They are used to store your password. (You enter your password, the system runs MD5 or whatever against it, then stores that. When you need to verify your password, it runs the same process and compares the hashed versions.) They are also used as "digital signatures" to verify that the content of an object hasn't changed. And they are, of course, used in hash tables with a lot of disparate entries.
But this is genuinely not something you should be worried about. Like I said, collisions are inevitable. But the computational effort required to find one is far from trivial. It's much easier for someone to crack your password using a dictionary-based attack or social engineering or pretty much anything else, at least for now. When you should worry is if someone finds a way to take a hash and use that to produce something which, when hashed again, will result in the original: that is, two-way hashing. MD5 and SHA1 are both allegedly one-way hashes, so you cannot ever go from the hash to any sort of original data. This is why they're secure for passwords and the like. But if that should turn out to be wrong...
I can't speak much to specifics about MD5 and SHA1 because I don't really have the background in math to do so. (I have the background to write a computer program implementing the algorithms, but I can't explain why they work so well.) MD5 is older and has been suspected to be not-so-good for a few years now. SHA1 is newer and better. That's about all I can tell you.
collisions found in MD5 (Score:5, Informative)
Paper here:
http://eprint.iacr.org/2004/199.pdf
Independently verified here:
http://www.mail-archive.com/cryptography@m
Saltation of Password Hashes (Score:2, Informative)
Generally not true. Most password hashing is done with unique salt data for the system, which is unknown to the cracker. In other words the passwords is not simply hashed by itself. (That would be really stupid.) Rather the password is mixed up (in a consistent way) with the "salt" data before the hash is generated. This prevents brute force generation of a password based on the hash. The larger the salt data set the better, of course.
Re:Consequences? (Score:3, Informative)
Correction (Score:5, Informative)
So here's the calculations for that:
2^80 = 1.208925819614629174706176 x 10^ 24, so we use 1.2 x 10 ^ 24.
1.2 x 10 ^ 24 seconds / 100 seconds/minute = 1.2 x 10 ^ 22 minutes
1.2 x 10 ^ 22 minutes / 100 minutes/hour = 1.2 x 10 ^ 20 hours
1.2 x 10 ^ 20 hours / 100 hours/day = 1.2 x 10 ^ 18 days
1.2 x 10 ^ 18 days / 1000 days/year = 1.2 x 10 ^ 15 years
A much smaller number, but you're still not likely to get it before the sun goes out.
Re:Consequences? (Score:5, Informative)
Nooo.... Not even close.
Let me illustrate with a simple example. Imagine that a hash function produces a 2 bit digest value from an arbitrary input 500 bit string. It's not a good hash function, because in about three tries you find another input string that produces the same hash value. However, you don't have any hope of turning one of the hash values (0, 1, 2 or 3) back into the 500 bit input string, even though the hashing algorithm is clearly broken by any commonly accepted definition.
A more realistic example is the MD5 hashing algorithm, which produces a 128 bit digest value from an arbitrary input string. Assume, for the sake of simplicity, that you're hashing only 1024 bit strings (though the input strings could be of any length). Then there are 2^(1024-128) = 2^896 possible input strings for every hash value. Your chances of guessing the correct input string are 1 in 2^896 (i.e. zero.) Remove the restriction that the input string must be of fixed length and your chances suffer even further.
In reality, the problem with a broken hashing algorithm is that an attacker can manufacture an altered document without affecting the "tamper-resistant" seal, i.e. the hash function digest. Other applications of digest algorithms dependent on probabilistically-guaranteed uniqueness of digest values as a function of content, like the creation of content-based unique identifiers, are likewise endangered.
However, the mere existence of a few random collisions, while potentially a source of unease for system and application designers, is not in itself evidence that a hashing algorithm is broken. Brokenness implies that for a given input X and digest F(X), other inputs Y can be generated in limited time such that F(X)=F(Y).
MOD PARENT DOWN (Score:5, Informative)
http://www.google.com/search?q=%22attack+relies+on +the+idea+of+producing+duplicates%22&num=20&hl=en& lr=&ie=UTF-8&safe=off&filter=0
Re:Don't the laws of computing make it... (Score:2, Informative)
Re:Isn't this Inevitable? (Score:5, Informative)
(The pigeonhole principle says that if you shoot N pigeons with N+1 bullets, and don't miss, one pigeon has two holes... or something like that.)
Now hash functions can be classified as either weak collision resistant or strong collision resistant.
A hash function h is weak collision resistant if, given h and a bitstring x, it is impossible to find some other x' such that h(x) = h(x'). Note that x is specified.
A hash function h is strong collision resistant if it is infeasible to compute any collision (x, x') of h.
So, if a collision was found in MD5, it's no longer strong collision resistant (since the person found a pair x, x' such that h(x) = h(x')). Its when MD5 is no longer weak collision resistant that things start to get scary -- meaning, given any arbitrary bit vector, I can find another bit vector that generates the same hash value in a reasonable amount of time. This means that, after you download your favorite Linux ISO, just making sure the MD5 hash checks out does not garuantee the authenticity of the file.
But no worries. There's still SHA-1. And if that isn't good enough, there's its bigger cousins, SHA-256, SHA-384, and SHA-512 [nist.gov].
- sm
Re:Consequences? (Score:4, Informative)
attack...
For the sake of this post, you will blatantly paste in great wodges of content from http://www.security-forums.com/forum/viewtopic.ph
Re:Don't the laws of computing make it... (Score:5, Informative)
Sorry, didn't mean to mock, it's just amusing whenever these one-time pad things come up and everyone starts jumping up and down yelling "unbreakable" and others start going "no, 'cos, like, we could brute-force it.."
You can't brute-force a one-time pad. That's the point. There are many weaknesses to OTP, related to key exchange, but you can't brute force it, because you have no way of knowing if you're right, or even if you're close. The possible set of plaintexts from a properly OTP encrypted message is the complete set of possible plaintexts of that size (or smaller, plausibly).
Let us take the following ciphertext:
aaa
I have encrypted this with a one time pad. Now, it's a pretty short message. We could brute force all the possible combinations on your regular computer pretty much instantly. Anyone care to guess what the message might be?
Of course not, because it could be *any* 3 letter combination, assuming that I'm sticking to letters. Any attempt to contextually analyse it is flawed because you will never be able to prove you got it right. Let's say that we know the message is english, and we can therefore reduce the number of possibilities down to all 3 letter english words.
Woohoo. It doesn't help, it doesn't get us any closer to knowing exactly what it is, because there is no next step, the only information that can aid us in the decryption of a one time pad is information from "outside" the decryption. In this case, two items of information are available to us, the length (3 letters) and the fact that it is english, but the actual ciphertext itself is of no value whatsoever. It doesn't matter if those a's were z's or q's or anything else, we can't do anything with them unless we have the OTP.
"Decrypt candidates with "bad" and "moo" in them would definitely merit further analysis"
This is always the point where things go wrong
abskjhsglkjlssdkglkjsfdlkgjfld
Now lets imagine that we knew it was coming from bank robbers. Sweet, so, what can we do? again, the *only* information we have is the length. It could say this:
I am going to rob a bank in WA
Or this:
I am going to rob a bank in CA
Or this:
I am going to rob a bank in NZ
And there is no way to prove what it actually says at all. It might say:
I am going to buy some flowers
Again, you'll never know unless you have the key, there's just no way to tell.
SHA-1 is not SHA-0 (Score:4, Informative)
I don't really know what Ed Felten means regarding "weaker cousins" of SHA-1 being the only other popular hashes. SHA-256 is a cousin, but has a larger hash size and is generally considered to be stronger. SHA (SHA-0?) is a cousin, but I've never seen it deployed anywhere. MD4/MD5 are still mildly popular, but they're at best design ancestors and not cousins -- there's no way a break in SHA-1 is going to make them any less secure (they have their own issues, of course). RIPE-160 and TIGER are the only two other hashes I've seen in the field, and they too have nothing to do with SHA-1.
There might be something here -- the left shift could have been a band-aid solution to an architectural fault in the SHA design, and there may be lots of curiosity about whether the new SHA-0 attack routes neatly around the fix. We'll see.
Hash Clash is nothing new. (Score:3, Informative)
"HashClash" has been around since the first hash function was written.
Re:Don't the laws of computing make it... (Score:3, Informative)
He's got a point, but he's assuming many, many things.
Re:Don't the laws of computing make it... (Score:4, Informative)
OTPs are provably secure, as long as the key isn't compromised, e.g. by reusing it....
Here is a good link that answers the question: Why Are One-Time Pads Perfectly Secure? [std.com]
-a
Re:Should We Fear? (Score:2, Informative)
Of course, even though the hash space is much smaller than the potential plaintext space, it is still large enough that it is infeasable to find such a collision.
For MD5, the plaintext space is infinite, and the hash space is 128 bits. Thus, for all inputs longer than 128 bits, there must, by necessity, be collisions.
Re:Don't the laws of computing make it... (Score:5, Informative)
"I killed John Smith"
"I did not kill John"
There is no way to tell which is the real plaintext. Since every single 19 character sequence will appear, every 19 character English sentence and every 19 character sentence in every other language that uses the same character set needs to be checked. You can eliminate as many garbage results as you like, there'll still be a huge number of non-garbage results that you have no way of choosing between. In fact, you might as well not even look at the cipher text - it can tell you nothing. Just enumerate all 19 character sentences and work from there.
Re:Don't the laws of computing make it... (Score:5, Informative)
Yes, the OTP is the way to go -- sequence of random bytes, which you simply XOR with your message. Dump out /dev/random to a CD-R or DVD-R, make a copy for your friend, and you've both got nice one-time-pads that will probably last you quite some time.
What's interesting is that quantum physics offers several new things that will help implement excellent OTP systems... over existing fiberoptic telecom systems, no less! This is really exciting stuff.
First, quantum physics offers us a new way to generate truly random numbers for your OTP. Your rand() function sucks, I guarantee you. /dev/random is very good, but slow... /dev/urandom uses hash mixing so isn't nearly as random. Both rely on physical events, time intervals, and possibly thermal noise. In comparison, a quantum random number generator [idquantique.com] in theory gives you random bits that are totally un-influencable.
So now you've got your random bitstream... what do you do with it? Well, you hook up the OTP stream to a laser-based system that sends essentially single photons down an optical fiber. The idea being that single photons are either received by your friend or intercepted (absorbed) by your enemy. They can't be copied. Anyway several factors complicate this process but the basic idea remains. It's for real [nowwireless.com].
So your computer can generate a random OTP, securely send it to your friend (without fear of interception), and now you can both exchange classical data encrypted with your OTP. Repeat as necessary. If the physics behind this is sound, we shouldn't have to worry about algorithmic attacks in the future. Here's a rather complete [aip.org] article describing everything.
Re:collisions found in MD5 (Score:5, Informative)
Re:Don't the laws of computing make it... (Score:5, Informative)
but it assumes something which is impossible, that is, perfect algorithms.
No, it doesn't. Schneier is discussing the computational complexity of scanning a uniform keyspace, which depends only on key size and doesn't touch on algorithmic quality in the slightest. Now, if he were talking about the security of 256-bit ciphers against any attack, rather than just against brute force, then the quality of the algorithm would indeed be important.
MD5 not *quite* broken yet, but maybe close.... (Score:5, Informative)
Eventually I gave up trying to reproduce the hashes, and went to looking online. I found a good summary explaining the mistake the authors made (endianness problems, mostly) at this website [rtfm.com].
The end result is that they didn't break MD5 -- yet. But their result can probably be modified to break the real MD5. Looks like we have a few more days till the world ends. ;)
Re:How is this news? (Score:5, Informative)
NIST (working with the NSA, but in a GOOD way) develops the SHS, of which SHA is part, during their effort to develop the DSS (of which the DSA standard is part). The standard is published, and everybody rejoices.
A little while later, NIST says, "Hey! There's a new version of SHA! We're calling it SHA-1. There are, uh, 'security concerns' with the old SHA. We can't tell you what it is, but don't use it. By the way, the main difference between SHA and SHA-1 is the introduction of a shift instruction. Buh-bye!"
The obvious inference here is that the NSA's crypto folks found a nasty problem with the original algorithm, and had it changed in the interest of keeping the Secure Hash Algorithm, well, secure.
Now, this catches the interest of cryptologists. They spend a bunch of time analyzing the algorithm, and a couple of folks found that there are a few attacks-- very theoretical, very impractical-- that could find a collision in slightly less time than a brute force search or a birthday attack.
With that, everybody stops trusting the original SHA. Most people decide to use SHA-1 or MD5 instead-- but SHA-1 has a longer hash length than MD5 (SHA-1 is designed for ~80 bits of security, MD5 designed for ~64 bits), so a lot of people decide to trust SHA-1 instead.
What's happened at CRYPTO '04 is a significant improvement on the attacks discovered previously. A birthday attack on the SHA-0 algorithm would normally be on the order of 2^80 memory and 2^80 time. This attack used 2^51 time and significantly less memory-- in other words, this attack is somewhere in the area of 500 million times faster than previously known possible.
As for the idea that "SHA-1 is not to be trusted", well-- maybe the more paranoid folks avoid it, but a lot of applications still use SHA-1:
For the record, SHA-1 is used in PGP (but not exclusively, and not necessarily as the default hash algorithm), in SSL (again, this can be configured), in a number of TripWire-like programs, by Gentoo's "emerge" system, the *BSDs' "ports" trees, and-- this is important, here-- as the only officially recognized hash for use in the digital signature standard (DSS).
So an attack on SHA-1 would be VERY significant.
Re:Don't the laws of computing make it... (Score:5, Informative)
First let's start with something that might return some "sensible" (i.e. not ridiculously high) numbers. On the Apple II, Basic programmers had access to an incredibly low resolution mode with 40x40 pixels and 16 colors. Assuming we only use 2 colors (say, black and white), there are:
2^(40*40) = 4.44 x 10^481 possible screen images.
Whew! Already far beyond the 2^256 limit discussed. But out of curiosity, we can look at some other numbers. Using the full 16-color support of this low-res mode:
16^(40*40) = 3.90 x 10^1926
How many possible terminal screens are there, assuming only alphanumeric (and space) characters?
(26+26+10+1)^(80*24) = 5.41 x 10^3454
And some other modes of interest:
320 x 200, 2 colors: 8.31 x 10^19265
320 x 200, 256 colors: 2.27 x 10^154127
640 x 480, 256 colors: 2.07 x 10^739811
After this, direct computation was far too slow, but we can get rough estimates:
640 x 480, 16-bit color:
640*480*log10(65536) = 10^1,479,622
800 x 600, 16-bit color:
800*600*log10(65536) = 10^2311910
1024 x 768, 16-bit color: 10^3787833
And finally...
1024 x 768, 32-bit color: 10^7575677
Yep, a 1 with 7.5 million zeroes behind it. So we may all have to wait awhile before we see a computer sequentially generate a picture of alternate post-Caesar Earth. Still, an interesting thought.
Re:breaking BitTorrent (Score:4, Informative)
In order to put two different pieces with the same SHA-1 checksum in a file, odds are your file is garbage anyway. You could not do this with someone elses file (as you mention, a copyright file).
You could however, very simply, send incorrect data (on purpose) that matches the hash in the torrent file. Users' bittorrent clients would believe the file to be complete when it is in fact corrupted. This would result in 100% of the users who downloaded that segment from you getting a corrupted file.
crypto news flash... (Score:5, Informative)
Lets start with SHA-0 collisions, methods for determining collisions in the original
SHA algorithm have been known since 98'. In 95' the NSA issued a modification to the
SHA-0 (original algorithm) which became SHA-1, the modification was to counter an attack
unknown to open researchers known as parallel shifting. The two French guyz which found
collision in SHA-0 in 98' were the first open researchers to encounter this attack method.
A side point I would like to make is that the NSA made a similar change back in the early
70's to the IBM DES algorithm which until 89-90 was not known why such a change was needed.
The attack the modification was protecting against was differential cryptanalysis. early 90s
there was a 20 year difference in knowledge late 90s there was only 3 years difference in
knowledge GO FIGURE!
So in theory the SHA news is old news, as far as MD5 and RipeMD, well anyone that has the
slightest clue of the mathematics behind message digests will know that all MDs derive from
the same logic and same mathematics, and that a flaw found in one algorithm can be easily
transferred to another algorithm if that a particular algorithm hasn't already dealt with
that specific attack/flaw.
And on a final note:
"Many systems, especially those that use cryptography for digital signatures are most at risk here."
Tell me of system in the world that uses security and does not make use of hash functions?
Arash Partow
_______________________________________________
Be one who knows what they don't know,
Instead of being one who knows not what they don't know,
Thinking they know everything about all things.
http://www.partow.net
Re:Don't the laws of computing make it... (Score:3, Informative)
The phone line between the Kremlin and the White house is enciphered this way.
D.
Re:Hash Clash is nothing new. (Score:4, Informative)
Re:Should We Fear? (Score:3, Informative)
Important note - with a guaranteed lower number of operations. Instead of 2^64 (SHA0) they used a method that always delivers 2^51. While neither one of these is within Joe Average computing power, the difference may bring a message into the cracking range of people who are in possession of "heavy computational artillery".
Re:So? (Score:3, Informative)
Yes, I can think of one immediately.
File sharing networks, where someone can inject garbage with the correct hash - for example in a bittorrent network. I don't know which hashing function bt uses, but if it turns out to be easy to generate collisions - you could wreak havoc against movie-sharers all over the globe.
I'm pretty sure the MPAA / RIAA would throw a gigantic party if such a break became easy.
Re:Hash Clash is nothing new. (Score:2, Informative)
Re:Summary (Score:2, Informative)
1. The output doesn't look like the input. (This is actually really, really easy. The easiest bit by far of the problem. Say you've got a 20-byte output - as soon as you've got a 21-byte input, Shannon's law pretty much guarantees that your output no longer looks like your input - you *have* to lose information. If you manage to produce a 20-byte hash algorithm from which it's possible to reconstruct 1Mb of input text, you're in line for a Nobel prize and a visit from angry physicists with pitchforks and torches who now need to revise their ideas about the second law of thermodynamics.)
2. Trying to find two inputs that produce the same output should ideally take at least 2^(N-1) steps. (It's necessarily easier to find two inputs with matching output than it is to find a matching input for a given output. Look up "Birthday paradox" on Google for why.)
3. Trying to produce a input that will hash to a given output should ideally take at least 2^N steps (where N is the bitsize of the hash)
Re:Should We Fear? (Score:4, Informative)
"The next step is MANIPULATING, i.e. adding junk to a given binary file to allow you to choose the cheksum. that's the scary one!"
Fortunatly manipulating a file to have the checksum you want is a hell of a lot more difficult than finding a colision between two bits of random data which don't need to match any format or rules.
Re:Umm... (Score:2, Informative)
Perhaps not, but it could still work...
Re:Should We Fear? (Score:4, Informative)
Re:Don't the laws of computing make it... (Score:1, Informative)
Re:I found the original paper (Score:3, Informative)
The MD4 collision has been verified [mail-archive.com].
The MD5 collision has some problems, but the researchers are trying to re-run their code before tonight's presentation.
Re:Should We Fear? (Score:3, Informative)
Of course you can do this, but the question is what do people actual do. In every single case I know of where digital signatures are actually used (including X.509 certificates, signed e-mail, etc.) it's a hash that's signed. For this to be secure, both the hash and the signature method has to be secure. If people just discovered a way to break the hash, then they can potentially break such a system.
MD5 Collision *NEARLY* Found (Score:4, Informative)
As it turns out, there was an error in their findings - and that the MD5 value created is NOT the same in this incident.
Re:Sigh... (Score:3, Informative)
That's more like three than two, two is where you generate a large random file that hashes the same as the iso (thus making downloads look ok, but actually be bad). If you can start with some known data, and then append something to make the hash be what you want then, yes, that's almost as bad as starting with the original and altering it but keeping the hash.
You also have other problems like the .iso you end up with really needs to be the same size and you probably need to break both MD5 and SHA-1 on the same file, which is probably much harder.
a bad day for hash function primitives (Score:3, Informative)
Collisions were found in MD4, MD5, RIPEMD, and HAVEL (and SHA-0).
No collisions were found in SHA-1, but progress has been made in finding near collisions.
The presenter didn't give many details about how the collisions were derived, but did give some timing results. The highlight of the evening was when the slide for MD4 came up, and in the timing results section there was text similar to: "4 to 24 iterations. Can be done by hand." Ouch.
I think for MD5 it only took a couple of hours to find the collisions.
PLUS, in the talk on near collisions of SHA-0, the presenters were able to find messages that collided through ~30 rounds and were decipherable as english sentences...
Other talks seemed to indicate that SHA-1 *and* SHA-2 (256/512 bit) may be susceptible.
All in all, it was not a good day for hash function primitives.
Re:why not do both a md5sum and a sha-0 ? (Score:2, Informative)
What you suggest is essentially just a 288 bit hash (128+160). The question is, would you rather trust this 288 bit hash created from two weak hash functions, or one of the 256 bit hash functions designed to replace SHA-1 and MD5?