Sunday, July 15, 2012
Password salting: Why it matters
I'd like to take this afternoon to explain why salts matter so very much in password storage. Why am I qualified to write this? Because I write password cracking tools, and have focused on attacking large lists of unsalted hashes very efficiently. And salts make my life very difficult.
Consider this a very long response to @jimblandy's tweet this morning.
I'm not picking on him specifically - this is a sentiment I see repeated over and over, in a wide variety of places. And it's time to respond. Also, it appears to be #passwordssunday for some reason.
For the purposes of this post, I'm going to avoid referring to hash algorithms by name, because it simply does not matter - this applies equally to all of them.
$pass: This is the user's password. It's probably a very bad password - users do this.
$salt: This is a PER USER SALT. This is vital. I've seen certain websites suggest very wrong ways of salting that will be explained later.
hash(): This is my hash function. It could be MD5, SHA1, SHA256, SHA512, Stribog, whatever.
An unsalted hash type (which is far too common in the wild) is when you simply perform the following: hash($pass)
It's a one way function, so nobody can go from the hash back to the password, and it's secure! Or so we're told.
The problem with this method is that, given the same input string, a hash function will always give the same output string. This is what they're designed to do.
Unfortunately, this means that if several users choose the same password, they will always end up with the same password hash.
Password cracking tools don't "reverse the hash" by going from the hash back to the password. They simply go forwards, really fast (perhaps at a rate of 154B NTLM/sec?). The tool generates a password, runs it through the hash function, and then checks it against the list of unique hashes. The "Checking against the list of unique hashes" process can be done insanely fast, with bloom filters (often implemented as a bitmap or 4).
What this means for the attacker is that they can go through a list of 7.5M hashes not much slower than for a few dozen hashes. This is bad - users reliably choose very bad passwords, and this is why the final crack rate for unsalted hash lists approaches 95% in most cases.
Now, one thing that has been tossed around on occasion is a static site salt. This is where instead of storing hash($pass), you store hash($pass."MySiteSecretPassphrase") - but this doesn't solve the problem above. This still means that hash("passw0rd") == hash("passw0rd") - which is the problem we're trying to avoid.
It has been claimed that if an attacker does not know the site secret, they can't crack the passwords - which is true, but if an attacker has just dumped your database, they can probably get your source code if they want it. Sad, but true. Also, this doesn't help against a "static site salt" of, perhaps, "changeme". As soon as an attacker finds the pattern, they're golden.
For salted passwords to be actual salted passwords, the salt MUST be a unique, per-user value. It's usually stored alongside the password hash in the database, in some format or another.
What this means is that for two users with the same password, they will not have the same hash.
hash("passw0rd"."12345") != hash("passw0rd"."67890")
This is the key to a salt. It is unique per user. What this means for the password cracker (and why salting is so powerful) is that it can no longer hash each password once and compare it against the list of hashes. It has to hash the password for each unique salt value!
Let me repeat that, because it's the key to salts: The password cracker has to hash the password for each unique salt value.
What does this mean if you have a million users? It means, if you actually have a unique salt for each user, the password cracker is a million times slower - without adding any work to the defender!
If a recent leak of around 7 million unsalted SHA1 hashes were, instead, salted SHA1 (with nothing else - no password stretching, just a simple salt added to the password), it would mean the password crackers were 7 MILLION TIMES SLOWER at the beginning. This is the power of salting!
Now, as the passwords are found, the salts can be removed from the list of salts that are tried, and things speed up. But it's still a huge factor slower, for no additional workload on the defender.
Another thing that is absolutely vital is that the salts be unique per user. There are some older schemes in which there are a limited number of possible salt values. The slowdown is proportional to the number of unique salts, so multiple users with the same salt does not help.
As an example, the "per site static salt" can be modeled as a salted password hash with one salt shared across all the users. Since the slowdown is proportional to the number of unique salts, the slowdown is by a factor of 1 - or nothing.
If you used a 2 digit number (00-99) as a salt, no matter how many users you have, the slowdown is only a factor of 100. This is no good with millions of users.
Again, for emphasis: The salt must be a unique, per-user value.
Another comment - salting is totally independent of password stretching (using time or memory hard algorithms). Those help by making it take longer (or use more memory) to generate a hash, but they affect the defender and the attacker roughly equally. They should definitely be used, but are a separate issue which I may deal with in the future.
So, salt your passwords.
And if you don't understand anything of what I just wrote, that's fine - just use bcrypt for now. Please don't try to invent your own password storage system.
Clarifications and Corrections
I'm going to add comments at the bottom to help clarify things based on feedback I receive and other thoughts I have.
The per-user salt is stored in the database with the hash. In a Unix shadow file, the salt is part of the hash line, and in many web applications, the salt is stored in the same table as the hash, or as part of the hash, as in "hash:salt" or "salt:hash" - either is fine. The attacker knows the salts, and the slowdowns still apply. I still have to try each unique password/salt combination, which is where the slowdowns come from.
Salts also accomplish the wonderful goal of eliminating the use of rainbow tables. Rainbow tables work wonderfully on unsalted hashes (though if you have enough hashes, brute force is far faster). Salts totally eliminate the threat of rainbow tables unless you do something silly, like...
Username as a salt
Some people wonder about using the username as a salt - it's a unique, per-user value, right? It fits the definition, but it's still a bad idea (that trips up people/companies as impressive as Microsoft). The problem is, depending on your environment, there may be "special" users. Precomputed tables can be made for a static salt. The worst offender here is the domain cached credentials. They're salted with the username - and almost every domain on the planet has some user called "Administrator" floating around. Plus, in many domains, "Administrator" has logged on to nearly every system at some point to set the system up (though this is less true in larger domains). So domain cached credential rainbow tables for Administrator are most definitely useful.
Single hash speed
It is occasionally stated that salting is useless, as it doesn't affect the speed to crack a single hash. This is true - but misses the point. Salting is far from useless! It depends on the threat model you are interested in. If your threat model is "An attacker gains one hash and wishes to crack it," then salting is useless. Password stretching is useful here. However, if you are a website, the threat model looks a lot more like "The attacker(s) wish to recover as many passwords as possible in the shortest time possible." This is what happens with website dumps most of the time, and the salting helps tremendously here!