Never trust any system that can tell you your own password.
When you type your username and password into a computer system, that system has to compare that information to a list to see if it’s valid. There’s no other way to do it- to have password protection, you need to store the correct answer. But such a list would be the ultimate cracker* target. Present on every system, and holding the keys to every door. To get around this, each entry in this list is hashed, and hopefully salted.
What’s a hash? A hash is a function that is executed on data to confuse it. Everyone knows a few hashes- Pig Latin is probably the easiest string hash. Take a word, move the last letter to the front of the word, and append “ay” to the end of it. Since every bit of data in a computer is a number (even if you have to go into the raw bits to get it) most hashes use mathematics. Add one, then multiply by two perhaps. Electronic hashes are leaps and bounds more complicated than this, but fundamentally the same, with one big difference. Those operations are as easy to perform one way as they are another- if my hash algorithm is “Add five, divide by two” then it takes me just as long to hash my password as it does you to dehash it. Yes, that requires an attacker to know your hash algorithm, but remember that this attacker is already in your system. They’re reading your password.txt file. But there are operations that are easy to do, but hard to undo- the simplest decent hash looks something like “$password mod $largePrimeNumber” but they can quickly get fiendish. The key is, fast to do, hard to undo.
But if it takes a long time to undo, how does the system look up your password when you enter it? Simply put, when the system gets the password, it does the hash operation on it, and looks for the result. That’s what’s stored in password.txt, and if you did your hash right, there’s no quick way to figure out what the original password was. But this is the information age. Throw enough processing into systematically trying every possible password through the hash, and wait until you get a familiar hash. (This is what the term ‘collision’ means, and why it’s a bad thing. Imagine my hash function was “Divide by two, round down.” Every hashed result would have two passwords that would look exactly the same. That’s a collision, if you have a lot of them it means guessing a password takes much less time.) Eventually, won’t you get all the passwords? Yes, but this would take a prohibitive amount of time. You could store each hashed value in one big table, and then reuse that table, but this takes a prohibitive amount of memory. (Seriously. Even a simple hash like the one above can generate a few dozen gigs of results if you try every possibility. Imagine a 10GB .txt file and you can see why this is a better system) And this is where rainbow tables come into it.
Lets say you do have a dehashing function. (Usually called a reduction function.) Neither of these are helpful names, to start. A good hash simply cannot be reversed. You can get plaintext back out of it, but it’s not the plaintext you put in.You can hash something, look for the hash in the list of final hashes, if it is there you’re done. If not, turn your hash through the reduction function again, to give you some more plaintext. Repeat until you get a match. Each turn of the screw is another column of data in a giant table- the Rainbow Table. And here’s a bit of irony- Collisions, which are terrible at the first level of security, are an excellent way to defeat someone using a rainbow table. Each collision will wind up giving a false trail, something that, since we’re essentially running around the world to get back to where we started since we can’t turn around, can hilariously screw us up by pointing us just a tiny bit off course. If you have a huge number of collisions, eventually the rainbow table starts getting the exact same answer for every password. Because of all these collisions, our train might derail long before we get a correct plaintext password. It’s dealing with these collisions that differentiates a rainbow table from similar forms of attack.
Every time a rainbow table notices it does not have a unique result, it stops following the track. This way, the table does not get derailed forever. The term “Rainbow Table” comes from the fact that each column uses a different reduction function, each going down a different path looking for a solid plaintext answer.
So that’s what a Rainbow Table is. A gigantic table of precomputed plaintext guesses derived from hashed values, searched through for matches, reporting a match when it finds one. So why haven’t crackers taken over the world yet?
Larger character sets make rainbow tables bigger and take longer to generate. Longer passwords do likewise. Better hashes do wonders for you. But to shut down a rainbow table, add salt.
Lets say my hash is “$password mod $primeNumber.” And lets say that since my genius hash is widely used, someone makes a rainbow table for it. But rainbow tables all work on a single hash. (Different reduction functions, but the same hash) Changing the hash forces an attacker to remake their rainbow table, a process which takes a long time. How about if I change the hash just a little bit each time? I know how many accounts I’ve set up. Why not grab the next largest prime every time I make a new account? Finding out the next largest prime is a trivial bit of computing, and it’s pretty easy to figure out what hash got used on each password. (Remember, I need to be able to go from plaintext to hash fast enough that the users don’t complain about lagging systems. (Sidenote: I miss the old registration system. I like to hear the screams of the freshmen all over campus, like obscene morning songbirds.)) Just try to use a rainbow table on that though- You’d need a different table for every single password you wanted. Each table is over a gig, and to clock through looking at my accounts you need every one of them in memory. This gets expensive fast, because each hash is different. That difference between the hashes? That’s called salt. And that’s why one rainbow table just isn’t enough.
*Bit of terminology. A hacker is someone who enjoys exploring the details of systems and how to stretch their capabilities. (Paraphrased from the jargon file. http://www.catb.org/jargon/html/H/hacker.html) A cracker is someone who does so to steal or vandalize.