The bad guys don’t need to know your password; they’ll figure it out themselves. In this video, you’ll learn the techniques that the bad guys use to reverse-engineer your password.
<< Previous Video: Transitive and Client-Side AttacksNext: URL Hijacking >>
To gain access to an account or you’re authenticating, you need to provide your username and your password. The username is usually something that is easily seen. Something you can see on the screen. It’s often sent via plain text over the network, which means it’s your password that is really the most important piece.
It’s something that is not sent in the clear over the network. It’s often hashed, which means you have no way to reverse engineer what that original password happened to be. So the idea that the bad guys have is they’ll try every possible scenario to try to determine what your password might be. This is called a brute force attack.
They will very methodically go through every possible combination of lowercase letters, uppercase letters, numbers, and special characters to try to determine what your password happens to be. And if it sounds like this could be a very involved and very long process then you would be correct. This takes a lot of time to be able to try every possible combination.
If you’ve ever forgotten your own password to a corporate account or an online account, one of the things you’ll notice is after a certain number of attempts it locks the account completely. And at that point, you have to receive an email or call somebody to unlock the account and reset the password. And that’s to prevent these brute force attacks.
The bad guys know that this process of over and over typing in a username and a password is not only slow, but as you can see, your account will quickly become locked out. That’s why the bad guys gain access to the servers and gather the hashes directly from the authentication database itself. So they’ll get a list of all the user names and all of the hashes associated with those usernames. And the hashes are the hashed password that was sent into that system.
And what they will do is begin their own calculations. Well, they’ll take their random guess. They will hash it. And then they’ll compare that hash to what was in the database. And they’ll do this over and over and over.
Because they have the hashes, they don’t have to worry about locking down an account. They can keep trying this as much as they’d like. Brute forcing the hash has its own disadvantages, of course, because the bad guys have to physically calculate what that hash is. And that requires a number of CPU cycles.
If you’re simply typing in a username and a password and trying to authenticate, there’s not a lot of CPU cycles involved there. But if you’re trying a million different calculations, you’re trying a million different password iterations for a user, there’s a number of CPU cycles that will be used there. So it’s not a cakewalk for the bad guys. They still have to do some significant work to be able to compare their calculated hash with what the stored hash might be.
Different applications and different operating systems will store this authentication hash in different ways. It’s pretty standard across the application or across the operating system, but you can’t take a hash from a Linux device and compare it to a hash from a Mac OS X or a Windows device because those operating systems may store the hashes in different formats. This is an example of hashes that I grabbed from my Windows 7 device.
So I’ve got these usernames. Jumper Bay, Carter, Jackson, O’Neill, and Teal’c. Those are on my Windows 7 machine. There are some IDs associated with those usernames.
And everything there at the end– that long string of hexadecimal numbers and letters there– that is the hash that is created by Windows 7. So if somebody grabbed this database, they could then take that information and begin their process of brute forcing those hashes.
If you’re trying to determine someone’s password, you don’t necessarily have to go through every possible iteration of lowercase numbers and uppercase numbers and letters and special characters. It might be a little faster if you simply chose words out of a dictionary. That tends to be what we use for passwords. It’s something that we can remember. It’s a word or a phrase that makes sense to us.
So a dictionary is the perfect place to go. And we can even make it easier by starting with a list of passwords that are most commonly used. Words like password is one of the most common passwords and probably not a good idea to use for your authentication. But there’s other words, like ninja and football that are just as familiar and just as commonly used.
There are large databases on the internet. You can download from many different places that have a list of the most popular passwords or a list of every word that happens to be in the dictionary. And the bad guys are going to use that list first before they go into randomizing and trying to determine what you might want to find from special characters or upper case.
For the smarter people, they’re going to use those more complex passwords. But a dictionary attack is definitely going to grab the folks that are trying to use a simple password to remember, which unfortunately, is a very simple password for the bad guys to guess as well. Of course, not everybody uses a very common name as their password. Most people are going to add special characters or perhaps letters or numbers to replace other letters or numbers in a word.
And one way that the bad guys can use to find these passwords is something called a hybrid attack where they’re going to use a very common set of dictionary words, but they’re going to change them up just a little bit and try different variations of those words. So it’s not uncommon to see somebody that might use the password apple, but they’ll put the number 1 after the apple. Or they might use ninja and put the number 9. Or they might change a T to a 7 within those words.
And the bad guys recognize that you’re going to do this so there are options in their brute forcing programs where they can use a dictionary, but change it and add numbers to the end, replace the letter I with the number 1 to see if perhaps you’re doing the same thing with your password. This is, obviously, going to take a little bit more time. You have to try one dictionary word and then many different iterations of that dictionary word, but obviously, it’s going to be a lot easier than going through every possible iteration of every character and every number and every special character as well.
This is something the bad guys can easily configure inside of their password cracking or brute force software so don’t think that simply changing a number or changing a single letter is going to protect you because the bad guys have already thought of that. There are a number of cryptographic theories the bad guys can use to help determine what your password might be or to be able to duplicate a hash. One of these is called the birthday attack.
And the way that this works is best described by this. In a classroom of 23 students, what is the chance of two of those students sharing a birthday? Now if you’re thinking about this off hand, you may think it’s 1 in 365, but the example we’re using here is every student is comparing their birthday to every other student. So the reality is if you get 23 people in the room, there’s about a 50% chance that one of those students is going to share a birthday.
In the world of cryptography, this is called a hash collision. This is something that really should not be happening. But unfortunately, a number of the hash algorithms that we use have the potential for colliding.
A hash collision is when you have one type of plain text that creates a hash, you have a completely different plain text that creates the same hash as well. This isn’t something that’s supposed to happen. We design our hash algorithms so that we don’t have these types of collisions. But unfortunately, these do exist with certain types of hashes.
So this is a great thing to find for the bad guys because they may not necessarily need your original text, and they might be able to use a different kind of text, but still have exactly the same hash at the end of the day. This is something that the attackers will then use to create multiple versions of this plain text, especially if they can only slightly modify some plain text and then have exactly the same hash on the other side.
They might be able to make changes to a document being sent across the network. On the other end, you can look at the digital signature of that hash and it matches what was sent, but the reality is that the bad guys changed something between one side and the other. And of course, this can be used for much more than just digital signatures. It could be used for the certificates that are used to encrypt data on a web server. So you might think you’re going to a trusted web server, but the bad guys have changed and found a collision hash that allows them to build a certificate that looks like yours, but it’s really owned by the bad guys.
One thing that can help prevent these hash collisions is to use very large hashes. The larger the hash, the more difficult it will be to find a collision, and the safer that plain text is going to be. We talked earlier about brute force attacks with hashes where the bad guy would try to guess what your password was, they would calculate a hash, and then compare that hash to what was stored.
What if all of the hashes, though, were already precalculated and all of those hashes were not only precalculated but also optimized to be able to store and find them very quickly? This type of database is called a rainbow table, and it’s used quite often to try to reverse engineer those hashes. This could be a very, very fast way to determine what a password is, especially when you’re dealing with larger and larger numbers of characters in a password because those take a lot longer to go through and try every possible iteration.
It’s very easy to go through set a five character passwords and try every possible combination, but what if it was a 12 character password? That takes a lot more time. Well, if you’ve already done the calculations and stored them in a rainbow table, now you can very, very quickly simply search through the table for exactly what you’re looking for.
Although having these rainbow tables can greatly speed up the process, you still need a separate table, a separate database of these rainbow tables for each type of technology. That’s because Windows 7 uses one way to hash and MySQL might use a completely different way to hash these passwords. That means you need to build two completely separate sets of rainbow tables.
So this hard work has to be done well before you go through the process of trying to reverse engineer. And if there are many different technologies that you need to reverse engineer, you will need a separate rainbow table for each one of those. Since we recognize that the bad guys can easily reverse engineer these hashes with a rainbow table, one thing that our application developers are doing is salting the passwords. They’re adding our password and so random bit of information and storing that information on the server. That way even if you are able to obtain my username and my salted hash, you would not be able to reverse engineer this with something like a rainbow table.