Keeping data safe is a challenge when storing information in a database. In this video, you’ll learn about tokenization, hashing, and using a salt.
When we are storing data we’re often putting it into some type of database. And obviously, we need to protect the data that is stored in that database, and we need to protect the data that is transmitted to and from that database.
We know that the data that we’re storing is incredibly valuable. In some cases, entire businesses are built around data that is stored inside of a database. We also have to manage the data inside of that database so that it complies with rules and regulations such as PCI DSS, HIPAA, GDPR, and other compliance rules.
Having strong security of our database ensures that the data is always available, and therefore ensures that the business is always able to operate. If there is a breach to the database, it not only disrupts the business, but can also be extremely expensive to repair.
One way to protect data inside of a database is to not actually store the real data inside of the database. Instead, we use a technique called tokenization, where we might have sensitive data that we replace with a token that is not associated with the original value.
For example, a social security number such as 266-12-1112 is stored in the database as 691-61-8539. A completely different number, and has no representation to the original social security number that was originally stored in the database.
We also see tokenization used when we’re storing credit card numbers. We need to have credit card information stored so that we can use it during our normal purchase process, but we also want to protect ourselves from anyone else gaining access to that credit card number.
So instead of storing the actual credit card number on our device, we’ll store a temporary credit card number, or temporary token, on our device and use that during the purchase process. That token is sent to a server that validates the token during the purchase process, and that token is then thrown away, and a different token will be used for the next purchase.
That means that if an attacker does find a way to gain access to that token during the purchase process, they won’t be able to use that for any subsequent purchases. That’s a valuable part of this tokenization process, is that you’re able to limit the use of these tokens.
An interesting part of this tokenization process is that we’re not hashing any of our sensitive data, and we’re not encrypting any of our sensitive data. So there’s no overhead associated with any cryptographic function. We’re simply replacing one value with a tokenized value and using that token for our transactions.
Another way to store information in a database is to store it as a hash. This is something that’s commonly done with passwords because we can store the password as a message digest, which is a fixed length string of text. But storing that hash, instead of storing a password, means that an attacker would not be able to understand what the original password was if they did gain access to the data.
A unique aspect of hashes is that different inputs will have a different hash value. And that we will not see a duplicated hash value, which we refer to as a collision.
We also know that there’s no way to retrieve the password from a hashed value. The hash is a fingerprint of the original information, and not some type of encrypted representation. We also know that the hash a one-way trip. There’s no way to reverse the process and somehow see what the password was based on a hash that’s stored inside of a database.
Here’s an example of some hashes that are based on some passwords. This is a SHA-256 hash. It’s a 256-bit hash value. And what we do is take the original password, for example, 123456. Not a great password, but still, we’re able to hash that password. And the hash that we would store in our database is this very long 256-bit value.
You can see that that value has no representation to the 123456 password, and there’s no way to reverse the process to somehow return back to that original password value.
During the login process, the login that you put in is going to also be hashed and compared to this hashed value that’s stored in the database. You can see the hashes for 1234567 are very different than the hash for 123456, and the hash for qwerty, and the hash for password, are also very, very different values.
If an attacker somehow came across our database that was filled with these hashes, they would have no idea what the original passwords were. And they would have no way to easily reverse the process back into those original passwords.
To add even more randomization to these hashes that we’re creating, we would add some additional information during the hashing process. That additional information is called a salt. If you have multiple users storing passwords, every user is going to have a different salt associated with their account.
We would use that different randomized salt, with the password that they’ve chosen, to then store a hash value in our database. This means that an attacker won’t be able to use rainbow tables to somehow quickly determine what the original password might have been.
A rainbow table is a pre-computed set of hashes and original values. But if you use salt during the hashing process to create more randomization, those predefined rainbow tables won’t be very useful. They would have to then perform the normal brute force to try to determine what that original password might be. And that is a much slower process than doing something with a rainbow table, that might take only a matter of seconds, to determine what that original password might have been.
Here’s how a salt might change the hashing value that we store in a database. Let’s say that our users are going to use exactly the same password on their accounts. And the password is dragon. You can see that the hash value for dragon is listed in our database. For our first user, they’re going to store that password dragon, but there’s going to be some random salt added to that password. You can see that the stored hash value is very different than the hash value that’s created by simply using the password dragon.
And every user that uses the password dragon is going to have a different salt, and therefore has a different hash that is stored in the database. If an attacker does gain access to this database, they’ll think these are very different passwords. In reality, they’re exactly the same password with some random salt added. But this is now going to create a much longer process for them to be able to brute force the original password used by every user.