Hashing and Digital Signatures – CompTIA Security+ SY0-701 – 1.4

The hashing process can provide integrity, authentication, and non-repudiation. In this video, you’ll learn how hashes are created and how to use digital signatures.


A cryptographic hash is used to represent data as a short string of text. Sometimes you’ll hear this referred to as a message digest or a fingerprint. Just like our fingerprints that can represent us, a digital fingerprint can represent data that is being stored elsewhere. Keep in mind that this cryptographic hash is not encryption. You can’t somehow recreate the data if the only thing you have is the hash. For the same reason, you can’t recreate a person when all you have is their fingerprint.

In practical terms, we can use these hashes to verify that a document that we’ve downloaded matches the original document that was posted on a website. This provides us with integrity. We can also use these hashes during the process of creating a digital signature. And these digital signatures are used for authentication, non-repudiation, and integrity.

Let’s create some hashes. We’re going to use a very common hashing algorithm called the SHA256 hashing algorithm. This will produce 256 bits of information that we will represent as 64 hexadecimal characters. So let’s create a hash from a very simple text string. This text string says, “My name is Professor Messer.” And there’s a period at the end of that sentence. If we were to put this into an application to create a SHA256 hash from that sentence, we would get this long string of characters that you see right here.

Let’s now make one change to this sentence. This now says, “My name is Professor Messer.” But instead of it ending in a period, it’s now ending in an exclamation mark. So there’s really only one character that’s been changed. This is a very common characteristic of hashing, where you make one minor change to the input text. And the output hash is very different from each other.

One of the things we’d like to avoid when creating a hash is to make sure the hashes are very different for all types of input. In practical use, we should never run into a situation where this hash is duplicated. If we’re putting different inputs into the hashing algorithm, we should expect to see different outputs as well. If, for some reason, we do have different inputs and those inputs create exactly the same hashing value, then we’ve created a collision.

In practical use, you’re probably never going to run into one of these collisions. And your hashing algorithm should be created so that collisions are an extremely rare occurrence. Unfortunately, there have been hashing algorithms, through the years, that did have problems with collisions.

One good example of this is the hashing algorithm MD5. This collision problem was found in 1996. And because of that, we highly recommend that you use a different hashing algorithm than MD5.

Here’s how this MD5 collision works. Here, we have a string of input. This is text that we’re going to put into a hashing algorithm. And we’re going to take another string of text that’s almost the same. You can see these almost match up.

But every place there is a red character means there’s a slight difference between each of these inputs. But if we take both of those inputs and put them into the MD5 algorithm, we get exactly the same hash. This is a collision. And this is the reason we no longer recommend using MD5 as a hashing algorithm.

We use hashing for many different purposes. And you might run into hashing multiple times through a normal workday. For example, you may need to verify that a file that you’ve downloaded matches the file that happens to be posted on a website.

You’ll often see this on sites where you’re downloading very important files like a Linux distribution. And you can see that each distribution has been associated with a particular hash. This means that you can download the ISO file, run the same hashing algorithm on the file you downloaded, and compare it to the hash that’s posted on the website. If your hash matches the one that’s on the website, then you’ve downloaded the same file that exists on that site.

Another common use of hashing is to store passwords. Ideally, we would never store someone’s password in plain text. And we would not encrypt passwords, because then, someone could potentially decrypt and gain access to your passwords. Instead, we provide a hash for all of the passwords that someone stores. In reality, it’s a hash plus a little extra information called a salted hash. This way, we’re able to store everyone’s password as a hash, which means we have no idea what the actual password might be.

During the login process, the password you input is changed to a hash, compared to the one that’s stored on the server. And if they match, you’ve gained access to that system. I mentioned earlier that when we’re storing passwords, we might want to add some additional information to make it more difficult to brute force. We refer to this extra information as assault. This is random information that we add during the hashing process to modify or randomize the resulting hash.

Every user gets a different random salt to go along with their password, which means if everyone’s using the same password, we’ll still see very different hashes stored for every single user. There’s a technique for reverse engineering hashes called a rainbow table. This is a pre-compiled set of every possible input and the series of hashes associated with those inputs. This makes it very easy for someone to get a non-salted hash and very quickly be able to determine what the original password might be.

But if you’re adding a random salt to everyone’s password, these rainbow tables will no longer work. This would certainly slow things down for an attacker that’s trying to find everyone’s password by performing a brute force. A rainbow table can find this information in a matter of seconds. But brute forcing can take days, weeks, or even longer to find someone’s password.

Let’s take some user passwords. We’ll add some salt to each password. And let’s see what the resulting hash looks like.

Let’s take the password of “dragon.” And if we’re not using any salting, this is the hash that results from that password of dragon. But now, let’s add some additional random text onto this password of dragon. And as we add the different randomization for each one of these, you can see that we have a very different hash that we’re storing. If someone was to gain access to our hashed database, they would think that there were five different passwords being used, when in reality, there’s a single password with a number of different salts added to that password.

Hashes are also used during the process of creating a digital signature. A digital signature is very similar to a signature you might use on any other document. But this one is a digital version of the signature that proves that the message that you received was not somehow changed during the process of sending that message to you.

From that perspective, digital signatures provide integrity. The digital signature will also help you prove the source of the message. This provides authentication. And if others want to prove that the message really was sent by the person who says they sent it, we can confirm that with the digital signature. That’s also referred to as non-repudiation.

The process for creating a digital signature is almost the opposite as encrypting data. For digital signature, the person signing the document will use their private key to create the digital signature. When that signature is sent to another party, they’re able to confirm that private key was used by verifying it with the public key for that user. If we receive a digital signature and go through the verification process and find that the public key of the sender is not able to verify the digital signature, then something in that document has changed. And we can no longer trust the information that we’ve received.

If you’re using a digital signature process built into your email system, or you’re using a third party utility to provide digital signatures, then it’s as simple as clicking a button or checkbox to include a digital signature with the information that you’re sending. But behind the scenes, there’s a great deal of cryptography that’s going on. Let’s step through the process of creating a digital signature so that you can see what happens when you select that checkbox.

We’ll start with Alice, who would like to send a message to Bob that says, “You’re hired, Bob.” We refer to this original message as the plaintext. Alice is going to click that check box that tells her email program to include a digital signature with this email message. Behind the scenes, the email client is going to look at the plain text of, “You’re hired, Bob,” and send it through a hashing algorithm to create a hash of that plaintext. The email application is then going to encrypt the hash that’s been created with Alice’s private key.

And since Alice is the only one that has her private key, she’s the only one that could have created this final digital signature. Just like a digital signature is a bit of information you add to the end of a document, we can do exactly the same thing with this email. So, “You’re hired, Bob,” is still sent through the network in plaintext. We’re not doing any type of encryption in this specific example.

But we do include the digital signature, usually as an attachment or at the end of the email. Bob now checks his email. And he’s got a message from Alice that has a message that says, “You’re hired, Bob,” and it includes that same digital signature. Now, Bob wants to really verify that the message he received is really the message that was originally sent. And he wants to confirm that it really came from Alice.

The first thing he’s going to do is load that message into his email client. And generally, the email client will recognize there’s a digital signature and will perform a verification and tell Bob that this is either verified or not verified. Behind the scenes, what’s really happening is that the email client is looking at the digital signature. And it decrypts that digital signature using Alice’s public key.

Remember that the keys are mathematically related. So if you encrypt with one key, you can decrypt with the other. The result of this decryption ends up being a hash of the original plain text. Now we simply perform the same hash that was done originally to the plaintext to see what the results are. And if both of those hashes match, then we know that the digital signature verifies, and that not only is the document exactly what was originally sent, but we can confirm that it really came from Alice.