Protecting Data – CompTIA Security+ SY0-701 – 3.3

Protecting data can take many different forms. In this video, you’ll learn about geographic restrictions, encryption, hashing, obfuscation, tokenization, and more.


One way to protect data is to make policy decisions on where the data is located and where you, as the user, might be located. We refer to these as geographic restrictions. One way to tell where someone may be located is to see what subnet they might be connected to using their IP address.

This might be relatively accurate if someone’s connected to an internal private network where we know exactly where those subnets might be. But wireless devices are much more difficult to determine location based on an IP subnet, because those mobile phones and tablets could be located virtually anywhere.

For those devices, we might want to provide some additional checks of where a person might be using geolocation. Geolocation can work with GPS, especially on mobile devices, to get a very accurate description of where a person might be. You might also use 802.11 wireless.

There are wireless databases that know all of the SSIDs of wireless devices in your area. And if that’s cross-tabulated against the list of SSIDs that your mobile device can see, we can get a relatively accurate representation of where you may be located. And as we’ve mentioned with mobile devices like this, an IP address may not be the most accurate form for providing geolocation.

Once you know where a person is and where your data is, you can start making decisions on what type of access someone might have to that data. When you’re basing this on location, we refer to this as Geofencing. For example, there may be certain types of data that should only be accessed if someone is inside one of the corporate facilities. And if they’re outside the facility, they should have no access to that data.

We can make the determination on whether somebody is located inside of a building, or not based around the network location, or the geolocation and then we can associate Geofencing policies to that location.

Protecting data is one of the primary goals of any it professional. If your organization loses its data, it will effectively be out of business. One of the challenges around the management and protection of this data is that there is a lot of data and, it’s located in many different locations.

It could be on a storage drive inside of a laptop or a mobile device. It might be traversing a network, either a private network, or the public internet, or maybe working inside of a machine, in the CPU and memory.

All of these locations could be used as an attack against your data, so we need to make sure that we’re using the proper amount of data protection regardless of where this data might be. So a good security administrator might apply some encryption to the data if possible, or associate security policies as the data is traversing the network.

You can also set individual permissions of the data so that you would only allow access to the data by those people who may be authorized. One common way to protect data is by encrypting the data. That encryption process is one that takes data that is open or in the clear and makes it into a form that is completely unreadable.

We refer to this in the clear data as plaintext, and once we encrypt the data, the resulting encryption is referred to as ciphertext. If you’re using technology that can encrypt this data into something unreadable, there’s also a method to decrypt the data back into its original form.

You need to make sure that you’re using the proper implementation to be able to both encrypt and decrypt this data, and then you also have to make sure that you have the proper decryption key.

One of the things you’ll notice when you encrypt data is that the resulting encryption is very different than the original data. We refer to this difference as confusion, where your encrypted data has a dramatic change over what the original data appeared to be. For example, let’s look at some data that we can encrypt.

Let’s take the simple sentence, Hello, world. And I’d like to provide some encryption of that data. So I’ll use PGP as my encryption method, in this example, and it creates a PGP message with all of this information on the inside. This message contains information that helps us with the decryption process and it contains an encrypted version of our plaintext, Hello, world. It’s fair to say that if we sent this information to a third party, they would have no idea what was contained within that ciphertext.

Another way to protect data is by using hashing . A hash is a way to represent data as a string of text. This is sometimes referred to as a message digest or a fingerprint. And a fingerprint is probably a good description of what a hash is. Just as you can take a fingerprint and tie that fingerprint back to an individual, you can do the same thing with data.

Although an individual’s fingerprint can tie back an individual to a particular print, we can’t somehow recreate an entire person. If all we have is their fingerprint the same thing applies for hashing. If we have a hash, we can’t somehow recreate the original data that was used to produce that hash. Because you can’t somehow recreate that original plaintext from a hash.

It makes a perfect mechanism for storing certain types of data. For example, we commonly store passwords using a hashing function. You might also see hashes used when you’re on a page to download a file. For example, if you try to download a Linux distribution, it’s very common that they would include a hash right next to the distribution. This allows you to download the file, run your own hashing algorithm against that file, and see if it matches the hash that’s listed online.

If both of those match, then the integrity of this file has been maintained all the way through the download process. We can also use hashing in conjunction with public key cryptography to create digital signatures. This allows us to authenticate the sender of a particular message, ensure that the person that sent the message really is the person that says they’re sending the message, and provides integrity to be sure that what we’ve received is really what that original author sent.

One general rule of hashing is that if you have two different inputs and you provide a hash of each of those inputs, the two outputs should be very different. If we do happen to have two different inputs that create exactly the same hash output, then we have created a collision. And in the past we have stopped using certain hashing algorithms that tend to have problems with collisions, and instead have opted to use newer or better hashing algorithms.

Here’s an example of how this hashing algorithm should work. Let’s use an algorithm called the SHA-256 hashing algorithm. It provides us with a hashing output of 256 bits, or what we represent on the screen as 64 hexadecimal characters. Here’s our first hash, it is a sentence that says my name is Professor Messer and it ends in a period. And if we use the SHA-256 hashing algorithm, we get the output that you see here.

Now, let’s change one character in that sentence and see what the resulting hash might be. The sentence now reads my name is Professor Messer but it ends in an exclamation mark, instead of a period, and you’ll notice that the SHA-256 hash output is very different than the one that was created previously.

This comparison between these two sentences shows that these hashing algorithms can really help us protect data because we’re able to create very different hashing output, even though the input may be very similar. Another security technique that helps protect data, is obfuscation. This is a process that takes something that is perfectly understandable and turns it into something that’s very difficult for humans to recognize.

For example, a developer could write some code and it’s something that’s very easy to follow and understand as you step through the code base, but some developers will run that code through an obfuscation process and then give you the obfuscated code. This helps them protect their code base but still allows them to work with the original code. Both the original code and the obfuscated code work exactly the same way. It’s just that one is much easier to read than the other.

You might also see attackers use obfuscation to hide exactly what’s going on within some malicious code or malicious script. This is also why you might see tools that are able to obfuscate certain types of code, so that you can better understand what’s happening in that code base.

Here’s one line of code. This code says echo and then inside of quotes it has Hello, world. We can see from this very simple line of PHP that this is simply going to put the words Hello, world on our screen. If you applied an obfuscation technique to that one line of code you can create a script that looks like this.

This is valid PHP code that has taken our original one line of PHP and obfuscated it into something very different. Just so you can see that this obfuscation does really work, I pasted it into a PHP shell and you can see that it produced the output, Hello, world, which is exactly the same as that one line of PHP that we were looking at earlier. Another type of obfuscation is data masking. Masking takes some original data and then hides some of that data to help protect it. Normally, this is used to protect personally identifiable information or perhaps sensitive financial details.

For example, if you look at a receipt where you’ve paid for something with a credit card, you’ll notice that it has a bank card on the receipt, but it may only show the last four digits of that bank card. Instead, the rest of the card is now hidden or masked, in many cases with asterisks. Of course, behind the scenes, the company that processed the credit card can see the entire credit card number, but anything that is presented to you or to the end user would be masked or obfuscated with those asterisks.

There are many different techniques that can be used to mask data. Some of this information may be shuffled around, or encrypted, or it may be simply masking out with asterisks like we see here. If you’ve ever paid for something in a store and you’ve used your phone or your smartwatch as a payment method, then you’ve probably used a data protection method referred to as tokenization. Tokenization takes sensitive information and replaces that information with a token. For example, you could take a Social Security number that you see here, and that number is changed, so that anything sent across the network has a completely different set of numbers than the original, much more sensitive Social Security number.

If you’re paying for something in person with your phone or your smartwatch, then you’re using tokenization for the credit card number. This sends a temporary token during the payment process instead of transmitting your actual credit card number. That token is a one-time use token. So, somebody does capture that information being sent across the network, they wouldn’t be able to replay or reuse that token because once it’s used the first time, it can never be used again.

What’s interesting about this technique is that there’s no encryption algorithms. You don’t have to worry about hashing any data. You’re simply sending information across the network the way that you would send any other type of information. The difference, of course, is that the data that you’re sending across the network and your original credit card number are two very different values and you aren’t able to derive one number from the other.

When you first set up a credit card on your mobile phone to be used for checkout, you go through a registration process on your local phone and then those details are sent to a remote token service server. Once the server receives your credit card number, it creates a series of tokens that are then sent down to your phone so that you can use them during checkout. When you then check out at a store and use your phone or your smartwatch using near-field communication, it will transfer one of those tokens to the store’s system.

The store will then use that as the default credit card and send that credit card validation to the remote token service server. That server knows what tokens have been presented to you and then says that is a correct token, and that is now validated, and can be used to purchase products at that store.

You might have previously heard news stories of enormous data breaches with very large corporations. Some of those breaches contain a very large amount of data for every single one of the customers of that company. And in many of those cases, the attackers were able to get such a large amount of data because it was all collected in one single database. This means once the attacker gets on the inside, they effectively have access to all of the data and they can transfer all of that outside the organization, and use it as they see fit.

Instead of having all of your data in one large database, you might want to consider using segmentation. This will separate the data into smaller pieces and put it into different locations. This will now make it much harder for an attacker, because they would need to breach each individual database to gain access to all of the data. This might also allow you to set different security for these different databases depending on the information that’s contained within.

For example, if a database simply contains someone’s name, you could put a minimum amount of security on that particular database. But if the database contains health care information or financial details, you might want to include additional security checks to protect that database from any attacker. And a type of data protection that we use every day, are permission restrictions.

This means when you provide your username and password, there’s a series of rights and permissions that are associated with that particular account. This starts with the authentication process itself. We need to be sure that we have safe and secure authentication. So we should have minimum, password policies.

Maybe there are additional authentication factors that have to be used during the login process and you might have additional checks and balances that occur when someone initially logs in. Once someone logs in, there may be additional security associated with that account. For example, there may be groups or file permissions that limit the type of data, that particular user has once they log in.