There are many techniques that can be used to keep data private. In this video, you’ll learn about tokenization, data minimization, data masking, and anonymization techniques.
Application developers have many techniques they can use to help keep our data safe. And in this video, we’ll look at ways of enhancing your privacy.
One way to use your personal data, without actually using your personal data, is through the use of tokenization. This is when we take data that normally would be sensitive, and we replace it with a completely different bit of data that we call a token.
For example, if you have a Social Security number that is 26-12-1112, we can store that data in a database and display it on the screen as 691-61-8539. We’re able to tie back the original Social Security number with this new tokenized version of the Social Security number because we have a single database that matches those up, and that database is one that’s relatively private.
You may use tokenization many times a day and not even realize it. It’s commonly used with credit card processing, especially if you’re using your mobile phone or your smartwatch to be able to pay for goods at checkout.
This Apple pay, or Google Pay, process doesn’t actually send your credit card number over this NFC connection and through the network. Instead, it uses a token of your credit card number, which means if somebody was to capture that for this single transaction, they would not be able to use that credit card information to purchase anything else.
It’s important to understand that this isn’t hashing, it’s not encryption. There’s no way to somehow derive your original credit card number from the tokenized version of your credit card number, because there is no mathematical relationship between either of those. And because there’s no encryption, we don’t have to worry about processing, CPU, memory, or any other type of overhead.
Here’s how tokenization works on your mobile phone. You first add your credit card number to your mobile phone, and once that’s registered, your mobile phone communicates with a Remote Token Service Server. When that server receives the request, it sends back to your phone a token that will be used instead of your credit card number. That token is then used at a store when you check out using near field communication, at which point the servers Merchant Payment Processing Server is going to send the validation to that Token Service Server. That’s going to validate that the token is legitimate, and the transaction will be approved by the Payment Processing Server.
This token may change every time you perform a transaction, which means that if somebody was to find this token and use that information for future transactions, they would find that it would not be approved.
Another way to enhance privacy is through the use of data minimization. This means that we would only collect data that would be used to perform the needed function. If you look at HIPAA regulations, you’ll see that it has a minimum necessary role. And in the European Union, with GDPR, you have personal data shall be adequate, relevant, and not excessive in relation to the purpose, or purposes, for which they are processed. Which in lawyer speak means that we’re only going to collect the information that’s needed.
This means if you’re on a registration page, or you’re paying for something online, it might ask you for a telephone number or address. And the question would be, is this information required to be able to perform this transaction? If the credit card verification doesn’t require this information, then we may have the option to remove that from the checkout page.
If you work for an organization, you should first only have access to the data that’s necessary to perform your task. And you should also be restricted from browsing through data that has nothing to do with your current tasks at hand.
There are many hospital workers who have been fired for looking through medical records, which normally they would have access to, but in this case they were looking through the medical records of a celebrity who might have been checked into the hospital.
One way to protect data is to simply not display it. This is called data masking, and it’s a way to obfuscate data in a way that shows the data exists, but doesn’t allow you to see any of it. This can protect your personally identifiable information, your financial details, or anything else that might be sensitive.
It may be that the information that we’re viewing does exist in its complete form inside the database, but it’s only displaying part of that data to us on the screen, or on a piece of paper. The control of exactly what’s displayed on the screen, or on the piece of paper, is usually controlled by the permissions of the person using that application.
There are many different techniques for masking data. You can shift data from one place to the other, or shuffle numbers around. Or in the case of credit card receipts, we can mask out the data with some asterisks and only show the last few numbers of the credit card number.
There may be times when the data is protected by not displaying anything associated with that data, and we refer to that as anonymization. This is when we take existing data, and we make it impossible to identify anything associated with the original data that was saved. There are many different ways to anonymize this data. We could hash the data so it would be unreadable, or we could use masking techniques to put asterisks in place of the actual data.
We can also anonymize some data but leave other pieces of data in place, especially if we want to perform some type of analysis. So if you want to analyze customer purchase information, you could remove all of the customer details, such as their name, their address, and their phone number, but leave the information that was purchased. So you can analyze what products were purchased, how many, what the total was, and what the sale date was.
This way, you could perform your analysis of product sales, but leave customer data completely private.
Another important aspect of anonymization is there’s no way to convert back to the actual data once the information has been anonymized. It’s stored in the data with these hashes, or with this masking, and we have no way to convert that back to its original form.
If you need to have some type of data protection, but still maintain the statistical relationships between the data that you’re using, then you’ll want to use pseudo-anonymization, or pseudonymization. Unlike the anonymization we looked at earlier, pseudonymization has a way to convert the data back if we need to provide it for other processes. This means we might see one thing on our screen, but the original data would still be available in the database.
There are many different ways to implement this style of data protection. One of them is to present a different name on the screen each time a record is accessed. So a record that’s associated with my name might show the name Jack O’Neill on the screen. And if the record is accessed again later in the day, the name might be Sam Carter. And if someone accesses that same record later that night, they might see the name Daniel Jackson. All of those refer back to my name, but they’re different each time that record is displayed.
If you need to maintain consistency each time this particular record is accessed, then you might have a consistent replacement. This means each time someone accesses a record that has my name in it, it will instead constantly show the name George Hammond. It doesn’t matter who brings up the record, or how many times, that same relationship will always be in use. And that will be the only name displayed on the screen.