They’re Not Just Long Words: Anonymization and Pseudonymization Protect Data-driven Business

March 29, 2021
Share on:

anonymization  \ ə-​ˌnä-​nə-​mə-​ˈzā-​shən  

pseudonymization \ so͞odənimə-​ˈzā-​shən  

Spelling, let alone pronouncing, “anonymization” and “pseudonymization” is just the beginning.

Vocabulary, however, will be the least of the challenges for organizations that ignore the business value created through the use of these data protection methods. Anonymization and pseudonymization are two ways to de-identify sensitive data, and each has a distinct purpose in the tightrope balance between fully using and fully protecting data and data privacy.

The amount of data keeps increasing, exponentially, and it moves quickly between applications and systems in the cloud and on-premises. With more governments passing data-compliance laws—and with more individuals asserting their right to data privacy—organizations, understandably, feel squeezed between monetizing and securing data.  

Hold onto to data too tightly—by restricting access and inevitably slowing projects such as AI-supported analytics—and you’ve already lost the potential insights it can deliver. Protect data minimally, and you run the risk of violating privacy laws and losing the business of customers and clients who expect their data to be safeguarded.  

Anonymization and pseudonymization offer ways to balance privacy and business needs. The data-protection methods de-identify data of its sensitive elements, allowing data-driven organizations to simultaneously protect data and extract its value.

Data-Privacy Regulations Set the Tone

Implemented in 2018, GDPR (General Data Protection Regulation) set a worldwide expectation that personally identifiable information (PII) in the form of data—no matter how it is used and stored—must be preserved to minimize intrusion of privacy and lessen misuse. GDPR codified a set of standards for how organizations, regardless of location, should handle the data of European citizens, including the expectation that personal data should be anonymized. GDPR was so sweeping that companies outside of the EU nonetheless honor it just to avoid unforeseen entanglements.  

Since GDPR went into effect, several countries (including Australia and Brazil) and US states (including California with its pioneering Consumer Privacy Act, or CCPA, and Virginia with its recent Consumer Data Protection Act, or CDPA) have enacted data-privacy regulations aimed at safeguarding consumer data and ensuring privacy. They join a long list of other regulations that have enshrined data privacy: the New York SHIELD Act, GLBA, COPPA, the Fair Credit Reporting Act, HIPAA, and PCI-DSS.

On the surface, all of these measures might seem draconian or, worse yet, difficult to honor. How can businesses, particularly those with limited budgets and staffing, possibly follow the many differing dictates of all of these regulations, not to mention seamlessly adhere to any new ones that will undoubtedly come along?  

Well, there’s a four-word answer: a data-protection platform. It makes sense: technology solving the challenges of data privacy. But not just any data-protection platform will do. The platform must accomplish two goals: manage access to data and protect data.

The platform should be able to see where data resides and what its purpose is, two steps toward understanding whether data is sensitive and how it needs to be protected. Once users have control of their data, they should also have the ability to choose how they can protect it. That’s why a platform should offer a selection of data-protection methods that, on their own and collectively, align with compliance and customer expectations for data security and privacy.

When organizations can either anonymize and pseudonymize data, they’re adding a third “C” to protection. With “control” and “choice” comes “confidence”—a confidence that enables businesses to prove that they are, in fact, closely following regulations and are future-proofing for new privacy laws. Here’s how anonymization and pseudonymization work to deliver the three “C”s.

Pseudonymize Sensitive Data Elements to Advance Business

Pseudonymization hides elements of data by replacing information fields with artificial identifiers, or pseudonyms. It’s an ideal way to protect operational and transactional data.

This approach comes in handy when only parts of data need to be protected, typically for lines of business where some employees can have complete or mostly complete access to data, while many others have only limited access. For instance, the doctors and nurses of a medical office need full access to a patient’s health records but usually not billing data, whereas the business staff needs to see only the latter.

There are two effective ways to pseudonymize: encryption and tokenization. Encryption uses mathematical algorithms and cryptographic keys to change data into binary cyphertext. Tokenization substitutes cleartext data with a deterministic random string of characters.

Organizations that have unstructured data fields or datasets that aren’t often shared and sit in only one system will typically encrypt data. Structured data that’s often accessed and shared—personal information, such as Social Security numbers or financial data—is usually tokenized. Tokenization enables customer service representatives, for example, to have just enough information to assist a customer and otherwise see only artificial identifiers obscuring most other details.

Pseudonymizing data doesn’t mean it’s gone forever. The process is reversible, allowing authorized users to view and manage the protected data afterwards.  

Anonymized Data Supports AI and Analytics Initiatives

Anonymization has become a popular method of protection because it can advance data-intensive business applications, such as analytics. Anonymized data satisfies stringent data regulations and high customer expectations on privacy—thresholds that can either float or sink a company’s AI pursuits.

By setting one or several privacy models to a user’s specifications, an effective data-protection platform can strip bare data elements that shouldn’t be seen by data analysts, business partners, or data marketplaces. It removes the individual from analytics about people, letting companies capture insight into how people live, spend money, work, heal, and entertain themselves—all without identifying who they are.

A CMO, for example, can anonymize aspects of a customer’s personal information so that a data-analytics team sees only partial addresses along with incomes and purchase histories, but not customer names and Social Security numbers—steps that should soothe customers who are concerned about privacy, as well as drive analytics that are determine narrow demographic groups which can be targeted with personalized marketing messages.

Similarly, a research hospital can anonymize sensitive personal data but keep intact health data when running AI-driven analytics to study preventative medicine or experimental treatments.  

Anonymization Vs. Pseudonymization

So, which one is best to use? There is no one-size-fits-all answer; it depends on how you need to use your data.

A bank, for example, may employ a marketing company to analyze customer data to enhance the messaging of its products or services. No individual or machine needs to see specific data that could identify individual customers to help them analyze general customer trends. Therefore, anonymization would be the best method for de-identifying the data shared with the marketing company’s third-party system. If the data is breached or mishandled at any time, it would be useless to bad actors.

Let’s say another bank has a help desk for its online users to troubleshoot login issues. The service agent may need only a few identifiers to assist the customer (i.e., address, last four digits of a social security number, etc.). Only the necessary data housed in the bank’s system needs to be viewable for the service agent to help the customer. However, the bank’s manager might need to see additional customer data, such as statement details or a credit report to assist them with more complex banking needs. Because the bank needs to set permissions for the customer’s data to be viewed by staff based on their level and type of employment, pseudonymization would be the better form of data protection.

Protected Data Drives Revenue and Satisfies Customers

Anonymization and pseudonymization answer organizations’ pressing imperatives to keep sensitive data private, while keeping it open enough to inform corporate decision making, product development, customer service, and just about every aspect of business.  

De-identifying data of sensitive elements fully protects the data as it moves across diverse cloud and on-premises environments—enabling organizations to use data to inform AI, machine learning, analytics, and many other data-driven initiatives that offer a competitive edge in our highly competitive digital world.

Effective data security begets successful data privacy. Organizations that have been ahead of the privacy curve can testify to the gains they’re making through the use of that data because of the confidence they have in data protection. Seventy percent of organizations surveyed by Cisco said they have seen “significant” business benefits—including operational efficiency, agility, and innovation—from prioritizing data privacy. Cisco also found that GDPR-ready companies have shorter sales delays—roughly three weeks as opposed to more than five weeks for those that aren’t compliant—because they’re actively addressing data-privacy concerns.

Securing data lets organizations demonstrate to their customers, clients, and employees that they take compliance and privacy seriously. But just as importantly, quickly providing access to secure data allows organizations to generate revenue, innovate, reduce costs, and create better products and services faster. That’s one bit of information that should be openly shared with everyone.