BACK TO RESOURCES

Coast Capital Savings Credit Union: Meeting Canadian Regulatory Standards for Compliance with Protegrity

By Muneeb Hasan, Senior Partner Solution Engineer, Protegrity
May 5, 2023

Summary

4 min read
  • Coast Capital Savings Credit Union (CSS) required critical PII data protection for data stored in their AWS cloud environments, Amazon Redshift, AWS EMR, and AWS S3
  • The credit union leveraged Protegrity for its data compliance requirements
  • Protegrity met the rigorous data protection standards set by this Coast Capital Savings Credit Union, giving them the ability to store customers’ Personal Identifiable Information (PII) in a protected (tokenized) format

One of Canada’s largest credit unions, Coast Capital Savings Credit Union (CSS), with over 50 branches across the country, needed to protect critical PII data stored in their AWS cloud environments, Amazon Redshift, AWS EMR, and AWS S3. CSS had to meet Canadian regulatory standards for compliance in 2021, including the Personal Information Electronic Documents Act (PIPEDA), along with a data transformation initiative surrounding its architecture. Without a robust tokenization and encryption solution, the company would not meet regulatory standards.

Coast Capital Savings Credit Union leveraged Protegrity for its data compliance requirements. Protegrity, the global leader in data security, provides data tokenization for various platforms, including on-prem as well as in the cloud, ranging from protection on databases, data warehouse, file systems, mainframes, application SDKs for Java, Python, C / C++ / .net, GoLang etc. along with cloud-based solutions, such as AWS, Azure, GCP, Snowflake, Databricks etc. As part of this post, we will focus mainly on AWS. Protegrity provides data protection for Amazon services using a cloud-native, serverless architecture.

The solution scales elastically to meet Amazon Redshift and S3’s on-demand, intensive workload processing seamlessly. Serverless tokenization with Protegrity delivers data security with the performance organizations need for sensitive data protection and on-demand scale.

Protegrity Tokenization

Tokenization is a non-mathematical approach to protecting data while preserving its type, format, and length. Tokens appear similar to the original value and can keep sensitive data fully or partially visible for data processing and analytics. Historically, vault-based tokenization uses a database table to create lookup pairs that associate a token with encrypted sensitive information.

Protegrity Vaultless Tokenization (PVT) uses innovative techniques to eliminate data management and scalability problems typically associated with vault-based tokenization. Using Amazon Redshift with Protegrity, data can be tokenized or de-tokenized (re-identified) with SQL depending on the user’s role and the governing Protegrity security policy.

Here’s an example of tokenized or de-identified personally identifiable information (PII) data preserving potential analytic usability. The email is tokenized while the domain name is kept in the clear. The date of birth (DOB) is tokenized except for the year. Other fields in green are fully tokenized. This example tokenization strategy provides the ability to do age-based analytics for balance, credit, and medical.

Figure 1 – Example tokenized data.
Figure 1 – Example tokenized data.

Protegrity for Amazon Services

Protegrity, a global leader in data security, provides data tokenization for AWS services, such as Amazon Redshift, Athena, S3, Glue, EMR, Kinesis, RDS, etc., by employing a cloud-native, serverless architecture.
The solution scales elastically to meet Amazon services’ on-demand, intensive workload processing seamlessly. Serverless tokenization with Protegrity delivers data security with the performance organizations need for sensitive data protection and on-demand scale.

Solution Overview

To comply with Canadian regulations, CSS wanted to implement a solution that can scale with their increasing data usage along with being able to deploy it across the enterprise on different platforms with a single pane of protection. Since the sensitive data existed across on-prem and cloud platforms, the solution needed to provide the flexibility to protect anywhere and be able to unprotect anywhere. Additionally, the solution also needed to provide role-based access control, to be able to unprotect the sensitive data only for the individuals who had the authority to be able to see the data in the clear.

After working discovery sessions with Protegrity, CSS decided to deploy Protegrity’s Enterprise Security Administrator (ESA), along with Data Security Gateway (DSG) appliance, which provides SFTP, HTTP intercept, and REST API capability, amongst other features for data protection. Sensitive data resided on an MSSQL cluster, which would then flow through the S3 bucket and be persisted into AWS EMR and Redshift clusters to run analytics on top of it.

With Protegrity’s Database Protector for MSSQL, the data flowing through MSSQL to the cloud was protected with the MSSQL environment prior to it moving to the cloud. The unprotected data which was landing directly within Amazon S3 was protected using Protegrity’s Cloud Protect API. For unprotection on AWS, Bigdata Protector for Amazon EMR was used while for Amazon Redshift, cloud database protector was leveraged.

Solution Architecture for Amazon S3

Protegrity built an ETL process using two Amazon S3 buckets to separate the zones for input data and output data, as well as a landing zone for incoming sensitive data and a processed zone for protected data stores the resulting protected data.
The S3 protector is triggered as new files land in the landing zone bucket, and it reads and processes the data based on a configuration file. The protected data, meanwhile, is written to a file in the processed zone bucket. The protected data can be the basis of a secure data lake for Amazon Athena or Amazon EMR, or loaded into a data warehouse such as Amazon Redshift.

Protegrity’s Cloud Protect offers protectors for these services and enables authorized users to unprotect the data on read.
The S3 protector supports the following file formats:

  • Text formats (comma-delimited, tab-delimited, custom)
  • Parquet
  • JSON
  • Excel

* Files may be optionally gzipped.

The Cloud Protect S3 solution is deployed on AWS Lambda and invokes the Protegrity Cloud API on AWS to protect the data.

File protector architecture diagrams can help InfoSec teams dissect what they need in their platform | Protegrity
Figure 2 – Protegrity S3 file protector architecture diagram.

The solution scales to process thousands of files in parallel or up to regional AWS quotes. A separate Lambda instance is used to process each file so there’s an upper file size based on the Lambda timeout period or files up to approximately 3 GB. However, larger files can be split to provide greater parallelism and ensure processing can be completed within the maximum 15-minute Lambda timeout period.

Below are example benchmarks for different CSV file sizes:

Figure 3 – Benchmark example.

Solution Architecture for Amazon Redshift

Amazon Redshift Lambda UDFs are architected to perform efficiently and securely. When you execute a Lambda UDF, each slice in the Amazon Redshift cluster batches the applicable rows after filtering and sends those batches to your Lambda function.
The federated user identity is included in the payload, which Lambda compares with the Protegrity security policy to determine whether partial or full access to the data access is permitted.
The number of parallel requests to Lambda scales linearly with the number of slices on your Amazon Redshift cluster, and performs up to 10 invocations per slice in parallel. To learn more, see the Amazon Redshift architecture.

Figure 4 – Amazon Redshift and Protegrity architecture.
Figure 4 – Amazon Redshift and Protegrity architecture.

The external UDF integration with Lambda efficiently scales with cluster size and workload. The following table shows real benchmarks with Protegrity and Amazon Redshift with throughput exceeding 180M token operations per second (6B token operations / 33.1 seconds).

Median Query Time(s) – Cluster vs. # Token Operations

Figure 5 – Token operation benchmark.
Figure 5 – Token operation benchmark.

Conclusion

With an agile, scalable solution, Protegrity met the rigorous data protection standards set by this Coast Capital Savings Credit Union, giving them the ability to store customers’ Personal Identifiable Information (PII) in a protected (tokenized) format. Additionally, customers are enabled to view and access their protected PII data in the clear. The solution was deployed seamlessly across different platforms of the enterprise, meeting the compliance regulations and giving the customers peace of mind that their PII data is well protected within CSS environment.

Recommended Next Read