Protegrity Blog

Protecting Data in AWS? Start with S3

Author : protegrity


As more CIOs are moving to establish public cloud first approaches or to even lift and shift their existing applications to the cloud, Amazon Web Services’ S3 emerges as a critical platform that anchors a cloud premise as it evolves. AWS S3 offers storage for cloud native applications, bulk storage for data lakes managed by Amazon’s Hadoop (EMR), backup and recovery, and disaster recovery. While these are all great S3 use cases, if you are in a regulated industry with compliance requirements you are likely only dipping your toe into the Amazon waters.

An important computing pattern in AWS is the de-coupling of storage and compute, where data is persisted on S3, while computing platforms spin up and down, drawing data from and depositing data to S3. This raises three questions: (1) With so much data on S3, how do we protect that data so that even if S3 is hacked, the data is safe?; (2) If data is to flow to other platforms that are spinning up and down at cloud speed, do we need to constantly change protections as data moves?; and (3) If speed is an advantage we look forward to leverage in our move to the cloud, does protecting the data properly impede the speed of innovation?

Let’s explore the answers to these questions. As data flows in the cloud, often using S3 as an interchange, not only do we want to ensure that data is protected at rest on S3, we also want to make sure the data can flow freely and securely to other platforms. First, protecting data at rest on S3, or on any other specific platform, is not difficult. Technologies abound, from disk encryption, to volume encryption, to file encryption. As we look at how this protected data can flow to other platforms, it is less straightforward. If a data item is protected using a platform specific mechanism, then it is necessary for us to unprotect the data as data is leaving S3. At this point, wherever the data is going, we need to choose how to protect this data over there. Clearly this decision point slows us down, as data cannot flow freely, we must make sure security is chosen and deployed before we let data flow there.

Ideally, a solution that enables the above follows these guidelines:

  1. Apply data protection as soon as possible, before or immediately after data land.
  2. The protection thus applied should “follow” the data, that is, the protected data can flow. Even better, if the protected data can be used by some analytics “as is,” then it can flow even more freely. Minimally, the protected data should be able to move to a different platform and hope to be unprotected there, based on centrally managed policies. This way, data can flow freely yet securely.
  3. Unprotect the data as needed, as close as possible to consumption.

A technology solution to this problem has several components:

  1. The ability to apply protection and unprotection at strategic junctures of data flow;
  2. Platform agnostic, fine-grained protection mechanisms; and
  3. Centrally managed policies.

Obviously, this strategy applies to on-premise, in the cloud, and in a hybrid topology. Below we will illustrate this type of solution using S3 as an example, based on Protegrity technology, which provides all the above dimensions and therefore supports a well formed, holistic data security strategy for AWS.

When to apply protection to data in S3?

A distinctive cloud computing pattern is de-coupling of storage from compute. In this model, it is valuable to allow the compute loads to go up and down, while data is persisted in cloud storage and are provided to computing platforms as needed. With this said, it is relevant to ask how do you get your data into cloud storage in the first place?

There are two possibilities. These data can be protected either in-transit, before they land on S3, or immediately after they land on S3. The second possibility is direct ingestion into what we can call a “data refinery.” Amazon EMR, for example, is often used as an ingestion engine. In this case, data can be protected by the EMR and then deposited in cloud storage, Amazon S3, in a protected state. Here we use EMR to land and refine data, including applying protection, and then depositing the data into S3, protected, ready for consumption.

How should you protect data in Amazon S3?

If you are in a regulated industry such as financial services, insurance, telecom, retail, or healthcare, you need to take steps to protect the sensitive data flowing into cloud storage. You have likely already deployed native Amazon access control and encryption to ensure only authorized users can see sensitive protected data. This is clearly a necessary and good first step.

However, for many industries access control and disc encryption’s Achilles’s heal represents too large a risk. The problem is that access control does not police those that have privileged administrative rights. At the same time, you are left with limited or no control over how Amazon manages its security policies and approaches. This opens a business risk for either internal misuse and, in a world of social engineering, it creates a target for those trying to steal information from your organization. Those targeting your organization know that storage systems and big data are where they want to go and they can spend months stealing credentials until they have in the clear access to everything you put in the public cloud. This means that you need to move from protecting system access to protecting the data itself.

The reality is that bad guys have gotten smarter. They are either aiming at Amazon itself or using your employees with privileged administrative access to get into your systems through phishing and other forms of social engineering. For them, these approaches are far easier than trying to break into all the intrusion protection and encryption that you have deployed. The whole game changes, however, if hackers can pass as one of your employees with administrative/in the clear access. Simply encrypting a database then providing access control does not provide enough protection for this use case. Limiting the access of those with administrative access is just not enough.

Using encryption to provide only coarse-grained protection does not provide the risk mitigation to respond to today’s internal and external threats. For these situations, two principals make sense for your business to adopt: (1) Segregation of duties argues that those that can see data should not be able to create access rules, and (2) least privileged which holds that business users should only see sensitive data needed to perform their job. Here a better answer to protecting data is to de-identify data via fine-grained protection. Fine-grain protected data can be used for analytics without access control constraints. It can flow freely across an extended corporate ecosystem, as whoever accessing the data will not see the sensitive data values but can see just the information sufficient for analytics, especially aggregated trending analysis.

Centralized policy management enables role-based protection/unprotection and separation of duties. Separation of duties holds that the data policy makers do not have access to sensitive data. If AWS S3 is a cornerstone of your cloud deployment you need to have a strategy to ensure data sitting in S3 is protected and ready to be dispensed.

Parting remarks

Download the white paper, “Safely Lifting and Shifting Enterprise IT to the Public Cloud.”

It makes business sense to use native protection with Amazon S3, but add to them the ability to directly protect data within S3, using fine-grained data protection. In other words, if hackers break into Amazon, they do not get your data. For those in regulated industries, this provides the kind of protection needed to start to securely lift and shift to the cloud. To learn more about this, please download the white paper, “Safely Lifting and Shifting Enterprise IT to the Public Cloud,” that digs into the opportunity to protect data as it moves to cloud.

Download our Latest GDPR Whitepapers


Subscribe Now