UpGuard reported last week that sensitive data retrieved from Facebook by third party apps was leaked through AWS S3 – 540 million user records got breached). In the past, we’ve seen similar sensitive data leaks through AWS S3 such as the Verizon breach, the GOP voter data breach and the Uber breach.
Leaving aside the ethics of Facebook’s practices of sharing personal data (e.g. Cambridge Analytica Scandal), could AWS S3’s built-in security have prevented this? Yes and no. This incident happened because third party app developers irresponsibly placed content retrieved from Facebook in publicly accessible AWS S3 buckets. One of these apps stored user passwords in the clear in these buckets.
Technically, these breaches are not really AWS’ fault. AWS or any cloud IaaS or PaaS provider will remind you of the ‘shared responsibility model’, yet such data breaches from cloud storage keep happening.
While in this case, the culprit app developers who allowed this leak to happen had no disincentive to keep people’s personal data secure, such cloud-based data security breaches through AWS S3 or similar cloud IaaS and PaaS is a critical concern for commercial, regulated enterprises.
Someone left the door open…
The issue of data security in this event is deep-rooted. AWS S3 provides server-side-encryption as a defense against attacks on physical data storage, but that’s ineffective against misconfigured access control or if the keys are not properly safeguarded. For instance, the Uber breach had sensitive data stolen from private S3 buckets. They had exposed keys to that bucket embedded in source code in GitHub. Clearly, S3 server-side-encryption or the most sophisticated Identity and Access Management (IAM) policies would be ineffective in this case.
We all tend to treat precious data like precious personal belongings like cash or jewelry. We tend to think that if we lock the vault where we keep our precious belongings, they will be safe. While that’s true for our assets in the physical world, it is not really true for our data assets.
Data is not a physical object. Access to data is controlled through other data in computer and networking systems called ‘access control’ which itself is vulnerable to misconfigurations or breaches. Data can be copied to ‘n’ different places while it is still resident in its original location. Data is not stolen by removing its physical presence from a physical medium, its copied. You may not even detect that your data has been breached until you analyze your data access logs.
Is Client-side-encryption a solution?
AWS S3 data breaches keep happening because people keep leaving the door open. AWS recommends that customers employ client-side-encryption to counter such human errors. However, client-side-encryption comes at a huge cost in terms of data usability. For instance, you lose complete usability of data if you were building a data lake based on AWS S3 then client-side encrypted data cannot be processed by downstream services. Commercial enterprises need to balance data security with data usability, so this may not really be an effective solution.
A practical alternative to client-side-encryption is data-centric protection. This means protecting the data itself in a way that maintains its usability by maintaining data format and its referential integrity and allowing access to unprotection on a least-privilege basis. Data protected in this way can move across different applications and data-stores (silos) and yet remain protected through most of its lifecycle.
In an AWS S3 scenario, it is about protecting sensitive data itself in a way such that it retains usability through analytical systems such as AWS QuickSight, Athena, EMR, Redshift Spectrum and even third-party data warehouses like Snowflake.
This is not really a new concept. It has been practiced in the PCI world for protecting PANs for years, but somehow (possibly awaiting regulations such as CCPA in the US, that will impose financial penalties on companies for not securing personal data) the critical mass of PII holders haven’t caught up to this practice yet, thus resulting in such breaches.
Find out how how your brand can better protect data in the cloud.