GAINING A SENSE OF SECURITY ABOUT SECURE DATA AND ALGORITHMS

Jul 30, 2021

Summary

4 min

AI and ML systems require a more secure framework, which starts with locking down data and algorithms. Secure data involves several techniques such as data tokenization, homomorphic encryption, differential privacy, and synthetic data.
Secure algorithms require verified and secure data to be trained on and kept secure when not in use. Protecting AI and ML integrity goes beyond a data science concern, and organizations must focus on educating and training teams to understand protection strategies and data contamination methods.

CONSTRUCTING A MORE SECURE FRAMEWORK FOR AI STARTS WITH A RECOGNITION

There’s an overwhelming temptation to view cybersecurity through a conventional lens: that tossing more devices and software solutions at it will somehow improve it. Yet, as digital technology evolves and artificial intelligence (AI) matures, this thinking falls woefully short. The ability to secure AI and machine learning (ML) data, along with the algorithms used in these systems, is increasingly at the center of security.

Over the last few years, reports of AI breaches and failures have surfaced with growing regularity. Google, Amazon, Microsoft, and Tesla have all been hit with failures at one point or another. Currently, at least nine different methods exist to attack AI. These include data poisoning, which corrupts training data; model stealing, which replaces a real model with a fake; model inversion, which recovers and reconstructs training data; and “attack ML supply chain,” which succumbs to modified models.

Constructing a more secure framework for AI starts with a recognition that locking down data and algorithms is critical. As organizations wade deeper into AI, traditional defense strategies don’t go away, but they increasingly fade into the backdrop. It’s important to understand how AI and ML change the dynamics, requiring redefined trust boundaries, a greater focus on identifying sources of training data, and better detection methods.

Secure AI Defined

A starting point for any foray into AI and ML is to understand the two building blocks of a data framework: secure data and secure algorithms.

Secure Data

Any AI data that has confidential attributes and requires some level of anonymization, or, at the least, pseudo-anonymization, qualifies as secure data. It encompasses areas such as regulations, privacy, trust, and ethics. There’s a growing need for data to be accessible yet secure for AI and ML processes. This task can be complicated—and at times frustrating—often involving trade-offs related to extracting value from data versus security and privacy concerns.

There are several techniques that can be used in isolation or combination to promote secure data. They include data tokenization, which substitutes an actual value with a placeholder; differential privacy or k-anonymity, which relies on quasi-identifiers to add “noise” and generalize data; homomorphic encryption, which encrypts all values of the data while allowing it to be analyzed without decrypting it; and an emerging space called synthetic data, which substitutes statistical properties and mathematical formulas for the original dataset. In some cases, organizations may combine these methods.

Secure Algorithms

In the AI world, the term relates to secure instruction sets and accompanying data designed for a business use case. Essentially, the algorithm is available to process the desired data, but it has built-in protection from undesired access. In order to maximize protection, it’s critical the algorithm is trained on verified and secure data, and it is kept secure when it isn’t in use. Secure algorithms are increasingly used for “secure inference,” which allows them to work with APIs and process data in batch mode in real time. The approach ratchets up the possibilities for AI and ML, but it also introduces risks such as malicious attacks.

AI Under Assault

AI data frameworks and algorithms may be transformative, but they aren’t without real-world dangers. In fact, attacks on AI are becoming more common and more serious. Gartner reported in its Top 10 Strategic Technology Trends for 2020 that 30 percent of all AI cyberattacks by 2022 will leverage training-data poisoning, AI model theft, or adversarial samples to attack AI-powered systems. Although it’s tempting to view these risks as somewhere over the horizon, they are, in fact, real.

For example, researchers in Germany discovered a way to embed hidden audio commands in files that are imperceptible to human ears but detected by voice assistants like Alexa. The team reported in 2018 that the flaw isn’t a bug; it’s a problem in the way AI is designed. The adversarial attack method, which they dubbed “psychoacoustic hiding,” demonstrated that it’s possible to control the device by feeding it a sound, which can even seem like a bird chirping. The same technique could be used to manipulate a machine vision system in an autonomous vehicle so that it thinks it’s seeing a stop sign when one doesn’t exist, for example.

Meanwhile, Google’s image recognition neural net framework was tricked by rotating images slightly off register, something that changed trucks into bobsleds; Tesla’s autopilot function was fooled simply by placing stickers on the windshield, which caused cars to swerve into the wrong lane; and a Microsoft chatbot function was taught how to be racist only a few hours after it was launched when users fed it Twitter posts, including tweets saying Adolf Hitler supported Donald Trump for president.

Protection is Key

All of this chaos points to a basic but critical fact: Even the most secure systems are vulnerable to attacks that may not be obvious or even visible to security experts. People can tweak data, poison data, or modify data residing on devices—or trick systems into handling data incorrectly.

As more and more decisions are based on algorithms—including those used on the edge and within the Internet of Things (IoT)—we must be on guard for data and algorithmic manipulation and data weaponization. Not only is it important to know where data came from and who has touched it, but organizations can also seek the support of methods such as advanced telemetry to detect anomalies in training data and more advanced lattice-based (mathematical) techniques to construct algorithms.

While researchers continue to develop tools and technologies to combat AI and ML attacks, it’s also critical to recognize that AI and ML integrity is more than a data science concern. Organizations must also focus on educating and training teams cross-functionality to understand data generation methods, protection strategies, and how data contamination can take place. Tools such as Microsoft’s Bug Bar, which outlines AI attack methods, can help. In addition, Mitre offers an array of resources on AI risks at GitHub.

‍
Although AI risks may seem futuristic, the problem has already arrived—and conventional security tools aren’t up for the task.