Threat actors can “fairly easily” alter AI systems, NIST warns


The National Institute of Standards and Technology (NIST) has identified four major cyberattacks that can alter the behavior of AI systems.

NIST has issued new guidance on cyberattacks that manipulate the behavior of AI systems. These attacks exploit AI vulnerabilities by introducing untrustworthy data, causing the systems to malfunction.

One prominent type of attack is the “evasion” attack, which occurs when an adversary confuses an AI's decision-making. For instance, misleading road markings could cause a driverless car to veer into oncoming traffic.

ADVERTISEMENT

The publication, titled Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations, is part of NIST’s efforts to develop trustworthy AI.

One of the paper’s authors, NIST computer scientist Apostol Vassilev, urged the community to come up with better defenses as the existing ones “ lack robust assurances that fully mitigate the risks.”

AI systems are integral to modern society, functioning in diverse roles such as driving vehicles, assisting in medical diagnoses, and serving as online chatbots.

These systems are trained on vast data sets, which can include unreliable sources like public interactions and websites. This opens the door for threat actors to corrupt the data, leading to undesirable AI behavior, such as chatbots adopting abusive language​​.

“For the most part, software developers need more people to use their product so it can get better with exposure,” Vassilev said. “But there is no guarantee the exposure will be good. A chatbot can spew out bad or toxic information when prompted with carefully designed language.”

Since AI training datasets are too large for effective human monitoring and filtering, completely shielding AI from misdirection remains a challenge.

In addition to the evasion, three other major cyberattack types identified in the NIST report include poisoning, privacy, and abuse.

While evasion attacks occur post-deployment, altering inputs to change AI responses, poisoning attacks happen during training, introducing corrupted data to influence future AI behavior.

ADVERTISEMENT

Privacy attacks aim to extract sensitive information about the AI or its training data. Abuse attacks involve inserting false information into legitimate sources that the AI uses, repurposing its intended function.

“Most of these attacks are fairly easy to mount and require minimum knowledge of the AI system and limited adversarial capabilities,” said co-author Alina Oprea, a professor at Northeastern University.

Once executed, such attacks can be challenging to undo, researchers said, warning against overconfidence in AI security.