The growing threat of adversarial attacks

Recently I attended the launch of Dark Data, the latest book by Imperial College’s emeritus professor of mathematics, David Hand, in which he outlines the various ways in which our big data era may be insufficient to make the kind of decisions we hope it can provide. He explores the many ways in which we can be blind to missing data, and how that can lead us to conclusions and actions that are mistaken, dangerous, or even disastrous.

The book is undoubtedly fascinating, especially for anyone interested in data and statistics, but for me, the most interesting aspect was, appropriately, something omitted from the book. Wrapping up the event was Professor Sir Adrian Smith, the head of the Turing Institute, and one of the architects of the recently published report into AI and ethics for the UK government. He raised the increasingly pertinent issue of adversarial data, or the deliberate attempt to manipulate the data upon which our AI systems depend.

As artificial intelligence has blossomed in recent years, the importance of securing the data upon which AI lives and breathes has grown in importance, or at least it kinda has. Data from the Capgemini Research Institute last year showed that just one in five businesses were implementing any form of AI cybersecurity, and while the same survey also found that around two-thirds were planning to do so by the end of this year, you do wonder just how seriously we’re taking the problem.

Trusting the data

There have already been numerous examples of AI-based systems that have gone astray on account of having poor, often biased, data with which to train the systems, which often results in discriminatory outcomes. It’s likely that a greater number of systems have poor quality outcomes due to the same lack of quality in the data they’re based upon.

In these kinds of examples the lack of quality is something the vendors are complicit in, but adversarial attacks involve the deliberate manipulation of data to distort the performance of AI systems. There are typically two main kinds of adversarial attack: targeted and untargeted.

A targeted attack has a deliberate form of distortion that it wants to create within the AI system, and sets out to ensure that X is classified as Y. An untargeted attack doesn’t have such specific aims, and merely wishes to distort the outputs of the system so they’re misleading. While untargeted attacks are understandably less powerful, they’re somewhat easier to implement.

Adversarial attacks

Ordinarily, the training stage of machine learning strives to minimize any loss between the target label and predicted label. This is then tested to ensure that the system can accurately predict the predicted label, with an error rate calculated as the difference between the two. Adversarial attackers change the query input such that the prediction outcome is changed.

It perhaps goes without saying that in many instances, attackers will have no idea what machine learning model the AI system is utilizing, which you might imagine would make distorting it very difficult. The reality, however, is that even when the model is unknown, adversarial attacks are still highly effective, due in large part because there is a degree of transferability between models. This means that adversarial attacks can practice on one model, before attacking a second, confident that it will still prove disruptive.

The question is, can we still trust machine learning? Research has suggested a good way to protect against adversarial attacks is to train systems to automatically detect them and repair them at the same time. One approach to achieve this is known as denoising, and requires methods to be developed to remove any noise from the data. Ordinarily this could be simply Gaussian noise, but by using an ensemble of denoisers, it’s possible to strip the noise out for each distinct type of noise. The aim is to return the data to as close to the original, uncorrupted version as possible, and thus allow the AI to continue functioning properly. The next step is to then use a verification ensemble that reviews the denoised data and re-classifies it. This is a simple verification layer to ensure the denoising has worked well.

Suffice to say, these defensive tactics are still themselves at an experimental stage, and it’s clear that more needs to be done to ensure that as AI becomes a growing part of everyday life, we can rely on it to be providing reliable and effective outcomes free from the distorting effects of hackers. There’s a strong sense that initial biased outputs have eroded any inclination to blindly trust the systems to deliver excellent results, but there is perhaps more work to be done to truly convince vendors to tackle adversarial attacks properly, and indeed for regulators to ensure such measures are in place.

Leave a Reply

Your email address will not be published. Required fields are markedmarked