How MIT Students Fooled A Google Algorithm

You and I see a cute picture of a dog. But Google’s neural network sees guacamole. The trickery behind this new way to fool AI is a bigger deal than you might think.

Machine learning algorithms, which use large amounts of data to power everything from your email to language translation, are being heralded as the next big thing in technology. The only problem is, they’re vulnerable.


Over the last few years, researchers have shown how one type of machine learning algorithm called an image classifier–think of it as a program to which you can show a picture of your pet, and it will tell you if it’s a dog or cat–are weak in a surprising way. These programs are susceptible to attacks from something called “adversarial examples.” An adversarial example occurs when you show the algorithm what is clearly an image of a dog, but instead of seeing a dog, a glitch that human eyes can’t detect make the classifier see a picture of guacamole instead.

Researchers initially thought these attacks were highly theoretical, more of a demonstration than something to worry about. That was until earlier this year, when a group of students at MIT from the student organization LabSix showed that they could create three-dimensional objects that algorithms would also misclassify–showing that adversarial examples are a threat in the real world. The students’ work was limited in one key way: they still needed to have access to the inner workings of the algorithm to create their adversarial examples.

Today, those same students announced that they’ve already moved beyond that limitation–a troubling insight into the vulnerabilities of the AI that’s already at work in our world.

In a new paper, the authors describe their newfound ability to create adversarial examples when they know very little about the algorithm they’re attacking (they were also able to complete the attack significantly faster than any previous method to date). To demonstrate the effectiveness of their technique, they successfully attacked and fooled the Google Cloud Vision API, a standard commercial image classification algorithm that’s used all over the internet. All they knew about Cloud Vision was what it produces when it looks at an image–for instance, its top few choices for identifying an image, as well as its confidence in each option.

Not having basic information about the neural network made creating an adversarial example to fool it a huge challenge, as Andrew Ilyas, one of the students in LabSix, explains. “Normally what you want to do when you construct these adversarial examples, you start with an image of a dog that you want to turn into guacamole,” says Ilyas. “It’s important, traditionally, that I have access to the probability at all times that this picture is guacamole. But with Google Cloud Vision, it’s not going to tell you anything about how likely that dog is going to be guacamole. It’s only going to tell me how confident it is that it’s a dog.”

[Image: LabSix]
To get around this problem, the team used a method from another area of computer science to estimate how much each pixel of the dog image needed to shift so that the algorithm would think the image was of guacamole. Then, they used a pair of algorithms working together to slowly shift the pixels. The process works by submitting that image thousands or even millions of times into the Cloud Vision API as the algorithms slowly adjust it from a dog to guacamole. Normally this could take upwards of 5 million queries, but Ilyas and his team’s method is much faster. It only took them about 1 million queries before they’d created a specific adversarial example for the Google Cloud Vision image classifier–the guacamole that human eyes would never see.


It’s a much more efficient mode of attack–and could make it easier for people with malicious intent to trick any number of commercial image classifiers that are used online. The LabSix team emphasizes that they didn’t choose Google for any particular reason–many other companies offer these types of algorithms, including Amazon and Microsoft. For instance, the comments company Disqus uses an image classifier called Clarifai to weed out inappropriate images from the comments sections of websites.

There are broader implications. For instance, defense companies and criminal investigators also use cloud-based learning systems to sort through large piles of images. A skilled coder could craft an image that would look innocuous to the human eye but read as dangerous to the machine–and vice versa.

[Image: LabSix]
“This is yet another result that real-world systems are at risk and we’re making progress toward breaking practical systems,” says Anish Athalye, another student on the LabSix team. “This is a system people hadn’t attacked before. Even if things are commercial, closed proprietary systems, they are easy to break.”

As adversarial examples move into the real world, researchers still haven’t found a robust way to guard against them. This could have devastating consequences in the future as these algorithms continue to colonize our online–and offline–world. But Ilyas and Athalye are hopeful that if researchers can find vulnerabilities before these technologies are too widespread, they’ll have a chance at patching the algorithms’ holes–before the bad guys exploit them.

About the author

Katharine Schwab is an associate editor at Co.Design based in New York who covers technology, design, and culture.