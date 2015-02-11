It’s not the most gorgeous photography you’ve ever seen. ImageNet features 1.2 million pictures of mundane items–a photocopier shoved in the corner of an office, a bowl of oatmeal on a table, a pile of logs, a giant sign shaped like an ear of corn, an elbow. But ImageNet is important: It’s the central collection of images scientists around the world use to teach their software image recognition, and then test it, too.

Every year, algorithms get better at identifying what’s in these images. But Microsoft Research has just announced a major milestone: Its software was able to identify the contents of 100,000 test images in ImageNet with a 4.94% error rate, while humans have scored a 5.1% error rate in the same test in the past. In other words, Microsoft hasn’t just beaten every competitor in the industry; they’ve beaten humans at their own game.

“That is the current best [result] I have heard about,” confirms Alex Berg, Assistant Professor at UNC Chapel Hill who helps manage the ImageNet set–though he pointed to Baidu, with their 5.33% error rate, as getting very close to Microsoft Research’s milestone, and potentially reaching theoretical peaks in the test itself. “There is some noise and ambiguity in the dataset, and so further small improvements in accuracy may not be meaningful.”

The advantage Microsoft’s system has over humans comes largely down to what the researchers call “fine grain” material, like distinguishing 120 different species of dogs. But error rates and theoretical peaks aside, the real takeaway here is that software is getting extremely good at recognizing what everyday things actually are with an incredible amount of specificity. And this is a key development when it comes to the future of interface.





As digital glasses like the Microsoft Hololens and Magic Leap make their way to market, they’ll lean largely on the promise of augmenting our reality–adding interface and information to all of the mundane objects around us. And there are really two ways that the systems can do this without adding RFID broadcast chips to every box of cereal on the grocery store shelf.

The first is geolocation. The Hololens patent application describes building a cloud-connected map of the entire world. So if you, say, walk through a park, every tree will be indexed and tagged in the map’s database, and the glasses can then deliver relevant information on the fly as you pass by any point.

The second is image recognition–the same sort of technology Facebook uses to tag the faces of your friends. In this scenario, if you looked at a stop sign with your augmented reality glasses on, your glasses would just know it’s a stop sign, just like a human would, through its own visual logic.