If the PS4 is a souped up Fiat, as my colleague John Brownlee wrote today, the Xbox One is a flying car in need of a parachute. On sale for $499 this week, it’s a vision of the future worthy of a World’s Fair booth or a Popular Mechanics cover. There’s just one problem: While you’re enjoying the view from above the clouds, sometimes the engines cut. And as you find yourself and your family plummeting to Earth, you’ll be shouting:
“XBOX [beat] FIRE EJECTOR SEAT!”
“XBOXXOXOBXBX! JIMMY HOLD ON! HONEY, I LOVE YOU.” [transmission goes silent]
More than any platform before it, the Xbox One promises an interface driven by multi-modal interactions—use the controller, use a tablet, gesture, or just speak to the console to do whatever you like. It’s the most groundbreaking product in Microsoft’s portfolio, and yet, it’s held back by the simple fact that its crown jewel—the newly engineered Kinect, a body tracking, heartbeat-reading, voice-understanding camera and speaker—still doesn't hear the human voice well enough for a living room environment.
The Xbox One wants to be the ultimate living room entertainment device. Whereas its predecessor, the Xbox 360, played games and streamed movies, the Xbox One sucks in your cable signal via the ubiquitous HDMI cable, where its interface allows seamless switching between games, television, apps, and the Xbox homescreen. One day, it might even control your Internet of Things objects, too. And the hardware supports this through the new Kinect, which aside from listening at all times, paints your room in invisible infrared light to both track your body and control your speakers and television much like a smart remote. The new Kinect is completely re-engineered from the Xbox 360, which no doubt emboldened Microsoft to build vocal commands deeper into the Xbox One's interface. Whereas the Xbox 360 allowed you to say a few commands generally displayed onscreen, the One interface encourages you to learn a vocabulary that's good anytime, any place, within the UI.
In theory, this means you can walk into your living room and simply say “Xbox On” to turn on your Xbox, as well as your television and receiver. Then you can say “Xbox play Forza Motorsport 5...Xbox watch TV...Xbox watch HBO...Xbox answer [Skype video call]...Xbox go to Netflix,” etc. Video games automatically pause mid-battle, and the software switches as fast as multitasking should, no matter the input.
In practice, you might say “Xbox On” three to four times for the system to wake. You might say “Watch HGTV” only to be taken to ABC. In a room of perfect silence, you might say any number of things that simply don’t seem to do anything at all. And in a room filled with the explosions from an action flick, you might say something and be heard flawlessly.
In my own testing, I’d estimate that vocal commands work about 50% of the time—the most common commands, like "Watch TV" or "Go Home" almost always work, while many others seem to fail more often than they succeed. So why don’t I just give up and use my Xbox controller? Or change channels with my Tivo’s remote? I certainly do sometimes (how else could I ever watch HGTV?). But voice is an omnipresent crutch that, floating in the ether, tempts me again and again.
Because when the Kinect does work, Microsoft has created magic that feels so good I can’t give it up. I’m old enough to remember the clocks on VCRs that perpetually blinked 12:00 because some engineer built a clock interface that no human could decipher. How can I not appreciate the godliness of having my hands full of Chinese takeout boxes and, much like asking my wife to grab plates, asking my Xbox to turn on ESPN?
The appeal of speaking to Xbox is that it can control the system, not just rattle off trivia. Take a similar system, Apple's Siri. When Siri Googles "sushi" for you, it's relatively unremarkable. But when she handles an actual task, like making a reservation for a sushi restaurant, the feeling of empowerment is incredible. And thanks to the Xbox One's underlying connectivity to so many various medias—your cable, Internet connection, Skype account, etc.—the console is always chauffeuring me through the interface to someplace where the action is, like a game, channel, or movie. Talking to the Xbox One, when it works, offers an unprecedented feeling of control over a digital entity.
And in this sense, Xbox One is a huge experiential leap from the Xbox 360, which, most of the time, only allowed you to read a list of commands on the screen. Voice done properly, with a deep understanding of our pronunciation and vocabulary of our dialect, can open wormholes within the interface, transporting you straight to whims that the best 2-D UI designers could never anticipate. The Xbox One opens enough vocal wormholes that I can’t go back to punching in all of my coordinates by hand.
It would be easy to hedge that, in five to 10 years, Microsoft will iron out the kinks (many through free firmware updates). But that’s exactly the sort of thing we’ve all been saying about voice recognition technologies and flying cars for decades. So who knows? The Xbox One, for all that it begs us to talk, may very well never get better at listening than it is today.
That said, the One is still prescient of the world of user interface to come—one where, sure, some controls are imperfect, but there are alternatives. Because as billions of dumb objects become alive and combine into the Internet of Things, do you think one interface can possibly rule them all?
The future of interface isn’t speech, but it isn’t touch, typing, or gesture, either. It’s all of these specialties and more, each standing at the ready to showcase its talents as context requires.
So buckle up, there may be turbulence. I promise the view is worth it.