Co.Design

The Xbox One Is Microsoft's Glitchy Vision Of The Future

The Xbox One is a tease of our future--one that's as frustrating as it is empowering.

If the PS4 is a souped up Fiat, as my colleague John Brownlee wrote today, the Xbox One is a flying car in need of a parachute. On sale for $499 this week, it’s a vision of the future worthy of a World’s Fair booth or a Popular Mechanics cover. There’s just one problem: While you’re enjoying the view from above the clouds, sometimes the engines cut. And as you find yourself and your family plummeting to Earth, you’ll be shouting:

“XBOX EJECT!”

“XBOX [beat] FIRE EJECTOR SEAT!”

“XBOXXOXOBXBX! JIMMY HOLD ON! HONEY, I LOVE YOU.” [transmission goes silent]

More than any platform before it, the Xbox One promises an interface driven by multi-modal interactions--use the controller, use a tablet, gesture, or just speak to the console to do whatever you like. It’s the most groundbreaking product in Microsoft’s portfolio, and yet, it’s held back by the simple fact that its crown jewel--the newly engineered Kinect, a body tracking, heartbeat-reading, voice-understanding camera and speaker--still doesn't hear the human voice well enough for a living room environment.

The Promise

The Xbox One wants to be the ultimate living room entertainment device. Whereas its predecessor, the Xbox 360, played games and streamed movies, the Xbox One sucks in your cable signal via the ubiquitous HDMI cable, where its interface allows seamless switching between games, television, apps, and the Xbox homescreen. One day, it might even control your Internet of Things objects, too. And the hardware supports this through the new Kinect, which aside from listening at all times, paints your room in invisible infrared light to both track your body and control your speakers and television much like a smart remote. The new Kinect is completely re-engineered from the Xbox 360, which no doubt emboldened Microsoft to build vocal commands deeper into the Xbox One's interface. Whereas the Xbox 360 allowed you to say a few commands generally displayed onscreen, the One interface encourages you to learn a vocabulary that's good anytime, any place, within the UI.

In theory, this means you can walk into your living room and simply say “Xbox On” to turn on your Xbox, as well as your television and receiver. Then you can say “Xbox play Forza Motorsport 5...Xbox watch TV...Xbox watch HBO...Xbox answer [Skype video call]...Xbox go to Netflix,” etc. Video games automatically pause mid-battle, and the software switches as fast as multitasking should, no matter the input.

The Practice

In practice, you might say “Xbox On” three to four times for the system to wake. You might say “Watch HGTV” only to be taken to ABC. In a room of perfect silence, you might say any number of things that simply don’t seem to do anything at all. And in a room filled with the explosions from an action flick, you might say something and be heard flawlessly.

In my own testing, I’d estimate that vocal commands work about 50% of the time--the most common commands, like "Watch TV" or "Go Home" almost always work, while many others seem to fail more often than they succeed. So why don’t I just give up and use my Xbox controller? Or change channels with my Tivo’s remote? I certainly do sometimes (how else could I ever watch HGTV?). But voice is an omnipresent crutch that, floating in the ether, tempts me again and again.

Because when the Kinect does work, Microsoft has created magic that feels so good I can’t give it up. I’m old enough to remember the clocks on VCRs that perpetually blinked 12:00 because some engineer built a clock interface that no human could decipher. How can I not appreciate the godliness of having my hands full of Chinese takeout boxes and, much like asking my wife to grab plates, asking my Xbox to turn on ESPN?

The appeal of speaking to Xbox is that it can control the system, not just rattle off trivia. Take a similar system, Apple's Siri. When Siri Googles "sushi" for you, it's relatively unremarkable. But when she handles an actual task, like making a reservation for a sushi restaurant, the feeling of empowerment is incredible. And thanks to the Xbox One's underlying connectivity to so many various medias--your cable, Internet connection, Skype account, etc.--the console is always chauffeuring me through the interface to someplace where the action is, like a game, channel, or movie. Talking to the Xbox One, when it works, offers an unprecedented feeling of control over a digital entity.

And in this sense, Xbox One is a huge experiential leap from the Xbox 360, which, most of the time, only allowed you to read a list of commands on the screen. Voice done properly, with a deep understanding of our pronunciation and vocabulary of our dialect, can open wormholes within the interface, transporting you straight to whims that the best 2-D UI designers could never anticipate. The Xbox One opens enough vocal wormholes that I can’t go back to punching in all of my coordinates by hand.

Eventually…

It would be easy to hedge that, in five to 10 years, Microsoft will iron out the kinks (many through free firmware updates). But that’s exactly the sort of thing we’ve all been saying about voice recognition technologies and flying cars for decades. So who knows? The Xbox One, for all that it begs us to talk, may very well never get better at listening than it is today.

That said, the One is still prescient of the world of user interface to come--one where, sure, some controls are imperfect, but there are alternatives. Because as billions of dumb objects become alive and combine into the Internet of Things, do you think one interface can possibly rule them all?

The future of interface isn’t speech, but it isn’t touch, typing, or gesture, either. It’s all of these specialties and more, each standing at the ready to showcase its talents as context requires.

So buckle up, there may be turbulence. I promise the view is worth it.

Read our take on the PS4 here.
Read an Xbox design studio's critique of the PS4 here.

Add New Comment

12 Comments

  • erictan

    Some very interesting and appealing features, but 1) I don't play video games, and 2) I'm in Hong Kong, so I doubt most of them work as designed...so yes, who is this device aimed at, owners of previous models of the XBox?

  • Daniel Carrapa

    Two problems I have with the Xbox One.

    Who is the target market for this device? It may appeal to a wider base of consumers than the strictly "gamer" crowd. Still, who are the real buyers? Gamers will be interested in the next-gen gaming console, either the X1 or the PS4. Will non-gamers step in as well, to buy what is ultimately seen as a gaming console, in search of a multi-modal entertainment device?

    The second problem then comes from the fact that, despite Microsoft being eventually right about the future, the future of the Xbox brand depends on this console's success. Meaning, the Xbox One's vision may be right on target for the consumer base ten years from now, but not for today. And if the PS4 wins this gen by a great margin, that means Microsoft may loose momentum as other players step in the multi-modal tv-system market, a few years from now.

    Microsoft may be right about the future, but they may be trying to sell it too soon.

    Also, as a gamer, one has to wonder how much of the console's processing power is compromised with the multi-modal (at the ready) interface, and how much that will be detrimental to the gaming experience.

  • Imran

    what is an intersting topic... You are just complaining about a technical issue. Too much tittle for such a simple complain

  • Vicenç Sallés

    Would the title be better "xbox one voice recognizing is glitchy" instead the current dramatic one? You are not talking about the future's living room all-in-one entertaintment box, what is an intersting topic... You are just complaining about a technical issue. Too much tittle for such a simple complain.

  • Imran

    as a gamer, one has to wonder how much of the console's processing power is compromised with the multi-modal (at the ready) interface

  • Fabian Galon

    "The future of interface isn’t speech, but it isn’t touch, typing, or gesture, either. It’s all of these specialties and more, each standing at the ready to showcase its talents as context requires."

    This is so amazingly true. I get so dismayed whenever I hear colleagues touting that one type of interaction is the ultimate solution for everything. That's like arguing for that every interface should consist solely of drop down menus.