It’s not just Siri, Cortana, and Alexa that want you to talk to them, anymore. At this year’s Consumer Electronics Show, Samsung debuted a refrigerator you can speak to, and AT&T revealed a voice assistant for home automation. Volkswagen showed off a concept car you can drive through vocal prompts, and Intel introduced off a domestic hoverboard robot who will obey your commands.
You might call this another CES trend, the smart photo frame feature creep of 2016. But in reality, it’s a talkpocalypse. We’re reaching a critical threshold where every device maker wants in on the next big thing in UI: natural, spoken language. And they’re going to wage battle through a hundred different ears that constantly listen to your life—to dim the lights or rearrange your calendar or buy you new shoes—ever-eavesdropping with the goal of serving your needs before a competitor can.
But the reason why companies want their gadgets listening to us isn’t simply the quest for good design. It’s not just ease of use driving their decision to incorporate this technology. Rather, voice control is a means for hardware manufactures to create and control their own platforms, rather than ceding them to the traditional Silicon Valley titans—all while opening up new revenue streams that scrape small but significant profits from providing services to you.
For example, Samsung could have just allowed Apple's Siri and HomeKit to control its new talkie fridge. Instead it bought its own speech system. From across the kitchen, you can speak to your fridge, adding milk to your shopping list. Samsung gets to double dip on this feature. Because not only might voice commands help to sell you their new fridge, but presumably, Samsung also gets a cut every time you order milk via its integrated shopping app, Groceries by MasterCard, which will bring groceries to your door from FreshDirect and ShopRite.
While all of this spoken convenience may inevitably become cacophony, can you blame them? Because if it’s not Samsung’s fridge listening to you for that transaction, you’d just order from Amazon’s Alexa instead.
Voice Has The Power To Democratize "The Platform" As We Know It
Back when the word "platform" really just referred to operating systems on desktop computers, the stakes were a lot lower. Microsoft made money selling Windows and not much else. As hardware became more specialized—call it the Xbox era or iPod era—Microsoft got an extra cut off of licensing games to play on its console, and Apple found a means to profit off the record industry via iTunes. Apple upped the ante further with its App Store, which takes 30% of every transaction both for and within apps, but by and large, being a "platform" still meant you could make licensing fees. And unless you could be the biggest, baddest platform on the block, it made no sense to even attempt it. So if you’re, say, a phone manufacturer, you’re still better off adopting Android than building your own OS.
But as the Internet and mobile world matures, everyone’s getting a lot more savvy. Companies like Google have started referring you to check out its partner, Uber, in Google Maps. Want to edit an Instagram photo into a collage? Instagram will link you to another (Instagram-owned) Layout app for that. From Amazon affiliates to Facebook's app-linking, there’s a new stream of money in referrals and self-promotion. If someone shows up to your doorstep, you can make a buck sending them to the next door.
In that world, convenience becomes the platform. Because when you combine voice control—the most naturalistic form of human expression, that sidesteps layers of taxonomy and UI that makes operating systems so hard to design—with domestic hardware that’s always going to be part of the physical world, the doorsteps change. Google’s invincible search engine, spewing out endless streams of referral dollars, hypothetically wields no more presence than the Samsung fridge from which you're grabbing your breakfast. This phenomenon is, perhaps, a second, better definition for the Internet of Things: Every object has a secondary profit model introduced by the web.
Now, hardware manufacturers have remembered their value, and CES’s obsession with voice proves it. It’s why Ford is working hard to stave off Google and Apple’s attempts to take over its in-car OS, and even got Toyota on board with its platform instead of siding with the Valley alternatives. If Ford has worked so hard to sell you its car, why would it let Google get the cut of every gas station it points you to? It’s just bad business...even if it's rumored to be inevitable.
Too Many Companies Are Listening To What They Want To Hear
But the problem with a scenario in which you can talk to anything is that you’re no longer talking to one thing. Only so many ears can live in one room. If I muse aloud that I need more shampoo in the shower, what hears me? Is it my iPhone sitting at the sink? Alexa networked in my apartment? Some new smart water nozzle from Kohler? And even if that bit is sorted out, I’m left with a bigger problem than the surface UI: Navigating the social dynamics, and the motivations, of everything that can hear me.
Because of the aforementioned business model at their core, each object that hears me is inherently biased. Make no mistake: Amazon’s Alexa is not Iron Man’s Jarvis. Jarvis answers to one guy, Tony Stark, because Tony Stark is the only guy he needs to keep happy. Alexa answers to all of Wall Street, because Amazon is a publicly traded company that needs to ship you things to live. And so Alexa’s not going to hunt around to find you the best deal on shampoo, much like Google Maps will never refer you to Lyft (unless an anti-trust suit catches up with it).
As consumers, we’re caught in the middle of the convenience. Do we choose to side with Siri, Alexa, or Cortana, and talk only to her, despite looming bias and the risk of growing dependent on a single voice—a voice that could take advantage of us? Or do we side with a free market that gives a voice to every stupid overzealous object in our lives, however confusing that may be, in a world where ordering milk becomes a bidding war on a commodities training floor?
Which future do you root for? They both sound horrible.
The way I see it, in another five years, we’ll all live like kings, having every whim taken care of on spoken demand. But we’ll be more like the kings of King’s Landing, where the ears hide everywhere, and our closest advisors can nary be trusted.