If there’s one dominant technological paradigm we’ll remember about 2016, it’s voice. From chatbots to Amazon Echo to conversational interfaces, our voices—and how we use them—are quickly becoming the primary way we interact with computers.
Often lacking from this picture, though, is data. Because many tech companies play their cards close to their vests when it comes to adoption and user growth rates, it’s been hard to know exactly how many people are really using voice interfaces, why, and how. This week, Mary Meeker—the Silicon Valley venture capitalist at Kleiner Perkins Caufield Byers and techno-cultural clairvoyant—released her annual report on the state of the Internet, a perennially delightful beacon of empirical data about the way humans are using the web and technology at large.
Unsurprisingly, Meeker devotes a significant chunk of the report to voice UI. It’s a refreshing (and in some cases, surprising) look at the numbers—here are a few highlights.
There’s a fairly diverse range of reasons voice makes sense: You might not have control of your digits. You might be driving. You might just be lazy. So which is the most common use case? According to the report, 60% of users of voice features do so because it’s easier when their hands or eyes are occupied, mostly either at home or in the car, where 36% of voice features were used.
But at the same time, nearly a quarter of users choose voice because it’s tough to type on some devices—suggesting that accessibility among disabled people is a major driver of user adoption. No surprise there: Many voice control features were originally developed for people with motor impairment, as Google demonstrates as it incubates new voice control features in its accessibility team.
Meanwhile, 22% percent of people said they use voice control because it’s "fun," which sounds unsettlingly like the first few minutes of Her.
What do people actually say when they talk to their devices? When Google rolled out new voice commands in early 2014, it used a common example: "Call mom." In this graphic, you can see how the use of the command spiked around that time. It’s stayed strong since then, spiking and dipping alongside another common command, "Navigate home." Sorry, Dad.
A big chunk of the report is devoted to one voice-control device in particular: Amazon Voice. The numbers are startling: A whopping 5% of all Amazon customers now own an Echo, and 61% are aware of it.
Why does this matter? Consider that Amazon has 44 million Prime subscribers—and one of Echo’s strongest points is making it easier to reorder supplies and new items from the site. Mumbling "buy more paper towels" into the air in your kitchen is about as frictionless as a user experience can get—compared to opening up Amazon on your computer or phone, searching for paper towels, adding them to your cart, and checking out.
That makes Alexa—the machine-borne personality that lives inside Echo devices—the ultimate salesperson, and she’s just getting started. Meeker draws a stark comparison between Echo and the iPhone as proof. While global shipments of iPhones have increased for almost a decade, in early 2016 we saw those numbers drop for the first time. Meanwhile, Echo sales have skyrocketed, with Amazon shipping about a million units in Q1 of 2016.
Does that mean Echo is the new iPhone? Definitely not, and it's still unclear whether users are regularly using their Echoes, or if the growth was due to marketing on Amazon's part or other forces. Rather, the numbers suggest that we’re exiting the era of unlimited smartphone growth—and entering a world where hardware that’s subtly embedded in our environment is increasingly common.
Whatever the long-term consequences, it’s certainly good news for Amazon. :face_with_rolling_eyes:
It’s easy to lose sight—in the barrage of stories about voice UI—of how long it’s taken the technology to get here. In 1970, machines could recognize words with just 10% accuracy, according to Google. In 2010, it had grown to 70% accuracy. In 2016, it jumped to 90%.
But the final percentages are both the hardest and most important. Andrew Ng, the chief scientist at Chinese search giant Baidu, gives us the clearest picture of why in the report:
As speech recognition accuracy goes from say 95% to 99%, all of us in the room will go from barely using it today to using it all the time. Most people underestimate the difference between 95% and 99% accuracy—99% is a game changer . . . No one wants to wait 10 seconds for a response. Accuracy, followed by latency, are the two key metrics for a production speech system . . .
Ng also predicts that within the next four years the move toward voice will see exponential growth. By 2020 over half of the total searches performed online won’t be text queries at all: They’ll be images or voice.
You can check out the full report here.