Microsoft Google Agency

Navigating the current hurdles of voice search


By Lisa Lacy, n/a

August 5, 2016 | 6 min read

There was a time, not long ago, when asking your Xbox to order a pizza seemed like an impossible dream. But the future, as they say, is now.

And the possibilities for voice-enabled technology certainly don’t end there.

In fact, voice may very well help search as we know it evolve into a more predictive, conversational experience in which all of a consumer’s needs in a given moment – like, say, “I need to book a flight to Las Vegas on October 10 for $300” -- are resolved by a voice assistant within a single ecosystem. But let’s not get ahead of ourselves.

Despite the whiz-bang things consumers can do with voice in this day and age, insiders don’t expect voice search to overtake text in the same way mobile searches eventually surpassed desktop. And that’s in part because voice still has some inherent limitations.

Here’s a closer look at these restraints:


For starters, voice search is limited by technology that may still not deliver the best results, said David Lau, vice president of media at digital marketing agency iCrossing.

In other words, no matter how consumers search, they expect the first results to be most relevant. And Tom Anthony, head of research and development at online marketing agency Distilled, agreed that voice is not always the ideal interface to return information.

“With voice as output [in devices like Amazon Echo, but soon Google Home and likely the rumored Siri-powered Apple Home Appliance], you can no longer rely on web search for meaningful answers,” Anthony said. “In the same way as ‘Intelligent Personal Assistants’ [like Google Now, Siri, Cortana, Hound and Viv] already respond to many queries with some form of direct answer, which requires them picking just one answer – [and not providing] ten blue links, a device that speaks the answer will need to make a decision about what answer to give you. This is part of the move towards data-driven search.”

At the same time, voice searches also include tone, which is often a better indication of intent, and the longer queries more typically found in voice also have additional clues about said consumer’s intent, noted Purna Virji, senior manager of PPC training at Microsoft.


However, another issue for brands, marketers, search engines and devices alike is that speech recognition is much more complicated than text. In other words, in addition to just words, these entities must now grapple with accents and dialects, not to mention mispronunciations.

In an effort to enhance the experience of Australian consumers, in January, Google announced its app “speaks ‘Strayan'”, including an Australian voice that speaks back to them, along with pronunciations of “all those wonderful and complex Aussie place names out there.” And Virji noted it will simply take additional time for search engines to cater to other dialects and languages.

Further, Michael Bonfils, managing director of SEM International, noted intent and voice tonality can mean entirely different things in different languages, which further complicates search marketing.

“So it’s understanding that and driving toward how do you optimize it and trying to think of additional challenges for behavior and mispronunciations like ‘Porsche.’ Some [people] say, ‘Porch,’” Virji said. “And if you do a search on Google or Bing for common mispronunciations, look at Givenchy. How do these brands prepare for that? For semi-complicated brand names, it’s not just misspellings -- it’s mispronunciations.”

But the good news is it will get better.

According to Virji, the more consumers interact with voice-enabled devices, the more they will improve, much like children who learn after their parents correct them when they make language mistakes.

“This technology needs more and more interaction to learn, so I think as it’s growing, the accuracy will be better and that’s the cycle -- more and more people will use it and the accuracy will get better,” she said.

Consumer habits

But consumers will have to evolve, too.

Virji cited comScore figures that found by 2020, 50 percent of all search queries will be via voice, which she thinks is a realistic timeframe for consumers to grow accustomed to it.

“When the iPhone came out [we were cautious], but now we take 800 selfies a day – we got trained,” Virji said. “The younger generation…they are going to grow up and it’s normal. But we’re so used to two- to three-word queries, so we’re still getting used to talking. But look at Bluetooth – everyone felt like huge idiots at first, but now you can talk to your watch and it’s fine…and as more devices come out with voice as the only UI, it can speed up the timeline.”

Further, she pointed to her nine-year-old son, who she said is perfectly comfortable having conversations with Siri or Alexa and asking for what he wants, like Batman cartoons.

Indeed, a 2014 Google study found more than half of teens use voice search daily.

“He’s really comfortable, whereas [adults] are still being trained,” she added. “But we will evolve.”

At the same time, Lau questioned how many consumers actually use voice search after it first emerged in 2011 with Siri and Google Now. What’s more, he noted he can frequently get what he needs faster by typing.

Lau even likened voice search to visual search in some respects, pointing to a consumer who queries a picture of a dress intending to find out who designed it, but who instead receives results related to shopping.

“When I search for something nuanced, I don’t get the immediate results I desire,” he said. “With voice, it seems like a hassle.”

Privacy concerns

Further, Lau said he thinks old habits die hard and pointed to the dual limitations of psychology and technology holding voice back.

He also cited privacy concerns, particularly for consumers in public spaces.

“If you’re injured or disabled, it’s a godsend. Voice absolutely helps you out,” Lau said. “And Alexa…is Amazon’s most successful product and it goes to show there is something to be said about having a virtual assistant on standby that you can activate by talking into the air. It works at home, so there is a degree of privacy, but if you’re on a crowded Subway train, are you going to search via voice or type it in?”

Microsoft Google Agency

More from Microsoft

View all


Industry insights

View all
Add your own content +