From birth we use language as our primary communication method and throughout life we hone this skill. We are adept at speech and language; however, we input data and requests into technology via typing into a screen or using a keyboard.
It’s easy to see why so many of us are eager to communicate to computers and technology in the way that we are most comfortable - speech.
While voice technology has come a long way in the last few years, voice assistants today still lack the basic conversational foundations that create a smooth personalised experience. We use wake words, commands and strange uses of vocabulary that would be alien if used within a real human conversation.
It’s obvious that talking to a Voice User Interface (VUI) is still not quite like talking to a human. Many may argue that this isn’t necessary, and that being able to bark fragmented command words at a disembodied voice gets the job done. For now maybe that’s all we need, but let’s think about the many future improvements that could be made to voice technology.
The subject of context is one of the most talked about within the VUI community. In the future we can expect devices to be able to hold context over much longer periods of time than they can currently.
Google has the ability to keep context for a few commands, e.g. 'Who is Brittany Spears?', 'Who is her mother?', 'Where was she born?', etc. This works well with the new Continued Conversation feature that Google has recently rolled out across the US (and soon UK).
Continued Conversation allows the device to continuing listening after it relays information, so that it can capture any follow up questions from the user. This means users don’t need to say wake words at the beginning of every sentence, much like normal conversations. Since this feature relies on multi-sentenced conversations, hopefully we will see an increase in Google’s ability to hold onto context for longer periods of time.
An example of this would be:
“How long will it take me to get to Gatwick airport?”
“In current traffic, it would take 55 minutes.”
A few hours later:
“Book me a taxi.”
“Sure. Would you like me to book you a taxi to Gatwick airport or somewhere else?”
This small use of context helps the user save time by using the past in its contextual memory.
Although context is a highly anticipated feature it needs to be used intuitively. VUIs need to be able to understand when context is helpful to the user and when it would be a barrier to interactions.
Knowledge of a device’s current situation, location and recent interactions forms the illusion of awareness. With this we are able to build trust with Personal Assistants since we know that they will respond appropriately in any situation.
If the device knows that the user is at home rather than at work, a location search can be more relevant. If the device knows that the user has been looking at maps and directions to London, chances are that they want to go to London. If it can cross-reference that against the user’s calendar, it can verify the date that they are visiting.
In the near future your smart device could give you everything you needed for the day based on your past requests and calendar information, e.g. when to leave the house, which train to get, which restaurants to try, when to leave for home and so on.
Alexa recently gained the ability to understand the room that it is in. You can now ask it to turn off the lights without being specific about which lights you want turned off.
This is the very starting point for device awareness. Potentially, in the future, a device could monitor noise levels and respond at an appropriate volume, or change lighting depending on current light levels in the room.
The way that humans interact with each other can be influenced in many ways. We manipulate our conversations, our words and even intonation based on the person we are speaking to.
We talk to children with simpler words and smaller sentences so that they can understand us. We have already seen a move towards this with the Google Home’s Pretty Please, a feature that can be enabled that encourages children to be polite when taking to Google. The ability for a VUI to do this automatically would break down the barriers of language even further, making sure that the user and the VUI are “speaking the same language”.
VUIs can sometimes interpret a momentary pause in speech as a cue to start answering a question. Not only is this frustrating for the user but the assistant won’t be able to surface the correct information from an unfinished sentence. By learning speech patterns, devices will be able to understand when a user has paused or finished their request. This could also go a step further and communicate differently from user to user based on their personality, mood and age.
Some VUIs can handle multiple commands at once. Earlier this year Google started rolling out support for multiple commands to Google Home. It can now support up to three multiple requests in the same sentence.
"Hey Google, give me the weather, the news and then play some music."
Google is currently able to understand basic commands strung together in one sentence. However, it struggles with complex commands and multi-clause sentences.
Even if VUIs are able to handle complex questions, a pitfall is that they need to be asked in a very specific way. If the user cannot guess the correct way in which to ask the question, they will generally give up instead of battling with the interface.
Alexa can set weekday alarms, however, if I ask it “Set alarms Monday to Friday at 7AM.” I get a response that it can’t do that. If I rephrase this to “Set a recurring alarm for weekdays at 7am.” It processes my request. Users should not have to waste time thinking of how to phrase a question in order to be understood.
The danger of this is that our expectations of smart speaks are being lowered every time we hit a wall. If a Personal Assistant is unable to process my request there is not much chance I’ll check to see if an update has fixed it a week later.
In the future we can expect that Personal Assistants will be able to understand long or complex questions phrased in a multitude of ways.
Throughout all of these points we have touched on the idea of Personal Assistants being able to learn patterns in user behaviour. Once VUIs understand how we want to be served our information, it will be much more equipped to be able to handle our requests in an appropriate way.
There are millions of smart speaker devices in people’s homes, which could be used to generate patterns that will help inform VUIs. Using machine-learning technology, smart speakers can start making sense of this data. It could be used to learn individual or nationwide patterns to enrich the user’s experience.
We are already on the back foot when it comes to voice design as users’ expectations have already been lowered. Because of this we might have to wait for future generations to start using this technology natively to reap the benefits. Since the next generation won’t have witnessed the amount of confusion, command failures and oddly structured sentences, voice interfaces might feel more natural and more fluid to them.
We are clearly still in the discovery and learning phase of VUIs and Personal Assistants and these are common teething problems when it comes to new technology. However I believe we should remain cautiously optimistic about the future of voice.
Caroline Richardson is a UX designer at DotLabel