The UX factor: Why AI in voice interfaces needs experience design

By Miguel Alvarez, Director of Technology Services

November 20, 2017 | 6 min read

The most exciting projects I’ve worked on, and that had the best impact on the people using them, had several things in common. There was a good understanding of the technologies involved, their limitations and the foundations needed for a proper solution. Most importantly, though, experience design played a big role, creating a clear sense of who the audience was and how people would be using the interface.

Amazon Alexa

Great advances have been made in the field of natural language processing or NLP (the application of computational techniques to the analysis and synthesis of natural language and speech), but to progress further, we need experience design professionals to work side-by-side with those defining the algorithms and coding the solutions – to close the gap between the maths and computation, and the way people interact with these interfaces. It’s one thing to have a clever system, such as speech recognition, but what do you do with it once it’s been developed?

Spoken language is a complex field. It requires comprehending individual words, their connection within a sentence, and the context, such as location, as well as the speed words are spoken, accent and tone of voice.Neuro-linguistic programming (NLP) has made sense of all this through mathematical equations that can identify, process and classify patterns. However, achieving the perfect balance between the equations and computational power of a system and the efficiency of how it’s used is still a challenge, so it’s important to design interfaces that find the sweet spot between the most cutting-edge tech and what the user wants to achieve and how. This is where experience designers come in.

So how can UX be integrated into the process? I believe a common language that can be used across the disciplines would be a great step forward. This would help experience designers to understand the complexities of the systems, the better to enhance the user experience.

Below are a few of the key terms used by the maths and tech teams that could form a common language and help UX designers contribute more fully to future projects.

Weight

Artificial neural networks are inspired by our understanding of the brain. How they work is complex, but one of the key features is that they rely on the notion of weight – giving priority to the connections that matter most.

When thinking of designing experiences for voice, it would be beneficial to match what matters the most with each experience. This could be the brand, tone of voice and keywords, as well as the language to be used. Not everything can have an identical priority.

Vectors and correlations

An important aspect within NLP is understanding how different the words we use are, but also their correlation to each other. For example, even though the words ‘cat’ and ‘dog’ are different, the way they’re used in sentences can be extremely similar. Vectors and correlations are used to properly populate the necessary information that represents these words. But there’s a complexity with this, because if we add more data to represent each word, then the accuracy of understanding speech goes up, but the necessary computational power goes up, too.

Experience designers should be aware of this and understand there needs to be a balance between design and complexity.

Abstraction

Abstraction helps to predict the future by turning something complex into something easier to represent. Experience design is great at predicting how an interface will be used so the solution created delivers fully to the public. For example, when Uber gets redesigned, there are experience design experts making assumptions and predicting the best way of creating an experience that connects with the customers.

Energy-based models as a way to enhance NLP

Energies are used as a reward mechanism for machine learning and this helps to quickly and accurately organise the results and ease the process of speech recognition. The most important thing with machine learning is teaching both what’s right and what’s wrong, in order to show the algorithms the way.

For example, with a finance interface, if we want to give a user options to transfer money, then the ideal scenario is ‘transfer X amount of money to Y’. The worst scenario would be ‘make a positive transaction between X and Y where X is the source and Y the receiver and with Z the amount of money to be sent’.

Latent variables

There are variables that, if they were known, would make it easier for a machine to find the right answer. For example, in speech recognition, it would be knowing who said what. So an extra input field at the beginning or some sort of reinforcement on the interface to make sure it knows who’s talking can have a positive impact.

Convolution and pooling

Convolution and pooling are used within convolutional neural networks, which are great at finding patterns and the predominant characteristics to classify objects, sounds, etc. At their heart, they use convolution to filter and pooling to extract the most important characteristics. The way to associate this with humans would be to say it’s in our nature to understand everything that surrounds us and compare it with what we’ve learned, but this process of managing uncertainty doesn’t come naturally to a computer, which is living within 0s and 1s. Artificial intelligence is running on the back of making computers embrace uncertainty, with extra layers of programmed mathematical equations that can abstract information and make a decision with a certain level of approximation.

In a moving field such as speech recognition, which is taking to the limit the mathematical equations behind AI, plus the computational power (constantly balancing algorithms between accuracy and the time they take to respond), there’s an extremely important factor: to always consider how users will interact with interfaces. Experience designers are good at representing users, but for this field to advance in a balanced way, they need to increase their knowledge of how speech recognition is created, and a common language across the disciplines is a good way to start. In the end, we want to make machines more human, and UX is great at thinking about the human factor.

Miguel Alvarez is the director of technology services at AnalogFolk London

Voice Activation Technology Amazon

The UX factor: Why AI in voice interfaces needs experience design

By Miguel Alvarez, Director of Technology Services

More from Voice Activation

Trending

Industry insights