Voice Assistant Alexa

The Fundamentals of Designing Voice Apps for Alexa

FX Digital

This content is produced by a publishing partner of Open Mic.

Open Mic is the self-publishing platform for the marketing industry, allowing members to publish news, opinion and insights on thedrum.com.

Find out more

July 7, 2020 | 9 min read

When people think about design they naturally think about visuals and graphics

Although many Voice-enabled devices are beginning to combine Voice technology with screens, designing for Voice differs from designing a traditional graphic user interface. This is because for the most part the user will not be interacting with their eyes, but instead listening for instructions, prompts and audio queues to engage with a Voice application.

This article will examine the core elements to consider when designing for Voice.

Wake Words, Launch Phrases and Invocation Names

To begin using a Voice experience, a user must say a Wake Word to their Voice-enabled device. This is the word we say to activate our Smart Speakers - "Alexa" or "Hey Google" and awake them from their slumber. When the user says this word, the device starts actively listening for the user's input.

Next, if the user wants to interact with a specific Voice App - known as Voice Skills for Amazon's Alexa - we have to use what's called a Launch Phrase. A Launch Phrase is a connecting word that tells Alexa to open a Skill.

Take our fictional example Alexa Voice Skill, Moe's Diner. The user would say "Alexa, [Launch Phrase] Moe's Diner". The Launch Phrase can vary, including but not limited to:

Ask
Play
Launch
Open
Start

Finally we require an Invocation Name. The Invocation Name is the actual name of a skill - this lets Alexa know which Skill it needs to launch. In the example above, this Invocation Name would be "Moe's Diner".

So, to recap - our Wake Word (in this case "Alexa") tells the device to start listening to the user's input. Our Launch Phrase tells Alexa to open a Skill when said in conjunction with the Invocation Name.

Welcome Message and Getting Started

The steps above will allow your user to activate an Alexa Skill, however this is just the beginning. Now we have to let the user know the Skill has launched. Typically we do this by giving the user a welcome message. This would then be followed by what the user can actually do within the Skill. This gives the user some basic understanding of what is available to them and reduces the risk of the user getting lost in the Skill. In our Moe's Diner Skill example, we could see a conversation like this:

User: Alexa, open Moe's Diner
Alexa: Hello, welcome to Moe's Diner. From juicy steaks to delicious milkshakes, experience classic American dining.
Would you like to reserve a table, order a takeaway, or hear our opening times?

It is always best to keep this section of the conversation as short as possible. This allows the user to interact with the experience quickly, ensuring they can receive the information they require without friction and stay engaged with Skill.

At this point, the Skill is listening for the user's input, as we just asked the question. In an ideal world the user answers the question and continues along the journey, however even though this question may seem pretty straight forward we have to account for the following factors.

Reprompts

If a user does not hear the question or fails to respond, you must give them a reprompt to move the conversation forward. So for Moe's Diner, our Skill could re-prompt the user by asking them the same question.

Alexa: Let's try that one again! Would you like to reserve a table, order a takeaway, or hear our opening times?

Error Handling

If the user says something unexpected that the Skill is not programmed to answer or the Skill mishears the user, it is important to bring the conversation back on track. This is known as error handling. Here's one way to do this:

Alexa: Would you like to reserve a table, order a takeaway, or hear our opening times?
User: Do you have outdoor seating at the restaurant?
Alexa: I'm sorry, I'm not sure I can help with that. Would you like to hear your options again?

In the example above, Moe's Diner has not been programmed to be able to answer the question about outdoor seating and therefore tries to bring the conversation back to the questions it can answer. However, the above example is also quite robotic and error handling is a great opportunity to inject a brand's personality into the response. This could look a bit like this:

Alexa: Umm..I'm sorry, I'm not sure. I'd have to check with Big Moe but he's busy in the kitchen cooking up a storm. I'll tell you what I can help you with - would you like to reserve a table, order a takeaway, or hear our opening times?

Intents, Utterances & Slots

An intent describes the overall aims of the user when interacting with a Skill. Utterances are the words uttered by the user to state their intent. There can be multiple utterances for one intent. A users intent could be to make a reservation, but this could come in a variety of utterances:

User: Can I book a table?
or
User: Please can I reserve a table?
or
User: I'd like to make a reservation.
Etc.

When a user has used an utterance, our Voice Assistant will know their intent is to make a reservation and we then know what the Skill needs to ask the user next to fulfil this request. Not all people state their intent in the same way and therefore when developing a Voice Application it is important to try and account for as many utterances as possible.

In our Moe's Diner example, this would ensure that when a user makes a request that they would like to reserve a table, regardless of how they say it, we are still able to map their request to the Skill's 'reserve a table intent'. This part is really important, because if you don't account for common utterances you could cause your user a lot of frustration.

It is here that we can utilise what are known as slots on the Alexa platform. Slots allow us to create variables that we can use in utterances. So, in our example, we could create a slot called [date slot]. In this slot we could include the following variables - tomorrow, tonight, today. So, if a user said something like:

User: "I'd like to reserve a table for tonight"

You don't need to waste valuable time writing out all the possible variations of utterances such as:
"I'd like to book a table for today"
"I'd like to book a table for tonight"
"I'd like to book a table for tomorrow"

Instead you can create one utterance that includes this slot type:
"I'd like to book a table for [date slot]"

Alexa will know that if a user says today, tonight or tomorrow, that they are referring to a date, as this has been set in the date slot. Luckily for us, Amazon has already created predefined slots for common variables such as regions, times, dates - which you can see here - saving us having to create slots from scratch.

We can also build out intents that help the user fulfil their requests rapidly. For example a user might utter an intent that features multiple slot values, such as the [day] and [time], like in the conversation below:

User: Alexa, ask Moe's diner for a table reservation [tonight] at [6pm]
Alexa: Sure thing, let me just check our booking availability this evening. How many people would you like to book for?
User: Three, please.
Alexa: Three people - ok let's have a look. Ah yes, we have a table available! So that is a table of three, for tonight, at 6pm. Would you like me to go ahead and confirm the reservation?
User: Yes please
Alexa: Great that's all booked, and we look forward to seeing you here later tonight.

In this scenario, the Skill checks to see what slot values it has collected and understands that it still needs to know the number of people the user would like to make the booking for. That is why the Skill asks the additional question "How many people would you like to book for?".

Once the Skill has collected all the information it requires to create a reservation for the user, we introduce a confirmation prompt. This is where the Skill reads back the users reservation details and asks if they would like the reservation to be made. This gives the user a chance to review the reservation before confirming, ensuring there are no incorrect details.

Closing Thoughts

Designing for Voice can be challenging as there is rarely a straightforward, linear path when in conversation. A user can ask a Voice Skill anything they want, at any time. As Voice designers we must do our best to understand the users intent and reply with the most relevant response at that moment.

In this article we have covered the basics of designing for Voice. Remember, Voice technology is constantly evolving, improving and increasing in functionality. If you are interested in understanding where Voice technology can fit into your wider marketing strategy, get in touch with FX Digital today.