Building a more intuitive Alexa skill for Boston

Working from Code for Boston’s initial Alexa Skill deployment, we teamed up with General Assembly students to improve the design of our new voice application.

In 2019, we partnered with Code for Boston, a Code for America Brigade and grassroots organization, to create and launch a new Alexa skill for City of Boston services. The skill allowed anyone with an Amazon Alexa stand-alone home device, or with the Alexa app on their phone, to hear:

daily alerts from the City
find food truck locations, and
learn about the latest BOS:311 requests.

The skill allowed users to find information about City services, like trash pickup times or parking updates, based on their address.

After the launch, the COVID-19 emergency began to unfold in Boston. We found ourselves faced with new challenges and goals around sharing information. At this time, we had an opportunity to revisit our Alexa skill and ask ourselves an important set of questions:

How do users want to engage with the Boston Info skill ability?
What information do they feel should be accessible amid COVID-19?
Can the format of information that Alexa gives users be improved?
Can we use analytics to evaluate the usefulness of our Alexa skill and find other valuable skill abilities?
What can we learn from how other municipalities and states with their own Alexa skills?

To answer these questions, we worked with a cohort of students from the General Assembly User Experience Design Immersive (UXDI) program. We felt that these open questions about our Alexa skill, and how to optimize it for the challenges of the moment, would be an excellent project. In early April, we joined forces with Brian Collura, Mike Anagnostakos, and Ted Macdonald to tackle this challenge. Their account of the project is written below:

What is this skill, and who are our users?

Perhaps the most important phase in the Design process is the first: research. Who are the people who will be using this product? What are their needs? Do they have problems that could be solved by the Alexa Skill? Without knowing the needs of our user base, we cannot design a solution for them.

We divided users into two major buckets:

those who live and interact with the City of Boston, and
those who own or interact with an Alexa device.

Of course, we were primarily concerned with users who fell into both buckets. But, learning about each separately was enlightening nonetheless.

Research vendiagram — *“Boston Info” users are those who interact with the City of Boston, and also with their Amazon Alexa device.*

Because of the pandemic, we could not leave our homes to find users to research. But, we still had a number of techniques up our sleeves for finding the information we needed to know. We ended up conducting eight personal interviews to learn more about real people’s experiences with the City, or with their Alexa devices. We created a survey that asked participants to rank the importance of current Boston Info skill abilities and other alerts found on Boston.gov. And we also asked Reddit users with Alexa devices how they’re currently using their devices, helping us to collect examples of positive Alexa experiences.

As part of our work, we analyzed seven Alexa skills produced for different city and state governments. We noted common features and the quality of each feature. We also dug into Boston.gov’s analytics to determine if there were any insights or trends that may apply to the skill. Some major insights include:

Coronavirus information pages were much more popular than other pages.
“Trash and Recycling Schedule” and ‘Food Trucks Schedule’, which are among the most popular pages, are already represented in the Skill.

Boston.gov research — *Yearly traffic data from April 1, 2019, to March 31, 2020. Even though the captured data comes mostly from before COVID-19, COVID-19 is still the second-most popular page.*

Drawing conclusions from our research

With the results of our user research in hand, it was time to start organizing the insights we gathered. One common tool for visualizing this qualitative data is called an “Affinity Map”. This board of written notes, one insight per note, can be physically reshaped and rearranged to bring common trends to the surface. To achieve this while working remotely, we used Miro to create the note board online.

Below, you can see our Affinity Map. We put all of our research insights onto virtual “sticky notes” and segmenting them into logical groups, to discover actionable trends and patterns.

Our research gave us a number of leads, and plenty of insights that we’d revisit throughout the process. We found that users interact with Boston.gov mainly to:

pay parking tickets
check snow emergency parking info, and
lookup their trash and recycling day schedule.

Currently, and in the past several months, they also frequently look up COVID-19 updates.

City services and skill abilities that were very important or nice to know included:

Public transit
Trash and recycling
Food Trucks and Farmers Markets, and
Parking-meter information, and
311 Reports.

In researching how people used their Alexa devices, we found that they are often at home and in the middle of other tasks. Interactions are quick. Alexa is also commonly used for coordinating smart home devices, asking one off questions, or for simple entertainment.

We believe that by refining the current Alexa Skill to be more digestible, navigable, and specific, we will create a more convenient way for Bostonians to keep updated with the City. We will know this to be true when we improve the System Usability Scale (SUS), and see a high sequence completion rate in Alexa’s Analytics.

Designing for voice

At this point in a usual design process, the high-level analysis takes a pause, and pencil finally gets a chance to meet paper! This usually means:

drawing sketches of the interface
trying out button locations and navigation contents, and
generally laying out where all the elements go.

But with no visual interface to craft, how can we begin to design the product?

We decided to seek some expert advice, and found it in Dr. Robert Moore, a Lead Conversation Analyst and Researcher at IBM. Dr. Moore works on IBM’s own Watson-based voice assistant. Dr. Moore essentially wrote the book about this very topic of Conversational UX Design. He was gracious enough to sit down with us (remotely, of course), and discuss best practices, new horizons, and challenges, with voice assistant technology.

Two major takeaways from our discussion with Dr. Moore were the concepts of Conversation Navigation, and Agent Persona:

Conversation Navigation

Many of the ways we interact with an app or website have a parallel in a conversational setting. For instance, asking someone, “Could you repeat that?”, is like going back to a previous page on a visual interface. Importantly, a conversation isn’t one sided, even when only one person is supplying information. We supply information in digestible bits, and use feedback from the listener to adjust what we’re saying. An app or website would use visible buttons and menus to provide this kind of interactivity. Here’s a table that shows the conversational and visual interface equivalents of some common actions:

Action	In conversation	On a visual interface
See Capabilities	“What can you do?”	Menu, Help
Repeat	“What’d you say?”	Back
End	“Okay” / “Thanks!”	Close / “X” button

This is conversation navigation.

Agent Persona

Because Alexa is “person-like”, we automatically tend to make certain assumptions about what it can do. This is based on what we’d expect a person in its position to know. Dr. Moore has a strategy for predicting what someone might expect a voice agent like Alexa to be able to do, by creating a job description for it — an “Agent Persona”. By drafting the job description of a Boston Information help desk assistant, for instance, we can better predict the sorts of questions users will naturally want to ask her.

Utterances and expansion

Interactions with your Alexa essentially boil down to two components: “Utterances” and “Intents”. Utterances are the words and phrases you say to Alexa, and the responses it gives back to you. Intents are the functions that are triggered by the user’s Utterance. For example, if I ask Alexa, “how tall is Mount Everest?”, that will trigger the “give the height of Mount Everest” command, or Intent, and she’ll answer with the proper Utterance: “Mount Everest is 29,029 feet tall”.

Clearly a large big part designing an Alexa Skill is comes down to content. This is true not only for creating the responses that Alexa can give, and in collecting all the questions that can be asked for Alexa, but also all the ways in which those questions can be asked. While Amazon has some ways to help with these sorts of phrasing issues, much of the work is still in the hands of the designer.

Our first iteration

Boston’s existing Alexa Skill was a strong jumping off point. However, our initial iteration on the skill took what we’d learned about conversational UX and identified the parts of the skill that could be improved upon most, and the features that could be added to address those research insights. Three particular issues stand out as areas for improvement:

Unresolved inquiries: If Alexa doesn’t understand or can’t accept your response, the conversation ends.
Unable to follow up or repeat: Most of the skill’s responses allow for no follow-up: the user asks a question, Alexa answers, and that’s the end of the interaction.
Long responses: Certain responses from Alexa went on for a very long time.

With our goals in mind, we created our own versions of the utterances and intents the skill would offer. We took the best parts of these designs, and formed one combined prototype that we would be able to test.

Alexa graphics — *Our first prototypes were made as simple Excel spreadsheets, as we collected all of the skill’s Utterances and Intents.*

Usability Testing

There are currently no established best practices for testing for voice, beyond writing a paper script to be read by the subject. We needed to develop a test that suited our needs and unique constraints. We determined that in order to meet our needs the test would have to:

Capture Verbal Syntax, as opposed to written syntax. We speak differently than we write. These differences would confound any results we could get from written responses.
Be consistent, in tone, timing, emphasis, etc. Making sure our "Test Alexa" is consistent across all trials will keep these from becoming noise in our results.
Be interacted with, just as a person would with a real Alexa device. If the user has follow-up questions to ask, the prototype needs to be able to respond.

The test also needed to be conducted under the following constraints:

We cannot meet in person, due to COVID-19.
We couldn’t use an actual Alexa, because our designs hadn’t yet been turned into the code needed to run on the device.

Testing was a topic of many brainstorming sessions, over many days. Finally, in a moment of inspiration, we came up with the idea to create a soundboard.

The soundboard test

The test would run like this: You, the test taker, would join a video call with one of us, the testers. We would share our screen, presenting one of two slideshows, one that you, the test taker, sees and one that only we, the testers, see. You would see a prompt describing a scenario and you’d be tasked with asking Alexa, in your own words, for some information that you might need to resolve the scenario.

Meanwhile, the tester would have the second slideshow open on their screen. This slideshow has the same number of slides, but each slide is composed entirely of links to audio files. These are the Alexa responses. When you ask “Alexa” a question, the tester clicks the appropriate response, and you will hear “Alexa” respond to your question.

User vs. presenter view — *Illustrated example of our Soundboard setup. The user sees a written prompt (left), and we also see the Soundboard (right) with clickable recorded responses.*

SUS score and results

To quantitatively measure the user’s impressions of our prototype, we employed the “System Usability Scale”, or SUS. This is a standardized 10-question survey given to a participant after our test. For example, we asked test takers to rate how the system was to use on a scale from “Strongly Agree” to “Strongly Disagree”. We take the participants’ answers to these questions and combine them as a final “score.” The average score is a 68, and anything above about an 80 is considered a very good score.

We ran our soundboard test seven times on the initial version of the skill, plus six times on our first iteration on it, and four times on our second iteration. You can see below that although the initial version of the skill scored a respectable 70, there was a marked improvement after several iterations. Our final version’s score was 83.5. We do acknowledge that this score does not reflect using the skill on an actual Alexa device, where Alexa’s processing of commands will be different than our own.

SUS score graphic — *SUS scores, shown as a percentile rank. A score of 70 is more usable than about 55% of systems. A score of 83.5 is more usable than about 90% of systems.*

Project conclusions

At the end of our work, we had so many ideas for improvements to the City of Boston’s Alexa Skill that we couldn’t include all of them in our final deliverable. But, we wanted to at least acknowledge some of the ideas that didn’t make it into our final product design because we think that they are promising:

Expanding transit alerts to be more specific may help keep it relevant to many of our test users’ high expectations.
Considering commonly found features from other municipal Alexa Skills, like public notices and event calendars, may be a way to borrow good ideas from others.

Being able to see our work put in place so quickly, driven in part by such an important need like COVID-19 information, really shows how meaningful user experience design can be. It’s been so rewarding to work for a group of people who care so much about the citizens their work supports. We’re thankful to have been able to help the City of Boston better serve residents!

ABOUT THE GENERAL ASSEMBLY TEAM

General Assembly offers accelerated courses for in-demand skills in tech. The students we worked with were part of the Spring 2020 User Experience Design Immersive. Their work on Returning Citizens was a final project that they completed over three weeks. The students who took part in the project were Brian Collura, Mike Anagnostakos, and Ted Macdonald.

Last updated: October 20, 2020

Published by: Boston Digital Service

Project partners:

Last updated: October 20, 2020

Published by: Boston Digital Service

Project partners: