Lessons in UX: How to design a great voice interface system

UX

One of my first jobs in user experience was designing an interactive voice response system for my company’s customer support line. it was more challenging that you might think… we had to understand the mental model of users who would call such a number, what their priority needs would be, how much “information” the caller can keep in his or her head at one time, etc. Such systems are still around, obviously, with some far better designed than others (I think the one I had a hand in was pretty good).

Now, however, we have an even greater challenge – how to design true voice interface systems, systems like Apple’s Siri that lets users do anything. I’ve been thinking a while about some “best practices” that can be applied to such interactions. Here’s my first swag at some good advice on how to make a good voice interface:

Be conversational. Respond in a way that is personal and polite, not monotonous or robotic. Just because it’s a computer doesn’t mean it needs to be a computerized voice. Be friendly, but…

Don’t get too personal. Avoid words or phrases that are overly critical or praising… It can come off as cloying and phony. While it was design in a different domain, I once evaluated an ATM UI that used phrases that were completely informal… It was off-putting and inappropriate.

Recover gracefully. When mistakes are made, learn from them. Apologize, and remember. Design a learning system. Yes,I know, this is easier said than done, but it’s important.

Respond quickly. Latency is DEATH to a voice control system, because the user is expecting the response immediately. Every time the system responds with “thinking” is a moment where the user’s confidence is reduced.

Be ready for ANYTHING. Build out “trees” that account for as many eventualities as you can think of. Be thorough. Defaulting to a web search is not a positive outcome. In creating a voice interface system you should aim for a complete solution, and deliver it… and users will use the system with great confidence

Be reliable. A voice response system that doesn’t respond… well, that’s not a very good thing, is it? Make sure that the system is like the dial tone should be when you pick up the phone – always on, always ready.

Provide alternate inputs, but use voice first. One of the things that Siri does badly is when users has to confirm something. Most of the time the user has to click a “yes” or “OK” button. Why? You’re already talking to the device, the device should KEEP LISTENING.

Avoid doing HAL 9000 jokes. I know it’s tempting, but it’s played out. But do pay attention to how movies like Moon, 2001, and Star Trek presents voice control systems, because that is the “training” that all of us have had. Understanding the expectations that users have of these systems can help you design them better.

Get out of the way whenever possible. Interacting with a voice system is cool now, but in the future it will become as normal as driving a car or watching TV. Once the novelty wears off, people will be more results and outcome-focused… so don’t be cute. Provide an interactive experience that obeys commands and gets out of the way, with no editorial comments.

In closing, I have always said that we are living in the future, it just wasn’t the future we were expecting. The very fact that I can write this type of article as a serious piece of guidance to designers instead of a blue-sky puff piece… well, that just proves my earlier point. The future has arrived.

The death of the UI…?

sirifaq_hero

Based on the presentation given at UPA 2012 this June.

I have spent a lot of time over the past month looking at alternate methods of interaction, different ways people can engage with technology other than the traditional mouse/keyboard paradigm. This focus was triggered by watching demo videos of The Leap, a new computer peripheral coming in November that provides gesture control that makes the Kinect look like a kindergarden toy (go ahead, google it and watch – I’ll still be here when you get back).

The videos are amazing, with gesture recognition down to the centimeter level, and it definitely The Leap is a significant… err, step forward. We are finally at the point that we saw years ago in the movie Minority Report, when the user controlled and engaged with the computer without touching a thing. The future will soon be here, faster than expected.

Gestural interaction is finally coming into its own, but what about voice? We have had some form of “voice control” of our computers for years, starting with Dragon dictation software over fifteen years ago… and then after that we had interactive voice systems for automated services such as banking, as well as assistive technology for the visually impaired computer user. The thing is, until this year these systems were not very “smart”… they had preset options and didn’t “learn” very well (or at all).

All this changed with the release of Siri for the iPhone last year, when Apple rolled out a vert “smart” personal assistant, with advanced artificial intelligence beyond any that has ever been in any consumer device. Majel, a similar system being worked on by Google for its Android platform, is also reported to have very good AI. The next version of Siri, just announced at Apple’s World Wide Developers Conference, looks to have even more advanced features and functions. An already pretty good system is getting better.

So, with gesture and voice becoming more and more powerful and available to customers in multiple platform, will these new interaction methods mean the death of the UI? Are user experience designers going to be out of a job in a few years? No… but we have to adapt. As design professionals we have to not only get used to this technology, but know how to design experiences around and using this technology.

When looking at a design problem, very soon it will be possible to design your solutions to support voice and/or gesture as alternate controls… Or make them the primary way to interact. There may be use cases where voice control or gesture is the optimal interaction method… or it may not work at all. We have to think about factors such as reliability, ease of use, and context… All inform the decision as to how we want users to engage with our designs.

A quick example: Recording a TV show. The current process works like this: The user picks up the remote, browses the channel listing finds the show, clicks the record button when they find “Breaking Bad” in the onscreen listing. The TV UI “refreshes” to show an icon next to the show to be recorded, confirming to the user that his or her choice was set. A potential process in the future could go something like this: User says “TV, record the new Breaking Bad,” and the TV replies “It’s set to record on Wednesday at 9.”

The example I cite is using voice, but you can easily create another example using gestural interaction – the user could swipe away at shows on a TV listing and move the preferred one to a “record” target area. Or you could look at something like a computer aided drafting program, where instead of using a mouse and pointer to smooth or stretch an object, the system could recognize the user’s own fingers as they “mold” the object. And so on…

In closing, we should always focus on designing the experience, not a UI. We need to be ready to embrace and use these new interaction methods because they may (and probably will) adapt them FAST. We need to be ready.