top of page

Nuance Interactive Natural Assistant

The Brief:

Create the user experience for a speech assistant to be embedded in existing iOS and Android applications, with a minimum of disruption to the customer's application. The natural language technology was ready, and the first banking customers were experimenting with voice control in their mobile app. But the creative team was blocked on the user experience: taking over the entire screen (the Siri approach) was too limiting.

 

The strategic issues: 

  • Ensure that existing applications can incorporate the assistant without a complete rewrite

  • Illustrate to customers how their applications could work through coaching and sample designs 

  • Work with the development team to create sample apps and an SDK to woo customers

The result: 

Shipped on mobile applications for USAA, GEICO, Verizon, US Bank, Garanti Bank, Domino's Pizza, and others.

Meet Nina!
Step 1. How does speech work?

Natural language processing is declarative - you teach it what you want it to 'know.' A natural language assistant learns language  by being fed a huge amount of text. Text analysis informs the processor that certain syllables, together, might mean something. It looks for patterns.

 

A natural assistant also needs to learn the domain it will support: facts, conditions, specific actions the user or the system can take, and any business logic that might apply.

We created sample applications and reference dialog design for banking, airline travel, and web customer support. We also created sample dialog design for pharmacy/medical, energy, and other industries.

Multimodal user experience requires design to think about how to support complementary as well as competing instructions to the system: how does speech support, replace, or conflict with touch events? We created this diagram to explain to customers.

Step 2. Create a 'stateful' user experience

Users respond to personality. The company had already built a full-screen 'assistant' user experience. But it wasn't satisfying: it generalized every use of speech to the same universal interface. Customers, we found, wanted to invest an assistant with the flavor of their own brand.

 

Our reference user experience presents a human-like face that clearly represents different states: listening, speaking, processing, sleeping. In addition, we added an 'authenticating' state to reinforce our biometric voice print processing.

To Illustrate the agent, we thought through three models if speech is invoked:

  1. full screen - escape the mobile application and return to it 

  2. overlay - show the agent on top of the existing application

  3. embed - reserve a specific part of the screen

Our solution combined #2 and #3: Nina requires a persistent invoke trigger to avoid 'always listening' problems, and appears in a dismissable drawer. We gave Nina expressive eyes and a few 'vanity' states to reinforce positive and negative statements.

Further, we created 'transaction sheets' to inform the user what information the agent needs to complete a task. The speech agent 'listens' to discover the user's intent. Each understood intent may require several pieces of information to react. For example, 'pay my bill' requires the following information (or slots).

The transaction sheet shown here is layered over the native application, and presents 4 slots to fill to pay a bill. When all 4 slots have a valid value, the transaction may be submitted.

Transaction sheets are constructed according to the model below. The 50-pixel tall Nina box at the bottom of the screen can be flicked out of the way or pulled up for hints.

Step 3. Design the dialog for reference application tasks

Speech systems have specific 'coverage' - the speech designer creates dialog to respond to a specific set of use cases. The speech designer creates dialog to 'match' what the user says to what the system does. Dialog designers need input from real users. There are typically many different ways to ask the same question. Some notation formats help express variety, such as the JGF notation format below.

  • 'Send a new message'

  • 'Create SMS'

  • 'Create new text message'

  • 'Text message'

  • 'New SMS'

Dialog designers translate features into intentsEach intent - something that the user wants to do - has multiple possible inputs (utterances users speak), possible conditions, possible business rules, and therefore multiple possible outcomes. 

For each sample application, ; and scripts to illustrate interaction in animated examples. Below is a physical flow and some sample questions users might ask at each screen while booking and checking on airline reservations.

  • We surveyed dozens of mobile applications in banking and airline travel and compared features and layouts

  • We compiled screenshots of hundreds of variations to find patterns, key use cases, brand differentiators, and gaps

  • We described business rules, use cases, and dialog design

  • We created a logical flow, physical flow, and screen maps

  • We developed sample data and scripts to illustrate use cases as movies for customers, and as guides for developers

  • We progressively showcased more and more technology in shorter, more impactful stories

  • Marketing, sales, and customers all wanted to see the latest movies we made to illustrate a new business domain 

Good Sample Data Makes Believable Demos

To make our sample applications believable, we designed many sets of sample data. Our use cases were focused, and our sample applications could be used with the appropriate domain Starter Kit to develop robust, customized multimodal applications for iOS and Android.

We created sample applications for fictitious businesses, supporting all top tier features as analyzed through competitive review:

  • flight search, reservations, and check in on TransMobile AIrlines

  • billpay and private banking at Sphere Bank

  • shopping, insurance, web support, pharmacy, and others

The example below is the full screen map for Transmobile Airlines, our fictitious, prototypical airline. The user experience presents all transactions in cards exposing the necessary fields ('slots') to complete the request. We created sample user and account profiles, flight patterns and options, optimized flight paths, and recreated airport and seat maps  for well known aircraft. We also had to invent business context - such as conditions  - that showcase the ability of the system to disambiguate, both recognition and interpretation of user intent.

Screen map: TransMobile Airlines
Main Pages
'Book a flight...
I want to check my flight status...
Show me my reservations.
Where are my bags?
How do I ...?'
My Flights
'Show me my next flight.
I want to see my itinerary.
My reservations.'
About Us
'What's my account balance?
I want to change my profile.
I have a question about my past reservations.' 
Flight Search & Purchase
'I need a flight to ...
When is the next flight to...
I want to fly from ...
I need to buy a ticket ...'
Flight Check In
'Find my reservation and check me in.
I want to check in to my next flight.
I want to upgrade my seat.
Reserve a meal for me.
Please put me on the upgrade list.'
Flight Status
'Is my flight on time? Is my next flight on time?
Can I upgrade my seat? 
How much time do I have between flights?
Is food available on the flight?'
Airport Terminal Maps
'Show me... Tell me... How can I find... Where is...
... the nearest restroom? 
... a restaurant?
... a shop?'

We used live demos and speech interaction tutorials to show customers how to build, deploy, and use Nina. Customers appreciated that reference applications in their domain already covered many user intents, giving them a head start on scoping their Nina project. 

  • Flight search, filter

  • Select flight 

  • Select seat

  • Purchase ticket

Insight: Compound requests and sustained context are complex problems to solve. The system needs to confirm slot by slot.

Insight: Round trips to process speech take time; let the user finish quickly by touching the screen instead of speaking commands.

  • Confirm flight

  • Ask about food, purchase offer

  • Check layover info

  • Check for upgrade

  • Send email, notification
  • Ask for Lounge location

  • Automatic check in on trigger

  • Register bags

  • Mobile boarding pass

  • Notify check in for next flight

This sample movie shows a method to hand off between dictation and interpretation, which are processed differently. 

Insight: As a user, receiving appropriate signals for what the system needs is preferred, even if it doesn't make sense to the user why. 

Insight: The burden of recognition is on the system and its ability to add to its repertoire.

Step 4. Showcase!

I accompanied Advance Sales to show customers how they could develop their own speech agent persona. I brought customized movies to potential customers showing interaction in their domain. Below are sample movies and some real-world counterparts.

I made this movie to show how a fictitious sandwich shop could incorporate Nina into their iOS app.​ It charmed Domino's Pizza, and helped win the account.

'I just tried using “Dom,” and he was pretty responsive, whether it was the green peppers I wanted on my large pizza or the extra bottle of Sprite I asked him to add to my order.

Domino’s notes that “Dom” can take orders for carryout or delivery, access saved orders, suggest additions, and find coupons. The company also included quotes from “Dom” in a press release today.

“My motherboard and fatherboard raised me right,” Dom said.'

Geekwire, October 6, 2014 at 11:21 am

Check out a video of Dom in action. This Shorty Awards entry was a 2014 finalist.

Dom uses Nina's states and feedback framework, with their own branded character animation assets.

They even included vanity commands!

This movie takes aim at the future of private banking.

 

We did some competitive investigation and invented the private banking branch of our fictitious reference bank, Sphere Bank.

 

Sample data and believable use cases, with a concise script, helped us make the case for connected investors to use Nina to get the information they needed to  make fast decisions. 

Garanti Bank (Istanbul) hired Fjord, an international design studio, to design several different personae for different online activities.

Each persona had its own look and feel, including voice and phrasing, and dialog style. Each had strictly bounded transaction areas, improving recognition for users' requests.

I worked with Garanti and Fjord to adapt the Nina animation and state machine to their needs.

This movie shows how a customer can identify their user from any entry point, and pick up on a continuous conversation.

This movie addresses record-intensive applications such as Customer Relationship Management systems.

We developed more than 20 different sets of animations in three series:

  • Character

  • Geometric

  • Organic

 

We experimented with various animation factors to determine: how much animation is necessary to draw the user's attention, without being annoying?

 

We called this animation series 'Kaleida,' one of many in our Geometrics series.

© 2025 by elizabeth dykstra-erickson

bottom of page