Context is one of the most important facets of a holistic conversation.
In conversations between grown adults you often see context at work. You see it in how they weave their words, how they present themselves and how they absorb information. Politicians often speak in sound bites, because what they say is reported in sound bites. By eliminating context, they keep their answers consistent and clear of scrutiny.
Scientists do the exact opposite, they account for every rational possibility and go into as much detail as possible, which is why NASA has to release articles to counter badly reported stories.
Further evidence that proves that context is the crux of all adult conversation, comes from looking at children. Next time someone’s trying to describe something to a child, listen very carefully and you’ll pick up on an important social cue – repetition.
"Look Tommy! Isn't that a cute dog? Look at the cute dog. What color is the cute dog? Tommy, what color is the dog? It's white, isn't it? It's a cute white dog, Tommy."
the Stranger Chat
And therein lies the problem.
Chatbots are the conversational equivalents of five-year-old kids.
Often repetitive, bots are mindless drones that can only engage with a singular motive. Like five-year-olds, they can’t multitask or juggle conversations. Context is often lost on them and they struggle with emotions.
For the most part, the problem of recognizing spoken input has been largely solved, but a new challenge has arisen: how to build a user experience that’s modeled after a natural human conversation.
Twenty-five years from now, no one will be clicking on drop-down menus, but everyone will still be pointing at maps and correcting each other’s sentences. It’s fundamental. Good information software reflects how humans, not computers, deal with information. ”
Bret Victor, Magic Ink
We’ve talked about the importance of better Natural Language Understanding and a lot of readers replied, asking a question that we’ve found asking ourselves more often as we continue to build Verloop.
Where do we go from here?
And today, we have an answer.
Consider this: Conversation has advanced our civilization to where it is today. All human inventions are born from the ideas we communicate through spoken words — an ability we evolved over a very long time. Over 100,000 years in fact. Compare that to the roughly 5,000-year-old infancy of writing, let alone computing.
Google, in all its power and wisdom, identifies six basic steps to conversations.
- Open a channel to set up common ground — Speaker A sends a message to speaker B
- Commit to engage — B commits to the conversation with A
- Construct meaning — A and B connect through a set of structured ideas and (often unspoken) contexts
- Evolve – A or B (or both) learned or gain something based on their interaction
- Converge on an agreement — If everything works, A and B have reached an agreement; if not, both may move to repair the situation
- Act, or interact — Functional action may follow as a result of the conversation, or some unconscious goal may be reached (being less lonely counts)
How to make your bot more natural
Through our time working on, designing and testing out bots, we’ve identified and zeroed in on four distinct conversational flaws that plague modern automation. The reason these four ” ‘ions ” are important, is that the inability of bots to comprehend these nuances often becomes the reason that humans shun conversational automation.
The crux of a conversation comes from the context it implies, but often in conversations, the things we don’t say hold more meaning than the things we don’t.
Say, for example, you and your friend are having a conversation and you say the following.
“I heard about this great burger place on the West Side, want to come with me on Friday evening?”
but she says,
“I work the evening shift.”
Individually the sentences have nothing to do with each other, but what she’s implying is evident to you. She can’t be at two places at the same time, so you infer, as logic would dictate, that she isn’t coming.
Take another example, after you’ve found another friend to go with you. So you call ahead to make a restaurant and this happens.
“Certainly sir, for how many people do you want to make this reservation for?”
and you say,
“Oh, just me and my friend.”
Again, the implication is evident. To humans, but not bots. To a bot for a reservation app, that sentence has no correlation to its question and it’ll ask you the same question again, till it fulfills its input requirement of a numeric value.
Humans need to have these presumptions and principles operating in the background because if they weren’t all human conversations would be hyper-literal.
Bots can’t comprehend nullification like humans can because they’ve been built on a single-keyword basis. While humans can understand affirmative forms that express the validity or truth of a basic assertion and negative forms that express the falsity of a basic assertion, bots can’t and this has real-world ramifications.
“I want a barbeque burger with no onions” (Only nullifies onions)
“We don’t want any drinks” (Nullifies the whole event).
“I’m not sure… I’ll take a milkshake (It doesn’t nullify the main event)”
A Burger King bot, for example, would struggle to understand a phrase as simple as, “I want a barbeque burger, with no onions” by virtue of a simple design failure.
The ability to coordinate and handle more than one request at once is one of the most used elements in how humans communicate, and after some research, we found out that most relevant platforms do not support a request where elements are joined by a coordinator.
“[[I want a barbeque burger] and [my wife will have a Peri Peri burger]].” (two main events)
“I’ll have a barbeque burger [with [extra cheese] and [onion]].” (two changes in ingredients)
“I’ll take [[a barbeque burger [with [extra cheese] and [onion]] and [a Peri Peri burger]].” (two burgers, the first one with two ingredients)
Connections between different phrases
Bots at large are driven by tree models of conversation, pick and choose level by level till you reach a predetermined end goal. This means that it’s not possible for a user to change his request halfway through a conversation, like you’d do, say, in real life, and forces the customer to start over.
As a solution, we propose the usage of connectors as in the following examples:
“I want a barbeque burger with onion… Actually, add extra cheese” (adds info to the first sentence: adds an ingredient).
“I want a Peri Peri with extra chicken. However, I prefer it with no bacon.” (also adds info to the previous one).
Conversation, regardless of its participants, should be a principled, mutual process of collaboration and negotiation. All parties involved create and agree upon meanings and operate against a background of rich, nuanced context. Understanding this can give you a theoretical model for designing your own conversational UI.
And all this is just the groundwork for what the future of conversational automation holds, voice interaction.
Voice-based conversational interfaces
Undoubtedly, one of the largest marketing trends from 2017 was the invasion of intelligent personal assistants (IPA’s) into households across the globe. Amazon is the unquestioned leader in the world of conversational marketing, selling over 15 million units of the Alexa powered ‘Echo’, good for 75% of the U.S. market share.
The plan for the immediate future seems to be, “If you can’t beat em, join em”.
Shortly after the Echo was launched, Google introduced the ‘Home Assistant’. The demand by now was so overwhelming, that Google sold one of its Home Assistants every second during last years holiday season, totaling over 6 million pieces divided amongst its current range of the Home, Home Max, and the Home Mini. According to DigiTimes, Facebook is also planning to launch a speaker of its own, the 15-inch touchscreen device is slated for release in early 2018. Apple joined the chat race as well when it launched its Siri powered $349 ‘Homepod’.
According to research by Forrester, the market for ‘smart speakers’ will continue to grow exponentially, with an estimated user base of 224 million by 2022 and the upside is tremendous.
The advantage of conversational UIs is that there is no learning curve; People already know how to talk. A well-designed user interface is intuitive — commands don’t have to be taught, unlike the meaning of a button in a visual interface or the keys on a touch-tone phone system.
Imagine your out-of-touch-tech mother. She frowns at smartphones, calls you to troubleshoot her iPad, looks down on those new-fangled Kindles and doesn’t understand Netflix. Her prejudices against technology are understandable. Even most kids take a while to get accustomed to a Mac after years on a Windows. This Christmas, instead of getting your mom the new Oculus Rift, or Pokemon Go, gift her an Alexa and watch how quickly she adopts it as her own.
If she or any of your users, however, do require help, the UI shouldn’t try to “teach” them what to say to protect them from veering off the so-called “happy path.” Instruction is irrelevant for those who aren’t having problems — which should be most people if you’ve designed an intuitive UI. Instead, give instructions on fallback paths and in repair (error) prompts, as in the following example. This way, you optimize relevance for people who don’t need help but offer help when someone seems to be stuck.