CS 124 / LING 180 From Languages to Information
Dan Jurafsky, Winter 2019
Week 9: Group Exercises on Chatbots/Dialog Agents on Mobile Devices
Mar 5, 2019
Your goal today, now that you are experts in chatbot technology,
is to explore the state of the art for chatbots on
phones. You are going to evaluate the
conversational and QA features of
the personal assistants on your phone,
Siri, Google Now, Alexa, Cortana, etc.,
keeping in mind the type of things we were thinking about in the rubric for our PA6 chatbot.
It's best if
you have more than one personal
assistant in your group, so you can compare and contrast!
But if not, it's ok, just work on one.
After each of the following four questions, we will report back some
of your results to the whole room, so make note of your group's interesting results.
Speech Recognition Performance
Write a couple of texts or emails using voice. Answer these questions with your group:
- Does the speech recognition system allow barge-in (you interrupting/talking over the system)?
- What is the speech recognition word error rate?
The word error rate is the edit distance in words between what was recognized and what you intended
(i.e. sum of the substitutions + deletions + insertions from the correct word string, and divide
by the total number of words in the correct string).
- Can you characterize what might be the cause of some of the errors
and group them into some sort of classes of errors?
- What does the system do if it is unsure about what you said?
What seems to be the confirmation strategy? (e.g., explicit vs. implicit)?
Does it vary?
- Task performance: Test out some tasks like making/canceling some calendar appointments
or getting recommendations for a business (restaurant or etc.).
Analyze any errors and note interesting examples.
For example did they fail because of speech recognition (the wrong words were recognized)
or natural language understanding (the words were right, but the system still didn't understand),
Or is the problem in the recommendation engine?
Are there disambiguation errors?
- Chatbot and Discourse/Dialog Performance: Does the system feel conversational?
For example can it make use of the discourse context (e.g. some
previous turns in the conversation)? Does it remember things you told it earlier?
Can it use dialogue for disambiguation?
Or how about personality? Or does it ever
surprise you in a positive way?
- Building the better conversational agent: Let's try to design a better dialogue system. Discuss in your groups some
abilities you'd like to add, especially those that require real dialogue
(i.e. dialogues that have more than one turn each, not just simple command and control). I'm hoping you'll bring in some ideas that
you learned in building your own chatbots!