CS 124 / LING 180 From Languages to Information
Dan Jurafsky, Winter 2020
Week 8: Group Exercises on Chatbots/Dialogue Agents on Mobile Devices
Feb 25, 2020
Your goal today, as part of the process of thinking deeply about chatbot implementation,
is to step back and explore the state of the art for chatbots on
phones. You are going to evaluate the
conversational and QA features of
the personal assistants on your phone,
Siri, Google Now, Alexa, Cortana, etc.,
keeping in mind the type of things that are in the rubric for your PA6 chatbot.
It's best if
you have more than one personal
assistant in your group, so you can compare and contrast!
But if not, it's ok, just work on one.
After each of the following four questions, we will report back some
of your results to the whole room, so make note of your group's interesting results.
-
Speech Recognition Performance
Write a couple of texts or emails using voice. Answer these questions with your group:
- Does the speech recognition system allow barge-in (you interrupting/talking over the system)?
- Can you find sentences that have high word error rate?
The word error rate is the edit distance in words between what was recognized and what you intended
(i.e. sum of the substitutions + deletions + insertions from the correct word string, and divide
by the total number of words in the correct string).
- Can you characterize what might be the cause of some of the errors
and group them into some sort of classes of errors?
- What does the system do if it is unsure about what you said?
What seems to be the confirmation strategy? (e.g., explicit vs. implicit)?
Does it vary?
- Task performance: Test out some tasks like making/canceling some calendar appointments
or getting recommendations for a business (restaurant or etc.).
Analyze any errors and note interesting examples.
For example did they fail because of speech recognition (the wrong words were recognized)
or natural language understanding (the words were right, but the system still didn't understand),
Or is the problem in the recommendation engine?
Are there disambiguation errors?
- Chatbot and Discourse/Dialogue Performance: Does the system feel conversational?
Fluent?
For example can it make use of the discourse context (e.g. some
previous turns in the conversation)? Does it remember things you told it earlier?
Can it use dialogue for disambiguation?
Or how about personality? Or does it ever
surprise you in a positive way?
- Ethical Issues: As I discussed in the chatbot lecture
and the textbook chapter, ethical issues are especially relevant in chatbots,
given that they interface with humans, and also are often training on human data.
Discuss in your group some issues, such as privacy (chatbots `overhearing' private info),
toxicity (chatbots saying toxic stuff), sexism (female chatbot servants
reinforcing traditional gender roles), or any other.
- Building the better conversational agent: Let's try to design a better dialogue system. Discuss in your groups some
abilities you'd like to add, especially those that require real dialogue
(i.e. dialogues that have more than one turn each, not just simple command and control). I'm hoping you'll bring in some ideas that
you learned in building your own chatbots!