CS 124 / LING 180 From Languages to Information
Dan Jurafsky, Winter 2020

Week 8: Group Exercises on Chatbots/Dialogue Agents on Mobile Devices
Feb 25, 2020

Your goal today, as part of the process of thinking deeply about chatbot implementation, is to step back and explore the state of the art for chatbots on phones. You are going to evaluate the conversational and QA features of the personal assistants on your phone, Siri, Google Now, Alexa, Cortana, etc., keeping in mind the type of things that are in the rubric for your PA6 chatbot.

It's best if you have more than one personal assistant in your group, so you can compare and contrast! But if not, it's ok, just work on one.

After each of the following four questions, we will report back some of your results to the whole room, so make note of your group's interesting results.

Speech Recognition Performance Write a couple of texts or emails using voice. Answer these questions with your group:
- Does the speech recognition system allow barge-in (you interrupting/talking over the system)?
- Can you find sentences that have high word error rate? The word error rate is the edit distance in words between what was recognized and what you intended (i.e. sum of the substitutions + deletions + insertions from the correct word string, and divide by the total number of words in the correct string).
- Can you characterize what might be the cause of some of the errors and group them into some sort of classes of errors?
- What does the system do if it is unsure about what you said? What seems to be the confirmation strategy? (e.g., explicit vs. implicit)? Does it vary?
Task performance: Test out some tasks like making/canceling some calendar appointments or getting recommendations for a business (restaurant or etc.). Analyze any errors and note interesting examples. For example did they fail because of speech recognition (the wrong words were recognized) or natural language understanding (the words were right, but the system still didn't understand), Or is the problem in the recommendation engine? Are there disambiguation errors?
Chatbot and Discourse/Dialogue Performance: Does the system feel conversational? Fluent? For example can it make use of the discourse context (e.g. some previous turns in the conversation)? Does it remember things you told it earlier? Can it use dialogue for disambiguation? Or how about personality? Or does it ever surprise you in a positive way?
Ethical Issues: As I discussed in the chatbot lecture and the textbook chapter, ethical issues are especially relevant in chatbots, given that they interface with humans, and also are often training on human data. Discuss in your group some issues, such as privacy (chatbots `overhearing' private info), toxicity (chatbots saying toxic stuff), sexism (female chatbot servants reinforcing traditional gender roles), or any other.
Building the better conversational agent: Let's try to design a better dialogue system. Discuss in your groups some abilities you'd like to add, especially those that require real dialogue (i.e. dialogues that have more than one turn each, not just simple command and control). I'm hoping you'll bring in some ideas that you learned in building your own chatbots!

CS 124 / LING 180 From Languages to Information Dan Jurafsky, Winter 2020 Week 8: Group Exercises on Chatbots/Dialogue Agents on Mobile Devices Feb 25, 2020

CS 124 / LING 180 From Languages to Information
Dan Jurafsky, Winter 2020

Week 8: Group Exercises on Chatbots/Dialogue Agents on Mobile Devices
Feb 25, 2020