CS 294S, CS 294W: A Project Course on Building the Best Virtual Assistant

NN Icon
Almond Icon
idea Icon

Course Description

CS 294S/CS 294W this year is an experimental course to offer students a remote research experience.

This course studies how to create the best virtual assistant (VA), as voice will become a common interface to the internet and all our IoTs in all human languages. This paradigm shift creates many research opportunities in programming languages (PL), natural language processing (NLP), machine learning (ML), and human-computer interaction (HCI). Considering that there are 23 million web developers today, this work can have an impact on 23 million voice interface developers in the future.

This course consists of lectures on the latest VA research results, hands-on tutorials on assistant development tools, interactive class discussions, small-group mentorships, and group presentations. Based on research by the OVAL lab, we have identified many interesting quarter-long projects which can be conducted on our open virtual assistant infrastructure. Groups of 2 or 3 can choose among proposed projects in PL, NLP, ML, or HCI, and may also propose their own projects.

No prior ML knowledge or research experience is required, but students must have taken at least two computer science courses. We limit this course’s enrollment to 20 students to ensure a quality educational experience. You can take this course multiple times for credit. CS 294S can be taken to fulfill the CS 194 requirement. Please sign up for CS 294W if you wish to fulfill your writing requirement as well.

Grading

Attendance is mandatory, please let us know if you can’t make it to class.
  • Class Participation: 15%
  • Homework: 15%
  • Final Project: 70%

Course Links

Research Areas

For detailed project ideas see this.

Multidisciplinary

  • Teaching autistic kids social skills: Develop a Minecraft chatbot to leverage autistic kids’ interests in the game to teach them social skills. This is a project to be jointly advised by Dr. Fung, an expert in autism.
  • VA-first search experience: Can we replace time-consuming search with a VA that lets users express their search criteria explicitly? For example, “find me a hotel room in Maui and I’m willing to pay $30 more for an ocean view.”
  • Systems

  • Conversational templates: Today dialogue flows require users to painstakingly handcode possible conversations sentence by sentence. Applying the concept of reuse in programming systems, we have shown that a generic transaction conversational template can be used to build many different reservation agents easily. Can we identify and create other conversational templates? Examples include (1) navigating a search space such as shopping, (2) helping users create and maintain a knowledge base, such as todo lists, diaries, preferences, etc.
  • Natural Language Processing (NLP)

  • NLP capabilities for VAs: How to train a single model that can understand all skills, so users can seamlessly flow from one skill to the next? How can VAs detect out-of-distribution commands to let users know of their limitations? How to use pre-trained language models to generate understandable confirmations and questions in a VA?
  • Neural dialogue acts: Dialogues are complex, and can take exponentially many paths. How can we understand all conversations, without seeing them all? How can the neural network generalize to dialogue acts and state transitions unseen in training?
  • Multilingual assistants: How do we quickly teach assistants many languages? Can we use pre-trained machine translation models to transfer assistant knowledge to different languages? Can we build a restaurant assistant that knows 20 languages with 20 native speakers in a week?
  • Internet-scale Assistants: How to create an assistant conversant of thousands of domains and over a billion websites, without engineering it one website or domain at a time? How to use pre-trained language models like BERT to (1) learn the vocabulary of new domains in zero-shot learning or (2) transfer knowledge to new domains?
  • Human-Computer Interaction (HCI)

  • Multimodal virtual assistants: Can we add voice to GUIs on phones to improve tasks involving multiple apps? Can we use voice on websites to let consumers automate repeated digital tasks?
  • Social Assistants: With shelter-in-place, can we use communicating virtual assistants to improve our social life? Examples include games, matching people with common interests (36 questions to fall in love), online “Hypehouse” for Tiktok.
  • Rethinking skills: How do we go beyond utilities like playing music or telling news stories? What if the music assistant can answer questions about the artists’ history and upcoming concerts? What if the news assistant can answer questions about the people and events in the news stories?
  • Communicative assistants: VAs are slave-like today; they either obey a command or respond with “I don’t understand”. What if VAs can suggest related actions and ask for clarifications? What if VAs can chat with the users to learn their profile so as to be more effective and helpful.
  • Schedule

    The course meets Tuesday and Thursday, from 10:30am to 11:50am Pacific Time via Zoom. Please see Canvas for Zoom links.

    This schedule is tentative and subject to change. Please pay attention to emails sent to the student list.


    Date Description Course Materials Events Deadlines (10:30am PT)
    Tue April 7 Course Introduction
    [slides]
    Suggested Readings:
    1. Almond: The Architecture of an Open, Crowdsourced,Privacy-Preserving, Programmable Virtual Assistant
    2. Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands
    Student Profiles out
    [link] (requires Stanford login)
    Thu April 9 Schema → QA (HW1)
    [slides]
    Suggested Readings:
    1. Schema2QA: Answering Complex Queries on the Structured Web with a Neural Model
    HW1 out
    [link]
    Student Profiles due
    Tue April 14 Schema → Dialogues
    [slides]
    Suggested Readings:
    1. Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking
    Thu April 16 Tutorial and Discussion
    Tue April 21 Project Discussions
    • Multi-Language Support for Virtual Assistants [slides]
    • Multi-Domain Transactional Dialogues [slides]
    • Controllable and Natural Response Generation for Virtual Assistants [slides]
    • Result Representation: Charts for Data Sequences [slides]
    HW2 out
    [link]
    HW1 due
    Thu April 23 Project Discussions
    Tue April 28 Project Discussions
    Thu April 30 Project Discussions
    Sat May 2 HW2 due at 23:59pm PT
    Tue May 5 ML for NLP Primer
    [slides]
    Suggested Readings:
    1. CS224n's Slides for the topic you are interested in
    Thu May 7 Proposal Presentations
    • Food Ordering [slides]
    • Improving Visual and Verbal User Interaction with Virtual Assistants [slides]
    • Multi-Domain Transactional Dialogues [slides]
    • Controllable and Natural Response Generation for Virtual Assistants [slides]
    • Multi-Language Support for Virtual Assistants [slides]
    • BuddyBot
    Project Proposal due
    [See Projects page for guidelines]
    Tue May 12 Weekly Group Meetings
    Weekly Update due on 5/11
    Thu May 14 Students' Mini-lectures
    • BuddyBot
    • Dialogue Datasets [slides]
    Tue May 19 Weekly Group Meetings
    Weekly Update due on 5/18
    Thu May 21 Students' Mini-lectures
    • Multi-Language Support for Virtual Assistants [slides]
    • Charts: Personality [slides]
    Tue May 26 Weekly Group Meetings
    Weekly Update due on 5/25
    Thu May 28 Students' Mini-lectures
    [slides]
    Tue June 2 Weekly Group Meetings
    Weekly Update due on 6/1
    Thu June 4 Students' Mini-lectures
    [slides]
    Tue June 9 Final Project Presentations
    [slides]
    Thu June 11 Project Report due at 23:59pm PT

    Resources

    Almond Virtual Assistant

    1. Almond: The Architecture of an Open, Crowdsourced, Privacy-Preserving, Programmable Virtual Assistant
      Giovanni Campagna, Rakesh Ramesh, Silei Xu, Michael Fischer, and Monica S. Lam.
      In Proceedings of the 26th World Wide Web Conference - WWW 2017.

    2. Controlling Fine-Grain Sharing in Natural Language with a Virtual Assistant
      Giovanni Campagna, Silei Xu, Rakesh Ramesh, Michael Fischer, and Monica S. Lam.
      In Proceedings of the 2018 ACM International Joint Conference on Pervasive and Ubiquitous Computing - Ubicomp 2018.

    3. Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands
      Giovanni Campagna, Silei Xu, Mehrad Moradshahi, Richard Socher, and Monica S. Lam.
      In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2019.

    Dialogue State Tracking

    1. Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking
      Giovanni Campagna, Agata Foryciarz, Mehrad Moradshahi, and Monica S. Lam
      To appear in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) - ACL 2020.

    2. MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines
      Mihail Eric, Rahul Goel, Shachi Paul, Adarsh Kumar, Abhishek Sethi, Peter Ku, Anuj Kumar Goyal, Sanchit Agarwal, Shuyang Gao, Dilek Hakkani-Tur
      In arXiv preprint - 2019.

    Semantic Parsing

    1. Seq2SQL: Generateing Structured Queries from Natural Language using Reinforcement Learning
      Victor Zhong, Caiming Xiong, and Richard Socher
      In arXiv preprint - 2017

    2. Neural Semantic Parsing with Type Constraints for Semi-Structured Tables
      Jayant Krishnamurthy, Pradeep Dasigi, and Matt Gardner.
      In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing - EMNLP 2017

    3. Data Recombination for Neural Semantic Parsing
      Robin Jia and Percy Liang.
      In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics - ACL 2016

    4. Language to Logical Form with Neural Attention
      Li Dong and Mirella Lapata.
      In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics - ACL 2016

    5. SQLNet: Generating Structured Queries From Natural Language without Reinforcement Learning. [Github]
      Xiaojun Xu, Chang Liu, and Dawn Song.
      In arXiv preprint - 2017

    6. Learning a Neural Semantic Parser from User Feedback. [Github]
      Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, Luke Zettlemoyer.
      In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics - ACL 2017

    7. The Alexa Meaning Representation Language
      Thomas Kollar, Danielle Berry, Lauren Stuart, Karolina Owczarzak, Tagyoung Chung, Lambert Mathias, Michael Kayser, Bradford Snow, Spyros Matsoukas
      In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers) - NAACL - 2018

    Question Answering Over Knowledge Base

    1. Schema2QA: Answering Complex Queries on the Structured Web with a Neural Model
      Silei Xu, Giovanni Campagna, Jian Li, Monica S. Lam
      In arXiv preprint - 2020

    2. Learning a Natural Language Interface with Neural Programmer
      Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, and Dario Amodei
      In Proceedings of the 5th International Conference on Learning Representations - ICLR 2017

    3. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision
      Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, Ni Lao.
      In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics - ACL 2017

    4. SEMPRE: Semantic Parsing with Execution
      Jonathan Berant, Percy Liang at al. 2013 - 2017

    Question Answering Over Free Text

    1. Reading Wikipedia to Answer Open-Domain Questions. [Github]
      Danqi Chen, Adam Fisch, Jason Weston and Antoine Bordes.
      In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics - ACL 2017

    2. SQuAD: 100,000+ Questions for Machine Comprehension of Text
      Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang.
      In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing - EMNLP 2016
      [SQuAD leaderboard]

    Brassau

    1. Brassau: Automatically Generating Graphical User Interfaces for Virtual Assistants
      Michael Fischer, Giovanni Campagna, Silei Xu, and Monica S. Lam.
      In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services - MobileHCI 2018

    Data Programming

    1. Snorkel: Fast Training Set Generation for Information Extraction
      Alexander J. Ratner, Stephen H. Bach, Henry R. Ehrenberg, and Chris Ré.
      In Proceedings of the 2017 ACM International Conference on Management of Data - SIGMOD 2017

    2. Fonduer: Knowledge Base Construction from Richly Formatted Data
      Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, and Chris Ré
      In Proceedings of the 2018 ACM International Conference on Management of Data - SIGMOD 2018

    Others

    1. World of Bits: An Open-Domain Platform for Web-Based Agents
      Tim (Tianlin) Shi, Andrej Karpathy, Jim Fan, Jonathan Hernedez, Percy Liang
      In Proceedings of the 34th Internationl Conference on Machine Learning - ICML 2017

    2. Projects from previous iterations (2018)
    3. Projects from previous iterations (2017)

    Teaching Staff

    Instructor

    Monica Lam

    Professor


    Office hours by appointment via Zoom

    Teaching Assistant

    Sina Semnani


    Office hours via Zoom
    (See Canvas for times and links)