CS 294S, CS 294W

Course Description

CS294S/CS294W is designed to offer students a remote research experience, in contrast to online lecture-style courses.

This course focuses on building a state-of-the-art social-good virtual assistant, a project supported by the Alfred P. Sloan Foundation to protect privacy and open access of knowledge. Opportunities to continue the research after the Fall Quarter are available.

State of the art. We can now automatically generate neural-based dialogue agents from a knowledge base, without writing dialogue trees and collecting annotated utterances painstakingly. This opens up many research opportunities in NLP, HCI, and systems.
Social good. We aim to give consumers an open-source, privacy-preserving assistant whose skills are all openly available in Thingpedia, a nonproprietary crowdsourced repository. This project provides a much-needed alternative to the emerging oligopoly of proprietary assistants and skill repositories.

This course is open to all undergraduates, masters, and PhD students, who have taken at least two courses in Computer Science. Students will learn:

The latest voice-assistant technology.
Interdisciplinary research in NLP, systems, HCI.
Experience with a large-scale social-good project.
Technology to disrupt surveillance capitalism.

This course consists of a few lectures on the latest technology, hands-on tutorials, small-group mentorships, interactive class discussions, and group presentations. Groups of 2 or 3 will choose or propose projects in subjects such as:

AI: machine learning, natural language processing, and knowledge extraction.
HCI: crowdsourced research, virtual assistant design, multimodal interface design, and user studies.
System: Formal knowledge representation and semantics, programming in natural language, and internet of things.

You can take this course multiple times for credit. CS294S can be taken to fulfill the CS194 requirement. Please sign up for CS294W if you wish to fulfill your writing requirement as well. (CS294W requires students to meet with an advisor from the Technical Communication Program.)

Grading

Attendance is mandatory, please let us know if you can’t make it to class.

Class Participation: 15%
Homework: 15%
Final Project: 70%

Course Links

Schedule

The course meets Tuesday and Thursday, from 10:30am to 11:50am Pacific Time via Zoom. Please see Canvas for Zoom links.

This schedule is tentative and subject to change. Please pay attention to emails sent to the student list. Schedule for previous school year (spring 2020) can be found here.

Date	Description	Course Materials	Events	Deadlines
Tue September 15	Course Introduction [slides]	Suggested Readings: Almond: The Architecture of an Open, Crowdsourced,Privacy-Preserving, Programmable Virtual Assistant Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands	Student Profiles out [link] (requires Stanford login) Signup spreadsheet out [link] (requires Stanford login)
Thu September 17	Lecture: Schema → Q&A [slides]	Suggested Readings: Schema2QA: High-Quality Low-Cost Q&A Agents for the Structured Web AutoQA: From Databases To Q&A Semantic Parsers With Only Synthetic Training Data	HW1 (See Canvas for more details) out [link]	Student Profiles due
Tue September 22	Project Discussions	Project Pitches Q&A on Wikidata (Silei Xu) Auto-IoT: Build Semantic Parser for IoTs Automatically (Silei Xu) Multimodal Chatbots (Nancy Xu) Error Detection: Know What You Don't Know (Sina Semnani) Natural Response Generation for Virtual Assistants (Sina Semnani)
Thu September 24	Lecture: Schema → Dialogues [slides]	Suggested Readings: State-Machine-Based Dialogue Agents with Few-Shot Contextual Semantic Parsers Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking		Homework due (11:59PM PT)
Tue September 29	Project Discussions	Project Pitches Are We Conversational Yet?: A Design Study and Empirical Evaluation of Multi-Turn Dialogues for Virtual Assistants (Giovanni Campagna) Multi-Language Support for Virtual Assistants (Mehrad Moradshahi) Teaching End-User Programming (Monica Lam) Representing Human Knowledge for Common Types (Giovanni Campagna) Let Me Know: A Dialogue for Notification (Giovanni Campagna)
Thu October 1	Project Discussion
Tue October 6	Project Proposals	Project Proposals Q&A on Wikidata (Naoki Yamamura) Zero-shot Multi-Modal Automation with QA (Michael Du, Sam Harris Masling, Nancy Xu)
Thu October 8	Project Proposals	Project Proposals Are We Conversational Yet? (Alejandrina G.R. and Kat McNeill) Blind IoT Devices (Nathan Lee) Safe Election FAQ System (Krithika Iyer) Finance Assistant (James F Zhuang)
Tue October 13	Project Proposals	Project Proposals Error ~~Detection~~ Preemption in Almond (Ammar Al-Qatari, Trey Connelly) Natural Response Generation for Virtual Assistants (Dhara Yu) Improving Multi-Language Support for Virtual Assistants (Tyler Hong, Pablo Ocampo, Eugene Tian)
Thu October 15	Lecture: An Overview of NLP [slides]	Suggested Readings: CS224n's Slides for the topic you are interested in
Tue October 20	Weekly Group Meetings			Weekly Updates due
Thu October 22	Students' Mini-Lecture	Mini-lectures Knowledge Base Question-Answering (Naoki Yamamura) Confidence Modeling for Deep Neural Networks (Ammar Al-Qatari, Trey Connelly)
Tue October 27	Weekly Group Meetings			Weekly Updates due
Thu October 29	Students' Mini-Lecture	Mini-lectures Preserve Factual Correctness for Neural Text Generation (Dhara Yu) WebAgent: Automatic Generation of a Coversational Agent from Web Instructions (Michael Du)
Tue November 3	Office Hour
Thu November 5	Students' Mini-Lecture	Mini-lectures Designing Empathetic Responses (Nathan Lee, Yoni Lerner) WebAgent: Automatic Generation of a Coversational Agent from Web Instructions (Related Work) (Sam Masling) The Value of Virtual Assistant for Personal Finance (James Zhuang)		Weekly Updates due
Tue November 10	Weekly Group Meetings			Weekly Updates due
Thu November 12	Students' Mini-Lecture	Mini-lectures Are We Conversational Yet? (Kat McNeill, Alejandrina G.R.) Neural Machine Translation for Question Answering (Pablo Ocampo, Tyler Hong, Eugene Tian) How do Deep Learning Models Answer Questions? (Krithika Iyer)
Tue November 17	Final Project Presentations	Final Project Presentations Better Error Detection with Calibrated Neural Confidence Modeling (Ammar Alqatari, Trey Connelly) Improving Conversational Fluency for Virtual Assistants (Dhara Yu) Personal Finance Virtual Assistant (James Zhuang)
Thu November 19	Final Project Presentations	Final Project Presentations Are we conversational yet? A Design Study and Evaluation of Multi-Turn Dialogues for Virtual Assistants (Kat McNeill, Alejandrina G.R.) Dialog Q&A with context over CSQA dataset (Naoki Yamamura) WebAgent (Michael Du, Sam Masling, Nancy Xu) Multilingual Paraphrasing for Question Ansering (Pablo Ocampo, Tyler Hong, Eugene Tian) Safe Voting Q&A System (Krithika Iyer) [Best Project Award] Habitual Logging Using an IoT Device (Nathan Lee, Yoni Lerner)
Fri November 20	Project Report due at 11:59PM PT

Resources

Almond Virtual Assistant

Almond: The Architecture of an Open, Crowdsourced, Privacy-Preserving, Programmable Virtual Assistant
Giovanni Campagna, Rakesh Ramesh, Silei Xu, Michael Fischer, and Monica S. Lam.
In Proceedings of the 26th World Wide Web Conference - WWW 2017.

Controlling Fine-Grain Sharing in Natural Language with a Virtual Assistant
Giovanni Campagna, Silei Xu, Rakesh Ramesh, Michael Fischer, and Monica S. Lam.
In Proceedings of the 2018 ACM International Joint Conference on Pervasive and Ubiquitous Computing - Ubicomp 2018.

Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands
Giovanni Campagna, Silei Xu, Mehrad Moradshahi, Richard Socher, and Monica S. Lam.
In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2019.

Dialogue State Tracking

State-Machine-Based Dialogue Agents with Few-Shot Contextual Semantic Parsers
Giovanni Campagna, Sina J. Semnani, Ryan Kearns, Lucas Jun Koba Sato, Monica S. Lam
arXiv preprint - 2020.

Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking
Giovanni Campagna, Agata Foryciarz, Mehrad Moradshahi, and Monica S. Lam
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) - ACL 2020.

MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines
Mihail Eric, Rahul Goel, Shachi Paul, Adarsh Kumar, Abhishek Sethi, Peter Ku, Anuj Kumar Goyal, Sanchit Agarwal, Shuyang Gao, Dilek Hakkani-Tur
In arXiv preprint - 2019.

Semantic Parsing

Seq2SQL: Generateing Structured Queries from Natural Language using Reinforcement Learning
Victor Zhong, Caiming Xiong, and Richard Socher
In arXiv preprint - 2017

Neural Semantic Parsing with Type Constraints for Semi-Structured Tables
Jayant Krishnamurthy, Pradeep Dasigi, and Matt Gardner.
In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing - EMNLP 2017

Data Recombination for Neural Semantic Parsing
Robin Jia and Percy Liang.
In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics - ACL 2016

Language to Logical Form with Neural Attention
Li Dong and Mirella Lapata.
In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics - ACL 2016

SQLNet: Generating Structured Queries From Natural Language without Reinforcement Learning. [Github]
Xiaojun Xu, Chang Liu, and Dawn Song.
In arXiv preprint - 2017

Learning a Neural Semantic Parser from User Feedback. [Github]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, Luke Zettlemoyer.
In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics - ACL 2017

The Alexa Meaning Representation Language
Thomas Kollar, Danielle Berry, Lauren Stuart, Karolina Owczarzak, Tagyoung Chung, Lambert Mathias, Michael Kayser, Bradford Snow, Spyros Matsoukas
In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers) - NAACL - 2018

Question Answering Over Knowledge Base

Schema2QA: High-Quality and Low-Cost Q\&A Agents for the Structured Web
Silei Xu, Giovanni Campagna, Jian Li, Monica S. Lam
To appear in Proceedings of the 29th ACM International Conference on Information and Knowledge Management - CIKM - 2020

AutoQA: From Databases To Q&A Semantic Parsers With Only Synthetic Training Data
Silei Xu, Sina J. Semnani, Giovanni Campagna, Monica S. Lam
To appear in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing - EMNLP - 2020.

Localizing Q&A Semantic Parsers for Any Language In a Day
Mehrad Moradshahi, Giovanni Campagna, Sina J. Semnani, Silei Xu, Monica S. Lam
To appear in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing - EMNLP - 2020.

Learning a Natural Language Interface with Neural Programmer
Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, and Dario Amodei
In Proceedings of the 5th International Conference on Learning Representations - ICLR 2017

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision
Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, Ni Lao.
In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics - ACL 2017

SEMPRE: Semantic Parsing with Execution
Jonathan Berant, Percy Liang at al. 2013 - 2017

Question Answering Over Free Text

Reading Wikipedia to Answer Open-Domain Questions. [Github]
Danqi Chen, Adam Fisch, Jason Weston and Antoine Bordes.
In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics - ACL 2017

SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang.
In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing - EMNLP 2016
[SQuAD leaderboard]

Brassau

Brassau: Automatically Generating Graphical User Interfaces for Virtual Assistants
Michael Fischer, Giovanni Campagna, Silei Xu, and Monica S. Lam.
In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services - MobileHCI 2018

Data Programming

Snorkel: Fast Training Set Generation for Information Extraction
Alexander J. Ratner, Stephen H. Bach, Henry R. Ehrenberg, and Chris Ré.
In Proceedings of the 2017 ACM International Conference on Management of Data - SIGMOD 2017

Fonduer: Knowledge Base Construction from Richly Formatted Data
Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, and Chris Ré
In Proceedings of the 2018 ACM International Conference on Management of Data - SIGMOD 2018

Others

World of Bits: An Open-Domain Platform for Web-Based Agents
Tim (Tianlin) Shi, Andrej Karpathy, Jim Fan, Jonathan Hernedez, Percy Liang
In Proceedings of the 34th Internationl Conference on Machine Learning - ICML 2017

Projects from previous iterations (2018)
Projects from previous iterations (2017)

Democratizing Virtual Assistants:
A Social-Good Research Project Course

Course Description

Grading

Course Links

Research Areas

Multidisciplinary

Natural Language Processing (NLP)

Human-Computer Interaction (HCI)

Systems

Schedule

Resources

Almond Virtual Assistant

Dialogue State Tracking

Semantic Parsing

Question Answering Over Knowledge Base

Question Answering Over Free Text

Brassau

Data Programming

Others

Teaching Staff

Instructor

Monica Lam

Teaching Assistant

Silei Xu

Democratizing Virtual Assistants: A Social-Good Research Project Course

Grading

Course Links

Multidisciplinary

Natural Language Processing (NLP)

Human-Computer Interaction (HCI)

Systems

Almond Virtual Assistant

Dialogue State Tracking

Semantic Parsing

Question Answering Over Knowledge Base

Question Answering Over Free Text

Brassau

Data Programming

Others

Instructor

Monica Lam

Teaching Assistant

Silei Xu

Democratizing Virtual Assistants:
A Social-Good Research Project Course