CS 224V

Conversational Virtual Assistants with Deep Learning

Fall 2024

Course Schedule

Date Topic Description Events Deadlines
9/23 Introduction What can we do with LLMs? Understanding LLMs (their strengths, weaknesses, how to grow it); Architecture of an agent (external corpora, NLP primitives, agent initiatives); Taxonomy of knowledge-oriented tasks; State-of-the-art results. Course design and outline.
9/25 Knowledge Curation How to use LLMs to curate knowledge in an open domain? Research in pre-writing stage by iteratively searching and reading from different perspectives; adding interactivity to allow personalization. Homework 1 Out Student intro form due
9/30 Building a task-oriented agent How to create an agent using the Genie Worksheet language? Genie Worksheet is the first high-level task-oriented agent specification language that lets users focus on the task to be done; low-level dialogue implementation details are left to the implementation of the language.
10/2 Research Project Ideas What are the ongoing research projects that students can participate in? Knowledge curation (Wikipedia); DataTalk (election data); Task-oriented agents (FAFSA, Courses; ServicesNow); Knowledge discovery (news, original historical corpora, drug-disease interactions); Multi-lingual (news analysis); Advanced knowledge curation (specialized domains (Arxiv), customizable writing schemas, data-driven curation); Understanding large corpora with expert feedback: automatic technical document schemas, personalized filtering; Knowledge distillation of agentic approaches (Sparql, game tutor); Formal reasoning using theorem proving (degree programs, compliance in finance). Homework 2 Homework 1 due
10/7 Grounding Agents on Small Database How to create a hallucination-free conversational bot grounded on structured data? Semantic parsing; Databases; Expressiveness of database queries; Few-shot prompting on small schemas; Handling enumerated types; Comparison with human annotations. Example: Yelp.
10/9 Student project ideas Students pitching preliminary project ideas Project Proposal Assignment out Homework 2 + Project Intent due
10/14 Project Proposal/Discussion Students are invited to pitch projects needing partners.
10/16 Project Proposals Groups present their proposals Project Proposal due
10/21 Project Proposals Groups present their proposals
10/23 Grounding Agents on Free Text How to create a hallucination-free conversational bot grounded on free-text? Text retrieval; Summarization; Verifying generation; Response generation; Evaluation methodology; Fine-tuning small language models. Examples: BingChat, WikiChat
10/28 Structured / Unstructured Query Language How to answer questions combining structured and unstructured data? SUQL language design; Automatic schema creation; Evaluation methodology.
10/30 Task-Oriented Agent Generation How to scale the creation of effective and reliable agents across different domains easily? Implementation of the Genie Worksheet; formal dialogue state representation; semantic parsing; dialogue state tracking; response generation.
11/4 Reactive Agents for Knowledge Graph Queries How to handle complex knowledge tasks using the agentic approach? E.g. Generating SPARQL query for Wikidata; Action set design; experimental approach
11/6 Knowledge Discovery How to discover knowledge in a large corpus of unstructured data? Qualitative coding; Top-down deductive coding; Self-learning with instruction refinement; Bottom-up inductive coding; Curiosity-driven browsing with evaluation function; Learning from experts as an assistant.
11/11 Formal Reasoning How do we use LLMs in formal reasoning? Theorem proving; satisfiability modulo theories; applications
11/13 Multimodal Applications How to build a multi-modal app that supports complex commands? Motivation; Arbitrary composition of APIs in a program by voice; Combining graphical and voice outputs; Showing voice command results in native graphical outputs; How to discover features; ReactGenie framework, GenieWizard.
11/18 NLP Building Blocks What are the building blocks under the hood used in natural language processing? information retrieval techniques; entity linking
11/20 Training LLMs How do we create LLMs? instruction following models like text-davinici and chatGPT; training data.
Thanksgiving Break
12/2 No class
12/4 Final project presentation Groups present their final projects. [3 hour class] Final Project Presentation + Poster
You can check last year's (2023) lectures here.