The lecture slides, labs, and assignments will be posted here as the course progresses.
Lecture times are 3pm-4:20pm PST on Tuesdays and Thursdays. All deadlines are at 11:59pm
PST.
This schedule is subject to change according to the pace of the class.
Date | Description | Materials | Events |
---|---|---|---|
Week 1 | |||
Tue Sept 24 |
Course overview and logistics Introduction to trustworthiness of LLMs Project overview, with examples of projects from last year |
Slides References: TrustLLM: Trustworthiness in Large Language Models (Sections 2.2 and 3) |
|
Thu Sept 26 |
Overview and structure of LLM tech stack RAG architecture: Tools needed for RAG (LlamaIndex, TruLens); Evaluation of RAGs, Presentation of Homework 1 Fine tuning concept and tools, anticipating Homework 2 |
Slides |
Homework 1 Out Homework 1 Colab Due Oct 8th Supplemental Materials: LlamaIndex TruLens |
Week 2 | |||
Tues Oct 1 |
Evaluation of models and apps:
|
Slides References: Grounding and Evaluation for Large Language Models |
|
Thu Oct 3 |
Project directions Sample application areas & evaluations |
Slides |
Final project group formations due Friday, October 4th. More info available on Ed. |
Week 3: Project Proposals and Feedback | |||
Tue Oct 8 |
Project Proposal Presentations |
||
Wed Oct 9 | Homework 1 Due | ||
Thu Oct 10 | Project Proposal Presentations |
||
Week 4 | |||
Tues Oct 15 |
Grounding and Factuality 1 Focus: Generating grounded responses
Fine tuning (for factuality) |
References: Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey) Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks TRUE: Re-evaluating Factual Consistency Evaluation Do Language Models Know When They're Hallucinating References? RARR: Researching and Revising What Language Models Say, Using Language Models The Internal State of an LLM Knows When its Lying SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models Measuring Reliability of Large Language Models through Semantic Consistency |
Homework 2 Out Due Date TBD |
Thurs Oct 17 |
Guest Lecture:
Focus: verification and guardrails
Checking Factuality Rewriting |
References: Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey) |
|
Week 5 | |||
Tue Oct 22 |
Confidence, Calibration, Uncertainty
Yarin Gal’s work on Uncertainty Self-Consistency, GD-Consistency, Prompt-Consistency and other topics |
References: Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models |
|
Thu Oct 24 | Explainability Data Quality for Supervised Fine-Tuning (SFT), RL(HF, AIF) |
References: Towards Monosemanticity: Decomposing Language Models With Dictionary Learning What makes good data for alignment? |
|
Week 6 | |||
Tue Oct 29 |
Agents 1 |
References: Cortex Analyst – An Agentic App for text2sql |
|
Thu Oct 31 | Agents 2: Evaluation |
References: Berkeley Function Calling Leaderboard Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? |
|
Week 7: Mid-term Project Presentations with Feedback | |||
Tues Nov 5 | No Class (Democracy Day) |
||
Thurs Nov 7 | Project mid-term presentations (All groups) |
||
Week 8 | |||
Tue Nov 12 | Guest Lecture:
Mert Yuksekgonul (Stanford University)
TextGrad (Mert Yuksekgonul) |
||
Thu Nov 14 | Guest Lecture:
|
||
Week 9: Project Presentations (dry-runs, with feedback) | |||
Tue Nov 19 | Project Presentations | ||
Thu Nov 21 | Project Presentations | ||
Thanksgiving Break (Nov 26, Nov 28) | |||
Week 10: Final Project Fair and Presentations | |||
Tue Dec 3 OR Thu Dec 5, 3 - 5:30PM |
Final Project Fair
|
||
Finals Week: Final Project Report Due at the end of the scheduled exam time, which is on Thursday, December 12 at 3:15 PM. |