The lecture slides, labs, and assignments will be posted here as the course progresses.
Lecture times are 4:30pm-5:50pm PST on Tuesdays and Thursdays. All deadlines are at 11:59pm
PST unless otherwise specified.
This schedule is subject to change according to the pace of the class.
| Date | Description | Materials | Events |
|---|---|---|---|
| Week 1 | |||
| Tue Sep 23 |
Course introduction and overview Course structure and logistics Project examples from 2023, 2024 Background on foundation models, LLMs Approaches: prompting, fine-tuning, agentic AI |
Slides Reading: LangGraph, Agent Architectures OpenAI, A Practical Guide to Building Agents |
|
| Thu Sep 25 |
Models, Prompting and RAG LLM power and limitations Prompting techniques Retrieval-Augmented Generation (RAG) |
Slides Reading: Grounding and Evaluation for Large Language Models The Prompt Report: A Systematic Survey of Prompt Engineering Techniques Survey of Hallucination in Natural Language Generation |
Homework 1: Homework 1 Handout Homework 1 Supplemental Materials: Building and Evaluating Data Agents (deeplearning.ai) |
| Week 2 | |||
| Tue Sep 30 |
Agentic AI Foundations of AI agents Sample agentic systems Evaluation criteria and methods: goals, plans, and actions |
Slides Reading: Anthropic, Building effective agents What is your Agent's GPA? Evaluating and improving agent trustworthiness (blog) |
|
| Thu Oct 2 |
Recap, further discussion, and planning for projects Review topics from lectures 2 and 3 in more depth Discuss promising directions for course projects |
Slides Reading: Together, Open Research Cookbook |
|
| Week 3 | |||
| Tue Oct 7 |
Project proposals and feedback - Group 1 |
|
|
| Thu Oct 9 |
Project proposals and feedback - Group 2 |
|
|
| Week 4 | |||
| Tue Oct 14 |
Systematic evaluation of agentic systems Goals, Plans, Actions Refining the LLM judge Evaluation with and without ground truth Guest Speaker: Allison Jia |
Slides Reading: Evaluation and Benchmarking of LLM Agents: A Survey Agent-as-a-Judge: Evaluate Agents with Agents What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment |
Homework 1 Due (4:30 PM) |
| Thu Oct 16 |
MCP, effective tool descriptions, evals Guest Speakers: Yusuf Ozuysal & Ishita Gupta |
Slides Slides (Ishita) Reading: Introducing the Model Context Protocol MCP Documentation Writing effective tools for agents — with agents |
|
| Week 5 | |||
| Tue Oct 21 |
Deep dive into LLMs |
Slides Reading: Deep Dive into LLMs like ChatGPT (YouTube | Andrej Karpathy) |
|
| Thu Oct 23 |
Data curation and RL to optimize agents Guest speaker: Trung Vu (Bespoke Labs) |
Slides Reading: Improving Multi-Turn Tool Use with Reinforcement Learning Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use RL Environments: SWE-Bench: Can Language Models Resolve Real-World GitHub Issues? SWE-smith: Scaling Data for Software Engineering Agents ARE: Scaling up Agent Environments and Evaluations (GAIA2 blog) OS World Terminal Bench RL: DAPO: An Open-Source LLM Reinforcement Learning System at Scale ScaleRL Optional: OpenThoughts: Data Recipes for Reasoning Models NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks |
|
| Week 6 | |||
| Tue Oct 28 |
GEPA Guest speaker: Lakshya Agrawal |
Slides Reading: GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning DSPy Notebooks: GEPA Notebook (Colab) GEPA Agent Discovery - ARC AGI |
|
| Thu Oct 30 |
Security for agents: adversarial attacks and guardrails Guest Speakers: Matt Fredrikson & Andy Zou (CMU & Gray Swan AI) |
Slides Reading: Offense: Universal and Transferable Adversarial Attacks on Aligned Language Models Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition Defense: Representation Engineering: A Top-Down Approach to AI Transparency Improving Alignment and Robustness with Circuit Breakers Benchmarks: HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents Gray Swan Arena |
|
| Week 7 | |||
| Tue Nov 4 |
Democracy Day: Day of Civic Service (no classes) |
|
|
| Thu Nov 6 |
Mid-term project presentations and feedback - Group 1 |
|
|
| Week 8 | |||
| Tue Nov 11 |
Mid-term project presentations and feedback - Group 2 |
|
|
| Thu Nov 13 |
Multi-modal frontier models with agentic capabilities Guest Speaker: Deepak Ramachandran (Google DeepMind) |
Reading: TBD |
|
| Week 9 | |||
| Tue Nov 18 |
Project Semi-Final Presentation, with Feedback - Group 1 |
|
|
| Thu Nov 20 |
Project Semi-Final Presentation, with Feedback - Group 2 |
|
|
| Thanksgiving Break (Nov 25, Nov 27) | |||
| Week 10 | |||
| Dec 2 or 4, 2025 |
Final Project Fair ONE 2.5-hour meeting this week, Tues or Thurs. 3:30-6:00PM Each team gives opening 2-3 min presentation Group poster/demo session for all Guests welcome |
|
|
| Week 11 | |||
|
No Exam: Written Project Report due at Final Exam Time |
|
|