Schedule & Syllabus

The lecture slides, labs, and assignments will be posted here as the course progresses.
Lecture times are 4:30pm-5:50pm PST on Tuesdays and Thursdays. All deadlines are at 11:59pm PST unless otherwise specified.

This schedule is subject to change according to the pace of the class.

Date Description Materials Events
Week 1
Tue Sep 23 Course introduction and overview
Course structure and logistics
Project examples from 2023, 2024
Background on foundation models, LLMs
Approaches: prompting, fine-tuning, agentic AI
Slides
Reading:
LangGraph, Agent Architectures
OpenAI, A Practical Guide to Building Agents


Thu Sep 25 Models, Prompting and RAG
LLM power and limitations
Prompting techniques
Retrieval-Augmented Generation (RAG)
Slides
Reading:
Grounding and Evaluation for Large Language Models
The Prompt Report: A Systematic Survey of Prompt Engineering Techniques
Survey of Hallucination in Natural Language Generation
Homework 1:
Homework 1 Handout
Homework 1 Supplemental Materials:
Building and Evaluating Data Agents (deeplearning.ai)
Week 2
Tue Sep 30 Agentic AI
Foundations of AI agents
Sample agentic systems
Evaluation criteria and methods: goals, plans, and actions
Slides
Reading:
Anthropic, Building effective agents
What is your Agent's GPA? Evaluating and improving agent trustworthiness (blog)


Thu Oct 2 Recap, further discussion, and planning for projects
Review topics from lectures 2 and 3 in more depth
Discuss promising directions for course projects
Slides
Reading:
Together, Open Research Cookbook


Week 3
Tue Oct 7 Project proposals and feedback - Group 1




Thu Oct 9 Project proposals and feedback - Group 2




Week 4
Tue Oct 14 Systematic evaluation of agentic systems
Goals, Plans, Actions
Refining the LLM judge
Evaluation with and without ground truth
Guest Speaker: Allison Jia
Slides
Reading:
Evaluation and Benchmarking of LLM Agents: A Survey
Agent-as-a-Judge: Evaluate Agents with Agents
What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment
Homework 1 Due (4:30 PM)
Thu Oct 16 MCP, effective tool descriptions, evals
Guest Speakers: Yusuf Ozuysal & Ishita Gupta
Slides
Slides (Ishita)
Reading:
Introducing the Model Context Protocol
MCP Documentation
Writing effective tools for agents — with agents


Week 5
Tue Oct 21 Deep dive into LLMs
Slides
Reading:
Deep Dive into LLMs like ChatGPT (YouTube | Andrej Karpathy)


Thu Oct 23 Data curation and RL to optimize agents
Guest speaker: Trung Vu (Bespoke Labs)
Slides
Reading:
Improving Multi-Turn Tool Use with Reinforcement Learning
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
RL Environments:
SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?
SWE-smith: Scaling Data for Software Engineering Agents
ARE: Scaling up Agent Environments and Evaluations (GAIA2 blog)
OS World
Terminal Bench
RL:
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
ScaleRL
Optional:
OpenThoughts: Data Recipes for Reasoning Models
NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks


Week 6
Tue Oct 28 GEPA
Guest speaker: Lakshya Agrawal
Slides
Reading:
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
DSPy
Notebooks:
GEPA Notebook (Colab)
GEPA Agent Discovery - ARC AGI


Thu Oct 30 Security for agents: adversarial attacks and guardrails
Guest Speakers: Matt Fredrikson & Andy Zou
(CMU & Gray Swan AI)
Slides
Reading:
Offense:
Universal and Transferable Adversarial Attacks on Aligned Language Models
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition
Defense:
Representation Engineering: A Top-Down Approach to AI Transparency
Improving Alignment and Robustness with Circuit Breakers
Benchmarks:
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Gray Swan Arena


Week 7
Tue Nov 4 Democracy Day: Day of Civic Service (no classes)




Thu Nov 6 Mid-term project presentations and feedback - Group 1




Week 8
Tue Nov 11 Mid-term project presentations and feedback - Group 2




Thu Nov 13 Multi-modal frontier models with agentic capabilities
Guest Speaker: Deepak Ramachandran
(Google DeepMind)
Reading:
TBD


Week 9
Tue Nov 18 Project Semi-Final Presentation, with Feedback - Group 1




Thu Nov 20 Project Semi-Final Presentation, with Feedback - Group 2




Thanksgiving Break (Nov 25, Nov 27)
Week 10
Dec 2 or 4, 2025 Final Project Fair
ONE 2.5-hour meeting this week, Tues or Thurs. 3:30-6:00PM
Each team gives opening 2-3 min presentation
Group poster/demo session for all
Guests welcome




Week 11
No Exam: Written Project Report due at Final Exam Time