CS321M: AI Measurement Science

Course Description

Artificial Intelligence (AI) measurement science provides frameworks and methodologies for evaluating, benchmarking, and understanding AI systems. As AI systems become increasingly powerful and deploy into high-stakes domains, the need for rigorous measurement approaches has become increasingly important. Current measurement approaches are often ad hoc, lacking theoretical grounding, and failing to connect to real-world use cases. This has led to a measurement crisis characterized by benchmark saturation, inconsistent evaluation methodologies, and difficulty in making valid claims about AI capabilities. This course develops AI measurement science through three connected themes:

Measurement as Predictive Modeling: probabilistic models of evaluation data (item-wise and pairwise response models, latent variable models), modeling benchmark response matrices, scaling laws, sample-efficient measurement.
Measurement Validity and Reliability: validity theory applied to AI evaluation (content, criterion, construct, external, and consequential validity), operationalizing constructs in AI systems, and reliability analysis including noise models and sources of measurement error.
Design, Governance, and Applications: benchmark and instrument design, synthetic and adversarial evaluation, incentive-aware leaderboard design, and governance and policy considerations around AI measurement.

This is a graduate-level course. By the end of the course, students will be able to understand, implement, and critique state-of-the-art AI measurement approaches and be prepared to conduct research in these areas.

Given the rapid growth of this field, the course will consist of weekly lectures and student-led discussions of assigned papers. Graded work includes two homeworks focused on implementing and analyzing measurement approaches, three quizzes, and a final project where students will develop a novel measurement approach or analysis for an AI system or capability.

If you are a CS PhD student at Stanford, this course is counted toward the breadth requirement for "Learning and Modeling".

Teaching Staff

Contact

Personal matters: cs321m-spr2526-staff@lists.stanford.edu

Logistics

Syllabus: Syllabus
Textbook: aimslab.stanford.edu
Lectures: Monday and Wednesday, 11:30 AM – 12:50 PM in CoDa B60
Assignments: 2 homeworks, 3 quizzes, 1 final project.
Prerequisites:
- Machine Learning (e.g., CS 221, CS 229, CS 230, CS 224N)
- Probability & Statistics (e.g., CS 109 or equivalent)
- Linear Algebra & Calculus (e.g., MATH 51, CME 100)
- Proficiency in Python

CS321M: AI Measurement Science

Stanford · Spring 2026

Course Description

Teaching Staff

Contact

Logistics