Artificial Intelligence (AI) measurement science provides frameworks and methodologies for evaluating, benchmarking, and understanding AI systems. As AI systems become increasingly powerful and deploy into high-stakes domains, the need for rigorous measurement approaches has become increasingly important. Current measurement approaches are often ad hoc, lacking theoretical grounding, and failing to connect to real-world use cases. This has led to a measurement crisis characterized by benchmark saturation, inconsistent evaluation methodologies, and difficulty in making valid claims about AI capabilities. This course develops AI measurement science through three connected themes:
This is a graduate-level course. By the end of the course, students will be able to understand, implement, and critique state-of-the-art AI measurement approaches and be prepared to conduct research in these areas.
Given the rapid growth of this field, the course will consist of weekly lectures and student-led discussions of assigned papers. Graded work includes two homeworks focused on implementing and analyzing measurement approaches, three quizzes, and a final project where students will develop a novel measurement approach or analysis for an AI system or capability.
If you are a CS PhD student at Stanford, this course is counted toward the breadth requirement for "Learning and Modeling".