What is safe AI, and how do we make it? CS120 explores this question, focusing on the technical challenges of creating reliable, ethical, and aligned AI systems. We distinguish between model-specific and systemic safety issues, from examining fairness and data limitations to adversarial vulnerabilities and embedding desired behavior in AI. While primarily focusing on current solutions and their limitations through CS publications, we will also discuss socio-technical concerns of modern AI deployment, how oversight of intelligence could look like, and what future risks we might face.
Topics will span reinforcement learning, computer vision, and natural language processing, focusing on interpretability, robustness, and evaluations. You will gain insights into the complexities and problems of why ensuring AI safety and reliability is challenging through lectures, readings, quizzes, and a final project. This course aims to prepare you to critically assess and contribute to safe AI development, equipping them with knowledge of cutting-edge research and ongoing debates in the field.
Week | Date | Lecturer | Topic |
---|---|---|---|
Week 1 | 09/24/24 | Max | What Does Safe AI Mean Anyways? |
09/26/24 | Max | [Optional] Technical AI/Machine Learning Recap | |
Week 2 | 10/01/24 | Max | Reward Functions, Alignment, and Human Preferences |
10/03/24 | Max | Encoding Human Preferences in AI | |
Week 3 | 10/08/24 | Sang Truong | [Guest] Efficient Alignment and Evaluation for the Language Models |
10/10/24 | Max | Data Is All You Need - The Impact of Data | |
Week 4 | 10/15/24 | Max | AI Vulnerabilities: Robustness and Adversaries |
10/17/24 | Max | Needs for AI Safety Today: Beyond the Hype | |
Week 5 | 10/22/24 | Robert Moss | [Guest] Sequential decision making for safety-critical applications |
10/24/24 | Max | Interpretability I | |
Week 6 | 10/29/24 | Jared Moore | [Guest] Multivalue Alignment and AI Ethics |
10/31/24 | Sydney Katz | [Guest] Validation of AI Systems | |
Week 7 | 11/05/24 | [No class] Election Day | |
11/07/24 | Anka Reuel | [Guest] Technical AI Governance | |
Week 8 | 11/12/24 | Max | Interpretability II |
11/14/24 | Max | Troubles of Anthropomorphizing AI | |
Week 9 | 11/19/24 | Min Wu | [Guest] Verified Explainability |
11/21/24 | Max | Electric Sheep: What Is Intelligence and Does It Want? | |
Week 10 | 11/26/24 | [No class] Thanksgiving | |
11/28/24 | [No class] Thanksgiving | ||
Week 11 | 12/03/24 | Max | Attributing Model Behavior at Scale |
12/05/24 | Max | Scalable Oversight: How to Supervise Advanced AI? |
This form is completely anonymous and a way for you to share your thoughts, concerns and ideas with the CS 120 teaching team.
You are welcome to audit the class! Please reach out to me (Max) if you want to audit the class to ensure we do not reach the capacity of the classroom.
Please note that auditing is only allowed for matriculated undergraduates, matriculated graduate/professional students, postdoctoral scholars, visiting scholars, Stanford faculty, and Stanford staff. After checking with me, please fill out this form and submit it. Non-Stanford students cannot audit the course. The current Stanford auditing policy is stated here.
Also, if you are auditing the class, please be informed that audited courses are not recorded on an academic transcript and no official records are maintained for auditors. There will not be any record that they audited the course.
Violating the Honor Code is a serious offense, even when the violation is unintentional. The Honor Code is available here. Students are responsible for understanding the University rules regarding academic integrity. In brief, conduct prohibited by the Honor Code includes all forms of academic dishonesty including and representing as one's own work the work of another. If students have any questions about these matters, they should contact their section instructor.
Much of the writing on existential risk produced in the last few decades, especially the notion of longtermism and its implications, has been authored by white male residents of high income countries. Diverse perspectives on threats to the future of humanity enrich our understanding and improve creative problem-solving. We have intentionally pulled work from a broader range of scholars. We encourage students to consider not only the ideas offered by various authors, but also how their social, economic and political position informs their views.
This class provides a setting where individuals of all visible and nonvisible differences– including but not limited to race, ethnicity, national origin, cultural identity, gender, gender identity, gender expression, sexual orientation, physical ability, body type, socioeconomic status, veteran status, age, and religious, philosophical, and political perspectives–are welcome. Each member of this learning community is expected to contribute to creating and maintaining a respectful, inclusive environment for all the other members. If students have any concerns please reach out to Professor Barrett.
Students who need an academic accommodation based on the impact of a disability must initiate the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is being made. Students should contact the OAE as soon as possible since timely notice is needed to coordinate accommodations. The OAE is located at 563 Salvatierra Walk (phone: 723-1066, URL: http://oae.stanford.edu).
Each week, students are expected to do the required readings and submit quizzes. Towards the end, students will need to submit a final project (later quizzes will adjusted and reduced in scope). Final projects can range from running experiments to writing literature reviews and policy recommendations to accommodate for different backgrounds. The grading breakdown is:
Letter Grade | Percentage |
---|---|
A | 89-100% |
A- | 86-88% |
B+ | 83-85% |
B | 80-82% |
B- | 76-78% |
C+ | 73-75% |
Letter Grade | Percentage |
---|---|
C | 69-72% |
C- | 66-68% |
D+ | 63-65% |
D | 59-62% |
D- | 56-58% |
F | 0-55% |
While it's possible to receive an A+, only a few outstanding students will earn this grade.
Submit each weekly quiz (one per week, released by Thursday 5 pm in that week) by the following quiz release (i.e., 7 days later) on Gradescope. The quizzes are based on the content from the previous lectures and the listed readings. The quizzes will not cover readings marked as “optional”, unless they were explicitly covered in the lectures.
A third of the final grade will be determined by a final project, which needs to be submitted by 5 PM PST on 12/06/2024:.
The first two pages should concise of:
It is not sufficient to only cite papers from the curriculum. You are expected to explore further related works. A good starting point could be to examine the references in a lecture paper or look up which works cite that paper online. The last half page should be a
The middle section (pages 3 onward) will depend on the nature of your project. We encourage you to study different papers from the reading list to get a better feel for how they approach their topics.
For project ideas, you can also study recent publications from different conferences and workshops:
Here are a few example project ideas.
We do not expect you to write a final project on par with any of these publications. If you are unsure about the appropriate project scope, but have a topic in mind, we can discuss details after class or in office hours. Quiz 6 will also help you find a final project topic.
A twelfth of the final grade will be determined by the quality of two peer reviews.
All students get 6 late days at the start of the course.
All classes have mandatory attendance.
The rest of this document contains the schedule of assigned and optional readings for each week. Course slides, lecture recordings, and quizzes will be linked below, though we do not guarantee that all lectures will be recorded, especially with guest speakers.
Readings can be subject to change throughout the course, but will not change more than 14 days in advance. Please check the curriculum here rather than a printed or duplicated copy.
Week | Date | Lecturer | Topic | |
---|---|---|---|---|
Week 1 | 09/24/24 | Max | What Does Safe AI Mean Anyways? | |
Readings (Required)
|
||||
09/26/24 | Max | [Optional] Technical AI/Machine Learning Recap | ||
Readings (Required)
|
||||
Week 2 | 10/01/24 | Max | Reward Functions, Alignment, and Human Preferences | |
Readings (Required)
|
||||
10/03/24 | Max | Encoding Human Preferences in AI | ||
Readings (Required)
|
||||
Week 3 | 10/08/24 | Sang Truong | [Guest] Efficient Alignment and Evaluation for the Language Models | |
Lecture Slides (No Recording) Readings (Required)
|
||||
10/10/24 | Max | Data Is All You Need - The Impact of Data | ||
Readings (Required)
|
||||
Week 4 | 10/15/24 | Max | AI Vulnerabilities: Robustness and Adversaries | |
Readings (Required)
|
||||
10/17/24 | Max | Needs for AI Safety Today: Beyond the Hype | ||
Readings (Required)
|
||||
Week 5 | 10/22/24 | Robert Moss | [Guest] Sequential decision making for safety-critical applications | |
Lecture (No Slides) (No Recording) Readings (Required)
|
||||
10/24/24 | Max | Interpretability I | ||
Readings (Required)
|
||||
Week 6 | 10/29/24 | Jared Moore | [Guest] Multivalue Alignment and AI Ethics | |
Readings (Required)
|
||||
10/31/24 | Sydney Katz | [Guest] Validation of AI Systems | ||
Readings (Required)
|
||||
Week 7 | 11/05/24 | [No class] Election Day | ||
11/07/24 | Anka Reuel | [Guest] Technical AI Governance | ||
Readings (Required)
|
||||
Week 8 | 11/12/24 | Max | Interpretability II | |
Readings (Required)
|
||||
11/14/24 | Max | Troubles of Anthropomorphizing AI | ||
Readings (Required)
|
||||
Week 9 | 11/19/24 | Min Wu | [Guest] Verified Explainability | |
Readings (Required)
|
||||
11/21/24 | Max | Electric Sheep: What Is Intelligence and Does It Want? | ||
Readings (Required)
|
||||
Week 10 | 11/26/24 | [No class] Thanksgiving | ||
11/28/24 | [No class] Thanksgiving | |||
Week 11 | 12/03/24 | Max | Attributing Model Behavior at Scale | |
Readings (Required)
|
||||
12/05/24 | Max | Scalable Oversight: How to Supervise Advanced AI? | ||
Readings (Required)
|