Overview

What is safe AI, and how do we make it? CS120 explores this question, focusing on the technical challenges of creating reliable, ethical, and aligned AI systems. We distinguish between model-specific and systemic safety issues, from examining fairness and data limitations to adversarial vulnerabilities and embedding desired behavior in AI. While primarily focusing on current solutions and their limitations through CS publications, we will also discuss socio-technical concerns of modern AI deployment, how oversight of intelligence could look like, and what future risks we might face.

Topics will span reinforcement learning, computer vision, and natural language processing, focusing on interpretability, robustness, and evaluations. You will gain insights into the complexities and problems of why ensuring AI safety and reliability is challenging through lectures, readings, quizzes, and a final project. This course aims to prepare you to critically assess and contribute to safe AI development, equipping them with knowledge of cutting-edge research and ongoing debates in the field.

Instructor

Course Assistant

Course Assistant

WeekDateLecturer Topic
Week 109/24/24Max What Does Safe AI Mean Anyways?
09/26/24Max [Optional] Technical AI/Machine Learning Recap
Week 210/01/24Max Reward Functions, Alignment, and Human Preferences
10/03/24Max Encoding Human Preferences in AI
Week 310/08/24Sang Truong [Guest] Efficient Alignment and Evaluation for the Language Models
10/10/24Max Data Is All You Need - The Impact of Data
Week 410/15/24Max AI Vulnerabilities: Robustness and Adversaries
10/17/24Max Needs for AI Safety Today: Beyond the Hype
Week 510/22/24Robert Moss [Guest] Sequential decision making for safety-critical applications
10/24/24Max Full Access: Inner Interpretability Methods
Week 610/29/24Jared Moore [Guest] Multivalue Alignment and AI Ethics
10/31/24Sydney Katz [Guest] Validation of AI Systems
Week 711/05/24 [No class] Election Day
11/07/24Max What If I Have A Black Box? Explainability and Interpretability
Week 811/12/24Max Troubles of Anthropomorphizing AI
11/14/24Anka Reuel [Guest] Technical AI Governance
Week 911/19/24TBD [Guest] TBD
11/21/24Max Electric Sheep: What Is Intelligence and Does It Want?
Week 1011/26/24 [No class] Thanksgiving
11/28/24 [No class] Thanksgiving
Week 1112/03/24Max Attributing Model Behavior at Scale
12/05/24Max Scalable Oversight: How to Supervise Advanced AI?

Logistics

Class Information

Anonymous Feedback

This form is completely anonymous and a way for you to share your thoughts, concerns and ideas with the CS 120 teaching team.

Auditing The Class

You are welcome to audit the class! Please reach out to me (Max) if you want to audit the class to ensure we do not reach the capacity of the classroom.

Please note that auditing is only allowed for matriculated undergraduates, matriculated graduate/professional students, postdoctoral scholars, visiting scholars, Stanford faculty, and Stanford staff. After checking with me, please fill out this form and submit it. Non-Stanford students cannot audit the course. The current Stanford auditing policy is stated here.

Also, if you are auditing the class, please be informed that audited courses are not recorded on an academic transcript and no official records are maintained for auditors. There will not be any record that they audited the course.

Academic Integrity and the Honor Code

Violating the Honor Code is a serious offense, even when the violation is unintentional. The Honor Code is available here. Students are responsible for understanding the University rules regarding academic integrity. In brief, conduct prohibited by the Honor Code includes all forms of academic dishonesty including and representing as one's own work the work of another. If students have any questions about these matters, they should contact their section instructor.

Diversity, Equity and Inclusion

Much of the writing on existential risk produced in the last few decades, especially the notion of longtermism and its implications, has been authored by white male residents of high income countries. Diverse perspectives on threats to the future of humanity enrich our understanding and improve creative problem-solving. We have intentionally pulled work from a broader range of scholars. We encourage students to consider not only the ideas offered by various authors, but also how their social, economic and political position informs their views.

This class provides a setting where individuals of all visible and nonvisible differences– including but not limited to race, ethnicity, national origin, cultural identity, gender, gender identity, gender expression, sexual orientation, physical ability, body type, socioeconomic status, veteran status, age, and religious, philosophical, and political perspectives–are welcome. Each member of this learning community is expected to contribute to creating and maintaining a respectful, inclusive environment for all the other members. If students have any concerns please reach out to Professor Barrett.

Students with Documented Disabilities

Students who need an academic accommodation based on the impact of a disability must initiate the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is being made. Students should contact the OAE as soon as possible since timely notice is needed to coordinate accommodations. The OAE is located at 563 Salvatierra Walk (phone: 723-1066, URL: http://oae.stanford.edu).

Grading

Each week, students are expected to do the required readings and submit quizzes. Towards the end, students will need to submit a final project (later quizzes will adjusted and reduced in scope). Final projects can range from running experiments to writing literature reviews and policy recommendations to accommodate for different backgrounds. The grading breakdown is:

Quizzes

Submit each weekly quiz (one per week, released by Friday 3 pm in that week) by the following quiz release (i.e., 7 days later) in Canvas. The quizzes are based on the content from the previous lectures and the listed readings. The quizzes will not cover readings marked as “optional”, unless they were explicitly covered in the lectures.

Final Projects

A third of the final grade will be determined by a final project, which needs to be submitted by (TBD).

Peer Review

A tenth of the final grade will be determined by the quality of two peer reviews.

Late Days

All students get 6 late days at the start of the course.

Attendance

All classes have mandatory attendance.

Curriculum

The rest of this document contains the schedule of assigned and optional readings for each week. Course slides, lecture recordings, and quizzes will be linked below, though we do not guarantee that all lectures will be recorded, especially with guest speakers.

Readings can be subject to change throughout the course, but will not change more than 14 days in advance. Please check the curriculum here rather than a printed or duplicated copy.

How to Read Research Papers

This course builds mainly on technical AI research papers as readings. If you haven't read AI research papers before, we recommend checking out these resources:

Deadlines

WeekDateLecturer Topic
Week 109/24/24Max What Does Safe AI Mean Anyways?

Lecture Slides + Recording

Readings (Required)
    TBD
Optional Readings (Not Required)
    None
09/26/24Max [Optional] Technical AI/Machine Learning Recap

Lecture Slides + Recording

Readings (Required)
    None
Optional Readings (Not Required)
Week 210/01/24Max Reward Functions, Alignment, and Human Preferences
Readings (Required) Optional Readings (Not Required)
  • TBD
10/03/24Max Encoding Human Preferences in AI
Readings (Required) Optional Readings (Not Required)
Week 310/08/24Sang Truong [Guest] Efficient Alignment and Evaluation for the Language Models
Readings (Required)
    TBD
Optional Readings (Not Required)
    None
10/10/24Max Data Is All You Need - The Impact of Data
Readings (Required) Optional Readings (Not Required)
Week 410/15/24Max AI Vulnerabilities: Robustness and Adversaries
Readings (Required) Optional Readings (Not Required)
10/17/24Max Needs for AI Safety Today: Beyond the Hype
Readings (Required) Optional Readings (Not Required)
Week 510/22/24Robert Moss [Guest] Sequential decision making for safety-critical applications
Readings (Required)
    TBD
Optional Readings (Not Required)
    None
10/24/24Max Full Access: Inner Interpretability Methods
Readings (Required) Optional Readings (Not Required)
Week 610/29/24Jared Moore [Guest] Multivalue Alignment and AI Ethics
Readings (Required)
    TBD
Optional Readings (Not Required)
    None
10/31/24Sydney Katz [Guest] Validation of AI Systems
Readings (Required)
    TBD
Optional Readings (Not Required)
    None
Week 711/05/24 [No class] Election Day
11/07/24Max What If I Have A Black Box? Explainability and Interpretability
Readings (Required) Optional Readings (Not Required)
    None
Week 811/12/24Max Troubles of Anthropomorphizing AI
Readings (Required) Optional Readings (Not Required)
11/14/24Anka Reuel [Guest] Technical AI Governance
Readings (Required)
    TBD
Optional Readings (Not Required)
    None
Week 911/19/24TBD [Guest] TBD
Readings (Required)
    TBD
Optional Readings (Not Required)
    None
11/21/24Max Electric Sheep: What Is Intelligence and Does It Want?
Readings (Required) Optional Readings (Not Required)
    None
Week 1011/26/24 [No class] Thanksgiving
11/28/24 [No class] Thanksgiving
Week 1112/03/24Max Attributing Model Behavior at Scale
Readings (Required) Optional Readings (Not Required)
12/05/24Max Scalable Oversight: How to Supervise Advanced AI?
Readings (Required) Optional Readings (Not Required)