MS&E338/CS338 Aligning Superintelligence

Within a couple of decades, or less, it is plausible that humans will create an AI that is much smarter than humans in practically all domains of human activity. We refer to such an AI as a superintelligence. The alignment problem is how to make sure that such a superintelligence acts according to its human designer's intent. This course is intended for a technical audience interested in thinking about this problem.

But why would an AI not act according to its human designer’s intent? And if the AI were to misbehave, wouldn't the designer just modify it or shut it down? Furthermore, even if we accept that the AI will not always behave as intended, why should this be considered a major source of risk, let alone, a catastrophic risk? Why are some people saying that these risks should be a global priority on par with pandemics and nuclear war while others are saying that these concerns are overhyped?

In this course, we will discuss:

  • Why may a superintelligence become misaligned with its designer's intent?
  • Might misalignment pose a catastrophic risk?
  • What are proposed solutions to the alignment problem?
Guest lectures will be delivered by alignment researchers.

Course information

There will be Google doc (link) that contains additional details about the course. This document is the main hub for course information and will be updated throughout the course. Anyone with a Stanford email has access. If you are curious about the course but not at Stanford, please reach out and we will invite you.

Prerequisites

The course will place special emphasis on formalizing ideas. About ⅔ of the course will be theoretical and ⅓ empirical. To have the background to participate in this, each student is recommended to have taken

  • one graduate-level machine learning course
  • one course that models decision making (e.g., AI, RL, decision analysis, economics)

What this course is not about
This course will focus on the alignment of future superintelligence, rather than the alignment of current systems. There are many challenges that the course will not address. These include:

  • Use of AI by ill-intentioned humans. Such situations represent misalignment between humans, rather than between a human and an AI.
  • Aggregation of conflicting preferences across humans.
  • Minimization of bias in AI products.
  • How to organize society in a post-superintelligence world (governance, redistribution, retooling).
  • How to deal with misinformation and track provenance.
  • The moral status of future superintelligence.

Logistics
3-4:20pm Mondays and Wednesdays in 370-370. This is in building 370 by the Main Quad.

Course Assistants
Semyon Lomasov, slomasov AT stanford DOT edu.
Benlin Gan, bgan2 AT stanford DOT edu.