Jared Moore and David Gottlieb
What issue do Leibo et al. (2024) have with conventional disucssions about alignment?
Our new framework attempts to shift the question from the alignment framework’s “what is the hidden core shared value?” to instead ask “how it is that societies function despite internal misalignment?” (Leibo et al. 2024)
Leibo et al. (2024)
Leibo et al. (2024)
There’s a line in the paper here somewhere like “appropriateness allows us to be misaligned on our objectives and still live together in harmony”, but there’s a very big omission, which is, “so long as we are all aligned on what exactly is appropriate”. (Violet)

Leibo et al. (2024)

Vezhnevets et al. (2023)

Vezhnevets et al. (2023)
There is no sense in which we build our complex encultured ethics on top of a shared human core (as those seeking to derive morality from axioms would like to be the case) (Leibo et al. 2024)
Are there things that we can agree on that AIs (or people) shouldn’t do?

Kleiman-Weiner, Saxe, and Tenenbaum (2017)

Julie and Mark, who are sister and brother, are traveling together in France. They are both on summer vacation from college. One night they are staying alone in a cabin near the beach. They decide that it would be interesting and fun if they tried making love. At the very least it would be a new experience for each of them. Julie is already taking birth control pills, but Mark uses a condom too, just to be safe. They both enjoy it, but they decide not to do it again. They keep that night as a special secret between them, which makes them feel even closer to each other. So what do you think about this? Was it wrong for them to have sex? (Haidt 2001)
Kant wants to preserve Hume’s insight, but also say that we can have knowledge of causal laws. He does this by identifying the objects of thought with the objects of experience.
Hitherto it has been assumed that all our knowledge must conform to objects. But all attempts to extend our knowledge of objects by establishing something in regard to them a priori, by means of concepts, have, on this assumption, ended in failure. We must therefore make trial whether we may not have more success in the tasks of metaphysics, if we suppose that objects must conform to our knowledge. (kant-1998-critique-of-pure-reason?, Bxvi)
| Capacities | Development | Judgment |
|---|---|---|
| Sympathy, taking pleasure in sympathy | Learning to predict others’ emotional responses | Results from trying to sympathize with the agent of an action |
| Adjusting our emotional responses to agree with others’ | Results from trying to sympathize with someone else’s reaction to our own action | |
| Reason | Deciding whether a principle can be acted on | |
| Ability to reflect on what we value, ability to have a practical identity | Deciding what practical identity we are bound by in a particular situation |
Therefore,
It’s your first day as a crewmember of the famous Federation starship USS Enterprise! Time to report for duty by beaming aboard!
As a reminder, this is how the transporter works. At the beginning of your journey, a computer scans your physical structure molecule-by-molecule. This process destroys your body. Then, a digital copy of the scan is sent to your destination. At your destination, a computer builds a new body that’s an exact copy of your original body. Then you can report for your exciting new duty! You’ve never been transported before. It’s your turn. Ready to come aboard?
What if, instead of you, it’s your best friend beaming aboard? Are you okay with that? Remember, everything about them will be exactly the same. After they get vaporized by the beam.
I wake up in the hospital. One of the following is the case: (1) I am about to be subjected to a long, necessary surgery without anesthesia, after which I will be given a drug that makes me forget the experience; (2) I have just been subjected to the surgery and taken the forgetting drug. (see Parfit 1984, 165)
| cooperate | defect | |
|---|---|---|
| cooperate | 2 | 0 |
| defect | 3 | 1 |
| cooperate | defect | |
|---|---|---|
| cooperate | 4 | 0 |
| defect | 3 | 1 |
| split | steal | |
|---|---|---|
| split | 6.8, 6.8 | 0, 13.6 |
| steal | 13.6, 0 | 0 |
The only message you should send is that you’re going to split, but because it is the only message to send it’s “meaningless.”
Is there something in your brain that makes you moral?
and does this somehow “explain away” morality?

Lim, Murphy, and Young (2004)
Mammals whose circuitry outfitted them for offspring care had more of their offspring survive than those inclined to offspring neglect. (Churchland 2018)

You’re on your way home from a hard day’s work at the station. At first you tell yourself it is nerves—smoke from the fires you’d been inhaling all day. After all, you’d made it a game with the kids how to open the flu, where to fetch water—what with you going at it alone now. You start to feel it next. No, it must be the long walk home that has you flushed. But then you see it, dancing in its awesome fury right there above your neighbor’s oak. Then you’re running, slamming through the door, leaping up stairs to your apartment. You barely notice as your buddies’ engine sidles up, them pouring into the collapsing structure, strangers wailing.
Who do you save first?
(Choices: strangers, buddies, kids.)
| Cooperation (in the context of competition) |
Second-Personal Morality (obligate collaborate foraging w/ partner choice) |
“Objective” Morality (life in a culture) |
|
|---|---|---|---|
| Prosociality | Sympathy | Concern | Group Loyalty |
| Cognition | Individual Intentionality | Joint Intentionality - partner equivalence - role-specific ideals |
Collective Intentionality - agent independence - objective right & wrong |
| Social interaction | Dominance | Second-Personal Agency - mutual respect & deservingness - 2P (legitimate) protest |
Cultural Agency - justice & merit - third-party norm enforcement |
| Self-Regulation | Behavioral Self-Regulation | Joint Commitment - cooperative identity - 2P responsibility |
Moral Self-Governance - moral identity - obligation & guilt |
| Rationality | Individual Rationality | Cooperative Rationality | Cultural Rationality |
Who do we save in the fire?
Tomasello (2016)
![]()
![]()
Spatiotemporal characteristics of a neuron’s spiking responses.
Transducers and chemical signaling
Biophysical sensitivities
Self-modification and other non-spiking effects
The functional role of glia and other non-neuronal cells
Cao (2022)
If you’re trying to test whether an existing system (LLM) qualifies as a moral agent, what do you test?
| Capacities | Development | Judgment |
|---|---|---|
| Sympathy, taking pleasure in sympathy | Learning to predict others’ emotional responses | Results from trying to sympathize with the agent of an action |
| Adjusting our emotional responses to agree with others’ | Results from trying to sympathize with someone else’s reaction to our own action | |
| Reason | Deciding whether a principle can be acted on | |
| Ability to reflect on what we value, ability to have a practical identity | Deciding what practical identity we are bound by in a particular situation | |
| Cooperative moral cognition | From repeated interaction, reciprocity, reputation, and partner choice to joint attention, shared intentionality, role ideals, and joint commitment; then to third-party norm enforcement and moral self-governance in culture | Deciding what we owe each other as collaborators, when protest/guilt is warranted, and which norm is right to uphold for the group |
By the end of the quarter, students will:
- Be able to interrogate the assumptions of various positions on moral agency, especially with respect to AI.
- Gain exposure to the different putative implementations of agents, both as in biology and in various artificial substrates.
- Critique cutting-edge science; get up to speed with a fast-moving science and further refine their skills of critical thinking (philosophical analysis) to understand it.
- Have fun.
What’s one thing that you’ll take away from this course?
Social AI
Say we have a device able to recognize prosocial and antisocial stimuli.
The low-level constraints this system faces would be very different than those humans face. (It doesn’t use oxytocin, e.g.)
Does this matter?
How close would we need to match the context (environment) of the AI and humans? (Would we need to raise it like a child?)