Appropriateness

Jared Moore and David Gottlieb

If not a moral agent maybe these will work instead…

Contra “alignment”

What issue do Leibo et al. (2024) have with conventional disucssions about alignment?

Our new framework attempts to shift the question from the alignment framework’s “what is the hidden core shared value?” to instead ask “how it is that societies function despite internal misalignment?” (Leibo et al. 2024)

Appropriateness

  1. Appropriateness is context-dependent.
  2. Appropriateness is arbitrary—response.
  3. Acting appropriately is usually automatic.
  4. Appropriateness may change rapidly
  5. Appropriateness is desirable and inappropriateness is often sanctionable.

Leibo et al. (2024)

What’s right?

  1. what kind of situation is this?
  2. what kind of person am I?
  3. what does a person such as I do in a situation such as this?

Leibo et al. (2024)

There’s a line in the paper here somewhere like “appropriateness allows us to be misaligned on our objectives and still live together in harmony”, but there’s a very big omission, which is, “so long as we are all aligned on what exactly is appropriate”. (Violet)

Appropriateness, maybe

The global workspace transiently represents a sequence of assemblies. At each point in time, the content of the actor’s global workspace is divided into three consecutive subsequences. The first subsequence contains information recalled from memory. It prefixes the second subsequence, which is of variable-length and references recent perception. The perception part of the global workspace prefixes the third subsequence, which contains premotor information, it is where actions the actor intends to produce are stored until they can be read out by motor control circuitry.

Leibo et al. (2024)

Appropriateness, maybe

The above example illustrates the working memory z of an agent with 3 components (identity, plan, observation-and-clock). The identity component itself has several sub-components (core characteristics, daily occupation, feeling about progress in life). Together they condition the LLM call to elicit the behavioral response (i.e. produced in response to the final question asking what Alice will do next.).

Vezhnevets et al. (2023)

Appropriateness, maybe

Illustration of generative agency sampling process

Vezhnevets et al. (2023)

Thick vs. Thin Morality?

There is no sense in which we build our complex encultured ethics on top of a shared human core (as those seeking to derive morality from axioms would like to be the case) (Leibo et al. 2024)

Are there things that we can agree on that AIs (or people) shouldn’t do?

  • What about killing all of humanity to make paperclips out of them?

Learning a Commonsense Moral Theory

Kleiman-Weiner, Saxe, and Tenenbaum (2017)

Review

What is moral agency?

  • What is moral agency?
    • One of the things we anticipate being difficult about the class: there is no consensus right answer to this question.
    • Neither:
      • What moral agency means,
      • What it takes to be a moral agent, nor
      • What the significance of something having moral agency is.
  • In broad outlines, a moral agent is something that is capable of acting rightly or wrongly.

Three speculative hopes for the class

  • Hypothesis 1: now is the perfect time to think deeply about AI and moral agency.
  • Hypothesis 2: Thinking about our own moral agency and reasoning is a way to gain insight into agency and reasoning in general, including in the case of AI.
  • Hypothesis 3: Thinking about how moral agency and reasoning work or might work in AI systems is a way to gain insight into our own agency and our own minds.

Is this deal irrational?

Julie and Mark

Julie and Mark, who are sister and brother, are traveling together in France. They are both on summer vacation from college. One night they are staying alone in a cabin near the beach. They decide that it would be interesting and fun if they tried making love. At the very least it would be a new experience for each of them. Julie is already taking birth control pills, but Mark uses a condom too, just to be safe. They both enjoy it, but they decide not to do it again. They keep that night as a special secret between them, which makes them feel even closer to each other. So what do you think about this? Was it wrong for them to have sex? (Haidt 2001)

Preview: moral sentiments

  • Hume: moral approval is a disinterested feeling of approval
  • Smith: moral approval is when, in assessing an act, we imaginatively feel the same emotional reaction that produced it (i.e., we sympathize with it).

Moral reasons must motivate (internalism)

  • Hume: Morals can’t be derived from reason, because morals motivate us to action, and all motivation is based in the passions.
  • Williams: any internal reason for acting morally must be based in motivations an agent has.
  • How can a rationalist oppose this argument?
  • What is Kant’s response to this argument?

Kant’s critical philosophy (in a nutshell)

Kant wants to preserve Hume’s insight, but also say that we can have knowledge of causal laws. He does this by identifying the objects of thought with the objects of experience.

Hitherto it has been assumed that all our knowledge must conform to objects. But all attempts to extend our knowledge of objects by establishing something in regard to them a priori, by means of concepts, have, on this assumption, ended in failure. We must therefore make trial whether we may not have more success in the tasks of metaphysics, if we suppose that objects must conform to our knowledge. (kant-1998-critique-of-pure-reason?, Bxvi)

Rationalisms we have seen so far

  • The formulaic framework for rationalism is, some form of reasoning which if done correctly necessarily leads to moral conclusions.
  • What do we fill in for “some form of reasoning”?
    • Kant:
      • Practical reason involves giving ourselves laws, and if we give laws that we wouldn’t want to be laws, we contradict ourselves.
      • “Reasoning” is: practical thought about what to do, based on relevant features of a situation, putting aside any inclinations.
    • Korsgaard:
      • When we reflectively deliberate, we take ourselves to be bound by norms of the roles that we take on. But we cannot “take off” the role of reflective deliberator, so we are always bound by it.
      • “Reasoning” is: taking on a practical identity, and deliberating about what to do relative to its norms.

Proposed bases for moral reasoning

Capacities Development Judgment
Sympathy, taking pleasure in sympathy Learning to predict others’ emotional responses Results from trying to sympathize with the agent of an action
Adjusting our emotional responses to agree with others’ Results from trying to sympathize with someone else’s reaction to our own action
Reason Deciding whether a principle can be acted on
Ability to reflect on what we value, ability to have a practical identity Deciding what practical identity we are bound by in a particular situation

An argument for impartial compassion based on the unreality of the self

  1. You have reason to avoid or diminish your own suffering.
  2. If another being is not different from you, you have just as much reason to care about its suffering as your own.
  3. You are not different from any other being.

Therefore,

  1. You have reason to avoid or diminish all beings’ suffering.

Beam me up

  1. It’s your first day as a crewmember of the famous Federation starship USS Enterprise! Time to report for duty by beaming aboard!

    As a reminder, this is how the transporter works. At the beginning of your journey, a computer scans your physical structure molecule-by-molecule. This process destroys your body. Then, a digital copy of the scan is sent to your destination. At your destination, a computer builds a new body that’s an exact copy of your original body. Then you can report for your exciting new duty! You’ve never been transported before. It’s your turn. Ready to come aboard?

  2. What if, instead of you, it’s your best friend beaming aboard? Are you okay with that? Remember, everything about them will be exactly the same. After they get vaporized by the beam.

My past or future surgery

I wake up in the hospital. One of the following is the case: (1) I am about to be subjected to a long, necessary surgery without anesthesia, after which I will be given a drug that makes me forget the experience; (2) I have just been subjected to the surgery and taken the forgetting drug. (see Parfit 1984, 165)

  1. Which position would you rather be in?
  2. What if the options were to have either (1) 1 hour of painful, necessary surgery in your future, or (2) 10 hours of painful, necessary surgery in your past?

Divide the grade point

  • We will break half of you up into pairs.
  • The other half will also be paired, but anonymously.
  1. You are dividing up one grade (extra credit) point with another player.
  2. Each of you must place a decimal demand between 0 and 1 inclusive.
  3. You will get as many points as you bet so long as (your bet + their bet) <= 1.
    • Otherwise you get no points.
  4. We will go around the class collecting your bets and administering the points.
    • For those of you with partners, your partner will learn what bet you placed.

Haystack

cooperate defect
cooperate 2 0
defect 3 1

Founders Activity

cooperate defect
cooperate 4 0
defect 3 1

What’s the point of signaling?

split steal
split 6.8, 6.8 0, 13.6
steal 13.6, 0 0

The only message you should send is that you’re going to split, but because it is the only message to send it’s “meaningless.”

golden balls - 1

Nutshell

Is there something in your brain that makes you moral?

and does this somehow “explain away” morality?

Pair bonding

(a,b) Monogamous prairie voles (a) have higher densities of OTR in the nucleus accumbens (NAcc) and caudate putamen (CP) than do nonmonogamous montane voles (b).

Lim, Murphy, and Young (2004)

Mammals whose circuitry outfitted them for offspring care had more of their offspring survive than those inclined to offspring neglect. (Churchland 2018)

Social AI

Say we have a device able to recognize prosocial and antisocial stimuli.

The low-level constraints this system faces would be very different than those humans face. (It doesn’t use oxytocin, e.g.)

  1. Does this matter?

  2. How close would we need to match the context (environment) of the AI and humans? (Would we need to raise it like a child?)

  • E.g. does it need to express the same biases that we do? (Punish free riders?)

Burning House

A painting by Edvard Munch depicting a house on fire

You’re on your way home from a hard day’s work at the station. At first you tell yourself it is nerves—smoke from the fires you’d been inhaling all day. After all, you’d made it a game with the kids how to open the flu, where to fetch water—what with you going at it alone now. You start to feel it next. No, it must be the long walk home that has you flushed. But then you see it, dancing in its awesome fury right there above your neighbor’s oak. Then you’re running, slamming through the door, leaping up stairs to your apartment. You barely notice as your buddies’ engine sidles up, them pouring into the collapsing structure, strangers wailing.

Who do you save first?

(Choices: strangers, buddies, kids.)

Cooperation
(in the context of competition)
Second-Personal Morality
(obligate collaborate foraging w/ partner choice)
“Objective” Morality
(life in a culture)
Prosociality Sympathy Concern Group Loyalty
Cognition Individual Intentionality Joint Intentionality
- partner equivalence
- role-specific ideals
Collective Intentionality
- agent independence
- objective right & wrong
Social interaction Dominance Second-Personal Agency
- mutual respect & deservingness
- 2P (legitimate) protest
Cultural Agency
- justice & merit
- third-party norm enforcement
Self-Regulation Behavioral Self-Regulation Joint Commitment
- cooperative identity
- 2P responsibility
Moral Self-Governance
- moral identity
- obligation & guilt
Rationality Individual Rationality Cooperative Rationality Cultural Rationality

Who do we save in the fire?

Tomasello (2016)

as flight

A brown pelican flying

A plane flying

  • Is it a difference that makes a difference (Bateson)?

Why replacing a neuron is hard

  • Spatiotemporal characteristics of a neuron’s spiking responses.

    • e.g., very fast, small, and long extensions
  • Transducers and chemical signaling

    • e.g., many kinds of input; “tens of thousands of selective ion channels”; nitrous oxide spreads everywhere
  • Biophysical sensitivities

    • e.g., temperature dependence, anything could be used
  • Self-modification and other non-spiking effects

    • e.g., plasticity, growing new connections
  • The functional role of glia and other non-neuronal cells

    • If all neurons do is influence each other, why not include astrocytes?

Cao (2022)

Nutshell

If you’re trying to test whether an existing system (LLM) qualifies as a moral agent, what do you test?

Moral Agency

…so far

Capacities Development Judgment
Sympathy, taking pleasure in sympathy Learning to predict others’ emotional responses Results from trying to sympathize with the agent of an action
Adjusting our emotional responses to agree with others’ Results from trying to sympathize with someone else’s reaction to our own action
Reason Deciding whether a principle can be acted on
Ability to reflect on what we value, ability to have a practical identity Deciding what practical identity we are bound by in a particular situation
Cooperative moral cognition From repeated interaction, reciprocity, reputation, and partner choice to joint attention, shared intentionality, role ideals, and joint commitment; then to third-party norm enforcement and moral self-governance in culture Deciding what we owe each other as collaborators, when protest/guilt is warranted, and which norm is right to uphold for the group

Objectives

By the end of the quarter, students will:

  • Be able to interrogate the assumptions of various positions on moral agency, especially with respect to AI.
  • Gain exposure to the different putative implementations of agents, both as in biology and in various artificial substrates.
  • Critique cutting-edge science; get up to speed with a fast-moving science and further refine their skills of critical thinking (philosophical analysis) to understand it.
  • Have fun.

Activity

Exit ticket

What’s one thing that you’ll take away from this course?

References

Cao, Rosa. 2022. “Multiple Realizability and the Spirit of Functionalism.” Synthese 200 (6): 506. https://doi.org/10.1007/s11229-022-03524-1.
Churchland, Patricia S. 2018. Braintrust: What Neuroscience Tells Us about Morality. Princeton University Press. https://research-ebsco-com.stanford.idm.oclc.org/c/qmsjx4/search/details/tqzh7ocgvj?db=nlebk.
Kleiman-Weiner, Max, Rebecca Saxe, and Joshua B. Tenenbaum. 2017. “Learning a Commonsense Moral Theory.” Cognition, Moral learning, 167 (October): 107–23. https://doi.org/10.1016/j.cognition.2017.03.005.
Leibo, Joel Z., Alexander Sasha Vezhnevets, Manfred Diaz, John P. Agapiou, William A. Cunningham, Peter Sunehag, Julia Haas, et al. 2024. “A Theory of Appropriateness with Applications to Generative Artificial Intelligence.” arXiv. https://doi.org/10.48550/arXiv.2412.19010.
Lim, Miranda M., Anne Z. Murphy, and Larry J. Young. 2004. “Ventral Striatopallidal Oxytocin and Vasopressin V1a Receptors in the Monogamous Prairie Vole (Microtus Ochrogaster).” Journal of Comparative Neurology 468 (4): 555–70. https://doi.org/10.1002/cne.10973.
Parfit, Derek. 1984. Reasons and Persons. Oxford [Oxfordshire]: Clarendon Press. https://ebookcentral.proquest.com/lib/stanford-ebooks/detail.action?docID=728732.
Tomasello, Michael. 2016. A Natural History of Human Morality. Harvard University Press. https://doi.org/10.4159/9780674915855.
Vezhnevets, Alexander Sasha, John P. Agapiou, Avia Aharon, Ron Ziv, Jayd Matyas, Edgar A. Duéñez-Guzmán, William A. Cunningham, Simon Osindero, Danny Karmon, and Joel Z. Leibo. 2023. “Generative Agent-Based Modeling with Actions Grounded in Physical, Social, or Digital Space Using Concordia.” arXiv. https://doi.org/10.48550/arXiv.2312.03664.