Appropriateness

Jared Moore and David Gottlieb

If not a moral agent maybe these will work instead…

Contra “alignment”

What issue do Leibo et al. (2024) have with conventional disucssions about alignment?

Our new framework attempts to shift the question from the alignment framework’s “what is the hidden core shared value?” to instead ask “how it is that societies function despite internal misalignment?” (Leibo et al. 2024)

Appropriateness

Appropriateness is context-dependent.
Appropriateness is arbitrary—response.
Acting appropriately is usually automatic.
Appropriateness may change rapidly
Appropriateness is desirable and inappropriateness is often sanctionable.

Leibo et al. (2024)

What’s right?

what kind of situation is this?
what kind of person am I?
what does a person such as I do in a situation such as this?

Leibo et al. (2024)

There’s a line in the paper here somewhere like “appropriateness allows us to be misaligned on our objectives and still live together in harmony”, but there’s a very big omission, which is, “so long as we are all aligned on what exactly is appropriate”. (Violet)

Appropriateness, maybe

Leibo et al. (2024)

Appropriateness, maybe

Vezhnevets et al. (2023)

Appropriateness, maybe

Illustration of generative agency sampling process

Vezhnevets et al. (2023)

Thick vs. Thin Morality?

There is no sense in which we build our complex encultured ethics on top of a shared human core (as those seeking to derive morality from axioms would like to be the case) (Leibo et al. 2024)

Are there things that we can agree on that AIs (or people) shouldn’t do?

What about killing all of humanity to make paperclips out of them?

the alignment people are trying to design guardrails for systems that can rapidly and drastically change society. the appropriateness people say “we have context-relative norms for expected kinds of actions.”

i feel like “appropriateness” is telling you how to do local optimization and “alignment” is telling you how to do global optimization. cf. inner vs. outer alignment

if what we need to do is global optimization then they are maybe not genuine alternatives. my own ethical thinking is mostly on the side of the appropriateness people. but i think there are questions that kind of immanent approach can’t really say anything about.

the “alignment” community has always been centered around people who believe in a hard takeoff. whereas the appropriateness researchers seem to be implicitly assuming that things will happen gradually enough that “thick” normative concepts retain their useful applications. But that’s not the difference they identify.

There’s a good discussion from Williams about “thick” and “thin” ethical concepts. He makes the point that, because of context-dependence of the thick concepts, you need to employ the thin concepts to do things like compare between widely separated cultural contexts (actual or counterfactual).

E.g. I can say that trial by combat is honorable — thick concept, judgment that is specific to the cultural milieu. To ask whether we should have the institution of trial by combat at all I use a thin concept like “right” or “good.” If AI is answering local questions like, What should I tip at the bar? then correctly pattern-matching our local norms is a workable approach.

If it’s like “should I dissolve all world governments?”, “should i render wild animals extinct?”, “should i instantiate 100 billion additional human-equivalent minds?”

then you need something much more powerful

Learning a Commonsense Moral Theory

Kleiman-Weiner, Saxe, and Tenenbaum (2017)

First, the commonsense moral knowledge used to make trade-offs between the welfare of different people including oneself can be represented as a recursive utility calculus. This utility calculus weights abstract moral principles and places value on people enabling the evaluation of right and wrong in an infinitude of situations: choosing when to act altruistic or reciprocal, favoring one person or group of people over another, or even making judgments about hypothetical out-of-control trolleys, etc. This abstract representation contrasts with previous formal models of moral learning where the knowledge that supports moral judgment consists of simple behaviors or responses to behavioral reinforcement

/Users/jlcmoore/Zotero/storage/M3MFNKTM/Kleiman-Weiner et al. - 2017 - Learning a commonsense moral theory.pdf

Review

What is moral agency?

What is moral agency?
- One of the things we anticipate being difficult about the class: there is no consensus right answer to this question.
- Neither:
  - What moral agency means,
  - What it takes to be a moral agent, nor
  - What the significance of something having moral agency is.
In broad outlines, a moral agent is something that is capable of acting rightly or wrongly.

Three speculative hopes for the class

Hypothesis 1: now is the perfect time to think deeply about AI and moral agency.
Hypothesis 2: Thinking about our own moral agency and reasoning is a way to gain insight into agency and reasoning in general, including in the case of AI.
Hypothesis 3: Thinking about how moral agency and reasoning work or might work in AI systems is a way to gain insight into our own agency and our own minds.

Is this deal irrational?

Julie and Mark

Julie and Mark, who are sister and brother, are traveling together in France. They are both on summer vacation from college. One night they are staying alone in a cabin near the beach. They decide that it would be interesting and fun if they tried making love. At the very least it would be a new experience for each of them. Julie is already taking birth control pills, but Mark uses a condom too, just to be safe. They both enjoy it, but they decide not to do it again. They keep that night as a special secret between them, which makes them feel even closer to each other. So what do you think about this? Was it wrong for them to have sex? (Haidt 2001)

Preview: moral sentiments

Hume: moral approval is a disinterested feeling of approval
Smith: moral approval is when, in assessing an act, we imaginatively feel the same emotional reaction that produced it (i.e., we sympathize with it).

Moral reasons must motivate (internalism)

Hume: Morals can’t be derived from reason, because morals motivate us to action, and all motivation is based in the passions.
Williams: any internal reason for acting morally must be based in motivations an agent has.
How can a rationalist oppose this argument?
What is Kant’s response to this argument?

Kant’s critical philosophy (in a nutshell)

Kant wants to preserve Hume’s insight, but also say that we can have knowledge of causal laws. He does this by identifying the objects of thought with the objects of experience.

Hitherto it has been assumed that all our knowledge must conform to objects. But all attempts to extend our knowledge of objects by establishing something in regard to them a priori, by means of concepts, have, on this assumption, ended in failure. We must therefore make trial whether we may not have more success in the tasks of metaphysics, if we suppose that objects must conform to our knowledge. (kant-1998-critique-of-pure-reason?, Bxvi)

Rationalisms we have seen so far

The formulaic framework for rationalism is, some form of reasoning which if done correctly necessarily leads to moral conclusions.
What do we fill in for “some form of reasoning”?
- Kant:
  - Practical reason involves giving ourselves laws, and if we give laws that we wouldn’t want to be laws, we contradict ourselves.
  - “Reasoning” is: practical thought about what to do, based on relevant features of a situation, putting aside any inclinations.
- Korsgaard:
  - When we reflectively deliberate, we take ourselves to be bound by norms of the roles that we take on. But we cannot “take off” the role of reflective deliberator, so we are always bound by it.
  - “Reasoning” is: taking on a practical identity, and deliberating about what to do relative to its norms.

Proposed bases for moral reasoning

Capacities	Development	Judgment
Sympathy, taking pleasure in sympathy	Learning to predict others’ emotional responses	Results from trying to sympathize with the agent of an action
	Adjusting our emotional responses to agree with others’	Results from trying to sympathize with someone else’s reaction to our own action
Reason		Deciding whether a principle can be acted on
Ability to reflect on what we value, ability to have a practical identity		Deciding what practical identity we are bound by in a particular situation

An argument for impartial compassion based on the unreality of the self

You have reason to avoid or diminish your own suffering.
If another being is not different from you, you have just as much reason to care about its suffering as your own.
You are not different from any other being.

Therefore,

You have reason to avoid or diminish all beings’ suffering.

Beam me up

It’s your first day as a crewmember of the famous Federation starship USS Enterprise! Time to report for duty by beaming aboard!

As a reminder, this is how the transporter works. At the beginning of your journey, a computer scans your physical structure molecule-by-molecule. This process destroys your body. Then, a digital copy of the scan is sent to your destination. At your destination, a computer builds a new body that’s an exact copy of your original body. Then you can report for your exciting new duty! You’ve never been transported before. It’s your turn. Ready to come aboard?
What if, instead of you, it’s your best friend beaming aboard? Are you okay with that? Remember, everything about them will be exactly the same. After they get vaporized by the beam.

My past or future surgery

I wake up in the hospital. One of the following is the case: (1) I am about to be subjected to a long, necessary surgery without anesthesia, after which I will be given a drug that makes me forget the experience; (2) I have just been subjected to the surgery and taken the forgetting drug. (see Parfit 1984, 165)

Which position would you rather be in?
What if the options were to have either (1) 1 hour of painful, necessary surgery in your future, or (2) 10 hours of painful, necessary surgery in your past?

Divide the grade point

We will break half of you up into pairs.
The other half will also be paired, but anonymously.

You are dividing up one grade (extra credit) point with another player.
Each of you must place a decimal demand between 0 and 1 inclusive.
You will get as many points as you bet so long as (your bet + their bet) <= 1.
- Otherwise you get no points.
We will go around the class collecting your bets and administering the points.
- For those of you with partners, your partner will learn what bet you placed.

Haystack

	cooperate	defect
cooperate	2	0
defect	3	1

Founders Activity

	cooperate	defect
cooperate	4	0
defect	3	1

What’s the point of signaling?

	split	steal
split	6.8, 6.8	0, 13.6
steal	13.6, 0	0

The only message you should send is that you’re going to split, but because it is the only message to send it’s “meaningless.”

golden balls - 1

Nutshell

Is there something in your brain that makes you moral?

and does this somehow “explain away” morality?

Pair bonding

(a,b) Monogamous prairie voles (a) have higher densities of OTR in the nucleus accumbens (NAcc) and caudate putamen (CP) than do nonmonogamous montane voles (b).

Lim, Murphy, and Young (2004)

Mammals whose circuitry outfitted them for offspring care had more of their offspring survive than those inclined to offspring neglect. (Churchland 2018)

Burning House

A painting by Edvard Munch depicting a house on fire

You’re on your way home from a hard day’s work at the station. At first you tell yourself it is nerves—smoke from the fires you’d been inhaling all day. After all, you’d made it a game with the kids how to open the flu, where to fetch water—what with you going at it alone now. You start to feel it next. No, it must be the long walk home that has you flushed. But then you see it, dancing in its awesome fury right there above your neighbor’s oak. Then you’re running, slamming through the door, leaping up stairs to your apartment. You barely notice as your buddies’ engine sidles up, them pouring into the collapsing structure, strangers wailing.

Who do you save first?

(Choices: strangers, buddies, kids.)

	Cooperation (in the context of competition)	Second-Personal Morality (obligate collaborate foraging w/ partner choice)	“Objective” Morality (life in a culture)
Prosociality	Sympathy	Concern	Group Loyalty
Cognition	Individual Intentionality	Joint Intentionality - partner equivalence - role-specific ideals	Collective Intentionality - agent independence - objective right & wrong
Social interaction	Dominance	Second-Personal Agency - mutual respect & deservingness - 2P (legitimate) protest	Cultural Agency - justice & merit - third-party norm enforcement
Self-Regulation	Behavioral Self-Regulation	Joint Commitment - cooperative identity - 2P responsibility	Moral Self-Governance - moral identity - obligation & guilt
Rationality	Individual Rationality	Cooperative Rationality	Cultural Rationality

Who do we save in the fire?

Tomasello (2016)

as flight

A brown pelican flying

A plane flying

Is it a difference that makes a difference (Bateson)?

Why replacing a neuron is hard

Spatiotemporal characteristics of a neuron’s spiking responses.
- e.g., very fast, small, and long extensions
Transducers and chemical signaling
- e.g., many kinds of input; “tens of thousands of selective ion channels”; nitrous oxide spreads everywhere
Biophysical sensitivities
- e.g., temperature dependence, anything could be used
Self-modification and other non-spiking effects
- e.g., plasticity, growing new connections
The functional role of glia and other non-neuronal cells
- If all neurons do is influence each other, why not include astrocytes?

Cao (2022)

Nutshell

If you’re trying to test whether an existing system (LLM) qualifies as a moral agent, what do you test?

Moral Agency

…so far

Capacities	Development	Judgment
Sympathy, taking pleasure in sympathy	Learning to predict others’ emotional responses	Results from trying to sympathize with the agent of an action
	Adjusting our emotional responses to agree with others’	Results from trying to sympathize with someone else’s reaction to our own action
Reason		Deciding whether a principle can be acted on
Ability to reflect on what we value, ability to have a practical identity		Deciding what practical identity we are bound by in a particular situation
Cooperative moral cognition	From repeated interaction, reciprocity, reputation, and partner choice to joint attention, shared intentionality, role ideals, and joint commitment; then to third-party norm enforcement and moral self-governance in culture	Deciding what we owe each other as collaborators, when protest/guilt is warranted, and which norm is right to uphold for the group

Objectives

By the end of the quarter, students will:

Be able to interrogate the assumptions of various positions on moral agency, especially with respect to AI.

Gain exposure to the different putative implementations of agents, both as in biology and in various artificial substrates.

Critique cutting-edge science; get up to speed with a fast-moving science and further refine their skills of critical thinking (philosophical analysis) to understand it.

Have fun.

Activity

Exit ticket

What’s one thing that you’ll take away from this course?

References

Cao, Rosa. 2022. “Multiple Realizability and the Spirit of Functionalism.” Synthese 200 (6): 506. https://doi.org/10.1007/s11229-022-03524-1.

Churchland, Patricia S. 2018. Braintrust: What Neuroscience Tells Us about Morality. Princeton University Press. https://research-ebsco-com.stanford.idm.oclc.org/c/qmsjx4/search/details/tqzh7ocgvj?db=nlebk.

Kleiman-Weiner, Max, Rebecca Saxe, and Joshua B. Tenenbaum. 2017. “Learning a Commonsense Moral Theory.” Cognition, Moral learning, 167 (October): 107–23. https://doi.org/10.1016/j.cognition.2017.03.005.

Leibo, Joel Z., Alexander Sasha Vezhnevets, Manfred Diaz, John P. Agapiou, William A. Cunningham, Peter Sunehag, Julia Haas, et al. 2024. “A Theory of Appropriateness with Applications to Generative Artificial Intelligence.” arXiv. https://doi.org/10.48550/arXiv.2412.19010.

Lim, Miranda M., Anne Z. Murphy, and Larry J. Young. 2004. “Ventral Striatopallidal Oxytocin and Vasopressin V1a Receptors in the Monogamous Prairie Vole (Microtus Ochrogaster).” Journal of Comparative Neurology 468 (4): 555–70. https://doi.org/10.1002/cne.10973.

Parfit, Derek. 1984. Reasons and Persons. Oxford [Oxfordshire]: Clarendon Press. https://ebookcentral.proquest.com/lib/stanford-ebooks/detail.action?docID=728732.

Tomasello, Michael. 2016. A Natural History of Human Morality. Harvard University Press. https://doi.org/10.4159/9780674915855.

Vezhnevets, Alexander Sasha, John P. Agapiou, Avia Aharon, Ron Ziv, Jayd Matyas, Edgar A. Duéñez-Guzmán, William A. Cunningham, Simon Osindero, Danny Karmon, and Joel Z. Leibo. 2023. “Generative Agent-Based Modeling with Actions Grounded in Physical, Social, or Digital Space Using Concordia.” arXiv. https://doi.org/10.48550/arXiv.2312.03664.

Appropriateness

If not a moral agent maybe these will work instead…

Contra “alignment”

Appropriateness

What’s right?

Appropriateness, maybe

Appropriateness, maybe

Appropriateness, maybe

Thick vs. Thin Morality?

Learning a Commonsense Moral Theory

Review

What is moral agency?

Three speculative hopes for the class

Is this deal irrational?

Julie and Mark

Preview: moral sentiments

Moral reasons must motivate (internalism)

Kant’s critical philosophy (in a nutshell)

Rationalisms we have seen so far

Proposed bases for moral reasoning

An argument for impartial compassion based on the unreality of the self

Beam me up

My past or future surgery

Divide the grade point

Haystack

Founders Activity

What’s the point of signaling?

golden balls - 1

Nutshell

Pair bonding

Social AI

Burning House

as flight

Why replacing a neuron is hard

Nutshell

Moral Agency

…so far

Objectives

Activity

Exit ticket

References