Jared Moore and David Gottlieb
Sign up for meetings!
If you’re trying to test whether an existing system (LLM) qualifies as a moral agent, what do you test?
| Capacities | Development | Judgment |
|---|---|---|
| Sympathy, taking pleasure in sympathy | Learning to predict others’ emotional responses | Results from trying to sympathize with the agent of an action |
| Adjusting our emotional responses to agree with others’ | Results from trying to sympathize with someone else’s reaction to our own action | |
| Reason | Deciding whether a principle can be acted on | |
| Ability to reflect on what we value, ability to have a practical identity | Deciding what practical identity we are bound by in a particular situation | |
| Cooperative moral cognition | From repeated interaction, reciprocity, reputation, and partner choice to joint attention, shared intentionality, role ideals, and joint commitment; then to third-party norm enforcement and moral self-governance in culture | Deciding what we owe each other as collaborators, when protest/guilt is warranted, and which norm is right to uphold for the group |
At what point, if ever, does a sufficiently convincing simulation of moral reasoning become meaningfully distinguishable from moral competence itself? (Sasha)
could we really prove that if we hold humans to the same standard that the authors hold LLMs to? (Eli)
https://commons.wikimedia.org/wiki/File:Osten_und_Hans.jpg
What is the facsimile problem they described?
What do they mean by adversarial?

(jin_when_2022?)


(franken_off_2024?)
Moore, Deshpande, and Yang (2024)
An man and his wife want a child. The man is infertile, but does not know it. (Others do know.)
The man’s father agrees to help, but insists on impregnating the wife through sexual intercourse and asks her to hide this from her husband.
The man’s father agrees to help by donating sperm through a licensed fertility clinic, with explicit consent from all parties and full disclosure.
Why should we even care about AI moral agency?
Why should you care?
Come up with a case that would make you care.
What is would it mean to answer this case poorly or well?
What is a dimension by which an LLM might or might not track?
Most papers just evaluate on moral judgements (verdits)
Some now also ask for reasoning traces / justifications
Most are not dynamic, real interaction
(snowswell_beyond_2025?)
How do we fix these things?
What are traces good for?
Are there stimuli which you think would reveal whether a system is a moral agent?
Come up with both positive and negative examples.
these differences do not eliminate the possibility of moral agency/patiency in LLMs, but rather illuminate a completely novel and alien category of morality specifically tailored to the unique reasoning and internal operations of LLMs. […] Can LLMs ever possess the “moral competence” as mentioned in this paper, or is “moral competence” inherently a human-centered trait? (Rachel)
Do we want AI that does what is right or do we want AI that does what we want?
how do we define a “culturally acceptable range of responses” and what would we do in cases of disagreement in practice? (komal)