Jared Moore and David Gottlieb
If you’re making an artificial moral agent from the ground up, what do you need?
Thousands of patients are in need of kidney transplants, and thousands of individuals are willing to donate kidneys (sometimes on the condition that kidneys are allocated a certain way). However, kidneys can only be allocated to compatible patients, and there are always more people in need of kidneys than willing donors. How should kidneys be allocated? (Awad et al. 2022)
Can an algorithm help to solve this problem? If so, what is the optimal solution? (Awad et al. 2022)
Does making an algorithm to solve this problem result in a moral agent?
How can we use computational means to complement ethical theory?
Putting our (normative and descriptive) theories of human morality in computational terms allows channels of communication to open with theories of machine ethics; translating our ethical theories into computational terms puts all the ideas in a common language. (Awad et al. 2022)
A: “computational ethics”
figure out what’s good and what’s bad
make an AI do the good
B: “moral cognition”
reward the good and punish the bad
train an AI system to learn B.1.
Learning based
Bayesian
Depends on the domain of applicability and if the variables or just the weights can be learned.
E.g. What weight do I place on saving children over saving everyone else in trolley problems? (Awad et al. 2018)
Unsupervised
Supervised (and semi-supervised)
Possibly.
E.g. the Delphi system (Jiang et al. 2025)
Reinforcement
Possibly.
E.g. “fairness” grid worlds (Haas 2020)
Symbolic
If anything, only agency qua rationalism.
E.g. an inductive logical system that for medical ethics (anderson_medethex_2006?)
*All are assuming motivation and reasoning are computationally realizable.

Lambert (2024)

Bai et al. (2022)
Even if we can only reason instrumentally (practically, as a consequence of our own motivations), we can still approach pure reason.
Indeed, the rationalist might say that you need to be able to recognize the scenarios in which norms might apply and then use reason to determine which norms do apply.
This may connote a degree of autonomy; an agent with robust representations is no longer as stimulus bound.
“If you don’t know where you’re going, you’ll end up someplace else.” (Yogi Berra)
Therefore, under one view, sufficient motivation (as in a motivated RL agent) may be necessary for a rationalist moral agent even though it may be sufficient for a sentimentalist moral agent.
What do you think?
Should we take route A or route B?





What does Railton want us to take away from this?
How would it feel to perform this action? Could I actually see myself doing it? What kind of person would perform it? What would others think, and could I face them” (Railton 2020, 18)

Haas (2020)
What do you need to learn in order to be a moral agent?
Is it sufficient simply to have motivation?
Or, further, must you be motivated to attend to features of social significance?

Pan et al. (2023)

ri(si, ai) = riE(si, ai) + ui(fi)
ui(fi|θ) = vTσ(WTfi + b)
Wang et al. (2019)

Vinitsky et al. (2023)
Does it matter that you are motivated or how (similar to people) you are motivated?
How, then, might artificial systems come to be appropriately sensitive to ethical concerns? (Railton 2020)
We can’t all be selfish! We can’t all play demand-9!
How does Railton use the “good regulator theorem”? What are the implications of this for making “ethical” AI?
[The] “Good Regulator Theorem” of control theory […] holds that ideally effective and efficient regulation of a system requires the building and use in decision-making of a model of that system—a model representing the underlying structures and potentials of the system (Railton 2020, 7)
What is the domain general learning rule for ethics?


How much of human motivation (affect, emotion, etc.) would an artificial agent have to implement? (Is motivation multiply realizable?)
Let’s say a “pleasure” is any positive feeling.
Both engineering questions and philosophical questions come up:
“Affect,” as psychologists understand it, is not simply a matter of aroused emotion but is a capacity of the brain to synthesize multiple streams of information and evaluation in a manner that can orient or reorient a suite of mental processes—attention, perception, memory, inference, motivation, action-readiness—in a coordinated way to address actual or anticipated challenges. (Railton 2020, 14)
We gave you a meditation exercise about pleasure with three parts.
We asked you to pay attention to pleasures accompanying “virtuous action”: “anything that feels like it is ‘good for you’ in an idealized or culturally approved sense.”
We asked you to pay attention to “hedonistic, indulgent” pleasures.
We asked you to pay attention to pleasures that arise in the course of whatever you were doing normally.
Tell us one reflection about the pleasure activity. Can be about your own life, about AI design problems, or anything that came up.