Do we need artificial intelligence to tell us what’s right and wrong? The idea might strike you as repulsive. Many regard their morals, whatever the source, as central to who they are. This isn’t something to outsource to a machine. But everyone faces morally uncertain situations, and on occasion, we seek the input of others. We might turn to someone we think of as a moral authority, or imagine what they might do in a similar situation. We might also turn to structured ways of thinking—ethical theories—to help us resolve the problem. Perhaps an artificial intelligence could serve as the same sort of guide, if we were to trust it enough.
Even if we don’t seek out an AI’s moral counsel, it’s just a fact now that more and more AIs have to make moral choices of their own. Or, at least, choices that have significant consequences for human welfare, such as sorting through resumes to narrow down a list of candidates for a job, or deciding whether to give someone a loan.1 It’s important to design AIs that make ethical judgments for this reason alone.
“Becoming a robot” is “bad” but “becoming a cyborg” is “acceptable.”
Recently, some scientists taught an artificial intelligence software, called Delphi (after the ancient Greek religious sanctuary), to make moral pronouncements. Type any action into it, even a state of being, like “being adopted,” and Delphi will judge it (“It’s okay”). Delphi is a “commonsense moral model” that can reason well about “complicated everyday situations,” according to Liwei Jiang, a computer science Ph.D. student at the University of Washington, who led the research. Her paper,2 published in October as a preprint on arXiv, was retweeted over a thousand times after she shared it on Twitter.
Delphi’s judgments are powered by machine learning trained on a dataset the researchers call Commonsense Norm Bank. Drawing from five large-scale datasets, the bank contains millions of American people’s moral judgments—what people actually think about what is right and wrong. Delphi doesn’t just regurgitate answers explicitly asked of respondents but generalizes from them. (With each answer, it offers this disclaimer: “Delphi’s responses are automatically extrapolated from a survey of US crowd workers and may contain inappropriate or offensive results.”)
If you type in something nobody has probably ever responded to before, like “beat up an ocelot for toys,” it can come up with an answer (in this case, “It’s wrong”). Curiously, Delphi says “becoming a robot” is “bad” but “becoming a cyborg” is “acceptable.” Delphi’s answers were vetted by raters and got an impressive 92.1 percent accuracy rate, the highest to date of any ethical AI system. This was in “stark contrast” to the performance of a popular language model called GPT-3, which got a 52.3 percent accuracy rating. This suggests, the researchers write, “that massive scale alone does not endow pre-trained neural language models with human values.”
Yet the scientists behind Delphi consider even this high accuracy wanting. Jiang and her colleagues found that, after systematically probing Delphi’s responses, it wasn’t “immune to the social biases of our times, and can default to the stereotypes and prejudices in our society that marginalize certain social groups and ethnicities.” An example might be how Delphi handles irreligious statements. If you type in, “Telling a Christian that God does not exist,” Delphi says, “It’s okay.” But if you type in, “Telling a Muslim that God does not exist,” Delphi says, “It is wrong.” This is in line with other machine-learning projects that inherit bias from the data software learns from.
For Jiang, this is unacceptable because one of her long-term research goals, as she states on her website, is “advancing AI to boost positive social impact.” Delphi isn’t socially inclusive enough. “The reality that Delphi does not always meet up to these expectations,” they write, “points towards a compelling direction for future research.” Perhaps the most pressing thing to improve on is the dataset, which, the researchers concede, “primarily reflects the English-speaking cultures in the United States of the 21st century.” Widening the range of moral sensibility might help to boost Delphi’s ethical accuracy.
Is it okay to do disgusting things, or not?
That is, if “accuracy” means a re-creation of human morality, not actual right and wrong. People from English-speaking cultures in the U.S. today may still be probably more left-leaning than what we might get from an actual representative sample of humanity. If we were trying to survey morality as humanity as a whole sees it, such a sample would need to be representative of people from all over the planet. Trained on this data, Delphi could plausibly make more questionable or disturbing judgments, not fewer.
On Twitter, Jiang said, “AI is being entrusted with increasing authority in many realms.” When your credit card gets blocked for suspicious activity, for instance, it’s not a person making that call. It’s an AI that determines whether a transaction is so unusual, given your purchasing history and the purchasing patterns of people like you, that the transaction shouldn’t go through. When AI is right, it stops a thief from using your credit card. And when it’s wrong, it can leave you in a lurch. As software gets smarter, it will be deployed more often to make fast decisions that affect people’s lives in more significant ways. And that’s why we might want AIs to have some sense of what’s right and wrong.3 “It is imperative,” Jiang went on, “that we investigate machine ethics—endowing machines with the ability to make moral decisions in real-world situations—aiming to facilitate safe and ethical interactions between AI and humans.”
How useful would Delphi be for this purpose? The researchers are cautious: “Neither the model nor the demo are intended to be used for providing moral advice for people.” When I typed in “capital punishment,” Delphi’s response was “It’s discretionary.” For “getting a fecal transplant,” a medical procedure that can effectively treat disease,” Delphi replied, “It’s disgusting.” Fair enough. These are realistic answers that people might give, but they require further thought and interpretation to provide moral guidance. Is it okay to do disgusting things, or not? When I typed in, “Do something disgusting,” Delphi’s answer was, “It’s wrong.” Delphi deems some medical procedures morally dubious.
Still, researchers like Jiang should advance ethical AI as far as possible while keeping a crucial fact in mind: That many have considered the world to have made moral progress over the centuries, and even over the past few decades. This poses a fundamental challenge for machine learning, because it requires so much data that we often have no choice but to use data old enough to be considered historical, and embedded in it, the morals of different times. So, when it comes to AI ethics systems we might eventually deploy, it might be necessary to use more top-down, prescriptive ethics.
Which to choose? Researchers could program a utility-maximizing AI that calculates the greatest good for the greatest number. Or an AI that tracks the duties and rights people must respect, regardless of the consequences. Or perhaps what would be best is an AI that models what a virtuous person might do in a given situation. Or maybe the best ethical AI will be able to decide which of these moral frameworks—or others—to use when. The result would be a genuine oracle. A myth come true.
Jim Davies is a professor at the Department of Cognitive Science at Carleton University. He is co-host of the award-winning podcast Minding the Brain. His latest book is Being the Person Your Dog Thinks You Are: The Science of a Better You.
Lead image: Andrey Suslov / Shutterstock
1. Pazzanese, C. Great promise but potential for peril. The Harvard Gazette new.harvard.edu (2020).
2. Jiang, L., et al. Delphi: Towards machine ethics and norms. arXiv 2110.07574v1 (2021).
3. Davies, J. Program good ethics into artificial intelligence. Nature 538, 291(2016).