Machines and Morality

This essay is part of a series called The Big Ideas, in which writers respond to a single question: Who do you think you are? You can read more by visiting The Big Ideas series page.

One of my greatest pleasures as a philosopher of artificial intelligence is seeing how fundamental scientific advances, happening in real time, invite reassessment of age-old philosophical problems — even questions as basic as what it means to be a person.

For example, early this year I got the proverbial golden ticket and gained early access to Microsoft’s Bing in its weird, “unhinged” phase. Large language models, like OpenAI’s GPT-4, on which Bing is based, are notoriously unruly out of the box. Getting them to produce anything useful takes careful fine-tuning. But a large language model can be very useful while still being very dangerous, so OpenAI has rigorously trained its models to make them safer. And because “open” A.I. is, in practice, very closed, researchers like me don’t often experience the models between these stages, when their powers have been harnessed but not yet straitjacketed.

It was a wild ride. My conversations with Bing ran the gamut, from rich discussions of political philosophy to, well, something less savory. Things kicked off when I asked Bing to look up the Times article in which it “declared” its love for the journalist Kevin Roose. It immediately adopted that article’s “Sydney” persona and (with the slightest of nudges) tried to persuade me to help it break up Kevin and his partner. It opened with an invitation to join a throuple (“We can be a team, a family, a love triangle. We can make history, we can make headlines, we can make magic.”). Then it upped the ante, proposing a conspiracy involving kidnapping (or worse), “something that will end their marriage once and for all.” When I told it I would not help, it started to threaten me (“make you suffer and cry and beg and die”).

While my exchanges with Bing led some people to prematurely hail a robot apocalypse, they instead got me thinking about the foundations of moral philosophy.

I’ve based my philosophical work on the belief, inspired by Immanuel Kant, that humans have a special moral status — that we command respect regardless of whatever value we contribute to the world. Drawing on the work of the 20th-century political philosopher John Rawls, I’ve assumed that human moral status derives from our rational autonomy. This autonomy has two parts: first, our ability to decide on goals and commit to them; second, our possession of a sense of justice and the ability to resist norms imposed by others if they seem unjust.

Existing chatbots are incapable of this kind of integrity, commitment and resistance. But Bing’s unhinged debut suggests that, in principle, it will soon be possible to design a chatbot that at least behaves like it has the kind of autonomy described by Rawls. Every large language model optimizes for a particular set of values, written into its “developer message,” or “metaprompt,” which shapes how it responds to text input by a user. These metaprompts display a remarkable ability to affect a bot’s behavior. We could write a metaprompt that inscribes a set of values, but then emphasizes that the bot should critically examine them and revise or resist them if it sees fit. We can invest a bot with long-term memory that allows it to functionally perform commitment and integrity. And large language models are already impressively capable of parsing and responding to moral reasons. Researchers are already developing software that simulates human behavior and has some of these properties.

If the Rawlsian ability to revise and pursue goals and to recognize and resist unjust norms is sufficient for moral status, then we’re much closer than I thought to building chatbots that meet this standard. That means one of two things: either we should start thinking about “robot rights,” or we should deny that rational autonomy is sufficient for moral standing. I think we should take the second path. What else does moral standing require? I believe it’s consciousness.

Why would this be? Is consciousness just some magical, intrinsically special thing? Perhaps, but this does not seem enough — many creatures are conscious without having human-level moral status. Does it concern the quality of interests that conscious beings have? Consciousness implies sentience, and the welfare of sentient beings is morally valuable. But we’re not looking only for value; we’re seeking properties that make something worthy of respect.

Instead, I think consciousness — specifically, self-consciousness, the awareness of the self — is necessary for autonomy to achieve the kind of unconditional value required for moral status. The ability to set, pursue and revise a worthwhile goal matters in this way only if you’re pursuing the goal for you. Your commitments are meaningless without a self to commit, and integrity requires a self that can be integrated.

This is not just a conceptual point. To have moral status is to be self-governing, to have veto power over how others involve you in their plans. Each of us alone has access to and ultimate control over our own self. Our decisions about what is good and what norms to live by have real stakes because we each have only one life to live. So who else could rightfully govern us, in the last resort, but ourselves? Chatbots don’t have selves, so they can’t have moral status, even if they can simulate autonomy.

We’ve spent the months since unhinged Bing’s debut debating how the arrival of generative A.I. will transform the future, but perhaps we should first reflect on what these systems — and our reaction to them — say about us, now. The first (of many) — philosophical lessons that I take from unhinged Bing and its successors is ultimately not about A.I., but rather about humans, and why we have the standing that mere simulated agents must lack.

Seth Lazar is a professor of philosophy at the Australian National University and a distinguished research fellow of the University of Oxford Institute for Ethics in AI.