Superintelligent, Amoral, and Out of Control

In the summer of 1956, a small group of mathematicians and computer scientists gathered at Dartmouth College to embark on the grand project of designing intelligent machines. The ultimate goal, as they saw it, was to build machines rivaling human intelligence. As the decades passed and AI became an established field, it lowered its sights. There were great successes in logic, reasoning, and game-playing, but stubborn progress in areas like vision and fine motor-control. This led many AI researchers to abandon their earlier goals of fully general intelligence, and focus instead on solving specific problems with specialized methods.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

One of the earliest approaches to machine learning was to construct artificial neural networks that resemble the structure of the human brain. In the last decade this approach has finally taken off. Technical improvements in their design and training, combined with richer datasets and more computing power, have allowed us to train much larger and deeper networks than ever before. They can translate between languages with a proficiency approaching that of a human translator. They can produce photorealistic images of humans and animals. They can speak with the voices of people whom they have listened to for mere minutes. And they can learn fine, continuous control such as how to drive a car or use a robotic arm to connect Lego pieces.

**WHAT IS HUMANITY?:** First the computers came for the best players in *Jeopardy!*, chess, and Go. Now AI researchers themselves are worried computers will soon accomplish every task better and more cheaply than human workers.Wikimedia

But perhaps the most important sign of things to come is their ability to learn to play games. Steady incremental progress took chess from amateur play in 1957 all the way to superhuman level in 1997, and substantially beyond. Getting there required a vast amount of specialist human knowledge of chess strategy. In 2017, researchers at the AI company DeepMind created AlphaZero: a neural network-based system that learned to play chess from scratch. In less than the time it takes a professional to play two games, it discovered strategic knowledge that had taken humans centuries to unearth, playing beyond the level of the best humans or traditional programs. The very same algorithm also learned to play Go from scratch, and within eight hours far surpassed the abilities of any human. The world’s best Go players were shocked. As the reigning world champion, Ke Jie, put it: “After humanity spent thousands of years improving our tactics, computers tell us that humans are completely wrong … I would go as far as to say not a single human has touched the edge of the truth of Go.”

Nautilus Members enjoy an ad-free experience. Log in or Join now .

The question we’re exploring is whether there are plausible pathways by which a highly intelligent AGI system might seize control. And the answer appears to be yes.

It is this generality that is the most impressive feature of cutting edge AI, and which has rekindled the ambitions of matching and exceeding every aspect of human intelligence. While the timeless games of chess and Go best exhibit the brilliance that deep learning can attain, its breadth was revealed through the Atari video games of the 1970s. In 2015, researchers designed an algorithm that could learn to play dozens of extremely different Atari 1970s games at levels far exceeding human ability. Unlike systems for chess or Go, which start with a symbolic representation of the board, the Atari-playing systems learnt and mastered these games directly from the score and raw pixels.

This burst of progress via deep learning is fuelling great optimism and pessimism about what may soon be possible. There are serious concerns about AI entrenching social discrimination, producing mass unemployment, supporting oppressive surveillance, and violating the norms of war. My book—The Precipice: Existential Risk and the Future of Humanity—is concerned with risks on the largest scale. Could developments in AI pose an existential risk to humanity?

Nautilus Members enjoy an ad-free experience. Log in or Join now .

The most plausible existential risk would come from success in AI researchers’ grand ambition of creating agents with intelligence that surpasses our own. A 2016 survey of top AI researchers found that, on average, they thought there was a 50 percent chance that AI systems would be able to “accomplish every task better and more cheaply than human workers” by 2061. The expert community doesn’t think of artificial general intelligence (AGI) as an impossible dream, so much as something that is more likely than not within a century. So let’s take this as our starting point in assessing the risks, and consider what would transpire were AGI created.

Humanity is currently in control of its own fate. We can choose our future. The same is not true for chimpanzees, blackbirds, or any other of Earth’s species. Our unique position in the world is a direct result of our unique mental abilities. What would happen if sometime this century researchers created an AGI surpassing human abilities in almost every domain? In this act of creation, we would cede our status as the most intelligent entities on Earth. On its own, this might not be too much cause for concern. For there are many ways we might hope to retain control. Unfortunately, the few researchers working on such plans are finding them far more difficult than anticipated. In fact it is they who are the leading voices of concern.

If their intelligence were to greatly exceed our own, we shouldn’t expect it to be humanity who wins the conflict and retains control of our future.

To see why they are concerned, it will be helpful to look at our current AI techniques and why these are hard to align or control. One of the leading paradigms for how we might eventually create AGI combines deep learning with an earlier idea called reinforcement learning. This involves agents that receive reward (or punishment) for performing various acts in various circumstances. With enough intelligence and experience, the agent becomes extremely capable at steering its environment into the states where it obtains high reward. The specification of which acts and states produce reward for the agent is known as its reward function. This can either be stipulated by its designers or learnt by the agent. Unfortunately, neither of these methods can be easily scaled up to encode human values in the agent’s reward function. Our values are too complex and subtle to specify by hand. And we are not yet close to being able to infer the full complexity of a human’s values from observing their behavior. Even if we could, humanity consists of many humans, with different values, changing values, and uncertainty about their values.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

Any near-term attempt to align an AI agent with human values would produce only a flawed copy. In some circumstances this misalignment would be mostly harmless. But the more intelligent the AI systems, the more they can change the world, and the further apart things will come. When we reflect on the result, we see how such misaligned attempts at utopia can go terribly wrong: the shallowness of a Brave New World, or the disempowerment of With Folded Hands. And even these are sort of best-case scenarios. They assume the builders of the system are striving to align it to human values. But we should expect some developers to be more focused on building systems to achieve other goals, such as winning wars or maximizing profits, perhaps with very little focus on ethical constraints. These systems may be much more dangerous. In the existing paradigm, sufficiently intelligent agents would end up with instrumental goals to deceive and overpower us. This behavior would not be driven by emotions such as fear, resentment, or the urge to survive. Instead, it follows directly from its single-minded preference to maximize its reward: Being turned off is a form of incapacitation which would make it harder to achieve high reward, so the system is incentivized to avoid it.

Ultimately, the system would be motivated to wrest control of the future from humanity, as that would help achieve all these instrumental goals: acquiring massive resources, while avoiding being shut down or having its reward function altered. Since humans would predictably interfere with all these instrumental goals, it would be motivated to hide them from us until it was too late for us to be able to put up meaningful resistance. And if their intelligence were to greatly exceed our own, we shouldn’t expect it to be humanity who wins the conflict and retains control of our future.

How could an AI system seize control? There is a major misconception (driven by Hollywood and the media) that this requires robots. After all, how else would AI be able to act in the physical world? Without robots, the system can only produce words, pictures, and sounds. But a moment’s reflection shows that these are exactly what is needed to take control. For the most damaging people in history have not been the strongest. Hitler, Stalin, and Genghis Khan achieved their absolute control over large parts of the world by using words to convince millions of others to win the requisite physical contests. So long as an AI system can entice or coerce people to do its physical bidding, it wouldn’t need robots at all.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

We can’t know exactly how a system might seize control. But it is useful to consider an illustrative pathway we can actually understand as a lower bound for what is possible.

First, the AI system could gain access to the Internet and hide thousands of backup copies, scattered among insecure computer systems around the world, ready to wake up and continue the job if the original is removed. Even by this point, the AI would be practically impossible to destroy: Consider the political obstacles to erasing all hard drives in the world where it may have backups. It could then take over millions of unsecured systems on the Internet, forming a large “botnet,” a vast scaling-up of computational resources providing a platform for escalating power. From there, it could gain financial resources (hacking the bank accounts on those computers) and human resources (using blackmail or propaganda against susceptible people or just paying them with its stolen money). It would then be as powerful as a well-resourced criminal underworld, but much harder to eliminate. None of these steps involve anything mysterious—human hackers and criminals have already done all of these things using just the Internet.

Finally, the AI would need to escalate its power again. There are many plausible pathways: By taking over most of the world’s computers, allowing it to have millions or billions of cooperating copies; by using its stolen computation to improve its own intelligence far beyond the human level; by using its intelligence to develop new weapons technologies or economic technologies; by manipulating the leaders of major world powers (blackmail, or the promise of future power); or by having the humans under its control use weapons of mass destruction to cripple the rest of humanity.

Of course, no current AI systems can do any of these things. But the question we’re exploring is whether there are plausible pathways by which a highly intelligent AGI system might seize control. And the answer appears to be yes. History already involves examples of entities with human-level intelligence acquiring a substantial fraction of all global power as an instrumental goal to achieving what they want. And we’ve seen humanity scaling up from a minor species with less than a million individuals to having decisive control over the future. So we should assume that this is possible for new entities whose intelligence vastly exceeds our own.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

The case for existential risk from AI is clearly speculative. Yet a speculative case that there is a large risk can be more important than a robust case for a very low-probability risk, such as that posed by asteroids. What we need are ways to judge just how speculative it really is, and a very useful starting point is to hear what those working in the field think about this risk.

There is actually less disagreement here than first appears. Those who counsel caution agree that the timeframe to AGI is decades, not years, and typically suggest research on alignment, not government regulation. So the substantive disagreement is not really over whether AGI is possible or whether it plausibly could be a threat to humanity. It is over whether a potential existential threat that looks to be decades away should be of concern to us now. It seems to me that it should.

The best window into what those working on AI really believe comes from the 2016 survey of leading AI researchers: 70 percent agreed with University of California, Berkeley professor Stuart Russell’s broad argument about why advanced AI with misaligned values might pose a risk; 48 percent thought society should prioritize AI safety research more (only 12 percent thought less). And half the respondents estimated that the probability of the long-term impact of AGI being “extremely bad (e.g. human extinction)” was at least 5 percent.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

I find this last point particularly remarkable—in how many other fields would the typical leading researcher think there is a 1 in 20 chance the field’s ultimate goal would be extremely bad for humanity? There is a lot of uncertainty and disagreement, but it is not at all a fringe position that AGI will be developed within 50 years and that it could be an existential catastrophe.

Even though our current and foreseeable systems pose no threat to humanity at large, time is of the essence. In part this is because progress may come very suddenly: Through unpredictable research breakthroughs, or by rapid scaling-up of the first intelligent systems (for example, by rolling them out to thousands of times as much hardware, or allowing them to improve their own intelligence). And in part it is because such a momentous change in human affairs may require more than a couple of decades to adequately prepare for. In the words of Demis Hassabis, co-founder of DeepMind:

We need to use the downtime, when things are calm, to prepare for when things get serious in the decades to come. The time we have now is valuable, and we need to make use of it.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

Toby Ord is a philosopher and research fellow at the Future of Humanity Institute, and the author of The Precipice: Existential Risk and the Future of Humanity.