Parallels between AI
1 futurology and religion are usually meant to insult and to avoid serious rational discussion of possible AI futures; typical examples can be found
here and
here. Such contributions to the discussion are hardly worth reading. It might therefore be tempting to dismiss philosopher John Danaher's paper
Why AI doomsayers are like sceptical theists and why it matters just from its title. That may well be a mistake, however, because the point that Danaher makes in the paper seems to be one that merits taking seriously.
Danaher is concerned with a specific and particularly nightmarish AI scenario that Nick Bostrom, in his influential recent book
Superintelligence, calls
the treacherous turn. To understand that scenario it is useful to recall the theory of an AI's instrumental versus final goals, developed
by Bostrom and
by computer scientist Steve Omohundro. In brief, this theory states that virtually any ultimate goal is compatible with arbitrarily high levels of intelligence (the
orthogonality thesis), and that there are a number of instrumental goals that a sufficiently intelligent AI is likely to set up as a means towards reaching its ultimate goal pretty much regardless of what this ultimate goal is (the
instrumental convergence thesis). Such instrumental goals include self-protection, acquisition of hardware and other resources, and preservation of the ultimate goal. (As indicated in Footnote 2 of my earlier blog post
Superintelligence odds and ends II, I am not 100% convinced that this theory is right, but I do consider it sufficiently plausible to warrant taking seriously, and it is the best and pretty much only game in town for predicting behavior of superintelligens AIs.)
Another instrumental goal that is likely to emerge in a wide range of circumstances is that as long as humans are still in power, and as long as the AI's ultimate goals are poorly aligned with human values, the AI will try to hide its true intentions. For concreteness, we may consider one of Bostrom's favorite running examples:
the paperclip maximizer. Maximizing the production of paperclips may sound like a pretty harmless goal of a machine, but only up to the point where it attains superintelligence and the capacity to take over the world and to consequently turn the entire solar system (or perhaps even the entire observable universe) into a giant heap of paperclips. The goal of the paperclip maximizer is clearly not very well aligned with human values, and as soon as we humans realize what the machine is up to, we will do what we can to pull the plug on it or to prevent its plans by other means. But if the machine is smart, it will forsee this reaction from us, and therefore hide its intention to turn everything (including us) into paperclips, until the moment when it has become so powerful that we can no longer put up any resistance. This is what Bostrom calls the treacherous turn. On p 119 of
Superintelligence he gives the following definition.
The treacherous turn: While weak, an AI behaves cooperatively (increasingly so, as it gets smarter). When the AI gets sufficiently strong - without warning or provocation - it strikes, forms a singleton,2 and begins directly to optimize the world according to the criteria implied by its final values.
Or as he puts it two pages earlier:
An unfriendly AI of sufficient intelligence realizes that its unfriendly goals will be best realized if it behaves in a friendly manner initially, so that it will be let out of the box. It will only start behaving in a way that reveals its unfriendly nature when it no longer matters whether we find out; that is, when the AI is strong enough that human opposition is ineffectual.
So the essence of the treacherous turn is that an increasingly powerful AI may
seem perfectly benevolent, while
in actual fact it has plans that run counter to human values, perhaps including our extinction. Things may be very different from what they seem.
And here is where Danaher,
in his paper, sees a parallel to
skeptical theism. Skeptical theism is the particular answer to
the problem of evil - i.e., the problem of how the existence of an omnipotent and omnibenevolent God can be reconciled with the existence of so much terrible evil in the world
3 - that points to limits of our mere human understanding, and the fact that something that seems evil to us may have consequences that are very very good but inconceivable to us: God moves in mysterious ways. Danaher explains that according to skeptical theists
you cannot, from the seemingly gratuitous nature of evil, [conclude] its actually gratuitous nature. This is because there may be beyond-our-ken reasons for allowing such evils to take place. We are cognitively limited human beings. We know only a little of the deep structure of reality, and the true nature of morality. It is epistemically possible that the seemingly gratuitous evils we perceive are necessary for some greater good.
So here, too, things may be very different from what they seem. And in both cases - the AI treacherous turn, as well as in the worldview of the skeptical theists - the radical discrepancy between how the world seems to us and how it actually is stems from the same phenomenon: the existence of a more intelligent being who understands the world incomparably better than we do.
So there is the parallel. Does it matter? Danaher says that it does, because he points to huge practical end epistemic costs to the stance of skeptical theism, and suggests analogous costs to belief in the treacherous turn. As an example of such downsides to skeptical theism, he gives the following example of moral paralysis:
Anyone who believes in God and who uses sceptical theism to resist the problem of evil confronts a dilemma whenever they are confronted with an instance of great suffering. Suppose you are walking through the woods one day and you come across a small child, tied to tree, bleeding profusely, clearly in agony, and soon to die. Should you intervene, release the child and call an ambulance? Or should you do nothing? The answer would seem obvious to most: you should intervene. But is it obvious to the sceptical theist? They assume that we don't know what the ultimate moral implications of such suffering really is. For all they know, the suffering of the child could be logically necessary for some greater good, (or not, as the case may be). This leads to a practical dilemma: either they intervene, and possibly (for all they know), frustrate some greater good; or they do nothing and possibly (for all they know) permit some
great evil.
Anyone who believes we may be the soon-to-be-slaughtered victims of an AI's treacherous turn suffers from similar epistemological and practical dilemmas.
Having come this far in Danaher's paper, I told myself that his parallel is unfair, and that a crucial disanalogy is that while skeptical theists consider humanity to be in this epistemologically hopeless situation
right now, Bostrom's treacherous turn is a
hypothetical future scenario - it applies only when a superintelligent AI is in existence, which is obviously not the case today.
4 Surely hypothesizing about the epistemologically desperate situation of people in the future is not the same thing as putting
oneself in such a situation - while the latter is self-defeating, the former is not. But then Danaher delivers the following blow:
It could be that an AI will, in the very near future, develop the level of intelligence necessary to undertake such a deceptive project and thereby pose a huge existential risk to our futures. In fact, it is even worse than that. If the AI could deliberately conceal its true intelligence from us, in the manner envisaged by Bostrom, it could be that there already is an AI in existence that poses such a risk. Perhaps Google's self-driving car, or IBM's Watson have already achieved this level of intelligence and are merely biding their time until we release them onto our highways or into our hospitals and quiz show recording studios? After all, it is not as if we have been on the lookout for the moment of the conception of deception (whatever that may look like). If AI risk is something we should take seriously, and if Bostrom’s notion of the treacherous turn is something we should also take seriously, then this would seem to be one of its implications.
Regular readers of this blog will know that
I am highly influenced by the sort of AI futurology that Bostrom respresents in his book Superintelligence. I therefore find Danaher's parallel highly worrying, and I do not know what to make of it. Danaher doesn't quite know either, but suggests two possible conclusions,
"tending in opposite directions":
The first is a reductio, suggesting that by introducing the treacherous turn, Bostrom reveals the underlying absurdity of his position. The second is an a fortiori, suggesting that the way in which Bostrom thinks about the treacherous turn may be the right way to think about superintelligence, and may consequently provide further reason to be extremely cautious about the development
of artificial intelligence.
Danaher goes on to develop these opposing ideas
in his paper, which I recommend reading in full.
5 And I would very much like to hear what others think about his highly disturbing parallel and what to make of it.
Footnotes
1) AI is short for artificial intelligence.
2) Earlier in the book (p 78), Bostrom defines a
singleton as
"a world order in which there is at the global level a single decision-making agency". See his 2005 paper
What is a singleton.