måndag 8 juni 2015

A disturbing parallel between AI futurology and religion

Parallels between AI1 futurology and religion are usually meant to insult and to avoid serious rational discussion of possible AI futures; typical examples can be found here and here. Such contributions to the discussion are hardly worth reading. It might therefore be tempting to dismiss philosopher John Danaher's paper Why AI doomsayers are like sceptical theists and why it matters just from its title. That may well be a mistake, however, because the point that Danaher makes in the paper seems to be one that merits taking seriously.

Danaher is concerned with a specific and particularly nightmarish AI scenario that Nick Bostrom, in his influential recent book Superintelligence, calls the treacherous turn. To understand that scenario it is useful to recall the theory of an AI's instrumental versus final goals, developed by Bostrom and by computer scientist Steve Omohundro. In brief, this theory states that virtually any ultimate goal is compatible with arbitrarily high levels of intelligence (the orthogonality thesis), and that there are a number of instrumental goals that a sufficiently intelligent AI is likely to set up as a means towards reaching its ultimate goal pretty much regardless of what this ultimate goal is (the instrumental convergence thesis). Such instrumental goals include self-protection, acquisition of hardware and other resources, and preservation of the ultimate goal. (As indicated in Footnote 2 of my earlier blog post Superintelligence odds and ends II, I am not 100% convinced that this theory is right, but I do consider it sufficiently plausible to warrant taking seriously, and it is the best and pretty much only game in town for predicting behavior of superintelligens AIs.)

Another instrumental goal that is likely to emerge in a wide range of circumstances is that as long as humans are still in power, and as long as the AI's ultimate goals are poorly aligned with human values, the AI will try to hide its true intentions. For concreteness, we may consider one of Bostrom's favorite running examples: the paperclip maximizer. Maximizing the production of paperclips may sound like a pretty harmless goal of a machine, but only up to the point where it attains superintelligence and the capacity to take over the world and to consequently turn the entire solar system (or perhaps even the entire observable universe) into a giant heap of paperclips. The goal of the paperclip maximizer is clearly not very well aligned with human values, and as soon as we humans realize what the machine is up to, we will do what we can to pull the plug on it or to prevent its plans by other means. But if the machine is smart, it will forsee this reaction from us, and therefore hide its intention to turn everything (including us) into paperclips, until the moment when it has become so powerful that we can no longer put up any resistance. This is what Bostrom calls the treacherous turn. On p 119 of Superintelligence he gives the following definition.
    The treacherous turn: While weak, an AI behaves cooperatively (increasingly so, as it gets smarter). When the AI gets sufficiently strong - without warning or provocation - it strikes, forms a singleton,2 and begins directly to optimize the world according to the criteria implied by its final values.
Or as he puts it two pages earlier:
    An unfriendly AI of sufficient intelligence realizes that its unfriendly goals will be best realized if it behaves in a friendly manner initially, so that it will be let out of the box. It will only start behaving in a way that reveals its unfriendly nature when it no longer matters whether we find out; that is, when the AI is strong enough that human opposition is ineffectual.
So the essence of the treacherous turn is that an increasingly powerful AI may seem perfectly benevolent, while in actual fact it has plans that run counter to human values, perhaps including our extinction. Things may be very different from what they seem.

And here is where Danaher, in his paper, sees a parallel to skeptical theism. Skeptical theism is the particular answer to the problem of evil - i.e., the problem of how the existence of an omnipotent and omnibenevolent God can be reconciled with the existence of so much terrible evil in the world3 - that points to limits of our mere human understanding, and the fact that something that seems evil to us may have consequences that are very very good but inconceivable to us: God moves in mysterious ways. Danaher explains that according to skeptical theists
    you cannot, from the seemingly gratuitous nature of evil, [conclude] its actually gratuitous nature. This is because there may be beyond-our-ken reasons for allowing such evils to take place. We are cognitively limited human beings. We know only a little of the deep structure of reality, and the true nature of morality. It is epistemically possible that the seemingly gratuitous evils we perceive are necessary for some greater good.
So here, too, things may be very different from what they seem. And in both cases - the AI treacherous turn, as well as in the worldview of the skeptical theists - the radical discrepancy between how the world seems to us and how it actually is stems from the same phenomenon: the existence of a more intelligent being who understands the world incomparably better than we do.

So there is the parallel. Does it matter? Danaher says that it does, because he points to huge practical end epistemic costs to the stance of skeptical theism, and suggests analogous costs to belief in the treacherous turn. As an example of such downsides to skeptical theism, he gives the following example of moral paralysis:
    Anyone who believes in God and who uses sceptical theism to resist the problem of evil confronts a dilemma whenever they are confronted with an instance of great suffering. Suppose you are walking through the woods one day and you come across a small child, tied to tree, bleeding profusely, clearly in agony, and soon to die. Should you intervene, release the child and call an ambulance? Or should you do nothing? The answer would seem obvious to most: you should intervene. But is it obvious to the sceptical theist? They assume that we don't know what the ultimate moral implications of such suffering really is. For all they know, the suffering of the child could be logically necessary for some greater good, (or not, as the case may be). This leads to a practical dilemma: either they intervene, and possibly (for all they know), frustrate some greater good; or they do nothing and possibly (for all they know) permit some great evil.
Anyone who believes we may be the soon-to-be-slaughtered victims of an AI's treacherous turn suffers from similar epistemological and practical dilemmas.

Having come this far in Danaher's paper, I told myself that his parallel is unfair, and that a crucial disanalogy is that while skeptical theists consider humanity to be in this epistemologically hopeless situation right now, Bostrom's treacherous turn is a hypothetical future scenario - it applies only when a superintelligent AI is in existence, which is obviously not the case today.4 Surely hypothesizing about the epistemologically desperate situation of people in the future is not the same thing as putting oneself in such a situation - while the latter is self-defeating, the former is not. But then Danaher delivers the following blow:
    It could be that an AI will, in the very near future, develop the level of intelligence necessary to undertake such a deceptive project and thereby pose a huge existential risk to our futures. In fact, it is even worse than that. If the AI could deliberately conceal its true intelligence from us, in the manner envisaged by Bostrom, it could be that there already is an AI in existence that poses such a risk. Perhaps Google's self-driving car, or IBM's Watson have already achieved this level of intelligence and are merely biding their time until we release them onto our highways or into our hospitals and quiz show recording studios? After all, it is not as if we have been on the lookout for the moment of the conception of deception (whatever that may look like). If AI risk is something we should take seriously, and if Bostrom’s notion of the treacherous turn is something we should also take seriously, then this would seem to be one of its implications.
Regular readers of this blog will know that I am highly influenced by the sort of AI futurology that Bostrom respresents in his book Superintelligence. I therefore find Danaher's parallel highly worrying, and I do not know what to make of it. Danaher doesn't quite know either, but suggests two possible conclusions, "tending in opposite directions":
    The first is a reductio, suggesting that by introducing the treacherous turn, Bostrom reveals the underlying absurdity of his position. The second is an a fortiori, suggesting that the way in which Bostrom thinks about the treacherous turn may be the right way to think about superintelligence, and may consequently provide further reason to be extremely cautious about the development of artificial intelligence.
Danaher goes on to develop these opposing ideas in his paper, which I recommend reading in full.5 And I would very much like to hear what others think about his highly disturbing parallel and what to make of it.


1) AI is short for artificial intelligence.

2) Earlier in the book (p 78), Bostrom defines a singleton as "a world order in which there is at the global level a single decision-making agency". See his 2005 paper What is a singleton.

3) The problem of evil is one that I've repeatedly discussed (and even proposed a solution to) on this blog.

4) This is somewhat analogous to the distinction between, on one hand, chemtrails conspiracy theories (which are purportedly about ongoing events), and, on the other hand, serious writing on geoengineering (which is about possible future technologies). Imagine my irritation when my writings in the latter category are read by chemtrails nutcases as support for ideas in the former category.

5) Footnote added on June 9, 2015: See also Danaher's blog post on the same topic.

7 kommentarer:

  1. It is worth not in that I have slightly weakened my position on this now. I think it is unlikely that any present AI has crossed the threshold (it is still a bare epistemic possibility). Nevertheless, I think if we take Bostrom seriously, we should be very sceptical about our ability to detect when the relevant threshold has been crossed.

    1. I tend to agree that the AI-threshold-already-been-crossed hypothesis is unlikely, but I'm nevertheless glad you raised the issue in your paper, because it is still an interesting idea, and it is only if we bring it up for discussion that we have a chance to rationally evaluate the evidence for or against it.

      I added Footnote 5 with a link to your blog post.

    2. It also seems unlikely to me that there is a treacherous turn capable AI in our midst already. I'll try to sketch an argument for thinking so. Perhaps this is not far from Danaher's own current slighty weakened (compared to the paper) position?

      The argument: The treacherous turn is a very advanced plan and would seem to require some advanced cognitive features. It seems unlikely that the necessary advanced cogntive features would at some increment of supercomputer development make a big leap into existence (followed by: the AI quickly puts them to use, hatches a treacherous turn plan, acts to hide evidence of the advanced cognitive features). Instead it seems likely that the necessary features will emerge incrementally from more rudimentary features. As long as human engineers mediate each incremental step (humans are a crucial part in the upgrades i.e. not a scenario where a supercomputer has effective autonomous control over a supercomputer research and manufacturing facility that would enable a rapid take-off scenario) it seems likely that the rudimentary features would then first be noticed before the number of incremental upgrades needed to generate the advanced cognitive features. Even though AI researchers perhaps haven't been thinking much about a treacherous turn per se it seems likely that they have been looking for such more rudimentary features. But it appears they have not so far found evidence of them. For these reasons it is likely that no AI currently has the necessary advanced cognitive features to be already acting on a treacherous turn plan.

      Caveats: To flesh this out I would need to specify a list of candidate rudimentary cognitive features and related behaviours. I'd also have to show that AI researchers have been looking for such features and that if they did find such features they would likely have let the world know about them.

    3. Martin: On the (highly open) issue of whether AI will emerge slowly and incrementally, or by what you call a "big leap" (a.k.a. an intelligence explosion, a.k.a. the Singularity), I warmly recommend Eliezer Yudkowsky's ambitious but highly readable paper Intelligence explosion microeconomics.

      By the way, I suspect that your suggestion "some increment of supercomputer development" is a bit misleading, and would prefer "some increment of AI software development". It is hardly likely that increased computational raw power in itself would lead to an AI breakthrough. The true bottleneck is probably more to be found on the software side.

    4. Re hardware vs software: you are right, my mistake. My argument should be amended to include both hardware and software increments. That weakens the argument but I'm not sure how much.

      Re your first point (without reading Yudkowsky right now): I do think the big leap/explosion/singularity hypothesis is very much a possibility and didn't mean to argue against it. I meant to argue more specifically that some rudimentary cognitive features that would be indicators of an AI capable of devicing and starting to act on a treacherous turn (TT) plan would need to be in place some time before the explosive phase starts. And that since researchers for TT-unrelated reasons do already look for such features but haven't found them there likely is not yet an AI acting on a TT plan.

      Maybe I can more clearly put this in Bostrom terms (Superintelligence, p62). My argument assumed (implicitly I now realize) that an AI capable of devicing and starting to act on a TT plan would have to have reached the human level baseline. Given that assumption I meant to argue that the path to the human level baseline will at a non-explosive pace go through increments with more rudimentary cognitive features. Even on a fast or moderate "takeoff duration" scenario the "time until takeoff" phase is slow enough for researchers to look for and detect the appearance of rudimentary cognitive features.

      Perhaps the above implicit assumption can be questioned. While I doubt that a being that is across the board below the human level baseline can device and start to act on a TT plan, perhaps an AI that excel in only some cognitive features (while markedly sub human in others) would be "TT plan sufficient".

  2. Göran Lambertz har gett sig på sannolikhetsberäkningar igen...


    1. Jag vet. Se gärna till att posta dina kommentarer i relevant tråd.