My review of Nick Bostrom's important book Superintelligence: Paths, Dangers, Strategies appeared first in Axess 6/2014, and then, in English translation, in a blog post here on September 10. The present blog post is the first in a series of five in which I offer various additional comments on the book (here is an index page for the series).
The control problem, in Bostrom's terminology, is the problem of turning the intelligence explosion into a controlled detonation, with benign consequences for humanity. The difficulties seem daunting, and certainly beyond our current knowledge and capabilities, but Bostrom does a good job (or seemingly so) systematically partitioning it into subtasks. One such subtask is to work out what values to instill in the AI that we expect to become our first superintelligence. What should it want to do?
Giving an explicit collection of values that does not admit what Bostrom calls perverse instantiation seems hard or undoable.1 He therefore focuses mainly on various indirect methods, and the one he seems most inclined to tentatively endorse is Eliezer Yudkowsky's so-called coherent extrapolated volition (CEV):
- Coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
CEV is an interesting Idea. Maybe it can work, maybe it can't, but it does seem to be worth thinking more about. (Yudkowsky has thought about it hard for a decade now, and has attracted a sizeable community of followers.) Here's one thing that worries me:
Human values exhibit, at least on the surface, plenty of incoherence. That much is hardly controversial. But what if the incoherence goes deeper, and is fundamental in such a way that any attempt to untangle it is bound to fail? Perhaps any search for our CEV is bound to lead to more and more glaring contradictions? Of course any value system can be modified into something coherent, but perhaps not all value systems cannot be so modified without sacrificing some of its most central tenets? And perhaps human values have that property?
Let me offer a candidate for what such a fundamental contradiction might consist in. Imagine a future where all humans are permanently hooked up to life-support machines, lying still in beds with no communication with each other, but with electrodes connected to the pleasure centra of our brains in such a way as to constantly give us the most pleasurable experiences possible (given our brain architectures). I think nearly everyone would attach a low value to such a future, deeming it absurd and unacceptable (thus agreeing with Robert Nozick). The reason we find it unacceptable is that in such a scenario we no longer have anything to strive for, and therefore no meaning in our lives. So we want instead a future where we have something to strive for. Imagine such a future F1. In F1 we have something to strive for, so there must be something missing in our lives. Now let F2 be similar to F1, the only difference being that that something is no longer missing in F2, so almost by definition F2 is better than F1 (because otherwise that something wouldn't be worth striving for). And as long as there is still something worth striving for in F2, there's an even better future F3 that we should prefer. And so on. What if any such procedure quickly takes us to an absurd and meaningless scenario with life-suport machines and electrodes, or something along those lines. Then no future will be good enough for our preferences, so not even a superintelligence will have anything to offer us that aligns acceptably with our values.2
Now, I don't know how serious this particular problem is. Perhaps there is some way to gently circumvent its contradictions. But even then, there might be some other fundamental inconsistency in our values - one that cannot be circumvented. If that is the case, it will throw a spanner in the works of CEV. And perhaps not only for CEV, but for any serious attempt to set up a long-term future for humanity that aligns with our values, with or without a superintelligence.
1) Bostrom gives many examples of perverse instantiations. Here's one: If we instill the value "make us smile", the AI might settle for paralyzing human facial musculatures in such a way that we go around endlessly smiling (regardless of mood).
2) And what about replacing "superintelligence" with "God" in this last sentence? I have often mocked Christians for their inability to solve the problem of evil: with an omnipotent and omnibenevolent God, how can there still be suffering in the world? Well, perhaps here we have stumbled upon an answer, and upon God's central dilemma. On one hand, he cannot go for a world of eternal life-support and electrical stimulus of our pleasure centra, because that would leave us bereft of any meaning of our lives. On the other hand, the alternative of going for one of the suboptimal worlds such as F1 or F2 would leave us complaining. In such a world, in order for us to be motivated to do anything at all, there must be some variation in our level of well-being. Perhaps it is the case that no matter how high our general such level is, we will perceive any dip in well-being as suffering, and be morally outraged at the idea of a God who allows it. But he still had to choose some such level, and here we are.
(Still, I think there is something to be said about the level of suffering God chose for us. He opted for the Haiti 2010 earthquake and the Holocaust, when he could have gone for something on the level of, say, the irritation of a dust speck in the eye. How can we not conclude that this makes him evil?)