Häggström hävdar: Superintelligence odds and ends I: What if human values are fundamentally incoherent?

fredag 12 september 2014

Superintelligence odds and ends I: What if human values are fundamentally incoherent?

My review of Nick Bostrom's important book Superintelligence: Paths, Dangers, Strategies appeared first in Axess 6/2014, and then, in English translation, in a blog post here on September 10. The present blog post is the first in a series of five in which I offer various additional comments on the book (here is an index page for the series).

The control problem, in Bostrom's terminology, is the problem of turning the intelligence explosion into a controlled detonation, with benign consequences for humanity. The difficulties seem daunting, and certainly beyond our current knowledge and capabilities, but Bostrom does a good job (or seemingly so) systematically partitioning it into subtasks. One such subtask is to work out what values to instill in the AI that we expect to become our first superintelligence. What should it want to do?

Giving an explicit collection of values that does not admit what Bostrom calls perverse instantiation seems hard or undoable.¹ He therefore focuses mainly on various indirect methods, and the one he seems most inclined to tentatively endorse is Eliezer Yudkowsky's so-called coherent extrapolated volition (CEV):

Coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted. The language here is, as Yudkowsky admits, a bit poetic. See his original paper for a careful explication of all the involved concepts. The idea is that we would like to have the AI take on our values, but for various reasons (we do not agree with each other, our values are confused and incoherent, many of us are jerks or jackasses, and so on) it is better that the AI works a bit more on our actual values to arrive at something that we would eventually recognize as better and more coherent.

CEV is an interesting Idea. Maybe it can work, maybe it can't, but it does seem to be worth thinking more about. (Yudkowsky has thought about it hard for a decade now, and has attracted a sizeable community of followers.) Here's one thing that worries me:

Human values exhibit, at least on the surface, plenty of incoherence. That much is hardly controversial. But what if the incoherence goes deeper, and is fundamental in such a way that any attempt to untangle it is bound to fail? Perhaps any search for our CEV is bound to lead to more and more glaring contradictions? Of course any value system can be modified into something coherent, but perhaps not all value systems cannot be so modified without sacrificing some of its most central tenets? And perhaps human values have that property?

Let me offer a candidate for what such a fundamental contradiction might consist in. Imagine a future where all humans are permanently hooked up to life-support machines, lying still in beds with no communication with each other, but with electrodes connected to the pleasure centra of our brains in such a way as to constantly give us the most pleasurable experiences possible (given our brain architectures). I think nearly everyone would attach a low value to such a future, deeming it absurd and unacceptable (thus agreeing with Robert Nozick). The reason we find it unacceptable is that in such a scenario we no longer have anything to strive for, and therefore no meaning in our lives. So we want instead a future where we have something to strive for. Imagine such a future F₁. In F₁ we have something to strive for, so there must be something missing in our lives. Now let F₂ be similar to F₁, the only difference being that that something is no longer missing in F₂, so almost by definition F₂ is better than F₁ (because otherwise that something wouldn't be worth striving for). And as long as there is still something worth striving for in F₂, there's an even better future F₃ that we should prefer. And so on. What if any such procedure quickly takes us to an absurd and meaningless scenario with life-suport machines and electrodes, or something along those lines. Then no future will be good enough for our preferences, so not even a superintelligence will have anything to offer us that aligns acceptably with our values.²

Now, I don't know how serious this particular problem is. Perhaps there is some way to gently circumvent its contradictions. But even then, there might be some other fundamental inconsistency in our values - one that cannot be circumvented. If that is the case, it will throw a spanner in the works of CEV. And perhaps not only for CEV, but for any serious attempt to set up a long-term future for humanity that aligns with our values, with or without a superintelligence.

Footnotes

1) Bostrom gives many examples of perverse instantiations. Here's one: If we instill the value "make us smile", the AI might settle for paralyzing human facial musculatures in such a way that we go around endlessly smiling (regardless of mood).

2) And what about replacing "superintelligence" with "God" in this last sentence? I have often mocked Christians for their inability to solve the problem of evil: with an omnipotent and omnibenevolent God, how can there still be suffering in the world? Well, perhaps here we have stumbled upon an answer, and upon God's central dilemma. On one hand, he cannot go for a world of eternal life-support and electrical stimulus of our pleasure centra, because that would leave us bereft of any meaning of our lives. On the other hand, the alternative of going for one of the suboptimal worlds such as F₁ or F₂ would leave us complaining. In such a world, in order for us to be motivated to do anything at all, there must be some variation in our level of well-being. Perhaps it is the case that no matter how high our general such level is, we will perceive any dip in well-being as suffering, and be morally outraged at the idea of a God who allows it. But he still had to choose some such level, and here we are.

(Still, I think there is something to be said about the level of suffering God chose for us. He opted for the Haiti 2010 earthquake and the Holocaust, when he could have gone for something on the level of, say, the irritation of a dust speck in the eye. How can we not conclude that this makes him evil?)

2 kommentarer:

Martin12 september 2014 kl. 08:55
En utväg från din inkonsistensutmaning är att argumenterar att följande är gott för oss *som en hel sekvens* (inte bara som enskilda maximerbara komponenter): Tänka igenom våra mål, finna vissa mål givande och intressanta, tänka ut medel för att nå intressanta mål, försöka nå de målen, stöta på oväntade utmaningar under försöken, ta sig igenom utmaningarna, uppnå målen, reflektera över hur en vuxit med uppgiften.

Filosofiska teorier om gott liv delas ofta in i tre typer: mental state views (hedonism är mest kända exemplet), desire satisfaction views och objective list views. Tredje kategorin är vildvuxen. Där finns plats för teser som att sekvensen ovan är bra för oss. Där finns även plats för teser som att livet är bättre om det har en viss övergripande struktur/riktning, såsom att ens liv är bättre om det är dåligt första halvan men blir bättre sen än om det börjar bra och blir sämre andra halvan, även när båda liven är hedonistiskt sett likvärdiga.

Men även om sekvensen löser utmaningen med F1/F2/F3 så kan en alltmer komplex teori om det goda kan spä på det övergripande problemet med att få ett AI att operationalisera den teorin utan perverse instantiation.
SvaraRadera
Svar
Anonym12 september 2014 kl. 11:09
Det är intressant att den nya eller kommande artificiella intelligensen tvingar folk att fundera kring livets värden. Moderna människor verkar annars extremt ointresserade av att ägna sig åt sådana frågor. Kanske vågar de inte ägna sig åt sådana frågor. Kanske tror de att hedonismen redan är bevisad. Själv tror jag att Mihalyi Csikzentmihalyis teorier om flow visar vägen framåt, även fast han verkar ha kört fast utan att klämma ur sig en verkligt fullgången filosofi. Googla "Det goda livet flödar fram" om du vill veta mer.
SvaraRadera
Svar

Lägg till kommentar