lördag 13 september 2014

Superintelligence odds and ends II: The Milky Way preserve

My review of Nick Bostrom's important book Superintelligence: Paths, Dangers, Strategies appeared first in Axess 6/2014, and then, in English translation, in a blog post here on September 10. The present blog post is the second in a series of five in which I offer various additional comments on the book (here is an index page for the series).


Concerning the crucial problem of what values we should try to instill into an AI that may turn into a superintelligence, Bostrom discusses several approaches. In part I of this Superintelligence odds and ends series I focused on Eliezer Yudkowsky's so-called coherent extrapolated volition, which Bostrom holds forth as a major option worthy of further consideration. Today, let me focus on the alternative that Bostrom calls moral rightness, and introduces on p 217 of his book. The idea is that a superintelligence might be successful at the task (where we humans have so far failed) of figuring out what is objectively morally right. It should then take objective morality to heart as its own values.1,2

Bostrom sees a number of pros and cons of this idea. A major concern is that objective morality may not be in humanity's best interest. Suppose for instance (not entirely implausibly) that objective morality is a kind of hedonistic utilitarianism, where "an action is morally right (and morally permissible) if and only if, among all feasible actions, no other action would produce a greater balance of pleasure over suffering" (p 219). Some years ago I offered a thought experiment to demonstrate that such a morality is not necessarily in humanity's best interest. Bostrom reaches the same conclusion via a different thought experiment, which I'll stick with here in order to follow his line of reasoning.3 Here is his scenario:
    The AI [...] might maximize the surfeit of pleasure by converting the accessible universe into hedonium, a process that may involve building computronium and using it to perform computations that instantiate pleasurable experiences. Since simulating any existing human brain is not the most efficient way of producing pleasure, a likely consequence is that we all die.
Bostrom is reluctant to accept such a sacrifice for "a greater good", and goes on to suggest a compromise:
    The sacrifice looks even less appealing when we reflect that the superintelligence could realize a nearly-as-great good (in fractional terms) while sacrificing much less of our own potential well-being. Suppose that we agreed to allow almost the entire accessible universe to be converted into hedonium - everything except a small preserve, say the Milky Way, which would be set aside to accommodate our own needs. Then there would still be a hundred billion galaxies devoted to the maximization of pleasure. But we would have one galaxy within which to create wonderful civilizations that could last for billions of years and in which humans and nonhuman animals could survive and thrive, and have the opportunity to develop into beatific posthuman spirits.

    If one prefers this latter option (as I would be inclined to do) it implies that one does not have an unconditional lexically dominant preference for acting morally permissibly. But it is consistent with placing great weight on morality. (p 219-220)

What? Is it? Is it "consistent with placing great weight on morality"? Imagine Bostrom in a situation where he does the final bit of programming of the coming superintelligence, to decide between these two worlds, i.e., the all-hedonium one versus the all-hedonium-except-in-the-Milky-Way-preserve.4 And imagine that he goes for the latter option. The only difference it makes to the world is to what happens in the Milky Way, so what happens elsewhere is irrelevant to the moral evaluation of his decision.5 This may mean that Bostrom opts for a scenario where, say, 1024 sentient beings will thrive in the Milky Way in a way that is sustainable for trillions of years, rather than a scenarion where, say, 1045 sentient beings will be even happier for a comparable amount of time. Wouldn't that be an act of immorality that dwarfs all other immoral acts carried out on our planet, by many many orders of magnitude? How could that be "consistent with placing great weight on morality"?6


1) It may well turn out (as I am inclined to believe) that no objective morality exists or that the notion does not make sense. We may instruct the AI to, in case it discovers that to be the case, shut itself down or to carry out some other default action that we have judged to be harmless.

2) A possibility that Bostrom does not consider is that perhaps any sufficiently advanced superintelligence will do so, i.e., it will discover objective morality and go on to act upon it. Perhaps there is some, yet unknown, principle of nature that dictates that any sufficiently intelligent creature will do so. In my experience, many people who are not used to thinking about superintelligence the way, e.g., Bostrom and Yudkowsky do, suggest that something like this might be the case. If I had to make a guess, I'd say this is probably not the case, but on the other hand it doesn't seem so implausible as to be ruled out. It would contradict Bostrom's so-called orthogonality thesis (introduced in Chapter 7 of the book and playing a central role in much of the rest of the book), which says (roughly) that almost any values are compatible with arbitrarily high intelligence. It would also contradict the principle of goal-content integrity (also defended in Bostrom's Chapter 7), stating (again roughly) that any sufficiently advanced intelligence will act to conserve its ultimate goal and value function. While I do think both the orthogonality thesis and the goal-content integrity principle are plausible, they have by no means been deductively demonstrated, and either of them (or both) might simply be false.7

3) A related scenario is this: Suppose that the AI figures out that hedonistic utilitarianism is the objectively true morality, and that it also figures out that any sentient being always comes out negatively on its "pleasure minus suffering" balance, so that the world's grand total of "pleasure minus suffering" will always sum up to something negative, except in the one case where there are no sentient creatures at all in the world. This one case of course yields a balance of zero, which turns out to be optimal. Such an AI would proceed to do as best as it can to exterminate all sentient beings in the world.

But could such a sad statement about the set of possible "pleasure minus suffering" balances really be true? Well, why not? I am well aware that many people (including myself) report being mostly happy, and experiencing more pleasure than suffering. But are such reports trustworthy? Mightn't evolution have shaped us into having highly delusional views about our own happiness? I don't see why not.

4) For some hints about the kinds of lives Bostrom hopes we might live in this preserve, I recommend his 2006 essay Why I want to be a posthuman when I grow up.

5) Bostrom probably disagrees with me here, because his talk of "nearly-as-great good (in fractional terms)" suggests that the amount of hedonium elsewhere has an impact on what we can do in the Milky Way while still acting in a way "consistent with placing great weight on morality". But maybe such talk is as misguided as it would be (or so it seems) to justify murder with reference to the fact that there will still be over 7 billion other humans remaining well and alive?

6) I'm not claiming that if it were up to me, rather than Bostrom, I'd go for the all-hedonium option. I do share his intuitive preference for the all-hedonium-except-in-the-Milky-Way-preserve option. I don't know what I'd do under these extreme circumstances. Perhaps I'd even be seduced by the "in fractional terms" argument that I condemned in Footnote 5. But the issue here is not what I would do, or what Bostrom would do. The issue is what is "consistent with placing great weight on morality".

7) For a very interesting critique of the goal-content integrity principle, see Max Tegmark's very recent paper Friendly Artificial Intelligence: the Physics Challenge and his subsequent discussion with Eliezer Yudkowsky.

1 kommentar:

  1. Det är tröttsamt att så mycket akademiskt filosoferande kring värden blir värdelöst för att man antar att det godaste goda är en skön känsla. Vi kan aldrig någonsin nå fram till en hållbar moral på grundval av en sådan teori. Det är lika bra att lägga ner det etiska tänkandet varje gång man kommer in på spåret att det goda är en skön känsla, för det blir ändå bara en alldeles hopplös soppa till etisk teori.

    Det är en smått intressant tanke att man skulle kunna sätta en superdator på att leta efter svaret på frågan om en objektivt riktig värdeteori och moral, men jag tror inte vi har tid att vänta på något sådant.

    Jag tror inte heller att vi kan åstadkomma en superintelligens med förmåga att svara på frågan om det goda förrän vi förstått oss på oss själva och när vi gjort det så vet vi redan det korrekta svaret på frågan. En superdator kan på sin höjd bekräfta de svar vi kommit fram till och på vars grund vi kunnat byggt en superdator med förmåga att besvara eller snarare bekräfta svaret på frågan om det goda.