In my opinion, anyone who seriously wants to engage in issues about the potential for disruptive consequences of future technologies, and about the long-term survival of humanity, needs to know about the kind of apocalyptic scenarios in connection with a possible breakthrough in artificial intelligence (AI) that Oxford philosopher Nick Bostrom discusses in his 2014 book
Superintelligence. For those readers who consider
my review of the book too brief, while not having gotten around to reading the book itself, and who lack the patience to wait for my upcoming
Here Be Dragons: Science, Technology and the Future of Humanity (to be released by Oxford University Press in January), I recommend two recent essays published in high-profile venues, both discussing Bostrom's arguments in
Superintelligence at some length:
Geist and Koch express very different views on the subject, and I therefore strongly recommend (a) reading both essays, and (b) resisting the temptation to quickly, lightheartedly and without deeper understanding of the issues involved take sides in favor of the view that best fits one's own personal intuitions and biases. As for myself, I have more or less taken sides, but only after having thought about the issues for years, and having read many books and many papers on the topic.
I have relatively little to say about Koch's essay, because he and I (and Bostrom) seem to be mostly in agreement. I disagree with his (mild) criticism against Bostrom for a tendency to go off on tangents concerning far-fetched scenarios, an example being (according to Koch) the AI going on to colonize the universe.1 And I have some minor qualms concerning the last few paragraphs, where Koch emphasizes the scientific study of the human brain and how it gives rise to intelligence and consciousness, as the way to move forward on the issues rasied by Bostrom: while I do think that area to be important and highly relevant, holding it forth as the only possible (or even as the obviously best) way forward seems unwarranted.2
My disagreements with Geist are more far-reaching. When he summarizes his opinion about Bostrom's book with the statement that "Superintelligence is propounding a solution that will not work to a problem that probably does not exist", I disagree with both the "a problem that probably does not exist" part and the "a solution that will not work" part. Let me discuss them one at a time.
What Geist means when he speaks of
"a problem that probably does not exist" is that a machine with superhuman intelligence will probably never be possible to build. He invokes the somewhat tired but often sensible rule of thumb "extraordinary claims require extraordinary evidence" when he holds forth that
"the extraordinary claim that machines can become so intelligent as to gain demonic powers requires extraordinary evidence". If we take the derogatory term
"demonic" to refer to superhuman intelligence, then how extraordinary is the claim? This issue separates neatly in two parts, namely:
(a) Is the human brain a near-optimal arrangement of matter for producing intelligence, or are there arrangements that give rise to vastly higher intelligence?
(b) If the answer to (a) is that such superhumanly intelligent arrangements of matter do exist, will it ever be within the powers of human technology to construct them?
To me, it seems pretty clear that the likely answer to (a) is that such superhumanly intelligent arrangements of matter do exist, based on how absurd it seems to think that Darwinian evolution, with all its strange twists and turns, its ad hoc solutions and its peacock tails, would have arrived, with the advent of the present-day human brain, at anything like a global intelligence optimum.
3
This leaves question (b), which in my judgement is much more open. If we accept both naturalism and the Church-Turing thesis, then it is natural to think that intelligence is essentially an algorithmic property, so that if there exist superhumanly intelligent arrangements of matter, then there are computer programs that implement such intelligence. A nice framework for philosophizing over whether we could ever produce such a program is computer scientist Thore Husfeldt's recent image of
the Library of Turing. Husfeldt used it to show that blind search in that library would no more be able to find the desired program than a group of monkeys with typewriters would be able to produce
Hamlet. But might we be able to do it by methods more sophisticated then blind search? That is an open question, but I am leaning towards thinking that we can. Nature used Darwinian evolution to succed at the Herculean task of navigating the Library of Mendel to find genomes corresponding to advanced organisms such as us, and we ourselves used intelligent design for navigating the library of Babel to find such masterpieces as
Hamlet and
Reasons and Persons (plus
The Da Vinci Code and a whole lot of other junk). And now, for searching the the Library of Turing in the hope of finding a superintelligent program, we have the luxury of being able to
combine Darwinian evolution with intelligent design (on top of which we have the possibility of looking into our own brains and using them as a template or at least an inspiration for engineering solutions). To suggest that a successful such search is within grasp of future human technology seems a priori no more extraordinary than to suggest that it isn't, but one of these suggestions is true, so they cannot both be dismissed as extraordinary.
Geist does invoke a few slightly more concrete arguments for his claim that a superintelligent machine will probably never be built, but they fail to convince. He spends much ink on the history of AI and the observation that despite decades of work no superintelligence has been found, which he takes as an indication that it will
never be found, but the conclusion simply does not follow. His statement that
"creating intelligence [...]
grows increasingly harder the smarter one tries to become" is a truism if we fix the level that we start from, but the question becomes more interesting if we ask whether it is easier or harder to improve from a high level of intelligence than from a low level of intelligence. Or
in Eliezer Yudkowsky's words:
"The key issue [is]
returns on cognitive reinvestment - the ability to invest more computing power, faster computers, or improved cognitive algorithms to yield cognitive
labor which produces larger brains, faster brains, or better mind designs." Does cognitive reinvestment yield increasing or decreasing returns? In his paper
Intelligence explosion microeconomics, Yudkowsky tries to review the evidence, and finds it mixed, but comes out with the tentative conclusion that on balance it points towards increasing returns.
Let me finally discuss the
"a solution that will not work" part of Geist's summary statement about Bostrom's
Superintelligence. The solution in question is, in short, to instill the AI that will later attain superintelligence level with values compatible with human flourishing. On this matter, there are two key passages in Geist's essay. First this:
Bostrom believes that superintelligences will retain the same goals they began with, even after they have increased astronomically in intelligence. "Once unfriendly superintelligence exists," he warns, "it would prevent us from replacing it or changing its preferences." This assumption - that superintelligences will do whatever is necessary to maintain their "goal-content integrity" - undergirds his analysis of what, if anything, can be done to prevent artificial intelligence from destroying humanity. According to Bostrom, the solution to this challenge lies in building a value system into AIs that will remain human-friendly even after an intelligence explosion, but he is pessimistic about the feasibility of this goal. "In practice," he warns, "the control problem ... looks quite difficult," but "it looks like we will only get one chance."
And then, later in the essay, this:
[Our experience with] knowledge-based reasoning programs indicates that even superintelligent machines would struggle to guard their "goal-content integrity" and increase their intelligence simultaneously. Obviously, any superintelligence would grossly outstrip humans in its capacity to invent new abstractions and reconceptualize problems. The intellectual advantages of inventing new higher-level concepts are so immense that it seems inevitable that any human-level artificial intelligence will do so. But it is impossible to do this without risking changing the meaning of its goals, even in the course of ordinary reasoning. As a consequence, actual artificial intelligences would probably experience rapid goal mutation, likely into some sort of analogue of the biological imperatives to survive and reproduce (although these might take counterintuitive forms for a machine). The likelihood of goal mutation is a showstopper for Bostrom’s preferred schemes to keep AI "friendly," including for systems of sub-human or near-human intelligence that are far more technically plausible than the godlike entities postulated in his book.
The idea of goal-content integrity, going back to papers by
Omohundro and by
Bostrom, is roughly this: Suppose an AI has the ultimate goal of maximizing future production of paperclips, and contemplates the idea of switching to the ultimate goal of maximizing the future production of thumbtacks. Should it switch? In judging whether it is a good idea to switch, it tries to figure out whether switching will promote its ultimate goal - which, since it hasn't yet switched, is paperclip maximization - and it will reason that no, switching to the thumbtacks goal is unlikely to benefit future production of paperclips, so it will not switch. The generalization to other ultimate goals seems obvious.
On the other hand, the claim that a sufficiently intelligent AI will exhibit reliable goal-content integrity is not written in stone, as in a rigorously proven mathematical theorem. It might fail. It might even fail for the very reason suggested by Geist, which is similar to a recent idea by physicist Max Tegmark.4 But we do not know this. Geist says that "it is impossible to [invent new higher-level concepts] without risking changing the meaning of its goals". Maybe it is. On the other hand, maybe it isn't, as it could also be that a superintelligent AI would understand how to take extra care in the formation of new higher-level concepts, so as to avoid corruption of its goal content. Geist cannot possibly know this, and it seems to me that when he confidently asserts that the path advocated by Bostrom is "a solution that will not work", he vastly overestimates the reliability of his own intuitions when speculating about how agents far far more intelligent than him (or anyone he has ever met) will reason and act. To exhibit a more reasonable level of epistemic humility, he ought to downgrade his statement about "a solution that will not work" to one about "a solution that might not work" - which would bring him into agreement with Bostrom (and with me). And to anyone engaging seriously with this important field of enquiry, pessimism about the success probability of Bostrom's preferred scheme for avoiding AI Armageddon should be a signal to roll up one's sleeves and try to improve on the scheme. I hope to make a serious attempt in that direction myself, but am not sure whether I have the smarts for it.
Footnotes
1) The AI-colonizes-the-universe scenario is not at all far-fetched. For it to happen, we need (a) that the AI is capable of interstellar and intergalactic travel and colonization, and (b) that it has the desire to do so. Concerning (a), a 2013 paper by Stuart Armstrong and Anders Sandberg makes a strong case that full-scale colonization of the visible universe at close to the speed of light will become feasible. Concerning (b), arguments put forth in the seminal 1998 Great Filter paper by Robin Hanson strongly suggest that colonizing the universe is not an unlikely choice by an agent that has the capability to do so.
3) Although very much superfluous, we might add to this argument all the deficiencies in the human cognitive machinery that we are aware of. For instance, quoting Koch:
Homo sapiens is plagued by superstitions and short-term thinking (just watch politicians, many drawn from our elites, to whom we entrust our long-term future). To state the obvious, humanity's ability to calmly reason - its capacity to plan and build unperturbed by emotion (in short, our intelligence) - can improve. Indeed, it is entirely possible that over the past century, average intelligence has increased somewhat, with improved access to good nutrition and stimulating environments early in childhood, when the brain is maturing.
4) In his paper
Friendly artificial intelligence: the physics challenge, Tegmark suggests that the concepts involved in the AI's goal content might turn out to be meaningless when the AI discovers the fundamental physics underlying reality, and it will realize that its ulitmate goal is incoherent. Then what will it do? Hard to say. Here is Tegmark:
For example, suppose we program a friendly AI to maximize the number of humans whose souls go to
heaven in the afterlife. First it tries things like increasing people's compassion and church attendance. But suppose it then attains a complete scientific understanding of humans and human consciousness, and discovers that there is no such thing as a soul. Now what? In the same way, it is possible that any other goal we give it based on our current understanding of the world ("maximize the
meaningfulness of human life", say) may eventually be discovered by the AI to be undefined.