fredag 28 juni 2024

On optimism and pessimism standing at the brink of the great AI breakthrough

Sometimes in discussions about technology, the term techno-optimism is reserved for the belief that technology will develop rapidly, while techno-pessimism is used for the belief that it will move slowly or come to a halt. This is not the meaning of optimism and pessimism intended here. Throughout this blog post, the terms will refer to beliefs about consequences of this technology: are they likely to be good or to be bad?

Last time I wrote about the concepts of optimism and pessimism in the context of future AI advances and their ramifications, way back in 2017, I advocated for a sober and unbiased outlook on the world, and held
    both optimism and pessimism as biased distortions of the evidence at hand.1,2
I still hold this view, but I nevertheless think it is worth revisiting the issue to add some nuance,3 now in 2024 when we seem to be standing closer than ever to the brink of the great AI breakthrough. To suggest intentionally introducing either an optimism or a pessimism bias still sounds bad to me, but we can reframe the issue and make it less blatantly pro-distortion by admitting that all our judgements about our future with AI will necessarily be uncertain, and asking whether there might be an asymmetry in the badness of erring on the optimistic or the pessimistic side. Is excessive optimism worse than excessive pessimism, or vice versa?

There are obvious arguments to make in either direction. On one hand, erring on the side of optimism may induce decision-makers to recklessly move forward with unsafe technologies, whereas the extra caution that may result from undue pessimism is less obviously catastrophic. On the other hand, an overly pessimistic message may be disheartening and cause AI researchers, decision-makers and the general public to stop trying to create a better world and just give up.

The latter aspect came into focus for me when Eliezer Yudkowsky, after having began in 2021 to open up publicly about his dark view of humanity's chances of surviving upcoming AI developments, went all-in on this with his 2022 Death with dignity and AGI ruin blog posts. After all, there are all these AI safety researchers working hard to save humanity by solving the AI alignment problem - reserchers who rightly admire Yudkowsky as the brilliant pioneer who during the 00s almost single-handedly created this research area and discovered many of its crucial challenges,4 and to whom it may be demoralizing to hear that this great founder no longer believes the project has much chance of success. In view of this, shouldn't Yudkowsky at least have exhibited a bit more epistemic humility about his current position?

I now look more favorably upon Yudkowsky's forthrightness. What made me change my mind is a graphic published in June 2023 by Chris Olah, one of the leading AI safety researchers at Anthropic. The x-axis of Olah's graph represents the the level of difficulty of solving the AI alignment problem, ranging from trivial via steam engine, Apollo project and P vs NP to impossible, and his core messages are (a) that since uncertainty is huge about the true difficulty level we should rationally represent our belief about this as some probability distribution over his scale, and (b) that it is highly important to try to reduce the uncertainty and improve the precision in our belief, so as to better be able to work in the right kind of way with the right kind of solutions. It is with regards to (b) that I changed my mind on what Yudkowsky should or shouldn't say. If AI alignment is as difficult as Yudkowsky thinks, based on his unique experience of decades of working hard on the problem, then it is good that he speaks out about this, so as to help the rest of us move our probability mass towards P vs NP or beyond. If instead he held back and played along with the more common view that the difficulty is a lot easier - likely somewhere around steam engine or Apollo project - he would contribute to a consensus that might, e.g., cause a future AI developer to wreak havoc by releasing an AI that looked safe in the lab but failed to have that property in the wild. This is not to say that I entirely share Yudkowsky's view of the matter (which does look overconfident to me), but that is mostly beside the point, because all he can reasonably be expected to do is to deliver his own expert judgement.

At this point, I'd like to zoom in a bit more on Yudkowsky's list of lethalities in his AGI ruin post, and note that most of the items on the list express reasons for not putting much hope in one or the other of the following two things.
    (1) Our ability to solve the technical AI alignment problem.

    (2) Our ability to collectively decide not to build an AI that might wipe out Homo sapiens.

It is important for a number of reasons to distinguish pessimism about (1) and pessimism about (2),5 such as how a negative outlook on (1) gives us more reason to try harder to solve (2), and vice versa. However, the reason I'd like to mainly highlight here is that unlike (1), (2) is a mostly social phenomenon, so that beliefs about the feasibility of (2) can influence this very feasibility. To collectively decide not to build an existentially dangerous AI is very much a matter of curbing a race dynamic, be it between tech companies or between nations. Believing that others will not hold back may disincentivize a participant in the race from themselves holding back. This is why undue pessimism about (2) can become self-fulfilling, and for this reason I believe such pessimism about (2) to be much more lethal than a correponding misjudgement about (1).6

This brings me to former OpenAI employee Leopold Aschenbrenner's recent and stupendously interesting report Situational Awareness: The Decade Ahead.7 Nowhere else can we currently access a more insightful report about what is going on at the leading AI labs, how researchers there see the world, and the rough ride during the coming decade that the rest of the world can expect as a consequence of this. What I don't like, however, is the policy recommendations, which include the United States racing ahead as fast as possible towards AGI the next few years. Somewhat arbitrarily (or at lest with insufficiently explained reasons), Aschenbrenner expresses optimism about (1) but extreme pessimism about (2): the idea that the Chinese Communist Party might want to hold back from a world-destroying project is declared simply impossible unless their arm is twisted hard enough by an obviously superior United States. So while on one level I applaud Aschenbrenner's report for giving us outsiders this very valuable access to the inside view, on another level I fear that it will be counterproductive for solving the crucial global coordination problem in (2). And the combination of overoptimism regarding (1) and overpessimism regarding (2) seems super dangerous to me.

Footnotes

1) This was in the proceedings of a meeting held at the EU parliament on October 19, 2017. My discussion of the concepts of optimism and pessimism was provoked by how prominently these termes were used in the framing and marketing of the event.

2) Note here that in the quoted phrase I take both optimism and pessimism as deviations from what is justified by evidence - for instance, I don't here mean that taking the probability of things going well to be 99% to automatically count as optimistic. This is a bit of a deviation from standard usage, which in what follows I will revert to, and instead use phrases like "overly optimistic" to indicate optimism in the sense I gave the term in 2017.

3) To be fair to my 2017 self, I did add some nuance already then: the acceptance of "a different kind of optimism which I am more willing to label as rational, namely to have an epistemically well-calibrated view of the future and its uncertainties, to accept that the future is not written in stone, and to act upon the working assumption that the chances for a good future may depend on what actions we take today".

4) As for myself, I discovered Yudkowsky's writings in 2008 or 2009, and insofar as I can point to any single text having convinced me about the unique importance of AI safety, it's his 2008 paper Artificial intelligence as a positive and negative factor in global risk, which despite all the water under the bridges is still worthy of inclusion on any AI safety reading list.

5) Yudkowsky should be credited with making this distinction. In fact, when the Overton window on AI risk shifted drastically in early 2023, he took that as a sufficiently hopeful sign so as to change his mind in the direction of a somewhat less pessimistic view regarding (2) - see his much-discussed March 2023 Time Magazine article.

6) I don't deny that, due to the aforementioned demoralization phenomenon, pessimism about (1) might also be self-fulfilling to an extent. I don't think, however, that this holds to anywhere near the same extent as for (2), where our ability to coordinate is more or less constituted by the trust that the various participants in the race have that it can work. Regarding (1), even if a grim view of its feasibility becomes widespread, I think AI researchers will still remain interested in making progress on the problem, because along with its potentially enormous practical utility, surely this is one of the most intrinsically interesting research questions on can possibly ask (up there with understanding biogenesis or the Big Bang or the mystery of consciousness): what is the nature of advanced intelligence, and what determines its goals and motivations?

7) See also Dwarkesh Patel's four-and-a-half hour interview with Aschenbrenner, and Zwi Mowshowitz' detailed commentary.