onsdag 18 juni 2025

Pro tip on discussions about AI xrisk: don't get sidetracked

In my experience, a quite frequent dynamic in discussions about existential risk (xrisk) from AI is that my conversation partner remains skeptic about the reality of such risk, but before that issue has been even the slightest bit resolved they propose to change the subject to some neighboring topic, such as one of the following.
  • But if (for the sake of argument) the risk is actually real, is there anything at all we can do about it?
  • But doesn't this whole xrisk issue just distract from more pressing near-term AI risks which we ought to discuss instead?
  • But evolution moves on, so what's the big deal anyway if humanity is replaced by some superior new kind of beings?
In all three cases, my recommendation (based on years of experience from having these discussions) is to avoid getting sidetracked and to insist on getting clear on the is-AI-xrisk-a-real-thing issue before moving on, and I will explain why. The explanations will be a bit different in the three cases, so I'm not sure how well my advise generalizes to other change-of-topic proposals, and I can in fact easily think of other more benign such proposals where I would happily oblige, such as "OK, let's agree to disagree, but how about if we go grab a beer and discuss [X]", where X could be anything ranging from tonight's football game to the scandal of Douglas Adams never having been rewarded the Nobel Prize in literture. On to the three cases:

But is there anything we can do?

I think the question of what we can do to mitigate or entirely avoid AI xrisk is very important and very difficult, and I am super happy to discuss it, provided my discussion partner is on board with the idea that the risk may be real. If he or she is not, I will politely decline to discuss this topic, because from from their point of view there is no real problem to be solved, and if I agree to discuss it anyway their contribution to the discussion therefore tends not to be very constructive. When we enter the realm of AI governance (as it nowadays tends to do fairly quickly, because as opposed to just 4-5 years ago I no longer believe that technical AI alignment on its own has much chance of saving us from AI catastrophe without assistance from politics and legislation), they will bombard me with questions such as "What about China?", "What about Trump?", "What about the relentless market forces?", and so on. These are all valid questions, but as the deck is currently stacked they are also extremely difficult, to the extent that even a moderately clever discussion partner who is not interested in actually solving the problem but merely in playing devil's advocate is likely to win the argument and triumphantly conclude that I have no clear and feasible plan for avoiding catastrophe, so why am I wasting people's time by going on and on abut AI xrisk?

And here's the thing. Many of those who play the devil's advocate in this way will be aiming for exactly that turn of the conversation, and will implicitly and possibly unconsciously believe that at that point, they have arrived at a reduction ad absurdum where the assumption that AI xrisk is real has been shown to be absurd and therefore false. But the reasoning leading to this reductio doesn't work, because it relies on (something like) the assumption that the universe is a sufficiently benign place to not put humanity in a situation where we are utterly doomed. Although this assumption is central to various Christian thinkers, it is in fact unwarranted, a horrible realization which is core to the so-called Deep Atheism of Eliezer Yudkowsky, further elaborated in recent work by Joe Carlsmith.

To reiterate, I do think that the AI governance questions on how to stop actors from building an apocalyptically dangerous AI are important, and I am very interested in discussing them. They are also difficult - difficult enough that I don't know of any path forward that will clearly work, yet we have far from exhausted all such possibilities, so the challenges cannot at this stage be dismissed as impossible. I want to explore potential ways forward in intellectual exchanges, but am only prepared to do it with someone who actually wants to help, because the field is so full of real difficulties of which we who work in it are so highly aware that our primary need is not for additional devil's advocates to repeat these difficulties to us. Our primary need is for the discussions to be serious and constructive, and for that we need discussion partners who take seriously the possibility of AI xrisk being real.

But distraction?

So what about the suggestion to put the AI xrisk issue aside, on account of it just being a distraction from more pressing concerns coming out of present-day AI systems? These are concerns about things like AI bias, copyright issues, deepfakes and the AI-driven poisoning of our epstemic infrastructure. I have two problems with this suggestion.

The first is terminological. Calling those kinds of more down-to-Earth AI issues "near-term", in contrast to AI xrisk which is called "long-term", may have had some logic to it in the bygone era of the 2010s when most of us working on AI xrisk thought the crucial events such as an intelligence explosion and/or the extinction of humanity were at least decades away. Now that there seems to be a very serious possibility that these may happen within the next five years or so (see, e.g., Daniel Kokotajlo et al's seminal AI 2027), insisting on this near-term vs long-term terminology has become highly misleading. My near-term survival may well depend on preventing an existential AI catastrophe!

My second problem with the change-of-topic suggestion is more substantial, and lies in whether the term "just" (as in "AI xrisk is just a distraction") is justified. Well, I claim that in order to judge that, we need to work out whether or not AI xrisk is a real thing. If it is not a real thing, then of course discussing it is just a distraction from more pressing real-world issues, and we should switch topics, whereas if it is a real thing, then of course it is a topic that warrants discussion, and not "just a distraction". Hence, to judge the case for changing topics on account of AI xrisk being "just a distraction", we have no choice but to continue discussing AI xrisk until we have reached a verdict on whether or not it is a real thing. As long as we disagree about that the suggested change of topic is premature.

To avoid any misunderstanding here, let me emphasize that I think many down-to-Earth problems with present-day AI are important to discuss. But there are plenty of times and places to do so, in parallell with discussions elsewhere on AI xrisk. There really isn't any need to abort the few discussions taking place about AI xrisk to leave room for those other AI discussions. See my paper On the troubled relation between AI ethics and AI safety for more on this.

But evolution?

So what about the question of whether humanity being replaced by a society of advanced AIs is a good thing or a bad thing? This is an interesting and serious philosophical question, involving whether to employ an ethics that is from the point of view of humanity, or a more objective one that is more from the point of view of the universe. There are surprisingly many thinkers within the AI sphere, including names like Larry Page, Robin Hanson, Hugo de Garis and Richard Sutton, who claim it is not a bad thing at all; see, e.g., this tweet by Andres Critch and Section 7 of my paper Our AI future and the need to stop the bear. And yes, I am happy to discuss it. But not with someone who is not yet on board with AI xrisk being a real thing, because to them the issue is merely theoretical, making them less capable of seeing the seriousness of the matter and of feeling the moral force of wanting to prevent the omnicidal murder by AIs of you and me and all our loved ones along with the remaining eight billion humans. If I agree to the proposed change of discussion topic, I run the risk of assisting my discussion partner, who was perhaps originally just driven by a lust for philosophical sophistication or contrarianness, in painting themselves into a philosophical corner, making it more difficult for them to wake up to the horror of omnicide once we get to the point that they realize it may actually happen.

Concluding remarks

Every discussion context is unique, and most of them are challenging in their own ways. In particular, standing on stage during a Q&A session with a room full of skeptics is usually more challenging than a one-on-one. I therefore fully expect to sometimes be sidetracked in future discussions in precisely the directions that I above recommend avoiding, and the same thing might well happen to the reader. But even when that happens, I believe having thought through the meta-concerns I raise above may be beneficial for the continued discussion.

Explaining AI xrisk convincingly to skeptics is not an easy thing to do, even of one's basic reasoning on the matter is correct. One reason for this is that people have such wildly varying intuitions on this topic, and tend to get hung up on very different issues - or in Liron Shapira's colorful language, there are many "stops where you could get off the Doom Train". Consequently, there are many different places where the conversation can go astray. Ideally it would be good to have a handbook, cataloguing all of them along with instructions on how to avoid or escape these traps, but for the time being we'll have to make do with more scattered and less systematic treatments of some of these traps, which is the kind of thing I try to do in this blog post.

If the reader aspires to become a better communicator of AI xrisk ideas, what can he or she do? I think this is very much a case where mastery comes with practice. It may take a long time, and after nearly 15 years of such practice I am still working on it. Along with that, it also helps to listen to masters like Rob Miles and the aforementioned Liron Shapira, and to read some modern classics such as the exemplary synthesis The Compendium by Connor Leahy and coauthors.

Inga kommentarer:

Skicka en kommentar