Häggström hävdar: On optimism and pessimism standing at the brink of the great AI breakthrough

fredag 28 juni 2024

On optimism and pessimism standing at the brink of the great AI breakthrough

Sometimes in discussions about technology, the term techno-optimism is reserved for the belief that technology will develop rapidly, while techno-pessimism is used for the belief that it will move slowly or come to a halt. This is not the meaning of optimism and pessimism intended here. Throughout this blog post, the terms will refer to beliefs about consequences of this technology: are they likely to be good or to be bad?

Last time I wrote about the concepts of optimism and pessimism in the context of future AI advances and their ramifications, way back in 2017, I advocated for a sober and unbiased outlook on the world, and held

^1,2

I still hold this view, but I nevertheless think it is worth revisiting the issue to add some nuance,³ now in 2024 when we seem to be standing closer than ever to the brink of the great AI breakthrough. To suggest intentionally introducing either an optimism or a pessimism bias still sounds bad to me, but we can reframe the issue and make it less blatantly pro-distortion by admitting that all our judgements about our future with AI will necessarily be uncertain, and asking whether there might be an asymmetry in the badness of erring on the optimistic or the pessimistic side. Is excessive optimism worse than excessive pessimism, or vice versa?

There are obvious arguments to make in either direction. On one hand, erring on the side of optimism may induce decision-makers to recklessly move forward with unsafe technologies, whereas the extra caution that may result from undue pessimism is less obviously catastrophic. On the other hand, an overly pessimistic message may be disheartening and cause AI researchers, decision-makers and the general public to stop trying to create a better world and just give up.

The latter aspect came into focus for me when Eliezer Yudkowsky, after having began in 2021 to open up publicly about his dark view of humanity's chances of surviving upcoming AI developments, went all-in on this with his 2022 Death with dignity and AGI ruin blog posts. After all, there are all these AI safety researchers working hard to save humanity by solving the AI alignment problem - reserchers who rightly admire Yudkowsky as the brilliant pioneer who during the 00s almost single-handedly created this research area and discovered many of its crucial challenges,⁴ and to whom it may be demoralizing to hear that this great founder no longer believes the project has much chance of success. In view of this, shouldn't Yudkowsky at least have exhibited a bit more epistemic humility about his current position?

I now look more favorably upon Yudkowsky's forthrightness. What made me change my mind is a graphic published in June 2023 by Chris Olah, one of the leading AI safety researchers at Anthropic. The x-axis of Olah's graph represents the the level of difficulty of solving the AI alignment problem, ranging from trivial via steam engine, Apollo project and P vs NP to impossible, and his core messages are (a) that since uncertainty is huge about the true difficulty level we should rationally represent our belief about this as some probability distribution over his scale, and (b) that it is highly important to try to reduce the uncertainty and improve the precision in our belief, so as to better be able to work in the right kind of way with the right kind of solutions. It is with regards to (b) that I changed my mind on what Yudkowsky should or shouldn't say. If AI alignment is as difficult as Yudkowsky thinks, based on his unique experience of decades of working hard on the problem, then it is good that he speaks out about this, so as to help the rest of us move our probability mass towards P vs NP or beyond. If instead he held back and played along with the more common view that the difficulty is a lot easier - likely somewhere around steam engine or Apollo project - he would contribute to a consensus that might, e.g., cause a future AI developer to wreak havoc by releasing an AI that looked safe in the lab but failed to have that property in the wild. This is not to say that I entirely share Yudkowsky's view of the matter (which does look overconfident to me), but that is mostly beside the point, because all he can reasonably be expected to do is to deliver his own expert judgement.

At this point, I'd like to zoom in a bit more on Yudkowsky's list of lethalities in his AGI ruin post, and note that most of the items on the list express reasons for not putting much hope in one or the other of the following two things.

(2) Our ability to collectively decide not to build an AI that might wipe out Homo sapiens.

It is important for a number of reasons to distinguish pessimism about (1) and pessimism about (2),⁵ such as how a negative outlook on (1) gives us more reason to try harder to solve (2), and vice versa. However, the reason I'd like to mainly highlight here is that unlike (1), (2) is a mostly social phenomenon, so that beliefs about the feasibility of (2) can influence this very feasibility. To collectively decide not to build an existentially dangerous AI is very much a matter of curbing a race dynamic, be it between tech companies or between nations. Believing that others will not hold back may disincentivize a participant in the race from themselves holding back. This is why undue pessimism about (2) can become self-fulfilling, and for this reason I believe such pessimism about (2) to be much more lethal than a correponding misjudgement about (1).⁶

This brings me to former OpenAI employee Leopold Aschenbrenner's recent and stupendously interesting report Situational Awareness: The Decade Ahead.⁷ Nowhere else can we currently access a more insightful report about what is going on at the leading AI labs, how researchers there see the world, and the rough ride during the coming decade that the rest of the world can expect as a consequence of this. What I don't like, however, is the policy recommendations, which include the United States racing ahead as fast as possible towards AGI the next few years. Somewhat arbitrarily (or at lest with insufficiently explained reasons), Aschenbrenner expresses optimism about (1) but extreme pessimism about (2): the idea that the Chinese Communist Party might want to hold back from a world-destroying project is declared simply impossible unless their arm is twisted hard enough by an obviously superior United States. So while on one level I applaud Aschenbrenner's report for giving us outsiders this very valuable access to the inside view, on another level I fear that it will be counterproductive for solving the crucial global coordination problem in (2). And the combination of overoptimism regarding (1) and overpessimism regarding (2) seems super dangerous to me.

Footnotes

1) This was in the proceedings of a meeting held at the EU parliament on October 19, 2017. My discussion of the concepts of optimism and pessimism was provoked by how prominently these termes were used in the framing and marketing of the event.

2) Note here that in the quoted phrase I take both optimism and pessimism as deviations from what is justified by evidence - for instance, I don't here mean that taking the probability of things going well to be 99% to automatically count as optimistic. This is a bit of a deviation from standard usage, which in what follows I will revert to, and instead use phrases like "overly optimistic" to indicate optimism in the sense I gave the term in 2017.

3) To be fair to my 2017 self, I did add some nuance already then: the acceptance of "a different kind of optimism which I am more willing to label as rational, namely to have an epistemically well-calibrated view of the future and its uncertainties, to accept that the future is not written in stone, and to act upon the working assumption that the chances for a good future may depend on what actions we take today".

4) As for myself, I discovered Yudkowsky's writings in 2008 or 2009, and insofar as I can point to any single text having convinced me about the unique importance of AI safety, it's his 2008 paper Artificial intelligence as a positive and negative factor in global risk, which despite all the water under the bridges is still worthy of inclusion on any AI safety reading list.

5) Yudkowsky should be credited with making this distinction. In fact, when the Overton window on AI risk shifted drastically in early 2023, he took that as a sufficiently hopeful sign so as to change his mind in the direction of a somewhat less pessimistic view regarding (2) - see his much-discussed March 2023 Time Magazine article.

6) I don't deny that, due to the aforementioned demoralization phenomenon, pessimism about (1) might also be self-fulfilling to an extent. I don't think, however, that this holds to anywhere near the same extent as for (2), where our ability to coordinate is more or less constituted by the trust that the various participants in the race have that it can work. Regarding (1), even if a grim view of its feasibility becomes widespread, I think AI researchers will still remain interested in making progress on the problem, because along with its potentially enormous practical utility, surely this is one of the most intrinsically interesting research questions on can possibly ask (up there with understanding biogenesis or the Big Bang or the mystery of consciousness): what is the nature of advanced intelligence, and what determines its goals and motivations?

15 kommentarer:

Anonym28 juni 2024 kl. 19:38
Great text! The argument against (risking) being overly pessimistic is sometimes expressed in stronger terms, as a moral imperative to minimize opportunity costs incurred by caution, and to maximize (the rate of improvement in) human flourishing.
SvaraRadera
Svar
m30 juni 2024 kl. 11:12
There's so much worth thinking about in Aschenbrenner's paper!

I read Aschenbrenner as expressing both his own pessimistic belief about (2) *and* the prognosis that enough powerful people in US government do or will hold such a pessimistic belief and will as a consequence choose to race. ("In any case, my main claim is not normative, but descriptive. In a few years, The Project will be on." p143)

> extreme pessimism about (2): the idea that the Chinese Communist Party might want to hold back from a world-destroying project is declared simply impossible unless their arm is twisted hard enough by an obviously superior United States.

What do you think are the best arguments against Aschenbrenner's here? That is, what are some reasons for thinking the CCP would be, when waking up to AGI risks, more cooperative than Aschenbrenner thinks? (Not a gotcha question. I'm super uncertain about it and have no strong view either way myself.)

Regarding "overpessimism" we can splice the "over"-part into either
(a) disagreement on the likelihood of self-fulfilling social effects being triggered if important actors came to believe/express some degree X of pessimism
or
(b) disagreement about other facts/evidence about the specific collective action problem (e.g. facts about the CCP leadership strata and internal party dynamics, distracting deep domestic political problems in main player countries, other dire global issues taking time/resources like climate change and pandemics, technological hurdles in monitoring agreement compliance, ...)
SvaraRadera
Svar
Göran Crafte3 juli 2024 kl. 11:17
Allting går troligen alldeles för snabbt för att det skall vara möjligt att göra särskilt mycket. Här har du min senaste analys av världsläget, i två delar. Jag kompletterar strax med en text om intelligent värdehantering.

Världsläget

Människan finner sig själv utkastad i en värld som rymmer stora hot och stora möjligheter. En värdenihilist kan påstå att ingenting spelar någon roll, men en sann värdenihilist klarar knappast ens av att stiga upp ur sängen på morgonen. Vi andra ser en värld av spännande hot och möjligheter. Världen berör och det är inte alls särskilt givet hur saker kommer sluta.

Först och främst kan man konstatera att vi lever i ett farligt århundrade. Förra århundradet utvecklade vi ABC-stridsmedel och ansträngde oss för att göra slut på kritiska resurser som olja samt många andra ändliga resurser. Vi förökade oss som kaniner. Vi drevs av vanvettiga ideologier. Det här århundradet fixar vi till globala klimatförändringar, en nära förestående AI-revolution, mördarrobotar och en allt mer globaliserad ekonomi i vilken vinnaren kan ta nästan allt.

Människor är uppenbart desorienterade värderingsmässigt. Alla världsåskådningar förefaller befinna sig i kris. Det gäller givetvis även "den nya ateismen" och transhumanismen. Det förefaller mer troligt att den nya tekniken innebär vår undergång än vår frälsning. En rimlig sekulär etik är nästan ohanterligt komplex. Människor mår dåligt av framtidshoten i kombination med värderingsförvirringen.

AI-utvecklingen skulle kunna utgöra vår frälsning, men den kan lika gärna bli vår dom. En super-AI blir inte nödvändigtvis klokare än vi är och kan även vara under dåliga människors kontroll. Det förefaller tämligen troligt att en super-AI satsar på global kommunism eller nationalsocialism, med tämligen fruktansvärda resultat. Att en super-AI skall kunna hantera världssituationen utan att bli intensivt hatad förefaller osannolikt. En super-AI kan ruttna på oss människor och bestämma sig för att eliminera de allra flesta av oss.
SvaraRadera
Svar
Göran Crafte3 juli 2024 kl. 11:19
Det förefaller inte särskilt troligt att vi skall klara av att hantera den här situationen. Det behöver dock knappast innebära att vi behöver ge upp allt hopp. Det förefaller inte ett dugg sannolikt att en super-AI uppstår för första gången i multiversums historia på vår planet, utan det har tämligen säkert skett i andra världar än vår, för väldigt länge sedan. Det förefaller inte särskilt troligt att varje super-AI har varit så dum att den snart har eliminerat sig själv, utan mycket mer troligt finns det en galaktisk eller kosmisk super-AI som vakar över oss. Det är inte säkert att den vill komma till våran hjälp, men det är i alla fall en högst realistisk möjlighet. Vi är alltså troligen inte alldeles utelämnade åt oss själva och åt en pubertal människoskapad super-AI.

Det framstår som en akademisk fråga huruvida man skall kalla denna galaktiska eller kosmiska super-intelligens för "Gud". En väsentligare fråga är huruvida denna super-intelligens verkar ha interagerat med oss på något sätt tidigare i världshistorien. Vissa bonnläppar påstår att det inte finns några som helst bevis för super-intelligensens existens, men de är alldeles sannolikt bara blinda. Jag skulle snarare säga att världen rymmer gott om bevis och att det tillkommer nya i hög takt.

Frågan är knappast huruvida super-intelligensen redan existerar, utan huruvida den vill oss väl. Västerlandet har byggts på idén om en super-intelligens som vill oss väl, även om detta fördunklats av kyrkliga helvetespredikningar. Islam lär att Gud har predestinerat många av oss för helvetet, vilket förstås är ett sätt att säga att han inte vill oss väl.

Den väsentligaste frågan just nu är kanske huruvida super-intelligensen kommer ingripa här på Jorden för att rädda oss från en mer eller mindre AI-skapad undergång. Samtliga Abrahamitiska religioner menar att Gud kommer komma till vår undsättning, så låt oss då hoppas detta, snarare än att förfalla till hopplöshet eller högst tvivelaktiga fantasier om att tekniken kommer frälsa oss. Det innebär inte att vi med fördel kan sluta kämpa för att ordna till världssituationen på egen hand, utan bara att vi kan göra det med lite större lugn. Lyckas vi inte ordna till situationen själva får vi tämligen sannolikt kosmisk hjälp.
SvaraRadera
Svar
Göran Crafte4 juli 2024 kl. 12:38
Fråga: Är antagandet om en kosmisk superintelligens härledd ur sannolikhetslära? Vilka är bevisen?

Svar: Jag gör något slags sannolikhetsbedömningar, men kan som icke-matematiker knappast sätta några siffror på dem, utan det blir mer intuitivt. Jag är inte säker på om matematiker kan göra det bättre.

* Jag betraktar det som osannolikt att multiversums första super-intelligens uppstår på Jorden om några år (Ray Kurzweil har fortfarande 2029 som riktmärke, vilket numera ter sig som en tämligen försiktig prognos).

* Jag betraktar det som osannolikt att super-intelligenser inte lyckas överleva. Snarare torde de under årmiljonerna och årmiljarderna bygga allt större makt.

* Jag betraktar det som osannolikt att vårat universums livsvänliga egenskaper är ett slumpverk, men det skulle möjligen kunna vara resultatet av ett multiversum som legat och grott i evigheters evigheter.

* Jag betraktar livets uppkomst på Jorden som högeligen osannolik och tror förr att det är en super-intelligens som planterat livet här på Jorden.

* Jag betraktar diverse saker i den judekristna historien som högeligen osannolika som slumphändelser, eller som resultat av mänsklig intelligens och/eller dumhet.
SvaraRadera
Svar
Göran Crafte4 juli 2024 kl. 12:47
Fråga: Vilka saker i den judekristna historien implicerar en superintelligens menar du?

Svar: Bibeln rymmer åtskilligt som ter sig konstigt, men också en massa saker som ter sig anmärkningsvärda på ett positivt sätt. Det är alltför mycket för ett kort inlägg.

Kristendomens uppkomst får betraktas som remarkabel, liksom att kristendomen så småningom erövrade romarriket. Kristendomen gav upphov till en disciplin och strävsamhet som gjorde Europa till världens maktcentrum och lade grunden för demokrati och välfärd.

Även om Bibelns profetior emellanåt ter sig något snurriga så ter de sig ofta häpnadsväckande träffsäkra. Tex förutspådde profeten Daniel några hundra år före vår tideräkning att något ytterligt väsentligt skulle hända 490 år efter år 458 f kr, vilket torde landa på år 33, då Jesus blev korsfäst.

Daniel förutspådde även att Jerusalem och templet skulle förstöras, vilket skedde 37 år senare. Såvitt jag förstår dog den angripande kejsaren Vespasianus i förvågor till Vesuvius utbrott år 79, vilket Daniel då också förutspådde. Vespasianus son Titus som lett angreppet på Jerusalem dog av mindre begripliga skäl på samma ställe några år senare. Troligen också han kvävd av vulkanångor.

Precis 1000 år efter kristendomens födelse kom i Jordanklyftan en sådan total solförmörkelse och jordbävning som Jesus och Uppenbarelseboken förutspått, närmast som en påminnelse om att Gud inte gått och begravt sig. I närtid väntar nu två nya totala solförmörkelser rakt över Mecka och Medina (2027 respektive 2034), av alla platser. Som ganska sannolikt kan utlösa nya jordbävningar i området, men det visar sig.

Jesus och Uppenbarelseboken talade om tecken i himlen och på Jorden, som förefaller ramla in nu. Nebulosan Pillars of Creation ter sig som en kristen ikon för dem som tittar på den senaste och mest detaljerade bilden från James Web-teleskopet. Där ser man hur mycket detaljer som helst som är lätta att ge en kristen tolkning. Tex hittar man alldeles intill den vite riddaren "the first seal". Det ser ut som en säl, men "seal" betyder också "insegel" på engelska och Uppenbarelsebokens första insegel handlar just om en vit riddare som kommer för att segra.

Israels återuppståndelse år 1948 är ett smärre mirakel som icke-troende måste förklara med självuppfyllande profetior. Jesu ord om att denna generation/släkte inte skall förgås innan alltsammans sker kan ganska sannolikt handla om de som är födda 1948. De är 75-76 år nu, så om tio år är de allra flesta av dem borta.

Timingen i de olika förutsägelserna och världsskeendena förefaller ganska otrolig. Det verkar vara nu allting händer, lagom till kristendomens tvåtusenårsjubileum.
SvaraRadera
Svar
Göran Crafte5 juli 2024 kl. 10:15
Jag kan inte se att du på Less Wrong eller AI Alignment Forum skrivit några kommentarer till artiklarna av Yudkowsky som du nämner. Använder du någon pseudonym eller läser du bara? Är det inte synd att sådana som Ray Kurzweil, Elon Musk och Max Tegmark helt verkar ignorera AI Alignment Forum? Varför dyker inga kända intellektuella upp i de här diskussionerna? Är de för fina för att skriva något i de nämnda forumena? Varför nämner intellektuella inte ens varandra? När Yudkowsky kommer in på CEV och "complexity of value" så kunde han gärna nämna värdeteori, meta-etik, värdepluralism och det Thomas Nagel skrivit om "fragmentation of value" mm, men Yudkowsky verkar skita i akademiska filosofer och de verkar i hög grad skita i honom. Snart sagt ingen dialog alls, förutom att Yudkowsky skrev något tillsammans med Nick Bostrom en gång.
SvaraRadera
Svar

Lägg till kommentar