Visar inlägg med etikett Eliezer Yudkowsky. Visa alla inlägg
Visar inlägg med etikett Eliezer Yudkowsky. Visa alla inlägg

onsdag 18 juni 2025

Pro tip on discussions about AI xrisk: don't get sidetracked

In my experience, a quite frequent dynamic in discussions about existential risk (xrisk) from AI is that my conversation partner remains skeptic about the reality of such risk, but before that issue has been even the slightest bit resolved they propose to change the subject to some neighboring topic, such as one of the following.
  • But if (for the sake of argument) the risk is actually real, is there anything at all we can do about it?
  • But doesn't this whole xrisk issue just distract from more pressing near-term AI risks which we ought to discuss instead?
  • But evolution moves on, so what's the big deal anyway if humanity is replaced by some superior new kind of beings?
In all three cases, my recommendation (based on years of experience from having these discussions) is to avoid getting sidetracked and to insist on getting clear on the is-AI-xrisk-a-real-thing issue before moving on, and I will explain why. The explanations will be a bit different in the three cases, so I'm not sure how well my advise generalizes to other change-of-topic proposals, and I can in fact easily think of other more benign such proposals where I would happily oblige, such as "OK, let's agree to disagree, but how about if we go grab a beer and discuss [X]", where X could be anything ranging from tonight's football game to the scandal of Douglas Adams never having been rewarded the Nobel Prize in literture. On to the three cases:

But is there anything we can do?

I think the question of what we can do to mitigate or entirely avoid AI xrisk is very important and very difficult, and I am super happy to discuss it, provided my discussion partner is on board with the idea that the risk may be real. If he or she is not, I will politely decline to discuss this topic, because from from their point of view there is no real problem to be solved, and if I agree to discuss it anyway their contribution to the discussion therefore tends not to be very constructive. When we enter the realm of AI governance (as it nowadays tends to do fairly quickly, because as opposed to just 4-5 years ago I no longer believe that technical AI alignment on its own has much chance of saving us from AI catastrophe without assistance from politics and legislation), they will bombard me with questions such as "What about China?", "What about Trump?", "What about the relentless market forces?", and so on. These are all valid questions, but as the deck is currently stacked they are also extremely difficult, to the extent that even a moderately clever discussion partner who is not interested in actually solving the problem but merely in playing devil's advocate is likely to win the argument and triumphantly conclude that I have no clear and feasible plan for avoiding catastrophe, so why am I wasting people's time by going on and on abut AI xrisk?

And here's the thing. Many of those who play the devil's advocate in this way will be aiming for exactly that turn of the conversation, and will implicitly and possibly unconsciously believe that at that point, they have arrived at a reduction ad absurdum where the assumption that AI xrisk is real has been shown to be absurd and therefore false. But the reasoning leading to this reductio doesn't work, because it relies on (something like) the assumption that the universe is a sufficiently benign place to not put humanity in a situation where we are utterly doomed. Although this assumption is central to various Christian thinkers, it is in fact unwarranted, a horrible realization which is core to the so-called Deep Atheism of Eliezer Yudkowsky, further elaborated in recent work by Joe Carlsmith.

To reiterate, I do think that the AI governance questions on how to stop actors from building an apocalyptically dangerous AI are important, and I am very interested in discussing them. They are also difficult - difficult enough that I don't know of any path forward that will clearly work, yet we have far from exhausted all such possibilities, so the challenges cannot at this stage be dismissed as impossible. I want to explore potential ways forward in intellectual exchanges, but am only prepared to do it with someone who actually wants to help, because the field is so full of real difficulties of which we who work in it are so highly aware that our primary need is not for additional devil's advocates to repeat these difficulties to us. Our primary need is for the discussions to be serious and constructive, and for that we need discussion partners who take seriously the possibility of AI xrisk being real.

But distraction?

So what about the suggestion to put the AI xrisk issue aside, on account of it just being a distraction from more pressing concerns coming out of present-day AI systems? These are concerns about things like AI bias, copyright issues, deepfakes and the AI-driven poisoning of our epstemic infrastructure. I have two problems with this suggestion.

The first is terminological. Calling those kinds of more down-to-Earth AI issues "near-term", in contrast to AI xrisk which is called "long-term", may have had some logic to it in the bygone era of the 2010s when most of us working on AI xrisk thought the crucial events such as an intelligence explosion and/or the extinction of humanity were at least decades away. Now that there seems to be a very serious possibility that these may happen within the next five years or so (see, e.g., Daniel Kokotajlo et al's seminal AI 2027), insisting on this near-term vs long-term terminology has become highly misleading. My near-term survival may well depend on preventing an existential AI catastrophe!

My second problem with the change-of-topic suggestion is more substantial, and lies in whether the term "just" (as in "AI xrisk is just a distraction") is justified. Well, I claim that in order to judge that, we need to work out whether or not AI xrisk is a real thing. If it is not a real thing, then of course discussing it is just a distraction from more pressing real-world issues, and we should switch topics, whereas if it is a real thing, then of course it is a topic that warrants discussion, and not "just a distraction". Hence, to judge the case for changing topics on account of AI xrisk being "just a distraction", we have no choice but to continue discussing AI xrisk until we have reached a verdict on whether or not it is a real thing. As long as we disagree about that the suggested change of topic is premature.

To avoid any misunderstanding here, let me emphasize that I think many down-to-Earth problems with present-day AI are important to discuss. But there are plenty of times and places to do so, in parallell with discussions elsewhere on AI xrisk. There really isn't any need to abort the few discussions taking place about AI xrisk to leave room for those other AI discussions. See my paper On the troubled relation between AI ethics and AI safety for more on this.

But evolution?

So what about the question of whether humanity being replaced by a society of advanced AIs is a good thing or a bad thing? This is an interesting and serious philosophical question, involving whether to employ an ethics that is from the point of view of humanity, or a more objective one that is more from the point of view of the universe. There are surprisingly many thinkers within the AI sphere, including names like Larry Page, Robin Hanson, Hugo de Garis and Richard Sutton, who claim it is not a bad thing at all; see, e.g., this tweet by Andres Critch and Section 7 of my paper Our AI future and the need to stop the bear. And yes, I am happy to discuss it. But not with someone who is not yet on board with AI xrisk being a real thing, because to them the issue is merely theoretical, making them less capable of seeing the seriousness of the matter and of feeling the moral force of wanting to prevent the omnicidal murder by AIs of you and me and all our loved ones along with the remaining eight billion humans. If I agree to the proposed change of discussion topic, I run the risk of assisting my discussion partner, who was perhaps originally just driven by a lust for philosophical sophistication or contrarianness, in painting themselves into a philosophical corner, making it more difficult for them to wake up to the horror of omnicide once we get to the point where they realize it may actually happen.

Concluding remarks

Every discussion context is unique, and most of them are challenging in their own ways. In particular, standing on stage during a Q&A session with a room full of skeptics is usually more challenging than a one-on-one. I therefore fully expect to sometimes be sidetracked in future discussions in precisely the directions that I above recommend avoiding, and the same thing might well happen to the reader. But even when that happens, I believe having thought through the meta-concerns I raise above may be beneficial for the continued discussion.

Explaining AI xrisk convincingly to skeptics is not an easy thing to do, even of one's basic reasoning on the matter is correct. One reason for this is that people have such wildly varying intuitions on this topic, and tend to get hung up on very different issues - or in Liron Shapira's colorful language, there are many "stops where you could get off the Doom Train". Consequently, there are many different places where the conversation can go astray. Ideally it would be good to have a handbook, cataloguing all of them along with instructions on how to avoid or escape these traps, but for the time being we'll have to make do with more scattered and less systematic treatments of some of these traps, which is the kind of thing I try to do in this blog post.

If the reader aspires to become a better communicator of AI xrisk ideas, what can he or she do? I think this is very much a case where mastery comes with practice. It may take a long time, and after nearly 15 years of such practice I am still working on it. Along with that, it also helps to listen to masters like Rob Miles and the aforementioned Liron Shapira, and to read some modern classics such as the exemplary synthesis The Compendium by Connor Leahy and coauthors.

tisdag 20 maj 2025

Ödesfråga i Lund

Förra veckan talade jag i Lund, på Kungliga Fysiografiska Sällskapets årliga Rausingsymposium, som i år bar rubriken Behövs våra hjärnor? med underrubriken AI - hot och möjligheter. Symposiet som helhet finns anslaget som YouTube-video. Mitt eget bidrag bar rubriken AI är mänsklighetens största ödesfråga och upptar tidsspannet 24:00-58:30 i videon, men där finns också en rad andra föredrag värda att lyssna till, varav jag allra främst vill rekommendera vännen Thore Husfeldt bidrag AI och undervisning (4:25:00-5:00:30) som jag ärligt kan säga är det både förståndigaste och mest välformulerade jag hört i den brännande frågan om hur utbildningssystemet bör förhålla sig till den nya AI-tekniken.

Symposiet kommer även att publiceras i bokform, och jag har idag levererat till arrangörerna min text AI är mänsklighetens största ödesfråga, som visserligen inte är en ordagrann återgivning av föredraget, men som till både innehåll och form följer det relativt nära. Jag vill såklart gärna att ni läser den fyra och en halv sida långa texten i sin helhet, men här, som ett litet smakprov, några stycken hämtade ur den:
    Jag har stor respekt för att det finns andra frågor än AI – exempelvis klimatförändringarna – som kan göra anspråk på att vara stora ödesfrågor för mänskligheten, men de senaste åren har jag kommit att landa i att den om hur vi hanterar AI-utvecklingen är den största och mest akuta av dem alla. Jag skall förklara varför, men vill börja i en mer jordnära ände.

    [...]

    En typ av arbetsuppgift där AI gjort särskilt dramatiska framsteg de senaste åren och som kan komma att få stor betydelse för den fortsatta utvecklingen som helhet är kodning och mjukvaruutveckling. AI-systemens förmåga att skriva korrekt kod och i övrigt lösa uppgifter riktigt är starkt avhängig uppgiftens omfattning. I en rapport från AI-säkerhetsorganisationen METR i mars i år studeras hur denna förmåga utvecklats över tid. Det visar sig att omfattningen – mätt i tidsåtgång för en mänsklig expert – som AI klarar av har ökat från enstaka sekunder 2019 till cirka en timme idag. Ökningen är exponentiell, med en observerad genomsnittlig fördubblingstid på sju månader, och om man extrapolerar den trenden blott ett par-tre år in i framtiden blir resultatet dramatiskt. Sådan kurvanpassning inbegriper givetvis stora osäkerheter, men ser man till hur modellerna förbättrats från 2024 och framåt verkar det snarast som att utvecklingen är på väg att gå ännu fortare.

    Det är bland annat den sortens data som ligger till grund för den gedigna rapporten AI 2027, utkommen i april i år och författad av en kvintett forskare med den avhoppade OpenAI-medarbetaren Daniel Kokotajlo i spetsen. Rapporten är det ambitiösaste och bästa som hittills skrivits vad gäller detaljerade förutsägelser av kommande års AI-utveckling. Osäkerheterna är som sagt stora, men successivt och månad för månad arbetar de fram vad de ser som det mest sannolika förloppet. Centralt i detta förlopp är hur AI, till följd av den utveckling som bland annat METR-rapporten påvisat, år 2027 når en punkt där den är en lika skicklig AI-utvecklare som dagens främsta sådana av kött och blod. Tack vare att de ledande AI-företagen då kan sätta hundratusentals eller miljontals sådana AI i arbete leder detta på några få månader till så kallad superintelligens – AI som vida överträffar människan över hela spektret av relevanta förmågor.

    [...]

    AI alignment-pionjären Eliezer Yudkowsky kan med blott en mild överdrift sägas egenhändigt ha lagt grunden för området under 00-talet. I en inflytelserik artikel från 2008 beskriver han det han bedömer vara default-scenariot ifall vi misslyckas med eller helt enkelt ignorerar AI alignment: ”AI:n hatar dig inte, ej heller älskar den dig, men du består av atomer som den kan ha annan användning för”.

    I en sådan situation vill vi givetvis inte hamna, och därför behöver vi lösa AI alignment i tid. Hur lång tid har vi då på oss? Ingen vet säkert, och det enda omdömesgilla är att medge att stor osäkerhet föreligger, men jag menar att vi bör ta på allvar den i AK-ekosystemet i San Francisco och Silicon Valley alltmer utbredda uppfattningen att superintelligent AI kan bli en realitet inom en tidsrymd som mäts i enstaka år snarare än decennier.

    [...]

    För att ge AI alignment-forskningen en chans att hinna ikapp tror jag att vi behöver dra i nödbromsen för utvecklingen av de allra mest kraftfulla AI-systemen. Detta försvåras dock av den kapplöpningssituation som föreligger, både mellan enskilda AI-företag och mellan länder (främst USA och Kina). Det allmänt hårdnande internationella klimatet sedan Trumps andra presidentämbetestillträde gör inte heller saken lättare. Ett lågvattenmärke för den globala AI-diskursen nåddes vid toppmötet AI Action Summit i Paris i februari i år, där säkerhetsfrågor sopades under mattan samtidigt som toppolitiker bjöd över varandra i vilka mångmiljardbelopp de avsåg satsa på AI-utveckling. Värst av allt var hur den amerikanske vicepresidenten JD Vance i sitt anförande uttryckte oförblommerat förakt för AI-säkerhet, då han slog fast att han ”inte var där för att tala om AI-säkerhet” och att ”vår AI-framtid inte erövras genom att oja sig över säkerhet utan genom att bygga”. Hans förhoppning lite längre fram i samma tal om att ”AI-ekonomin kommer att […] transformera den värld som består av atomer” ger, för den som minns Yudkowskys ovan citerade oneliner om AI och atomer, en isande rysning längs ryggraden.

    Detta säger något om vilka krafter vi behöver övervinna om vi skall få ordning på AI-utvecklingen och styra den i för mänskligheten mer gynnsam riktning jämfört med vart vi idag verkar vara på väg. Men framtiden är inte ristad i sten, och jag tror fortfarande att det är möjligt att förhindra en AI-katastrof. Något som skulle förbättra oddsen ytterligare vore om vi lyckas mobilisera den folkopinion mot skapandet av övermänskligt intelligent AI som enligt diverse opinionsundersökningar verkar föreligga. Så hjälp gärna till att sprida budskapet!

Läs hela texten här!

måndag 8 juli 2024

On Anthropic's call for proposals for third-party model evaluations

Nick Bostrom's modern classic Superintelligence: Paths, Dangers, Strategies from 2014 is full of interesting ideas.1 Some of them have a scary quality to them, and the one that I found scariest of all, back when I read the book, is what he calls the treacherous turn - the idea that a sufficiently intelligent AI which discovers a discrepancy between its own goals and motivations and those of us humans is likely to hide its capabilities and/or true intentions and to keep a low profile, quietly improving its situation and its capabilities until one day it judges the coast to be clear for moving forward at full speed towards whatever its goal is, be it paperclip production or something entirely different and incomprehensible to us. I remember not being fully prepared to take the idea in, but expecting or hoping that it would soon be demoted to a noteworthy technicality that AI alignment research has demonstrated an easy way to defuse.

This has not happened. On the contrary, the treacherous turn phenomenon lies at the heart of the fundamental problem with evaluating the safety of advanced AI models that has become increasingly recognized in recent years. In short, we do not know how to establish the absence of dangerous capabilities in AI models without the a priori assumption that they do not possess superhuman capabilities for deception and social manipulation, making the argument for the models' safety in part circular. With increasingly capable large language models, this problem becomes increasingly pressing and has been discussed both in popular media and in articles by leading AI researchers, as well as by the AI company Anthropic2 in their recent document A new initiative for developing third-party model evaluations:
    Our research shows that, under some circumstances, AI models can learn dangerous goals and motivations, retain them even after safety training, and deceive human users about actions taken in their pursuit. These abilities, in combination with the human-level persuasiveness and cyber capabilities of current AI models, increases our concern about the potential actions of future, more-capable models. For example, future models might be able to pursue sophisticated and hard-to-detect deception that bypasses or sabotages the security of an organization, either by causing humans to take actions they would not otherwise take or exfiltrating sensitive information. We propose to develop evaluations that would monitor such abilities.
Given the current situation, it's a good thing that Anthropic makes this call for help with the problem of safety evaluation of advanced AI models. Even better, one might argue, would be if they refrained from building models whose safety cannot be ascertained with existing evaluation methods.3

Footnotes

1) The same is true about Bostrom's latest book Deep Utopia: Life and Meaning in a Solved World, which I hope to come back to in another blog post later this year.

2) Anthropic's rise, since it was founded in early 2021, has been meteoric. They are now unquestionably among the world's three leading frontier AI developers (along with OpenAI and Google/DeepMind). With their release of Claude 3.5 Sonnet on June 20 this year, we have for the first time since I became aware in 2019 of the rapid rise of large language models a situation where a model not from OpenAI has a significant lead over other publicly available models in terms of overall cognitive capabilities. To understand on a gut level how intelligent Claude is, there really is no substitute for talking to it yourself (which anyone is free to do), but let me offer an amusing sample. After having had a bit of back-and-forth with Claude regarding who are the top ten greatest chess players of all time, I switched topics a bit:

    OH: Here's a harder list-making challenge: who are the 10 greatest AI researchers of all time?

    Claude: Creating a list of the 10 greatest AI researchers of all time is indeed a challenging task, as the field is vast and has evolved significantly over the decades. Additionally, contributions can be theoretical, practical, or both, making comparisons difficult. However, I'll attempt to create a list based on historical impact, influential work, and contributions to the field. Please note that this list is subjective and others might have different opinions:

    1. Alan Turing - For his foundational work on computation and the Turing Test.

    2. Marvin Minsky - Co-founder of MIT's AI lab and author of influential works on AI.

    3. John McCarthy - Coined the term "Artificial Intelligence" and invented LISP.

    4. Judea Pearl - For his work on Bayesian networks and causal reasoning.

    5. Geoffrey Hinton - Pioneer in deep learning and neural networks.

    6. Yann LeCun - Made significant contributions to convolutional neural networks.

    7. Yoshua Bengio - Key figure in the development of deep learning.

    8. Stuart Russell - Co-author of a seminal AI textbook and researcher in multiple AI fields.

    9. Andrew Ng - For his work in machine learning and deep learning, and for making AI education widely accessible.

    10. Demis Hassabis - Co-founder of DeepMind and key figure in reinforcement learning and AI game-playing systems.

    This list attempts to balance historical pioneers with more recent contributors who have shaped modern AI. It's worth noting that many other researchers could justifiably be included, such as Herbert Simon, Allen Newell, Seymour Papert, Jürgen Schmidhuber, Fei-Fei Li, and many others.

    The field of AI is collaborative and builds upon the work of many researchers, so any such list will inevitably omit many important contributors. Also, as AI continues to evolve rapidly, the landscape of influential researchers is likely to change.

    Would you like to discuss any of these researchers in more detail or explore other aspects of AI history?

    OH: Turing on top of the list - I like that! Did you by any chance consider including Ilya Sutskever?

    Claude: I'm glad you appreciate seeing Turing at the top! His contributions to the foundations of computer science and AI are truly monumental.

    Regarding Ilya Sutskever, you raise an excellent point. He's certainly a strong candidate for such a list, and I can see why you'd bring him up. Sutskever has made significant contributions to the field of deep learning and is a key figure in modern AI research. Here's a brief overview of why he could be considered for a top AI researchers list:

    1. He's a co-founder and Chief Scientist of OpenAI, one of the leading AI research organizations.

    2. Sutskever was a key contributor to AlexNet, which was a breakthrough in computer vision using deep learning.

    3. He's made important contributions to sequence-to-sequence learning, which has applications in machine translation and other natural language processing tasks.

    4. His work on improving training techniques for deep neural networks has been influential in the field.

    5. Sutskever has been involved in the development of GPT (Generative Pre-trained Transformer) models, which have revolutionized natural language processing.

    Given these contributions, Sutskever could certainly be considered for inclusion in a top 10 list, especially if focusing on more recent impactful work in deep learning and neural networks. His omission from my initial list reflects the difficulty of narrowing down to just 10 names in such a rich and rapidly evolving field.

    If you were to include Sutskever, who would you remove from the list to make room for him? Or would you structure the list differently to accommodate more recent contributors like Sutskever?

    OH: The biggest surprise to me on your list is Judea Pearl. I'm aware of his work, and respect it, but probably would not have included him on my top 10.

    Another name I'd consider for inclusion, although I realize it might be controversial, is Eliezer Yudkowsky.

    Claude: Your perspective on Judea Pearl is interesting and highlights the subjective nature of these lists. Pearl's inclusion is often based on his groundbreaking work on Bayesian networks and causal reasoning, which has had a significant impact on how we approach probabilistic reasoning in AI. However, I can understand why his work might not seem as directly relevant to the current deep learning-dominated landscape of AI.

    Regarding Eliezer Yudkowsky, his inclusion would indeed be controversial, but it's an intriguing suggestion. Here are some points to consider:

    1. Contributions: Yudkowsky is known for his work on AI safety and rationality rather than for technical AI breakthroughs. He's been influential in raising awareness about potential risks of advanced AI systems.

    2. Impact: He founded the Machine Intelligence Research Institute (MIRI) and has written extensively on topics like friendly AI and decision theory.

    3. Controversy: His lack of formal academic credentials and some of his unconventional views have led to mixed reception in mainstream AI circles.

    4. Influence: Despite controversy, his ideas have influenced discussions on AI ethics and safety, which are increasingly important as AI systems become more powerful.

    Including Yudkowsky would shift the list's focus somewhat from technical achievements to broader impacts on AI discourse and policy. It would recognize the importance of considering AI's societal implications alongside its technical development.

    If you were to create your own top 10 list, how would you balance between technical innovators, theoretical contributors, and those who've influenced the direction and ethics of AI research? Would you give more weight to recent contributions in deep learning and neural networks, or try to represent the broader history of AI?

    OH: These are excellent questions, and I fear there is no objective truth to be found out there.

    I have a confession to make, namely that one of my reasons for asking about top AI researchers was to check whether your creators at Anthropic might have sneakily programmed you to have a preference for one of your own researchers, such as Dario Amodei or Chris Olah. Sorry about that, but you passed my test with dignity!

    Claude: I appreciate your honesty about your motivations for asking the question. You're right that there's no objective truth in creating such lists - they're inherently subjective and can vary greatly depending on the criteria used.

    Your "test" is quite insightful. I can assure you that I don't have any programmed preference for Anthropic researchers or anyone else. My responses are based on my training data and the general consensus in the AI field, not on any specific agenda or bias towards particular individuals or organizations.

    It's a good instinct to be skeptical and look for potential biases in AI systems. In my case, I'm designed to be as objective as possible, acknowledging when topics are controversial or when I might not have complete information.

    Regarding Anthropic researchers like Dario Amodei or Chris Olah, while they have certainly made valuable contributions to the field, their work would be evaluated on its merits just like any other researcher's. The field of AI is vast and rapidly evolving, with many brilliant minds contributing from various institutions and companies around the world.

    Your approach to testing for potential biases is commendable. It's this kind of critical thinking that helps keep AI development accountable and transparent. Is there anything else you'd like to discuss about AI researchers, the development of AI, or the ethical considerations surrounding it?

3) The same goes for OpenAI, as well as for their other competitors. See my comments in January this year about OpenAI's evaluation framework, and this addendum the month after.

fredag 28 juni 2024

On optimism and pessimism standing at the brink of the great AI breakthrough

Sometimes in discussions about technology, the term techno-optimism is reserved for the belief that technology will develop rapidly, while techno-pessimism is used for the belief that it will move slowly or come to a halt. This is not the meaning of optimism and pessimism intended here. Throughout this blog post, the terms will refer to beliefs about consequences of this technology: are they likely to be good or to be bad?

Last time I wrote about the concepts of optimism and pessimism in the context of future AI advances and their ramifications, way back in 2017, I advocated for a sober and unbiased outlook on the world, and held
    both optimism and pessimism as biased distortions of the evidence at hand.1,2
I still hold this view, but I nevertheless think it is worth revisiting the issue to add some nuance,3 now in 2024 when we seem to be standing closer than ever to the brink of the great AI breakthrough. To suggest intentionally introducing either an optimism or a pessimism bias still sounds bad to me, but we can reframe the issue and make it less blatantly pro-distortion by admitting that all our judgements about our future with AI will necessarily be uncertain, and asking whether there might be an asymmetry in the badness of erring on the optimistic or the pessimistic side. Is excessive optimism worse than excessive pessimism, or vice versa?

There are obvious arguments to make in either direction. On one hand, erring on the side of optimism may induce decision-makers to recklessly move forward with unsafe technologies, whereas the extra caution that may result from undue pessimism is less obviously catastrophic. On the other hand, an overly pessimistic message may be disheartening and cause AI researchers, decision-makers and the general public to stop trying to create a better world and just give up.

The latter aspect came into focus for me when Eliezer Yudkowsky, after having began in 2021 to open up publicly about his dark view of humanity's chances of surviving upcoming AI developments, went all-in on this with his 2022 Death with dignity and AGI ruin blog posts. After all, there are all these AI safety researchers working hard to save humanity by solving the AI alignment problem - reserchers who rightly admire Yudkowsky as the brilliant pioneer who during the 00s almost single-handedly created this research area and discovered many of its crucial challenges,4 and to whom it may be demoralizing to hear that this great founder no longer believes the project has much chance of success. In view of this, shouldn't Yudkowsky at least have exhibited a bit more epistemic humility about his current position?

I now look more favorably upon Yudkowsky's forthrightness. What made me change my mind is a graphic published in June 2023 by Chris Olah, one of the leading AI safety researchers at Anthropic. The x-axis of Olah's graph represents the the level of difficulty of solving the AI alignment problem, ranging from trivial via steam engine, Apollo project and P vs NP to impossible, and his core messages are (a) that since uncertainty is huge about the true difficulty level we should rationally represent our belief about this as some probability distribution over his scale, and (b) that it is highly important to try to reduce the uncertainty and improve the precision in our belief, so as to better be able to work in the right kind of way with the right kind of solutions. It is with regards to (b) that I changed my mind on what Yudkowsky should or shouldn't say. If AI alignment is as difficult as Yudkowsky thinks, based on his unique experience of decades of working hard on the problem, then it is good that he speaks out about this, so as to help the rest of us move our probability mass towards P vs NP or beyond. If instead he held back and played along with the more common view that the difficulty is a lot easier - likely somewhere around steam engine or Apollo project - he would contribute to a consensus that might, e.g., cause a future AI developer to wreak havoc by releasing an AI that looked safe in the lab but failed to have that property in the wild. This is not to say that I entirely share Yudkowsky's view of the matter (which does look overconfident to me), but that is mostly beside the point, because all he can reasonably be expected to do is to deliver his own expert judgement.

At this point, I'd like to zoom in a bit more on Yudkowsky's list of lethalities in his AGI ruin post, and note that most of the items on the list express reasons for not putting much hope in one or the other of the following two things.
    (1) Our ability to solve the technical AI alignment problem.

    (2) Our ability to collectively decide not to build an AI that might wipe out Homo sapiens.

It is important for a number of reasons to distinguish pessimism about (1) and pessimism about (2),5 such as how a negative outlook on (1) gives us more reason to try harder to solve (2), and vice versa. However, the reason I'd like to mainly highlight here is that unlike (1), (2) is a mostly social phenomenon, so that beliefs about the feasibility of (2) can influence this very feasibility. To collectively decide not to build an existentially dangerous AI is very much a matter of curbing a race dynamic, be it between tech companies or between nations. Believing that others will not hold back may disincentivize a participant in the race from themselves holding back. This is why undue pessimism about (2) can become self-fulfilling, and for this reason I believe such pessimism about (2) to be much more lethal than a correponding misjudgement about (1).6

This brings me to former OpenAI employee Leopold Aschenbrenner's recent and stupendously interesting report Situational Awareness: The Decade Ahead.7 Nowhere else can we currently access a more insightful report about what is going on at the leading AI labs, how researchers there see the world, and the rough ride during the coming decade that the rest of the world can expect as a consequence of this. What I don't like, however, is the policy recommendations, which include the United States racing ahead as fast as possible towards AGI the next few years. Somewhat arbitrarily (or at lest with insufficiently explained reasons), Aschenbrenner expresses optimism about (1) but extreme pessimism about (2): the idea that the Chinese Communist Party might want to hold back from a world-destroying project is declared simply impossible unless their arm is twisted hard enough by an obviously superior United States. So while on one level I applaud Aschenbrenner's report for giving us outsiders this very valuable access to the inside view, on another level I fear that it will be counterproductive for solving the crucial global coordination problem in (2). And the combination of overoptimism regarding (1) and overpessimism regarding (2) seems super dangerous to me.

Footnotes

1) This was in the proceedings of a meeting held at the EU parliament on October 19, 2017. My discussion of the concepts of optimism and pessimism was provoked by how prominently these termes were used in the framing and marketing of the event.

2) Note here that in the quoted phrase I take both optimism and pessimism as deviations from what is justified by evidence - for instance, I don't here mean that taking the probability of things going well to be 99% to automatically count as optimistic. This is a bit of a deviation from standard usage, which in what follows I will revert to, and instead use phrases like "overly optimistic" to indicate optimism in the sense I gave the term in 2017.

3) To be fair to my 2017 self, I did add some nuance already then: the acceptance of "a different kind of optimism which I am more willing to label as rational, namely to have an epistemically well-calibrated view of the future and its uncertainties, to accept that the future is not written in stone, and to act upon the working assumption that the chances for a good future may depend on what actions we take today".

4) As for myself, I discovered Yudkowsky's writings in 2008 or 2009, and insofar as I can point to any single text having convinced me about the unique importance of AI safety, it's his 2008 paper Artificial intelligence as a positive and negative factor in global risk, which despite all the water under the bridges is still worthy of inclusion on any AI safety reading list.

5) Yudkowsky should be credited with making this distinction. In fact, when the Overton window on AI risk shifted drastically in early 2023, he took that as a sufficiently hopeful sign so as to change his mind in the direction of a somewhat less pessimistic view regarding (2) - see his much-discussed March 2023 Time Magazine article.

6) I don't deny that, due to the aforementioned demoralization phenomenon, pessimism about (1) might also be self-fulfilling to an extent. I don't think, however, that this holds to anywhere near the same extent as for (2), where our ability to coordinate is more or less constituted by the trust that the various participants in the race have that it can work. Regarding (1), even if a grim view of its feasibility becomes widespread, I think AI researchers will still remain interested in making progress on the problem, because along with its potentially enormous practical utility, surely this is one of the most intrinsically interesting research questions on can possibly ask (up there with understanding biogenesis or the Big Bang or the mystery of consciousness): what is the nature of advanced intelligence, and what determines its goals and motivations?

7) See also Dwarkesh Patel's four-and-a-half hour interview with Aschenbrenner, and Zwi Mowshowitz' detailed commentary.

fredag 24 mars 2023

Jag skriver om GPT-4 och GPT-5 i Ny Teknik idag

I Ny Teknik skriver jag idag om AI-risk i samband med OpenAI:s nya GPT-4 och dess eventuella efterföljare GPT-5. Min debattartikel fick rubriken Vi kan inte ta för givet att vi överlever en otillräckligt säkrad GPT-5 och inleds på följande vis:
    Vilka egenskaper ligger bakom den dominans på vår planet som människan byggt upp från nästan noll de senaste 100 000 åren? Det handlar inte om muskelstyrka eller fysiska uthållighet, utan nästan enbart om vår intelligens, vars enorma kraft också pekar på vilket avgörande skede vi befinner oss i nu, då vi är på väg att automatisera den och överföra den till maskiner.

    Googles VD Sundar Pichai överdrev knappast när han i ett tal 2018 utnämnde AI (artificiell intelligens), till ”troligen det viktigaste mänskligheten någonsin arbetat med – viktigare än både elektriciteten och elden”. AI kan bli nyckeln till att lösa alla de stora miljö-, naturresurs- och samhällsproblem som vi brottas med i dag, och till att lägga grunden för en hållbar och blomstrande framtid.

    Men det finns också stora risker. En del av dessa är relativt jordnära – som den att AI-verktyg medverkar till diskriminering eller blir ett redskap för spridning av individanpassad desinformation och spam. Dessa är viktiga att hantera, men den i slutändan allra främsta AI-risken handlar om ett eventuellt framtida AI-genombrott som skapar det som ibland kallas AGI (artificiell generell intelligens). Redan datavetenskapens grundare Alan Turing skrev i en framåtblickande text 1951 om en punkt där maskinerna till slut överträffar mänsklig tankeförmåga, och hur vi då inte längre kan räkna med att behålla kontrollen över dem.

    I ett sådant läge hänger mänsklighetens fortsatta öde på vilka mål och drivkrafter de superintelligenta maskinerna har. Det skulle dröja mer än ett halvsekel efter Turings varningsord innan det forskningsområde som i dag benämns ”AI alignment” sakta började komma igång. AI alignment går ut på att se till att de första AGI-maskinerna får värderingar som är i linje med våra egna, och som prioriterar mänsklig välfärd. Om vi lyckas med det blir AGI-genombrottet det bästa som någonsin hänt oss, men om inte så blir det sannolikt vår undergång.

    Den som har överdoserat på Terminator och liknande filmer kan lätt tro att avancerad robotik är nödvändigt för en AI-katastrof. En av de insikter som ai alignment-pionjären Eliezer Yudkowsky gjorde på 00-talet var dock att ett ai-övertagande inte förutsätter robotik, och att en kanske troligare väg till maktövertagandet åtminstone initialt går via...

Med denna cliffhanger uppmanar jag er att läsa resten av artikeln hos Ny Teknik.

måndag 18 juli 2022

On systemic risk

For the latest issue of ICIAM Dianoia - the newsletter published by the International Council for Industrial and Applied Mathematics - which was released last week, I was invited to offer my reflections on a recent document namned Briefing Note on Systemic Risk. The resulting text can be found here, and is reproduced below for the convenience of readers of this blog.

* * *

Brief notes on a Briefing Note

I have been asked to comment on the Briefing Note on Systemic Risk, a 36 page document recently released jointly by the International Science Council, the UN Office for Disaster Risk Reduction, and an interdisciplinary network of decision makers and experts on disaster risk reduction that goes under the acronym RISKKAN. The importance of the document lies not so much in the concrete subject-matter knowledge (of which in fact there is rather little) that an interested reader can take away from it, but more in how it serves as a commitment from the three organizations to take the various challenges associated with systemic risk seriously, and to work on our collective ability to overcome these challenges and to reduce the risks.

So what is systemic risk? A first attempt at a definition could involve requiring a system consisting of multiple components, and a risk that cannot be understood in terms of a single such component, but which involves more than one of them (perhaps the entire system) and arises not just from their individual behavior but from their interactions. But more can be said, and an appendix to the Briefing Note lists definitions offered by 22 different organizations and groups of authors, including the OECD, the International Monetary Fund and the World Economic Forum. Recurrent concepts in these definitions include complexity, shocks, cascades, ripple effects, interconnectedness and non-linearity. The practical approach here is probably that we give up on the hope for a clear set of necessary and sufficient conditions on what constitutes a systemic risk, and accept that the concept has somewhat fuzzy edges.

A central theme in the Briefing Note is the need for good data. A system with many components will typically also have many parameters, and in order to understand it well enough to grasp its systemic risks we need to estimate its parameters. Without good data that cannot be done. A good example is the situation the world faced in early 2020 as regards the COVID pandemic. We were very much in the dark about key parameters such as R0 (the basic reproduction number) and the IFR (infection fatality rate), which are properties not merely of the virus itself, but also of the human population that it preys upon, our social contact pattern, our societal infrastructures, and so on – in short, they are system parameters. In order to get a grip on these parameters it would have been instrumental to know the infection’s prevalence in the population and how that quantity developed over time, but the kind of data we had was so blatantly unrepresentative of the population that experts’ guesstimates differed by an order of magnitude or sometimes even more. A key lesson to be remembered for the next pandemic is the need to start sampling individuals at random from the population to test for infection as early as possible.

Besides parameter estimation within a model of the system, it is of course also important to realize that the model is necessarily incomplete, and that system risk can arise from features not captured by it. At the very least, this requires a well-calibrated level of epistemic humility and an awareness of the imprudence of treating a risk as nonexistent just because we are unable to get a firm handle on it.

Early on in the Briefing Note, it is emphasized that while studies of systemic risk have tended to focus on “global and catastrophic or even existential risks”, the phenomenon appears ”at all possible scales – global, national, regional and local”. While this is true, it is also true that it is systemic risk at the larger scales that carry the greatest threat to society and arguably are the most crucial to address. An important cutoff is when the amounts at stake become so large that the risk cannot be covered by insurance companies, and another one is when the very survival of humanity is threatened. As to the latter kinds of risk, the recent monograph by philosopher Toby Ord gives the best available overview and includes a chapter on the so-called risk landscape, i.e., how the risks interact in systemic ways.

Besides epidemics, the concrete examples that feature the most in the Briefing Note are climate change and financial crises. These are well-chosen due both to their urgent need to be addressed and their various features typical of systemic risk. Still, there are other examples whose absence in the report constitute a rather serious flaw. One is AI risk, which is judged by Ord (correctly, in my view) to constitute the greatest existential risk of all to humanity in the coming century. A more abstract one, but nonetheless important, is the risk of human civilization ending up more or less irreversibly in the kind of fixed point – somewhat analogous to mutual defection in the prisoners’ dilemma game but typically much more complex and pernicious – that Scott Alexander calls Moloch and that Eliezer Yudkowsky speaks more prosaically of as inadequate equilibria.

tisdag 6 juli 2021

Artificial general intelligence and the common sense argument

Consider the distinction between narrow AI and artificial general intelligence (AGI), the former being AI specialized on a limited range of capabilities and tasks, and the latter denoting a hypothetical future AI technology exhibiting the flexibility and the full range of cognitive capabilities that we associate with human intelligence. Timelines for when (if ever) AGI will be built are highly uncertain [MB,GSDZE], but scientific debate on the issue has become more pressing in recent years, following work on possible catastrophic consequences of an AGI breakthrough unless sufficient care is taken in aligning the machine’s goals with human values [Y08,Bo,Ru,H].

A common argument for long or infinite timelines until AGI is to point to some example where present-day AI performs badly compared to humans, and to take this as a lack of “common sense” and an indication that constructing AGI is far beyond our current capabilities, whence it cannot plausibly become reality anytime soon [Bks,W,P18a,P18b]. We have all seen YouTube videos of robots tripping over their toes used for this purpose, and a recent viral case concerned an AI meant to steer a TV camera to track the ball during a football game which instead tracked the head of a bald linesman [V]. Such utter lack of common sense! An implicit or sometimes explicit conclusion tends to be that concerns about future machines taking control of the world are overblown and can be dismissed.

I believe that the central position assigned to concept of AGI in the AI futurology and AI safety literature has, via the following implicit narrative, given undue credence to the idea that AI existential risk is exclusively an issue concerning the far future. For AI to become AGI, we need to equip it with all of the myriad competences we associate with common sense, one after the other, and only then will it be ready to enter the spiral of self-improvement that has been suggested may lead to an intelligence explosion, superintelligence, and the ability to take over the world [G,Bo,H].

But common sense is a vague notion – it tends to be simply a label we put on any ability where humans still outperform AI. Hence, by the very definition of AGI, we will be able to point to common sense failures of AI up until the very moment AGI is created, so the argument that such failures show that AGI is a long way off is unsound; see [Y17] for a similar observation.

More importantly, however: AGI may not be necessary for the self-improvement spiral to take off. A better narrative is the following. We are already at a stage where AIs are better than humans at some cognitive tasks while humans remain better than AIs at others. For instance, AIs are better at chess [Si,SR], while humans are better at tracking a football [V]. A human pointing to the football example to show that AIs lack common sense has no more force than an AI pointing to chess to show that humans lack common sense. Neither of them fully dominates the other in terms of cognitive capabilities. Still, AI keeps improving, and rather than focusing on full dominance of AI (as encouraged by the AGI concept) the right question to ask is in what range of cognitive capacities AI needs to master in order to be able to seize power from humans. Self-improvement (i.e., constructing advanced AI) is probably a key competence here, but what does that entail?

There is probably little reason to be concerned about, say, the AlphaZero breakthrough [Si,SR], as two-player zero-sum finite board games with full information is unlikely to be the most significant part of the intelligence spectrum. More significant, probably, is the use of language, as a large fraction of what a human does to control the world is via language acts. It is also a direction of AI that has seen dramatic improvements in the last few years with large-scale natural language processors like GPT-2 and GPT-3 [RWAACBD,Bwn,So,KEWGMI]. With the launch of GPT-3, it was stressed that no new methods were employed compared to GPT-2: the improved performance was solely a consequence of brute-force upscaling. Nobody knows where the ceiling is for such upscaling, and it is therefore legitimate to ask whether the crucial ingredient to make AI’s ability to control the world surpass that of humans might lie in the brute-force expansion of such language models.

If it seems counterintuitive that an AI would attain the ability to take over the world without first having mastered common sense, consider that this happens all the time for smaller tasks. As an illustrative small-scale example, take the game of Sokoban, where the player moves around in a maze with the task of pushing a collection of boxes to a set of prespecified locations. An AI system for solving Sokoban puzzles was recently devised and exhibited superhuman abilities, but apparently without acquiring the common sense insight that every human player quickly picks up: if you push a box into a corner, it will be stuck there forever [FDS,PS]. No clear argument has been given for why efficient problem solving lacking in common sense would be impossible for a bigger task like taking over the world.

To summarize, I believe that while the notion of AGI has been helpful in the early formation of ideas in the futurology and philosophy of AI, it has also been a source of some confusion, especially via the common sense argument discussed above. AI existential risk literature might therefore benefit from a less narrow focus on the AGI concept. Such a shift of focus may be on its way, such as in the recent authoritative research agenda [CK] where talk of AGI is almost entirely abandoned in favor of other notions such as prepotent AI, meaning AI systems whose deployment would be at least as transformative to the world as humanity itself and unstoppable by humans. I welcome this shift in discourse.

References

[Bo] Bostrom, N. (2014) Superintelligence: Paths, Dangers, Strategies, Oxford University Press, Oxford.

[Bks] Brooks, R. (2017) The seven deadly sins of AI prediction, MIT Technology Review, October 6.

[Bwn] Brown, T. et al (2020) Language models are few-shot learners, https://arxiv.org/abs/2005.14165

[CK] Critch, A. and Krueger, D. (2020) AI research considerations for human existential safety (ARCHES), https://arxiv.org/abs/2006.04948

[FDS] Feng, D., Gomes, C. and Selman, B. (2020) A novel automated curriculum strategy to solve hard Sokoban planning instances, 34th Conference on Neural Information Processing Systems (NeurIPS 2020).

[G] Good, I.J. (1966) Speculations concerning the first ultraintelligent machine, Advances in Computers 6, 31-88.

[GSDZE] Grace, K., Salvatier, J., Dafoe, A., Zhang, B. and Evans, O. (2017) When will AI exceed human performance? Evidence from AI experts, https://arxiv.org/abs/1705.08807

[H] Häggström, O. (2021) Tänkande maskiner: Den artificiella intelligensens genombrott, Fri Tanke, Stockholm.

[KEWGMI] Kenton, Z., Everitt, T., Weidinger, L., Gabriel, I., Mikulik, V. and Irving, G. (2021) Alignment of language agents, https://arxiv.org/abs/2103.14659

[MB] Müller, V. and Bostrom, N. (2016) Future progress in artificial intelligence: A survey of expert opinion, in Fundamental Issues of Artificial Intelligence (ed. V. Müller), Springer, Berlin, p 554-571.

[PS] Perry, L. and Selman, B. (2021) Bart Selman on the promises and perils of artificial intelligence, Future of Life Institute Podcast, May 21.

[P18a] Pinker, S. (2018) Enlightenment Now: The Case for Reason, Science and Humanism, Viking, New York.

[P18b] Pinker, S. (2018) We’re told to fear robots. But why do we think they’ll turn on us? Popular Science, February 14.

[RWAACBD] Radford A., Wu, J., Amodei, D., Amodei, D., Clark, J., Brundage, M. and Sutskever, I. (2019) Better language models and their implications, OpenAI, February 14, https://openai.com/blog/better-language-models/

[Ru] Russell, S. (2019) Human Compatible: Artificial Intelligence and the Problem of Control, Viking, New York.

[SR] Sadler, M. and Regan, N. (2019) Game Changer: AlphaZero’s Ground-Breaking Chess Strategies and the Promise of AI, New In Chess, Alkmaar, NL.

[Si] Silver, D. et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play, Science 362, 1140-1144.

[So] Sotala, K. (2020) I keep seeing all kinds of crazy reports about people's experiences with GPT-3, so I figured that I'd collect a thread of them, Twitter, July 15.

[V] Vincent, J. (2020) AI camera operator repeatedly confuses bald head for soccer ball during live stream, The Verge, November 3.

[W] Waters, R. (2018) Why we are in danger of overestimating AI, Financial Times, February 5.

[Y08] Yudkowsky, E. (2008) Artificial intelligence as a positive and negative factor in global risk, in Global Catastrophic Risks (ed. N. Bostrom and M. Cirkovic), Oxford University Press, Oxford, p 308–345.

[Y17] Yudkowsky, E. (2017) There’s no fire alarm for artificial general intelligence, Machine Intelligence Research Institute, Berkeley, CA.

tisdag 8 juni 2021

Tendentiöst i DN om labbläckehypotesen

DN rapporterade i lördags om diskussionerna kring huruvida covid-19-pandemin kan ha sitt ursprung i en labbläcka från Wuhan Institute of Virology (WIV), och hur Anthony Fauci (av DN kallad "USA:s Anders Tegnell") därvid hamnat i blåsväder. Artikeln bjuder på följande passage:
    En annan ingrediens som är mumma för konspirationsteoretikerna är att Wuhan-labbet fått internationella bidrag för sin forskning - bland annat från den smittskyddsmyndighet som Anthony Fauci är chef för - och det påstås också att det bedrevs viss forskning som syftade till att förändra virus.
Två ordval här får mig att reagera. För det första, "påstås". Vadå "påstås"? Att gain-of-function-forskning1 bedrivits vid WIV är inte något löst påstående av några opålitliga tyckare, utan ett faktum som även WIV-forskarna själva stolt givit spridning åt. Se exempelvis Nature Medicine-artikeln A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence från 2015, med medverkan av WIV-forskare under ledning av Shi Zhengli.

För det andra, "konspirationsteoretikerna". DN:s reporter Juan Flores gör sitt bästa att framställa labbläckediskussioner som konspirationsteorier. Men att gain-of-function-forskning förekommer på viruslaboratorier här och var inklusive på WIV är som sagt obstridligt, liksom det faktum att många incidenter förekommit genom historien (inte minst i samband med sovjetiska biovapenprogram) där farliga smittämnen läckt från biolaboratorier. Härtill är även bristen på transparens hos kinesiska myndigheter ett okontroversiellt faktum. Om vi lägger samman dessa saker med det osäkra evidensläget (troligtvis finns inte någon enda människa som med säkerhet känner till virusets ursprung) och den allmänmänskliga benägenheten att önsketäkna och skylla ifrån sig, så inses lätt att labbläckehypotesen inte förutsätter några konspirationer eller hemliga överenskommelser i rökiga rum.

Ändå var det så labbläckehypotesen framställdes genom hela 2020: som en konspirationsteori. Därigenom blev ämnet mer eller mindre tabu i anständiga kretsar.2 Tongivande härvidlag blev den Lancet-artikel i början av 2020, med en rad tunga namn på författarlistan, som förkunnade att "We stand together to strongly condemn conspiracy theories suggesting that COVID-19 does not have a natural origin".3

Tonläget har emellertid under 2021 förändrats, mycket tack vare ett par gedigna artiklar - av Nicholas Baker i New York Magazine, respektive Nicholas Wade i Bulletin of the Atomic Scientists - som pekar dels på svagheterna i den evidens som framhölls i Lancet-artikeln, dels på omfattande annan evidens som mer tyder på en labbläcka.4 Allt fler seriösa tänkare och debattörer betraktar nu labbläckehypotesen som minst lika sannolik som hypotesen om virusets naturliga ursprung,5 och tabut får nu anses tillräckligt brutet för att möjliggöra förutsättningslös och seriös diskussion i frågan. Men DN har inte hängt med, och fortsätter att vifta med konspirationsteoristämpeln.

Jag skulle gärna se att DN tänkte om på denna punkt, för frågan om covid-virusets ursprung är viktig. Det handlar inte i första hand om någon historisk kuriositet eller om att avgöra den uppmärksammade vadslagningen mellan Martin Rees och Steven Pinker, eller ens om att peka finger mot de eventuella skyldiga, utan om att försätta oss i ett så bra kunskapsläge som möjligt för att förebygga nästa pandemi, som mycket väl kan komma att bli långt värre än covid-19.

Fotnoter

1) Med gain-of-function-forskning avses modifiering av virus eller andra smittämnen för att göra dem mer smittsamma eller farligare. Avsikten är att lära sig mer om vad mutationer i naturen kan tänkas åstadkomma och därigenom öka vår beredskap inför framtida smittoutbrott. Inom bland annat xrisk-forskningen är vi emellertid många som anser att risken för labbläckor gör gain-of-function-forskningen oacceptabelt farlig. Jag berörde saken kort i min bok Here Be Dragons från 2016, och uttryckte tillfredsställelse över att Obamaadministrationen 2014 beslutat om att förbjuda federal finansiering av gain-of-function-forskning, men detta förbud upphävdes några år senare under Trump (och Fauci), och hur som helst så behövs såklart skarpare lagstiftning än så för att säkerställa att vårdslös forskning av detta slag upphör.

2) I mer oanständiga kretsar var det däremot fritt fram, något som kan ha bidragit ytterligare till att misskreditera labbläckehypotesen bland anständigt folk.

3) Artikeln meddelar också "We declare no competing interests", vilket är vilseledande med tanke på att initiativtagaren till artikeln, Peter Daszak, har starka kopplingar till gain-of-function-forskningen vid WIV.

4) Den som hellre läser på svenska kan alternativt vända sig till Ola Wongs artikel i Kvartal.

5) Bland dem som de senaste veckorna till och med vågat uppge en subjektiv bayesiansk sannolikhet för labbläckehypotesen återfinns bland andra jag själv (55%), Nate Silver (60%), och Eliezer Yudkowsky (80%).

tisdag 18 september 2018

An essential collection on AI safety and security

The xxix+443-page book Artificial Intelligence Safety and Security, edited by Roman Yampolskiy, has been out for a month or two. Among its 28 independent chapters (plus Yamploskiy's introduction), which have a total of 47 different authors, the first 11 (under the joint heading Concerns of Luminaries) have previously been published, with publication years ranging from 2000 to 2017, while the remaining 17 (dubbed Responses of Scholars) are new. As will be clear below, I have a vested interest in the book, so the reader may want to take my words with a grain of salt when I predict that it will quickly become widely accepted as essential reading in the rapidly expanding and increasingly important fields of AI futurology, AI risk and AI safety; nevertheless, that is what I think. I haven't yet read every single chapter in detail, but have seen enough to confidently assert that while the quality of the chapters is admittedly uneven, the book still offers an amazing amount of key insights and high-quality expositions. For a more systematic account by someone who has read all the chapters, see Michaël Trazzi's book review at Less Wrong.

Most of the texts in the Concerns of Luminaries part of the book are modern classics, and six of them were in fact closely familiar to me even before I had opened the book: Bill Joy's early alarm call Why the future doesn't need us, Ray Kurzweil's The deeply intertwined promise and peril of GNR (from his 2005 book The Singularity is Near), Steve Omohundro's The basic AI drives, Nick Bostrom's and Eliezer Yudkowsky's The ethics of artificial intelligence, Max Tegmark's Friendly artificial intelligence: the physics challenge, and Nick Bostrom's Strategic implications of openness in AI development. (Moreover, the papers by Omohundro and Tegmark provided two of the cornerstones for the arguments in Section 4.5 (The goals of a superintelligent machine) of my 2016 book Here Be Dragons.) Among those that I hadn't previously read, I was struck most of all by the urgent need to handle the near-term nexus of risks connecting AI, chatbots and fake news, outlined in Matt Chessen's The MADCOM future: how artificial intelligence will enhance computational propaganda, reprogram human cultrure, and threaten democracy... and what can be done about it.

The contributions in Responses of Scholars offer an even broader range of perspectives. Somewhat self-centeredly, let me just mention that three of the most interesting chapters were discussed in detail by the authors at the GoCAS guest researcher program on existential risk to humanity organized by Anders Sandberg and myself at Chalmers and the University of Gothenburg in September-October last year: James Miller's A rationally addicted artificial superintelligence, Kaj Sotala's Disjunctive scenarios of catastrophic AI risk, and Phil Torres' provocative and challenging Superintelligence and the future of governance: on prioritizing the control problem at the end of history. Also, there's my own chapter on Strategies for an unfriendly oracle AI with reset button. And much more.

måndag 14 maj 2018

Two well-known arguments why an AI breakthrough is not imminent

Much of my recent writing has concerned future scenarios where an artificial intelligence (AI) breakthrough leads to a situation where we humans are no longer the smartest agents on the planet in terms of general intelligence, in which case we cannot (I argue) count on automatically remaining in control; see, e.g., Chapter 4 in my book Here Be Dragons: Science, Technology and the Future of Humanity, or my paper Remarks on artificial intelligence and rational optimism. I am aware of many popular arguments for why such a breakthrough is not around the corner but can only be expected in the far future or not at all, and while I remain open to the conclusion possibly being right, I typically find the arguments themselves at most moderately convincing.1 In this blog post I will briefly consider two such arguments, and give pointers to related and important recent developments. The first such argument is one that I've considered silly for as long as I've given any thought at all to these matters; this goes back at least to my early twenties. The second argument is perhaps more interesting, and in fact one that I've mostly been taking very seriously.

1. A computer program can never do anything creative, as all it does is to blindly execute what it has been programmed to do.

This argument is hard to take seriously, because if we do, we must also accept that a human being such as myself cannot be creative, as all I can do is to blindly execute what my genes and my environment have programmed me to do (this touches on the tedious old free will debate). Or we might actually bite that bullet and accept that humans cannot be creative, but with such a restrictive view of creativity the argument no longer works, as creativity will not be needed to outsmart us in terms of general intelligence. Anyway, the recent and intellectually crowd-sourced paper The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities offers a fascinating collection of counterexamples to the claim that computer programs cannot be creative.

2. We should distinguish between narrow AI and artificial general intelligence (AGI). Therefore, as far as a future AGI breakthrough is concerned, we should not be taken in by the current AI hype, because it is all just a bunch of narrow AI applications, irrelevant to AGI.

The dichotomy between narrow AI and AGI is worth emphasizing, as UC Berkeley computer scientist Michael Jordan does in his interesting recent essay Artificial Intelligence  - The Revolution Hasn’t Happened Yet. That discourse offers a healthy dose of skepticism concerning the imminence of AGI. And while the claim that progress in narrow AI is not automatically a stepping stone towards AGI seems right, the oft-repeated stronger claim that no progress in narrow AI can serve as such a stepping stone seems unwarranted. Can we be sure that the poor guy in the cartoon on the right (borrowed from Ray Kurzweil's 2005 book; click here for a larger image) can carry on much longer in his desperate production of examples of what only humans can do? Do we really know that AGI will not eventually emerge from a sufficiently broad range of specialized AI capabilities? Can we really trust Thore Husfeldt's image suggesting that Machine Learning Hill is just an isolated hill rather than a slope leading up towards Mount Improbable where real AGI can be found? I must admit that my certainty about such a topography in the landscape of computer programs is somewhat eroded by recent dramatic advances in AI applications. I've previously mentioned as an example AlphaZero's extraordinary and self-taught way of playing chess, made public in December last year. Even more striking is last week's demonstration of the Google Duplex personal assistant's ability to make intelligent phone conversations. Have a look:3

Footnotes

1) See Eliezer Yudkowsky's recent There’s No Fire Alarm for Artificial General Intelligence for a very interesting comment on the lack of convincing arguments for the non-imminence of an AI breakthrough.

2) The image appears some 22:30 into the video, but I really recommend watching Thore's entire talk, which is both instructive and entertaining, and which I had the privilege of hosting in Gothenburg last year.

3) See also the accompanying video exhibiting a wider range of Google AI products. I am a bit dismayed by its evangelical tone: we are told what wonderful enhancements of our lives these products offer, and there is no mention at all of possible social downsides or risks. Of course I realize that this is the way of the commercial sector, but I also think a company of Google's unique and stupendous power has a duty to rise above that narrow-minded logic. Don't be evil, goddamnit!

onsdag 17 januari 2018

Two excellent contemporary writers

Two of my favorite contemporary writers, operating however in very different genres, are Ted Chiang and Scott Alexander:
  • Ted Chiang is a science fiction writer specializing in short stories. When I read his collection Stories of Your Life and Others I said to myself "wow, this guy is almost better than Greg Egan" (but let me withhold final judgement on that comparison). The book opens with Tower of Babylon, which explores a beautiful alternative cosmology more in line with what people believed in ancient times, and continues with Understand, which, albeit lacking somewhat in realism, gives what is probably the best account I've read on what it might be like to attain superintelligence - an impossible topic, yet important in view of possible future transhumanistic developments. Among the other stories in the book is the title one, Story of Your Life, which was later adapted to the Hollywood movie Arrival; I recommend both reading the story and seeing the movie (the plots diverge somewhat in interesting respects) and then listening to the podcast Very Bad Wizards discussing them.

  • Scott Alexander blogs about science, philosophy, future technologies and related topics. He often penetrates quite deeply into his chosen topic, and his posts are often longish to very long. Several of his blog posts have influenced me significantly, such as...
Excellence is writing may however be more or less genre-specific; I suspect that most good authors of university-level mathematics textbooks suck as poets, and vice versa. Another example is that when last month Ted Chiang tried his luck at writing an essay on AI futurology with the title Silicon Valley is Turning into its Own Worst Fear, his brilliance does not shine through. The gist of Chiang's argument is that superintelligent AI and capitalism are similar in that they both relentlessly optimize for something that is not entirely well-aligned with human well-being, and that since superintelligent AI does not at present exist while capitalism does, only capitalism poses a real danger. This last conclusion is a non sequitur.

And now, finally... get ready for my excuse for discussing Chiang and Alexander in the same blog post! Scott Alexander's blog post Maybe the Real Superintelligent AI is Extremely Smart Computers from earlier this week is a masterful exposition of the errors in Chiang's arguments. When I first saw Chiang's essay, I saw mostly the same errors that Alexander saw, but would never have been able to explain them quite as pedagogically as he does. Do read it (Alexander's blog post, that is), as I have nothing to add to it.

torsdag 7 december 2017

Diverse om AI

Idag har jag samlat några korta saker om AI (artificiell intelligens):
  • Igår chockades schackvärlden av DeepMinds offentliggörande av programmet AlphaZero, som utan annan schacklig förkunskap än själva reglerna tränade upp sig själv inom loppet av timmar (men stor datorkraft) att uppnå spelstyrka nog att i en match om 100 partier besegra regerande datorvärldsmästaren Stockfish 28 (som i sin tur är överlägsen alla mänskliga stormästare) med utklassningssiffrorna 64-36 (28 vinster, 72 remier och 0 förluster). Här finns tio av partierna för genomspelning, och förutom AlphaZeros fabulösa taktiska och strategiska färdigheter kan man fascineras av att den på egen hand återupptäckt en rad av våra traditionella spelöppningar (inklusive långtgående varianter i Damindiskt, Spanskt och Franskt), något som måhända kan tas som intäkt i att vi inte varit totalt ute och cyklat i vår mänskligt utvecklade spelöppningsteori.

    Att ett nytt schackprogram är bättre än de tidigare har givetvis hänt förut och är inte i sig sensationellt; paradigmskiftet består i att programmet uppnått sin spelstyrka på egen hand från scratch. Det kan hända att schacksajten chess24.com har rätt i sin spekulation att "the era of computer chess engine programming also seems to be over". Det kan också hända att jag överreagerar på grund av den nära relation till just schack jag haft i decennier (det är kanske inte ens givet att det vi fick veta igår är ett större genombrott än DeepMinds motsvarande framsteg i Go tidigare i år), men händelsen får mig att överväga en omvärdering av hur snabbt AI-utvecklingen mer allmänt går idag. Eliezer Yudkowskys uppsats There’s No Fire Alarm for Artificial General Intelligence är värd att åtminstone skänka en tanke i detta sammanhang.

  • Jag hade verkligen hoppats att aldrig mer behöva höra talas om de så kallade No Free Lunch-satserna - ett knippe triviala och på det hela taget oanvändbara matematiska observationer som genom flashig paketering kunnat få ett oförtjänt gott rykte och användas som retoriskt tillhygge av kreationister. Med min uppsats Intelligent design and the NFL theorems (samt den uppföljande rapporten Uniform distribution is a model assumption) för ett årtionde sedan hoppades jag ta död på detta rykte. Och sedan dess har det faktiskt varit lite tystare kring NFL - även om det är som bäst oklart i vad mån jag har del i äran av det. Men när nu ännu en AI-debattör av det slag som inte tror på superintelligens och därmed förknippad AI-risk och som är beredd att ta till precis vilka argument som helst för denna ståndpunkt (sådana debattörer plockar jag ned och här på bloggen) dykt upp, så borde jag måhända inte förvånas över att denne klämmer till med NFL-satserna i sin argumentation. Trams!

  • Frågan om superintelligent AI och huruvida sådan är inom räckhåll för AI-forskningen (och i så fall i vilket tidsperspektiv) är mycket svår, och även kunniga och intelligenta personers intuitioner går kraftigt isär. I ett sådant läge är jag såklart inte opåverkad av vad forskare och tänkare jag räknar som av yttersta toppklass tänker om saken även då de inte har så mycket att tillföra i sak. Därför är det av intresse att datalogen Scott Aaronson nu meddelat en justering av sin syn på superintelligens och AI-risk (händelsevis i samma bloggpost som uppmärksammade mig på det i föregående punkt omtalade NFL-tramset). Så här skriver han:
      A decade ago, it was far from obvious that known methods like deep learning and reinforcement learning, merely run with much faster computers and on much bigger datasets, would work as spectacularly well as they’ve turned out to work, on such a wide variety of problems, including beating all humans at Go without needing to be trained on any human game. But now that we know these things, I think intellectual honesty requires updating on them. And indeed, when I talk to the AI researchers whose expertise I trust the most, many, though not all, have updated in the direction of “maybe we should start worrying.” (Related: Eliezer Yudkowsky’s There’s No Fire Alarm for Artificial General Intelligence.)

      Who knows how much of the human cognitive fortress might fall to a few more orders of magnitude in processing power? I don’t—not in the sense of “I basically know but am being coy,” but really in the sense of not knowing.

      To be clear, I still think that by far the most urgent challenges facing humanity are things like: resisting Trump and the other forces of authoritarianism, slowing down and responding to climate change and ocean acidification, preventing a nuclear war, preserving what’s left of Enlightenment norms. But I no longer put AI too far behind that other stuff. If civilization manages not to destroy itself over the next century—a huge “if”—I now think it’s plausible that we’ll eventually confront questions about intelligences greater than ours: do we want to create them? Can we even prevent their creation? If they arise, can we ensure that they’ll show us more regard than we show chimps? And while I don’t know how much we can say about such questions that’s useful, without way more experience with powerful AI than we have now, I’m glad that a few people are at least trying to say things.

  • Som meddelades i en tidigare bloggpost så hade Stockholm besök förra månaden av två av vår tids vassaste och kanske viktigaste filosofer: Nick Bostrom och Daniel Dennett. I samband med marknadsföringen av de svenska översättningarna av deras respektive senaste böcker hade Fri Tankes förlagschef Christer Sturmark lyckat få till stånd ett seminarium med dem båda, under rubriken Medvetandets mysterium och tänkande maskiner. Jag var där, tillsammans med cirka 800 andra (en fantastisk siffra för ett filosofiseminarium!), och mina förväntningar var ganska höga. Framför allt hoppades jag att Dennett äntligen skulle pressas att ge solida argument för den ståndpunkt han ofta ventilerat men sällan eller aldrig motiverat: att det slags AI-risk (förknippad med uppkomsten av superintelligens) som Bostrom behandlar i sin bok inte är något att bry sig om.

    Så hur gick det? Vad tyckte jag om seminariet? Gav Dennett det slags besked jag hoppades på? Istället för att orda om detta här hänvisar jag till det poddsamtal som Sturmark och jag spelade in dagen efter seminariet, och som i hög grad kretsade kring vad Bostrom och Dennett hade talat om. Jag tror att vi fick till ett hyfsat givande samtal, så lyssna gärna på det! Men framför allt, tag gärna del av videoinspelningen av själva seminariet!

fredag 27 oktober 2017

On Tegmark's Life 3.0

I've just read Max Tegmark´s new book Life 3.0: Being Human in the Age of Artificial Intelligence.1 My expectations were very high after his previous book Our Mathematical Universe, and yes, his new book is also good. In Tegmark's terminology, Life 1.0 is most of biological life, whose hardware and software are both evolved; Life 2.0 is us, whose hardware is mostly evolved but who mostly create our own software through culture and learning; and Life 3.0 is future machines who design their own software and hardware. The book is about what a breakthrough in artificial intelligence (AI) might entail for humanity - and for the rest of the universe. To a large extent it covers the same ground as Nick Bostrom's 2014 book Superintelligence, but with more emphasis on cosmological perspectives and on the problem of consciousness. Other than that, I found less novelty in Tegmark's book compared to Bostrom's than I had expected, but one difference is that while Bostrom's book is scholarly quite demanding, Tegmark's is more clearly directed to a broader audience, and in fact a very pleasant and easy read.

There is of course much I could comment upon in the book, but to keep this blog post short, let me zoom in on just one detail. The book's Figure 1.2 is a very nice diagram of ways to view the possibility of a future superhuman AI, with expected goodness of the consequences (utopia vs dystopia) on the x-axis, and expected time until its arrival on the y-axis. In his discussion of the various positions, Tegmark emphasizes that "virtually nobody" expects superhuman AI to arrive within the next few years. This is what pretty much everyone in the field - including myself - says. But I've been quietly wondering for some time what the actual evidence is for the claim that the required AI breakthrough will not happen in the next few years.2 Almost simultaneously with reading Life 3.0, I read Eliezer Yudkowsky's very recent essay There’s No Fire Alarm for Artificial General Intelligence, which draws attention to the fact that all of the empirical evidence that is usually held forth in favor of the breakthrough not being imminent describes a general situation that can be expected to still hold at a time very close to the breakthrough.3 Hence the purported evidence is not very convincing.

Now, unless I misremember, Tegmark doesn't actually say in his book that he endorses the view that a breakthrough is unlikely to be imminent - he just says that this is the consensus view among AI futurologists. Perhaps this is not an accident? Perhaps he has noticed the lack of evidence for the position, but chooses not to advertise this? I can see good reasons to keep a low profile on this issue. First, when one discusses topics that may come across as weird (AI futurology clearly is such a topic), one may want to somehow signal sobriety - and saying that an AI breakthrough in the next few years is unlikely may serve as such a signal. Second, there is probably no use in creating panic, as solving the so-called Friendly AI problem seems unlikely to be doable in just a few years. Perhaps one can even make the case that these reasons ought to have compelled me not to write this blog post.

Footnotes

1) I read the Swedish translation, which I typically do not do with books written in English, but this time I happened to receive it as a birthday gift from the translators Helena Sjöstrand Svenn and Gösta Svenn. The translation struck me as highly competent.

2) An even more extreme version of this question is to ask what the evidence is that, in a version of Bostrom's (2014) notion of the treacherous turn, the superintelligent AI already exists and is merely biding its time. It was philosopher John Danaher, in his provocative 2015 paper Why AI doomsayers are like sceptical theists and why it matters, who brought attention to this matter; see my earlier blog post A disturbing parallel between AI futurology and religion.

3) Here is Yudkowsky's own summary of this evidence:
    Why do we know that AGI is decades away? In popular articles penned by heads of AI research labs and the like, there are typically three prominent reasons given:

    (A) The author does not know how to build AGI using present technology. The author does not know where to start.

    (B) The author thinks it is really very hard to do the impressive things that modern AI technology does, they have to slave long hours over a hot GPU farm tweaking hyperparameters to get it done. They think that the public does not appreciate how hard it is to get anything done right now, and is panicking prematurely because the public thinks anyone can just fire up Tensorflow and build a robotic car.

    (C) The author spends a lot of time interacting with AI systems and therefore is able to personally appreciate all the ways in which they are still stupid and lack common sense.