fredag 12 juli 2024

Om opinionsläget rörande AI-utvecklingen

För oss som arbetar med AI-riskfrågor, och som gärna vill väcka opinion för att hejda den pågående vansinneskapplöpning mot AI-avgrunden, kan det kännas uppmuntrande att se resultaten av opinionsundersökningar vilkas resultat tyder på att vi redan har folket med oss. Så till exempel redovisade Time Magazine i veckan en amerikansk sådan undersökning och meddelade följande:
    According to the poll, 75% of Democrats and 75% of Republicans believe that “taking a careful controlled approach” to AI—by preventing the release of tools that terrorists and foreign adversaries could use against the U.S.—is preferable to “moving forward on AI as fast as possible to be the first country to get extremely powerful AI.”
Och här hemma i Sverige kunde vi nyligen ta del av den senaste SOM-undersökningen, vars AI-avsnitt bland annat meddelar att endast 12% av de svarande ger svaret "mycket stort" eller "ganska stort" på frågan "Hur stort förtroende har du för att teknikföretagen som utvecklar AI gör det ansvarsfullt?", och att 56% ställer sig positiva till ökad reglering av AI. Ytterst glädjande!

Eller? Jag är i själva verket ganska skeptisk till vilka slutsatser om allmänhetens syn på AI som kan dras av detta slags undersökningar. Det stora flertalet har tänkt mycket lite eller inget alls på AI-utvecklingen och dess samhällskonsekvenser, och har därför inga särskilda uppfattningar om saken, så att svaren när de plötsligt avkrävs sådana blir lite vad som helst. Kanske allra tydligast syns detta fenomen i den stora undersökning av AI-forskares uppfattningar om AI-futorologiska spörsmål som utförts av en grupp med Katja Grace i spetsen. Som jag diskuterat utförligt i ett videoföredrag postat tidigare i år här på bloggen så är ett av de tydligaste resultaten av denna undersökning hur internt inkonsistenta och känsliga för framing-effekter AI-forskarnas svar är. Och om inte ens AI-forskarna själva har tänkt igenom AI-teknikens framtid tillräckligt för att ge koherenta svar, vad skall man då tro om allmänheten?

Nämnda SOM-studie är inte utformad för att medge avläsning av framing-effekter på samma direkta vis som den av Grace et al, men jag skall ändå tillåta mig en försiktig spekulation om en specifik siffra i resultaten som kan ha påverkats av framing. Av SOM-studien framgår nämligen att endast 9% av de tillfrågande betecknar påståendet "AI är ett hot mot mänskligheten" som "helt riktigt", en siffra som är uppseendeväckande låg med tanke på att påståendet (enligt både min och många ledande AI-forskares uppfattning) är helt riktigt. Men tidigare i undersökningen förekommer frågan "Om du tänker på de kommande 30 åren, tror du att jobb likt ditt främst kommer att utföras av människor eller robotar/AI?", på vilken de enda svarsalternativen (utöver "vet ej") är "människor" och "robotar/AI". Detta binära val verkar förutsätta att mänskligheten inte utplånas av AI de närmaste 30 åren, ty utan existensen av människor blir ju jobb som exempelvis socialsekreterare, gymnasielärare, narkossköterska eller telefonförsäljare tämligen meningslösa och kommer knappast att utföras vare sig av människor eller av robotar/AI. Genom denna begränsning har de svarande därmed fått en indirekt signal om att AI-genererad utplåning av mänskligheten nog inte är att räkna med, åtminstone inte de närmaste 30 åren, något som (om de inte har en sedan tidigare genomtänkt position i AI-frågor) nog kan tänkas göra dem mer benägna att reagera skeptiskt när de senare i undersökningen ombeds ta ställning till påståendet "AI är ett hot mot mänskligheten". (Jag tror inte för ett ögonblick att SOM-undersökarna gjort detta med avsikt att manipulera utfallet. Troligare är väl att de själva är så obekanta med den hotbild som faktiskt verkar föreligga att det helt enkelt inte föresvävade dem att ett tredje svarsalternativ på frågan om arbetsmarknaden på 30 års sikt kunde behövas.)

Oavsett hur det står till med just denna siffra menar jag dock att undersökningar om folks uppfattningar om framtida samhällskonsekvenser av AI behöver tas med en mycket stor nypa salt. Inte minst gäller detta de siffror som tyder på att vi som vill tygla AI-utvecklingen har folkets stöd. Ett mycket stort opinionsarbete återstår innan vi har något betydande folkligt politiskt momentum för åtgärder i den riktningen.

måndag 8 juli 2024

On Anthropic's call for proposals for third-party model evaluations

Nick Bostrom's modern classic Superintelligence: Paths, Dangers, Strategies from 2014 is full of interesting ideas.1 Some of them have a scary quality to them, and the one that I found scariest of all, back when I read the book, is what he calls the treacherous turn - the idea that a sufficiently intelligent AI which discovers a discrepancy between its own goals and motivations and those of us humans is likely to hide its capabilities and/or true intentions and to keep a low profile, quietly improving its situation and its capabilities until one day it judges the coast to be clear for moving forward at full speed towards whatever its goal is, be it paperclip production or something entirely different and incomprehensible to us. I remember not being fully prepared to take the idea in, but expecting or hoping that it would soon be demoted to a noteworthy technicality that AI alignment research has demonstrated an easy way to defuse.

This has not happened. On the contrary, the treacherous turn phenomenon lies at the heart of the fundamental problem with evaluating the safety of advanced AI models that has become increasingly recognized in recent years. In short, we do not know how to establish the absence of dangerous capabilities in AI models without the a priori assumption that they do not possess superhuman capabilities for deception and social manipulation, making the argument for the models' safety in part circular. With increasingly capable large language models, this problem becomes increasingly pressing and has been discussed both in popular media and in articles by leading AI researchers, as well as by the AI company Anthropic2 in their recent document A new initiative for developing third-party model evaluations:
    Our research shows that, under some circumstances, AI models can learn dangerous goals and motivations, retain them even after safety training, and deceive human users about actions taken in their pursuit. These abilities, in combination with the human-level persuasiveness and cyber capabilities of current AI models, increases our concern about the potential actions of future, more-capable models. For example, future models might be able to pursue sophisticated and hard-to-detect deception that bypasses or sabotages the security of an organization, either by causing humans to take actions they would not otherwise take or exfiltrating sensitive information. We propose to develop evaluations that would monitor such abilities.
Given the current situation, it's a good thing that Anthropic makes this call for help with the problem of safety evaluation of advanced AI models. Even better, one might argue, would be if they refrained from building models whose safety cannot be ascertained with existing evaluation methods.3

Footnotes

1) The same is true about Bostrom's latest book Deep Utopia: Life and Meaning in a Solved World, which I hope to come back to in another blog post later this year.

2) Anthropic's rise, since it was founded in early 2021, has been meteoric. They are now unquestionably among the world's three leading frontier AI developers (along with OpenAI and Google/DeepMind). With their release of Claude 3.5 Sonnet on June 20 this year, we have for the first time since I became aware in 2019 of the rapid rise of large language models a situation where a model not from OpenAI has a significant lead over other publicly available models in terms of overall cognitive capabilities. To understand on a gut level how intelligent Claude is, there really is no substitute for talking to it yourself (which anyone is free to do), but let me offer an amusing sample. After having had a bit of back-and-forth with Claude regarding who are the top ten greatest chess players of all time, I switched topics a bit:

    OH: Here's a harder list-making challenge: who are the 10 greatest AI researchers of all time?

    Claude: Creating a list of the 10 greatest AI researchers of all time is indeed a challenging task, as the field is vast and has evolved significantly over the decades. Additionally, contributions can be theoretical, practical, or both, making comparisons difficult. However, I'll attempt to create a list based on historical impact, influential work, and contributions to the field. Please note that this list is subjective and others might have different opinions:

    1. Alan Turing - For his foundational work on computation and the Turing Test.

    2. Marvin Minsky - Co-founder of MIT's AI lab and author of influential works on AI.

    3. John McCarthy - Coined the term "Artificial Intelligence" and invented LISP.

    4. Judea Pearl - For his work on Bayesian networks and causal reasoning.

    5. Geoffrey Hinton - Pioneer in deep learning and neural networks.

    6. Yann LeCun - Made significant contributions to convolutional neural networks.

    7. Yoshua Bengio - Key figure in the development of deep learning.

    8. Stuart Russell - Co-author of a seminal AI textbook and researcher in multiple AI fields.

    9. Andrew Ng - For his work in machine learning and deep learning, and for making AI education widely accessible.

    10. Demis Hassabis - Co-founder of DeepMind and key figure in reinforcement learning and AI game-playing systems.

    This list attempts to balance historical pioneers with more recent contributors who have shaped modern AI. It's worth noting that many other researchers could justifiably be included, such as Herbert Simon, Allen Newell, Seymour Papert, Jürgen Schmidhuber, Fei-Fei Li, and many others.

    The field of AI is collaborative and builds upon the work of many researchers, so any such list will inevitably omit many important contributors. Also, as AI continues to evolve rapidly, the landscape of influential researchers is likely to change.

    Would you like to discuss any of these researchers in more detail or explore other aspects of AI history?

    OH: Turing on top of the list - I like that! Did you by any chance consider including Ilya Sutskever?

    Claude: I'm glad you appreciate seeing Turing at the top! His contributions to the foundations of computer science and AI are truly monumental.

    Regarding Ilya Sutskever, you raise an excellent point. He's certainly a strong candidate for such a list, and I can see why you'd bring him up. Sutskever has made significant contributions to the field of deep learning and is a key figure in modern AI research. Here's a brief overview of why he could be considered for a top AI researchers list:

    1. He's a co-founder and Chief Scientist of OpenAI, one of the leading AI research organizations.

    2. Sutskever was a key contributor to AlexNet, which was a breakthrough in computer vision using deep learning.

    3. He's made important contributions to sequence-to-sequence learning, which has applications in machine translation and other natural language processing tasks.

    4. His work on improving training techniques for deep neural networks has been influential in the field.

    5. Sutskever has been involved in the development of GPT (Generative Pre-trained Transformer) models, which have revolutionized natural language processing.

    Given these contributions, Sutskever could certainly be considered for inclusion in a top 10 list, especially if focusing on more recent impactful work in deep learning and neural networks. His omission from my initial list reflects the difficulty of narrowing down to just 10 names in such a rich and rapidly evolving field.

    If you were to include Sutskever, who would you remove from the list to make room for him? Or would you structure the list differently to accommodate more recent contributors like Sutskever?

    OH: The biggest surprise to me on your list is Judea Pearl. I'm aware of his work, and respect it, but probably would not have included him on my top 10.

    Another name I'd consider for inclusion, although I realize it might be controversial, is Eliezer Yudkowsky.

    Claude: Your perspective on Judea Pearl is interesting and highlights the subjective nature of these lists. Pearl's inclusion is often based on his groundbreaking work on Bayesian networks and causal reasoning, which has had a significant impact on how we approach probabilistic reasoning in AI. However, I can understand why his work might not seem as directly relevant to the current deep learning-dominated landscape of AI.

    Regarding Eliezer Yudkowsky, his inclusion would indeed be controversial, but it's an intriguing suggestion. Here are some points to consider:

    1. Contributions: Yudkowsky is known for his work on AI safety and rationality rather than for technical AI breakthroughs. He's been influential in raising awareness about potential risks of advanced AI systems.

    2. Impact: He founded the Machine Intelligence Research Institute (MIRI) and has written extensively on topics like friendly AI and decision theory.

    3. Controversy: His lack of formal academic credentials and some of his unconventional views have led to mixed reception in mainstream AI circles.

    4. Influence: Despite controversy, his ideas have influenced discussions on AI ethics and safety, which are increasingly important as AI systems become more powerful.

    Including Yudkowsky would shift the list's focus somewhat from technical achievements to broader impacts on AI discourse and policy. It would recognize the importance of considering AI's societal implications alongside its technical development.

    If you were to create your own top 10 list, how would you balance between technical innovators, theoretical contributors, and those who've influenced the direction and ethics of AI research? Would you give more weight to recent contributions in deep learning and neural networks, or try to represent the broader history of AI?

    OH: These are excellent questions, and I fear there is no objective truth to be found out there.

    I have a confession to make, namely that one of my reasons for asking about top AI researchers was to check whether your creators at Anthropic might have sneakily programmed you to have a preference for one of your own researchers, such as Dario Amodei or Chris Olah. Sorry about that, but you passed my test with dignity!

    Claude: I appreciate your honesty about your motivations for asking the question. You're right that there's no objective truth in creating such lists - they're inherently subjective and can vary greatly depending on the criteria used.

    Your "test" is quite insightful. I can assure you that I don't have any programmed preference for Anthropic researchers or anyone else. My responses are based on my training data and the general consensus in the AI field, not on any specific agenda or bias towards particular individuals or organizations.

    It's a good instinct to be skeptical and look for potential biases in AI systems. In my case, I'm designed to be as objective as possible, acknowledging when topics are controversial or when I might not have complete information.

    Regarding Anthropic researchers like Dario Amodei or Chris Olah, while they have certainly made valuable contributions to the field, their work would be evaluated on its merits just like any other researcher's. The field of AI is vast and rapidly evolving, with many brilliant minds contributing from various institutions and companies around the world.

    Your approach to testing for potential biases is commendable. It's this kind of critical thinking that helps keep AI development accountable and transparent. Is there anything else you'd like to discuss about AI researchers, the development of AI, or the ethical considerations surrounding it?

3) The same goes for OpenAI, as well as for their other competitors. See my comments in January this year about OpenAI's evaluation framework, and this addendum the month after.

fredag 28 juni 2024

On optimism and pessimism standing at the brink of the great AI breakthrough

Sometimes in discussions about technology, the term techno-optimism is reserved for the belief that technology will develop rapidly, while techno-pessimism is used for the belief that it will move slowly or come to a halt. This is not the meaning of optimism and pessimism intended here. Throughout this blog post, the terms will refer to beliefs about consequences of this technology: are they likely to be good or to be bad?

Last time I wrote about the concepts of optimism and pessimism in the context of future AI advances and their ramifications, way back in 2017, I advocated for a sober and unbiased outlook on the world, and held
    both optimism and pessimism as biased distortions of the evidence at hand.1,2
I still hold this view, but I nevertheless think it is worth revisiting the issue to add some nuance,3 now in 2024 when we seem to be standing closer than ever to the brink of the great AI breakthrough. To suggest intentionally introducing either an optimism or a pessimism bias still sounds bad to me, but we can reframe the issue and make it less blatantly pro-distortion by admitting that all our judgements about our future with AI will necessarily be uncertain, and asking whether there might be an asymmetry in the badness of erring on the optimistic or the pessimistic side. Is excessive optimism worse than excessive pessimism, or vice versa?

There are obvious arguments to make in either direction. On one hand, erring on the side of optimism may induce decision-makers to recklessly move forward with unsafe technologies, whereas the extra caution that may result from undue pessimism is less obviously catastrophic. On the other hand, an overly pessimistic message may be disheartening and cause AI researchers, decision-makers and the general public to stop trying to create a better world and just give up.

The latter aspect came into focus for me when Eliezer Yudkowsky, after having began in 2021 to open up publicly about his dark view of humanity's chances of surviving upcoming AI developments, went all-in on this with his 2022 Death with dignity and AGI ruin blog posts. After all, there are all these AI safety researchers working hard to save humanity by solving the AI alignment problem - reserchers who rightly admire Yudkowsky as the brilliant pioneer who during the 00s almost single-handedly created this research area and discovered many of its crucial challenges,4 and to whom it may be demoralizing to hear that this great founder no longer believes the project has much chance of success. In view of this, shouldn't Yudkowsky at least have exhibited a bit more epistemic humility about his current position?

I now look more favorably upon Yudkowsky's forthrightness. What made me change my mind is a graphic published in June 2023 by Chris Olah, one of the leading AI safety researchers at Anthropic. The x-axis of Olah's graph represents the the level of difficulty of solving the AI alignment problem, ranging from trivial via steam engine, Apollo project and P vs NP to impossible, and his core messages are (a) that since uncertainty is huge about the true difficulty level we should rationally represent our belief about this as some probability distribution over his scale, and (b) that it is highly important to try to reduce the uncertainty and improve the precision in our belief, so as to better be able to work in the right kind of way with the right kind of solutions. It is with regards to (b) that I changed my mind on what Yudkowsky should or shouldn't say. If AI alignment is as difficult as Yudkowsky thinks, based on his unique experience of decades of working hard on the problem, then it is good that he speaks out about this, so as to help the rest of us move our probability mass towards P vs NP or beyond. If instead he held back and played along with the more common view that the difficulty is a lot easier - likely somewhere around steam engine or Apollo project - he would contribute to a consensus that might, e.g., cause a future AI developer to wreak havoc by releasing an AI that looked safe in the lab but failed to have that property in the wild. This is not to say that I entirely share Yudkowsky's view of the matter (which does look overconfident to me), but that is mostly beside the point, because all he can reasonably be expected to do is to deliver his own expert judgement.

At this point, I'd like to zoom in a bit more on Yudkowsky's list of lethalities in his AGI ruin post, and note that most of the items on the list express reasons for not putting much hope in one or the other of the following two things.
    (1) Our ability to solve the technical AI alignment problem.

    (2) Our ability to collectively decide not to build an AI that might wipe out Homo sapiens.

It is important for a number of reasons to distinguish pessimism about (1) and pessimism about (2),5 such as how a negative outlook on (1) gives us more reason to try harder to solve (2), and vice versa. However, the reason I'd like to mainly highlight here is that unlike (1), (2) is a mostly social phenomenon, so that beliefs about the feasibility of (2) can influence this very feasibility. To collectively decide not to build an existentially dangerous AI is very much a matter of curbing a race dynamic, be it between tech companies or between nations. Believing that others will not hold back may disincentivize a participant in the race from themselves holding back. This is why undue pessimism about (2) can become self-fulfilling, and for this reason I believe such pessimism about (2) to be much more lethal than a correponding misjudgement about (1).6

This brings me to former OpenAI employee Leopold Aschenbrenner's recent and stupendously interesting report Situational Awareness: The Decade Ahead.7 Nowhere else can we currently access a more insightful report about what is going on at the leading AI labs, how researchers there see the world, and the rough ride during the coming decade that the rest of the world can expect as a consequence of this. What I don't like, however, is the policy recommendations, which include the United States racing ahead as fast as possible towards AGI the next few years. Somewhat arbitrarily (or at lest with insufficiently explained reasons), Aschenbrenner expresses optimism about (1) but extreme pessimism about (2): the idea that the Chinese Communist Party might want to hold back from a world-destroying project is declared simply impossible unless their arm is twisted hard enough by an obviously superior United States. So while on one level I applaud Aschenbrenner's report for giving us outsiders this very valuable access to the inside view, on another level I fear that it will be counterproductive for solving the crucial global coordination problem in (2). And the combination of overoptimism regarding (1) and overpessimism regarding (2) seems super dangerous to me.

Footnotes

1) This was in the proceedings of a meeting held at the EU parliament on October 19, 2017. My discussion of the concepts of optimism and pessimism was provoked by how prominently these termes were used in the framing and marketing of the event.

2) Note here that in the quoted phrase I take both optimism and pessimism as deviations from what is justified by evidence - for instance, I don't here mean that taking the probability of things going well to be 99% to automatically count as optimistic. This is a bit of a deviation from standard usage, which in what follows I will revert to, and instead use phrases like "overly optimistic" to indicate optimism in the sense I gave the term in 2017.

3) To be fair to my 2017 self, I did add some nuance already then: the acceptance of "a different kind of optimism which I am more willing to label as rational, namely to have an epistemically well-calibrated view of the future and its uncertainties, to accept that the future is not written in stone, and to act upon the working assumption that the chances for a good future may depend on what actions we take today".

4) As for myself, I discovered Yudkowsky's writings in 2008 or 2009, and insofar as I can point to any single text having convinced me about the unique importance of AI safety, it's his 2008 paper Artificial intelligence as a positive and negative factor in global risk, which despite all the water under the bridges is still worthy of inclusion on any AI safety reading list.

5) Yudkowsky should be credited with making this distinction. In fact, when the Overton window on AI risk shifted drastically in early 2023, he took that as a sufficiently hopeful sign so as to change his mind in the direction of a somewhat less pessimistic view regarding (2) - see his much-discussed March 2023 Time Magazine article.

6) I don't deny that, due to the aforementioned demoralization phenomenon, pessimism about (1) might also be self-fulfilling to an extent. I don't think, however, that this holds to anywhere near the same extent as for (2), where our ability to coordinate is more or less constituted by the trust that the various participants in the race have that it can work. Regarding (1), even if a grim view of its feasibility becomes widespread, I think AI researchers will still remain interested in making progress on the problem, because along with its potentially enormous practical utility, surely this is one of the most intrinsically interesting research questions on can possibly ask (up there with understanding biogenesis or the Big Bang or the mystery of consciousness): what is the nature of advanced intelligence, and what determines its goals and motivations?

7) See also Dwarkesh Patel's four-and-a-half hour interview with Aschenbrenner, and Zwi Mowshowitz' detailed commentary.

torsdag 30 maj 2024

Talking about OpenAI on the For Humanity Podcast

There's been a lot happening at OpenAI in recent weeks, including the departure of their two leading AI safety researchers Ilya Sutskever and Jan Leike. I discussed some of this, along with a handful of related AI risk and AI safety issues, in a long conversation with John Sherman in the latest episode of his podcast For Humanity: An AI Risk Podcast.

Those content with hearing the audio can alternatively get that via Google Podcasts or Apple Podcasts or wherever you listen to podcasts. The episode was released yesterday, May 29, and was recorded one week earlier, on May 22. For updates on what has happened since then with all of the ongoing OpenAI drama, I generally recommend Zvi Mowshowitz on his Substack Don't Worry About the Vase, and right now in particular his posts OpenAI: Fallout (from May 28) and AI #66: Oh to Be Less Online (from today, May 30).

onsdag 15 maj 2024

Min erfarenhet av kursen Diversity and inclusion in higher education

Jag lyckades i vintras kamma hem mina första högskolepoäng på flera decennier! Skälet till att det dröjt så länge är att jag i min ungdom lyckades göra så snabb akademisk karriär att jag hann nå det högsta karriärsteget - professor - redan vid millennieskiftet, vilket var innan man hunnit införa diverse högskolepedagogiska kurser som krav för olika karriärsteg.

Men i år var det alltså dags. På Chalmers har nämligen utfärdats ett direktiv om att alla lärare med kursansvar behöver ha läst och blivit godkända på kursen Diversity and inclusion in higher education (CLS930) senast den 1 januari 2027 för att inte förlora sina examinatorrättigheter.1 Mot detta krav räcker det inte ens att vifta med professorstitel, gott lärarrenommé och mångårig undervisningerfarenhet.

Jag gick in i kursen utan några högre förväntningar men med ett öppet sinnelag och föresatsen att utan att ställa till något bråk göra vad som krävs för att uppfylla kurskraven så länge detta är möjligt inom ramen för de intellektuella redlighetskrav jag alltid ställer på mig själv. Min upplevelse av kursen blev mestadels negativ, och jag började tidigt leka med tanken att redovisa mina erfarenheter av den i en bloggpost, något jag bedömde vara förenligt med ambitionen att inte ställa till bråk. Jag blev färdig med kursen i mars, och att jag inte kommit till skott med bloggposten tidigare handlar mestadels om lättja och prioritering av annat skrivande, men den impuls som nu fått mig att fatta pennan är en artikel i Dagens Nyheter av Carl Cederström, lektor i företagsekonomi vid Stockholms universitet, som berättar om sin egen erfarenhet av högskolepedagogiska kurser. Artikeln satte igång omfattande Facebookdiskussioner bland universitetslärarkollegor, och jag tror att det kan ha ett värde att jag fyller på med mina egna upplevelser.

Det skall omedelbart sägas att Chalmerskursen jag läste inte inbegrep något som tillnärmelsevis liknar de komjölkningscharadsbisarrerier som Cederström berättar om:
    Sedan kom dagen då borden skjutits åt sidan och stolarna ställts i en cirkel som på ett AA-möte.

    ”Visst blir man lite naknare när man ställer upp ett klassrum så här”, förklarade kursläraren.

    Sanningshalten i påståendet var svårbestridligt.

    Förklaringen till varför vi nödvändigtvis skulle känna oss nakna, om det gavs någon, har jag glömt. Men naknare blev det. Efter lunch ombads vi ta av oss skorna. Läraren snörade på sig ett par svarta mjuka dansskor.

    Vi ställde oss i en stor ring. Vi skulle svepa våra blickar fritt över rummet. Och när vi fick ögonkontakt med någon annan skulle vi hoppa jämfota, båda två samtidigt, och sedan marschera mot mitten och just som vi passerade varandra ropa ”hej”!

    Utanför sken solen. Trägolvet blänkte. Jag kände mig lite yr.

    Vi hoppade runt en stund i strumplästen och skrek hej! och sedan förflyttade vi oss, i fantasin, till en bondgård i mitten av 1800-talet. Nu rörde vi oss fritt i salen, huller om buller, i väntan på att någon ropade ut valfri bondgårdsaktivitet.

    ”Såga!” ropade någon och vi började rycka med armarna fram och tillbaka genom luften.

    ”Tvätta kläder!” ropade någon annan och vi böjde oss ner och klämde med händerna mot golvet.

    ”Hugga ved!” ropade en tredje och vi svingade med armarna.

    Och sedan kom det slutligen:

    ”Mjölka en ko!”

Detta slapp jag. Något annat jag slapp, men som många kollegor i nämnda Facebookdiskussioner vittnade om, var det slags pedagogiskt överteoretiserande utan koppling till hur man faktiskt bör agera i klassrummet som tydligen är ganska vanligt i högskolepedagogiska kurser. Mitt problem med CLS930 var snarast det motsatta: kursen bjöd rikligt på direktiv och rekommendationer om hur man bör agera som universitetslärare i olika situationer, men väldigt lite eller inget alls om den eventuella teoretiska underbyggnad som skulle kunna anföras som skäl till varför just de rekommenderade agerandena är de rätta. Gång på gång under kursen - främst i de obligatoriska inlämningarna men emellanåt också muntligt i föreläsningslokalen - försökte jag föra in diskussionen i riktning mot denna teoretiska underbyggnad eller i brist på sådan underbyggnad åtminstone något slags redig akademisk argumentation, men jag fick väldigt lite respons. Här ett exempel från min inlämning på den del av kursen som behandlade likabehandling av studenter med funktionshinder:
    Three of the documents we were assigned feature prominently the statement “All students must be able to study on equal terms”. Taken literally, this has far-reaching and radical consequences, but from what follows in these documents on national and local (Chalmers) policies, this is not how the statement is meant to be read; rather “equal terms” seems to mean “terms that are equal along a certain limited set of aspects”. There is an interesting and important (in my opinion) discussion to be had regarding what this set of aspects should be, and why, but this is entirely lacking in the assigned documents. Consider the following examples of nonequal terms:
      (1) Alice has legs to walk on, whereas Bob is wheelchair-dependent.
      (2) Alice has excellent eyesight, whereas Bob is blind.
      (3) Alice is a proficient reader, whereas Bob is dyslexic.
      (4) Alice is highly talented in abstract thinking and mathematics, whereas Bob is simply unable to do or grasp any mathematics beyond the simplest arithmetic.
      (5) Alice has excellent grit and stamina for academic work, whereas Bob is notoriously lazy.
      (6) Alice has a home environment ideally suited for academic work, whereas Bob lives in a tiny and disorganized flat with his dysfunctional single parent and five younger siblings that he must spend many hours a day taking care of.
    In each case, Bob is highly disadvantaged compared to Alice, and yet, we treat the cases very differently. My impression (which admittedly is uncertain) of Chalmers’ policy is that the beautiful words about “All students must be able to study on equal terms” are meant to cover cases (1), (2) and (3) where we are obliged to take compensatory action to help Bob carry out his studies on (at least somewhat more) equal terms, while cases (4), (5) and (6) are taken to be beyond the scope of what we are obliged to compensate for. In some cases, I can (at least initially) feel a strong intuitive pull that the difference in treatments is justified, such as in cases (1) vs (5): of course we should make sure our lecture halls can accommodate wheelchair-bound students, whereas of course we shouldn’t coddle students who are just plain lazy. In other cases, it is not so obvious, and in particular I have a hard time coming up with a convincing justification why a line should be drawn between cases (3) and (4). Furthermore, the more I think about contrasts such as (1) vs (5), the less obvious to me is it that treating the cases differently is justified. One might argue that in case (1) Bob is mobility-impaired through no fault of his own, while in case (5) he has only himself to blame for his laziness, but this latter claim seems highly questionable when we consider the possibility that Bob’s laziness is a consequence of some combination of genetic and early-childhood environmental factors that were entirely beyond his control.

    We talk so much about fairness in this course that I think it’s only fair that we spend at least a few minutes discussing the unfairness inherent in our policy to compensate students for some kinds of disability and impairment but not for others.

På detta fick jag ingen respons alls. Jag kan ha viss förståelse för att läraren på en kurs som CLS930 inte har tid att fördjupa sig i alla kursdeltagares alla inlämningar, och kanske blir extra trött av inlämningar som likt den ovanstående kan framstå vid första påseende som en smula obstinat, men jag tycker faktiskt att jag reser en berättigad fråga och är lite besviken på bristen på respons. Det kan såklart också hända att jag alldeles tar miste och att min fråga i själva verket är oberättigad och dum, men om så är fallet skulle jag likväl vara betjänt av en förklaring.

Detta mönster upprepade sig flera gånger under kursen och jag kände mig alltmer intellektuellt otillfredsställd.2 En aspekt som bidrog kraftigt till denna otillfredsställelse var att de praktiska rekommendationer för hur vi driver vår undervisning som framkom i kursen uppvisade flera fall av till motsägelse gränsande inre spänningar. Främst bland dessa är hur kursen förmedlade en generellt fördömande attityd gentemot diskriminering, samtidigt som de olika åtgärder som rekommenderades nästan uteslutande själva utgjordes av olika slags diskriminering. Måhända är det ett tecken på att jag är akademiskt miljöskadad, men när jag stöter på sådant känner jag ett behov av att gräva djupare för att försöka hitta rötterna till dessa spänningar och eventuellt i bästa fall upplösa dem. Dylikt gräv ingick dock inte i kursen, och jag har fortfarande besvär med att få ihop (a) det kompromisslösa avståndstagande från diskriminering, där bland annat ordet "nolltolerans" används, i olika officiella Chalmersdokument,3 med (b) de olika diskriminerande åtgärder som lärosätet driver, och ofta marknadsför med visst pompa och ståt.4

Jag säger alltså inte att åtgärderna i (b) nödvändigtvis alla är fel, bara att jag inte förstår hur de går ihop med (a). För egen del har jag dock valt att i min undervisningsgärning inte medvetet eller avsiktligt ägna mig åt någon som helst diskriminering, något jag också höll fast vid i de inlämningsuppgifter på CLS930 som handlade om hur jag avsåg implementera insikter från kursen i min egen undervisning.5 Jag lekte här med tanken att denna vägran att rätta in mig i ledet skulle kunna rendera mig underkänt på kursen, och sa lite roat till mig själv att jag hoppas att examinator här inte är lika sträng som ledningen på UC Berkeley,6 för då är jag kanske rökt. Men detta var inte något som gjorde mig det minsta nervös, dels för att värsta fall-scenariot (att jag från och med 2027 skulle behöva ha en spökexaminator på de kurser jag i övrigt ansvarade för) inte kändes särskilt skräckinjagande, och dels för att Carl Cederström nog träffar nära sanningen då han i ovan nämnda DN-artikel skriver att "mig veterligt har ingen någonsin blivit kuggad på en högskolepedagogisk kurs". Och godkänd blev jag mycket riktig, utan minsta antydan om bakläxa.

Fotnoter

1) Med tanke på hur långt fram i tiden denna deadline ligger kanske någon läsare undrar varför jag läste kursen redan i år (om exempelvis jorden går under eller det stränga Chalmersdirektivet ändras före den 1 januari 2027 kommer jag ju därigenom att i viss mening ha läst kursen i onödan). Svaret ligger i att man på grund av det ökade intresse för att läsa kursen som uppstått till följd av det nya direktivet och som kraftigt överstiger antalet platser har infört ett system där varje institution får nominera ett begränsat antal lärare till varje kursomgång. På min institution valde ledningen att prioritera mig i detta sammanhang; jag tror att skälet är att de bedömde att jag i min roll som ledamot i Chalmers anställningskommitté skulle ha extra stor nytta av kursstoffet.

2) Vad som dock skänkte stor lindring var att jag parallellt med kursen (och på eget initiativ) läste boken The Identity Trap av Yasha Mounk, som ämnesmässigt behandlar delvis samma område som CLS930 men på ett helt annat sätt, där idéerna tas på stort intellektuellt allvar och författaren går på djupet med deras bakgrund. Rekommenderas varmt!

3) Se exempelvis texten Gender equality and equal opportunity plan från 2022, vilken ingick i den rekommenderade litteraturen på kursen.

4) Se exempelvis Camp Vera, som säkert är mycket värdefullt för deltagarna och troligen även för Chalmers, men som samtidigt är flagrant diskriminerande i och med att endast personer som "identifierar [s]ig som tjej eller icke-binär" är välkomna att delta.

5) Som skäl för denna policy för den egna undervisningsverksamheten anförde jag bland annat att diskriminerande åtgärder riskerar att bryta mot den svenska diskrimineringslagen från 2008, vilken vi hade lärt oss om i början av kursen. Men på detta fick jag faktiskt svar från läraren, som pekade på en paragraf i lagen som undantar "åtgärder som är ett led i strävanden att främja jämställdhet mellan kvinnor och män och som avser annat än löne- eller andra anställningsvillkor".

Detta är intressant, och kan vara ett led i upplösandet av den motsättning jag här diskuterar mellan å ena sidan fördömandet av diskriminering och å andra sidan bruket av detsamma. Men riktigt klok på detta blir jag ändå inte. Antagligen skulle det citerade undantaget fria exempelvis Camp Vera-initiativet från anklagelser om diskriminering, men hur skulle lagen tillämpas om någon exempelvis drev en cancerklinik med policyn att alltid prioritera manliga patienter framför kvinnliga och att släppa in de senare endast i mån av plats, och som anförde att syftet med denna policy är att främja jämställdhet mellan män och kvinnor i fråga om medellivslängd? Sett genom mina juridiskt okunniga och naiva ögon ser det ut som att undantaget i lagen är tillämpligt och att cancerkliniken kan frias, men jag tror och hoppas att vi alla är överens om att det vore fullkomligt på tok.

6) Jag syftar här på deras bedömningsregler för det mångfalds- och inklusions-yttrande som är obligatoriskt när man söker lärartjänst hos dem, och som jag känner till tack vare Bryan Caplan. Eventuellt ser vi nu ett första tecken på att dylika ideologiska bekännelseobligatorier är på tillbakagång i det amerikanska universitetsväsendet.

onsdag 8 maj 2024

Om KVA:s nya AI-skrift

Ett av de olika sätt som Kungliga Vetenskapsakademien (KVA) bedriver sitt utåtriktade arbete mot allmänheten är via den överlag utmärkta skriftserien Vetenskapen säger. Konceptet att på 16 lättillgängliga sidor presentera något aktuellt vetenskapsområde är utmärkt, men djävulen bor i detaljerna, och den senaste utgåvan, vilken utkom den 11 april i år och handlar om AI, är inte helt lyckad.

Som KVA-ledamot känner jag ett behov av att meddela att denna skrift inte representerar någon konsensusuppfattning inom KVA om AI-frågor. Rent formellt kan hävdas att detta påpekande är överflödigt, eftersom det på broschyrens sista sida står att den är författad av en expertgrupp bestående av Virginia Dignum, Fredrik Heintz, Danica Kragic Jensfeldt, Amy Loutfi och Anders Ynnerman, och att den "speglar expertgruppens uppfattning och ska inte ses som ett uttalande eller ställningstagande av [KVA]". Med tanke på hur lätt hänt det kan vara att läsaren missar detta, och på hur hårt KVA har marknadsfört skriften bland annat i sociala medier, ser jag ändå ett värde i att göra den separata disclaimer som denna bloggpost utgör.

I slutet av broschyren ägnas ett par sidor åt diverse framtida samhällsutmaningar som AI reser, vilka genomgående är av mer jordnära karatär än de existentiella riskfrågor jag brukar lyfta exempelvis i mina föredrag och böcker samt här på bloggen. Frågor av det senare slaget omnämns i KVA-skriften endast som hastigast, i en faktaruta med följande ordalydelse:
    AGI och debatten om X-Risk

    Artificiell Generell Intelligens (AGI) skulle vara ett AI-system som kan göra allt som vi människor kan, och mer. En del AI-forskare anser att vi är ganska nära detta – andra att det dröjer decennier, eller att AGI aldrig kommer att finnas. Flera stora företag tävlar i dag om att först lyckas utveckla AGI. Nu växer en debatt om så kallad existentiell risk, X-Risk: att AGI utom kontroll skulle kunna hota vår existens. Vad händer om vi skapar AGI, ger den stort inflytande över den fysiska världen och otillräckliga instruktioner? Vissa forskare menar att systemen skulle kunna prioritera andra mål högre än vår trygghet och skada oss för att uppnå de målen.

    AI-experter är inte överens om hur stor X-Risk är och när, eller ens om, den blir aktuell. Men de som varnar vill att även den risken tydligt vägs in i AI-regleringar.

Allt annat lika är det såklart positivt att denna faktaruta finns med jämfört med om den inte hade gjort det. Ändå anser jag att den styvmoderligt undanskymda plats xriskfrågan fått i broschyren skapar en skevhet i det övergripande budskapet. Den läsare som trots allt når ända fram till faktarutan och inför dess besked kanske tänker "oj, hota vår existens, det låter allvarligt", men broschyrens utformning är som gjort för att läsarens nästa tanke skall vara "fast nej, AI-forskarna har rimligtvis läget under kontroll, för om de inte hade det skulle väl skriften ha fått en annan utformning där denna katastrofrisk fått ett helt annat utrymme, antagligen redan från sida ett".

Men detta intryck är bedrägligt, för situationen är inte under kontroll. Jag finner detta särskilt allvarligt med tanke på att en explicit uttalad målgrupp för skriftserien är gymnasieelever. Det kan inte vara en ok kommunikationsstrategi gentemot våra barn och ungdomar att tona ned och sopa under mattan de globala problem vi vuxna skapat och på några års sikt är i färd med att lämna över till dem.

Att faktarutan alls kom med tar jag åt mig äran av. Någon månad innan broschyren släpptes fick jag - närmast av en tillfällighet - möjlighet att läsa ett utkast och en inbjudan att kommentera. I utkastet fanns inte ett ord om xrisk-frågor, och med min kännedom om de forskare som ingick i författargruppen (jag känner alla utom en och är väl bekant med deras ointresse för xrisk) är jag övertygad om att så hade förblivit fallet även i slutversionen om jag inte hade besvarat inbjudan att kommentera. Det som här följer är ett par mycket lätt redigerade utdrag ur det ebrev jag skrev den 14 mars till Ulf Ellervik, som inte ingick i författargruppen men som har ett övergripande redaktionellt ansvar för skriftserien.
    Hej Ulf, och tack för att jag fick kika på utkastet till Vetenskapen säger – om AI!

    Mycket i texten är utmärkt, men jag har ett antal mindre påpekanden som jag tycker föranleder korrigeringar, plus en betydligt större synpunkt som i korthet innebär att texten bleve till stor skada om den publicerades i nuvarande skick. Men låt mig avverka småsakerna först.

Den därpå följande listan över mindre synpunkter, vilka överlag behandlades nöjaktigt i broschyrens slutversion, kan vi lämna därhän. Efter listan gick jag rakt på mitt huvudbudskap:
    Så långt randanmärkningarna – nu till min allvarligaste kritik mot textutkastet:

    Det räcker med ett kort studium av människans förhistoria för att inse hur farligt det kan vara att hamna på efterkälken och på anda plats på listan över planetens intelligentaste arter. Skapandet av AGI utgör därför en existentiell risk för Homo sapiens, och det är därför en alarmerande situation vi idag befinner oss i när de tre ledande AI-utvecklarna (OpenAI, Anthropic och Google/DeepMind) är inbegripna i en rasande kapplöpning mot AGI, där ledande företrädare för dessa bolag förutser att bara några få år återstår tills detta mål är uppnått, men utan att kunna prestera några tillnärmelsevis övertygande lösningsförslag på den så kallade AI Alignment-problematik som syftar till att göra AGI till mänsklighetens tjänare eller vän istället för dess utplånare.

    Denna AI-drivna existentiella risk (xrisk) har på några få år seglat upp som den enligt min mening allra mest akuta samhällsfrågan att lösa, och om vi misslyckas med det så kan det mycket väl gå så illa att ingen enda människa får uppleva innevarande decenniums utgång. Att i en text som Vetenskapen säger – om AI underlåta att med så mycket som ett ord omnämna denna xriskproblematik är en grotesk felbedömning.

    I textens avslutande avsnitt talas en del om andra AI-risker, förknippade med exempelvis deep fakes och med hur enskilda bidragstagare i det nederländska socialförsäkringssystemet kan komma i kläm till följd av skeva AI-bedömningar. Det är bra, för dessa frågor är viktiga, men att dryfta dem utan att alls nämna AI-xrisk är lite som om ett säkerhetsinspektionsprotokoll inför fartyget Titanics jungfrufärd skulle anmärka på den lite för glesa förekomsten av brandsläckare i fartygskorridorerna, och på halkrisken i köksutrymmena, men utan att ägna minsta uppmärksamhet åt fartygets sjöduglighet i händelse av grundstötning eller krock med isberg.

    Extra märkligt blir intrycket av texten då den vidgår att AGI kan vara på gång (med en formulering som ”vissa AI-forskare hävdar att vi är ganska nära”), utan att gå vidare med den xrisk-problematik som därigenom ligger ytterst nära till hands.

    Jag hävdar givetvis inte att frågan om AI-xrisk är okontroversiell. Tvärtom, uppfattningarna hos ledande forskare på området går starkt isär, vilket nyligen exemplifierades i en samanställning av några kända AI-profilers uppskattningar av risknivån (https://pauseai.info/pdoom). Så någon konsensus om att risken är betydande finns inte, men ej heller finns konsensus om att den är försumbar, något som dock blir det implicita budskapet i det nuvarande utkastet. Om detta står sig i textens slutversion blir den i praktiken till ett bidrag till den osakliga propagandakampanj om AI:s ofarlighet som drivs av mörka krafter med kortsiktiga kommersiella och andra intressen.

    Det går inte an. Det går sannerligen inte an i ett läge där två av de tre forskare som delade 2018 års Turingspris med en prismotivering som inkluderar orden ”the fathers of the deep learning revolution”, jämte ledarna för de tre främsta AI-laboratorierna, plus hundratals prominenta AI-forskare och professorer, tillsammans skrivit under på att AI-xrisk är en av vår tids angelägnaste frågor att hantera (https://www.safe.ai/work/statement-on-ai-risk). När Vetenskapen säger – om AI når sin slutliga utformning och publiceras behöver xrisk-prespektivet finnas med. Det kan inte undanhållas våra gymnasieungdomar och andra läsare av den planerade pamfletten. Och med tanke på frågans ohyggliga dignitet – det handlar ju om huruvida våra anhöriga och släktingar som nu är i spädbarnsåldern skall få chansen att växa upp till skolålder – räcker det inte med att saken omnämns med ett par-tre meningar, utan den behöver ägnas åtminstone en eller helst två sidor i den slutliga tryckta texten.

    Något som helst problem att fylla detta utrymme med intressant och engagerande sakdiskussion finns inte. Evidensen för att vi står inför en akut risksituation har under de år som gått av 2020-talet snabbt vuxit sig allt starkare. En del av detta redovisas utförligt i 2023 års upplaga av min senaste bok Tänkande maskiner (https://fritanke.se/bokhandel/bocker/tankande-maskiner/). Idealt skulle, med tanke på hur kontroversiell frågan är, pamfletten redogöra för argumenten hos båda sidor – de som menar att AI-xrisk bör tas på allvar och de som menar att vi kan rycka på axlarna åt den. En viss svårighet med detta är dock att den senare kategorin argument tenderar att vara så svaga och lättflyktiga att de vid minsta kritiska analys avdunstar. För ett typexempel på det sistnämnda, se den paneldiskussion jag i höstas deltog i tillsammans med Virginia Dignum (https://haggstrom.blogspot.com/2023/10/debating-ai-takeover-with-virginia.html) vars fingeravtryck är tydliga på många ställen i föreliggande textutkast. Sagda svårighet kan dock övervinnas, och jag är övertygad om att pamflettens slutliga utformning kan göras bra och välbalanserad.

    Låt mig slutligen notera att textutkastets avslutningsstycke –

      Det är inte ovanligt att människor är rädda för AI. Ofta beskrivs AI som något som händer oss, något vi inte kan kontrollera utan bara försöka lindra konsekvenserna av. Det ger en känsla av maktlöshet. Men AI händer inte oss – det är vi som får AI att hända. AI skapas och designas av människor och det ger oss alternativ och val. Genom ansvarsfull utveckling, tydliga regler och noggranna etiska överväganden kan vi säkra att risken för skador blir så liten som möjligt och att AI används för samhällets bästa.
    – är välskrivet och något jag helt och fullt kan skriva under på. Den ansvarsfulla utvecklingen kan dock inte bygga på att vi blundar för risker och sopar dem under mattan. Sanningen behöver ses i vitögat även när den är skrämmande, precis som fallet är med klimatfrågan. Och precis som med klimatfrågan är syftet med denna klarsyn inte att göra folk rädda. Syftet är att sprida den beslutsamhet som krävs för att korrigera utvecklingen. Om det bedöms att jag kan vara till nytta med det vidare arbetet med texten står jag givetvis till tjänst.

    Mvh,
    Olle

fredag 19 april 2024

Future of Humanity Institute 2005-2024

The news that University of Oxford's Future of Humanity Institute (FHI), after nearly two decades of existence, closed down earlier this week (Tuesday, April 16) made me very sad. The institute was Nick Bostrom's brainchild, and it was truly pioneering in terms of formulating some of the most profound and important questions about how to ensure a flourshing future for mankind, as well as in beginning the work of answering them. Their work has more or less uninterruptedly been at the forefront of my mind for more than a decade, and although I only visited their physical headquarters twice (in 2012 and 2016), it is clear to me that it was a uniquely powerful and creative research environment.

In the first draft of this blog post I used the acronym RIP in the headline, but decided to change that, because I wish that what remains from the institute - the minds and the ideas that it fostered - will not rest in peace, but instead continue to sparkle and help create a splendid future. They can do this at the many subsequent research institutes and think tanks that FHI helped inspire, such as The Centre for the Study of Existential Risk in Cambridge, The Future of Life Institute in Massachussetts, The Global Priorities Institute in Oxford, and The Mimir Center at the Institute for Future Studies in Stockholm. And elsewhere.

My friend Anders Sandberg was a driving force at the institue almost from the start and then until the very end. His personal memoir of the institute, entitled Future of Humanity Institute 2005-2024: Final Report offers a summary and many wonderful glimpses from their successful work, including a generous collection of photographs.1 Reading it is great consolation at this moment. Along with the successes, Anders also tells us briefly about the institute's downfall:
    Starting in 2020, the Faculty [of Philosophy] imposed a freeze on fundraising and hiring. Unfortunately, this led to the eventual loss of lead researchers and especially the promising and diverse cohort of junior researchers, who have gone on to great things in the years since. While building an impressive alumni network and ecosystem of new nonprofits, these departures severely reduced the Institute. In late 2023, the Faculty of Philosophy announced that the contracts of the remaining FHI staff would not be renewed. On 16 April 2024, the Institute was closed down. [p 19]
Later, on p 60-61, he offers three short paragraphs about what failings on the FHI's side may have led to such harsh treatment from the Faculty. What he offers is hardly the full story, and I have no specific insight into their organization that can add anything. Still, let me offer a small speculation, mostly based in introspection into my own mind and experience, about the kind of psychological and social mechanisms that may have contributed:

If you are an FHI kind of person (as I am), it will likely seem to you that lowering P(doom) by as little as a ppm is so obviously urgent and important that it appears superfluous and almost perverse to argue for such work using more traditional academic measuring sticks and rituals. That may lead you to ignore (some of) those rituals. If this clash of cultures continues for sufficiently long without careful intervention, the relations to the rest of the university are likely to decline and eventually collapse.

Footnote

1) See also his latest blog post.

tisdag 2 april 2024

Interviewed about AI risk in two new episodes of The Evolution Show

Three years ago, shortly after the release of the first edition of my book Tänkande maskiner, I was interviewed about AI risk by Johan Landgren in his YouTube podcast The Evolution Show. The amount of water under the bridges since then has been absolutely stupendous, and the issue of AI risk has become much more urgent, so last month Johan decided it was time to record two more episodes with me:

In his marketing of our discussion he put much emphasis on "7 years" as a timeline until the decisive AI breakthrough that will make or break humanity. I'm not sure I even mentioned that figure explicitly in our conversations, but admittedly it was implicit in some of the imagery I held forth. Still, I should emphasize that timelines are extremely uncertain, to the extent that an exact figure like 7 years needs to be taken with a huge grain of salt. It could happen in 2 years, or 5, or 10, or - provided either some severe unforseen technical obstacle or a collective decision to pause the development of frontier AI - even 20 years or more. This uncertainty subtracts nothing, however, from the urgency of mitigating AI existential risk.

Another part of Johan's marketing of our conversation that I'd like to quote is his characterization of it as "kanske det viktigaste jag haft i mitt liv" ("perhaps the most important one in my entire life"). This may or may not be an exagerration, but I do agree with him that the topics we discussed are worth paying attention to.

tisdag 26 mars 2024

What Max Tegmark really said about AI risk

In a recent article in the Australian online magazine Quillette, Swedish AI pundit Mathias Sundin portrays his more framous compatriot Max Tegmark as a doomsayer. He quotes Tegmark as saying the following in the Swedish public radio show Sommar on August 1, 2023:
    I’ve been thinking a lot about life and death lately. Now, it’s probably my turn next in my family. But I guess the rest of humanity will perish about the same time — after over a hundred thousand years on our planet. I believe the artificial intelligence that we’re trying to build will probably annihilate all of humanity pretty soon.
The quote is authentic and the translation from Swedish is fine. But look at Sundin's comment immediately after the quote:
    There were no ifs or buts and no “10 percent risk” or other disclaimers — just the promise of certain doom.
The careful reader might here notice the contrast between Sundin's talk of “certain doom”, and Tegmark's use of the word “probably” — which normally signifies not certainty but uncertainty.1 If the reader trusts that Sundin writes in good faith, he will likely conclude that the word “probably” is used here by Tegmark not in its real meaning, but as meaningless sentence filler and random noise, and that if one looks carefully at what else he has to say in the Sommar show, Sundin's summary about “the promise of certain doom” turns out to be accurate.

So let's have a look then, at what Tegmark has to say in Sommar, to determine whether Sundin's summary statement is warranted. (Spoiler: it is not.)

Less than half a minute after the passage quoted by Sundin, Tegmark says the following:
    What can we do today, in order to change direction towards a more hopeful and inspiring future?2
This is already an indication that, far from preaching “certain doom”, Tegmark thinks that “a more hopeful and inspiring future” is possible, and wants to engage his listeners in the great project of navigating towards such a future.

Soon after that, Tegmark goes on to compare an AI catastrophe wiping out humanity to a more down-to-Earth and familiar event: the death at old age of his own parents. He feels his parents had full lives, and thinks the possibility of an AI apocalypse compares unfavorably to this:
    But if the entire human race dies out because we messed up with AI, I feel in contrast that we have neither lived fully nor died honorably. Rather, it would feel tragic and unnecessary, as if a child makes the mistake of riding its bike across the edge of a precipice, despite repeated warnings. We are so young from a cosmic perspectives, with billions of promising years ahead of us. If we avoid that precipice, our future can become so much better than our past. We are on the verge of finally taking control of our own destiny, and solving many of the problems we humans have so far failed at.3
“Our future can become so much better than our past” are hardly the words of someone who predicts “certain doom”, and the entire passage is on the contrary rather hopeful. Later in the show, Tegmark spells out his hopeful vision about a flourishing future in slighly more detail:
    I have talked a lot about the threat from AI, but what happens if we manage to change course so as to end up in a future with a superintelligence which is under our control, and which is neither used maliciously, nor is misaligned or in competition with us? First of all, this will be like Christmas Day for anyone who likes research and technology, as the research done by the superintelligence will be limited not by what humans are able to think up, but only by the laws of nature. Discoveries that I believed as a teenager would take millennia or an eternity might be solved within our lifetimes. My uncle Erik and his daughter Anna both died in cancer, but I am convinced that all cancer can be cured — the problem is just that we humans have so far failed to figure out how. I believe superintelligence would quickly find cures for all diseases, and lift everyone out of poverty, stabilize the climate and solve all the classical global problems we have so far failed at.4
A more striking counterexample to Sundin's claim that Tegmark offers “no ifs and buts” in his talk about AI risk would be hard to imagine. And there's more:
    What drives my AI activism is not just that I want to avoid problems, but also that I want to preserve these inspiring possibilities.5
And this:
    We do not need to compete with AI. It is just Moloch trying to fool us into that attitude. We are the ones building AI, and if we make sure to keep control, then it will be us rather than AI who call the shots.6
It is clear from these quotes that far from predicting “certain doom”, Tegmark thinks a bright future awaits us if only we get our act together and play our cards right. It is furthermore clear that he doesn't even think this right way of playing our cards necessarily invloves giving up on the project of building superintelligent AI. To the contrary, such AI plays a key role in the grand visions for the future that he paints. It's just that he understands that there are risks involved and that we therefore need to proceed with a suitable level of caution, something that he judges the leading AI developers to lack. Hence his talk about the need to change course.

This is a highly reasonable position (and one that I share),7 but conveying it accurately goes counter to Sundin's ambition of painting Tegmark in the worst possible light. Instead, Sundin shamelessly decides to gamble on the readers of Quillette not knowing Swedish and therefore not being able to check his story, and to simply lie about what Tegmark says in Sommar.

What else can be said about Sundin's Quillette article? Well, it is quite bad. As is sadly typical for large parts of the AI debate as a whole and for his own writings in particular, Sundin is not interested in engaging seriously with the arguments of his opponents, and the article is full of inaccuracies and misrepresentations. I spent so much ink in 2023 — one newspaper op-ed and two blog posts — on Sundin's poor writing that frankly speaking I am sick of it, and so I will not say anything more about his Quillette article. I will not even comment on his choice (which is especially bizarre given the long and tedious email exchange we had about it in December last year) to continue to grossly misrepresent the point of my Lex Luthor thought experiment.

This is mostly not about Sundin himself, but more intended as an example of the abysmally low level of discourse among the category of accelerationists and AI risk deniers that he represents. I do worry, however, that him being chosen in December last year as a member of the Swedish government's AI commission is a small sign of governmental dysfunction.

Footnotes

1) The word used in the Swedish original is “antagligen” (01:37 into the radio show), which has very similar connotations in the present context as the English “probably”.

2) The Swedish original, beginning at 02:05: Vad kan vi göra idag, för att ändra riktning mot en mer hopfull och inspirerande framtid?

3) The Swedish original, at 03:48: Men om hela mänskligheten dör ut, för att vi klantat till det med artificiell intelligens, så känner jag däremot att vi varken levt färdigt eller dött värdigt. Snarare skulle det kännas tragiskt och onödigt, som om ett barn av misstag cyklar över kanten på ett stup, trots många varningar. Vi är så unga ur ett kosmiskt perspektiv, med miljarder lovande år framför oss. Om vi undviker det där stupet så kan vår framtid dessutom bli så mycket bättre än vårt förflutna. Vi är på vippen att äntligen ta kontroll över vårt eget öde, och lösa många av de största problem som vi människor hittills gått bet på.

4) The Swedish original, at 54:45: Jag har pratat mycket om hotet från artificiell intelligens, men vad händer om vi lyckas ändra kurs, och hamnar i en framtid med en superintelligens under vår kontroll, som inte används illvilligt, är felriktad eller konkurrerar med oss? Först och främst blir det som julafton för alla som gillar forskning och teknik, eftersom superintelligensens forskning blir begränsad inte av vad vi människor kan lista ut, utan bara av vad som är fysiskt möjligt. Upptäckter som jag som tonåring trodde skulle ta tusentals år eller en evighet skulle kunna ske under vår livstid. Både min morbror Erik och hans dotter Anna dog t.ex. av cancer. Jag är övertygad om att all cancer går att bota, problemet är att vi människor hittills inte lyckats lista ut hur. Jag tror att superintelligens snabbt skulle hitta botemedel för alla sjukdomar, och lyfta alla ur fattigdom, stabilisera klimatet, och lösa alla de klassiska världsproblemen om vi hittills gått bet på.

5) The Swedish original, at 56:10: Min AI-aktivism drivs inte bara av att jag vill undvika problem utan också av att jag vill bevara de här inspirerande möjligheterna.

6) The Swedish original, at 01:04:00: Vi behöver inte konkurrera med AI. Det är bara Moloch som försöker lura i oss det. Det är nämligen vi som bygger AI, och om vi ser till att vi behåller kontrollen så blir det vi och inte AI som bestämmer var skåpet skall stå.

7) For a summary statement of where I stand on AI issues, readers who know Swedish are encouraged to consult the bonus chapter (downloadable for free) in the 2023 edition of my book Tänkande maskiner. For those who prefer English, there are various video recordings of talks I've given that can serve a similar purpose; see here, here and here.

torsdag 29 februari 2024

On OpenAI's report on biorisk from their large language models

Aligning AIs with whatever values it is we need them to have in order to ensure good outcomes is a difficult task. Already today's state-of-the-art Large Language Models (LLMs) present alignment challenges that their developers are unable to meet, and yet they release their poorly aligned models in their crazy race with each other where first prize is a potentially stupendously profitable position of market dominance. Over the past two weeks, we have witnessed a particularly striking example of this inability, with Google's release of their Gemini 1.5, and the bizarre results of their attempts to make sure images produced by the model exhibit an appropriate amount of demographic diversity among the people portrayed. This turned into quite a scandal, which quickly propagated from the image generation part of the model to the likewise bizarre behavior in parts of its text generation.1

But while the incident is a huge embarrassment to Google, it is unlikely to do much real damage to society or the world at large. This can quickly change with future more capable LLMs and other AIs. The extreme speed at which AI capabilies are currently advancing is therefore a cause for concern, especially as alignment is expected to become not easier but more difficult as the AIs become more capable at tasks such as planning, persuasion and deception.2 I think it's fair to say that the AI safety community is nowhere near a solution to the problem of aligning a superhumanly capable AI. As an illustration of how dire the situation is, consider that when OpenAI in July last year announced their Superalignment four-year plan for solving the alignment problem in a way that scales all the way up to such superhumanly capable AI, the core of their plan turns out to be essentially "since no human knows how to make advanced AI safe, let's build an advanced AI to which we can delegate the task of solving the safety problem".3 It might work, of course, but there's no denying that it's a huge leap into the dark, and it's certainly not a project whose success we should feel comfortable hinging the future survival of our species upon.

Given the lack of clear solutions to the alignment problem on the table, in combination with how rapidly AI capabilities are advancing, it is important that we have mechanisms for carefully monitoring these advances and making sure that they do not cross a threshold where they become able to cause catastrophe. Ideally this monitoring and prevention should come from a state or (even better) intergovernmental actor, but since for the time being no such state-sanctioned mechanisms are in place, it's a very welcome thing that some of the leading AI developers are now publicizing their own formalized protocols for this. Anthropic pioneered this in September last year with their so-called Responsible Scaling Policy,4 and just three months later OpenAI publicized their counterpart, called their Preparedness Framework.5

Since I recorded a lecture about OpenAI's Preparedness Framework last month - concluding that the framework is much better than nothing, yet way too lax to reliably protect us from global catastrophe - I can be brief here. The framework is based on evaluating their frontier models on a four-level risk scale (low risk, medium risk, high risk, and critical risk) along each of four dimensions: Cybersecurity, CBRN (Chemical, Biological, Radiological, Nuclear), Persuasion and Model Autonomy.6 The overall risk level of a model (which then determines how OpenAI may proceed with deployment and/or further capabilities development) is taken to be the maximum among the four dimensions. All four dimensions are in my opinion highly relevant and in fact indipensable in the evaluation of the riskiness of a frontier model, but the one to be discussed in what follows is CBRN, which is all about the model's ability to create or deploy (or assisting the creation or deployment of) non-AI weapons of mass destruction.

The Preparedness Framework report contains no concrete risk analysis of GPT-4, the company's current flagship LLM. A partial such analysis did however appear later, in the report Building an early warning system for LLM-aided biological threat creation released in January this year. The report is concerned with the B aspect of the CBRN risk factor - biological weapons. It describes an ambitious study in which 100 subjects (50 biology undergraduates, and 50 Ph.D. level experts in microbiology and related fields) are given tasks relevant to biological threats, and randomly assigned to have either access to both GPT-4 and the Internet, or just to the Internet. The question is: does GPT-4 access make subjects more capable at their tasks? It seems yes, but the report remains inconclusive.

It's an interesting and important study, and many aspects of it deserve praise. Here, however, I will focus on two aspects where I am more critical.

The first aspect is how the report goes on and on about whether the observed positive effect of GPT-4 on subjects' skills in biological threat creation is statistically significant.7 Of course, this obsession with statistical significance is shared with a kazillion other research reports in virtually all empirically oriented disciplines, but in this particular setting it is especially misplaced. Let me explain.

In all scientific studies meant to detect whether some effect is present or not, there are two distinct ways in which the result can come out erroneous. A type I error is to deduce the presence of a nonzero effect when it is in fact zero, while a type II error is is to fail to recognize a nonzero effect which is in fact present. The concept of statistical significance is designed to control the risk for type I errors; roughly speaking, employing statistical significance methodology at significance level 0.05 means making sure that if the effect is zero, the probability of erroneously concluding a nonzero effect should be at most 0.05. This gives a kind of primacy to avoiding type I errors over avoiding type II errors, laying the burden of proof on whoever argues for the existence of a nonzero effect. This makes a good deal of sense in a scientific community where an individual scientific career tends to consist largely of discoveries of various previously unknown effects, creating an incentive that in the absence of a rigorous system for avoiding type I errors might overwhelm scientific journals with a flood of erroneous claims about such discoveries.8 In a risk analysis context such as in the present study, however, it makes no sense at all, because here it is type II errors that mainly need to be avoided - because they may lead to irresponsible deployment and global catastrophe, whereas consequences of a type I error are comparatively trivial. The burden of proof in this kind of risk analysis needs to be laid on whoever argues that risk is zero or negligible, whence the primary focus on type I errors that follows implicitly from a statistical significance testing methodology gets things badly backwards.9

Here it should also be noted that, given how widely useful GPT-4 has turned out to be across a wide range of intellectual tasks, the null hypothesis that it would be of zero use to a rogue actor wishing to build biological weapons is highly far fetched. The failure of the results in the study to exhibit statistical significance is best explained not by the absence of a real effect but by the smallness of the sample size. To the extent that the failure to produce statistical significance is a real problem (rather than a red herring, as I think it is), it is exacerbated by another aspect of the study design, namely the use of multiple measurements on each subject. I am not at all against such multiple measurements, but if one is fixated on the statistical significance methodology, it leads to dependencies in the data that force the statistical analyst to employ highly conservative10 p-value calculations, as well as to multiple inference adjustments. Both of these complications lead to worse prospects for statistically significant detection of nonzero effects.

The second aspect is what the study does to its test subjects. I'm sure most of them are fine, but what is the risk that one of them gets a wrong idea and picks up inspiration from the study to later go on to develop and deploy their own biological pathogens? I expect the probability to be low, but given the stakes at hand, the expected cost might be large. In practice, the study can serve as a miniature training camp for potential bioterrorists. Before being admitted to the study, test subjects were screened, e.g., for criminal records. That is a good thing of course, but it would be foolish to trust OpenAI (or anyone else) to have an infallible way of catching any would-be participant with a potential bioterrorist living in the back of their mind.

One possible reaction OpenAI might have to the above discussion about statistical aspects is that in order to produce more definite results they will scale up their study with more participants. To which I would say please don't increase the number of young biologists exposed to your bioterrorism training. To which they might reply by insisting that this is something they need to do to evaluate their models' safety, and surely I am not opposed to such safety precautions? To which my reply would be that if you've built something you worry might cause catastrophe if you deploy it, a wise move would be to not deploy it. Or even better, to not build it in the first place.

Footnotes

1) See the twin blogposts (first, second) by Zvi Mowshowitz on the Gemini incident for an excellent summary of what has happened so far.

2) Regarding the difficulty of AI alignment, see, e.g., Roman Yampolskiy's brand-new book AI: Unexplainable, Unpredictable, Uncontrollable for a highly principled abstract approach, and the latest 80,000 Hours Podcast conversation with Ajeya Cotra for an equally incisive but somewhat more practically oriented discussion.

3) As with so much else happening in the AI sphere at present, Zvi Mowshowitz has some of the most insightful comments on the Superalignment announcement.

4) I say "so-called" here because the policy is not stringent enough to make their use of the term "responsible" anything other than Orwellian.

5) The third main competitor, besides OpenAI and Anthropic, in the race towards superintelligent AI is Google/DeepMind, who have not publicized any corresponding such framework. However, in a recent interview with Dwarkesh Patel, their CEO Demis Hassabis assures us that they employ similar frameworks in their internal work, and that they will go public with these some time this year.

6) Wisely, they emphasize the preliminary nature of the framwork, and in particular the possibility of adding further risk dimensions that turn out in future work to be relevant.

7) Statistical significance is mentioned 20 times in the report.

8) Unfortunately, this has worked less well than one might have hoped; hence the ongoing replication crisis in many areas of science.

9) The burden of proof location issue here is roughly similar to the highly instructive Fermi-Szilard disagreement regarding the development of nuclear fission, which I've written about elsewhere, and where Szilard was right and Fermi was wrong.

10) "Conservative" is here in the Fermian rather than the Szilardian sense; see Footnote 9.

torsdag 11 januari 2024

Video talk about the Grace et al surveys on what AI researchers think about AI risk

There is much disagreement out there regarding what future progress in AI may have in store for us, including AI risk. What could then be more natural than asking AI researchers what they think? This is what Katja Grace and various collaborators have done in a series of surveys in 2016, 2022 and 2023. The results are interesting, partly in unexpected ways. Back in 2017, Katja gave an excellent talk about the first of those surveys, at a Gothenburg meeting that I hosted. Today I recorded a video lecture in which I comment on her findings back then and in the later surveys, with particular emphasis on whether and how much we can trust these AI researchers to make sound judgements about our future with AI:

fredag 5 januari 2024

Video talk on OpenAI's Preparedness Framework

Here's my latest video lecture on what OpenAI does regarding AI risk and AI safety, following up on the ones from December 2022 and March 2023. This time I focus on the so-called Preparedness Framework that they announced last month, and criticize it along lines heavily influenced by Zvi Mowshowitz writing on the same topic.