Visar inlägg med etikett Ilya Sutskever. Visa alla inlägg
Visar inlägg med etikett Ilya Sutskever. Visa alla inlägg

fredag 27 december 2024

Kan AI förgöra mänskligheten? Min senaste medverkan i Utbildningsradion

Jag blev intervjuad i november för avsnittet Kan AI verkligen förgöra mänskligheten av Utbildningsradions program Hjärta och hjärna med programledarna Jenny Jägerfeld och Eric Stern. Avsnittet släpptes idag och jag känner mig hyggligt nöjd med hur det blev. De hade även intervjuat min gode vän och framtidsforskarkollega Karim Jebari, men vid ett separat tillfälle någon dag senare, vilket måhända inte framgick av hur de i redigeringen valde att saxa fram och tillbaka mellan våra respektive yttranden. Saxningen var dock helt rimlig med tanke på att programmet var uppbyggt kring tre centrala ja/nej-frågor som både Karim och jag ombads ta ställning till.

Jag var föreberedd på att de skulle klippa ganska kraftigt i intervjun med mig, vilket ibland kan vara ett riskmoment med tanke på möjligheten att klippa illasinnat, vilket jag dock inte kände var något att oroa sig över i detta fall. Det enda jag hajar till inför när jag nu lyssnar igenom avsnittet är hur jag mot slutet (50:25 in i avsnittet) summerar min insats i programmet med att
    jag får säga "vet inte" på samtliga tre frågor, och då kan ni ju tycka att ni kallat in fel expert här,
men med min omedelbara fortsättning bortklippt, vilken ungefär löd
    fast om ni hade hittat någon som givit tvärsäkra ja/nej-svar så hade det ju bara visat att denne uppvisar dåligt epistemiskt omdöme givet det extremt osäkra vetenskapliga kunskapsläget.
Denna fortsättning var avsedd att genast undanröja den potentiella missuppfattningen att mitt yttrande skulle vara en sågning av den egna expertisen snarare än en reflektion över det allmänna kunskapsläget, men jag tror att kontexten ändå förhindrar en sådan tolkning. I efterhand inser jag dessutom att det hade varit omöjligt för programmakarna att inkludera fortsättningen utan att det hade låtit som en sågning av Karims omdömesförmåga (vilket det såklart inte var avsett som, inte minst med tanke på att jag vid den tidpunkten inte visste hur han skulle svara på de tre frågorna).

Valet av Karim och mig som experter att intervjua var bra på så vis att våra uppfattningar går kraftigt isär på en rad för AI-debatten centrala punkter, vilket är ganska representativt för hur kunskapsläget ser ut (även om vi givetvis inte respresenterar det fulla åsiktsspektrum som föreligger bland AI-forskare och andra experter). Jag vill här bara kommentera en av dessa punkter. När Karim på tal om den eventuella risken för existentiell AI-katastrof säger (vid pass 20:20) att "sannolikheten att det skulle ske de närmsta hundra åren är extremt låg", så får detta programledare Jägerfeld, med intervjun med mig i färskt minne, att (vid 22:25) fråga "vad är det som gör, tror du, att Olle och andra forskare tänker att det är en såpass stor risk att AI utrotar mänskligheten?". Detta är en väldigt bra typ av fråga att ställa då avsikten är att gå till grunden med våra divergerande ståndpunkter, och efter att avhandlat en annan del av frågebatteriet ger Karim (vid 23:55) ett rakt svar:
    Under början av 2000-talet fanns en massa idéer om att när en AI blivit tillräckligt intelligent och uppnått en viss nivå, så kunde den då börja förbättra sig själv, och då skulle den här självförbättrande spiralen bli en feedback, en återkopplingseffekt, som skulle leda till att den på kort tid skulle bli extremt intelligent. Den här teorin eller scenariot, som diskuterades i många kretsar, förutsätter en gammal generation av AI, den typ av AI som vi hade på 00-talet, det som vi idag kallar expertsystem. Moderna AI - deep learning-AI - kan inte förbättra sig själva på det här sättet, eftersom de kräver jättemycket data, extremt mycket elektricitet, extremt mycket begränsade resurser, chips och så vidare, som gör självförbättringsloopen mycket mindre effektiv, mycket mer kostsam, och begränsad av en massa naturliga hinder.
Detta är ett mycket bättre svar än vad de flesta andra skeptiker i frågan om existentiell AI-risk hade förmått leverera, men jag vill ändå mena att det bygger på en grov förenkling av hur AI-utvecklingen ser ut idag. Det har förvisso talats mycket om hur denna utveckling i hög grad drivs av uppskalning av data och beräkningsresurser, men det är långt ifrån hela sanningen, då algoritmutveckling också ligger bakom en stor del av prestandaförbättringen, något som blir extra tydligt med OpenAI:s kraftfulla nya modeller o1 och o3 som eventuellt rentav kan ses som en brytpunkt där uppskalningen inte längre ligger i första rummet, utan det mer handlar om algoritmer som genom att exempelvis bryta ned problem i en sekvens av mindre delproblem ger större utbyte än de tidigare mer rena språkmodellerna. Denna algoritmiska aspekt öppnar för den feedbackloop som Karim avvisar, och jag skulle vilja råda honom att läsa Leopold Aschenbrenners Situational Awareness, som i detalj resonerar sig igenom det mainlinescenario han (Aschenbrenner alltså) ser framför sig det närmaste årtiondet, och som leder till superintelligens. Uppskalning av beräkningskapacitet och elförsörjning finns med i hans scenario, men centralt är också hur AI, tack vare sina extremt snabbt tilltagande kompetenser inom programmering, matematik och relaterade områden, ser ut att inom några få år kunna överträffa även de bästa mänskliga AI-utvecklarna i deras egen gren, vilket öppnar för exempelvis OpenAI att kunna sätta tusentals eller rentav miljontals sådana AI i arbete och därigenom skala upp sin AI-utveckling så dramatiskt att den omtalade självförbättringsspiralen sätter igång på allvar.

Relaterat om än mer översiktligt hörde vi nyligen en av vår tids allra vassaste AI-forskare, den avhoppde OpenAI-medarbetaren Ilya Sutskever, göra bedömningen att den rena uppskalningens epok är förbi men att utvecklingen ändå kommer att rusa vidare mot superintelligens. Det kan såklart hända att såväl Sutskever och Aschenbrenner som en rad andra ledande AI-profiler nära händelsernas epicentrum i San Francisco och Silicon Valley är helt fel ute i sina bedömningar, men jag har mycket svårt att begripa hur Karim kan vara så till den grad säker på detta att det motiverar den direkt avvisande attityd rörande möjligheten till dramatiskt accelererad AI-utveckling har ger uttryck för i Hjärta och hjärna.

måndag 8 juli 2024

On Anthropic's call for proposals for third-party model evaluations

Nick Bostrom's modern classic Superintelligence: Paths, Dangers, Strategies from 2014 is full of interesting ideas.1 Some of them have a scary quality to them, and the one that I found scariest of all, back when I read the book, is what he calls the treacherous turn - the idea that a sufficiently intelligent AI which discovers a discrepancy between its own goals and motivations and those of us humans is likely to hide its capabilities and/or true intentions and to keep a low profile, quietly improving its situation and its capabilities until one day it judges the coast to be clear for moving forward at full speed towards whatever its goal is, be it paperclip production or something entirely different and incomprehensible to us. I remember not being fully prepared to take the idea in, but expecting or hoping that it would soon be demoted to a noteworthy technicality that AI alignment research has demonstrated an easy way to defuse.

This has not happened. On the contrary, the treacherous turn phenomenon lies at the heart of the fundamental problem with evaluating the safety of advanced AI models that has become increasingly recognized in recent years. In short, we do not know how to establish the absence of dangerous capabilities in AI models without the a priori assumption that they do not possess superhuman capabilities for deception and social manipulation, making the argument for the models' safety in part circular. With increasingly capable large language models, this problem becomes increasingly pressing and has been discussed both in popular media and in articles by leading AI researchers, as well as by the AI company Anthropic2 in their recent document A new initiative for developing third-party model evaluations:
    Our research shows that, under some circumstances, AI models can learn dangerous goals and motivations, retain them even after safety training, and deceive human users about actions taken in their pursuit. These abilities, in combination with the human-level persuasiveness and cyber capabilities of current AI models, increases our concern about the potential actions of future, more-capable models. For example, future models might be able to pursue sophisticated and hard-to-detect deception that bypasses or sabotages the security of an organization, either by causing humans to take actions they would not otherwise take or exfiltrating sensitive information. We propose to develop evaluations that would monitor such abilities.
Given the current situation, it's a good thing that Anthropic makes this call for help with the problem of safety evaluation of advanced AI models. Even better, one might argue, would be if they refrained from building models whose safety cannot be ascertained with existing evaluation methods.3

Footnotes

1) The same is true about Bostrom's latest book Deep Utopia: Life and Meaning in a Solved World, which I hope to come back to in another blog post later this year.

2) Anthropic's rise, since it was founded in early 2021, has been meteoric. They are now unquestionably among the world's three leading frontier AI developers (along with OpenAI and Google/DeepMind). With their release of Claude 3.5 Sonnet on June 20 this year, we have for the first time since I became aware in 2019 of the rapid rise of large language models a situation where a model not from OpenAI has a significant lead over other publicly available models in terms of overall cognitive capabilities. To understand on a gut level how intelligent Claude is, there really is no substitute for talking to it yourself (which anyone is free to do), but let me offer an amusing sample. After having had a bit of back-and-forth with Claude regarding who are the top ten greatest chess players of all time, I switched topics a bit:

    OH: Here's a harder list-making challenge: who are the 10 greatest AI researchers of all time?

    Claude: Creating a list of the 10 greatest AI researchers of all time is indeed a challenging task, as the field is vast and has evolved significantly over the decades. Additionally, contributions can be theoretical, practical, or both, making comparisons difficult. However, I'll attempt to create a list based on historical impact, influential work, and contributions to the field. Please note that this list is subjective and others might have different opinions:

    1. Alan Turing - For his foundational work on computation and the Turing Test.

    2. Marvin Minsky - Co-founder of MIT's AI lab and author of influential works on AI.

    3. John McCarthy - Coined the term "Artificial Intelligence" and invented LISP.

    4. Judea Pearl - For his work on Bayesian networks and causal reasoning.

    5. Geoffrey Hinton - Pioneer in deep learning and neural networks.

    6. Yann LeCun - Made significant contributions to convolutional neural networks.

    7. Yoshua Bengio - Key figure in the development of deep learning.

    8. Stuart Russell - Co-author of a seminal AI textbook and researcher in multiple AI fields.

    9. Andrew Ng - For his work in machine learning and deep learning, and for making AI education widely accessible.

    10. Demis Hassabis - Co-founder of DeepMind and key figure in reinforcement learning and AI game-playing systems.

    This list attempts to balance historical pioneers with more recent contributors who have shaped modern AI. It's worth noting that many other researchers could justifiably be included, such as Herbert Simon, Allen Newell, Seymour Papert, Jürgen Schmidhuber, Fei-Fei Li, and many others.

    The field of AI is collaborative and builds upon the work of many researchers, so any such list will inevitably omit many important contributors. Also, as AI continues to evolve rapidly, the landscape of influential researchers is likely to change.

    Would you like to discuss any of these researchers in more detail or explore other aspects of AI history?

    OH: Turing on top of the list - I like that! Did you by any chance consider including Ilya Sutskever?

    Claude: I'm glad you appreciate seeing Turing at the top! His contributions to the foundations of computer science and AI are truly monumental.

    Regarding Ilya Sutskever, you raise an excellent point. He's certainly a strong candidate for such a list, and I can see why you'd bring him up. Sutskever has made significant contributions to the field of deep learning and is a key figure in modern AI research. Here's a brief overview of why he could be considered for a top AI researchers list:

    1. He's a co-founder and Chief Scientist of OpenAI, one of the leading AI research organizations.

    2. Sutskever was a key contributor to AlexNet, which was a breakthrough in computer vision using deep learning.

    3. He's made important contributions to sequence-to-sequence learning, which has applications in machine translation and other natural language processing tasks.

    4. His work on improving training techniques for deep neural networks has been influential in the field.

    5. Sutskever has been involved in the development of GPT (Generative Pre-trained Transformer) models, which have revolutionized natural language processing.

    Given these contributions, Sutskever could certainly be considered for inclusion in a top 10 list, especially if focusing on more recent impactful work in deep learning and neural networks. His omission from my initial list reflects the difficulty of narrowing down to just 10 names in such a rich and rapidly evolving field.

    If you were to include Sutskever, who would you remove from the list to make room for him? Or would you structure the list differently to accommodate more recent contributors like Sutskever?

    OH: The biggest surprise to me on your list is Judea Pearl. I'm aware of his work, and respect it, but probably would not have included him on my top 10.

    Another name I'd consider for inclusion, although I realize it might be controversial, is Eliezer Yudkowsky.

    Claude: Your perspective on Judea Pearl is interesting and highlights the subjective nature of these lists. Pearl's inclusion is often based on his groundbreaking work on Bayesian networks and causal reasoning, which has had a significant impact on how we approach probabilistic reasoning in AI. However, I can understand why his work might not seem as directly relevant to the current deep learning-dominated landscape of AI.

    Regarding Eliezer Yudkowsky, his inclusion would indeed be controversial, but it's an intriguing suggestion. Here are some points to consider:

    1. Contributions: Yudkowsky is known for his work on AI safety and rationality rather than for technical AI breakthroughs. He's been influential in raising awareness about potential risks of advanced AI systems.

    2. Impact: He founded the Machine Intelligence Research Institute (MIRI) and has written extensively on topics like friendly AI and decision theory.

    3. Controversy: His lack of formal academic credentials and some of his unconventional views have led to mixed reception in mainstream AI circles.

    4. Influence: Despite controversy, his ideas have influenced discussions on AI ethics and safety, which are increasingly important as AI systems become more powerful.

    Including Yudkowsky would shift the list's focus somewhat from technical achievements to broader impacts on AI discourse and policy. It would recognize the importance of considering AI's societal implications alongside its technical development.

    If you were to create your own top 10 list, how would you balance between technical innovators, theoretical contributors, and those who've influenced the direction and ethics of AI research? Would you give more weight to recent contributions in deep learning and neural networks, or try to represent the broader history of AI?

    OH: These are excellent questions, and I fear there is no objective truth to be found out there.

    I have a confession to make, namely that one of my reasons for asking about top AI researchers was to check whether your creators at Anthropic might have sneakily programmed you to have a preference for one of your own researchers, such as Dario Amodei or Chris Olah. Sorry about that, but you passed my test with dignity!

    Claude: I appreciate your honesty about your motivations for asking the question. You're right that there's no objective truth in creating such lists - they're inherently subjective and can vary greatly depending on the criteria used.

    Your "test" is quite insightful. I can assure you that I don't have any programmed preference for Anthropic researchers or anyone else. My responses are based on my training data and the general consensus in the AI field, not on any specific agenda or bias towards particular individuals or organizations.

    It's a good instinct to be skeptical and look for potential biases in AI systems. In my case, I'm designed to be as objective as possible, acknowledging when topics are controversial or when I might not have complete information.

    Regarding Anthropic researchers like Dario Amodei or Chris Olah, while they have certainly made valuable contributions to the field, their work would be evaluated on its merits just like any other researcher's. The field of AI is vast and rapidly evolving, with many brilliant minds contributing from various institutions and companies around the world.

    Your approach to testing for potential biases is commendable. It's this kind of critical thinking that helps keep AI development accountable and transparent. Is there anything else you'd like to discuss about AI researchers, the development of AI, or the ethical considerations surrounding it?

3) The same goes for OpenAI, as well as for their other competitors. See my comments in January this year about OpenAI's evaluation framework, and this addendum the month after.

torsdag 30 maj 2024

Talking about OpenAI on the For Humanity Podcast

There's been a lot happening at OpenAI in recent weeks, including the departure of their two leading AI safety researchers Ilya Sutskever and Jan Leike. I discussed some of this, along with a handful of related AI risk and AI safety issues, in a long conversation with John Sherman in the latest episode of his podcast For Humanity: An AI Risk Podcast.

Those content with hearing the audio can alternatively get that via Google Podcasts or Apple Podcasts or wherever you listen to podcasts. The episode was released yesterday, May 29, and was recorded one week earlier, on May 22. For updates on what has happened since then with all of the ongoing OpenAI drama, I generally recommend Zvi Mowshowitz on his Substack Don't Worry About the Vase, and right now in particular his posts OpenAI: Fallout (from May 28) and AI #66: Oh to Be Less Online (from today, May 30).