tisdag 24 januari 2023

Denna text om ChatGPT kan ni inte läsa i DN

OpenAI:s lansering i november förra året av ChatGPT (som jag bland annat diskuterade i mitt YouTube-föredrag den 16 december) har lett till något av ett allmänt uppvaknande inför AI-utvecklingen. Så t.ex. skrev Lars Strannegård (rektor för Handelshögskolan i Stockholm och yngre bror till till sin äldre bror Claes) om saken ur ett (ut-)bildningsperspektiv på DN Kultur i förra veckan. Jag såg det som ett tillfälle att haka på och försöka bredda disussionen till ett större samhällsperspektiv, och skickade därför nedanstående text till DN Kultur. Den refuserades dock, och jag bjuder istället på den här på bloggen. För en lite längre utveckling av min slutpoäng i texten, se nämnda YouTube-video.

* * *

Utvecklingen inom artificiell intelligens (AI) går rasande snabbt, och nu senast har det kaliforniska AI-företaget OpenAI:s textrobot ChatGPT har tagit världen med storm. Denna konversations-AI kan föra häpnadsväckande intelligenta och mänskligt klingande samtal, men trampar liksom vi människor ofta snett och häver ur sig felaktigheter och ren bullshit. Som Lars Strannegård berättar (DN Kultur, 17/1) förs bland oss universitetslärare just nu en intensiv diskussion om vad det betyder för vår undervisning och examination av studenter när dessa nu kan överlåta mycket av sitt uppsatsskrivande och annat hemarbete på ChatGPT.

Jag tror inte att det är meningsfullt att diskutera den betydelse ChatGPT och andra så kallade stora språkmodeller (Large Language Models) har för utbildningssystemet utan att samtidigt beakta vad de kan göra i andra samhällssektorer. Denna AI-teknik ser ut att bland annat kunna komma att omforma allt administrativt och kommunikativt arbete snabbare och långt mer genomgripande än det sena 1900-talets datorisering. Strannegård har därför rätt när han pekar på det djupt problematiska med om våra lärosäten skulle försöka isolera sig själva och sina studenter från den nya tekniken, något som skulle skapa en i förhållande till arbetsmarknaden föråldrad utbildning.

Vad vi dessutom behöver ta i beaktande är att ChatGPT inte är slutpunkten på AI-utvecklingen, utan kommer att följas av en strid ström av ännu kraftfullare och intelligentare AI-system. Det betyder att den lösning Strannegård är inne på – att identifiera och sedan satsa på de unikt mänskliga förmågor som ligger bortom vad en AI kan göra – riskerar att allt snabbare behöva uppdateras i takt med att AI gör nya framsteg. Att idag hitta exempel på mänskliga förmågor som ChatGPT saknar är inte svårt, men desto svårare är att peka ut sådant som oöverstigligt för kommande AI. De senaste årens AI-utveckling har gått omilt fram mot den sortens prediktioner, och att slå fast något unikt mänskligt som permanent kommer att ligga bortom AI:s förmåga verkar det inte finnas utrymme för med en naturalistisk världsbild.

Några gränser för vad AI på sikt kan åstadkomma finns i praktiken inte alls, vilket för oss till de varningsord datavetenskapens fader Alan Turing formulerade i en känd essä från 1951, om hur en tillräckligt avancerad AI kan börja förbättra sig själv utan fortsatt mänsklig inblandning, och hur det kan leda vidare till ett läge där den tar över makten från oss människor. Vi hamnar då i samma slags prekära läge som gorillor befinner sig i idag, och mänsklighetens öde kommer att hänga på vad maskinerna har för mål och drivkrafter. Det av Turing förutsedda genombrottet har alltid framstått som tämligen avlägset, men på sistone har allt fler bedömare i ljuset av den snabba utvecklingen pekat på 10-20 år fram i tiden som en fullt möjlig tidtabell, och dagens två ledande AI-utvecklare – OpenAI och Londonbaserade DeepMind – har båda ambitionen att nå fram till ett sådant genombrott, under beteckningen artificiell generell intelligens (AGI).

Allt detta innebär att det är av största betydelse för mänsklighetens framtid att lista ut hur vi kan se till att denna AGI har mål som ligger i linje med mänskliga värderingar. Detta forskningsområde ignorerades i ett halvsekel efter Turings prediktion, men har nu börjat ta form under namnet AI alignment. Pionjären på området, Eliezer Yudkowsky, har sagt att ”AI:n varken hatar dig eller älskar dig, men du är gjord av atomer som den kan ha annan användning för” – ord som visar hur katastrofalt illa det kan gå om vi misslyckas med detta projekt. Tyvärr har AI alignment-forskningen hittills mest visat hur svårt det är, och ingen kan idag peka ut någon säker väg framåt.

Särskilt illavarslande i detta sammanhang är hur OpenAI misslyckats med att få ChatGPT att bete sig väl. De har ansträngt sig att utforma den på sådant vis att den undviker att ljuga, att uttrycka rasism eller att dela med sig av farlig kunskap som exempelvis receptet för Molotovcocktails. Ändå har skadeglada användare världen över visat hur lätt det är att provocera ChatGPT till just dessa beteenden. Om OpenAI inte ens klarar att lösa AI alignment för ett system som ChatGPT, hur skall de då ha en chans att göra det för en AGI?

Lars Strannegård fokuserar i sin artikel på hur vi kan anpassa oss till den nya AI-utvecklingen, vilket är gott så, men jag menar att vi inte bör stanna där, utan att även tänka över hur vi kan styra den i gynnsam riktning. Måhända är det dags att bygga upp ett politiskt momentum som förmår sätta press på OpenAI och andra ledande teknikbolag att sluta spela rysk roulette med mänsklighetens överlevnad som insats, och att i tid bromsa utvecklingen mot AGI?

måndag 23 januari 2023

Artificial general intelligence and scientific reasoning: an exchange with David Sumpter

My friend David Sumpter (a professor of applied mathematics at Uppsala University) and I have had stimulating discussions and deep disagreements on AI and related topics off and on since (at least) 2013. Each year since 2019, we have taken part in a seminar organized by Frances Sprei at Chalmers, with an audience of computer science students. This year, as usual, we zoomed in quickly on our diametrically opposing views on whether or not the creation of artificial general intelligence (AGI) is a realistic prospect in the present century. David thinks the answer is no, and pointed to the deep divide between present-day narrow AI and (the so far exclusively hypothetical) notion of AGI, to the historical record of how difficult technology is to predict, and to our inability to faithfully simulate even very primitive biological organisms. I, on the other hand, think the answer may well be yes, and pointed to Ajeya Cotra's more ambitious comparisons with biological systems, and to the fallaciousness of the so-called common sense argument which I believe drives much of AGI skepticism. I don't think having seen our seminar is a necessary prerequisite for enjoying the following correspondence, but for those who still wish to see it it's available on YouTube.

In earlier discussions, David has noted (quite correctly) my resistance to and blatant disrespect for the norm that science is a gradual process that does not deal with phenomena (such as AGI) that are far outside of the envelope of what is already known and understood. It was with reference to this that he made the friendly-yet-provocative remark during the Q&A (not included in the video) that ''Olle does not understand how science works'', which triggered me to write to him afterwards with an attempt at a clarification regarding my view of science. This led to a further exchange between the two of us regarding AGI and the broader issue of how one should think about scientific and rational reasoning. We have not reached the end of our discussion, but David has indicated that he needs more time to carefully lay out his position, so we agreed that it makes sense to post what we already have, and leave readers with a bit of a cliffanger until a later blog post. Here goes:

*******

From OH to DS, November 25:

Hi David,

I still feel our debates are productive in that we’re zooming in (slowly) towards the crux of our disagreement. When I pressed you about prediction, you said some things can be predicted (such as your prediction about AGI), and some cannot (such as Cotra’s prediction about AGI). What makes that answer unacceptable in my view is that the two of you are asking the same question (when will we get AGI?), so what you’re doing here is to say that some answers to that question are scientifically legitimate, while others are not. Doing so introduces the strongest possible bias in the search for the correct answer to that question. I disagree vehemently with such approaches, and in particular with one-sidedly invoking “burden of evidence” in cases whose complexity go far beyond Popper’s idealized “are all swans white?” situation; for more detail on this, see my old blog post on “Vulgopopperianism”.

Relatedly, when you say “Olle does not understand how science works”, I appreciate the witticism, but my reaction to it depends on whether you mean “how science works” as a sociological observation about how science is actually practiced today or as a normative claim about how we ought to go about. As a sociological observation, I am totally on board with your view of how (most) science works, but normatively, I reject it. And we must be clear about the distinction between the descriptive and the normative questions: “this is how science is practiced today” is not a satisfactory answer to “how should we do science?”.

Olle

*******

From DS to OH, November 26:

Hi Olle,

Thanks for this. Very useful. You wrote:
    When I pressed you about prediction, you said some things can be predicted (such as your prediction about AGI), and some cannot (such as Cotra’s prediction about AGI).
I suppose I do think that I am not making a prediction. I am rather saying the future is highly uncertain and no-one has presented any evidence that AGI is on its way. A bit like the aliens example. There is no evidence presented for the existence of aliens who can travel to our planet. But who knows what will happen. So maybe they are already on their way when they saw our first signs of technology. Or there is no evidence that we are living in a computer simulation. But we have all seen the Matrix, so it might be true. So here we have two ideas that are far out in the same way that Terminator is far out. And they should be treated in the same way. So I am not making a prediction in the same way as Cotra is making a prediction. She is describing a distribution of likely year outcomes. This to me this doesn’t pass the test of methods which have been successful in the past. For example, in Outnumbered I wrote about election forecasting, but we could take a lot of other things — economic forecasting, forecasting life outcomes. See for example this PNAS paper. We just don’t have methods for predicting outcomes of these sorts. So I do give less value to these types of prediction methods.

I feel similar about your argument about exponential growth (which for me is the linear model, btw!). Without previous instances available, it is impossible to extrapolate in that way.

So, if I am not making a prediction, what am I doing? Well this year I broke it down into three stages. The first is a look at the machine learning methods available today. Surely, that should be a big part of our focus here (especially since you now think that AGI is very likely on a shorter timescale). Even on the timescale of Cotra’s predictions we should be able to find some signal of its arrival? And I just can’t see it. Markov’s first application of a Markov chain in 1907 was in predicting the next letter (vowel or constant) in Pushkin. Yes. We can now construct massive Markov chains of text, which is now done, and the same results on a larger scale come out. That is all that has happened here. The same is true of each application. Chess is a combinatorically massive (but closed, finite) problem. Go is slightly bigger. With big enough computers, we were always going to solve it with the right representation. Protein folding is an impressive piece of engineering, as our many ways in which machine learning is used. But they all come from the underlying maths. I suppose the scope of the debate doesn’t give me time to break all of these down, but I think that for any given AI development, I could break it down for you in terms of early or mid-20th century maths. And I don’t see where the new maths might lie that supersedes it.

The rapid discovery of new maths, would undermine the argument above (the ten year argument). But again, Cotra’s argument should account for this lack of new maths? There is no abundance of new computational ideas waiting to be inserted in to AI. Instead, the AGI argument is usually made around scale (back to exponential growth). If we scale up Markov’s predictions with pencil and paper to computing today we increase the range over time we can predict a text. There is a belief (a wrong-headed one I think) that this type of scaling can be used to predict the evolution of AI (see for example this paper for the scaling argument). But even from Markov’s first paper, we would have been able to predict that AI scales in this way! Of course it does. The more symbols we read and predict, the longer it can keep churning out syntactically correct text. But this is fact free text. It is nonsense.

My fifty year biological argument is simply another way in to explaining the above, without going in to the maths. It is a way of gauging what maths and computing allows us to do and what it (strangely) doesn’t.

Eventually, you are then forced, and at some point you did this, to move toward the idea that “humans might just be like language models”. This is to me a religous belief. You can think it, if you want, but it does not square with my experience of the world. Nor does it align with how I see you as an individual. I see our relationship as uniquely human. I can’t understand it in any other way. So while those are mystical statements, I see any appeal to "we are just a neural networks" as an equally mystical statement. I at least, in this context, am prepared to admit my mysticism.

I don’t think that reality is a computation. So we are exempt from the Church-Turing hypothesis. I think computation is one limited part of the rich ways in which the world manifests itself. Ultimately, this is a philosophical position. But it is self-consistent. Your position is very strongly one of logical positivism and mine is more of (scientific) realism.

You should put away Popper and read Hilary Putnam, for example. I think this might help.

Have a good weekend,

David.

*******

From OH to DS, November 29:

Hi David, and many thanks for your thoughtful response!

But you seem to want to have your cake and eat it too. When you confidently state that AGI will not be developed in the foreseeable future, and simultaneously insist on not having made a prediction about AGI, you’re making two incompatible claims. If you say that AGI is very unlikely in the foreseeable future, then that is a prediction, no matter how you arrived at it. Conversely, if you insist on not having made a prediction, then it doesn’t make sense to respond to the AGI timeline and feasibility issue in any other way than by shrugging and saying “I have no clue”.

Your argument for your having-the-cake-and-eating-it is precisely the kind of burden-of-proof-asymmetry gambit that I do not accept. You say:
    I am rather saying the future is highly uncertain and no-one has presented any evidence that AGI is on its way.
Assuming for the moment that your claim here that no evidence is available for AGI being on its way (a claim I will return to and contest below) is correct, why in the world would this settle the matter, with zero regard to whether or not there is any evidence for the opposed statement that AGI is not on its way?

Simplifying just a little bit, there are basically two ways the world can be as regards the AGI issue: either
    (a) AGI is attainable within the next 100 years, or

    (b) there is some secret sauce about human cognition that makes AGI unattainable within the next 100 years.

(Secret sauce here need not literally be some substance, but could denote whatever aspect of our minds that cannot be replicated in AI in the given time frame.)

You seem to be saying that if the collected evidence for (a) and the collected evidence for (b) are both weak, then we should treat (b) as true by default. I strongly disagree. To require that defenders of (a) should provide, e.g., a concrete plan for how to attain AGI within the given time frame, while at the same time absolving defenders of (b) from having to point to what the secret sauce consists in, is to rig the discussion on (a) vs (b) so much in (b)’s favor that it becomes almost pointless. If we are to carry on our debates without short-circuiting them from the start, we have to take both (a) and (b) seriously, and treat neither of them as correct by default.

Speaking of plans for how to attain AGI, you said in the Q&A session at our seminar that any argument for (a) would have to be accompanied by such a plan in order to be taken seriously, but that no such plan has been but forth. This last claim is however false, as multiple such plans have been suggested, one of which in fact you’ve alluded to yourself, namely the scaling approach of building bigger and bigger GPT-n systems until eventually one of them turns out to be an AGI. Does this plan guarantee success? Of course not. But neither is it guaranteed to fail. And it is (I maintain) unreasonable to require the existence of a 100% fail-safe plan for building an AGI as a prerequisite for attaching a nonzero probability to (a) being true.

I get the impression that your failure to notice that the scaling approach is a plan for building AGI is a consequence of your deeply held belief that it just cannot work – a belief that in turn rests on your strong belief in (b). Please correct me if I am misreading you here, but in case I am not, you surely see the problem: if we are to have a meaningful and non-circular discussion on (a) vs (b), you cannot appeal to arguments that presuppose (b).

Let me now go back to your claim that there is no evidence that AGI is on its way. I think this claim is simply false. The more broadly competent and intelligent AI we produce, the more this points towards AGI being on its way. Various increasingly widely competent products from DeepMind such as MuZero constitute such evidence, and the same goes for the large GTP-like models from OpenAI and others. None of this is definite proof of (a), of course, and if you consider it to be weak evidence then I respect your judgement (while, depending on the meaning of “weak”, perhaps not quite sharing it), but if you go as far as to insist that it constitutes zero evidence, then I can only take that as a sign that you are dogmatically presupposing the secret sauce-hypothesis (b) as a means to disqualifying all the evidence, again inserting a nonproductive circularity into the discussion.

To back up your skepticism about the evidential value of the performance of GPT-3 and other language models, you state that all they produce is “syntactically correct […] nonsense”. But how do you come to such a conclusion? Let me give an example. Suppose you quiz me with the following:
    Trevor has wanted to see the mountain with all of the heads on it for a long time, so he finally drove out to see it. What is the capital of the state that is directly east of the state that Trevor is currently in?
And this is my reply:
    The mountain with all of the heads on it is Mount Rushmore. Mount Rushmore is in South Dakota. The state directly east of South Dakota is Minnesota. The capital of Minnesota is St. Paul. The answer is "St. Paul".
Would you dismiss my reply as “nonsense”? Of course not: you would (correctly) conclude that, far from talking nonsense, I am systematically working out the answer to your quiz.

Now modify the thought experiment by leaving the dialogue intact but replacing me by Google’s GTP-like language model PaLM. Actually, this is exactly how PaLM answers this quiz question. How do you then judge its answer? Is it systematic reasoning or just nonsense? If you dismiss it as nonsense, I am again led to suspect that you are short-circuiting the discussion by falling back on the secret sauce-hypothesis (b). You might retort that no, that’s not what you’re doing, and explain that your conclusion is based on the empirical fact that PaLM responds to a bunch of other questions by obvious bullshitting and nonsense. Well, that doesn’t quite work, because the same holds for me: if you quiz me with sufficiently confusing questions, my answers will degenerate into nonsense. So you might qualify by saying that you count my answer as sense and PaLM’s as nonsense because PaLM gives obviously nonsensical answers more often than I do. That’s actually a stance that I am somewhat sympathetic to, but note that it points towards a quantitative rather than qualitative difference between me and PaLM, so while it can be taken to support the view “PaLM is not AGI”, it seems to me much more suspect to read it as supporting “PaLM is not even a step towards AGI”.

My reply is becoming longer than I intended, but before closing I want to clarify something on the issue of “how science works” that I addressed in my previous email. Namely, I have found that people with a strong science background are often prone to the following kind of reasoning; I’m not sure if this literally applies to you as well, but it seems to me that you have operated in its vicinity:

They argue the claim “near-term AGI is unfalsifiable”, from which they conclude “near-term AGI is an unscientific issue” and from that finally arrive at “near-term AGI will not happen”. Or they apply the same chain of reasoning to go from “near-term contact with extraterrestrial aliens is unfalsifiable” to “near-term contact with extraterrestrial aliens will not happen”, or from “the simulation hypothesis is unfalsifiable” to “the simulation hypothesis is false”. I believe all these inferences are non sequiturs based on a category error: abstract rules and conventions for what constitutes proper science cannot on their own answer substantial questions about the world. To me, the AGI question and the aliens question and the simulation question are all important questions about the world, and I want to be able to address them without having my reasoning short-circuited by conventions about unfalsifiability. Hence, if they insist that science is built on conventions that either allow for the inference from unfalsifiability to falsehood, or prohibit reasoning about AGI/aliens/simulation altogether, then I say to heck with (their notion of) science – I will address these questions outside of the narrow confines of science, and instead within the realm of rational reasoning!

I value our exchange highly, David, and I have great respect for you. This respect comes from your mathematical and scientific brilliance, and from your kindness and overall lovely personality – but not from any hypothetical secret sauce lurking somewhere deep in your mind. I am hoping that you can extend the same secret sauce-independence in your view of me, because I am not at all sure that I have any.

Best,

Olle

onsdag 11 januari 2023

Leo Szilard the satirist

Of all the 20th century physicists who contributed to the intellectually magnificent project of putting human civilization on the brink of destruction, and who reflected afterwards on the ethics of doing so, my favorite is probably Leo Szilard. It has now been brought to my attention (by Jason Crawford) that Szilard was also an excellent satirist. Here's his 1948 (very) short story Tha Mark Gable Foundation:
    "I have earned a very large sum of money" - said Mr. Gable, turning to me, "with very little work. And now I am thinking of setting up a Trust Fund. I want to do something that will really contribute to the happiness of the mankind; but it is very difficult to know what to do with money. When Mr. Rosenblatt told me that you would be here tonight I asked the Mayor to invite me. I certainly would value your advice."

    "Would you intend to do anything for the advancement of science?" I asked.

    "No," Mark Gable said, "I believe scientific progress is too fast as it is."

    "I share your feeling about this point," I said with the fervor of conviction, "but then, why not do something about the retardation of scientific progress?"

    "That I would very much like to do," Mark Gable said, "but how do I go about it?"

    "Well," I said, "I think that should not be very difficult. As a matter of fact, I think it should be quite easy. You could set up a Foundation, with an annual endowment of thirty million dollars. Research workers in need of funds could apply for grants, if they could make out a convincing case. Have ten committees, each composed of twelve scientists, appointed to pass on these applications. Take the most active scientists out of the laboratory and make them members of these committees. And the very best men in the field should be appointed as Chairmen at salaries of $50,000 each. Also have about twenty prizes of $100,000 each for the best scientific papers of the year. This is just about all you would have to do. Your lawyers could easily prepare a Charter for the Foundation. As a matter of fact, any of the National Science Foundation Bills which had been introduced in the 79th and 80th Congress could perfectly well serve as a model."

    "I think you had better explain to Mr. Gable why this Foundation would in fact retard the progress of science," said a bespectacled young man sitting at the far end of the table, whose name I didn't get at the time of introduction.

    "It should be obvious" - I said - "First of all, the best scientists would be removed from their laboratories and kept busy on committees passing on applications for funds. Secondly, the scientific workers in need of funds will concentrate on problems which are considered promising and are pretty certain to lead to publishable results. For a few years there may be a great increase in scientific output; but by going after the obvious, pretty soon Science will dry out. Science will become something like a parlor game. Some things will be considered interesting, others will not. There will be fashions. Those who follow the fashion will get grants. Those who won't, will not, and pretty soon they will learn to follow the fashion too."

fredag 23 december 2022

Utredningen om medicinsk åldersbedömning nedlagd

Kort efter att den dåvarande regeringen sommaren 2020 tillsatte en utredning om Rättsmedicinalverkets (RMV) metod för medicinsk åldersbedömning utsågs jag som ledamot i den expertgrupp som skulle assistera enmansutredaren Maria Eka i hennes arbete. Detta uppdrag fick ett hastigt slut igår, den 22 december 2022, då den nuvarande regeringen tog beslut om att lägga ned utredningen. Beskedet gjorde mig bestört, då det (enligt min mening helt felaktigt) pekar på att frågor om rättssäkerhet med mera vid medicinsk åldersbedömning av ensamkommande flyktingbarn inte anses viktiga nog att förtjäna fortsatt tankearbete.

Fastän jag på några punkter är försiktigt kritisk mot utredningens arbete,1 menar jag att det enda rimliga hade varit att låta den fullfölja sitt arbete, och att den plötsliga nedläggningen utgör ett slags epistemisk vandalisering. Som väl är finns i alla fall utredningens delbetänkande från i oktober förra året, men något slutbetänkande blir det alltså inte.

Fotnot

1) Mina huvudsakliga kritiska synpunkter är tre till antalet:
  • Utredningsarbetet ålades ett tidsschema som redan från början var så till den grad utdraget (deadline för slutbetänkandet var satt till den 31 maj 2024, nära fyra år efter utredningens sjösättning) att det är svårt att föreställa sig en rimlig saklig motivering till det. Snarare tror jag att det handlade om att man ville ge dåvarande justitieminister Morgan Johansson rejält med arbetsro i frågan.
  • I expertgruppen ingick personer från en rad organisationer och myndigheter med fingrar med i själva den verksamhet utredningen var satt att granska. Skälet till det är uppenbart: dessa organisationer sitter på kunskap av intresse för utredningen. Dock finns lika uppenbara nackdelar, vilka i början av utredningens arbete lyftes i bland annat Läkartidningen. Utan att nämna några namn eller gå in på några detaljer i vad som sagts i expertgruppen kan jag med facit i hand konstatera att den intellektuella nivån på diskussionerna ofta blev lidande till följd av att vissa ledamöter såg det förutsättningslösa och opartiska dryftandet av sakfrågor som ett i bästa fall sekundärt uppdrag, och istället betraktade sitt primära uppdrag som att söka undvika kritik mot den egna organisationen. Jag tror att denna erfarenhet är värd att beakta inför kommande statliga utredningar med liknande problematik; förhoppningsvis går det att hitta sätt att organisera dessa som är mer gynnsamma för nivån på deras arbete.
  • Utredningen var enligt min mening alltför snävt fokuserad på RMV:s metod för medicinsk åldersbedömning, på bekostnad av den bredare frågan om vad som är en (etiskt, juridiskt, mättekniskt och beslutsteoretiskt) lämplig eller rentav optimal sådan metod. Direktviv är direktiv, kan man hävda, men frågan om huruvida RMV:s metod är lämplig kan inte frikopplas från den om huruvida bättre alternativ finns.

fredag 16 december 2022

ChatGPT and AI Alignment

Just over two weeks ago, on November 30, OpenAI released its ChatGPT, and since then, everyone's talking about it. Me too. In this newly recorded 47-minute lecture I offer some thoughts on what it means for the AI Alignment problem.

tisdag 29 november 2022

Scott Aaronson on AI safety

Many times over the last few years I've made attempts at lectures giving an overview about what kinds of futures for humanity the unfolding saga of AI development may lead to, and about what is being done within the still small but rapidly growing field of AI alignment with the goal of improving the odds for a benign outcome. Some of these attempts are available in video format in earlier Häggström hävdar blog posts and elsewhere. They have varied in quality, and in a few instances I've walked away from the podium with something resembling a sense of having nailed the topic as best as I can hope to, but to tell the truth I have never ever come even close to the level exhibited by the brilliant and inimitable Scott Aaronson in the talk hosted by AT Austin's EA (Effective Altruism) group he gave a couple of weeks ago, on November 14, 2022. Warmly recommended!

torsdag 24 november 2022

Slaughterbots and Diplomacy

The most vivid and pedagogically useful explanation of the near-term dangers implied by the development of AI technology for military drones that I know of has been the famous Slaughterbots video, produced in 2017 by the Future of Life Institute and featuring leading AI safety researcher Stuart Russell. I've been showing it to my students every year since 2018, along with proclamations such as ''Potential dangers in AI lurk everywhere, but if I were to single out one application area as the absolutely most reckless one as a playground for development of cutting-edge AI technology, I'd say killer robots''.

Just this week, two event have come to my attention that cause me to update on this and perhaps modify my lectures slightly:

1. A large part of the Slaughterbots video is a fictional promotion event for a new autonomous killer drone product. But why stick to fiction when we can have the real thing? The Israeli defense electronics company Elbit Systems has released a promotion video for their latest lethal AI product. Their tone and content are reminiscent the Slaughterbots video but without any hint of irony or admission that AI technology for killing people could be dangerous or bad. Here's what we get when reality imitates fiction:

2. Facebook's (or Meta as they now call themselves) AI lab just announced how they built an AI that performs on (at least) human level at the game of Diplomacy. Given the role of Bostrom's treacherous turn concept and related deception problems (as outlined in the recent very important report by Ajeya Cotra) in exacerbating AI existential risk, I should perhaps consider modifying the proclamation above to something like ''Potential dangers in AI lurk everywhere, but if I were to single out one application area as the absolutely most reckless one as a playground for development of cutting-edge AI technology, I'd say games where the winning moves consist of manipulating human opponents into acting in your interest and then stabbing their back''.