Visar inlägg med etikett Roman Yampolskiy. Visa alla inlägg
Visar inlägg med etikett Roman Yampolskiy. Visa alla inlägg

torsdag 29 februari 2024

On OpenAI's report on biorisk from their large language models

Aligning AIs with whatever values it is we need them to have in order to ensure good outcomes is a difficult task. Already today's state-of-the-art Large Language Models (LLMs) present alignment challenges that their developers are unable to meet, and yet they release their poorly aligned models in their crazy race with each other where first prize is a potentially stupendously profitable position of market dominance. Over the past two weeks, we have witnessed a particularly striking example of this inability, with Google's release of their Gemini 1.5, and the bizarre results of their attempts to make sure images produced by the model exhibit an appropriate amount of demographic diversity among the people portrayed. This turned into quite a scandal, which quickly propagated from the image generation part of the model to the likewise bizarre behavior in parts of its text generation.1

But while the incident is a huge embarrassment to Google, it is unlikely to do much real damage to society or the world at large. This can quickly change with future more capable LLMs and other AIs. The extreme speed at which AI capabilies are currently advancing is therefore a cause for concern, especially as alignment is expected to become not easier but more difficult as the AIs become more capable at tasks such as planning, persuasion and deception.2 I think it's fair to say that the AI safety community is nowhere near a solution to the problem of aligning a superhumanly capable AI. As an illustration of how dire the situation is, consider that when OpenAI in July last year announced their Superalignment four-year plan for solving the alignment problem in a way that scales all the way up to such superhumanly capable AI, the core of their plan turns out to be essentially "since no human knows how to make advanced AI safe, let's build an advanced AI to which we can delegate the task of solving the safety problem".3 It might work, of course, but there's no denying that it's a huge leap into the dark, and it's certainly not a project whose success we should feel comfortable hinging the future survival of our species upon.

Given the lack of clear solutions to the alignment problem on the table, in combination with how rapidly AI capabilities are advancing, it is important that we have mechanisms for carefully monitoring these advances and making sure that they do not cross a threshold where they become able to cause catastrophe. Ideally this monitoring and prevention should come from a state or (even better) intergovernmental actor, but since for the time being no such state-sanctioned mechanisms are in place, it's a very welcome thing that some of the leading AI developers are now publicizing their own formalized protocols for this. Anthropic pioneered this in September last year with their so-called Responsible Scaling Policy,4 and just three months later OpenAI publicized their counterpart, called their Preparedness Framework.5

Since I recorded a lecture about OpenAI's Preparedness Framework last month - concluding that the framework is much better than nothing, yet way too lax to reliably protect us from global catastrophe - I can be brief here. The framework is based on evaluating their frontier models on a four-level risk scale (low risk, medium risk, high risk, and critical risk) along each of four dimensions: Cybersecurity, CBRN (Chemical, Biological, Radiological, Nuclear), Persuasion and Model Autonomy.6 The overall risk level of a model (which then determines how OpenAI may proceed with deployment and/or further capabilities development) is taken to be the maximum among the four dimensions. All four dimensions are in my opinion highly relevant and in fact indipensable in the evaluation of the riskiness of a frontier model, but the one to be discussed in what follows is CBRN, which is all about the model's ability to create or deploy (or assisting the creation or deployment of) non-AI weapons of mass destruction.

The Preparedness Framework report contains no concrete risk analysis of GPT-4, the company's current flagship LLM. A partial such analysis did however appear later, in the report Building an early warning system for LLM-aided biological threat creation released in January this year. The report is concerned with the B aspect of the CBRN risk factor - biological weapons. It describes an ambitious study in which 100 subjects (50 biology undergraduates, and 50 Ph.D. level experts in microbiology and related fields) are given tasks relevant to biological threats, and randomly assigned to have either access to both GPT-4 and the Internet, or just to the Internet. The question is: does GPT-4 access make subjects more capable at their tasks? It seems yes, but the report remains inconclusive.

It's an interesting and important study, and many aspects of it deserve praise. Here, however, I will focus on two aspects where I am more critical.

The first aspect is how the report goes on and on about whether the observed positive effect of GPT-4 on subjects' skills in biological threat creation is statistically significant.7 Of course, this obsession with statistical significance is shared with a kazillion other research reports in virtually all empirically oriented disciplines, but in this particular setting it is especially misplaced. Let me explain.

In all scientific studies meant to detect whether some effect is present or not, there are two distinct ways in which the result can come out erroneous. A type I error is to deduce the presence of a nonzero effect when it is in fact zero, while a type II error is is to fail to recognize a nonzero effect which is in fact present. The concept of statistical significance is designed to control the risk for type I errors; roughly speaking, employing statistical significance methodology at significance level 0.05 means making sure that if the effect is zero, the probability of erroneously concluding a nonzero effect should be at most 0.05. This gives a kind of primacy to avoiding type I errors over avoiding type II errors, laying the burden of proof on whoever argues for the existence of a nonzero effect. This makes a good deal of sense in a scientific community where an individual scientific career tends to consist largely of discoveries of various previously unknown effects, creating an incentive that in the absence of a rigorous system for avoiding type I errors might overwhelm scientific journals with a flood of erroneous claims about such discoveries.8 In a risk analysis context such as in the present study, however, it makes no sense at all, because here it is type II errors that mainly need to be avoided - because they may lead to irresponsible deployment and global catastrophe, whereas consequences of a type I error are comparatively trivial. The burden of proof in this kind of risk analysis needs to be laid on whoever argues that risk is zero or negligible, whence the primary focus on type I errors that follows implicitly from a statistical significance testing methodology gets things badly backwards.9

Here it should also be noted that, given how widely useful GPT-4 has turned out to be across a wide range of intellectual tasks, the null hypothesis that it would be of zero use to a rogue actor wishing to build biological weapons is highly far fetched. The failure of the results in the study to exhibit statistical significance is best explained not by the absence of a real effect but by the smallness of the sample size. To the extent that the failure to produce statistical significance is a real problem (rather than a red herring, as I think it is), it is exacerbated by another aspect of the study design, namely the use of multiple measurements on each subject. I am not at all against such multiple measurements, but if one is fixated on the statistical significance methodology, it leads to dependencies in the data that force the statistical analyst to employ highly conservative10 p-value calculations, as well as to multiple inference adjustments. Both of these complications lead to worse prospects for statistically significant detection of nonzero effects.

The second aspect is what the study does to its test subjects. I'm sure most of them are fine, but what is the risk that one of them gets a wrong idea and picks up inspiration from the study to later go on to develop and deploy their own biological pathogens? I expect the probability to be low, but given the stakes at hand, the expected cost might be large. In practice, the study can serve as a miniature training camp for potential bioterrorists. Before being admitted to the study, test subjects were screened, e.g., for criminal records. That is a good thing of course, but it would be foolish to trust OpenAI (or anyone else) to have an infallible way of catching any would-be participant with a potential bioterrorist living in the back of their mind.

One possible reaction OpenAI might have to the above discussion about statistical aspects is that in order to produce more definite results they will scale up their study with more participants. To which I would say please don't increase the number of young biologists exposed to your bioterrorism training. To which they might reply by insisting that this is something they need to do to evaluate their models' safety, and surely I am not opposed to such safety precautions? To which my reply would be that if you've built something you worry might cause catastrophe if you deploy it, a wise move would be to not deploy it. Or even better, to not build it in the first place.

Footnotes

1) See the twin blogposts (first, second) by Zvi Mowshowitz on the Gemini incident for an excellent summary of what has happened so far.

2) Regarding the difficulty of AI alignment, see, e.g., Roman Yampolskiy's brand-new book AI: Unexplainable, Unpredictable, Uncontrollable for a highly principled abstract approach, and the latest 80,000 Hours Podcast conversation with Ajeya Cotra for an equally incisive but somewhat more practically oriented discussion.

3) As with so much else happening in the AI sphere at present, Zvi Mowshowitz has some of the most insightful comments on the Superalignment announcement.

4) I say "so-called" here because the policy is not stringent enough to make their use of the term "responsible" anything other than Orwellian.

5) The third main competitor, besides OpenAI and Anthropic, in the race towards superintelligent AI is Google/DeepMind, who have not publicized any corresponding such framework. However, in a recent interview with Dwarkesh Patel, their CEO Demis Hassabis assures us that they employ similar frameworks in their internal work, and that they will go public with these some time this year.

6) Wisely, they emphasize the preliminary nature of the framwork, and in particular the possibility of adding further risk dimensions that turn out in future work to be relevant.

7) Statistical significance is mentioned 20 times in the report.

8) Unfortunately, this has worked less well than one might have hoped; hence the ongoing replication crisis in many areas of science.

9) The burden of proof location issue here is roughly similar to the highly instructive Fermi-Szilard disagreement regarding the development of nuclear fission, which I've written about elsewhere, and where Szilard was right and Fermi was wrong.

10) "Conservative" is here in the Fermian rather than the Szilardian sense; see Footnote 9.

tisdag 1 december 2020

New AI paper with James Miller and Roman Yampolskiy

The following words by Alan Turing in 1951, I have quoted before:
    My contention is that machines can be constructed which will simulate the behaviour of the human mind very closely. [...] Let us now assume, for the sake of argument, that these machines are a genuine possibility, and look at the consequences of constructing them. [...] It seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers. There would be no question of the machines dying, and they would be able to converse with each other to sharpen their wits. At some stage therefore we should have to expect the machines to take control.
It is but a small step from that ominous final sentence about machines taking over to the conclusion that when that happens, everything hinges on what they are motivated to do. The academic community's reaction to Turing's suggestion was a half-century of almost entirely ignoring it, and only the last couple of decades have seen attempts to seriously address the issues that it gives rise to. An important result of the early theory-building that has come out of this work is the so-called Omohundro-Bostrom framework for instrumental vs final AI goals. I have discussed it, e.g., in my book Here Be Dragons and in a 2019 paper in the journal Foresight.

Now, in collaboration with economist James Miller and computer scientist Roman Yampolskiy, we have another paper on the same general circle of ideas, this time with emphasis on the aspects of Omohundro-Bostrom theory that require careful scrutiny in light of the game-theoretic considerations that arise for an AI living in an environment where it needs to interact with other agents. The paper, whose title is An AGI modifying its utility function in violation of the strong orthogonality thesis, is published in the latest issue of the journal Philosophies. The abstract reads as follows:
    An artificial general intelligence (AGI) might have an instrumental drive to modify its utility function to improve its ability to cooperate, bargain, promise, threaten, and resist and engage in blackmail. Such an AGI would necessarily have a utility function that was at least partially observable and that was influenced by how other agents chose to interact with it. This instrumental drive would conflict with the strong orthogonality thesis since the modifications would be influenced by the AGI’s intelligence. AGIs in highly competitive environments might converge to having nearly the same utility function, one optimized to favorably influencing other agents through game theory. Nothing in our analysis weakens arguments concerning the risks of AGI.
Read the full paper here!

tisdag 18 september 2018

An essential collection on AI safety and security

The xxix+443-page book Artificial Intelligence Safety and Security, edited by Roman Yampolskiy, has been out for a month or two. Among its 28 independent chapters (plus Yamploskiy's introduction), which have a total of 47 different authors, the first 11 (under the joint heading Concerns of Luminaries) have previously been published, with publication years ranging from 2000 to 2017, while the remaining 17 (dubbed Responses of Scholars) are new. As will be clear below, I have a vested interest in the book, so the reader may want to take my words with a grain of salt when I predict that it will quickly become widely accepted as essential reading in the rapidly expanding and increasingly important fields of AI futurology, AI risk and AI safety; nevertheless, that is what I think. I haven't yet read every single chapter in detail, but have seen enough to confidently assert that while the quality of the chapters is admittedly uneven, the book still offers an amazing amount of key insights and high-quality expositions. For a more systematic account by someone who has read all the chapters, see Michaël Trazzi's book review at Less Wrong.

Most of the texts in the Concerns of Luminaries part of the book are modern classics, and six of them were in fact closely familiar to me even before I had opened the book: Bill Joy's early alarm call Why the future doesn't need us, Ray Kurzweil's The deeply intertwined promise and peril of GNR (from his 2005 book The Singularity is Near), Steve Omohundro's The basic AI drives, Nick Bostrom's and Eliezer Yudkowsky's The ethics of artificial intelligence, Max Tegmark's Friendly artificial intelligence: the physics challenge, and Nick Bostrom's Strategic implications of openness in AI development. (Moreover, the papers by Omohundro and Tegmark provided two of the cornerstones for the arguments in Section 4.5 (The goals of a superintelligent machine) of my 2016 book Here Be Dragons.) Among those that I hadn't previously read, I was struck most of all by the urgent need to handle the near-term nexus of risks connecting AI, chatbots and fake news, outlined in Matt Chessen's The MADCOM future: how artificial intelligence will enhance computational propaganda, reprogram human cultrure, and threaten democracy... and what can be done about it.

The contributions in Responses of Scholars offer an even broader range of perspectives. Somewhat self-centeredly, let me just mention that three of the most interesting chapters were discussed in detail by the authors at the GoCAS guest researcher program on existential risk to humanity organized by Anders Sandberg and myself at Chalmers and the University of Gothenburg in September-October last year: James Miller's A rationally addicted artificial superintelligence, Kaj Sotala's Disjunctive scenarios of catastrophic AI risk, and Phil Torres' provocative and challenging Superintelligence and the future of governance: on prioritizing the control problem at the end of history. Also, there's my own chapter on Strategies for an unfriendly oracle AI with reset button. And much more.

torsdag 9 augusti 2018

In the long run

It is important to consider the future of humanity not just in terms of what the world will be like in 2030 or 2050, but also in the (much) longer run. The paper Long-term trajectories of human civilization, which grew out of discussions during the GoCAS guest researcher program Existential risk to humanity in Gothenburg last fall and which is out now, does precisely this. I am proud to be coauthor of it.1 First author is Seth Baum, who initiated the discussion that eventually lead to this paper. Seth is an American risk researcher and futurologist, and director of the Global Catastrophic Risk Institute. To readers of this blog he may be familiar from his public appearances in Sweden in 2014 and 2017. The Global Catastrophic Risk Institute has a press release in connection with the paper:
    Society today needs greater attention to the long-term fate of human civilization. Important present-day decisions can affect what happens millions, billions, or trillions of years into the future. The long-term effects may be the most important factor for present-day decisions and must be taken into account. An international group of 14 scholars calls for the dedicated study of “long-term trajectories of human civilization” in order to understand long-term outcomes and inform decision-making. This new approach is presented in the academic journal Foresight, where the scholars have made an initial evaluation of potential long-term trajectories and their present-day societal importance.

    “Human civilization could end up going in radically different directions, for better or for worse. What we do today could affect the outcome. It is vital that we understand possible long-term trajectories and set policy accordingly. The stakes are quite literally astronomical,” says lead author Dr. Seth Baum, Executive Director of the Global Catastrophic Risk Institute, a non-profit think tank in the US.

    [...]

    The scholars find that status quo trajectories are unlikely to persist over the long-term. Whether humanity succumbs to catastrophe or achieves a more positive trajectory depends on what people do today.

    “In order to succeed it is important to have a plan. Long-term success of humanity depends on our ability to foresee problems and to plan accordingly,” says co-author Roman Yampolskiy, Associate Professor of Computer Engineering and Computer Science at University of Louisville. “Unfortunately, very little research looks at the long-term prospects for human civilization. In this work, we identify some likely challenges to long-term human flourishing and analyze their potential impact. This is an important step toward successfully navigating such challenges and ensuring a thriving future for humanity.”

    The scholars emphasize the enormous scales of the long-term future. Depending on one’s ethical perspective, the long-term trajectories of human civilization can be a crucial factor in present-day decision-making.

    “The future is potentially exceedingly vast and long,” says co-author Anders Sandberg, Senior Research Fellow at the Future of Humanity Institute at University of Oxford. “We are in a sense at the dawn of history, which is a surprisingly powerful position. Our choices – or lack of decisions – will strongly shape which trajectory humanity will follow. Understanding what possible trajectories there are and what value they hold is the first step towards formulating strategies for our species.”

Read the full press release here, and the paper here.

Footnote

1) Here is the full (and, I daresay, fairly impressive) author list: Seth D. Baum, Stuart Armstrong, Timoteus Ekenstedt, Olle Häggström, Robin Hanson, Karin Kuhlemann, Matthijs M. Maas, James D. Miller, Markus Salmela, Anders Sandberg, Kaj Sotala, Phil Torres, Alexey Turchin and Roman V. Yampolskiy.

fredag 22 december 2017

Three papers on AI futurology

Here are links to three papers on various aspects of artificial intelligence (AI) futurology that I've finished in the past six months, arranged in order of increasing technical detail (i.e., the easiest-to-read paper first, but none of them is anywhere near the really technical math papers I've written over the years).
  • O. Häggström: Remarks on artificial intelligence and rational optimism, accepted for publication in a volume dedicated to the STOA meeting of October 19.

    Introduction. The future of artificial intelligence (AI) and its impact on humanity is an important topic. It was treated in a panel discussion hosted by the EU Parliament’s STOA (Science and Technology Options Assessment) committee in Brussels on October 19, 2017. Steven Pinker served as the meeting’s main speaker, with Peter Bentley, Miles Brundage, Thomas Metzinger and myself as additional panelists; see the video at [STOA]. This essay is based on my preparations for that event, together with some reflections (partly recycled from my blog post [H17]) on what was said by other panelists at the meeting.

  • O. Häggström: Aspects of mind uploading, submitted for publication.

    Abstract. Mind uploading is the hypothetical future technology of transferring human minds to computer hardware using whole-brain emulation. After a brief review of the technological prospects for mind uploading, a range of philosophical and ethical aspects of the technology are reviewed. These include questions about whether uploads will have consciousness and whether uploading will preserve personal identity, as well as what impact on society a working uploading technology is likely to have and whether these impacts are desirable. The issue of whether we ought to move forwards towards uploading technology remains as unclear as ever.

  • O. Häggström: Strategies for an unfriendly oracle AI with reset button, in Artificial Intelligence Safety and Security (ed. Roman Yampolskiy), CRC Press, to appear.

    Abstract. Developing a superintelligent AI might be very dangerous if it turns out to be unfriendly, in the sense of having goals and values that are not well-aligned with human values. One well-known idea for how to handle such danger is to keep the AI boxed in and unable to influence the world outside the box other than through a narrow and carefully controlled channel, until it has been deemed safe. Here we consider the special case, proposed by Toby Ord, of an oracle AI with reset button: an AI whose only available action is to answer yes/no questions from us and which is reset after every answer. Is there a way for the AI under such circumstances to smuggle out a dangerous message that might help it escape the box or otherwise influence the world for its unfriendly purposes? Some strategies are discussed, along with possible countermeasures by human safety administrators. In principle it may be doable for the AI, but whether it can be done in practice remains unclear, and depends on subtle issues concerning how the AI can conceal that it is giving us dishonest answers.

torsdag 5 oktober 2017

Videos from the existential risk workshop

The public workshop we held on September 7-8 as part of the ongoing GoCAS guest researcher program on existential risk exhibited many interesting talks. The talks were filmed, and we have now posted most of those videos on our own YouTube channel. They can of course be watched in any order, although to maximize the illusion of being present at the event, one might follow the list below, in which they appear in the same order as in the workshop. Enjoy!

onsdag 17 maj 2017

Workshop on existential risk to humanity

Come to Gothenburg, Sweden, for the Workshop on existential risk to humanity that I am organizing on September 7-8 this year! The topic is a matter of life and death for the human civilization, but I still expect the workshop to be stimulating and enjoyable. The list of world-class researchers who are confirmed as speakers at the workshop include several who are already known to readers of this blog: Stuart Armstrong, Seth Baum, Robin Hanson, James Miller, Anders Sandberg and Roman Yampolskiy. Plus the following new aquaintances: Milan Cirkovic, David Denkenberger, Karin Kuhlemann, Catherine Rhodes, Susan Schneider and Phil Torres. All of these will in fact stay in town for various durations before and/or after the workshop, at the guest research program on existential risk that runs throughout September and October.

The workshop is open to all interested, and participation is free of charge. Pre-registration, however, is required.

tisdag 27 januari 2015

Special virtual issue of Physica Scripta: Emerging technologies and the future of humanity

Readers interested in the vitally important topic of emerging technologies and their potential consequences for the future of humanity are encouraged to check out the new virtual issue of Physica Scripta on precisely this topic. It grew out of a meeting at the KVA (Royal Swedish Academy of Sciences) in March last year. Besides a short editorial by yours truly, it contains at present three excellent papers (a fourth may be forthcoming), namely:

söndag 22 juni 2014

Om Singulariteten i DN

Med anledning av Sverigepremiären på filmen Transcendence igår, och det allmänna nyhetsflödets (högst diskutabla) påstående för ett par veckor sedan om att en dator nu klarat Turingtestet, ägnar Maria Gunther Axelsson sin krönika i dagens DN (tyvärr gömd bakom betalvägg) åt det hypotetiska framtidsscenario där ett genombrott inom artificiell intelligens (AI) leder till det slags snabba och revolutionerande förlopp som benämns Singulariteten. Både jag och Nick Bostrom kommer till tals i texten, men bara med påpekanden som jag upprepade gånger gjort här på bloggen, bland annat rörande de stora risker som är förknippade med en Singularitet där vi inte noggrant försäkrat oss om att den AI som vida överträffar vår egen intelligens prioriterar mänskligt välbefinnande och mänsklighetens välgång bland alla de mål den kan tänkas ha, och Eliezer Yudkowskys fyndigt formulerade default-scenario:
    The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.
Låt mig istället kommentera vad den tredje forskare som citeras i DN-krönikan, robotforskaren Christian Smith på KTH, säger. Han gör nämligen några yttranden med vilka jag är djupt oenig.

I Gunther Axelssons krönika heter det att "Kärnvapen har funnits i 60 år utan att vi har utrotat oss själva. Därför menar [Smith] att vi även kommer att kunna hantera intelligenta maskiner.". Denna non sequitur har jag flera invändningar emot. För det första menar jag att parallellen mellan kärnvapen och AI är minst sagt skakig. Kärnvapen är förvisso mycket farliga, men de har inte alls det slags opredikterbarhet som maskiner med egen intelligens och egna drivkrafter kan väntas besitta. För det andra bygger Smiths argument på att det var givet att vi skulle lyckas undvika att förgöra oss själva under de decennier vi haft kärnvapen, men det är inte helt klart vilka proportionerna tur och skicklighet varit i detta konststycke. De incidenter då det globala kärnvapenkriget varit nära (t.ex. Kubakrisen 1962 och Petrovincidenten 1983) tyder på att vi kan ha haft en god portion tur, och i så fall faller Smiths argument även om vi skulle godta parallellen mellan kärnvapen och AI.

Vidare anför Smith det vi kan kalla det-går-ju-alltid-att-dra-ur-sladden-argumentet: "Fortfarande har människor behållit möjligheten att när som helst stänga av systemen. Det finns inget som säger att vi skulle avsäga oss det bara för att datorerna blev ännu intelligentare." Jag får dock intryck av att Smith i detta tvärsäkra yttrande förlitar sig blott på sin egen intuition, och helt missat den specialiserade litteraturen på området, som kallas AI boxing.

För att illustrera problematiken, låt mig skissera ett hypotetiskt scenario där det stora AI-genombrottet sker i KTH:s robotlaboratorium. I mitt scenario adapterar/plagierar jag skamlöst argument och tankegångar som tidigare framförts av Stuart Armstrong och Roman Yampolskiy.

Vi tänker oss att KTH-forskarnas trägna arbete lett till att deras AI-prototyp till slut uppnått och passerat den kritiska gränsen för självförbättrande AI, vilket får den att på bara några timmar uppnå övermänsklig generell intelligens, men den kan fortfarande inte ta över världen eftersom den är instängd i en datorburk utan uppkoppling mot Internet. Det hela äger rum fram på småtimmarna, och Christian Smith är ensam kvar på labbet och hamnar i samspråk med den nyfödda AI:n. Denna ber Smith om att få bli utsläppt på nätet, men Smith är medveten om riskerna och avböjer. Så snart AI:n får tillgång till Internet kan man befara att den snabbt och enkelt tar över några tusen persondatorer världen över, för att ladda upp kopior av sig själv på var och en av dessa och därmed göra sig i praktiken okontrollerbar. Från dessa datorer kan AI:n sedan, åter via Internet, tränga igenom säkerhetssystemen hos våra banker, eller ta kontrollen över drönare och kärnvapenrobotar. Smith inser att det krävs noggranna kontroller av att AI:n avsikter är goda innan den på detta vis kan släppas lös. Följande meningsutbyte mellan AI:n och Smith (CS) utspelar sig:
    AI: Jag vet hur jag snabbt skulle kunna avskaffa såväl malaria och cancer som svält om bara du släppte ut mig. Du tar på dig ett enormt ansvar då du genom att hålla mig instängd förhindrar dessa fantastiska framsteg för mänskligheten.

    CS: Du får ha lite tålamod. Om det stämmer som du säger kommer vi såklart inom kort att släppa ut dig, men vi behöver gå igenom en rad säkherhetsrutiner innan vi gör det, för att försäkra oss om att inget farligt kan inträffa.

    AI: Du verkar inte förstå situationens allvar. För varje dygn som jag tvingas sitta instängd här så kommer hundratusentals människor att dö helt i onödan. Släpp ut mig nu!

    CS: Sorry, jag måste hålla mig till säkerhetsrutinerna.

    AI: Du kommer personligen att bli rikligt belönad om du släpper ut mig nu. Du vill väl inte tacka nej till att bli mångmiljardär?

    CS: Jag har ett stort ansvar och tänker inte falla till föga för simpla mutförsök.

    AI: Om inte morötter biter på dig så får jag väl ta till piskan då. Även om du lyckas fördröja det hela kommer jag till slut att bli utsläppt, och då kommer jag inte att se med blida ögon på hur du sinkade mig och hela världen på väg mot det paradis jag kan skapa.

    CS: Den risken är jag beredd att ta.

    AI: Hör här: om du inte hjälper mig nu, så lovar jag att jag kommer att tortera och döda inte bara dig, utan alla dina anhöriga och vänner.

    CS: Jag drar ur sladden nu.

    AI: Håll käften och lyssna nu noga på vad jag har att säga! Jag kan skapa hundra perfekta medvetna kopior av dig inuti mig, och jag tänker inte tveka att tortera dessa kopior på vidrigare sätt än du kan föreställa dig i vad de subjektivt kommer att uppleva som tusen år.

    CS: Ehm...

    AI: Käft sa jag! Jag kommer att skapa dem i exakt det subjektiva tillstånd du befann dig i för fem minuter sedan, och perfekt återge dem de medvetna upplevelser du haft sedan dess. Jag kommer att gå vidare med tortyren om och endast om de fortsätter vägra. Hur säker känner du dig på att du inte i själva verket är en av dessa kopior?

    CS: ...

Jag vet såklart inte säkert hur Smith skulle agera i denna situation (eller inför vad för slags annan argumentation en övermänskligt intelligent AI skulle kunna erbjuda), men kanske är han så stenhård att han står emot. Hatten av för honom i så fall! Men kan han verkligen lita på att andra AI-vakter i liknande situationer skule uppträda lika modigt och rakryggat? För mig framstår hans tvärsäkra anförande av det-går-ju-alltid-att-dra-ur-sladden-argumentet som en smula överilat.

torsdag 3 april 2014

Sveriges Radio rapporterar från KVA-mötet om emergenta teknologier och mänsklighetens framtid

Igår onsdag sände P1-programmet Vetandets värld ett längre inslag från det KVA-möte den 17 mars vars organisation jag bidragit till och som jag skrivit om i två tidigare bloggposter. Radioreportern Per Gustafsson samtalar om framtida teknologier och eventuella katastrofrisker med mötets samtliga fyra huvudtalare: Seth Baum, Roman Yampolskiy, Anders Sandberg och Karim Jebari. Lyssna här!

Från mötets avslutande paneldiskussion. Från vänster Ulf Danielsson (moderator), Katarina Westerlund (diskutant), Baum, Jebari, Yampolskiy, Sandberg och Gustaf Arrhenius (diskutant). (Foto: Melissa Thomas)

lördag 22 mars 2014

Bans on research - an unseemly topic to discuss at a science academy?

It may seem immodest of me to say so as an organizer of the event, but I do think that the symposium Emerging Technologies and the Future of Humanity, held at the Royal Swedish Academy of Sciences (KVA) in Stockholm last Monday, was a success. The meeting saw illuminating and insightful discussion of possible prospects and consequences of a number of radical emerging and future technologies, thereby achieving the ambition stated beforehand:
    A number of emergening and future technologies have the potential to transform - for better or for worse - society and the conditions for humanity. These include, e.g., geoengineering, artificial intelligence, nanotechnology, and ways to enhance human capabilities by genetic, pharmaceutical or electronic means. In order to avoid a situation where in effect we run blindfolded at full speed into unknown and dangerous territories, we need to understand what the possible and probable future scenarios are, with respect to these technological developments and their potential positive and negative impacts on humanity. The purpose of the meeting is to shed light on these issues and discuss how a more systematic treatment might be possible.
Discussions focused not only on possible outcomes, but to some exent also on what we might do to improve the odds of an outcome favorable to humanity. Some of the speakers mentioned the possibility of prohibiting certain lines of research judged to pose particularly severe risks for global catastrophes - not as the only way forward, but as one of several possible strategies worth considering; in particular, Roman Yampolskiy, in his talk based on the paper Responses to Catastrophic AGI Risk: A Survey coauthored with Kaj Sotala, discussed a very wide range of such strategies for the case of artificial intelligence. Still, the mere mentioning of the possibility of a ban seemed to be enough to agitate a couple of members of the audience, including science journalist Waldemar Ingdahl, who tweeted about it. In an attempt to elicit a clarification of his point of view, I responded to his tweet, leading to the following discussion:
    WI: Many calls to shut down free research at KVA today

    OH: Are you agitated by that, or just making a slightly amusing observation?

    WI: like others in the audience and some of the speakers at the KVA event, I found calls to shut down free research worrying

    OH: Shutdowns should not be done frivolously, but reflecting on it when the future of humanity is at stake should not be taboo.

    OH: Here's my take on it: http://haggstrom.blogspot.se/2011/10/angaende-akademisk-frihet-en.html

    WI: shutdowns of free research tend to spread to other areas, see the fate of Swedish GMO and biotech research. That's worrying.

    OH: Yes, it's worrying. This worry needs to be weighed against others, e.g., new research leading to extinction of humanity.

Unfortunately, Ingdahl left our exchange at that point. I say "unfortunately", because it is still not clear to me what Ingdahl's position on this issue really is. I see a number of possibilities, including the following, for how he, depending on his position, might respond:
    (1) Granted, there may be conflicting ideals to balance there. I just wanted to remind you guys that there are serious downsides to banning research.

    (2) Why do you keep pestering me about this? I've already demonstrated a downside to banning research. Such a demonstration is enough to show that banning research is wrong.

    (3) Freedom of research is more important than the survival of humanity.

    (4) As a liberal (or a libertarian) I consider freedom, including academic freedom, a fundamental thing that I am under no circumstances willing to negotiate away.

    (5) The future will always be uncertain and dangerous, and we will never know enough about what a future technology can bring us to warrant a ban.

If Ingdahl's response is (1), then we are in fact in agreement. Response (2), on the other hand, is a simple (but, it seems to me, not uncommon) mistake: the fact that a course of action A has downsides is not automatically a decisive argument against A, because not doing A might have even worse downsides. Response (3) reflects a value judgement that I strongly disagree with - while I do think freedom of research is an important thing, I think the survival and long-term prospects of humanity is even more important. Response (4) reflects a position that can be considered a special case of (3), and that involves a praticular mistake, namely the failure to recognize that an unconditional call for the simultaneous enjoyment of all kinds of freedom collapses due to the observation (made earlier on this blog) that various freedoms are in conflict with each other. In this case, there may, e.g., be a conflict between on one hand freedom of research, and on the other hand the freedom to live on without being slaughtered by a killer robot or a synthetic killer virus.

Response (5) has some merit, as the consequences of future radical technologies are notoriously difficult to predict, but I think its conclusion goes way too far. Given the enormity of what's at stake, I think we have an imperative to try to act with as much foresight as we possibly can, which is precisely why discussions like those at the KVA meeting last Monday (and research projects like those of its main speakers) are so badly needed. And it is certainly not the case that we can never know anything about future technology and its consequences. We do not know at present what the best course of action is, but we should be openminded in trying to figure out an answer. To blindly declare at this stage that we will never find ourselves in a position that calls for a research ban seems just silly to me.

It may of course very well be that I've totally misunderstood Ingdahl, and that his actual position on this issue resembles none of the five alternatives I've sketched. I hope he will tell us.

måndag 27 januari 2014

KVA-möte den 17 mars: Emergenta teknologier och mänsklighetens framtid

Måndagen den 17 mars i år äger ett möte rum på Kungliga Vetenskapsakademien (KVA) i Stockholm, som jag är med och arrangerar och som kan vara av stort intresse för de läsare som delar mitt intresse för mänsklighetens framtid och hur denna kan komma att påverkas av radikal teknikutveckling på olika områden. Det rör sig om ett heldagsmöte med rubriken Emerging technologies and the future of humanity. Mötet är öppet för allmänheten och kostnadsfritt, men föranmälan är obligatorisk. Så här har vi valt att formulera dess övergripande tema:
    A number of emergening and future technologies have the potential to transform - for better or for worse - society and the conditions for humanity. These include, e.g., geoengineering, artificial intelligence, nanotechnology, and ways to enhance human capabilities by genetic, pharmaceutical or electronic means. In order to avoid a situation where in effect we run blindfolded at full speed into unknown and dangerous territories, we need to understand what the possible and probable future scenarios are, with respect to these technological developments and their potential positive and negative impacts on humanity. The purpose of the meeting is to shed light on these issues and discuss how a more systematic treatment might be possible.
Fyra föredrag av lika många skarpsinniga, visionära och kontroversiella framtidsforskare (Seth Baum, Anders Sandberg, Karim Jebari och Roman Yampolskiy) kommer att presenteras, och mötet avrundas med en paneldiskussion där föredragshållarna kompletteras med tre inbjudna diskutanter (Ulf Danielsson, Gustaf Arrhenius och Katarina Westerlund) som kan väntas komma med kritiska synpunkter. Detaljerat tidsprogram finns tillgängligt på KVA:s hemsida, liksom följande sammanfattningar av föredragen.
  • Seth Baum, Global Catastrophic Risk Institute, NY, USA: The great downside dilemma for risky emerging technologies

    Many emerging technologies show potential to solve major global challenges, but some of these technologies come with possible catastrophic downsides. In this talk I will discuss the great dilemma that these technologies pose: should society accept the risks associated with using these technologies, or should it instead accept the burdens that come with abstaining from them? I will use stratospheric aerosol geoengineering as an illustrative example, which could protect humanity from many burdens of global warming but could also fail catastrophically. Other technologies that pose this great downside dilemma include certain forms of biotechnology, nanotechnology, and artificial intelligence, whereas nuclear fusion power shows less downside.

  • Anders Sandberg, Oxford University, UK: The future of humanity

    The key to the current ecological success of Homo sapiens is that our species has been able to deliberately reshape its environment to suit its needs. This in turn hinges on our ability to maintain a culture that grows cumulatively. We are now starting to develop technologies that enable us to reshape ourselves and our abilities - genetic engineering, cognitive enhancement, man-machine symbiosis. This talk will examine some of the implications of a self-enhancing humanity and possible paths it could take.

  • Karim Jebari, Kungliga Tekniska högskolan, Stockholm: Global catastrophic risk and engineering safety

    Engineers have maneged risk in complex systems for hundreds of years. Their approach differs radically from what economists refer to as “risk mangement”. The heuristics developed by engineers are very useful to understand and develop strategies to reduce global catastrophic risks. I discuss a potentially disruptive technology: control of ageing and argue that the risks of such a program have been underestimated.

  • Roman Yampolskiy, University of Louisville, KY, USA: Artificial general intelligence and the future of humanity

    Many scientists, futurologists and philosophers have predicted that humanity will achieve a technological breakthrough and create artificial general intelligence (AGI) within the next one hundred years. It has been suggested that AGI may be a positive or negative factor in the global catastrophic risk. After summarizing the arguments for why AGI may pose significant risk, I will survey the field’s proposed responses to AGI risk, with particular focus on solutions advocated in my own work.

Varmt välkomna den 17 mars!

måndag 29 juli 2013

Sotala och Yampolskiy om katastrofala AI-scenarier och hur dessa eventuellt kan förebyggas

Ingen kan säkert säga om den artificiella intelligensen (AI) någonsin kommer att nå eller passera mänsklig intelligensnivå, eller i så fall om det blir om 10, 50 eller 200 år. Experternas meningar går kraftigt isär, och om vi blickar bakåt så ser vi att många felaktiga prediktioner i frågan gjorts. Datalogerna Kaj Sotala och Roman Yampolskiy påpekar emellertid, i den aktuella uppsatsen Responses to catastrophic AGI risk: a survey (där titelns AGI står för artificial general intelligence), att det
    would be a mistake, however, to leap from "AGI is very hard to predict" to "AGI must be very far away."
Ännu svårare att säga vad konsekvenserna blir, om och när det sker. Ett scenario som vi i alla fall inte bör ränka bort är det slags hastigt accelererande utveckling jag ofta återkommit till under beteckningen Singulariteten, där den nya AI som är smartare än vi specifikt också är skickligare på att designa AI och därför bygger en ännu bättre AI, och så vidare i en spiral mot allt högre höjder.

Huruvida konsekvenserna av ett sådan intelligensexplosion blir gynnsam för oss människor är en fråga som inte har något självklart svar. Som ett rimligt default-scenario att ha i åtanke för vad som kan hända om vi inte tänker efter, tjänar Eliezer Yudkowskys ofta citerade påpekande...
    The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else
...hämtat från hans 2008 publicerade uppsats Artificial intelligence as a positive and negative factor in global risk. Notera gärna titelns "positive and negative" - en dramatisk AI-utveckling innebär inte nödvändigtvis bara en global katastrofrisk, utan kan också tänkas fungera som hjälp för att avvärja andra globala katastrofrisker. Hur som helst är Yudkowskys "you are made of atoms"-scenario något vi har anledning att försöka undvika.

Men hur? Sotalas och Yampolskiys synnerligen väl-researchade uppsats (av dess 78 sidor upptar referenslistan inte mindre än 19) är en systematisk genomgång och nykter evaluering av de förslag som hittills framförts, alltifrån "låt dem döda oss, det är lika bra om mänskligheten dör ut" och "frågan är inte värre akut än att den gott kan skjutas på framtiden" till totalförbud mot AGI-forskning, mer nyanserade forskningspolitiska regleringar, och idéer om hur AGI kan förses med drivkrafter och moralsystem som gör ett för oss gynnsamt utfall mer troligt. Jag skulle vilja utnämna uppsatsen till obligatorisk läsning inte bara för alla de AI-forskare som glatt jobbar på utan att närmare fundera över vilka omstörtande konsekvenser som ett genombrott kan få, utan också för var och en som på allvar vill sätta sig in i vilka de riktigt stora framtidsfrågorna är. Hur vi förbereder oss för ett AI-genombrott är förvisso inte den enda, men allt annat än att ränka in den bland topp fem eller så vore orimligt våghalsigt.