Häggström hävdar: Zvi Mowshowitz

Visar inlägg med etikett Zvi Mowshowitz. Visa alla inlägg

fredag 10 oktober 2025

If Anyone Builds It, Everyone Dies: my review

In the mathematics community, there is a popular joke about inflation in recommendation letters that goes as follows. A professor is happy about his PhD student, whom we may call Alex, and writes in a recommendation letter that Alex is arguably the most talented mathematician since Gauss. The next year, the professor has another PhD student, Robin, and writes in an even more enthusiastic letter that even though Alex is good, Robin is much better.

I am reminded about this as I now set out to review Eliezer Yudkowsky's and Nate Soares' new book If Anyone Builds It, Everyone Dies (henceforth, IABIED), and think back about my review of Nick Bostrom's 2014 book Superintelligence back when that book had just come out. The final sentence of that review reads "If this book gets the reception that it deserves, it may turn out the most important alarm bell since Rachel Carson's Silent Spring from 1962, or ever". Those are strong words, and I stand by them, and yet now I am tempted to announce that IABIED is better and more important than Bostrom's book.

Max Tegmark's praise for the book is more measured and restrained than mine.

The comparison is, however, unfair to Bostrom in two ways. First, Bostrom's book was written during the dawn of the deep learning revolution when it was not yet clear that it was about to become the paradigm that allowed AI development to really take off, and several years before the enormous breakthrough of large language models and other generative AI; while Yudkowsky's and Soares' book is jam-packed with insights coming from those recent developments, Bostrom's is obviously not. Second, while Superintelligence mostly exhibits a terse, academic style, IABIED is written with a broader audience in mind.

This last point should not be read as a disrecommendation of IABIED for AI researchers. Despite its popular style, the book argues quite forcefully and with a good deal of rigor for its central claims that (a) we seem to be on track to create superhumanly capable AIs within one or (at most) two decades, and that (b) with the current rush and consequent neglect of the safety aspect, creation of such AIs will likely spell the end of the human race. To the many AI researchers who are still unfamiliar with the central arguments for these claims and who in many cases simply deny the risk,¹ the book is potentially a very valuable read to get them more on board with the state-of-the-art in AI risk. And to those of us who are already on board with the central message, the book is valuable for a different reason, in that it offers a wealth of pedagogical devices that we can use when we explain AI risk to other audiences.

The authors are highly qualified in the field of AI safety, which Yudkowsky pioneered in the 00s.^2,3 Soares came later into the playing field, but is nevertheless one of its veterans, and currently the president of Machine Intelligence Research Institute (MIRI) that Yudkowsky co-founded in 2000 and still works at. They have both worked for many years on the so-called Al alignment problem - that of making sure that the first really powerful AIs have goals that are aligned with ours - but most of the fruits of this labor have been not blueprints for aligning AIs, but negative results, indicating how difficult the problem is, and how creating superintelligent AI without first having solved AI alignment spells disaster. This, unfortunately, reflects the situation that the entire field is facing.

In recent years (most visibly since 2021, but I suspect the insight goes back a bit further), Yudkowsky and Soares have converged on the conclusion that with the shortening of timelines until the creation of superintelligence (a time span increasingly often estimated in a single-digit number of years rather than in decades), we are very unlikely to solve AI alignment in time to avert existential catastrophe. Hence the stark book title If Anyone Builds It, Everyone Dies. They really mean it - and to emphasize this, one of the recurrent slogans during their book promotion work has been "We wish we were exaggerating". I mostly buy their message, albeit with less certainty than the authors; if I hade written the book, a more suitable title would have been If Anyone Builds It, Probably Everyone Dies. But of course, they are right to present their true judgements without softening or downplaying them, and to gesture towards what they think is the only viable solution: to pull the brakes, via binding international agreements, on frontier AI development. They are under no illusion that achieving this is easy, but insist that if we firmly decide to save our species from obliteration, it can be done.

The book is remarkably easy to read, and I have been very happy to put it in the hands of a number of non-expert friends, and to urge them to read it. The authors' most consistently recurrent pedagogical device is the use of colorful analogies and metaphors. One of my favorite passages of the book is a detailed description of how a nuclear energy plant works and what went wrong in the Chernobyl 1986 disaster. A comparison between this and advanced AI development reveals far-reaching similarities, but also differences in that the engineers at Chernobyl had a far better grasp of the principles underlying the nuclear reactions and how to stay safe - in particular by knowing the exact critical fraction of released neutrons triggering another fission event that is needed for a runaway chain reaction, along with the time frames involved - compared to present-day AI researchers who can at most make educated guesses about the corresponding runaway AI dynamics.

The authors' favorite source of analogies is not nuclear physics, however, but biological evolution. Early in the book (on p 17-18) we are treated to the following lovely illustration of the virtually unlimited powers of intelligence:

"It's going to take me a few more moves," said the hominid-god, "but I think I've got this game in the bag."

There was a confused silence, as many gods looked over the gameboard trying to see what they had missed. The scorpion-god said, “How? Your ‘hominid’ family has no armor, no claws, no poison.”

“Their brain,” said the hominid-god.

“I infect them and they die,” said the smallpox-god.

“For now,” said the hominid-god. “Your end will come quickly, Smallpox, once their brains learn how to fight you.”

“They don’t even have the largest brains around!” said the whale-god.

“It’s not all about size,” said the hominid-god. “The design of their brain has something to do with it too. Give it two million years and they will walk upon their planet’s moon.”

“I am really not seeing where the rocket fuel gets produced inside this creature’s metabolism,” said the redwood-god. “You can’t just think your way into orbit. At some point, your species needs to evolve metabolisms that purify rocket fuel—and also become quite large, ideally tall and narrow—with a hard outer shell, so it doesn’t puff up and die in the vacuum of space. No matter how hard your ape thinks, it will just be stuck on the ground, thinking very hard.”

“Some of us have been playing this game for billions of years,” a bacteria-god said with a sideways look at the hominid-god. “Brains have not been that much of an advantage up until now.”

“And yet,” said the hominid-god.

The book has dozens and dozens of similarly quotable gems. I love it, and I want everyone to read it.

Many other reviews of IABIED have been written. Those that resonate best with me include one by Scott Alexander, and one by Zvi Mowshowitz, who also offers a broad annotated collection of (positive as well as negative) reactions from others.

Footnotes

1) Indeed, there is still plently of such ignorance or even denialism around in the AI research community. As an illustrative example, Swedish readers may have look at the denialism pushed in public debate in August this year by a group of colleagues of mine at the Chalmers University of Technology.

2) Nick Bostrom's 2014 book can to no small extent be said to be conceived on top of Yudkowsky's shoulders.

3) Hostile critics sometimes counter this with the claim that Yudkowsky's highest academic merit is that of being a high-school dropout, which is formally true but conveys a lack of understanding of the importance of distinguishing between the social game of formal qualifications and the reality of actual competence.

lördag 5 april 2025

Recommending the AI 2027 report by Kokotajlo and collaborators

Most people, including many AI experts, struggle to grasp (if they're even aware of it) the increasingly dominant view emerging in the epicenter of AI development in San Francisco and Silicon Valley: that extreme developments (on a scale surpassing even the Industrial Revolution) may well occur within the present decade. It is therefore entirely reasonable that there's growing demand for concrete scenarios describing what might happen. For almost exactly ten months, from early June 2024 onward, my go-to reference for such scenarios was Leopold Aschenbrenner’s Situational Awareness.

As of today, however, I recommend instead the brand-new AI 2027, by Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland and Romeo Dean. The report is an ambitious, detailed, remarkably competent and highly readable account of where things may be headed over the next few years. I commend the authors for putting in the considerable amount of work needed to produce the report, and I share their hope that the report will stimulate others in the AI sphere to react to it, for instance by challenging the various assumptions underlying the suggested scenario, or by proposing alternative scenarios.

Anyone with an interest in societal issues and a desire to ground their discussions in a realistic situational assessment is warmly encouraged to read it. In addition to the report itself, there is a highly informative three-hour podcast episode where Dwarkesh Patel interviews the first two authors, plus a short blog post by Scott Alexander introducing the project.

[Edit, April 9: Here are Scott Alexander's personal takeways from the project. Here and here are two relevant blog posts by Zwi Mowshowitz: the first is a detailed reading of the aforementioned podcast discussion, and the second is his summary of various people's initial reactions to the report, on Twitter and elsewhere. All of this is well worth reading, and I am certain a lot more will follow.]

fredag 28 juni 2024

On optimism and pessimism standing at the brink of the great AI breakthrough

Sometimes in discussions about technology, the term techno-optimism is reserved for the belief that technology will develop rapidly, while techno-pessimism is used for the belief that it will move slowly or come to a halt. This is not the meaning of optimism and pessimism intended here. Throughout this blog post, the terms will refer to beliefs about consequences of this technology: are they likely to be good or to be bad?

Last time I wrote about the concepts of optimism and pessimism in the context of future AI advances and their ramifications, way back in 2017, I advocated for a sober and unbiased outlook on the world, and held

^1,2

I still hold this view, but I nevertheless think it is worth revisiting the issue to add some nuance,³ now in 2024 when we seem to be standing closer than ever to the brink of the great AI breakthrough. To suggest intentionally introducing either an optimism or a pessimism bias still sounds bad to me, but we can reframe the issue and make it less blatantly pro-distortion by admitting that all our judgements about our future with AI will necessarily be uncertain, and asking whether there might be an asymmetry in the badness of erring on the optimistic or the pessimistic side. Is excessive optimism worse than excessive pessimism, or vice versa?

There are obvious arguments to make in either direction. On one hand, erring on the side of optimism may induce decision-makers to recklessly move forward with unsafe technologies, whereas the extra caution that may result from undue pessimism is less obviously catastrophic. On the other hand, an overly pessimistic message may be disheartening and cause AI researchers, decision-makers and the general public to stop trying to create a better world and just give up.

The latter aspect came into focus for me when Eliezer Yudkowsky, after having began in 2021 to open up publicly about his dark view of humanity's chances of surviving upcoming AI developments, went all-in on this with his 2022 Death with dignity and AGI ruin blog posts. After all, there are all these AI safety researchers working hard to save humanity by solving the AI alignment problem - reserchers who rightly admire Yudkowsky as the brilliant pioneer who during the 00s almost single-handedly created this research area and discovered many of its crucial challenges,⁴ and to whom it may be demoralizing to hear that this great founder no longer believes the project has much chance of success. In view of this, shouldn't Yudkowsky at least have exhibited a bit more epistemic humility about his current position?

I now look more favorably upon Yudkowsky's forthrightness. What made me change my mind is a graphic published in June 2023 by Chris Olah, one of the leading AI safety researchers at Anthropic. The x-axis of Olah's graph represents the the level of difficulty of solving the AI alignment problem, ranging from trivial via steam engine, Apollo project and P vs NP to impossible, and his core messages are (a) that since uncertainty is huge about the true difficulty level we should rationally represent our belief about this as some probability distribution over his scale, and (b) that it is highly important to try to reduce the uncertainty and improve the precision in our belief, so as to better be able to work in the right kind of way with the right kind of solutions. It is with regards to (b) that I changed my mind on what Yudkowsky should or shouldn't say. If AI alignment is as difficult as Yudkowsky thinks, based on his unique experience of decades of working hard on the problem, then it is good that he speaks out about this, so as to help the rest of us move our probability mass towards P vs NP or beyond. If instead he held back and played along with the more common view that the difficulty is a lot easier - likely somewhere around steam engine or Apollo project - he would contribute to a consensus that might, e.g., cause a future AI developer to wreak havoc by releasing an AI that looked safe in the lab but failed to have that property in the wild. This is not to say that I entirely share Yudkowsky's view of the matter (which does look overconfident to me), but that is mostly beside the point, because all he can reasonably be expected to do is to deliver his own expert judgement.

At this point, I'd like to zoom in a bit more on Yudkowsky's list of lethalities in his AGI ruin post, and note that most of the items on the list express reasons for not putting much hope in one or the other of the following two things.

(2) Our ability to collectively decide not to build an AI that might wipe out Homo sapiens.

It is important for a number of reasons to distinguish pessimism about (1) and pessimism about (2),⁵ such as how a negative outlook on (1) gives us more reason to try harder to solve (2), and vice versa. However, the reason I'd like to mainly highlight here is that unlike (1), (2) is a mostly social phenomenon, so that beliefs about the feasibility of (2) can influence this very feasibility. To collectively decide not to build an existentially dangerous AI is very much a matter of curbing a race dynamic, be it between tech companies or between nations. Believing that others will not hold back may disincentivize a participant in the race from themselves holding back. This is why undue pessimism about (2) can become self-fulfilling, and for this reason I believe such pessimism about (2) to be much more lethal than a correponding misjudgement about (1).⁶

This brings me to former OpenAI employee Leopold Aschenbrenner's recent and stupendously interesting report Situational Awareness: The Decade Ahead.⁷ Nowhere else can we currently access a more insightful report about what is going on at the leading AI labs, how researchers there see the world, and the rough ride during the coming decade that the rest of the world can expect as a consequence of this. What I don't like, however, is the policy recommendations, which include the United States racing ahead as fast as possible towards AGI the next few years. Somewhat arbitrarily (or at lest with insufficiently explained reasons), Aschenbrenner expresses optimism about (1) but extreme pessimism about (2): the idea that the Chinese Communist Party might want to hold back from a world-destroying project is declared simply impossible unless their arm is twisted hard enough by an obviously superior United States. So while on one level I applaud Aschenbrenner's report for giving us outsiders this very valuable access to the inside view, on another level I fear that it will be counterproductive for solving the crucial global coordination problem in (2). And the combination of overoptimism regarding (1) and overpessimism regarding (2) seems super dangerous to me.

Footnotes

1) This was in the proceedings of a meeting held at the EU parliament on October 19, 2017. My discussion of the concepts of optimism and pessimism was provoked by how prominently these termes were used in the framing and marketing of the event.

2) Note here that in the quoted phrase I take both optimism and pessimism as deviations from what is justified by evidence - for instance, I don't here mean that taking the probability of things going well to be 99% to automatically count as optimistic. This is a bit of a deviation from standard usage, which in what follows I will revert to, and instead use phrases like "overly optimistic" to indicate optimism in the sense I gave the term in 2017.

3) To be fair to my 2017 self, I did add some nuance already then: the acceptance of "a different kind of optimism which I am more willing to label as rational, namely to have an epistemically well-calibrated view of the future and its uncertainties, to accept that the future is not written in stone, and to act upon the working assumption that the chances for a good future may depend on what actions we take today".

4) As for myself, I discovered Yudkowsky's writings in 2008 or 2009, and insofar as I can point to any single text having convinced me about the unique importance of AI safety, it's his 2008 paper Artificial intelligence as a positive and negative factor in global risk, which despite all the water under the bridges is still worthy of inclusion on any AI safety reading list.

5) Yudkowsky should be credited with making this distinction. In fact, when the Overton window on AI risk shifted drastically in early 2023, he took that as a sufficiently hopeful sign so as to change his mind in the direction of a somewhat less pessimistic view regarding (2) - see his much-discussed March 2023 Time Magazine article.

6) I don't deny that, due to the aforementioned demoralization phenomenon, pessimism about (1) might also be self-fulfilling to an extent. I don't think, however, that this holds to anywhere near the same extent as for (2), where our ability to coordinate is more or less constituted by the trust that the various participants in the race have that it can work. Regarding (1), even if a grim view of its feasibility becomes widespread, I think AI researchers will still remain interested in making progress on the problem, because along with its potentially enormous practical utility, surely this is one of the most intrinsically interesting research questions on can possibly ask (up there with understanding biogenesis or the Big Bang or the mystery of consciousness): what is the nature of advanced intelligence, and what determines its goals and motivations?

torsdag 30 maj 2024

Talking about OpenAI on the For Humanity Podcast

There's been a lot happening at OpenAI in recent weeks, including the departure of their two leading AI safety researchers Ilya Sutskever and Jan Leike. I discussed some of this, along with a handful of related AI risk and AI safety issues, in a long conversation with John Sherman in the latest episode of his podcast For Humanity: An AI Risk Podcast.

Those content with hearing the audio can alternatively get that via Google Podcasts or Apple Podcasts or wherever you listen to podcasts. The episode was released yesterday, May 29, and was recorded one week earlier, on May 22. For updates on what has happened since then with all of the ongoing OpenAI drama, I generally recommend Zvi Mowshowitz on his Substack Don't Worry About the Vase, and right now in particular his posts OpenAI: Fallout (from May 28) and AI #66: Oh to Be Less Online (from today, May 30).

torsdag 29 februari 2024

On OpenAI's report on biorisk from their large language models

Aligning AIs with whatever values it is we need them to have in order to ensure good outcomes is a difficult task. Already today's state-of-the-art Large Language Models (LLMs) present alignment challenges that their developers are unable to meet, and yet they release their poorly aligned models in their crazy race with each other where first prize is a potentially stupendously profitable position of market dominance. Over the past two weeks, we have witnessed a particularly striking example of this inability, with Google's release of their Gemini 1.5, and the bizarre results of their attempts to make sure images produced by the model exhibit an appropriate amount of demographic diversity among the people portrayed. This turned into quite a scandal, which quickly propagated from the image generation part of the model to the likewise bizarre behavior in parts of its text generation.¹

But while the incident is a huge embarrassment to Google, it is unlikely to do much real damage to society or the world at large. This can quickly change with future more capable LLMs and other AIs. The extreme speed at which AI capabilies are currently advancing is therefore a cause for concern, especially as alignment is expected to become not easier but more difficult as the AIs become more capable at tasks such as planning, persuasion and deception.² I think it's fair to say that the AI safety community is nowhere near a solution to the problem of aligning a superhumanly capable AI. As an illustration of how dire the situation is, consider that when OpenAI in July last year announced their Superalignment four-year plan for solving the alignment problem in a way that scales all the way up to such superhumanly capable AI, the core of their plan turns out to be essentially "since no human knows how to make advanced AI safe, let's build an advanced AI to which we can delegate the task of solving the safety problem".³ It might work, of course, but there's no denying that it's a huge leap into the dark, and it's certainly not a project whose success we should feel comfortable hinging the future survival of our species upon.

Given the lack of clear solutions to the alignment problem on the table, in combination with how rapidly AI capabilities are advancing, it is important that we have mechanisms for carefully monitoring these advances and making sure that they do not cross a threshold where they become able to cause catastrophe. Ideally this monitoring and prevention should come from a state or (even better) intergovernmental actor, but since for the time being no such state-sanctioned mechanisms are in place, it's a very welcome thing that some of the leading AI developers are now publicizing their own formalized protocols for this. Anthropic pioneered this in September last year with their so-called Responsible Scaling Policy,⁴ and just three months later OpenAI publicized their counterpart, called their Preparedness Framework.⁵

Since I recorded a lecture about OpenAI's Preparedness Framework last month - concluding that the framework is much better than nothing, yet way too lax to reliably protect us from global catastrophe - I can be brief here. The framework is based on evaluating their frontier models on a four-level risk scale (low risk, medium risk, high risk, and critical risk) along each of four dimensions: Cybersecurity, CBRN (Chemical, Biological, Radiological, Nuclear), Persuasion and Model Autonomy.⁶ The overall risk level of a model (which then determines how OpenAI may proceed with deployment and/or further capabilities development) is taken to be the maximum among the four dimensions. All four dimensions are in my opinion highly relevant and in fact indipensable in the evaluation of the riskiness of a frontier model, but the one to be discussed in what follows is CBRN, which is all about the model's ability to create or deploy (or assisting the creation or deployment of) non-AI weapons of mass destruction.

The Preparedness Framework report contains no concrete risk analysis of GPT-4, the company's current flagship LLM. A partial such analysis did however appear later, in the report Building an early warning system for LLM-aided biological threat creation released in January this year. The report is concerned with the B aspect of the CBRN risk factor - biological weapons. It describes an ambitious study in which 100 subjects (50 biology undergraduates, and 50 Ph.D. level experts in microbiology and related fields) are given tasks relevant to biological threats, and randomly assigned to have either access to both GPT-4 and the Internet, or just to the Internet. The question is: does GPT-4 access make subjects more capable at their tasks? It seems yes, but the report remains inconclusive.

It's an interesting and important study, and many aspects of it deserve praise. Here, however, I will focus on two aspects where I am more critical.

The first aspect is how the report goes on and on about whether the observed positive effect of GPT-4 on subjects' skills in biological threat creation is statistically significant.⁷ Of course, this obsession with statistical significance is shared with a kazillion other research reports in virtually all empirically oriented disciplines, but in this particular setting it is especially misplaced. Let me explain.

In all scientific studies meant to detect whether some effect is present or not, there are two distinct ways in which the result can come out erroneous. A type I error is to deduce the presence of a nonzero effect when it is in fact zero, while a type II error is is to fail to recognize a nonzero effect which is in fact present. The concept of statistical significance is designed to control the risk for type I errors; roughly speaking, employing statistical significance methodology at significance level 0.05 means making sure that if the effect is zero, the probability of erroneously concluding a nonzero effect should be at most 0.05. This gives a kind of primacy to avoiding type I errors over avoiding type II errors, laying the burden of proof on whoever argues for the existence of a nonzero effect. This makes a good deal of sense in a scientific community where an individual scientific career tends to consist largely of discoveries of various previously unknown effects, creating an incentive that in the absence of a rigorous system for avoiding type I errors might overwhelm scientific journals with a flood of erroneous claims about such discoveries.⁸ In a risk analysis context such as in the present study, however, it makes no sense at all, because here it is type II errors that mainly need to be avoided - because they may lead to irresponsible deployment and global catastrophe, whereas consequences of a type I error are comparatively trivial. The burden of proof in this kind of risk analysis needs to be laid on whoever argues that risk is zero or negligible, whence the primary focus on type I errors that follows implicitly from a statistical significance testing methodology gets things badly backwards.⁹

Here it should also be noted that, given how widely useful GPT-4 has turned out to be across a wide range of intellectual tasks, the null hypothesis that it would be of zero use to a rogue actor wishing to build biological weapons is highly far fetched. The failure of the results in the study to exhibit statistical significance is best explained not by the absence of a real effect but by the smallness of the sample size. To the extent that the failure to produce statistical significance is a real problem (rather than a red herring, as I think it is), it is exacerbated by another aspect of the study design, namely the use of multiple measurements on each subject. I am not at all against such multiple measurements, but if one is fixated on the statistical significance methodology, it leads to dependencies in the data that force the statistical analyst to employ highly conservative¹⁰ p-value calculations, as well as to multiple inference adjustments. Both of these complications lead to worse prospects for statistically significant detection of nonzero effects.

The second aspect is what the study does to its test subjects. I'm sure most of them are fine, but what is the risk that one of them gets a wrong idea and picks up inspiration from the study to later go on to develop and deploy their own biological pathogens? I expect the probability to be low, but given the stakes at hand, the expected cost might be large. In practice, the study can serve as a miniature training camp for potential bioterrorists. Before being admitted to the study, test subjects were screened, e.g., for criminal records. That is a good thing of course, but it would be foolish to trust OpenAI (or anyone else) to have an infallible way of catching any would-be participant with a potential bioterrorist living in the back of their mind.

One possible reaction OpenAI might have to the above discussion about statistical aspects is that in order to produce more definite results they will scale up their study with more participants. To which I would say please don't increase the number of young biologists exposed to your bioterrorism training. To which they might reply by insisting that this is something they need to do to evaluate their models' safety, and surely I am not opposed to such safety precautions? To which my reply would be that if you've built something you worry might cause catastrophe if you deploy it, a wise move would be to not deploy it. Or even better, to not build it in the first place.

Footnotes

1) See the twin blogposts (first, second) by Zvi Mowshowitz on the Gemini incident for an excellent summary of what has happened so far.

2) Regarding the difficulty of AI alignment, see, e.g., Roman Yampolskiy's brand-new book AI: Unexplainable, Unpredictable, Uncontrollable for a highly principled abstract approach, and the latest 80,000 Hours Podcast conversation with Ajeya Cotra for an equally incisive but somewhat more practically oriented discussion.

3) As with so much else happening in the AI sphere at present, Zvi Mowshowitz has some of the most insightful comments on the Superalignment announcement.

4) I say "so-called" here because the policy is not stringent enough to make their use of the term "responsible" anything other than Orwellian.

5) The third main competitor, besides OpenAI and Anthropic, in the race towards superintelligent AI is Google/DeepMind, who have not publicized any corresponding such framework. However, in a recent interview with Dwarkesh Patel, their CEO Demis Hassabis assures us that they employ similar frameworks in their internal work, and that they will go public with these some time this year.

6) Wisely, they emphasize the preliminary nature of the framwork, and in particular the possibility of adding further risk dimensions that turn out in future work to be relevant.

7) Statistical significance is mentioned 20 times in the report.

8) Unfortunately, this has worked less well than one might have hoped; hence the ongoing replication crisis in many areas of science.

9) The burden of proof location issue here is roughly similar to the highly instructive Fermi-Szilard disagreement regarding the development of nuclear fission, which I've written about elsewhere, and where Szilard was right and Fermi was wrong.

10) "Conservative" is here in the Fermian rather than the Szilardian sense; see Footnote 9.

fredag 5 januari 2024

Video talk on OpenAI's Preparedness Framework

Here's my latest video lecture on what OpenAI does regarding AI risk and AI safety, following up on the ones from December 2022 and March 2023. This time I focus on the so-called Preparedness Framework that they announced last month, and criticize it along lines heavily influenced by Zvi Mowshowitz writing on the same topic.

onsdag 13 december 2023

AI-frågan: vad göra?

Så vad behöver vi då göra åt den AI-problematik jag envisas med att orera om i bloggpost efter bloggpost?¹ Tro det eller ej, men den saken har under hösten faktiskt börjat klarna något. Zvi Mowshowitz (som under 2023 med sitt nyhetsbrev Don't worry bout the vase blivit till en i AI-frågor omistlig nyhetskälla) sammanfattar:

Policy now has to lay the groundwork for a regime where we have visibility into the training of frontier models. We collectively must gain the power to restrict such training once a model’s projected capabilities become existentially dangerous until we are confident we know what it would take to proceed safely.

That means registration of large data centers and concentrations of compute. That means, at minimum, registration of any training runs above some size threshold, such as the 10²⁶ flops chosen in the executive order. In the future, it means requiring additional safeguards, up to and including halting until we figure out sufficient additional procedures, with those procedures ramping up alongside projected and potential model capabilities. Those include computer security requirements, safeguards against mundane harm and misuse, and, most importantly, protection against existential risks.

I believe that such a regime will automatically be a de facto ban on sufficiently capable open source frontier models. Alignment of such models is impossible; it can easily be undone. The only way to prevent misuse of an open source model is for the model to lack the necessary underlying capabilities. Securing the weights of such models is obviously impossible by construction. Releasing a sufficiently capable base model now risks releasing the core of a future existential threat when we figure out the right way to scaffold on top of that release—a release that cannot be taken back.

Finally, we will need an international effort to extend such standards everywhere.

Yes, I know there are those who say that this is impossible, that it cannot be done. To them, I say: the alternative is unthinkable, and many similarly impossible things happen when there is no alternative.

Yes, I know there are those who would call such a policy various names, with varying degrees of accuracy. I do not care.

I would also say to such objections that the alternative is worse. Failure to regulate at the model level, even if not directly fatal, would then require regulation at the application level, when the model implies the application for any remotely competent user.

Give every computer on the planet the power to do that which must be policed, and you force the policing of every computer. Let’s prevent the need for that.

There are also many other incrementally good policies worth pursuing. I am happy to help prevent mundane harms and protect mundane utility, and explore additional approaches. This can be a ‘yes, and’ situation.

En rimlig följdfråga från den läsare som inte tycker sig besitta vare sig relevant spetskompetens eller någon särskild maktposition är denna: "Ok, men vad kan lilla jag göra för att bidra till att detta faktiskt händer?"

Well, det handlar om det vanliga batteriet av åtgärder när en politisk omorientering är av nöden: prata med folk, i fikarum, mingelbarer, skolsalar och sammanträdesrum; gå ut på gatan (helst i grupp) och skandera slagord; kontakta en riksdagsledamot, en EU-parlamentariker, eller en ledamot i Kristerssons nya AI-kommission²; skriv en insändare, en bloggpost, ett Facebookinlägg eller varför inte en bok; starta en aktionsgrupp eller en studiecirkel; gå med i ett politiskt parti och driv frågan där; ring P1; etc, etc.

Diverse ursäkter finns i vanlig ordning till hands för den som vill slippa engagera sig. Låt mig nämna en av dem - den som handlar om att inte spelar det väl någon roll vad vi i Sverige gör? På vilket mitt svar blir att det gör det visst. Det behöver skapas ett politiskt momentum och en global konsensus kring idén att det inte är ok att de ledande kaliforniska AI-företagen i sin interna kapplöpning mot AI-herravälde spelar rysk roulette med mänsklighetens överlevnad. Att Sverige skulle stå vid sidan om denna rörelse duger inte. Vi kan och vi skall bidra på olika vis, inklusive via organisationer som FN, EU och NATO, jämte ett otal mer informella nätverk och sammanhang. Kasta en blick över Atlanten och betrakta den politiska situationen därborta: inte fan kan vi överlåta åt USA att ensamt bestämma mänsklighetens öde?

Fotnoter

1) För en sammanfattning av läget, se t.ex. det 73-sidiga bonuskapitlet (gratis nedladdningsbart i pdf) till nyutgåvan av min bok Tänkande maskiner.

2) Låt er inte nedslås av att kommissionens ordförande uttalat fåraktigheter om att AI är "kärlek och empati" och att det spridits "väl mycket larmrapporter" i frågan - detta gör det bara än mer angeläget att påverka kommissionens arbete!

torsdag 2 november 2023

En intensiv vecka i AI-politiken

Det är ännu bara torsdag, men ändå har mer hänt denna vecka i fråga om statliga och mellanstatliga AI-politiska initiativ än vi normalt ser på... jag vet inte ens vad jag skall klämma till med för tidsrymd här, för det politiska intresset för AI-frågor är ju så nyvaket att det inte finns något steady state att relatera ordet "normalt" till. De två stora händelser jag har i åtanke är följande.

I måndags: President Bidens direktiv om Safe, Secure, and Trustworthy Artificial Intelligence.
Igår och idag: Den första globala AI Safety Summit, på Bletchley Park och med Storbritanniens prämiärminister Rishi Sunak som initiativtagare och värd, samt deltagande såväl av toppolitiker (med Kamala Harris och Ursula von der Leyen i spetsen) som av AI- och teknikbranschprofiler (Yoshua Bengio, Geoffrey Hinton, Sam Altman, Elon Musk, ...).

Redan igår, på Bletchley Park-mötets första dag, släppte de sin Bletchley Declaration, undertecknad av företrädare för EU, USA, Kina, Indien, Storbritanninen och en rad andra länder, och med formueringar som denna:

There is potential for serious, even catastrophic, harm, either deliberate or unintentional, stemming from the most significant capabilities of [frontier] AI models. Given the rapid and uncertain rate of change of AI, and in the context of the acceleration of investment in technology, we affirm that deepening our understanding of these potential risks and of actions to address them is especially urgent. I Bidens presidentorder finns tal om krav på...

companies developing any foundation model that poses a serious risk to national security, national economic security, or national public health and safety must notify the federal government when training the model, and must share the results of all red-team safety tests. These measures will ensure AI systems are safe, secure, and trustworthy before companies make them public, där jag gärna vill tänka mig att "national security, national economic security, or national public health and safety" är ett slags placeholder för "existential risk to humanity" som ännu inte riktigt får plats inom Overtonförnstret på denna politiska nivå.

Fastän båda dokumenten är utspädda med tal om AI-frågor av jämförelsevis sekundär betydelse, och fastän det i båda fallen inte handlar om något med status av reglering eller bindande avtal utan blott avsiktsförklaringar och storstilade ambitioner, så ser jag de ovan citerade formuleringarna som ett bevis på hur otroligt långt vi under 2023 har lyckats flytta Overtonförnstret för publika AI-diskussioner, där jag menar att de båda öppna brev jag i våras var med och undertecknade (det som organiserades av FLI i mars och det av CAIS i maj) har haft en icke oväsentlig betydelse. Trots den remarkabelt snabba omsvängningen i diskussionsklimatet känner jag ändå en kvardröjande oro om att det kanske inte går snabbt nog för att hinna avvärja katastrof, men en vecka som denna kan jag inte annat än känna mig gladare och hoppfullare än veckan innan.

Jag har inte hunnit smälta dokumentens innehåll tillräckligt för att kommentera dem mer i detalj, men vad gäller Bidens presidentorder har den ständigt läsvärde och gedigne Zvi Mowshowitz varit snabbt på plats med två utförliga texter som jag i stora drag är böjd att instämma i: On the executive order och Reactions to the executive order. Om jag känner honom rätt så kan vi inom någon dag eller två vänta oss en ungefär lika ambitiös reaktion från honom på Bletchley-deklarationen.

Jag vill passa på att nämna att jag som engagerad åskådare till Bletchley Park-mötet gjort min stämma hörd i ett par sammanhang:

När Svenska Dagbladet i förrgår rapporterade inför mötet finns några intervjusvar från mig med.
Jag finns också med bland undertecknarna till ännu ett öppet brev - Urging an International AI Treaty: An Open Letter - och precis som med de två ovan nämnda breven är det även denna gång många prominenta namn jag har nöjet att samsas med.

Edit 8 november 2023: Nu finns den förutskickade texten av Zvi Mowshowitz om Bletchley Park-mötet.