En medborgare och matematiker ger synpunkter på samhällsfrågor, litteratur och vetenskap.
(2) Our ability to collectively decide not to build an AI that might wipe out Homo sapiens.
1) This was in the proceedings of a meeting held at the EU parliament on October 19, 2017. My discussion of the concepts of optimism and pessimism was provoked by how prominently these termes were used in the framing and marketing of the event.
2) Note here that in the quoted phrase I take both optimism and pessimism as deviations from what is justified by evidence - for instance, I don't here mean that taking the probability of things going well to be 99% to automatically count as optimistic. This is a bit of a deviation from standard usage, which in what follows I will revert to, and instead use phrases like "overly optimistic" to indicate optimism in the sense I gave the term in 2017.
3) To be fair to my 2017 self, I did add some nuance already then: the acceptance of "a different kind of optimism which I am more willing to label as rational, namely to have an epistemically well-calibrated view of the future and its uncertainties, to accept that the future is not written in stone, and to act upon the working assumption that the chances for a good future may depend on what actions we take today".
4) As for myself, I discovered Yudkowsky's writings in 2008 or 2009, and insofar as I can point to any single text having convinced me about the unique importance of AI safety, it's his 2008 paper Artificial intelligence as a positive and negative factor in global risk, which despite all the water under the bridges is still worthy of inclusion on any AI safety reading list.
5) Yudkowsky should be credited with making this distinction. In fact, when the Overton window on AI risk shifted drastically in early 2023, he took that as a sufficiently hopeful sign so as to change his mind in the direction of a somewhat less pessimistic view regarding (2) - see his much-discussed March 2023 Time Magazine article.
6) I don't deny that, due to the aforementioned demoralization phenomenon, pessimism about (1) might also be self-fulfilling to an extent. I don't think, however, that this holds to anywhere near the same extent as for (2), where our ability to coordinate is more or less constituted by the trust that the various participants in the race have that it can work. Regarding (1), even if a grim view of its feasibility becomes widespread, I think AI researchers will still remain interested in making progress on the problem, because along with its potentially enormous practical utility, surely this is one of the most intrinsically interesting research questions on can possibly ask (up there with understanding biogenesis or the Big Bang or the mystery of consciousness): what is the nature of advanced intelligence, and what determines its goals and motivations?
7) See also Dwarkesh Patel's four-and-a-half hour interview with Aschenbrenner, and Zwi Mowshowitz' detailed commentary.
1) See the twin blogposts (first, second) by Zvi Mowshowitz on the Gemini incident for an excellent summary of what has happened so far.
2) Regarding the difficulty of AI alignment, see, e.g., Roman Yampolskiy's brand-new book AI: Unexplainable, Unpredictable, Uncontrollable for a highly principled abstract approach, and the latest 80,000 Hours Podcast conversation with Ajeya Cotra for an equally incisive but somewhat more practically oriented discussion.
3) As with so much else happening in the AI sphere at present, Zvi Mowshowitz has some of the most insightful comments on the Superalignment announcement.
4) I say "so-called" here because the policy is not stringent enough to make their use of the term "responsible" anything other than Orwellian.
5) The third main competitor, besides OpenAI and Anthropic, in the race towards superintelligent AI is Google/DeepMind, who have not publicized any corresponding such framework. However, in a recent interview with Dwarkesh Patel, their CEO Demis Hassabis assures us that they employ similar frameworks in their internal work, and that they will go public with these some time this year.
6) Wisely, they emphasize the preliminary nature of the framwork, and in particular the possibility of adding further risk dimensions that turn out in future work to be relevant.
7) Statistical significance is mentioned 20 times in the report.
8) Unfortunately, this has worked less well than one might have hoped; hence the ongoing replication crisis in many areas of science.
9) The burden of proof location issue here is roughly similar to the highly instructive Fermi-Szilard disagreement regarding the development of nuclear fission, which I've written about elsewhere, and where Szilard was right and Fermi was wrong.
10) "Conservative" is here in the Fermian rather than the Szilardian sense; see Footnote 9.
Policy now has to lay the groundwork for a regime where we have visibility into the training of frontier models. We collectively must gain the power to restrict such training once a model’s projected capabilities become existentially dangerous until we are confident we know what it would take to proceed safely.
That means registration of large data centers and concentrations of compute. That means, at minimum, registration of any training runs above some size threshold, such as the 1026 flops chosen in the executive order. In the future, it means requiring additional safeguards, up to and including halting until we figure out sufficient additional procedures, with those procedures ramping up alongside projected and potential model capabilities. Those include computer security requirements, safeguards against mundane harm and misuse, and, most importantly, protection against existential risks.
I believe that such a regime will automatically be a de facto ban on sufficiently capable open source frontier models. Alignment of such models is impossible; it can easily be undone. The only way to prevent misuse of an open source model is for the model to lack the necessary underlying capabilities. Securing the weights of such models is obviously impossible by construction. Releasing a sufficiently capable base model now risks releasing the core of a future existential threat when we figure out the right way to scaffold on top of that release—a release that cannot be taken back.
Finally, we will need an international effort to extend such standards everywhere.
Yes, I know there are those who say that this is impossible, that it cannot be done. To them, I say: the alternative is unthinkable, and many similarly impossible things happen when there is no alternative.
Yes, I know there are those who would call such a policy various names, with varying degrees of accuracy. I do not care.
I would also say to such objections that the alternative is worse. Failure to regulate at the model level, even if not directly fatal, would then require regulation at the application level, when the model implies the application for any remotely competent user.
Give every computer on the planet the power to do that which must be policed, and you force the policing of every computer. Let’s prevent the need for that.
There are also many other incrementally good policies worth pursuing. I am happy to help prevent mundane harms and protect mundane utility, and explore additional approaches. This can be a ‘yes, and’ situation.
2) Låt er inte nedslås av att kommissionens ordförande uttalat fåraktigheter om att AI är "kärlek och empati" och att det spridits "väl mycket larmrapporter" i frågan - detta gör det bara än mer angeläget att påverka kommissionens arbete!