My friend David Sumpter
(a professor of applied mathematics at Uppsala University) and I have had stimulating discussions and deep disagreements on AI and related topics off and on since (at least) 2013
. Each year since 2019, we have taken part in a seminar organized by Frances Sprei at Chalmers, with an audience of computer science students. This year, as usual, we zoomed in quickly on our diametrically opposing views on whether or not the creation of artificial general intelligence (AGI) is a realistic prospect in the present century. David thinks the answer is no, and pointed to the deep divide between present-day narrow AI and (the so far exclusively hypothetical) notion of AGI, to the historical record of how difficult technology is to predict, and to our inability to faithfully simulate even very primitive biological organisms. I, on the other hand, think the answer may well be yes, and pointed to Ajeya Cotra's more ambitious comparisons with biological systems
, and to the fallaciousness of the so-called common sense argument
which I believe drives much of AGI skepticism. I don't think having seen our seminar is a necessary prerequisite for enjoying the following correspondence, but for those who still wish to see it it's available on YouTube
In earlier discussions, David has noted (quite correctly) my resistance to and blatant disrespect for the norm that science is a gradual process that does not deal with phenomena (such as AGI) that are far outside of the envelope of what is already known and understood. It was with reference to this that he made the friendly-yet-provocative remark during the Q&A (not included in the video) that ''Olle does not understand how science works'', which triggered me to write to him afterwards with an attempt at a clarification regarding my view of science. This led to a further exchange between the two of us regarding AGI and the broader issue of how one should think about scientific and rational reasoning. We have not reached the end of our discussion, but David has indicated that he needs more time to carefully lay out his position, so we agreed that it makes sense to post what we already have, and leave readers with a bit of a cliffanger until a later blog post. Here goes:
From OH to DS, November 25:
I still feel our debates are productive in that we’re zooming in (slowly) towards the crux of our disagreement. When I pressed you about prediction, you said some things can be predicted (such as your prediction about AGI), and some cannot (such as Cotra’s prediction about AGI). What makes that answer unacceptable in my view is that the two of you are asking the same question (when will we get AGI?), so what you’re doing here is to say that some answers to that question are scientifically legitimate, while others are not. Doing so introduces the strongest possible bias in the search for the correct answer to that question. I disagree vehemently with such approaches, and in particular with one-sidedly invoking “burden of evidence” in cases whose complexity go far beyond Popper’s idealized “are all swans white?” situation; for more detail on this, see my old blog post on “Vulgopopperianism”
Relatedly, when you say “Olle does not understand how science works”, I appreciate the witticism, but my reaction to it depends on whether you mean “how science works” as a sociological observation about how science is actually practiced today or as a normative claim about how we ought to go about. As a sociological observation, I am totally on board with your view of how (most) science works, but normatively, I reject it. And we must be clear about the distinction between the descriptive and the normative questions: “this is how science is practiced today” is not a satisfactory answer to “how should we do science?”.
From DS to OH, November 26:
Thanks for this. Very useful. You wrote:
When I pressed you about prediction, you said some things can be predicted (such as your prediction about AGI), and some cannot (such as Cotra’s prediction about AGI).
I suppose I do think that I am not making a prediction. I am rather saying the future is highly uncertain and no-one has presented any evidence that AGI is on its way. A bit like the aliens example. There is no evidence presented for the existence of aliens who can travel to our planet. But who knows what will happen. So maybe they are already on their way when they saw our first signs of technology. Or there is no evidence that we are living in a computer simulation. But we have all seen the Matrix, so it might be true. So here we have two ideas that are far out in the same way that Terminator is far out. And they should be treated in the same way. So I am not making a prediction in the same way as Cotra is making a prediction. She is describing a distribution of likely year outcomes. This to me this doesn’t pass the test of methods which have been successful in the past. For example, in Outnumbered I wrote about election forecasting, but we could take a lot of other things — economic forecasting, forecasting life outcomes. See for example this PNAS paper
. We just don’t have methods for predicting outcomes of these sorts. So I do give less value to these types of prediction methods.
I feel similar about your argument about exponential growth (which for me is the linear model, btw!). Without previous instances available, it is impossible to extrapolate in that way.
So, if I am not making a prediction, what am I doing? Well this year I broke it down into three stages. The first is a look at the machine learning methods available today. Surely, that should be a big part of our focus here (especially since you now think that AGI is very likely on a shorter timescale). Even on the timescale of Cotra’s predictions we should be able to find some signal of its arrival? And I just can’t see it. Markov’s first application of a Markov chain in 1907 was in predicting the next letter (vowel or constant) in Pushkin. Yes. We can now construct massive Markov chains of text, which is now done, and the same results on a larger scale come out. That is all that has happened here. The same is true of each application. Chess is a combinatorically massive (but closed, finite) problem. Go is slightly bigger. With big enough computers, we were always going to solve it with the right representation. Protein folding is an impressive piece of engineering, as our many ways in which machine learning is used. But they all come from the underlying maths. I suppose the scope of the debate doesn’t give me time to break all of these down, but I think that for any given AI development, I could break it down for you in terms of early or mid-20th century maths. And I don’t see where the new maths might lie that supersedes it.
The rapid discovery of new maths, would undermine the argument above (the ten year argument). But again, Cotra’s argument should account for this lack of new maths? There is no abundance of new computational ideas waiting to be inserted in to AI. Instead, the AGI argument is usually made around scale (back to exponential growth). If we scale up Markov’s predictions with pencil and paper to computing today we increase the range over time we can predict a text. There is a belief (a wrong-headed one I think) that this type of scaling can be used to predict the evolution of AI (see for example this paper
for the scaling argument). But even from Markov’s first paper, we would have been able to predict that AI scales in this way! Of course it does. The more symbols we read and predict, the longer it can keep churning out syntactically correct text. But this is fact free text. It is nonsense.
My fifty year biological argument is simply another way in to explaining the above, without going in to the maths. It is a way of gauging what maths and computing allows us to do and what it (strangely) doesn’t.
Eventually, you are then forced, and at some point you did this, to move toward the idea that “humans might just be like language models”. This is to me a religous belief. You can think it, if you want, but it does not square with my experience of the world. Nor does it align with how I see you as an individual. I see our relationship as uniquely human. I can’t understand it in any other way. So while those are mystical statements, I see any appeal to "we are just a neural networks" as an equally mystical statement. I at least, in this context, am prepared to admit my mysticism.
I don’t think that reality is a computation. So we are exempt from the Church-Turing hypothesis. I think computation is one limited part of the rich ways in which the world manifests itself. Ultimately, this is a philosophical position. But it is self-consistent. Your position is very strongly one of logical positivism and mine is more of (scientific) realism.
You should put away Popper and read Hilary Putnam, for example. I think this might help.
Have a good weekend,
From OH to DS, November 29:
Hi David, and many thanks for your thoughtful response!
But you seem to want to have your cake and eat it too. When you confidently state that AGI will not be developed in the foreseeable future, and simultaneously insist on not having made a prediction about AGI, you’re making two incompatible claims. If you say that AGI is very unlikely in the foreseeable future, then that is a prediction, no matter how you arrived at it. Conversely, if you insist on not having made a prediction, then it doesn’t make sense to respond to the AGI timeline and feasibility issue in any other way than by shrugging and saying “I have no clue”.
Your argument for your having-the-cake-and-eating-it is precisely the kind of burden-of-proof-asymmetry gambit that I do not accept. You say:
I am rather saying the future is highly uncertain and no-one has presented any evidence that AGI is on its way.
Assuming for the moment that your claim here that no evidence is available for AGI being on its way (a claim I will return to and contest below) is correct, why in the world would this settle the matter, with zero regard to whether or not there is any evidence for the opposed statement that AGI is not on its way?
Simplifying just a little bit, there are basically two ways the world can be as regards the AGI issue: either
(a) AGI is attainable within the next 100 years, or
(b) there is some secret sauce about human cognition that makes AGI unattainable within the next 100 years.
(Secret sauce here need not literally be some substance, but could denote whatever aspect of our minds that cannot be replicated in AI in the given time frame.)
You seem to be saying that if the collected evidence for (a) and the collected evidence for (b) are both weak, then we should treat (b) as true by default. I strongly disagree. To require that defenders of (a) should provide, e.g., a concrete plan for how to attain AGI within the given time frame, while at the same time absolving defenders of (b) from having to point to what the secret sauce consists in, is to rig the discussion on (a) vs (b) so much in (b)’s favor that it becomes almost pointless. If we are to carry on our debates without short-circuiting them from the start, we have to take both (a) and (b) seriously, and treat neither of them as correct by default.
Speaking of plans for how to attain AGI, you said in the Q&A session at our seminar that any argument for (a) would have to be accompanied by such a plan in order to be taken seriously, but that no such plan has been but forth. This last claim is however false, as multiple such plans have been suggested, one of which in fact you’ve alluded to yourself, namely the scaling approach of building bigger and bigger GPT-n systems until eventually one of them turns out to be an AGI. Does this plan guarantee success? Of course not. But neither is it guaranteed to fail. And it is (I maintain) unreasonable to require the existence of a 100% fail-safe plan for building an AGI as a prerequisite for attaching a nonzero probability to (a) being true.
I get the impression that your failure to notice that the scaling approach is a plan for building AGI is a consequence of your deeply held belief that it just cannot work – a belief that in turn rests on your strong belief in (b). Please correct me if I am misreading you here, but in case I am not, you surely see the problem: if we are to have a meaningful and non-circular discussion on (a) vs (b), you cannot appeal to arguments that presuppose (b).
Let me now go back to your claim that there is no evidence that AGI is on its way. I think this claim is simply false. The more broadly competent and intelligent AI we produce, the more this points towards AGI being on its way. Various increasingly widely competent products from DeepMind such as MuZero constitute such evidence, and the same goes for the large GTP-like models from OpenAI and others. None of this is definite proof of (a), of course, and if you consider it to be weak evidence then I respect your judgement (while, depending on the meaning of “weak”, perhaps not quite sharing it), but if you go as far as to insist that it constitutes zero evidence, then I can only take that as a sign that you are dogmatically presupposing the secret sauce-hypothesis (b) as a means to disqualifying all the evidence, again inserting a nonproductive circularity into the discussion.
To back up your skepticism about the evidential value of the performance of GPT-3 and other language models, you state that all they produce is “syntactically correct […] nonsense”. But how do you come to such a conclusion? Let me give an example. Suppose you quiz me with the following:
Trevor has wanted to see the mountain with all of the heads on it for a long time, so he finally drove out to see it. What is the capital of the state that is directly east of the state that Trevor is currently in?
And this is my reply:
The mountain with all of the heads on it is Mount Rushmore. Mount Rushmore is in South Dakota. The state directly east of South Dakota is Minnesota. The capital of Minnesota is St. Paul. The answer is "St. Paul".
Would you dismiss my reply as “nonsense”? Of course not: you would (correctly) conclude that, far from talking nonsense, I am systematically working out the answer to your quiz.
Now modify the thought experiment by leaving the dialogue intact but replacing me by Google’s GTP-like language model PaLM. Actually, this is exactly how PaLM answers this quiz question. How do you then judge its answer? Is it systematic reasoning or just nonsense? If you dismiss it as nonsense, I am again led to suspect that you are short-circuiting the discussion by falling back on the secret sauce-hypothesis (b). You might retort that no, that’s not what you’re doing, and explain that your conclusion is based on the empirical fact that PaLM responds to a bunch of other questions by obvious bullshitting and nonsense. Well, that doesn’t quite work, because the same holds for me: if you quiz me with sufficiently confusing questions, my answers will degenerate into nonsense. So you might qualify by saying that you count my answer as sense and PaLM’s as nonsense because PaLM gives obviously nonsensical answers more often than I do. That’s actually a stance that I am somewhat sympathetic to, but note that it points towards a quantitative rather than qualitative difference between me and PaLM, so while it can be taken to support the view “PaLM is not AGI”, it seems to me much more suspect to read it as supporting “PaLM is not even a step towards AGI”.
My reply is becoming longer than I intended, but before closing I want to clarify something on the issue of “how science works” that I addressed in my previous email. Namely, I have found that people with a strong science background are often prone to the following kind of reasoning; I’m not sure if this literally applies to you as well, but it seems to me that you have operated in its vicinity:
They argue the claim “near-term AGI is unfalsifiable”, from which they conclude “near-term AGI is an unscientific issue” and from that finally arrive at “near-term AGI will not happen”. Or they apply the same chain of reasoning to go from “near-term contact with extraterrestrial aliens is unfalsifiable” to “near-term contact with extraterrestrial aliens will not happen”, or from “the simulation hypothesis is unfalsifiable” to “the simulation hypothesis is false”. I believe all these inferences are non sequiturs based on a category error: abstract rules and conventions for what constitutes proper science cannot on their own answer substantial questions about the world. To me, the AGI question and the aliens question and the simulation question are all important questions about the world, and I want to be able to address them without having my reasoning short-circuited by conventions about unfalsifiability. Hence, if they insist that science is built on conventions that either allow for the inference from unfalsifiability to falsehood, or prohibit reasoning about AGI/aliens/simulation altogether, then I say to heck with (their notion of) science – I will address these questions outside of the narrow confines of science, and instead within the realm of rational reasoning!
I value our exchange highly, David, and I have great respect for you. This respect comes from your mathematical and scientific brilliance, and from your kindness and overall lovely personality – but not from any hypothetical secret sauce lurking somewhere deep in your mind. I am hoping that you can extend the same secret sauce-independence in your view of me, because I am not at all sure that I have any.