tisdag 20 september 2022

Katastrofrisker från nya teknologier: två färska framträdanden

Om jag nämner katastrofrisker från nya teknologier så kommer trogna läsare att känna igen det som det på senare år kanske mest frekventa temat för mina debattinlägg och andra skriverier, och jag har nyligen gjort två framträdanden som gemensamt kan sorteras under denna rubrik och som går att ta del av på nätet:
  • Vi är inte rustade att möta de extrema risker vi står inför heter den debattartikel med Max Tegmark och Anders Sandberg som vi fick in i Göteborgs-Posten i torsdags, den 15 september. I förhållande till den engelskspråkiga litteraturen om globala katastrofrisker har vi inte så mycket nytt att komma med utöver en försiktig anpassning till svenska förhållanden av de rekommendationer som Toby Ord, Angus Mercer och Sophie Dannreuther gör för Storbritannien i deras rapport Future Proof, men ämnet är ohyggligt viktigt och det vi säger om behovet av förebyggande arbete tål att upprepas.
  • Idag släppte den irländske filosofen John Danaher del 12 av sin podcast The Ethics of Academia med rubriken Olle Häggström on Romantics vs Vulgarists in Scientific Research - finns där poddar finns. Vi tar i vårt samtal avstamp i min text Vetenskap på gott och ont (finns även i engelsk översättning med rubriken Science for good and science for bad) och den lite karikatyrartade uppdelning av gängse forskningsetiska synsätt jag gör när jag kontrasterar de akademisk-romantiska mot de ekonomistisk-vulgära. Istället för att välja mellan dessa två förespråkar jag ett tredje synsätt som till skillnad mot de två andra tar vederbörlig hänsyn till de risker eventuella forskningsframsteg kan föra med sig.

torsdag 8 september 2022

Another presumptuous philosopher

In his wonderfully rich 2002 book Anthropic Bias: Observation Selection Effects in Science and Philosophy, Nick Bostrom addresses deep and difficult questions about how to take our own existence and position in the universe into account in inference about the world. At the heart of the book are two competing principles: the SSA (Self-Sampling Assumption) and the SIA (Self-Indication Assumption).1 An adherent of SSA starts from a physical/objective/non-anthropic Bayesian prior for what the world is like, and then updates based on the information of finding themselves inhabiting the body and the position in space-time that they do, under the assumption that they are a random sample from the class of relevant observers.2 An adherent of SIA does the same thing, except that before updating they reweight the prior through a kind of size-bias: each possible world has its probability multiplied by the number of relevant observers and then renormalized.

It is far from clear that either SSA or SIA produces correct reasoning, but they are the main candidates on the table. Bostrom offers many arguments for and against each, but ends up favoring SSA, largely due to the presumptuousness of sticking to SIA that is revealed by the following thought experiment.
    The presumptuous philosopher

    It is the year 2100 and physicists have narrowed down the search for a theory of everything to only two remaining plausible candidate theories: T1 and T2 (using considerations from super-duper symmetry). According to T1 the world is very, very big but finite and there are a total of a trillion trillion observers in the cosmos. According to T2, the world is very, very, very big but finite and there are a trillion trillion trillion observers. The super-duper symmetry considerations are indifferent between these two theories. Physicists are preparing a simple experiment that will falsify one of the theories. Enter the presumptuous philosopher: ''Hey guys, it is completely unnecessary for you to do the experiment, because I can already show you that T2 is about a trillion times more likely to be true than T1!''

This does indeed serve as an intuition against SIA, but it only takes a minor modification to produce another equally plausible thought experiment that serves equally strongly as an intuition against SSA.3 Taken together, the two thought experiments should not push us in any particular direction as to which of the two principles is preferable. Here goes:
    Another presumptuous philosopher

    It is the year 2100 and humanity has succeeded in the twin feats of (a) establishing that we are alone in the universe, and (b) once and for all solving xrisk, so that there is no longer any risk for permature extinction of humanity: our civilization will persist until the end of time. Physicists are on the verge of accomplishing a third feat, namely (c) finding the true and final theory of everything. They have narrowed down the search to only two remaining plausible candidate theories: T1 and T2 (using considerations from super-duper symmetry). According to T1 the world will last for a very, very long but finite amount of time, and there will be a total of a trillion trillion observers in the cosmos. According to T2, the world will last for a very, very, very long but finite amount of time, and there will be a trillion trillion trillion observers. The super-duper symmetry considerations are indifferent between these two theories. Physicists are preparing a simple experiment that will falsify one of the theories. Enter the SSA-adhering presumptuous philosopher: ''Hey guys, it is completely unnecessary for you to do the experiment, because I can already show you that T1 is about a trillion times more likely to be true than T2!''

Note that the SSA-adhering presumptuous philosopher's reasoning is exactly the same as in Brandon Carter's Doomsday Argument; I have merely changed the narrative details a bit in order to emphasize the similarity with Bostrom's presumptuous philosopher. Note also that if the two philosophers (the one favoring SIA in Bostrom's example, and the one favoring SSA in mine) trade places, then neither of them will be inclined to suggest any anthropic modification at all of the physicists' credence in T1 and T2.

In my opinion, the two thought experiments taken side by side serve as to illustrate that we are still confused about how to do anthropic reasoning. Both SIA and SSA produce appalingly presumptuous inferences. Is there some third possibility that avoids all of that? Maybe yes, but I suspect probably no, and that we will need to bite the bullet somewhere.

Footnotes

1) The names are terrible but have stuck.

2) I am here glossing over what ''random'' means (typically a uniform distribution, either over observers or so-called observer-moments), and even more so the meaning of ''relevant'', but the reader can rest assured that both Bostrom and 20 years of commentators treat these issues at length.

3) Olum (2002) suggests, for the same purpose, a different modification of Bostrom's original thought experiment. Here, super-duper symmetry comes with a vaguely Occam-like principle where, a priori, a theory of everything has probability inversely proportional to the size of the resulting universe. Bostrom and Cirkovic (2003), however, dismiss the example as far-fetched. I am not sure I find that dismissal convincing, but be that as it may, I still hope the modification I propose here is more to their taste.

onsdag 7 september 2022

New preprint on the Hinge of History

My latest preprint, entitled The Hinge of History and the Choice between Patient and Urgent Longtermism, is out now. Some terminological explanation and further context:
  • The Hinge of History means, roughly, the most important time in the entire human history (past, present and future), in which we either get our act together or do something really bad that destroys all or most of future value. This can be made more precise in various ways discussed in the preprint. An increasingly popular idea is that the Hinge of History is now. Will MacAskill pushes back against this idea in a recent paper called Are we Living at the Hinge of History?, and in my preprint I push back against his pushback.
  • Longtermism, in the words of MacAskill in his recent book What We Owe the Future, is ''the idea that positively influencing the longterm future is a key moral priority of our time''.
  • Talk of urgent vs patient longtermism refers to whether this positive influence is best achieved via concrete object-level action now or via saving resources for such action at later times.
  • The media attention around What We Owe the Future has been stupendous, not only in intellectually oriented and/or effective altruism-adjacent outlets such as Astral Codex Ten and podcasts by Tyler Cowen, Sean Carroll and Sam Harris, but also in mainstream media such as The New Yorker, Time Magazine, The Guardian and the BBC. I share the wide-spread enthusiasm for the book, and intend soon to help fix the relative shortage of reviews in Swedish.
  • It has been pointed out that MacAskill sometimes defends far-reaching positions in academic papers and backs down to more moderate stances in his book, an example being how in a paper with Hilary Greaves he lays out the case for strong longtermism defined as the view that ''far-future effects are the most important determinant of the value of our options [today]'', but is content in the book with the somewhat more watered-down longtermism defined as above. Another example of this is his defense of patient longtermism in Are we Living at the Hinge of History, which is toned down in What We Owe the Future almost to the point of pressing the mute button. One may raise an eyebrow at this inconsistency, but in my opinion it is perfectly reasonable to explore principled positions in theoretical discussions taking place in seminar rooms and the academic literature, while choosing not to defend them in broader contexts.
Click here to find my new preprint!

onsdag 24 augusti 2022

Jag framträder i Oslo den 14 och 15 september

Lystring alla vänner i Oslotrakten! I mitten av nästa månad gör jag två föredragsframträdanden som ni har möjlighet att kostnadsfritt delta i: Vid båda tillfällen har jag avsikten att ge gott om utrymme för frågor och publikdiskussion.

måndag 1 augusti 2022

Mixed feelings about two publications on probability and statistics

The purpose of the present blog post is to report about my mixed feelings for two publications on probability and statistical inference that have come to my attention. In both cases, I concur with the ultimate message conveyed but object to a probability calculation that is made along the path towards that ultimate message.

Also, in both cases I am a bit late to the party. My somewhat lame excuse for this is that I cannot react to a publication before learning about its existence. In any case, here are my reactions:

*

The first case is the 2019 children's picture book Bayesian Probability for Babies by Chris Ferrie and Sarah Kaiser, which is available read out loud on YouTube. The reading of the entire book is over in exactly 2:00 minutes:

As you can see, the idea of the book is to explain Bayes' Theorem in a simple and fully worked out example. In the example, the situation at hand is that the hero of the story has taken a bite of a cookie which may or may not contain candies, and the task is to work out the posterior probability that the cookie contains candies given the data that the bite has no candies. This is just lovely...

...were it not for the following defect that shows up half-way (1:00) through the video. Before introducing the prior (the proportion of cookies in the jar having candies), expressions for P(D|C) and P(D|N) are presented, where D is the data from the bite, C is the event that the cookie has candies, and N is the event that it does not: we learn that P(D|C)=1/3 and P(D|N)=1, the authors note that the latter expression is larger, and conclude that "the no-candy bite probably came from a no-candy cookie!".

This conclusion is plain wrong, because it commits a version of the fallacy of the transposed conditional: a comparison between P(D|C) and P(D|N) is confused with one between P(C|D) and P(N|D). In fact, no probability that the cookie at hand has candies can be obtained before the prior has been laid out. The asked-for probability P(C|D) can, depending on the prior, land anywhere in the interval [0,1]. A more generous reader than me might object that the authors immediately afterwards do introduce a prior, which is sufficiently biased towards cookies containing candies to reverse the initial judgement and land in the (perhaps counterintuitive) result that the cookie is more likely than not to contain candies: P(C|D)=3/4. This is to some extent a mitigating circumstance, but I am not impressed, because the preliminary claim that "the no-candy bite probably came from a no-candy cookie" sends the implicit message that as long as no prior has been specified, it is OK to proceed as if the prior is uniform, i.e., puts equal probability on the two possible states C and N. But in the absence of specific arguments (perhaps something based on symmetry), it simply isn't. As I emphasized back in 2007, uniform distribution is a model assumption, and there is no end to how crazy conclusions one risks ending up with if one doesn't realize this.

*

The second case is the 2014 British Medical Journal article Trap of trends to statistical significance: likelihood of near significant P value becoming more significant with extra data by John Wood, Nick Freemantle, Michael King and Irwin Nazareth. The purpose of the paper is to warn against overly strong interpretations of moderately small p-values in statistical hypothesis testing. This is a mission I wholeheartedly agree with. In particular, the authors complain about other author's habit when, say, a significance level of 0.05 is employed but a disappointingly lame p-value of 0.08 is obtained, to describe it as "trending towards significance" or some similar expression. This, Wood et al remarks, gives the misleading suggestion that if only the sample size used had been bigger, significance would have been obtained - misleading because there is far from any guarantee that that would happen as a consequence of larger sample size. This is all fine and dandy...

...were it not for the probability calculations that Wood et al provide to support their claim. The setting discussed is the standard case of testing for a difference between two groups, and the article offer a bunch of tables where we can read, e.g., that if the p-value of 0.08 is obtained, and 50% more data is added to the sample size, then there's still a 39.7% chance that the outcome remains non-significant (p>0.05). The problem here is that such probabilities cannot be calculated, because they depend on the true but unknown effect size. If the true effect size is large, then the p-value is likely to improve with increased effect size (and in fact it would with probability 1 approach 0 as the sample size goes to infinity), whereas if the effect size is 0, then we should expect the p-value to regress towards the mean (and it would with probability 1 keep fluctuating forever over the entire interval [0,1] as the sample size goes to infinity).

At this point, the alert reader may ask: can't we just assume the effect size to be random, and calculate the desired probabilities as the corresponding weighted average over the possible effect sizes? In fact, that is what Wood et al do, but they barely mention it in the article, and hide away the specification of that prior distribution in an appendix that only a minority of their readers can be expected to ever lay their eyes on.

What is an appropriate prior on the effect size? That is very much context-dependent. If the statistical study is in, say, the field of parapsychology which has tried without success to demonstrate nonzero effects for a century or so, then a reasonable prior would put a point mass of 0.99 or more at effect size zero, and the remaining probability spread out near zero. If on the other hand (and to take another extreme) the study is about test subjects shown pictures of red or blue cars and asked to determine their color, and the purpose of the study is to find out whether the car being red increases the probability of the subject answering "red" compared to if the car is blue, then the reasonable thing to do is obviously to put most of the prior on large effect sizes.

None of this context-dependence is discussed by the authors. This omission serves to create the erroneous impression that if a study has sample size 100 and produces a p-value of 0.08, then the probability that the outcome remains non-significant if the sample size is increased to 150 can unproblematically be calculated to be 0.397.

So what is the prior used by Wood et al in their calculations? When we turn to the appendix, we actually find out. They use a kind of improper prior that treats all possible effect sizes equally, at the cost of the total "probability" mass being infinity rather than 1 (as it should be for a proper probability distribution); mathematically this works neatly because a proper probability distribution is obtained as soon as one conditions on some data, but it creates problems in coherently interpreting the resulting numbers such as the 0.397 above as actual probabilities. This is not among my two main problem with the prior, however. One I have already mentioned: the authors' utter negligence of showing that this particular choice of prior leads to relevant probabilities in practice, and their sweeping under the carpet of the very fact that there is a choice to be made. The other main problem is that with this prior, the probability of zero effect is exactly zero. In other words, their choice of prior amounts to dogmatically assuming that the effect size is nonzero (and thereby that the p-value will tend to 0 as the effect size increases towards infinity). For a study of what happens in other studies meant to shed light on whether or not a nonzero effect exists, this particular model assumption strikes me as highly unsuitable.

lördag 30 juli 2022

What are the chances of that?

Some months ago I was asked by the journal The Mathematical Intelligencer to review a recent popular science introduction to probability: Andrew Elliot's What are the Chances of that? (Oxford University Press, 2021). My review has now been published, and here is how it begins:
    A stranger approaches you in a bar and offers a game. This situation recurs again and again in Andrew Elliot’s book What are the Chances of that? How to Think About Uncertainty, which attempts to explain probability and uncertainty to a broad audience. In one of the instances of the bar scene, the stranger asks how many pennies you have in your wallet. You have five, and the stranger goes on to explain the rules of the game. On each round, three fair dice are rolled, and if a total of 10 comes up you win a penny from the stranger, whereas if the total is 9 he wins a penny from you, while all other sums lead to no transaction. You then move on to the next round, and so on until one of you is out of pennies. Should you accept to play? To analyze the game, we first need to understand what happens in a single round. Of the 63=216 equiprobable outcomes of the three dice, 25 of them result in a total of 9 while 27 of them result in a total of 10, so your expected gain from each round is (27-25)/216 = 0.009 pennies. But what does this mean for the game as a whole? More on this later.

    The point of discussing games based on dice, coin tosses, roulette wheels and cards when introducing elementary probability is not that hazard games are a particularly important application of probability, but rather that they form an especially clean laboratory in which to perform calculations: we can quickly agree on the model assumptions on which to build the calculations. In coin tossing, for instance, the obvious approach is to work with a model where each coin toss, independently of all previous ones, comes up heads with probability 1/2 and tails with probability 1/2. This is not to say that the assumptions are literally true in real life (no coin is perfectly symmetric, and no croupier knows how to pick up the coin in a way that entirely erases the memory of earlier tosses), but they are sufficiently close to being true that it makes sense to use them as starting points for probability calculations.

    The downside of such focus on hazard games is that it can give the misleading impression that mathematical modelling is easy and straightforward – misleading because in the messy real world such modelling is not so easy. This is why most professors, including myself, who teach a first (or even a second or a third) course on probability like to alternate between simple examples from the realm of games and more complicated real-world examples involving plane crashes, life expectancy tables, insurance policies, stock markets, clinical trials, traffic jams and the coincidence of encountering an old high school friend during your holiday in Greece. The modelling of these kinds of real-world phenomena is nearly always a more delicate matter than the subsequent step of doing the actual calculations.

    Elliot, in his book, alternates similarly between games and the real world. I think this is a good choice, and the right way to teach probability and stochastic modelling regardless of...

Click here to read the full review!

måndag 18 juli 2022

On systemic risk

For the latest issue of ICIAM Dianoia - the newsletter published by the International Council for Industrial and Applied Mathematics - which was released last week, I was invited to offer my reflections on a recent document namned Briefing Note on Systemic Risk. The resulting text can be found here, and is reproduced below for the convenience of readers of this blog.

* * *

Brief notes on a Briefing Note

I have been asked to comment on the Briefing Note on Systemic Risk, a 36 page document recently released jointly by the International Science Council, the UN Office for Disaster Risk Reduction, and an interdisciplinary network of decision makers and experts on disaster risk reduction that goes under the acronym RISKKAN. The importance of the document lies not so much in the concrete subject-matter knowledge (of which in fact there is rather little) that an interested reader can take away from it, but more in how it serves as a commitment from the three organizations to take the various challenges associated with systemic risk seriously, and to work on our collective ability to overcome these challenges and to reduce the risks.

So what is systemic risk? A first attempt at a definition could involve requiring a system consisting of multiple components, and a risk that cannot be understood in terms of a single such component, but which involves more than one of them (perhaps the entire system) and arises not just from their individual behavior but from their interactions. But more can be said, and an appendix to the Briefing Note lists definitions offered by 22 different organizations and groups of authors, including the OECD, the International Monetary Fund and the World Economic Forum. Recurrent concepts in these definitions include complexity, shocks, cascades, ripple effects, interconnectedness and non-linearity. The practical approach here is probably that we give up on the hope for a clear set of necessary and sufficient conditions on what constitutes a systemic risk, and accept that the concept has somewhat fuzzy edges.

A central theme in the Briefing Note is the need for good data. A system with many components will typically also have many parameters, and in order to understand it well enough to grasp its systemic risks we need to estimate its parameters. Without good data that cannot be done. A good example is the situation the world faced in early 2020 as regards the COVID pandemic. We were very much in the dark about key parameters such as R0 (the basic reproduction number) and the IFR (infection fatality rate), which are properties not merely of the virus itself, but also of the human population that it preys upon, our social contact pattern, our societal infrastructures, and so on – in short, they are system parameters. In order to get a grip on these parameters it would have been instrumental to know the infection’s prevalence in the population and how that quantity developed over time, but the kind of data we had was so blatantly unrepresentative of the population that experts’ guesstimates differed by an order of magnitude or sometimes even more. A key lesson to be remembered for the next pandemic is the need to start sampling individuals at random from the population to test for infection as early as possible.

Besides parameter estimation within a model of the system, it is of course also important to realize that the model is necessarily incomplete, and that system risk can arise from features not captured by it. At the very least, this requires a well-calibrated level of epistemic humility and an awareness of the imprudence of treating a risk as nonexistent just because we are unable to get a firm handle on it.

Early on in the Briefing Note, it is emphasized that while studies of systemic risk have tended to focus on “global and catastrophic or even existential risks”, the phenomenon appears ”at all possible scales – global, national, regional and local”. While this is true, it is also true that it is systemic risk at the larger scales that carry the greatest threat to society and arguably are the most crucial to address. An important cutoff is when the amounts at stake become so large that the risk cannot be covered by insurance companies, and another one is when the very survival of humanity is threatened. As to the latter kinds of risk, the recent monograph by philosopher Toby Ord gives the best available overview and includes a chapter on the so-called risk landscape, i.e., how the risks interact in systemic ways.

Besides epidemics, the concrete examples that feature the most in the Briefing Note are climate change and financial crises. These are well-chosen due both to their urgent need to be addressed and their various features typical of systemic risk. Still, there are other examples whose absence in the report constitute a rather serious flaw. One is AI risk, which is judged by Ord (correctly, in my view) to constitute the greatest existential risk of all to humanity in the coming century. A more abstract one, but nonetheless important, is the risk of human civilization ending up more or less irreversibly in the kind of fixed point – somewhat analogous to mutual defection in the prisoners’ dilemma game but typically much more complex and pernicious – that Scott Alexander calls Moloch and that Eliezer Yudkowsky speaks more prosaically of as inadequate equilibria.