Häggström hävdar: Artificial intelligence and Solomonoff induction: what to read?

tisdag 20 augusti 2013

Artificial intelligence and Solomonoff induction: what to read?

I am deeply concerned about the future of humanity.¹ There are several factors that may have a huge impact (for good and/or for bad) but are mostly overlooked in mainstream public discourse, and one of my ambitions is to try to understand these, as best as I can - with the ultimate goal of suggesting courses of action that might improve the odds of a favorable future. One of these factors is the possibility of a breakthrough in AGI - Artificial General Intelligence.² Hence it is important to me to try to stay informed about major developments in AGI (notwithstanding the extreme difficulty even for experts to forecast in this area).

One concept that is sometimes claimed to be of central importance in contemporary AGI research is the so-called AIXI formalism. I came across a recent presentation named Uncertainty & Induction in AGI where AIXI's inventor Marcus Hutter presents its basic ingredients, which, very briefly, are as follows:

An AIXI agent is (like all of us) situated in an environment which is partly unknown to it, and wishes to act in such a way as to maximize some pre-defined utility. Based on what it observes it updates its beliefs about the environment, and unlike most of us, it does so using Bayesian updating. Before it has made any observations, the agents starts from a particular prior distribution known as Solomonoff's universal prior, defined by assigning to every possible environment H probability 2^-K(H), where K(H) is the Kolmogorov complexity of H, which in turned is defined as the length of the shortest program that produces H on a given universal Turing machine. The AIXI also acts, and it does so based by plugging its utility function and its Bayesian beliefs about the environment into the Bellman equations of standard decision theory, in order to maximize its futire utility (with some finite time horizon).

In his presentation, Hutter advices, in fairly strong terms, AGI researchers to take AIXI seriously. It provides, he says, "the best conceptual solutions of the induction/AGI problem so far". The approach he advocates outperforms all earlier attempts at theories of uncertainty and induction, including "Popper’s denial of induction, frequentist statistics, much of statistical learning theory, subjective Bayesianism, Carnap’s confirmation theory, the data paradigm, eliminative induction, pluralism, and deductive and other approaches". Moreover, "AIXI is the most intelligent environmental independent, i.e. universally optimal, agent possible". If nothing else, this last word "possible" is a bit of an exaggeration, because AIXI is in fact not possible to implement, as Kolmogorov complexity is not computable). Hutter's tone then moves on from immodest to outright condescending, when he writes that "cranks who have not understood the giants and try to reinvent the wheel from scratch can safely be ignored".³

Concluding from Hutter's boasting tone that Solomonoff induction and AIXI are of no interest would, of course, be a non sequitur. But Uncertainty & Induction in AGI is just a PowerPoint presentation and can only give so much detail, so I felt I needed to look further. In the presentation, Hutter advices us to consult his book Universal Artificial Intelligence. Before embarking on that, however, I decided to try one of the two papers that he also directs us to in the presentation, namely his A philosophical treatise of universal induction, coauthored with Samuel Rathmanner and published in the journal Entropy in 2011. After reading the paper, I have moved the reading of Hutter's book far down my list of priorities, because gerneralizing from the paper leads me to suspect that the book is not so good.

I find the paper bad. There is nothing wrong with the ambition - to sketch various approaches to induction from Epicurus and onwards, and to try to argue how it all culminates in the concept of Solomonoff induction. There is much to agree with in the paper, such as the untenability of relying on uniform priors and the limited interest of the so-called No Free Lunch Theorems (points I've actually made myself in a different setting). The authors' emphasis on the difficulty of defending induction without resorting to circularity (see the well-known anti-induction joke for a drastic illustration) is laudable. And it's a nice perspective to view Solomonoff's prior as a kind of compromise between Epicurus and Ockham, but does this particular point need to be made in quite so many words? Judging from the style of the paper, the word "philosophical" in the title seems to mean something like "characterized by lack of rigor and general verbosity".⁴ Here are some examples of my more specific complaints:

The definition of Kolmogorov complexity depends on the choice of universal Turing machine, of which there are infinitely many. The same ambiguity is therefore true of Solomonoff induction, and the authors admit this. So far, so good. But how, then, should we choose which particular machine to use?
From p 1111 and onwards, the authors talk about choosing a machine that is "unbiased", i.e. it doesn't favor certain objects blatantly more than other universal Turing machines do. It is not at all clear (at least not to the present reader) that such a notion of unbiasedness can be made sense of. On p 1112, the vague notion of natural universal Turing machines is introduced, meaning machines that do not exhibit such biases, and the authors offers this:
This does not make any progress towards solving the ambiguity problem, due to the obvious circularity: We should choose a machine that is unbiased in comparison to a "universally agreed upon reference machine", but how should that machine be chosen? Perhaps it should be chosen in such a way as to be unbiased in comparison to some absolutely honest-to-God universally agreed upon reference machine? And so on.⁵
Solomonoff induction involves the model assumption that the environment is computable. On p 1118, the authors address this assumption, and we find the following passage:
All right, thank you for the "not quite mathematically correct", but I still think this is downplaying the problem in a misleading way. It's not the case that von Neumann's statement could be turned into something similar in spirit but correct - the problem is that the statement is downright false. It would have been easy for the authors at this point to provide a counterexample by sketching Turing's diagonalization argument for the halting problem.
In the same paragraph, they say that "according to the Church-Turing thesis, the class of all computable measures includes essentially any conceivable natural environment". Here it would have been suitable for the authors to modestly admit that the procedure they advocate (Solomonoff prediction) is in fact not computable and therefore impossible in "any conceivable natural environment". (Such a discussion comes much later in the paper.)
Of course it is OK and meaningful to discuss, as most of this paper does, the hypothetical scenario of having a machine that performs Solomonoff induction. But to insist, in this scenario, that the environment must be computable, is to introduce an outright contradiction and thus to enter the realm of the meaningless.
The black raven paradox is a major issue that any theory of induction ought to address. To accept induction is to accept that the observation of a black raven will (usually) strengthen our belief in the hypothesis H that all ravens are black. Note, however, that "black" and "raven" are just arbitrarily chosen examples of properties here, and might as well be replaced by, e.g., "non-raven" and "non-black", so that accepting induction equally well entails accepting that the observation of a non-black non-raven will (usually) strengthen our belief in the hypothesis H^* that all non-black items are non-ravens. But H^* and H are logically equivalent, and surely logically equivalent hypoteses must be supported by the same evidence, so we seem to be forced to accept that the observation of a non-black non-raven (such as a yellow banana) will (usually) strengthen our belief in the hypothesis H that all ravens are black. But this seems paradoxical: most people will intuitively resist the idea that the observation of a yellow banana should have any evidential impact on the credibility of a hypothesis concerning the color of ravens.
Rathmanner and Hutter suggest that Solomonoff induction makes progress on the black raven paradox, but their case is very weak, for the following reasons:
- Nowhere in their discussion about Solomonoff induction and the black raven paradox on p 1122-1124 of their paper is there any hint of any asymmetry in how Solomonoff induction would treat black ravens versus non-black non-ravens as regards the the all-ravens-are-black hypothesis. No such hint means no argument that Solomonoff induction helps to solve the paradox.
- The highlight, supposedly, of the authors' argument on Solomonoff induction and the black raven paradox is the displayed formula near the top of p 1124, which states that when observing objects, one after another, the conditional probability of never ever seeing a black raven given what has been seen so far after n observations tends to 1 as n→∞ almost surely on the event that no black raven is ever seen. This is a nice property, but has little or nothing to do with Solomonoff induction, because the same is true for any Bayesian model incorporating such an infinite sequence of observations.⁶
- There is a difference between the statements "all ravens are black" and "all ravens ever appearing in our sequence of observations are black". The statement we're originally interested in is the former, whereas the aforementioned highlight on p 1124 is about the latter. The preceding displayed formula, near the bottom of p 1123, bridges this difference. That formula, however, is not about Solomonoff induction but only about a certain simple class of i.i.d. mixtures.
To summarize, there is nothing in the paper to back up its penultimate sentence (on p 1134) saying that that the authors have shown "how Solomonoff induction deals with the black raven paradox and argued that it should give the desired result".
A bit further down on p 1124, in Section 8.1, the authors suggest that a certain property of Solomonoff's univeral prior M implies "fast convergence in practice". I resent the term "in practice" here, because there is no practice: Solomonoff's prior is not computable. (The paper is full of similar slips.)
In the introductory section, on p 1078, we learn that Solomonoff induction means that "it can be argued that the problem of formalizing optimal inductive inference is solved", and from then on, we hear about this optimality again and again. But I have trouble understanding the concept of e procedure that is both optimal and (if we accept the aforementioned Church-Turing thesis) impossible. Here's another induction procedure which is equally impossible, and arguably even more optimal: whenever we wish to know something about the world, ask an infallible oracle!
Presumably, what the authors mean by Solomonoff induction being "optimal" is that no real-world system can perform better at the given task. But there are many such "optimal" levels, as my oracle example shows, so a better term than "optimal" would be "upper bound on performance". The upper bound on performance provided by Solomonoff induction is obviously tighter than mine, and therefore better. But how tight is it, compared to what is actually achievable? Is there room for better bounds? There'd better not be, in order for the authors' statement about "the problem of formalizing optimal inductive inference [being] solved" not to look silly, but on this issue they are silent.
In Section 10.4 entitled "Conclusions", the authors claim on p 1134 that Solomonoff induction "gives us new insights into the mechanics of our own thinking". Well, isn't that nice? Even nicer would have been if the paper had contained at least one example of such an insight that can plausibly be attributed to Solomonoff induction.

I still consider it plausible to think that Kolmogorov complexity and Solomonoff induction are relevant to AGI⁷ (as well as to statistical inference and the theory of science), but the experience of reading Uncertainty & Induction in AGI and A philosophical treatise of universal induction strongly suggests that Hutter's writings are not the place for me to go in order to learn more about this. But where, then? Can the readers of this blog offer any advice?

Footnotes

1) For the benefit of Samuel Rathmanner and Marcus Hutter, and to give them a chance to comment, I write this blog post in English.

2) What such a breakthrough might entail is extremely difficult to predict, but I have discussed various scenarios en earlier blog posts, e.g., here, here and here.

3) By "giants", he means "Bayes, Shannon, Turing, Kolmogorov, Solomonoff, Wallace, Rissanen [and] Bellman".

4) I have great respect for the subject of philosophy. Naturally, then, I disapprove of Rathmanner's and Hutter's useage of the term "philosophical".

5) There is a structural similarity between, on one hand, this failure to make progress by introducing yet another reference machine, and, on the other hand, the postulation that everything that exists must be created and thus have a creator, and pretending to solve the problem by introducing God Almighty. (Who created God Almighty?)

6) One interesting strengthening concering Solomonoff induction would be if the "almost surely" qualifier might be dropped in the asymptotic result considered here. This would be less trivial, because the corresponding statement in the generality of Bayesian models incorporating an infinite sequence of observations would be false.

7) I am not a computer scientist, so the following should perhaps be taken with a grain of salt. While I do think that computability and concepts derived from it such as Kolmogorov complexity may be relevant to AGI, I have the feeling that the somewhat more down-to-earth issue of computability in polynomial time is even more likely to be of crucial importance.

23 kommentarer:

David Sumpter20 augusti 2013 kl. 21:32
I think it is highly improbable that AGI will arise from anything based on some theoretical results from information theory. The basic reason for me saying this is empirical. Most machine learning researchers use information theory and Bayesian updating in the algorithms they develop, yet they can't get even the slightest bit of intelligence out of a computer. They are just simple search algorithms. It seems daft then to investigate the formal basis of the reasoning with a hope of getting to the bottom of the key components of what artificial intelligence looks like.

I heard Josh Tennenbaum (http://web.mit.edu/cocosci/josh.html) give an excellent talk recently where first he explained what they had achieved in showing how people reason using Bayesian updating about the likilhood which of their various models of the world. He then spent the second half of his talk saying why this answered only a small set of problems. His latest research is looking at whether the brain runs 'physics simulators' to make predictions about future states. This is in my opinion is pushing the limits of what can be done and is where you should start. Start with the actual science.

I appreciate the honesty in your post. But I feel forced to point out that you are picking soft targets. Clearly this guy was some sort of crackpot, and in a way it is bad taste to attack him.
SvaraRadera
Svar
Monica Anderson21 augusti 2013 kl. 02:31
You might like the stuff at http://syntience.com/links
and especially http://syntience.com/rch.pdf that provides a more bioplausible view of a dual process cognitive system that focuses on Reduction rather than Induction. Induction is "too logical" to be bioplausible.

The video "A New Direction in AI Research" contains the main points but later results and more domain-specific examples are available in the other videos.
SvaraRadera
Svar
Anonym21 augusti 2013 kl. 12:28
Praktiska algoritmer inspirerade av Solomonoff-induktion:
http://arxiv.org/abs/0909.0801
http://arxiv.org/abs/1007.2049

/Erik
SvaraRadera
Svar
Olle Häggström22 augusti 2013 kl. 08:53
I notified Marcus Hutter about my blog post, and received the following response. I find it thoroughly unsatisfactory:

"Hi Olle,

Thanks for your interest in my work. In general I don't have time to engage in blog discussions, especially if they are non-constructive. I am happy to leave such discussions to others. Just two brief comments:

(1) If you don't like the verbose style of [RH11], you should consult my book and/or technical papers at http://www.hutter1.net/official/bib.htm

(2) "Hutter's writings are not the place for me to go in order to learn more about this. But where, then?" Your question implies that [RH11] contains "the best conceptual solutions of the induction/AGI problem" you are aware of. In case you find or develop something better, i.e. which better withstands comparable scrutiny, I would be interested to hear about it.

Best regards,

Marcus"

Hutter begins with declaring that he declines to participate in scientific discussions unless they take place in traditional formats (i.e., not blogs). For someone who primarily values scientific progress (as opposed to superficial things like career-boosting) such format issues really ought to be secondary compared to the actual scientific content.

He then criticizes me for being "non-constructive". But when Hutter claims to make progress on problems like the choice of reference machine or the black raven paradox, and I point out the hollowness of his claims, it's more than a little disingenious to shrug off my criticism by noting that I haven't provided alternative approaches to these deep and difficult unsolved problems.

In his comment (1), Hutter pretends that my main criticism concerns style rather than content.

In his comment (2), he draws a conclusion about what my question "...to learn more about this. But where, then?" implies. This conclusion is an obvious non sequitur, even if we should ignore that he pretends that my word "this" refers to the general "induction/AGI problem" rather than (as is entirely clear from my blog post's final paragraph, from which he quoted my question) the specific approach of using Solomonoff induction.
SvaraRadera
Svar
Anonym22 augusti 2013 kl. 11:49
Angående svarta korpar:

* Rathmanner & Hutter diskuterar detta exempel på flera ställen i sin artikel. Speciellt avsnittet omkring s. 1106 där de diskuterar Mahers text ger intrycket att problemet de främst menar sig lösa är att på ett naturligt sätt ge hypoteser som "alla korpar är svarta" en ändlig och icke-försumbar sannolikhet. Om problemformuleringen är sådan att antalet korpar ska betraktas som oändligt, så ger de mest direkta angreppssätten att hypotesen "alla korpar är svarta" antingen alltid har sannolikhet 0 eller särbehandlas från andra hypoteser.

* Bland bayesianer är den vanligasta uppfattningen att en icke-svart icke-korp faktiskt ger pyttelite empiriskt stöd till hypotesen att alla korpar är svarta. Med Rathmanner & Hutter egna ord är asymmetrin mellan observation av en svart korp resp. en icke-svart icke-korp:

"The solution to this problem lies in the relative degree of confirmation. The above result only states that the belief in the hypothesis must increase after observing either a black raven or a white sock, it says nothing about the size of this increase. If the size of the increase is inversely proportional to the proportion of this object type in the relevant object population then the result can become quite consistent and intuitive." (s. 1109)

"This also solves the problem of our intuition regarding black ravens. Black ravens make up a vanishingly small proportion off all possible objects, so the observation of a black raven gives an enormously greater degree of confirmation to “all ravens are black” than a non-black non-raven. So much so that the observation of a non-black non-raven has a negligible affect on our belief in the statement.
Unfortunately no formal inductive system has been shown to formally give this desired result so far. It is believed that Solomonoff Induction may be able to achieve this result but is has not been shown rigorously. Later we will argue the case for Solomonoff induction." (s. 1109)

* Ditt påstående "Nowhere in their discussion about Solomonoff induction and the black raven paradox on p 1122-1124 of their paper is there any hint of any asymmetry..." är falskt. Det finns visst antydningar om asymmetri. Men framför allt så skriver de mycket explicit om detta på andra ställen i artikeln. I kommentaren ovan skriver du att Hutters e-mail svar är "more than a little disingenious". Om något så ser ditt selektiva refererande av Hutters diskussion av "svarta korpar"-exemplet mer ohederligt ut än Hutters e-mail svar till dig. Men framförallt så borde man inte slänga ur sig sådana anklagelser på lösa grunder.

/Erik
SvaraRadera
Svar

Lägg till kommentar