- The definition of Kolmogorov complexity depends on the choice of universal Turing machine, of which there are infinitely many. The same ambiguity is therefore true of Solomonoff induction, and the authors admit this. So far, so good. But how, then, should we choose which particular machine to use?
From p 1111 and onwards, the authors talk about choosing a machine that is "unbiased", i.e. it doesn't favor certain objects blatantly more than other universal Turing machines do. It is not at all clear (at least not to the present reader) that such a notion of unbiasedness can be made sense of. On p 1112, the vague notion of natural universal Turing machines is introduced, meaning machines that do not exhibit such biases, and the authors offers this:
- One possible criterion is that a reference machine is natural if there is a short interpreter/compiler for it on some predetermined and universally agreed upon reference machine. If a machine did have an in-built bias for any complex strings then there could not exist a short interpreter/compiler. If there is no bias then we assume it is always possible to find a short compiler.
- Solomonoff induction involves the model assumption that the environment is computable. On p 1118, the authors address this assumption, and we find the following passage:
- John von Neumann once stated
“If you will tell me precisely what it is that a machine cannot do, then I can always make a machine
which will do just that”. This is because, given any “precise” description of a task we can design an algorithm to complete this task and therefore the task is computable. Although slightly tongue in cheek and not quite mathematically correct, this statement nicely captures the universality of the concept of computability.
In the same paragraph, they say that "according to the Church-Turing thesis, the class of all computable measures includes essentially any conceivable natural environment". Here it would have been suitable for the authors to modestly admit that the procedure they advocate (Solomonoff prediction) is in fact not computable and therefore impossible in "any conceivable natural environment". (Such a discussion comes much later in the paper.)
Of course it is OK and meaningful to discuss, as most of this paper does, the hypothetical scenario of having a machine that performs Solomonoff induction. But to insist, in this scenario, that the environment must be computable, is to introduce an outright contradiction and thus to enter the realm of the meaningless.
- The black raven paradox is a major issue that any theory of induction ought to address. To accept induction is to accept that the observation of a black raven will (usually) strengthen our belief in the hypothesis H that all ravens are black. Note, however, that "black" and "raven" are just arbitrarily chosen examples of properties here, and might as well be replaced by, e.g., "non-raven" and "non-black", so that accepting induction equally well entails accepting that the observation of a non-black non-raven will (usually) strengthen our belief in the hypothesis H* that all non-black items are non-ravens. But H* and H are logically equivalent, and surely logically equivalent hypoteses must be supported by the same evidence, so we seem to be forced to accept that the observation of a non-black non-raven (such as a yellow banana) will (usually) strengthen our belief in the hypothesis H that all ravens are black. But this seems paradoxical: most people will intuitively resist the idea that the observation of a yellow banana should have any evidential impact on the credibility of a hypothesis concerning the color of ravens.
Rathmanner and Hutter suggest that Solomonoff induction makes progress on the black raven paradox, but their case is very weak, for the following reasons:
- Nowhere in their discussion about Solomonoff induction and the black raven paradox on p 1122-1124 of their paper is there any hint of any asymmetry in how Solomonoff induction would treat black ravens versus non-black non-ravens as regards the the all-ravens-are-black hypothesis. No such hint means no argument that Solomonoff induction helps to solve the paradox.
- The highlight, supposedly, of the authors' argument on Solomonoff induction and the black raven paradox is the displayed formula near the top of p 1124, which states that when observing objects, one after another, the conditional probability of never ever seeing a black raven given what has been seen so far after n observations tends to 1 as n→∞ almost surely on the event that no black raven is ever seen. This is a nice property, but has little or nothing to do with Solomonoff induction, because the same is true for any Bayesian model incorporating such an infinite sequence of observations.6
- There is a difference between the statements "all ravens are black" and "all ravens ever appearing in our sequence of observations are black". The statement we're originally interested in is the former, whereas the aforementioned highlight on p 1124 is about the latter. The preceding displayed formula, near the bottom of p 1123, bridges this difference. That formula, however, is not about Solomonoff induction but only about a certain simple class of i.i.d. mixtures.
- A bit further down on p 1124, in Section 8.1, the authors suggest that a certain property of Solomonoff's univeral prior M implies "fast convergence in practice". I resent the term "in practice" here, because there is no practice: Solomonoff's prior is not computable. (The paper is full of similar slips.)
- In the introductory section, on p 1078, we learn that Solomonoff induction means that "it can be argued that the problem of formalizing optimal inductive inference is solved", and from then on, we hear about this optimality again and again. But I have trouble understanding the concept of e procedure that is both optimal and (if we accept the aforementioned Church-Turing thesis) impossible. Here's another induction procedure which is equally impossible, and arguably even more optimal: whenever we wish to know something about the world, ask an infallible oracle!
Presumably, what the authors mean by Solomonoff induction being "optimal" is that no real-world system can perform better at the given task. But there are many such "optimal" levels, as my oracle example shows, so a better term than "optimal" would be "upper bound on performance". The upper bound on performance provided by Solomonoff induction is obviously tighter than mine, and therefore better. But how tight is it, compared to what is actually achievable? Is there room for better bounds? There'd better not be, in order for the authors' statement about "the problem of formalizing optimal inductive inference [being] solved" not to look silly, but on this issue they are silent.
- In Section 10.4 entitled "Conclusions", the authors claim on p 1134 that Solomonoff induction "gives us new insights into the mechanics of our own thinking". Well, isn't that nice? Even nicer would have been if the paper had contained at least one example of such an insight that can plausibly be attributed to Solomonoff induction.