Bayesian Updating and Evolutionary Search: On Bayes, Darwin and Wallace

Charles_Darwin Cosma Shalizi over at Three-Toed Sloth:

[T]here are many situations where Bayesian learning does seem to work reasonably effectively, which in light of the Freedman-Diaconis results needs explaining, ideally in a way which gives some guidance as to when we can expect it to work. This is the origin of the micro-field of Bayesian consistency or Bayesian nonparametrics, and it's here that I find I've written a paper, rather to my surprise.

I never intended to work on this. In the spring of 2003, I was going to the statistics seminar in Ann Arbor, and one week the speaker happened to be Yoav Freund, talking about this paper (I think) on model averaging for classifiers. I got hung up on why the weights of different models went down exponentially with the number of errors they'd made. It occurred to me that this was what would happen in a very large genetic algorithm, if a solution's fitness was inversely proportional to the number of errors it made, and there was no mutation or cross-over. The model-averaged prediction would just be voting over the population. This made me feel better about why model averaging was working, because using a genetic algorithm to evolve classifier rules was something I was already pretty familiar with.

The next day it struck me that this story would work just as well for Bayesian model averaging, with weights depending on the likelihood rather than the number of errors. In fact, I realized, Bayes's rule just is the discrete-time replicator equation, with different hypotheses being so many different replicators, and the fitness function being the conditional likelihood.

As you know, Bob, the replicator dynamic is a mathematical representation of the basic idea of natural selection. There are different kinds of things, the kinds being called “replicators”, because things of one kind cause more things of that same kind to come into being. The average number of descendants per individual is the replicator's fitness; this can depend not only on the properties of the replicator and on time and chance, but also on the distribution of replicators in the population; in that case the fitness is “frequency dependent”. In its basic form, fitness-proportional selection is the only evolutionary mechanism: no sampling, no mutation, no cross-over, and of course no sex. The result is that replicators with above-average fitness increase their share of the population, while replicators with below-average fitness dwindle.

This is a pretty natural way of modeling half of the mechanism Darwin and Wallace realized was behind evolution, the “selective retention” part — what it leaves out is “blind variation”.