Tuesday, November 24, 2009

the three laws of inference

If I hadn't believed it, I wouldn't have seen it --- Anon

The problem of making inferences in the presence of uncertainty is ubiquitous both in science and in society. Just look at the controversy over the trends in Earth temperature and how such data influence public policy. What most people don't realize is how uncertainty permeates even the simplest measurements (which are almost never direct) and hence the whole chain of measurements that goes into even the simplest problem. Here are some simple ideas to keep in mind when interpreting claims in the news



1) with big enough error bars, any model will fit the data.
2) with enough degrees of freedom in the model, you can fit any data.
3) it is impossible to achieve finite uncertainty without using quantifiable prior information.

Here's a simple example. If you see a bunch of points on a graph with no error bars, and someone claims that 'a straight line fits the data', then without an explicit account of the uncertainties, it is impossible to verify the statement. The straight line WILL fit the data, by definition, if the error bars are big enough. On the other hand, if the error bars become negligibly small, then the straight line CANNOT fit the data, unless the data were constructed directly from the straight line, which, of course, won't happen in practice.

Further, to say that you believe something, in itself carries no quantitative significance.    This is the problem Bayesians run into all the time.  They make so-called a priori assumptions but rarely try to justify them rigorously.  The results can thus be highly misleading.  Some even believe, for instance, that knowing a parameter is between a and b, is equivalent to saying it's equally likely to be any number between a and b.  Obviously this is absurd.   So, if you want to play the Bayesian game, you need to a) verify your prior rigorously, or b) make sure your posterior inferences don't depend strongly on the prior.

Definition:  a priori in this context means whatever information you have before you analyze your data.  After you've analyzed your data in the light of your prior, it becomes posterior.