Turning Scientific Perplexity into Ordinary Statistical Uncertainty

Cosma Shalizi in American Scientist:

9781107644458i D. R. Cox published his first major book, Planning of Experiments, in 1958; he has been making major contributions to the theory and practice of statistics for as long as most current statisticians have been alive. He is now in a reflective phase of his career, and this book, coauthored with the distinguished biostatistician Christl A. Donnelly, is a valuable distillation of his experience of applied work. It stands as a summary of an entire tradition of using statistics to address scientific problems.

Statistics is a branch of applied mathematics that studies how to draw reliable inferences from partial or noisy data. The field as we know it arose from several strands of scholarship. The word “statistics,” coined in the 1770s, originally referred to the study of the human populations of states and the resources those populations offered: how many men, in what physical condition, with what life expectancies, what wealth and so on. Practitioners soon learned that there was always variation within populations, that there were stable patterns to this variation and that there were relations between these variables. (For instance, richer men tended to be taller and live longer.) Another component strand was formed when scientists began to systematically analyze or “reduce” scientific data from multiple observers or observations (especially astronomical data). It became obvious from this research that there was always variation from one observation to the next, even in controlled experiments, but again, there were patterns to the variation. In both cases, probability theory provided very useful models of the variation. Statistics was born from the weaving together of these three strands: population variability, experimental noise and probability models. The field’s mathematical problems are about how, within a probability model, one might soundly infer something about a given process from the data the model generates, and at the same time quantify how uncertain that inference is.

Applied statistics, in the sense that Cox and Donnelly profess, is about turning vexed scientific (or engineering) questions into statistical problems, and then turning those problems’ solutions into answers to the original questions. The sometimes conflicting aims are to make sure that the statistical problem is well posed enough that it can be solved, and that its solution still helps resolve the original, substantive dilemma—which is, after all, the point.

Rather than spoiling any of Cox and Donnelly’s examples, I will sketch one that recently came up in my department.

More here.