Monday, February 13, 2017
All models are wrong, some are useful
by Hari Balasubramanian
Thoughts on the differences in math applied to the physical and social sciences.
The quote in the title is attributed to the statistician George Box. The term ‘model' could refer to a single equation, a set of equations, or an algorithm that takes an input and carries out some calculations. Box's point is that you can never capture a physical or biological or social system entirely within a mathematical or algorithmic framework; you always have to leave something out. Put another way, reality yields itself to varying degrees but never completely; something always remains unknown that is not easily describable.
And in any case, for the practical matter of achieving a certain outcome that extra effort may not be necessary. If the goal is to put a satellite into orbit, the equations that define Newton's laws of motion and gravity, though not 100% correct, are more than sufficient; you don't need Einstein's theories of relativity though they would provide a more accurate description. But if the goal is to determine a GPS device's location on earth you do need relativity. This is because for an observer on earth a clock on an orbiting satellite ticks at a different speed than a clock on earth and if the necessary adjustments are not made, your phone's location estimate will be inaccurate.
So there is this art in modeling, this choosing of some aspects and ignoring others, trying to create the the right approximations. As Box notes: "there is no need to ask the question 'Is the model true?'. If 'truth' is to be the 'whole truth' the answer must be 'No'. The only question of interest is 'Is the model illuminating and useful?'"
Models vary widely in the amount of truth they capture. In the engineering disciplines that exploit physical laws – mechanical, chemical, civil, electronics and communications engineering – the test of a model is whether the mathematical answers match empirical observations to the degree of precision needed and whether the results can be reproduced again and again.
Standards are high: if equations or computer simulations describing some physical phenomena do not match empirical observations, they eventually will be abandoned or modified. Evidence of this precision and repeatability is all around us – consider that, for the most part, light comes on when turn on the switch, a bridge is able to withstand loads, sensors are able to measure accurately, images and voices and messages can be searched and transmitted at near-instant speeds. Indeed, the evidence so pervasive that it is often taken for granted.
Contrast this with mathematical models in what we can call social domains – economics, healthcare delivery, election polling, psychology and human behavior. In these fields, you can't get – at least not yet – the kind of precision and repeated successes that you get in physics. You can use the models to sharpen your thought process; you can predict general trends and derive insights. But predicting the precise value of a future quantity is quite challenging. For example, models in economics often assume rational actions when of course there is always a rogue factor in how individuals and groups behave, throwing off chances of an accurate prediction. Indeed, one use of such models is to show that equations and theorems, however elegant, have little basis in reality. Friedrich Hayek captured the spirit of this beautifully in his quote: "The curious task of economics is to demonstrate to men how little they really know about what they imagine they can design."
Another difficulty in social systems is that cause and effect and not so neatly separated. This is particularly true when you try to analyze historical data. If we consider multiple regression and other statistical models – very much in fashion these days: easy to use them at a click of a button and a dizzying array of graphs and numbers pop out, and the users may not be aware of the nitty-gritty details that generated these results – in the case of statistical models, it could well be the case that the effects have little to do with the hypothesized causes. A complex matter gets reduced to p values, percentage improvements or other newly defined metrics, meant to highlight the modeler's principal claims and inadvertently masking deficiencies. Sometimes the deficiencies can't be detected since the datasets are so large and have so many variables, they are not easily visualized; so anything goes. Perhaps this is why one study may find that such and such is true – media outlets enhance the effect by providing attention-grabbing titles – while another discovers the opposite result.
Once in a while we get something remarkably successful like Nate Silver's election forecasts, which aggregate and weight various polls. So successful that we are lulled into thinking that there is a rock-solid science of predicting how a population will vote in an election. This perception lasts until an election like 2016 comes along. Looking at the comments section of Silver's website on Nov 9, 2016 you could feel the anger – and the anger turned on the pollsters and statisticians: how could everyone get it wrong?
Like stories, models have a kind of seductive power: we get psychologically attached to them more than is warranted, forgetting that they may be more wrong than we think. If we are clear-eyed about what these models really are – in Silver's case, projections based on samples where errors could easily creep in – perhaps we wouldn't be so surprised. The advent of big data and machine learning only seems to have made us more confident, more triumphant – there's a feeling that social systems, human behavior and consciousness will finally yield themselves to massive computing power and advanced statistics, that algorithms of the same status as the great laws of physics are about to be unveiled. Maybe we really are on the verge. Already the impact of big data, both for beneficial and nefarious purposes, is undeniable. But there is also reason to be skeptical; there is no substitute to looking closely under the hood of the new algorithms, on a case by case basis, to note whether the expectations are unrealistic to begin with.
Many of the thoughts in this piece come from my own modeling experience in a field called operations research. A relatively new branch of mathematics and engineering, operations research is concerned with ‘optimal' ways of running organizations and making things more ‘efficient'. (I use quotes here since these terms are far more difficult to define and achieve than it seems at first glance.) An airline needs to match its pilots, crews and passengers to flights each day and reschedule in case of unexpected events; a corporation like Intel or Apple needs to manage its far flung supply chains so that its products are delivered on time; Doctors Without Borders needs to deploy its clinical staff and equipment on a short notice during infectious disease outbreaks such as Ebola. From a computational viewpoint, these problems become difficult quickly; it is not unusual for there to be billions or trillions or many orders higher number of solutions that even the fastest computers cannot parse through. Without models and search algorithms, finding good answers in a reasonable amount of time would be impossible.
But in the end, the optimizations that are carried out in an abstracted mathematical world have to be implemented in situations where human behavior – all those messy, inexplicable, contradictory, delightful, mischievous things we do – plays a non-trivial role. And so the results, I've noticed, are far from optimal in practice. Some groups may feel unfairly treated; unintended consequences pop up sooner or later in parts of the system that the model did not consider; or the environment changes to an extent that the so-called globally optimal solutions, obtained with great effort, turn out to be short-sighted.
On a personal front, I find myself – and this is the side of me that is drawn to the humanities – rebelling against excessive quantification, metrics and buzzwords such as ‘predictive analytics'. "Don't count too much," my aunt said to me sternly last August in India when I kept telling her how many of her delicious sweets I'd greedily consumed, and which I was feeling guilty about. "Don't count too much – just tell me whether you enjoyed them." That comment has stayed with me and it seemed to summarize what I'd been feeling: that perhaps we are overdoing models and analyses in situations where numbers are far from the full story.
Posted by Hari Balasubramanian at 12:35 AM | Permalink