Psychological Science: The [Non-]Theory of Psychological Testing

“Psychological Science: The [Non-]Theory of Psychological Testing – Part 1” can be found HERE.

Q & A

Q. If Psychological Test Theory (PTT) is not a theory but a tautology, then what should be substituted in it's place?

A. How about replacing it with a scientific, or observational theory. *

* I hope those who believe PTT is a scientific theory will indulge me in my elaboration, below.

The story so far

No modern science begins with the assumption, explicitly or implicitly, of the reality of Plato's World of Ideal Forms. The one exception is testing and measurement in the social sciences, particularly psychological or mental testing. What is not appreciated by many, if not most, social scientists is that PTT assumptions like True Score, or Latent Trait, are not like literary dramatic license that gives weight and impact to the narrative. From the point of view of the philosophy of science, they are indistinguishable from Plato's Ideal Forms, and have no place in modern science.

Mathematical argument is found in all modern science. The social sciences are no exception. Scientists use mathematical argument in three ways:

It is used as a way to analyze, understand, and communicate data from observation;
Mathematical argument helps one hypothesize about data not yet observed; and
In the service of supplementing 1 and 2, properties of mathematical inventions and constructions are used as convenient substitutes for the undetermined properties of observed or hypothesized data.

PTT, however, tends to use mathematical inventions and constructions, not as a supplement to mathematical argument based on observation, but as a near total substitute for it. This is the tradition handed down to Western civilization from Pythagoras, that is both praised and lamented by Carl Sagan in his book and Public Broadcasting Service (PBS) video series, “Cosmos”.

“Pythagoras…developed a method of mathematical deduction…. The modern tradition of mathematical argument, essential to all of science, owes much to Pythagoras.” Pp. 149-150.

“In the recognition by Pythagoras and Plato that the Cosmos is knowable, that there is a mathematical underpinning to nature, they greatly advanced the cause of science. But in the suppression of disquieting facts, the sense that science should be kept for a small elite, the distaste for experiment [emphasis mine], the embrace of mysticism…, they set back the human enterprise.” P. 155.

The great contributions of Peter Abelard (1079 CE – 1142 CE) to philosophy, and Galileo Galilei (1564 CE – 1642 CE) to science and the process of science, freed Western man from the Plato-bound Scholastics of the Church who believed in the reality of Ideal Forms, and that knowledge was yours just for the thinking without the need to observe. Ideas had no separate existence from man's ability to create them, conceptualize them, and communicate them to others. The arbiter of all truth about nature – the world we experience – would be science.

What makes PTT a tautology?

I thought you'd never ask. A tautology is a self-consistent logical system, wherein all statements are true. It's also called circular reasoning. Here's a very good example from a text of religious instruction for a Christian faith community. The instruction is in the form of questions and answers that were memorized by believers:

Q. Who made me?

A. God made me.

Q. Who is God?

A. God is the infinitely supreme Being who made all things.

What make this a tautology is that the logic is circular and self-consistent. God is defined in terms of creation, and creation is defined in terms of God. All statements within a tautology are always true: God made me; and, God is the infinitely supreme Being who made all things. Another example of a tautology is Freud's concepts of Eros and Thanatos, the creative life force, and the destructive death force, respectively, in the psyche of the individual. Each concept, Eros and Thanatos, is defined by the absence of the other. If you are a creative artist in the throes of productive output, you have a lot of Eros and your Thanatos is low. If your latest showing at the Museum of Modern Art, in New York City, got a bad review in the New York Times, and you committed suicide, your Thanatos flared up and your Eros dropped off to nothing.

What makes modern PTT a tautology is captured in the title of one of the most influential texts on the subject, Frederick Lord's and Melvin Novick's (1968) book on Classical Test Theory (CTT), with four chapters on Item Response Theory (IRT) written by Allan Birnbaum, “Statistical Theories of Mental Tests.” It is a statistical theory, not a scientific theory, nor an observational theory. By definition, a statistical theory of mental testing is a tautology. PTT is captured in a closed, self-consistent, logical system of assumptions and derivations. Within a statistical theory of mental testing, all statements are true.

Among the elements of PTT are various distributions, of which the most easily recognized is the Euler-Gauss Normal Distribution. Too, there is a good supply of parameters and assumptions about the properties of parameters. Let's take a look at the Euler-Gauss bell curve. The shape of the familiar curve comes from plotting the values of the left side of a very interesting equation:

and if we assume a mean of 0 and a standard deviation of 1, the equation reduces to:

Standard_normal_probability_density_function

and the only variable is x. There are two constants, π, and e. e is Euler's constant (there are actually two Euler's constants) and is known as the natural log. [How about a bit of trivia. What is the relationship between e and Google's IPO, Initial Public Offering?] The problem with most social scientists is that they assume the Euler-Gauss formula is derived from nature, that it is based upon data from systematic observation, and that it reflects nature. It is a purely mathematical invention and construction, that has practical utility for some areas of science, BUT IT IS NOT A DESCRIPTION OF NATURE. It appears to social scientists that it represents reality because there are some observed distributions that look like a normal curve, like height, shoe size, and SAT scores.

The reason many sciences use the normal distribution, is that its properties are already known. Why waste time determining, empirically, the properties of the distribution of scores on a test of depression. If it looks like a normal distribution, then, what the hell – let's use the properties of the mathematical invention and save ourselves a lot of time. This doesn't bother the mathematician nor the mathematical statistician. Nor should it. This doesn't bother the social science researcher, either, for completely different reasons. For the most part, social scientists do not understand that PTT is a tautology, let alone appreciate the limitations it puts on PTT as a science. A reader commented, in Part 1 of this article, on my observation that the normal curve is not a representation of nature, by saying I should look up the Central Limit Theorem (CLT). The CLT states that the distribution of means from a very large number of random samples will tend to be distributed normally. Here's the problem with that comment. The CLT is part of the larger statistical theory, a tautology, that forms the basis for PTT. Since every statement of a tautology is always true, the reader was trying to prove his assertion of the ubiquity in nature of the normal distribution, by asserting another element of the same tautology. That which is tautologically true, cannot be assumed to be a result of empirical discovery, no matter the common sense appeal, long time use, or the prestige of the author(s), and the sales of their books. I am quite confident that they would agree with much of my discourse. Their intellectual caution and reservations, however, do not seem, in my personal opinion, to infiltrate the graduate and undergraduate curricula of educational and psychological testing.

Here's a more comedic example from an old Abbott and Costello routine. Lou Costello, accompanied by Bud Abbott, goes into a bank to apply for a loan. The bank manager asks for identification to prove that he is Lou Costello. He doesn't have any identification with him. The bank manager says he can't begin to consider him for a loan until he can prove he is Lou Costello. Costello turns to Bud Abbott and says, “Tell the bank manager who I am.” Abbott speaks directly to the bank manager and says, with great moral force and confidence, “This man is Lou Costello.”

The limitations of PTT

On the level of common sense observation and utility, the normal curve APPEARS to be in line with many distributions that are observed in nature. However, the normal curve is one of a number of distributions used by PTT. There are distributions for both continuous variables and discrete variables. Each distribution curve APPEARS to approximate at least some distributions found in nature. All of these distributions, though they may APPEAR to mimic some aspect of nature, are purely mathematical constructions, and are not derived from systematic observation. These distributions, and the tautologies they rode in on, may have utility but they do not describe reality in the natural world.

Here's an example from my own experience. It's a little bit of a shaggy dog story, but it is relevant, so please indulge me. Early in my IBM career I worked on the computer simulation of pedestrian and automobile traffic for the sprawling IBM complex in Poughkeepsie, NY. It had manufacturing facilities, research and development laboratories, administration buildings, programming centers, and monstrous computer operations. We had about 10,000 people in the main complex on three shifts. Everyone commuted by car, as there was no mass transit serving the facility. We had scores of buildings, and hundreds of entrance and exit doors for employees. There were huge parking lots, many access and exit points, several miles of intra-complex roads, and guaranteed traffic jams at the end of the workday for first shift employees. The General Manager wanted to change the staggered start-stop times for all employees on all shifts, reduce the 45 minute lunch period to 30 minutes (allowable by New York State law at the time), and have all the first shift operations ended by 3:30 pm. What I am about to report is the literal, absolute truth about why the General Manager was taking these actions, though it was known to only a few of us. He was an avid motorcyclist in good weather, and snow mobile enthusiast in the winter. He wanted to leave his office by 3:30 on Fridays, and head to his personal retreat and playground in the mountains. The GM wanted to be sure that he was not creating traffic delays that were worse than the current situation.

One of the first things we had to do was determine the patterns of exit behaviors on the part of departing employees for all 200 plus exit doors on all three shifts for the whole friggin Poughkeepsie complex. When I say patterns of exit behaviors, I mean the frequency distributions associated with exiting employees following their stop-time. If your shift ended at 4:12 pm, along with all 127 people in your corner of the building, you would 'punch out' no earlier than 4:12 pm and then make a dash for the exit door. We trained observers to stand at the exit doors with stopwatch and clipboard. We had no prior data on this phenomenon, and no idea what the frequency distributions would be like. So we decided not to sample. Instead we collected data for every friggin door, for every friggin stop-time, for every friggin building, for every friggin shift, for every friggin day of the week. Considering the consequence to the GM, and to us (under the principle that shit flows down hill), should he make the wrong decision and produce a worse traffic problem, we had to do as complete a job of data gathering as possible.

Siméon Denis Poisson (1781-1840)

Now let's get back to the point at hand. We need to know the shapes of the frequency distributions of exit behavior of the employees. The software, General Purpose Simulation System (GPSS), could accept empirically derived data (from observation), or could use one of many idealized frequency distributions (like the normal curve) and save us a lot of programming time. When we started plotting all the friggin data, by friggin this, and by friggin that, we noticed something consistent in almost all of the distributions. The greatest frequency of exits was in the first few minutes following the formal stop-time, with only a trickle of people exiting after 6 or 7 minutes. The project manager, an industrial engineer with good training in statistics, realized they all LOOKED LIKE Poisson distributions, not normal distributions, not Chi-square distributions, not flat distributions, not bimodal distributions. They LOOKED LIKE Poisson distributions. So he said the hell with inputting empirical data, we will use the Poisson function in GPSS to generate the distributions of exit behavior for all employees under all conditions.

This example makes three important points:

The Poisson distribution was not reflecting nature, nor was it derived from nature. We observed nature, and on a rational basis, decided to use the convenience of a known frequency distribution with known properties. This is the exact same process we use in deciding to use a normal distribution; not because the normal distribution is derived from nature, but because our observations from nature appear to us to look like a normal distribution.
No statistical theory could have predicted the shape of the frequency distribution associated with employee exit behavior – a never before observed phenomenon.
The use of an idealized distribution, the selection of which is rationally determined, rather than using an empirically derived distribution, can have high utility.

Our computer simulations, using far more data than exit door distributions, showed that lines of cars at intersections would be longer, but that more cars would be processed through the traffic patterns in a given time period. The conclusion from the simulation was that the overall length of time for any car to exit the complex was unchanged. Staking our next raises on our findings, we recommended the GM proceed as planned. We got our next raise and the GM left early on Fridays.

The seeming ubiquity of the Normal Distribution may be more apparent than real; it may be more artifact than substance. Students of PTT, developers of mental tests, and practitioners learn how to create testing procedures and materials that are more likely to produce normal distributions in the data they collect. It is not a function of nature, nor an accident, that SAT scores – and like measures of aptitude, abilities, and achievement – look like the ideal Euler-Gauss bell curve. Test developers work hard to produce measurement tools that yield normal curves. Is this a dishonest manipulation? No. They are simply justifying the use of the properties of the Euler-Gauss distribution for their data. As a ready-made substitute for a time consuming and costly empirical investigation of the properties of raw score distributions, the normal curve can't be beat. It has the added benefit of keeping things standardized, predictable, more easily communicated to constituents, and is continuous with history.

However, in my forty years of social science research, the typical distributions of actual research data, collected in the field, are anything but predictably normal. Our get-out-of-jail-free card was, in some cases, the Central Limit Theorem. Sometimes we copped a reprieve by using an F-distribution because it was more robust in the face of non-normal distributions. Where did we get the notion of F-distributions being robust? You guessed it. It came from the same tautological statistical theory as the Central Limit Theorem.

The most serious limitation of a statistical theory (tautology) of mental testing, is that it cannot predict the properties of the distribution of scores of an hypothesized, but not yet unobserved, aspect of behavior or mental functioning in animals and humans. This alone should inform the social scientist that PTT is NOT a scientific or observational theory. Let's examine what we mean by a scientific theory. First, we need to draw a distinction between the common usage of the word theory and the term scientific theory. In common usage, theory relates to an educated guess, an hypothesis, a conclusion not well supported, a hunch, and the like. In common usage theory has little to do with facts, or, at best, with incomplete, even contradictory facts. A scientific theory is a coherent narrative of some aspect of nature that is based upon facts from systematic observation; it can accommodate new data; it can predict events that were previously unknown or never observed; from new data it can be modified, extended, limited, made more general, or even completely thrown out.

Evolutionary biology is a very good example of a scientific theory. It is a coherent story, base on facts, applies to new data, it can predict transitional forms, and be supported, modified, or disproved by new facts. None of this applies to a statistical theory of mental testing. None of this applies to any tautology. [Evolutionary biology is often criticized, dismissively, with the comment that it is only a theory. Such a remark betrays a lack of scientific knowledge, and no understanding of the distinction between the common usage of theory and the correct understanding of a scientific theory.]

Next time

In my third and final installment of PTT, I'm going to discuss how today's PTT is creating its own reality in the behavior and thinking of test developers, practitioners, and those who are tested; how society is being drawn in to the tautology and perpetuating the illusion that we are observing nature; and how we might move toward a scientific or observational theory of mental testing. Please come back on August 17 and continue to comment to your heart's content.