Psychological Science: The [Non-]Theory of Psychological Testing

“Psychological Science: The [Non-]Theory of Psychological Testing – Part 1” can be found HERE.

“Psychological Science: The [Non-]Theory of Psychological Testing – Part 2” can be found HERE.

Note: My views in these three articles on Psychological Test Theory (PTT) are limited to psychological science, particularly what we know as the statistical theory of psychological testing: Classical Test Theory (CTT) and Item Response Theory (IRT). While I do not cover, explicitly, classical inferential statistics in psychological research, some of my ideas would extend to that domain, particularly on Plato's Ideal Forms, and the tautological nature of some psychological statistics. I have nothing to say about how my views apply, or not, to engineering, quantum physics, and neural activity in the brain. At times, I use 'overstatement' as a rhetorical device to make a point.

“If a thing exists, it exists in some amount; and if it exists in some amount, it can be measured.” *

* –E. L. Thorndike (1874-1949), Introduction to the Theory of Mental and Social Measurements (1904)

“Thus, if we perceive the presence of some attribute, we can infer that there must also be present an existing thing or substance to which it may be attributed.” **

“For I freely acknowledge that I recognize no matter in corporeal things apart from that which the geometers call quantity, and take as the object of their demonstrations, ….” **

** –Rene Descartes (1596-1650), Principles of Philosophy. I:52 and II:64 (1644).

More philosophical embarrassments for psychology

These oft quoted, or paraphrased, ideas have been unfortunate for psychological science. E. L. Thorndike's pioneering contributions to educational, social, general, and industrial psychology, and animal behavior are substantial, without dispute. However, this forceful attempt to establish, with 'common sense', a justification for psychological testing, was no more than a restatement of Plato's Ideal Forms. At the beginning of his illustrious career, psychology and philosophy were commonly administered in the same college and university departments. During his lifetime we saw the ascendancy of psychological science as a discipline separate from philosophy, but with a vestige of relationship issues from the prior marriage of long standing.

Descartes gave us another problem, frustrating when we look back on it, that limited progress in science and philosophy for nearly 400 years. When it came to mental life (thinking, reasoning, cognition, memory), there was a clear line of demarcation between humans and the rest of entire animal world. Humans could think, plan, imagine, reason, and solve complex problems; animals functioned at the level of instinct and base neural connections. Thorndike reinforced this notion by a refusal to see the possibility of human-like thought processes in research on animals. The problem of mind and body, since Descartes, advanced only by putting a hyphen between the two words, 'Mind-Body'. Fortunately, philosophy has stopped asking itself questions that can't be answered.

Alright, not all scientists, and philosophers are perfect. The point I wish to make, though, is that the same philosophical and mathematical assumptions that help to spur advances in psychological science, can also limit its future development. If psychology, as we know it, does not get it's scientific-philosophical-mathematical act together, it will be eclipsed by neuro-cognitive science, fMRI, genetics, biology, endocrinology, pharmacology, and the commercial testing industry. The prefix 'psycho' may be sprinkled, amply, through course catalogs, but we might be hard pressed to justify administering it as a separate discipline, and distributing research dollars on a par with other departments. It is my personal view that we have less than one generation to shake ourselves loose of an entrenched failure to reform our scientific shortcomings.

The Revolutions of 1848996-7

I was completely oblivious to the opening salvos, and the barricades, when a few anarchist psychologists and disillusioned social science researchers went into the streets and called for an end to psychological and inferential statistics as we knew them. The voices of discontent and progress were battling the entrenched department heads who thought their own research, and that of their students, would not reach publication. Sequences of courses were breached, content was revised, and a few of the old guard went, voluntarily, to reeducation camps (pre-convention seminars). In April 1996, I started a social science research company in Brewster, NY. I was preoccupied with hiring staff, renting office space, buying equipment, marketing, writing proposals, and funding my own start-up business. I don't think I read a professional journal for a couple of years. Certainly, there was no time or money to spend the better part of a week at the annual conventions of the American Psychological Association (APA), and the Society for Industrial/Organizational Psychology (SIOP).

When I finally discovered what happened, the smoke and the barricades were gone, and there was no obvious trace of the birth of an important movement. I had to go looking for it. It's always chancy when you try to pin down the one event that started it all, especially when many factors, over time, may have made that one moment an auspicious one. So, here's my candidate for the shot heard round the world of psychological statistics. It was an article by Frank L. Schmidt, Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers, in Psychological Methods, I, 115-129. The following year, 1997, saw a great deal of focus at the APA convention on some of the issues discussed by Schmidt. At the risk of over simplifying the central ideas that were debated, I make the following observations: The concept of the effective well conducted primary study is an illusion; statistical testing in the single study is virtually worthless; the value of the primary or single study is only assessed, years later, in a meta-analysis. The revolutionaries wanted to ban all reporting of significance results. The brave comrades who manned and womaned the barricades did change some of the rules of peer reviewed journals in psychology for the better.

What the hell was going on that led bookish, nerdy researchers to take to the streets, and put their jobs and reputations on the line? My view is that the fundamental suppositions of psychology as a science were fatally flawed. They were not rebelling at psychology, nor were they rebelling at the science of psychology. They were simply saying that what they've learned in graduate school, and continued to teach their students, was frustrating their progress as social science researchers. They were wondering, in my opinion, if psychology as a science could hold it's own with the other more successful sciences. I think many of the courageous warriors DID NOT understand that the philosophy and statistical theories undergirding their science was dooming them to failure. They didn't appreciate this problem nor articulate it this way; they just knew it wasn't working. The result of brave men and women standing up to the tyranny of the past and the established order of things, was to deemphasize statistical tests in refereed journals, and provide results that were more descriptive rather than purely inferential.

Pre-revolution history

The pre-revolution history shows that this moment in time was inevitable. Psychology, as a science, was given a huge boost by the psychologists who cut their teeth on psychological testing and psychological research in the U. S. Army Air Corps during WWII. It is not an exaggeration to say that the science of psychology in the second half of the twentieth century is indebted, immeasurably, to the Army Air Corps. Among many truly outstanding psychologists was Robert L. Thorndike (1910-1990), who followed his father, E. L. Thorndike, into Teachers College, Columbia University, in New York City. He was one of the best psychologists and psychometricians of the twentieth century. Few, however, took note that he was very clear that PTT was a tautology. Many of my colleagues will probably bristle at this and find it implausible. After all, he was one of the giants in the field of PTT, and contributed mightily to the literature, and the texts that are still used today. Is there a paradox or contradiction here? No. He was intelligent enough, and so well versed in the statistical theory of mental tests, that he understood it for what it was: a highly useful tool for society (as it still is) that was based on a tautology.

R. L. Thorndike demonstrated the tautological nature of PTT with a simple example. First, we need a little background. The two most important cornerstones of PTT are the concepts of validity and reliability. Validity is a property that is imputed to a psychological or educational test, if it can be demonstrated that it is measuring what it is intended to measure. For example, a school district wants to use a standardized test to assess mathematics achievement among its eighth graders. How do the school superintendent, principals, and parents, know that a particular test really measures mathematics achievement as it relates to their educational requirements? Test publishers claim a test, in their catalog, measures eighth grade math achievement; upon inspection it may even look like a test of eighth grade math. Is it valid as a test of eighth grade math achievement? It is valid (it has validity) only if the test is subjected to specific kinds of research and examination that are spelled out in a document called, “The Standards for Educational and Psychological Testing.”

Reliability is a property that is imputed to a psychological or educational test, if it can be demonstrated that it yields consistent results with repeated use, all other things being equal. The statistical determination of the property of reliability is founded upon the concept of parallel tests. Achievement Test A, and Achievement Test B, are parallel if the content is essentially the same with, possibly, some variation. For example, Test A asks a student to solve for the unknown in the equation, 24 = 4 + x. Test B uses the equation, 13 = 3 + x. If there is only one test form available, the items of the single test could be divided into two tests of equal numbers of items. Thus, we have a Test A, and a Test B, administered at the same time on one form, and in one sitting. These parallel tests are referred to as split-half, parallel tests. It is also possible to use a single test as its own parallel test, with two different administrations of the same test. Like validity, the determination of a test's reliability is the result of prescribed research and statistics found in “The Standards for Educational and Psychological Testing.” Reliability is an indispensable, but not sufficient, condition for the validity of a test.

Now, how does Thorndike demonstrate the tautological nature of PTT? He does it very simply. Reliability is defined in terms of parallel tests, and parallel tests are defined in terms of reliability. If you want to determine a test's reliability, then create a parallel test and follow the recipe in the “Standards.” Tests are parallel, if they can be used to measure reliability. What do we have? We have circular reasoning, also known as a tautology: Reliability cannot be imputed without parallel tests, and parallel tests, as a concept, do not exist apart from their use in determining reliability. All statements in a tautology are necessarily true.

Some of the leading, early psychometricians recognized this tautological problem as early as the 1930s – definitely in the 1940s and 1950s. The brilliant psychologist, Jane Loevinger (1918-2008), was probably the first (and for a long time the only) psychologist to make a stink about the fact that there was no non-circular definition of test reliability. She was ignored by the big name psychometricians of her day, but her assertion stands, and has never been challenged, successfully. Her personal history is fascinating. In spite of blatant gender discrimination for decades, hers was an exceptional career as scholar, teacher, and researcher. TRIVIA ALERT: Jane Loevinger singlehandedly created the academic area of women's studies in the university. Please say a prayer of thanks, or give a moment of reflection, for her gift to all of us, women and men.

The biggest antecedent to the revolution occurred 37 years earlier. The philosopher H. Feigl published the article, “Philosophical Embarrassments of Psychology,” in the APA's flagship publication, American Psychologist, 1959, 14, 115-128. Feigl was one of the most influential philosophers in America, following his immigration just prior to the outbreak of WWII, and his appointment at the University of Minnesota. He is associated with ideas like philosophical analysis, logical empiricism, and scientific empiricism. With my penchant for over simplification, I would like to say that much of his thinking, and influence, were summed up in two humble questions: “What do you mean?” and “How do you know?” Continuing with great brevity, I would like to say that his paper had two intended effects, and two that were unintended. Feigl correctly pointed out the serious flaws in the psychoanalytic traditions that still believed they were doing the Lord's work as good scientists. Their pretenses to empirical science were – shall we say – embarrassing. The second, and probably intended, result, was to give succor, of a philosophical kind, to researchers who had enough with the hitherto, arrogant psychoanalytic pretenders to science. One of the unintended consequences was to give justification to the positivist behaviorists to seize the offices of the recently deposed, arrogant pretenders. Subsequently, it was harder to get your paper published if it spoke of cognitive function, mental process, meaningful verbal learning, and object relations theory in ego psychology. Scientific speculation resulted in the 666 branding of the foreheads of the incorrigible researchers. The second unintended result, was the cumulative frustration, thirty-five years later, of the psychological research community, after three decades of a free hand at a positivist, reductionist research model did not get them any closer to answering important questions. Richard Feynman, in an interview, summed up the lack of satisfying scientific progress in psychology, very nicely. He said psychology had adopted the proper scientific form, but we were not producing any laws of nature. Until we do, he said, psychology was a pseudo-science. Personally, as a psychologist and a scientist, that hurts. There are a few fundamental aspects about mental life and behavior that we have described, but, in the main I have to agree with him. The solution is clear: get our philosophy and science right before we go ahead. Otherwise, we will lead each other down a path to more frustration.

What got us here, and how are we going to get out?

The Revolution confirmed the inadequacies of the established regime. What was so familiar to them in the past, was, and still is, hard to see as wrong. For example, classical inferential statistics for psychological research, appears to be joined at the hip of experimental design that we teach in psychological research methods classes. The classical model presupposes an ongoing accumulation of data that stand apart from the researcher, and who is objective and dispassionate. A Bayesian model of statistical inference compensates for many of the limitations of classical statistical inference. I won't go into all the goodies associated with a Bayesian approach. I want to focus, instead, on the major roadblock to incorporating Bayesian inference by social science researchers. What is untenable, if not downright unnatural for the classically trained, is that the Bayesian approach functions by modifying the beliefs and projected assumptions of the researcher. The researcher is prompted, constantly in the research process, to take a position on what is likely to happen, based on prior data. What happened to the objective, detached psychological scientist? The Bayesian focus of shaping the belief system of the researcher just doesn't compute for most investigators in the social sciences.

Those not familiar with a Bayesian approach to statistical inference assume there must be a corollary in the classical model. There isn't. I gave a talk on Bayesian sampling, some years ago, to our graduate and post-doc interns in industrial psychology at IBM. One of our very bright interns suggested that we could change the value of alpha, the probability of making a Type I error (incorrectly rejecting established knowledge when, in fact, it was true all along), using a classical model of statistical inference. I asked him if he would like to report to the executive of compensation and benefits that the percent of employees who were satisfied with their pay was 38 percent, + or – 41 percent. He would get thrown out of the executive's office, and asked to pack his bags and head back to the University of South Florida. That is the consequence of trying to use classical inference when Bayesian is more appropriate. It is a very different process that sounds like make believe to the uninitiated. It will be at least a generation, if at all, for those trained in classical inferential statistics to consider using a Bayesian approach.

Another thing we must do is to understand the stifling effect on progress in psychological science by: 1. Our philosophy that fails to relinquish the World of Ideal Forms; 2. the tautology of the statistical theory of mental tests; and 3. the assumptions that our models of the distribution of traits are depictions of reality. No matter how closely an observed distribution of a measured psychological trait APPEARS TO LOOK LIKE a Normal Bell Curve, or another idealized distribution, we must understand that the curve is a mathematical model, a human construction, that is used because it has utility. The model of the Normal Bell Curve is no more a depiction of reality than Ptolemy's model of the universe. Ptolemy's model was accepted as the truth of reality because reality, as it was perceived, fit it perfectly. Ptolemy's description of reality allowed western civilization to make very accurate calendars, and predict events that were so important to sustaining civilization, like when to plant. Observation fit his idealized curve, exactly. It was so successful, and accepted as obviously real, that it obviated the need to explore a different model of reality for many centuries. When it was virtually synonymous with truth, as determined by the church, the arbiter of all truth, investigation into new ideas was aborted, discouraged, or persecuted.

The problem for PTT is not that is isn't very useful, in the way that Ptolemy's model of the universe was very useful – in fact essential to the survival of whole peoples. The problem is that the current state of PTT limits progress in psychological science. Here's how. Let's look at the philosophical straight-jacket that is Plato's Ideal Forms. One of the fundamental assumptions of Forms is that, since they can't be observed directly, we can only observe their manifestation in the World of Experience as successive approximations of the REAL THING, which REALLY EXISTS. Personal experience over a lifetime gets us ever closer to the truth because, by definition, life is an accumulation of closer, successive approximations. So what the hell is wrong with that, you ask. Plenty! Assuming an unchanging reality that we continue to approximate, completely shuts off the option to chuck the whole thing, say it was all bullshit, and start over with something (Monty Python) completely different. This was the quandary that Kepler and Bruno were in. Since we all know the earth is at the center of God's creation, then there must be something wrong with our data – worse still, we have to discount them as an illusion.

Let's take a look at Isaac Newton's work on physics and astronomy, resulting in his Principia. The mathematical principals of the day, influenced greatly by Archimedes, Pythagoras, and Hindu and Islamic scholars, were insufficient for the work he was doing. So he invented a new system of mathematics that would work for him – Calculus. (Calculus was also invented independently and contemporaneously by Leipzig.) Try to imagine Newton trying to do his research with only Pythagorean mathematics, back in the time when Pythagoras was keeping secret his discovery of irrational numbers, and solid geometric constructions like the dodecahedron. Newton would have nowhere to go. Imagine the Church saying, this is the extent of truth, and there is nowhere else to go, anyway. This is the highly circumscribed situation we find ourselves in regarding psychological science, in general, and PTT in particular. We accept representational models as actual reality. We can't see the tautologies for the forest, because we've become too accustomed to using them as if they were legitimate scientific theories based on observation. Einstein's general relativity was not an extension, elaboration, or refined approximation of Newton's work on gravity and motion. Einstein threw it all out. Newton was almost completely wrong. Sometimes scientific progress is incremental, but let's not confine ourselves in a scientific prison with highly circumscribed assumptions before we even begin.

Thank you for taking the time to read and, hopefully, comment on my ideas. At another time I will return to the intricacies of PTT and discuss them in more technical detail.However, that will not be for a while. Next month I will return to a more familiar genre of non-fiction. All I will reveal at this time is the title, “My Life as an Observer: Target Practice.” See you on September 14, 2009.