Psychological Science: Measurement, Uncertainty, and Determinism

by Norman Costa

Scientific Psychology, an Oxymoron?

In the minds of many, including scientists from the more successful sciences, the field of psychology is not a science and may never be a science. The Nobel Laureate, Richard Feynman, was kind in his criticism of psychology as a science when he said that we have the form [of science] down, but we are not producing any laws of nature. In my view, psychology as a science has made some important contributions to describing mental life and behavior in animals and humans, but, on the whole, I tend to agree with Feynman.

I care about psychology as a science, very deeply. But, why should I care? Why should anyone care? Scientific psychology must care if we are to have confidence in our discipline as a science, in ourselves as scientists, and be respected by the larger scientific community as colleagues on an equal footing. For anyone who doubts that psychological science had a serious problem of credibility, consider the following:

The American Psychological Association (APA) did not issue a position in support of science in the classroom in the recent Dover, OH school case. The case involved the integrity of the school district's science curriculum for teaching of the science of evolutionary biology, and against the introduction of faith-based pretenses to science.
The ascendancy of the Association for Psychological Science (APS) is due, not in a small way, to the failure of mainstream psychology to embrace the mantel of science.
The de facto secession of Division 14 from APA and the creation an independent professional association, the Society for Industrial and Organizational Psychology (SIOP).

Students today are not as [insert your own text] as when I was a student.

I began teaching psychological research methods at the undergraduate and graduate levels after two careers in research psychology – one at IBM Corporation, and another as a social science research consultant. The new mission I gave myself was the training of the next generation of research psychologists. Very quickly, I fell in love with teaching, and I fell in love with my students (not inappropriately, I might add.) For the father in me, it felt like having my children back home again.

So it is with some embarrassment and guilt that I confess to entertaining the thought that my students didn't know anything. I couldn't believe they were as ignorant as they appeared. Is it really true that students today are not as good as students when I was an undergraduate?

I mentioned this to a colleague and friend, more seasoned than I, who, quite a while before I did, entered college teaching after a long and successful corporate career. In the first semester of his new role as educator of the neonate professional, he came to the same conclusion as I. He complained to his wife that his students were ignorant of practically everything that mattered. They didn't know a damn thing. The light of perspective descended upon him when she suggested he put himself back in time to when he was an undergraduate. How much did he know about anything that mattered? He came to the conclusion that he had been equally ignorant. He didn't know a damn thing either at that age. I considered myself lucky that he passed on his wisdom to me at the time that I could best profit from it.

I became comfortable with the fact that my students were as smart as I was at their age, if not a lot smarter. If they were ignorant about important matters in life and scientific psychology, then it was my job to impart the requisite knowledge and skills. They might not know a damn thing coming into my classes, but they were sure as hell going to know a lot when they went out. At the end of each semester, I would praise my students for the fine work they did. I told them, however, that I had one complaint about them. They did not argue enough with their teachers, including me. I didn't want them to be disputatious for its own sake, rather I wanted them to question their own willing accession to 'truth delivered by the professor', and challenge ideas that didn't make sense to them.

Academic Psychology, Today

Unfortunately, the new found appreciation for my students, and confidence in them were not matched in my rediscovery of academic psychology. In my earlier “Psychological Science” articles, I described significant shortcomings in psychological test theory. The problems were as fundamental as they get:

A philosophy of science that is founded on Plato's Ideal Forms;
Regarding mathematical constructs, like the normal curve, as if they were derived from nature and actually represented nature;
Failing to see psychological test theory as a tautology and not a real scientific theory;
And equating the high utility of statistical models to scientific evidence of truth about nature.

The three-fold sources of my disillusionment, the grating on my intellectual nerves, and my uppity reactions are the many errors in some psychological research methods texts, my role as a reviewer of technical papers for one of my professional associations, and the indifference of some of my colleagues in various institutions. For some colleagues, the indifference was so forced as to thinly veil a seething hostility. On my part, as an example, I suppose it didn't help that I was quick to describe a long used and treasured text as a 'piece of crap.'

Measurement

I remember, very well, my frustration with the undergraduate text I was using. I didn't like the examples. The beginning of each chapter seemed to read well, but the later half seemed to dilute the earlier clarity and became confusing. Some of the graphics had style elements that, I thought, were supposed to contribute something to conveying meaning. They didn't. They were gratuitous and distracting. There were substantive errors throughout the text, and statements by the author that were just wrong.

The moment of truth came for me in the chapter on measurement scales. I told my students that my frustration level had reached the limit, and it was no more Mr. Nice Guy when it came to the shortcomings of the text. In the five prior chapters we dealt with fundamental concepts of science and experimental design. In chapter six we were getting down to the meat and potatoes of all scientific research.

Q. What makes science different from other approaches to understanding nature and ourselves?

A. Observation and the recording of data.

Q. And what is it about the recording of data from observation that allows science to be science?

A. The concept of measurement.

And here is where the fun starts for scientific research.

But there was no fun in that chapter on measurement. “The mighty Casey had struck out.” I was mad as hell and I wasn't going to take it anymore. The text book author expounded upon different types of measurement scales, and the use of frequency distributions, percentiles, measures of central tendency, and standard scores. Never once did he bother to define the concept of Measurement. There was no entry for Measurement in the book's glossary. So what was a frustrated and impatient research methods teacher supposed to do?

The Madeline Theory of Measurement

It was at this point that I first developed my Madeline lecture on Measurement. I opened my lecture on chapter six with the opening stanza of the classic illustrated children's story, “Madeline,” by Ludwig Bemelmanns:

“In an old house in Paris

that was covered with vines

lived twelve little girls in two straight lines.

They broke their bread

and brushed their teeth and went to bed.

They left the house at half past nine

in two straight lines in rain or shine

—the smallest one was Madeline.”

This wonderful children's story captured the attention of any student, no matter the circumstances of their upbringing. Almost everyone could associate to wonderful memories of being read the book as a child. But, my use of the story's opening had more of a function than eliciting childhood memories. I focused on the opening lines, “In an old house in Paris that was covered with vines…”. I began the substantive part of the lecture by saying that in an old government building in Paris, a half dozen levels below street level, were a number of objects that were guarded day and night, and stored under strictly controlled conditions of temperature and humidity. Among those objects was a platinum rod and a metal alloy sphere. The rod was the universal standard of length for the meter. The metal alloy sphere was the universal standard of weight for the kilogram. They were deemed universal standards as a result of international treaties going back centuries, professional associations, industry groups, and standards committees that work through the United Nations.

Measurement is a comparison to a standard. Without this understanding, there is no measurement. This begs the questions, as to how standards are determined, and who or what makes the determination? Not a few of my students were miffed, initially, at the answer to the first question: We make them up! Part of my lecture discusses the first person to articulate the notion of relativity. It was Galileo, and not Einstein. Galileo was the first to articulate the idea that there was no absolute reference to position and momentum. Pick a point of reference that makes sense, and you have a standard. This leads to some interesting examples about settling on a standard for length based upon the distance from the King's nose to the tip of his longest finger on an outstretched arm. I tell my students that if they and some of your friends agree to use some arbitrary determination of length, as an example, then you have a standard. If lots of people agree to use your standard, then you have a unit of measurement.

Finally, I make the point in my lecture that standards do not last. The old house in Paris, that is the depository for many of the world's standards for measurement, will become, someday, a reliquary. Electronic and atomic level standards are being developed and adopted, and will render the old standards obsolete.

Enter The Graduate Students and Their Instructors

At the level of graduate study in the social sciences, and research psychologists with doctoral degrees, the problem of defining Measurement can become downright ludicrous, if not professionally embarrassing. I see this in the technical papers I review for professional conferences. The definition most often cited is from a 1946 paper by S. S. Stevens, “On the theory of scales of measurement.” Science, 103, 677-680. It reads:

“…[M]easurement, in the broadest sense, is defined as the assignment of numerals to objects or events according to rules.” (p. 677)

Stevens' definition of Measurement is totally useless. I could go even further. It is one of the worst definitions I've ever seen. In common parlance, “It really sucks!” Upon closer scrutiny, it is not a definition of anything that has value or utility for the social sciences. I can assign numerals to phenomena according to 'rules' that have nothing to do with comparisons to standards. The result would be a collection of assigned scores that are totally meaningless for any measurement purpose in psychological research.

For example, I approach a person on the street and ask if I may assign that person a number. Given a positive response, I put my hand into a bag full of coins of different monetary values (U. S. legal tender coinage) and pull out a single coin. I determine the face value of the coin and assign that numeral to the individual. I pull out a nickel (a U.S. five-cent piece) and say, “You are a five.” As I leave to approach another person on the street, my graduate student assistant asks the individual to “…pick a number, any number, between 1 and 10.” The picked number is recorded along with the coin number. A final score (a measurement) is determined by multiplying the coin number by the picked number. The picked number was four. So the final measurement value is 20 (5 x 4 = 20.) Yes, this definition really sucks!

The only way the Stevens definition could work, is if the 'rules' involved a comparison to a standard. I have seen a few authors who appended the idea of 'a comparison to a standard' to the Stevens definition. When that happens, the centerpiece of Stevens' definition, 'rules', becomes superfluous. The definition reduces to, “…the assignment of numerals to objects or events according to…[a comparison to a standard.]”

Most authors of technical papers and books, who cite the Stevens definition of Measurement, do not realize that this is not Stevens definition, at all. In fact, there is some doubt that he accepts this as a definition of Measurement, or, at the very least, does not accept it as a good definition. To understand this you would have to go back to the work of the Ferguson Committee, established by the British Association for the Advancement of Science in 1932. The purpose of the committee was to determine whether or not real scientific measurement was a possibility for the psychological sciences. In other words: Is psychology a real science or not? The Ferguson Committee, dominated by N. R. Campbell, an important figure in the philosophy of science for the physical sciences, answered the question with a resounding, “No!” Of course, the report put its response in a highly technical treatment, focusing on Campbell's theory of scientific measurement based on physical additivity, the structural additivity of the mathematician Otto Holder, and, in my personal view, the fact that Campbell would rather die from eating bad shell fish than recognize psychology as a science.

Stevens' definition of Measurement was, as he states, a paraphrasing of Campbell's definition, although he does not give us Campbell's definition from the final Committee report. In Campbell's view, measurement involved the assignment of numerical values to phenomenon according to scientific laws. This meant that psychologists had to conduct experiments to demonstrate the properties of physical and structural additivity in psycho-physical, psycho-social, and psychological measurement. Physical additivity was akin to taking many one-foot rulers and laying them end-to-end along side a much longer object to be measured. Add up the number of rulers, and you have a measure of the length of the object. Structural additivity was a set of mathematical axioms developed by Otto Holding, and published in 1901. Today we see these axioms in our first courses in algebra. For example,

a is equal to b (a=b) or not equal (a < b; a > b).
For any lengths a and b, a + b > a.
Order of operation doesn't matter, a + b = b + a.
Additive relation is indifferent for compound operations, a + (b + c) = ( a + b ) + c.

Stevens ignored the report of the Ferguson Committee. In short, he dismissed the matter entirely and felt that they simply got it wrong. He did not bother to address the call for experiments that would address the issues of additivity. It fell on later psychological researchers and statisticians, in the 1960s, to develop the mathematical proofs that scientific measurement was clearly in the domain of psycho-physical, psycho-social, and psychological science.

Stevens' Theory of Measurement Numerical Scales

Stevens went on to develop his theory of measurement scales, that is well known to all students of psychological research methods. On this page is the graphic of Table 1 from his 1946 article. He asserted, correctly, that different types of measurement scales are derived from different measurement operations that we use to produce them. This is of utmost importance to psychological science because, depending upon the type of measurement scale a researcher is using, different decisions must be made about how to analyze the data. For example, you can't compute averages for nominal measurement scales like religious affiliation. Suppose the numeral '1' represents a Southern Baptist, '2' represents a Zen Buddhist, and '3' represents a Sufi Muslim. It is meaningless to compute an average religious affiliation score for a sample of people for which you have data.

Stevens thought he was developing a theory of measurement that would yield a definition of Measurement based upon the operations required to produce measurements. He was greatly influenced by the concept of operationalism in the work of fellow Harvard faculty, Percy Bridgman, a Nobel Laureate in Physics. In the end, he did no such thing. A close examination of his table shows that his theory of measurement is really a self-contained, mathematical description of the properties of different numerical scales. What he developed was a theory of numerical scales, not a theory and definition of Measurement. He constrained himself to the confines of the internal mathematics involved, and never ventured to examine the relationship of a fundamental or derived measure to a standard. He was stuck on the fact that the differing operations [different 'rules'] that were applied, would impute different properties to the assigned numerals. He was correct, as far as it went. But, we still don't have a definition of Measurement.

Throughout Stevens' discussions of Nominal, Ordinal, Interval, and Ratio scales, Measurement as a comparison to a standard is implied to the point that it almost jumps out and bites you on the nose. He was so focused on the operational aspects and internal mathematical properties of the resulting scales, that he simply missed the essence of what he was talking about. He couldn't see the forest [the concept of Measurement] for the trees [the scale properties that are contingent on the operations of measuring]. If he thought it was too obvious to mention, which I highly doubt, then he should have said it out loud. The fact is that he already committed himself to a useless definition of Measurement. Stevens redeemed himself, thankfully, by giving psychological and behavioral sciences an understanding of different types of numerical scales that serves us to this day.

Measurement is a comparison to a standard

E. L. Thorndike, who bridged the 19th and 20th centuries, and his contemporaries were keen on developing the scientific foundations of psychological and the behavioral research. He was the first psychologist to attempt a codification of the properties of scientific measurement in the social sciences. Though his effort fell short – he never captured the idea of comparison to a standard, for example – it was an impetus for others who followed, and who sought to elevate psychology to a science. We are not there, yet. But, we can get there if we get our fundamentals in order.

How have we come this far without a consensus for a good definition of Measurement?

Is it really true that so many of us haven't a clue, that as a science, and scientists, we are missing something very important?

In Part 2 of this article, I will discuss the issues of Uncertainty, and Determinism in the science of psychology. I think a good subtitle for Part 2 might be: “How Psychologists Quote Heisenberg and Drive Physicists Up the Friggin' Wall!”

Thank you for reading. Please COMMENT, PRAISE, CRITICIZE, SUPPORT, DENOUNCE, ARGUE, and DEFEND as you are inclined to do. All of your observations help me, enormously, in developing my ideas. I'll see you here at 3Quarksdaily.com on January 3, 2010. Happy Holidays and Happy New Year!