All too frequently I come across a research paper in which the author asks (and answers) the important question of test validity this way: “Smith and Jones (2011) found that Skitch's test of fluency has adequate reliability and validity.” That's all, no detail, just a magical incantation and voila! Skitch gets a gold star.
This reminds me of a joke my family and I use during road trips. One of us will say, “Hey, I know a place that has the world's best cup of coffee!” The rest of us then ask, “How do you know it's the world's best cup of coffee?” and the person will answer, “Because look, it says so right on the sign!”
The same thing is going on in the example of the research paper. Can't Smith and Jones (2011) hold up a sign to assure us that the “test is valid” and thereby settle the question?
No, they can't. At least, not without a lot more discussion.
A test, strictly speaking, is never valid. A shocking thing to say, no doubt, because it seems to go against what we remember being taught about test validity in high school and college. But it's true; a test is never valid.
Now, I realize that validity is a shorthand term that refers to something more substantive than the test itself. But how many people know what that “something more substantive” is? My goal in this blog post is to help you think more clearly about validity.
The word valid is an adjective. And an adjective, we all know, modifies a noun, a something. And that something is not the test in and of itself.
Let me introduce you to an explanation of validity from the latest (and in my opinion greatest) edition of the Standards for Educational and Psychological Testing, published by the American Educational Research Association (AERA). This publication includes nothing short of a master seminar on truly clear thinking about the question, “What is validity?” Here is their definition:
Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests.
Let that sink in. Validity is about evidence. It is about the reasons you give to justify using the test. When a psychometrician refers to test validity, what they have in mind is the complete set of articles published in professional journals, internal technical reports, and white papers that describe in detail how well the test scores do the thing they are intended to do.
For example, the Scholastic Aptitude Test (SAT) has been widely studied to determine how accurately it predicts college GPA. A study of its predictive accuracy is called a validity study, and based on such a validity study, one can make the following circumscribed claim: the SAT is valid for the purpose of predicting college GPA. In truth, even this circumscribed claim is far too broad to be of much use, but at least we're getting somewhere.
The main thing you need to know about validity is that the root adjective “valid” does not stand alone. It has to point to something very concrete. Here's a little thought experiment to convey my meaning. When you have an impulse to use the word valid in reference to an educational test, try to substitute the word suitable in its placeand notice the tension it leaves in your mind.
Imagine an acquaintance invites you to a social function, saying that you would probably be a suitable companion for the occasion. Are you at all curious what that means, exactly? You want to say, “suitable for what?” To impress people? To shock them? To make a statement? To fill an empty chair you couldn't get anyone else to fill? If the goal is not stated explicitly, then your friend's calling you a suitable companion is at the very least coy, and at the very worst, a big red flag.
When talking about the concept of suitability, we easily recognize that there is an end goal in mind. Although it can be less readily apparent, the same is true with the concept of validity. And when talking about test validity, determining what that end goal is is very important.
In the world of testing, here are two answers you should never give to the question “suitable for what?”
Unsatisfactory answer #1: “The test is just suitable, that’s all.”
Even worse answer #2: “The test is suitable for anything at all.”
Let's flesh out this concept of validity a little further. When we use the word valid, we have an object in mind. A test is valid with respect to a concrete situation. A concrete situation, I propose, can always be said to implicitly involve a Who, What, When, Where, and How:
Who do we propose will take the test? What grades are covered by the items? Can it be taken in and out of grade? Which ones? Is there evidence that the test applies fairly to different groups? If not, for which groups are inferences about test scores trustworthy?
What is being inferred about students on the basis of their test scores? Are they being placed in a Special Education program? Are they admitted to college? Are they denied a driver’s license? Is their end-of-year test performance predicted? Are we predicting success in an advanced placement curriculum? Are we diagnosing a psychological disability?
When is the test being used? Is it during the fall when school is starting? Is it in the spring after learning has occurred? Is it after high school? During high school? Is the test meant to be given immediately after a curriculum lesson?
Where is the test being administered? Is it assumed that the student is in a non-distracting environment? Is the test proctored? Is it possible to cheat or consult notes? Can the student take breaks and leave the premise?
How is the test administered? Is it an oral test? A computer-based test? Is it a paper and pencil test? Is it given in the student’s native language? How was it translated? Was the essay question scored online by machine algorithm or offline by a teacher?
When you consider all the combinations of these questions, it’s easy to see that there are an infinite number of ways in which a test can be suitable, or valid. This is why test makers never want to communicate a blanket claim of test validity. A test is a living, breathing thing that begins its life as a blank slate, but gradually proves itself (or not) by means of how it performs when it is actually put to use in specific situations.
The only way to establish validity and understand it more comprehensively over time is to keep asking questions. And as educators, we're all good at that.
Interested in learning more about Edmentum’s approach to testing, validity, and efficacy? Check out this post on Study Island Benchmarks & Predictive Validity! You can also take a look at our white paper on Construct Validity in Study Island.