Today’s blog is the first in a series of three blogs about benchmark testing. Many of us are familiar with benchmark tests. They are often given at the start of the year—which is why now is a good time to have this discussion. If your school is like many others, benchmark tests will be administered not only in the fall but also several times throughout the year to gauge student standing relative to well-defined outcomes.
Despite their frequent use, there’s a lot of confusion surrounding benchmarks. What are they? What purpose do they serve? Are there different kinds? How can they be used most effectively? These are all questions that we in Edmentum’s research department field regularly—and frankly, grapple with ourselves as well. Our goal with this series is to explore some of the nuances of benchmark assessment. Let’s start by reviewing the basics and clarifying key definitions.
Why administer benchmark assessments?
All educators know—it’s all about student outcomes. One might say that the purpose of a benchmark is to influence well-defined outcomes—they’re a form of assessment for learning in preparation for the assessment of learning that summative tests eventually measure. Notice the preposition in the phrasing; benchmark assessments are for learning, not of learning. A benchmark is given when there is still time to intervene, and it should alert the teacher as to how things are going, much like a checkup at the doctor is made to promote good health in the future. Similarly, teachers administer benchmark tests to ensure that students are on track to achieve the relevant learning outcomes, and if they are not on track, it’s not too late for a teacher to adjust instruction. That’s the point of benchmarks in a nutshell.
To better grasp the very basic distinction we’re making between benchmarks and outcomes, think of your college classes. You had a final exam to judge whether you met the required outcomes. You usually had a midterm test to see if you were on track to pass the class. If so, the final exam was the outcome test, and the midterm was the benchmark test.
Are there different types of benchmark assessments?
Of course, K–12 education is quite different than college. The simple distinction between outcome and benchmark lacks sufficient contour. We must introduce a distinction for K–12. Benchmarks can be divided into two kinds: interim tests and formative tests. We will here follow the terminology provided by Dr. Marianne Perie, director of the Center for Assessment and Accountability Research and Design (CAARD). For more detail on evaluating benchmarks, Dr. Perie and her colleagues Scott Marion and Brian Gong published this very interesting article on “Moving Toward a Comprehensive Assessment System: A Framework for Considering Interim Assessments.”
An interim assessment is administered during instruction to measure knowledge and skills to inform educator or policymaker decisions at the state, district, or classroom level. Results are reported at the aggregate level. For example, the Measures of Academic Progress (MAP) and the Renaissance Star assessments have been adopted by districts and states as interim assessments. Schools and districts report average scores across students, schools, and districts throughout the year.
The emphasis of an interim assessment tilts toward administration and accountability. Is there another kind of benchmark that is perhaps more organic than interim assessments? Something that feels more intimately related to instruction? Well, some teachers give quizzes frequently during instruction. A quiz is like an interim assessment in that it comes before the final, and it tries to positively influence the results on the final assessment. But quiz scores are not generally used beyond the classroom level. And quiz scores are typically not aggregated to the school or district level. Quizzes are examples of a different kind of assessment known as formative assessments.
A formative assessment is typically used solely by the teacher to check if instruction is having its expected result on individual students, to verify that they are on track to achieve all relevant outcomes, and to diagnose gaps in learning. Thus, formative assessment might be thought of as an essential part of an individualized learning infrastructure. That’s not always the case, but thinking about it this way gives us insight into formative assessments.
One more thing needs to be said about formative assessments. Many years ago, the classroom quiz was certainly the prototype of a formative assessment. But, educational technology has matured to the point that we can now automate the preparation and administration of high-quality formative tests. These tests can be as reliable and valid as the well-known end-of-year tests or the third-party interim assessments, yet used strictly at the classroom level, and only for assisting the teacher.
So, one should keep in mind that the simplicity of an assessment is not the indicator that it is a formative assessment. What makes an assessment formative is that it pertains strictly to the individual student and lacks the administrative and accountability aspects of interim assessments.
Formative assessments span a diverse range of techniques, including exit tickets (brief, ungraded questions given at the end of class), quizzes, thought papers, oral preliminary exams in graduate school, and computer-assisted exams. These are widely different in aims, complexity, time, and implementation.
Is one type of benchmark assessment best?
These definitions would seem to suggest that interim tests are of higher quality or more reliable and valid than formative tests. But, as we’ve seen, it is a mistake to draw that conclusion. In fact, a formative assessment can be a better choice for some settings. Whereas interim assessments tend to be given under highly standardized conditions, formative tests—lacking the high stakes implications—can be administered across a wide variety of technology platforms and settings. Because the purpose is not to compare one child to another, but rather to support the unique needs of each child, formative testing may be the right choice for teachers, schools, districts, and states in many situations where determining student needs and tailoring instruction is the goal.
There is one caveat to be added. Keep in mind that whether a benchmark test is properly designated as interim or formative is determined primarily by use—when the test is administered, how often, under what conditions, where the scores go, and what decisions are made from the data. Indeed, many interim or summative end-of-year tests expose their retired item banks to offer prep courses and practice testing materials. One item in the same test can have different uses.
The goal of this blog was to dig into benchmark testing to clarify the terminology surrounding these tests. We tried to show two very different use cases for benchmark tests. Now that we can speak fluently about formative and interim assessments, we can look forward to the next installments of this blog series where we will look at the different ways that these tests can be administered and used. We will find that it all comes down to what the state, district, school, and teacher hope to accomplish with benchmark testing. And, hopefully, by the time we finish the series, teachers will feel more knowledgeable in making decisions about what kinds of assessments they should choose to best help their students.
Looking for additional resources to support effective assessment in your classroom? Download our Guide to Assessment for Learning!