EMERGING DIALOGUES IN ASSESSMENT

In Defense of Standardization: Let us Move on to an Actual Villain

February 4, 2019

Caroline Prendergast & Keston Fulcher

This paper continues a conversation begun at last year’s AALHE conference in Salt Lake City. Natasha Jankowski, Javarro Russell, and Keston Fulcher (this paper’s second author) were asked a series of questions by panel moderator Ross Markle. When the topic of standardization was raised, the crowd exhibited discomfort. A panelist asked the audience: what is the first word that comes to mind when you hear "standardization"? The first response was "bulls**t." Other responses followed. The negative theme persisted. We concluded that many higher education assessment practitioners feel that standardization is bad while non-standardization is good.

However, our own experiences have conflicted with this knee-jerk “standardization is bad” sentiment. We’ve witnessed multiple situations where a lack of standardization rendered meaningless results. For instance, Fulcher once worked on a writing assessment where short poems were evaluated alongside 20-page technical biology reports and everything in between. The same rubric was applied to all papers. The subsequent “writing scores” were averaged and then reported as [the institution's] evidence of students' writing proficiency. These results neither reflected students' writing proficiency nor were helpful in thinking about how the institution could improve students’ writing. The most glaring methodological flaw was the lack of standardization regarding writing products. Poetry versus technical biology lab reports? It’s a higher education example of comparing apples to oranges.

In this paper, we argue that careful standardization is necessary for quality assessment practice. And, we challenge assessment practitioners to think more thoughtfully about the topic. We first provide common criticisms of standardization. Second, we argue that the general arguments against standardization stem from a lack of stasis, or common ground, in our discussions. Third, we suggest that we first think about the intended inferences and uses of our assessment, and then think through where and to what degree standardization is appropriate. Fourth, we provide several points in the assessment process where one should consider what degree of standardization is appropriate. Lastly, we hope to re-direct assessment professionals’ wrath to a true enemy of good assessment: practices that do not align with goals.

The question of finding the "sweet spot" of standardization was brought up on the ASSESS LISTSERV.

We wish to acknowledge the contributors[1] as their comments helped us frame the paper. (To view the conversation follow this line and search on "sweet spot": https://lsv.uky.edu/scripts/wa.exe?A1=ind1812&L=ASSESS.)


The Loose Conceptualization of Standardization

The loose gestalt of assessment standardization in higher education looks something like this: an external entity - like a regional accreditor - requires assessment. As a response, a mid-level administrator chooses a multiple-choice test from a testing company like Educational Testing Service (ETS) or ACT. Faculty have little say in the test selection process, nor do they buy into the decision. If lucky, the program in question is able to obtain a representative sample of seniors. They take the test—out of class—with little personal consequence. The school receives a fancy quantitative report, comparing School X’s program with user-based norms. This information is reported out to the regional accreditor, but no one internal to the university believes the information is useful. Faculty state that the test is not aligned with their curriculum and note that most students gave little effort on the test.

This dark scenario of "standardization" resembles what Ewell (2009) referred to as the exo-skeletal assessment model that prioritizes compliance and reporting. Nevertheless, the above scenario covers tremendous territory, not just standardization. Larry J. King from the AALHE listserv commented that standardization needed to be better defined. He asked if we were referring to standardized (commercial) tests, standardized assignments, or just standardization in general. As outlined in the scenario above, standardization could mean any of these things individually or in combination.

 

Roots of the Problem: Finding Agreement

The rhetorical concept of stasis is useful in understanding how the argument about standardization has become so polarized. Stasis is the process of questioning the basic tenants of an argument prior to its discussion (Raign, 1994). Through stasis, we clarify what it is that we are discussing. We also determine the scope of the discussion as well as the areas in which we agree. Stasis theory requires consideration of the nature of the issue at hand by intentionally pausing our discussion. During this time, we can whittle away at the things that may distract from the argument: What do we know to be true? Upon what do we disagree? When these questions are answered, they can help us engage in more productive arguments by identifying the areas where we agree and the areas where we diverge. If these questions are ignored, we can end up lost in unproductive arguments.

Allow us to pause, then, to consider what we are discussing when we talk about standardization. What do we mean by the term? Where do we agree? Where do we disagree? In reviewing the ways in which we discuss standardization in higher education assessment, it seems clear that we do not agree on the very nature of the issue. Without this baseline agreement about what constitutes standardization (let alone when and how it poses problems), we cannot move forward with a useful discussion about standardization. For the following discussion, let us define "standardization" as any methodical control exerted over the assessment process to ensure some degree of comparability, similarity, or agreement. In doing so, we acknowledge that the forms and degrees of standardization vary widely.

Below, we will lay out a few of the various ways in which assessment practitioners conceptualize standardization. Our hope is to demonstrate that standardization itself is not problematic. Instead, standardization poses problems when it does not serve our goals for assessment.

 

How Should We Think About Standardization?

Our discussions about standardization may become more nuanced when we contextualize the conversation within the goals of our assessment processes. Therefore, our first question should not be, "how standardized is our assessment?" Instead, we might begin with the following questions:

  • What is the purpose of the assessment?
  • What are the inferences we’d like to make from the results?
  • How will the results be used?

The answers to these questions should drive the construction and implementation of the assessment process. For example, consider an assessment with the goal of determining which medical students are competent enough to become doctors. It is likely that we will require a standardized definition of competency, as well as a common measure that is applied to all students who have met a certain set of qualifications in their medical training. Finally, we would develop appropriate scoring criteria given the type of items included in the measure (e.g., a key for selected response items or a scoring guide for constructed response items). Given the stakes of the measure, we would likely create a standardized method of training raters for any performance tasks included in the measure. Therefore, we would standardize at least five things: our definition of the construct, the measure of competency, the eligibility requirements to take the assessment, the scoring criteria, and the rater training.

Alternatively, consider an assessment that seeks to identify History 101 students’ strengths and weaknesses with respect to writing. Faculty intend to use results to make curricular decisions. In this case, we will likely want to provide some constraints on the writing process (e.g., providing a prompt and setting a maximum and/or minimum word count). Additionally, we are likely to design a rubric for use in evaluating students’ writing. Therefore, we would standardize at least two things: the task provided to the students and the rubric. Depending upon our goals, we may also standardize the setting in which the test is administered by requiring all students to attend a two-hour testing session, during which they write the essay.

Finally, consider a classroom assessment seeking to provide diagnostic feedback on students’ oral presentation skills. The assessment is given in a single section of a single course, and students will be graded based on their performance. In this case, we are again likely to require a rubric, and we may also desire constraints on the oral presentation (e.g., a five- to seven-minute presentation about a current public policy debate). However, we are unlikely to standardize most other elements of the scenario. The low stakes and small-scale nature of the assessment are unlikely to warrant an expensive rater training process, for example.

The stakes, scope, intended inferences, and intended uses vary wildly across these three examples. Although fictional, they are comparable to the disparate examples of standardization that were discussed on the ASSESS listserv. At their core, these are examples of student learning assessment in higher education. However, they differ in the types of standardization they require. Really, then, there is no such thing as a non-standardized assessment. All assessment processes require some degree of standardization at some point in the process.

Things go awry, though, when the type and degree of standardization are misaligned with the intended purposes, uses, and inferences of the assessment process. Misalignment occurs when the measures and methods of the assessment process do not match the goals of the assessment process. This problem is bidirectional. Assessments employing inappropriately lax standardization—such as measuring writing skill without using a rubric—are likely to be criticized for their lack of rigor. Similarly, assessments employing inappropriately rigid standardization—such as measuring writing skill across a number of genres and disciplines using the exact same rubric—are likely to be criticized for their over-simplification of the target construct. Both of these criticisms are warranted, but neither of these criticisms are rooted in standardization itself. Instead, they are rooted in a misalignment between what we want assessment to do and how the assessment process is actually implemented.

We therefore propose that this misalignment, not standardization itself, should be the target of our collective ire. Assessment systems should always be built and evaluated within the context of the inferences we intend to draw from their results. Limiting our understanding of alignment problems solely to issues of standardization misses the broader point. Useful, valid inferences cannot be drawn from measures that are misaligned with our intended purposes and uses. While appropriate types and degrees of standardization should certainly play a role in evaluating alignment, we should not let our vision narrow only to this. Instead, we should engage in a more nuanced conversation about the appropriate applications of standardization within the larger goal of creating better alignment between our practices and our intentions.

 

Notes

1Fiona Chrystall, Summer DeProw, Matthew DeSantis, David Dirlam, Jeffrey Freels, Joan Hawthorne, Jarod Hightower-Mills, Larry King, Tom Leary, Kara Maloney, Tisha Paredes, Karen Pain, Jeremy Penn, Jane Marie Souza, Claudia Stanny, Reuben Ternes. (Return to text.)

 

References

Ewell, P. (2004). General education and the assessment reform agenda. Washington, DC: Association of American Colleges and Universities.

Raign, K. A. (1994). Teaching stones to talk: Using stasis theory to teach students the art of dialectic. Rhetoric Society Quarterly, 24 (3/4), 88-95.