Sorely Tested

A combination of new federal accountability measures, states鈥� plans to comply with them, and new commercial testing products threatens students, teachers and schools with a new wave of inappropriate high-stakes testing.

So cautions Teachers College measurement-evaluation expert Madhabi Chatterji in a publication released this week by the , based at the University of Colorado, Boulder.

Chatterji鈥檚 warning 鈥� and a series of guidelines for preventing the scenarios she fears 鈥� comes as 44 states begin implementing their federally approved plans for meeting the testing and accountability requirements of the Every Student Succeeds Act (), enacted in 2015 under President Obama.

Many of these states are planning to use 鈥渟tatistically derived indices from test-based data to rank, rate or examine growth of schools or education systems to fulfill ESSA鈥檚 requirements,鈥� writes Chatterji, Professor of Measurement, Evaluation & Education, in 鈥淗owever, measurement experts, researchers and professional associations (such as the American Educational Research Association and the American Statistical Association) have cautioned against several of these 鈥� particularly 鈥榮tudent growth percentiles,鈥� 鈥榲alue-added鈥� growth models, and multi-indicator 鈥榗omposite鈥� scores.鈥�

Many states are planning to use 鈥渟tatistically derived indices from test-based data to rank, rate or examine growth of schools or education systems to fulfill ESSA鈥檚 requirements,鈥� writes Chatterj. Misuse of test information in this way is akin to 鈥渕isreading a Fahrenheit thermometer in degrees Celsius.鈥�

Misuse of test information in this way, Chatterji writes, is akin to 鈥渕isreading a Fahrenheit thermometer in degrees Celsius.鈥�

Chatterji, who is also founding director of 911爆料网鈥檚 Assessment and Evaluation Research Initiative, says her 鈥淐onsumer鈥檚 Guide鈥� is not a critique of particular standardized tests or testing programs, but instead a 鈥溾€榯ool kit鈥� for state, national, and district policymakers (and the assessment specialists/researchers who assist them) to help avert the most common pitfalls and adverse consequences of inappropriate test information use for students, families and concerned stakeholders.鈥� A key message 鈥� which Chatterji has delivered in many past writings 鈥� is that 鈥渧alidity is not a fixed property鈥� that can be built into tests. Rather, she writes 鈥渢he extent to which tests yield meaningful or valid information on student learning, or the quality of schooling, depends on how appropriately test results are put to use in decision-making contexts.鈥�

The nation鈥檚 recent track record in that regard has not been encouraging. For example the old SAT test was not designed to measure the schools鈥� effectiveness 鈥� but in 2012, under the No Child Left Behind Act (ESSA鈥檚 predecessor), many school districts used it as the basis for identifying exceptional schools and practices.

One of Chatterji's key messages is that 鈥渧alidity is not a fixed property鈥� that can be built into tests. Rather, 鈥渢he extent to which tests yield meaningful or valid information on student learning, or the quality of schooling, depends on how appropriately test results are put to use in decision-making contexts.鈥�

Nor have test developers helped the situation. Rather than simply providing students 鈥渞aw鈥� scores on standardized tests (the total points a student earns for providing correct answers), makers of standardized tests typically provide 鈥渟caled scores鈥� 鈥� scores that have been transformed to enable comparisons among students who took different levels or forms of a test. The statistical wizardry involved can be so complex that such 鈥渄erived鈥� scores become a 鈥渂lack box鈥� to most test users, increasing the likelihood that they will be misused for policy purposes.

The new ESSA guidelines further increase the chances for testing misuse, Chatterji says, because they place heavy pressure and tight restrictions on states to meet self-set goals 鈥� including long-term 鈥済rowth-related鈥� targets.

On the broadest level, Chatterji recommends that all test users specify, up front, the kinds of inferences they intend to draw from test data; that they avoid 鈥渕ulti-purposing鈥� tests in ways that go beyond either the test鈥檚 intended use or the reported evidence; that they justify their uses and inferences of test-based data by referring to specific appropriate criteria for validity, reliability and utility; and that they seek out expert technical review before using tests for accountability purposes.

Among her other, more specific recommendations, Chatterji calls for the use of 鈥渄escriptive quality profiles鈥� 鈥� reports on locally valued indicators of student and school success separately 鈥� instead of complex statistical indices.

The 鈥渉igh stakes鈥� of states鈥� ESSA rollout plans go beyond the immediate impact on schools and students. Past assessments have prompted major backlashes, Chatterji notes 鈥� for example, the Opt Out movement (parents who refuse to let their kids to take standardized tests in public schools) and the concurrent decision by many states to opt out of the two national consortia that are implementing assessments geared to the Common Core State Standards. Such fragmentation can result in individual states adopting policies based on their own tests, with similar patterns of misuse of testing data.

And yet, Chatterji says, 鈥渢here is a political demand for high-stakes uses of test data that is likely to continue.

鈥淩egardless of the recent backlash, the public still seeks standardized test scores 鈥� not only for students, but also obtaining a better gauge of their local schools. Combined, these factors create conditions for some of the recurring testing issues that this guide identifies.鈥�

Tags: Evaluation & Learning Analytics Assessment & Testing Research