Today's Paper Digital FAQ Obits River Valley Democrat-Gazette Newsletters 🎄Community Christmas Card NWA Screening Sites NWA Vaccine Information Virus Interactive Map Coronavirus FAQ Crime Razorback Sports Today's Photos Puzzles

Proficiency on state tests is subjective

Grading is 2-step process by Brenda Bernet | April 23, 2012 at 3:27 a.m.

— Earning a “proficient” score on the state’s Benchmark or End-of-Course exams isn’t like making a 70 on a classroom test.

Rather, students’ scores are analyzed and evaluated each year in a complex process involving testing-industry experts to separate the proficient students from those who are not.

“This is large-scale assessment,” said Gayle Potter, director of student assessment for the Arkansas Department of Education. “It’s not like a classroom test. When standard setting is done, they don’t think in percentages of points.”

School districts are in the midst of the annual spring testing season. Students in the third through eighth grades finished taking state tests in literacy, math andscience in April. High school students will finish exams in geometry, biology and algebra in early May.

The federally mandated tests are a gauge of whether students have mastered skills and concepts the state expects. Students who do not pass their exams, or earn a “proficient” score, can be held back or required to repeat a course if they do not finish a plan for remediation.

Though determining proficiency involves lots of expert opinions and calculations, the process is a subjective one with the passing line changing every year.

Determining whether a student earned a proficient score, meaning the student is on grade level, involves a two-step grading process: determining the number of raw points a student earnsfor correctly answering multiple-choice and open-response questions, and equating those points with a fixed range of “scale” scores, Potter said.

“It’s really the scale scores that correlate to our performance levels,” Potter said.


Test questions can be harder on the test one year and easier the next, Potter said. The department must account for variation in the level of difficulty to ensure the state is expecting the same level of performance from students each year, she said.

Converting raw points to scale scores allows for consistent measurement of student understanding of the required skills and concepts from year to year, Potter said.

The department involves content experts, testing experts and the state’s contracted testing company throughout the process of developing the exams, grading them and determining what constitutes proficiency, Potter said.

The state Board of Education has defined ranges of scale scores, a measurement tool, that determine how a student performed, Potter said. Students falling within the ranges for the below basic and basic ranges are considered below grade level, while those scoring within the proficient and advanced ranges are on grade level. The dividing lines among the four categories differ by grade and subject, but they don’t change from year to year.

The scale ranges for the Benchmark Exams in math and literacy are from zero to 999. For the third-grade Benchmark Exam in math, scale scores from zero to 408 are below basic, scale scores of 409 to 499 are basic, scale scores from 500 to 585 are proficient, and scale scores from 586 to 999 are advanced.

What does change each year are the number of raw points that equal a scale score, Potter said. After students complete their exams, the department convenes with experts who evaluate the results of the exams and determine how many points equal each scale score, or set what is known as a “cut” score.

On the third-grade math test in 2011, earning 25 of 80 points was equivalent to earning a scale score of 500, the minimum to show proficiency. In 2010, third-graders needed to earn 32 of 80 points for a scale score of 500.

“Those performance level designations and cut score designation are a target with a bull’s eye,” Potter said. “They tell [school districts] what we expect in terms of rigor and how good is good enough.”


The federal No Child Left Behind Act, in place for a decade, required every state to set standards for classroom instruction in math and literacy and for states to track student mastery of those standards on statewide assessments. The law fostered the development of a different set of standards, assessments and definitions of proficiency in all 50 states.

A proficient student could move to a neighboring state and test below proficient there, experts say.

The assessments in Arkansas are the Augmented Benchmark Exam for third through eighth grades and End-of-Course Exams for high school students. Arkansas also tests students in science in fifth and seventh grades and in biology.

To adhere to the new requirements, the Education Department established a technical advisory committee with about a half-dozen experts in fields such as measurement, psychometrics and testing, said Ray Simon, who ran the department from 1997-2003.He later became a deputy secretary of education for the U.S. Department of Education.

The committee is involved in equating raw scores to scale scores that determine proficiency, Simon said.

“Testing itself and the measurement is not an exact science,” Simon said.

The group of experts advise the department on the best option for creating a system that is fair to students and that upholds the state standards for education, Simon said.

But no mathematical formula exists to determine the number of correct questions or points necessary for students to achieve proficiency, said Ron Dietel, assistant director for research use and communications for the National Center for Research on Evaluation, Standards & Student Testing, based at the University of California, Los Angeles.

“It ultimately comes down to human judgment where to set the cut score,” Dietel said.

Point ranges will vary depending on the difficulty of the exam, Dietel said. Committees equating raw points with scale scores may be more generous when the test questions are difficult.

“If those items are really difficult, only getting 30 percent to 40 percent right for proficiency is OK,” he said.

The concept is different from the public’s general understanding of testing, when anything less than 70 percent warranted a D or an F, Dietel said.

While a classroom test is designed so every student would have the chance to score 100, standardized tests are meant to show a distribution of performance that distinguishes high-performing students from low-performing students and those in the middle, said Russ Whitehurst, director of the Brown Center on Education Policy for the Brookings Institution. The Brookings Institution is a nonprofit public policy organization in Washington, D.C.

“You want a spread of scores,” he said. “On a welldesigned test, it would be rare for any student to get a perfect score.”

Standards for math in one grade differ from those for English, and that causes differences in the points necessary for proficiency, he said.


Other states commonly follow an approach like Arkansas, Whitehurst said. Content experts meet and sort questions by level of difficulty.

Other experts review the distribution of performance and make decisions on where to set levels of performance, similar to setting a curve, Whitehurst said. Those experts and policy makers weigh concerns of the public when setting levels of performance.They could set a low level of performance, so that most students pass. Or they could set the bar so high that few pass, which could result in pushback from teachers and principals.

The decisions consider how many non-proficient children a state’s education system can handle and what level of proficiency represents an achievable goal for schools, he said.

“You’re seeing a behindthe-sciences mystical process,” he said. “It’s complicated.”

No test is without measurement error, meaning that it’s possible a student could take the test one day and score in the proficient range and take the same test another day and fall in the basic range, Potter said. The extensive process to develop the test, grade it and determine performance levels are intended to minimize those errors.

“We’re very confident about the scoring, about the equating, about making sure with all of our committees that we actually measure our state standards,” Potter said.

Northwest Arkansas, Pages 7 on 04/23/2012

Print Headline: Proficiency on state tests is subjective


Sponsor Content