Why Achievement Test Scores are Poor Indicators of Student Learning and Teacher Effectiveness

The U.S. Department of Education (ED) has established a single variable as the way to reward and punish schools, teachers, students and their parents.  The fact that I have used the terms “rewards” and punishments” is evidence enough that the ED is stuck in 19th century psychology.

In 2001, the Congress approved the No Child Left Behind Act which mandated the testing of all students in reading and math.  Immediately, this set in motion the most devastating impact on curriculum in the elementary schools by narrowing the curriculum, and putting such emphasis on reading and math.

In 2009, the Congress approved the Race to the Top Fund (RT3), which earmarked about $4.5 billion for a U.S. competition among the 50 states and the District of Columbia.  Of these entities, only 18 were winners.  The rest lost, except for four states which choose not to compete).

The Race to the Top, in my view, is even worse for education than the NCLB.  In the RT3, achievement test scores are given even more importance because those states that got the money were required to tie student test scores to teacher evaluation using the Value Added Modeling (VAM) system.

Many states, even those that did not receive RT3 money now require at least 50% of a teacher’s evaluation be based on the VAM scores generated by a mythical statistical model.  If you think I am kidding, here is the formula for determining a teachers worth as measured by adding value to student learning.

 Figure 2. The statistic value-added model (covariate adjustment model) used to evaluate Florida teachers.


Figure 1. The statistic value-added model (covariate adjustment model) used to evaluate Florida teachers.

Aside from the fact that VAM scores are unreliable, often the scores of very competent teachers end up being at the bottom of the list.  Further, the tests upon which the VAM is calculated measure only a very small aspect of student learning.  In fact, much of what we think is really important in school–communication skills, ability for work collaboratively with others to solve problems, creative thinking, empathy, and ethics–are not measured on achievement tests.

Why does the ED insist on this simple and behavioristic model of teaching?  It does so because it thinks that school is like a factory, and runs much like a machine.  Some call this mechanistic thinking.  Everything can be broken down into components, such as teacher behavior, teacher training, computers in the classroom, number of students in the class, access to technology, standards, academic tests, courses, homework, etc.   Mechanistic thinking leads to a “fix it” mentality.  That is, we can fix the problem of schooling by changing one or more of these variables.

The big problem in the minds of the mechanistic thinkers, who I am also going call the Neo-School Reformers, such as Bill Gates, Michelle Rhee, Joe Klein, and Arne Duncan, is that they believe that American schools are inferior to schools in other nations, especially countries including Finland, and most of the Asian nations.  Our schools are inferior, and they prove it by citing test scores on PISA and other international tests.  But they don’t tell you the rest of the story.

The Neo-School Reformers solution to what ails our schools is the Global Education Reform Model (GERM).  Although not named by Gates and associates, it was described by one of Finland’s leading educators, Dr. Pasi Sahlberg.

There is a growing body of research that shows that the GERM model is an ineffective model of educational reform.  As Sahlberg points out, GERM is primarily practiced by the North Atlantic Alliance of Schools (primarily the U.S. Europe, and Australia).

Indeed, if you compare the PISA test results of these nations, its difficult to distinguish one from the other.

Thinking In Terms of Systems Theory

The Neo-education reforms are “heads in the sand” reformers.  They fail to look around.  They can’t.  Their necks are stuck in the muck of their own arrogance, and ignorance.  They fail to take their heads out of the box of a classroom or a school, and think about the larger ecosystem in which the school is placed.  They really get mad at teachers or education researchers if they bring up out-of-school factors that might affect student achievement.  They have a code or a motto: No Excuses Education (NEE).

Here is the thing. I’ve learned from a group of scholars, including Ed Johnson, Diane Ravitch, Russell Ackoff, Peter Barnard, W. Edwards Deming, & Lisa Delpit, that there is an other and more humane way to look at schools.

When we try to isolate the effect of teachers on any of the outputs of the school, we are sure to fail.  Think about learning as a system.

Ed Johnson, a scholar and activist in Atlanta has taught me this.  When we try to break the system apart, it loses its essential properties. In this case the output as measured by student test scores is the product of the system, which is due to interactions and interdependencies that the teacher is only one small part.

To ignore the effects of the “system” on student achievement is ignore the large body of research on the effects of poverty on the emotional and social aspects of childhood, acute and chronic stressors, cognitive lags, and health and safety issues.

Just ask any teacher about his or her students.  Ask them how is the achievement of their students affected by inadequate school resources, living in poverty, not having a home, parents who struggle to earn a living, the size of the school and district, the location of the school, students coming to school each day hungry or inadequately fed, school policies, and so on?

Systems of Achievement in Race to the Top States

Take look at Figure 2.  I’ve selected seven winners of the Race to the Top competition, and plotted their math achievement level (at or above proficient) as measured by the National Assessment of Educational Progress (NAEP).   In addition to the seven winners (Florida, Georgia, Massachusetts, New York, North Carolina, Tennessee, District of Columbia) we also have included data for the United States.

The RT3 funding began in 2010, and is now in its fourth year for many of the winning states.  Notice, however, that five of states hover near the U.S. average, but  Massachusetts and the District of Columbia lie above and below the other states, respectively.  Why is this?

 

Now take a look at Figure 3. It’s the same graph but in this case its marked up.  The six states, and DC received from $75 to $700 million to improve education in their respective states.  In all cases, the single variable used to check effectiveness of the system is student achievement scores.  In  figure 3, we examine the results from a system’s point of view, a method that I learned from Ed Johnson.

Figure 2. 8th Grade Math as a System. All states, except for Massachusetts fall within the framework of Upper and Lower Control Limits.  Any variation within this zone is due to system causes, and not special causes.  Source: The Annie E. Casey Foundation, KIDS COUNT Data Center, datacenter.kidscount.org

Figure 2. 8th Grade Math as a System. All states, except for Massachusetts fall within the framework of Upper and Lower Control Limits. Any variation within this zone is due to system causes, and not special causes. Source: The Annie E. Casey Foundation, KIDS COUNT Data Center, datacenter.kidscount.org

In the graph below, most of the state scores fall within expected limits (Upper control limits–UCL and Lower control limits–LCL).  Any variation in scores for North Carolina, New York, Florida, Georgia, and Tennessee for the most part was random, but there is evidence that some special causes were at work in Massachusetts, and we might hypothesize that special cause  effects might be at work in DC..

Georgia, Florida, Tennessee, New York and North Carolina are U.S. examples of what Finnish educator Pasi Sahlberg calls the Global Education Reform Movement.  In each of these states, GERM has spread across these states, and we see classic GERM conditions, including the adoption of common standards, narrowing of curriculum focusing on math, writing and reading, high-stakes testing, a corporate management model which is data driven, and a system of accountability based on student test scores.

The graph below shows that the GERM model for most states is ineffective in changing math achievement.  I’ve examined reading in the same states during the same period, and the graphs are nearly identical.

The reforms that are in place in Georgia and other Race to the Top states will not affect student achievement in real ways.  The reforms are narrow and they ignore the ecology of learning by not seeing the school as part of a larger system.  For example, I asked in the last post why there was very little mention of poverty in Georgia’s reporting of their new method of grading schools.

Here is one reason.  Here is another graph of the same states, but this time showing poverty.  The graph is almost an inverse of the graphs shown in Figures 1 and 2. Notice that most states level of children living in poverty, except for Massachusetts (15%), has converged to the U.S. average which is about 23%.  What is the effect of poverty on student learning. Until we come look at the effects of the system on learning, we’ll make little progress in learning.

Using achievement scores is a poor indicator of student learning, and an even worse measure of teacher evaluation.

What do you think about the reforms that have been put into place as part of the Race to the Top?

About Jack Hassard

Jack Hassard is a writer, a former high school teacher, and Professor Emeritus of Science Education, Georgia State University.

…and I’M STILL FOR HER.

Google