Why Achievement Test Scores are Poor Indicators of Student Learning and Teacher Effectiveness

The U.S. Department of Education (ED) has established a single variable as the way to reward and punish schools, teachers, students and their parents.  The fact that I have used the terms “rewards” and punishments” is evidence enough that the ED is stuck in 19th century psychology.

In 2001, the Congress approved the No Child Left Behind Act which mandated the testing of all students in reading and math.  Immediately, this set in motion the most devastating impact on curriculum in the elementary schools by narrowing the curriculum, and putting such emphasis on reading and math.

In 2009, the Congress approved the Race to the Top Fund (RT3), which earmarked about $4.5 billion for a U.S. competition among the 50 states and the District of Columbia.  Of these entities, only 18 were winners.  The rest lost, except for four states which choose not to compete).

The Race to the Top, in my view, is even worse for education than the NCLB.  In the RT3, achievement test scores are given even more importance because those states that got the money were required to tie student test scores to teacher evaluation using the Value Added Modeling (VAM) system.

Many states, even those that did not receive RT3 money now require at least 50% of a teacher’s evaluation be based on the VAM scores generated by a mythical statistical model.  If you think I am kidding, here is the formula for determining a teachers worth as measured by adding value to student learning.

Figure 1. The statistic value-added model (covariate adjustment model) used to evaluate Florida teachers.

Aside from the fact that VAM scores are unreliable, often the scores of very competent teachers end up being at the bottom of the list.  Further, the tests upon which the VAM is calculated measure only a very small aspect of student learning.  In fact, much of what we think is really important in school–communication skills, ability for work collaboratively with others to solve problems, creative thinking, empathy, and ethics–are not measured on achievement tests.

Why does the ED insist on this simple and behavioristic model of teaching?  It does so because it thinks that school is like a factory, and runs much like a machine.  Some call this mechanistic thinking.  Everything can be broken down into components, such as teacher behavior, teacher training, computers in the classroom, number of students in the class, access to technology, standards, academic tests, courses, homework, etc.   Mechanistic thinking leads to a “fix it” mentality.  That is, we can fix the problem of schooling by changing one or more of these variables.

The big problem in the minds of the mechanistic thinkers, who I am also going call the Neo-School Reformers, such as Bill Gates, Michelle Rhee, Joe Klein, and Arne Duncan, is that they believe that American schools are inferior to schools in other nations, especially countries including Finland, and most of the Asian nations.  Our schools are inferior, and they prove it by citing test scores on PISA and other international tests.  But they don’t tell you the rest of the story.

The Neo-School Reformers solution to what ails our schools is the Global Education Reform Model (GERM).  Although not named by Gates and associates, it was described by one of Finland’s leading educators, Dr. Pasi Sahlberg.

There is a growing body of research that shows that the GERM model is an ineffective model of educational reform.  As Sahlberg points out, GERM is primarily practiced by the North Atlantic Alliance of Schools (primarily the U.S. Europe, and Australia).

Indeed, if you compare the PISA test results of these nations, its difficult to distinguish one from the other.

Thinking In Terms of Systems Theory

The Neo-education reforms are “heads in the sand” reformers.  They fail to look around.  They can’t.  Their necks are stuck in the muck of their own arrogance, and ignorance.  They fail to take their heads out of the box of a classroom or a school, and think about the larger ecosystem in which the school is placed.  They really get mad at teachers or education researchers if they bring up out-of-school factors that might affect student achievement.  They have a code or a motto: No Excuses Education (NEE).

Here is the thing. I’ve learned from a group of scholars, including Ed Johnson, Diane Ravitch, Russell Ackoff, Peter Barnard, W. Edwards Deming, & Lisa Delpit, that there is an other and more humane way to look at schools.

When we try to isolate the effect of teachers on any of the outputs of the school, we are sure to fail.  Think about learning as a system.

Ed Johnson, a scholar and activist in Atlanta has taught me this.  When we try to break the system apart, it loses its essential properties. In this case the output as measured by student test scores is the product of the system, which is due to interactions and interdependencies that the teacher is only one small part.

To ignore the effects of the “system” on student achievement is ignore the large body of research on the effects of poverty on the emotional and social aspects of childhood, acute and chronic stressors, cognitive lags, and health and safety issues.

Just ask any teacher about his or her students.  Ask them how is the achievement of their students affected by inadequate school resources, living in poverty, not having a home, parents who struggle to earn a living, the size of the school and district, the location of the school, students coming to school each day hungry or inadequately fed, school policies, and so on?

Systems of Achievement in Race to the Top States

Take look at Figure 2.  I’ve selected seven winners of the Race to the Top competition, and plotted their math achievement level (at or above proficient) as measured by the National Assessment of Educational Progress (NAEP).   In addition to the seven winners (Florida, Georgia, Massachusetts, New York, North Carolina, Tennessee, District of Columbia) we also have included data for the United States.

The RT3 funding began in 2010, and is now in its fourth year for many of the winning states.  Notice, however, that five of states hover near the U.S. average, but  Massachusetts and the District of Columbia lie above and below the other states, respectively.  Why is this?


Now take a look at Figure 3. It’s the same graph but in this case its marked up.  The six states, and DC received from $75 to $700 million to improve education in their respective states.  In all cases, the single variable used to check effectiveness of the system is student achievement scores.  In  figure 3, we examine the results from a system’s point of view, a method that I learned from Ed Johnson.

Figure 2. 8th Grade Math as a System. All states, except for Massachusetts fall within the framework of Upper and Lower Control Limits. Any variation within this zone is due to system causes, and not special causes. Source: The Annie E. Casey Foundation, KIDS COUNT Data Center, datacenter.kidscount.org

In the graph below, most of the state scores fall within expected limits (Upper control limits–UCL and Lower control limits–LCL).  Any variation in scores for North Carolina, New York, Florida, Georgia, and Tennessee for the most part was random, but there is evidence that some special causes were at work in Massachusetts, and we might hypothesize that special cause  effects might be at work in DC..

Georgia, Florida, Tennessee, New York and North Carolina are U.S. examples of what Finnish educator Pasi Sahlberg calls the Global Education Reform Movement.  In each of these states, GERM has spread across these states, and we see classic GERM conditions, including the adoption of common standards, narrowing of curriculum focusing on math, writing and reading, high-stakes testing, a corporate management model which is data driven, and a system of accountability based on student test scores.

The graph below shows that the GERM model for most states is ineffective in changing math achievement.  I’ve examined reading in the same states during the same period, and the graphs are nearly identical.

The reforms that are in place in Georgia and other Race to the Top states will not affect student achievement in real ways.  The reforms are narrow and they ignore the ecology of learning by not seeing the school as part of a larger system.  For example, I asked in the last post why there was very little mention of poverty in Georgia’s reporting of their new method of grading schools.

Here is one reason.  Here is another graph of the same states, but this time showing poverty.  The graph is almost an inverse of the graphs shown in Figures 1 and 2. Notice that most states level of children living in poverty, except for Massachusetts (15%), has converged to the U.S. average which is about 23%.  What is the effect of poverty on student learning. Until we come look at the effects of the system on learning, we’ll make little progress in learning.

Using achievement scores is a poor indicator of student learning, and an even worse measure of teacher evaluation.

What do you think about the reforms that have been put into place as part of the Race to the Top?

Using Achievement Scores to Support Myths and Build Fear

There was an interesting discussion in Yong Zhao’s book, Catching Up, or Leading the Way: American Education in the Age of Globalization about how John F. Kennedy used the launching of Sputnik to suggest that a “Missile Gap” existed between the United States and Soviet Union, that the United States was behind. It turns out that the so called Missile Gap did exist, but it favored the United States. The myth of the gap became a truism in American culture, and it started us on a long path of using myths to feed the fear factor, and use this to satisfy political, economic or educational goals. In Kennedy’s case, it helped him get elected in 1960.

In today’s culture, politicians and especially business leaders, have perpetuated the myth that academic achievement in a few subjects is the most important outcome of schooling, and that indeed, there is a huge gap between the achievement of students in the United States and its counterparts in other industrialized nations. Furthermore, these same politicians and business leaders would have us believe that there is a serious decline in the supply of high-quality students from the beginning (the end of high school) to the end of the Science & Engineering “pipeline.” Both of these cases are myths—that U.S. students do not achieve at high levels, and that there is a serious shortage of high quality persons for science & engineering. They are perpetuated to fulfill the needs and desires of officials whose best interests are served by claiming such weaknesses in the American educational system (see Lowell & Salzman).

The Race to the Top Fund showcases these myths, and uses them to determine the criteria upon which proposals submitted by the states (one per stste) will be judged a winner and therefore eligible for some if the $4.3 billion.

Even the Bill & Melinda Gates Foundation is providing 20 states a quarter of a million dollars in expertise and funds to prepare Race to the Top proposals due January 19th. It’s also been announced that those states that did not submit a proposal to judged in phase one will receive funds for June 1 submissions.

There is a lot of hysteria around the Race to the Top. For example in Georgia, a writer in the Atlanta Business Chronicle said “Georgia must win ‘Race to the Top”, and indeed again connected the economic well being of the state or nation, you take your pick, with the achievement scores of students in school. The problem is there is no clear evidence that student achievement scores are directly related to the nation’s or state’s economy—as much as the business community would have us believe.

Since 1983, when the Federal report “A Nation at Risk: The Imperative for Educational Reform” was published by the U.S Department of Education, these same business and political leaders have used the rationale that there are major economic “threats” to the United States because of the performance of American students compared to students in other countries. Although U.S. students perform at high levels on these international tests, business/political leaders focus in on countries that score higher than the U.S. in math, and then conclude that these countries pose a “threat” to the U.S. economy. These countries include Singapore, Latvia, Belgium. Threats? I don’t think so.

I am not suggesting that financial support in the form of grants from the Federal government should not be enacted. Quite the opposite. I am, however, arguing that the underlying principles upon which States will be funded in the Race to the Top Fund are flawed, and based on myths and feed on fear that America’s educational system is in a race with other nations, and that we should fear a few smaller countries scores on very narrow achievement test scores.

If we look at any indicator used by these same business and political leaders, American students have continued to show improving scores on SAT, ACT, NAEP and on international tests including TIMSS and PISA.

So what is going on? The desire of the power brokers is to control and manage education, and use a simplistic business model to regulate schooling,and assume that the fundamental purpose of schooling is to increase test scores in math and reading, and hold teachers and administrators accountable for these same test scores. This leads America toward a more deeply authoritarian approach to education, the antithesis of what education should be a democratic society. What is needed is a paradigm that would foster critical and creative thinking, innovation, and a focus on helping students learn how to learn. We need a broad curriculum not a narrow one.  We need a curriculum that values the arts just as much as the sciences—we need to go well beyond math and reading and engage our students in real problems that are set in their lived experience.  This is of course the paradigm of learning that I have been advocating on this website—-the humanistic science paradigm of learning.

I’ll explore these ideas in more detail in the days ahead. In the meantime, I invite comments on these ideas. Am I out of sync? What do you think?