Two Strikes Against Teacher Evaluation Schemes

"Creative Commons Strike One" by Eli Christman is licensed under CC BY 2.0.
“Creative Commons Strike One” by Eli Christman is licensed under CC BY 2.0.

Two curve balls were thrown at the movement to evaluate teachers using student tests scores and classroom observations.  Both were strikes!

Attempts to evaluate teachers have focused on classroom observations of teacher performance, and the contributions (value added) teachers make to student test score gains.

Two studies were published recently, casting doubt on the use of classroom observations and VAM as strong indicators of teacher quality.

The two studies overlap, and their results raise further questions about the nature of teaching. One of the seven deadly diseases in the management of any organization identified by W. Edwards Deming is evaluation of performance, merit rating, or annual review.  He doesn’t advocate any of these. Perhaps the research reported here will support him.

I’ll come back to this later.

But, let’s take a look at the studies.

Evaluating Teachers with Classroom Observations (TCO)

Nearly all teachers are evaluated to some extent based on classroom observations conducted by administrators or trained technical observers.  In most many states classroom observations contribute up to 50% of a teachers evaluation, in some districts it counts even more.

Are classroom observations valid and reliable to use to rate teachers?

In a study conducted by the Brown Center at the Brookings Institute, entitled Evaluating Teachers with Classroom Observations, researchers found that classroom observation systems favor teachers with top students, and those teaching in large districts.  They also found that observations by outside observers were more valid that those conducted by school administrators.

Using student test scores and classroom observation data, the researchers obtained individual student achievement data linked to each teacher from four urban districts’ databases.  None of the names of the districts were revealed because of “political sensitivities” surrounding teacher evaluation systems.  I’d say so!

Using correlations of 0.33 to 0.38, the authors of the Brookings report claim that they have a statistically significant and robust predictive relationship with the ability of teachers to raise student test scores in an adjacent year.

It obvious that the researchers have a bias toward the use of student test scores to generate VAM scores, and believe this method is better than a teacher’s paper credentials and types of experience.  To say that a correlation of 0.33 is “robust” is a misinterpretation of this value.  A correlation of 0.33 is moderate at best and borders close to a weak positive relationship.  Then, to use moderate to weak correlations to evaluate teachers and make life decisions is unwarranted.

This study throws a curve at using VAM and Classroom Observations to rate teachers.

Strike one!

Evaluating Teachers with the Value Added Model (VAM)

In a study Instructional Alignment as a Measure of Teaching Quality published in Educational Evaluation and Policy Analysis, authors Morgan S. Polikoff (University of Southern California)  and Andrew C. Porter (University of Pennsylvania), found very weak associations between teachers’ instructional alignment and contributions to student learning (VAM).  They were interested in finding out if there was any relationship between the teacher’s alignment of instruction to the standards and assessments to the value-added to student achievement.  The study was funded by Gates Foundation.

Their research was guided by OTL, or “students opportunity to learn.” According to the authors, the literature of OTL including teaching practices aligned to the standards and the quality of methods used affect student learning.

But as the authors point out, their study is one of the first to associate VAM scores with both instructional alignment and pedagogical quality.  They also have data across state lines, and they are among the first to use VAM measures as dependent variables.  The teachers who participated in their study (a survey) were drawn from the MET study (the Gates Measures of Effective Teaching study).  In all, 701 teachers from six MET partner districts were contacted to take part.  Of those contacted, 388 responded, and 278 actually completed the survey (39% participation rate).

Here are some of their findings:

  • The researchers found very low correlations between teachers’ instruction with state standards and state and alternative assessments.  For math the overall  correlation was r = .16, and in ELA, they found a significant negative relationship ( r = -.24) between instruction-standards alignment and state test VAM for one district, and a very low correlation overall (r = .14).
  • In short, there is no evidence of relationships between alignment and a composite measure of effectiveness.
  • Overall, the correlations do not show evidence of strong relationships between alignment or pedagogical quality and VAM scores
  • The finding of lack of relationship between FFT (Danielson’s Framework for Teaching) and Tripod (student surveys) to VAM scores is in contrast to the statistically significant relationships found in analyses of the same correlations in the full study database (Bill & Melinda Gates Foundation. Still, the size of the relationships both here and in the full study was small.

The author’s conclusion are very important given the policy makers have made decisions to use VAM to evaluate teachers without evidence to support their use, let alone the AERA’s condemnation of using high-stakes tests to decide people’s fates.

Take a look at these conclusions the authors make about using VAM as a measure of teacher quality:

  • Overall, the results are disappointing. Based on our obtained sample, we would conclude that there are very weak associations of content alignment with student achievement gains and no associations with the composite measure of effective teaching….we are left with the conclusion that there are simply weak relationships..
  • Another interpretation the authors make is that the tests used for calculating VAM are not able to detect differences in the content or quality of teaching.  Since standardized tests, how could they really be used to detect differences among teachers.  (Well, what do you know!)
  • But here is a powerful conclusion they make.  These results suggest it may be fruitless for teachers to use state test VAMs to inform adjustments to their instruction.  And they add, if VAMs are not meaningfully associated with either the content and quality of instruction, what they measuring?
These are significant findings. Most states evaluating teacher effectiveness using a very complicated statistical analysis of student scores to predict the effect teachers should have on student test scores. In this study the researchers found very weak correlations (and in one case a negative relationship) between teacher alignment to the state standards and student learning gains.

Yet, states have insisted in tieing teacher evaluation to student scores. This study suggests that using student test scores as a measure of teacher quality is questionable.

In Georgia, where I live, the state legislature mandated that 50% of a teachers annual evaluation be based on Georgia’s version of VAM.  And, for the third time in a few years, Georgia will using a new standardized testing system, and they’ll raise the bar even further.

The Polikoff/Porter study is peer-reviewed research that policy makers should use to put the brakes on using student test score gains to rate teachers.

Strike Two!

Rating Teachers Does Not Improve Learning

Most school reformers subscribe to the use of test scores to rate teachers, and they also support (financially) to the unbelievably complicated systems of teacher classroom observations that have been generated and used in school districts.

The research reported here calls into question the use of these devices to rate teachers.

But, I want to suggest that to improve learning, to improve the education of children and youth, we need to remove all systems of rating teachers, or the use of merit systems, and bonuses for performance.  None of these has been shown to do anything for student learning.  And it creates a competitive system of rewards and punishments, and not a cooperative system of learning and life-long development.

To continue to think of education as a machine-age system in which teachers are workers, whose job is improve student performance as measured on high-stakes tests, which are made more difficult to pass, will lead to a failed and corrupt system.

Research by W. Edwards Deming, Russell L. Ackoff, Peter Barnard, Lisa Delpit, John Dewey, and Peter Senge point to the concept of “schools that learn.”  Schools that learn think differently than the standards-based, high-stakes accountability schools of today.  These schools see teachers as members of a team whose collaborative work will build a shared vision to foster a humane and energetic system or organization.  Schools that learn are focal points of learning in the communities around them, says Peter Senge.

However,  Atlanta has just hired a new superintendent (Dr. Meria Carstarphen) and according to public statements she made yesterday, she seeks “a culture change,” by focusing on student achievement and graduation rates.  But the problem with this, is that this is exactly what Dr. Beverly Hall did when she was superintendent of Atlanta resulting in a “culture of fear” (according to Governor Deal’s report) that resulted in the biggest cheating scandal in the country.  Doesn’t Carstarphen realize she is heading in the same direction?

Atlanta, if it follows the same path we’ve been on for a decade or so, will not necessarily witness wide swings in achievement test scores or graduation rates.

Continuing with standardized tests, and teacher evaluation systems will only result in a quick fix, and in the long run simply perpetuate the mechanistic solutions to schooling that remove any sense of creativity, and student-teacher collaboration.

Bring teachers in to sit at the table, and invite them to plan the future.  Don’t rely on hired guns brought from another city. The resources to think differently about student learning and school improvement already exist in the community.