E-valuating Teaching: It Doesn’t Add Up–An Art of Teaching Science Inquiry

Latest Story

[restabs alignment=”osc-tabs-left” pills=”nav-pills” responsive=”true” icon=”true” text=”More” tabcolor=”#246820″ seltabcolor=”#3d52c6″]
[restab title=”Research” active=”active”]You might want to visit this site to see the research on value-added modeling.[/restab]
[restab title=”A Teacher Speaks Out”]John Spencer, an Arizona middle school teacher wrote a post that described his experience with the value-added model. He reports that one of his students said to him, “You look stressed.” You might want to read his post in which he explains how his stress was derived from Arizona’s use of VAM scores to rate teachers.[/restab]
[restab title=”Key Studies”]Two studies were recently published which spell bad news for advocates of the current models to rate teachers.[/restab][/restabs]

Last week, I introduced four Inquiries on the Art of Teaching Science blog.  You can find the inquiries on the side bar on my blog’s home page, or follow these links:

You can navigate each inquiry from the landing pages for each inquiry.

This blog post introduces the fifth inquiry which focuses on the use of value-added measures to rate teachers.

E-valuating Teaching: It Doesn’t Add Up

Why the Use of Student Achievement Tests Is an Absurd System to Evaluate the Practice of Teaching

Teacher bashing has become a contact sport played out by many U.S. Governors. The rules of the game are staked against teachers by using measures that have not been substantiated scientifically. For many governors, and mayors it is fair play to release the names of every teacher in the city, and their Value-added score determined by analyzing student achievement test scores. None of the data that has been published has been scientifically validated, and in fact, the data that is provided is uneven, and unreliable from one year to the next.

A VAM score is a number that is derived using a covariate adjustment equation (Figure 1). The idea is to rate teachers using student test scores. For example, in the Florida VAM big data release, VAM scores are reported for teachers who taught math and reading, and for those that didn’t teach math or reading. They reported next to each teacher’s name, a score that indicates the learning gains students made above or below what they were expected to learn (based on earlier performance, with OTHER teachers).

Here is equation used to figure teachers’ “value added effect.”


Figure 1. The statistic value-added model used to check teachers.

Using student achievement scores to compute a number which claims to find what a teacher “adds” to student learning simply doesn’t add up.  This is what this inquiry is about.

For years now, I’ve written about the nonsense attributed to using student achievement scores to assess teachers.  But there are others who have written more powerfully about the nonsense attributed to the use of these scores.  I want to direct you to two websites where you can find important information why using student test scores to evaluate teachers doesn’t add-up.

  • Dr. Cathy O’Neil’s Blog (Mathbabe) Cathy O’Neil is the Program Director of The Lede Program. Prior to Columbia, she was a data scientist in the New York startup scene and co-authored the book Doing Data Science.  She blogs daily at mathbabe.org, appears weekly on Slate’s The Big Money podcast, and is active in Occupy Wall Street’s Alternative Banking group.  She has written considerably on education, and in particular her  views on the use of value-added modeling to evaluate teaching.  You can read her value-added posts here.
  • Audrey Amrein-Beardsey at Vamboozled!  This blog, founded by Dr. Amrein-Beardsley as the lead blogger, focuses on research-based analyses of teacher evaluation, teacher accountability, and value-added models used the nation’s public schools.  She is Associate Professor of Education at Arizona State University.  As a university professor, she has taken the lead in contributing to public debates about education, especially the use of value-added models to rate teachers. In addition to the Vamboozled blog, I recommend Dr. Amrein-Beardsley’s book, Rethinking Value-Added Models in Education: Critical Perspectives on Tests and Assessment-Based Accountability (Library Copy).


E-valuating Teaching: It Doesn’t Add Up

In this inquiry, we ask if it is a practical system to use student achievement test scores as a measure of teacher effectiveness.  Is the system viable, and will such a system have a detrimental affect on student learning.

The following are few articles that were posted on the Art of Teaching Science blog that focus on these questions.  You can find many more of these articles here and here.


The Absurdity of Teacher Evaluation Systems


Creative Commons, Gander? by Jack Hassard Licensed Under
Creative Commons, Gander? by Jack Hassard Licensed Under CC BY-NC-SA

There was an article today in the Atlanta Journal-Constitution that really got my gander up.  The article, written by AJC blogger Maureen Downey, was entitled Grading on a curve.  The article was about teacher evaluation systems.  Downey’s article focused on classroom observation systems, indicating that only 22% of teachers will be evaluated with student test scores.  This is the first error in the article.  In Georgia, for instance, 50% of each teacher’s evaluation will be based on student test scores.  And again, that’s every teacher.  Even teachers who do not teach courses in which standardized tests are used will be evaluated by how all teachers in a school do.

She then reviews the study published by the Brookings Institute which examined teacher observation systems in four school districts.  As she reports, teacher observation systems seem to be biased in favor of teachers teaching high-performing students, and unfair to teachers teaching low-performing students.

But then she cites Sandi Jacobs of the National Council on Teacher Quality (NCTQ).  We’ve debunked the NCTQ here, as have many other bloggers, and so I was disappointed that Downey would refer to NCTQ in her article.  The last group to go to for advice on improving teaching is the NCTQ.  Most of the reporting done by the NCTQ is junk science.

Twin Methods of Teacher Evaluation

The twin methods that are put together to form a teacher evaluation system are absurd, muddled, and unreasonable.  Even more, the assumptions which are used evaluate teachers are rooted in false claims about what is effective teaching, and how one knows when effective teaching happens.  At its stupidest level, bureaucrats who sit in front of their computer screens, and who’ve consulted with agronomists, believe they have the algorithms that will actually measure in some quantifiable way, just how much a teacher adds to student academic achievement.

Framework of Classroom Teaching

Then there is the group that believes it is possible to quantify teacher effectiveness by observing teachers in action in the classroom.  One of the common systems to measure teacher classroom effectiveness is the Danielson’s Framework for Teaching.  The mantra from the Danielson group is that the “framework” is comprehensive and coherent and based on those aspects of teaching (behaviors) that promote student learning.

But here is the thing, the “framework‘ reduces teaching to 22 components and 76 smaller elements organized into four domains of teaching (Planning and Preparation, the Environment, Delivery of Services, and Professional Responsibilities.  This is a classic example of reductionism.  And for reductionist researchers, the use of this kind of framework of teaching makes sense.

The Danielson Framework is not a new idea. For decades, educational researchers have developed and implemented tens of “instruments” to see and quantify teacher behavior.  Most of these instruments were analytic–teacher behavior was divided into categories or clusters of performance, as is done in the Danielson Framework.


And of course the most extreme reductionist measure is the quantification of learning by means of achievement test scores. Using the same logic used in evaluating teacher performance, student performance is measured using standardized tests which are based on content components and smaller elements that are organized into domains of content in fields such as science, mathematics, social studies, and English/language arts.

Teachers should not scored or rated as if they were in a competition to win or lose something.  The use of these systems borders on a sinister view of teachers, who for some reason, need to poked, prodded, and measured.  If you read the value added technical documents, such as this one in Florida, you will probably have a nervous breakdown.  And you will wonder how the algorithms used have anything to do with teaching.   Figure 1 is the algorithm used to figure VAM (for Florida teachers).

Figure 1. Value Added Model for the State of Florida. Source: Florida Value-Added Model Technical Report, American Institutes for Research
Figure 1. Value Added Model for the Florida. Source: Florida Value-Added Model Technical Report, American Institutes for Research


Using VAM scores is part of a larger plan to use standardization and high-stakes test accountability to privatize public education, and cut teaching to “teaching to the test.”

We have to keep in mind that public education has become a place where the locus of success is based on student achievement test scores.  The system of accountability is like mildew, a thin layer covering any sense of creativity and innovativeness, that results in a smell much like the fungus that created it in the first place.

Student achievement gains, according to VAM folks, can be traced back to a teacher’s contribution using an algorithm that most people who work at state department’s of education can not explain to teachers.  They have no idea how to use the results of VAM to help teachers improve.  All these scores do is offer a story for newspapers to list the VAM scores of teachers, and leave them out to dry.

Dr. Cathy O’Neil, a mathematician and professor at Columbia University where she is director of the Lede Program at the Journalism School  writes ablogs at mathbabe (exploring and venting about quantitative issues).   I’ve read her blog regularly for the past year, and she’s brought me into a world that has pushed me into areas that I know very little about, but because of the way she writes, I’ve found a number of her ideas applicable to this blog.

Her interest in teaching is quite clear on her blog.  If you search ” teaching” on her blog you will find articles that are very pertinent to this blog post.  She has a collection of articles on VAM, and discussions of VAM from a perspective that is crucial to efforts to fight against the use of VAM, let along shaming teachers by posting VAM scores publicly.  She discussed in one of her articles how detestable it was when New York City teacher’s VAM scores were released.

But read what she said about the nature of the VAM score.  What is “underneath” the VAM score?  What does it mean?  She writes:


Just to be clear, the underlying test doesn’t actually use a definition of a good teacher beyond what the score is. In other words, this model isn’t being trained by looking at examples of what is a “good teacher”. Instead, it derived from another model which predicts students’ test scores taking into account various factors. At the very most you can say the teacher model measures the ability teachers have to get their kids to score better or worse than expected on some standardized tests. Call it a “teaching to the test model”. Nothing about learning outside the test. Nothing about inspiring their students or being a role model or teaching how to think or preparing for college.

A “wide margin of error” on this value-added model then means they have trouble actually deciding if you are good at teaching to the test or not. It’s an incredibly noisy number and is affected by things like whether this year’s standardized tests were similar to last year’s. (O’Neil, C. Teaching scores released, mathbabe, Feb. 26, 2012, extracted May 19, 2014.


Big Mistake

We are making a serious mistake to condone the use of VAM, and I was disturbed by Maureen Downey’s article’s lack of any criticism of VAM.  She did point out some of the shortcomings of using classroom observation systems, but here is the thing.  A classroom visit, especially by a colleague or someone who is informed about providing feedback to help improve instruction, is a much more valuable tool to improve teaching.  The concern I have for the way classroom observation systems are being used is that these observations will result in a calculation or a number which will be used with VAM scores to rate, grade, judge teachers.

This needs to be prevented.

If we want to improve teaching, then it needs to accomplished in a collaborative, collegial way.  Teachers, to take risks with their teaching style and methods, need to trust the people who visit their classroom to see them at work.

Trust.  How can teachers trust the system when it uses complicated algorithms to rate them based on dubious academic achievement (standardized) tests, that may or not “test” the content that was part of their curriculum?

The system of teacher evaluation that is prevalent in most states is absurd.

How do you think teachers should be evaluated?

If You Think Student Output as Measured by Achievement Tests Is a Way to Evaluate Teachers, You’d Be Plug Wrong!

If You Think Student Output as Measured by Achievement Tests Is a Way to Evaluate Teachers, You’d Be Plug Wrong!

What will it take to convince school boards, departments of education and administrators that using student achievement scores, one of the outputs that we constantly measure in American schools, is not a scientific nor ethical way to evaluate teachers.  To do so is to ignore the research on this issue, and to perpetuate the myth that using a student test score is a valid way to determine the effectiveness of teachers.

To carry out this plan, which will be implemented in the Cobb County Schools (where I live) and the rest of Georgia’s schools by 2015,  reinforces the machine age conception of our schools.  The machine age gave rise to factories, which became the model used to build and organize schools.  The outputs of a factory such a shoe, a dress, a pot or pan, are analogous to the outputs of schools such as grade point average, drop out rate, or student achievement.  In this machine age example, many people believe that the outputs are explained by a cause-effect relationship.  In our world of education there is the belief that student achievement as an output is caused (or added to) by the teacher.  This is a false belief.  And by the way, if a factory produced “bad” shoes, you can’t pin in on the factory workers, either.

If teachers don’t effect in substantial ways student achievement scores, what does?  To answer this will require us to be willing to think in a different way.  Albert Einstein is quoted by Russell Ackoff about thinking in different ways:

You can’t solve the problems created by the current pattern of thought using the current pattern of thought.

The current pattern of thought, based on causal thinking, derives from the acceptance of a cause as enough for its effect.  In the case of student achievement, this pattern of thought means that the teacher effect can be taken to explain rises or falls in student achievement.  Nothing else needs to be taken into account.  As Russell Ackoff has said, “Machine-Age thinking was, to a large extent, environment-free; it tried to develop understanding of natural phenomena without using the concept of environment.”

But here is the thing.

We’ve left the machine age.  Or perhaps it might be safer to say we are in the midst of a transformation from the machine/factory age of thinking to an other way of viewing the world.   This transformation is to an ecological, interdisciplinary or systems view of the world with writers from many fields describing this new way of thinking, including Rachel Carson (ecology), W. Edwards Deming (economics and business), Russell L. Ackoff (management), and Peter Barnard (systems thinking schools),

We need to think about school as a whole.  It’s a school system, and a more powerful way to look at schooling is to think of it as a system.  A system (according to many researchers in this field) is a whole that cannot be divided into independent parts.  Indeed, every part of a system has properties that it loses when separated from the system, and every system has some properties–its essential ones–that none of its parts do.

In order to improve school, we have to stand back and look at the school system.  As we look at school as a system, researchers such as W. Edwards Deming suggest that 94% of the variation we see in the school system is due to the nature of the system, not the people who work or make the system work.  For many of us, this doesn’t make any sense.  But if we are willing to move away from the linear factory model, and move to a vertical or system view, then we are led to ask what are some causes of the variation.  What causes variation in student achievement, in drop-out rates, and the achievement gap?

The Importance of Understanding Variation

There are two types of variation in a system, common-cause (accounting for 94%) and special cause.  Common-cause variation is the noise in a system.  It’s there in the background.  Its part of the natural pattern of the system.  Special cause is a clear signal, an unnatural patter, an assignable cause.  Variation falling within statistical limits means that any variation we see (test scores, graduation rates, achievement gaps) is the result of the natural behavior of the system, and as such, we can not point to one reason that caused higher scores, lower graduation rates, or decreases in the achievement gaps.  We need to accept the fact that student achievement scores are subject to the behavior of the system, and if you do the math, teachers have almost no control over this.  So why do we continue to put the blame on teachers for kids learning or not learning.

In research on the Trial Urban District Assessment which was reported here based on Ed Johnson’s analysis of TUDA for the years 2002 – 2013, there was very little variation in test scores over this period for 21 urban districts.  In fact, except for four instances at the 4th grade reading system, all the variation in test scores at the 4th and 8th grade in math and reading was due to common causes.

Figure 1. TUDA, Reading, 4th Grade Control Chart Showing Long Term Achievement Scores Across 21 Urban Districts
Figure 1. TUDA, Reading, 4th Grade Control Chart Showing Long Term Achievement Scores Across 21 Urban Districts. Source Ed Johnson, NAEP TUDA 2002 – 2011 Study

When we try to isolate the effect of teachers on any of the outputs of the school, we are sure to fail.  When we try to break the system apart, it loses its essential properties.  In this case the output as measured by student test scores is the product of the system, which is due to interactions and interdependencies that the teacher is only one small part.  How is student achievement affected by inadequate resources, living in poverty, not having a home, parents who struggle to earn a living, the size of the school and district, the location of the school, students coming to school each day hungry or inadequately fed, school policies, and so on?  


Which Model Describes the Real World?

For example, Mike Stoecklein wrote a guest post on the W. Edwards Deming Institute Blog, and according to researchers in the field of systems thinking, performance of the person can not be separated from the system, and is unknown.   The relationship between the individual (a teacher in this case) and the system (school system) is important to understand, if we are to try to test teachers based on some measure of student’s performance.  Stoecklein presents three models developed by a colleague of Dr. Deming–Heero Hacquebord.  They are shown in Figure 1.

In World I the individual is independent of the system, and performance is independent, and in this model, pay for performance, ranking and rating makes sense.  But is it the real world? Of course not.  In World II, the person is immersed in the system, and totally dependent on the system.  All outcomes are attributable only to the system.  Does this world exist? No.  World III is a model in which the individual interacts with the system, performance of the individual can not be separated from the system, and is unknown.  Performance pay or ranking makes no sense.  Performance is only improved by focusing on the union of the system and the person.  Stoecklein believes this is the real world.

Figure 1.  Three World Views showing the Interaction between the System and the Individual by Hacquebord, in Mike Stoecklein's blog post.
Figure 1. Three World Views showing the Interaction between the System and the Individual by Hacquebord, in Mike Stoecklein’s blog post.  (Stoecklein, Mike. “We Need to Understand Variation to Manage Effectively.” Deming Blog. W. Edwards Deming Institute, 07 Feb. 2013. Web. 26 Jan. 2014)

If the world of school was depicted as shown in World I, then using VAM scores might be valid.  But World I is not real.  Teachers are not separate from the school system any more than are students.  So why does the state insist that teacher performance can be measured by student performance.  It doesn’t make any sense.  World II might be closer to the truth.  But surely teachers have some sense of independence, and are not totally dependent on the system.

So we come to World III where teacher performance is the result of an interaction between the individual and the system.  Yet, even in this model, it is not possible to dissect how the system affects performance, any more than how student achievement can be used as the reason to judge teacher performance.  There are too many other variables and interactions that affect performance if teachers and students.  If we want to improve teacher performance, then we must focus on the union between the system and person.  In this model we have to make the assumption that one’s ability as a teacher is not only related to his or her pedagogical abilities, but ones interaction with the system.  We could ask, What’s the contribution of the individual to the system?  What’s the contribution of the system in which the teacher works?  These are not easy questions to answer.  To continue to believe student achievement score gains are directly related to personal teacher performance is a falsehood.  It’s a misrepresentation of the complexity of teaching and learning.

Yet, in Georgia (and other Race to the Top winning states), large sums of money are being spent on hiring consultants to tell school districts how to manage its people.  Heero Hacquebord made an important point about this on a comment he made on the Mike Stoecklein’s blog post:

Our systems are cancerous diseases that consultants do not seem to have the courage to address, because that terminates their client contracts!!!  “Performance appraisal:, “pay for performance”, “bonuses”, “productivity measurements” for nurses and physicians, are sold by consultants at great costs to the health care systems. We talk about respect for people, but then we destroy them by the systems we use. We do not motivate people, we only activate them, which means they do what leadership want them to do because of the consequences if they did not? We end up with fear and intimidation, and people have to go along to put bread on the table (note, substitute the word nurses and physicians with administrators and teachers).  (Stoecklein, Mike. “We Need to Understand Variation to Manage Effectively.” Deming Blog. W. Edwards Deming Institute, 07 Feb. 2013. Web. 26 Jan. 2014)

To Stoecklein,  Hacquebord, and others, because system leaders do not understand variation, they continue to lack the knowledge to manage humanely; instead they prod along tampering with the system.  Because of this lack of understanding of systems theory, they think that most of the problems of schools can put on the shoulders of teachers, and they continue to think that simple causal relationships define the teacher-student relationship.  Nothing could be further from the truth.

What is the effect of using student test scores to evaluate teachers?  Its demoralizing not only to teachers, but imagine the kid who says to herself, “today I am going to take a test that will decide if my teacher is hired or fired!”  What’s the effect of this in the school culture?  How would you approach the curriculum if you knew that student scores will affect your performance and job stability?  Wouldn’t you teach to the test?  Using pre-test vs post-test scores, Value Added Measures, and high-stakes tests are unsubstantiated methods that have very low reliability on the one hand, and are simply invalid on the other.  How can school board members vote to carry out such as plan in their own school district?  What are they thinking if they do this?

Last year, a group of Georgia university professors, who are experts in the field of educational evaluation, posted a letter to Governor Deal, State School Superintendent Barge, as well as key politicians in the Georgia Legislature, and superintendents of school districts participating in Georgia’s Race to the Top.   The researchers provided detailed evidence that the teacher evaluation system that the Georgia Department of Education has created is not based on supporting research.  They raised the following concerns, and recommended that using student achievement scores to evaluate teachers should be postponed.  Their concerns included the following:

  1. Value Added Models are not proven;
  2. GA is not prepared to implement this evaluation model;
  3. This model is not the most useful way to spend education funds;
  4. Students will be adversely affected by this Value Added Model.

We need not only suspend the use of teacher evaluation systems based on student achievement gains, we need to think differently about schools.  We need to heed Einstein’s warning that we can’t solve the problems created by the current pattern of thought using the current pattern of thought.

My dear colleagues, school board members, school leaders, if you think student output as measured by achievement tests is a way to evaluate teacher effectiveness, please consider that you might be wrong.


A High School Principal Tells How One Great Teacher Was Wronged by Flawed Evaluation System

Guest Post by Dr. Carol Burris

Carol Corbett Burris has served as principal of South Side High School in the Rockville Centre School District in NY since 2000.  Prior to becoming a principal, she was a teacher at both the middle and high school level.  She received her doctorate from Teachers College, Columbia University, and her dissertation, which studied her district’s detracking reform in math, received the 2003 National Association of Secondary Schools’ Principals Middle Level Dissertation of the Year Award. Dr. Burris has for some time been chronicling the consequences of standardized test-driven reform in her state  which you can read (here, and here and here, for example).

Burris was named New York’s 2013 High School Principal of the Year by the School Administrators Association of New York and the National Association of Secondary School Principals, and in 2010, tapped as the 2010 New York State Outstanding Educator by the School Administrators Association of New York State. She is the co-author of the New York Principals letter of concern about the evaluation of teachers by student test scores. It has been signed by more than 1,535 New York principals and more than 6,500 teachers, parents, professors, administrators and citizens. You can read the letter by clicking here.

In this new post, Burris tells the story of a New York state teacher who was just unfairly smacked by the state’s flawed new teacher and principal evaluation system, known as APPR, which in part uses student standardized test scores to evaluate educators. The method isn’t reliable or valid, as Burris shows here.

This article first appeared on the Answer Sheet.  The article is published with permission of Valerie Strauss and Carol C. Burris.

If you are a teacher or principal in Georgia, then you will find this article quite disturbing given that the Georgia legislature passed HB244, which will use a similar system to evaluate educators in Georgia.

How One Great Teacher Was Wronged by Flawed Evaluation System

By Carol Burris

Jenn is a teacher of middle-school students. Her school is in a small city district that has limited resources. The majority of kids in the school receive free or reduced priced lunch and about 40% are black or Latino. Many are English language learners. Lots of them are homeless.

After learning that she was rated less than effective because of her students’ standardized test scores, she wrote to Diane Ravitch, who posted her letter on her blog. She wrote:

I’m actually questioning whether I can teach for the next 20 years. It’s all I’ve ever wanted to do, but this APPR garbage is effectively forcing out some of the best teachers I’ve worked with. I may be next.

I contacted Jenn to better understand her story. I encountered the kind of teacher I love to hire. She has never imagined herself as anything but a teacher—teaching is not a stepping stone to a career in law or business. She does all the extras. She comes in early and leaves late. She coaches. She understands that she must be a counselor, a nurse and a surrogate parent to her students—the most at-risk students in the seventh-grade. Jenn is their support teacher for English Language Arts.

Growth Score diagram showing the three components on which teachers are judged. Source: NYSUT
Growth Score diagram showing the three components on which teachers are judged. Source: NYSUT

She is valued by her principal who gave her 58 out of 60 points on the measure of teaching behaviors—instruction, lesson plans, professional obligations, understanding of child development, communication with parents—all of the things that matter and that Jenn can truly control.

Student Test Scores

And then came the test score measures. The grade-level teachers and the principal had to create a local measure of student performance. They chose a group measure based on reading growth on a standardized test. They were required to set targets from a pre-test given in the winter to a post-test given in the spring. The targets were a guess on the part of the teachers and principal. How could they not be? The team was shooting in the dark—making predictions without any long-term data. Such measures can never be reliable or valid.

The state of Massachusetts requires that measures of student learning be piloted and that teachers be evaluated not by one set of scores, but rather by trends over time. That state’s evaluation model will not be fully implemented for several years because they are building it using phase-in and revision. But New York does not believe in research or caution. New York is the state where the powerful insist that teachers “perform,” as though they were trained circus seals. There is no time for a pilot in the Empire State. Our students and we must jump, as our chancellor advises, “into the deep end of the pool.” In New York, our commissioner warns that we can never let the perfect be the enemy of the good. We don’t even let nonsense be the enemy of the good. And so Jenn hoped that she and her colleagues made a reasonable gamble when they set those targets.

The Students’ Tests Don’t Count or Are Too Hard

Many of the seventh-grade students did not take the standardized reading test seriously. Middle schoolers are savvy—they knew the test didn’t count. So they quickly filled in the bubbles as teachers watched in horror. Luckily, enough students took their time so that their teachers were able to get 10/20 points on that local measure of learning, which put Jenn in the Effective range.

The final piece in her evaluation was her score from new Common Core-aligned tests that the state gave to students this past spring. The tests were far too difficult for Jenn’s Academic Intervention Services (AIS) students. They were too long. The reading passages were dense and many of the questions were confusing. We know that only about 1 in 5 students across the state, who are like the majority of Jenn’s students, scored proficient on the Common Core tests. Even more importantly, we know that about half of all students like Jenn’s scored in level 1—below Basic. These are the students who, overwhelmed by frustration, give up or guess. The test did not measure their learning—it measured noise.

It Doesn’t Add Up

So Jenn’s students’ scores, along with all the other seventh-grade scores on the Common Core tests, were put in a regression model and the statisticians cranked the model, and they entered their covariates and set confidence levels and scratched their heads over Beta reports and did all kinds of things that make most folks’ eyes glaze over. And all of that cranking and computing finally spit out all of the teachers’ and principals’ places on the modified bell curve. Jenn got 5 points out of 20 on her state growth score along with the label, Developing.

When all of the points were added up, it did not matter that she received 58/60 points in the most important category of all, which is based on New York’s teaching standards. And it did not matter that she was Effective in the local measure of student learning. 5+10+58 = 73 which meant that Jenn was two points short of being an Effective teacher. Jenn was labeled, Developing, and put on a mandated improvement plan.

This seven-year dedicated teacher feels humiliated. She knows that parents will know and possibly lose confidence in her. She is angry because the label is unfair. She will be under scrutiny for a year. Time she would spend on her students and her lessons will be wasted in meetings and improvement plan measurement. The joy of teaching is gone. It has been replaced by discouragement and fear.

Her principal also knows it is not fair—she gave Jenn 58/60 points. Over time, however, she may begin to doubt her own judgment—the scores may influence how she rates teachers. After all, principals get a growth score too, and the teachers with low scores will become a threat to principals’ own job security over time. Those who created this system put Machiavelli to shame.

Flawed Model

Jenn is not alone. There are hundreds, if not thousands, of good teachers and principals across the state who are receiving poor ratings they do not deserve based on a flawed model and flawed tests. Slowly, stories will come out as they gain the courage to speak out. There will be others who suffer in silence, and still others who leave the profession in disgust. None of this is good for children.

During the July 2013 hearing of the Governor’s New Education Reform Commission, David Steiner, the previous New York State Commissioner of Education, said,

There is a risk, and I want to be honest about this, that very, very, mature, effective teachers are saying you are treating me like a kid. In the name of getting rid of the bottom 5 percent, we risk losing the top 5 percent….We do not want to infantilize the profession in order to save it.

Steiner directed those remarks to his former deputy, now state commissioner, John. B. King. Did King understand what his former mentor was trying to tell him? Because he did not respond to Steiner’s observation, we do not know.

John King told districts to use caution when using this year’s scores to evaluate teachers and principals. He claimed that the tests did not negatively impact teacher’s accountability ratings. Perhaps he should ask Jenn if she agrees. We already know that 1 in 4 teachers and principals moved down at least one growth score category from last year — hardly the hallmark of a reliable system.

What Should Be Done

There is much that King and the Board of Regents can do. They can ask the governor to pass legislation so that the evaluations remain private. They can request that teachers like Jenn, who are more than effective in the eyes of their principals, be spared an improvement plan this year. I hold no hope, however, that John King will do that. He lives in “the fierce urgency of now.” But for Jenn and her students, now quickly becomes tomorrow. The risk that David Steiner explained is real. We need to make sure that we have our best teachers tomorrow and not lose them in the deep end of the pool.

What do you think?  Could this be the fate of many of Georgia’s teachers?  And if you are a principal, how does this effect you?

In Marietta, GA, Teachers Might Be Scammed by the Use of VAM

Latest Story

In 2010, Georgia was one of the winners of the Race to the Top competition.  The prize was half a billion dollars from the Federal government to among other things, adopt the common core standards and base teacher evaluation on student test scores.

Some more facts:

In 2012 the Georgia Department of Education applied for a NCLB “Waivers,” (full report) and again agreed to the use of student test scores as a significant part of teacher and principal evaluation.

Then,  the esteemed Georgia Legislature passed HB244 (Annual Performance Evaluations) this year.  What does this mean?

To put it nicely, another nail is now placed in the educator’s coffin.  This law, which will apply to all teachers and principals in Georgia, says:

Growth in student achievement/academic achievement shall be the priority measuring stick and shall count for at least 50% of the evaluation.
Basically its is saying that if your students had a bad year, then you or your principal caused this, and you should be punished.


States Using Value-Added Model to Wreck Havoc on Schooling
States Using Value-Added Model to Wreck Havoc on Schooling

This simple view of learning is based on a very old and stale explanation for how our kids learn.   The teacher causes the student to learn.  If the student learns, then the student is rewarded.  If the student does not learn, then the student is punished.  And now the brilliant Georgia legislature, which meets for only 40 days each year, has decided that the teacher is the major determiner of student learning.

But hold on.  According to Georgia HB244, teachers will be punished if their student’s scores are low, or might be rewarded if their student scores increase.  It’s sort of like a mother telling a child who has finished her work, and asks for dessert, “We’ll see.”
Unfortunately for all of us, a lot of policy makers, legislators, school board members, and citizens think that what a child learns is directly caused by the teacher.  We now ask, “how much does a teacher add to the learning of students in a class?”  Probably a lot, but the method used called VAM (Value Added Model), which rhymes with SCAM.
And this is just what it is, a SCAM. If you don’t believe me then read this article on Anthony Cody’s blog, Living in Dialog, written by a Florida teacher who explains why she thinks VAM is a scam.


So why would a highly rated school system, such as the Marietta City Schools, pay a group (Education Resource Strategies) from Massachusetts to tell them how to spend their money and test their rather successful school faculty and administration?
The superintendent of Marietta City Schools said in an article in the Marietta Daily Journal that “compensation redesign is something that’s long overdue in our profession.”  I would agree with her.  But why would she throw out a system that is based on experience and qualifications and replace it with a system that is untested, unscientific, unethical, and some would say immoral.
If you look at some of America’s most prestigious organizations, experience and education level are key factors used to decide employee salaries.  Yes, performance evaluation is part of their strategy, but evaluation is not used to penalize the employee, but to improve the employee’s ability to be a better professional and contribute to the target goals and aspirations of the organization or company.
Instead, Georgia will instigate a competitive system of rewards and punishments based on how well our students do, and then use these test scores to praise or degrade teachers and principals.  How immoral is that?
Instead of a system which advocates a dog-eat-dog world, why not base it on principles of equity and high performance in which teachers are held accountable for carrying-out the highest quality educational environment in which children thrive, and are not held as pawns in a education marketplace that uses student test scores as the “bottom-line.”
To carry out this kind of teacher performance evaluation is not only shameful, it will result in many unintended consequences.  Here are a few:
Teachers and principals will do an outstanding job with our students without threats, penalties, and the kinds of rigid controls that are described in HB244.  If you are a parent, you know that when your kids come home from school, they have a more ingenious way of evaluating our teachers.  They tell you as they trust you.
Now we need to send the School Board of Marietta a message asking them to vote against the concept of a pay-for-performance plan for teachers and principals.

What would you tell the School Board of Marietta?