There was an article today in the Atlanta Journal-Constitution that really got my gander up. The article, written by AJC blogger Maureen Downey, was entitled Grading on a curve. The article was about teacher evaluation systems. Downey’s article focused on classroom observation systems, indicating that only 22% of teachers will be evaluated with student test scores. This is the first error in the article. In Georgia, for instance, 50% of each teacher’s evaluation will be based on student test scores. And again, that’s every teacher. Even teachers who do not teach courses in which standardized tests are used will be evaluated by how all teachers in a school do.
She then reviews the study published by the Brookings Institute which examined teacher observation systems in four school districts. As she reports, teacher observation systems seem to be biased in favor of teachers teaching high-performing students, and unfair to teachers teaching low-performing students.
But then she cites Sandi Jacobs of the National Council on Teacher Quality (NCTQ). We’ve debunked the NCTQ here, as have many other bloggers, and so I was disappointed that Downey would refer to NCTQ in her article. The last group to go to for advice on improving teaching is the NCTQ. Most of the reporting done by the NCTQ is junk science.
Twin Methods of Teacher Evaluation
The twin methods that are put together to form a teacher evaluation system are absurd, muddled, and unreasonable. Even more, the assumptions which are used evaluate teachers are rooted in false claims about what is effective teaching, and how one knows when effective teaching happens. At its stupidest level, bureaucrats who sit in front of their computer screens, and who’ve consulted with agronomists, believe they have the algorithms that will actually measure in some quantifiable way, just how much a teacher adds to student academic achievement.
Framework of Classroom Teaching
Then there is the group that believes it is possible to quantify teacher effectiveness by observing teachers in action in the classroom. One of the common systems to measure teacher classroom effectiveness is the Danielson’s Framework for Teaching. The mantra from the Danielson group is that the “framework” is comprehensive and coherent and based on those aspects of teaching (behaviors) that promote student learning.
But here is the thing, the “framework‘ reduces teaching to 22 components and 76 smaller elements organized into four domains of teaching (Planning and Preparation, the Environment, Delivery of Services, and Professional Responsibilities. This is a classic example of reductionism. And for reductionist researchers, the use of this kind of framework of teaching makes sense.
The Danielson Framework is not a new idea. For decades, educational researchers have developed and implemented tens of “instruments” to see and quantify teacher behavior. Most of these instruments were analytic–teacher behavior was divided into categories or clusters of performance, as is done in the Danielson Framework.
And of course the most extreme reductionist measure is the quantification of learning by means of achievement test scores. Using the same logic used in evaluating teacher performance, student performance is measured using standardized tests which are based on content components and smaller elements that are organized into domains of content in fields such as science, mathematics, social studies, and English/language arts.
Teachers should not scored or rated as if they were in a competition to win or lose something. The use of these systems borders on a sinister view of teachers, who for some reason, need to poked, prodded, and measured. If you read the value added technical documents, such as this one in Florida, you will probably have a nervous breakdown. And you will wonder how the algorithms used have anything to do with teaching. Figure 1 is the algorithm used to figure VAM (for Florida teachers).
Using VAM scores is part of a larger plan to use standardization and high-stakes test accountability to privatize public education, and cut teaching to “teaching to the test.”
We have to keep in mind that public education has become a place where the locus of success is based on student achievement test scores. The system of accountability is like mildew, a thin layer covering any sense of creativity and innovativeness, that results in a smell much like the fungus that created it in the first place.
Student achievement gains, according to VAM folks, can be traced back to a teacher’s contribution using an algorithm that most people who work at state department’s of education can not explain to teachers. They have no idea how to use the results of VAM to help teachers improve. All these scores do is offer a story for newspapers to list the VAM scores of teachers, and leave them out to dry.
Dr. Cathy O’Neil, a mathematician and professor at Columbia University where she is director of the Lede Program at the Journalism School writes ablogs at mathbabe (exploring and venting about quantitative issues). I’ve read her blog regularly for the past year, and she’s brought me into a world that has pushed me into areas that I know very little about, but because of the way she writes, I’ve found a number of her ideas applicable to this blog.
Her interest in teaching is quite clear on her blog. If you search ” teaching” on her blog you will find articles that are very pertinent to this blog post. She has a collection of articles on VAM, and discussions of VAM from a perspective that is crucial to efforts to fight against the use of VAM, let along shaming teachers by posting VAM scores publicly. She discussed in one of her articles how detestable it was when New York City teacher’s VAM scores were released.
But read what she said about the nature of the VAM score. What is “underneath” the VAM score? What does it mean? She writes:
Just to be clear, the underlying test doesn’t actually use a definition of a good teacher beyond what the score is. In other words, this model isn’t being trained by looking at examples of what is a “good teacher”. Instead, it derived from another model which predicts students’ test scores taking into account various factors. At the very most you can say the teacher model measures the ability teachers have to get their kids to score better or worse than expected on some standardized tests. Call it a “teaching to the test model”. Nothing about learning outside the test. Nothing about inspiring their students or being a role model or teaching how to think or preparing for college.
A “wide margin of error” on this value-added model then means they have trouble actually deciding if you are good at teaching to the test or not. It’s an incredibly noisy number and is affected by things like whether this year’s standardized tests were similar to last year’s. (O’Neil, C. Teaching scores released, mathbabe, Feb. 26, 2012, extracted May 19, 2014.
We are making a serious mistake to condone the use of VAM, and I was disturbed by Maureen Downey’s article’s lack of any criticism of VAM. She did point out some of the shortcomings of using classroom observation systems, but here is the thing. A classroom visit, especially by a colleague or someone who is informed about providing feedback to help improve instruction, is a much more valuable tool to improve teaching. The concern I have for the way classroom observation systems are being used is that these observations will result in a calculation or a number which will be used with VAM scores to rate, grade, judge teachers.
This needs to be prevented.
If we want to improve teaching, then it needs to accomplished in a collaborative, collegial way. Teachers, to take risks with their teaching style and methods, need to trust the people who visit their classroom to see them at work.
Trust. How can teachers trust the system when it uses complicated algorithms to rate them based on dubious academic achievement (standardized) tests, that may or not “test” the content that was part of their curriculum?
The system of teacher evaluation that is prevalent in most states is absurd.
How do you think teachers should be evaluated?