Why Bill Gates Defends the Common Core & Other Top 2014 Posts

In 2014, there were 100 new posts added to the Art of Teaching Science blog, as shown in the graphic below. I’ve made links to the top five posts for 2014.  As you can see, our examination of the how the Gates Foundation has used its billions to influence the Common Core State Standards was in the most-viewed category.  The reduction in funding also has had a marked influence on the research.  And speaking of research, too many educators, especially at the top (Sec. A. Duncan) believe there is research to support the use of Value Added Models to evaluate teacher performance.  And gaining momentum is the absurdity of following graduates of teacher education institutions to track the test scores of the K-12 students they teach.  The U.S. Department of Education will propose this regulations next year.  You can read about this here on Diane Ravitch’s blog and comment here.

In fifth place was post I wrote about the advantages and disadvantages of the theory of plate tectonics and the theory of gravity in response to politicians who want to promote “critical thinking” by imposing their will on teachers and insisting that they critically look at the theory of evolution, origins of life, global warming, and climate change.

Screen Shot 2014-12-30 at 6.24.01 PM
Figure 1. 2014 Distribution of blog posts on the Art of Teaching Science


Why Bill Gates Defends the Common Core.  At a national conference, Gates said he was concerned with people who oppose implementing the Common Core State Standards. We explore why in this post.

Why Are Scientists Abandoning Their Research.  A survey was sent to 67,454 researchers holding grants from the National Institute of Health (NIH) or the National Science Foundation (NSF). Results and implications are discussed.

Top 20 Organizations Receiving Common Core Grants from the Gates Foundation. In this post, I report on those organizations that were funded by the Bill & Melinda Gates Foundation to make the Common Core State Standards a reality.  The Council of Chief State School Officers leads the pack. But it looks like the Common Core is running into a brick wall.

The Absurdity of Teacher Evaluation Systems.  An article in the Atlanta Journal Constitution got my gander up about teacher evaluation systems. I vent here.

Advantages & Disadvantages of Plate Tectonics Theory & the Theory of Gravity. When politicians enter the arena of education and curriculum, and especially fields such as science, the are on a slippery slope. If, however, they simply want the facts (on climate change or global warming) taught in science class, they might go here.

Thank you for visiting my blog in 2014.

Happy New Year.

The Absurdity of Teacher Evaluation Systems


Creative Commons, Gander? by Jack Hassard Licensed Under
Creative Commons, Gander? by Jack Hassard Licensed Under CC BY-NC-SA

There was an article today in the Atlanta Journal-Constitution that really got my gander up.  The article, written by AJC blogger Maureen Downey, was entitled Grading on a curve.  The article was about teacher evaluation systems.  Downey’s article focused on classroom observation systems, indicating that only 22% of teachers will be evaluated with student test scores.  This is the first error in the article.  In Georgia, for instance, 50% of each teacher’s evaluation will be based on student test scores.  And again, that’s every teacher.  Even teachers who do not teach courses in which standardized tests are used will be evaluated by how all teachers in a school do.

She then reviews the study published by the Brookings Institute which examined teacher observation systems in four school districts.  As she reports, teacher observation systems seem to be biased in favor of teachers teaching high-performing students, and unfair to teachers teaching low-performing students.

But then she cites Sandi Jacobs of the National Council on Teacher Quality (NCTQ).  We’ve debunked the NCTQ here, as have many other bloggers, and so I was disappointed that Downey would refer to NCTQ in her article.  The last group to go to for advice on improving teaching is the NCTQ.  Most of the reporting done by the NCTQ is junk science.

Twin Methods of Teacher Evaluation

The twin methods that are put together to form a teacher evaluation system are absurd, muddled, and unreasonable.  Even more, the assumptions which are used evaluate teachers are rooted in false claims about what is effective teaching, and how one knows when effective teaching happens.  At its stupidest level, bureaucrats who sit in front of their computer screens, and who’ve consulted with agronomists, believe they have the algorithms that will actually measure in some quantifiable way, just how much a teacher adds to student academic achievement.

Framework of Classroom Teaching

Then there is the group that believes it is possible to quantify teacher effectiveness by observing teachers in action in the classroom.  One of the common systems to measure teacher classroom effectiveness is the Danielson’s Framework for Teaching.  The mantra from the Danielson group is that the “framework” is comprehensive and coherent and based on those aspects of teaching (behaviors) that promote student learning.

But here is the thing, the “framework‘ reduces teaching to 22 components and 76 smaller elements organized into four domains of teaching (Planning and Preparation, the Environment, Delivery of Services, and Professional Responsibilities.  This is a classic example of reductionism.  And for reductionist researchers, the use of this kind of framework of teaching makes sense.

The Danielson Framework is not a new idea. For decades, educational researchers have developed and implemented tens of “instruments” to see and quantify teacher behavior.  Most of these instruments were analytic–teacher behavior was divided into categories or clusters of performance, as is done in the Danielson Framework.


And of course the most extreme reductionist measure is the quantification of learning by means of achievement test scores. Using the same logic used in evaluating teacher performance, student performance is measured using standardized tests which are based on content components and smaller elements that are organized into domains of content in fields such as science, mathematics, social studies, and English/language arts.

Teachers should not scored or rated as if they were in a competition to win or lose something.  The use of these systems borders on a sinister view of teachers, who for some reason, need to poked, prodded, and measured.  If you read the value added technical documents, such as this one in Florida, you will probably have a nervous breakdown.  And you will wonder how the algorithms used have anything to do with teaching.   Figure 1 is the algorithm used to figure VAM (for Florida teachers).

Figure 1. Value Added Model for the State of Florida. Source: Florida Value-Added Model Technical Report, American Institutes for Research
Figure 1. Value Added Model for the Florida. Source: Florida Value-Added Model Technical Report, American Institutes for Research


Using VAM scores is part of a larger plan to use standardization and high-stakes test accountability to privatize public education, and cut teaching to “teaching to the test.”

We have to keep in mind that public education has become a place where the locus of success is based on student achievement test scores.  The system of accountability is like mildew, a thin layer covering any sense of creativity and innovativeness, that results in a smell much like the fungus that created it in the first place.

Student achievement gains, according to VAM folks, can be traced back to a teacher’s contribution using an algorithm that most people who work at state department’s of education can not explain to teachers.  They have no idea how to use the results of VAM to help teachers improve.  All these scores do is offer a story for newspapers to list the VAM scores of teachers, and leave them out to dry.

Dr. Cathy O’Neil, a mathematician and professor at Columbia University where she is director of the Lede Program at the Journalism School  writes ablogs at mathbabe (exploring and venting about quantitative issues).   I’ve read her blog regularly for the past year, and she’s brought me into a world that has pushed me into areas that I know very little about, but because of the way she writes, I’ve found a number of her ideas applicable to this blog.

Her interest in teaching is quite clear on her blog.  If you search ” teaching” on her blog you will find articles that are very pertinent to this blog post.  She has a collection of articles on VAM, and discussions of VAM from a perspective that is crucial to efforts to fight against the use of VAM, let along shaming teachers by posting VAM scores publicly.  She discussed in one of her articles how detestable it was when New York City teacher’s VAM scores were released.

But read what she said about the nature of the VAM score.  What is “underneath” the VAM score?  What does it mean?  She writes:


Just to be clear, the underlying test doesn’t actually use a definition of a good teacher beyond what the score is. In other words, this model isn’t being trained by looking at examples of what is a “good teacher”. Instead, it derived from another model which predicts students’ test scores taking into account various factors. At the very most you can say the teacher model measures the ability teachers have to get their kids to score better or worse than expected on some standardized tests. Call it a “teaching to the test model”. Nothing about learning outside the test. Nothing about inspiring their students or being a role model or teaching how to think or preparing for college.

A “wide margin of error” on this value-added model then means they have trouble actually deciding if you are good at teaching to the test or not. It’s an incredibly noisy number and is affected by things like whether this year’s standardized tests were similar to last year’s. (O’Neil, C. Teaching scores released, mathbabe, Feb. 26, 2012, extracted May 19, 2014.


Big Mistake

We are making a serious mistake to condone the use of VAM, and I was disturbed by Maureen Downey’s article’s lack of any criticism of VAM.  She did point out some of the shortcomings of using classroom observation systems, but here is the thing.  A classroom visit, especially by a colleague or someone who is informed about providing feedback to help improve instruction, is a much more valuable tool to improve teaching.  The concern I have for the way classroom observation systems are being used is that these observations will result in a calculation or a number which will be used with VAM scores to rate, grade, judge teachers.

This needs to be prevented.

If we want to improve teaching, then it needs to accomplished in a collaborative, collegial way.  Teachers, to take risks with their teaching style and methods, need to trust the people who visit their classroom to see them at work.

Trust.  How can teachers trust the system when it uses complicated algorithms to rate them based on dubious academic achievement (standardized) tests, that may or not “test” the content that was part of their curriculum?

The system of teacher evaluation that is prevalent in most states is absurd.

How do you think teachers should be evaluated?

Third Strike Against Teacher Evaluation Schemes: Brave New Parents Opt Out

Creative Commons Strike Three by rundnd Licensed under CC BY-ND 2.0
Creative Commons Little League Strike Three by rundnd Licensed under CC BY-ND 2.0

The headline in Thursday’s Atlanta Journal-Constitution was “Parents push back on required testing.”

Could the movement to Opt Out of high-stakes testing be the third strike against using high stakes testing to rate teachers? In an earlier post, two studies were reviewed that cast doubt on the use of VAM scores  (which are based on student achievement scores) and classroom observation systems to rate teachers.

There is a movement that begins in the homes of children whose parents have had it with their children being subjected to tests that not only create high anxiety, but are a dubious snapshot of student learning.

Marietta, Georgia

In Marietta, Georgia, where I live, a courageous family decided not to allow their children in elementary school to take the state CRCT’s two weeks ago. Met by the police claiming that they would be trespassing if they entered the school and their children did not take the CRCTs, the family went home, and stood their ground against the Marietta School District.

School officials use scare tactics claiming that there is no provision for parents to opt their children out of exams. But, there is no provision preventing parents from opting out. So, in Marietta, the parents were told their kids didn’t have to take the CRCTs.

Meg Norris, on a United Opt Out website has created a quick-reference guide for Georgia parents who want to opt out or refuse to have their children take state mandated high-stakes tests.

In the article mentioned at the top of the article, Meg Norris was quoted as saying:

Georgia parents have been told they must withdraw their child from school if they do not wish them tested.  Georgia parents have been told they will brought up in front of tribunals, sent to court, referred to DFACS for keeping children at home.  Children have been left out of parties and humiliated in front of their classmates.

But as you will see as you read ahead, there is support out there for parents who are brave enough to whether the resistance they will get, as the Marietta family did.


Edy Chamness, a former teacher, and parent in Austin, Texas, and professor Julie Westerlund founded the Texas chapter of the Opt Out Movement.  I came in contact with Chamness and Westerlund when I reached out to Joyce Murdock Feilke to find out about what she called “psychological abuse” created by the state-wide obsession with high-stakes testing in an Austin elementary school where she was a school counselor.

Joyce reported her observations to authorities in the state and district and the Austin American-Statesman, but in the end her concerns were dismissed by the superintendent (Dr. Meria Carstarphen, Atlanta’s new superintendent).  You can read Joyce’s report here.

Edy Chamness and Julie Westerlund were professional colleagues of Joyce’s and provided more and compelling evidence that children are being used in an experiment, rooted in punitive classic conditioning to meet the goals of the school district, which is increase student test scores and eventually graduation rates.

Edy Chamness wrote to me that Joyce and Julie are not exaggerating when they described the horrible bullying practices using in Austin ISD.  She says:

I was banned from my kids’ elementary school for sharing information about the Opt Out Movement with parents. Our son started a new school this year and the administration seems decent. Unfortunately, everything at the school is totally geared toward test prep and practice testing. The vast majority of our son’s assignments are worksheet packets. Tons of work is assigned as nightly homework; most of it is skills-based instruction and memorization. The two elementary schools in my neighborhood, Mills & Kiker, are horrible. No enrichment programs, no literature-based reading instruction, no games for reinforcement or outdoor education. NOTHING but practice tests and worksheets. The only thing that matters is test scores.

Now, Director of Texas Parents Opt Out of State Tests, Edy Chamness is one of many leaders of the Opt Out movement.  The Texas Opt Out group is very active, and offer a great deal of support to parents and teachers who acknowledge that the testing craze needs to be stopped, and one way to do this by taking action by not participating in any high-stakes tests.

The Nation

Opting Out or refusing to have students take state standardized tests is part of larger movement of a number of groups including FairTest, Parents Across America, Save our Schools, and the Network for Public Education.

Testing has become perverse.  It doesn’t have to be.  But we’ve gone to far, and the testing that is being mandated is unnecessary.  One Georgia Department of Education official said that we needed testing to find out how schools are doing, because, after all, the state is spending a lot of money on public education.

If we want to know how our schools are doing, there are better ways to answer that question than forcing millions of American students to spend their school days either preparing for tests, or sitting in front of computer, or at a desk to answer questions written by hired guns by corporations which charge at least $30 per student to do this!

The state of Georgia has just agreed to pay McGraw-Hill $110 million to develop a new battery of tests (to called the GMAP–Georgia Measures of Academic Progress to measure the academic progress of students on the Common Core.

Maybe the students should be paid for participating in these experiments. Maybe students should unionize, much like college students are doing who happen to be athletes.

We already have “data” we need to assess the performance of schools.  In 1969, the National Assessment of Educational Progress, also called The Nations ReportCard was established and has been using low-stakes testing in reading, writing, math, science and other subjects to assess student achievement.

Not only do we have access to reports as they come out, but the NAEP has conducted long-term studies for us to look at, and more recently designed and carried out the Trial Urban District Assessment, bi-annual tests in reading, writing, math and science.  Using sampling and stratification, the NAEP selects a sample of students from public and private schools large enough to estimate performance by state.  Only about 60 students per school selected are tested.  No student has to sit for the entire test.  Each student takes part of the test, and scores are aggregated to set up averages.

By the way, the state of Georgia could use the same research design methods as NAEP to assess students in the state.  In fact, Georgia could use the CRCT, or the new GMAP in a low-stakes approach, thereby informing the state that it’s getting its money worth out of teachers and students, but also tell the school districts about their performance.  And there would be no need to test every kid.

If high-stakes testing is revoked, we will make one of the most important decisions in the lives of students and their families, and the educators who practice in our public schools.  Banning tests, throwing them out, eliminating them, what ever you wish to call it, will open the door to more innovative and creative teaching, and an infusion of collaborative and problem solving projects that will really prepare students for career and college.

Making kids endure adult anger is not what public education is about.  Why in the world are we so angry and willing to take it out on K-12 students?  Why do we put the blame on children and youth, and if they don’t live up to a set of unsubstantiated and unscientific standards and statistics, we take it out on teachers?

The best thing for students is throw the bums (tests) out.  The next best thing will be for teachers because without standardized test scores, there will be no way to calculate VAM scores as a method to evaluate teachers.

Strike Three!

Two Strikes Against Teacher Evaluation Schemes

"Creative Commons Strike One" by Eli Christman is licensed under CC BY 2.0.
“Creative Commons Strike One” by Eli Christman is licensed under CC BY 2.0.

Two curve balls were thrown at the movement to evaluate teachers using student tests scores and classroom observations.  Both were strikes!

Attempts to evaluate teachers have focused on classroom observations of teacher performance, and the contributions (value added) teachers make to student test score gains.

Two studies were published recently, casting doubt on the use of classroom observations and VAM as strong indicators of teacher quality.

The two studies overlap, and their results raise further questions about the nature of teaching. One of the seven deadly diseases in the management of any organization identified by W. Edwards Deming is evaluation of performance, merit rating, or annual review.  He doesn’t advocate any of these. Perhaps the research reported here will support him.

I’ll come back to this later.

But, let’s take a look at the studies.

Evaluating Teachers with Classroom Observations (TCO)

Nearly all teachers are evaluated to some extent based on classroom observations conducted by administrators or trained technical observers.  In most many states classroom observations contribute up to 50% of a teachers evaluation, in some districts it counts even more.

Are classroom observations valid and reliable to use to rate teachers?

In a study conducted by the Brown Center at the Brookings Institute, entitled Evaluating Teachers with Classroom Observations, researchers found that classroom observation systems favor teachers with top students, and those teaching in large districts.  They also found that observations by outside observers were more valid that those conducted by school administrators.

Using student test scores and classroom observation data, the researchers obtained individual student achievement data linked to each teacher from four urban districts’ databases.  None of the names of the districts were revealed because of “political sensitivities” surrounding teacher evaluation systems.  I’d say so!

Using correlations of 0.33 to 0.38, the authors of the Brookings report claim that they have a statistically significant and robust predictive relationship with the ability of teachers to raise student test scores in an adjacent year.

It obvious that the researchers have a bias toward the use of student test scores to generate VAM scores, and believe this method is better than a teacher’s paper credentials and types of experience.  To say that a correlation of 0.33 is “robust” is a misinterpretation of this value.  A correlation of 0.33 is moderate at best and borders close to a weak positive relationship.  Then, to use moderate to weak correlations to evaluate teachers and make life decisions is unwarranted.

This study throws a curve at using VAM and Classroom Observations to rate teachers.

Strike one!

Evaluating Teachers with the Value Added Model (VAM)

In a study Instructional Alignment as a Measure of Teaching Quality published in Educational Evaluation and Policy Analysis, authors Morgan S. Polikoff (University of Southern California)  and Andrew C. Porter (University of Pennsylvania), found very weak associations between teachers’ instructional alignment and contributions to student learning (VAM).  They were interested in finding out if there was any relationship between the teacher’s alignment of instruction to the standards and assessments to the value-added to student achievement.  The study was funded by Gates Foundation.

Their research was guided by OTL, or “students opportunity to learn.” According to the authors, the literature of OTL including teaching practices aligned to the standards and the quality of methods used affect student learning.

But as the authors point out, their study is one of the first to associate VAM scores with both instructional alignment and pedagogical quality.  They also have data across state lines, and they are among the first to use VAM measures as dependent variables.  The teachers who participated in their study (a survey) were drawn from the MET study (the Gates Measures of Effective Teaching study).  In all, 701 teachers from six MET partner districts were contacted to take part.  Of those contacted, 388 responded, and 278 actually completed the survey (39% participation rate).

Here are some of their findings:

  • The researchers found very low correlations between teachers’ instruction with state standards and state and alternative assessments.  For math the overall  correlation was r = .16, and in ELA, they found a significant negative relationship ( r = -.24) between instruction-standards alignment and state test VAM for one district, and a very low correlation overall (r = .14).
  • In short, there is no evidence of relationships between alignment and a composite measure of effectiveness.
  • Overall, the correlations do not show evidence of strong relationships between alignment or pedagogical quality and VAM scores
  • The finding of lack of relationship between FFT (Danielson’s Framework for Teaching) and Tripod (student surveys) to VAM scores is in contrast to the statistically significant relationships found in analyses of the same correlations in the full study database (Bill & Melinda Gates Foundation. Still, the size of the relationships both here and in the full study was small.

The author’s conclusion are very important given the policy makers have made decisions to use VAM to evaluate teachers without evidence to support their use, let alone the AERA’s condemnation of using high-stakes tests to decide people’s fates.

Take a look at these conclusions the authors make about using VAM as a measure of teacher quality:

  • Overall, the results are disappointing. Based on our obtained sample, we would conclude that there are very weak associations of content alignment with student achievement gains and no associations with the composite measure of effective teaching….we are left with the conclusion that there are simply weak relationships..
  • Another interpretation the authors make is that the tests used for calculating VAM are not able to detect differences in the content or quality of teaching.  Since standardized tests, how could they really be used to detect differences among teachers.  (Well, what do you know!)
  • But here is a powerful conclusion they make.  These results suggest it may be fruitless for teachers to use state test VAMs to inform adjustments to their instruction.  And they add, if VAMs are not meaningfully associated with either the content and quality of instruction, what they measuring?
These are significant findings. Most states evaluating teacher effectiveness using a very complicated statistical analysis of student scores to predict the effect teachers should have on student test scores. In this study the researchers found very weak correlations (and in one case a negative relationship) between teacher alignment to the state standards and student learning gains.

Yet, states have insisted in tieing teacher evaluation to student scores. This study suggests that using student test scores as a measure of teacher quality is questionable.

In Georgia, where I live, the state legislature mandated that 50% of a teachers annual evaluation be based on Georgia’s version of VAM.  And, for the third time in a few years, Georgia will using a new standardized testing system, and they’ll raise the bar even further.

The Polikoff/Porter study is peer-reviewed research that policy makers should use to put the brakes on using student test score gains to rate teachers.

Strike Two!

Rating Teachers Does Not Improve Learning

Most school reformers subscribe to the use of test scores to rate teachers, and they also support (financially) to the unbelievably complicated systems of teacher classroom observations that have been generated and used in school districts.

The research reported here calls into question the use of these devices to rate teachers.

But, I want to suggest that to improve learning, to improve the education of children and youth, we need to remove all systems of rating teachers, or the use of merit systems, and bonuses for performance.  None of these has been shown to do anything for student learning.  And it creates a competitive system of rewards and punishments, and not a cooperative system of learning and life-long development.

To continue to think of education as a machine-age system in which teachers are workers, whose job is improve student performance as measured on high-stakes tests, which are made more difficult to pass, will lead to a failed and corrupt system.

Research by W. Edwards Deming, Russell L. Ackoff, Peter Barnard, Lisa Delpit, John Dewey, and Peter Senge point to the concept of “schools that learn.”  Schools that learn think differently than the standards-based, high-stakes accountability schools of today.  These schools see teachers as members of a team whose collaborative work will build a shared vision to foster a humane and energetic system or organization.  Schools that learn are focal points of learning in the communities around them, says Peter Senge.

However,  Atlanta has just hired a new superintendent (Dr. Meria Carstarphen) and according to public statements she made yesterday, she seeks “a culture change,” by focusing on student achievement and graduation rates.  But the problem with this, is that this is exactly what Dr. Beverly Hall did when she was superintendent of Atlanta resulting in a “culture of fear” (according to Governor Deal’s report) that resulted in the biggest cheating scandal in the country.  Doesn’t Carstarphen realize she is heading in the same direction?

Atlanta, if it follows the same path we’ve been on for a decade or so, will not necessarily witness wide swings in achievement test scores or graduation rates.

Continuing with standardized tests, and teacher evaluation systems will only result in a quick fix, and in the long run simply perpetuate the mechanistic solutions to schooling that remove any sense of creativity, and student-teacher collaboration.

Bring teachers in to sit at the table, and invite them to plan the future.  Don’t rely on hired guns brought from another city. The resources to think differently about student learning and school improvement already exist in the community.

Will Georgia Follow Florida and Release Teacher VAM-Like Scores?


Screen Shot 2014-03-04 at 4.07.26 PMWill Georgia follow Florida by releasing teacher VAM-like scores?  In Florida’s case, the Florida Times-Union released links to all 116,723 teachers’ VAM scores in an extraordinary unethical move that has happened other locations, including New York, and California.

The Georgia Department of Education (GADOE) will have two or more sets of data on every teacher in Georgia, and by law (Georgia House Bill 244) has to base 50% of the score on student growth measures (achievement tests).  The drive for a VAM-like system of teacher evaluation is a result of Georgia’s Race to the Top $400 million grant in 2010.  One of the chief components of the RT3 is the building of data systems about student and teacher performance.  As part of the RT3 grant, 26 Georgia districts have collaborated with the GADOE on the grant, and especially a teacher evaluation system.

Here is how it will work in Georgia.  Each teacher will have a summative score generated from three data sources (Figure 1) including assessment of performance based on classroom observations, assessment of teaching based on student surveys, and assessment of student growth using a Value Added Model (VAM).   This system is known as the Teacher Keys Effectiveness System (TKES).  There is a parallel system for administrators (LKES), although it is not discussed here.

Keep in mind that at the end of the day, each teacher will have a score computed by some form of mathematical formula based on these three subsystems, classroom performance, student surveys, and student growth.  It will be out there ready to be published for all to see.

Figure 1. Georgia Teacher Effectiveness System. From the Georgia Department of Education Office of School Improvement Teacher and Leader Keys Effectiveness Division
Figure 1. Georgia Teacher Effectiveness System. From the Georgia Department of Education Office of School Improvement
Teacher and Leader Keys Effectiveness Division


Let’s take a look at the three-parts of Georgia’s teacher evaluation system.

Teacher Assessment on Performance Standards (TAPS)

TAPS is the first subsystem, and it will be based on classroom observations and “documentation.”

Each teacher will be observed in the classroom and evaluated based on the Teacher Assessment on Performance Standards (TAPS).  “Trained” observers will visit (announced or unannounced) four times by short walk-abouts, and two full observations of 30 minutes each, for a total of about 100 minutes. Figure 2 is a list of 10 classroom standards, each of which is judged by a trained observer.

Your TAPS score will be based on the ratings on each performance standard listed in Figure 2.  The “trained observer” will use performance appraisal rubrics to rate you.

But here is the deal.  There are only ten performance standards in Figure 2.  But you will be rated on at least five performance indicators for each standard.  For example, according to the GADOE documentation on TKES, seven indicators are used to rate your professional knowledge.  My count based on the TKES document is that there are at least 69 performance indicators in the observational system.  Each of these performance indicators is a behavior that you must show consistently (regular intervals) or continually (frequently–every day, every class).  I did not make this up.  It is in the TKES documentation.

So, according to the state, teaching can be broken into about 69 behavioral performances which are grouped into ten categories or standards.  The GADOE puts it this way:

A Summative Performance Evaluation shall be completed for each teacher which establishes a final rating on all ten Performance Standards. These ratings shall take into account ALL data sources available including student perception data generated by the Surveys of Instructional Practice.

Ratings of Level IV, Level III, Level II, or Level I shall be provided for each of the ten performance standards using the performance appraisal rubrics. The evaluator will rate each of the ten Performance Standards based on the totality of evidence and consistency of practice.  (TKES Implementation Handbook, Georgia Department of Education, Office of School Improvement)

Figure 2.
Figure 2. TAPS Performance Standards, Georgia Department of Education.  Each standard will be rated using multiple indicators, and a final score will be generated at the end of the year.
Screen Shot 2014-03-03 at 7.56.51 PM
Figure 3. You must either consistently or continually do or show the performance indicators to get a passing rank in the TKES system.

Student Surveys of Instructional Practice

The second subsystem will ask for student opinions of their teachers in Grades 3 – 5, Grades 6-8, Grades 9-12.  This will give the state student perception data on each teacher, Grades 3 – 12.  The data will be used as part of teacher classroom performance.

The claim here is that student perception surveys should be “aligned” (as educators, we like this term) with the performance standards.  The student data will be uploaded to the GADOE Electronic Platform or similar data system.  Another set of data on teachers for the local newspapers to go after.

One section of the documentation that is a bit disturbing here is that student survey data must be included for standards 3, 4, 7, and 8, and if the TAPS ratings and student survey results are inconsistent, then the evaluator must give justification for the difference.  The state is assuming there is a direct correlation between student perceptions and teacher performance.  Where is the evidence?

Here are a couple of sample student survey questions:

  • My teacher encourages me to participate in class, rather than just sitting and listening.
  • My teacher encourages me to ask questions in class.
Figure 3. Survey results by mean for the four key standards (3, 4, 7, 8).
Figure 3. Survey results by mean for the four key standards (3, 4, 7, 8).

According to the state, if student survey data differs very much from the mean scores for these standards (Figure 3), then justification must be provided why there is a discrepancy.

This appears to be a nightmare.

VAM: Student Growth and Academic Performance

The State Department uses “student growth percentiles (SPGs) as a measure of student growth.  Don’t be fooled here.  This is no different from VAM.  It is VAM.

Here is what the state says about student growth percentiles:

Student Growth Percentiles (SGPs) shall be used as the student growth component of the Teacher Effectiveness Measure (TEM) for teachers of tested subjects. SGPs describe a student’s growth relative to his/her academically similar peers – other students with a similar prior achievement (i.e., those with similar history of scores). A growth percentile can range from 1 to 99. Lower percentiles indicate lower academic growth and higher percentiles indicate higher academic growth. From the 1st to the 99th percentile, growth is possible for all students regardless of previous achievement scores. Annual calculations of student growth for tested courses are based on state assessment data (grades 4-8 CRCT and high school EOCT). (TKES Implementation Handbook, Georgia Department of Education, Office of School Improvement)

But you need to realize that none of this is based on sound science.  As Diane Ravitch says, this is junk science, and of course I agree.  Here is what the state does.  It takes two years of earlier test data, and calls this data a pretest.  So if you teach fifth grade math, then the pretest data for YOUR CURRENT students are generated using these students 3rd and 4th grade CRCT results.  It is this data against which you will be evaluated as adding value to the student performance in your 5th grade math class.

But hold on.  Get this.  The SPG model will give the state a wealth of student, classroom, school, district, and state growth information based on Criterion-Referenced Competency Tests (CRCT) and End of Course Tests (EOCT).

But, what if you teach a course or subject that is not tested using a CRCT or EOCT.  Because there is no pre-test data available, then there is really no problem here.  Teachers or groups can make up a pre-test, which will be given at the beginning of the course, and then a post-test will be given at the end of the course.  Student gains can then be determined.

The problem is that this data is unreliable and of questionable validity.  The Pre-Post Test model is a valid model of research if there is a control group available for comparison.  But there isn’t.  Any gains or losses in scores can be due to history, maturation, characteristics of students enrolled in your class, changes in the sample of students in your class over the year, effects of pre-testing, reliability and validity of the test, and so forth.

50 – 50

You probably know that our state legislature has mandated that the student growth percentile be counted for 50% of teachers’ evaluation.  The Teacher Assessment on Performance Standards (TAPS) is weighted 50%.  Combining these scores is your evaluation. Teachers will receive a score and a rating level called Exemplary, Proficient, Needs Development, or Ineffective.

Mind you, there is no scientific basis for establishing scores for these four rating levels.  But the state will tell you that they do have the data.

Teacher evaluation is NOT nonsense.  As teachers, we welcome assessment, especially if can enhance our professional abilities and professionalism.  But this system, as rigorous as it is displayed in the state’s documentation is no different from the VAM nonsense that we saw unveiled by the Florida Times-Union.

I predict that the Atlanta Journal-Constitution (AJC) will seek and get all of this data, and publish in the AJC newspaper.

What do you think?  Will Georgia teachers be put on display as were all of Florida’s teachers?

Photo of Natick High School, Natick, Massachusetts