Is the Smarter Balanced Assessment Consortium Smart or Just Dumb?

Is the Smarter Balanced Assessment Consortium Smart or Just Dumb?  That’s the question we’ll try to address in this blog post.

The Smarter Balanced Assessment Consortium (Smarter Balanced) released scale scores for math and ELA (English Language Arts) aligned to the Common Core State Standards.

In their release to the public on November 17, Smarter Balance announced that:

Members of the Smarter Balanced Assessment Consortium have voted to approve initial achievement levels for the mathematics and English language arts/literacy (ELA) assessments that will be administered in 17 states and one territory this school year. The vote marks an important milestone in the development of the assessment system (emphasis mine).

So, a vote was taken (according to their press release) to approve a set of scale scores that will be used next year to evaluate students in 17 states when they sit at computers to take tests in math and ELA in grades 3 – 8 and high school.  Smarter Balanced explains that because the Common Core content standards set higher expectations for kids, then the new computer based tests will be more difficult.  Why?  Well, Smarter Balanced simply raised the bar, and they have no problem in stating that:

It’s not surprising that fewer students could score at Level 3 or higher. However, over time the performance of students will improve.

Fewer students experiencing success is another perfect set up for failure.

Is the Smarter Balanced Assessment Consortium smart, or is it dumb?

The answer to this lies in reading their comments about what they have done to set up a testing program that is based on false claims.  For example, they tell us that even though kids will not do very well when the tests come on-line, they are sure to improve over time.   They don’t improve over time, and we have more than a decade of results to show this.  Furthermore, raising the bar (supposedly making the standards more difficult, rigorous, demanding–choose your own descriptor) does not affect achievement test scores, as measured the National Assessment of Educational Progress (NAEP).  In a study looking at the relationship between the quality of standards and student NAEP scores, the correlations ranged from -0.60.08.  We interpret these correlations a moderate downhill (negative) relationship to weak uphill (positive) relationship.

That said, shouldn’t would conclude that Smarter Balanced should be the Dumb and Dumber Unbalanced Assessment?

And one more thing.

I have reported in earlier research on this blog that many researchers have concluded that we should not expect much from the Common Core State Standards.  In an interesting discussion of the implications of their findings, Tom Loveless, the author of the report, cautions us to be careful about not being drawn into thinking that standards represent a kind of system of “weights and measures.” Loveless tells us that standards’ reformers use the word—benchmarks—as a synonym for standards. And he says that they use too often. In science education, we’ve had a long history of using the word benchmarks, and Loveless reminds us that there are not real, or measured benchmarks in any content area. Yet, when you read the standards—common core or science—there is the implication we really know–almost in a measured way–what standards should be met at a particular grade level.

Voting on the Scale Scores: What’s this mean?

It amazes me that the members of an organization can vote on scale scores (real numbers), and think that this has meaning.  For instance, Figure 1 shows the mathematics threshold scale scores for grades 3 – 11.  It’s a nice graph, isn’t it.  And the graph is accompanied in their Smarter Balanced press release with a very colorful chart estimating the percentage of students who will score at each level by grade level.

Figure 1. Mathematics: Threshold Scale Scores set by the Smarter Balanced Assessment Consortium, November 14, 2014.  Source: media@smarterbalanced.org
Figure 1. Mathematics: Threshold Scale Scores set by the Smarter Balanced Assessment Consortium, November 14, 2014. Source: media@smarterbalanced.org. Extracted on November 17.

Here is the graph that displays the percent of students who will fail or pass.

Figure 2. How students will score that each level by grade.  Note that between 27 - 40 percent of students will fail to reach proficiency. Can you believe that? Source: media:smarter balanced.org. Retrieved November 17,2 014
Figure 2. How students will score that each level by grade. Note that between 27 – 40 percent of students will fail to reach proficiency. Can you believe that? Source: media:smarter balanced.org. Retrieved November 17,2 014

Are Standards and Aligned Assessments Scientific?

It’s a fair question. It’s a fair question because most of the 17 states will input student test scores into a mathematical algorithm called the Value Added Model to check the efficacy and quality of a teacher, and then use this number to decide upon the “grade” or assessment of the teacher. In some states, more than 50% of a teacher’s evaluation is based on this mathematical algorithm.

So, are standards and the aligned assessments scientific.

No they are not.

In her ground breaking book, Reign of error: The Hoax of the Privatization Movement and the danger to America’s Public Schools (Public Librarys), Diane Ravitch takes on this issue. Here is what she says:

All definitions of education standards are subjective. People who set standards use their own judgment to decide what students ought to know and how well they should know it. People use their own judgment to decide the passing mark on a test. None of this is science. It is human judgment, subject to error and bias ; the passing mark may go up or down, and the decision about what students should know in which grades may change, depending on who is making the decisions and whether they want the test to be hard or easy or just right. All of these are judgmental decisions, not science. (Ravitch, Diane (2013-09-17). Reign of Error: The Hoax of the Privatization Movement and the Danger to America’s Public Schools (Kindle Locations 1033-1035). Knopf Doubleday Publishing Group. Kindle Edition).

The Common Core State Standards and the Smarter Balanced Assessment Consortium (one of two aligned Common Core assessments) have along with its private corporate sponsors, and neo-liberal foundations such as Gates, Walton, Broad and others, have set up the perfect trap to fail millions of students, blame and then fire teachers, and then bring in privately run charter school management systems.

Think I’m kidding?  What do you think?

Clueless in Atlanta; Not So in Seattle

Maureen Downey is the education blogger at Get Schooled on the Atlanta Journal-Journal (AJC) website, and writes occasional education editorials for the newspaper. In her post today, she wonders why the teachers in Seattle are protesting by refusing to administer a test they are required to give three times per year to all students in their classes. She puts it this way:

What’s odd to me is the test Seattle teachers are choosing to protest, which is the Measure of Academic Progress (MAP). The high performing City of Decatur Schools uses MAP testing as well, giving it three times a year to see where students begin, where they are mid-year and where they are at the end of the year.

My kids attend Decatur schools and are not intimidated by MAP testing as it has been part of their education for a long time. Nor are they overly concerned with the scores, which they get instantly as the test is taken on a computer. I would be interested in what other Decatur parents out there think about MAP.

Downey clearly doesn’t understand the reasons for teachers boycotting the exam.  The MAP, purports to measure student’s academic performance in reading, math, language and science.  It is a product of the Northwest Evaluation Association, a testing company in Portland, Oregon.  MAP is computer generated test that adapts to student responses.  Downey claims that her children have no problem with the test as it is used at Decatur High School, a school located next to Atlanta.  That may be so, but her reasoning is flawed about why the teachers in Seattle refuse to give the test.

Here’s the deal.  Teachers at Garfield High School in Seattle announced their refusal to administer the standardized test, MAP.  The teachers believe it wastes time, money and resources.  According to one report, the test is useless for Algebra I students since the test is about probability and statistics and geometry, which are not in the curriculum.  Because students are told that the results on the test will not affect their grade or graduation, many do not take the test seriously.

But the real reason is that the teachers know that the test results do not offer formative assessment information that benefits them or their students.   In fact, some of the teachers want to replace the MAP standardized test with portfolios and tests that are related to their curriculum.

Seattle Public Schools paid $4 million to the company that its superintendent served as a member of the board of directors.  If the district spends this much on a test that doesn’t impact students, imagine what they pay for the other required standardized high-stakes tests.

What Downey misses here is that teachers in Seattle are not clueless about evaluation.  They know that assessment should be for learning.  The use of a test such as MAP DOES NOT promote student learning.  It has little meaning to specific students needs, and teachers’ expectations.

Downey needs to understand that assessment for learning is formative assessment. Formative assessments are everyday methods that teachers use to help students improve their learning and understanding , and to inform and improve their teaching. Formative assessment methods have been studied by many researchers, and one study, funded by the National Science Foundation found that teachers who use formative methods take the steps to find the gap between a student’s current work and the desired aim, and then together figure out how the gap can be bridged.

Formative assessment is multidimensional, and unlike high-stakes testing, is integrated into the curriculum. The assessments are authentic–that is to say, teachers use a variety of real activities to assess student progress–laboratory activities, writing essays, participating in a debate, classroom questions, and indeed simply observing and interacting with students.

Although banning high-stakes testing needs to done, assessment for learning is not a simple idea, but one that requires a multidimensional approach to assessment in the service of student learning.

The fact that teachers are willing to take the risk and act on their professional knowledge that these tests are not pedagogically valid.  Like their colleagues in Chicago, the Seattle teachers are willing to say no.

What do you think about this issue?  Are the teachers in Seattle acting in the interest of their students?

Anthony Cody: Designer of Value-Added Tests a Skeptic About Current Test Mania

Guest Post by Anthony Cody

Follow Anthony on Twitter at @AnthonyCody

Defenders of our current obsession over test scores claim that new, better tests will rescue us from the educational stagnation caused by a test prep curriculum. And one of those new types of tests is an adaptive test, which adjusts the difficulty of questions as students work, so that students are always challenged. This gives a better measure of student ability than a traditional test, and can be given in the fall and spring to measure student growth over the year. This approach is increasingly being used to determine the “value” individual teachers add to their students’ academic ability, which is then used as a significant factor in teacher evaluation — as required by the Department of Education as a condition for relief from No Child Left Behind.

One might expect the designer of these tests to be happy with the many uses now being found for the data they produce. But Jim Angermeyr, one of the architects of the value-added assessment, is not so thrilled. He worked with the Northwest Evaluation Association to develop tests, and more recently as director of research and evaluation with the Bloomington Public Schools. In thisfascinating interview with the Minneapolis Post, he shares some of his concerns as he prepares to retire from the field.

His first concern is the way test scores are being used to rate teachers:

We [test designers] have a healthy respect for error and how to measure it. And always a certain amount of caution when you’re interpreting results.
That caution grows as the groups get smaller, like looking at a classroom instead of a whole school. And that caution grows even more when the stakes increase because increasing the stakes can lead to all kinds of distortions, whether it’s the cheating that goes on in some of schools that you’ve been reading about around the country, or whether it’s just the general over-emphasis on testing to the exclusion of other things.

Dr. Angermeyr helps us put testing in its place. He says,

Where the distortion comes in is that you can only test a limited amount of the domain. Even if it’s a domain like mathematics, you can’t cover everything. And so you make assumptions about kids’ skills in that broader domain. Do we have eighth graders who are good readers based on a pretty small sample of questions and items?
Testing professionals know that you’re just sampling the domain and you don’t try to make inferences further than that. But nonprofessionals do that all the time. “American students are 51st in the world in reading.” There are a lot of assumptions that are made before you can get to that conclusion, but people leap right over that.If I was running the world, I would severely reduce the accountability stakes for tests. I would certainly eliminate things like No Child Left Behind. I would probably take away the current waiver. Even if it looks better, sometimes it’s still really the same wolf in different clothing.I would do away with standards, to be honest. Even though on paper they sound kind of cool, they assume all kids are the same and they all make progress the same way and move in lockstep. And that’s just not accurate. Standards distort individual differences among kids. And that’s bad.
I would put testing back as a local control issue in school districts. I would take the emphasis off of evaluating and [compensating] teachers. I would put the emphasis on good training for principals and curriculum specialists and teachers on how to interpret data and use it for the kind of diagnosis and assessment that it was originally intended for.

This resonates powerfully with what teachers have been saying since the beginning of No Child Left Behind. It reminds me especially of the work that Doug Christensen led in Nebraska several years back, focused on developing local control of testing and standards.

But Jim Angermeyr is also aware of the power of data to provide our leaders with the ability to simplify complex issues.

It’s politicians and some policymakers who believe tests can do more than they really can. And there’s not enough people stopping and saying wait a minute. When you can summarize a whole bunch of complicated things in a single number, that has a lot of power and it’s hard to ignore, especially when it tells a story that you want to promote. And that’s where it gets really twisted.

There are quite a few of us saying “wait a minute.”   There is a National Resolution on High Stakes Testing that has gathered the support of hundreds of organizations and thousands of individuals.

This message is also echoed in the latest news out of Florida, where the state School Board Association recently adopted a resolution condemning the over-use of high stakes tests, and objecting to their use as the primary basis for evaluating teachers, administrators, schools and districts.

Perhaps if those designing the tests raise their voices alongside those of us who are giving the tests, and the students taking the tests, and their parents as well, we can bring about the change we need.

What do you think? Can we return testing to its proper place as a diagnostic tool? 

Anthony Cody spent 24 years working in Oakland schools, 18 of them as a science teacher at a high needs middle school. He is National Board certified, and now leads workshops with teachers focused on Project Based Learning. With education at a crossroads, he invites you to join him in a dialogue on education reform and teaching for change and deep learning. For additional information on Cody’s work, visit his Web site, Teachers Lead. Or follow him on Twitter.  This post was published with Anthony’s permission.

Science Scores on NAEP for 8th-Graders Not So Bad

The National Assessment of Educational Progress (NAEP) published Science 2011, science results for grade 8.

A representative sample of 122,000 8th-graders were involved in the 2011 NAEP science assessment.  No student took the entire test.  Instead the 144 questions that made up the test were divided into nine 25-minute sessions of between 14 and 18 questions.  Each student responded to two sections. NAEP reported that no hands-on or computer tasks were administered.

The NAEP test assessed physical science (30%), life science (30%), and Earth and life science (40%).  The test included multiple choice and constructed response (open-ended) questions.

Analysis of test results is reported as the percentage of students performing at or above Basic and Proficient and at the Advanced level.

Results

The average eighth-grade science score increased from 150 in 2009 to 152 in 2011. The percentages of students performing at or above the Basic and Proficient levels were higher in 2011 than in 2009. There was no significant change from 2009 to 2011 in the percentage of students at the Advanced level.

As seen in Figure one, the average score of eight-graders improved from 2009

National Center for Education Statistics (2012). The Nation's Report Card: Science 2011 (NCES 2012–465). Institute of Education Sciences, U.S. Department of Education, Washington, D.C.

Continue reading “Science Scores on NAEP for 8th-Graders Not So Bad”

Anthony Cody Writes: At the Department of Education, Warm Snow Falls Up

Guest Post by Anthony Cody

As the Simpson family prepared to travel south of the equator to Brazil, Homer revealed some misconceptions. In opposite land, according to Bart’s father, “warm snow falls up.” Reading the latest press releases and speeches from the Department of Education, sometimes I feel as if this is where we have arrived.

For the past two years, the Department of Education policies have been roundly criticized by teachers. The latest response from Arne Duncan is a big public relations push bearing the title RESPECT — Recognizing Educational Success, Professional Excellence and Collaborative Teaching.

However, as in Homer’s opposite-land, everything seems to be upside down.

In his speech launching the project last week, Secretary Duncan laid out what he feels are the problems afflicting the teaching profession.

The Department has solutions to each of these problems – but they often have pursued policies that actually make things worse. Here are the problems, and the solutions the Department of Ed has offered — many of which are mandatory if states wish to qualify for Race to the Top or escape the ravages of NCLB:

Problem #1: “Many of our schools of education are mediocre at best. A staggering 62 percent of young teachers say they felt unprepared to enter the classroom.”
Continue reading “Anthony Cody Writes: At the Department of Education, Warm Snow Falls Up”