We Have Low Expectations for American Students in Math & Science!

Who the #@!% would make such a statement? Why would such a statement be made about America’s youth?

If you go the Broad Foundation Education page you will find the answer to the first question.  This is the first of four statements about American youth, followed by “stark” statistics.  The Broad Foundation says:

We have low expectations for American students.

Shame on them!

Image attributed to http://www.tagxedo.com

This is the foundation that has channeled over $400 million into education, primarily in charter schools, training of administrators, and online education.  It’s a very good time to be in the business of influencing and undermining public education these days, especially if you run a very well endowed foundation or corporation.

For years now, these same foundations and corporations are using statistics that misrepresent and pervert what is actually the case.  Data from tests, especially international test results are used by politicians, foundation heads, the media, and even the U.S. Department of Education to make proclamations about the status the country’s educational system.  Needless to say, American youth are beat over the head for not meeting someone else’s expectations.

TIMSS and PISA: The Super Bowls of Education

Two international assessments are: Trends in International Mathematics and Science Study (TIMSS) and the Program for International Assessment (PISA).  Each of these international organizations test students in mathematics, reading and science.  PISA studies 15 year-olds, while TIMSS assesses students in grades 4 and 8.  TIMSS claims to assess students’ performance on the curriculum, whereas PISA claims to test student’s abilities to apply what they have learned to real-world problems.  But please keep in mind, these are low stakes bubble tests comprised of a pool of questions that in general are without a context.

Since about 65 countries participate in these assessments, there is the general feeling that the results are important, and provide us with a glimmer of the nature of science education in these various nations.  Some would agree, others would argue that the real issues facing any nation’s educational system are masked by looking at average scores, and simple rankings.  Still others report that the findings are inconsistent.  For example, a country might score low on TIMSS, yet higher on PISA.  Most researchers urge that we use caution when interpreting the results, and not rely of simple averages (now someone’s thinking) to make judgements about student performance.

That said, Dr. Svein Sjoberg, Professor of Science Education, University of Oslo, and Director of the ROSE project (The Relevance of Science Education), an international comparative research project that gathers information about attitudes of students toward science & technology, makes this point regarding PISA:

the main focus in the public reporting is in the form of simple ranking, often in the form of league tables for the participating countries. Here, the mean scores of the national samples in different countries are published. These league tables are nearly the only results that appear in the mass media. Although the PISA researchers take care to explain that many differences (say, between a mean national score of 567 and 572) are not statistically significant, the placement on the list gets most of the public attention. It is somewhat similar to sporting events: The winner takes it all. If you become no 8, no one asks how far you are from the winner, or how far you are from no 24 at any event. Moving up or down some places in this league table from PISA2000 to PISA2003 is awarded great importance in the public debate, although the differences may be non-significant statistically as well as educationally.

If a team doesn’t win the Super Bowl, is that team a failure?  What do you think?  What does the public think?

Are our schools failing?  Is is a fair claim to say we have low expectations for American students?  The answer is no!

Let’s take a look.

The Math and Science Conundrum

It is easy to to make a quick decision about what you think about math and science education when you read headlines in the newspaper that report that the sky is falling on our educational system, or that we are experiencing another Sputnik moment.  But the teaching and learning of mathematics or science, as seen by practicing teachers and collaborating researchers is much more complex (and interesting) than the questions that make up the tests that PISA or TIMSS uses to assess mathematics and science in more than 60 nations.

The conundrum is this.  The vision of science that each of these tests measures gives meaning to scientific literacy that looks inward at the canon of orthodox science—the concepts, processes and products of science.  Science is seen through the lens of the content of science.  But added to this the fact that we have a second vision of science.  This vision of science includes public understanding of science and science literacy about science-related situations.  In this vision we are more interested in the context of learning, asvwell as the meaning that students attach to science and mathematics, and how it relates to their world.  The lens we use here to view science is within the framework of socioscientific issues (SSI).

TIMSS, because it is tied to the current traditional curriculum, is likely measuring the outcomes of vision I.  PISA claims to be measuring students abilities to apply what they learned to real situations.  But science education researchers Troy Sadler and Dana Zeidler disagree with this, and suggest that the test items that have been released publicly seem quite removed from the intent of the SSI movement.

Given this analysis, we are quite safe to claim that these tests are measuring Vision I of science education, and do not provide a full picture of what actually is happening in many classrooms, schools and districts.  Science education is more than learning terms, and concepts.  It should include problem-solving and inquiry, and investigations into problems that are relevant to students lived experiences.


Where do we stand?

PISA and TIMSS are favorite sources of data for foundations and corporations, and especially the U.S. Department of Education (ED) to use to show how poorly American students are doing in mathematics and science.  The Program for International Student Assessment ( PISA) is a system of international assessments that tests 15-year-olds in reading, math and science in 65 countries every three years.  The latest results are available for 2009.  The next will be administered in 2012.

Using scores from tests such as PISA or TIMSS to evaluate and assess science education misleads the public into thinking that science learning has been assessed in the first place.  For instance, in the United States there are more than 15,000 independent school systems, and to use a mean score on a science test, such as PISA, or TIMSS does not describe the qualities or inequalities inherent in the U.S.A.’s schools.  Furthermore, as we showed above, there are at least two visions for teaching science, and these tests seem to assess Vision I, ignoring perhaps more relevant and interesting science learning  that is taking place in many science classrooms.   That said, let’s look at two interpretations of data from these international tests.

Interpretation 1.

For example, take a look at these statistics that you can find here on the Broad Foundation website, most of which were based on PISA results from past years.

  1. American students rank 25th in math and 21st in science compared to students in 30 industrialized countries.
  2. America’s top math students rank 25th out of 30 countries when compared with
    top students elsewhere in the world. [PISA Math Assessment, 2006)]
  3. By the end of 8th grade, U.S. students are two years behind in the math being studied by peers in other countries. [Schmidt, W., 2003 at a presentation]
  4. Sixty eight percent of 8th graders can’t read at their grade level, and most will
    never catch up.

The Broad Foundation paints a picture of American education as a broken system, with little hope for many students, especially those who the Broad Foundation claims can not read at their grade level.

Interpretation 2.

Let’s take a look at another way to examine these data.  I have gone to the ED site that presents PISA data, and downloaded Highlights from PISA 2009 in reading, math and science to provide another view of the results.  Here is another interpretation, point by point.

  1. In mathematics, the only country of similar size and demographics that scored higher than the U.S. was Canada.  Most of the other countries that did score significantly higher were small European or Asian (Korea, Japan) countries.  The U.S. score was above the average score of OECD countries. Although there were 12 countries that scored significantly higher, there were only three that are similar to the U.S. in size and demographics.  We are not ranked 25th in math and 21st in science.   (source: PISA Data 2009)
  2. America’s top students’  performance place near the top of all students tested by PISA.  For example Dr. Gerold Tirozzi, Executive Director of the National Association of Secondary Schools, analyzed the PISA data from the lens of poverty, as measured by the percentage of students receiving government free or reduced lunches.  For example, Tirozzi found that in schools where less than 10% of the students get a free lunch, the reading score would place them number 2 in the ranking of countries.  This is very far from being 25th as reported by the Broad Foundation.
  3. Are we two years behind in the content of  math that is being studied by 8th graders?  There is no data that would support such a claim in the form of statistical analysis.  Curriculum differences have great variance from one country to another.  As in other countries, curriculum is implemented in American schools based now on the Common Core State Standards in mathematics, and the high-stakes tests that used in each state.
  4. It is not true that 68% of 8th graders can’t read at their grade level.  In the 2009 NAEP reading achievement-level results, 76% of American 8th graders were above  the basic level of performance.  The graph below shows 8th grade reading results, 1969 – 2011.  Yes, we have work to do, but the claim that 68% of 8th graders can not read is not justified.
NAEP Eighth-Grade Reading Achievement Results 1969 - 2011


 Trends in Performance

Here is the truth.

I have provided  graphs showing trends in science, mathematics and reading for American students as measured by National Assessment of Educational Progress (NAEP).  You will find that the trends reported by NAEP do not support the Broad Foundation’s opinions of American youth.

Science. U.S. students have significantly improved on the PISA test from 2006 to 2009, as shown in the graph below.  This trend is a positive sign, and disputes the claim that expectations for American students is low.  One of the ways in which data is perverted is to claim that American education, including science education is broken, and that the cause probably has to do with poor performance of “bad” teachers.  It is an unsubstantiated claim.

Average scores of 15-year olds in the U.S. and OECD countries in scienceSource: Fleischman, H.L., Hopstock, P.J., Pelczar, M.P., and Shelley, B.E. (2010). Highlights From PISA 2009: Performance of U.S. 15-YearOld Students in Reading, Mathematics, and Science Literacy in an International Context (NCES 2011-004). U.S. Department of Education, National Center for Education Statistics. Washington, DC: U.S. Government Printing Office.


Student performance is affected by a number of factors including gender, race/ethnicity, type of school, family income level.  The figure below shows Grade 4 results on the 2009 NAEP science assessment.  The graph shows relationship between family income (as measured by eligibility for reduced-price or free lunch).  Note that students of families with lower incomes perform lower than students from families with higher incomes.  This is an important factor when we interpret test scores, as Dr. Gerold Tirozzi found when he analyzed the PISA data from the lens of poverty.

Grade 4 Science Results, NAEP 2009 by Family Income. Click on the figure to explore this data in more detail.

Mathematics.  According to NAEP results, mathematics scores for 9- and 13-year olds were higher in 2008 when compared to previous years.  There was no significant change in the White – Black or White – Hispanic score gaps compared to 2004.  However, since 1973, Black and Hispanic students have made greater gains than White students.

Trend in Mathematics scores for 9- and 13-year olds 1973 – 2008.                                                                                                                                                    SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), various years, 1973–2008 Long-Term Trend Mathematics Assessments.

Reading. Overall, the national trend in reading showed improvement from 2004 to 2008 for students at three ages (9, 13, and 17). The average reading score for White, Black and Hispanic students was higher than in previous assessments.

Trend in fourth- and eighth-grade NAEP reading scores 1992 - 2011

Have you visited any of the educators in your community that teach science? Have you heard about any of the projects that they doing with their students? What do you think about the Broad Foundation’s crummy assessment of  American students’ performance in math and science and that we should have low expectations.  

What are they thinking?

High-Stakes Testing = Negative Effects on Student Achievement

In earlier posts, I have advocated banning high-stakes testing as a means of making significant decisions about student performance (achievement in a course, passing a course—end-of-year-tests, being promoted, and graduating from high school).  I suggested this because the research evidence does not support continuing the practice in American schools.

The research reported here sheds light on high-stakes testing, and shows why they should not be used to make decisions about students’ achievement, teachers’ performance, or to make sanctions or offer rewards to schools.

Research from the National Academies

The Board on Testing and Assessment of the National Research Council issued a report entitled Incentives and Test-Based Accountability in Education.  The report concludes that using test-based (high-stakes testing) incentives has not created positive effects on student achievement.  It says that school-incentives such as those of the No Child Left Behind Act produce some of the highest effects in the programs studied, but only in elementary mathematics, and the improvements were miniscule.  Exit exams, which are used in 25 states, typically given in each of the major content areas at the end-of-the-year have actually decreased graduation rates.

What do tests measure?

We rely on tests to inform us about academic learning, but we fail to consider not only what tests don’t measure, but the limitations on what they do measure.  We get ourselves in real trouble when we think that a score on a NCLB test, or a CRCT type of test is actually a good measure of student academic learning.  We get ourselves in further trouble when we believe that the score represents what students know, and we dig the hole deeper when we think that changes in student test scores (positive or negative) can be attributed to the performance of teachers.

The authors of the National Research Council report on Incentives and Test-Based Accountability in Education had this to say about tests:

The tests that are typically used to measure performance in education fall short of providing a complete measure of desired educational outcomes in many ways. This is important because the use of incentives for performance on tests is likely to reduce emphasis on the outcomes that are not measured by the test.

Collateral Effects

Collateral effects of testing is that the curriculum becomes narrow as teachers “teach to the test” and consequently stray from activities that might be interesting, or in the case of science, involve students in project-based work, or hands-on collaborative activities.  Using projects and hands-on activities takes away time needed to drill students on the content of the test, or in the case of elementary schools, these take away time to teach math and reading/language arts.

One of the constraints of test-based incentives is that there are many goals of teaching that are not measured by bubble tests such as curiosity, persistence, ability to solve problems, or to collaborate.  Yet, these might be as important as the content that is tested.

But as the Board of Testing report reveals, the tests that we use do not do a great job in measuring the performance in the tested areas such as science, mathematics, English, or social studies.  Since the tests in these areas are based on the outline of content as represented in the content standards of each subject, there simply is not enough time to test students in each content standard.

Constructing a Test is not So Simple

For example in the National Science Education Standards for grades K – 8, there are seven major areas of standards (Science as Inquiry, Physical Science, Life Science, Earth Space Science, Science and Technology, Science in Personal and Social Perspectives, and History and Nature of Science).  In these seven areas there are 64 content standards just for grades K-8.  If you then look at the details of the Science Standards for any one of the 64 content standards, one finds at least three fundamental concepts and principles that underlie the standards.  So at the least, we have 192 concepts to measure on a test. What is a test maker to do?

National Science Education Standards "Content Standards" Grades 5-8

If you were to develop a test for Grade 5, you would need to develop a domain chart that included about 96 concepts.  If you wrote one test item for each concept, then the test would be 96 items long.  But, that’s too long a test, so the test must be reduced in number, to say 30 or 40 items, meaning that not all of the content standards have been measured.  And what is worse, we are only using one test item to “measure” performance on each standard. Wouldn’t it be more valid if we used two or more test items to “measure” each standard? If we do, then we end up testing fewer standards. So high-stakes tests fall short in measuring the standards in most content areas, yet we continue to use them to make decisions about student, teacher and school performance.

As the National Research Council report suggested

…tests also fall short in measuring performance in the tested  subjects and grades in important ways.   Some aspects of performance in many tested subjects are difficult or even impossible to assess with current tests.  As a result, tests can measure only a subset of the content of a tested subject.

We can define what a test measures, but in the current era of high-stakes testing, the tests that are being used to measure performance in any subject (math, science, English) do not represent the full scope of the curriculum, and have been shown to be ineffective in increasing student achievement.  End-of-year tests, such as those given in Georgia, are high-stakes tests, and should not be used to determine if a student should graduate.  The evidence is that end-of-year tests actually result in decreasing graduation rates.


The authors of the Incentives and Test-Based Accountability in Education report recommend that since we do not yet know how to use test-based incentives consistently to make positive effects, policy makers should support and look at alternative evaluation models.  Furthermore, policy makers should make use of basic research and make choices from a number of options.  They go on to say that:

We call on researchers, policy makers, and educators to examine the evidence in detail and not to reduce it to a simple thumbs-up or thumbs-down verdict. The school reform effort will move forward to the extent that everyone, from policy makers to parents, learns from a thorough and balanced analysis of each success and each failure.

We would wish that policy makers would use the report to put a moratorium on using high-stakes tests to make decisions about students, teachers and schools.
Do you think this will happen? Comment and tell us what you think.