TIMSS: Always Claiming the Grass is Greener on the Other Side of the Globe

International science and environmental education have been a major focus of my professional work, and so when results on international comparisons are released by TIMSS (Math and Science, PIRLS (Reading), or PISA(Math, Reading & Science), I am eager to write about what these results mean.

On this website there have been many posts devoted to an analysis of international test results and the comparisons that fill the airwaves, the Internet and newspapers.  In the United States  (and in other countries as well) the perception of science (and mathematics) education is driven by published rankings based on science or mathematics achievement test scores.

A few days ago TIMSS and PIRLS released 2011 data on worldwide assessments in mathematics, reading and science.

In the 1980s through 2000 I was involved in a global or international science and environmental science project that began as a collaboration with Soviet science teachers, researchers, and professors of science and ecology.  I’ve written about it here and here. Over the years I traveled more than 25 times to Russia and to other republics in the former Soviet Union, and was involved in teaching in Russian schools, as well as collaborating with other Americans and Russian educators to create the Global Thinking Project, an international science program.  It was an inquiry-based environmental science project in which teachers worked with their students on local environmental issues and questions, used the Internet to collaborate, report, and discuss their findings with peers in other countries.  With time, many other countries joined our effort.  Some of these included Argentina, Australia, Botswana, Czech Republic, Japan, and Spain.

The TIMSS assessments began in 1995, and have continued to the 2012 report, and will continue into the foreseeable future through 2015.  In 1995, at a National Science Teachers Association (NSTA) annual meeting, there was a session on TIMSS at which the directors of the project, many of which represented different countries, spoke about the new international assessment.  I happened to be in attendance, and after the session, I expressed serious doubts about such assessments and later comparisons that would be made.  Even though the directors will tell you that comparisons really are not a valid exercise, it seems to me and many other educators, that comparisons are all that the media seems to report.


In the media, the way TIMSS and other international (or national) test results are reported is using the sports metaphor of league standings.  TIMSS is kind of analogous to the Olympics in that the main reason for competing is to take home the gold.  In fact, according to the historical record of TIMSS, the U.S. students never score high enough to even merit a bronze medal.  If your country’s students don’t score high enough to get into the top three or four, reports fly in national and local newspapers claiming that the sky is falling, and that the nation’s education is in perilous condition.

The 2012 TIMSS report immediately identifies East Asian countries among the top performers in TIMSS 2011.  Also high percentages of East Asian students reach TIMSS international benchmarks.  Benchmarks are classified by score as low, intermediate, high, and advanced.  These are arbitrary and do not have any basis in research.  They are simply a way to differentiate and classify test ranges.  The media focus on findings such as these, and leaves the impression that comparisons across countries are valid, and helpful.  They are not.

Leader board based on TIMSS Scale Score

Nearly all reports in the media use the data in the first two columns of the chart that I created based on TIMSS 2011, the name of the nation and the TIMSS scale score (either 4th or 8th grade).  Bear in mind that there were 49 countries that participated in TIMSS 2011.  I’ve only listed the top 19.    Note that the top five countries are East Asian and Finland.  The U.S. is in tenth place.

TIMSS Leaderboard based on Science Test Scores, Grade 8
Figure 1. TIMSS Leaderboard based on Science Test Scores, Grade 8

Leader Board Based on Poverty

Very few reports discuss the issue of poverty and its relationship to TIMSS scores.  Here we have listed the countries in descending order based on CIA statistics on poverty levels in the participating nation.  Of the top 19 scoring countries, the U.S. has the 5th highest poverty level.  According the most recent data, the U.S. has a poverty level of 15.1.  Notice that the number of people in the U.S. living in poverty is 46.2 million.  Why we continue to compare countries using average achievement test scores is beyond me.  As many researchers have pointed out, if you don’t take into consideration the poverty concentration of schools, comparisons are meaningless.

The fact is comparisons are simply meaningless, and don’t give one iota!

Nations Listed By Poverty Levels
Figure 2. Nations Listed By Poverty Levels

What is the relationship between student’s scores on TIMSS and poverty? The Pearson correlation coefficient was used to determine the strength of the linear association between poverty and TIMSS scores in 8th grade science.  As we can see here, there is an inverse relationship between 8th grade science scores and the poverty levels in these nations.  The poverty levels, which ranged from 0.9 percent to 24 percent, show a strong negative relationship with science scores.
The Pearson r for the TIMSS 8th grade Science score and Poverty level = -0.39, which means there is a moderate to negative relationship; in this case the higher the poverty level, the lower the science scale score.

Unless we acknowledge the effect of poverty on achievement, comparisons among countries are not valid.  And if you take a look at Figure 1 again, notice the variation in population for the top 19 countries.  The countries that score at the top of the leader board have populations that range from 2.6 – 15 million, except for Japan which has 127 million people.  Compare that to the population of the U.S. which is 311 million. How can an average score for a country like the U.S., make any sense? The U.S. has more than 13,000 school districts.

Graph of the Relationship between 8th Grade Science Scores and Poverty Levels
Figure 3. Graph of the Relationship between 8th Grade Science Scores and Poverty Levels


There are many factors that affect student performance on international tests.  Here are some more factors that play a major role in student achievement as measured on the TIMSS assessment.

Home Resources

As other researchers have reported, socioeconomic factors such as resources available to students greatly affects the success in school.  TIMSS used parents’ reports to decide the level of resources.  Notice that the same inverse relationship with poverty is present here.

Relationship between home resources and science scores at the 4th and 8th grade levels.
Relationship between home resources and science scores at the 4th and 8th grade levels.


Affect of Affluence, Safety and Bullying

It is not surprising that students living in affluence and attending schools in affluent neighborhoods do better on tests than students who live and attend schools in poor neighborhoods.  The TIMSS report does discuss the implications of poverty on test scores.  In fact, the word “poverty” only is used once in the 517 page report.  Instead of using the concept of poverty, the managers of TIMSS use the term “disadvantaged.”  Again affluence was directly and positively related to science achievement.

Students also do well on achievement tests when they are in safe environments, there is hardly any discipline problems, and very little bullying.

Student Attitudes

TIMSS reported that students said they liked science did well, and students who had confidence in their ability to learn science, also did well. In project ROSE: The Relevance of Science Education, researchers at the University of Oslo studied the factors of importance to the learning of science and technology (S&T), as perceived by learners.  One finding in the ROSE project was that there was difference in the attitudes of students in Western industrialized nations compared to students in Developing Countries.  It is important to note that students valued science and technology, but students in Developing nations had a positive attitude toward learning science, because students from Western nations tended to have more negative attitudes of school the longer they were in school.

Teaching and Teacher Preparation

According the TIMSS 2011 report, teacher preparation and experience teaching were significant factors in student achievement.  Teachers who had strong background in content and pedagogy were more successful in helping student learn science and math.  And teachers with at least 10 years of experience were the most successful in teaching reading at the 4th grade level, and mathematics and science at the 4th and 8th grade levels.

This is a significant finding in light of the burgeoning business of Teach for America which claims to be successful by preparing teachers to teach in the most difficult schools.  According to TFA, it only requires five weeks of summer training to get élite students ready to teach in urban schools.  According to independent research, TFA recruits teach for only a couple of years.  However, Anthony Cody over on Living in Dialog writes that Wendy Kopp, TFA founder and CEO claims that on average TFA recruits stay in teaching for eight years.  Independent research studies show this not be true, that TFA teachers stay in the classroom for a little more than 2 years on average.

One More Thing

Is the grass greener in other countries?  Is is a valid and useful exercise to test nearly a million kids every few years to publish a report that only pushes the buttons of politicians and policy makers?  What do you think?



Misconceptions about International Math & Science Test Scores

Why is it that the perception of science education in the U.S. (and other countries as well) is driven by rankings of students on international test score comparisons?  The perception is that U.S. students are not competitive in the global market place because of their position in the rankings of the scores obtained on tests such as PISA and TIMSS.

Yet, as Iris C. Rotberg has shown in her analysis of educational reforms on a global scale, most of the conclusions that we make based on international studies are not supported either by the findings or by research in general.

Student Test Score Rankings & Global Competitiveness

For example, the most visible conclusion that is made from the international studies is that “test-score rankings are linked to a country’s economic competitiveness.”  Rotberg uses data from the World Economic Forum’s 2010 – 2011 global-competitiveness report to show that student test score rankings do not correlate with a nation’s economic competitiveness.  For example on the 2009 Programme for International Student Assessment (PISA) U.S. students do not rank in the top 10 member countries in any of these areas: Maths, Sciences, and Reading. The United States ranked 30 in maths, 23 in sciences, and 17 in reading.

Yet, in 2011, the United States was in 4th place in the rankings of 139 countries global competitiveness (dropping from the number 2 position from the last year).  The comparisons across countries are made using 12 pillars of competitiveness, including basic requirements (institutions, infrastructure, etc.), efficiency enhancers (higher education, good market, labor market, financial market, etc.) and innovation and sophistication factors (business sophistication, innovation).

 What is making the United States less competitive, according to the report?  Could it be the way math or science is taught in our schools?  Could it be that are teachers are not competent to teach math or science?

The major factors identified in World Economic Forum analysis identifies as contributing to making the U.S. less competitive (i.e., dropping from 2nd to 4th) are really not surprising.  The evaluation of institutions, the fact that the public does not trust politicians, and that the business community is concerned about government interference with business are three for starters.

Here are some more: the business community thinks the government spends its money too freely.  There is also increasing concern about the nature of the auditing of private companies, as well as the downward trend of business ethics.  And one of the major factors causing the United States to become less competitive is the “burgeoning levels of public indebtedness.  The debt ceiling fiasco that we all witnessed in Washington led the United States losing its prized AAA credit rating from S&P.

Iris Rotberg concludes that continuing to use student test scores is not a valid argument to understand a nation’s competitiveness.

A nation’s competitiveness is too complicated, and is impacted by other variables as identified above, and put rather nicely by Rotberg as follows:

Other variables, such as outsourcing to gain access to lower-wage employees, the climate and incentives for innovation, tax rates, health-care and retirement costs, the extent of government subsidies or partnerships, protectionism, intellectual-property enforcement, natural resources, and exchange rates overwhelm mathematics and science scores in predicting economic competitiveness.

Taking the Lead

One of the outcomes of reading the international test reports is that corporate leaders, politicians, and influential policy makers continue the cry that U.S. education is lacking in math and science, and that our place in the world of economic prosperity is being challenged.  There is no evidence to support this other to say that the forces identified above are contributing to challenge America’s prosperity.

For example, for the past year or so, the Carnegie Foundation funded and will continue to fund a process that will lead to a new generation of science standards.  In the Summer of 2011, the National Research Council announced the publication of a Framework for K-12 Science Education which will be used by Achieve, Inc., to write a new set of science standards, K-12 to be published in 2012 or 2013.  Achieve, which also wrote the Common Core Standards in math and reading/language arts, has begun the process which involves cooperation with the AAAS, and NSTA.  In one of their documents, which are online at their site, they had this to say about the relationship between the new standards, and the United States position in the world:

Conditions are right for the United States to take the lead internationally in forging a new conceptual framework for science, and next generation science standards. The NRC framework and aligned science standards will create a fresh vision for science education and new directions for teaching, learning, and assessment that could contribute significantly to improving student understanding and achievement. Seizing the opportunity that this moment presents will bring us a step closer to moving the United States into the vanguard of international science education reform.

You might wonder what is the problem with a new conceptual framework, a fresh vision for science education. Actually, this process might be valuable if it were tied directly to curriculum development and teacher education.  However, the problem is that these new standards will be part of a continuing effort to reform science education along the lines of NCLB Act in which achievement test scores are used as the marker for measuring what students have learned in schools, and how well teachers and schools are performing.  The standardization of the science curriculum seems to me to be the antithesis of innovation, which is one of the 12 pillars used to assess the competitiveness of a nation’s economic system.

Science teaching has much to offer society, especially science teachers that embrace innovation, creativity, and inquiry as core to their teaching approaches with their students.

Continuing to used data in meaningless and unsupportable ways to achieve ends of a few corporate leaders, and policy makers is not in the best interest of American science education.

The Lens of Poverty

A report this week indicated that the poverty rates in the U.S. had increased and that one out of six Americans lives in poverty (46.2 million people).  The poverty rates among African-Americans and Hispanics, 27.4 and 26.6, are more than double that of whites, which is 9.9 percent.

According to separate research analyses by Rotberg, and Tirozzi, the examination of international (or national) test results through the lens of poverty uncovers quite a different picture.  Each researcher has reported that poverty and concentrations of poverty have adverse effects in schools on student performance (in all countries).

For example on PISA test results socioeconomic status accounts for more than 80 of the difference in performance.  Tirozzi, using free or reduced lunch data as a marker of poverty, found that the U.S. has the largest number of students living in poverty (21.7), and that the only other nations (taking part in PISA) that had poverty levels close to the United States were the U.K, and New Zealand.  U.S. schools with less than 10% poverty rank one in the world, those with 10 – 25%  poverty rank third, behind Korea and Finland, and U.S. schools with 25 – 50% poverty are tenth in the world.

The recent cheating scandal in the Atlanta Public Schools an unfortunate example of what happens when education becomes deterministic based on a set of policies that drive schools and systems to “create a culture of fear” to make sure that schools meet accountability standards that are not based on supporting documentation or research.

There are many misconceptions surrounding the use of achievement test results in making claims about the quality of science education.

 Suggested Readings:

Pisa Test Results Through the Lens of Poverty, Art of Teaching Science Weblog

International Test Scores, Irrelevant Policies by Iris C. Rotberg, Education Week, September 13, 2011

Balancing Change and Tradition in Global Education Reform by Iris C. Rotberg, et.al.

The Competitiveness Report 2010-2011, World Economic Forum

The Economics Behind International Education Rankings, Cynthia McCabe

Next Generation Science Standards


Why Do We Teach Science, Anyway? The Democratic Argument

There are at least two interpretations that emerge when we explore why we teach science from the democratic argument.   The first interpretation is that we should be teaching science to help students become informed citizens in an increasingly technocratic and scientific world, and provide them with the tools to intelligently discuss, vote on, and make decisions about “modern life, politics and society.” (Turner, p. 10.)  But we also interpret the democratic argument in the context of democratic schools–that is schools in which students and teachers participate equally in shared decision-making on matters related to the organization of school, the curriculum and related matters.

In am going to focus on the first argument here, namely that school science should be in service of helping students become informed citizens.  In science education, there is an interesting history of curriculum projects and efforts at the school level aimed at a science education that are context-based. (See Judith Bennett for synthesis of the research on context-based science)  Helping students become informed students is also the subject of Science-Technology-Society Environment (STSE), environmental education, social responsibility, public understanding of science, humanistic science, and citizen science.

In the democratic paradigm of science education, contexts and applications are the starting places for learning about science, which is in contrast to the traditional approach to science teaching, which chiefly attends to the structure of the disciplines of science, and its subject matter knowledge in curriculum design.  This is clearly a very different approach than is used in the design and construction of standards in science.  The 1996 NSES and the Conceptual Framework for a New Generation of Science Standards start with the key concepts or core ideas in the disciplines of science: earth science, life science, and physical science ( engineering and technology were added as a fourth area in 2010 Conceptual Framework).  If you want to find examples of STS or Context-based science standards, you have to mine the standards to find instances of STS.

The democratic argument creates a curriculum that potentially is more interesting to students.  In fact, in a synthesis of research on S-T-S Context-based science programs, Judith Miller and colleagues reported that:

detailed research evidence from 17 experimental studies undertaken in eight different countries on the effects of context-based and STS approaches, drawing on the findings of two systematic reviews of the research literature. The review findings indicate that context-based/STS approaches result in improvement in attitudes to science and that the understanding of scientific ideas developed is comparable to that of conventional approaches.

This is an important finding.  In a very large study involving more than 40 countries, researchers of the Rose Project (The Relevance of Science Education) surveyed the attitudes of thousands of 15-year old students to find out the status of science education.  Under the direction of Svein Sjøberg, & Camilla Schreiner (University of Oslo), the Rose Project seeks to address:

mainly the affective dimensions of how young learners relate to S&T.  The purpose of ROSE is to gather and analyze information from learners about several factors that have a bearing on their attitudes to S&T and their motivation to learn S&T.  Examples are: A variety of S&T-related out-of-school experiences, interests in learning different S&T topics in different contexts, prior experiences with and views on school science, views and attitudes to science and scientists in society, future hopes, priorities and aspirations as well as young peoples’ feeling of empowerment with regards to environmental challenges, etc.

The findings in the ROSE study are important to the democratic argument because the researchers sought to find out about students attitudes about the science curriculum and science in their lives and society. As the researchers claim, developing a positive attitude about science is an important goal of science teaching, and it would appear important to know what attitudes students hold.  Most large scale assessments of students focus on the “knowledge” students have as reported by TIMSS and PISA.  ROSE researchers point out that

It is a worrying observation that in many countries where students are on top of the international TIMSS and PISA score tables, they tend to score very low on interest for science and attitudes to science.  These negative attitudes may be long lasting and in effect rather harmful to how people later in life related to S&T as citizens.

Designing a science curriculum around STSE not only will further the democratic argument, but it might contribute to more positive attitudes of students about science.  In Bennett’s research, it was found that in context-based science programs, students achieved at the same content levels as students in more traditional science courses.  We could argue that context-based program might serve not only the students, but contribute to an improvement of science teaching in general.

Moving ahead with a context-based or STSE approach to science curriculum is not without problems.  Are there significant context-based themes that could be used with young students, say in grades K- 4?  Is this approach more applicable to students in middle and high school?   There is also the problem with teacher education.  Some researchers suggest that teachers are more reluctant to move away from the content of their discipline, and entertain social and contextual issues as a basis for curriculum.

But there are many examples of context-based science programs that are successful with students and teachers.  ChemCom (Chemistry in the Community) is one example—a high school chemistry course that is context based, SEPUP (Science Education for Public Understanding), Project Learning Tree, and Project Wild, just to name a few.

Students need to see relevance and connection between their lived-experiences and the science content (or any content for that matter) that they learn in school science.  The democratic argument for why we teach science appears to foster these connections.

Coming next: Why do we teach science? The Skills argument.

Progressive Science Education

I have been reading and have referenced on this weblog the October 2009 special issue of the Journal of Research in Science Teaching (JRST) on the topic/theme “Scientific Literacy and Contexts in PISA Science.”  The articles in the special issue provide a broad view of international testing as conceived in PISA, as well as the TIMSS.

One of the articles (by Sadler and Zeidler) which was focused on PISA and Socioscientific Discourse, used the term progressive science education as a way to describe a vision of science education that includes public understanding of science, humanistic science education, context-based science teaching, S-T-S, and socioscientific issues.  As pointed out by the authors, the term progressive science education was used by George DeBoer in 1991.

A bit of background.  In a paper written by Douglas A. Roberts (Scientific Literacy/Science Literacy) that appeared in the 2007 Handbook of Research on Science Education, the author introduced two visions to explore the notions of scientific and science literacy, namely Vision I and Vision II.  In Roberts view, Vision I gives meaning to scientific literacy by “looking inward at the canon of orthodox natural science, that is, the products and processes of science itself.”  As Roberts states, this approach envisions literacy (or, perhaps, thorough knowledgeability) within science.  He goes on to point out that the Benchmarks for Science Literacy by the AAAS approximates his view of Vision I.  I would add that the National Science Education Standards (NCES) imparts Vision I as well.

To Roberts, there is a contrasting and quite different vision of science, Vision II, which gets its meaning from “the character of situations with a scientific component, situations that students are likely to encounter as citizens.”  Roberts defines this vision as literacy (again, read thorough knowledgeability) about science-related situations.  In my view, a very good description and discussion of Vision II is by Glen Aikenhead in his book, Science for Everyday Life.

We might think of Vision I as traditional science education; Vision II as progressive science education.

In the JRST special issue on PISA Science, some of the authors suggest that most of the documents produced in the past 20 years under the “standards movement era” tend to support Vision I.  Indeed, we could also suggest that most state-standards are written as Vision I science literacy.  At the US national level, the NAEP assessments focus on Vision I.  At the international assessment level, we might identify TIMSS as a Vision I marker.

There is some suggestion in the JRST issue that PISA 2006 aligns very closely with the Vision II view of science literacy described by Roberts.  This view was suggested by the editors of this special issue, but Sadler and Zeidler wrote that they have serious concerns about the extent to which PISA supports progressive science education.  Can progressive science education, or Vision II science literacy be “measured” by the use of a standardized assessment such as PISA?

The answer to this question is probably not.  As much as the authors of PISA would like us to believe that the test measures contextualized and controversial topics, others argue that the items are really decontextualized.  I found the items on PISA to be quite complicated, and required a lot of reading, and in some cases, what the students were asked to read had little or nothing to do with the questions that were asked.   One thoughtful evaluation of the PISA assessment program was written by Svein Sjoberg (see PISA and “Real Life Challenges: Mission Impossible).  He suggests that, although PISA claims to test “real-life skills and competencies in authentic contexts,” such a goal is impossible in a traditional testing environment as described in the PISA documentation.

Progressive science education (humanistic science education) will require a different form of assessment, and one that will rely on the observations, and active assessment of learning in the context of classrooms by science teachers and researchers.  The most effective form of classroom assessments that contribute to our understanding of student learning, and indeed help students improve in their learning are formative assessments, not summative assessments in the form of PISA, or TIMSS.

Yet, in the USA, where science education has actually made a great deal of progress (see Lowell & Salzman), the winds of change are aimed at further standardizing teaching by the “common standards movement.”  This will be followed by the development of “common tests” which will be used to compare and contrast schools, school districts, states, and individuals, including teachers and principals.

More to come on this topic.

Students Lag in Science So Says the National Center for Education Statistics

There was story on cnn.com today that caught my attention entitled U.S. students behind in math, science, analysis says.  The analysis was written by the National Center of Educational Statistics and was a summary analysis of several international assessments including the Trends in International Mathematics and Science Study (TIMSS), and the Program for International Student Assessment (PISA, 2006 results).

The story was a report of a brief talk given by the U.S. Secretary of Education (Mr. Arne Duncan) in which he used the results on the “Condition of Education” issued by the National Center of Educational Statistics.  You can see the full report by clicking on the previous link.  The basic question in the report was: How do U.S. students compare with their peers in other countries?  For all the details that you can examine, the analysis comes down to this:

The performance of U.S. students neither leads nor trails the world in reading, mathematics, or science at any grade or age (quote from report’s summary).

The Secretary of Education uses the results of the analysis to say that “we are lagging the rest of the world, and we are lagging it in pretty substantial ways.  I think we have become complacent. We’ve sort of lost our way.”  Unfortunately politicians believe that the data represents an accurate picture of student learning, and use it to drum up support for their policies. Yet, U.S. scores have not changed since 2000.

If you look at the PISA results, which are for the year 2006, U.S. 15 year-old students scored higher than some peers, and scored lower than some peers on the major areas of testing as reported by PISA: overall scientific literacy, identifying scientific issues, explaining phenomena, and using scientific evidence.  Rank ordering the countries by score (similar to way we rank order competitive sports),  in overall scientific literacy, Finland leads the way with a score of 563 (500 is average), the U.S. scores 489 (21st), and Mexico scores 410 (30th).

Is the sky falling?  Have we lost our way?  Should we pay math and science teachers more?  Can we educate our way to a better economy?

I’ve written before that the results of international comparisons and other large-scale assessments need to be carefully scrutinized before making sweeping generalizations about the fitness of a country’s or state’s educational system.  For example, the U.S. has more than 15,000 independent school systems;  to use an average score that is representative of the students in these schools based on a sit-down test of 48 to 60 items doesn’t describe the qualities or inequalities inherent in any country’s schools.

Results as reported by PISA and TIMSS help shape the public image of science education (or mathematics education), and it is unfortunate that educators allow this to happen.  Dr. Svein Sjøberg of the University of Oslo in a publication entitled Pisa and Real Life Challenges: Mission Impossible, questions the use of tests such as PISA and TIMSS.  He informs us that:

The PISA project sets the educational agenda internationally as well as within the participating countries. PISA results and advice are often considered as objective and value- free scientific truths, while they are, in fact embedded in the overall political and economic aims and priorities of the OECD. Through media coverage PISA results create the public perception of the quality of a country’s overall school system. The lack of critical voices from academics as well as from media gives authority to the images that are presented.

PISA measures only three areas of the curriculum (math, science, reading), according to Dr. Sjøberg , and the implication is that these are the most important areas, and areas such as history, geography, social science, ethics, foreign language, practical skills, arts and aesthetics are not as important to the goals of PISA.  TIMSS, according to his analysis (and I would agree) is based on a science curriculum that many science educators want to replace, yet uses test items that could have been used 50 years ago.   In general the public is convinced that these international tests are valid ways of measuring learning, and that the results can be used to draw significant conclusions about the effectiveness of teaching and learning.

If you live in the world of psychometrics and modeling, the results that are gathered by these international testing bodies is a dream come true. Sjøberg puts it this way:

PISA (and even more so TIMSS) is dominated and driven by psychometric concerns, and much less by educational. The data that emerge from these studies provides a fantastic pool of social and educational data, collected under strictly controlled conditions – a playground for psychometricians and their models. In fact, the rather complicated statistical design of the studies decreases the intelligibility of the studies. It is, even for experts, rather difficult to understand the statistical and sampling procedures, the rationale and the models that underlie the emergence of even test scores. In practice, one has to take the results at face value and on trust, given that some of our best statisticians are involved. But the advanced statistics certainly reduce the transparency of the study and hinder publicly informed debate.