Samenvatting van kritische rapporten over PISA-toets
INTERNATIONAL T EST SCORE C OMPARISONS A ND E DUCATIONAL POLICY : A REVIEW OF THE C RITIQUES
Martin Carnoy, Stanford University
In this brief, we review the main critiques that have been made of international tests, as
well as the rationales and education policy analyses accompanying these critiques
particularly the policy analyses generated by the Program for International Student
Assessment (PISA) of the Organization for Economic Cooperation and Development
We first focus on four main critiques of analyses that use average PISA scores as a
comparative measure of student learning:
Critique #1: Whereas the explicit purpose of ranking countries by average test score is to allow for inferences about the quality of national educational systems, the ranking is misleading because the samples of students in different countries have different levels of family academic resources (FAR).
Critique #2: Students in a number of countries, including the United States, have made large FAR-adjusted gains on the Trends in International Mathematics and Science Study (TIMSS) test 1999-2011, administered by the International Association for the Evaluation of Educational Achievement (IEA). However, they have shown much smaller, or no, gains on the FAR -adjusted PISA test. This raises issues about whether one test or the other is a more valid measure of student knowledge.
Critique #3: The error terms of the test scores are considerably larger than the testing agencies care to admit. As a result, the international country rankings are much more in flux than they appear.
Critique #4: The OECD has repeatedly held up Shanghai students and the Shanghai educational system as a model for the rest of the world and as representative of China, yet the sample is not representative even of the Shanghai 15-year -old population and certainly not of China. In addition, Shanghai schools systematically exclude migrant youth. These issues should have kept Shanghai scores out of any OECD comparison group and raise serious questions about the OECDs brand as an international testing agency.
This brief also discusses a set of critiques around the underlying social meaning and educational policy value of international test comparisons. These comparisons indicate how students in various countries score on a particular test, but do they carry a larger meaning? There are four main critiques in this regard.
First, claims that the average national scores on mathematics tests are good predictors of future economic growth are, at best, subject to serious questions and, at worst, gross misuse of correlational analysis. The U.S. case appears to be a major counterexample to these claims. Japan is another.
Second, the use of data from international tests and their accompanying surveys have limited use for drawing educational policy lessons. This is because cross-sectional surveys such as the TIMSS and PISA are not amenable to estimating the causal effects of school inputs on student achievement gains. Further, unlike TIMSS, PISA neither directly
measures teacher characteristics and practices, nor can it associate particular teachers with particular students. Yet, again in the case of the OECD, there seem to be no end of asserted policy lessonsnone with appropriate causal inference analysis, many based on questionable data, and others largely anecdotalproposed by the same agency that developed and applied the test.
Third, critiques have pointed to the conflict of interest that arises because the OECD (and its member governments) acts simultaneously as testing agency, data analyst, and interpreter of results for policy purposes.
Fourth, a recent critique questions the relevance of nation-level test score comparisons of countries with national educational systems to other countries with more diverse and complex systemssuch as the United States, with its 51 (including the District of Columbia) highly autonomous geographic educational administrations. This newest critique goes beyond the questions raised about the validity of international test comparisons and even beyond the careless way results are used to draw conclusions about good educational policies. PISA and TIMSS scores for U.S. states show large variation in student performance among states. PISA results for U.S. states are available only in 2012,
but FAR -adjusted TIMSS scores are available for a number of U.S. states over more than a decade. These show large performance gains for some states and smaller gains for others.
The critique suggests that from the standpoint of U.S. educational analysts and politicians, it would seem much more relevant and interesting to employ state-level test results over time to understand the policies high-gaining states implemented in the past 20 years than to examine other countries educational policiesif, indeed, it is their educational policiesthat are behind any large test score gains made in the decade of the 2000s.