After PISA: Standards, Assessment, and Accountability in Germany and in the U.S.

Table of contents

1. Introduction

2. 2001: The “PISA shock” hits Germany and the U.S.
2.1 After PISA: Reforms in Germany
2.2 After PISA: Reforms in the U.S.

3. The development of the results in the following PISA studies

4. Conclusion and future outlook

5. References

In 2000, the OECD conducted the first of meanwhile four Program for International Student Assessment, henceforth PISA, studies, international surveys testing the skills and knowledge of 15-year-old students in mathematics, science, and reading. Students from 28 OECD-member and from four OECD-nonmember states took part, among them Germany and the U.S.

Of all participant states, Germany was ranked 21st both in mathematics and science, and the 22nd in reading. It thus got an overall 21st ranking. The U.S. were ranked 20th in mathematics, 15th in science, and 16th in reading, thus ranked 17th overall. This meant an overall below score average for both countries. Besides ranking the performance of the students, the results also shed light on the correlation between socioeconomic status and academic achievement. Germany displayed the highest correlation between student performance and social background of all countries tested. In addition, student performance strongly varied between the school forms Gymnasium, Hauptschule and Realschule. Regarding educational equality, this finding challenged Germany’s tripartite school system. “Socio-economic disadvantage has a notable impact on student performance in the United States”, too (cf. OECD 2011), albeit by far not as strong as in Germany. In light of the high child poverty rate in the U.S.[1], this finding could have been anticipated. Nevertheless, it was alarming.

PISA can be and has rightly been criticized for its objective, methodology, and the interpretation of its results. It is, for example, problematic to test and compare the achievement of 15-year-old students, disregarding the school form they visit at the time of the assessment. Moreover, the fact that the students tested by PISA were schooled in completely different school systems challenges the comparability of the results. Nonetheless, PISA has a great social impact. When the results of the first PISA study were published in 2001, Germany and the U.S. conceived of them as a “shock” and reacted similarly: with major reforms of their education systems that would not have been otherwise possible. The changes were immense, particularly with regard to educational progress assessment, teacher accountability, and trans-regional comparability of educational standards. These changes had a significant impact on education. How did this affect the results of the second PISA study, which was conducted in 2003? While the nature and the purpose of both countries' reform agendas were questionable, the U.S. reforms in particular were condemned to failure. They were inadequate for improving the education system and thus the PISA survey results. Based on a comparison of the two reform agendas, ideas on how to adequately improve education will be presented in the conclusion.

The results of the first PISA study placed Germany below OECD-average in all three areas of competence tested: rank 21 of 34 in mathematics, 21 in reading, and 22 in scientific literacy. When these results were published in November 2001, they caused huge media response and a shockwave throughout the nation. How could the “land of poets and thinkers”, a leading industrial nation, receive such poor marks on its education system? A closer look at the history of school assessment reveals that poor results in student performance tests were in fact nothing completely new: Germany had scored poorly in the first Trends in International Mathematics and Science Study (TIMMS) in 1995. In contrast to PISA, the German press and public hardly took any notice (cf. OECD 2011: 208).

Slightly better than Germany, the U.S. scored poorly in the first PISA study, too. The American public was similarly shocked, and the government was forced to take action. It also introduced reforms of its education system. The centerpiece of the reform agendas introduced in Germany and the U.S. was a significant increase in assessment and accountability. These reforms, and hence the PISA results, had washback effects in both countries. What is meant by washback (WB)? In applied linguistics, the term has been defined as "the effect of testing on teaching and learning" (Hughes 1989: 1). Hughes's definition focuses on the micro-level of WB, its impact on individual teachers and students. WB on the macro-level refers to the impact on society and educational systems (cf. Bailey 1999: 4). Messick (1996: 241) also refers to the micro-level when defining WB as "the extent to which a test influences language teachers and learners to do things they would not necessarily otherwise do that promote or inhibit language learning". While some researchers view WB merely as a part of the impact of a test, either on a micro- and or on a macro-level (cf. Bachman & Palmer 1996: 29f.), others consider it critically as the […] natural tendency for both teachers and students to tailor their classroom activities to the demands of the test, especially when the test is very important to the future of the students, and pass rates are used as a measure for teacher success.

(Buck 1988: 17)

Thus, tests, in particular high-stakes ones, determine how and what teachers teach and how and what students learn. Buck adds that “[t]his influence of the test on the classroom […] is, of course, very important; this washback effect can either be beneficial or harmful.” (ibid.).

Alderson and Wall (1993) state in fifteen theses on WB how a test can influence teaching and learning: what and how teachers teach, what and how learners learn, the rate and sequence of learning, the attitude to learning and teaching, and the methods. They focus on the question “How directly, according to the washback hypothesis, do tests bring about change in teaching and learning?” (ibid. 18). This question is highly relevant for the reforms due to the PISA results which put the focus of education in Germany partially, and in the U.S completely on high-stakes testing.

While there hardly was any public reaction to the poor TIMMS results in 1995, they had an impact on Germany’s educational policy. In 1997, the Standing Conference of the ministers of Education and Cultural Affairs of the Länder in the Federal Republic of Germany, henceforth KMK[2], began to prepare the grounds for the PISA assessments (cf. OECD 2011: 207). The Länder, represented by the Council of ministers, put forth a national report on education that has been published every two years since then. According to the DIPF, the German Institute for International Pedagogical Research, these reports served as “a major instrument of education monitoring in Germany alongside the international student performance surveys PISA, TIMSS and PIRL” (Weishaupt et al. 2012: 2). The introduction of the reports also marked the beginning of the cooperation of the ministers in the federal government and the Länder on a reform agenda. In 2001 Federal Minister of Education and Research Edelgard Bulmahn (SPD), proposed a major reform program comprising the introduction of all-day schools[3], developing and implementing national education standards and creating a new national report on education. Besides, language support programs for children with a poor command of German and their families were implemented. While not all sixteen ministers agreed on this agenda at first, the all-day school program was passed in 2003, followed by the national education standards 2004. Furthermore, the Länder agreed to another national report on education (cf. OECD 2011: 208).

Agreement on the reform agenda by the education ministers of all states was only possible because of the PISA results. The political left and right[4] endorsed the agenda for different reasons. In a nutshell, the Social Democrats (SPD) focused on national educational standards, strengthening kindergartens, increasing funds for special language training for nonnative children and their parents, for transforming schools into all-day schools, and for teacher training. The Christian Democrats (CDU) aimed at holding accountable school staff and at economizing schools by managing them like business organizations. This way, school staff would get more autonomy in exchange for more accountability for their performance. Surprisingly, accountability was endorsed by the SPD as well. Due to Germany’s political system, each side had effectively blocked the other for years, resulting in a standstill on educational policy change. The “PISA shock” changed that. For the first time in years it was possible to have substantial changes of the system. The huge media response and public reaction forced politicians to become active so that the states through the KMK implemented the reforms, whereby the national government had the ultimate legislative power. The reform agenda reflected both the agendas of the politically right and left parties, a novelty which made it possible to pull through a common agenda in all states, as well SPD- as CDU-governed ones (cf. ibid.).

Some elements of the reform agenda focused on assessment and accountability such as the national educational standards, which formulate demands on teachers and pupils […], pick up general education goals [and] designate the competencies that school has to impart to its pupils so that the key educational goals can be attained. The educational standards stipulate which competencies the children or young people are to acquire up to a specific school year level. The competencies are described […] in such a way that they can be implemented in setting tasks and can be documented […] with the help of testing procedures. (Klieme in Weisseno 2005: 4f.)

Introducing standards implied higher performance demands for teachers as well as for students. The competencies introduced together with the standards were designed in such a way as to serve an assessment of the performance. This way, the standards made teachers and students subject to more federal control.

In addition, more Länder introduced centralized Abitur exams, henceforth Zentralabitur, due to the PISA results. While only seven states had introduced a Zentralbitur prior to PISA, the remaining Länder followed afterwards[5]. It was extended to all Länder with the purpose of making the Abitur performance comparable, and because “if an external standard is to be met at the end of the school career, students have no incentives to establish a low achievement cartel in class, possibly with the tacit consent of the teachers” (Jürges et al. 2003: 2). Hence, it also served the purpose of monitoring the performance of teachers and students.

Besides implementing educational standards and standardized assessment on Abitur level, the reform agenda implemented language promotion programs starting in preschool for children with a weak command of German and their families and the introduction of all-day schools. The introduction of the standards and the standardized Abitur assessment meant stricter performance demands for schools, teachers, and students. The language promotion programs and the introduction of all-day schools implied promoting educational equality.

Slightly better than Germany, the U.S. scored poorly in the first PISA study, too. The U.S. government reacted to the results with major reforms of its education system. In 2001, President Bush introduced NCLB, an education-reform bill which became law in 2002. Even though testing and accountability has had a longer tradition in the U.S.[6], this opened up a new era of standardized, nationwide high stakes testing and accountability. A WB effect on the macro-level was the conduction of annual “proficiency tests”, standardized, nationwide tests allegedly intended to test students’ progress in reading and mathematics.

Bush presented NCLB as a revolution of the education system, particularly of improving the academic achievement of the nation's poor and minority children. However, it was nothing completely new, but a reiteration of the 1965 Elementary and Secondary Education Act[7] (cf. Ravitch 2010: 94). The way for NCLB was paved in the 1990s, when both political parties showed interest in testing and accountability. On these grounds, the “Education President” Bush sr. in 1991 proposed America 2000, an act comprising several programs intended to create voluntary national standards and voluntary national tests in English, mathematics, science, history, and geography in grades four, eight, and twelve (cf. ibid.). Its voluntary character contrasts with the mandatory tests due to the NCLB-agenda. Besides, the tests were not conducted annually, but every fourth year.

Under President Clinton there also were national standards and tests (cf. ibid. 95). At that time, Bush jr. as governor successfully implemented a model of accountability Texas. When becoming president in 2001, he had this experience on his record and thus got strong bipartisan support for his school reform plans so that congress passed a program closely aligned with the Texas model. Bush’s pledge to “[…] make sure every child is educated” and that “no child will be left behind – not one single child” (cf. ibid. 94), should be fulfilled based on four simple principles: every child should be tested every year in grades three through eight, whereby the tests should be state, not national tests. School reforms should be administered by the states, not by the national government. Low-performing schools should receive state funds in order to improve, and students in weak or failing schools should be enabled to transfer to better schools. An accountability plan was set up: First, all states were required to choose their own tests. They had to adopt three performance levels such as “basic”, “proficient”, and ”advanced”, and find a definition for “proficient” (cf. ibid. 97). Secondly, mandatory tests in grades three through eight were stipulated for all public schools receiving federal funding and schools were to separate scores by race, ethnicity, low-income status, disability status, and limited English proficiency. Thirdly, states were expected to establish timelines on how all of their students would reach proficiency in reading and mathematics by 2014. Finally, all schools and school districts had to make Adequate Yearly Progress (AYP) in order to reach the goal of 100 percent proficiency by 2014.


[1] The U.S. have the second highest child poverty rate of 35 industrialized countries, only surpassed by Romania, according to a UNICEF-report (cf. UNICEF 2012: 4).

[2] According to the German abbreviation for Ständige Konferenz der Kultusminister der Länder in der Bundesrepublik Deutschland.

[3] All day schools are schools on a primary or lower secondary level that provide afternoon service for at least seven hours on at least three days a week (cf. OECD 2011: 209).

[4] Political left refers to the Social Democratic Party of Germany, SPD, and to the Green Party, Bündnis 90/Die Grünen, who were in charge of the Federal Government during the time of the agenda. Political right refers to the Christian Democratic/Christian Social Union, CDU/CSU, who was in charge of some Länder -governments at that time.

[5] Except for Rhineland-Palatine, which still conducts decentral Abitur exams.

[6] It dates back at least to Lyndon B. Johnson’s presidency (cf. Ravitch 2010: 4).

[7] The Elementary and Secondary Education Act will not be elaborated on, since this would go beyond the scope of this paper.


Title: After PISA: Standards, Assessment, and Accountability in Germany and in the U.S.