Standardized Learning Tests: An Overview

Standardized learning tests play a significant role in education systems worldwide. They are designed to provide a consistent and objective measure of student performance, informing decisions at various levels, from individual student support to broader educational policies. This article explores the definition, history, purpose, and ongoing debates surrounding standardized tests, aiming to provide a comprehensive understanding of this important aspect of education.

Defining Standardized Tests

A standardized test is "any test that’s administered, scored, and interpreted in a standard, predetermined manner," ensuring consistency across all test takers, according to W. James Popham, former president of the American Educational Research Association. This means that every student answers the same questions (or a selection from a common bank), under the same conditions, and their responses are evaluated using the same scoring criteria. This uniformity aims to create a level playing field, allowing for fair comparisons of performance among individuals or groups of students.

While multiple-choice questions are common due to their ease of automated scoring, standardized tests can also include true-false, short-answer, essay questions, or a mix of question types. These tests do not necessarily need to be high-stakes, time-limited, or academic in nature. Any assessment administered and scored uniformly qualifies as standardized.

A Historical Perspective

The concept of standardized testing has a long history, dating back to ancient civilizations.

Early Origins in China

The earliest evidence of standardized testing can be traced back to China during the Han dynasty. Imperial examinations covered the "Six Arts," including music, archery, horsemanship, arithmetic, writing, and knowledge of rituals and ceremonies. Later, sections on military strategies, civil law, revenue and taxation, agriculture, and geography were added. The famed "Eight-Legged Essay" became a standard part of civil service tests in the Ming Dynasty (1368-1644), testing applicants' rote learning of Confucian philosophy.

The Industrial Revolution and the Rise of Modern Testing

Further technological development spurred still a greater need for testing and standardized exams. The Industrial Revolution (mid-1700s to the early 1900s) encouraged school-aged farmhands and factory workers to go to school. In the mid-1800s, Boston school reformers Horace Mann and Samuel Gridley Howe, modeling their efforts on the centralized Prussian school system, introduced standardized testing to Boston schools. The Kansas Silent Reading Test (1914-15) is the earliest known published multiple-choice test, developed by Frederick J. Kelly, a Kansas school director. Kelly created the test to reduce “time and effort” in administration and scoring.

Inspired by the Chinese system, British company managers in British India used standardized exams in the early 19th century for hiring and promotions to ensure fairness and prevent corruption. This practice was later adopted in mainland Britain and spread throughout the British Commonwealth, Europe, and America.

20th Century Developments

World War I (1914-18) also played a key role in popularizing standardized testing in the United States. Given to new recruits, the Army’s “intelligence tests,” developed by Princeton psychologist Carl Brigham, were deeply biased, reflecting the prejudices and racism of the day. This wartime emphasis on standardized tests influenced the founding of the Scholastic Aptitude Test (SAT) in 1926. Created by Carl Brigham for the College Board for the expansion of access to higher education, the SAT became a standard exam for acceptance into college in the post-World War II era.

In 1934, International Business Machines Corporation (IBM) hired teacher and inventor Reynold B. Johnson-best known for creating the world’s first commercial computer disk drive-to create a production model of his prototype test scoring machine. The IBM 805, announced in 1938 and marketed until 1963, graded answer sheets by detecting the electrical current flowing through graphite pencil marks. The Elementary and Secondary Education Act of 1965 required some standardized testing in public schools.

Individual states began testing large numbers of children and teenagers through the public school systems in the 1970s. The need for the federal government to make meaningful comparisons across a highly de-centralized (locally controlled) public education system encouraged the use of large-scale standardized testing.

Read also: Converting UTC to PST

The Rise of High-Stakes Testing

The 1983 release of A Nation at Risk: The Imperative for Educational Reform, a report by Pres. Ronald Reagan’s National Commission on Excellence in Education, warned of a crisis in American education and an urgent need to raise academic standards. Successive administrations attempted to implement national school reform following the release of A Nation at Risk. Pres.Bill Clinton’s Goals 2000 Act and Improving America’s Schools Act (IASA), passed in 1994, had the same aim of making American students the top in the world in math and science by 2000. Many of its principles reflected an outcome-based approach to education, which has been criticized for over-emphasizing standardized test scores, leading to the negative consequences associated with high-stakes testing, such as narrowing the curriculum and “teaching to the test” at the expense of art, music, or social studies.

Clinton’s Goals 2000 Act is often seen as the precursor to Pres. George W. Bush’s No Child Left Behind Act (NCLB), which passed overwhelmingly in the U.S. Senate (87-10), and was signed into law on January 8, 2002. The legislation, modeled on Bush’s education policy as governor of Texas, mandated annual testing in reading and math (and later science) in grades 3-8 and again in grade 10. If schools did not show sufficient Adequate Yearly Progress (AYP), they faced sanctions and the possibility of being taken over by the state or closed. The goal of NCLB was that all students be “proficient” on state reading and math tests by 2014, which was regarded as an impossible target by many testing opponents. According to the Pew Center on the States, annual state spending on standardized tests rose from $423 million before NCLB to almost $1.1 billion in 2008 (a 160 percent increase).

On February 17, 2009, Pres. Barack Obama’s Race to the Top program was signed into law, inviting states to compete for $4.35 billion in extra funding based on the strength of their student test scores. On March 13, 2010, Obama proposed an overhaul of Bush’s No Child Left Behind, promising further incentives to states if they develop improved assessments tied more closely to state standards, and emphasizing other indicators like pupil attendance, graduation rates, and learning climate in addition to test scores.

The No Child Left Behind Act of 2001 further tied some types of public school funding to the results of standardized testing. Under these federal laws, the school curriculum was still set by each state, but the federal government required states to assess how well schools and teachers were teaching the state-chosen material with standardized tests. The results of large-scale standardized tests were used to allocate funds and other resources to schools, and to close poorly performing schools.

Contemporary Testing Landscape

The Every Student Succeeds Act (ESSA), signed into law in 2015, replaced NCLB and returned some authority over testing methods and teacher evaluations to the states. While ESSA still requires yearly testing in math and reading after Grade 3 and for schools to report demographic statistics, it allows states more flexibility in determining how to use test results.

The testing includes all students in Years 3, 5, 7 and 9 in Australian schools to be assessed using national tests. The program presents students level reports designed to enable parents to see their child's progress over the course of their schooling life, and help teachers to improve individual learning opportunities for their students. Students and school level data are also provided to the appropriate school system on the understanding that they can be used to target specific supports and resources to schools that need them most.

Colombia has several standardized tests that assess the level of education in the country. Students in third grade, fifth grade and ninth grade take the "Saber 3°5°9°" exam. Upon leaving high school students present the "Saber 11" that allows them to enter different universities in the country. Canada leaves education, and standardized testing as result, under the jurisdiction of the provinces.

Impact of COVID-19

COVID-19 Interrupts Testing. On March 20, 2020, Education Secretary Betsy DeVos announced that states could cancel standardized testing for the 2019-2020 school year due to the COVID-19 pandemic-related school closures. As DeVos stated, Students need to be focused on staying healthy and continuing to learn. Teachers need to be able to focus on remote learning and other adaptations. Neither students nor teachers need to be focused on high-stakes tests during this difficult time.

On November 25, 2020, the National Center for Education Statistics (NCES) announced that National Assessment of Educational Progress (NAEP) reading and math tests would be postponed until 2022 in light of the ongoing pandemic. The Biden Administration announced on February 22, 2021, that states must resume annual math and reading standardized testing in spring 2021.

Post-Pandemic Testing Standardized testing scores suffered after the pandemic. The tests given in the fall of 2022 show the lowest math scores since 1990 and the lowest reading scored since 2003 for 13-year-olds on the National Assessment of Educational Progress (NAEP). A February 7, 2024, Forbes report found that students in Massachusetts, Utah, New Jersey, New Hampshire, and Connecticut maintained the highest scores from fourth through eighth grade. Mississippi, Alabama, West Virginia, New Mexico, and Oklahoma showed sharp declines in scores from fourth to eighth grade.

The 2024 NAEP results in fourth-grade math showed minor improvement from 2022 (up three points), but scores had not yet rebounded to pre-pandemic rates. Overall, only 39 percent of fourth graders performed at or above the NAEP “proficient” level in math. Eighth-grade math scores remained at 2022 levels, which were lower than pre-pandemic scores. Overall, only 28 percent of eighth graders performed at or above the NAEP “proficient” level on math. Reading scores for both fourth and eighth graders continued to decline post-pandemic (down two points for each), with only 31 percent of fourth graders and 30 percent of eighth graders performing at or above the NAEP “proficient” level. By late 2025, it was clear that a post-pandemic rebound was still not happening. The NAEP’s “Nation’s Report Card” found a decrease of four points in eighth-grade science scores since 2019, and a three point decrease in 12th-grade math and reading scores. Compared to the 1992 reading assessment, 12th grade average scores were down 10 points, a historic low.

Purposes of Standardized Tests

Standardized tests serve various purposes within the education system:

Measuring Student Achievement: Standardized achievement tests measure how much students have already learned about a school subject and determine the academic progress they have made over a period of time.
Assessing Aptitude: Standardized aptitude tests measure students' abilities to learn in school and how well they are likely to do in future school work. They measure a broad range of abilities or skills considered important to success in school, such as verbal, mechanical, creativity, clerical, or abstract reasoning.
Evaluating School Effectiveness: Test results help schools measure how students in a given class, school, or school system perform in relation to other students who take the same test. They also evaluate the effectiveness of schools and teachers.
Informing Instruction: The results from aptitude tests help teachers plan instruction that is appropriate for the students' levels. Standardized tests can help teachers and administrators make decisions regarding the instructional program.
Accountability: Standardized achievement tests have become an increasingly prominent part of public schooling in the United States to hold schools and educators accountable for educational results and student performance.
Identifying Achievement Gaps: They identify achievement gaps among different student groups, including students of color, students who are not proficient in English, students from low-income households, and students with physical or learning disabilities.
College Admissions: College-admissions tests are used in the process of deciding which students will be admitted to a collegiate program.
International Comparisons: International-comparison tests are administered periodically to representative samples of students in a number of countries, including the United States, for the purposes of monitoring achievement trends in individual countries and comparing educational performance across countries.
Psychological Assessment: Psychological tests, including IQ tests, are used to measure a person’s cognitive abilities and mental, emotional, developmental, and social characteristics.

Norm-Referenced vs. Criterion-Referenced Tests

Standardized tests can be either norm-referenced or criterion-referenced:

Norm-referenced score interpretations compare test takers to a sample of peers, ranking them as better or worse than others. This type of test identifies whether the test taker performed better or worse than other people taking this test. An IQ test is a norm-referenced standardized test. Comparing against others makes norm-referenced standardized tests useful for admissions purposes in higher education, where a school is trying to compare students from across the nation or across the world.
Criterion-referenced score interpretations compare test takers to a criterion (a formal definition of content), regardless of the scores of other examinees. These may also be described as standards-based assessments, as they are aligned with the standards-based education reform movement. In this case, the objective is simply to see whether the test taker can answer the questions correctly.

Pros and Cons of Standardized Tests

Standardized tests are a subject of ongoing debate, with proponents and critics raising valid points.

Arguments in Favor

Objective Measurement: Standardized tests offer an objective measurement of education because they assess students based on a similar set of questions, are given under nearly identical testing conditions, and are graded by a machine or blind reviewer. They help schools measure how students in a given class, school, or school system perform in relation to other students who take the same test.
Help for Marginalized Groups: Standardized tests help students in marginalized groups, whether by race, learning disability, or other difference, because advocates can use testing data to prove a problem exists and to help solve the problem via more funding, development of programs, or other solutions.
Indicators of Success: Standardized tests scores are good indicators of college and job success. Standardized tests can promote and offer evidence of academic rigor, which is invaluable in college as well as in students’ careers.
Useful Metrics for Teacher Evaluations: Standardized tests are useful metrics for teacher evaluations because they help schools and districts make sure students are meeting learning goals. If test scores show certain areas need improvement, schools can make changes to help students succeed.

Arguments Against

Limited Scope: Standardized tests only determine which students are good at taking tests, ignoring skills like creative thinking and problem-solving. They don't measure everything students are expected to learn in school.
Potential for Bias: Standardized tests are racist, classist, and sexist because they are unable to account for cultural, ethnic or social differences, as a system of assessment perpetuated the educational disadvantages people of color continued to face, including in quality and available resources.
Not Predictors of Future Success: Standardized tests scores are not predictors of future success. While they may indicate academic rigor, they don't guarantee success in college or careers.
Unfair Metrics for Teacher Evaluations: Standardized tests are unfair metrics for teacher evaluations because standardized tests, and the consequences attached to low scores, hold schools, educators, and students to higher standards and improve the quality of public education.

Ensuring Validity and Reliability

The considerations of validity and reliability typically are viewed as essential elements for determining the quality of any standardized test. Validity refers to the accuracy of the test in measuring what it is intended to measure. Reliability refers to the consistency of the test results.

In standardized testing, measurement error (a consistent pattern of errors and biases in scoring the test) is easy to determine in standardized testing. When the score depends upon the graders' individual preferences, then test takers' grades depend upon who grades the test. Standardized tests also remove grader bias in assessment.

Standardized Testing Controversies

The 2010 documentary Waiting for Superman gave the testing and accountability movement a nationally recognized spokesperson in Michelle Rhee, then-Chancellor of Washington, D.C, public schools. Rhee, appointed by D.C. Mayor Adrian Fenty in June 2007, became a lightning rod for testing opponents after she enacted a strict policy of teacher and school accountability based on standardized test scores.

In August 2010, the Los Angeles Times spurred a national debate when the newspaper published the names of about 6,000 Los Angeles elementary school teachers (grades 3-5), alongside calculations of their students’ gains and losses on standardized tests during the school year, in a publicly searchable database. Known as the “value added” method of evaluating teacher effectiveness, it has been mandated by several hundred school districts in some 20 states. For example, up to 40 percent of New York teachers’ evaluations were tied to value-added test score analyses, as of the 2011-2012 school year.

Education Secretary Arne Duncan told Congress that 82 percent of American schools could fail to meet NCLB’s goal of 100 percent proficiency on standardized tests by 2014. Individual states have cast similar doubts on their ability to satisfy NCLB’s Adequate Yearly Progress (AYP) goals. A 2008 study published in the peer-reviewed journal Science forecast “nearly 100 percent failure” of California schools to meet AYP in 2014. In 2015, parents staged an “opt-out movement” across the country in which parents did not allow their children to be included in standardized testing, and children as young as 11 were protesting testing. The 2019 Nation’s Report Card (National Assessment of Educational Progress) reported that fourth- and eighth-grade reading and math scores had remained largely the same for a decade, despite stronger academic standards.

tags: #standardized #learning #tests #definition