Standardized Tests in Education: Definition, Types, and Historical Context

A standardized test is designed and administered to ensure consistency and fairness in evaluating test-takers. It is a method of assessment that requires all test takers to answer the same questions, or a selection of questions from a common bank of questions, in the same way, and that is scored in a "standard" or consistent manner, which makes it possible to compare the relative performance of individual students or groups of students. This approach aims to provide an objective measure of knowledge or skills, reducing potential biases in grading.

Defining Standardized Tests

A standardized test is administered and scored uniformly for all test takers. Any test in which the same test is given in the same manner to all test takers and graded in the same manner for everyone, is a standardized test. This means that the test is given under the same conditions to all participants, and the scoring process is consistent, ensuring that all test takers who answer a test question in the same way will get the same score for that question. This uniformity is intended to create a level playing field, allowing for fair comparisons between individuals or groups.

While standardized tests do not need to be high-stakes tests, time-limited tests, multiple-choice tests, academic tests, or tests given to large numbers of test takers, they are often perceived as fairer than non-standardized tests because everyone gets the same test and the same grading system. Such tests are often thought of as more objective than a system in which some test takers get an easier test and others get a more difficult test.

The definition of a standardized test has evolved over time. In 1960, standardized tests were defined as those in which the conditions and content were equal for everyone taking the test, regardless of when, where, or by whom the test was given or graded. By the beginning of the 21st century, the focus shifted away from a strict sameness of conditions towards equal fairness of testing conditions. Changing the testing conditions in a way that improves fairness with respect to a permanent or temporary disability, but without undermining the main point of the assessment, is called an accommodation. For example, a test taker with a broken wrist might write more slowly because of the injury, and it would be more equitable and produce a more reliable understanding of the test taker's actual knowledge if that person were given a few more minutes to write down the answers to a time-limited test.

Types of Standardized Tests

Standardized tests can take various forms, including written, oral, or practical assessments. A standardized test can be composed of multiple-choice questions, true-false questions, essay questions, authentic assessments, or nearly any other form of assessment. Not all standardized tests involve answering questions. An authentic assessment for athletic skills could take the form of running for a set amount of time or dribbling a ball for a certain distance. Multiple-choice and true-false items are often chosen for tests that are taken by thousands of people because they can be given and scored inexpensively, quickly, and reliably through using special answer sheets that can be read by a computer or via computer-adaptive testing. Healthcare professionals must pass tests proving that they can perform medical procedures, and candidates for driver's licenses must pass a standardized test showing that they can drive a car.

Here's a breakdown of common types:

Achievement Tests

Standardized achievement tests measure how much students have already learned about a school subject. K-12 achievement tests are designed to assess what students have learned in a specific content area. These tests include those specifically designed by states to access mastery of state academic content standards as well as general tests such as the California Achievement Tests, The Comprehensive Tests of Basic Skills, Iowa Tests of Basic Skills, Metropolitan Achievement Tests, and the Stanford Achievement Tests. These general tests are designed to be used across the nation and so will not be as closely aligned with state content standards as specifically designed tests. Standardized achievement tests are designed to be used for students in kindergarten through high school. For young children, questions are presented orally, and students may respond by pointing to pictures, and the subtests are often not timed.

Aptitude Tests

Standardized aptitude tests measure students’ abilities to learn in school-how well they are likely to do in future school work. Instead of measuring knowledge of subjects taught in school, these tests measure a broad range of abilities or skills that are considered important to success in school. Aptitude tests, like achievement tests, measure what students have learned, but rather than focusing on specific subject matter learned in school (e.g. math, science, English, or social studies), the test items focus on verbal, quantitative, and problem-solving abilities that are learned in school or in the general culture. These tests are typically shorter than achievement tests and can be useful in predicting general school achievement.

Diagnostic Tests

Some standardized tests are designed to diagnose strengths and weaknesses in skills, typically reading or mathematics skills. For example, an elementary school child may have difficulty in reading and one or more diagnostic tests would provide detailed information about three components: (1) word recognition, which includes phonological awareness (pronunciation), decoding, and spelling; (2) comprehension which includes vocabulary as well as reading and listening comprehension, and (3) fluency. Diagnostic tests are often administered individually by school psychologists, following standardized procedures.

College-Admissions Tests

College-admissions tests are used in the process of deciding which students will be admitted to a collegiate program. The Scholastic Assessment Test (SAT) is a college admission standardized test that many students take every year. The SAT aims to assess the students’ basic school knowledge by evaluating them in two main sections, math and reading. The American College Test (ACT) is another standardized test whose score colleges and universities use to make admission decisions.

Read also: Mastering the SAT

International-Comparison Tests

International-comparison tests are administered periodically to representative samples of students in a number of countries, including the United States, for the purposes of monitoring achievement trends in individual countries and comparing educational performance across countries.

Psychological Tests

Psychological tests, including IQ tests, are used to measure a person’s cognitive abilities and mental, emotional, developmental, and social characteristics. Trained professionals, such as school psychologists, typically administer the tests, which may require students to perform a series of tasks or solve a set of problems.

Norm-Referenced vs. Criterion-Referenced Tests

Standardized tests can be further categorized into norm-referenced and criterion-referenced tests.

Norm-Referenced Tests (NRT)

A norm-referenced test (NRT) is a type of test, assessment, or evaluation which yields an estimate of the position of the tested individual in a predefined population. The estimate is derived from the analysis of test scores and other relevant data from a sample drawn from the population. This type of test identifies whether the test taker performed better or worse than other people taking this test. Norm-referenced score interpretations compare test takers to a sample of peers. The goal is to rank test takers as being better or worse than others. An IQ test is a norm-referenced standardized test. Comparing against others makes norm-referenced standardized tests useful for admissions purposes in higher education, where a school is trying to compare students from across the nation or across the world.

Criterion-Referenced Tests (CRT)

A criterion-referenced test (CRT) is a style of test which uses test scores to show how well test takers performed on a given task, not how well they performed compared to other test takers. Criterion-referenced score interpretations compare test takers to a criterion (a formal definition of content), regardless of the scores of other examinees. These may also be described as standards-based assessments, as they are aligned with the standards-based education reform movement. Criterion-referenced score interpretations are concerned solely with whether or not this particular student's answer is correct and complete. Most tests and quizzes that are written by school teachers are criterion-referenced tests. In this case, the objective is simply to see whether the test taker can answer the questions correctly.

Read also: Comprehensive ACT Guide

Historical Overview of Standardized Testing

The concept of standardized testing is not new, with roots tracing back to ancient China.

Early Origins

The earliest evidence of standardized testing was in China, during the Han dynasty, where the imperial examinations covered the Six Arts which included music, archery, horsemanship, arithmetic, writing, and knowledge of the rituals and ceremonies of both public and private parts. Later, sections on military strategies, civil law, revenue and taxation, agriculture and geography were added to the testing.

Western Adoption

Inspired by the Chinese use of standardized testing, in the early 19th century, British company managers used standardized exams for hiring and promotions to keep the process fair and free from corruption or favoritism. This practice of standardized testing was later adopted in the late 19th century in the Britain mainland. Standardized testing spread from Britain not only throughout the British Commonwealth but to Europe and then America. Its spread was fueled by the Industrial Revolution, where the increase in the number of school students as a result of compulsory education laws decreased the use of open-ended assessments, which were harder to mass-produce and assess objectively.

Prior to their adoption, standardized testing was not traditionally a part of Western pedagogy. Based on the skeptical and open-ended tradition of debate inherited from Ancient Greece, Western academia favored non-standardized assessments using essays written by students. Because of this, the first European implementation of standardized testing did not occur in Europe proper, but in British India.

20th Century Developments

During World War I, the Army Alpha and Beta tests were developed to help place new recruits in appropriate assignments based upon their assessed intelligence levels. The first edition of a modern standardized test for IQ, the Stanford-Binet Intelligence Test, appeared in 1916. The College Board then designed the SAT (Scholar Aptitude Test) in 1926. Individual states began testing large numbers of children and teenagers through the public school systems in the 1970s. The need for the federal government to make meaningful comparisons across a highly de-centralized (locally controlled) public education system encouraged the use of large-scale standardized testing.

Federal Legislation

The Elementary and Secondary Education Act of 1965 required some standardized testing in public schools. The No Child Left Behind Act of 2001 further tied some types of public school funding to the results of standardized testing. Under these federal laws, the school curriculum was still set by each state, but the federal government required states to assess how well schools and teachers were teaching the state-chosen material with standardized tests. The results of large-scale standardized tests were used to allocate funds and other resources to schools and to close poorly performing schools.

International Examples

Colombia has several standardized tests that assess the level of education in the country. Students in third grade, fifth grade and ninth grade take the "Saber 3°5°9°" exam. Upon leaving high school students present the "Saber 11" that allows them to enter different universities in the country. In Australia, the testing includes all students in Years 3, 5, 7 and 9 in Australian schools to be assessed using national tests. Canada leaves education, and standardized testing as result, under the jurisdiction of the provinces. Young adults in Poland sit for their Matura exams.

Advantages and Disadvantages of Standardized Testing

Standardized tests, while widely used, are subject to ongoing debate regarding their effectiveness and fairness.

Advantages

Objectivity: Standardized tests aim to provide an objective measure of student performance, reducing the potential for bias in grading. The standardization ensures that all of the students are being tested equally.
Comparability: They allow for comparisons between individuals, schools, and districts, providing insights into relative performance.
Accountability: Standardized tests can hold schools and educators accountable for educational results and student performance.
Identification of Gaps: They can help identify gaps in student learning and academic progress.
Resource Allocation: Test results can be used to allocate funds and resources to schools based on performance.

Disadvantages

Narrow Focus: Critics argue that standardized tests can only evaluate a narrow range of achievement using inherently limited methods.
Potential for Bias: Subjective human judgment enters into the testing process at various stages-e.g., in the selection and presentation of questions, or in the subject matter and phrasing of both questions and answers. Some critics argue that standardized tests are culturally and socially biased.
Teaching to the Test: Educators may focus on teaching to the test, rather than supporting students’ critical-thinking skills.
Stress and Anxiety: The high stakes associated with standardized tests can lead to stress and anxiety for students and teachers.
Misleading Indicators: Are numerical scores on a standardized test misleading indicators of student learning, since standardized tests can only evaluate a narrow range of achievement using inherently limited methods?

Validity and Reliability

The considerations of validity and reliability typically are viewed as essential elements for determining the quality of any standardized test. In the field of psychometrics, the Standards for Educational and Psychological Testing place standards about validity and reliability, along with errors of measurement and issues related to the accommodation of individuals with disabilities. Validity refers to the accuracy of the test in measuring what it is intended to measure, while reliability refers to the consistency of the test results.

Scoring and Grading

Standardized tests have a consistent, uniform method for scoring. This means that all test takers who answer a test question in the same way will get the same score for that question. Since the latter part of the 20th century, large-scale standardized testing has been shaped in part by the ease and low cost of grading multiple-choice tests by computer. People are used to score items that are not able to be scored easily by computer (such as essays).

Human Scoring

Human scoring is relatively expensive and often variable, which is why computer scoring is preferred when feasible. For example, some critics say that poorly paid employees will score tests badly. Agreement between scorers can vary between 60 and 85 percent, depending on the test and the scoring session.

Computer Scoring

Though the process is more difficult than grading multiple-choice tests electronically, essays can also be graded by computer. In other instances, essays and other open-ended responses are graded according to a pre-determined assessment rubric by trained graders. Using a rubric is meant to increase fairness when the test taker's performance is evaluated.

Measurement Error and Bias

In standardized testing, measurement error (a consistent pattern of errors and biases in scoring the test) is easy to determine in standardized testing. Standardized tests also remove grader bias in assessment. In non-standardized assessment, graders have more individual discretion and therefore are more likely to produce unfair results through unconscious bias.

Standardized Testing Today

Proponents of testing have continued to argue that the government has a responsibility to ensure that educational funding is given to schools with the greatest need, and that the government must rely on some testing procedure to ensure that federal funding is being effectively used. The modern era of standardized testing is linked to the No Child Left Behind policy developed under President George W. Bush and implemented in 2002. Whereas previous legislation supported and funded assessment tests, NCLB made standardized testing a requirement in certain grades. In addition, all states were required to include district test results in their funding requests.

Following the spread of the global COVID-19 pandemic beginning in early 2020, daily school operations were significantly disrupted as measures were put in place in an effort to get the increasingly transmissible virus under some extent of control. The pandemic also accelerated a shift, already underway at some universities, away from standardized tests such as the SAT being used as a key factor in college admissions in the US.

tags: #standardized #test #in #education #definition #types