Intelligence Testing: Past and Present

Welcome to this blog post about the history of intelligence testing. If the numbers associated with IQ tests are a mystery to you, or you are anxious about how results might be used, read on.

After defining intelligence testing, we will review the historical problems associated with such testing, and the challenges of designing a useful and fair assessment of intelligence.

Intelligence testing is a process that aims to quantify intelligence.  As simple as this may sound, defining “intelligence” is a complex matter, and debate continues around whether intelligence is a unitary ability, or the aggregate of several distinct abilities.

Intelligence testing aims to measure one’s underlying mental ability, rather than achievement (as in academic achievement), and to report these results in a useful manner.   The term “intelligence quotient” (or IQ) used in such testing is meant to reflect one’s overall mental ability.

IQ was first thought of as referring to a subject’s “mental age”, based on test results, as compared to results documented for their actual age group.  Modern tests retain this procedure of reporting scores relative to one’s age group, although the metric of “mental age” is no longer used. Rather, modern scores are reported in terms of where they fall in the distribution of scores, for the sample representing the subject’s peer group.  For example, scores for 12-year-olds are compared to those of other 12-year-olds, etc. IQ scores are “scaled”, or reported in reference to the mean and standard deviation of peers in the general population. The average or median standardized score typically used to report IQ results is 100 (50th percentile, Average range), and other scores are scaled based on the distance from the median (measured in units of standard deviation).  IQ scores follow a “normal” distribution, and are scaled so that a score one full standard deviation from the median corresponds to 15 IQ points higher or lower than 100. By definition this means that about 2/3rds of the population have IQ scores between 85 and 115, and 95% of all people have scores between 70 and 130. That said, let us explore the history of IQ testing.

The History of Intelligence Testing

Humans have endeavored to understand the nature of intellect – our basic abilities to focus, use language, reason through words and pictures, etc. – for centuries. Alfred Binet, French psychologist, began his study of intelligence in the tradition of the nineteenth-century—with craniometry.  Craniometry is essentially the idea that an individual’s mental capabilities are related to, and measurable by the size of their cranium (skull). Binet himself, however, cast doubt on the ultimate validity of his craniometric investigations. He recognized that even the barely significant relationship he found between intelligence and cranial volume was most likely due to his own biases, as he conducted the testing and measured the skulls of subjects with full awareness of his own hypothesis.  Binet turned instead to psychological measures, eventually designing the first real intelligence test. His test consisted of an array of carefully constructed tasks, designed to “measure specific and independent ‘faculties’ of mind” with the goal of assessing innate capabilities rather than learned skills (cf. Gould, 1996 ). Binet developed his test to help identify children whose “mental age” was lower than their chronological age, so they could be given special help in school. Binet felt that a single number (such as overall IQ) could not properly represent a person’s intellectual capabilities.  Although he explicitly discouraged the use of his scale to quantify intelligence as if it were a single capacity, this did not stop others from doing precisely that.

H.H. Goddard served as the director of research at the “Vineland Training School for Feeble-Minded Girls and Boys.”  Goddard translated the works of Binet, and used the latter’s scale to “recognize limits, segregate, and curtail breeding [of ‘morons’] to prevent further deterioration of an American stock, threatened by immigration from without and by prolific reproduction of its feeble-minded within” (Gould, 1996).  Readers of this blog can decide for themselves whether Goddard’s perspective was rooted in naive ignorance, social Darwinism, racism, some combination of these, etc., but its influence is still felt. Goddard defined “morons” as adults whose scores reflected a mental age between 8 and 12 years, and he attributed low intelligence entirely to genetic inferiority (Gould, 1996).  He did not acknowledge the extent to which low scores might have been the result of pre-or perinatal complications, psychosocial trauma, improper test administration or interpretation, or other such factors. His logic led him to conclude that the “moron” must be institutionalized, and forbidden to reproduce. In 1912, Goddard arrived at Ellis Island to test new immigrants. He assigned female associates to use their ‘intuition’ to identify and test feeble-minded individuals (Gould, 1996).  Resulting scores were often very low. While the results could be partly explained by the harshness specific to Goddard’s translation and interpretation of the Binet test, it is likely that the conditions (hurried and biased testing of frightened immigrants, frequently administered in English to non-English speakers) skewed the results considerably. Goddard’s “findings” and those of other eugenicists were influential in legal efforts to tighten standards for U.S. immigration, specifically to keep out the ‘mentally deficient’, ostensibly to protect the American gene pool.

Goddard also studied a community of paupers and “ne’er-do-wells”, and explained their lives based on a story about their ancestor, whom he called Martin Kallikak (198).  The story went that Martin Kallikak had unlawfully married a feeble-minded woman, with whom he had feeble-minded children, and later a “worthy Quakeress” with whom he produced respectable offspring (Gould, 1996).  In the books in which Goddard wrote about the Kallikaks, it is evident that the pictures of the ‘feeble-minded’ individuals were retouched with dark marks around their mouths and eyes, to give them a “diabolical” appearance (Gould, 1996).

Like Goddard, some who came after him grossly misused Binet’s scale, and their own versions of intelligence testing.  Many tests relied on familiarity with American culture, putting foreigners at a disadvantage. Testing was frequently held in environments that were not conducive to test-taking.  One overall flaw in the work of Goddard, and some who followed him, was the assumption that intelligence was a single, stable, hereditary trait.

As the chief of psychology at Bellevue Psychiatric Hospital in 1932, David Wechsler had experience with the testing frequently administered at the “mental hygiene clinic” to a patient population of varying backgrounds and needs, inspiring his vision for a more accessible, standardized and statistical test (cf. Boake, 2002).  Pulling largely from the content of existing tests, but revising the scale and scoring methods, Wechsler published the Wechsler-Bellevue Intelligence Scale in 1939, altering the course of intelligence testing in the United States. The Wechsler Adult Intelligence Scale (WAIS) was published in 1955, followed by the Wechsler Intelligence Scale for Children (WISC), and the Wechsler Preschool and Primary Scale of Intelligence (WPPSI), each of which have been revised several times since.

It is now generally accepted that intelligence is the result of both nature and nurture.  Unlike the intelligence testers of the past, we understand that that there is a degree of plasticity or fluidity to many domains of intelligence.  Once difficulties have been identified, recommendations can be made to increase the client’s ability to function in his or her environment. Some problems may be solved by interventions that build capacity, other challenges may be overcome by learning compensatory skills.  Additionally, modern testing identifies both strengths as well as weaknesses.

Standardization, efforts to remove culturally specific test content, and to ensure uniform testing conditions have increased the validity of modern IQ testing.  Yet as history shows, it is very difficult, if not impossible, to design a test which is equally fair and accessible for people from all walks of life. Due to these well known limitations, it is important that all test results be interpreted by an expert who is aware of the shortcomings of the tests, and uses this knowledge and a sensitivity to individual differences when interpreting the results and writing the test report.

Overall, intelligence testing can be a useful tool for identifying both an individual’s mental strengths and difficulties, so that appropriate steps can be taken towards helping the individual, and improving their quality of life.

Leslie Sachs, Metrowest Neuropsychology, 2011 Summer Intern, with Dr. Jeff Gaines, Metrowest Clinical Director.

Works Cited
Boake, C. (2002). From the Binet-Simon to the Wechsler-Bellevue: Tracing the history of intelligence testing. Journal of Clinical and Experimental Neuropsychology, 24(3), 383-405. doi:10.1076/jcen.24.3.383.981
Gould, S.J. (1996) The mismeasure of man. NY: W. W. Norton