Math

The Flynn Effect: Why Average IQ Scores Have Been Rising for a Century

If you took a 1950s IQ test today, you would almost certainly score higher than your grandparents did. Much higher. Across most of the twentieth century, average raw scores on standardized intelligence tests rose at a remarkably steady clip — roughly three points per decade — in country after country, on test after test. The pattern has been documented on every inhabited continent where long-run testing data exists, and it is now considered one of the most consequential findings in the history of psychometrics.

Does that mean we are getting smarter than our ancestors? The honest answer is more interesting than a simple yes or no. The phenomenon is real, the data is robust, and the debate over what it actually means is still very much alive. This is the story of the Flynn Effect — what it is, how we discovered it, why it probably is not what it looks like, and why it matters every time someone sits down to take an IQ test today.

Meet the Flynn Effect

The Flynn Effect is the observation that average raw scores on standardized intelligence tests have been rising, generation after generation, for most of the twentieth century. The rate is surprisingly consistent: roughly three IQ points per decade, which works out to about a full standard deviation — fifteen points — over fifty years. The trend shows up on Wechsler scales, Stanford-Binet batteries, Raven's Progressive Matrices, and the other major psychometric instruments, and it has been observed across many different countries with different languages, school systems, and economies.

The phenomenon is named after James R. Flynn (1934-2020), a New Zealand political scientist and psychometrician who documented the pattern in a series of papers in the 1980s. Flynn was not the first person to notice that test scores were drifting upward — test publishers had been quietly adjusting their norms for decades — but he was the first to stitch the evidence together into a coherent, international picture and insist that the field take it seriously.

How We Know Scores Are Rising

Here is the paradox at the heart of the story: if average IQ is always 100 by definition, how can anyone say it has gone up?

The answer lies in how IQ tests are scored. An IQ score is not a measurement of a fixed quantity the way a temperature reading is. It is a ranking relative to a reference group. When a test is first published, the authors give it to a large, carefully selected sample — the "standardization sample" — and then rescale the raw number of correct answers so that the sample's average becomes 100 and the standard deviation becomes 15. Every future test-taker's score is calculated against those fixed anchors. If you want the full explanation of the normal-curve math behind this, our guide to how IQ scoring and the bell curve work walks through it step by step.

Because the scoring is pinned to the reference sample, any rise in underlying ability is invisible at first glance — the new mean is still 100 by construction. The rise only becomes visible in two situations. The first is when researchers go back and compare raw scores across decades: the same number of correct answers that would have placed you at the average in 1947 might place you below average in 2007. The second is when test publishers issue a new edition and re-norm the test against a fresh standardization sample. When they do, they typically discover that the old mean of 100 now corresponds to a score somewhere above the new average. Test-takers measured against the new norms end up with lower scores than they would have gotten on the old version, even though their actual cognitive performance has not changed.

That re-norming drift is exactly what Flynn pieced together. By collecting the data that publishers had been generating for their own internal purposes and lining it up against older editions, he could estimate how much the average had been moving over time. The answer — about three points per decade — was far larger than anyone had expected.

James Flynn: The Political Scientist Who Noticed

James Flynn was not a traditional cognitive psychologist. He was a political philosopher by training, based in New Zealand, and he came to psychometrics through a broader interest in the debate around intelligence testing and social policy. What started as a skeptical look at the literature turned into a career-defining finding.

In a 1984 paper, Flynn reported that white Americans had been gaining on standardized IQ tests for decades. Three years later, in a 1987 follow-up titled "Massive IQ gains in 14 nations: What IQ tests really measure," he expanded the analysis to more than a dozen countries and showed the same pattern everywhere he looked. That second paper turned a curiosity into a field-defining phenomenon. In their 1994 book The Bell Curve, Richard Herrnstein and Charles Murray named it the "Flynn Effect," and the label stuck.

Flynn himself was surprised by his own discovery. He was a careful skeptic by temperament, and he expected to find either no trend or a small one. Finding an increase this large — the equivalent of a full standard deviation over half a century — forced him to rethink what the tests were actually measuring. His conclusion, which he returned to repeatedly in later books, was that the tests were picking up something real, but not quite the same thing most people meant by "intelligence."

The Seven Proposed Explanations

Why have scores been rising? No single cause has ever won the argument, and most researchers now think the Flynn Effect is a product of several overlapping forces rather than one dominant driver. The leading candidates, roughly in order of how often they get cited, are:

  • Better nutrition. Particularly in the first half of the twentieth century, childhood diets improved dramatically across the industrialized world. Adequate calories, protein, iodine, iron, and other micronutrients in early development are known to support brain growth, and it is hard to believe the nutritional revolution of the last century had no effect on cognitive performance.
  • Expanded schooling. Average years of formal education climbed sharply in most countries over the same period. School teaches the specific skills that IQ tests reward — abstract reasoning, multiple-choice problem solving, sitting still and focusing on written material, thinking about hypotheticals. More school means more practice at exactly the sort of tasks the tests measure.
  • Smaller family sizes. As birth rates fell, the average child grew up with more parental attention, more conversation, and more cognitive stimulation per person in the household. This has been proposed as a contributor to gains in verbal and reasoning skills in particular.
  • Greater environmental complexity. Flynn himself placed heavy weight on this one. Over the twentieth century, daily life became saturated with abstract symbols, diagrams, maps, menus, schedules, and visual puzzles — the kind of thinking that matches what IQ tests demand. Urban life, industrial work, mass media, and eventually computers all pushed people to manipulate abstract categories constantly.
  • Test familiarity. Modern test-takers are used to the multiple-choice format, the time-pressured sections, and the style of trick questions that psychometric tests throw at them. Earlier generations often encountered this format for the first time in the testing room itself, which would have depressed their scores regardless of underlying ability.
  • Reduced infectious disease burden. Childhood infections — parasitic, bacterial, and viral — can impair cognitive development by diverting energy that would otherwise go to the brain. Vaccination, sanitation, antibiotics, and cleaner water dramatically reduced that burden over the twentieth century, and some researchers argue this is a major underappreciated driver of the gains.
  • Better prenatal care. Improvements in maternal health, obstetric practice, and care for premature infants mean that more children now start life in good cognitive shape, rather than losing ground before they are even born.

None of these explanations is fully satisfying on its own. Better nutrition cannot explain why the gains continued long after childhood diets in wealthy countries had stabilized. More schooling cannot explain why the biggest gains have come on tests that do not rely on school content. Environmental complexity is hard to measure and harder to prove. The honest answer is that the Flynn Effect is probably the joint signature of many different twentieth-century changes all pushing in the same direction at once.

Why Fluid Intelligence Gained More

One of the most striking features of the Flynn Effect is that the gains are not evenly spread across every kind of test. The biggest rises have come on measures of fluid intelligence — tests of abstract reasoning, pattern recognition, and novel problem-solving that deliberately try to minimize the role of learned knowledge. The classic example is Raven's Progressive Matrices, where test-takers look at visual patterns and work out which piece completes the sequence. On Raven's-style tests, the generational gains have been enormous.

On measures of crystallized intelligence — vocabulary, general knowledge, arithmetic facts — the gains are much smaller, and in some subtests they are negligible. If anything, vocabulary scores have been surprisingly flat in some datasets, which is not what you would expect if the Flynn Effect were simply a story about more education or more reading.

That uneven pattern is a serious clue. Fluid-reasoning tests were originally designed to be as culture-free as possible — to measure pure on-the-spot reasoning rather than learned content. For many decades, psychologists treated them as the cleanest window into raw cognitive ability. If anything, Raven's-style tests were supposed to be the most resistant to environmental influence. That the biggest gains have come there is one of the strangest facts about the Flynn Effect, and it has pushed researchers to reconsider what these "culture-fair" tests are really measuring — a topic our guide to culture-fair IQ tests takes up in detail.

What Flynn Himself Thought

Flynn was remarkably careful about what he believed his own data showed. He did not think the rising scores proved that people today are simply smarter than their grandparents. He thought the tests were picking up a genuine change, but a change in how people approach cognitive problems rather than a change in raw mental horsepower.

His favorite way of explaining this was to imagine asking someone from a pre-industrial rural society what a dog and a rabbit have in common. A modern schoolchild will almost always answer "they are both mammals" or "they are both animals" — a classification into abstract categories. A farmer from a century ago might answer "you use dogs to catch rabbits." Both answers are correct. The farmer's answer is arguably more practical and more grounded in lived experience. But only one of them is what an IQ test scores as "right."

For Flynn, the twentieth century taught people to think more in abstract categories, to manipulate hypotheticals ("if all dogs were purple and..."), and to treat logic as something you apply to symbols rather than to concrete situations. Schools drilled this habit of mind, scientific culture reinforced it, and modern work demanded it. IQ tests had been designed from the beginning to reward exactly that style of thinking. So as people got more practice at it, their scores went up — but their underlying cognitive capacity was not necessarily any greater than their great-grandparents'. Those earlier generations, Flynn insisted, were not dumber. They were just asking different questions about the world.

The Reversal

For most of its history, the Flynn Effect looked like a one-way march. Then, starting in the 2000s, researchers in several Northern European countries began reporting something strange: the gains had stalled, and in some places the trend had actually reversed. Work by Bratsberg and Rogeberg on Norwegian military conscript data, published in 2018, showed a clear decline in average scores among more recent cohorts. Similar patterns have been reported in several other Northern European populations since then.

This "negative Flynn Effect" or "reverse Flynn Effect" is now an active and unsettled area of research. Several different explanations have been proposed, and it is fair to say none of them has won consensus:

  • Changing educational priorities — curricula that shifted away from the kinds of abstract reasoning tasks that IQ tests reward.
  • Digital media habits — less sustained reading, more short-form content, different attention patterns across a generation.
  • Environmental factors — from diet quality and sleep patterns to possible exposures to endocrine disruptors.
  • Demographic and methodological changes — shifts in who gets tested, how tests are administered, and whether the samples being compared are really comparable across decades.
  • Fertility-pattern arguments — a contentious line of reasoning that has attracted more controversy than evidence.

It is important to keep the scale honest. The reported declines are modest, they are concentrated in a small number of countries, and the broader global picture is much less clear. In many places data is too thin to say whether the Flynn Effect is still running, plateauing, or reversing. This is a live debate, not a settled story, and the next decade of data will matter a lot.

Why This Matters for IQ Tests Today

The Flynn Effect is not just a historical curiosity. It has direct practical consequences for anyone taking an IQ test in the twenty-first century.

The most important consequence is that IQ tests must be periodically re-normed. If a test is scored against a standardization sample from the 1970s but administered to a test-taker in the 2020s, the resulting score will be inflated relative to the actual population. That is why major instruments like the Wechsler scales go through regular revisions — WAIS, WAIS-R, WAIS-III, WAIS-IV, and onwards — each anchored to a fresh sample. A score of 100 on one edition does not necessarily correspond to a score of 100 on the next. Someone who scored in the "average" range on an older test could easily land below average on a freshly normed one without any change in underlying ability.

This has real stakes. Clinical diagnoses that depend on IQ thresholds — for learning disabilities, giftedness, or intellectual-disability determinations — can flip based on which edition of a test is used. High-IQ societies like Mensa have had to adjust their qualifying percentiles as tests have been re-normed. Historical comparisons of cohorts, countries, or age groups that naively compare old and new scores without accounting for norm drift can be badly misleading. Understanding which edition you took and when it was normed is part of understanding what your score actually means, as our comparison of the major IQ tests explains in detail.

What It Doesn't Mean

Because the Flynn Effect sounds dramatic, it is often dragged into arguments where it does not belong. A few things worth being careful about:

It doesn't mean our ancestors were cognitively disabled. Naively extrapolating the three-points-per-decade figure backwards makes it look like people in 1900 would have had average IQs in the mid-70s — a range that modern tests categorize as "intellectually disabled." That conclusion is obviously absurd, and the absurdity is itself evidence that raw IQ scores are not measuring a stable, universal quantity. People in 1900 built bridges, ran businesses, wrote books, and navigated complex lives. What changed is not their mental capacity but the kinds of problems our tests reward.

It doesn't mean future generations will be geniuses. Extrapolating forward is just as misleading as extrapolating back. The trend has slowed or reversed in some populations, and even where it is still running there is no reason to think it will continue forever. Historical trends do not guarantee future ones.

It doesn't mean IQ tests are worthless. Within a given cohort at a given moment, well-constructed IQ tests remain among the most reliable and best-validated instruments psychology has ever produced. They correlate with academic and occupational outcomes, they have stable internal properties, and they measure something real. The Flynn Effect complicates historical and cross-generational comparisons. It does not invalidate the within-cohort ranking that a properly normed test provides.

It doesn't settle the nature of intelligence. The Flynn Effect has been used to argue for every imaginable position on the nature-nurture debate, and none of those arguments is clean. The effect tells us that something environmental matters — the gains are too fast to be explained by genetic change — but it does not tell us exactly what, how much, or in which direction the causal arrows point. It is a clue, not a conclusion.

Try Our IQ Tests

All of this is why understanding the context of an IQ score matters almost as much as the number itself. A score means something different depending on which test you took, when the test was normed, and which population you are being compared to. Our free online IQ test is scored against a modern Wechsler-style distribution — mean 100, standard deviation 15 — and uses the kinds of abstract-reasoning tasks that today's psychometric research recognizes as core to fluid intelligence. It reflects where cognitive norms sit right now, not where they sat in 1955.

If you are curious how you would compare against a contemporary reference sample — and what your score actually means once you understand the Flynn Effect, the bell curve, and a century of psychometric history — our test is a good starting point.

Try Our IQ Test

Take a free online IQ test with 18 timed questions across pattern recognition, number sequences, verbal analogies, and logical reasoning. Get your estimated IQ score, percentile rank, bell curve visualization, and score comparison across Wechsler, Stanford-Binet, and Cattell scales.

Open Calculator