Raven's Progressive Matrices: The 1936 Test That Still Works

In 1936, a British psychologist named John Carlyle Raven published a test that no one expected would still be in routine clinical and research use ninety years later. It had no words. It had no numbers. It asked the taker to look at a grid of abstract shapes with one piece missing and choose the correct completion from a set of candidates. That was it. That was the whole test.

Most psychological instruments from the 1930s are museum pieces today — replaced, retired, or quietly forgotten. Raven's Progressive Matrices is not. It is still published, still administered, still cited in research papers, and still considered one of the purest measures of general intelligence ever devised. The question worth asking is why. Why does a test built from black-and-white squares and triangles continue to outlive the more elaborate assessments that came after it?

Who Was John C. Raven?

John Carlyle Raven (1902–1970) was a British psychologist whose career coincided with one of the most turbulent periods in the history of intelligence testing. When he started graduate work in psychology in the early 1930s, the field was dominated by verbally-loaded instruments imported from the United States and France — descendants of the Binet-Simon scale and its American revision, the Stanford-Binet. These tests relied heavily on vocabulary, cultural knowledge, and verbal comprehension, which meant that anyone with limited schooling, limited English, or a hearing impairment was at a significant disadvantage before the test even began.

Raven was influenced by Charles Spearman's theory of a general intelligence factor — what Spearman called g — which Spearman had argued underlay performance across virtually every kind of mental test. If Spearman was right, then in principle you should be able to measure intelligence without words at all, so long as you could design a task that demanded the same kind of abstract reasoning that verbal tests were trying to tap. That was the project Raven took on: build a nonverbal instrument that could measure g without relying on language or cultural knowledge. The result, published in 1936, was the Progressive Matrices.

Raven continued working on and refining the test for the rest of his life, producing the three main versions that are still in use today. After his death, his family and collaborators kept the instrument in publication, and it eventually passed to Pearson Assessment, which publishes it in its current editions.

The Core Idea: Visual Pattern Completion

The structure of a Raven's item is simple enough to describe in a single sentence, which is part of what makes the test so portable. The taker is shown a matrix — usually a three-by-three grid of abstract visual patterns — with the bottom-right cell left blank. Below or beside the matrix sits a panel of candidate pieces. The taker's job is to figure out which piece belongs in the missing cell.

What makes this deceptively difficult is that the relationship between the cells can follow any number of rules, and the taker has to discover which rule is operating before they can pick the answer. A single matrix might test whether you notice that shapes are rotating by a fixed amount as you move across a row. The next might add and subtract features as you move down a column. The next might alternate between two different transformations. Later items combine multiple rules simultaneously, so that the missing cell has to satisfy a rotation rule and a color or shading rule and a counting rule at the same time.

Because the content is purely visual, the taker does not need to read instructions in any particular language to understand what is being asked. A demonstration with two or three worked examples is enough for anyone who can see the page to understand the task. That has made the test unusually portable across languages, cultures, education levels, and age groups — which, in turn, is a large part of why it has stayed in use for so long.

The Three Versions: SPM, CPM, APM

Raven did not build one test. He built a family of three, each targeting a different population and ability level. All three share the same underlying task — complete the matrix — but they differ in difficulty, item count, presentation, and norming.

Standard Progressive Matrices (SPM) — The original and most widely used version, designed for the general adult population. The SPM contains 60 items divided into five sets (labeled A through E), with each set becoming progressively harder. It is the version most people are referring to when they say "Raven's Matrices" without qualification.
Coloured Progressive Matrices (CPM) — A shorter, easier version designed for children roughly aged 5 to 11, for elderly adults, and for individuals with intellectual disabilities or cognitive impairments. The items are printed on a colored background to help keep younger and older takers engaged, and the difficulty ceiling is lower than the SPM.
Advanced Progressive Matrices (APM) — A harder version built for high-ability adolescents and adults, used in selective academic, professional, and occupational contexts where the SPM ceiling is too low to discriminate among top performers. The items in the APM are more complex, requiring takers to track several rules at once.

Between them, these three versions cover ages five through adulthood and ability levels from significant intellectual disability through near-ceiling giftedness. That range is one reason clinicians have continued to reach for the instrument for ninety years: they can assess a five-year-old and a PhD candidate using the same underlying task, with the same theoretical grounding, just on different forms.

Progressive Difficulty

The word "Progressive" in the test's name is not decorative. It describes the defining feature of how items are arranged: they start simple and get systematically harder. The first item in the SPM's first set is almost trivial — anyone who understands the instructions will solve it in a few seconds. By the end of the fifth set, the items are challenging enough that only a small fraction of the adult population can solve them at all, and many of those who can still make errors.

That ordering is not an accident of test construction — it is a design choice with a specific purpose. A test that starts with easy items and builds up lets every taker demonstrate some competence, which matters both for motivation and for measurement. The taker's score is effectively a record of how far up the difficulty ladder they were able to climb before their accuracy collapsed. That ceiling is a more accurate estimate of reasoning capacity than you would get from a test where every item is the same difficulty, because it measures where the taker's ability actually runs out rather than just how many average-difficulty items they happened to get right.

It also means that people at very different ability levels can take the same test without the experience feeling lopsided. A taker with modest abstract reasoning skills will find the early items manageable and the late items hard but interpretable. A taker with very high abstract reasoning skills will breeze through the early items and find the late ones genuinely challenging. Both walk away having encountered items that were informative about their ability — which is a remarkably hard design goal to achieve, and one that a uniformly-difficulty test simply cannot.

Fluid vs Crystallized Intelligence

To understand what Raven's test is actually measuring, it helps to know the distinction that psychometricians draw between two broad kinds of intelligence: fluid and crystallized. The terms come from Raymond Cattell, working a couple of decades after Raven, but they neatly describe the theoretical territory Raven was aiming at.

Fluid intelligence, often written as Gf, is the ability to reason about novel problems — to notice patterns you have never seen before, hold multiple relationships in mind at once, and deduce rules from incomplete information. It is the kind of intelligence you would need if you were handed a game with rules you had to figure out as you went. Crystallized intelligence, or Gc, is the stock of knowledge, vocabulary, facts, and well-practiced procedures you have accumulated over a lifetime of learning. Crystallized intelligence is what a vocabulary test, a general knowledge quiz, or a verbal comprehension subtest taps into.

Raven's Matrices is widely regarded as one of the purest available measures of fluid intelligence, and by extension one of the purest measures of Spearman's g. The test gives the taker no vocabulary to rely on, no facts to recall, no procedures to apply. Every item is a novel problem that has to be solved from scratch using the raw ability to see patterns. That is fluid intelligence in something close to its pure form.

The Wechsler scales and the Stanford-Binet, by contrast, measure a broader blend of fluid and crystallized abilities. They include vocabulary, general information, arithmetic, and similarities subtests alongside more Raven-like matrix reasoning tasks. That breadth is useful when you want a comprehensive cognitive profile, but it also means the final IQ score mixes together genuinely different things. Raven's, by narrowing its focus, gets you a cleaner read on one specific thing.

Why It's Still Used 90 Years Later

A lot of psychological tests from the 1930s have been retired, revised beyond recognition, or quietly dropped from the field's standard toolkit. Raven's has not. When you look at the reasons, most of them come back to a small set of properties that the test happens to have in unusual combination.

It is almost entirely nonverbal. A test that can be administered without a shared language is enormously valuable in a multilingual clinical practice, in cross-cultural research, with deaf takers, with young children who have not yet developed verbal fluency, and with anyone whose primary language differs from the examiner's.
It correlates strongly with general intelligence. Among tests that are simple enough to administer in an afternoon, Raven's is consistently near the top of the pack in terms of its loading on Spearman's g. It is a remarkably efficient way to get a lot of information about fluid reasoning ability.
It is well-normed. Decades of research have produced norms from many countries and populations, which makes it one of the more useful tools for cross-cultural comparative work and for interpreting individual scores against a relevant reference group.
It is easy to administer. There is no scripted verbal patter, no complicated setup, and minimal equipment beyond the test booklet or screen. An examiner can be trained to administer it in a short time, and there is relatively little room for examiner effects to contaminate the score.
It does not depend on specific educational content. A student who was taught algebra is not advantaged on a Raven's item the way they would be on a math subtest. The information the taker needs is contained entirely within the matrix on the page.

The test has found its way into clinical assessment for intellectual disability and giftedness, educational placement decisions, military selection (where nonverbal reasoning tests have a long history), research on cognitive development and aging, and occupational assessment for roles where abstract reasoning is central to the job.

The Culture-Fair Debate

Raven's Matrices is often described as a "culture-fair" test, sometimes even a "culture-free" test. The first label is defensible with qualifications. The second is not, and it is worth understanding why.

Compared with a vocabulary or general knowledge test, Raven's is dramatically less culturally loaded. It does not ask the taker to know what a particular word means, or to have heard of a particular historical figure, or to have encountered a particular fable. In that sense, it strips away many of the most obvious cultural advantages that more traditional IQ tests confer on takers from the culture that built them. If you are comparing how a French adult and an English adult score on a vocabulary test administered in English, the comparison is not meaningful. If you are comparing how they score on Raven's, it is at least much closer to being meaningful.

But researchers who have looked closely have repeatedly pointed out that the test is not free of cultural influence — just more subtle about it. Exposure to abstract visual symbols in print, familiarity with the conventions of formal testing (sitting quietly, selecting one answer from several, working down a numbered list), and the implicit expectation that patterns mean something and can be deciphered are all things that formal schooling tends to cultivate. A taker who has spent twelve years in a classroom has had extensive practice with the exact kind of symbolic, decontextualized reasoning Raven's rewards. A taker who has not had that experience is approaching the task somewhat cold, even if the specific content is novel for both.

The honest summary is that Raven's is more culture-fair than the alternatives that existed when it was designed, and more culture-fair than most of the instruments that came after it, but it is not culture-free in an absolute sense. That is a meaningful distinction for researchers doing cross-cultural comparisons and clinicians working with takers from backgrounds very different from the norming sample — not a reason to discard the test, but a reason to interpret its results with the same care any psychological measurement deserves.

Timed vs Untimed

The original Raven's protocol was relatively generous with time. Raven was interested in measuring reasoning ability, not processing speed, and he designed the test so that most takers could finish without feeling rushed. Clinical administration to this day is often untimed, or only loosely timed, for exactly this reason — if the point is to find out what someone can figure out, there is no obvious benefit to cutting them off at the moment they are about to do so.

That said, timed variants exist, and they measure something real and related but distinct. When you put a stopwatch on matrix reasoning, the score you get is no longer a pure measure of reasoning capacity — it now also reflects how quickly the taker can apply that capacity under pressure. Processing speed is itself a well-studied dimension of cognitive ability, and it correlates with fluid intelligence without being identical to it. A taker with excellent reasoning ability but a methodical, deliberate style can score lower on a timed matrix test than on an untimed one, even though both tests are tapping similar underlying abilities.

Neither protocol is automatically "correct." The right choice depends on what you are trying to learn. If you want a ceiling estimate of pure pattern reasoning, untimed administration is the classical approach. If you want a measure that is closer to real-world exam conditions — where time pressure is real and unavoidable — a timed variant gives you a more relevant score.

Try Our Raven's-Style Tests

We offer two free tests inspired by the Raven's format. Neither is the actual Standard, Coloured, or Advanced Progressive Matrices — those are copyrighted instruments published by Pearson Assessment and administered by qualified examiners. Our tests use pattern-only items drawn from our question bank, twelve questions per sitting, built to give a sense of what nonverbal matrix reasoning feels like and to provide a rough estimate of ability in that specific domain.

The untimed Raven's-Style Matrix Test is the one to take if you want a clean measurement of pure pattern reasoning. It has no time limit on any question, so you can sit with a difficult matrix for as long as you need, working through possible rules and testing candidate answers before you commit. This is the closest our format gets to the original Raven's philosophy of giving the taker enough room to think.

The timed Raven's-Style Matrix Test uses the same twelve-question pattern-only format but imposes a strict sixty-second limit on every question. That changes the experience considerably. You are no longer just testing whether you can solve a matrix — you are testing whether you can solve it before the clock runs out. It is a closer simulation of how cognitive assessments are often actually administered in real selection and admissions contexts, where the test is both a reasoning task and a speed task at once.

Taking both, in sequence, is a useful exercise for anyone curious about their own profile. If your score drops noticeably under time pressure, you know that processing speed is a bigger factor for you than raw reasoning capacity. If it holds up, you know the opposite. Either is informative, and neither makes you worse at the underlying skill — it just tells you something about how you deploy it.

Try Our IQ Test

Take a free online IQ test with 18 timed questions across pattern recognition, number sequences, verbal analogies, and logical reasoning. Get your estimated IQ score, percentile rank, bell curve visualization, and score comparison across Wechsler, Stanford-Binet, and Cattell scales.

Open Calculator