## Simpson’s Paradox Definition

Simpson’s paradox is a phenomenon which appears in statistics. It is an instance in which the total data set shows one trend while subsets of the data set show the opposite trends or none at all. Simpson’s paradox goes by many names, among them the Yule-Simpson effect, reversal paradox, or amalgamation effect. This statistical illusion comes from ignoring or overlooking causal relations. It often appears when conducting medical or social science experiments on a large population of individuals that differ from one another in a way significant to the study. If a significant variable is overlooked, then the results of the analysis are skewed. However, when these causal relations and variables are fully accounted for, the false trend can be eliminated from analysis.

## History of the Simpson’s Paradox

Though Simpson’s paradox was not named until the work of Edward H. Simpson in 1951, it had been noted by the statisticians of years before. One study which brought this phenomenon to academic attention was conducted by Ernst Negel and Morris Cohen in 1934. They were looking into the 1910 death rates due to tuberculosis in two cities: New York and Richmond. The paradox that they found was that the death rate was lower for African Americans in Richmond than those in New York in that year. Similarly, the death rate was lower for Caucasians was lower in Richmond than New York. However, when the two populations were combined, the death rates due to tuberculosis were higher in New York than in Richmond.

This was only one of the most recently publicized studies on the matter. Simpson’s paradox had been recognized in a 1903 study performed by Udny Yule and another by Karl Pearson in 1899. Statisticians were left with a quandary. At the time, they preferred to avoid any talk of causal relations as unscientific or at least unsuited to scientific inquiry and the construction of theories. However, this paradox demonstrated that certain causal factors could not be ignored if the statistical analysis were to be valid.

Finally, in 1951, Simpson published a technical paper that explored this phenomenon in the studies mentioned above. In addition to careful mathematical analysis of the original data, Simpson included clear illustrations of the phenomena and discussions in easily accessible terms. It was this clear and simple description that brought the attention into the realm of common knowledge, popularizing the paradox and laying groundwork to avoid it in later studies. In 1972, more than two decades later, the term Simpson’s paradox was finally coined by Colin Blyth.

Several later works on the Simpson’s paradox explored the phenomenon further. One key work was published by Martin Gardner in 1976. This article, published in the Mathematical Games column of *Scientific American*, discussed the dangers of misapplied statistics and brought the phenomenon to the attention of the layperson. Next, Nancy Cartwright applied this phenomenon in 1979 to the modern understanding of science. Cartwright emphasized the need for taking causal factors and relations into account when constructing scientific theories. This was significant in that it turned the tables on previous attitudes regarding causal relations in scientific inquiry.

## Simpson’s Paradox Example

One of the reasons that the Simpson’s paradox is important is because it helps to illustrate potential misjudgments or erroneous assumptions that can arise from the mishandling of data. A classic case of this, and one that was highly publicized, was a 1973 lawsuit brought against the University of California in Berkeley. Berkeley was accused of gender discrimination in their hiring practices. In this instance, the total number of male applicants was compared to the percentage hired in the fall of 1973. The same comparison was made between female applicants and females hired. The numbers showed that 44% of men were hired, compared to only 35% of women. Though this may seem to be a small difference, from a statistical standpoint it is too large to be attributed to chance.

However, the data looks very different when the applicants were broken down by department:

After assessing the hiring percentages by department, it is apparent that hiring over the fall of 1973 showed a small (yet statistically significant) bias in favor of women. In this instance the mishandling of the data misrepresented Berkeley’s hiring practices over that period and could have resulted in a major discrimination lawsuit. And yet, the discrimination was nothing more than a statistical illusion. As will all instances of Simpson’s paradox, it was caused by ignoring a significant variable. In this case, the variable was the department for which people were being hired.

Though this case is compelling in itself, it is but one of several that have come to public attention. One area in which Simpson’s paradox occurs frequently is in medical trials. Statistics in medical trials are of supreme importance. They can show the relative efficacy of a certain medicine or treatment and lead to decisions on medical policy. In one real-life example from the field, the relative efficacy of two treatments was compared for patients with kidney stones. For this case, it is important to remember that patients can have different severity. In less severe cases, the stones are smaller, while in more severe cases, the stones are larger.

The case in question was a comparison between open surgery and percutaneous nephrolithotomy. The latter involves a small puncture rather than a major surgery. When the percentage of success was examined for both large and small stones, it appeared that open surgery was more effective. However, when the population was divided by severity, into patients with larger stones and smaller stones, the statistics clearly demonstrated that percutaneous nephrolithotomy was more successful in both cases. The following table gives the actual figures for this study:

If Simpson’s paradox had been ignored in this study, a less effective form of treatment would have been adopted for the treatment of kidney stones. While this may seem a small thing, it is measured in lives and quality of life for the patients involved.

## Challenges of the Simpson’s Paradox

There are numerous challenges regarding Simpson’s paradox, not the least that it stands contrary to surface logic and intuition. We expect that, if it is one way for the total population, then it should be that same way for each subpopulation. Are we to regard the aggregated total or the partitioned results as the “correct” result? And, unfortunately, there is no straight answer to this. The key is that, for the statistics to be accurate, the populations must be divided according to the significant variable(s). If the data set is divided in one way, the results may be very different than if divided according to a different variable. Furthermore, there is no end to the divisions available, and there is no absolute way to know which variable is significant in any given situation.

To make this clear with an example, consider the kidney stone trial described above. Instead of dividing the populations by severity, we might instead divide by male and female. Or, the population could be divided by age group. Say we were to divide the results by something seemingly arbitrary, such as eye color. Or blood type. Or ethnicity. Each division of sample grouping might provide a different result. And each might be the significant variable that provides accurate results in a given situation. Because of this, we are forced to wonder which results to follow. In some instances, the aggregated total is more accurate than the partitioned results.

## Final Words

Simpson’s paradox highlights the challenge of causal relations and causal inference. Hume’s work demonstrated that all inference is based in one thing happening before another. It is a feature of our perception that infers causation from association. This is why scientists in the 19^{th} century avoided causal inference in scientific inquiry. And yet, if we were to ignore cause, our thinking and decision making would be crippled. We would be ineffective at handling the situations we encounter from one day to the next. However, the real-life cases described above show that there is a danger in being ignorant of causal relations. In the end, Simpson’s paradox encourages us as thinkers and scientists to be careful when identifying the significant variables of any situation.

## Quiz