Transcript (PDF, spreadsheet)
The night of the elections, there was, of course, the one big surprise. But there was also an exit poll finding that some pundits found surprising.
The narrative had always been that Trump supporters were angry working-class voters, who were mad as hell about income inequality and globalization.
[“I’m as mad as hell and I’m not going to take this any more!”]
So poorer voters should have been more likely to vote for Trump, right?
Yet exit polls showed that nationwide, among voters with annual family incomes above $50,000, 48% voted Trump. But among poorer voters, only 41% did so! In the three decisive states, the story was again the same — richer voters were more likely to vote Trump.
How do we reconcile these exit poll data with the “angry working-class” narrative?
It turns out that these data probably involve Simpson’s Paradox, a statistical fallacy named in honor of this guy. Just kidding. It was named after some boring statistician.
Coulrophobia is the fear of clowns. We have 10,000 coulrophobic patients. A new drug claims to cure coulrophobia. 6,700 take it, while the remaining 3,300 do not.
Among drug-takers, 49% are cured. In contrast, among non-drug-takers, only 36% are cured. A higher percentage of drug-takers are cured. And so we conclude that the drug works!
Mr. Boring Statistician now comes along. “Hold your horses! You forgot to control for weight!” Of the 6,700 drug-takers, 200 were overweight and 6,500 were not. Of the 3,300 non-drug-takers, 1,800 were overweight and 1,500 were not.
Among overweight patients, 5% of drug-takers and 7% of non-drug-takers were cured. Among non-overweight patients, 50% of drug-takers and 70% of non-drug-takers were cured. Within each weight category, those who take the drug were less likely to be cured. So we conclude that the drug doesn’t work.
Which is the exact opposite conclusion from before! Altogether then, if we know the patient is overweight or not overweight, then she should not take the drug. But if we don’t know, then she should! This absurdity is Simpson’s Paradox.
If this seems unbelievable, you can pause here and check for yourself that the numbers add up correctly. You can use this spreadsheet. Again, the surprising paradox here is that the drug works in the aggregate data, but not in the disaggregate data.
Simpson’s Paradox is itself actually an example of two broader fallacies. The fallacy of division and the fallacy of composition. What is true of the whole is not necessarily true of the parts. And what is true of the parts is not necessarily true of the whole.
Gender Discrimination Example
For the next example, I’ll reuse the exact same numbers, but in a different context.
A small university has just two graduate departments. In total, 6,700 men and 3,300 women apply for admission. The university admits 49% of men but only 36% of women. It thus appears as if the university is biased in favor of men.
But when we look at each individual department, the bias is now in favor of women! The English department admits 7% of women but only 5% of men. The Math department admits 70% of women but only 50% of men.
And so both genders have grounds for discrimination lawsuits, depending on which data they look at!
Double Simpson’s Paradox
In case your head isn’t spinning yet, let’s try a double Simpson’s Paradox. Even if you’ve learnt about the paradox before, this may be new to you. Continuing with the same grad school example, say we look also at whether applicants are blonde.
We can partition applicants into four categories: Blonde Math applicants, non-blonde Math applicants, blonde English applicants, and non-blonde English applicants. It is entirely possible that a higher percentage of men are accepted, in all four of these categories!
Altogether then, when looking at the university as a whole, men are more likely to be admitted. But when looking at each of the two departments, women are more likely to be admitted. And when looking at each of the four blonde-department categories, men are once again more likely to be admitted. This is an example of a double Simpson’s Paradox.
Again, there is absolutely no trick here. You should pause and take as long as you need to study and be amazed by these numbers.
Simpson’s Paradox happens everywhere. There are plenty of real-world medical examples. A gender discrimination example happened at Berkeley.
Here’s one from basketball. In 2015, Mo Williams made 87.2% of his free throws and Kevin Durant made only 85.4%. In 2016, Williams made 90.5% of his free throws, while Durant made only 89.8%.
In each year, Williams made a higher percentage of his free throws than Durant. Yet paradoxically, when we add up the numbers across the two years, we find that Durant made a higher percentage! 88.6% vs only 87.8% for Williams!
Simpson’s Paradox was already described in 1899. But surprisingly, even today, there still isn’t consensus as to how exactly to solve it.
And so in most textbooks and YouTube videos, Simpson’s Paradox is simply presented as a freaky anomaly, accompanied by some boilerplate warning: “Statistics can be misleading!” “Don’t jump to conclusions!” The discussion goes no further and this video would usually end right here. The paradox is left unresolved. In particular, when the aggregate and disaggregate data disagree, which should we use?
Fortunately, in the past couple of decades, an academic has worked quite a bit on the paradox. And in 2014, he even wrote that “we can safely proclaim Simpson’s paradox ‘resolved.’”
His solution boils down to this: Come up with the correct causal model. Or more simply, come up with the correct story behind the data. Using this correct story, we can then decide which data to use.
Illustrating Pearl’s Solution to the Paradox
Let’s use the weight-drug example to illustrate. Suppose we believe the only three relevant variables are: whether the patient takes the drug, whether she’s overweight, and whether she’s cured. Then here are two possible stories we might consider.
In Story 1, the drug’s effect is purely harmful. Taking it simply makes patients more coulrophobic.
Next, for reasons unknown, non-overweight patients are more likely to take the drug.
Also, they are more likely to be cured, even without any help.
Altogether, it so happens that most drug-takers are non-overweight. But non-overweight folks are also more likely to be cured, even without any help. And so even if the drug has zero effect, the data will suggest that drug-takers are more likely to be cured. This is what statisticians call a spurious correlation.
In this story, weight is the lurking variable, the troll under the bridge. It affects both the likelihood of being cured and the likelihood of taking the drug. It thereby confounds the true effect of the drug.
We should therefore lock up the troll. In other words, control for weight. Which means the correct data to look at are the disaggregate data. And sure enough, these data reveal the drug’s purely harmful effect and patients should NOT take the drug.
In summary, if Story 1 is correct, then we should control for weight; we should use the disaggregate data; and patients should not take the drug.
From Story 1 alone, it is tempting to conclude that whenever we’re faced with a Simpson’s Paradox, we should look at the disaggregate data. But this is not so. We can easily come up with a different but equally plausible story, Story 2, in which we should not control for weight; we should use the aggregate data; and patients should take the drug.
Here’s how Story 2 goes. Again, the drug’s effect is harmful. And again, the non-overweight are more likely to be cured, even without any help.
But this time, weight has no influence on the patient’s likelihood of taking the drug. All else equal, an overweight and a non-overweight patient are equally likely to take the drug.
Also, the drug now has the effect of reducing weight. Since the non-overweight are more likely to be cured, this means the drug actually helps to cure coulrophobia through the indirect mechanism of weight-reduction.
In this second story, weight is no troll. Instead, it is an important mechanism through which the drug works. And so if we control for weight, we’d be neglecting this important mechanism. We should therefore leave weight alone and not control for it. Which means the correct data to look at are the aggregate data.
Now, from the story alone, it is ambiguous as to whether the direct harmful effect or the indirect beneficial effect is larger. The net effect of the drug is therefore ambiguous.
This is where the data come in handy. What the aggregate data tell us is that the drug’s net effect is beneficial. And so patients should take the drug.
Statistics is More Art Than Science
Story 1 and Story 2 give exact opposite conclusions. So which should we believe?
It is sometimes said that statistics is “more art than science.” Meaning that statistics involves subjective judgment calls, especially when deciding the correct story or causal model to believe. Now, “subjective” doesn’t mean “everyone’s entitled to their own opinion”.
Instead, we must use the best available evidence and theory, along with a good dose of wisdom and a pinch of intuition, to subjectively decide which story is better.
And so for example, if we already suspected that the drug reduces weight and that coulrophobia is linked to obesity, then we might prefer Story 2. But if instead there is no reason to believe the drug has any effect on weight, then we might prefer Story 1.
Back to the Election
Back to the election. Exit polls showed that richer voters were more likely to vote Trump. Some pundits were confused by these data. Some even considered these data to be proof that the “angry working-class” narrative was wrong.
Here’s a CNN interview done a few days after the election.
[Poppy Harlow: “Hillary Clinton won lower-income Americans. $50,000 incomes and below. And Donald Trump won, you know, middle-class and wealthier Americans. What is the Trump movement then? If it’s not necessarily about income inequality, what is it?”]
This journalist on the left may have been confused. But Old Man on the right wasn’t.
[Warren Buffett: “Well Trump won the $50,000-and-under white — particularly white male vote — big time. So that $50,000-and-under has a disproportionate number of minorities that went for Hillary. So you have to segment that further.”]
Old Man nailed it. We must control for race.
Researchers haven’t released more detailed data. So for now, here’s a fictitious example, just to illustrate how Simpson’s Paradox might have arisen. I’ll assume there are only two races — whites and blacks — and only two income categories — rich and poor. I’ll also reuse the same numbers from previous examples.
49% of rich voters voted Trump, but only 36% of poor voters did so. The aggregate data thus suggest that rich voters were more likely than poor voters to vote Trump.
But when we control for race, we have Simpson’s Paradox. 70% of poor whites voted for Trump, but only 50% of rich whites did so. 7% of poor blacks voted for Trump, but only 5% of rich blacks did so. And so the disaggregate data suggest that poor voters were more likely than rich voters to vote Trump. Which is the exact opposite conclusion from before!
So which data should we believe?
As before, we must come up with the correct story behind the data. And this time, one story stands out as being particularly plausible.
Unless you’re Michael Jackson, race is fixed at birth. And so it couldn’t possibly be that race depends on income or voting. Instead, if there are any links between race and income or voting, such links must flow from the former to the latter.
And we do know of two links. First, blacks are poorer. Second, blacks tend to vote against Republicans.
Altogether, it so happens that many of the poor are black. But blacks tend to vote against Republicans. And so the data will automatically suggest that the poor tend to vote against Republicans, even if income has no effect whatsoever on voting. Again: spurious correlation.
In this story, race is the troll under the bridge. We must “lock her up!”
We must control for race and so the correct data to look at are the disaggregate data. And these data are perfectly consistent with the “angry working-class” narrative — poorer voters were more likely to vote Trump.
Simpson’s Paradox. What is true of the overall population is not necessarily true of any individual subpopulation. Conversely, what is true of every individual subpopulation is not necessarily true of the overall population.
So which data should we use? The aggregate or the disaggregate data? The economist’s answer to every question is: “It depends.” Which also happens to be the answer here. Which data we use depends on what we believe to be the causal model or story behind the data.
Researchers haven’t released more detailed data. And until we have such data, we can’t be sure that Simpson’s Paradox actually arose in the 2016 election.
But even from the data currently available, what we can say is that there was a large swing in poorer voters and that this swing won Trump the Presidency.
The three decisive states were Michigan, Pennsylvania, and Wisconsin. Trump’s margin of victory in these three states was razor-thin. And had Hillary won these states, she would also have won the Presidency.
So let’s compare the election results of these three states for 2012 and 2016. In 2012, Romney was handily defeated in all three states.
|Republican candidate’s share of vote, by family income|
|< $50,000||≥ $50,000|
|Rep. minus Dem. share of total vote|
Among voters with family incomes above $50,000, Romney’s shares of the vote were 51%, 56%, and 53%. In 2016, Trump did not improve on these numbers. Indeed, in Wisconsin and Pennsylvania, he did even worse than Romney! So how on earth did Trump win these states?
The answer is, of course, that there was a substantial swing in the poorer vote. Among voters with family incomes below $50,000, Romney won only 36%, 37%, and 31%. Trump improved on these numbers by 6 percentage points in Michigan, 8 in Wisconsin, and 11 in Pennsylvania. Had these swings in the poorer vote been just a tiny bit smaller, Trump would have lost.
There may not have been a dramatic working class revolution. But there was a substantial swing in the poorer vote and this was enough to win Trump the election.
 Pearl, Glymour, and Jewell (2016, p.2) go further by nothing that “The reversals may continue indefinitely, switching back and forth as we consider more and more attributes.” That is, there can be triple, quadruple, etc. Simpson’s Paradoxes.
 Judea Pearl has written quite a few academic articles on Simpson’s Paradox. The 2014 piece (mentioned here) thoroughly explains and dissects the paradox. Pearl is also author of Causality: Models, Reasoning and Inference (2009, 2e) and co-author of Causal Inference in Statistics: A Primer (2016). The latter is a gentler introduction to his ideas on causality.