Simpson’s Paradox in the 2016 Election: Quick Follow-Up Transcript

Transcript (PDF)

In December, I speculated that Simpson’s Paradox may have been at work in the election. We can now back this up a little, using data from the American National Election Studies, or ANES.

Each presidential election, ANES interviews several thousand adult US citizens. ANES is hailed by some political scientists as “the gold standard among public opinion surveys”.[1]

The 2016 ANES study[2] had 4,271 respondents. We’ll keep only those respondents who reported their family income, reported their presidential vote,[3] and reported being either black or white.[4] This leaves us with a sample of 2,513 respondents. Of these, we have 1,425 rich whites, 811 poor whites, 107 rich blacks, and 170 poor blacks.[5]

53.4% of poor whites voted Trump. But only 50.1% of rich whites did so.

4.7% of poor blacks voted Trump. But only 3.7% of rich blacks did so.

Voted Trump
# % #
Whites 1,425 Rich 50.1% 714
811 Poor 53.4% 433
Blacks 107 Rich 3.7% 4
170 Poor 4.7% 8
Total 1,532 Rich 46.9% 718
981 Poor 45.0% 441

We see that poor whites were more likely than rich whites to vote Trump. Similarly, poor blacks were more likely than rich blacks to vote Trump.

Now. When we add up whites and blacks, something funny happens. 46.9% of rich whites and blacks voted Trump. But only 45.0% of poor whites and blacks did so.

This is Simpson’s Paradox. In each subgroup, the poor were more likely than the rich to vote Trump. But paradoxically, when aggregated, the rich were more likely than the poor to vote Trump.

Endnote. All we’ve shown here is that in a sample of 2,513 respondents, Simpson’s Paradox was at work. Ideally, we’d like to extrapolate from this sample and say that Simpson’s Paradox was also at work in the actual voting population. But unfortunately, we can’t, because the sample size is too small.

If you’re interested in more of the gory details, check out my Slightly More Detailed Analysis listed in the description below.

Footnotes

[1] Aldrich & McGraw, 2012, Chapter I, Improving Public Opinion Surveys, p. 3.

[2] 2016 ANES Time Series Study.

[3] I kept only those respondents who specified the candidate they voted for.

[4] I also dropped 44 respondents who were both white and black.

[5] As I did in the previous video, I simply label those with annual family incomes above 50K “rich”, and those with lower family incomes “poor”. I hope no one gets too annoyed by these labels.

Slightly More Detailed Analysis of the ANES and CCES Data

The American National Election Studies 2016 Time Series Study had 4,271 respondents. (3,090 of these were internet and the remaining 1,181 face-to-face.)

  • Out of 4,271, only 4,158 reported their family income.
  • Out of 4,158, only 2,749 voted for President AND reported their vote.
  • Out of 2,749, 277 self-identified as black (but not white) and 2,236 self-identified as white (but not black). (The 44 who self-identified as both black and white were excluded. The remaining 192 did not self-identify as being either black or white.)

This leaves us with 2,513 respondents and the following breakdown in Table 1 (which was shown in the video). Poor whites were more likely than rich whites to vote Trump. Likewise, poor blacks were more likely than rich blacks to vote Trump.

But paradoxically, when aggregated, rich whites and blacks were more likely than poor whites and blacks to vote Trump. This is Simpson’s Paradox.

Table 1.

Voted Trump

#

%

#

Whites

1,425

Rich

50.1%

714

811

Poor

53.4%

433

Blacks

107

Rich

3.7%

4

170

Poor

4.7%

8

Total

1,532

Rich

46.9%

718

981

Poor

45.0%

441

The thing though is that ANES does not use a simple random sample. As stated in DeBell (2010), “How to Analyze ANES Survey Data” (PDF),

ANES data analysis should be done with weights for any analysis that is intended to generalize to the population, because unweighted ANES data do not represent the population as well as weighted data.

The above data are unweighted. Meaning we cannot generalize from the above data and say that 50.1% of rich whites, 53.4% of poor whites, etc. voted Trump. To make such generalizations, we must use the appropriate weights.

The appropriate weights to use are given on pp. iii-iv of “User’s Guide and Codebook for the ANES 2016 Time Series Study” (PDF). Once we use these weights, we get the following numbers, which are slightly different from the above, but again tell the Simpson’s Paradox story:

Table 2.

 Voted Trump

%

Whites

Rich

50.7%

Poor

53.7%

Blacks

Rich

4.4%

Poor

6.7%

Total

Rich

47.0%

Poor

44.0%

The weighted data tell us that poor whites were 3.0 percentage points more likely than rich whites to vote Trump (p = 0.264); poor blacks were 2.3 percentage points more likely than rich whites to vote Trump (p = 0.590); poor whites and blacks were 3.0 percentage points less likely than rich whites and blacks to vote Trump (p = 0.231). This again is Simpson’s Paradox.

But unfortunately, none of these differences are statistically significant at the 5% level. Indeed, one huge drawback of ANES is its small sample size, which gives us very little statistical power.

CCES

The Cooperative Congressional Election Study (CCES) is the upstart rival to ANES. ANES began life in 1948 (as the Michigan Election Studies). In contrast, CCES began only in 2006.

CCES is conducted entirely online and is an opt-in survey. But CCES does have one huge advantage — a much larger sample size. The 2016 CCES had 64,600 respondents. That’s more than 15 times the size of the ANES sample.

We can repeat for the CCES data much the same analysis that we did with the ANES data. Just like ANES, the CCES data need to be appropriately weighted.

As before, let’s start with the unweighted CCES data:

Table 3.

Voted Trump

#

%

#

Whites

18,785

Rich

45.8%

8,600

12,152

Poor

48.3%

5,872

Blacks

1,879

Rich

8.2%

154

1,948

Poor

4.8%

93

Total

20,664

Rich

42.4%

8,754

14,100

Poor

42.3%

5,965

In contrast to the ANES data, we no longer have Simpson’s Paradox. Poor whites were more likely than rich whites to vote Trump. However, poor blacks were less likely than rich blacks to vote Trump.

And when aggregated, poor whites and blacks were just a whisker less likely than rich white and blacks to vote Trump.

The story’s the same when we look at the weighted CCES data:

Table 4.

 Voted Trump

%

Whites

Rich

51.8%

Poor

55.4%

Blacks

Rich

12.3%

Poor

6.4%

Total

Rich

47.5%

Poor

46.2%

The weighted data tell us that poor whites were 3.7 percentage points more likely than rich whites to vote Trump (p = 0.000); poor blacks were 5.9 percentage points less likely than rich blacks to vote Trump (p = 0.002); and poor whites and blacks were 1.3 percentage points less likely than rich whites and blacks to vote Trump (p = 0.157).

Conclusion

The ANES 2016 data suggest that Simpson’s Paradox was at work. However, none of the rich-poor differences were statistically significant at the 5% level.

In contrast, the CCES 2016 data do not suggest that Simpson’s Paradox was at work. In particular, they show that poor blacks were less likely than rich blacks to vote Trump.

Nonetheless, the CCES 2016 data do show that poor whites were significantly more likely than rich whites to vote Trump. This is in marked contrast to previous US presidential elections. And here we can again make the point that was already made at the end of the first video: it was this conspicuous swing in the poor white vote that won Trump the presidency.

Instructions for Replicating the Above Results

(I assume you have Stata.)

ANES:

  1. Open this Stata dataset: anes_timeseries_2016.dta. (Originally downloaded from this URL. Note though that ANES periodically updates the dataset. The version I used was that of May 2nd, 2017.)
  2. Copy-paste-enter in Stata the text in this text file: ANES.txt.
  3. The two tables midway through the Stata output contain the numbers for Table 1 (those shown in the video).
  4. The regressions at the end contain the numbers for Table 2.

CCES:

  1. Open this Stata dataset: CCES2016_Common_FirstRelease.dta. (Originally downloaded from this URL. Note though that is merely the first release of March 3rd, 2017. Newer, updated data will probably be released in the future.)
  2. Copy-paste-enter in Stata the text in this text file: CCES.txt.
  3. The two tables midway through the Stata output contain the numbers for Table 3 (those shown in the video).
  4. The regressions at the end contain the numbers for Table 4.