In our modern, wired world, scientists now have access to massive amounts of data, and it’s become popular to mine these datasets for correlations and spin the results into a science story. For example, countries with grammar rules that strongly differentiate between the present and the future have lower savings rates. Ergo, language affects how close or distant the future seems, which in turn affects how much people decide to save for the future.
It’s an elegant tale, but remember the first rule of critical thinking (pay attend, future GRE essay section takers!): correlation does not equal causation. Could it be possible that such correlations that seem logical – especially after we have found them and have had time to come up with a plausible sounding explanation for them – are in fact, just accidental? Scientists call these “spurious correlations”; the math seems to check out (more on this later), but they’re not meaningful.
Seán Roberts and James Winters recently published a cautionary tale of such correlation hunting in the open access journal PLoS One. Using standard techniques that have often been applied to such giant datasets, they found several spurious correlations.
My favorite (thanks to my other blog) is shown in Figure 10, below: per capita chocolate consumption correlates with per capita number of serial and rampaging killers!
I can think of 4 explanations for this finding:
1) Chocolate consumption causes people to become rampaging killers.
2) Rampaging killers consume massive quantities of chocolate, enough to up a whole country’s per capita chocolate consumption.
3) Some third variable causes both chocolate consumption to increase and people to become rampaging killers.
4) This statistically significant correlation was in fact, a statistical fluke.
I think we can pretty safely rule out 1 and 2 for being patently ridiculous to nigh impossible. Explanation 3 has some potential and is fun for speculation. Maybe people with the means to buy and eat chocolate, a luxury good, also have the means to go on killing rampages. Being a serial killer takes some time and investment!
I’m going to go ahead and guess that this particular correlation likely falls under Explanation 4: a statistical fluke. In statistics, we’re never able to prove that something is true. Instead, we can only assign a X percent chance that we think something, such as a correlation, could have happened due to random chance alone. When X is sufficiently small, we decide that the correlation didn’t happen due to random chance and instead is a meaningful relationship.
In the chocolate and rampaging killers example above, statistics said that there was just a 2% chance that the correlation was due to random chance. That’s a pretty tiny chance, so it seems unlikely that it just came about randomly, right?
If you’re only looking at one correlation, then yes. But the problem with these giant datasets is that they let scientists look for lots and lots of correlations between lots of different variables. When you look at 100 correlations, a 2% chance means that 2 of those correlations would look statistically significant but would in fact be totally random. If you’re looking for relationships between hundreds of variables, that could be thousands and thousands of correlations, and all of sudden all sorts of random relationships come out as being statistically significant, like a link between chocolate and serial killing.
This is a pretty common problem in science, and it’s one that’s not limited to giant datasets. If you have 100 labs investigating the same thing, say a link between a gene A and disease B, two of those labs may find a statistically significant relationship that’s really just accidental. The two labs that find a significant result publish their data, the 98 labs that found no significant relationships don’t, and the written record only shows the positive findings that gene A is related to disease B.
How do we fix this problem? In some cases, we can’t; we just have to be aware of what statistically significant really means and interpret data with a discerning eye. In other cases, we can turn to the good old scientific method.
In our chocolate consumption and rampaging killers example, say we wanted to test if Explanation 1 (chocolate consumption causes people to become rampaging killers) were true. We could take 1000 pairs of identical twins, with each pair reared as identically as possible. Give 1 twin in each pair a chocolate bar a day, don’t let the other twin have any chocolate, and wait and see which twins turn into rampaging killers.
Impractical? Yes. Unethical? If we really thought chocolate could turn people into rampaging killers, it would be highly unethical! But scientifically sound? Yes, because we’re only manipulating a single variable and holding all else constant!
Seán Roberts, & James Winters (2013). Linguistic Diversity and Traffic Accidents: Lessons from Statistical Studies of Cultural Traits PLoS One DOI: 10.1371/journal.pone.0070902