Probably Approximately Correct

MSc Machine Learning @ UCL | Alumnus @ IIT Madras| Google DeepMind Scholar | Interests: Machine learning

Wednesday, May 24, 2023

Correlation ≠ Causation (#POST2)


In daily life, however, people often confuse the two even if scientists all over the world are generally in agreement that correlation does not imply causation. Another study by Messerli (2012) found an interesting positive correlation between chocolate consumption and Nobel Laureates per country. However, these latter data have given way to even stronger relationships than those shown below. Nonetheless his intention is unclear; Messerli is careful not to mix up this relationship with a cause and effect one. This was in contrast to the statement made by “the chocolate industry” which claimed that “eating chocolate leads to Nobel prize winners”(Nieburg, 2012).

The presence of such a correlation between two variables does not mean that there is automatically a causal relation because statistical association alone cannot determine causal relationships with certainty For example, it may be possible that more Nobel laureates results in more chocolate consumption or vice versa due to some factors connected with winning the Nobel prize which Mr. Messerli (2012) referred to as celebrations. There might also be unobserved variables such as socio-economic status or education quality that influence both the number of people who win Nobel prizes and consume chocolate therefore making this correlation non-causal or spurious. These possibilities are according to Reichenbach (1956) common cause principle:

Statistically dependent random variables X and Y imply: (1) X causes Y, (2) Y causes X, or (3) a third variable Z causes both X and Y. Additionally, X and Y become independent given Z, denoted as X ⫫ Y | Z.

Interventions are used for disentangling causality from uncertainty where it exists in causal relationships Hence we can consider studying how increased consumption of chocolates would lead to an increase in noble prize laureates by forcing Austria citizens consume more chocolates In most cases though these kinds of interventions cannot practically be done due ethical considerations or logistics Some similar challenges come up when conducting randomized controlled trials on smoking and lung cancer Nevertheless certain assumptions met help us make causal inference without necessarily having true experiments These assumptions grow strong as we move up levels of causal hierarchy.


Vigen (2015), Spurious correlations : 



Accordingly, the annual number of people who drown in swimming pools has a significant correlation with films that feature Nicholas Cage [above figure]. This correlation raises intriguing questions: Do people’s swimming habits somehow have a link with Nicholas Cages films? Does he want to act in more movies when he sees many people drowning? Is there any other explanation? However, these possibilities seem highly unlikely and suggest a spurious correlation with no causal relationship.

Next post will delve into levels of association, intervention and counterfactuals by way of observing, implementing and imagining respectively. In addition, we will use another example to demonstrate how spurious correlations may emerge thereby making it easier to understand this concept. Look out for an even clearer and more interesting discussion.

Labels:

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home