David Hume defined cause as he says is an object A followed by another B such that if there were no A, there would be no B. Nevertheless, Karl Pearson stressed the importance of correlation in scientific research and considered causality to be just a particular case of correlation. Although Pearson categorized correlations into ‘genuine’ and ‘spurious,’ modern causal inference went beyond it recognizing the fact that Hume’s definition always contains its own impossible counterfactual element and thus has problems with directly observing counterfactuals.
Randomized experiments were also designed by Neyman in order to solve this problem, and later on Rubin extended this framework to observational settings. This theory explains why some people are observed with certain outcomes while others are not, yet all individuals are in a situation where they could have experienced either outcome. Judea Pearl introduces another way of causal reasoning which uses Directed Acyclic Graphs (DAG) for illustrating causal relationships. The Structural Causal Models (SCM) underpinning this approach focus on interventional and counterfactual statements.
Both these frameworks make valuable contributions to understanding causal inference and have found applications in disciplines such as epidemiology, medical statistics, economics, artificial intelligence etc., discussed briefly below: In this blog post we will specifically dwell on the potential outcome model and its implications,
Fundamental Problem of Causal Inference : Consider two scenarios,

Scenario 1: The subject’s unhappiness leading them to consider getting a dog to become happy is one example from the given task. Thus purchasing happiness afterwards does not show any direct relationship between them while purchasing without it cannot be directly shown to lead causally or casually towards happiness.
Scenario 2: There is only a slight modification. So, in this case it is assumed that acquiring a dog leads to happiness while not acquiring one leads to continuous sadness thereby giving the dog quite strong claim for causing individual happiness.
In the previously mentioned cases, a potential outcome framework is used where happiness is the interest of the objective being referred to as Y. Precisely if one is happy then Y=1 while if one is unhappy then Y=0. The treatment variable T stands for deciding whether or not to get a dog. T= 1 could mean getting a dog while T = 0 implies not getting a dog. In order to capture the possible outcomes, we refer to Y(1) as the expected level of joy if an individual acquires a pup (T = 1). Also, Y(0) represents the potential happiness outcome if no dog will be bought (T = 0). For example, in situation one, both Y(1) = 1 and Y(0) = 1 represent happiness irrespective of whether they have dogs or not. On other side, scenario two in which Y(1)= 1 and Y(0) = 0 indicates that only people with dogs are happy. It should be noted that the potential outcome denoted by ‘Y(T)’ reflects what would happen when subjected under treatment ‘T’ contrary from what was observed in actual sense. Consequently, all feasible results are not observed but depends on an actual value of treating variable that is observed results are varies with reality depending on the actual value of T .
The individual treatment effect (ITE) or individual causal effect is defined as below: τi ≜ Yi(1) - Yi(0)
The potential outcome variable Y(t) becomes random when multiple people exist within a population because each individual can have different potential outcomes under treatment t. However, the observed outcome variable Yi(t) is considered as nonrandom in most of the cases. This is due to the presence of subscript “i” which denotes that we are looking at a specific individual and his/her context, thus it narrows down our focus to one person within one particular context only. In this case, since we have taken into account these aspects of this particular person’s circumstances, their potential outcomes become known quantities; they are fixed and not subject to any randomness. The determinate nature allows us to analyze and study the causation for that given unit in that particular setting without any ambiguity regarding what possible outcomes it might have. Also in scenario 2, choosing a dog is driven by its positive causal effect on happiness testified by Y(1) - Y(0) > 0. Conversely, according to scenario 1 getting a dog doesn’t make one happier than they were before since Y(1) - Y(0) = 1 - 1 = 0. Thus, deciding against having a dog in example 1 recognizes that being happy does not depend on whether or not one has a dog around him/her. The equation for observed outcomes can be written as:
Yi = Ti * Y1i + (1 - Ti) * Y0i
This equation states that the observed outcome (Yi) for an individual i is determined by the treatment assignment (Ti). If the treatment assignment (Ti) is equal to 1, the observed outcome is equal to the potential outcome under treatment (Y1i). Conversely, if the treatment assignment (Ti) is equal to 0, the observed outcome is equal to the potential outcome without treatment (Y0i).
The paragraph above discusses the analogy of potential outcomes given by Brady Neal, a prominent author on causality. It is Neyman’s approach to observational studies as if they were experiments with proper controls. In contrast to random assignment, the model proposed by Neyman uses an urn model that can be considered similar to natural experiments that are as-if randomized in the social and health sciences. There are a limited number of treatment levels which this nonparametric model allows for. Additional work by Holland, Rubin and others considers continuous treatment variables and parametric models, such as linear causal relationships. By turning to the simplest kind of experiment where there is treatment and control groups only; hence, there are many subjects in the population. A subset of populations (Xi) is randomly selected and assigned to the treatment group (Ti) whereas the remaining population (Xi-Ti) constitute the control group. In fact within Neyman-Holland-Rubin model each subject has two possible responses given when it belongs to either a sample group or its complement group. Nevertheless due to some practical constraints i.e., it becomes very hard at times to observe both these answers together. The study population has three different key parameters: 1) The average response if all units were treated, 2) The average response if all units were not treated, and 3) difference between these two averages
Further reading : https://www.kurims.kyoto-u.ac.jp/~kyodo/kokyuroku/contents/pdf/1703-09.pdf
In the context of the example involving getting a dog, it is possible to observe the potential outcome Y(1) by acquiring a dog and evaluating one's subsequent happiness. Similarly, one could observe Y(0) by not getting a dog and assessing their happiness. Nonetheless, observing both Y(1) and Y(0) is impossible without time travel to go back in time and chose another treatment. Just adopting a dog, watching its effect Y(1), returning it then seeing how its absence affects me does not hold water given that the actions taken between these observations as well as other changes will affect this second observation. This inherent challenge is called the fundamental problem of causal inference because we cannot directly observe the causal effect Yi(1) - Yi(0) without having access to both potential outcomes.
COUNTERFACTUALS
Counterfactuals are unobserved potential outcomes because they are different from what really happened. Some time we call them counterfactual outcomes in some instances. On the other hand, measured potential outcome is sometimes known as factual. It should be understood that counterfactual and factual can only be defined after an outcome has been observed. Before that, there are just potential outcomes.
Average Treatment Effect (ATE) or the Average Causal Effect refers to a measure of the mean difference in outcomes between treatment groups. This is obtained by averaging the Individual Treatment Effects (ITEs) represented by τi which refer to the differences in possible outcomes if treated versus untreated:
τi ≜ E[Yi(1) - Yi(0)] = E[Y(1) - Y(0)]
One of the natural quantities that may come to mind is the associational difference which compares the expected outcome when the treatment variable (T) is set to 1 and when it is set to 0: E[Y|T=1] - E[Y|T=0]. In this way, this measure captures any differences in outcomes relating to treatment variables. But without causality. The ATE can be expressed as a function of association difference using linearity property of expectation: ATE = E[Y(1) - Y(0)] = E[Y(1)] - E[Y(0)]. The mathematical representation therefore implies that the expected outcome (Y=1/0) under treatment and no-treatment differs from each other by the expected outcome if T were 1 or 0. However, one should note that there are times when the associative difference, i.e., E[Y|T=1] - E[Y|T=0], and causal difference, i.e., E[Y(1)] – E[Y(0)], are not necessarily interchangeable terms. This would mean reducing causality into association if they are similar. Because of confounding effect, they are not equivalent.
In our previous post, The confounding effects were addressed by us in the two scenarios. To remind you of this, see that post which demonstrates how variable X plays a role in causing confusion influencing treatment T as well as outcome Y. The non-causal relationship is explained through Y <- X -> T pathway.
In our subsequent posts, we shall explore solutions to the fundamental causal problem and present a Python code for calculating average treatment effects. After that, we will introduce research articles and discuss them briefly with specialist readers in mind.
Slides: https://scholar.princeton.edu/sites/default/files/jmummolo/files/po_model_jm.pdf
Papers:
Splawa-Neyman, J. (1990). On the Application of Probability Theory to Agricultural Experiments: Essay on Principles, Section 9. (Original work published in 1923)
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies.
Sekhon, J. S. (2008). The Neyman-Rubin Model of Causal Inference and Estimation via Matching Methods.
Labels: #potentialoutcomes