Probably Approximately Correct

MSc Machine Learning @ UCL | Alumnus @ IIT Madras| Google DeepMind Scholar | Interests: Machine learning

Wednesday, May 24, 2023

Why Causal Inference ? (#POST1)

Simpson's Paradox: Shedding Light on the 'Causal' Phenomenon

                                                                               
                                                                                               

In a speculative future, humanity is faced with the task of tackling a novel hypothetical malady known as Echovirus-39. As we move forward in medical technology, two treatments are referred to as treatment A and treatment B that may be used to counteract this foe. However, treatment B becomes scarce and it appreciates in value like any other precious resource. It is peculiar how life plays out sometimes. This blog post examines Simpson's Paradox which is an interesting phenomenon that has been studied for years trying to find the right treatment option. Therefore the purpose of this thought experiment was to understand causal inference models. 
Now can we proceed since we have data on mortality rates among patients suffering from Echovirus-39 who were classified according to severity of disease and type of treatment?

First glances are deceptive, it’s intriguing paradoxes reveal: The average death rate resulting from Treatment A is 16%, just slightly lower than Treatment B’s 19%. Naturally, such misleading evidence might lead us into supporting Treatment A however if you looked at the mild and severe cases separately then the outcome would be different.

Among those with milder condition, Treatment A records a death rate of 15% while Treatment B boasts only 10% loss thereby bringing light to darkness within him/her; In contrast within severe population groupings with a death rate of 30% “Treatment A” shows mourning while “Treatment B” holds some hope showing a death rate of 20%.

Table for this tale:


Now let’s see these figures visually in the table below so that we can understand what Simpson's Paradox really means. These percentages show how many people died among those treated by each method; therefore low numbers mean better outcomes (here in brackets). The paradox occurs when results are analyzed as such whereby although at first glance treatment A appears more advantageous taken over all populations, yet when looking by subpopulations treatment B still comes out on top.

The Key: Causality and Understanding:

According to the figures presented in this dataset, one would be tempted to go with treatment A as indicated by the “Total” column. First impressions of the treatment A show that it leads to good results and thus can be a solution for it too. But more detailed analysis indicates that the “Mild” and “Severe” columns are almost identical in their pattern, therefore favoring option B. This is due to an unequal distribution of people amongst treatments. In other words, among 1,500 patients treated under A, 1,400 had mild cases while in the case of B only 500 (out of 550) had severe illness. So death rates are lower if you have a mild condition; this makes Treatment A have less overall deaths than if moderate and severe were distributed equally. However, treatment B has higher mortality rates indicating opposite directions for these two schemes as well.

This interaction of different levels of severity calls for a better understanding of the causality that underlies making an informed decision. To solve this puzzle, we have to dig into the basic principles of causality. On meditating this riddle, we come to know that it is embedded in the nitty gritty details of the underlying data. The key determinant for choosing the best treatment strategy depends on the specific causal structure being employed. Depending on what drives Disease X, either treatment A or treatment B could potentially win over this disease.

Let us consider 2 scenarios using DAG

Scenario 1

If condition C is a causal factor for treatment T, it can be seen that treatment B reduces death rate more effectively than Treatment A does. This can be illustrated by medical practitioners who give priority to treating patients with mild cases through therapy A and then resort to expensive and scarce intervention B when dealing with those suffering from severe conditions like major trauma injuries. In other words, as having a severe condition (C -> Y) increases the odds of getting treatment B (C -> T), on aggregate terms there would be higher mortality associated with drug B. Thus, in essence, increased mortality rates are connected to common influence caused by both treatments and mortality through condition as confounding variable in-between them. Here, one could argue that it is now confusing relationship between treatment and mortality because some populations will have different risk profiles due to their prior existing conditions which may cause them to react differently towards certain treatments such as giving them protective effects rather than causing their deaths; thus when considering all people together without stratification by risk groups they appear at first glance being exposed worse thing ever prescribed upon recipient population since everyone dies whether they received this medicationor not depending on their individual associations between events involved themselves eventually leading back specifically those who were given dose depending simply upon coarser view about narrow window problems raised earlier where overall impact has been postulated before conducting any study looks into some particular set circumstances surrounding such complex biological system either way there are no good reasons why two drugs should be compared because we randomly chose convenience samples wherein only few patients were assigned take one group other without any intention making valid inference..


Scenario 2: 


If treatment T causes condition C, then it is clear that treatment A is more effective. In this case, when B treatment is limited and thus maybe leading to extended waiting times after prescription, A treatment avoids such problem. Thus, patients with mild Echovirus-39 conditions deteriorate over time as they are put on medication B which eventually cause their deaths thus increasing fatality rate. However, the overall effectiveness of treatment B was reduced despite its efficacy immediately following administration (positive effect along T->Y) due to worsening of the condition by prescribing therapy B (negative effect along T -> C -> Y). Treatment B has a higher cost and therefore has been prescribed with a probability of 0.27 while treatment A has been administered at a probability of 0.73.In addition to this point there’s also important information given above about one situation where for different types treatments made decision regardless whether an individual has disease or not based upon similar conditions created within our case study population so far; yet it remains unknown as how many subjects included in these groups had diagnosed those diseases even though if such figure could easily obtained by counting number individuals who developed condition followed soon afterwards discharge during first visit hospital before being admitted again subsequently resulting less accurate diagnoses given large proportion would actually consist healthy control participants whom did present with any indications suggesting presence associated illness rather than diseased patients themselves having said little each sample cannot chosen representatively like mentioned earlier since small numbers allow statistical tests done upon them instead considering entire patient pool irrespective risks involved ahead next subsection much background guidance setting context research apart from those general considerations discussed previously somewhere another part document concerning multiple reasons why clinicians’ decisions might differ according whether client suffers severe ailment.

From the above, the effectiveness of treatment depends on the causality. On that note, in scenario 1 (where C causes T), treatment B was more effective. However, in scenario 2 (where T causes C), treatment A was more effective. Understanding causality is crucial in resolving Simpson's paradox. Thus, unless one looks at cause-and-effect, the paradox is still there. But when we think about it this way with causes-and-effects involved then no such thing as a paradox anymore.

In this blog series, we examine step by step the causal hierarchy, discussing fundamental issues in causal inference and giving instances. From the start, as a series of initial posts it does not purport to treat all aspects exhaustively; rather it is meant to provide for a basic understanding of these concepts. Henceforth, this piece aims at enhancing causal modeling skills among them who read it up to date on what is taking place in the field today.

Labels:

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home