Simpson's Paradox
This notebook follows the famous example from Causality (Pearl, 2009).
![]() | ![]() |
import pyagrum as gumimport pyagrum.lib.notebook as gnbimport pyagrum.causal as cslimport pyagrum.causal.notebook as cslnbIn a statistical study about a drug, we try to evaluate the latter’s efficiency among a population of men and women. Let’s note:
- : drug taking
- : cured patient
- : patient’s gender
The model from the observed date is as follow :
m1 = gum.fastBN("Gender{F|M}->Drug{Without|With}->Patient{Sick|Healed}<-Gender")
m1.cpt("Gender")[:] = [0.5, 0.5]m1.cpt("Drug")[:] = [ [0.25, 0.75], # Gender=F [0.75, 0.25],] # Gender=M
m1.cpt("Patient")[{"Drug": "Without", "Gender": "F"}] = [0.2, 0.8] # No Drug, Male -> healed in 0.8 of casesm1.cpt("Patient")[{"Drug": "Without", "Gender": "M"}] = [0.6, 0.4] # No Drug, Female -> healed in 0.4 of casesm1.cpt("Patient")[{"Drug": "With", "Gender": "F"}] = [0.3, 0.7] # Drug, Male -> healed 0.7 of casesm1.cpt("Patient")[{"Drug": "With", "Gender": "M"}] = [0.8, 0.2] # Drug, Female -> healed in 0.2 of casesgnb.flow.row(m1, m1.cpt("Gender"), m1.cpt("Drug"), m1.cpt("Patient"))|
|
|
|---|---|
| 0.5000 | 0.5000 |
|
|
| |
|---|---|---|
| 0.2500 | 0.7500 | |
| 0.7500 | 0.2500 | |
|
|
| ||
|---|---|---|---|
|
| 0.2000 | 0.8000 | |
| 0.3000 | 0.7000 | ||
|
| 0.6000 | 0.4000 | |
| 0.8000 | 0.2000 | ||
def getCuredObservedProba(m1, evs): evs0 = dict(evs) evs1 = dict(evs) evs0["Drug"] = "Without" evs1["Drug"] = "With"
return ( gum.Tensor() .add(m1["Drug"]) .fillWith( [gum.getPosterior(m1, target="Patient", evs=evs0)[1], gum.getPosterior(m1, target="Patient", evs=evs1)[1]] ) )
gnb.sideBySide( getCuredObservedProba(m1, {}), getCuredObservedProba(m1, {"Gender": "F"}), getCuredObservedProba(m1, {"Gender": "M"}), captions=[ r"$P(Patient = Healed \mid Drug )$<br/>Taking $Drug$ is observed as efficient to cure", r"$P(Patient = Healed \mid Gender=F,Drug)$<br/>except if the $gender$ of the patient is female", r"$P(Patient = Healed \mid Gender=M,Drug)$<br/>... or male.", ],)Those results form a paradox called Simpson paradox :
Actuallay, giving the drug is not an observation in our model but rather an intervention. What if we use intervention instead of observation ?
How to compute causal impacts on the patient’s health ?
Section titled “How to compute causal impacts on the patient’s health ?”We propose this causal model.
d1 = csl.CausalModel(m1)cslnb.showCausalModel(d1)cslnb.showCausalImpact(d1, "Patient", doing="Drug", values={"Drug": "Without"})|
|
|
|---|---|
| 0.4000 | 0.6000 |
We have,
d1 = csl.CausalModel(m1)cslnb.showCausalImpact(d1, "Patient", "Drug", values={"Drug": "With"})|
|
|
|---|---|
| 0.5500 | 0.4500 |
And then :
Therefore :
Which means that taking this drug would not enhance the patient’s healing process, and it is better not to prescribe this drug for treatment.
Simpson paradox solved by interventions
Section titled “Simpson paradox solved by interventions”So to summarize, the paradox appears when wrongly dealing with observations on :
gnb.sideBySide( getCuredObservedProba(m1, {}), getCuredObservedProba(m1, {"Gender": "F"}), getCuredObservedProba(m1, {"Gender": "M"}), captions=[ r"$P(Patient = Healed \mid Drug )$<br/>Taking $Drug$ is observed as efficient to cure", r"$P(Patient = Healed \mid Gender=F,Drug)$<br/>except if the $gender$ of the patient is female", r"$P(Patient = Healed \mid Gender=M,Drug)$<br/>... or male.", ],)… and disappears when dealing with intervention on :
gnb.sideBySide( csl.causalImpact(d1, on="Patient", doing="Drug", values={"Patient": "Healed"})[1], csl.causalImpact(d1, on="Patient", doing="Drug", knowing={"Gender"}, values={"Patient": "Healed", "Gender": "F"})[1], csl.causalImpact(d1, on="Patient", doing="Drug", knowing={"Gender"}, values={"Patient": "Healed", "Gender": "M"})[1], captions=[ r"$P(Patient = 1 \mid \text{do}(Drug) )$<br/>Effectively $Drug$ taking is not efficient to cure", r"$P(Patient = 1 \mid \text{do}(Drug), gender=F )$<br/>, the $gender$ of the patient being female", r"$P(Patient = 1 \mid \text{do}(Drug), gender=M )$<br/>, ... or male.", ],)
