Skip to content

Walking Example (p135)

Creative Commons LicenseaGrUMinteractive online version

Authors: Aymen Merrouche and Pierre-Henri Wuillemin.

This notebook follows the example from “The Book Of Why” (Pearl, 2018) chapter 4 page135

import pyagrum as gum
import pyagrum.lib.notebook as gnb
import pyagrum.causal as csl
import pyagrum.causal.notebook as cslnb

In 1998 a study unveiled a correlation between physical exercise and longevity among nonsmoking retired men. Of course what we want to know is whether men who exercise more live longer, suggesting a causal relationship. Study measurements are to be found at the end of this notebook.

The corresponding causal diagram is the following:

## We create the causal diagram
we = gum.fastBN("Walking{casual|normal|intense}->Mortality{dead|alive}")
## We fill the CPTs
we.cpt("Walking")[:] = [151 / 707, 379 / 707, 177 / 707]
we.cpt("Mortality")[{"Walking": "casual"}] = [0.43, 0.57]
we.cpt("Mortality")[{"Walking": "intense"}] = [0.215, 0.785]
we.cpt("Mortality")[{"Walking": "normal"}] = [0.277, 0.723]
gnb.sideBySide(
we,
we.cpt("Walking") * we.cpt("Mortality"),
we.cpt("Walking"),
we.cpt("Mortality"),
captions=["the BN", "the joint distribution", "the marginal for $Walking$", "the CPT for $Mortality$"],
)
G Walking Walking Mortality Mortality Walking->Mortality
the BN
Walking
Mortality
casual
normal
intense
dead
0.09180.14850.0538
alive
0.12170.38760.1965

the joint distribution
Walking
casual
normal
intense
0.21360.53610.2504

the marginal for $Walking$
Mortality
Walking
dead
alive
casual
0.43000.5700
normal
0.27700.7230
intense
0.21500.7850

the CPT for $Mortality$

The study showed that after 12 years, 43% of casual walkers died while only 21,5% of intense walkers died.

Causal effect of walking on mortality in this model:

Section titled “Causal effect of walking on mortality in this model:”
weModele = csl.CausalModel(we)
cslnb.showCausalImpact(weModele, "Mortality", doing="Walking", values={})
Walking Walking Mortality Mortality Walking->Mortality
Causal Model
P(Mortalitydo(Walking))=P(MortalityWalking)\begin{equation*}P( Mortality \mid \text{do}(Walking)) = P\left(Mortality\mid Walking\right)\end{equation*}


Explanation : Do-calculus computations

Mortality
Walking
dead
alive
casual
0.43000.5700
normal
0.27700.7230
intense
0.21500.7850

Impact

Before jumping to any conclusions, we should consider the presence of possible confounders. We need to ask the following question: what characterizes intense walkers from casual walkers?
Without abandoning the idea of a possible cause-and-effect relationship between walking and mortality, we introduce a third variable, a “confounder”, a common cause of the two variables that could explain the correlation that exists between them. Our aim is to distinguish between the causal effect of walking on mortality (if there is a cause and effect relationship) the bias induced by this third variable. For this purpose, we need to adjust for it.

weModele1 = csl.CausalModel(we, [("confounder", ["Walking", "Mortality"])], True)
gnb.show(weModele1)

svg

cslnb.showCausalImpact(weModele1, "Mortality", "Walking", values={"Walking": "intense"})
confounder Walking Walking confounder->Walking Mortality Mortality confounder->Mortality Walking->Mortality
Causal Model
Hedge Error: G={'Walking', 'Mortality'}, G[S]={'Mortality'}
Impossible
No result
Impact

We want to measure the causal effect of walking on mortality, the introduction of a confounding bias occurs when a third variable called “confounding variable” influences both walking and mortality.
An obvious confounder is age, younger subjects exercise more and have more time to live! (there are other confounders)

wea = gum.fastBN("Age{cat1|cat2|cat3}->Walking{casual|normal|intense}->Mortality{dead|alive}<-Age{cat1|cat2|cat3}")
gnb.sideBySide(
wea,
wea.cpt("Age"),
wea.cpt("Walking"),
wea.cpt("Mortality"),
captions=["the BN", "the marginal for $Age$", "the CPT for $Walking$", "the CPT for $Mortality$"],
)
G Walking Walking Mortality Mortality Walking->Mortality Age Age Age->Walking Age->Mortality
the BN
Age
cat1
cat2
cat3
0.40980.35570.2345

the marginal for $Age$
Walking
Age
casual
normal
intense
cat1
0.39350.39780.2086
cat2
0.53960.32720.1332
cat3
0.26160.47330.2651

the CPT for $Walking$
Mortality
Age
Walking
dead
alive
cat1
casual
0.30690.6931
normal
0.45080.5492
intense
0.35860.6414
cat2
casual
0.34570.6543
normal
0.58420.4158
intense
0.61480.3852
cat3
casual
0.45170.5483
normal
0.72800.2720
intense
0.47390.5261

the CPT for $Mortality$

Causal effect of walking on mortality with age as a confounder:

Section titled “Causal effect of walking on mortality with age as a confounder:”
weModele2 = csl.CausalModel(wea)
cslnb.showCausalImpact(weModele2, "Mortality", "Walking", values={})
Age Age Walking Walking Age->Walking Mortality Mortality Age->Mortality Walking->Mortality
Causal Model
P(Mortalitydo(Walking))=AgeP(MortalityAge,Walking)P(Age)\begin{equation*}P( Mortality \mid \text{do}(Walking)) = \sum_{Age}{P\left(Mortality\mid Age,Walking\right) \cdot P\left(Age\right)}\end{equation*}


Explanation : backdoor [‘Age’] found.

Mortality
Walking
dead
alive
casual
0.35460.6454
normal
0.56330.4367
intense
0.47680.5232

Impact

We adjusted for Age using the back-door criterion (Age blocks all back-door paths from Walking to Mortality, setting Walking= “intense” or conditioning on Walking=“intense” has the same effect on Mortality)

After adjusting for age, we obtain that 40.5% (43% unadjusted) of casual walkers died, whereas only 23.8% (21,5% unadjusted) of intense walkers died. The correlation induced by Age between the two variables is negligible.
Even after adjusting for all plausible confounders, after getting rid of all confounding bias, Walking is still associated to Mortality. Unless we missed any other confounders, in which case the remaining uncertainty is proportional to the correlation induced by these hidden variables, we can say that intentional walking prolongs life among the studied population.

In an observational study, adjusting for confounding factors is systematic in order to measure the causal effect of a treatment on an outcome.

Study measurements both unadjusted and age-adjusted:

Section titled “Study measurements both unadjusted and age-adjusted:”

title