This notebook follows the famous example from Causality (Pearl, 2009).
A correlation has been observed between Smoking and Cancer, represented by this Bayesian network :
import pyagrum.lib.notebook as gnb
import pyagrum.causal as csl
import pyagrum.causal.notebook as cslnb
obs1 = gum.fastBN( "Smoking->Cancer" )
obs1.cpt( "Smoking" )[:] = [ 0.6 , 0.4 ]
obs1.cpt( "Cancer" )[{ "Smoking" : 0 }] = [ 0.9 , 0.1 ]
obs1.cpt( "Cancer" )[{ "Smoking" : 1 }] = [ 0.7 , 0.3 ]
obs1.cpt( "Smoking" ) * obs1.cpt( "Cancer" ),
captions = [ "the BN" , "the joint distribution" , "the marginal for $smoking$" , "the CPT for $cancer$" ],
Smoking
Cancer
0
1
0 0.5400 0.2800
1 0.0600 0.1200
the joint distribution the marginal for $smoking$
Cancer
Smoking
0
1
0 0.9000 0.1000
1 0.7000 0.3000
the CPT for $cancer$
The very strong observed correlation between smoking and lung cancer suggests a causal relationship as the Surgeon General asserts in 1964, then, the proposed model is as follows :
## the Bayesian network is causal
modele1 = csl.CausalModel(obs1)
cslnb.showCausalImpact(modele1, "Cancer" , "Smoking" , values = { "Smoking" : 1 })
Smoking
Smoking
Cancer
Cancer
Smoking->Cancer
Causal Model
P ( C a n c e r ∣ do ( S m o k i n g ) ) = P ( C a n c e r ∣ S m o k i n g ) \begin{equation*}P( Cancer \mid \text{do}(Smoking)) = P\left(Cancer\mid Smoking\right)\end{equation*} P ( C an cer ∣ do ( S m o kin g )) = P ( C an cer ∣ S m o kin g )
Explanation : Do-calculus computations
This model is highly contested by the tobacco industry which answers by proposing a different model in which Smoking and Cancer are simultaneously provoked by a common factor, the Genotype (or other latent variable) :
## a latent varible exists between Smoking and Cancer in the causal model
modele2 = csl.CausalModel(obs1, [( "Genotype" , [ "Smoking" , "Cancer" ])])
cslnb.showCausalImpact(modele2, "Cancer" , "Smoking" , values = { "Smoking" : 1 })
Genotype
Smoking
Smoking
Genotype->Smoking
Cancer
Cancer
Genotype->Cancer
Causal Model
P ( C a n c e r ∣ do ( S m o k i n g ) ) = P ( C a n c e r ) \begin{equation*}P( Cancer \mid \text{do}(Smoking)) = P\left(Cancer\right)\end{equation*} P ( C an cer ∣ do ( S m o kin g )) = P ( C an cer )
Explanation : No causal effect of X on Y, because they are d-separated (conditioning on the observed variables if any).
## just check P(Cancer) in the bn `obs1`
(obs1.cpt( "Smoking" ) * obs1.cpt( "Cancer" )).sumIn([ "Cancer" ])
In a diplomatic effort, both parts agree that there must be some truth in both models :
## a latent variable exists between Smoking and Cancer but the direct causal relation exists also
modele3 = csl.CausalModel(obs1, [( "Genotype" , [ "Smoking" , "Cancer" ])], True )
cslnb.showCausalImpact(modele3, "Cancer" , "Smoking" , values = { "Smoking" : 1 })
Genotype
Smoking
Smoking
Genotype->Smoking
Cancer
Cancer
Genotype->Cancer
Smoking->Cancer
Causal Model Hedge Error: G={'Smoking', 'Cancer'}, G[S]={'Cancer'}
Impossible No result
Impact
Smoking’s causal effect on Cancer becomes uncomputable in such a model because we can’t distinguish both causes’ impact from the observations.
We introduce an auxilary factor between Smoking and Cancer, tobacco causes cancer because of the tar deposits in the lungs.
obs2 = gum.fastBN( "Smoking->Tar->Cancer;Smoking->Cancer" )
obs2.cpt( "Smoking" )[:] = [ 0.6 , 0.4 ]
obs2.cpt( "Tar" )[{ "Smoking" : 0 }] = [ 0.9 , 0.1 ]
obs2.cpt( "Tar" )[{ "Smoking" : 1 }] = [ 0.7 , 0.3 ]
obs2.cpt( "Cancer" )[{ "Tar" : 0 , "Smoking" : 0 }] = [ 0.9 , 0.1 ]
obs2.cpt( "Cancer" )[{ "Tar" : 1 , "Smoking" : 0 }] = [ 0.8 , 0.2 ]
obs2.cpt( "Cancer" )[{ "Tar" : 0 , "Smoking" : 1 }] = [ 0.7 , 0.3 ]
obs2.cpt( "Cancer" )[{ "Tar" : 1 , "Smoking" : 1 }] = [ 0.6 , 0.4 ]
captions = [ "" , "$P(Smoking)$" , "$P(Tar|Smoking)$" , "$P(Cancer|Tar,Smoking)$" ],
G
Tar
Tar
Cancer
Cancer
Tar->Cancer
Smoking
Smoking
Smoking->Tar
Smoking->Cancer
Tar
Smoking
0
1
0 0.9000 0.1000
1 0.7000 0.3000
$P(Tar|Smoking)$
Cancer
Smoking Tar
0
1
0 0 0.9000 0.1000
1 0.8000 0.2000
1 0 0.7000 0.3000
1 0.6000 0.4000
$P(Cancer|Tar,Smoking)$
modele4 = csl.CausalModel(obs2, [( "Genotype" , [ "Smoking" , "Cancer" ])])
cslnb.showCausalModel(modele4)
cslnb.showCausalImpact(modele4, "Cancer" , "Smoking" , values = { "Smoking" : 1 })
Genotype
Smoking
Smoking
Genotype->Smoking
Cancer
Cancer
Genotype->Cancer
Tar
Tar
Smoking->Tar
Tar->Cancer
Causal Model
P ( C a n c e r ∣ do ( S m o k i n g ) ) = ∑ T a r P ( T a r ∣ S m o k i n g ) ⋅ ( ∑ S m o k i n g ′ P ( C a n c e r ∣ S m o k i n g ′ , T a r ) ⋅ P ( S m o k i n g ′ ) ) \begin{equation*}P( Cancer \mid \text{do}(Smoking)) = \sum_{Tar}{P\left(Tar\mid Smoking\right) \cdot \left(\sum_{Smoking'}{P\left(Cancer\mid Smoking',Tar\right) \cdot P\left(Smoking'\right)}\right)}\end{equation*} P ( C an cer ∣ do ( S m o kin g )) = T a r ∑ P ( T a r ∣ S m o kin g ) ⋅ S m o kin g ′ ∑ P ( C an cer ∣ S m o kin g ′ , T a r ) ⋅ P ( S m o kin g ′ )
Explanation : frontdoor [‘Tar’] found.
In this model, we are, again, able to calculate the causal impact of Smoking on Cancer thanks to the verification of the Frontdoor criterion by the Tar relatively to the couple (Smoking, Cancer)
## just check P(Cancer|do(smoking)) in the bn `obs2`
((obs2.cpt( "Cancer" ) * obs2.cpt( "Smoking" )).sumOut([ "Smoking" ]) * obs2.cpt( "Tar" )).sumOut([ "Tar" ]).putFirst( "Cancer" )
Cancer
Smoking
0
1
0 0.8100 0.1900
1 0.7900 0.2100
cslnb.showCausalImpact(modele4, "Smoking" , doing = "Cancer" , knowing = { "Tar" }, values = { "Cancer" : 1 , "Tar" : 1 })
Genotype
Smoking
Smoking
Genotype->Smoking
Cancer
Cancer
Genotype->Cancer
Tar
Tar
Smoking->Tar
Tar->Cancer
Causal Model
P ( S m o k i n g ∣ do ( C a n c e r ) , T a r ) = P ( S m o k i n g ∣ T a r ) \begin{equation*}P( Smoking \mid \text{do}(Cancer), Tar) = P\left(Smoking\mid Tar\right)\end{equation*} P ( S m o kin g ∣ do ( C an cer ) , T a r ) = P ( S m o kin g ∣ T a r )
Explanation : No causal effect of X on Y, because they are d-separated (conditioning on the observed variables if any).
cslnb.showCausalImpact(modele4, "Smoking" , doing = "Cancer" , values = { "Cancer" : 1 })
Genotype
Smoking
Smoking
Genotype->Smoking
Cancer
Cancer
Genotype->Cancer
Tar
Tar
Smoking->Tar
Tar->Cancer
Causal Model
P ( S m o k i n g ∣ do ( C a n c e r ) ) = P ( S m o k i n g ) \begin{equation*}P( Smoking \mid \text{do}(Cancer)) = P\left(Smoking\right)\end{equation*} P ( S m o kin g ∣ do ( C an cer )) = P ( S m o kin g )
Explanation : Do-calculus computations
cslnb.showCausalImpact(modele4, "Smoking" , doing = { "Cancer" , "Tar" }, values = { "Cancer" : 1 , "Tar" : 1 })
Genotype
Smoking
Smoking
Genotype->Smoking
Cancer
Cancer
Genotype->Cancer
Tar
Tar
Smoking->Tar
Tar->Cancer
Causal Model
P ( S m o k i n g ∣ do ( T a r ) , do ( C a n c e r ) ) = P ( S m o k i n g ) \begin{equation*}P( Smoking \mid \text{do}(Tar),\text{do}(Cancer)) = P\left(Smoking\right)\end{equation*} P ( S m o kin g ∣ do ( T a r ) , do ( C an cer )) = P ( S m o kin g )
Explanation : Do-calculus computations
cslnb.showCausalImpact(modele4, "Tar" , doing = { "Cancer" , "Smoking" }, values = { "Cancer" : 1 , "Smoking" : 1 })
Genotype
Smoking
Smoking
Genotype->Smoking
Cancer
Cancer
Genotype->Cancer
Tar
Tar
Smoking->Tar
Tar->Cancer
Causal Model
P ( T a r ∣ do ( C a n c e r ) , do ( S m o k i n g ) ) = P ( T a r ∣ S m o k i n g ) \begin{equation*}P( Tar \mid \text{do}(Cancer),\text{do}(Smoking)) = P\left(Tar\mid Smoking\right)\end{equation*} P ( T a r ∣ do ( C an cer ) , do ( S m o kin g )) = P ( T a r ∣ S m o kin g )
Explanation : Do-calculus computations
csl.causalImpact(modele1, on = "Cancer" , doing = "Smoking" )[ 0 ],
csl.causalImpact(modele2, on = "Cancer" , doing = "Smoking" )[ 0 ],
csl.causalImpact(modele3, on = "Cancer" , doing = "Smoking" )[ 0 ],
csl.causalImpact(modele4, on = "Cancer" , doing = "Smoking" )[ 0 ],
Smoking
Smoking
Cancer
Cancer
Smoking->Cancer
P ( C a n c e r ∣ do ( S m o k i n g ) ) = P ( C a n c e r ∣ S m o k i n g ) P( Cancer \mid \text{do}(Smoking)) = P\left(Cancer\mid Smoking\right) P ( C an cer ∣ do ( S m o kin g )) = P ( C an cer ∣ S m o kin g )
Genotype
Smoking
Smoking
Genotype->Smoking
Cancer
Cancer
Genotype->Cancer
P ( C a n c e r ∣ do ( S m o k i n g ) ) = P ( C a n c e r ) P( Cancer \mid \text{do}(Smoking)) = P\left(Cancer\right) P ( C an cer ∣ do ( S m o kin g )) = P ( C an cer )
Genotype
Smoking
Smoking
Genotype->Smoking
Cancer
Cancer
Genotype->Cancer
Smoking->Cancer
None
Genotype
Smoking
Smoking
Genotype->Smoking
Cancer
Cancer
Genotype->Cancer
Tar
Tar
Smoking->Tar
Tar->Cancer
P ( C a n c e r ∣ do ( S m o k i n g ) ) = ∑ T a r P ( T a r ∣ S m o k i n g ) ⋅ ( ∑ S m o k i n g ′ P ( C a n c e r ∣ S m o k i n g ′ , T a r ) ⋅ P ( S m o k i n g ′ ) ) P( Cancer \mid \text{do}(Smoking)) = \sum_{Tar}{P\left(Tar\mid Smoking\right) \cdot \left(\sum_{Smoking'}{P\left(Cancer\mid Smoking',Tar\right) \cdot P\left(Smoking'\right)}\right)} P ( C an cer ∣ do ( S m o kin g )) = ∑ T a r P ( T a r ∣ S m o kin g ) ⋅ ( ∑ S m o kin g ′ P ( C an cer ∣ S m o kin g ′ , T a r ) ⋅ P ( S m o kin g ′ ) )