Skip to content

Back-Door Criterion (p150)

Creative Commons LicenseaGrUMinteractive online version

Authors: Aymen Merrouche and Pierre-Henri Wuillemin.

This notebook follows the example from “The Book Of Why” (Pearl, 2018) chapter 4 page 150

import pyagrum as gum
import pyagrum.lib.notebook as gnb
import pyagrum.causal as csl
import pyagrum.causal.notebook as cslnb

In a causal diagram, confounding bias is due to the flow of non-causal information between treatment XX and outcome YY through back-door paths. To neutralize this bias, we need to block these paths.
To block a non-causal path, we must perform an adjustment operation for a variable or a set of variables that would block the flow of information on that path. Such a set of variables satisfies what we call the “back-door” criterion. A set of variables ZZ satisfies the back-door criterion for (X,Y)(X, Y) if and only if:

  • ZZ blocks all back-door paths between XX and YY. A “back-door path” is any path in the causal diagram between XX and YY starting with an arrow pointing towards XX.
  • No variable in ZZ is a descendant of XX on a causal path, if we adjust for such a variable we would block a path that carries causal information hence the causal effect of XX on YY would be biased.

If a set of ZZ variable satisfies the back-door criterion for (X,Y)(X,Y), the causal effect of XX on YY is given by the formula: P(ydo(x))=zP(yx,z)×P(z)P(y \mid do(x)) = \sum_{z}{P(y \mid x,z) \times P(z)}

e1 = gum.fastBN("X->A->Y;A->B")
e1
G B B Y Y A A A->B A->Y X X X->A
m1 = csl.CausalModel(e1)
cslnb.showCausalImpact(m1, "Y", doing="X", values={})
X X A A X->A Y Y A->Y B B A->B
Causal Model
P(Ydo(X))=AP(AX)(XP(YA)P(X))\begin{equation*}P( Y \mid \text{do}(X)) = \sum_{A}{P\left(A\mid X\right) \cdot \left(\sum_{X'}{P\left(Y\mid A\right) \cdot P\left(X'\right)}\right)}\end{equation*}


Explanation : frontdoor [‘A’] found.

Y
X
0
1
0
0.51590.4841
1
0.50550.4945

Impact
## This function returns the set of variables which satisfies the back-door criterion for (X, Y)
## None if there are no back-door paths.
setOfVars = m1.backDoor("X", "Y")
print("The set of variables which satisfies the back-door criterion for (X, Y) is :", setOfVars)
The set of variables which satisfies the back-door criterion for (X, Y) is : None

No incoming arrows into X, therefore there are no back-door paths between XX and YY (as if we did a graph surgery according to the do operator), direct causal path XAYX \rightarrow A \rightarrow Y.

e2 = gum.fastBN("A->B->C;A->X->E->Y;B<-D->E")
e2
G D D B B D->B E E D->E C C A A X X A->X A->B Y Y X->E B->C E->Y
m2 = csl.CausalModel(e2)
gnb.show(m2)

svg

cslnb.showCausalImpact(m2, "Y", doing="X", values={})
A A B B A->B X X A->X C C B->C E E X->E Y Y E->Y D D D->B D->E
Causal Model
P(Ydo(X))=EP(EX)(XP(YE)P(X))\begin{equation*}P( Y \mid \text{do}(X)) = \sum_{E}{P\left(E\mid X\right) \cdot \left(\sum_{X'}{P\left(Y\mid E\right) \cdot P\left(X'\right)}\right)}\end{equation*}


Explanation : frontdoor [‘E’] found.

Y
X
0
1
0
0.24730.7527
1
0.20990.7901

Impact
## This function returns the set of variables which satisfies the back-door criterion for (X, Y)
## None if there are no back-door paths.
setOfVars = m2.backDoor("X", "Y")
print("The set of variables which satisfies the back-door criterion for (X, Y) is :", setOfVars)
The set of variables which satisfies the back-door criterion for (X, Y) is : None

There is one back-door path from XX to YY : XABDEYX \leftarrow A \rightarrow B \leftarrow D \rightarrow E \rightarrow Y We don’t need to control for any set of variables; this back-door path is blocked by collider node BB (two incoming arrows) ABDA \rightarrow B \leftarrow D Controlling for collider node BB would open this causal path (controlling for colliders increases bias), direct causal path XEYX \rightarrow E \rightarrow Y.

e3 = gum.fastBN("B->X->Y;X->A<-B->Y")
e3
G B B A A B->A Y Y B->Y X X B->X X->A X->Y
m3 = csl.CausalModel(e3)
cslnb.showCausalImpact(m3, "Y", doing="X", values={})
B B X X B->X Y Y B->Y A A B->A X->Y X->A
Causal Model
P(Ydo(X))=BP(YB,X)P(B)\begin{equation*}P( Y \mid \text{do}(X)) = \sum_{B}{P\left(Y\mid B,X\right) \cdot P\left(B\right)}\end{equation*}


Explanation : backdoor [‘B’] found.

Y
X
0
1
0
0.65520.3448
1
0.18710.8129

Impact
## This function returns the set of variables which satisfies the back-door criterion for (X, Y)
## None if there are no back-door paths.
setOfVars = m3.backDoor("X", "Y")
print("The set of variables which satisfies the back-door criterion for (X, Y) is :", setOfVars)
The set of variables which satisfies the back-door criterion for (X, Y) is : {'B'}

There is one back-door path from XX to YY : YBXY \leftarrow B \rightarrow X We need to block it by controlling for BB wich satisfies the back-door criterion.

e4 = gum.fastBN("X<-A->B<-C->Y")
e4
G C C Y Y C->Y B B C->B A A X X A->X A->B
m4 = csl.CausalModel(e4)
cslnb.showCausalImpact(m4, "Y", doing="X", values={})
X X A A A->X B B A->B C C C->B Y Y C->Y
Causal Model
P(Ydo(X))=P(Y)\begin{equation*}P( Y \mid \text{do}(X)) = P\left(Y\right)\end{equation*}


Explanation : No causal effect of X on Y, because they are d-separated (conditioning on the observed variables if any).

Y
0
1
0.50430.4957

Impact
## This function returns the set of variables which satisfies the back-door criterion for (X, Y)
## None if there are no back-door paths.
setOfVars = m4.backDoor("X", "Y")
print("The set of variables which satisfies the back-door criterion for (X, Y) is :", setOfVars)
The set of variables which satisfies the back-door criterion for (X, Y) is : None

There is one back-door path from XX to YY : XABCYX \leftarrow A \rightarrow B \leftarrow C \rightarrow Y We don’t need to control for any set of variables, this back-door path is blocked by collider node BB, the two variables are d-separated, deconfounded, independent. Controlling for collider node BB would make them dependant (introducing the M-bias).

e5 = gum.fastBN("X<-B<-A->X->Y<-C->B")
e5
G C C Y Y C->Y B B C->B A A X X A->X A->B X->Y B->X
m5 = csl.CausalModel(e5)
cslnb.showCausalImpact(m5, "Y", doing="X", values={})
X X Y Y X->Y B B B->X A A A->X A->B C C C->B C->Y
Causal Model
P(Ydo(X))=CP(YC,X)P(C)\begin{equation*}P( Y \mid \text{do}(X)) = \sum_{C}{P\left(Y\mid C,X\right) \cdot P\left(C\right)}\end{equation*}


Explanation : backdoor [‘C’] found.

Y
X
0
1
0
0.96980.0302
1
0.87080.1292

Impact

Game 4 and 5

## This function returns the set of variables which satisfies the back-door criterion for (X, Y)
## None if there are no back-door paths.
setOfVars = m5.backDoor("X", "Y")
print("The set of variables which satisfies the back-door criterion for (X, Y) is :", setOfVars)
The set of variables which satisfies the back-door criterion for (X, Y) is : {'C'}

The difference between this example and the previous one is that we added an arrow between BB and XX ( BXB \rightarrow X ), this opens a new back-door path between XX and YY that isn’t blocked by any colliders XBCYX \leftarrow B \leftarrow C \rightarrow Y We need to block the non-causal information that flows through it, controlling for BB closes this backdoor path (it prevents information from getting from XX to CC). However, this action will open the back-door path that was formerly blocked by collider node BB that we are adjusting for now: XABCYX \leftarrow A \rightarrow B \leftarrow C \rightarrow Y And, in this case, in addition to BB we would also control for CC or for AA to reblock the path we opened and to block the new path.

Another solution is to control for CC (it prevents information from getting from BB to YY) which satisfies the back-door criterion, it blocks the new path without reopening the one that is blocked by BB.

e6 = gum.fastBN("A->X;A->B;D->A;B->X;C->B;C->E;C->Y;D->C;E->Y;E->X;F->C;F->X;F->Y;G->X;G->Y;X->Y")
e6
G D D C C D->C A A D->A F F F->C Y Y F->Y X X F->X C->Y B B C->B E E C->E A->X A->B X->Y B->X G G G->Y G->X E->Y E->X
m6 = csl.CausalModel(e6)
cslnb.showCausalImpact(m6, "Y", doing="X", values={})
A A X X A->X B B A->B Y Y X->Y B->X D D D->A C C D->C C->B E E C->E C->Y E->X E->Y F F F->X F->C F->Y G G G->X G->Y
Causal Model
P(Ydo(X))=C,E,F,GP(YC,E,F,G,X)P(C,E,F,G)\begin{equation*}P( Y \mid \text{do}(X)) = \sum_{C,E,F,G}{P\left(Y\mid C,E,F,G,X\right) \cdot P\left(C,E,F,G\right)}\end{equation*}


Explanation : backdoor [‘C’, ‘E’, ‘F’, ‘G’] found.

Y
X
0
1
0
0.68670.3133
1
0.59600.4040

Impact
## This function returns the set of variables which satisfies the back-door criterion for (X, Y)
## None if there are no back-door paths.
setOfVars = m6.backDoor("X", "Y")
print("The set of variables which satisfies the back-door criterion for (X, Y) is :", setOfVars)
The set of variables which satisfies the back-door criterion for (X, Y) is : {'F', 'C', 'G', 'E'}

Back-door paths are:

    • XGYX \leftarrow G \rightarrow Y
    • XEYX \leftarrow E \rightarrow Y and any other back-door paths that go through EE
    • XFYX \leftarrow F \rightarrow Y and any other back-door paths that go through FF
    • Blocked by collider BB : XABCYX \leftarrow A \rightarrow B \leftarrow C \rightarrow Y and any other back-door paths that go through A A will go through CC
    • XBCYX \leftarrow B \leftarrow C \rightarrow Y and any other back-door paths that go through BB will go through CC

      Two sets of variables that satisfy the back-door criterion are:
  • {CC,EE,FF,GG} blocking (1), (2), (3) and (5)
  • {AA,BB,EE,FF,GG} blocking (1), (2), (3), (5), opening (4) and reblocking it.