Smoking industry and some other skeptic statisticians advanced the theory that smokers are genetically different from nonsmokers. A smoking gene could be a confounder that would explain the observed association.
Explanation : No causal effect of X on Y, because they are d-separated (conditioning on the observed variables if any).
Lung Cancer
0
1
0.5824
0.4176
Impact
This constitutional hypothesis was untestable, we couldn’t sequence the human genome at the time.
However, this hypothesis wasn’t plausible because the observed association was way too strong.
To explain this association, another hypothesis was that a smoking gene could be a confounder but there was still a direct causal effect between smoking on lung cancer:
Let’s suppose now that smoking causes cancer only through tar deposits that are fully due to the physical action of cigarettes, the causal diagram becomes:
Even if the smoking gene is unobservable, we can assess the causal effect of Smoking on Lung Cancer using the front-door method. In this case, the front-door is: Smoking→Tar→LungCancer
It consists of variables that we have observed:
We can measure the causal effect of Smoking on Tar, there are no open back-doors between the two (Tar←Smoking→SmokingGene←LungCancer is blocked by the collider node LungCancer)
P(Tar∣do(Smoking))=P(Tar∣Smoking)
We can measure the causal effect of Tar on LungCancer, we just need to adjust for the Smoking to block the “back-door path” Tar←Smoking←SmokingGene→LungCancerP(LungCancer∣do(Tar))=∑SmokingP(LungCancer∣Tar,Smoking)×P(Smoking)
We can now combine these two pieces of information to have the causal effect of Smoking on LungCancer and reduce the expression of P(LungCancer∣do(Smoking)) to elements that we observed:
P(LungCancer∣do(Smoking))=∑∗Tar(P(Tar∣Smoking)×∑∗Smoking′P(LungCancer∣Tar,Smoking′)×P(Smoking′))
Studies have shown that babies of smoking mothers tend to weigh less than average. Other studies have shown that low-birth-weight babies have a higher mortality rate than normal-birth-weight babies. The corresponding causal diagram is the following causal:
However the data also showed that low-birth-weight babies of smoker mothers had lower mortality rates than low-birth-weight babies of non-smoker mothers.
An explanation for this paradoxical situation is that low-birth-weight is either due to a smoking mother or to another birth defect that is much more threatening to the baby’s health. The causal diagram becomes:
Pinpointing the source of this paradoxical situation becomes easy thanks to this causal diagram: “collider bias”.”Low Birth Weight” is a collider!
The data only concerned low-birth-weight babies (it is as if we are adjusting for “Low Birth Weight.”). Knowing that the mother doesn’t smoke increases our belief that a birth defect is the cause of the low-birth-weight, and a birth defect is more threatening for the baby’s health. This opened the backdoor path formerly blocked and allowed non-causal information to flow from Smoking to Mortality (Smoking→LowBirthWeight←Birthdefect→Mortality) introducing a bias.