Skip to content

Learning a CLG

One of the main features of this library is the possibility to learn a CLG.

More precisely what can be learned is : : - The dependency graph of a CLG

  • The parameters of a CLG: the mu and sigma of each variable, the coefficients of the arcs

Learning the graph

To learn the graph of a CLG (ie the dependence between variables) we use a modified PC algorithm based on the workof Diego Colombo, Marloes H. Maathuis: Order-Independent Constraint-Based Causal Structure Learning(2014).

The independence test used is based on the work of Dario Simionato, Fabio Vandin: Bounding the Family-Wise Error Rate in Local Causal Discovery using Rademacher Averages(2022).

class pyagrum.clg.learning.CLGLearner(filename, , n_sample=15, fwer_delta=0.05)

Section titled “class pyagrum.clg.learning.CLGLearner(filename, , n_sample=15, fwer_delta=0.05)”

Using Rademacher Average to guarantee FWER(Family Wise Error Rate) in independency test. (see “Bounding the Family-Wise Error Rate in Local Causal Discover using Rademacher Averages”, Dario Simionato, Fabio Vandin, 2022)

  • Parameters:
    • filename (str)
    • n_sample (int)
    • fwer_delta (float)

This function is the first step of PC-algo: Adjacency Search. Apply indep_test() to the first step of PC-algo for Adjacency Search.

  • Parameters:
    • order (List [**NodeId ]) – A particular order of the Nodes.
    • verbose (bool) – Whether to print the process of Adjacency Search.
  • Returns:
    • C (Dict[NodeId, Set[NodeId]]) – The temporary skeleton.
    • sepset (Dict[Tuple[NodeId, NodeId], Set[NodeId]]) – Sepset(which will be used in Step2&3 of PC-Algo).

This function is an advanced version of PC-algo. We use Indep_test_Rademacher() to replace indep_test() in PC-algo. And we orient the undirected edges in the skeleton C by comparing the variances of the two nodes.

  • Parameters:
    • order (List [**NodeId ]) – A particular order of the Nodes.
    • verbose (bool) – Whether to print the process of the PC algorithm.
  • Returns: C – A directed graph DAG representing the causal structure.
  • Return type: Dict[NodeId, Set[NodeId]]

Estimate Pearson’s linear correlation(using linear regression when Z is not empty).

X : id of the first variable tested.

Y : id of the second variable tested.

Z : The conditioned variable’s id set.

Find the Markov Boundary of variable T with FWER lower than Delta.

  • Parameters: T (NodeId) – The id of the target variable T.
  • Returns: MB – The Markov Boundary of variable T with FWER lower than Delta.
  • Return type: Set[NodeId]

Find the Parent-Children of variable T with FWER lower than Delta.

  • Parameters: T (NodeId) – The id of the target variable T.
  • Returns: The Parent-Children of variable T with FWER lower than Delta.
  • Return type: Set[NodeId]

This function is the second part of the Step1 of PC algorithm.

  • Parameters:
    • order (List [**NodeId ]) – The order of the variables.
    • C (Dict [**NodeId , Set [**NodeId ] ]) – The temporary skeleton.
    • l (int) – The size of the sepset
    • verbose (bool) – Whether to print.
  • Returns: found_edge – True if a new edge is found, False if not.
  • Return type: bool

This function is the fourth step of PC-algo. Orient the remaining undirected edge by comparing variances of two nodes.

  • Parameters:
    • C (Dict [**NodeId , Set [**NodeId ] ]) – The temporary skeleton.
    • verbose (bool) – Whether to print the process of Step4.
  • Returns:
    • C (Dict[NodeId, Set[NodeId]]) – The final skeleton (of Step4).
    • new_oriented (bool) – Whether there is a new edge oriented in the fourth step.

This function is used to estimate the parameters of the CLG model.

  • Parameters: C (Dict [**NodeId , Set [**NodeId ] ]) – A directed graph DAG representing the causal structure.
  • Returns:
    • id2mu (Dict[NodeId, float]) – The estimated mean of each node.
    • id2sigma (Dict[NodeId, float]) – The estimated variance of each node.
    • arc2coef (Dict[Tuple[NodeId, NodeId], float]) – The estimated coefficients of each arc.

In this function, we fit the parameters of the CLG model.

  • Parameters: clg (CLG) – The CLG model to be changed its parameters.

Find all the possible combinations of X, Y and Z.

  • Returns: All the possible combinations of X, Y and Z.
  • Return type: List[Tuple[Set[NodeId], Set[NodeId]]]

Generator that iterates on all all the subsets of S (from the smallest to the biggest).

  • Parameters: S (Set [**NodeId ]) – The set of variables.

First use PC algorithm to learn the skeleton of the CLG model. Then estimate the parameters of the CLG model. Finally create a CLG model and return it.

  • Returns: learned_clg – The learned CLG model.
  • Return type: CLG

r_XYZ : Dict[Tuple[FrozenSet[NodeId], FrozenSet[NodeId]], List[float]]

Section titled “r_XYZ : Dict[Tuple[FrozenSet[NodeId], FrozenSet[NodeId]], List[float]]”

sepset : Dict[Tuple[NodeId, NodeId], Set[NodeId]]

Section titled “sepset : Dict[Tuple[NodeId, NodeId], Set[NodeId]]”

Use n-MCERA to get supremum deviation.

  • Parameters:
    • n_sample (int) – The MC number n in n-MCERA.
    • fwer_delta (float ∈ (**0 ,**1 ]) – Threshold.
  • Returns: SD – The supremum deviation.
  • Return type: float

Perform a standard statistical test and use Bonferroni correction to correct for multiple hypothesis testing.

  • Parameters:
    • X (NodeId) – The id of the first variable tested.
    • Y (NodeId) – The id of the second variable tested.
    • Z (Set [**NodeId ]) – The conditioned variable’s id set.
  • Returns: True if X and Y are indep given Z, False if not indep.
  • Return type: bool

This function is the third step of PC-algo. Orient as many of the remaining undirected edges as possible by repeatedly application of the three rules.

  • Parameters:
    • C (Dict [**NodeId , Set [**NodeId ] ]) – The temporary skeleton.
    • verbose (bool) – Whether to print the process of this function.
  • Returns: C – The final skeleton (of Step3).
  • Return type: Dict[NodeId, Set[NodeId]]