Learning a CLG
One of the main features of this library is the possibility to learn a CLG.
More precisely what can be learned is : : - The dependency graph of a CLG
- The parameters of a CLG: the mu and sigma of each variable, the coefficients of the arcs
Learning the graph
To learn the graph of a CLG (ie the dependence between variables) we use a modified PC algorithm based on the workof Diego Colombo, Marloes H. Maathuis: Order-Independent Constraint-Based Causal Structure Learning(2014).
The independence test used is based on the work of Dario Simionato, Fabio Vandin: Bounding the Family-Wise Error Rate in Local Causal Discovery using Rademacher Averages(2022).
class pyagrum.clg.learning.CLGLearner(filename, , n_sample=15, fwer_delta=0.05)
Section titled “class pyagrum.clg.learning.CLGLearner(filename, , n_sample=15, fwer_delta=0.05)”Using Rademacher Average to guarantee FWER(Family Wise Error Rate) in independency test. (see “Bounding the Family-Wise Error Rate in Local Causal Discover using Rademacher Averages”, Dario Simionato, Fabio Vandin, 2022)
- Parameters:
- filename (
str) - n_sample (
int) - fwer_delta (
float)
- filename (
Adjacency_search(order, verbose=False)
Section titled “Adjacency_search(order, verbose=False)”This function is the first step of PC-algo: Adjacency Search. Apply indep_test() to the first step of PC-algo for Adjacency Search.
- Parameters:
- order (List [**NodeId ]) – A particular order of the Nodes.
- verbose (bool) – Whether to print the process of Adjacency Search.
- Returns:
- C (Dict[NodeId, Set[NodeId]]) – The temporary skeleton.
- sepset (Dict[Tuple[NodeId, NodeId], Set[NodeId]]) – Sepset(which will be used in Step2&3 of PC-Algo).
PC_algorithm(order, verbose=False)
Section titled “PC_algorithm(order, verbose=False)”This function is an advanced version of PC-algo. We use Indep_test_Rademacher() to replace indep_test() in PC-algo. And we orient the undirected edges in the skeleton C by comparing the variances of the two nodes.
- Parameters:
- order (List [**NodeId ]) – A particular order of the Nodes.
- verbose (bool) – Whether to print the process of the PC algorithm.
- Returns: C – A directed graph DAG representing the causal structure.
- Return type: Dict[NodeId, Set[NodeId]]
Pearson_coeff(X, Y, Z)
Section titled “Pearson_coeff(X, Y, Z)”Estimate Pearson’s linear correlation(using linear regression when Z is not empty).
Parmeters
Section titled “Parmeters”X : id of the first variable tested.
Y : id of the second variable tested.
Z : The conditioned variable’s id set.
RAveL_MB(T)
Section titled “RAveL_MB(T)”Find the Markov Boundary of variable T with FWER lower than Delta.
- Parameters: T (NodeId) – The id of the target variable T.
- Returns: MB – The Markov Boundary of variable T with FWER lower than Delta.
- Return type: Set[NodeId]
RAveL_PC(T)
Section titled “RAveL_PC(T)”Find the Parent-Children of variable T with FWER lower than Delta.
- Parameters: T (NodeId) – The id of the target variable T.
- Returns: The Parent-Children of variable T with FWER lower than Delta.
- Return type: Set[NodeId]
Repeat_II(order, C, l, verbose=False)
Section titled “Repeat_II(order, C, l, verbose=False)”This function is the second part of the Step1 of PC algorithm.
- Parameters:
- order (List [**NodeId ]) – The order of the variables.
- C (Dict [**NodeId , Set [**NodeId ] ]) – The temporary skeleton.
- l (int) – The size of the sepset
- verbose (bool) – Whether to print.
- Returns: found_edge – True if a new edge is found, False if not.
- Return type: bool
Step4(C, verbose=False)
Section titled “Step4(C, verbose=False)”This function is the fourth step of PC-algo. Orient the remaining undirected edge by comparing variances of two nodes.
- Parameters:
- C (Dict [**NodeId , Set [**NodeId ] ]) – The temporary skeleton.
- verbose (bool) – Whether to print the process of Step4.
- Returns:
- C (Dict[NodeId, Set[NodeId]]) – The final skeleton (of Step4).
- new_oriented (bool) – Whether there is a new edge oriented in the fourth step.
estimate_parameters(C)
Section titled “estimate_parameters(C)”This function is used to estimate the parameters of the CLG model.
- Parameters: C (Dict [**NodeId , Set [**NodeId ] ]) – A directed graph DAG representing the causal structure.
- Returns:
- id2mu (Dict[NodeId, float]) – The estimated mean of each node.
- id2sigma (Dict[NodeId, float]) – The estimated variance of each node.
- arc2coef (Dict[Tuple[NodeId, NodeId], float]) – The estimated coefficients of each arc.
fitParameters(clg)
Section titled “fitParameters(clg)”In this function, we fit the parameters of the CLG model.
- Parameters: clg (CLG) – The CLG model to be changed its parameters.
static generate_XYZ(l)
Section titled “static generate_XYZ(l)”Find all the possible combinations of X, Y and Z.
- Returns: All the possible combinations of X, Y and Z.
- Return type: List[Tuple[Set[NodeId], Set[NodeId]]]
static generate_subsets(S)
Section titled “static generate_subsets(S)”Generator that iterates on all all the subsets of S (from the smallest to the biggest).
- Parameters: S (Set [**NodeId ]) – The set of variables.
id2samples : Dict[NodeId, List]
Section titled “id2samples : Dict[NodeId, List]”learnCLG()
Section titled “learnCLG()”First use PC algorithm to learn the skeleton of the CLG model. Then estimate the parameters of the CLG model. Finally create a CLG model and return it.
- Returns: learned_clg – The learned CLG model.
- Return type: CLG
r_XYZ : Dict[Tuple[FrozenSet[NodeId], FrozenSet[NodeId]], List[float]]
Section titled “r_XYZ : Dict[Tuple[FrozenSet[NodeId], FrozenSet[NodeId]], List[float]]”sepset : Dict[Tuple[NodeId, NodeId], Set[NodeId]]
Section titled “sepset : Dict[Tuple[NodeId, NodeId], Set[NodeId]]”supremum_deviation(n_sample, fwer_delta)
Section titled “supremum_deviation(n_sample, fwer_delta)”Use n-MCERA to get supremum deviation.
- Parameters:
- n_sample (int) – The MC number n in n-MCERA.
- fwer_delta (float ∈ (**0 ,**1 ]) – Threshold.
- Returns: SD – The supremum deviation.
- Return type: float
test_indep(X, Y, Z)
Section titled “test_indep(X, Y, Z)”Perform a standard statistical test and use Bonferroni correction to correct for multiple hypothesis testing.
- Parameters:
- X (NodeId) – The id of the first variable tested.
- Y (NodeId) – The id of the second variable tested.
- Z (Set [**NodeId ]) – The conditioned variable’s id set.
- Returns: True if X and Y are indep given Z, False if not indep.
- Return type: bool
three_rules(C, verbose=False)
Section titled “three_rules(C, verbose=False)”This function is the third step of PC-algo. Orient as many of the remaining undirected edges as possible by repeatedly application of the three rules.
- Parameters:
- C (Dict [**NodeId , Set [**NodeId ] ]) – The temporary skeleton.
- verbose (bool) – Whether to print the process of this function.
- Returns: C – The final skeleton (of Step3).
- Return type: Dict[NodeId, Set[NodeId]]