Learning a CTBN

One of the main features of this library is the possibility to learn a CTBN.

More precisely what can be learned is : : - The dependency graph of a CTBN

The CIMs of a CTBN
(The variables and their labels from a sample)

Tools to extract data from samples are necessary. This is the role of class pyagrum.ctbn.Trajectory and function pyagrum.ctbn.CTBNFromData().

Before introducing the algorithms, here are the following definitions : : - $M_{xx'|u}$ is the number of time a variable X go from a state x to a state x’, conditioned by an instance of its parents u. It is filled using samples.

$M_{x|u}$ is the number of time X goes to state x.
$T_{x|u}$ is the time spent in state x, conditioned by an instance of its parents u.
$M_{xx'|y,u}$ and $T_{x|y,u}$ are the same but with another conditioning variable Y in state y.

Those can be stored in pyagrum.Tensor.

Being conditioned by an instance means that the extracted data comes from time intervals where conditioning variables take specific values.

Learning parameters : learning the CIMs

Goal : finding the $q_{i,j|u}$ (i.e $q_{x|u}$ and $q_{x \rightarrow x'|u}$ ) coefficients.

Idea : $q_{x|u}$ = $\frac{M_{x|u}}{T_{x|u}}$ ; $P_X(x\rightarrow x') = \frac{M_{x \rightarrow x'|u}}{M_{x|u}} = \frac{q_{x \rightarrow x'|u}}{q_{x|u}}$ Then $q_{x \rightarrow x'|u} = \frac{M_{x \rightarrow x'|u}}{T_{x|u}}$

Learning the graph

To learn the graph of a CTBN (ie the dependence between variables) we use the CTPC algorithm from Bregoli et al. [BSS20] (and using Nodelman et al. [NSK02]). The independence test used is based on Fisher and chi2 tests to compare exponential distributions.

class pyagrum.ctbn.Learner(source)

Class used to learn a CTBN (independence between variables and CIMs) using samples.

Parameters: source (str |**Dict [**int , List [**Tuple [**float , str , str ] ] ]) – Path to the csv file containing the samples(trajectories). Or directly the trajectories in a python dict.

fitParameters(ctbn)

Learns the parameters of ctbn’s CIMs.

Parameters: ctbn (CTBN) – CTBN containing the CIMs to learn.

learnCTBN(template=None)

Learns a CTBN, using the CTPC(continuous-time PC) algorithm. Reference : A. Bregoli, M. Scutari, F. Stella, Constraint-Based Learning for Continuous-Time Bayesian Networks, arXiv:2007.03248, 2020.

Parameters: template (CTBN) – CTBN used to find variables. If not given, variables are searched inside the trajectories. (if the trajectory is very short, some variables can be missed).
Returns: The learned ctbn.
Return type: CTBN

pyagrum.ctbn.readTrajectoryCSV(filename)

Reads trajectories from a csv file. Storing format : {IdSample, time, var, state}

Parameters: filename (str) – Path to the file.
Returns: The trajectories, a trajectory for every index.
Return type: Dict[int, List[Tuple[float, str, str]]]

pyagrum.ctbn.CTBNFromData(data)

Constructs a CTBN and add the corresponding variables found in the trajectories.

Warning

If data is too short, some variables or state labels might be missed.

Parameters: data (Dict [**int , List [**Tuple [**float , str , str ] ] ]) – The trajectories used to look for variables.
Returns: The resulting CTBN.
Return type: CTBN

pyagrum.ctbn.computeCIMFromStats(X, M, T)

Computes a CIM (Conditional Intensity Matrix) using stats from a trajectory. Variables in the tensor are not copied but directly used in the result to avoid memory issues.

Parameters:
- X (str) – Name of the variable to compute CIM for.
- M (pyagrum.Tensor) – Tensor containing the number of transitions for each pair of X’s states.
- T (pyagrum.Tensor) – Tensor containing the time spent to transition from every state of X.
Returns: The resulting tensor, X’s CIM.
Return type: pyagrum.Tensor

class pyagrum.ctbn.Trajectory(source, ctbn=None)

Tools to extract useful informations from a trajectory. It is used for parameters/graph learning. It can be created from a trajectory (a dict of trajectories) or from a file that contains one.

Parameters:
- source (str |**Dict [**int , List [**Tuple [**float , str , str ] ] ]) – The path to a csv file containing the samples or the dict of trajectories itself.
- ctbn (CTBN) – To link the variables’s name in the trajectory to their pyAgrum variable. If not given, a new CTBN is created with the variables and labels found in the trajectory. (warning : if the trajectory is short, all of the variables may not be found correctly).

data

The samples.

Type: Dict[int, List[Tuple[float, str, str]]]

ctbn

The CTBN used to link the names in the trajectory to pyAgrum variables.

Type: CTBN

timeHorizon

The time length of the trajectory.

Type: float

computeAllCIMs()

Computes the CIMs of the variables in self.ctbn. Conditioning is given by the graph of self.ctbn.

computeStats(X, U)

Computes time spent and number of transitions values of X and returns them as pyagrum.Tensor.

Parameters:
- X (str) – Name of the variable.
- U (List [**str ]) – List of conditioning variable’s name.
Returns: The resulting tensors.
Return type: Tuple[pyagrum.Tensor, pyagrum.Tensor]

computeStatsForTests(X, Y, U)

Computes time spent and number of transitions values of X when conditioned by Y and U and returns them as pyagrum.Tensor. Used for independence testing.

Parameters:
- X (str) – Name of the variable.
- Y (str) – Name of a conditioning variable not in U.
- U (List [**str ]) – List of conditioning variable’s name.
Returns: The resulting tensors.
Return type: Tuple[pyagrum.Tensor, pyagrum.Tensor, pyagrum.Tensor]

setStatValues(X, inst_u, Txu, Mxu)

Fills the tensors given.

Parameters:
- X (str) – Name of the variable.
- inst_u (Dict [**str , str ]) – Instance of conditioning variables.
- Txu (pyagrum.Tensor) – Tensor to fill. Contains the time spent in each state.
- Mxu (pyagrum.Tensor) – Tensor to fill. Contains the number of transitions from any pair of states.

setStatsForTests(X, Y, inst_u, Txu, Txyu, Mxyu)

Fills the tensors given. They are used for independence testing.

Parameters:
- X (str) – Name of the variable.
- Y (str) – Name of a conditioning variable.
- inst_u (Dict [**str , str ]) – Instance of conditioning variables.
- Txu (pyagrum.Tensor) – Tensor to fill. Contains the time spent in each state. Conditioned by variables in inst_u.
- Txyu (pyagrum.Tensor) – Tensor to fill. Contains the time spent in each state. Conditioned by Y and variables in inst_u.
- Mxyu (pyagrum.Tensor) – Tensor to fill. Contains the number of transitions from any pair of states. Conditioned by Y and variables in inst_u.

class pyagrum.ctbn.Stats(trajectory, X, Y, par)

Stores all tensors used for learning.

Parameters:
- trajectory (Trajectory) – Samples used to find stats.
- X (str) – Name of the variable to study.
- Y (str) – Name of the variable used for conditioning variable X.
- par (List [**str ]) – List of conditioning variables of X.

Mxy

Tensor containing the number of transitions the variable X does from any of its states for any instance of its parents and variable“Y“.

Type: pyagrum.Tensor

Mx

Tensor containing the number of transitions the variable X does from any of its states for any instance of its parents.

Type: pyagrum.Tensor

Tx

Tensor containing the time spent by X to transition from a state to another for any instance of its parents.

Type: pyagrum.Tensor

Txy

Tensor containing the time spent by X to transition from a state to another for any instance of its parents and of Y.

Type: pyagrum.Tensor

Qx

Conditional Intensity Matrix(CIM) of X.

Type: pyagrum.Tensor

QxY

Conditional Intensity Matrix(CIM) of X that includes the conditioning variable Y.

Type: pyagrum.Tensor

class pyagrum.ctbn.StatsIndepTest.FChi2Test(tr)

Bases: IndepTest

This class use 2 independence tests : Fisher Test (F-test) and chi2 Test. To test independence between 2 variables, we first consider them independent. There is independence until one of the 2 tests (F and chi2) contradict the independence hypothesis. If the hyopothesis is not rejected, the variables are considered independent.

Parameters: tr (Trajectory) – Samples used to extract stats.

addVariables(X, Y, U)

Saves variables X and Y and the conditioning set U, and generates stats to be used in statistical tests.

Parameters:
- X (str) – Name of the variable.
- Y (str) – Name of the variable to test independence from, not in U.
- U (List [**str ]) – List of conditioning variables.

computeChi2()

Compute chi2-test value for every instance of the variables.

Returns: chi2-test value.
Return type: pyagrum.Tensor

computeF()

Compute F-test value for every instance of the variables.

Returns: F-test value.
Return type: pyagrum.Tensor

getMxxGivenU(M, Y)

Parameters:
- M (pyagrum.Tensor) – A matrix M_{x, x’ | y, U}, for some instantiation U of the conditioning set and y of a specific parent.
- Y (str) – A parent.
Returns: The tensor M_{x, x’ | U} by summing over all values of y.
Return type: pyagrum.Tensor

nullStateToStateTransitionHypothesisChi2(X, Y, _)

Decides if the null state to state transition hypothesis is rejected using chi2-test.

Parameters:
- X (str) – A random variable.
- Y (str) – A parent of X.
- _ (List[str]) – A subset of the parents of X that does not contain Y.
- _
Returns: False if X is not independent of Y given the conditioning set U.
Return type: bool

nullTimeToTransitionHypothesisF(X, Y, _)

Decides if the null time to transition hypothesis is rejected using F-test.

Parameters:
- X (str) – A random variable.
- Y (str) – A parent of X.
- _ (List[str]) – A subset of the parents of X that does not contain Y.
- _
Returns: False if X is not independent of Y given the conditioning set U.
Return type: bool

testIndep(X, Y, U)

Parameters:
- X (str) – Name of the variable.
- Y (str) – Name of the variable to test independence from, not in U.
- U (List [**str ]) – List of conditioning variables.
Returns: true if X is independent to Y given U, otherwise false.
Return type: bool

class pyagrum.ctbn.StatsIndepTest.IndepTest

Bases: object

Mother class used to test independance between 2 variables knowing some other parents.

abstractmethod testIndep(X, Y, U)

Parameters:
- X (str) – Head of the arc we want to test.
- Y (str) – Tail of the arc we want to test.
- U (List [**str ]) – Known parents.
Return type: bool

class pyagrum.ctbn.StatsIndepTest.Oracle(ctbn)

Bases: IndepTest

Oracle’s testing tools.

Parameters: ctbn (CTBN)

testIndep(X, Y, U)

Parameters:
- X (str) – Head of the arc we want to test.
- Y (str) – Tail of the arc we want to test.
- U (List [**str ]) – Known parents.
Returns: False if there is an arc from Y to X knowing U, True otherwise.
Return type: bool

pyagrum.ctbn.StatsIndepTest.sqrtTensor(tensor)

Applies sqrt function to all values inside the tensor.

Parameters: tensor (pyagrum.Tensor) – tensor to play sqrt to.
Returns: sqrt of tensor.
Return type: pyagrum.Tensor

Learning a CTBN

Learning parameters : learning the CIMs

Learning the graph

class pyagrum.ctbn.Learner(source)

fitParameters(ctbn)

learnCTBN(template=None)

pyagrum.ctbn.readTrajectoryCSV(filename)

pyagrum.ctbn.CTBNFromData(data)

pyagrum.ctbn.computeCIMFromStats(X, M, T)

class pyagrum.ctbn.Trajectory(source, ctbn=None)

data

ctbn

timeHorizon

computeAllCIMs()

computeStats(X, U)

computeStatsForTests(X, Y, U)

setStatValues(X, inst_u, Txu, Mxu)

setStatsForTests(X, Y, inst_u, Txu, Txyu, Mxyu)

class pyagrum.ctbn.Stats(trajectory, X, Y, par)

Mxy

Mx

Tx

Txy

Qx

QxY

class pyagrum.ctbn.StatsIndepTest.FChi2Test(tr)

addVariables(X, Y, U)

computeChi2()

computeF()

getMxxGivenU(M, Y)

nullStateToStateTransitionHypothesisChi2(X, Y, _)

nullTimeToTransitionHypothesisF(X, Y, _)

testIndep(X, Y, U)

class pyagrum.ctbn.StatsIndepTest.IndepTest

abstractmethod testIndep(X, Y, U)

class pyagrum.ctbn.StatsIndepTest.Oracle(ctbn)

testIndep(X, Y, U)

pyagrum.ctbn.StatsIndepTest.sqrtTensor(tensor)

Bibliography for CTNB