Back to Blog

Fast implementation of SHAP Values in pyAgrum>=2.3.0

Fast implementation of SHAP Values in pyAgrum

This notebooks illustrates the computational time improvement achieved by the new version of the SHAP values implementation in the pyAgrum library.

These experiments are certainly not sufficient to provide a comprehensive analysis of the behavior of our new implementations. In particular, the evaluation has so far been limited to relatively small Bayesian networks. Nevertheless, the results offer valuable insights into the potential improvements achievable in the computation of SHAP values. They highlight both the computational gains made possible by exploiting graphical model factorization and the promising scalability of the proposed approach when applied to larger and more complex networks.

Note SHAP stands for SHapley Additive exPlanations. In general, SHAP aims to provide an additive decomposition of a predictive function f(x1,,xn)=P(Y=1x1,,xn)f(x_1, \ldots, x_n) = P(Y = 1 \mid x_1, \ldots, x_n). A Bayesian network defined over the variables (Y,X1,,Xn)(Y, X_1, \ldots, X_n) can indeed be used to efficiently compute such conditional probabilities. However, it can also be employed to estimate other types of probabilities, such as the likelihood of an observation vector P(x1,,xn)P(x_1, \ldots, x_n). We propose to extend the SHAP framework to this case by introducing — to the best of our knowledge — the original concept of SHALL values (SHapley Additive LogLikelihood), which provide an additive explanation of the likelihood function itself.

Introduction

Recall that SHAP values are an explanation method that assigns to each feature of a model an importance score representing the effect of including that feature on the model’s prediction. This importance score is given by the formula :

ϕi=SF{i}S!(FS1)!F![fS{i}(xS{i})fS(xS)]\phi_i = \sum_{S \subseteq F \setminus \{i\}} \frac{|S|!(|F| - |S| - 1)!}{|F|!} \left[ f_{S \cup \{i\}}(x_{S \cup \{i\}}) - f_S(x_S) \right]

Where FF is the set of all features of the model, ff is the model itself, and xx is the instance we aim to explain. Note that if we assume that a model call on a subset of features SS, denoted fs(xs)f_s(x_s) has a constant execution time τ\tau regardless of the subset SS (which is not completely true), then we can upper bound the total explanation time by 2FFNτ2^{|F|} \cdot |F| \cdot N \cdot \tau, where NN is the length of the dataset we aim to explain. We now know which parameters affect the execution time, namely: the number of features and the size of the dataset.

The computation of SHAP values within Bayesian networks can be substantially simplified by exploiting the factorization properties inherent to graphical models. Bayesian networks explicitly encode conditional dependencies among variables, allowing the joint probability distribution to be decomposed into a product of local conditional probabilities. This structural factorization enables the isolation of variable-specific contributions to the prediction without requiring an exhaustive evaluation over the entire input space, which would otherwise be computationally prohibitive. The new version of the algorithm leverages this property more comprehensively by identifying relevant subgraphs for each explanatory variable and computing SHAP values only over the conditional dependencies that are truly informative. As a result, the overall computation becomes significantly faster and more stable, since the estimation relies on localized, probabilistically consistent subsets of variables. By integrating the factorization structure of Bayesian networks into the SHAP computation process, the updated approach achieves a notable improvement in computational efficiency while preserving the interpretability and fidelity of the explanations.

The impact of the dataset size

To assess the impact of dataset size, we generated a Bayesian network of fixed size (F=4)(|F| = 4) and computed SHAP values using datasets with increasing sizes, ranging from 11 to 1,0001,000 instances. We repeated this process on different Bayesian networks of size 44 to avoid potential biases. Times are shown in the next figures

Conditional SHAP Value
Marginal SHAP Value (note the logY scale)
The empirical mean execution time for computing SHAP Values across the Bayesian networks, along with the $95\%$ confidence interval, versus dataset size.

A quick aside: if you’re expecting the execution time to grow roughly linearly, you’re absolutely right. The deviation comes from a unique operation applied to the dataset. We use a Bayesian network of size 4 with discrete variables (one of which is the target), each taking 5 possible values. This results in 53=1255^3 = 125 possible feature combinations. So, with a sufficiently large dataset, it’s very likely that all combinations will appear—effectively reducing the dataset to 125 unique instances.

The execution time for computing both Conditional and Marginal SHAP values has significantly decreased. The computation of Causal SHAP values is a different story. Since the current implementation in pyAgrum is incorrect, a direct comparison of execution times is not meaningful. Although our approach results in a longer execution time, it provides a correct computation. Nonetheless, we provide the execution time comparison in the next figure:

Causal SHAP Value

One notable aspect of the new version—beyond being faster—is that the execution time does not depend on the specific instance, but rather on its size, which is indeed expected. What is surprising, however, is that the previous version shows a high standard deviation across multiple instances. Recall that the width of a confidence interval depends only on the standard deviation and the chosen confidence level (2ασn2 \cdot \alpha \frac{\sigma}{\sqrt{n}}).

The impact of the background data

Previously, when writing the theoretical complexity formula, we assumed that a model call on a subset of features SS, denoted fS(xS)f_S(x_S), has a constant execution time τ\tau, regardless of the subset SS. As we mentioned, this assumption is not strictly correct. However, this does not affect the validity of the theoretical formula, since it represents an upper bound. It is therefore sufficient to take τ\tau as the maximum execution time of a single classifier call. For Marginal and Causal SHAP values, the execution time depends on the size of the background dataset. To assess its impact, we fix a Bayesian network (F=4|F| = 4) and generate a dataset to explain with a fixed size (N=10N = 10). Then, we create multiple explainers using background datasets of varying sizes. We repeated this process on different Bayesian networks of size 4 to avoid potential biases. The computation times are shown in the followin figure.

The empirical mean execution time for computing the Marginal SHAP values across 50 Bayesian networks,
along with the $95\%$ confidence interval, versus background data size.

Same idea as before, if you’re expecting the execution time to grow roughly linearly, you’re absolutely right. The deviation comes from a unique operation applied to the background dataset, while of course preserving the frequencies in order to estimate the probabilities.

Quick analysis w.r.t the number of nodes

We now present a preliminary analysis of the computation time of SHAP values as a function of the number of nodes in the Bayesian network.

The empirical mean execution time for computing the Conditional SHAP Values
across $50$ random Bayesian networks for each size

As expected, the computational complexity increases exponentially with network size (i.e. the number of nodes). However, the results once again demonstrate the substantial performance gains achieved by the new implementation, confirming its ability to mitigate part of this exponential growth through more efficient exploitation of the model’s factorization properties.

Conclusion

In conclusion, the new implementation of SHAP values in pyAgrum is faster than the previous one, except in the case of CausalShapleyValues, as the earlier version does not compute them correctly.