Generalized Functional ANOVA in Closed-Form: A Unified View of Additive Explanations

Baptiste Ferrere; Fabrice Gamboa; Jean-Michel Loubes; Nicolas Bousquet

arxiv: 2605.18422 · v1 · pith:ESGERUKYnew · submitted 2026-05-18 · 📊 stat.ML · cs.LG· math.ST· stat.TH

Generalized Functional ANOVA in Closed-Form: A Unified View of Additive Explanations

Baptiste Ferrere , Nicolas Bousquet , Fabrice Gamboa , Jean-Michel Loubes This is my paper

Pith reviewed 2026-05-19 23:53 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH

keywords generalized functional ANOVARiesz basisadditive explanationsdependent inputsHilbert spacemodel interpretabilitySHAPgeneralized additive models

0 comments

The pith

Hilbert space methods yield an explicit Riesz basis for generalized functional ANOVA on continuous dependent inputs

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a way to decompose any model prediction into additive main effects and higher-order interaction terms for continuous input variables even when those variables are statistically dependent. It achieves this by merging Hilbert space techniques with the generalized functional ANOVA to produce a Riesz basis that makes the entire decomposition explicit and directly computable. A sympathetic reader cares because dependence is the normal case in real data, yet most classical tools for interpretability either assume independence or resort to costly approximations. The construction recovers the familiar orthogonal decomposition as a special case when inputs are independent. From the same representation the authors derive a simple algorithm that estimates the decomposition directly from finite samples without requiring knowledge of the underlying model.

Core claim

By combining Hilbert space methods with the generalized functional ANOVA, we build an explicit decomposition Riesz Basis allowing to easily compute the decomposition. Our formulation recovers the classical independent case and its associated orthogonal decomposition. Building on this representation, we propose a simple but mighty algorithm to estimate the decomposition from a data sample in a model-agnostic setting and we compare it empirically with several state-of-the-art explanation methods, demonstrating the power of the approach.

What carries the argument

the explicit decomposition Riesz Basis constructed in the Hilbert space of the input measure, which represents every term of the generalized functional ANOVA as an inner product against basis elements

If this is right

The decomposition can be evaluated in closed form once the basis coefficients are known, without numerical integration over the joint distribution.
The independent-input orthogonal ANOVA appears as the special case in which the Riesz basis reduces to the usual product basis.
A model-agnostic estimator follows immediately by replacing population inner products with their empirical counterparts on a sample.
The same representation connects the decomposition to SHAP values and to the terms of a generalized additive model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Choosing particular Riesz bases, such as orthogonal polynomials with respect to the marginal measures, could yield fast algorithms that scale to moderate dimensions.
The construction supplies a theoretical justification for applying additive explanation methods to correlated features without first transforming the data to independence.
If the Riesz basis can be adapted to mixed continuous-discrete inputs, the same closed-form route would extend to a wider class of tabular data problems.

Load-bearing premise

The input variables are continuous and the underlying Hilbert space admits a suitable Riesz basis so that the decomposition stays explicit and can be estimated from finite samples.

What would settle it

Generate synthetic continuous data with known dependence structure, compute the true generalized functional ANOVA terms by direct integration, then apply the proposed estimator and check whether the recovered terms match the true decomposition to within sampling error.

Figures

Figures reproduced from arXiv: 2605.18422 by Baptiste Ferrere, Fabrice Gamboa, Jean-Michel Loubes, Nicolas Bousquet.

**Figure 1.** Figure 1: Estimated main effects on California Housing: our method (black) vs TreeHFD (main effects) and TreeSHAP on a trained XGB. actually quantify. Second, they are frequently computationally expensive, and in some cases formally intractable. However, these constructions are in fact closely connected to functional decompositions of the predictor. Highlighting this connection has two important consequences. First,… view at source ↗

**Figure 2.** Figure 2: Decomposition of a trained MLP on Bike Sharing. Left: Network plot for a random instance of the dataset for visualizing local feature attribution and interaction. Middle & Right: For the features hour and atemp our method (black) vs KernelSHAP and DeepSHAP. and higher-order effects, producing richer decompositions of the predictor. In parallel, generalized additive models themselves have long been used to … view at source ↗

**Figure 3.** Figure 3: Estimated main effects for Age on Census Income. Left: Our method (black) vs KernelSHAP and DeepSHAP on a trained MLP. Right: Our method (black) vs TreeHFD (main effects) and TreeSHAP on a trained XGB. 3 Background The Functional ANOVA decomposition provides a mathematical framework for decomposing a real-valued square integrable function ν(X) into a sum of components of increasing order: ν(X) = ν∅ + Xp i=… view at source ↗

**Figure 4.** Figure 4: Comparison of native main effects from an EBM and a NAM with those recovered by our [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Force plot of positive and negative contributions (main effects, pair effects and residual) for a random instance of Electrical Grid. [18] to continuous random variables. This mechanism will be the key tool to obtain hierarchical orthogonality. First, let ξ∅ := 1. Definition 4.1. For S ⊆ [p], let denote fS the marginal density of XS and define mS := (mj )j∈S ∈ N |S| + . For x ∈ [−1, 1]p we set ξ (mS ) S (x… view at source ↗

**Figure 6.** Figure 6: Illustration of our method to estimate the [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Estimated main effects in the analytical setting for [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Estimated main effects in the analytical setting for [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗

**Figure 9.** Figure 9: Estimated main effects in the unbounded-density setting for [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison between the theoretical ANOVA components and our estimator ( [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗

**Figure 11.** Figure 11: Estimated main effects on Bike Sharing: our method (black) vs TreeHFD (main effects) and TreeSHAP on a trained XGB. 0 5 10 15 20 200 100 0 100 200 300 hr 0 1 2 3 4 5 6 150 100 50 0 50 100 weekday 0.0 0.2 0.4 0.6 0.8 1.0 100 80 60 40 20 0 20 holiday 1.0 1.5 2.0 2.5 3.0 3.5 4.0 60 40 20 0 20 40 60 season 0.0 0.2 0.4 0.6 0.8 1.0 150 100 50 0 50 100 atemp 0.0 0.2 0.4 0.6 0.8 1.0 80 60 40 20 0 20 40 60 hum 0.0… view at source ↗

**Figure 12.** Figure 12: Estimated main effects on Bike Sharing: our method (black) vs KernelSHAP and DeepSHAP on a trained MLP. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗

**Figure 13.** Figure 13: Comparison of native main effects from an EBM with those recovered by our method on [PITH_FULL_IMAGE:figures/full_fig_p029_13.png] view at source ↗

**Figure 14.** Figure 14: Estimated main effects on California Housing: our method (black) vs KernelSHAP and DeepSHAP on a trained MLP. 2 4 6 8 10 0.8 0.0 0.8 1.6 2.4 MedInc 15 30 45 0.30 0.15 0.00 0.15 0.30 HouseAge 4 6 8 10 0.0 2.5 5.0 7.5 10.0 AveRooms 0.9 1.2 1.5 1.8 2.1 7.5 5.0 2.5 0.0 AveBedrms 0 1500 3000 4500 0.08 0.00 0.08 Population 2 3 4 5 0.8 0.0 0.8 1.6 AveOccup 34 36 38 40 2 1 0 1 Latitude 123.0 121.5 120.0 118.5 117… view at source ↗

**Figure 15.** Figure 15: Comparison of native main effects from an EBM with those recovered by our method on [PITH_FULL_IMAGE:figures/full_fig_p029_15.png] view at source ↗

**Figure 16.** Figure 16: Estimated main effects on Census Income: our method (black) vs TreeHFD (main effects) and TreeSHAP on a trained XGB. 20 30 40 50 60 70 80 90 3 2 1 0 1 2 age 0 2 4 6 8 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 workclass 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1e6 0.4 0.2 0.0 0.2 0.4 0.6 fnlwgt 0 2 4 6 8 10 12 14 1.0 0.5 0.0 0.5 1.0 1.5 2.0 education 2 4 6 8 10 12 14 16 2 1 0 1 2 education-num 0 1 2 3 4 5 6 2.0 1.5 1.0 0.5 0… view at source ↗

**Figure 17.** Figure 17: Estimated main effects on Census Income: our method (black) vs KernelSHAP and DeepSHAP on a trained MLP. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_17.png] view at source ↗

**Figure 18.** Figure 18: Comparison of native main effects from an EBM with those recovered by our method on [PITH_FULL_IMAGE:figures/full_fig_p031_18.png] view at source ↗

**Figure 19.** Figure 19: Estimated main effects on Electrical Grid: our method (black) vs TreeHFD (main effects) and TreeSHAP on a trained XGB. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_19.png] view at source ↗

**Figure 20.** Figure 20: Estimated main effects on Electrical Grid: our method (black) vs KernelSHAP and DeepSHAP on a trained MLP. 2 4 6 8 10 4 2 0 2 ¿1 2 4 6 8 10 4 2 0 2 ¿2 2 4 6 8 10 4 2 0 2 ¿3 2 4 6 8 10 4 2 0 2 ¿4 2.4 3.2 4.0 4.8 0.00 0.15 0.30 p1 2.0 1.6 1.2 0.8 0.16 0.08 0.00 0.08 p2 2.0 1.6 1.2 0.8 0.06 0.03 0.00 0.03 0.06 p3 2.0 1.6 1.2 0.8 0.00 0.25 0.50 0.75 p4 0.2 0.4 0.6 0.8 1.0 3.0 1.5 0.0 1.5 3.0 °1 0.2 0.4 0.6 0.… view at source ↗

**Figure 21.** Figure 21: Comparison of native main effects from an EBM with those recovered by our method on [PITH_FULL_IMAGE:figures/full_fig_p032_21.png] view at source ↗

**Figure 22.** Figure 22: Estimated main effects on Diabetes: our method (black) vs TreeHFD (main effects) and TreeSHAP on a trained XGB. 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 0.5 0.0 0.5 1.0 1.5 preg 0 25 50 75 100 125 150 175 200 4 3 2 1 0 1 2 3 plas 0 20 40 60 80 100 120 0.5 0.0 0.5 1.0 1.5 2.0 pres 0 20 40 60 80 100 0.3 0.2 0.1 0.0 0.1 0.2 skin 0 200 400 600 800 0.8 0.6 0.4 0.2 0.0 0.2 insu 0 10 20 30 40 50 60 70 3 2 1 0 1 2 mas… view at source ↗

**Figure 23.** Figure 23: Estimated main effects on Diabetes: our method (black) vs KernelSHAP and DeepSHAP on a trained MLP. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_23.png] view at source ↗

read the original abstract

The functional ANOVA, or Hoeffding decomposition, provides a principled framework for interpretability by decomposing a model prediction into main effects and higher-order interactions. For independent inputs, this classical decomposition is explicit. It is closely connected to SHAP values, generalized additive models, and orthogonal polynomial expansions, and therefore constitutes a fundamental tool for additive explainability. In the more general and realistic dependent setting, however, obtaining a tractable representation and estimating the decomposition from data remain challenging. In this work, we address this problem for continuous inputs. By combining Hilbert space methods with the generalized functional ANOVA, we build an explicit decomposition Riesz Basis allowing to easily compute the decomposition. Our formulation recovers the classical independent case and its associated orthogonal decomposition. Building on this representation, we propose a simple but mighty algorithm to estimate the decomposition from a data sample in a model-agnostic setting and we compare it empirically with several state-of-the-art explanation methods, demonstrating the power of the approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a Hilbert-space Riesz-basis construction for functional ANOVA under dependent continuous inputs and recovers the independent case, but the explicitness and practical computability of that basis remain the open question.

read the letter

The central new piece is the explicit Riesz-basis representation for the generalized functional ANOVA when inputs are continuous and dependent. The authors combine standard Hilbert-space arguments with the generalized decomposition to produce a basis whose coefficients are supposed to yield the main effects and interactions directly. They show that the construction reduces to the usual orthogonal Hoeffding decomposition under independence, which is a useful sanity check. They also sketch a model-agnostic estimation procedure from finite samples and run empirical comparisons against existing explanation methods.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that Hilbert-space methods applied to the generalized functional ANOVA yield an explicit Riesz basis for decomposing model predictions into additive terms and interactions when inputs are continuous and possibly dependent. The resulting representation is asserted to be directly computable, to recover the classical orthogonal Hoeffding decomposition under independence, and to support a simple model-agnostic estimation procedure from finite samples whose performance is compared empirically with existing explanation methods.

Significance. If the explicit Riesz-basis construction is valid and remains tractable for general joint distributions, the work would supply a principled, closed-form unification of additive explanations that extends beyond the independent-input case while preserving connections to SHAP, GAMs, and orthogonal expansions. The empirical comparisons constitute a concrete strength that would help establish practical utility.

major comments (2)

[Abstract and Riesz-basis construction section] Abstract and the section presenting the Riesz-basis construction: the central claim that an explicit, easily computable Riesz basis exists for arbitrary continuous joint distributions is load-bearing, yet the provided outline supplies neither the explicit form of the basis functions nor a demonstration that coefficient extraction avoids solving a Fredholm integral equation whose kernel depends on the unknown density; without this, the 'closed-form' and 'easily compute' guarantees cannot be verified.
[Recovery of independent case] Section on recovery of the independent case: the manuscript must show, via direct substitution or limit argument, that the generalized basis reduces exactly to the classical orthogonal decomposition when the joint measure factors, rather than merely stating recovery at a high level.

minor comments (2)

[Preliminaries] Clarify the precise definition of the underlying Hilbert space L²(μ) and the inner product used to define the Riesz basis, including any regularity conditions on the joint density.
[Experiments] The empirical section would benefit from reporting standard errors or confidence intervals on the explanation metrics to strengthen the comparison with state-of-the-art methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. These have helped us identify areas where the manuscript would benefit from greater explicitness and rigor. We address each major comment below and have revised the manuscript to incorporate the requested details, proofs, and clarifications.

read point-by-point responses

Referee: [Abstract and Riesz-basis construction section] Abstract and the section presenting the Riesz-basis construction: the central claim that an explicit, easily computable Riesz basis exists for arbitrary continuous joint distributions is load-bearing, yet the provided outline supplies neither the explicit form of the basis functions nor a demonstration that coefficient extraction avoids solving a Fredholm integral equation whose kernel depends on the unknown density; without this, the 'closed-form' and 'easily compute' guarantees cannot be verified.

Authors: We agree that the original presentation would be strengthened by a more detailed derivation of the Riesz basis. In the revised manuscript we now include an explicit construction: the basis functions are the Riesz representers of the coordinate functionals on the subspaces of the generalized ANOVA decomposition with respect to the joint measure. These are obtained via a Gram-Schmidt orthogonalization that exploits the nested structure of the ANOVA subspaces, yielding closed-form expressions involving only the joint density evaluated at the observed points and the marginal conditionals. We further show that the expansion coefficients are inner products that reduce to expectations under the data-generating distribution; these expectations are estimated directly from samples via Monte Carlo averages and do not require solving any integral equation whose kernel involves the unknown density. A new subsection with the full derivation and a worked example for the bivariate case has been added. revision: yes
Referee: [Recovery of independent case] Section on recovery of the independent case: the manuscript must show, via direct substitution or limit argument, that the generalized basis reduces exactly to the classical orthogonal decomposition when the joint measure factors, rather than merely stating recovery at a high level.

Authors: We thank the referee for this precise request. The revised manuscript now contains a dedicated lemma with a direct substitution argument. When the joint measure factors as the product of the marginals, the inner-product structure of the generalized Riesz basis collapses to the standard L² inner product with respect to the product measure. Substituting the product form into the defining equations for the basis functions shows that they coincide exactly with the classical orthogonal polynomials (or indicator functions) of the Hoeffding decomposition. The coefficients likewise reduce to the usual centered conditional expectations. The proof is presented in full, including the verification that all cross terms vanish under independence. revision: yes

Circularity Check

0 steps flagged

Derivation is a direct Hilbert-space construction with no reduction to inputs by construction

full rationale

The paper constructs an explicit Riesz basis for the generalized functional ANOVA by combining standard Hilbert-space methods with the existing generalized ANOVA framework for continuous inputs. The abstract states that this yields a tractable decomposition that recovers the classical orthogonal Hoeffding decomposition under independence, which functions as a consistency check rather than an input. No equations or steps are shown that define a quantity in terms of itself, rename a fitted parameter as a prediction, or rely on a load-bearing self-citation whose content is unverified. The central claim remains a mathematical construction from external functional-analysis results and is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of a Riesz basis within a Hilbert space for the generalized functional ANOVA operator when inputs are continuous; this is a standard functional-analysis tool rather than a new postulate, but its applicability to the decomposition is the key modeling choice.

axioms (1)

domain assumption Input variables are continuous.
The work explicitly restricts attention to the continuous-input setting.

invented entities (1)

Riesz basis for the generalized functional ANOVA no independent evidence
purpose: To furnish an explicit, computable representation of the decomposition under input dependence.
The basis is constructed within the paper as the central technical device; no external empirical handle is supplied in the abstract.

pith-pipeline@v0.9.0 · 5714 in / 1242 out tokens · 36162 ms · 2026-05-19T23:53:39.247352+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By combining Hilbert space methods with the generalized functional ANOVA, we build an explicit decomposition Riesz Basis... recovers the classical independent case and its associated orthogonal decomposition.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ξ(m_S)_S(x) := 1/√(2^{p−|S|}) · ∏_{j∈S} eP_{m_j}(x_j) / f_S(x_S)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 2 internal anchors

[1]

Agarwal, R., Melnick, L., Frosst, N., Zhang, X., Lengerich, B., Caruana, R., and Hinton, G. E. (2021). Neural additive models: Interpretable machine learning with neural nets.Advances in neural information processing systems, 34:4699–4711

work page 2021
[2]

I., Salaün, T., and Brunel, N

Amoukou, S. I., Salaün, T., and Brunel, N. (2022). Accurate Shapley values for explaining tree-based models. InInternational conference on artificial intelligence and statistics, pages 2448–2465. PMLR

work page 2022
[3]

Apley, D. W. and Zhu, J. (2020). Visualizing the effects of predictor variables in black box super- vised learning models.Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(4):1059–1086

work page 2020
[4]

Arzamasov, V ., Böhm, K., and Jochem, P. (2018). Towards concise models of grid stability. In 2018 IEEE international conference on communications, control, and computing technologies for smart grids (SmartGridComm), pages 1–6. IEEE

work page 2018
[5]

Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., and Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140

work page 2015
[6]

Bénard, C. (2025). Tree Ensemble Explainability through the Hoeffding Functional Decomposi- tion and TreeHFD Algorithm.Advances in Neural Information Processing Systems

work page 2025
[7]

and Sudret, B

Blatman, G. and Sudret, B. (2011). Adaptive sparse polynomial chaos expansion based on least angle regression.Journal of computational Physics, 230(6):2345–2367

work page 2011
[8]

and von Luxburg, U

Bordt, S. and von Luxburg, U. (2023). From Shapley values to generalized additive models and back. InInternational Conference on Artificial Intelligence and Statistics, pages 709–745. PMLR

work page 2023
[9]

(2011).Functional analysis, Sobolev spaces and partial differential equations, volume 2

Brézis, H. (2011).Functional analysis, Sobolev spaces and partial differential equations, volume 2. Springer

work page 2011
[10]

Cencov, N. N. (1962). Estimation of an unknown distribution density from observations.Soviet Math., 3:1559–1566

work page 1962
[11]

Chang, C.-H., Caruana, R., and Goldenberg, A. (2021). NODE-GAM: Neural generalized additive model for interpretable deep learning.arXiv:2106.01613

work page arXiv 2021
[12]

Chastaing, G., Gamboa, F., and Prieur, C. (2012). Generalized Hoeffding-Sobol Decomposition for Dependent Variables – Application to Sensitivity Analysis.Electronic Journal of Statistics, 6:2420–2448

work page 2012
[13]

Chen, T., He, T., Benesty, M., Khotilovich, V ., Tang, Y ., Cho, H., Chen, K., Mitchell, R., Cano, I., Zhou, T., et al. (2015). Xgboost: extreme gradient boosting.R package version 0.4-2, 1(4):1–4

work page 2015
[14]

and Hilbert, D

Courant, R. and Hilbert, D. (2024).Methods of mathematical physics, volume 2. John Wiley & Sons

work page 2024
[15]

Dua, D., Graff, C., et al. (2017). Uci machine learning repository, 2017.URL http://archive. ics. uci. edu/ml, 7(1):62

work page 2017
[16]

Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression.The Annals of Statistics, 32(2). 10

work page 2004
[17]

and Gama, J

Fanaee-T, H. and Gama, J. (2014). Event labeling combining ensemble detectors and background knowledge.Progress in Artificial Intelligence, 2(2):113–127

work page 2014
[18]

Ferrere, B., Bousquet, N., Gamboa, F., Loubes, J.-M., and Muré, J. (2026). Exact Functional ANOV A Decomposition for Categorical Inputs Models.arXiv:2603.02673

work page arXiv 2026
[19]

Ghanem, R. G. and Spanos, P. D. (2003).Stochastic finite elements: a spectral approach. Courier Corporation

work page 2003
[20]

Hamidieh, K. (2018). A data-driven statistical model for predicting the critical temperature of a superconductor.Computational Materials Science, 154:346–354

work page 2018
[21]

R., Millman, K

Harris, C. R., Millman, K. J., Van Der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., et al. (2020). Array programming with NumPy. Nature, 585(7825):357–362

work page 2020
[22]

Harsanyi, J. C. (1963). A Simplified Bargaining Model for the n−Person Cooperative Game. International Economic Review, 4(2):194–220

work page 1963
[23]

Hastie, T. J. (2017). Generalized additive models. InStatistical models in S, pages 249–307. Routledge

work page 2017
[24]

and Hahn, P

Herren, A. and Hahn, P. R. (2022). Statistical aspects of SHAP: Functional ANOV A for model interpretation.arXiv:2208.09970

work page arXiv 2022
[25]

T., and Wright, M

Hiabu, M., Meyer, J. T., and Wright, M. N. (2023). Unifying local and global model explanations by functional decomposition of low dimensional structures. InInternational conference on artificial intelligence and statistics, pages 7040–7060. PMLR

work page 2023
[26]

Hoeffding, W. (1948). A Class of Statistics with Asymptotically Normal Distribution.The Annals of Mathematical Statistics, 19(3):293–325

work page 1948
[27]

Hooker, G. (2007). Generalized Functional ANOV A Diagnostics for High-Dimensional Func- tions of Dependent Variables.Journal of Computational and Graphical Statistics, 16(3):709–732

work page 2007
[28]

I., Bousquet, N., Gamboa, F., Iooss, B., and Loubes, J.-M

Idrissi, M. I., Bousquet, N., Gamboa, F., Iooss, B., and Loubes, J.-M. (2025). Hoeffding decomposition of functions of random dependent variables.Journal of Multivariate Analysis, 208:105444

work page 2025
[29]

Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014
[30]

Kohavi, R. et al. (1996). Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. InKdd, volume 96, pages 202–207

work page 1996
[31]

E., Venkatasubramanian, S., Scheidegger, C., and Friedler, S

Kumar, I. E., Venkatasubramanian, S., Scheidegger, C., and Friedler, S. (2020). Problems with Shapley-value-based explanations as feature importance measures. InInternational conference on machine learning, pages 5491–5500. PMLR

work page 2020
[32]

Lengerich, B., Tan, S., Chang, C.-H., Hooker, G., and Caruana, R. (2020). Purifying interaction effects with the functional ANOV A: An efficient algorithm for recovering identifiable additive models. InInternational Conference on Artificial Intelligence and Statistics, pages 2402–2412. PMLR

work page 2020
[33]

and Conklin, M

Lipovetsky, S. and Conklin, M. (2001). Analysis of regression in game theory approach.Applied stochastic models in business and industry, 17(4):319–330

work page 2001
[34]

Lou, Y ., Caruana, R., Gehrke, J., and Hooker, G. (2013). Accurate intelligible models with pairwise interactions. InProceedings of the 19th ACM SIGKDD international conference on Knowledge Discovery and Data mining, pages 623–631

work page 2013
[35]

Consistent Individualized Feature Attribution for Tree Ensembles

Lundberg, S. M., Erion, G. G., and Lee, S.-I. (2018). Consistent individualized feature attribution for tree ensembles.arXiv:1802.03888. 11

work page internal anchor Pith review Pith/arXiv arXiv 2018
[36]

Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30

work page 2017
[37]

Montgomery, D. C. (2017).Design and analysis of experiments. John Wiley & Sons

work page 2017
[38]

Muschalik, M., Baniecki, H., Fumagalli, F., Kolpaczki, P., Hammer, B., and Hüllermeier, E. (2024a). shapiq: Shapley interactions for machine learning.Advances in Neural Information Processing Systems, 37:130324–130357

work page
[39]

Muschalik, M., Fumagalli, F., Hammer, B., and Hüllermeier, E. (2024b). Beyond TreeSHAP: Efficient computation of any-order shapley interactions for tree ensembles. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 14388–14396

work page
[40]

Nelsen, R. B. (2006).An introduction to copulas. Springer

work page 2006
[41]

Nori, H., Jenkins, S., Koch, P., and Caruana, R. (2019). InterpretML: A unified framework for machine learning interpretability.arXiv preprint arXiv:1909.09223

work page arXiv 2019
[42]

Owen, A. B. (2014). Sobol’indices and Shapley value.SIAM/ASA Journal on Uncertainty Quantification, 2(1):245–251

work page 2014
[43]

Owen, A. B. and Prieur, C. (2017). On Shapley value for measuring importance of dependent inputs.SIAM/ASA Journal on Uncertainty Quantification, 5(1):986–1002

work page 2017
[44]

Pace, R. K. and Barry, R. (1997). Sparse spatial autoregressions.Statistics & Probability Letters, 33(3):291–297

work page 1997
[45]

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V ., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V ., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830

work page 2011
[46]

Popov, S., Morozov, S., and Babenko, A. (2019). Neural oblivious decision ensembles for deep learning on tabular data.arXiv:1909.06312

work page arXiv 2019
[47]

Radenovic, F., Dubey, A., and Mahajan, D. (2022). Neural basis models for interpretability. Advances in Neural Information Processing Systems, 35:8414–8426

work page 2022
[48]

Rahman, S. (2014). A generalized ANOV A dimensional decomposition for dependent probabil- ity measures.SIAM/ASA Journal on Uncertainty Quantification, 2(1):670–697

work page 2014
[49]

and Simon, B

Reed, M. and Simon, B. (1980). V olume 1: Functional analysis. InMethods of Modern Mathematical Physics. Elsevier

work page 1980
[50]

Why should I trust you?

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). " Why should I trust you?" Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD international conference on Knowledge Discovery and Data mining, pages 1135–1144

work page 2016
[51]

Rota, G.-C. (1964). On the foundations of combinatorial theory: I. theory of Möbius functions. InClassic Papers in Combinatorics, pages 332–360. Springer

work page 1964
[52]

Shapley, L. S. (1953).A Value for n-Person Games, pages 307–318. Princeton University Press, Princeton

work page 1953
[53]

Shrikumar, A., Greenside, P., and Kundaje, A. (2017). Learning important features through propagating activation differences. InInternational Conference on Machine Learning, pages 3145–3153. PMLR

work page 2017
[54]

W., Everhart, J

Smith, J. W., Everhart, J. E., Dickson, W. C., Knowler, W. C., and Johannes, R. S. (1988). Using the adap learning algorithm to forecast the onset of diabetes mellitus. InProceedings of the Annual Symposium on Computer Application in Medical Care, page 261

work page 1988
[55]

Stone, C. J. (1994). The use of polynomial splines and their tensor products in multivariate function estimation.The Annals of Statistics, pages 118–171. 12

work page 1994
[56]

and Najmi, A

Sundararajan, M. and Najmi, A. (2020). The many Shapley values for model explanation. In International Conference on Machine Learning, pages 9269–9278. PMLR

work page 2020
[57]

(1939).Orthogonal polynomials, volume 23

Szeg ˝o, G. (1939).Orthogonal polynomials, volume 23. American Mathematical Soc

work page 1939
[58]

(2008).Spectral theory of block operator matrices and applications

Tretter, C. (2008).Spectral theory of block operator matrices and applications. World Scientific

work page 2008
[59]

E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., et al

Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., et al. (2020). Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272

work page 2020
[60]

and Karniadakis, G

Xiu, D. and Karniadakis, G. E. (2002). The Wiener–Askey polynomial chaos for stochastic differential equations.SIAM Journal on Scientific Computing, 24(2):619–644

work page 2002
[61]

Yang, Z., Zhang, A., and Sudjianto, A. (2021). GAMI-Net: An explainable neural network based on generalized additive models with structured interactions.Pattern Recognition, 120:108192

work page 2021
[62]

1√ 2p−|S| Y s∈S ePms(Xs) fs(Xs) · 1√ 2p−|T| Y t∈T ePnt(Xt) ft(Xt) # (70) ∝E

Yu, G., Bien, J., and Tibshirani, R. (2019). Reluctant interaction modeling.arXiv:1907.08414. 13 A Legendre Polynomials In this section, we rely on results that have been widely studied by [57]. Definition A.1.For any non-negative integer m∈N , we denote by Pm the Legendre polynomial of degreem. This family is uniquely defined by the three-term recurrence...

work page arXiv 2019

[1] [1]

Agarwal, R., Melnick, L., Frosst, N., Zhang, X., Lengerich, B., Caruana, R., and Hinton, G. E. (2021). Neural additive models: Interpretable machine learning with neural nets.Advances in neural information processing systems, 34:4699–4711

work page 2021

[2] [2]

I., Salaün, T., and Brunel, N

Amoukou, S. I., Salaün, T., and Brunel, N. (2022). Accurate Shapley values for explaining tree-based models. InInternational conference on artificial intelligence and statistics, pages 2448–2465. PMLR

work page 2022

[3] [3]

Apley, D. W. and Zhu, J. (2020). Visualizing the effects of predictor variables in black box super- vised learning models.Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(4):1059–1086

work page 2020

[4] [4]

Arzamasov, V ., Böhm, K., and Jochem, P. (2018). Towards concise models of grid stability. In 2018 IEEE international conference on communications, control, and computing technologies for smart grids (SmartGridComm), pages 1–6. IEEE

work page 2018

[5] [5]

Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., and Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140

work page 2015

[6] [6]

Bénard, C. (2025). Tree Ensemble Explainability through the Hoeffding Functional Decomposi- tion and TreeHFD Algorithm.Advances in Neural Information Processing Systems

work page 2025

[7] [7]

and Sudret, B

Blatman, G. and Sudret, B. (2011). Adaptive sparse polynomial chaos expansion based on least angle regression.Journal of computational Physics, 230(6):2345–2367

work page 2011

[8] [8]

and von Luxburg, U

Bordt, S. and von Luxburg, U. (2023). From Shapley values to generalized additive models and back. InInternational Conference on Artificial Intelligence and Statistics, pages 709–745. PMLR

work page 2023

[9] [9]

(2011).Functional analysis, Sobolev spaces and partial differential equations, volume 2

Brézis, H. (2011).Functional analysis, Sobolev spaces and partial differential equations, volume 2. Springer

work page 2011

[10] [10]

Cencov, N. N. (1962). Estimation of an unknown distribution density from observations.Soviet Math., 3:1559–1566

work page 1962

[11] [11]

Chang, C.-H., Caruana, R., and Goldenberg, A. (2021). NODE-GAM: Neural generalized additive model for interpretable deep learning.arXiv:2106.01613

work page arXiv 2021

[12] [12]

Chastaing, G., Gamboa, F., and Prieur, C. (2012). Generalized Hoeffding-Sobol Decomposition for Dependent Variables – Application to Sensitivity Analysis.Electronic Journal of Statistics, 6:2420–2448

work page 2012

[13] [13]

Chen, T., He, T., Benesty, M., Khotilovich, V ., Tang, Y ., Cho, H., Chen, K., Mitchell, R., Cano, I., Zhou, T., et al. (2015). Xgboost: extreme gradient boosting.R package version 0.4-2, 1(4):1–4

work page 2015

[14] [14]

and Hilbert, D

Courant, R. and Hilbert, D. (2024).Methods of mathematical physics, volume 2. John Wiley & Sons

work page 2024

[15] [15]

Dua, D., Graff, C., et al. (2017). Uci machine learning repository, 2017.URL http://archive. ics. uci. edu/ml, 7(1):62

work page 2017

[16] [16]

Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression.The Annals of Statistics, 32(2). 10

work page 2004

[17] [17]

and Gama, J

Fanaee-T, H. and Gama, J. (2014). Event labeling combining ensemble detectors and background knowledge.Progress in Artificial Intelligence, 2(2):113–127

work page 2014

[18] [18]

Ferrere, B., Bousquet, N., Gamboa, F., Loubes, J.-M., and Muré, J. (2026). Exact Functional ANOV A Decomposition for Categorical Inputs Models.arXiv:2603.02673

work page arXiv 2026

[19] [19]

Ghanem, R. G. and Spanos, P. D. (2003).Stochastic finite elements: a spectral approach. Courier Corporation

work page 2003

[20] [20]

Hamidieh, K. (2018). A data-driven statistical model for predicting the critical temperature of a superconductor.Computational Materials Science, 154:346–354

work page 2018

[21] [21]

R., Millman, K

Harris, C. R., Millman, K. J., Van Der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., et al. (2020). Array programming with NumPy. Nature, 585(7825):357–362

work page 2020

[22] [22]

Harsanyi, J. C. (1963). A Simplified Bargaining Model for the n−Person Cooperative Game. International Economic Review, 4(2):194–220

work page 1963

[23] [23]

Hastie, T. J. (2017). Generalized additive models. InStatistical models in S, pages 249–307. Routledge

work page 2017

[24] [24]

and Hahn, P

Herren, A. and Hahn, P. R. (2022). Statistical aspects of SHAP: Functional ANOV A for model interpretation.arXiv:2208.09970

work page arXiv 2022

[25] [25]

T., and Wright, M

Hiabu, M., Meyer, J. T., and Wright, M. N. (2023). Unifying local and global model explanations by functional decomposition of low dimensional structures. InInternational conference on artificial intelligence and statistics, pages 7040–7060. PMLR

work page 2023

[26] [26]

Hoeffding, W. (1948). A Class of Statistics with Asymptotically Normal Distribution.The Annals of Mathematical Statistics, 19(3):293–325

work page 1948

[27] [27]

Hooker, G. (2007). Generalized Functional ANOV A Diagnostics for High-Dimensional Func- tions of Dependent Variables.Journal of Computational and Graphical Statistics, 16(3):709–732

work page 2007

[28] [28]

I., Bousquet, N., Gamboa, F., Iooss, B., and Loubes, J.-M

Idrissi, M. I., Bousquet, N., Gamboa, F., Iooss, B., and Loubes, J.-M. (2025). Hoeffding decomposition of functions of random dependent variables.Journal of Multivariate Analysis, 208:105444

work page 2025

[29] [29]

Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014

[30] [30]

Kohavi, R. et al. (1996). Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. InKdd, volume 96, pages 202–207

work page 1996

[31] [31]

E., Venkatasubramanian, S., Scheidegger, C., and Friedler, S

Kumar, I. E., Venkatasubramanian, S., Scheidegger, C., and Friedler, S. (2020). Problems with Shapley-value-based explanations as feature importance measures. InInternational conference on machine learning, pages 5491–5500. PMLR

work page 2020

[32] [32]

Lengerich, B., Tan, S., Chang, C.-H., Hooker, G., and Caruana, R. (2020). Purifying interaction effects with the functional ANOV A: An efficient algorithm for recovering identifiable additive models. InInternational Conference on Artificial Intelligence and Statistics, pages 2402–2412. PMLR

work page 2020

[33] [33]

and Conklin, M

Lipovetsky, S. and Conklin, M. (2001). Analysis of regression in game theory approach.Applied stochastic models in business and industry, 17(4):319–330

work page 2001

[34] [34]

Lou, Y ., Caruana, R., Gehrke, J., and Hooker, G. (2013). Accurate intelligible models with pairwise interactions. InProceedings of the 19th ACM SIGKDD international conference on Knowledge Discovery and Data mining, pages 623–631

work page 2013

[35] [35]

Consistent Individualized Feature Attribution for Tree Ensembles

Lundberg, S. M., Erion, G. G., and Lee, S.-I. (2018). Consistent individualized feature attribution for tree ensembles.arXiv:1802.03888. 11

work page internal anchor Pith review Pith/arXiv arXiv 2018

[36] [36]

Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30

work page 2017

[37] [37]

Montgomery, D. C. (2017).Design and analysis of experiments. John Wiley & Sons

work page 2017

[38] [38]

Muschalik, M., Baniecki, H., Fumagalli, F., Kolpaczki, P., Hammer, B., and Hüllermeier, E. (2024a). shapiq: Shapley interactions for machine learning.Advances in Neural Information Processing Systems, 37:130324–130357

work page

[39] [39]

Muschalik, M., Fumagalli, F., Hammer, B., and Hüllermeier, E. (2024b). Beyond TreeSHAP: Efficient computation of any-order shapley interactions for tree ensembles. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 14388–14396

work page

[40] [40]

Nelsen, R. B. (2006).An introduction to copulas. Springer

work page 2006

[41] [41]

Nori, H., Jenkins, S., Koch, P., and Caruana, R. (2019). InterpretML: A unified framework for machine learning interpretability.arXiv preprint arXiv:1909.09223

work page arXiv 2019

[42] [42]

Owen, A. B. (2014). Sobol’indices and Shapley value.SIAM/ASA Journal on Uncertainty Quantification, 2(1):245–251

work page 2014

[43] [43]

Owen, A. B. and Prieur, C. (2017). On Shapley value for measuring importance of dependent inputs.SIAM/ASA Journal on Uncertainty Quantification, 5(1):986–1002

work page 2017

[44] [44]

Pace, R. K. and Barry, R. (1997). Sparse spatial autoregressions.Statistics & Probability Letters, 33(3):291–297

work page 1997

[45] [45]

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V ., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V ., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830

work page 2011

[46] [46]

Popov, S., Morozov, S., and Babenko, A. (2019). Neural oblivious decision ensembles for deep learning on tabular data.arXiv:1909.06312

work page arXiv 2019

[47] [47]

Radenovic, F., Dubey, A., and Mahajan, D. (2022). Neural basis models for interpretability. Advances in Neural Information Processing Systems, 35:8414–8426

work page 2022

[48] [48]

Rahman, S. (2014). A generalized ANOV A dimensional decomposition for dependent probabil- ity measures.SIAM/ASA Journal on Uncertainty Quantification, 2(1):670–697

work page 2014

[49] [49]

and Simon, B

Reed, M. and Simon, B. (1980). V olume 1: Functional analysis. InMethods of Modern Mathematical Physics. Elsevier

work page 1980

[50] [50]

Why should I trust you?

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). " Why should I trust you?" Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD international conference on Knowledge Discovery and Data mining, pages 1135–1144

work page 2016

[51] [51]

Rota, G.-C. (1964). On the foundations of combinatorial theory: I. theory of Möbius functions. InClassic Papers in Combinatorics, pages 332–360. Springer

work page 1964

[52] [52]

Shapley, L. S. (1953).A Value for n-Person Games, pages 307–318. Princeton University Press, Princeton

work page 1953

[53] [53]

Shrikumar, A., Greenside, P., and Kundaje, A. (2017). Learning important features through propagating activation differences. InInternational Conference on Machine Learning, pages 3145–3153. PMLR

work page 2017

[54] [54]

W., Everhart, J

Smith, J. W., Everhart, J. E., Dickson, W. C., Knowler, W. C., and Johannes, R. S. (1988). Using the adap learning algorithm to forecast the onset of diabetes mellitus. InProceedings of the Annual Symposium on Computer Application in Medical Care, page 261

work page 1988

[55] [55]

Stone, C. J. (1994). The use of polynomial splines and their tensor products in multivariate function estimation.The Annals of Statistics, pages 118–171. 12

work page 1994

[56] [56]

and Najmi, A

Sundararajan, M. and Najmi, A. (2020). The many Shapley values for model explanation. In International Conference on Machine Learning, pages 9269–9278. PMLR

work page 2020

[57] [57]

(1939).Orthogonal polynomials, volume 23

Szeg ˝o, G. (1939).Orthogonal polynomials, volume 23. American Mathematical Soc

work page 1939

[58] [58]

(2008).Spectral theory of block operator matrices and applications

Tretter, C. (2008).Spectral theory of block operator matrices and applications. World Scientific

work page 2008

[59] [59]

E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., et al

Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., et al. (2020). Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272

work page 2020

[60] [60]

and Karniadakis, G

Xiu, D. and Karniadakis, G. E. (2002). The Wiener–Askey polynomial chaos for stochastic differential equations.SIAM Journal on Scientific Computing, 24(2):619–644

work page 2002

[61] [61]

Yang, Z., Zhang, A., and Sudjianto, A. (2021). GAMI-Net: An explainable neural network based on generalized additive models with structured interactions.Pattern Recognition, 120:108192

work page 2021

[62] [62]

1√ 2p−|S| Y s∈S ePms(Xs) fs(Xs) · 1√ 2p−|T| Y t∈T ePnt(Xt) ft(Xt) # (70) ∝E

Yu, G., Bien, J., and Tibshirani, R. (2019). Reluctant interaction modeling.arXiv:1907.08414. 13 A Legendre Polynomials In this section, we rely on results that have been widely studied by [57]. Definition A.1.For any non-negative integer m∈N , we denote by Pm the Legendre polynomial of degreem. This family is uniquely defined by the three-term recurrence...

work page arXiv 2019