pith. sign in

arxiv: 2604.26266 · v1 · submitted 2026-04-29 · 💻 cs.IR

Explaining the "Why": A Unified Framework for the Additive Attribution of Changes in Arbitrary Measures

Pith reviewed 2026-05-07 12:36 UTC · model grok-4.3

classification 💻 cs.IR
keywords attributioncooperative game theoryroot cause analysismeasure classificationdata analyticsSimpson's paradoxadditive attributionnon-additive measures
0
0 comments X

The pith

A classification of measures by mathematical structure yields a spectrum of attribution algorithms from approximations to exact closed-form solutions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a unified framework for explaining changes in arbitrary aggregated measures by reframing the attribution task as a cooperative game. It introduces a classification of measures according to their mathematical structure, which determines whether general approximation methods or exact closed-form solutions apply. A reader would care because prior methods fail to combine generality across measure types, holism over data dimensions and compositions, and rigorous interpretability in one approach. If the classification works as described, analysts gain a principled way to select algorithms that balance accuracy and efficiency when diagnosing why metrics shift.

Core claim

The central claim is that classifying measures based on their mathematical structure enables a spectrum of algorithms—from general approximations to exact, closed-form solutions—that offer a principled trade-off between generality and performance. This is shown through simulations confirming numerical accuracy and generality for non-additive measures, a Simpson's Paradox case study demonstrating unique interpretability, and experiments where the framework significantly outperforms existing root cause analysis systems.

What carries the argument

The classification of measures according to mathematical structure, which selects the appropriate attribution algorithm from approximations to exact solutions within the cooperative game reframing.

If this is right

  • Exact closed-form attribution becomes available for measures whose structure matches the classification criteria.
  • Non-additive measures receive consistent approximate attributions that preserve the cooperative game properties.
  • The same framework applies uniformly across data dimensions and measure compositions without custom per-measure adjustments.
  • Root cause analysis systems built on this approach achieve higher accuracy than prior methods on both synthetic and real tasks.
  • Interpretability improves in paradoxical cases such as Simpson's Paradox because attributions respect the full game structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The classification may suggest analogous structure-based breakdowns for attribution problems in model interpretability or causal analysis.
  • Implementers could benchmark the exact solutions against sampling methods on common measures such as sums, ratios, and products to quantify speed gains.
  • The cooperative game view could be extended to dynamic settings where measures evolve over time, treating successive snapshots as sequential games.

Load-bearing premise

That reframing attribution as a cooperative game over arbitrary measures produces a holistic and rigorous solution that existing methods lack, and that the structure-based classification reliably enables both generality and performance.

What would settle it

Run the exact closed-form algorithm on a measure the classification assigns to that category and observe whether the attributed contributions sum exactly to the observed change in the aggregated measure on held-out simulated data.

Figures

Figures reproduced from arXiv: 2604.26266 by Changsheng Zhou, Dajun Chen, Peng Di, Wei Jiang, Yong Li, Zhitao Shen.

Figure 2
Figure 2. Figure 2: Accuracy against decay factor. Error bars indicate view at source ↗
read the original abstract

Explaining why aggregated measures change is a critical challenge in data analytics that existing systems struggle to address. While current attribution methods exist, they lack a unified solution that is simultaneously general for arbitrary measures, holistic across both data dimensions and measure composition, and rigorous in its interpretability. To bridge this gap, we introduce a principled framework that reframes attribution through the powerful lens of cooperative game theory. Our key contribution is a classification of measures based on their mathematical structure, which enables a spectrum of algorithms-from general approximations to exact, closed-form solutions-that offer a principled trade-off between generality and performance. We demonstrate our framework's superiority through a multi-faceted evaluation: simulations first confirm its numerical accuracy and then its generality for non-additive measures; a case study on Simpson's Paradox showcases its unique interpretability; and a final experiment proves its practical utility by significantly outperforming existing root cause analysis systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces a unified framework for attributing changes in arbitrary aggregated measures by reframing the attribution problem as a cooperative game. Its central contribution is a classification of measures according to their mathematical structure, which supports a spectrum of algorithms ranging from general approximations to exact closed-form solutions and provides a principled generality-performance trade-off. The framework is evaluated via simulations confirming numerical accuracy and applicability to non-additive measures, a Simpson's Paradox case study demonstrating interpretability, and experiments showing outperformance relative to existing root cause analysis systems.

Significance. If the central claims hold, the work would offer a significant advance in explainable data analytics by supplying a general, holistic, and rigorous method for measure attribution that extends beyond additive cases and unifies disparate approaches through game-theoretic modeling. The structure-based classification and resulting algorithmic spectrum could enable practical improvements in root cause analysis while maintaining interpretability guarantees. Strengths include the explicit handling of non-additive measures and the multi-faceted evaluation design.

major comments (1)
  1. §3 (Framework and value function construction): The definition of the characteristic function v(S) — the measure change attributable to coalition S — is not uniquely determined for arbitrary (especially non-additive) measures. Different choices of baseline, marginal contribution ordering, or interaction encoding produce different games and thus different attributions for the same data. The classification of measures addresses computational tractability after v is fixed but does not resolve this prior modeling choice, so the claims of a 'unified,' 'holistic,' and 'rigorous' solution remain conditional on an unstated canonical construction of v that may not exist without additional assumptions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comment raises an important point about the construction of the value function, which we address below by clarifying the role of our measure classification in guiding this choice. We have revised the paper to strengthen the presentation of this aspect.

read point-by-point responses
  1. Referee: §3 (Framework and value function construction): The definition of the characteristic function v(S) — the measure change attributable to coalition S — is not uniquely determined for arbitrary (especially non-additive) measures. Different choices of baseline, marginal contribution ordering, or interaction encoding produce different games and thus different attributions for the same data. The classification of measures addresses computational tractability after v is fixed but does not resolve this prior modeling choice, so the claims of a 'unified,' 'holistic,' and 'rigorous' solution remain conditional on an unstated canonical construction of v that may not exist without additional assumptions.

    Authors: We agree that constructing v(S) involves modeling decisions, especially for non-additive measures where interactions must be encoded. Section 3 of the manuscript defines v(S) explicitly as the attributable change in the target measure for coalition S relative to a fixed baseline (typically the reference period or population), with the precise formulation determined by the measure's mathematical structure per our classification. For additive measures, v(S) reduces to the sum of marginal contributions; for non-additive cases (e.g., ratios or products), it incorporates higher-order terms via the structure-specific encoding detailed in §3.2–3.4. This classification therefore does more than address tractability: it supplies the canonical construction of v(S) for each class, ensuring the resulting game yields additive attributions that are unique within the chosen structure. Different baselines or orderings are possible in principle, but our framework restricts them to structure-preserving choices that maintain the additivity guarantee. To address the concern directly, we have expanded §3 with a new paragraph and example table illustrating how the classification dictates the v(S) definition, thereby reinforcing the unified and rigorous character of the approach. revision: yes

Circularity Check

0 steps flagged

No circularity: new classification and game-theoretic reframing are self-contained

full rationale

The paper introduces a classification of measures by mathematical structure to support a spectrum of attribution algorithms derived from cooperative game theory. No equations, fitting procedures, or self-citations are presented that reduce any claimed result to its own inputs by construction. The central reframing and classification steps are presented as novel contributions rather than tautological renamings or fitted predictions, with evaluations (simulations, Simpson's Paradox case, and root-cause comparisons) serving as external checks rather than internal re-derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is abstract-only, so the ledger records only the high-level premises stated in the abstract.

axioms (1)
  • domain assumption Measures admit a classification by mathematical structure that enables both exact and approximate attribution algorithms.
    This classification is presented as the key enabler of the spectrum of algorithms and the trade-off between generality and performance.

pith-pipeline@v0.9.0 · 5465 in / 1173 out tokens · 29873 ms · 2026-05-07T12:36:40.945611+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 3 canonical work pages

  1. [1]

    Explaining individual predictions when features are dependent: More accurate approximations to shapley values

    Kjersti Aas, Martin Jullum, and Anders Løland. Explaining individual predictions when features are dependent: More accurate approximations to shapley values. Artificial Intelligence, 298:103502, 2021

  2. [2]

    B.W. Ang. Lmdi decomposition approach: A guide for implementation.Energy Policy, 86:233–238, 2015

  3. [3]

    Factorizing changes in energy and environmental indicators through decomposition.Energy, 23(6):489–495, 1998

    B.W Ang, F.Q Zhang, and Ki-Hong Choi. Factorizing changes in energy and environmental indicators through decomposition.Energy, 23(6):489–495, 1998

  4. [4]

    A logarithmic mean divisia index decomposition of co 2 emissions from energy use in romania

    Mariana Carmelia Balanica-Dragomir, Gabriel Murariu, and Lucian Puiu Georgescu. A logarithmic mean divisia index decomposition of co 2 emissions from energy use in romania. Papers 2403.04354, arXiv.org, Mar 2024

  5. [5]

    Adtributor: Rev- enue debugging in advertising systems

    Ranjita Bhagwan, Rahul Kumar, Ramachandran Ramjee, George Varghese, Sur- jyakanta Mohapatra, Hemanth Manoharan, and Piyush Shah. Adtributor: Rev- enue debugging in advertising systems. InSymposium on Networked Systems Design and Implementation, 2014

  6. [6]

    Causal structure-based root cause analysis of outliers

    Kailash Budhathoki, Lenon Minorics, Patrick Bloebaum, and Dominik Janzing. Causal structure-based root cause analysis of outliers. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Resear...

  7. [7]

    Covert, Scott M

    Hugh Chen, Ian C. Covert, Scott M. Lundberg, and Su-In Lee. Algorithms to estimate shapley value feature attributions, 2022

  8. [8]

    Lundberg, and Su-In Lee

    Hugh Chen, Scott M. Lundberg, and Su-In Lee. Explaining a series of models by propagating shapley values.Nature Communications, 13(1), August 2022

  9. [9]

    1973 berkeley graduate admissions data

    Data Science Discovery. 1973 berkeley graduate admissions data. https: //discovery.cs.illinois.edu/dataset/berkeley/. Accessed: 2025-08-15

  10. [10]

    Root cause analysis of failures in microservices through causal discovery

    Azam Ikram, Sarthak Chakraborty, Subrata Mitra, Shiv Saini, Saurabh Bagchi, and Murat Kocaoglu. Root cause analysis of failures in microservices through causal discovery. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems, volume 35, pages 31158–31170. Curran Associates, Inc., 2022

  11. [11]

    Feature relevance quantification in explainable ai: A causal perspective

    Dominik Janzing, Lenon Minorics, and Patrick Blöbaum. Feature relevance quantification in explainable ai: A causal perspective. InProceedings of the 23rd International Conference on Artificial Intelligence and Statistics, volume 108 of PMLR, pages 3227–3237, 2020

  12. [12]

    Fastshap: Real-time shapley value estimation

    Neil Jethani, Mukund Sudarshan, Rumen Watcher, and Ramesh Raskar. Fastshap: Real-time shapley value estimation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 7131–7139, 2021

  13. [13]

    Autoroot: A novel fault localization schema of multi-dimensional root causes

    Pengkun Jing, Yanni Han, Jiyan Sun, Tao Lin, and Yanjie Hu. Autoroot: A novel fault localization schema of multi-dimensional root causes. In2021 IEEE Wireless Communications and Networking Conference (WCNC), pages 1–7, 2021

  14. [14]

    Riskloc: Localization of multi-dimensional root causes by weighted risk, 2022

    Marcus Kalander. Riskloc: Localization of multi-dimensional root causes by weighted risk, 2022

  15. [15]

    Riskloc: Localization of multi-dimensional root causes by weighted risk.arXiv preprint arXiv:2205.10004, 2022

    Marcus Kalander. Riskloc: Localization of multi-dimensional root causes by weighted risk.arXiv preprint arXiv:2205.10004, 2022

  16. [16]

    Causal inference-based root cause analysis for online service systems with intervention recognition

    Mingjie Li, Zeyan Li, Kanglin Yin, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, and Dan Pei. Causal inference-based root cause analysis for online service systems with intervention recognition. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 3230–3240, New York, NY, USA, 2022. Association for Computing Machinery

  17. [17]

    Generic and robust root cause localization for multi-dimensional data in online service systems, 2023

    Zeyan Li, Junjie Chen, Yihao Chen, Chengyang Luo, Yiwei Zhao, Yongqian Sun, Kaixin Sui, Xiping Wang, Dapeng Liu, Xing Jin, Qi Wang, and Dan Pei. Generic and robust root cause localization for multi-dimensional data in online service systems, 2023

  18. [18]

    idice: Prob- lem identification for emerging issues

    Qingwei Lin, Jian-Guang Lou, Hongyu Zhang, and Dongmei Zhang. idice: Prob- lem identification for emerging issues. In2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), pages 214–224, 2016

  19. [19]

    A unified approach to interpreting model predic- tions, 2017

    Scott Lundberg and Su-In Lee. A unified approach to interpreting model predic- tions, 2017

  20. [20]

    Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M

    Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. Ex- plainable ai for trees: From local explanations to global understanding, 2019

  21. [21]

    Lundberg, Gabriel G

    Scott M. Lundberg, Gabriel G. Erion, and Su-In Lee. Consistent individualized feature attribution for tree ensembles, 2019

  22. [22]

    Anomaly detection and fault localization an automated process for advertising systems, 1 2018

    Moa Persson and Linnea Rudenius. Anomaly detection and fault localization an automated process for advertising systems, 1 2018

  23. [23]

    why should i trust you?

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should i trust you?": Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 1135–1144. ACM, 2016

  24. [24]

    The cascading analysts algorithm

    Matthias Ruhl, Mukund Sundararajan, and Qiqi Yan. The cascading analysts algorithm. InProceedings of the 2018 International Conference on Management of Data, SIGMOD ’18, page 1083–1096, New York, NY, USA, 2018. Association for Computing Machinery

  25. [25]

    Columbus Salley and E. F. Codd. Providing olap to user-analysts: An it mandate. 1998

  26. [26]

    Explaining differences in multidimensional aggregates

    Sunita Sarawagi. Explaining differences in multidimensional aggregates. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB ’99, page 42–53, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc

  27. [27]

    idiff: Informative summarization of differences in multidi- mensional aggregates.Data Mining and Knowledge Discovery, 5(4):255–276, 10 2001

    Sunita Sarawagi. idiff: Informative summarization of differences in multidi- mensional aggregates.Data Mining and Knowledge Discovery, 5(4):255–276, 10 2001

  28. [28]

    A value for n-person games

    Lloyd S Shapley. A value for n-person games. In Harold W. Kuhn and Albert W. Tucker, editors,Contributions to the Theory of Games II, pages 307–317. Princeton University Press, Princeton, 1953

  29. [29]

    Minglin Shen, Yanping Hou, Keying Liang, Wenjing Zhu, Chin Hao Chong, Yuejing Bin, Xiaoyong Zhou, and Linwei Ma. Energy-system characteristic shifts and their quantitative impacts on china’s co2 trajectory: Evidence from a high-resolution energy allocation analysis–lmdi sectoral decomposition.Energy, 335:137905, 2025

  30. [30]

    Learning important features through propagating activation differences

    Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. InProceedings of the 34th International Conference on Machine Learning, volume 70 ofPMLR, pages 3145– 3153, 2017

  31. [31]

    Robust anomaly clue localization of multi-dimensional derived measure for online video services.IEEE Transactions on Services Computing, 16(2):1387–1401, 2023

    Yongqian Sun, Daguo Cheng, Pengxiang Jin, Quan Ding, Shenglin Zhang, Xu Chen, Yuzhi Zhang, Minghan Liang, Dan Pei, Jianyan Zheng, Sen Luo, and Xinyu Tang. Robust anomaly clue localization of multi-dimensional derived measure for online video services.IEEE Transactions on Services Computing, 16(2):1387–1401, 2023

  32. [32]

    Hotspot: Anomaly localization for additive kpis with multi-dimensional attributes.IEEE Access, 6:10909–10923, 2018

    Yongqian Sun, Youjian Zhao, Ya Su, Dapeng Liu, Xiaohui Nie, Yuan Meng, Shiwen Cheng, Dan Pei, Shenglin Zhang, Xianping Qu, and Xuanyou Guo. Hotspot: Anomaly localization for additive kpis with multi-dimensional attributes.IEEE Access, 6:10909–10923, 2018

  33. [33]

    The many shapley values for model explanation

    Mukund Sundararajan and Amir Najmi. The many shapley values for model explanation. InProceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020

  34. [34]

    Axiomatic attribution for deep networks

    Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InProceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 3319–3328. JMLR.org, 2017

  35. [35]

    RADICE: causal graph based root cause analysis for system performance diagnostic.CoRR, abs/2501.11545, 2025

    Andrea Tonon, Meng Zhang, Bora Caglayan, Fei Shen, Tong Gui, MingXue Wang, and Rong Zhou. RADICE: causal graph based root cause analysis for system performance diagnostic.CoRR, abs/2501.11545, 2025

  36. [36]

    Incremental causal graph learning for online root cause analysis

    Dongjie Wang, Zhengzhang Chen, Yanjie Fu, Yanchi Liu, and Haifeng Chen. Incremental causal graph learning for online root cause analysis. InProceedings 10 Explaining the “Why”: A Unified Framework for the Additive Attribution of Changes in Arbitrary Measures Conference’17, July 2017, Washington, DC, USA of the 29th ACM SIGKDD Conference on Knowledge Disco...

  37. [37]

    Cmmd: Cross-metric multi-dimensional root cause analysis

    Shifu Yan, Caihua Shan, Wenyi Yang, Bixiong Xu, Dongsheng Li, Lili Qiu, Jie Tong, and Qi Zhang. Cmmd: Cross-metric multi-dimensional root cause analysis. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 4310–4320, New York, NY, USA, 2022. Association for Computing Machinery

  38. [38]

    Fast treeshap: Accelerating shap value computation for trees, 2022

    Jilei Yang. Fast treeshap: Accelerating shap value computation for trees, 2022

  39. [39]

    Mulan: Multi- modal causal structure learning and root cause analysis for microservice systems

    Lecheng Zheng, Zhengzhang Chen, Jingrui He, and Haifeng Chen. Mulan: Multi- modal causal structure learning and root cause analysis for microservice systems. InProceedings of the ACM Web Conference 2024, WWW ’24, page 4107–4116, New York, NY, USA, 2024. Association for Computing Machinery. 11