pith. sign in

arxiv: 2503.14469 · v5 · submitted 2025-03-18 · 💻 cs.DB · cs.AI

Causality-Based Scores Alignment in Explainable Data Management

Pith reviewed 2026-05-22 23:32 UTC · model grok-4.3

classification 💻 cs.DB cs.AI
keywords attribution scorescausalityexplainable data managementquery syntaxtuple rankingscore alignmentexogenous tuplesdatabase explanations
0
0 comments X

The pith

Causality-based attribution scores for database tuples always align in rankings for some query classes but not others, with exogenous tuples as the key factor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates alignment between pairs of attribution scores—causal responsibility, Shapley value, Banzhaf power index, and causal effect—when ranking database tuples that contribute to query answers. It shows that a syntactic classification of queries divides them into two groups: one where any pair of these scores always produces compatible rankings no matter the database, and another where misalignment can occur. The presence of exogenous tuples, which do not participate in the query, turns out to control whether alignment holds. This matters for explainable data management because practitioners often pick one score as a proxy for another when generating explanations.

Core claim

Pairs of causality-based scores are always aligned on one side of a syntactic dichotomy for queries and not always aligned on the other side, with the presence of exogenous tuples making a crucial difference.

What carries the argument

The syntactic dichotomy of queries that separates classes where score pairs induce identical tuple rankings from classes where they do not.

Load-bearing premise

The standard definitions of the four attribution scores apply without change to the query classes studied, and syntactic query properties alone fix alignment behavior for every possible database.

What would settle it

A concrete database and query belonging to the 'always aligned' syntactic class in which two of the scores produce different orderings of the same tuples.

Figures

Figures reproduced from arXiv: 2503.14469 by Felipe Azua, Leopoldo Bertossi.

Figure 1
Figure 1. Figure 1: Query Classification: Theorem 1 With Theorem 1 we have a full characterization of the space of SJF BCQs, as shown in [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗
read the original abstract

Different attribution scores have been proposed to quantify the relevance of database tuples for query answering in databases; e.g. Causal Responsibility, the Shapley Value, the Banzhaf Power-Index, and the Causal Effect. They have been analyzed in isolation. This work is a first investigation of score alignment depending on the query and the database; i.e. on whether they induce compatible rankings of tuples. We concentrate mostly on causality-based scores; and provide a syntactic dichotomy result for queries: on one side, pairs of scores are always aligned, on the other, they are not always aligned. It turns out that the presence of exogenous tuples makes a crucial difference in this regard.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper investigates alignment between pairs of causality-based attribution scores (Causal Responsibility, Shapley Value, Banzhaf Power-Index, Causal Effect) for database tuples. It claims a syntactic dichotomy over query classes: on one side pairs are always aligned (compatible rankings), on the other side they are not always aligned, with the presence or absence of exogenous tuples making a crucial difference in the alignment behavior.

Significance. If the dichotomy is correct, the result supplies a useful theoretical classification of queries that determines when different scores can be used interchangeably for ranking tuples. This would be a meaningful contribution to explainable data management, as it links query syntax directly to score compatibility and highlights the role of exogenous data.

major comments (1)
  1. [Abstract] Abstract: the central syntactic dichotomy asserts that certain query classes are 'always aligned' for every database instance. Because the abstract states that exogenous tuples 'make a crucial difference,' the proof must demonstrate that no counter-example database (with suitably chosen exogenous tuples) exists for the 'always' class. The stress-test concern that syntax alone may not guarantee universality across arbitrary exogenous tuples is therefore load-bearing; without an explicit argument or exhaustive case analysis addressing this, the dichotomy claim is not yet secured.
minor comments (1)
  1. The abstract would be clearer if it named the specific syntactic classes (e.g., the query fragments or operators) that fall on each side of the dichotomy.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed reading and for highlighting the need to ensure the 'always aligned' side of the dichotomy holds universally, including over databases containing exogenous tuples. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central syntactic dichotomy asserts that certain query classes are 'always aligned' for every database instance. Because the abstract states that exogenous tuples 'make a crucial difference,' the proof must demonstrate that no counter-example database (with suitably chosen exogenous tuples) exists for the 'always' class. The stress-test concern that syntax alone may not guarantee universality across arbitrary exogenous tuples is therefore load-bearing; without an explicit argument or exhaustive case analysis addressing this, the dichotomy claim is not yet secured.

    Authors: The referee is correct that the 'always aligned' direction requires the property to hold for every database instance. The manuscript establishes the syntactic dichotomy by proving that, for the query classes placed on the 'always' side, ranking compatibility between the pairs of scores follows directly from the query syntax and holds for arbitrary database instances. The proofs (Sections 4–5) proceed via exhaustive case analysis on the syntactic forms (e.g., certain classes of conjunctive queries and their extensions); each case shows that the relative ordering induced by the scores is preserved regardless of which additional tuples—exogenous or otherwise—are present in the instance. Consequently, no counter-example database exists for these classes. The 'crucial difference' made by exogenous tuples appears only on the complementary side of the dichotomy, where they are used to construct the counter-examples that separate the classes. If the referee finds the independence from exogenous tuples insufficiently foregrounded, we can add an explicit corollary and a short remark in the proof overview. revision: partial

Circularity Check

0 steps flagged

No circularity; syntactic dichotomy derived from standard score definitions

full rationale

The paper presents a theoretical syntactic dichotomy classifying queries into 'always aligned' vs 'not always aligned' for pairs of attribution scores (responsibility, Shapley, Banzhaf, causal effect). This rests on the standard definitions of the four scores and query syntax, with exogenous tuples handled as part of the case analysis. No equations, fitted parameters, or self-citations are load-bearing in the provided material; the result does not reduce to its inputs by construction. The derivation is self-contained as a characterization over query classes and database instances.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, background axioms, or new entities introduced by the paper.

pith-pipeline@v0.9.0 · 5631 in / 911 out tokens · 34312 ms · 2026-05-22T23:32:05.781523+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Sufficient Explanations in Databases and their Connections to Database Repairs

    cs.DB 2025-11 unverdicted novelty 5.0

    The paper introduces sufficient explanations and a sufficiency-degree attribution score for database tuples in query answering, connects them to database repairs and causality explanations, and demonstrates computatio...

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · cited by 1 Pith paper

  1. [1]

    and Olteanu, D

    Abramovich, O., Deutch, D., Frost, N., Kara, A. and Olteanu, D. Banzhaf Values for Facts in Query Answering. Proc. SIGMOD, 2024

  2. [2]

    and Frost, N

    Arad, D., Deutch, D. and Frost, N. LearnShapley: Learning to Predict Rankings of Facts Contribution Based on Query Logs. Proc. CIKM, 2022, pp. 4788-4792

  3. [3]

    and Bertossi, L

    Az´ ua, F. and Bertossi, L. The Causal-Effect Score in Data Management. Proc. 4th Conference on Causal Learning and Reasoning (CLeaR 2025), PMLR, 2025, 275:874–893. arXiv 2502.02495

  4. [4]

    Weighted Voting Doesn’t Work: A Mathematical Analysis.Rutgers L

    Banzhaf III, J. Weighted Voting Doesn’t Work: A Mathematical Analysis.Rutgers L. Rev., 19(31), 1964

  5. [5]

    and Zick, Y

    Biradar, G., Izza, Y., Lobo, E., Viswanathan, V. and Zick, Y. Axiomatic Aggre- gations of Abductive Explanations. Proc. AAAI 2024

  6. [6]

    From Database Repairs to Causality in Databases and Beyond

    Bertossi, L. From Database Repairs to Causality in Databases and Beyond. Transactions on Large-Scale Data- and Knowledge-Centered Systems LIV, 2023, 14160:119-131

  7. [7]

    and Salimi, B

    Bertossi, L. and Salimi, B. Causes for Query Answers from Databases: Datalog Abduction, View-Updates, and Integrity Constraints.Int. J. Approx. Reason., 2017, 90:226-252

  8. [8]

    From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back.Theory Comput

    Bertossi, L and Salimi,B. From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back.Theory Comput. Syst., 2017, 61(1):191-232

  9. [9]

    and Monet, M

    Bertossi, L., Livshits, E., Kimelfeld, B. and Monet, M. The Shapley Value in Database Management.ACM Sigmod Record, 2023, 52(3):6-17

  10. [10]

    and Lafourcade, P

    Bienvenu, M., Figueira, D. and Lafourcade, P. When is Shapley Value Com- putation a Matter of Counting? Proc. ACM Manag. Data (PODS), 2024, 2, p. 105

  11. [11]

    and Halpern, J

    Chockler, H. and Halpern, J. Responsibility and Blame: A Structural-Model Approach.J. Artif. Intell. Res., 2004, 22:93-115

  12. [12]

    and Suciu, D

    Dalvi, N. and Suciu, D. The Dichotomy of Probabilistic Inference for Unions of Conjunctive Queries.J. ACM, 2012, 59(6):30:1-87

  13. [13]

    and Monet, M

    Davidson, S., Deutch, D., Frost, N., Kimelfeld, B., Koren, O. and Monet, M. ShapGraph: An Holistic View of Explanations through Provenance Graphs and Shapley Values. Proc. SIGMOD, 2022

  14. [14]

    and Monet, M

    Deutch, D., Frost, N., Kimelfeld, B. and Monet, M. Computing the Shapley Value of Facts in Query Answering. Proc. SIGMOD, 2022, pp. 1570-1583

  15. [15]

    and Shapley, L

    Dubey, P. and Shapley, L. S. Mathematical Properties of the Banzhaf Power Index.Math. Oper. Res., 1979, 4(2):99-131

  16. [16]

    On Ordinal Equivalence of the Shapley and Banzhaf Values for Coop- erative Games.Int

    Freixas, J. On Ordinal Equivalence of the Shapley and Banzhaf Values for Coop- erative Games.Int. J. Game Theory, 2010, 39:513-527

  17. [17]

    and Pons, M

    Freixas, J., Marciniak, D. and Pons, M. On the Ordinal Equivalence of the John- ston, Banzhaf and Shapley Power Indices.European Journal of Operational Re- search, 2012, 216:367–375

  18. [18]

    and Hill, J.Data Analysis Using Regression and Multi- level/Hierarchical Models

    Gelman, A. and Hill, J.Data Analysis Using Regression and Multi- level/Hierarchical Models. Cambridge Univ. Press, 2007

  19. [19]

    A Modification of the Halpern-Pearl Definition of Causality

    Halpern, J. A Modification of the Halpern-Pearl Definition of Causality. Proc. IJCAI, 2015, pp. 3022-3033

  20. [20]

    MIT Press, 2016

    Halpern, J.Actual Causality. MIT Press, 2016

  21. [21]

    Halpern, J. Y. and Pearl, J. Causes and Explanations: A Structural-Model Ap- proach. Part I: Causes.The British Journal for the Philosophy of Science, 2005, 56(4):843-887. 18

  22. [22]

    Holland, P. W. Statistics and Causal Inference.Journal of the American Statistical Association, 1986, 81(396):945-960

  23. [23]

    and Suciu, D

    Kara, A., Olteanu, D. and Suciu, D. From Shapley Value to Model Counting and Back. In Proc. ACM Manag. Data (PODS), 2024

  24. [24]

    and Bressan, S

    Karmarkar, P., Monet, M., Senellart, P. and Bressan, S. Expected Shapley- Like Scores of Boolean Functions: Complexity and Applications to Probabilistic Databases. In Proc. ACM Manag. Data (PODS), 2024, 2, p. 92

  25. [25]

    and Sebag, M

    Livshits, E., Bertossi, L., Kimelfeld, B. and Sebag, M. Query Games in Databases. ACM Sigmod Record, 2021, 50(1):78-85

  26. [26]

    and Sebag, M

    Livshits, E., Bertossi, L., Kimelfeld, B. and Sebag, M. The Shapley Value of Tuples in Query Answering.Log. Methods Comput. Sci., 2021, 17(3)

  27. [27]

    and Gatterbauer, W

    Makhija, N. and Gatterbauer, W. A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming and LP Relaxations. Proc. SIG- MOD, 2023

  28. [28]

    Meliou, A., Gatterbauer, W., Halpern, J., Koch, Ch., Moore, K. E. and Suciu, D. Causality in Databases.IEEE Data Eng. Bull., 2010, 33(3):59-67

  29. [29]

    Meliou, A., Gatterbauer, W., Moore, K. F. and Suciu, D. The Complexity of Causality and Responsibility for Query Answers and Non-Answers. Proc. VLDB Endow., 2010, 4(1):34-45

  30. [30]

    Cambridge University Press, USA, 2nd edition, 2009

    Pearl, J.Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009

  31. [31]

    E, editor.The Shapley Value: Essays in Honor of Lloyd S

    Roth, A. E, editor.The Shapley Value: Essays in Honor of Lloyd S. Shapley. Cambridge University Press, 1988

  32. [32]

    and Salimi, B

    Roy, S. and Salimi, B. Causal Inference in Data Analysis with Applications to Fairness and Explanations. InReasoning Web. Causality, Explanations and Declarative Knowledge, Springer LNCS 13759, 2023, pp. 105-131

  33. [33]

    Rubin, D. B. Estimating Causal Effects of Treatments in Randomized and Non- randomized Studies.Journal of Educational Psychology, 1974, 66:688-701

  34. [34]

    Salimi, B.Query-Answer Causality in Databases and its Connections with Reverse Reasoning Tasks in Data and Knowledge Management.PhD Thesis, Carleton University, Canada, 2016

  35. [35]

    Quantifying Causal Effects on Query Answering in Databases

    Salimi, B., Bertossi, L., Suciu, D and Van den Broeck, G. Quantifying Causal Effects on Query Answering in Databases. Proc. TaPP, USENIX Association, 2016

  36. [36]

    A Value for an n-Person Game

    Shapley, L. A Value for an n-Person Game. InContributions to the Theory of Games II, Princeton Univ. Press, 1953, pp. 307–331

  37. [37]

    Causality-Based Scores Alignment in Explainable Data Management

    Suciu, D., Olteanu, D., R´ e, Ch. and Koch, Ch.Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. 19 Appendix to: “Causality-Based Scores Alignment in Explainable Data Management” by Felipe Az´ ua and Leopoldo Bertossi A Basic Notions and Properties of the Scores The contents of this section will be needed ...

  38. [38]

    This change is countered by the increasing in the number of endogenous tuples by one, leaving the CES unchanged for all endogenous tuples

    Therefore, adding the tupleτ d toD∖{τ d}, the number of∆(Q, S∪ {τ d}, τ), which are equal to one, is doubled. This change is countered by the increasing in the number of endogenous tuples by one, leaving the CES unchanged for all endogenous tuples. For (c), we recall a property of Shapley Value in relation with dummy tu- ples: for a given BCQQand an insta...

  39. [39]

    (ii) There is no subinstance with the previous condition

    Thus, the scores are aligned. (ii) There is no subinstance with the previous condition. If this is the case, ρ(D,Q, τ) = 1 1+r for anyτ∈D en. SinceRespis constant, it follows that the scores are aligned.■ Proof of Proposition 2.Letx, y∈Var(Q), such thatAtoms(y)⫋Atoms(x). Now, select two atomsR x, Ry fromAtoms(Q), such thatR x ∈(Atoms(x)∖ Atoms(y)) andR y ...

  40. [40]

    We will extend the result for such a query using Proposition 4

    We first consider a queryQwith a single component and without non-trivial sets of coincident variables. We will extend the result for such a query using Proposition 4. Consider the case|Atoms(Q)|= 2, and|Var(Q)|= 2 or 3. We proceed by showing thatCESandRespare aligned for any pair (Q RS, D), whereDis an instance with or without exogenous tuples, andQ RS (...

  41. [41]

    Now, consider an instanceDwithout dummy tuples

    From now on, we will assumem≥2. Now, consider an instanceDwithout dummy tuples. Two cases arise: (1) All tuples inDshare the same constant in thex’s position of the atoms in Qred R1,Sm; or (2)Dcontains at least two tuples where the constants in thex’s position of each tuple are different. We first deal with case (1). Letτ R andτ si a tuple from relationRa...