Causality-Based Scores Alignment in Explainable Data Management
Pith reviewed 2026-05-22 23:32 UTC · model grok-4.3
The pith
Causality-based attribution scores for database tuples always align in rankings for some query classes but not others, with exogenous tuples as the key factor.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Pairs of causality-based scores are always aligned on one side of a syntactic dichotomy for queries and not always aligned on the other side, with the presence of exogenous tuples making a crucial difference.
What carries the argument
The syntactic dichotomy of queries that separates classes where score pairs induce identical tuple rankings from classes where they do not.
Load-bearing premise
The standard definitions of the four attribution scores apply without change to the query classes studied, and syntactic query properties alone fix alignment behavior for every possible database.
What would settle it
A concrete database and query belonging to the 'always aligned' syntactic class in which two of the scores produce different orderings of the same tuples.
Figures
read the original abstract
Different attribution scores have been proposed to quantify the relevance of database tuples for query answering in databases; e.g. Causal Responsibility, the Shapley Value, the Banzhaf Power-Index, and the Causal Effect. They have been analyzed in isolation. This work is a first investigation of score alignment depending on the query and the database; i.e. on whether they induce compatible rankings of tuples. We concentrate mostly on causality-based scores; and provide a syntactic dichotomy result for queries: on one side, pairs of scores are always aligned, on the other, they are not always aligned. It turns out that the presence of exogenous tuples makes a crucial difference in this regard.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper investigates alignment between pairs of causality-based attribution scores (Causal Responsibility, Shapley Value, Banzhaf Power-Index, Causal Effect) for database tuples. It claims a syntactic dichotomy over query classes: on one side pairs are always aligned (compatible rankings), on the other side they are not always aligned, with the presence or absence of exogenous tuples making a crucial difference in the alignment behavior.
Significance. If the dichotomy is correct, the result supplies a useful theoretical classification of queries that determines when different scores can be used interchangeably for ranking tuples. This would be a meaningful contribution to explainable data management, as it links query syntax directly to score compatibility and highlights the role of exogenous data.
major comments (1)
- [Abstract] Abstract: the central syntactic dichotomy asserts that certain query classes are 'always aligned' for every database instance. Because the abstract states that exogenous tuples 'make a crucial difference,' the proof must demonstrate that no counter-example database (with suitably chosen exogenous tuples) exists for the 'always' class. The stress-test concern that syntax alone may not guarantee universality across arbitrary exogenous tuples is therefore load-bearing; without an explicit argument or exhaustive case analysis addressing this, the dichotomy claim is not yet secured.
minor comments (1)
- The abstract would be clearer if it named the specific syntactic classes (e.g., the query fragments or operators) that fall on each side of the dichotomy.
Simulated Author's Rebuttal
We thank the referee for the detailed reading and for highlighting the need to ensure the 'always aligned' side of the dichotomy holds universally, including over databases containing exogenous tuples. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central syntactic dichotomy asserts that certain query classes are 'always aligned' for every database instance. Because the abstract states that exogenous tuples 'make a crucial difference,' the proof must demonstrate that no counter-example database (with suitably chosen exogenous tuples) exists for the 'always' class. The stress-test concern that syntax alone may not guarantee universality across arbitrary exogenous tuples is therefore load-bearing; without an explicit argument or exhaustive case analysis addressing this, the dichotomy claim is not yet secured.
Authors: The referee is correct that the 'always aligned' direction requires the property to hold for every database instance. The manuscript establishes the syntactic dichotomy by proving that, for the query classes placed on the 'always' side, ranking compatibility between the pairs of scores follows directly from the query syntax and holds for arbitrary database instances. The proofs (Sections 4–5) proceed via exhaustive case analysis on the syntactic forms (e.g., certain classes of conjunctive queries and their extensions); each case shows that the relative ordering induced by the scores is preserved regardless of which additional tuples—exogenous or otherwise—are present in the instance. Consequently, no counter-example database exists for these classes. The 'crucial difference' made by exogenous tuples appears only on the complementary side of the dichotomy, where they are used to construct the counter-examples that separate the classes. If the referee finds the independence from exogenous tuples insufficiently foregrounded, we can add an explicit corollary and a short remark in the proof overview. revision: partial
Circularity Check
No circularity; syntactic dichotomy derived from standard score definitions
full rationale
The paper presents a theoretical syntactic dichotomy classifying queries into 'always aligned' vs 'not always aligned' for pairs of attribution scores (responsibility, Shapley, Banzhaf, causal effect). This rests on the standard definitions of the four scores and query syntax, with exogenous tuples handled as part of the case analysis. No equations, fitted parameters, or self-citations are load-bearing in the provided material; the result does not reduce to its inputs by construction. The derivation is self-contained as a characterization over query classes and database instances.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Sufficient Explanations in Databases and their Connections to Database Repairs
The paper introduces sufficient explanations and a sufficiency-degree attribution score for database tuples in query answering, connects them to database repairs and causality explanations, and demonstrates computatio...
Reference graph
Works this paper leans on
-
[1]
Abramovich, O., Deutch, D., Frost, N., Kara, A. and Olteanu, D. Banzhaf Values for Facts in Query Answering. Proc. SIGMOD, 2024
work page 2024
-
[2]
Arad, D., Deutch, D. and Frost, N. LearnShapley: Learning to Predict Rankings of Facts Contribution Based on Query Logs. Proc. CIKM, 2022, pp. 4788-4792
work page 2022
-
[3]
Az´ ua, F. and Bertossi, L. The Causal-Effect Score in Data Management. Proc. 4th Conference on Causal Learning and Reasoning (CLeaR 2025), PMLR, 2025, 275:874–893. arXiv 2502.02495
-
[4]
Weighted Voting Doesn’t Work: A Mathematical Analysis.Rutgers L
Banzhaf III, J. Weighted Voting Doesn’t Work: A Mathematical Analysis.Rutgers L. Rev., 19(31), 1964
work page 1964
-
[5]
Biradar, G., Izza, Y., Lobo, E., Viswanathan, V. and Zick, Y. Axiomatic Aggre- gations of Abductive Explanations. Proc. AAAI 2024
work page 2024
-
[6]
From Database Repairs to Causality in Databases and Beyond
Bertossi, L. From Database Repairs to Causality in Databases and Beyond. Transactions on Large-Scale Data- and Knowledge-Centered Systems LIV, 2023, 14160:119-131
work page 2023
-
[7]
Bertossi, L. and Salimi, B. Causes for Query Answers from Databases: Datalog Abduction, View-Updates, and Integrity Constraints.Int. J. Approx. Reason., 2017, 90:226-252
work page 2017
-
[8]
From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back.Theory Comput
Bertossi, L and Salimi,B. From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back.Theory Comput. Syst., 2017, 61(1):191-232
work page 2017
-
[9]
Bertossi, L., Livshits, E., Kimelfeld, B. and Monet, M. The Shapley Value in Database Management.ACM Sigmod Record, 2023, 52(3):6-17
work page 2023
-
[10]
Bienvenu, M., Figueira, D. and Lafourcade, P. When is Shapley Value Com- putation a Matter of Counting? Proc. ACM Manag. Data (PODS), 2024, 2, p. 105
work page 2024
-
[11]
Chockler, H. and Halpern, J. Responsibility and Blame: A Structural-Model Approach.J. Artif. Intell. Res., 2004, 22:93-115
work page 2004
-
[12]
Dalvi, N. and Suciu, D. The Dichotomy of Probabilistic Inference for Unions of Conjunctive Queries.J. ACM, 2012, 59(6):30:1-87
work page 2012
-
[13]
Davidson, S., Deutch, D., Frost, N., Kimelfeld, B., Koren, O. and Monet, M. ShapGraph: An Holistic View of Explanations through Provenance Graphs and Shapley Values. Proc. SIGMOD, 2022
work page 2022
-
[14]
Deutch, D., Frost, N., Kimelfeld, B. and Monet, M. Computing the Shapley Value of Facts in Query Answering. Proc. SIGMOD, 2022, pp. 1570-1583
work page 2022
-
[15]
Dubey, P. and Shapley, L. S. Mathematical Properties of the Banzhaf Power Index.Math. Oper. Res., 1979, 4(2):99-131
work page 1979
-
[16]
On Ordinal Equivalence of the Shapley and Banzhaf Values for Coop- erative Games.Int
Freixas, J. On Ordinal Equivalence of the Shapley and Banzhaf Values for Coop- erative Games.Int. J. Game Theory, 2010, 39:513-527
work page 2010
-
[17]
Freixas, J., Marciniak, D. and Pons, M. On the Ordinal Equivalence of the John- ston, Banzhaf and Shapley Power Indices.European Journal of Operational Re- search, 2012, 216:367–375
work page 2012
-
[18]
and Hill, J.Data Analysis Using Regression and Multi- level/Hierarchical Models
Gelman, A. and Hill, J.Data Analysis Using Regression and Multi- level/Hierarchical Models. Cambridge Univ. Press, 2007
work page 2007
-
[19]
A Modification of the Halpern-Pearl Definition of Causality
Halpern, J. A Modification of the Halpern-Pearl Definition of Causality. Proc. IJCAI, 2015, pp. 3022-3033
work page 2015
- [20]
-
[21]
Halpern, J. Y. and Pearl, J. Causes and Explanations: A Structural-Model Ap- proach. Part I: Causes.The British Journal for the Philosophy of Science, 2005, 56(4):843-887. 18
work page 2005
-
[22]
Holland, P. W. Statistics and Causal Inference.Journal of the American Statistical Association, 1986, 81(396):945-960
work page 1986
-
[23]
Kara, A., Olteanu, D. and Suciu, D. From Shapley Value to Model Counting and Back. In Proc. ACM Manag. Data (PODS), 2024
work page 2024
-
[24]
Karmarkar, P., Monet, M., Senellart, P. and Bressan, S. Expected Shapley- Like Scores of Boolean Functions: Complexity and Applications to Probabilistic Databases. In Proc. ACM Manag. Data (PODS), 2024, 2, p. 92
work page 2024
-
[25]
Livshits, E., Bertossi, L., Kimelfeld, B. and Sebag, M. Query Games in Databases. ACM Sigmod Record, 2021, 50(1):78-85
work page 2021
-
[26]
Livshits, E., Bertossi, L., Kimelfeld, B. and Sebag, M. The Shapley Value of Tuples in Query Answering.Log. Methods Comput. Sci., 2021, 17(3)
work page 2021
-
[27]
Makhija, N. and Gatterbauer, W. A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming and LP Relaxations. Proc. SIG- MOD, 2023
work page 2023
-
[28]
Meliou, A., Gatterbauer, W., Halpern, J., Koch, Ch., Moore, K. E. and Suciu, D. Causality in Databases.IEEE Data Eng. Bull., 2010, 33(3):59-67
work page 2010
-
[29]
Meliou, A., Gatterbauer, W., Moore, K. F. and Suciu, D. The Complexity of Causality and Responsibility for Query Answers and Non-Answers. Proc. VLDB Endow., 2010, 4(1):34-45
work page 2010
-
[30]
Cambridge University Press, USA, 2nd edition, 2009
Pearl, J.Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009
work page 2009
-
[31]
E, editor.The Shapley Value: Essays in Honor of Lloyd S
Roth, A. E, editor.The Shapley Value: Essays in Honor of Lloyd S. Shapley. Cambridge University Press, 1988
work page 1988
-
[32]
Roy, S. and Salimi, B. Causal Inference in Data Analysis with Applications to Fairness and Explanations. InReasoning Web. Causality, Explanations and Declarative Knowledge, Springer LNCS 13759, 2023, pp. 105-131
work page 2023
-
[33]
Rubin, D. B. Estimating Causal Effects of Treatments in Randomized and Non- randomized Studies.Journal of Educational Psychology, 1974, 66:688-701
work page 1974
-
[34]
Salimi, B.Query-Answer Causality in Databases and its Connections with Reverse Reasoning Tasks in Data and Knowledge Management.PhD Thesis, Carleton University, Canada, 2016
work page 2016
-
[35]
Quantifying Causal Effects on Query Answering in Databases
Salimi, B., Bertossi, L., Suciu, D and Van den Broeck, G. Quantifying Causal Effects on Query Answering in Databases. Proc. TaPP, USENIX Association, 2016
work page 2016
-
[36]
Shapley, L. A Value for an n-Person Game. InContributions to the Theory of Games II, Princeton Univ. Press, 1953, pp. 307–331
work page 1953
-
[37]
Causality-Based Scores Alignment in Explainable Data Management
Suciu, D., Olteanu, D., R´ e, Ch. and Koch, Ch.Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. 19 Appendix to: “Causality-Based Scores Alignment in Explainable Data Management” by Felipe Az´ ua and Leopoldo Bertossi A Basic Notions and Properties of the Scores The contents of this section will be needed ...
work page 2011
-
[38]
Therefore, adding the tupleτ d toD∖{τ d}, the number of∆(Q, S∪ {τ d}, τ), which are equal to one, is doubled. This change is countered by the increasing in the number of endogenous tuples by one, leaving the CES unchanged for all endogenous tuples. For (c), we recall a property of Shapley Value in relation with dummy tu- ples: for a given BCQQand an insta...
-
[39]
(ii) There is no subinstance with the previous condition
Thus, the scores are aligned. (ii) There is no subinstance with the previous condition. If this is the case, ρ(D,Q, τ) = 1 1+r for anyτ∈D en. SinceRespis constant, it follows that the scores are aligned.■ Proof of Proposition 2.Letx, y∈Var(Q), such thatAtoms(y)⫋Atoms(x). Now, select two atomsR x, Ry fromAtoms(Q), such thatR x ∈(Atoms(x)∖ Atoms(y)) andR y ...
-
[40]
We will extend the result for such a query using Proposition 4
We first consider a queryQwith a single component and without non-trivial sets of coincident variables. We will extend the result for such a query using Proposition 4. Consider the case|Atoms(Q)|= 2, and|Var(Q)|= 2 or 3. We proceed by showing thatCESandRespare aligned for any pair (Q RS, D), whereDis an instance with or without exogenous tuples, andQ RS (...
-
[41]
Now, consider an instanceDwithout dummy tuples
From now on, we will assumem≥2. Now, consider an instanceDwithout dummy tuples. Two cases arise: (1) All tuples inDshare the same constant in thex’s position of the atoms in Qred R1,Sm; or (2)Dcontains at least two tuples where the constants in thex’s position of each tuple are different. We first deal with case (1). Letτ R andτ si a tuple from relationRa...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.