Generalized Deletion Propagation on Counting Conjunctive Query Answers
Pith reviewed 2026-05-24 16:46 UTC · model grok-4.3
The pith
For full conjunctive queries without self-joins, the complexity of minimizing input deletions to eliminate at least k output tuples is decided by a structural property of the query.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For the class of full conjunctive queries without self-joins, there exists a structural characterization of Q such that the minimum-deletion problem of removing at least k output tuples is polynomial-time solvable precisely when Q satisfies the characterization and is NP-hard otherwise.
What carries the argument
A structural characterization of the query Q that partitions the class of full conjunctive queries without self-joins into polynomial-time and NP-hard cases.
If this is right
- When the query structure meets the characterization, an exact minimum deletion set can be computed in polynomial time.
- When the query structure fails the characterization, the decision version of the problem is NP-hard.
- An approximation algorithm is available for every hard instance.
- The same structural test decides tractability for the special case of deleting enough tuples to remove all outputs.
Where Pith is reading between the lines
- The same structural test may separate tractable and intractable cases for conjunctive queries that allow self-joins.
- The approximation algorithm for hard queries could be combined with sampling techniques to produce practical explanations for large aggregate outputs.
- The dichotomy supplies a decision procedure that database systems could use to choose between exact and approximate deletion-propagation routines at query-plan time.
Load-bearing premise
The structural characterization of Q correctly classifies every full conjunctive query without self-joins as either polynomial-time solvable or NP-hard.
What would settle it
A full conjunctive query without self-joins for which the minimum number of deletions needed to remove k outputs can be computed in polynomial time but the structural characterization labels the query as hard, or vice versa.
Figures
read the original abstract
We investigate the computational complexity of minimizing the source side-effect in order to remove a given number of tuples from the output of a conjunctive query. In particular, given a multi-relational database $D$, a conjunctive query $Q$, and a positive integer $k$ as input, the goal is to find a minimum subset of input tuples to remove from D that would eliminate at least $k$ output tuples from $Q(D)$. This problem generalizes the well-studied deletion propagation problem in databases. In addition, it encapsulates the notion of intervention for aggregate queries used in data analysis with applications to explaining interesting observations on the output. We show a dichotomy in the complexity of this problem for the class of full conjunctive queries without self-joins by giving a characterization on the structure of $Q$ that makes the problem either polynomial-time solvable or NP-hard. Our proof of this dichotomy result already gives an exact algorithm in the easy cases; we complement this by giving an approximation algorithm for the hard cases of the problem.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies the generalized deletion propagation problem: given a database D, a conjunctive query Q, and integer k, find a minimum set of input tuples whose deletion removes at least k tuples from Q(D). This generalizes classical deletion propagation. For full conjunctive queries without self-joins the authors claim a dichotomy: a structural property of Q decides whether the problem is in P (with an exact algorithm supplied) or NP-hard (with an approximation algorithm supplied).
Significance. If the claimed dichotomy and algorithms are correct, the result supplies a complete complexity map for a natural generalization of deletion propagation that arises in data explanation and intervention analysis. The exact algorithm for the polynomial cases and the approximation guarantee for the hard cases are concrete contributions that increase the practical utility of the classification.
major comments (2)
- [Abstract] Abstract and §1: the structural characterization that separates the P cases from the NP-hard cases is asserted but never defined or sketched in the provided text, so it is impossible to verify that the separation is exhaustive and free of counter-examples for every full CQ without self-joins.
- [Main theorem] The proof of the dichotomy (whatever section contains it) must explicitly show both directions: that the identified structural property yields a polynomial-time algorithm and that every query outside the property is NP-hard; without the explicit property and the two directions the central claim cannot be evaluated.
minor comments (2)
- The introduction should include a small concrete example of a query that falls on each side of the claimed dichotomy.
- Clarify whether the approximation algorithm applies only to the decision version or also to the optimization version of finding the minimum deletion set.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments on the presentation of our dichotomy result. We address each major comment below and will revise the manuscript accordingly to improve clarity.
read point-by-point responses
-
Referee: [Abstract] Abstract and §1: the structural characterization that separates the P cases from the NP-hard cases is asserted but never defined or sketched in the provided text, so it is impossible to verify that the separation is exhaustive and free of counter-examples for every full CQ without self-joins.
Authors: We agree that the abstract and §1 would benefit from a high-level sketch of the structural property (defined formally in Section 3 as the absence of a 'critical path' in the query's hypergraph, which determines whether the problem reduces to a min-cut computation). The full proof in Section 4 establishes exhaustiveness over all full CQs without self-joins. We will add a concise definition and one illustrative example to §1 in the revision. revision: yes
-
Referee: [Main theorem] The proof of the dichotomy (whatever section contains it) must explicitly show both directions: that the identified structural property yields a polynomial-time algorithm and that every query outside the property is NP-hard; without the explicit property and the two directions the central claim cannot be evaluated.
Authors: Section 4 contains the full proof of the dichotomy theorem (Theorem 4.1), which explicitly states the structural property and proves both directions: (i) when the property holds, the problem is solved in polynomial time via a reduction to minimum cut (with the algorithm given as a corollary); (ii) when the property fails, NP-hardness is shown by reduction from the minimum vertex cover problem on graphs. We will revise the theorem statement and proof outline in §4 to make the two directions more prominently separated and cross-referenced. revision: partial
Circularity Check
No significant circularity detected
full rationale
The paper presents a new structural dichotomy theorem for the generalized deletion propagation problem on full conjunctive queries without self-joins, claiming polynomial-time solvability or NP-hardness based on a characterization of query structure, with an exact algorithm supplied for the P cases and an approximation for the hard cases. No equations, fitted parameters, self-definitional reductions, or load-bearing self-citations appear in the abstract or described claim; the result is framed as an independent proof rather than a renaming or construction from prior fitted inputs. The derivation chain is therefore self-contained against external benchmarks and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Si ze Bounds and Query Plans for Relational Joins. In Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS ’08). 739–748
work page 2008
-
[2]
F. Bancilhon and N. Spyratos. 1981. Update Semantics of R e- lational Views. ACM Trans. Database Syst. 6, 4 (Dec. 1981), 557–575
work page 1981
-
[3]
Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan
-
[4]
On Propagation of Deletions and Annotations Through Views. In Proceedings of the Twenty-first ACM SIGMOD- SIGACT-SIGART Symposium on Principles of Database Systems (PODS ’02). 150–158
-
[5]
Bugra Caskurlu, Vahan Mkrtchyan, Ojas Parekh, and K. Sub - ramani. 2017. Partial Vertex Cover and Budgeted Maximum Coverage in Bipartite Graphs. SIAM J. Discrete Math. 31, 3 (2017), 2172–2184
work page 2017
-
[6]
Gao Cong, Wenfei Fan, and Floris Geerts. 2006. Annotatio n Propagation Revisited for Key Preserving Views. In Proceed- ings of the 15th ACM International Conference on Informatio n and Knowledge Management (CIKM ’06). 632–641
work page 2006
-
[7]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest , and Clifford Stein. 2009. Introduction to Algorithms, Third Edi- tion (3rd ed.). The MIT Press
work page 2009
-
[8]
Nilesh N. Dalvi and Dan Suciu. 2012. The dichotomy of prob - abilistic inference for unions of conjunctive queries. J. ACM 59, 6 (2012), 30:1–30:87
work page 2012
- [9]
-
[10]
Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, an d Alexandra Meliou. 2015. The Complexity of Resilience and Responsibility for Self-Join-Free Conjunctive Queries. PVLDB 9, 3 (2015), 180–191
work page 2015
-
[11]
R Gandhi, S Khuller, and A Srinivasan. 2004. Ap- proximation algorithms for partial covering prob- lems. Journal of Algorithms 53, 1 (2004), 55–84. https://doi.org/10.1016/j.jalgor.2004.04.002
-
[12]
Benny Kimelfeld. 2012. A dichotomy in the complexity of deletion propagation with functional dependencies. In Pro- ceedings of the 31st ACM SIGMOD-SIGACT-SIGART Sympo- sium on Principles of Database Systems, PODS 2012, Scottsda le, AZ, USA, May 20-24, 2012 . 191–202
work page 2012
-
[13]
Benny Kimelfeld, Jan Vondrák, and Ryan Williams. 2011. Max- imizing conjunctive views in deletion propagation. InProceed- ings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2011, June 12-16, 2011 , Athens, Greece. 187–198
work page 2011
-
[14]
Benny Kimelfeld, Jan Vondrák, and David P. Woodruff. 201 3. Multi-Tuple Deletion Propagation: Approximations and Com- plexity. PVLDB 6, 13 (2013), 1558–1569
work page 2013
-
[15]
Paraschos Koutris and Dan Suciu. 2011. Parallel evalua tion of conjunctive queries. In Proceedings of the 30th ACM SIGMOD- SIGACT-SIGART Symposium on Principles of Database Sys- tems, PODS. 223–234
work page 2011
-
[16]
Ester Livshits, Benny Kimelfeld, and Sudeepa Roy. 2018 . Com- puting Optimal Repairs for Functional Dependencies. In Pro- ceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10 - 15, 2018. 225–237
work page 2018
-
[17]
Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. 2010. The Complexity of Causality and Responsibility for Query Answers and non-Answers. PVLDB 4, 1 (2010), 34–45
work page 2010
-
[18]
Alexandra Meliou, Wolfgang Gatterbauer, and Dan Suciu
-
[19]
PVLDB 4, 12 (2011), 1490– 1493
Reverse Data Management. PVLDB 4, 12 (2011), 1490– 1493
work page 2011
-
[20]
Alexandra Meliou and Dan Suciu. 2012. Tiresias: the dat a- base oracle for how-to queries. In Proceedings of the ACM SIG- MOD International Conference on Management of Data, SIG- MOD 2012, Scottsdale, AZ, USA, May 20-24, 2012 . 337–348
work page 2012
-
[21]
Ngo, Ely Porat, Christopher Ré, and Atri Rudra
Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2 012. Worst-case optimal join algorithms: [extended abstract]. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Sym- posium on Principles of Database Systems . 37–48
-
[22]
Sudeepa Roy, Laurel Orr, and Dan Suciu. 2015. Explainin g Query Answers with Explanation-Ready Databases. PVLDB 9, 4 (2015), 348–359
work page 2015
-
[23]
Sudeepa Roy and Dan Suciu. 2014. A formal approach to find - ing explanations for database queries. In International Con- ference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014 . 1579–1590
work page 2014
-
[24]
Moshe Y Vardi. 1982. The complexity of relational query lan- guages. In STOC. 137–146
work page 1982
-
[25]
Todd L. Veldhuizen. 2014. Triejoin: A Simple, Worst-Ca se Op- timal Join Algorithm. In Proc. 17th International Conference on Database Theory (ICDT) . 96–106
work page 2014
-
[26]
Eugene Wu and Samuel Madden. 2013. Scorpion: Explainin g Away Outliers in Aggregate Queries. PVLDB 6, 8 (2013), 553– 564. A HARDNESS PROPAGATION FOR SIMPLIFICATION STEPS In this section we show that for the last three simplification steps in Algorithm 1 that calls the I/s.scP/t.sc/i.sc/m.sc/e.scfunction recur- sively, if the new query Q′ (or one of the new...
work page 2013
-
[27]
given a solution to k′-PSC, how to recover a solution to GDP( Q,k,D) . Given the full CQ Q containing p relations in its body, namely R1,R2,··· ,Rp , we create a set per input tuple in the p relations, and an element per output tuple in Q( D) . Each set contains elements that correspond to the output tuples resulting from the join between the associated i...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.