pith. sign in

arxiv: 1907.10125 · v1 · pith:EF7HFSNPnew · submitted 2019-07-23 · 💻 cs.DB

Generalized Deletion Propagation on Counting Conjunctive Query Answers

Pith reviewed 2026-05-24 16:46 UTC · model grok-4.3

classification 💻 cs.DB
keywords deletion propagationconjunctive queriescomputational complexitydichotomy theoremNP-hardnessapproximation algorithmsdatabase interventions
0
0 comments X

The pith

For full conjunctive queries without self-joins, the complexity of minimizing input deletions to eliminate at least k output tuples is decided by a structural property of the query.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the problem of selecting the smallest set of input tuples to delete so that a conjunctive query on a multi-relational database loses at least k output tuples. This generalizes classical deletion propagation and supports intervention-style explanations for aggregate query results. The authors prove a dichotomy for full conjunctive queries without self-joins: a structural characterization of the query determines whether the minimization task is solvable in polynomial time or is NP-hard. They supply an exact algorithm for the tractable side of the dichotomy and an approximation algorithm for the intractable side.

Core claim

For the class of full conjunctive queries without self-joins, there exists a structural characterization of Q such that the minimum-deletion problem of removing at least k output tuples is polynomial-time solvable precisely when Q satisfies the characterization and is NP-hard otherwise.

What carries the argument

A structural characterization of the query Q that partitions the class of full conjunctive queries without self-joins into polynomial-time and NP-hard cases.

If this is right

  • When the query structure meets the characterization, an exact minimum deletion set can be computed in polynomial time.
  • When the query structure fails the characterization, the decision version of the problem is NP-hard.
  • An approximation algorithm is available for every hard instance.
  • The same structural test decides tractability for the special case of deleting enough tuples to remove all outputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same structural test may separate tractable and intractable cases for conjunctive queries that allow self-joins.
  • The approximation algorithm for hard queries could be combined with sampling techniques to produce practical explanations for large aggregate outputs.
  • The dichotomy supplies a decision procedure that database systems could use to choose between exact and approximate deletion-propagation routines at query-plan time.

Load-bearing premise

The structural characterization of Q correctly classifies every full conjunctive query without self-joins as either polynomial-time solvable or NP-hard.

What would settle it

A full conjunctive query without self-joins for which the minimum number of deletions needed to remove k outputs can be computed in polynomial time but the structural characterization labels the query as hard, or vice versa.

Figures

Figures reproduced from arXiv: 1907.10125 by Debmalya Panigrahi, Shweta Patwa, Sudeepa Roy.

Figure 1
Figure 1. Figure 1: Database schema and instance from Ex￾ample 2.1 and the answers for query Q(A, B,C, E) : −R1(A, B), R2(B,C), R3(C, E). Full conjunctive queries without self-joins. We con￾sider the class of full conjunctive queries (CQ) without self￾joins. Such a CQ represents the natural join among the given relations, and has the following form: Q(A) : −R1(A1), R2(A2), · · · , Rp (Ap ) We will call the above queryQ the fu… view at source ↗
Figure 2
Figure 2. Figure 2: Tree T when Algorithm 1 is run on Q0 from Example 3.2, which simplification step has been ap￾plied is shown next to each edge. 3.1 Hardness The proof of hardness is divided into two parts. In the first part we argue that for the last three simplification steps that call the IsPtime algorithm recursively, hardness of the query is preserved (proved in Appendix A due to space constraints). In particular, the … view at source ↗
Figure 3
Figure 3. Figure 3: An example construction from Lemma 3.6: for the given instance of PVCB in the figure, and query Q2(· · · ) : − R1(A), R2(B), R3(A,C), R4(E, B), R5(C, E), R6(C, F ) from the introduction, we create D for GDP(Q2, k,D) as shown. First R1 ← U, R2 ← V . Then R3 ← U, R4 ← V . Then R5 ← E and R6 ← U . Now we claim that PVCB has a solution of size M if and only if GDP(Q, k,D) has a solution of size M. Note that th… view at source ↗
Figure 4
Figure 4. Figure 4: An example construction for Lemma 3.7. Consider the PVCB instance from [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: An example construction for Lemma A.4: Consider the full CQQ0 from [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
read the original abstract

We investigate the computational complexity of minimizing the source side-effect in order to remove a given number of tuples from the output of a conjunctive query. In particular, given a multi-relational database $D$, a conjunctive query $Q$, and a positive integer $k$ as input, the goal is to find a minimum subset of input tuples to remove from D that would eliminate at least $k$ output tuples from $Q(D)$. This problem generalizes the well-studied deletion propagation problem in databases. In addition, it encapsulates the notion of intervention for aggregate queries used in data analysis with applications to explaining interesting observations on the output. We show a dichotomy in the complexity of this problem for the class of full conjunctive queries without self-joins by giving a characterization on the structure of $Q$ that makes the problem either polynomial-time solvable or NP-hard. Our proof of this dichotomy result already gives an exact algorithm in the easy cases; we complement this by giving an approximation algorithm for the hard cases of the problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies the generalized deletion propagation problem: given a database D, a conjunctive query Q, and integer k, find a minimum set of input tuples whose deletion removes at least k tuples from Q(D). This generalizes classical deletion propagation. For full conjunctive queries without self-joins the authors claim a dichotomy: a structural property of Q decides whether the problem is in P (with an exact algorithm supplied) or NP-hard (with an approximation algorithm supplied).

Significance. If the claimed dichotomy and algorithms are correct, the result supplies a complete complexity map for a natural generalization of deletion propagation that arises in data explanation and intervention analysis. The exact algorithm for the polynomial cases and the approximation guarantee for the hard cases are concrete contributions that increase the practical utility of the classification.

major comments (2)
  1. [Abstract] Abstract and §1: the structural characterization that separates the P cases from the NP-hard cases is asserted but never defined or sketched in the provided text, so it is impossible to verify that the separation is exhaustive and free of counter-examples for every full CQ without self-joins.
  2. [Main theorem] The proof of the dichotomy (whatever section contains it) must explicitly show both directions: that the identified structural property yields a polynomial-time algorithm and that every query outside the property is NP-hard; without the explicit property and the two directions the central claim cannot be evaluated.
minor comments (2)
  1. The introduction should include a small concrete example of a query that falls on each side of the claimed dichotomy.
  2. Clarify whether the approximation algorithm applies only to the decision version or also to the optimization version of finding the minimum deletion set.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on the presentation of our dichotomy result. We address each major comment below and will revise the manuscript accordingly to improve clarity.

read point-by-point responses
  1. Referee: [Abstract] Abstract and §1: the structural characterization that separates the P cases from the NP-hard cases is asserted but never defined or sketched in the provided text, so it is impossible to verify that the separation is exhaustive and free of counter-examples for every full CQ without self-joins.

    Authors: We agree that the abstract and §1 would benefit from a high-level sketch of the structural property (defined formally in Section 3 as the absence of a 'critical path' in the query's hypergraph, which determines whether the problem reduces to a min-cut computation). The full proof in Section 4 establishes exhaustiveness over all full CQs without self-joins. We will add a concise definition and one illustrative example to §1 in the revision. revision: yes

  2. Referee: [Main theorem] The proof of the dichotomy (whatever section contains it) must explicitly show both directions: that the identified structural property yields a polynomial-time algorithm and that every query outside the property is NP-hard; without the explicit property and the two directions the central claim cannot be evaluated.

    Authors: Section 4 contains the full proof of the dichotomy theorem (Theorem 4.1), which explicitly states the structural property and proves both directions: (i) when the property holds, the problem is solved in polynomial time via a reduction to minimum cut (with the algorithm given as a corollary); (ii) when the property fails, NP-hardness is shown by reduction from the minimum vertex cover problem on graphs. We will revise the theorem statement and proof outline in §4 to make the two directions more prominently separated and cross-referenced. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents a new structural dichotomy theorem for the generalized deletion propagation problem on full conjunctive queries without self-joins, claiming polynomial-time solvability or NP-hardness based on a characterization of query structure, with an exact algorithm supplied for the P cases and an approximation for the hard cases. No equations, fitted parameters, self-definitional reductions, or load-bearing self-citations appear in the abstract or described claim; the result is framed as an independent proof rather than a renaming or construction from prior fitted inputs. The derivation chain is therefore self-contained against external benchmarks and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, invented entities, or non-standard axioms are mentioned in the abstract; the result is a complexity classification resting on standard assumptions of P vs NP.

pith-pipeline@v0.9.0 · 5710 in / 1043 out tokens · 23183 ms · 2026-05-24T16:46:56.833429+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Si ze Bounds and Query Plans for Relational Joins. In Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS ’08). 739–748

  2. [2]

    Bancilhon and N

    F. Bancilhon and N. Spyratos. 1981. Update Semantics of R e- lational Views. ACM Trans. Database Syst. 6, 4 (Dec. 1981), 557–575

  3. [3]

    Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan

  4. [4]

    In Proceedings of the Twenty-first ACM SIGMOD- SIGACT-SIGART Symposium on Principles of Database Systems (PODS ’02)

    On Propagation of Deletions and Annotations Through Views. In Proceedings of the Twenty-first ACM SIGMOD- SIGACT-SIGART Symposium on Principles of Database Systems (PODS ’02). 150–158

  5. [5]

    Sub - ramani

    Bugra Caskurlu, Vahan Mkrtchyan, Ojas Parekh, and K. Sub - ramani. 2017. Partial Vertex Cover and Budgeted Maximum Coverage in Bipartite Graphs. SIAM J. Discrete Math. 31, 3 (2017), 2172–2184

  6. [6]

    Gao Cong, Wenfei Fan, and Floris Geerts. 2006. Annotatio n Propagation Revisited for Key Preserving Views. In Proceed- ings of the 15th ACM International Conference on Informatio n and Knowledge Management (CIKM ’06). 632–641

  7. [7]

    Cormen, Charles E

    Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest , and Clifford Stein. 2009. Introduction to Algorithms, Third Edi- tion (3rd ed.). The MIT Press

  8. [8]

    Dalvi and Dan Suciu

    Nilesh N. Dalvi and Dan Suciu. 2012. The dichotomy of prob - abilistic inference for unions of conjunctive queries. J. ACM 59, 6 (2012), 30:1–30:87

  9. [9]

    Bernstein

    Umeshwar Dayal and Philip A. Bernstein. 1982. On the Cor- rect Translation of Update Operations on Relational Views. ACM Trans. Database Syst. 7, 3 (Sept. 1982), 381–416

  10. [10]

    Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, an d Alexandra Meliou. 2015. The Complexity of Resilience and Responsibility for Self-Join-Free Conjunctive Queries. PVLDB 9, 3 (2015), 180–191

  11. [11]

    R Gandhi, S Khuller, and A Srinivasan. 2004. Ap- proximation algorithms for partial covering prob- lems. Journal of Algorithms 53, 1 (2004), 55–84. https://doi.org/10.1016/j.jalgor.2004.04.002

  12. [12]

    Benny Kimelfeld. 2012. A dichotomy in the complexity of deletion propagation with functional dependencies. In Pro- ceedings of the 31st ACM SIGMOD-SIGACT-SIGART Sympo- sium on Principles of Database Systems, PODS 2012, Scottsda le, AZ, USA, May 20-24, 2012 . 191–202

  13. [13]

    Benny Kimelfeld, Jan Vondrák, and Ryan Williams. 2011. Max- imizing conjunctive views in deletion propagation. InProceed- ings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2011, June 12-16, 2011 , Athens, Greece. 187–198

  14. [14]

    Benny Kimelfeld, Jan Vondrák, and David P. Woodruff. 201 3. Multi-Tuple Deletion Propagation: Approximations and Com- plexity. PVLDB 6, 13 (2013), 1558–1569

  15. [15]

    Paraschos Koutris and Dan Suciu. 2011. Parallel evalua tion of conjunctive queries. In Proceedings of the 30th ACM SIGMOD- SIGACT-SIGART Symposium on Principles of Database Sys- tems, PODS. 223–234

  16. [16]

    Ester Livshits, Benny Kimelfeld, and Sudeepa Roy. 2018 . Com- puting Optimal Repairs for Functional Dependencies. In Pro- ceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10 - 15, 2018. 225–237

  17. [17]

    Moore, and Dan Suciu

    Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. 2010. The Complexity of Causality and Responsibility for Query Answers and non-Answers. PVLDB 4, 1 (2010), 34–45

  18. [18]

    Alexandra Meliou, Wolfgang Gatterbauer, and Dan Suciu

  19. [19]

    PVLDB 4, 12 (2011), 1490– 1493

    Reverse Data Management. PVLDB 4, 12 (2011), 1490– 1493

  20. [20]

    Alexandra Meliou and Dan Suciu. 2012. Tiresias: the dat a- base oracle for how-to queries. In Proceedings of the ACM SIG- MOD International Conference on Management of Data, SIG- MOD 2012, Scottsdale, AZ, USA, May 20-24, 2012 . 337–348

  21. [21]

    Ngo, Ely Porat, Christopher Ré, and Atri Rudra

    Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2 012. Worst-case optimal join algorithms: [extended abstract]. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Sym- posium on Principles of Database Systems . 37–48

  22. [22]

    Sudeepa Roy, Laurel Orr, and Dan Suciu. 2015. Explainin g Query Answers with Explanation-Ready Databases. PVLDB 9, 4 (2015), 348–359

  23. [23]

    Sudeepa Roy and Dan Suciu. 2014. A formal approach to find - ing explanations for database queries. In International Con- ference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014 . 1579–1590

  24. [24]

    Moshe Y Vardi. 1982. The complexity of relational query lan- guages. In STOC. 137–146

  25. [25]

    Veldhuizen

    Todd L. Veldhuizen. 2014. Triejoin: A Simple, Worst-Ca se Op- timal Join Algorithm. In Proc. 17th International Conference on Database Theory (ICDT) . 96–106

  26. [26]

    Eugene Wu and Samuel Madden. 2013. Scorpion: Explainin g Away Outliers in Aggregate Queries. PVLDB 6, 8 (2013), 553– 564. A HARDNESS PROPAGATION FOR SIMPLIFICATION STEPS In this section we show that for the last three simplification steps in Algorithm 1 that calls the I/s.scP/t.sc/i.sc/m.sc/e.scfunction recur- sively, if the new query Q′ (or one of the new...

  27. [27]

    Given the full CQ Q containing p relations in its body, namely R1,R2,··· ,Rp , we create a set per input tuple in the p relations, and an element per output tuple in Q( D)

    given a solution to k′-PSC, how to recover a solution to GDP( Q,k,D) . Given the full CQ Q containing p relations in its body, namely R1,R2,··· ,Rp , we create a set per input tuple in the p relations, and an element per output tuple in Q( D) . Each set contains elements that correspond to the output tuples resulting from the join between the associated i...