pith. sign in

arxiv: 2303.05327 · v4 · submitted 2023-03-09 · 💻 cs.DB · cs.DS· cs.LO

Direct Access for Answers to Conjunctive Queries with Aggregation

Pith reviewed 2026-05-24 09:16 UTC · model grok-4.3

classification 💻 cs.DB cs.DScs.LO
keywords conjunctive queriesaggregationdirect accessfine-grained complexitysemiringcount-distinctlexicographic orderannotated databases
0
0 comments X

The pith

Past tractability results for direct access to conjunctive query answers extend to annotated databases with aggregation when annotations are excluded from the order.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper investigates evaluating conjunctive queries with grouping and aggregation by building a data structure for fast direct access to ordered answers. For standard aggregates like min, max, count, and sum, queries are represented using commutative semirings. The authors show that previous sufficient conditions for tractability remain valid for such annotated databases if the annotation values do not enter the lexicographic ordering. They separately derive the tractability condition for count-distinct, which cannot use semirings, and explore how complexity shifts when aggregate values participate in the ordering or when most relations carry the multiplicative identity annotation.

Core claim

All past results on direct access continue to hold for annotated databases assuming the annotation does not participate in the lexicographic order. For count-distinct aggregation, which has no efficient semiring representation, the corresponding tractability condition is established. The complexity changes when the aggregate and annotation values are included in the order, and special cases arise with unit annotations and idempotent addition.

What carries the argument

The construction of a loglinear-time data structure providing logarithmic-time direct access to answers, extended via commutative semiring annotations for aggregation.

Load-bearing premise

The annotation itself does not participate in the lexicographic order.

What would settle it

Finding a conjunctive query without self-joins, a lexicographic order excluding annotations, and an annotated database where no loglinear-time data structure for direct access exists, contradicting the extension of past results.

Figures

Figures reproduced from arXiv: 2303.05327 by Benny Kimelfeld, Idan Eldar, Nofar Carmeli.

Figure 1
Figure 1. Figure 1: An example of a Q-database over the numerical semiring constructed to evaluate the AggCQ Q(c, Sum(t)) :− Teams(p, c), Goals(g, p, t), Replays(g, t). rational numbers rather than all real numbers to avoid issues of numerical presentation in the computational model. Avg can be computed using Sum and Count. CountD (count distinct) cannot be captured by a semiring, as the result of ⊕ cannot be computed from tw… view at source ↗
Figure 2
Figure 2. Figure 2: Example of QR and DR when N = 4 and d = 3. The input (D, τ ) is defined by D = {R(2), R(13), R(64), R(192)} where τ (R(a)) = a for every fact R(a). hardness when the computed value is within an AggCQ, say using Count. For example, direct access for Q(Count(), x, y):− R(x), S(y) is clearly in ⟨loglinear, log⟩ since the computed value has no impact (as it is always 1). Nevertheless, we can show that incorpor… view at source ↗
Figure 3
Figure 3. Figure 3: Example of the construction in the proof of Proposition 5.12: direct access for the AggCQ Q(Count(), x, x′ ) :− R(x, w), R′ (x ′ , w′ ). also phrase Lemma 5.4 over the Cartesian product Q(Count(), x, y):− R(x), S(y) or alike. Clearly, incorporating Count in R(x)×S(y) would be meaningless since every answer appears exactly once (and has a count of 1). The next theorem shows that the reason goes deeper: even… view at source ↗
Figure 4
Figure 4. Figure 4: An example for the construction from Lemma 6.9 on the query Q(x1, x2):− R(w1, w2), S(w2, x1), T(x1, x2, w3), U(x2, w4, w5) for R-annotated databases. Here, x1 is the carrying variable, Rcarry = S|free(Q) , and V = {w1, w2}. D that agrees with f ′ on the free variables and has a0 in place of any existential variables. We conclude that the claimed homomorphism exists, and ⃗a is also an answer for Q(D). It is… view at source ↗
read the original abstract

We study the fine-grained complexity of conjunctive queries with grouping and aggregation. For common aggregate functions (e.g., min, max, count, sum), such a query can be phrased as an ordinary conjunctive query over a database annotated with a suitable commutative semiring. We investigate the ability to evaluate such queries by constructing in loglinear time a data structure that provides logarithmic-time direct access to the answers ordered by a given lexicographic order. This task is nontrivial since the number of answers might be larger than loglinear in the size of the input, so the data structure needs to provide a compact representation of the space of answers. In the absence of aggregation and annotation, past research established a sufficient tractability condition on queries and orders. For queries without self-joins, this condition is not just sufficient, but also necessary (under conventional lower-bound assumptions in fine-grained complexity). We show that all past results continue to hold for annotated databases, assuming that the annotation itself does not participate in the lexicographic order. Yet, past algorithms do not apply to the count-distinct aggregation, which has no efficient representation as a commutative semiring; for this aggregation, we establish the corresponding tractability condition. We then show how the complexity of the problem changes when we include the aggregate and annotation value in the order. We also study the impact of having all relations but one annotated by the multiplicative identity (one), as happens when we translate aggregate queries into semiring annotations, and having a semiring with an idempotent addition, such as the case of min, max, and count-distinct over a logarithmic-size domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies the fine-grained complexity of conjunctive queries with grouping and aggregation. It shows that prior results on constructing loglinear-time data structures for logarithmic-time direct access to answers in lexicographic order extend to semiring-annotated databases (covering aggregates like min, max, count, sum), provided the annotation does not participate in the order. For count-distinct (which lacks an efficient commutative semiring representation), a separate tractability condition is derived. The work further examines the complexity when aggregate/annotation values are included in the order, as well as special cases such as most relations annotated by the multiplicative identity and idempotent semirings.

Significance. If the central claims hold, the results meaningfully extend the tractability landscape for direct-access evaluation of CQs to the aggregated setting that is central to database query processing. The explicit handling of the non-participation assumption for annotations, the separate treatment of count-distinct, and the analysis of order inclusion and idempotence are useful contributions that build directly on prior fine-grained complexity results for non-aggregated CQs.

major comments (2)
  1. [Abstract] Abstract (paragraph on annotated databases): The extension of past results is conditioned on the assumption that 'the annotation itself does not participate in the lexicographic order.' This assumption is load-bearing for the claim that 'all past results continue to hold'; the main theorems should explicitly restate the precise query classes (e.g., self-join-free) and order conditions under which the extension applies, rather than relying on the abstract phrasing.
  2. [Abstract] Abstract (paragraph on count-distinct): The tractability condition for count-distinct is asserted but not stated; the corresponding section should give the exact syntactic condition on the query and order (analogous to the non-aggregated case) and prove it is both sufficient and necessary under standard assumptions.
minor comments (2)
  1. Clarify in the preliminaries whether the semiring operations are required to be computable in constant time or logarithmic time for the loglinear preprocessing bound to hold.
  2. The discussion of translating aggregate queries into semiring annotations (most relations annotated by 1) should include a brief example showing how the direct-access structure is constructed in this restricted case.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address the two major comments below and will make the requested clarifications to the abstract.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph on annotated databases): The extension of past results is conditioned on the assumption that 'the annotation itself does not participate in the lexicographic order.' This assumption is load-bearing for the claim that 'all past results continue to hold'; the main theorems should explicitly restate the precise query classes (e.g., self-join-free) and order conditions under which the extension applies, rather than relying on the abstract phrasing.

    Authors: We agree that the abstract should be more explicit. We will revise it to state that the extension applies to self-join-free conjunctive queries under the same lexicographic order conditions as the non-annotated case (with annotations excluded from the order). The body of the paper already contains the precise theorems with these conditions; the abstract revision will align the high-level claim with them. revision: yes

  2. Referee: [Abstract] Abstract (paragraph on count-distinct): The tractability condition for count-distinct is asserted but not stated; the corresponding section should give the exact syntactic condition on the query and order (analogous to the non-aggregated case) and prove it is both sufficient and necessary under standard assumptions.

    Authors: We will revise the abstract to explicitly state the syntactic tractability condition for count-distinct (the same as the non-aggregated case). The corresponding section already provides the exact condition together with proofs of sufficiency and necessity under standard fine-grained assumptions; the abstract change will make this visible at the summary level. revision: yes

Circularity Check

0 steps flagged

Minor self-citation of prior tractability results; extensions for annotated databases and count-distinct are independent

full rationale

The paper explicitly conditions the extension of prior results on the assumption that annotations do not participate in the lexicographic order and separately derives a new tractability condition for count-distinct aggregation (which lacks a semiring representation). These steps are stated directly in the abstract and do not reduce any new claim to a fitted parameter, self-definition, or unverified self-citation chain. The reference to 'past research' on the non-aggregated case is a standard citation of independent prior work and is not load-bearing for the novel annotated or count-distinct contributions. No equations or derivations in the provided text equate a claimed result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on prior results about conjunctive queries without aggregation and on conventional lower-bound assumptions from fine-grained complexity; no free parameters, new axioms beyond domain standards, or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Conventional lower-bound assumptions in fine-grained complexity
    Invoked to establish necessity of the tractability condition for queries without self-joins.

pith-pipeline@v0.9.0 · 5832 in / 1364 out tokens · 40350 ms · 2026-05-24T09:16:38.858916+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

  1. [1]

    On acyclic conjunctive queries and constant delay enumeration

    Guillaume Bagan, Arnaud Durand, and Etienne Grandjean. On acyclic conjunctive queries and constant delay enumeration. In Jacques Duparc and Thomas A. Henzinger, editors, Computer Science Logic , pages 208--222, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg

  2. [2]

    Computing the JTH solution of a first-order query

    Guillaume Bagan, Arnaud Durand, Etienne Grandjean, and Fr \'e d \'e ric Olive. Computing the JTH solution of a first-order query . RAIRO - Theoretical Informatics and Applications (RAIRO: ITA) , 42:147--164, 2008. URL: https://hal.archives-ouvertes.fr/hal-00221730

  3. [3]

    Answering FO+MOD queries under updates on bounded degree databases

    Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. Answering FO+MOD queries under updates on bounded degree databases. ACM Trans. Database Syst. , 43(2):7:1--7:32, 2018

  4. [4]

    De la pertinence de l' \'e num \'e ration : complexit \'e en logiques propositionnelle et du premier ordre

    Johann Brault-Baron. De la pertinence de l' \'e num \'e ration : complexit \'e en logiques propositionnelle et du premier ordre . Theses, Universit \'e de Caen , April 2013

  5. [5]

    Tight fine-grained bounds for direct access on join queries

    Karl Bringmann, Nofar Carmeli, and Stefan Mengel. Tight fine-grained bounds for direct access on join queries. In PODS , pages 427--436. ACM , 2022

  6. [6]

    Tight fine-grained bounds for direct access on join queries

    Karl Bringmann, Nofar Carmeli, and Stefan Mengel. Tight fine-grained bounds for direct access on join queries. CoRR , abs/2201.02401, 2022

  7. [7]

    Tractable orders for direct access to ranked answers of conjunctive queries

    Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, and Mirek Riedewald. Tractable orders for direct access to ranked answers of conjunctive queries. In PODS , pages 325--341. ACM , 2021

  8. [8]

    Answering (unions of) conjunctive queries using random access and random-order enumeration

    Nofar Carmeli, Shai Zeevi, Christoph Berkholz, Alessio Conte, Benny Kimelfeld, and Nicole Schweikardt. Answering (unions of) conjunctive queries using random access and random-order enumeration. ACM Trans. Database Syst. , 47(3):9:1--9:49, 2022

  9. [9]

    A course in computational algebraic number theory , volume 138 of Graduate texts in mathematics

    Henri Cohen. A course in computational algebraic number theory , volume 138 of Graduate texts in mathematics . Springer, 1993

  10. [10]

    Deciding equivalences among conjunctive aggregate queries

    Sara Cohen, Werner Nutt, and Yehoshua Sagiv. Deciding equivalences among conjunctive aggregate queries. J. ACM , 54(2):5–es, apr 2007

  11. [11]

    On a class of o(n2) problems in computational geometry

    Anka Gajentaan and Mark H Overmars. On a class of o(n2) problems in computational geometry. Computational Geometry , 5(3):165--185, 1995. URL: https://www.sciencedirect.com/science/article/pii/0925772195000222, https://doi.org/https://doi.org/10.1016/0925-7721(95)00022-2 doi:https://doi.org/10.1016/0925-7721(95)00022-2

  12. [12]

    Which arithmetic operations can be performed in constant time in the ram model with addition?, 2022

    Étienne Grandjean and Louis Jachiet. Which arithmetic operations can be performed in constant time in the ram model with addition?, 2022. URL: https://arxiv.org/abs/2206.13851, https://doi.org/10.48550/ARXIV.2206.13851 doi:10.48550/ARXIV.2206.13851

  13. [13]

    Green, Grigoris Karvounarakis, and Val Tannen

    Todd J. Green, Grigoris Karvounarakis, and Val Tannen. Provenance semirings. In PODS , PODS '07, page 31–40, New York, NY, USA, 2007. Association for Computing Machinery

  14. [14]

    The dynamic yannakakis algorithm: Compact and efficient query processing under updates

    Muhammad Idris, Martin Ugarte, and Stijn Vansummeren. The dynamic yannakakis algorithm: Compact and efficient query processing under updates. In SIGMOD , page 1259–1274, New York, NY, USA, 2017. Association for Computing Machinery

  15. [15]

    Trotter Jr

    Leslie E. Trotter Jr. Algorithmic graph theory and perfect graphs, by martin c. golumbic, academic, new york, 284 pp. price: 34.00. Networks , 13(2):304--305, 1983. https://doi.org/10.1002/net.3230130214 doi:10.1002/net.3230130214

  16. [16]

    Curtin, Benjamin Moseley, Hung Q

    Mahmoud Abo Khamis, Ryan R. Curtin, Benjamin Moseley, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. Functional aggregate queries with additive inequalities. ACM Trans. Database Syst. , 45(4):17:1--17:41, 2020

  17. [17]

    Ngo, and Atri Rudra

    Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. FAQ: questions asked frequently. In PODS , pages 13--28. ACM , 2016

  18. [18]

    Factorized databases

    Dan Olteanu and Maximilian Schleich. Factorized databases. SIGMOD Rec. , 45(2):5--16, 2016

  19. [19]

    Size bounds for factorised representations of query results

    Dan Olteanu and Jakub Z\' a vodn\' y . Size bounds for factorised representations of query results. ACM Trans. Database Syst. , 40(1), mar 2015

  20. [20]

    Towards polynomial lower bounds for dynamic problems

    Mihai Patrascu. Towards polynomial lower bounds for dynamic problems. In Proceedings of the Forty-Second ACM Symposium on Theory of Computing , STOC '10, page 603–610, New York, NY, USA, 2010. Association for Computing Machinery. https://doi.org/10.1145/1806689.1806772 doi:10.1145/1806689.1806772

  21. [21]

    The trichotomy of HAVING queries on a probabilistic database

    Christopher R \' e and Dan Suciu. The trichotomy of HAVING queries on a probabilistic database. VLDB J. , 18(5):1091--1116, 2009

  22. [22]

    Dynamic complexity under definable changes

    Thomas Schwentick, Nils Vortmeier, and Thomas Zeume. Dynamic complexity under definable changes. ACM Trans. Database Syst. , 43(3):12:1--12:38, 2018

  23. [23]

    Hardness of easy problems: Basing hardness on popular conjectures such as the strong exponential time hypothesis (invited talk)

    Virginia Vassilevska Williams. Hardness of easy problems: Basing hardness on popular conjectures such as the strong exponential time hypothesis (invited talk). In IPEC . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2015

  24. [24]

    Algorithms for acyclic database schemes

    Mihalis Yannakakis. Algorithms for acyclic database schemes. In VLDB , VLDB '81, page 82–94. VLDB Endowment, 1981