Query Optimization and Evaluation via Information Theory: A Tutorial

Dan Suciu; Hung Q. Ngo; Mahmoud Abo Khamis

arxiv: 2604.04893 · v1 · submitted 2026-04-06 · 💻 cs.DB · cs.IT· math.IT

Query Optimization and Evaluation via Information Theory: A Tutorial

Mahmoud Abo Khamis , Hung Q. Ngo , Dan Suciu This is my paper

Pith reviewed 2026-05-10 19:30 UTC · model grok-4.3

classification 💻 cs.DB cs.ITmath.IT

keywords conjunctive queriesquery optimizationinformation theoryPANDA frameworkcardinality boundsquery evaluationdatabase theoryconstraint satisfaction

0 comments

The pith

The PANDA framework derives conjunctive query plans directly from proofs of information-theoretic bounds on intermediate cardinalities to match specialized algorithm runtimes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper is a tutorial on the PANDA framework for conjunctive query optimization and evaluation. It shows how information theory supplies tight upper bounds on the sizes of intermediate results, and how those bounds both set plan costs and dictate the structure of the plans themselves. This produces a single general method that works across problems generalizing graph pattern matching, constraint satisfaction, and statistical inference. A reader would care because the approach claims to deliver runtimes competitive with or better than hand-tuned algorithms for each specific case, including ones that use fast matrix multiplication.

Core claim

By computing information-theoretically tight upper bounds on the cardinalities of intermediate relations and constructing query plans directly from the mathematical proofs of those bounds, the PANDA framework yields a generic algorithm for conjunctive query evaluation whose performance matches or subsumes that of specialized algorithms for the same problems.

What carries the argument

The PANDA framework, which translates information-theoretically tight cardinality upper bounds into executable query plans by following the structure of their proofs.

If this is right

A single optimizer can achieve asymptotically optimal runtimes for many different conjunctive queries without requiring separate code for each problem class.
Query plans can be generated automatically by mirroring the steps in the proof of a cardinality bound.
Techniques from database theory, constraint satisfaction, and graph algorithms become interchangeable under the same bounding method.
Performance can equal or exceed that of algorithms relying on fast matrix multiplication for suitable queries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Database systems could adopt bound-derivation techniques to improve performance on mixed workloads without maintaining many specialized code paths.
The same proof-to-plan coupling might be tested on query classes beyond conjunctive queries, such as those with aggregates or disjunctions.
Practical implementations would need to measure the exact overhead of bound computation on real data to confirm the claimed advantage holds outside theory.

Load-bearing premise

The information-theoretically tight upper bounds on intermediate cardinalities can be computed and turned into executable plans with overhead low enough to preserve the asymptotic runtime advantage.

What would settle it

A concrete conjunctive query together with input statistics for which a PANDA-derived plan runs asymptotically slower than a published specialized algorithm, or for which computing the bounds takes longer than the evaluation savings they provide.

Figures

Figures reproduced from arXiv: 2604.04893 by Dan Suciu, Hung Q. Ngo, Mahmoud Abo Khamis.

**Figure 1.** Figure 1: Query Q□ from Eq. (2) (or Qfull □ from Eq. (1)), along with the two free-connex tree decompositions. (1) The bags form an acyclic query Q(F) :- V B∈bags(T ) QB(B), where QB is a new relation symbol associated with B. (2) Each atom R(X) ∈ atoms(Q) is contained in some bag, i.e., satisfies X ⊆ B for some B ∈ bags(T ). The TD T is called free-connex if the acyclic query Q(F) :- V B∈bags(T ) QB(B) is free-conn… view at source ↗

**Figure 2.** Figure 2: An input database instance for the 4-cycle query Qfull □ from Eq. (1) along with the corresponding output. Each output tuple is annotated with its probability (the red numbers to the right) under a uniform distribution. Similarly, each input tuple is annotated with its marginal probability. where QB is the output of rule (13). Consequently, the first thing we need to do is to estimate supD|=S |QB(D)| for e… view at source ↗

read the original abstract

Database theory is exciting because it studies highly general and practically useful abstractions. Conjunctive query (CQ) evaluation is a prime example: it simultaneously generalizes graph pattern matching, constraint satisfaction, and statistical inference, among others. This generality is both the strength and the central challenge of the field. The query optimization and evaluation problem is fundamentally a "meta-algorithm" problem: given a query $Q$ and statistics $\cal S$ about the input database, how should one best answer $Q$? Because the problem is so general, it is often impossible for such a meta-algorithm to match the runtimes of specialized algorithms designed for a fixed query -- or so it seemed. The past fifteen years have witnessed an exciting development in database theory: a general framework, called PANDA, that emerged from advances in database theory, constraint satisfaction problems (CSP), and graph algorithms, for evaluating conjunctive queries given input data statistics. The key idea is to derive information-theoretically tight upper bounds on the cardinalities of intermediate relations produced during query evaluation. These bounds determine the costs of query plans, and crucially, the query plans themselves are derived directly from the mathematical proof of the upper bound. This tight coupling of proof and algorithm is what makes PANDA both principled and powerful. Remarkably, this generic algorithm matches -- and in some cases subsumes -- the runtimes of specialized algorithms for the same problems, including algorithms that exploit fast matrix multiplication. This paper is a tutorial on the PANDA framework. We illustrate the key ideas through concrete examples, conveying the main intuitions behind the theory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a tutorial explaining the existing PANDA framework for information-theoretic query optimization, with clear examples but no new theorems or experiments.

read the letter

The main takeaway is that this paper is an expository tutorial on the PANDA framework developed over the last fifteen years, not a source of new results. It walks through how information-theoretic bounds on intermediate cardinalities can be turned into query plans for conjunctive queries, and shows that those plans can match or beat specialized algorithms in some cases, including ones using fast matrix multiplication. The examples are the strongest part: they make the link between the entropy-based proofs and the actual evaluation steps feel concrete and usable. That tight coupling of bound proof to plan structure is the core idea the authors want readers to carry away. The writing stays focused on intuition rather than heavy formalism, which fits the tutorial format. On the downside, the paper supplies no proofs, no counterexamples, and no runtime measurements of its own. The abstract claims asymptotic matching, but without data or a worked complexity analysis of the bound-computation step itself, it is hard to judge whether the overhead of solving the linear programs or materializing the plans stays low enough to preserve the advantage. The stress-test concern about super-linear costs in the LP or plan-generation phase is not addressed here, so any reader interested in production use would need to dig into the cited prior papers for those details. This piece is aimed at database researchers, graduate students, or practitioners who already know conjunctive queries and want a single narrative that connects the CSP, graph-algorithm, and statistical-inference angles. It does not replace the original technical papers, but it lowers the barrier to understanding the overall approach. I would send it out for peer review at a venue that publishes tutorials or surveys; the framework is important enough that a clear, self-contained exposition adds value even without fresh theorems.

Referee Report

2 major / 2 minor

Summary. The manuscript is a tutorial on the PANDA framework for conjunctive query optimization and evaluation. It describes how information-theoretically tight upper bounds on the cardinalities of intermediate relations are derived from entropy inequalities and used to construct query plans directly from the mathematical proofs of these bounds. The tutorial claims that this approach allows a generic algorithm to match or subsume the runtimes of specialized algorithms for the same problems, including those that use fast matrix multiplication, and illustrates the ideas with concrete examples.

Significance. If the tutorial accurately conveys the PANDA framework's ability to achieve tight bounds and efficient plans, it could be significant for the database theory community by making advanced results from the intersection of database theory, CSP, and graph algorithms more accessible. The tight coupling of proofs to algorithms is a notable strength that could inspire new query optimization techniques.

major comments (2)

[Discussion of runtime matching and PANDA framework] The central claim that PANDA matches the runtimes of specialized algorithms (including fast matrix multiplication) is load-bearing, but the tutorial does not address the computational overhead of computing the optimal information-theoretic bounds (via linear programs over entropy variables, which can be exponential in size for general CQs) or of translating the proof structure into an executable plan. This overhead must be shown to be asymptotically negligible to substantiate the claim.
[Examples illustrating the key ideas] In the concrete examples used to illustrate the key ideas, include explicit calculations of the bound derivation (e.g., the LP or entropy inequalities used) and the resulting plan, along with a comparison to the runtime of the specialized algorithm it is claimed to match.

minor comments (2)

[Notation and definitions] Ensure consistent use of notation for statistics S and query Q throughout the tutorial.
[References] Add references to the original PANDA papers for readers who wish to delve deeper into the proofs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive view of the tutorial's potential impact. We address the major comments point by point below and will revise the manuscript accordingly to strengthen the presentation.

read point-by-point responses

Referee: The central claim that PANDA matches the runtimes of specialized algorithms (including fast matrix multiplication) is load-bearing, but the tutorial does not address the computational overhead of computing the optimal information-theoretic bounds (via linear programs over entropy variables, which can be exponential in size for general CQs) or of translating the proof structure into an executable plan. This overhead must be shown to be asymptotically negligible to substantiate the claim.

Authors: We agree that an explicit discussion of overhead is needed to fully substantiate the runtime claims. In the PANDA framework the entropy LP is solved once per query; under standard data-complexity assumptions (fixed query) its size is constant and therefore asymptotically negligible relative to input size. Plan construction follows the proof structure and runs in time polynomial in the (query-dependent) proof size. We will add a dedicated subsection on these complexity considerations, clarifying that the matching runtimes refer to the data-dependent evaluation phase and that the overhead is acceptable for the fixed-query setting in which specialized algorithms are also analyzed. revision: yes
Referee: In the concrete examples used to illustrate the key ideas, include explicit calculations of the bound derivation (e.g., the LP or entropy inequalities used) and the resulting plan, along with a comparison to the runtime of the specialized algorithm it is claimed to match.

Authors: We appreciate this suggestion for making the tutorial more self-contained. The current examples were kept concise to convey intuition, but we will expand them to show the explicit entropy inequalities (or the corresponding LP), the query plan derived directly from the proof, and a side-by-side runtime comparison with the specialized algorithm (including fast-matrix-multiplication baselines). These additions will be placed in the relevant example sections without altering the overall tutorial structure. revision: yes

Circularity Check

0 steps flagged

No circularity: tutorial presents prior PANDA results without self-referential derivations

full rationale

The paper is explicitly an expository tutorial on the established PANDA framework from prior publications. It describes the derivation of information-theoretic cardinality bounds and their translation into query plans, but introduces no new equations, predictions, or first-principles results within this text that reduce to fitted parameters, self-definitions, or self-citation chains by construction. Central claims about matching specialized runtimes (including fast matrix multiplication) are attributed to the framework's existing properties rather than being re-derived here in a tautological manner. No load-bearing step equates output to input via the paper's own equations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The tutorial rests on the prior development of the PANDA framework; no new free parameters, axioms, or invented entities are introduced in the abstract.

axioms (1)

domain assumption Information-theoretically tight upper bounds on intermediate cardinalities exist and can be derived for conjunctive queries given input statistics.
This is the central premise stated in the abstract as the key idea of PANDA.

pith-pipeline@v0.9.0 · 5592 in / 1296 out tokens · 49776 ms · 2026-05-10T19:30:16.850402+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The key idea is to derive information-theoretically tight upper bounds on the cardinalities of intermediate relations... turning each proof step into an atomic relational operator
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

submodular width... Shannon-flow inequalities

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages

[1]

Abo Khamis and H

M. Abo Khamis and H. Chen,Jaguar: A Primal Algorithm for Conjunctive Query Evaluation in Submodular-Width Time, Proc. ACM Manag. Data, (2026)

work page 2026
[2]

Abo Khamis, R

M. Abo Khamis, R. R. Curtin, B. Moseley, H. Q. Ngo, X. Nguyen, D. Olteanu, and M. Schleich,Functional aggregate queries with additive inequalities, ACM Trans. Database Syst., 45 (2020)

work page 2020
[3]

Abo Khamis, V

M. Abo Khamis, V. Nakos, D. Olteanu, and D. Suciu,Join size bounds using lp-norms on degree sequences, Proc. ACM Manag. Data, 2 (2024)

work page 2024
[4]

Abo Khamis, H

M. Abo Khamis, H. Q. Ngo, and A. Rudra,FAQ: questions asked frequently, in Proceedings of the 35th ACM PODS 2016, San Francisco, CA, USA, June 26 - July 01, 2016, ACM, 2016, pp. 13–28

work page 2016
[5]

Abo Khamis, H

M. Abo Khamis, H. Q. Ngo, and D. Suciu,Computing join queries with functional dependencies, in Proceedings of the 35th ACM PODS 2016, San Francisco, CA, USA, June 26 - July 01, 2016, ACM, 2016, pp. 327–342

work page 2016
[6]

Abo Khamis, H

M. Abo Khamis, H. Q. Ngo, and D. Suciu,What Do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Have to Do with One Another?, in Proceedings of the 36th ACM PODS 2017, Chicago, IL, USA, May 14-19, 2017, ACM, 2017, pp. 429–444. Extended version available athttp://arxiv.org/abs/1612.02503

work page arXiv 2017
[7]

Abo Khamis, H

M. Abo Khamis, H. Q. Ngo, and D. Suciu,PANDA: Query Evaluation in Submodular Width, TheoretiCS, Volume 4 (2025)

work page 2025
[8]

Abo Khamis, H

M. Abo Khamis, H. Q. Ngo, and D. Suciu,PANDAExpress: a Simpler and Faster PANDA Algorithm, Proc. ACM Manag. Data, (2026). 14We also need to replace the maximum in Eq. (42) by the supremum since the set of entropic functions is not closed. 15In general, we cannot perform FMM over the ( min, +) semiring because there is no subtraction. Also, full queries ca...

work page 2026
[9]

Alon,On the number of subgraphs of prescribed type of graphs with a given number of edges, Israel J

N. Alon,On the number of subgraphs of prescribed type of graphs with a given number of edges, Israel J. Math., 38 (1981), pp. 116–130. [11]N. Alon, R. Yuster, and U. Zwick,Finding and counting given length cycles, Algorithmica, 17 (1997), pp. 209–223

work page 1981
[10]

Atserias, M

A. Atserias, M. Grohe, and D. Marx,Size bounds and query plans for relational joins, SIAM J. Comput., 42 (2013), pp. 1737–1767

work page 2013
[11]

Bagan, A

G. Bagan, A. Durand, and E. Grandjean,On acyclic conjunctive queries and constant delay enumeration, in Computer Science Logic, 21st International Workshop, CSL 2007, 16th Annual Conference of the EACSL, Lausanne, Switzerland, September 11-15, 2007, Proceedings, vol. 4646 of Lecture Notes in Computer Science, Springer, 2007, pp. 208–222

work page 2007
[12]

Berkholz and H

C. Berkholz and H. Vinall-Smeeth,Factorised Representations of Join Queries: Tight Bounds and a New Dichotomy, arXiv e-prints, (2025), p. arXiv:2503.20438

work page arXiv 2025
[13]

Bringmann and E

K. Bringmann and E. Gorbachev,A fine-grained classification of subquadratic patterns for subgraph listing and friends, in Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC 2025, Prague, Czechia, June 23-27, 2025, ACM, 2025, pp. 2145–2156

work page 2025
[14]

T. H. Chan and R. W. Yeung,On a relation between information inequalities and group theory, IEEE Transactions on Information Theory, 48 (2002), pp. 1992–1995

work page 2002
[15]

A. K. Chandra and P. M. Merlin,Optimal implementation of conjunctive queries in relational data bases, in Proceedings of the 9th Annual ACM Symposium on Theory of Computing, May 4-6, 1977, Boulder, Colorado, USA, ACM, 1977, pp. 77–90

work page 1977
[16]

S. Chaudhuri,An overview of query optimization in relational systems, in Proceedings of the Seventeenth ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems, June 1-3, 1998, Seattle, Washington, USA, ACM Press, 1998, pp. 34–43

work page 1998
[17]

F. R. K. Chung, R. L. Graham, P. Frankl, and J. B. Shearer,Some intersection theorems for ordered sets and graphs, J. Combin. Theory Ser. A, 43 (1986), pp. 23–37

work page 1986
[18]

Dalirrooyfard, S

M. Dalirrooyfard, S. Mathialagan, V. V. Williams, and Y. Xu,Towards optimal output-sensitive clique listing or: Listing cliques from smaller cliques, in Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, New York, NY, USA, 2024, Association for Computing Machinery, p. 923–934

work page 2024
[19]

Dalirrooyfard, T

M. Dalirrooyfard, T. Vuong, and V. Vassilevska Williams,Graph pattern detection: Hardness for all induced patterns and faster noninduced cycles, SIAM J. Comput., 50 (2021), pp. 1627–1662. [22]R. Dechter,Bucket elimination: A unifying framework for reasoning, Artif. Intell., 113 (1999), pp. 41–85

work page 2021
[20]

,Bucket elimination: A unifying framework for reasoning, Artificial Intelligence, 113 (1999), pp. 41–85

work page 1999
[21]

Deeds and T

K. Deeds and T. C. Merkl,Partition Constraints for Conjunctive Queries: Bounds and Worst-Case Optimal Joins, in 28th International Conference on Database Theory (ICDT 2025), vol. 328 of Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, 2025, Schloss Dagstuhl – Leibniz-Zentrum f¨ ur Informatik, pp. 17:1–17:18

work page 2025
[22]

S. Deep, H. Zhao, A. Z. Fan, and P. Koutris,Output-sensitive conjunctive query evaluation, Proc. ACM Manag. Data, 2 (2024)

work page 2024
[23]

B. Ding, V. R. Narasayya, and S. Chaudhuri,Extensible query optimizers in practice, Found. Trends Databases, 14 (2024), pp. 186–402

work page 2024
[24]

Durand and S

A. Durand and S. Mengel,Structural tractability of counting of solutions to conjunctive queries, in Proceedings of the 16th International Conference on Database Theory, ICDT ’13, New York, NY, USA, 2013, Association for Computing Machinery, p. 81–92

work page 2013
[25]

A. Z. Fan, P. Koutris, and H. Zhao,The Fine-Grained Complexity of Boolean Conjunctive Queries and Sum-Product Problems, in 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023), vol. 261 of Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, 2023, Schloss Dagstuhl – Leibniz-Zentrum f¨ ur Informatik, p...

work page 2023
[26]

ACM Manag

,Tight bounds of circuits for sum-product queries, Proc. ACM Manag. Data, 2 (2024). [30]E. Friedgut,Hypergraphs, entropy, and inequalities, Amer. Math. Monthly, 111 (2004), pp. 749–760

work page 2024
[27]

Friedgut and J

E. Friedgut and J. Kahn,On the number of copies of one hypergraph in another, Israel J. Math., 105 (1998), pp. 251–256

work page 1998
[28]

,On the number of copies of one hypergraph in another, Israel Journal of Mathematics, 105 (1998), pp. 251–256

work page 1998
[29]

Gottlob, G

G. Gottlob, G. Greco, N. Leone, and F. Scarcello,Hypertree decompositions: Questions and answers, in Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2016, San Francisco, CA, USA, June 26 - July 01, 2016, ACM, 2016, pp. 57–74

work page 2016
[30]

Gottlob, S

G. Gottlob, S. T. Lee, and G. Valiant,Size and treewidth bounds for conjunctive queries, in Proceedings of the Twenty-Eigth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2009, June 19 - July 1, 2009, Providence, Rhode Island, USA, ACM, 2009, pp. 45–54

work page 2009
[31]

Gottlob, S

G. Gottlob, S. T. Lee, G. Valiant, and P. Valiant,Size and treewidth bounds for conjunctive queries, J. ACM, 59 (2012), pp. 16:1–16:35

work page 2012
[32]

Gottlob, N

G. Gottlob, N. Leone, and F. Scarcello,Hypertree decompositions and tractable queries, Journal of Computer and System Sciences, 64 (2002), pp. 579–627

work page 2002
[33]

T. J. Green, G. Karvounarakis, and V. Tannen,Provenance semirings, in Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’07, New York, NY, USA, 2007, Association for Computing Machinery, p. 31–40. 25

work page 2007
[34]

T. J. Green and V. Tannen,The semiring framework for database provenance, in Proceedings of the 36th ACM SIGMOD- SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, Chicago, IL, USA, May 14-19, 2017, ACM, 2017, pp. 93–99

work page 2017
[35]

Grohe and D

M. Grohe and D. Marx,Constraint solving via fractional edge covers, in Proceedings of the Seventeenth Annual ACM- SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, ACM Press, 2006, pp. 289–298

work page 2006
[36]

Algorithms, 11 (2014), pp

,Constraint solving via fractional edge covers, ACM Trans. Algorithms, 11 (2014), pp. 4:1–4:20. [41]X. Hu,Output-optimal algorithms for join-aggregate queries, Proc. ACM Manag. Data, 3 (2025)

work page 2014
[37]

S. Im, B. Moseley, H. Q. Ngo, and K. Pruhs,Efficient algorithms for cardinality estimation and conjunctive query evaluation with simple degree constraints, Proc. ACM Manag. Data, 3 (2025), pp. 96:1–96:26

work page 2025
[38]

Itai and M

A. Itai and M. Rodeh,Finding a minimum circuit in a graph, in Proceedings of the 9th Annual ACM Symposium on Theory of Computing, May 4-6, 1977, Boulder, Colorado, USA, ACM, 1977, pp. 1–10

work page 1977
[39]

M. A. Khamis, X. Hu, and D. Suciu,Fast matrix multiplication meets the submodular width, Proc. ACM Manag. Data, 3 (2025), pp. 98:1–98:26

work page 2025
[40]

M. A. Khamis, H. Q. Ngo, C. R ´e, and A. Rudra,Joins via geometric resolutions: Worst case and beyond, ACM Trans. Database Syst., 41 (2016)

work page 2016
[41]

P. G. Kolaitis, A. E. Romashchenko, M. Studen ´y, D. Suciu, and T. A. Boege,Algorithmic Aspects of Information Theory (Dagstuhl Seminar 22301), Dagstuhl Reports, 12 (2023), pp. 180–204

work page 2023
[42]

P. G. Kolaitis and M. Y. Vardi,Conjunctive-query containment and constraint satisfaction, in Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1-3, 1998, Seattle, Washington, USA, ACM Press, 1998, pp. 205–213. [48]D. Koller and N. Friedman,Probabilistic Graphical Models - Principles and Techniques, M...

work page 1998
[43]

V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann,How good are query optimizers, really?, Proc. VLDB Endow., 9 (2015), pp. 204–215. [50]D. Maier,Theory of Relational Databases, Computer Science Pr, 1983

work page 2015
[44]

Marx,Tractable Hypergraph Properties for Constraint Satisfaction and Conjunctive Queries, J

D. Marx,Tractable Hypergraph Properties for Constraint Satisfaction and Conjunctive Queries, J. ACM, 60 (2013), pp. 42:1–42:51

work page 2013
[45]

H. Q. Ngo, E. Porat, C. R ´e, and A. Rudra,Worst-case optimal join algorithms: [extended abstract], in Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20-24, 2012, ACM, 2012, pp. 37–48. [53]H. Q. Ngo, E. Porat, C. R ´e, and A. Rudra,Worst-case optimal join algorithms, J. ACM...

work page 2012
[46]

H. Q. Ngo, C. R ´e, and A. Rudra,Skew strikes back: new developments in the theory of join algorithms, SIGMOD Rec., 42 (2013), pp. 5–16. [55]R. Ramakrishnan and J. Gehrke,Database management systems (3. ed.), McGraw-Hill, 2003

work page 2013
[47]

T. L. Veldhuizen,Triejoin: A simple, worst-case optimal join algorithm, in Proc. 17th International Conference on Database Theory (ICDT), Athens, Greece, March 24-28, 2014, OpenProceedings.org, 2014, pp. 96–106

work page 2014
[48]

Vikneshwar Mani Jayaraman, C

S. Vikneshwar Mani Jayaraman, C. Ropell, and A. Rudra,Worst-case Optimal Binary Join Algorithms under General ℓp Constraints, arXiv e-prints, (2021), p. arXiv:2112.01003

work page arXiv 2021
[49]

V. V. Williams, Y. Xu, Z. Xu, and R. Zhou,New Bounds for Matrix Multiplication: from Alpha to Omega, pp. 3792–3835

work page
[50]

M. Yannakakis,Algorithms for acyclic database schemes, in Very Large Data Bases, 7th International Conference, September 9-11, 1981, Cannes, France, Proceedings, IEEE Computer Society, 1981, pp. 82–94

work page 1981
[51]

Yuster and U

R. Yuster and U. Zwick,Detecting short directed cycles using rectangular matrix multiplication and dynamic programming., in SODA, vol. 4, 2004, pp. 254–260

work page 2004
[52]

N. L. Zhang and D. Poole,Exploiting causal independence in bayesian network inference, J. Artif. Int. Res., 5 (1996), p. 301–328

work page 1996
[53]

N. X. Zhang and D. Poole,A simple approach to bayesian network computations, in Proceedings of the Canadian AI Conference, Springer, 1994, pp. 207–214

work page 1994
[54]

Zhang and R

Z. Zhang and R. W. Yeung,A non-shannon-type conditional inequality of information quantities, IEEE Trans. Information Theory, 43 (1997), pp. 1982–1986

work page 1997
[55]

Zhang and R

Z. Zhang and R. W. Yeung,On characterization of entropy function via information inequalities, IEEE Transactions on Information Theory, 44 (1998), pp. 1440–1452. 26

work page 1998

[1] [1]

Abo Khamis and H

M. Abo Khamis and H. Chen,Jaguar: A Primal Algorithm for Conjunctive Query Evaluation in Submodular-Width Time, Proc. ACM Manag. Data, (2026)

work page 2026

[2] [2]

Abo Khamis, R

M. Abo Khamis, R. R. Curtin, B. Moseley, H. Q. Ngo, X. Nguyen, D. Olteanu, and M. Schleich,Functional aggregate queries with additive inequalities, ACM Trans. Database Syst., 45 (2020)

work page 2020

[3] [3]

Abo Khamis, V

M. Abo Khamis, V. Nakos, D. Olteanu, and D. Suciu,Join size bounds using lp-norms on degree sequences, Proc. ACM Manag. Data, 2 (2024)

work page 2024

[4] [4]

Abo Khamis, H

M. Abo Khamis, H. Q. Ngo, and A. Rudra,FAQ: questions asked frequently, in Proceedings of the 35th ACM PODS 2016, San Francisco, CA, USA, June 26 - July 01, 2016, ACM, 2016, pp. 13–28

work page 2016

[5] [5]

Abo Khamis, H

M. Abo Khamis, H. Q. Ngo, and D. Suciu,Computing join queries with functional dependencies, in Proceedings of the 35th ACM PODS 2016, San Francisco, CA, USA, June 26 - July 01, 2016, ACM, 2016, pp. 327–342

work page 2016

[6] [6]

Abo Khamis, H

M. Abo Khamis, H. Q. Ngo, and D. Suciu,What Do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Have to Do with One Another?, in Proceedings of the 36th ACM PODS 2017, Chicago, IL, USA, May 14-19, 2017, ACM, 2017, pp. 429–444. Extended version available athttp://arxiv.org/abs/1612.02503

work page arXiv 2017

[7] [7]

Abo Khamis, H

M. Abo Khamis, H. Q. Ngo, and D. Suciu,PANDA: Query Evaluation in Submodular Width, TheoretiCS, Volume 4 (2025)

work page 2025

[8] [8]

Abo Khamis, H

M. Abo Khamis, H. Q. Ngo, and D. Suciu,PANDAExpress: a Simpler and Faster PANDA Algorithm, Proc. ACM Manag. Data, (2026). 14We also need to replace the maximum in Eq. (42) by the supremum since the set of entropic functions is not closed. 15In general, we cannot perform FMM over the ( min, +) semiring because there is no subtraction. Also, full queries ca...

work page 2026

[9] [9]

Alon,On the number of subgraphs of prescribed type of graphs with a given number of edges, Israel J

N. Alon,On the number of subgraphs of prescribed type of graphs with a given number of edges, Israel J. Math., 38 (1981), pp. 116–130. [11]N. Alon, R. Yuster, and U. Zwick,Finding and counting given length cycles, Algorithmica, 17 (1997), pp. 209–223

work page 1981

[10] [10]

Atserias, M

A. Atserias, M. Grohe, and D. Marx,Size bounds and query plans for relational joins, SIAM J. Comput., 42 (2013), pp. 1737–1767

work page 2013

[11] [11]

Bagan, A

G. Bagan, A. Durand, and E. Grandjean,On acyclic conjunctive queries and constant delay enumeration, in Computer Science Logic, 21st International Workshop, CSL 2007, 16th Annual Conference of the EACSL, Lausanne, Switzerland, September 11-15, 2007, Proceedings, vol. 4646 of Lecture Notes in Computer Science, Springer, 2007, pp. 208–222

work page 2007

[12] [12]

Berkholz and H

C. Berkholz and H. Vinall-Smeeth,Factorised Representations of Join Queries: Tight Bounds and a New Dichotomy, arXiv e-prints, (2025), p. arXiv:2503.20438

work page arXiv 2025

[13] [13]

Bringmann and E

K. Bringmann and E. Gorbachev,A fine-grained classification of subquadratic patterns for subgraph listing and friends, in Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC 2025, Prague, Czechia, June 23-27, 2025, ACM, 2025, pp. 2145–2156

work page 2025

[14] [14]

T. H. Chan and R. W. Yeung,On a relation between information inequalities and group theory, IEEE Transactions on Information Theory, 48 (2002), pp. 1992–1995

work page 2002

[15] [15]

A. K. Chandra and P. M. Merlin,Optimal implementation of conjunctive queries in relational data bases, in Proceedings of the 9th Annual ACM Symposium on Theory of Computing, May 4-6, 1977, Boulder, Colorado, USA, ACM, 1977, pp. 77–90

work page 1977

[16] [16]

S. Chaudhuri,An overview of query optimization in relational systems, in Proceedings of the Seventeenth ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems, June 1-3, 1998, Seattle, Washington, USA, ACM Press, 1998, pp. 34–43

work page 1998

[17] [17]

F. R. K. Chung, R. L. Graham, P. Frankl, and J. B. Shearer,Some intersection theorems for ordered sets and graphs, J. Combin. Theory Ser. A, 43 (1986), pp. 23–37

work page 1986

[18] [18]

Dalirrooyfard, S

M. Dalirrooyfard, S. Mathialagan, V. V. Williams, and Y. Xu,Towards optimal output-sensitive clique listing or: Listing cliques from smaller cliques, in Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, New York, NY, USA, 2024, Association for Computing Machinery, p. 923–934

work page 2024

[19] [19]

Dalirrooyfard, T

M. Dalirrooyfard, T. Vuong, and V. Vassilevska Williams,Graph pattern detection: Hardness for all induced patterns and faster noninduced cycles, SIAM J. Comput., 50 (2021), pp. 1627–1662. [22]R. Dechter,Bucket elimination: A unifying framework for reasoning, Artif. Intell., 113 (1999), pp. 41–85

work page 2021

[20] [20]

,Bucket elimination: A unifying framework for reasoning, Artificial Intelligence, 113 (1999), pp. 41–85

work page 1999

[21] [21]

Deeds and T

K. Deeds and T. C. Merkl,Partition Constraints for Conjunctive Queries: Bounds and Worst-Case Optimal Joins, in 28th International Conference on Database Theory (ICDT 2025), vol. 328 of Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, 2025, Schloss Dagstuhl – Leibniz-Zentrum f¨ ur Informatik, pp. 17:1–17:18

work page 2025

[22] [22]

S. Deep, H. Zhao, A. Z. Fan, and P. Koutris,Output-sensitive conjunctive query evaluation, Proc. ACM Manag. Data, 2 (2024)

work page 2024

[23] [23]

B. Ding, V. R. Narasayya, and S. Chaudhuri,Extensible query optimizers in practice, Found. Trends Databases, 14 (2024), pp. 186–402

work page 2024

[24] [24]

Durand and S

A. Durand and S. Mengel,Structural tractability of counting of solutions to conjunctive queries, in Proceedings of the 16th International Conference on Database Theory, ICDT ’13, New York, NY, USA, 2013, Association for Computing Machinery, p. 81–92

work page 2013

[25] [25]

A. Z. Fan, P. Koutris, and H. Zhao,The Fine-Grained Complexity of Boolean Conjunctive Queries and Sum-Product Problems, in 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023), vol. 261 of Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, 2023, Schloss Dagstuhl – Leibniz-Zentrum f¨ ur Informatik, p...

work page 2023

[26] [26]

ACM Manag

,Tight bounds of circuits for sum-product queries, Proc. ACM Manag. Data, 2 (2024). [30]E. Friedgut,Hypergraphs, entropy, and inequalities, Amer. Math. Monthly, 111 (2004), pp. 749–760

work page 2024

[27] [27]

Friedgut and J

E. Friedgut and J. Kahn,On the number of copies of one hypergraph in another, Israel J. Math., 105 (1998), pp. 251–256

work page 1998

[28] [28]

,On the number of copies of one hypergraph in another, Israel Journal of Mathematics, 105 (1998), pp. 251–256

work page 1998

[29] [29]

Gottlob, G

G. Gottlob, G. Greco, N. Leone, and F. Scarcello,Hypertree decompositions: Questions and answers, in Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2016, San Francisco, CA, USA, June 26 - July 01, 2016, ACM, 2016, pp. 57–74

work page 2016

[30] [30]

Gottlob, S

G. Gottlob, S. T. Lee, and G. Valiant,Size and treewidth bounds for conjunctive queries, in Proceedings of the Twenty-Eigth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2009, June 19 - July 1, 2009, Providence, Rhode Island, USA, ACM, 2009, pp. 45–54

work page 2009

[31] [31]

Gottlob, S

G. Gottlob, S. T. Lee, G. Valiant, and P. Valiant,Size and treewidth bounds for conjunctive queries, J. ACM, 59 (2012), pp. 16:1–16:35

work page 2012

[32] [32]

Gottlob, N

G. Gottlob, N. Leone, and F. Scarcello,Hypertree decompositions and tractable queries, Journal of Computer and System Sciences, 64 (2002), pp. 579–627

work page 2002

[33] [33]

T. J. Green, G. Karvounarakis, and V. Tannen,Provenance semirings, in Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’07, New York, NY, USA, 2007, Association for Computing Machinery, p. 31–40. 25

work page 2007

[34] [34]

T. J. Green and V. Tannen,The semiring framework for database provenance, in Proceedings of the 36th ACM SIGMOD- SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, Chicago, IL, USA, May 14-19, 2017, ACM, 2017, pp. 93–99

work page 2017

[35] [35]

Grohe and D

M. Grohe and D. Marx,Constraint solving via fractional edge covers, in Proceedings of the Seventeenth Annual ACM- SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, ACM Press, 2006, pp. 289–298

work page 2006

[36] [36]

Algorithms, 11 (2014), pp

,Constraint solving via fractional edge covers, ACM Trans. Algorithms, 11 (2014), pp. 4:1–4:20. [41]X. Hu,Output-optimal algorithms for join-aggregate queries, Proc. ACM Manag. Data, 3 (2025)

work page 2014

[37] [37]

S. Im, B. Moseley, H. Q. Ngo, and K. Pruhs,Efficient algorithms for cardinality estimation and conjunctive query evaluation with simple degree constraints, Proc. ACM Manag. Data, 3 (2025), pp. 96:1–96:26

work page 2025

[38] [38]

Itai and M

A. Itai and M. Rodeh,Finding a minimum circuit in a graph, in Proceedings of the 9th Annual ACM Symposium on Theory of Computing, May 4-6, 1977, Boulder, Colorado, USA, ACM, 1977, pp. 1–10

work page 1977

[39] [39]

M. A. Khamis, X. Hu, and D. Suciu,Fast matrix multiplication meets the submodular width, Proc. ACM Manag. Data, 3 (2025), pp. 98:1–98:26

work page 2025

[40] [40]

M. A. Khamis, H. Q. Ngo, C. R ´e, and A. Rudra,Joins via geometric resolutions: Worst case and beyond, ACM Trans. Database Syst., 41 (2016)

work page 2016

[41] [41]

P. G. Kolaitis, A. E. Romashchenko, M. Studen ´y, D. Suciu, and T. A. Boege,Algorithmic Aspects of Information Theory (Dagstuhl Seminar 22301), Dagstuhl Reports, 12 (2023), pp. 180–204

work page 2023

[42] [42]

P. G. Kolaitis and M. Y. Vardi,Conjunctive-query containment and constraint satisfaction, in Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1-3, 1998, Seattle, Washington, USA, ACM Press, 1998, pp. 205–213. [48]D. Koller and N. Friedman,Probabilistic Graphical Models - Principles and Techniques, M...

work page 1998

[43] [43]

V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann,How good are query optimizers, really?, Proc. VLDB Endow., 9 (2015), pp. 204–215. [50]D. Maier,Theory of Relational Databases, Computer Science Pr, 1983

work page 2015

[44] [44]

Marx,Tractable Hypergraph Properties for Constraint Satisfaction and Conjunctive Queries, J

D. Marx,Tractable Hypergraph Properties for Constraint Satisfaction and Conjunctive Queries, J. ACM, 60 (2013), pp. 42:1–42:51

work page 2013

[45] [45]

H. Q. Ngo, E. Porat, C. R ´e, and A. Rudra,Worst-case optimal join algorithms: [extended abstract], in Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20-24, 2012, ACM, 2012, pp. 37–48. [53]H. Q. Ngo, E. Porat, C. R ´e, and A. Rudra,Worst-case optimal join algorithms, J. ACM...

work page 2012

[46] [46]

H. Q. Ngo, C. R ´e, and A. Rudra,Skew strikes back: new developments in the theory of join algorithms, SIGMOD Rec., 42 (2013), pp. 5–16. [55]R. Ramakrishnan and J. Gehrke,Database management systems (3. ed.), McGraw-Hill, 2003

work page 2013

[47] [47]

T. L. Veldhuizen,Triejoin: A simple, worst-case optimal join algorithm, in Proc. 17th International Conference on Database Theory (ICDT), Athens, Greece, March 24-28, 2014, OpenProceedings.org, 2014, pp. 96–106

work page 2014

[48] [48]

Vikneshwar Mani Jayaraman, C

S. Vikneshwar Mani Jayaraman, C. Ropell, and A. Rudra,Worst-case Optimal Binary Join Algorithms under General ℓp Constraints, arXiv e-prints, (2021), p. arXiv:2112.01003

work page arXiv 2021

[49] [49]

V. V. Williams, Y. Xu, Z. Xu, and R. Zhou,New Bounds for Matrix Multiplication: from Alpha to Omega, pp. 3792–3835

work page

[50] [50]

M. Yannakakis,Algorithms for acyclic database schemes, in Very Large Data Bases, 7th International Conference, September 9-11, 1981, Cannes, France, Proceedings, IEEE Computer Society, 1981, pp. 82–94

work page 1981

[51] [51]

Yuster and U

R. Yuster and U. Zwick,Detecting short directed cycles using rectangular matrix multiplication and dynamic programming., in SODA, vol. 4, 2004, pp. 254–260

work page 2004

[52] [52]

N. L. Zhang and D. Poole,Exploiting causal independence in bayesian network inference, J. Artif. Int. Res., 5 (1996), p. 301–328

work page 1996

[53] [53]

N. X. Zhang and D. Poole,A simple approach to bayesian network computations, in Proceedings of the Canadian AI Conference, Springer, 1994, pp. 207–214

work page 1994

[54] [54]

Zhang and R

Z. Zhang and R. W. Yeung,A non-shannon-type conditional inequality of information quantities, IEEE Trans. Information Theory, 43 (1997), pp. 1982–1986

work page 1997

[55] [55]

Zhang and R

Z. Zhang and R. W. Yeung,On characterization of entropy function via information inequalities, IEEE Transactions on Information Theory, 44 (1998), pp. 1440–1452. 26

work page 1998