Towards Output-Optimal Uniform Sampling and Approximate Counting for Join-Project Queries

Jinchao Huang; Xiao Hu

arxiv: 2603.12560 · v3 · submitted 2026-03-13 · 💻 cs.DB

Towards Output-Optimal Uniform Sampling and Approximate Counting for Join-Project Queries

Xiao Hu , Jinchao Huang This is my paper

Pith reviewed 2026-05-15 12:22 UTC · model grok-4.3

classification 💻 cs.DB

keywords join-project queriesuniform samplingapproximate countingcommunication complexityrejection samplingoutput optimalitydatabase query processing

0 comments

The pith

First asymptotically optimal algorithms for uniform sampling and approximate counting on matrix, star, and chain join-project queries

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the first asymptotically optimal algorithms for uniform sampling and approximate counting on join-project queries for the classes of matrix, star, and chain shapes. Earlier propose-and-verify methods incurred high costs whenever projections shrank the output size relative to the join. A rejection-based sampling strategy paired with hybrid counting reductions delivers polynomial improvements in running time. Matching lower bounds from communication complexity establish that the results are tight, and these bounds continue to apply even when fast matrix multiplication is allowed. The work is relevant because join-project queries appear often in database workloads and better primitives directly speed up query optimization and approximate query processing.

Core claim

We present the first asymptotically optimal algorithms for fundamental classes of join-project queries, including matrix, star, and chain queries. By leveraging a novel rejection-based sampling strategy and a hybrid counting reduction, we achieve polynomial speedups over the state of the art. We establish the optimality of our results through matching communication complexity lower bounds, which hold even against algebraic techniques like fast matrix multiplication.

What carries the argument

Rejection-based sampling strategy combined with a hybrid counting reduction that produces output-optimal running time

If this is right

Matrix and star queries admit efficient sublinear-time algorithms.
Sublinear algorithms are impossible for chain queries in general.
The lower bounds remain valid even when fast matrix multiplication is permitted.
Optimal algorithms for unrestricted join-project queries stay open.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The rejection technique may extend to other query shapes that share similar structural reduction properties.
Communication-complexity arguments could set limits for sampling in distributed database systems.
Improved primitives may raise performance in practical approximate query processing engines.

Load-bearing premise

The optimality claims and polynomial speedups hold only for the restricted classes of matrix, star, and chain queries.

What would settle it

An algorithm that samples or counts chain join-project queries in sublinear time, or a communication-complexity lower bound that the presented upper bound fails to match

Figures

Figures reproduced from arXiv: 2603.12560 by Jinchao Huang, Xiao Hu.

read the original abstract

Uniform sampling and approximate counting are fundamental primitives for modern database applications, ranging from query optimization to approximate query processing. While recent breakthroughs have established optimal sampling and counting algorithms for full join queries, a significant gap remains for join-project queries, which are ubiquitous in real-world workloads. The state-of-the-art ``propose-and-verify'' framework \cite{chen2020random} for these queries suffers from fundamental inefficiencies, often yielding prohibitive complexity when projections significantly reduce the output size. In this paper, we present the first asymptotically optimal algorithms for fundamental classes of join-project queries, including matrix, star, and chain queries. By leveraging a novel rejection-based sampling strategy and a hybrid counting reduction, we achieve polynomial speedups over the state of the art. We establish the optimality of our results through matching communication complexity lower bounds, which hold even against algebraic techniques like fast matrix multiplication. Finally, we delineate the theoretical limits of the problem space. While matrix and star queries admit efficient sublinear-time algorithms, we establish a significantly stronger lower bound for chain queries, demonstrating that sublinear algorithms are impossible in general.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first output-optimal sampling and counting algorithms for matrix, star, and chain join-project queries with matching communication-complexity lower bounds.

read the letter

The core advance here is a pair of algorithms that achieve output-optimal time for uniform sampling and approximate counting on matrix, star, and chain join-project queries. The authors replace the earlier propose-and-verify approach with a rejection-sampling strategy plus a hybrid counting reduction, which removes the blow-up that occurs when projections shrink the output size. They back the upper bounds with communication-complexity lower bounds that continue to hold even if fast matrix multiplication is allowed. That combination is the main new piece of work. The paper is also careful about its scope. It states outright that the general join-project case is still open and proves a stronger lower bound showing that sublinear time is impossible for chain queries. That honesty about where the results stop is useful. The main soft spot is that the concrete constructions for rejection sampling and the hybrid reduction are only sketched in the abstract; without the full proofs it is hard to judge how tight the constants are or whether hidden logarithmic factors appear in practice. The lower-bound arguments look external and non-circular, which is a plus. This paper is for people working on the complexity of join queries and on approximate query engines. A reader who already knows the full-join results will see exactly where the projection step creates new difficulties and how the authors handle them. I would send it to peer review. The claims are scoped tightly enough and the technical direction is clear enough that referees can check the details without wasting time on vague promises.

Referee Report

0 major / 3 minor

Summary. The paper claims to present the first asymptotically optimal algorithms for uniform sampling and approximate counting over fundamental classes of join-project queries (matrix, star, and chain). It introduces a rejection-based sampling strategy and a hybrid counting reduction that yield polynomial speedups relative to the prior propose-and-verify framework, and proves optimality by exhibiting matching communication-complexity lower bounds that hold even against algebraic methods such as fast matrix multiplication. The work further delineates theoretical limits, showing that sublinear-time algorithms exist for matrix and star queries but are impossible for chain queries in general.

Significance. If the claimed upper and lower bounds hold, the result is significant: it closes a long-standing gap between full-join and join-project primitives that are ubiquitous in database workloads. The matching communication-complexity bounds supply tight characterizations for the restricted classes, the rejection-sampling and hybrid-counting constructions are technically novel, and the stronger impossibility result for chain queries usefully maps the boundary of what is possible. These contributions advance both the theory and the practical design of sampling/counting engines.

minor comments (3)

Abstract: the phrase 'polynomial speedups' is stated without concrete exponents or degree; adding the precise improvement factors (e.g., from O(n^{2}) to O(n^{1.5})) would strengthen the claim.
§1 (Introduction): the communication-complexity lower-bound argument is summarized at a high level; a short paragraph recalling the precise reduction (e.g., from set-disjointness or index) would help readers verify the matching claim without consulting the full proof.
References: the citation chen2020random appears in the abstract but is not expanded in the provided reference list; ensure the full bibliographic entry is present.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the paper, accurate summary of our contributions on asymptotically optimal uniform sampling and approximate counting for join-project queries, and recommendation for minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives asymptotically optimal algorithms for uniform sampling and approximate counting on restricted classes of join-project queries (matrix, star, chain) via a rejection-based sampling strategy and hybrid counting reduction. Optimality is established by matching lower bounds drawn from external communication-complexity arguments that apply even to algebraic methods such as fast matrix multiplication; these bounds are independent of the paper's own constructions and do not reduce to fitted parameters or self-referential quantities. The manuscript explicitly scopes its positive results to the listed query classes, states that general join-project queries remain open, and proves a stronger impossibility result for sublinear algorithms on chain queries. No derivation step collapses by construction to the inputs, and no load-bearing claim relies on self-citation chains or ansatzes smuggled from prior work by the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the work appears to rest on standard complexity-theoretic assumptions and communication-complexity reductions.

pith-pipeline@v0.9.0 · 5490 in / 1040 out tokens · 25458 ms · 2026-05-15T12:22:32.917803+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

Acharya, P

S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. InSIGMOD, pages 275–286, 1999

work page 1999
[2]

Agarwal, B

S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. Blinkdb: queries with bounded errors and bounded response times on very large data. InProceedings of the 8th ACM European Conference on Computer Systems, pages 29–42. ACM, 2013

work page 2013
[3]

N. Alon, R. Yuster, and U. Zwick. Color-coding.Journal of the ACM (JACM), 42(4):844–856, 1995

work page 1995
[4]

R. R. Amossen and R. Pagh. Faster join-projects and sparse matrix multiplications. InICDT, pages 121–126. ACM, 2009

work page 2009
[5]

Arenas, L

M. Arenas, L. A. Croquevielle, R. Jayaram, and C. Riveros. When is approximate counting for conjunctive queries tractable? InProceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1015–1027, 2021. Towards Output-Optimal Uniform Sampling and Approximate Counting for Join-Project Queries 31

work page 2021
[6]

Assadi, M

S. Assadi, M. Kapralov, and S. Khanna. A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling. InProc. Innovations in Theoretical Computer Science, pages 6:1–6:20, 2019

work page 2019
[7]

Atserias, M

A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins.SIAM Journal on Computing, 42 (4):1737–1767, 2013

work page 2013
[8]

Bagan, A

G. Bagan, A. Durand, and E. Grandjean. On acyclic conjunctive queries and constant delay enumeration. InInternational Workshop on Computer Science Logic, pages 208–222. Springer, 2007

work page 2007
[9]

Bar-Yossef, T

Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. Information theory methods in communication complexity. InProceedings 17th IEEE Annual Conference on Computational Complexity, pages 93–102. IEEE, 2002

work page 2002
[10]

Censor-Hillel, T

K. Censor-Hillel, T. Even, and V. Vassilevska Williams. Fast approximate counting of cycles. InICALP 2024, 2024

work page 2024
[11]

Censor-Hillel, T

K. Censor-Hillel, T. Even, and V. Vassilevska Williams. Output-sensitive approximate counting via a measure-bounded hyperedge oracle, or: How asymmetry helps estimate k-clique counts faster. InProceedings of the 57th Annual ACM Symposium on Theory of Computing, pages 1985–1996, 2025

work page 1985
[12]

Chaudhuri, R

S. Chaudhuri, R. Motwani, and V. Narasayya. On random sampling over joins.ACM SIGMOD Record, 28(2):263–274, 1999

work page 1999
[13]

Chen and K

Y. Chen and K. Yi. Random sampling and size estimation over cyclic joins. InICDT, 2020

work page 2020
[14]

Cohen and H

E. Cohen and H. Kaplan. Summarizing data using bottom-k sketches. In I. Gupta and R. Wattenhofer, editors,Proc. ACM Symposium on Principles of Distributed Computing, pages 225–234. ACM, 2007

work page 2007
[15]

Cohen and H

E. Cohen and H. Kaplan. Tighter estimation using bottom k sketches.Proceedings of the VLDB Endowment, 1(1): 213–224, 2008

work page 2008
[16]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein.Introduction to algorithms. MIT press, 2022

work page 2022
[17]

Curticapean, H

R. Curticapean, H. Dell, and D. Marx. Homomorphisms are a good basis for counting small subgraphs. InProceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 210–223, 2017

work page 2017
[18]

Dalirrooyfard, S

M. Dalirrooyfard, S. Mathialagan, V. Vassilevska Williams, and Y. Xu. Towards optimal output-sensitive clique listing or: Listing cliques from smaller cliques. InProceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, pages 923–934, 2024

work page 2024
[19]

S. Deng, S. Lu, and Y. Tao. On join sampling and hardness of combinatorial output-sensitive join algorithms. InPODS, 2023

work page 2023
[20]

T. Eden, A. Levi, D. Ron, and C. Seshadhri. Approximately counting triangles in sublinear time.SIAM Journal on Computing, 46(5):1603–1646, 2017

work page 2017
[21]

Eisenbrand and F

F. Eisenbrand and F. Grandoni. On the complexity of fixed parameter clique and dominating set.Theoretical Computer Science, 326(1-3):57–67, 2004

work page 2004
[22]

R. A. Fisher and F. Yates.Statistical tables for biological, agricultural and medical research. Hafner Publishing Company, 1953

work page 1953
[23]

X. Hu. Cover or pack: New upper and lower bounds for massively parallel joins. InProceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 181–198, 2021

work page 2021
[24]

Hu and K

X. Hu and K. Yi. Parallel algorithms for sparse matrix multiplication and join-aggregate queries. InProc. ACM Symposium on Principles of Database Systems, pages 411–425, 2020

work page 2020
[25]

Huang, Y

J. Huang, Y. Tao, and S. Wang. Acyclic join sampling under selections: Dichotomy, union sampling, and enumeration. InProc. International Conference on Database Theory, 2026

work page 2026
[26]

N. L. Johnson, A. W. Kemp, and S. Kotz.Univariate discrete distributions. John Wiley & Sons, 2005

work page 2005
[27]

Kalyanasundaram and G

B. Kalyanasundaram and G. Schintger. The probabilistic communication complexity of set intersection.SIAM Journal on Discrete Mathematics, 5(4):545–557, 1992

work page 1992
[28]

K. Kim, J. Ha, G. Fletcher, and W.-S. Han. Guaranteeing the ˜𝑂(AGM/OUT) runtime for uniform sampling and size estimation over joins. InPODS, page 113–125, 2023

work page 2023
[29]

Kloks, D

T. Kloks, D. Kratsch, and H. Müller. Finding and counting small induced subgraphs efficiently.Inf. Process. Lett., 74 (3-4):115–121, 2000

work page 2000
[30]

D. E. Knuth.The art of computer programming, Volume II: Seminumerical Algorithms, 3rd Edition. Addison-Wesley Professional, 1998

work page 1998
[31]

Kowaluk, A

M. Kowaluk, A. Lingas, and E. Lundell. Counting and detecting small subgraphs via equations.SIAM J. Discret. Math., 27(2):892–909, 2013

work page 2013
[32]

H. Q. Ngo. Worst-case optimal join algorithms: Techniques, results, and open problems. InProc. ACM Symposium on Principles of Database Systems, pages 111–124. ACM, 2018

work page 2018
[33]

H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms. InProc. ACM Symposium on Principles of Database Systems, pages 37–48, 2012

work page 2012
[34]

H. Q. Ngo, C. Ré, and A. Rudra. Skew strikes back: new developments in the theory of join algorithms.Acm Sigmod Record, 42(4):5–16, 2014. 32 X. Hu and J. Huang Algorithm 13:SampleJoinProjectpRq Input: An instance R for the join-project query Q“ pV,E, yq, with some auxiliary indices built forRif needed. Output:A uniform sample of the query resultQpRq. 1whi...

work page 2014
[35]

Olken and D

F. Olken and D. Rotem. Simple random sampling from relational databases. InProceedings of the VLDB Endowment, pages 160–169, 1986

work page 1986
[36]

Olken and D

F. Olken and D. Rotem. Random sampling from databases: a survey.Statistics and Computing, 5(1):25–42, 1995

work page 1995
[37]

A. A. Razborov. On the distributional complexity of disjointness. InInternational Colloquium on Automata, Languages, and Programming, pages 249–253. Springer, 1990

work page 1990
[38]

J. Tětek. Approximate Triangle Counting via Sampling and Fast Matrix Multiplication. In49th International Colloquium on Automata, Languages, and Programming (ICALP 2022), pages 107:1–107:20, 2022

work page 2022
[39]

Yannakakis

M. Yannakakis. Algorithms for acyclic database schemes. InProc. International Conference on Very Large Data Bases, pages 82–94, 1981

work page 1981
[40]

AGMpQ ¯y,t|𝑅 𝑒 | : 𝑒PEuq . Note that 𝜋 V´y𝑠 is a valid join result of Q¯ypR𝑠 q (a “success

Z. Zhao, R. Christensen, F. Li, X. Hu, and K. Yi. Random sampling over joins revisited. InSIGMOD, pages 1525–1539, 2018. A Extension to General Join-Project Queries Our framework introduced for matrix and star queries can be easily extended to arbitrary join- project queries. We first recall the AGM bound [7]. For a join query 𝑞“ pV,Eq , letN: EÑZ ` be th...

work page 2018

[1] [1]

Acharya, P

S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. InSIGMOD, pages 275–286, 1999

work page 1999

[2] [2]

Agarwal, B

S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. Blinkdb: queries with bounded errors and bounded response times on very large data. InProceedings of the 8th ACM European Conference on Computer Systems, pages 29–42. ACM, 2013

work page 2013

[3] [3]

N. Alon, R. Yuster, and U. Zwick. Color-coding.Journal of the ACM (JACM), 42(4):844–856, 1995

work page 1995

[4] [4]

R. R. Amossen and R. Pagh. Faster join-projects and sparse matrix multiplications. InICDT, pages 121–126. ACM, 2009

work page 2009

[5] [5]

Arenas, L

M. Arenas, L. A. Croquevielle, R. Jayaram, and C. Riveros. When is approximate counting for conjunctive queries tractable? InProceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1015–1027, 2021. Towards Output-Optimal Uniform Sampling and Approximate Counting for Join-Project Queries 31

work page 2021

[6] [6]

Assadi, M

S. Assadi, M. Kapralov, and S. Khanna. A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling. InProc. Innovations in Theoretical Computer Science, pages 6:1–6:20, 2019

work page 2019

[7] [7]

Atserias, M

A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins.SIAM Journal on Computing, 42 (4):1737–1767, 2013

work page 2013

[8] [8]

Bagan, A

G. Bagan, A. Durand, and E. Grandjean. On acyclic conjunctive queries and constant delay enumeration. InInternational Workshop on Computer Science Logic, pages 208–222. Springer, 2007

work page 2007

[9] [9]

Bar-Yossef, T

Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. Information theory methods in communication complexity. InProceedings 17th IEEE Annual Conference on Computational Complexity, pages 93–102. IEEE, 2002

work page 2002

[10] [10]

Censor-Hillel, T

K. Censor-Hillel, T. Even, and V. Vassilevska Williams. Fast approximate counting of cycles. InICALP 2024, 2024

work page 2024

[11] [11]

Censor-Hillel, T

K. Censor-Hillel, T. Even, and V. Vassilevska Williams. Output-sensitive approximate counting via a measure-bounded hyperedge oracle, or: How asymmetry helps estimate k-clique counts faster. InProceedings of the 57th Annual ACM Symposium on Theory of Computing, pages 1985–1996, 2025

work page 1985

[12] [12]

Chaudhuri, R

S. Chaudhuri, R. Motwani, and V. Narasayya. On random sampling over joins.ACM SIGMOD Record, 28(2):263–274, 1999

work page 1999

[13] [13]

Chen and K

Y. Chen and K. Yi. Random sampling and size estimation over cyclic joins. InICDT, 2020

work page 2020

[14] [14]

Cohen and H

E. Cohen and H. Kaplan. Summarizing data using bottom-k sketches. In I. Gupta and R. Wattenhofer, editors,Proc. ACM Symposium on Principles of Distributed Computing, pages 225–234. ACM, 2007

work page 2007

[15] [15]

Cohen and H

E. Cohen and H. Kaplan. Tighter estimation using bottom k sketches.Proceedings of the VLDB Endowment, 1(1): 213–224, 2008

work page 2008

[16] [16]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein.Introduction to algorithms. MIT press, 2022

work page 2022

[17] [17]

Curticapean, H

R. Curticapean, H. Dell, and D. Marx. Homomorphisms are a good basis for counting small subgraphs. InProceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 210–223, 2017

work page 2017

[18] [18]

Dalirrooyfard, S

M. Dalirrooyfard, S. Mathialagan, V. Vassilevska Williams, and Y. Xu. Towards optimal output-sensitive clique listing or: Listing cliques from smaller cliques. InProceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, pages 923–934, 2024

work page 2024

[19] [19]

S. Deng, S. Lu, and Y. Tao. On join sampling and hardness of combinatorial output-sensitive join algorithms. InPODS, 2023

work page 2023

[20] [20]

T. Eden, A. Levi, D. Ron, and C. Seshadhri. Approximately counting triangles in sublinear time.SIAM Journal on Computing, 46(5):1603–1646, 2017

work page 2017

[21] [21]

Eisenbrand and F

F. Eisenbrand and F. Grandoni. On the complexity of fixed parameter clique and dominating set.Theoretical Computer Science, 326(1-3):57–67, 2004

work page 2004

[22] [22]

R. A. Fisher and F. Yates.Statistical tables for biological, agricultural and medical research. Hafner Publishing Company, 1953

work page 1953

[23] [23]

X. Hu. Cover or pack: New upper and lower bounds for massively parallel joins. InProceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 181–198, 2021

work page 2021

[24] [24]

Hu and K

X. Hu and K. Yi. Parallel algorithms for sparse matrix multiplication and join-aggregate queries. InProc. ACM Symposium on Principles of Database Systems, pages 411–425, 2020

work page 2020

[25] [25]

Huang, Y

J. Huang, Y. Tao, and S. Wang. Acyclic join sampling under selections: Dichotomy, union sampling, and enumeration. InProc. International Conference on Database Theory, 2026

work page 2026

[26] [26]

N. L. Johnson, A. W. Kemp, and S. Kotz.Univariate discrete distributions. John Wiley & Sons, 2005

work page 2005

[27] [27]

Kalyanasundaram and G

B. Kalyanasundaram and G. Schintger. The probabilistic communication complexity of set intersection.SIAM Journal on Discrete Mathematics, 5(4):545–557, 1992

work page 1992

[28] [28]

K. Kim, J. Ha, G. Fletcher, and W.-S. Han. Guaranteeing the ˜𝑂(AGM/OUT) runtime for uniform sampling and size estimation over joins. InPODS, page 113–125, 2023

work page 2023

[29] [29]

Kloks, D

T. Kloks, D. Kratsch, and H. Müller. Finding and counting small induced subgraphs efficiently.Inf. Process. Lett., 74 (3-4):115–121, 2000

work page 2000

[30] [30]

D. E. Knuth.The art of computer programming, Volume II: Seminumerical Algorithms, 3rd Edition. Addison-Wesley Professional, 1998

work page 1998

[31] [31]

Kowaluk, A

M. Kowaluk, A. Lingas, and E. Lundell. Counting and detecting small subgraphs via equations.SIAM J. Discret. Math., 27(2):892–909, 2013

work page 2013

[32] [32]

H. Q. Ngo. Worst-case optimal join algorithms: Techniques, results, and open problems. InProc. ACM Symposium on Principles of Database Systems, pages 111–124. ACM, 2018

work page 2018

[33] [33]

H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms. InProc. ACM Symposium on Principles of Database Systems, pages 37–48, 2012

work page 2012

[34] [34]

H. Q. Ngo, C. Ré, and A. Rudra. Skew strikes back: new developments in the theory of join algorithms.Acm Sigmod Record, 42(4):5–16, 2014. 32 X. Hu and J. Huang Algorithm 13:SampleJoinProjectpRq Input: An instance R for the join-project query Q“ pV,E, yq, with some auxiliary indices built forRif needed. Output:A uniform sample of the query resultQpRq. 1whi...

work page 2014

[35] [35]

Olken and D

F. Olken and D. Rotem. Simple random sampling from relational databases. InProceedings of the VLDB Endowment, pages 160–169, 1986

work page 1986

[36] [36]

Olken and D

F. Olken and D. Rotem. Random sampling from databases: a survey.Statistics and Computing, 5(1):25–42, 1995

work page 1995

[37] [37]

A. A. Razborov. On the distributional complexity of disjointness. InInternational Colloquium on Automata, Languages, and Programming, pages 249–253. Springer, 1990

work page 1990

[38] [38]

J. Tětek. Approximate Triangle Counting via Sampling and Fast Matrix Multiplication. In49th International Colloquium on Automata, Languages, and Programming (ICALP 2022), pages 107:1–107:20, 2022

work page 2022

[39] [39]

Yannakakis

M. Yannakakis. Algorithms for acyclic database schemes. InProc. International Conference on Very Large Data Bases, pages 82–94, 1981

work page 1981

[40] [40]

AGMpQ ¯y,t|𝑅 𝑒 | : 𝑒PEuq . Note that 𝜋 V´y𝑠 is a valid join result of Q¯ypR𝑠 q (a “success

Z. Zhao, R. Christensen, F. Li, X. Hu, and K. Yi. Random sampling over joins revisited. InSIGMOD, pages 1525–1539, 2018. A Extension to General Join-Project Queries Our framework introduced for matrix and star queries can be easily extended to arbitrary join- project queries. We first recall the AGM bound [7]. For a join query 𝑞“ pV,Eq , letN: EÑZ ` be th...

work page 2018