An Approach Based on Bayesian Networks for Query Selectivity Estimation

Frank Morvan; Max Halford; Philippe Saint-Pierre

arxiv: 1907.06295 · v1 · pith:6SUUJKTFnew · submitted 2019-07-14 · 💻 cs.DB

An Approach Based on Bayesian Networks for Query Selectivity Estimation

Max Halford , Philippe Saint-Pierre , Frank Morvan This is my paper

Pith reviewed 2026-05-24 21:19 UTC · model grok-4.3

classification 💻 cs.DB

keywords query selectivity estimationChow-Liu treesBayesian networksattribute correlationsdatabase cost modelsTPC-DS benchmarkquery optimizationindependence assumption

0 comments

The pith

Chow-Liu trees capture attribute correlations to produce selectivity estimates an order of magnitude more accurate than the independence assumption.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that building a Chow-Liu tree for each relation lets a cost model approximate the joint distribution of attribute values instead of treating them as independent. This change reduces the errors that arise when the optimizer is given badly wrong row-count predictions. On the TPC-DS benchmark the resulting estimates are roughly ten times more precise while the extra time and space cost stays modest. A reader would care because these estimates directly determine which join order, index, or algorithm the database chooses, so even modest gains in accuracy can improve end-to-end query speed. The work therefore relaxes one of the oldest simplifying assumptions in query optimization without destroying the practicality of the cost model.

Core claim

We propose a novel approach based on a particular type of Bayesian networks called Chow-Liu trees to approximate the distribution of attribute values inside each relation of a database. Our results on the TPC-DS benchmark show that our method is an order of magnitude more precise than other approaches whilst remaining reasonably efficient in terms of time and space.

What carries the argument

Chow-Liu trees, tree-structured Bayesian networks that encode the strongest pairwise dependencies between attributes and thereby approximate their joint distribution for selectivity calculation.

If this is right

Query optimizers receive row-count estimates whose errors are an order of magnitude smaller, so they more often select efficient join orders and access methods.
The space and time overhead of maintaining one Chow-Liu tree per relation stays low enough for practical deployment inside existing cost models.
The independence assumption can be dropped for any relation whose attributes exhibit tree-like pairwise correlations without redesigning the rest of the optimizer.
TPC-DS workloads, which contain realistic correlation patterns, serve as a concrete demonstration that the precision gain is achievable on standard decision-support data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same trees could be refreshed incrementally when data changes, turning the method into a lightweight online statistics collector.
Because the trees are small, they could be shipped with data exports or used inside federated query engines that lack global statistics.
If higher-order dependencies prove important on some workloads, the Chow-Liu construction could be replaced by a slightly denser graphical model while preserving the same interface to the cost model.

Load-bearing premise

That Chow-Liu trees built from sampled data will capture the correlations that actually matter for selectivity estimation across the full range of queries without introducing new sources of error that offset the gains.

What would settle it

Measure selectivity error on a new benchmark containing known higher-order attribute dependencies; if the error is no longer reduced by roughly a factor of ten relative to the independence baseline, the central claim fails.

Figures

Figures reproduced from arXiv: 1907.06295 by Frank Morvan, Max Halford, Philippe Saint-Pierre.

**Figure 1.** Figure 1: Possible factorisations of P(hair, nationality, gender) The goal of structure learning is to find a BN that closely matches P(X) whilst preserving a low computational complexity. Indeed, for any given BN, the cost of storing it and of computing a marginal distribution P(Xi) depend on it’s structure. The classic approach to structure learning is to define a scoring function that determines how good a BN is … view at source ↗

**Figure 2.** Figure 2: Steiner tree in blue containing nodes G, N, and H nee [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Construction time 4.3 Cardinality estimates We then compared methods based on their average accuracy. In other words we ran each method against each query and measured the average error. Although our method improves the accuracy of cardinality estimation for queries on a single relation; our goal in this benchmark is to measure how much this will impact the overall accuracy for general queries over multipl… view at source ↗

**Figure 4.** Figure 4: Average errors Unsurprisingly, the textbook method produces estimates that are off by several orders of magnitude. What is more interesting is that Bayesian networks are significantly better than sampling. The reason this occurs is because the sampling method doesn’t place any uncertainty as to if a value is present in a relation or not. A value is either in a 8 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Cardinality estimation time As can be seen the main pitfall of the sampling method is that it takes a considerable amount of time to estimate a cardinality. This is expected because for each set of predicates a full pass has to be made on the according sample. Whats more we didn’t even take into account the fact that the necessary samples have to be loaded in memory beforehand. As for the textbook method,… view at source ↗

read the original abstract

The efficiency of a query execution plan depends on the accuracy of the selectivity estimates given to the query optimiser by the cost model. The cost model makes simplifying assumptions in order to produce said estimates in a timely manner. These assumptions lead to selectivity estimation errors that have dramatic effects on the quality of the resulting query execution plans. A convenient assumption that is ubiquitous among current cost models is to assume that attributes are independent with each other. However, it ignores potential correlations which can have a huge negative impact on the accuracy of the cost model. In this paper we attempt to relax the attribute value independence assumption without unreasonably deteriorating the accuracy of the cost model. We propose a novel approach based on a particular type of Bayesian networks called Chow-Liu trees to approximate the distribution of attribute values inside each relation of a database. Our results on the TPC-DS benchmark show that our method is an order of magnitude more precise than other approaches whilst remaining reasonably efficient in terms of time and space.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Chow-Liu trees per relation give a practical way to relax attribute independence in selectivity estimation, but the order-of-magnitude TPC-DS claim rests on details not visible in the abstract.

read the letter

The paper's core move is to learn a Chow-Liu tree from samples for each relation and then use the resulting tree-structured distribution to compute selectivities that respect pairwise correlations. This is a direct, lightweight way to drop the independence assumption that most cost models still rely on. The approach stays efficient because the tree has linear size and the probability calculations can be done with standard message passing or simple traversal. That part is cleanly motivated and technically sound on its own terms. The reported order-of-magnitude precision lift on TPC-DS is the headline result, and if the experiments properly compare against both independence and other correlation-aware baselines while controlling for sampling rate, it would be a useful data point for optimizer work. The main soft spot is that the abstract supplies no information on how the trees were built from samples, which baselines were reimplemented, how error was measured across query mixes, or whether the gains survive changes in data distribution or predicate selectivity. Without those controls the central empirical claim stays hard to evaluate. The sampling concern in the stress-test note is real in principle—finite samples can miss rare but high-impact correlations—but the paper would need to show it does not dominate the reported numbers. This is aimed at database systems people who maintain or extend cost models. A reader already working on selectivity or cardinality estimation would get concrete value from the method even if the numbers require confirmation. I would send it to peer review so the experimental setup can be checked in detail.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes using Chow-Liu trees (a restricted form of Bayesian network) to approximate the joint distribution of attribute values within each database relation. This relaxes the ubiquitous attribute-independence assumption in query cost models while aiming to keep estimation time and space costs modest. The central empirical claim is that the resulting selectivity estimates are an order of magnitude more precise than competing methods on the TPC-DS benchmark.

Significance. If the reported precision gains prove robust under controlled measurement and sensitivity checks, the work would supply a concrete, tree-structured mechanism for incorporating limited attribute correlations into selectivity estimation without incurring the cost of full joint distributions or high-dimensional histograms. Such a technique could be directly integrated into existing DBMS cost models.

major comments (2)

[Abstract] Abstract: the statement that results show the method is 'an order of magnitude more precise than other approaches' supplies no information on the error metric employed (q-error, relative error, etc.), the concrete baseline implementations, the query workload size or mix, or any statistical measures such as standard deviation or confidence intervals. This directly undercuts the central empirical claim.
[Evaluation] Evaluation (results section): no sensitivity analysis is presented on sample size used to build the Chow-Liu trees or on the fidelity of the learned tree structure to the predicate correlations actually appearing in TPC-DS queries. Because the approach relies on finite samples to recover both structure and parameters, any mismatch between the empirical distribution and the true distribution is load-bearing for the claimed accuracy improvement.

minor comments (1)

[Abstract] Abstract: the phrase 'reasonably efficient in terms of time and space' is not accompanied by any concrete figures or comparison tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and outline revisions that will strengthen the manuscript's clarity and empirical support.

read point-by-point responses

Referee: [Abstract] Abstract: the statement that results show the method is 'an order of magnitude more precise than other approaches' supplies no information on the error metric employed (q-error, relative error, etc.), the concrete baseline implementations, the query workload size or mix, or any statistical measures such as standard deviation or confidence intervals. This directly undercuts the central empirical claim.

Authors: We agree that the abstract would benefit from additional context to substantiate the central claim. In the revised version we will expand the abstract to specify that precision is quantified via q-error, that the baselines comprise attribute independence as well as the other methods evaluated in the paper, that the workload consists of the full set of TPC-DS query templates, and that results include mean q-error together with standard deviation across repeated trials. These additions will be kept concise while supplying the requested information. revision: yes
Referee: [Evaluation] Evaluation (results section): no sensitivity analysis is presented on sample size used to build the Chow-Liu trees or on the fidelity of the learned tree structure to the predicate correlations actually appearing in TPC-DS queries. Because the approach relies on finite samples to recover both structure and parameters, any mismatch between the empirical distribution and the true distribution is load-bearing for the claimed accuracy improvement.

Authors: The observation is correct: the current evaluation fixes a single sample size and does not report sensitivity or structure validation against TPC-DS predicates. We will insert a new subsection that varies sample size (0.1 %, 1 %, 5 %, 10 % of each relation) and plots the resulting q-error, and we will add a brief analysis of the learned tree edges versus the most frequent selection and join predicates appearing in the benchmark queries. This directly addresses the robustness concern while remaining within reasonable space limits. revision: yes

Circularity Check

0 steps flagged

Empirical evaluation on external benchmark; no derivation reduces to its inputs by construction

full rationale

The paper presents a method that builds Chow-Liu trees from sampled data to approximate joint distributions and then measures selectivity estimation error on the TPC-DS benchmark. The central claim (order-of-magnitude improvement) is an empirical comparison against ground-truth selectivities computed from the benchmark data, not a quantity obtained by fitting parameters inside the same experiment and then renaming the fit as a prediction. No equations are shown that define the reported precision in terms of the model's own parameters, no self-citation is invoked as a uniqueness theorem, and the approach does not rename a known result under new coordinates. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the method implicitly relies on standard assumptions of Bayesian network structure learning and the representativeness of TPC-DS data.

pith-pipeline@v0.9.0 · 5694 in / 1101 out tokens · 30352 ms · 2026-05-24T21:19:54.866637+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a novel approach based on a particular type of Bayesian networks called Chow-Liu trees to approximate the distribution of attribute values inside each relation of a database.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Chow-Liu trees minimise the KL divergence... retrieving a marginal distribution from a tree can be done in linear time

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 2 internal anchors

[1]

Ac- cess path selection in a relational database management sys tem

P Grifﬁths Selinger, Morton M Astrahan, Donald D Chamber lin, Raymond A Lorie, and Thomas G Price. Ac- cess path selection in a relational database management sys tem. In Proceedings of the 1979 ACM SIGMOD international conference on Management of data , pages 23–34. ACM, 1979

work page 1979
[2]

On the propagation of errors in the size of join results , vol- ume 20

Y annis E Ioannidis and Stavros Christodoulakis. On the propagation of errors in the size of join results , vol- ume 20. ACM, 1991

work page 1991
[3]

Looking Back at Postgres

Joseph M Hellerstein. Looking back at postgres. arXiv preprint arXiv:1901.01973, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[4]

Presto: Interacting with petabytes of data at facebook

Martin Traverso. Presto: Interacting with petabytes of data at facebook. Retrieved February, 4:2014, 2013

work page 2014
[5]

Spark sql: Rel ational data processing in spark

Michael Armbrust, Reynold S Xin, Cheng Lian, Yin Huai, Da vies Liu, Joseph K Bradley, Xiangrui Meng, Tomer Kaftan, Michael J Franklin, Ali Ghodsi, et al. Spark sql: Rel ational data processing in spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1383–1394. ACM, 2015

work page 2015
[6]

Improved histograms for selectivity estimation of range predicates

Viswanath Poosala, Peter J Haas, Y annis E Ioannidis, and Eugene J Shekita. Improved histograms for selectivity estimation of range predicates. In ACM Sigmod Record, volume 25, pages 294–305. ACM, 1996

work page 1996
[7]

Self-tunin g, gpu-accelerated kernel density models for mul- tidimensional selectivity estimation

Max Heimel, Martin Kiefer, and V olker Markl. Self-tunin g, gpu-accelerated kernel density models for mul- tidimensional selectivity estimation. In Proceedings of the 2015 ACM SIGMOD International Conferenc e on Management of Data, pages 1477–1492. ACM, 2015

work page 2015
[8]

Equi-depth multidim ensional histograms

M Muralikrishna and David J DeWitt. Equi-depth multidim ensional histograms. In ACM SIGMOD Record , volume 17, pages 28–36. ACM, 1988

work page 1988
[9]

Accura te estimation of the number of tuples satisfying a condi- tion

Gregory Piatetsky-Shapiro and Charles Connell. Accura te estimation of the number of tuples satisfying a condi- tion. ACM Sigmod Record, 14(2):256–276, 1984

work page 1984
[10]

Query size esti mation by adaptive sampling

Richard J Lipton and Jeffrey F Naughton. Query size esti mation by adaptive sampling. In Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages 40–46. ACM, 1990

work page 1990
[11]

Join size estimation subject to ﬁlter conditions

David V engerov, Andre Cavalheiro Menck, Mohamed Zait, and Sunil P Chakkappen. Join size estimation subject to ﬁlter conditions. Proceedings of the VLDB Endowment , 8(12):1530–1541, 2015

work page 2015
[12]

Cardinality estimation done right: Index-based join sampling

Viktor Leis, Bernharde Radke, Andrey Gubichev, Alfons Kemper, and Thomas Neumann. Cardinality estimation done right: Index-based join sampling. In CIDR, 2017

work page 2017
[13]

Two-level sampling for join size estim ation

Y u Chen and Ke Yi. Two-level sampling for join size estim ation. In Proceedings of the 2017 ACM International Conference on Management of Data , pages 759–774. ACM, 2017

work page 2017
[14]

Learned Cardinalities: Estimating Correlated Joins with Deep Learning

Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis , Peter Boncz, and Alfons Kemper. Learned cardinali- ties: Estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[15]

How good are query optimizers, really? Proceedings of the VLDB Endowment , 9(3):204–215, 2015

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Bo ncz, Alfons Kemper, and Thomas Neumann. How good are query optimizers, really? Proceedings of the VLDB Endowment , 9(3):204–215, 2015

work page 2015
[16]

Selec tivity estimation using probabilistic models

Lise Getoor, Benjamin Taskar, and Daphne Koller. Selec tivity estimation using probabilistic models. In ACM SIGMOD Record, volume 30, pages 461–472. ACM, 2001

work page 2001
[17]

Lightweight graphical models for selectivity estima- tion without independence assumptions

Kostas Tzoumas, Amol Deshpande, and Christian S Jensen . Lightweight graphical models for selectivity estima- tion without independence assumptions. Proceedings of the VLDB Endowment , 4(11):852–863, 2011. 10 A PREPRINT - J ULY 16, 2019

work page 2011
[18]

An introduction to Bayesian networks , volume 210

Finn V Jensen. An introduction to Bayesian networks , volume 210. UCL press London, 1996

work page 1996
[19]

Neil Robertson and Paul D. Seymour. Graph minors. ii. al gorithmic aspects of tree-width. Journal of algorithms, 7(3):309–322, 1986

work page 1986
[20]

The optimization of queries in rela tional databases

Robert Philip Kooi. The optimization of queries in rela tional databases. 1980

work page 1980
[21]

Optim al histograms for limiting worst-case error propagation in the size of join results

Y annis E Ioannidis and Stavros Christodoulakis. Optim al histograms for limiting worst-case error propagation in the size of join results. ACM Transactions on Database Systems (TODS) , 18(4):709–748, 1993

work page 1993
[22]

St holes: a multidimensional workload-aware histogram

Nicolas Bruno, Surajit Chaudhuri, and Luis Gravano. St holes: a multidimensional workload-aware histogram. In ACM SIGMOD Record, volume 30, pages 211–222. ACM, 2001

work page 2001
[23]

On rectangular partitionings in two dimensions: Algo- rithms, complexity and applications

S Muthukrishnan, Viswanath Poosala, and Torsten Suel. On rectangular partitionings in two dimensions: Algo- rithms, complexity and applications. Database Theory—ICDT’99 , pages 236–256, 1999

work page 1999
[24]

The computational complexity of prob abilistic inference using bayesian belief networks

Gregory F Cooper. The computational complexity of prob abilistic inference using bayesian belief networks. Artiﬁcial intelligence, 42(2-3):393–405, 1990

work page 1990
[25]

On random sampling over joins

Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayy a. On random sampling over joins. In ACM SIGMOD Record, volume 28, pages 263–274. ACM, 1999

work page 1999
[26]

Random sampling from databases

Frank Olken. Random sampling from databases . PhD thesis, University of California, Berkeley, 1993

work page 1993
[27]

Wander join: O nline aggregation via random walks

Feifei Li, Bin Wu, Ke Yi, and Zhuoyue Zhao. Wander join: O nline aggregation via random walks. In Proceedings of the 2016 International Conference on Management of Data , pages 615–629. ACM, 2016

work page 2016
[28]

Join synopses for approxi- mate query answering

Swarup Acharya, Phillip B Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. Join synopses for approxi- mate query answering. In ACM SIGMOD Record, volume 28, pages 275–286. ACM, 1999

work page 1999
[29]

Leo-db2’s learning optimizer

Michael Stillger, Guy M Lohman, V olker Markl, and Mokht ar Kandil. Leo-db2’s learning optimizer. In VLDB, volume 1, pages 19–28, 2001

work page 2001
[30]

Adaptive selectivity estimation using query feedback , vol- ume 23

Chungmin Melvin Chen and Nick Roussopoulos. Adaptive selectivity estimation using query feedback , vol- ume 23. ACM, 1994

work page 1994
[31]

Le arning bayesian networks: The combination of knowledge and statistical data

David Heckerman, Dan Geiger, and David M Chickering. Le arning bayesian networks: The combination of knowledge and statistical data. Machine learning, 20(3):197–243, 1995

work page 1995
[32]

Learning bayesian network structure using lp relaxations

Tommi Jaakkola, David Sontag, Amir Globerson, and Mari na Meila. Learning bayesian network structure using lp relaxations. In Proceedings of the Thirteenth International Conference on Artiﬁcial Intelligence and Statistics, pages 358–365, 2010

work page 2010
[33]

Approximating discrete probabilit y distributions with dependence trees

C Chow and Cong Liu. Approximating discrete probabilit y distributions with dependence trees. IEEE transac- tions on Information Theory , 14(3):462–467, 1968

work page 1968
[34]

Probabilistic networks and expert systems: Exact computational methods for Bayesian n etworks

Robert G Cowell, Philip Dawid, Steffen L Lauritzen, and David J Spiegelhalter. Probabilistic networks and expert systems: Exact computational methods for Bayesian n etworks. Springer Science & Business Media, 2006

work page 2006
[35]

The Steiner tree problem, volume 53

Frank K Hwang, Dana S Richards, and Pawel Winter. The Steiner tree problem, volume 53. Elsevier, 1992

work page 1992
[36]

Why you should run tpc-ds: a workload analysis

Meikel Poess, Raghunath Othayoth Nambiar, and David Wa lrath. Why you should run tpc-ds: a workload analysis. In Proceedings of the 33rd international conference on V ery large data bases, pages 1138–1149. VLDB Endowment, 2007

work page 2007
[37]

Preventing bad plans by bounding the impact of cardinality estimation errors

Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. Preventing bad plans by bounding the impact of cardinality estimation errors. Proceedings of the VLDB Endowment , 2(1):982–993, 2009

work page 2009
[38]

Query optimization through the looking glass, and what we found running the join order benchmark

Viktor Leis, Bernhard Radke, Andrey Gubichev, Atanas M irchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal , pages 1–26, 2018. 11

work page 2018

[1] [1]

Ac- cess path selection in a relational database management sys tem

P Grifﬁths Selinger, Morton M Astrahan, Donald D Chamber lin, Raymond A Lorie, and Thomas G Price. Ac- cess path selection in a relational database management sys tem. In Proceedings of the 1979 ACM SIGMOD international conference on Management of data , pages 23–34. ACM, 1979

work page 1979

[2] [2]

On the propagation of errors in the size of join results , vol- ume 20

Y annis E Ioannidis and Stavros Christodoulakis. On the propagation of errors in the size of join results , vol- ume 20. ACM, 1991

work page 1991

[3] [3]

Looking Back at Postgres

Joseph M Hellerstein. Looking back at postgres. arXiv preprint arXiv:1901.01973, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[4] [4]

Presto: Interacting with petabytes of data at facebook

Martin Traverso. Presto: Interacting with petabytes of data at facebook. Retrieved February, 4:2014, 2013

work page 2014

[5] [5]

Spark sql: Rel ational data processing in spark

Michael Armbrust, Reynold S Xin, Cheng Lian, Yin Huai, Da vies Liu, Joseph K Bradley, Xiangrui Meng, Tomer Kaftan, Michael J Franklin, Ali Ghodsi, et al. Spark sql: Rel ational data processing in spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1383–1394. ACM, 2015

work page 2015

[6] [6]

Improved histograms for selectivity estimation of range predicates

Viswanath Poosala, Peter J Haas, Y annis E Ioannidis, and Eugene J Shekita. Improved histograms for selectivity estimation of range predicates. In ACM Sigmod Record, volume 25, pages 294–305. ACM, 1996

work page 1996

[7] [7]

Self-tunin g, gpu-accelerated kernel density models for mul- tidimensional selectivity estimation

Max Heimel, Martin Kiefer, and V olker Markl. Self-tunin g, gpu-accelerated kernel density models for mul- tidimensional selectivity estimation. In Proceedings of the 2015 ACM SIGMOD International Conferenc e on Management of Data, pages 1477–1492. ACM, 2015

work page 2015

[8] [8]

Equi-depth multidim ensional histograms

M Muralikrishna and David J DeWitt. Equi-depth multidim ensional histograms. In ACM SIGMOD Record , volume 17, pages 28–36. ACM, 1988

work page 1988

[9] [9]

Accura te estimation of the number of tuples satisfying a condi- tion

Gregory Piatetsky-Shapiro and Charles Connell. Accura te estimation of the number of tuples satisfying a condi- tion. ACM Sigmod Record, 14(2):256–276, 1984

work page 1984

[10] [10]

Query size esti mation by adaptive sampling

Richard J Lipton and Jeffrey F Naughton. Query size esti mation by adaptive sampling. In Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages 40–46. ACM, 1990

work page 1990

[11] [11]

Join size estimation subject to ﬁlter conditions

David V engerov, Andre Cavalheiro Menck, Mohamed Zait, and Sunil P Chakkappen. Join size estimation subject to ﬁlter conditions. Proceedings of the VLDB Endowment , 8(12):1530–1541, 2015

work page 2015

[12] [12]

Cardinality estimation done right: Index-based join sampling

Viktor Leis, Bernharde Radke, Andrey Gubichev, Alfons Kemper, and Thomas Neumann. Cardinality estimation done right: Index-based join sampling. In CIDR, 2017

work page 2017

[13] [13]

Two-level sampling for join size estim ation

Y u Chen and Ke Yi. Two-level sampling for join size estim ation. In Proceedings of the 2017 ACM International Conference on Management of Data , pages 759–774. ACM, 2017

work page 2017

[14] [14]

Learned Cardinalities: Estimating Correlated Joins with Deep Learning

Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis , Peter Boncz, and Alfons Kemper. Learned cardinali- ties: Estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[15] [15]

How good are query optimizers, really? Proceedings of the VLDB Endowment , 9(3):204–215, 2015

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Bo ncz, Alfons Kemper, and Thomas Neumann. How good are query optimizers, really? Proceedings of the VLDB Endowment , 9(3):204–215, 2015

work page 2015

[16] [16]

Selec tivity estimation using probabilistic models

Lise Getoor, Benjamin Taskar, and Daphne Koller. Selec tivity estimation using probabilistic models. In ACM SIGMOD Record, volume 30, pages 461–472. ACM, 2001

work page 2001

[17] [17]

Lightweight graphical models for selectivity estima- tion without independence assumptions

Kostas Tzoumas, Amol Deshpande, and Christian S Jensen . Lightweight graphical models for selectivity estima- tion without independence assumptions. Proceedings of the VLDB Endowment , 4(11):852–863, 2011. 10 A PREPRINT - J ULY 16, 2019

work page 2011

[18] [18]

An introduction to Bayesian networks , volume 210

Finn V Jensen. An introduction to Bayesian networks , volume 210. UCL press London, 1996

work page 1996

[19] [19]

Neil Robertson and Paul D. Seymour. Graph minors. ii. al gorithmic aspects of tree-width. Journal of algorithms, 7(3):309–322, 1986

work page 1986

[20] [20]

The optimization of queries in rela tional databases

Robert Philip Kooi. The optimization of queries in rela tional databases. 1980

work page 1980

[21] [21]

Optim al histograms for limiting worst-case error propagation in the size of join results

Y annis E Ioannidis and Stavros Christodoulakis. Optim al histograms for limiting worst-case error propagation in the size of join results. ACM Transactions on Database Systems (TODS) , 18(4):709–748, 1993

work page 1993

[22] [22]

St holes: a multidimensional workload-aware histogram

Nicolas Bruno, Surajit Chaudhuri, and Luis Gravano. St holes: a multidimensional workload-aware histogram. In ACM SIGMOD Record, volume 30, pages 211–222. ACM, 2001

work page 2001

[23] [23]

On rectangular partitionings in two dimensions: Algo- rithms, complexity and applications

S Muthukrishnan, Viswanath Poosala, and Torsten Suel. On rectangular partitionings in two dimensions: Algo- rithms, complexity and applications. Database Theory—ICDT’99 , pages 236–256, 1999

work page 1999

[24] [24]

The computational complexity of prob abilistic inference using bayesian belief networks

Gregory F Cooper. The computational complexity of prob abilistic inference using bayesian belief networks. Artiﬁcial intelligence, 42(2-3):393–405, 1990

work page 1990

[25] [25]

On random sampling over joins

Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayy a. On random sampling over joins. In ACM SIGMOD Record, volume 28, pages 263–274. ACM, 1999

work page 1999

[26] [26]

Random sampling from databases

Frank Olken. Random sampling from databases . PhD thesis, University of California, Berkeley, 1993

work page 1993

[27] [27]

Wander join: O nline aggregation via random walks

Feifei Li, Bin Wu, Ke Yi, and Zhuoyue Zhao. Wander join: O nline aggregation via random walks. In Proceedings of the 2016 International Conference on Management of Data , pages 615–629. ACM, 2016

work page 2016

[28] [28]

Join synopses for approxi- mate query answering

Swarup Acharya, Phillip B Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. Join synopses for approxi- mate query answering. In ACM SIGMOD Record, volume 28, pages 275–286. ACM, 1999

work page 1999

[29] [29]

Leo-db2’s learning optimizer

Michael Stillger, Guy M Lohman, V olker Markl, and Mokht ar Kandil. Leo-db2’s learning optimizer. In VLDB, volume 1, pages 19–28, 2001

work page 2001

[30] [30]

Adaptive selectivity estimation using query feedback , vol- ume 23

Chungmin Melvin Chen and Nick Roussopoulos. Adaptive selectivity estimation using query feedback , vol- ume 23. ACM, 1994

work page 1994

[31] [31]

Le arning bayesian networks: The combination of knowledge and statistical data

David Heckerman, Dan Geiger, and David M Chickering. Le arning bayesian networks: The combination of knowledge and statistical data. Machine learning, 20(3):197–243, 1995

work page 1995

[32] [32]

Learning bayesian network structure using lp relaxations

Tommi Jaakkola, David Sontag, Amir Globerson, and Mari na Meila. Learning bayesian network structure using lp relaxations. In Proceedings of the Thirteenth International Conference on Artiﬁcial Intelligence and Statistics, pages 358–365, 2010

work page 2010

[33] [33]

Approximating discrete probabilit y distributions with dependence trees

C Chow and Cong Liu. Approximating discrete probabilit y distributions with dependence trees. IEEE transac- tions on Information Theory , 14(3):462–467, 1968

work page 1968

[34] [34]

Probabilistic networks and expert systems: Exact computational methods for Bayesian n etworks

Robert G Cowell, Philip Dawid, Steffen L Lauritzen, and David J Spiegelhalter. Probabilistic networks and expert systems: Exact computational methods for Bayesian n etworks. Springer Science & Business Media, 2006

work page 2006

[35] [35]

The Steiner tree problem, volume 53

Frank K Hwang, Dana S Richards, and Pawel Winter. The Steiner tree problem, volume 53. Elsevier, 1992

work page 1992

[36] [36]

Why you should run tpc-ds: a workload analysis

Meikel Poess, Raghunath Othayoth Nambiar, and David Wa lrath. Why you should run tpc-ds: a workload analysis. In Proceedings of the 33rd international conference on V ery large data bases, pages 1138–1149. VLDB Endowment, 2007

work page 2007

[37] [37]

Preventing bad plans by bounding the impact of cardinality estimation errors

Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. Preventing bad plans by bounding the impact of cardinality estimation errors. Proceedings of the VLDB Endowment , 2(1):982–993, 2009

work page 2009

[38] [38]

Query optimization through the looking glass, and what we found running the join order benchmark

Viktor Leis, Bernhard Radke, Andrey Gubichev, Atanas M irchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal , pages 1–26, 2018. 11

work page 2018