Enabling Data Dependency-based Query Optimization

Daniel Lindner; Daniel Ritter; Felix Naumann

arxiv: 2406.06886 · v2 · submitted 2024-06-11 · 💻 cs.DB

Enabling Data Dependency-based Query Optimization

Daniel Lindner , Daniel Ritter , Felix Naumann This is my paper

Pith reviewed 2026-05-24 00:24 UTC · model grok-4.3

classification 💻 cs.DB

keywords data dependenciesquery optimizationdatabase systemsTPC-DSJOB benchmarkprimary keysforeign keysperformance improvement

0 comments

The pith

An automated system discovers and validates additional data dependencies to optimize queries without manual declarations or SQL rewrites.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that data dependencies beyond primary and foreign keys can be identified automatically and used to improve query performance in analytical databases. It first establishes the potential gains through experiments with rewritten SQL queries across multiple systems and benchmarks. It then describes a complete system that finds candidate dependencies, checks them efficiently, and feeds valid ones into the optimizer. If correct, this removes the need for experts to declare or maintain such dependencies while delivering speedups comparable to hand-tuned rewrites.

Core claim

The paper claims that an integrated system can recognize dependency candidates, validate them for optimization use, and apply them in query plans, matching the performance of dedicated SQL rewrites. Compared to PKs and FKs alone, it reports geometric mean speedups of 35% on TPC-DS and 29% on JOB, with some queries improving more than 90%, and discovery costs far below the gains from one workload run.

What carries the argument

The automated pipeline that recognizes dependency candidates, validates their applicability to queries, and integrates them into existing query optimizers without manual input.

If this is right

Queries achieve geometric mean speedups of 35% on TPC-DS and 29% on JOB over PK/FK-only optimization.
Individual query latencies can drop by more than 90% when valid dependencies are applied.
Dependency discovery overhead remains orders of magnitude smaller than the improvement from executing a workload once.
The gains appear across a range of analytical database systems when dependencies are used without SQL rewrites.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The low overhead suggests the approach remains practical even when queries run repeatedly on the same data.
Because no manual declaration is needed, the technique could extend to environments where schema changes frequently.
If validation scales with data size, similar automation might apply to larger analytical workloads beyond the tested benchmarks.

Load-bearing premise

Target datasets contain additional data dependencies that can be found and checked efficiently enough for the performance gains to outweigh the discovery cost.

What would settle it

Running the system on datasets that lack extra dependencies beyond PKs and FKs, or where validation time exceeds the latency savings on a workload, would show no net benefit.

Figures

Figures reproduced from arXiv: 2406.06886 by Daniel Lindner, Daniel Ritter, Felix Naumann.

**Figure 2.** Figure 2: Original query plan and versions successively rewritten using O-1, O-2, and O-3. Edges are annotated with the data [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Architectural overview of the automatic depen [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Example of dependency propagation in the query [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Metadata-aware UCC validation using the on-the [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 7.** Figure 7: Latencies with and without dependency-based op [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 10.** Figure 10: Average candidate validation times for four bench [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

read the original abstract

Primary key (PK) and foreign key (FK) constraints are widely used for query optimization. Knowledge about additional data dependencies, such as order dependencies, enables further substantial performance improvements. However, such dependencies are not maintained by database systems or are even unknown to the user. Identifying and validating relevant dependencies automatically and efficiently remains an unsolved problem. This paper presents a system that (i) recognizes dependency candidates for optimization, (ii) efficiently validates their applicability, and (iii) optimizes query plans using valid dependencies. First, we demonstrate the performance impact of optimization techniques using data dependencies additional to PKs and FKs. Using rewritten SQL queries, we empirically show that data dependencies improve performance for a wide range of analytical database systems and benchmarks. Second, we present how to integrate data dependencies into a system to use them without (i) manual declaration and maintenance or (ii) SQL rewrites. Our integrated and fully automated system matches the performance of dedicated SQL rewrites: compared to using only PKs and FKs, queries improve with geometric mean speedups of 35 % for TPC-DS and 29 % for JOB. Individual query latencies drop by more than 90 %. The dependency discovery overhead is orders of magnitude lower than the latency improvement of a single workload execution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers a working automated pipeline for spotting and using extra data dependencies in query optimization, with reported speedups on TPC-DS and JOB, but the claim that it matches hand-written rewrites rests on unshown completeness of discovery.

read the letter

The new piece is the full stack: candidate recognition, efficient validation, and direct optimizer integration, all without manual declarations or SQL rewrites. Earlier work either assumed the dependencies were known or relied on query rewrites; this tries to close the loop automatically. They first demonstrate the upside with rewritten queries across systems, then show the automated version producing geometric mean speedups of 35% on TPC-DS and 29% on JOB versus PK/FK only, with some queries over 90% faster and discovery overhead low enough to pay for itself in one run. That empirical framing is straightforward and addresses the practical barrier of overhead directly. The integration details and benchmark numbers give a concrete sense of what is achievable today. The soft spot is the matching claim. The abstract states the automated system matches dedicated SQL-rewrite performance, yet supplies no evidence that the discovered dependencies are the same ones the rewrites exploited or that the resulting plans are equivalent. If discovery misses key cases or the optimizer integration applies them differently, the headline numbers could overstate what automation actually delivers. Experimental controls, statistical tests, and dependency counts are also missing from the abstract, which leaves room for post-hoc selection concerns. This is for database systems researchers focused on query optimization who want to move past PK/FK. A reader interested in practical dependency exploitation would get usable numbers and a system sketch. It is grounded enough in real benchmarks and a clear problem to deserve referee time, even if the integration evidence needs strengthening.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a system for automatically recognizing, validating, and integrating data dependencies (beyond PK/FK constraints, including order dependencies) into query optimizers for analytical workloads. It first empirically demonstrates performance gains from such dependencies via hand-written SQL rewrites across database systems and benchmarks, then claims an integrated automated pipeline that matches those gains without manual declaration or rewrites, reporting geometric-mean speedups of 35% on TPC-DS and 29% on JOB (with some queries improving >90%) and discovery overhead orders of magnitude below query latency savings.

Significance. If the automated recognition+validation+integration pipeline is shown to surface the same dependencies and produce equivalent plan changes as the manual rewrites, the result would be significant: it would make dependency-based optimizations practical at scale without user intervention. The low-overhead claim and cross-system empirical gains (if reproducible) would strengthen the case for extending optimizers beyond PK/FK.

major comments (2)

[Abstract] Abstract: the central claim that the 'integrated and fully automated system matches the performance of dedicated SQL rewrites' is load-bearing yet unsupported; no evidence is supplied that the discovery pipeline surfaces exactly the dependencies exploited by the rewrites or that the optimizer integration reproduces the same plan deltas.
[Abstract] Abstract / experimental evaluation: the reported geometric-mean speedups (35% TPC-DS, 29% JOB) and individual >90% latency drops are presented without any description of experimental controls, statistical significance testing, workload selection criteria, or safeguards against post-hoc dependency selection, limiting verification of the performance claims.

minor comments (1)

[Abstract] The abstract mentions 'order dependencies' as an example but does not enumerate the full set of dependency types handled by the system; a brief enumeration would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript where the concerns identify opportunities for clarification or additional evidence.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the 'integrated and fully automated system matches the performance of dedicated SQL rewrites' is load-bearing yet unsupported; no evidence is supplied that the discovery pipeline surfaces exactly the dependencies exploited by the rewrites or that the optimizer integration reproduces the same plan deltas.

Authors: The manuscript reports that the automated pipeline produces the same geometric-mean speedups as the hand-written rewrites (35% on TPC-DS, 29% on JOB). We agree, however, that an explicit side-by-side comparison of discovered dependencies and resulting plan deltas would make the equivalence claim more direct. We will add such a comparison (e.g., a table listing dependencies used in the manual rewrites versus those surfaced by the pipeline, together with optimizer plan differences) to the revised evaluation section. revision: yes
Referee: [Abstract] Abstract / experimental evaluation: the reported geometric-mean speedups (35% TPC-DS, 29% JOB) and individual >90% latency drops are presented without any description of experimental controls, statistical significance testing, workload selection criteria, or safeguards against post-hoc dependency selection, limiting verification of the performance claims.

Authors: The abstract is intentionally concise; the full experimental section describes the TPC-DS and JOB workloads, query selection, and the automated discovery/validation pipeline. We will nevertheless revise the abstract to include a short statement of the benchmarks used and a pointer to the detailed methodology. We will also add any missing statistical significance results and an explicit description of how the candidate-generation step avoids post-hoc selection (candidates are enumerated from schema and data statistics independently of the query workload). revision: partial

Circularity Check

0 steps flagged

No circularity: empirical system benchmarks with direct measurements

full rationale

The paper describes a practical system for auto-discovering, validating, and integrating data dependencies into query optimizers, evaluated via direct runtime benchmarks on TPC-DS and JOB workloads. Speedups (geometric means 35% and 29%) and latency reductions are reported as measured outcomes from the implemented pipeline, not as quantities derived from equations, fitted parameters, or self-referential definitions. No load-bearing derivations, uniqueness theorems, or ansatzes appear; the central claims rest on experimental comparison to PK/FK baselines and hand-written rewrites rather than any reduction to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard database assumptions that query optimizers can exploit additional dependencies when present and that such dependencies occur in real analytical workloads; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Database query optimizers can exploit data dependencies beyond primary and foreign keys when they are known and valid.
Invoked in the opening sentences of the abstract as the basis for performance improvements.

pith-pipeline@v0.9.0 · 5749 in / 1208 out tokens · 24592 ms · 2026-05-24T00:24:09.308494+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages

[1]

Abadi, Samuel Madden, and Nabil Hachem

Daniel J. Abadi, Samuel Madden, and Nabil Hachem. 2008. Column-stores vs. row-stores: how different are they really?. In Proceedings of the International Conference on Management of Data (SIGMOD) . 967–980

work page 2008
[2]

Ziawasch Abedjan, Lukasz Golab, and Felix Naumann. 2015. Profiling relational data: a survey. The VLDB Journal 24, 4 (2015), 557–581

work page 2015
[3]

Lee, Andrew Witkowski, Dinesh Das, Hong Su, Mohamed Zaït, and Thierry Cruanes

Rafi Ahmed, Allison W. Lee, Andrew Witkowski, Dinesh Das, Hong Su, Mohamed Zaït, and Thierry Cruanes. 2006. Cost-Based Query Transformation in Oracle. In Proceedings of the International Conference on Very Large Databases (VLDB) . 1026–1036

work page 2006
[4]

RJ Atwal, Peter Boncz, Ryan Boyd, Antony Courtney, Till Döhmen, Florian Ger- linghoff, Jeff Huang, Joseph Hwang, Raphael Hyde, Elena Felder, Jacob Lacouture, Yves LeMaout, Boaz Leskes, Yao Liu, Alex Monahan, Dan Perkins, Tino Tereshko, Jordan Tigani, Nick Ursa, Stephanie Wang, and Yannick Welsch. 2024. Mother- Duck: DuckDB in the cloud and in the client. ...

work page 2024
[5]

Maximilian Bandle, Jana Giceva, and Thomas Neumann. 2021. To Partition, or Not to Partition, That is the Join Question in a Real System. In Proceedings of the International Conference on Management of Data (SIGMOD) . 168–180

work page 2021
[6]

Yuanzhe Bei, Thao Pham, Akshay Aggarwal, Nga Tran, Jaimin Dave, Chuck Bear, and Michael Leuchtenburg. 2019. Vertica Flattened Tables and Live Aggregate Projections: A Column-based Alternative to Materialized Views for Analytics. In Proceedings of the International Conference on Big Data (BigData) . 1749–1758

work page 2019
[7]

Siegfried Bell. 1997. Dependency Mining in Relational Databases. In Proceedings of the International Joint Conference on Qualitative and Quantitative Practical Reasoning (ECSQARU-FAPR). 16–29

work page 1997
[8]

Siegfried Bell and Peter Brockhausen. 1995. Discovery of Data Dependencies in Relational Databases. Technical Report. University Dortmund. 6 pages

work page 1995
[9]

Srikanth Bellamkonda, Rafi Ahmed, Andrew Witkowski, Angela Amor, Mohamed Zaït, and Chun Chieh Lin. 2009. Enhanced Subquery Optimizations in Oracle. Proceedings of the VLDB Endowment (PVLDB) 2, 2 (2009), 1366–1377

work page 2009
[10]

Carsten Binnig, Stefan Hildenbrand, and Franz Färber. 2009. Dictionary-based order-preserving string compression for main memory column stores. InProceed- ings of the International Conference on Management of Data (SIGMOD) . 283–296

work page 2009
[11]

Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting Set Enumeration with Partial Information for Unique Column Combination Discovery. Proceedings of the VLDB Endowment (PVLDB) 13, 11 (2020), 2270–2283

work page 2020
[12]

Boncz, Thomas Neumann, and Orri Erling

Peter A. Boncz, Thomas Neumann, and Orri Erling. 2013. TPC-H Analyzed: Hid- den Messages and Lessons Learned from an Influential Benchmark. InProceedings of the TPC Technology Conference (TPCTC) . 61–76

work page 2013
[13]

Casanova, Luiz Tucherman, and Antonio L

Marco A. Casanova, Luiz Tucherman, and Antonio L. Furtado. 1988. Enforcing Inclusion Dependencies and Referencial Integrity. In VLDB. 38–49

work page 1988
[14]

Edgar F. Codd. 1971. Further Normalization of the Data Base Relational Model . Research Report RJ909. IBM. 33 pages

work page 1971
[15]

C. J. Date and Hugh Darwen. 1992. Relational Database Writings 1989-1991 . Addison-Wesley, Chapter The Role of functional Dependence in Query Decom- position, 133–150

work page 1992
[16]

Markus Dreseler, Martin Boissier, Tilmann Rabl, and Matthias Uflacker. 2020. Quantifying TPC-H Choke Points and Their Optimizations. Proceedings of the VLDB Endowment (PVLDB) 13, 8 (2020), 1206–1220

work page 2020
[17]

Markus Dreseler, Jan Kossmann, Martin Boissier, Stefan Klauck, Matthias Uflacker, and Hasso Plattner. 2019. Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management. In Proceed- ings of the International Conference on Extending Database Technology (EDBT) . 313–324

work page 2019
[18]

Falco Dürsch, Axel Stebner, Fabian Windheuser, Maxi Fischer, Tim Friedrich, Nils Strelow, Tobias Bleifuß, Hazar Harmouch, Lan Jiang, Thorsten Papenbrock, and Felix Naumann. 2019. Inclusion Dependency Discovery: An Experimental Evaluation of Thirteen Algorithms. In Proceedings of the International Conference on Information and Knowledge Management (CIKM) . 219–228

work page 2019
[19]

Ronald Fagin and Moshe Y. Vardi. 1984. The Theory of Data Dependencies - An Overview. In Proceedings of the International Colloquium on Automata, Languages and Programming (ICALP). 1–22

work page 1984
[20]

Wenfei Fan, Floris Geerts, and Xibei Jia. 2008. Semandaq: a data quality sys- tem based on conditional functional dependencies. Proceedings of the VLDB Endowment (PVLDB) 1, 2 (2008), 1460–1463

work page 2008
[21]

Franz Färber, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and Wolfgang Lehner. 2011. SAP HANA database: data management for modern business applications. SIGMOD Record 40, 4 (2011), 45–51

work page 2011
[22]

Franz Färber, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, and Jonathan Dees. 2012. The SAP HANA Database – An Architecture Overview. IEEE Data Engineering Bulletin 35, 1 (2012), 28–33

work page 2012
[23]

Ganski and Harry K

Richard A. Ganski and Harry K. T. Wong. 1987. Optimization of Nested SQL Queries Revisited. In Proceedings of the International Conference on Management of Data (SIGMOD). 23–33

work page 1987
[24]

Goetz Graefe, Ross Bunker, and Shaun Cooper. 1998. Hash Joins and Hash Teams in Microsoft SQL Server. In Proceedings of the International Conference on Very Large Databases (VLDB). 86–97

work page 1998
[25]

Haas, Johann Christoph Freytag, Guy M

Laura M. Haas, Johann Christoph Freytag, Guy M. Lohman, and Hamid Pirahesh

work page
[26]

InProceedings of the International Conference on Management of Data (SIGMOD)

Extensible Query Processing in Starburst. InProceedings of the International Conference on Management of Data (SIGMOD) . 377–388

work page
[27]

Knoblock

Chun-Nan Hsu and Craig A. Knoblock. 1996. Using Inductive Learning To Generate Rules for Semantic Query Optimization. In Advances in Knowledge Discovery and Data Mining . AAAI/MIT Press, 425–445

work page 1996
[28]

Ykä Huhtala, Juha Kärkkäinen, Pasi Porkka, and Hannu Toivonen. 1999. TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependen- cies. Comput. J. 42, 2 (1999), 100–111

work page 1999
[29]

Sjoerd Mullender, and Martin L

Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, K. Sjoerd Mullender, and Martin L. Kersten. 2012. MonetDB: Two Decades of Research in Column- oriented Database Architectures. IEEE Data Engineering Bulletin 35, 1 (2012), 40–45

work page 2012
[30]

International Organization for Standardization. 2023. Information technology – Database languages SQL – Part 2: Foundation (SQL/Foundation) . Standard Specification ISO/IEC 9075-2:2023(E)

work page 2023
[31]

Ioannidis

Yannis E. Ioannidis. 1996. Query Optimization. Comput. Surveys 28, 1 (1996), 121–123

work page 1996
[32]

Won Kim. 1982. On Optimizing an SQL-like Nested Query. ACM Transactions on Database Systems (TODS) 7, 3 (1982), 443–469

work page 1982
[33]

Jonathan J. King. 1980. Modelling Concepts for Reasoning About Access to Knowledge. In Proceedings of the Workshop on Data Abstraction, Databases and Conceptual Modelling. 138–140

work page 1980
[34]

Jan Kossmann, Daniel Lindner, Felix Naumann, and Thorsten Papenbrock. 2022. Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization. In Proceedings of the Conference on Innovative Data Systems Research (CIDR) . 7 pages

work page 2022
[35]

Jan Kossmann, Thorsten Papenbrock, and Felix Naumann. 2022. Data dependen- cies for query optimization: a survey. The VLDB Journal 31, 1 (2022), 1–22

work page 2022
[36]

Hanson, Weiyun Huang, Michal Nowakiewicz, and Vassilis Papadimos

Per-Åke Larson, Adrian Birka, Eric N. Hanson, Weiyun Huang, Michal Nowakiewicz, and Vassilis Papadimos. 2015. Real-Time Analytical Process- ing with SQL Server. Proceedings of the VLDB Endowment (PVLDB) 8, 12 (2015), 1740–1751

work page 2015
[37]

Patel, and Mike Zwilling

Per-Åke Larson, Spyros Blanas, Cristian Diaconu, Craig Freedman, Jignesh M. Patel, and Mike Zwilling. 2011. High-Performance Concurrency Control Mecha- nisms for Main-Memory Databases. Proceedings of the VLDB Endowment (PVLDB) 5, 4 (2011), 298–309

work page 2011
[38]

Boncz, Alfons Kem- per, and Thomas Neumann

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kem- per, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? Proceedings of the VLDB Endowment (PVLDB) 9, 3 (2015), 204–215

work page 2015
[39]

Maurizio Lenzerini. 2002. Data Integration: A Theoretical Perspective. In Pro- ceedings of the Symposium on Principles of Database Systems (PODS) . 233–246

work page 2002
[40]

Mark Levene and George Loizou. 2003. Why is the snowflake schema a good data warehouse design? Information Systems (IS) 28, 3 (2003), 225–240

work page 2003
[41]

Xiaoxuan Liu, Shuxian Wang, Mengzhu Sun, Sicheng Pan, Ge Li, Siddharth Jha, Cong Yan, Junwen Yang, Shan Lu, and Alvin Cheung. 2023. Leveraging Application Data Constraints to Optimize Database-Backed Web Applications. Proceedings of the VLDB Endowment (PVLDB) 16, 6 (2023), 1208–1221

work page 2023
[42]

Lucchesi and Sylvia L

Claudio L. Lucchesi and Sylvia L. Osborn. 1978. Candidate Keys for Relations. J. Comput. System Sci. 17, 2 (1978), 270–279

work page 1978
[43]

Zhenghua Lyu, Huan Hubert Zhang, Gang Xiong, Gang Guo, Haozhou Wang, Jinbao Chen, Asim Praveen, Yu Yang, Xiaoming Gao, Alexandra Wang, Wen Lin, Ashwin Agrawal, Junfeng Yang, Hao Wu, Xiaoliang Li, Feng Guo, Jiang Wu, Jesse Zhang, and Venkatesh Raghavan. 2021. Greenplum: A Hybrid Database for Transactional and Analytical Workloads. In Proceedings of the Int...

work page 2021
[44]

Bernstein, and Erhard Rahm

Jayant Madhavan, Philip A. Bernstein, and Erhard Rahm. 2001. Generic Schema Matching with Cupid. InProceedings of the International Conference on Very Large Databases (VLDB). 49–58

work page 2001
[45]

Norman May, Alexander Böhm, and Wolfgang Lehner. 2017. SAP HANA - The Evolution of an In-Memory DBMS from Pure OLAP Processing Towards Mixed Workloads. In Proceedings of the Conference Datenbanksysteme in Business, Technologie und Web Technik (BTW). 545–563

work page 2017
[46]

Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase, Allison Holloway, Hui Jin, Jesse Kamp, Kartik Kulkarni, Tirthankar Lahiri, Juan Loaiza, Neil MacNaughton, Vineet Marwah, Atrayee Mullick, Andy Witkowski, Jiaqi Yan, and Mohamed Zaït. 2015. Distributed Architecture of Oracle Database In-memory. Proceedings of the VLDB End...

work page 2015
[47]

Thomas Neumann. 2014. Engineering High-Performance Database Engines. Proceedings of the VLDB Endowment (PVLDB) 7, 13 (2014), 1734–1741

work page 2014
[48]

Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In Proceedings of the Conference on Innovative Data Systems Research (CIDR) . 7 pages

work page 2020
[49]

Anisoara Nica, Reza Sherkat, Mihnea Andrei, Xun Chen, Martin Heidel, Christian Bensberg, and Heiko Gerwens. 2017. Statisticum: Data Statistics Management in SAP HANA. Proceedings of the VLDB Endowment (PVLDB) 10, 12 (2017), Daniel Lindner, Daniel Ritter, and Felix Naumann 1658–1669

work page 2017
[50]

O’Neil, Elizabeth J

Patrick E. O’Neil, Elizabeth J. O’Neil, and Xuedong Chen. 2009. Star Schema Benchmark. Standard Specification Revision 3. https://www.cs.umb.edu/~poneil/ StarSchemaB.PDF (accessed April 9, 2024)

work page 2009
[51]

O’Neil, Elizabeth J

Patrick E. O’Neil, Elizabeth J. O’Neil, Xuedong Chen, and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing. InProceedings of the TPC Technology Conference (TPCTC) . 237–252

work page 2009
[52]

Oracle. [n. d.]. MySQL 8.0 Reference Manual – Optimizing IN and EXISTS Subquery Predicates with Semijoin Transformations . https://dev.mysql.com/doc/refman/8. 0/en/semijoins.html (accessed April 9, 2024)

work page 2024
[53]

Orr, Srikanth Kandula, and Surajit Chaudhuri

Laurel J. Orr, Srikanth Kandula, and Surajit Chaudhuri. 2019. Pushing Data- Induced Predicates Through Joins in Big-Data Clusters. Proceedings of the VLDB Endowment (PVLDB) 13, 3 (2019), 252–265

work page 2019
[54]

Thorsten Papenbrock, Jens Ehrlich, Jannik Marten, Tommy Neubert, Jan-Peer Rudolph, Martin Schönberg, Jakob Zwiener, and Felix Naumann. 2015. Func- tional Dependency Discovery: An Experimental Evaluation of Seven Algorithms. Proceedings of the VLDB Endowment (PVLDB) 8, 10 (2015), 1082–1093

work page 2015
[55]

Thorsten Papenbrock and Felix Naumann. 2017. A Hybrid Approach for Effi- cient Unique Column Combination Discovery. In Proceedings of the Conference Datenbanksysteme in Business, Technologie und Web Technik (BTW) . 195–204

work page 2017
[56]

Mowry, Matthew Perron, Ian Quah, Siddharth San- turkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang

Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth San- turkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Sys- tems. In Proceedings of the Conference on Innovative Data Syste...

work page 2017
[57]

Eduardo H. M. Pena, Erik Falk, Jorge Augusto Meira, and Eduardo Cunha de Almeida. 2018. Mind Your Dependencies for Semantic Query Optimization. J. Inf. Data Manag. 9, 1 (2018), 3–19

work page 2018
[58]

Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. In Proceedings of the International Conference on Management of Data (SIGMOD). 1981–1984

work page 2019
[59]

Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent KulandaiSamy, Jens Leenstra, Sam Lightstone, Shaorong Liu, Guy M

Vijayshankar Raman, Gopi K. Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent KulandaiSamy, Jens Leenstra, Sam Lightstone, Shaorong Liu, Guy M. Lohman, Tim Malkemus, René Müller, Ippokratis Pandis, Berni Schiefer, David Sharpe, Richard Sidle, Adam J. Storm, and Liping Zhang. 2013. DB2 with BLU Acceleration: So Much More than Just a Column St...

work page 2013
[60]

Aref, Ahmed K

El Kindi Rezig, Mourad Ouzzani, Walid G. Aref, Ahmed K. Elmagarmid, Ahmed R. Mahmood, and Michael Stonebraker. 2021. Horizon: Scalable Dependency-driven Data Cleaning. Proceedings of the VLDB Endowment (PVLDB) 14, 11 (2021), 2546– 2554

work page 2021
[61]

Philipp Schirmer, Thorsten Papenbrock, Sebastian Kruse, Felix Naumann, Dennis Hempfing, Torben Mayer, and Daniel Neuschäfer-Rube. 2019. DynFD: Functional Dependency Discovery in Dynamic Datasets. In Proceedings of the International Conference on Extending Database Technology (EDBT) . 253–264

work page 2019
[62]

Shashi Shekhar, Babak Hamidzadeh, Ashim Kohli, and Mark Coyle. 1993. Learn- ing Transformation Rules for Semantic Query Optimization: A Data-Driven Approach. IEEE Transactions on Knowledge and Data Engineering (TKDE) 5, 6 (1993), 950–964

work page 1993
[63]

Siegel, Edward Sciore, and Sharon C

Michael D. Siegel, Edward Sciore, and Sharon C. Salveter. 1992. A Method for Automatic Rule Derivation to Support Semantic Query Optimization. ACM Transactions on Database Systems (TODS) 17, 4 (1992), 563–600

work page 1992
[64]

Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C

Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C. Caragea, Carlos Garcia-Alvarado, Foyzur Rahman, Michalis Petropoulos, Florian Waas, Sivaramakrishnan Narayanan, Konstantinos Krikellas, and Rhonda Baldwin. 2014. Orca: a modular query optimizer architecture for big data. In Proceedings of the Interna...

work page 2014
[65]

Jaroslaw Szlichta, Parke Godfrey, and Jarek Gryz. 2012. Fundamentals of Order Dependencies. Proceedings of the VLDB Endowment (PVLDB) 5, 11 (2012), 1220– 1231

work page 2012
[66]

Jaroslaw Szlichta, Parke Godfrey, Jarek Gryz, Wenbin Ma, Przemyslaw Pawluk, and Calisto Zuzarte. 2011. Queries on dates: fast yet not blind. In Proceedings of the International Conference on Extending Database Technology (EDBT) . 497–502

work page 2011
[67]

2021.TPC Benchmark DS

Transaction Processing Performance Council. 2021.TPC Benchmark DS. Standard Specification Version 3.2.0. http://tpc.org/tpc_documents_current_versions/pdf/ tpc-ds_v3.2.0.pdf (accessed April 9, 2024)

work page 2021
[68]

Transaction Processing Performance Council. 2022. TPC Benchmark H. Standard Specification Revision 3.0.1. http://tpc.org/tpc_documents_current_versions/ pdf/tpc-h_v3.0.1.pdf (accessed April 9, 2024)

work page 2022
[69]

Jeffrey D. Ullman. 1988. Principles of Database and Knowledge-Base Systems, Volume I. Principles of computer science series, Vol. 14. Computer Science Press

work page 1988
[70]

J. Beau W. Webber. 2013. A bi-symmetric log transformation for wide-range data. Measurement Science and Technology 24, 2 (2013), 3 pages

work page 2013
[71]

Yu and Wei Sun

Clement T. Yu and Wei Sun. 1989. Automatic Knowledge Acquisition and Main- tenance for Semantic Query Optimization. IEEE Transactions on Knowledge and Data Engineering (TKDE) 1, 3 (1989), 362–375

work page 1989
[72]

Xinyu Zeng, Yulong Hui, Jiahong Shen, Andrew Pavlo, Wes McKinney, and Huanchen Zhang. 2023. An Empirical Evaluation of Columnar Storage Formats. Proceedings of the VLDB Endowment (PVLDB) 17, 2 (2023), 148–161

work page 2023
[73]

Mohamed Ziauddin, Andrew Witkowski, You Jung Kim, Janaki Lahorani, Dmitry Potapov, and Murali Krishna. 2017. Dimensions Based Data Clustering and Zone Maps. Proceedings of the VLDB Endowment (PVLDB) 10, 12 (2017), 1622–1633

work page 2017

[1] [1]

Abadi, Samuel Madden, and Nabil Hachem

Daniel J. Abadi, Samuel Madden, and Nabil Hachem. 2008. Column-stores vs. row-stores: how different are they really?. In Proceedings of the International Conference on Management of Data (SIGMOD) . 967–980

work page 2008

[2] [2]

Ziawasch Abedjan, Lukasz Golab, and Felix Naumann. 2015. Profiling relational data: a survey. The VLDB Journal 24, 4 (2015), 557–581

work page 2015

[3] [3]

Lee, Andrew Witkowski, Dinesh Das, Hong Su, Mohamed Zaït, and Thierry Cruanes

Rafi Ahmed, Allison W. Lee, Andrew Witkowski, Dinesh Das, Hong Su, Mohamed Zaït, and Thierry Cruanes. 2006. Cost-Based Query Transformation in Oracle. In Proceedings of the International Conference on Very Large Databases (VLDB) . 1026–1036

work page 2006

[4] [4]

RJ Atwal, Peter Boncz, Ryan Boyd, Antony Courtney, Till Döhmen, Florian Ger- linghoff, Jeff Huang, Joseph Hwang, Raphael Hyde, Elena Felder, Jacob Lacouture, Yves LeMaout, Boaz Leskes, Yao Liu, Alex Monahan, Dan Perkins, Tino Tereshko, Jordan Tigani, Nick Ursa, Stephanie Wang, and Yannick Welsch. 2024. Mother- Duck: DuckDB in the cloud and in the client. ...

work page 2024

[5] [5]

Maximilian Bandle, Jana Giceva, and Thomas Neumann. 2021. To Partition, or Not to Partition, That is the Join Question in a Real System. In Proceedings of the International Conference on Management of Data (SIGMOD) . 168–180

work page 2021

[6] [6]

Yuanzhe Bei, Thao Pham, Akshay Aggarwal, Nga Tran, Jaimin Dave, Chuck Bear, and Michael Leuchtenburg. 2019. Vertica Flattened Tables and Live Aggregate Projections: A Column-based Alternative to Materialized Views for Analytics. In Proceedings of the International Conference on Big Data (BigData) . 1749–1758

work page 2019

[7] [7]

Siegfried Bell. 1997. Dependency Mining in Relational Databases. In Proceedings of the International Joint Conference on Qualitative and Quantitative Practical Reasoning (ECSQARU-FAPR). 16–29

work page 1997

[8] [8]

Siegfried Bell and Peter Brockhausen. 1995. Discovery of Data Dependencies in Relational Databases. Technical Report. University Dortmund. 6 pages

work page 1995

[9] [9]

Srikanth Bellamkonda, Rafi Ahmed, Andrew Witkowski, Angela Amor, Mohamed Zaït, and Chun Chieh Lin. 2009. Enhanced Subquery Optimizations in Oracle. Proceedings of the VLDB Endowment (PVLDB) 2, 2 (2009), 1366–1377

work page 2009

[10] [10]

Carsten Binnig, Stefan Hildenbrand, and Franz Färber. 2009. Dictionary-based order-preserving string compression for main memory column stores. InProceed- ings of the International Conference on Management of Data (SIGMOD) . 283–296

work page 2009

[11] [11]

Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting Set Enumeration with Partial Information for Unique Column Combination Discovery. Proceedings of the VLDB Endowment (PVLDB) 13, 11 (2020), 2270–2283

work page 2020

[12] [12]

Boncz, Thomas Neumann, and Orri Erling

Peter A. Boncz, Thomas Neumann, and Orri Erling. 2013. TPC-H Analyzed: Hid- den Messages and Lessons Learned from an Influential Benchmark. InProceedings of the TPC Technology Conference (TPCTC) . 61–76

work page 2013

[13] [13]

Casanova, Luiz Tucherman, and Antonio L

Marco A. Casanova, Luiz Tucherman, and Antonio L. Furtado. 1988. Enforcing Inclusion Dependencies and Referencial Integrity. In VLDB. 38–49

work page 1988

[14] [14]

Edgar F. Codd. 1971. Further Normalization of the Data Base Relational Model . Research Report RJ909. IBM. 33 pages

work page 1971

[15] [15]

C. J. Date and Hugh Darwen. 1992. Relational Database Writings 1989-1991 . Addison-Wesley, Chapter The Role of functional Dependence in Query Decom- position, 133–150

work page 1992

[16] [16]

Markus Dreseler, Martin Boissier, Tilmann Rabl, and Matthias Uflacker. 2020. Quantifying TPC-H Choke Points and Their Optimizations. Proceedings of the VLDB Endowment (PVLDB) 13, 8 (2020), 1206–1220

work page 2020

[17] [17]

Markus Dreseler, Jan Kossmann, Martin Boissier, Stefan Klauck, Matthias Uflacker, and Hasso Plattner. 2019. Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management. In Proceed- ings of the International Conference on Extending Database Technology (EDBT) . 313–324

work page 2019

[18] [18]

Falco Dürsch, Axel Stebner, Fabian Windheuser, Maxi Fischer, Tim Friedrich, Nils Strelow, Tobias Bleifuß, Hazar Harmouch, Lan Jiang, Thorsten Papenbrock, and Felix Naumann. 2019. Inclusion Dependency Discovery: An Experimental Evaluation of Thirteen Algorithms. In Proceedings of the International Conference on Information and Knowledge Management (CIKM) . 219–228

work page 2019

[19] [19]

Ronald Fagin and Moshe Y. Vardi. 1984. The Theory of Data Dependencies - An Overview. In Proceedings of the International Colloquium on Automata, Languages and Programming (ICALP). 1–22

work page 1984

[20] [20]

Wenfei Fan, Floris Geerts, and Xibei Jia. 2008. Semandaq: a data quality sys- tem based on conditional functional dependencies. Proceedings of the VLDB Endowment (PVLDB) 1, 2 (2008), 1460–1463

work page 2008

[21] [21]

Franz Färber, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and Wolfgang Lehner. 2011. SAP HANA database: data management for modern business applications. SIGMOD Record 40, 4 (2011), 45–51

work page 2011

[22] [22]

Franz Färber, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, and Jonathan Dees. 2012. The SAP HANA Database – An Architecture Overview. IEEE Data Engineering Bulletin 35, 1 (2012), 28–33

work page 2012

[23] [23]

Ganski and Harry K

Richard A. Ganski and Harry K. T. Wong. 1987. Optimization of Nested SQL Queries Revisited. In Proceedings of the International Conference on Management of Data (SIGMOD). 23–33

work page 1987

[24] [24]

Goetz Graefe, Ross Bunker, and Shaun Cooper. 1998. Hash Joins and Hash Teams in Microsoft SQL Server. In Proceedings of the International Conference on Very Large Databases (VLDB). 86–97

work page 1998

[25] [25]

Haas, Johann Christoph Freytag, Guy M

Laura M. Haas, Johann Christoph Freytag, Guy M. Lohman, and Hamid Pirahesh

work page

[26] [26]

InProceedings of the International Conference on Management of Data (SIGMOD)

Extensible Query Processing in Starburst. InProceedings of the International Conference on Management of Data (SIGMOD) . 377–388

work page

[27] [27]

Knoblock

Chun-Nan Hsu and Craig A. Knoblock. 1996. Using Inductive Learning To Generate Rules for Semantic Query Optimization. In Advances in Knowledge Discovery and Data Mining . AAAI/MIT Press, 425–445

work page 1996

[28] [28]

Ykä Huhtala, Juha Kärkkäinen, Pasi Porkka, and Hannu Toivonen. 1999. TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependen- cies. Comput. J. 42, 2 (1999), 100–111

work page 1999

[29] [29]

Sjoerd Mullender, and Martin L

Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, K. Sjoerd Mullender, and Martin L. Kersten. 2012. MonetDB: Two Decades of Research in Column- oriented Database Architectures. IEEE Data Engineering Bulletin 35, 1 (2012), 40–45

work page 2012

[30] [30]

International Organization for Standardization. 2023. Information technology – Database languages SQL – Part 2: Foundation (SQL/Foundation) . Standard Specification ISO/IEC 9075-2:2023(E)

work page 2023

[31] [31]

Ioannidis

Yannis E. Ioannidis. 1996. Query Optimization. Comput. Surveys 28, 1 (1996), 121–123

work page 1996

[32] [32]

Won Kim. 1982. On Optimizing an SQL-like Nested Query. ACM Transactions on Database Systems (TODS) 7, 3 (1982), 443–469

work page 1982

[33] [33]

Jonathan J. King. 1980. Modelling Concepts for Reasoning About Access to Knowledge. In Proceedings of the Workshop on Data Abstraction, Databases and Conceptual Modelling. 138–140

work page 1980

[34] [34]

Jan Kossmann, Daniel Lindner, Felix Naumann, and Thorsten Papenbrock. 2022. Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization. In Proceedings of the Conference on Innovative Data Systems Research (CIDR) . 7 pages

work page 2022

[35] [35]

Jan Kossmann, Thorsten Papenbrock, and Felix Naumann. 2022. Data dependen- cies for query optimization: a survey. The VLDB Journal 31, 1 (2022), 1–22

work page 2022

[36] [36]

Hanson, Weiyun Huang, Michal Nowakiewicz, and Vassilis Papadimos

Per-Åke Larson, Adrian Birka, Eric N. Hanson, Weiyun Huang, Michal Nowakiewicz, and Vassilis Papadimos. 2015. Real-Time Analytical Process- ing with SQL Server. Proceedings of the VLDB Endowment (PVLDB) 8, 12 (2015), 1740–1751

work page 2015

[37] [37]

Patel, and Mike Zwilling

Per-Åke Larson, Spyros Blanas, Cristian Diaconu, Craig Freedman, Jignesh M. Patel, and Mike Zwilling. 2011. High-Performance Concurrency Control Mecha- nisms for Main-Memory Databases. Proceedings of the VLDB Endowment (PVLDB) 5, 4 (2011), 298–309

work page 2011

[38] [38]

Boncz, Alfons Kem- per, and Thomas Neumann

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kem- per, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? Proceedings of the VLDB Endowment (PVLDB) 9, 3 (2015), 204–215

work page 2015

[39] [39]

Maurizio Lenzerini. 2002. Data Integration: A Theoretical Perspective. In Pro- ceedings of the Symposium on Principles of Database Systems (PODS) . 233–246

work page 2002

[40] [40]

Mark Levene and George Loizou. 2003. Why is the snowflake schema a good data warehouse design? Information Systems (IS) 28, 3 (2003), 225–240

work page 2003

[41] [41]

Xiaoxuan Liu, Shuxian Wang, Mengzhu Sun, Sicheng Pan, Ge Li, Siddharth Jha, Cong Yan, Junwen Yang, Shan Lu, and Alvin Cheung. 2023. Leveraging Application Data Constraints to Optimize Database-Backed Web Applications. Proceedings of the VLDB Endowment (PVLDB) 16, 6 (2023), 1208–1221

work page 2023

[42] [42]

Lucchesi and Sylvia L

Claudio L. Lucchesi and Sylvia L. Osborn. 1978. Candidate Keys for Relations. J. Comput. System Sci. 17, 2 (1978), 270–279

work page 1978

[43] [43]

Zhenghua Lyu, Huan Hubert Zhang, Gang Xiong, Gang Guo, Haozhou Wang, Jinbao Chen, Asim Praveen, Yu Yang, Xiaoming Gao, Alexandra Wang, Wen Lin, Ashwin Agrawal, Junfeng Yang, Hao Wu, Xiaoliang Li, Feng Guo, Jiang Wu, Jesse Zhang, and Venkatesh Raghavan. 2021. Greenplum: A Hybrid Database for Transactional and Analytical Workloads. In Proceedings of the Int...

work page 2021

[44] [44]

Bernstein, and Erhard Rahm

Jayant Madhavan, Philip A. Bernstein, and Erhard Rahm. 2001. Generic Schema Matching with Cupid. InProceedings of the International Conference on Very Large Databases (VLDB). 49–58

work page 2001

[45] [45]

Norman May, Alexander Böhm, and Wolfgang Lehner. 2017. SAP HANA - The Evolution of an In-Memory DBMS from Pure OLAP Processing Towards Mixed Workloads. In Proceedings of the Conference Datenbanksysteme in Business, Technologie und Web Technik (BTW). 545–563

work page 2017

[46] [46]

Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase, Allison Holloway, Hui Jin, Jesse Kamp, Kartik Kulkarni, Tirthankar Lahiri, Juan Loaiza, Neil MacNaughton, Vineet Marwah, Atrayee Mullick, Andy Witkowski, Jiaqi Yan, and Mohamed Zaït. 2015. Distributed Architecture of Oracle Database In-memory. Proceedings of the VLDB End...

work page 2015

[47] [47]

Thomas Neumann. 2014. Engineering High-Performance Database Engines. Proceedings of the VLDB Endowment (PVLDB) 7, 13 (2014), 1734–1741

work page 2014

[48] [48]

Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In Proceedings of the Conference on Innovative Data Systems Research (CIDR) . 7 pages

work page 2020

[49] [49]

Anisoara Nica, Reza Sherkat, Mihnea Andrei, Xun Chen, Martin Heidel, Christian Bensberg, and Heiko Gerwens. 2017. Statisticum: Data Statistics Management in SAP HANA. Proceedings of the VLDB Endowment (PVLDB) 10, 12 (2017), Daniel Lindner, Daniel Ritter, and Felix Naumann 1658–1669

work page 2017

[50] [50]

O’Neil, Elizabeth J

Patrick E. O’Neil, Elizabeth J. O’Neil, and Xuedong Chen. 2009. Star Schema Benchmark. Standard Specification Revision 3. https://www.cs.umb.edu/~poneil/ StarSchemaB.PDF (accessed April 9, 2024)

work page 2009

[51] [51]

O’Neil, Elizabeth J

Patrick E. O’Neil, Elizabeth J. O’Neil, Xuedong Chen, and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing. InProceedings of the TPC Technology Conference (TPCTC) . 237–252

work page 2009

[52] [52]

Oracle. [n. d.]. MySQL 8.0 Reference Manual – Optimizing IN and EXISTS Subquery Predicates with Semijoin Transformations . https://dev.mysql.com/doc/refman/8. 0/en/semijoins.html (accessed April 9, 2024)

work page 2024

[53] [53]

Orr, Srikanth Kandula, and Surajit Chaudhuri

Laurel J. Orr, Srikanth Kandula, and Surajit Chaudhuri. 2019. Pushing Data- Induced Predicates Through Joins in Big-Data Clusters. Proceedings of the VLDB Endowment (PVLDB) 13, 3 (2019), 252–265

work page 2019

[54] [54]

Thorsten Papenbrock, Jens Ehrlich, Jannik Marten, Tommy Neubert, Jan-Peer Rudolph, Martin Schönberg, Jakob Zwiener, and Felix Naumann. 2015. Func- tional Dependency Discovery: An Experimental Evaluation of Seven Algorithms. Proceedings of the VLDB Endowment (PVLDB) 8, 10 (2015), 1082–1093

work page 2015

[55] [55]

Thorsten Papenbrock and Felix Naumann. 2017. A Hybrid Approach for Effi- cient Unique Column Combination Discovery. In Proceedings of the Conference Datenbanksysteme in Business, Technologie und Web Technik (BTW) . 195–204

work page 2017

[56] [56]

Mowry, Matthew Perron, Ian Quah, Siddharth San- turkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang

Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth San- turkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Sys- tems. In Proceedings of the Conference on Innovative Data Syste...

work page 2017

[57] [57]

Eduardo H. M. Pena, Erik Falk, Jorge Augusto Meira, and Eduardo Cunha de Almeida. 2018. Mind Your Dependencies for Semantic Query Optimization. J. Inf. Data Manag. 9, 1 (2018), 3–19

work page 2018

[58] [58]

Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. In Proceedings of the International Conference on Management of Data (SIGMOD). 1981–1984

work page 2019

[59] [59]

Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent KulandaiSamy, Jens Leenstra, Sam Lightstone, Shaorong Liu, Guy M

Vijayshankar Raman, Gopi K. Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent KulandaiSamy, Jens Leenstra, Sam Lightstone, Shaorong Liu, Guy M. Lohman, Tim Malkemus, René Müller, Ippokratis Pandis, Berni Schiefer, David Sharpe, Richard Sidle, Adam J. Storm, and Liping Zhang. 2013. DB2 with BLU Acceleration: So Much More than Just a Column St...

work page 2013

[60] [60]

Aref, Ahmed K

El Kindi Rezig, Mourad Ouzzani, Walid G. Aref, Ahmed K. Elmagarmid, Ahmed R. Mahmood, and Michael Stonebraker. 2021. Horizon: Scalable Dependency-driven Data Cleaning. Proceedings of the VLDB Endowment (PVLDB) 14, 11 (2021), 2546– 2554

work page 2021

[61] [61]

Philipp Schirmer, Thorsten Papenbrock, Sebastian Kruse, Felix Naumann, Dennis Hempfing, Torben Mayer, and Daniel Neuschäfer-Rube. 2019. DynFD: Functional Dependency Discovery in Dynamic Datasets. In Proceedings of the International Conference on Extending Database Technology (EDBT) . 253–264

work page 2019

[62] [62]

Shashi Shekhar, Babak Hamidzadeh, Ashim Kohli, and Mark Coyle. 1993. Learn- ing Transformation Rules for Semantic Query Optimization: A Data-Driven Approach. IEEE Transactions on Knowledge and Data Engineering (TKDE) 5, 6 (1993), 950–964

work page 1993

[63] [63]

Siegel, Edward Sciore, and Sharon C

Michael D. Siegel, Edward Sciore, and Sharon C. Salveter. 1992. A Method for Automatic Rule Derivation to Support Semantic Query Optimization. ACM Transactions on Database Systems (TODS) 17, 4 (1992), 563–600

work page 1992

[64] [64]

Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C

Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C. Caragea, Carlos Garcia-Alvarado, Foyzur Rahman, Michalis Petropoulos, Florian Waas, Sivaramakrishnan Narayanan, Konstantinos Krikellas, and Rhonda Baldwin. 2014. Orca: a modular query optimizer architecture for big data. In Proceedings of the Interna...

work page 2014

[65] [65]

Jaroslaw Szlichta, Parke Godfrey, and Jarek Gryz. 2012. Fundamentals of Order Dependencies. Proceedings of the VLDB Endowment (PVLDB) 5, 11 (2012), 1220– 1231

work page 2012

[66] [66]

Jaroslaw Szlichta, Parke Godfrey, Jarek Gryz, Wenbin Ma, Przemyslaw Pawluk, and Calisto Zuzarte. 2011. Queries on dates: fast yet not blind. In Proceedings of the International Conference on Extending Database Technology (EDBT) . 497–502

work page 2011

[67] [67]

2021.TPC Benchmark DS

Transaction Processing Performance Council. 2021.TPC Benchmark DS. Standard Specification Version 3.2.0. http://tpc.org/tpc_documents_current_versions/pdf/ tpc-ds_v3.2.0.pdf (accessed April 9, 2024)

work page 2021

[68] [68]

Transaction Processing Performance Council. 2022. TPC Benchmark H. Standard Specification Revision 3.0.1. http://tpc.org/tpc_documents_current_versions/ pdf/tpc-h_v3.0.1.pdf (accessed April 9, 2024)

work page 2022

[69] [69]

Jeffrey D. Ullman. 1988. Principles of Database and Knowledge-Base Systems, Volume I. Principles of computer science series, Vol. 14. Computer Science Press

work page 1988

[70] [70]

J. Beau W. Webber. 2013. A bi-symmetric log transformation for wide-range data. Measurement Science and Technology 24, 2 (2013), 3 pages

work page 2013

[71] [71]

Yu and Wei Sun

Clement T. Yu and Wei Sun. 1989. Automatic Knowledge Acquisition and Main- tenance for Semantic Query Optimization. IEEE Transactions on Knowledge and Data Engineering (TKDE) 1, 3 (1989), 362–375

work page 1989

[72] [72]

Xinyu Zeng, Yulong Hui, Jiahong Shen, Andrew Pavlo, Wes McKinney, and Huanchen Zhang. 2023. An Empirical Evaluation of Columnar Storage Formats. Proceedings of the VLDB Endowment (PVLDB) 17, 2 (2023), 148–161

work page 2023

[73] [73]

Mohamed Ziauddin, Andrew Witkowski, You Jung Kim, Janaki Lahorani, Dmitry Potapov, and Murali Krishna. 2017. Dimensions Based Data Clustering and Zone Maps. Proceedings of the VLDB Endowment (PVLDB) 10, 12 (2017), 1622–1633

work page 2017