pith. sign in

arxiv: 1906.08687 · v1 · pith:2IW4POWEnew · submitted 2019-06-20 · 💻 cs.DB

A Layered Aggregate Engine for Analytics Workloads

Pith reviewed 2026-05-25 18:49 UTC · model grok-4.3

classification 💻 cs.DB
keywords aggregate computationdatabase optimizationmachine learning over databasesdata cubesquery optimizationin-memory analyticsbatch aggregatesjoin computation
0
0 comments X

The pith

LMFAO computes batches of aggregates over database joins orders of magnitude faster than commercial databases and ML systems by applying layered optimizations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LMFAO as an in-memory engine for batches of aggregates computed over the join of input database relations. Many analytics workloads, such as training linear regression models, decision trees, Bayesian networks, and constructing data cubes, reduce to these group-by aggregates. LMFAO applies successive layers of logical and code optimizations that share computation across aggregates, exploit parallelism, and generate specialized code. Benchmarks on four datasets show speedups of several orders of magnitude against a commercial database, MonetDB, TensorFlow, Scikit-learn, R, and AC/DC. This matters because it lets users perform model learning and exploration directly inside the database engine rather than exporting data to separate tools.

Core claim

LMFAO is a layered engine whose logical and code optimizations systematically share computation, parallelism, and specialization when evaluating batches of aggregates over joins; this decomposition covers ridge linear regression, classification and regression trees, Chow-Liu trees for Bayesian networks, and data cubes, and produces orders-of-magnitude speedups over both database systems and machine-learning frameworks on the tested workloads.

What carries the argument

LMFAO's layers of logical and code optimizations that exploit sharing of computation, parallelism, and code specialization for aggregate batches.

If this is right

  • Model training for linear regression, trees, and Bayesian networks can be performed directly over database joins without data export.
  • Data-cube exploration in warehousing becomes feasible at larger scales due to shared aggregate computation.
  • Multiple aggregate queries in a batch benefit from common subexpression elimination across the join.
  • In-memory execution with code specialization reduces per-aggregate overhead compared with general-purpose query engines.
  • The same layered approach applies uniformly to both relational analytics and statistical model learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Workloads outside the four examples might still gain if their heavy steps fit the aggregate-over-join pattern.
  • Embedding LMFAO inside existing database servers could let users invoke it via standard SQL extensions.
  • The performance edge might allow interactive model selection loops that were previously too slow.
  • Similar layering of sharing and specialization could be applied to other batch query problems such as frequent itemset mining.

Load-bearing premise

The data-intensive parts of the listed analytics workloads can be decomposed into group-by aggregates over the join of the input relations.

What would settle it

A workload whose main computation cannot be expressed as group-by aggregates over joins, or a new dataset on which LMFAO shows no substantial speedup over the compared systems.

Figures

Figures reproduced from arXiv: 1906.08687 by Dan Olteanu, Hung Q. Ngo, Mahmoud Abo Khamis, Maximilian Schleich, XuanLong Nguyen.

Figure 1
Figure 1. Figure 1: ‡e Optimization Layers of LMFAO. Œe Find Roots layer is novel and a‚ects the design of all subsequent layers. By default, LMFAO computes each group-by aggregate in one boŠom-up pass over the join tree, by decomposing the aggregate into views computed along each edge in the join tree. We allow for di‚erent traversals of the join tree: di‚erent aggregates may be computed over the same join tree rooted at di‚… view at source ↗
Figure 2
Figure 2. Figure 2: Example of a regression tree. Classi€cation [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (le ) ‡e schema for the Favorita dataset. (middle) A join tree for this schema with directional views [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Multi-output execution plan to compute Q4, Q5, and Q6 in the example of Section 3.5. Q6(item;д(item) · f (units) · α1 · α4 · α8) += VT (date,store; α8), S(item, date,store, units, promo),VH (date; α4),VI (item; α1) Join attribute order. Œe scan uses a total order on the join aŠributes of the relation S and sees S logically as a (partial) trie, grouped by the €rst join aŠribute and so on until the last join… view at source ↗
Figure 5
Figure 5. Figure 5: Performance impact of optimizations in LM [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Join Trees used in experiments for the Retailer, Favorita, Yelp and TPC-DS datasets. [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Snippet of code generated by LMFAO that shows how aggregates are stored and computed. query processing that pipelines tuples between relational operators in the execution plan for one query. Œe three IRs exploit information about the workload at compile time to specialize the generated code. We next ex￾plain some of these optimizations; further code optimizations have been already mentioned at the end of S… view at source ↗
read the original abstract

This paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. The primary motivation for this work stems from the observation that for a variety of analytics over databases, their data-intensive tasks can be decomposed into group-by aggregates over the join of the input database relations. We exemplify the versatility and competitiveness of LMFAO for a handful of widely used analytics: learning ridge linear regression, classification trees, regression trees, and the structure of Bayesian networks using Chow-Liu trees; and data cubes used for exploration in data warehousing. LMFAO consists of several layers of logical and code optimizations that systematically exploit sharing of computation, parallelism, and code specialization. We conducted two types of performance benchmarks. In experiments with four datasets, LMFAO outperforms by several orders of magnitude on one hand, a commercial database system and MonetDB for computing batches of aggregates, and on the other hand, TensorFlow, Scikit, R, and AC/DC for learning a variety of models over databases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces LMFAO, an in-memory engine for batches of group-by aggregates over database joins. It claims that analytics workloads including ridge regression, decision/regression trees, Chow-Liu trees for Bayesian networks, and data cubes can be decomposed into such aggregates, allowing layered logical and code optimizations (sharing, parallelism, specialization) to deliver orders-of-magnitude speedups over commercial DBMSs, MonetDB, TensorFlow, Scikit-learn, R, and AC/DC on four datasets.

Significance. If the workload decompositions are complete and the experimental claims are substantiated, the work would offer a unified, highly optimized aggregate engine that bridges OLAP and ML over relational data, with potential for broad impact in analytics systems.

major comments (2)
  1. [Abstract] Abstract: the central claim that ridge regression, decision trees, Chow-Liu trees, and data cubes reduce exactly to batches of group-by aggregates over the input join (with no material extra work) is presented as an observation but supplies no explicit reductions, complexity arguments, or completeness proofs; this decomposition is load-bearing for the TensorFlow/Scikit/R comparisons.
  2. [Experiments] Experiments section (implied by abstract claims): the abstract reports orders-of-magnitude gains but provides no details on experimental setup, exact queries, hardware, statistical significance, or how the ML workloads were mapped to aggregates; without these, the performance claims against external baselines cannot be verified.
minor comments (1)
  1. [Abstract] Abstract: the phrasing 'on one hand ... and on the other hand' is slightly awkward and could be clarified for readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and experimental presentation. We address each major comment below and will revise the manuscript accordingly where the points identify areas for improved clarity.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that ridge regression, decision trees, Chow-Liu trees, and data cubes reduce exactly to batches of group-by aggregates over the input join (with no material extra work) is presented as an observation but supplies no explicit reductions, complexity arguments, or completeness proofs; this decomposition is load-bearing for the TensorFlow/Scikit/R comparisons.

    Authors: The full manuscript provides explicit reductions for each workload in dedicated sections (ridge regression in Section 4 via the normal equations reducing to aggregates such as SUM(x_i * x_j) and SUM(x_i * y) over the join; decision/regression trees in Section 5 via sufficient statistics for splits; Chow-Liu trees in Section 6 via mutual information aggregates; data cubes in Section 7). These are presented as direct mappings with no additional materialization beyond the aggregates. We agree the abstract presents this too concisely without referencing the sections and will revise it to briefly note the reductions and point readers to the relevant sections. Complexity arguments follow from the fact that the number of aggregates is polynomial in the number of features (independent of join size after factorization), but we will add a short summary paragraph. Full completeness proofs are not included because the reductions follow from standard statistical formulations of these models; the paper's focus is the aggregate engine rather than re-deriving ML theory. We will add a clarifying sentence in the abstract and introduction. revision: partial

  2. Referee: [Experiments] Experiments section (implied by abstract claims): the abstract reports orders-of-magnitude gains but provides no details on experimental setup, exact queries, hardware, statistical significance, or how the ML workloads were mapped to aggregates; without these, the performance claims against external baselines cannot be verified.

    Authors: The experiments section of the full manuscript describes the four datasets, the two benchmark types (aggregate batches vs. ML libraries), and reports wall-clock times. However, we acknowledge that explicit mappings from each ML workload to the precise aggregate queries, hardware specifications, and measures of statistical significance (e.g., standard deviation across runs) are not presented with sufficient granularity. We will revise the experiments section to include (1) a table listing the exact aggregate queries generated for each model on each dataset, (2) hardware details (processor, memory, OS), and (3) run-time variance where multiple executions were performed. The abstract itself is intentionally high-level per convention and will remain so, but the experiments section will be made self-contained. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on external benchmarks and stated observation

full rationale

The paper presents LMFAO as a new engine with layered optimizations for batches of aggregates over joins, motivated by the observation that analytics workloads decompose into such aggregates. Performance claims are supported by direct comparisons to external systems (MonetDB, TensorFlow, Scikit, R, AC/DC) on four datasets rather than any internal fitted parameters or self-referential derivations. No equations, predictions, or load-bearing self-citations are shown that reduce results to inputs by construction; the decomposition is presented as a motivating observation without being derived within the paper itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a systems and engineering paper; the abstract introduces no mathematical free parameters, domain axioms beyond standard database assumptions, or new postulated entities.

pith-pipeline@v0.9.0 · 5721 in / 1137 out tokens · 26016 ms · 2026-05-25T18:49:38.716193+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages

  1. [1]

    Mart´ın Abadi, Paul Barham, Jianmin Chen, and et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265–283

  2. [2]

    Aberger, Susan Tu, Kunle Olukotun, and Christopher R´e

    Christopher R. Aberger, Susan Tu, Kunle Olukotun, and Christopher R´e. 2016. EmptyHeaded: A Relational Engine for Graph Processing. In SIGMOD. 431–446

  3. [3]

    Abiteboul, R

    S. Abiteboul, R. Hull, and V. Vianu. 1995. Foundations of Databases . Addison-Wesley. 14

  4. [4]

    Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich

    Mahmoud Abo Khamis, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. 2018. AC/DC: In-Database Learning /T_hunderstruck. InDEEM. 8:1–8:10

  5. [5]

    Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich

    Mahmoud Abo Khamis, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. 2018. In-Database Learning with Sparse Tensors. In PODS. 325–340

  6. [6]

    Ngo, and Atri Rudra

    Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. 2016. FAQ: /Q_uestions Asked Frequently. InPODS. 13–28

  7. [7]

    S. M. Aji and R. J. McEliece. 2006. /T_he Generalized Distributive Law. IEEE Trans. Inf. /T_heor.46, 2 (2006), 325–343

  8. [8]

    Nurzhan Bakibayev, Tom´as Kocisk´y, Dan Olteanu, and Jakub Z´avodn´y

  9. [9]

    PVLDB 6, 14 (2013), 1990–2001

    Aggregation and Ordering in Factorised Databases. PVLDB 6, 14 (2013), 1990–2001

  10. [10]

    Nurzhan Bakibayev, Dan Olteanu, and Jakub Z´avodn´y. 2012. FDB: A /Q_uery Engine for Factorised Relational Databases.PVLDB 5, 11 (2012), 1232–1243

  11. [11]

    Ma/t_thias Boehm et al. 2016. SystemML: Declarative Machine Learning on Spark. PVLDB 9, 13 (2016), 1425–1436

  12. [12]

    Breiman, J

    L. Breiman, J. Friedman, R. Olshen, and C. Stone. 1984. Classi/f_ication and Regression Trees. Wadsworth and Brooks, Monterey, CA

  13. [13]

    Surajit Chaudhuri. 1998. Data Mining and Database Systems: Where is the Intersection? IEEE Data Eng. Bull. 21, 1 (1998), 4–8

  14. [14]

    Fayyad, and Jeff Bernhardt

    Surajit Chaudhuri, Usama M. Fayyad, and Jeff Bernhardt. 1999. Scal- able Classi/f_ication over SQL Databases. InICDE. 470–479

  15. [15]

    Naughton, and Jignesh M

    Lingjiao Chen, Arun Kumar, Jeffrey F. Naughton, and Jignesh M. Patel

  16. [16]

    PVLDB 10, 11 (2017), 1214–1225

    Towards Linear Algebra over Normalized Data. PVLDB 10, 11 (2017), 1214–1225

  17. [17]

    Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In KDD. 785–794

  18. [18]

    Chow and C

    C. Chow and C. Liu. 2006. Approximating Discrete Probability Distri- butions with Dependence Trees. IEEE Trans. Inf. /T_heor.14, 3 (2006), 462–467

  19. [19]

    Ev- /f_imievski, Shirish Tatikonda, Berthold Reinwald, and Prithviraj Sen

    Tarek Elgamal, Shangyu Luo, Ma/t_thias Boehm, Alexandre V. Ev- /f_imievski, Shirish Tatikonda, Berthold Reinwald, and Prithviraj Sen

  20. [20]

    SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning. In CIDR

  21. [21]

    Corporacion Favorita. 2017. Corp. Favorita Grocery Sales Forecasting: Can you accurately predict sales for a large grocery chain? (2017). h/t_tps://www.kaggle.com/c/favorita-grocery-sales-forecasting/

  22. [22]

    Xixuan Feng, Arun Kumar, Benjamin Recht, and Christopher R´e. 2012. Towards a uni/f_ied architecture for in-RDBMS analytics. InSIGMOD. 325–336

  23. [23]

    Gao, Shangyu Luo, Luis Leopoldo Perez, and Chris Jermaine

    Zekai J. Gao, Shangyu Luo, Luis Leopoldo Perez, and Chris Jermaine

  24. [24]

    In SIGMOD

    /T_he BUDS Language for Distributed Bayesian Machine Learning. In SIGMOD. 961–976

  25. [25]

    Georgios Giannikis, Darko Makreshanski, Gustavo Alonso, and Don- ald Kossmann. 2014. Shared Workload Optimization. PVLDB 7, 6 (2014), 429–440

  26. [26]

    Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh

  27. [27]

    Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. InICDE. 152–159

  28. [28]

    Ga¨el Guennebaud, Benoˆıt Jacob, et al. 2010. Eigen v3. (2010). h/t_tp: //eigen.tuxfamily.org

  29. [29]

    Venky Harinarayan, Anand Rajaraman, and Jeffrey D. Ullman. 1996. Implementing Data Cubes Efficiently. In SIGMOD. 205–216

  30. [30]

    Hellerstein, Christopher R´e, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar

    Joseph M. Hellerstein, Christopher R´e, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar. 2012. /T_he MADlib Analytics Library or MAD Skills, the SQL. PVLDB 5, 12 (2012), 1700– 1711

  31. [31]

    Sjoerd Mullender, and Martin L

    Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, K. Sjoerd Mullender, and Martin L. Kersten. 2012. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng. Bull. 35, 1 (2012), 40–45

  32. [32]

    Kaggle. 2018. Kaggle ML & DB Survey. (2018). h/t_tps://www.kaggle. com/kaggle/kaggle-survey-2018

  33. [33]

    Timo Kersten, Viktor Leis, Alfons Kemper, /T_homas Neumann, Andrew Pavlo, and Peter Boncz. 2018. Everything You Always Wanted to Know About Compiled and Vectorized /Q_ueries but Were Afraid to Ask. PVLDB 11, 13 (2018), 2209–2222

  34. [34]

    Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. /T_he Tensor Algebra Compiler.OOPSLA 1, Article 77 (2017), 77:1–77:29 pages

  35. [35]

    Naughton, and Jignesh M

    Arun Kumar, Jeffrey F. Naughton, and Jignesh M. Patel. 2015. Learning Generalized Linear Models Over Normalized Data. In SIGMOD. 1969– 1984

  36. [36]

    Naughton, Jignesh M

    Arun Kumar, Jeffrey F. Naughton, Jignesh M. Patel, and Xiaojin Zhu

  37. [37]

    In SIGMOD

    To Join or Not to Join?: /T_hinking Twice about Joins before Feature Selection. In SIGMOD. 19–34

  38. [38]

    Gao, Michael N

    Shangyu Luo, Zekai J. Gao, Michael N. Gubanov, Luis Leopoldo Perez, and Christopher M. Jermaine. 2018. Scalable Linear Algebra on a Relational Database System. SIGMOD Rec. 47, 1 (2018), 24–31

  39. [39]

    D´aniel Marx. 2010. Approximating Fractional Hypertree Width. ACM Trans. Algorithms 6, 2, Article 29 (April 2010), 17 pages

  40. [40]

    Brendan McMahan and et al

    H. Brendan McMahan and et al. 2013. Ad Click Prediction: A View from the Trenches. In KDD. 1222–1230

  41. [41]

    Xiangrui Meng, Joseph Bradley, et al. 2016. MLlib: Machine Learning in Apache Spark. J. Mach. Learn. Res. 17, 1 (2016), 1235–1241

  42. [42]

    Jan Motl and Oliver Schulte. 2015. /T_he CTU Prague Relational Learning Repository. (2015). arXiv:cs.LG/1511.03086

  43. [43]

    Inderpal Singh Mumick, Dallan /Q_uass, and Barinderpal Singh Mumick

  44. [44]

    In SIGMOD

    Maintenance of Data Cubes and Summary Tables in a Warehouse. In SIGMOD. 100–111

  45. [45]

    Raghunath Othayoth Nambiar and Meikel Poess. 2006. /T_he Making of TPC-DS. In PVLDB. 1049–1058

  46. [46]

    /T_homas Neumann. 2011. Efficiently Compiling Efficient /Q_uery Plans for Modern Hardware. PVLDB 4, 9 (2011), 539–550

  47. [47]

    Dan Olteanu and Maximilian Schleich. 2016. Factorized Databases. SIGMOD Rec. 45, 2 (2016), 5–16

  48. [48]

    Judea Pearl. 1982. Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach. In AAAI. 133–136

  49. [49]

    Fabian Pedregosa, Ga ¨el Varoquaux, Alexandre Gramfort, and et al

  50. [50]

    Scikit-learn: Machine Learning in Python. J. Machine Learning Research 12 (2011), 2825–2830

  51. [51]

    Holger Pirk, Oscar Moll, Matei Zaharia, and Samuel Madden. 2016. Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware. PVLDB 9 (2016), 1707–1718

  52. [52]

    Chengjie Qin and Florin Rusu. 2015. Speculative Approximations for Terascale Distributed Gradient Descent Optimization. In DanaC. 1:1–1:10

  53. [53]

    R Core Team. 2013. R: A Language and Environment for Statistical Computing. R Foundation for Stat. Comp., www.r-project.org

  54. [54]

    Maximilian Schleich, Dan Olteanu, and Radu Ciucanu. 2016. Learning Linear Regression Models over Factorized Joins. In SIGMOD. 3–18

  55. [55]

    Timos K. Sellis. 1988. Multiple-query Optimization. ACM Trans. Database Syst. 13, 1 (1988), 23–52

  56. [56]

    Amir Shaikhha, Yannis Klonatos, and Christoph Koch. 2018. Build- ing Efficient /Q_uery Engines in a High-Level Language.ACM Trans. Database Syst. 43, 1, Article 4 (2018), 45 pages

  57. [57]

    Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mo- hammad Dashti, and Christoph Koch. 2016. How to Architect a /Q_uery Compiler. In SIGMOD. 1907–1922

  58. [58]

    Spampinato and Markus P ¨uschel

    Daniele G. Spampinato and Markus P ¨uschel. 2016. A basic linear algebra compiler for structured matrices. In CGO. 117–127. 15

  59. [59]

    Tahboub, Gr´egory M

    Ruby Y. Tahboub, Gr´egory M. Essertel, and Tiark Rompf. 2018. How to Architect a /Q_uery Compiler, Revisited. InSIGMOD. 307–322

  60. [60]

    /T_he StatsModels development team. 2012. StatsModels: Statistics in Python, h/t_tp://statsmodels.sourceforge.net. (2012)

  61. [61]

    Veldhuizen

    Todd L. Veldhuizen. 2014. Triejoin: A Simple, Worst-Case Optimal Join Algorithm. In ICDT. 96–106

  62. [62]

    Abdul Wasay, Xinding Wei, Niv Dayan, and Stratos Idreos. 2017. Data Canopy: Accelerating Exploratory Statistical Analysis. In SIGMOD. 557–572

  63. [63]

    Yan and Per-˚Ake Larson

    Weipeng P. Yan and Per-˚Ake Larson. 1995. Eager Aggregation and Lazy Aggregation. In VLDB. 345–357

  64. [64]

    Yelp. 2017. Yelp Dataset Challenge. (2017). h/t_tps://www.yelp.com/ dataset/challenge/

  65. [65]

    Matei Zaharia, Mosharaf Chowdhury, et al. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Com- puting. In NSDI. 2–2

  66. [66]

    Marcin Zukowski, Mark van de Wiel, and Peter Boncz. 2012. Vector- wise: A Vectorized Analytical DBMS. In ICDE. 1349–1350. A DATASETS Figure 6 gives the join trees for the four datasets used in the experiments in Section 4. Retailer has /f_ive relations:Inventory stores the number of inventory units for each date, store, and stock keeping unit (sku); Locat...