pith. machine review for the scientific record. sign in

arxiv: 2603.27775 · v2 · submitted 2026-03-29 · 💻 cs.DB

Recognition: 2 theorem links

· Lean Theorem

Enzyme: Incremental View Maintenance for Data Engineering

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:33 UTC · model grok-4.3

classification 💻 cs.DB
keywords incremental view maintenancematerialized viewsApache Sparkdata pipelinescost-based optimizationETLdeclarative pipelinesview refresh
0
0 comments X

The pith

Enzyme automates incremental refresh of materialized views in Spark pipelines through cost-based strategy selection, delivering billions of daily CPU-second savings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Enzyme is an incremental view maintenance engine built for Apache Spark Declarative Pipelines that treats materialized views as first-class building blocks for ETL and analytical workloads. It adds a cost-based optimization layer that automatically chooses refresh strategies for collections of views, exploiting batching across sources instead of requiring users to hand-tune maintenance. The system maintains consistency as underlying data changes while reducing the need for manual intervention in high-throughput settings. Production validation across thousands of diverse pipelines shows large efficiency gains that lower overall compute costs.

Core claim

Enzyme provides a built-in end-to-end incremental view maintenance approach for Spark by layering a cost-based optimizer on top of Spark primitives; the optimizer selects refresh strategies for pipelines of materialized views, incorporates batching optimizations, and generalizes across data sources, with empirical results confirming substantial performance gains at scale.

What carries the argument

The cost-based optimization layer that selects and plans refresh strategies for collections of materialized views organized into pipelines, while exploiting cross-source batching opportunities.

If this is right

  • Users focus on business logic rather than materialized view mechanics in declarative pipelines.
  • Total cost of ownership for data engineering workloads decreases through automated and efficient maintenance.
  • Performance scales on standard benchmarks and large production deployments via batching and optimization.
  • Modular architecture supports extension to additional data sources and query engines beyond current Spark usage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the optimizer generalizes reliably, the same automation pattern could reduce manual tuning in other data processing systems that rely on materialized views.
  • The demonstrated compute reductions open the possibility of running more frequent or real-time updates in environments where resources were previously a constraint.
  • Broader adoption might shift ETL design toward treating incremental maintenance as a default rather than an advanced feature.

Load-bearing premise

The cost-based optimizer can reliably pick correct and efficient refresh strategies for arbitrary view collections and data sources without adding unacceptable overhead or correctness risks.

What would settle it

A production pipeline in which the optimizer-chosen refresh strategy uses more compute than a manually tuned alternative or produces inconsistent view results would disprove the central efficiency and reliability claims.

Figures

Figures reproduced from arXiv: 2603.27775 by Bilal Aslam, Indrajit Roy, Jeffrey Helt, Manuel Ung, Melody Hu, Michael Armbrust, Min Yang, Paul Lappas, Ritwik Yadav, Ross Bunker, Shrikanth Shankar, Sourav Chatterji, Supun Abeysinghe, Tahir Fayyaz, Tom van Bussel, William Wei, Yannis Papakonstantinou, Yiming Yang, Yuhong Chen.

Figure 1
Figure 1. Figure 1: Medallion architecture organizing data into pro [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Simplified query plan for the MV query in Figure 2. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Enzyme transforms a query into an incremental [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Enzyme selects between incremental and full re [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Dependency Graph of MVs in TPC-DI [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Enzyme performance overview on TPC-DI benchmark. Note that the y-axis is log scale. [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Enzyme performance compared to a leading cloud vendor. Note that the y-axis is log scale. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Autoscaling takes advantage of smoother task [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
read the original abstract

Materialized views are a core construct in database systems, used to accelerate analytical queries and optimize batch pipelines for extract-transform-load (ETL) workflows. Maintaining view consistency as underlying data evolves is a fundamental challenge, especially in high-throughput and real-time settings. Incremental view maintenance (IVM) has been studied for decades and continues to attract significant investment from major database vendors. However, most industrial systems either offer limited SQL-operator coverage or require users to hand-tune refresh strategies. This paper presents Enzyme, an IVM engine developed at Databricks to power Spark Declarative Pipelines. It provides a built-in, end-to-end approach to incremental pipelines, utilizing materialized views as first-class building blocks. By automating refresh planning, Enzyme reduces total cost of ownership and lets users focus on business logic rather than MV mechanics. Validation across thousands of large-scale production pipelines spanning diverse application domains has demonstrated substantial computational efficiency gains, yielding a cumulative daily compute reduction of billions of CPU seconds. Built atop Apache Spark primitives, Enzyme adds a cost-based optimization layer that selects refresh strategies for collections of materialized views organized into pipelines. Enzyme's modular architecture is designed to generalize across data sources and query engines. We present key design decisions for incremental refresh planning and execution, including optimizations that exploit batching opportunities across materialized view sources. Experimental results on standard benchmarks demonstrate significant performance improvements at scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents Enzyme, an incremental view maintenance (IVM) engine built for Apache Spark Declarative Pipelines at Databricks. It treats materialized views as first-class constructs in ETL pipelines, automates refresh planning via a cost-based optimizer that selects strategies and exploits batching across views, and claims to reduce total cost of ownership by eliminating manual tuning. The central empirical claim is that deployment across thousands of large-scale production pipelines has yielded billions of daily CPU-second reductions, with additional support from experiments on standard benchmarks.

Significance. If the production-scale claims are substantiated with reproducible methodology, Enzyme would constitute a meaningful systems contribution by demonstrating practical, generalizable IVM at the scale of modern Spark workloads. The emphasis on modular architecture and automated strategy selection addresses a long-standing gap between academic IVM research and industrial ETL practice. However, the current manuscript supplies no concrete evidence (benchmarks, baselines, or verification procedures) that would allow the field to assess whether the reported gains are attributable to the described optimizer rather than workload-specific factors.

major comments (2)
  1. [Abstract] Abstract: The headline claim of 'cumulative daily compute reduction of billions of CPU seconds' across thousands of production pipelines is presented without any description of the evaluation methodology, baseline systems, error bars, or correctness verification procedure. This omission makes it impossible to determine whether the savings result from Enzyme's cost-based refresh planner or from unrelated Spark improvements.
  2. [Design and Optimization sections (referenced in Abstract)] The cost-based optimization layer is described as selecting refresh strategies and exploiting batching opportunities, yet the manuscript provides no specification of the cost model, the search space over refresh plans, or how interdependencies among materialized views are handled. Without these details, the central architectural claim cannot be evaluated for overhead or correctness on arbitrary MV collections.
minor comments (1)
  1. [Abstract] The abstract states that 'experimental results on standard benchmarks demonstrate significant performance improvements at scale' but does not name the benchmarks or report quantitative results; these should be added with tables or figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for clearer methodological details. We will revise the manuscript to strengthen the abstract and expand the description of the cost-based optimizer. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claim of 'cumulative daily compute reduction of billions of CPU seconds' across thousands of production pipelines is presented without any description of the evaluation methodology, baseline systems, error bars, or correctness verification procedure. This omission makes it impossible to determine whether the savings result from Enzyme's cost-based refresh planner or from unrelated Spark improvements.

    Authors: We agree that the abstract would benefit from a concise description of the evaluation approach. The reported savings come from production A/B deployments: for each pipeline we measured daily CPU-seconds under the prior manual refresh regime versus after Enzyme was enabled, using the same Spark version and data volumes. Correctness was verified by comparing view contents and downstream query results before and after each refresh. We will revise the abstract to state this high-level methodology and point to Section 5 for benchmark details on standard datasets. Because the production data are proprietary, we report only aggregated statistics rather than per-pipeline error bars. revision: yes

  2. Referee: [Design and Optimization sections (referenced in Abstract)] The cost-based optimization layer is described as selecting refresh strategies and exploiting batching opportunities, yet the manuscript provides no specification of the cost model, the search space over refresh plans, or how interdependencies among materialized views are handled. Without these details, the central architectural claim cannot be evaluated for overhead or correctness on arbitrary MV collections.

    Authors: We acknowledge that the current text could make the cost model and search procedure more explicit. The optimizer estimates refresh cost from Spark statistics on data volume, predicate selectivity, and update delta size; batching savings are modeled as a reduction in scan overhead when multiple views share source partitions. The search enumerates per-view choices (incremental versus full refresh) subject to the pipeline DAG and applies a dynamic-programming pass to select globally consistent plans. We will add the cost-model equations, a short pseudocode listing of the planner, and an explicit statement of how DAG dependencies are respected in the revised Design section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical production claims rest on external measurements without derivations or self-referential reductions

full rationale

The paper presents Enzyme as an IVM system with a cost-based optimizer for refresh planning and batching, validated via production runs and benchmarks. No equations, fitted parameters, or derivation steps appear in the provided text. Central efficiency claims (billions of CPU-second reductions) are attributed to observed outcomes across thousands of pipelines rather than any model that reduces to its own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The architecture description and empirical results form a self-contained systems contribution without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters, new axioms, or invented entities are introduced; the work is a systems implementation relying on standard database assumptions about view consistency and data evolution.

axioms (1)
  • domain assumption Standard assumptions on view consistency and incremental update semantics from prior IVM literature
    The system builds on decades of IVM research without stating new foundational axioms.

pith-pipeline@v0.9.0 · 5613 in / 1084 out tokens · 36375 ms · 2026-05-14T21:33:22.160663+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

  1. [1]

    Supun Abeysinghe, Qiyang He, and Tiark Rompf. 2022. Efficient Incrementializa- tion of Correlated Nested Aggregate Queries using Relative Partial Aggregate Indexes (RPAI). InProceedings of the ACM International Conference on Manage- ment of Data (SIGMOD ’22). ACM, 136–149. doi:10.1145/3514221.3517889

  2. [2]

    Narasayya

    Sanjay Agrawal, Surajit Chaudhuri, and Vivek R. Narasayya. 2000. Automated Selection of Materialized Views and Indexes in SQL Databases. InProceedings of the 26th International Conference on Very Large Data Bases (VLDB ’00). Morgan Kaufmann, Cairo, Egypt, 496–505

  3. [3]

    Rafi Ahmed, Randall Bello, Andrew Witkowski, and Praveen Kumar. 2020. Au- tomated Generation of Materialized Views in Oracle.Proceedings of the VLDB Endowment13, 12 (2020), 3046–3058

  4. [4]

    Tyler Akidau, Paul Barbier, Istvan Cseri, Fabian Hueske, Tyler Jones, Sasha Lion- heart, Daniel Mills, Dzmitry Pauliukevich, Lukas Probst, Niklas Semmler, Dan Sotolongo, and Boyuan Zhang. 2023. What’s the Difference? Incremental Process- ing with Change Queries in Snowflake.Proceedings of the ACM on Management of Data1, 2 (2023), 1–27. doi:10.1145/3589776

  5. [5]

    2024.Materialized Views in Amazon Redshift

    Amazon Web Services. 2024.Materialized Views in Amazon Redshift. Re- trieved November 1, 2025 from https://docs.aws.amazon.com/redshift/latest/ dg/materialized-view-overview.html

  6. [6]

    Michael Armbrust, Tathagata Das, Liwen Sun, Burak Yavuz, Shixiong Zhu, Mukul Murthy, Joseph Torres, Herman van Hovell, Adrian Ionescu, Alicja Łuszczak, Michał Świątkowski, Michał Szafrański, Xiao Li, Takuya Ueshin, Mostafa Mokhtar, Peter Boncz, Ali Ghodsi, Sameer Paranjpye, Pieter Senster, Reynold Xin, and Matei Zaharia. 2020. Delta Lake: High-Performance...

  7. [7]

    Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K

    Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. InProceedings of the ACM International Conference on Management of Data (SIGMOD ’15). ACM, 1383–1394. doi:10.1145/2723372.2742797

  8. [8]

    Nikos Armenatzoglou, Sanuj Basu, Naga Bhanoori, Mengchu Cai, Naresh Chainani, Kiran Chinta, Venkatraman Govindaraju, Todd J. Green, Monish Gupta, Sebastian Hillig, Eric Hotinger, Yan Leshinksy, Jintian Liang, Michael McCreedy, Fabian Nagel, Ippokratis Pandis, Panos Parchas, Rahul Pathak, Orestis Polychro- niou, Foyzur Rahman, Gaurav Saxena, Gokul Soundara...

  9. [9]

    Alexander Behm, Shoumik Palkar, Utkarsh Agarwal, Timothy Armstrong, David Cashman, Ankur Dave, Todd Greenstein, Shant Hovsepian, Ryan Johnson, Arvind Sai Krishnan, Paul Leventis, Ala Luszczak, Prashanth Menon, Mostafa Mokhtar, Gene Pang, Sameer Paranjpye, Greg Rahn, Bart Samwel, Tom van Bussel, Herman van Hovell, Maryann Xue, Reynold Xin, and Matei Zahari...

  10. [10]

    Bello, Karl Dias, Alan Downing, James J

    Randall G. Bello, Karl Dias, Alan Downing, James J. Feenan, James L. Finnerty, William D. Norcott, Harry Sun, Andrew Witkowski, and Mohamed Ziauddin

  11. [11]

    InProceedings of the 24th International Conference on Very Large Data Bases (VLDB ’98)

    Materialized Views in Oracle. InProceedings of the 24th International Conference on Very Large Data Bases (VLDB ’98). Morgan Kaufmann, New York, NY, USA, 659–664

  12. [12]

    Blakeley, Per-Åke Larson, and Frank Wm

    José A. Blakeley, Per-Åke Larson, and Frank Wm. Tompa. 1986. Efficiently Updating Materialized Views. InProceedings of the ACM International Conference on Management of Data (SIGMOD ’86). ACM, Washington, DC, USA, 61–71. doi:10.1145/16894.16861

  13. [13]

    Mihai Budiu, Tej Chajed, Frank McSherry, Leonid Ryzhyk, and Val Tannen. 2023. DBSP: Automatic Incremental View Maintenance for Rich Query Languages. Proceedings of the VLDB Endowment16, 7 (2023), 1601–1614. doi:10.14778/3587136. 3587137

  14. [14]

    Ramesh Chandra, Haogang Chen, Ray Matharu, Sarah Cai, Jeff Chen, Priyam Dutta, Bogdan Ghita, Todd Greenstein, Gopal Holla, Peng Huang, Yuchen Huo, Adrian Ionescu, Adriana Ispas, Tim Januschowski, Vihang Karajgaonkar, Stefania Leone, David Lewis, Andrew Li, Nong Li, Cheng Lian, Stephen Link, Qing Lu, Yesheng Ma, Chris Pettitt, Vijayan Prabhakaran, Bogdan R...

  15. [15]

    2019.Introducing Delta Time Travel for Large Scale Data Lakes

    Databricks. 2019.Introducing Delta Time Travel for Large Scale Data Lakes. Retrieved November 1, 2025 from https://www.databricks.com/blog/2019/02/04/ introducing-delta-time-travel-for-large-scale-data-lakes.html

  16. [16]

    2024.Use Row Tracking for Delta Tables

    Databricks. 2024.Use Row Tracking for Delta Tables. Retrieved November 1, 2025 from https://docs.databricks.com/aws/en/delta/row-tracking

  17. [17]

    2025.The AUTO CDC APIs: Simplify Change Data Capture with Pipelines

    Databricks. 2025.The AUTO CDC APIs: Simplify Change Data Capture with Pipelines. Retrieved November 1, 2025 from https://docs.databricks.com/aws/en/ ldp/cdc

  18. [18]

    2025.MERGE INTO (Delta Lake SQL Reference)

    Databricks. 2025.MERGE INTO (Delta Lake SQL Reference). Retrieved Novem- ber 1, 2025 from https://docs.databricks.com/aws/en/sql/language-manual/delta- merge-into

  19. [19]

    2025.Selectively Overwrite Data with Delta Lake

    Databricks. 2025.Selectively Overwrite Data with Delta Lake. Retrieved November 1, 2025 from https://docs.databricks.com/aws/en/delta/selective-overwrite

  20. [20]

    2025.Use Delta Lake Change Data Feed on Databricks

    Databricks. 2025.Use Delta Lake Change Data Feed on Databricks. Retrieved November 1, 2025 from https://docs.databricks.com/aws/en/delta/delta-change- data-feed

  21. [21]

    2025.What Are Deletion Vectors?Retrieved November 1, 2025 from https://docs.databricks.com/aws/en/delta/deletion-vectors

    Databricks. 2025.What Are Deletion Vectors?Retrieved November 1, 2025 from https://docs.databricks.com/aws/en/delta/deletion-vectors

  22. [22]

    Jonathan Goldstein and Per-Åke Larson. 2001. Optimizing Queries Using Materi- alized Views: A Practical, Scalable Solution.ACM SIGMOD Record30, 2 (2001), 331–342

  23. [23]

    2025.Introduction to Materialized Views

    Google. 2025.Introduction to Materialized Views. Retrieved November 1, 2025 from https://cloud.google.com/bigquery/docs/materialized-views-intro

  24. [24]

    Green, Shan Shan Huang, Boon Thau Loo, and Wenchao Zhou

    Todd J. Green, Shan Shan Huang, Boon Thau Loo, and Wenchao Zhou. 2013. Datalog and Recursive Query Processing.Foundations and Trends in Databases5, 2 (2013), 105–195. doi:10.1561/1900000017

  25. [25]

    Timothy Griffin and Bharat Kumar. 1998. Algebraic Change Propagation for Semijoin and Outerjoin Queries.ACM SIGMOD Record27, 3 (1998), 22–27

  26. [26]

    Timothy Griffin and Leonid Libkin. 1995. Incremental Maintenance of Views with Duplicates. InProceedings of the ACM International Conference on Management of Data (SIGMOD ’95). ACM, San Jose, CA, USA, 328–339. doi:10.1145/223784.223849

  27. [27]

    Ashish Gupta, Inderpal Singh Mumick, and V. S. Subrahmanian. 1993. Main- taining Views Incrementally. InProceedings of the ACM International Conference on Management of Data (SIGMOD ’93). ACM, Washington, DC, USA, 157–166. doi:10.1145/170035.170066

  28. [28]

    Muhammad Idris, Martín Ugarte, and Stijn Vansummeren. 2017. The Dynamic Yannakakis Algorithm: Compact and Efficient Query Processing Under Updates. InProceedings of the ACM International Conference on Management of Data (SIG- MOD ’17). ACM, 1259–1274. doi:10.1145/3035918.3064027

  29. [29]

    Muhammad Idris, Martín Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolf- gang Lehner. 2018. Conjunctive Queries with Inequalities Under Updates.Pro- ceedings of the VLDB Endowment11, 7 (2018), 733–745

  30. [30]

    Muhammad Idris, Martín Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolf- gang Lehner. 2019. Efficient Query Processing for Dynamically Changing Datasets.ACM SIGMOD Record48, 1 (2019), 33–40

  31. [31]

    Yannis Katsis, Kian Win Ong, Yannis Papakonstantinou, and Kevin Keliang Zhao

  32. [32]

    InProceedings of the ACM International Conference on Management of Data (SIGMOD ’15)

    Utilizing IDs to Accelerate Incremental View Maintenance. InProceedings of the ACM International Conference on Management of Data (SIGMOD ’15). ACM, 1985–2000

  33. [33]

    Oliver Kennedy, Yanif Ahmad, and Christoph Koch. 2011. DBToaster: Agile Views for a Dynamic Data Management System. InProceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR ’11). www.cidrdb.org, 284–295

  34. [34]

    Christoph Koch. 2010. Incremental Query Evaluation in a Ring of Databases. InProceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS ’10). ACM, 87–98. doi:10.1145/1807085. 1807100

  35. [35]

    Christoph Koch, Yanif Ahmad, Oliver Kennedy, Milos Nikolic, Andres Nötzli, Daniel Lupei, and Amir Shaikhha. 2014. DBToaster: Higher-Order Delta Pro- cessing for Dynamic, Frequently Fresh Views.The VLDB Journal23, 2 (2014), 253–278. doi:10.1007/s00778-013-0348-4

  36. [36]

    Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really?Proceedings of the VLDB Endowment9, 3 (2015), 204–215. doi:10.14778/2850583.2850594

  37. [37]

    Frank McSherry. 2022. Materialize: A Platform for Building Scalable Event Based Systems. InProceedings of the 16th ACM International Conference on Distributed and Event-Based Systems (DEBS ’22). ACM, 3

  38. [38]

    Frank McSherry, Derek Gordon Murray, Rebecca Isaacs, and Michael Isard. 2013. Differential Dataflow. InProceedings of the 6th Biennial Conference on Innovative Data Systems Research (CIDR ’13). www.cidrdb.org

  39. [39]

    Sudarshan, and Krithi Ramamritham

    Hoshi Mistry, Prasan Roy, S. Sudarshan, and Krithi Ramamritham. 2001. Mate- rialized View Selection and Maintenance Using Multi-Query Optimization. In Proceedings of the ACM International Conference on Management of Data (SIGMOD ’01). ACM, 307–318

  40. [40]

    Meikel Poess, Tilmann Rabl, Hans-Arno Jacobsen, and Brian Caufield. 2014. TPC- DI: The First Industry Benchmark for Data Integration.Proceedings of the VLDB Endowment7, 13 (2014), 1367–1378

  41. [41]

    2025.Materialized Views

    PostgreSQL Global Development Group. 2025.Materialized Views. Re- trieved November 1, 2025 from https://www.postgresql.org/docs/current/rules- materializedviews.html

  42. [42]

    Dallan Quass. 1996. Maintenance Expressions for Views with Aggregation. In Proceedings of the Workshop on Materialized Views: Techniques and Applications (VIEWS ’96). 110–118

  43. [43]

    Deepak Vohra. 2016. Apache Parquet. InPractical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools. Springer, 325–335. SIGMOD Companion ’26, May 31-June 05, 2026, Bengaluru, India Ritwik Yadav et al

  44. [44]

    Qichen Wang and Ke Yi. 2020. Maintaining Acyclic Foreign-Key Joins under Updates. InProceedings of the ACM International Conference on Management of Data (SIGMOD ’20). ACM, 1225–1239

  45. [45]

    Yanghao Wang and Zhi Liu. 2022. A Sneak Peek at RisingWave: A Cloud-Native Streaming Database. InProceedings of the 16th ACM International Conference on Distributed and Event-Based Systems (DEBS ’22). ACM, 190–193. doi:10.1145/ 3524860.3543284

  46. [46]

    Maryann Xue, Yingyi Bu, Abhishek Somani, Wenchen Fan, Ziqi Liu, Steven Chen, Herman van Hovell, Bart Samwel, Mostafa Mokhtar, Rk Korlapati, Andy Lam, Yunxiao Ma, Vuk Ercegovac, Jiexing Li, Alexander Behm, Yuanjian Li, Xiao Li, Sriram Krishnamurthy, Amit Shukla, Michalis Petropoulos, Sameer Paranjpye, Reynold Xin, and Matei Zaharia. 2024. Adaptive and Robu...

  47. [47]

    Zilio, Calisto Zuzarte, Sam Lightstone, Wenbin Ma, Guy M

    Daniel C. Zilio, Calisto Zuzarte, Sam Lightstone, Wenbin Ma, Guy M. Lohman, Roberta Cochrane, Hamid Pirahesh, Latha S. Colby, Jarek Gryz, Eric Alton, Dong- ming Liang, and Gary Valentin. 2004. Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor. InProceedings of the International Conference on Autonomic Computing (ICAC ’04). IEEE, 180–187