arxiv: 2603.27775 · v2 · submitted 2026-03-29 · 💻 cs.DB

Recognition: 2 theorem links

· Lean Theorem

Enzyme: Incremental View Maintenance for Data Engineering

Ritwik Yadav , Supun Abeysinghe , Min Yang , Jeffrey Helt , Manuel Ung , Yuhong Chen , Melody Hu , William Wei

show 11 more authors

Yiming Yang Tom van Bussel Sourav Chatterji Indrajit Roy Paul Lappas Yannis Papakonstantinou Tahir Fayyaz Bilal Aslam Ross Bunker Michael Armbrust Shrikanth Shankar

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:33 UTC · model grok-4.3

classification 💻 cs.DB

keywords incremental view maintenancematerialized viewsApache Sparkdata pipelinescost-based optimizationETLdeclarative pipelinesview refresh

0 comments

The pith

Enzyme automates incremental refresh of materialized views in Spark pipelines through cost-based strategy selection, delivering billions of daily CPU-second savings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Enzyme is an incremental view maintenance engine built for Apache Spark Declarative Pipelines that treats materialized views as first-class building blocks for ETL and analytical workloads. It adds a cost-based optimization layer that automatically chooses refresh strategies for collections of views, exploiting batching across sources instead of requiring users to hand-tune maintenance. The system maintains consistency as underlying data changes while reducing the need for manual intervention in high-throughput settings. Production validation across thousands of diverse pipelines shows large efficiency gains that lower overall compute costs.

Core claim

Enzyme provides a built-in end-to-end incremental view maintenance approach for Spark by layering a cost-based optimizer on top of Spark primitives; the optimizer selects refresh strategies for pipelines of materialized views, incorporates batching optimizations, and generalizes across data sources, with empirical results confirming substantial performance gains at scale.

What carries the argument

The cost-based optimization layer that selects and plans refresh strategies for collections of materialized views organized into pipelines, while exploiting cross-source batching opportunities.

If this is right

Users focus on business logic rather than materialized view mechanics in declarative pipelines.
Total cost of ownership for data engineering workloads decreases through automated and efficient maintenance.
Performance scales on standard benchmarks and large production deployments via batching and optimization.
Modular architecture supports extension to additional data sources and query engines beyond current Spark usage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the optimizer generalizes reliably, the same automation pattern could reduce manual tuning in other data processing systems that rely on materialized views.
The demonstrated compute reductions open the possibility of running more frequent or real-time updates in environments where resources were previously a constraint.
Broader adoption might shift ETL design toward treating incremental maintenance as a default rather than an advanced feature.

Load-bearing premise

The cost-based optimizer can reliably pick correct and efficient refresh strategies for arbitrary view collections and data sources without adding unacceptable overhead or correctness risks.

What would settle it

A production pipeline in which the optimizer-chosen refresh strategy uses more compute than a manually tuned alternative or produces inconsistent view results would disprove the central efficiency and reliability claims.

Figures

Figures reproduced from arXiv: 2603.27775 by Bilal Aslam, Indrajit Roy, Jeffrey Helt, Manuel Ung, Melody Hu, Michael Armbrust, Min Yang, Paul Lappas, Ritwik Yadav, Ross Bunker, Shrikanth Shankar, Sourav Chatterji, Supun Abeysinghe, Tahir Fayyaz, Tom van Bussel, William Wei, Yannis Papakonstantinou, Yiming Yang, Yuhong Chen.

**Figure 3.** Figure 3: Simplified query plan for the MV query in Figure 2. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Enzyme transforms a query into an incremental [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Enzyme selects between incremental and full re [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Dependency Graph of MVs in TPC-DI [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Enzyme performance overview on TPC-DI benchmark. Note that the y-axis is log scale. [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Enzyme performance compared to a leading cloud vendor. Note that the y-axis is log scale. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: Autoscaling takes advantage of smoother task [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

read the original abstract

Materialized views are a core construct in database systems, used to accelerate analytical queries and optimize batch pipelines for extract-transform-load (ETL) workflows. Maintaining view consistency as underlying data evolves is a fundamental challenge, especially in high-throughput and real-time settings. Incremental view maintenance (IVM) has been studied for decades and continues to attract significant investment from major database vendors. However, most industrial systems either offer limited SQL-operator coverage or require users to hand-tune refresh strategies. This paper presents Enzyme, an IVM engine developed at Databricks to power Spark Declarative Pipelines. It provides a built-in, end-to-end approach to incremental pipelines, utilizing materialized views as first-class building blocks. By automating refresh planning, Enzyme reduces total cost of ownership and lets users focus on business logic rather than MV mechanics. Validation across thousands of large-scale production pipelines spanning diverse application domains has demonstrated substantial computational efficiency gains, yielding a cumulative daily compute reduction of billions of CPU seconds. Built atop Apache Spark primitives, Enzyme adds a cost-based optimization layer that selects refresh strategies for collections of materialized views organized into pipelines. Enzyme's modular architecture is designed to generalize across data sources and query engines. We present key design decisions for incremental refresh planning and execution, including optimizations that exploit batching opportunities across materialized view sources. Experimental results on standard benchmarks demonstrate significant performance improvements at scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Enzyme is a practical IVM system for Spark Declarative Pipelines that automates refresh planning and claims large production savings, but the paper gives almost no details on the cost model or how the gains were measured.

read the letter

Enzyme adds an automated IVM layer on top of Spark for Databricks' Declarative Pipelines. It treats materialized views as first-class objects, picks refresh strategies with a cost-based planner, and tries to batch work across views. That is the core new piece: an end-to-end industrial implementation that removes the hand-tuning users normally have to do for incremental pipelines at scale. The modular design and focus on Spark primitives are sensible choices for a system meant to run on real workloads rather than micro-benchmarks.

Referee Report

2 major / 1 minor

Summary. The paper presents Enzyme, an incremental view maintenance (IVM) engine built for Apache Spark Declarative Pipelines at Databricks. It treats materialized views as first-class constructs in ETL pipelines, automates refresh planning via a cost-based optimizer that selects strategies and exploits batching across views, and claims to reduce total cost of ownership by eliminating manual tuning. The central empirical claim is that deployment across thousands of large-scale production pipelines has yielded billions of daily CPU-second reductions, with additional support from experiments on standard benchmarks.

Significance. If the production-scale claims are substantiated with reproducible methodology, Enzyme would constitute a meaningful systems contribution by demonstrating practical, generalizable IVM at the scale of modern Spark workloads. The emphasis on modular architecture and automated strategy selection addresses a long-standing gap between academic IVM research and industrial ETL practice. However, the current manuscript supplies no concrete evidence (benchmarks, baselines, or verification procedures) that would allow the field to assess whether the reported gains are attributable to the described optimizer rather than workload-specific factors.

major comments (2)

[Abstract] Abstract: The headline claim of 'cumulative daily compute reduction of billions of CPU seconds' across thousands of production pipelines is presented without any description of the evaluation methodology, baseline systems, error bars, or correctness verification procedure. This omission makes it impossible to determine whether the savings result from Enzyme's cost-based refresh planner or from unrelated Spark improvements.
[Design and Optimization sections (referenced in Abstract)] The cost-based optimization layer is described as selecting refresh strategies and exploiting batching opportunities, yet the manuscript provides no specification of the cost model, the search space over refresh plans, or how interdependencies among materialized views are handled. Without these details, the central architectural claim cannot be evaluated for overhead or correctness on arbitrary MV collections.

minor comments (1)

[Abstract] The abstract states that 'experimental results on standard benchmarks demonstrate significant performance improvements at scale' but does not name the benchmarks or report quantitative results; these should be added with tables or figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for clearer methodological details. We will revise the manuscript to strengthen the abstract and expand the description of the cost-based optimizer. Our point-by-point responses follow.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claim of 'cumulative daily compute reduction of billions of CPU seconds' across thousands of production pipelines is presented without any description of the evaluation methodology, baseline systems, error bars, or correctness verification procedure. This omission makes it impossible to determine whether the savings result from Enzyme's cost-based refresh planner or from unrelated Spark improvements.

Authors: We agree that the abstract would benefit from a concise description of the evaluation approach. The reported savings come from production A/B deployments: for each pipeline we measured daily CPU-seconds under the prior manual refresh regime versus after Enzyme was enabled, using the same Spark version and data volumes. Correctness was verified by comparing view contents and downstream query results before and after each refresh. We will revise the abstract to state this high-level methodology and point to Section 5 for benchmark details on standard datasets. Because the production data are proprietary, we report only aggregated statistics rather than per-pipeline error bars. revision: yes
Referee: [Design and Optimization sections (referenced in Abstract)] The cost-based optimization layer is described as selecting refresh strategies and exploiting batching opportunities, yet the manuscript provides no specification of the cost model, the search space over refresh plans, or how interdependencies among materialized views are handled. Without these details, the central architectural claim cannot be evaluated for overhead or correctness on arbitrary MV collections.

Authors: We acknowledge that the current text could make the cost model and search procedure more explicit. The optimizer estimates refresh cost from Spark statistics on data volume, predicate selectivity, and update delta size; batching savings are modeled as a reduction in scan overhead when multiple views share source partitions. The search enumerates per-view choices (incremental versus full refresh) subject to the pipeline DAG and applies a dynamic-programming pass to select globally consistent plans. We will add the cost-model equations, a short pseudocode listing of the planner, and an explicit statement of how DAG dependencies are respected in the revised Design section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical production claims rest on external measurements without derivations or self-referential reductions

full rationale

The paper presents Enzyme as an IVM system with a cost-based optimizer for refresh planning and batching, validated via production runs and benchmarks. No equations, fitted parameters, or derivation steps appear in the provided text. Central efficiency claims (billions of CPU-second reductions) are attributed to observed outcomes across thousands of pipelines rather than any model that reduces to its own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The architecture description and empirical results form a self-contained systems contribution without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters, new axioms, or invented entities are introduced; the work is a systems implementation relying on standard database assumptions about view consistency and data evolution.

axioms (1)

domain assumption Standard assumptions on view consistency and incremental update semantics from prior IVM literature
The system builds on decades of IVM research without stating new foundational axioms.

pith-pipeline@v0.9.0 · 5613 in / 1084 out tokens · 36375 ms · 2026-05-14T21:33:22.160663+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Enzyme adds a cost-based optimization layer that selects refresh strategies for collections of materialized views organized into pipelines
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

operator-level delta plan construction... Δ(G_{k,agg}(T)) = ...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

[1]

Supun Abeysinghe, Qiyang He, and Tiark Rompf. 2022. Efficient Incrementializa- tion of Correlated Nested Aggregate Queries using Relative Partial Aggregate Indexes (RPAI). InProceedings of the ACM International Conference on Manage- ment of Data (SIGMOD ’22). ACM, 136–149. doi:10.1145/3514221.3517889

work page doi:10.1145/3514221.3517889 2022
[2]

Narasayya

Sanjay Agrawal, Surajit Chaudhuri, and Vivek R. Narasayya. 2000. Automated Selection of Materialized Views and Indexes in SQL Databases. InProceedings of the 26th International Conference on Very Large Data Bases (VLDB ’00). Morgan Kaufmann, Cairo, Egypt, 496–505

work page 2000
[3]

Rafi Ahmed, Randall Bello, Andrew Witkowski, and Praveen Kumar. 2020. Au- tomated Generation of Materialized Views in Oracle.Proceedings of the VLDB Endowment13, 12 (2020), 3046–3058

work page 2020
[4]

Tyler Akidau, Paul Barbier, Istvan Cseri, Fabian Hueske, Tyler Jones, Sasha Lion- heart, Daniel Mills, Dzmitry Pauliukevich, Lukas Probst, Niklas Semmler, Dan Sotolongo, and Boyuan Zhang. 2023. What’s the Difference? Incremental Process- ing with Change Queries in Snowflake.Proceedings of the ACM on Management of Data1, 2 (2023), 1–27. doi:10.1145/3589776

work page doi:10.1145/3589776 2023
[5]

2024.Materialized Views in Amazon Redshift

Amazon Web Services. 2024.Materialized Views in Amazon Redshift. Re- trieved November 1, 2025 from https://docs.aws.amazon.com/redshift/latest/ dg/materialized-view-overview.html

work page 2024
[6]

Michael Armbrust, Tathagata Das, Liwen Sun, Burak Yavuz, Shixiong Zhu, Mukul Murthy, Joseph Torres, Herman van Hovell, Adrian Ionescu, Alicja Łuszczak, Michał Świątkowski, Michał Szafrański, Xiao Li, Takuya Ueshin, Mostafa Mokhtar, Peter Boncz, Ali Ghodsi, Sameer Paranjpye, Pieter Senster, Reynold Xin, and Matei Zaharia. 2020. Delta Lake: High-Performance...

work page doi:10.14778/3415478.3415560 2020
[7]

Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K

Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. InProceedings of the ACM International Conference on Management of Data (SIGMOD ’15). ACM, 1383–1394. doi:10.1145/2723372.2742797

work page doi:10.1145/2723372.2742797 2015
[8]

Nikos Armenatzoglou, Sanuj Basu, Naga Bhanoori, Mengchu Cai, Naresh Chainani, Kiran Chinta, Venkatraman Govindaraju, Todd J. Green, Monish Gupta, Sebastian Hillig, Eric Hotinger, Yan Leshinksy, Jintian Liang, Michael McCreedy, Fabian Nagel, Ippokratis Pandis, Panos Parchas, Rahul Pathak, Orestis Polychro- niou, Foyzur Rahman, Gaurav Saxena, Gokul Soundara...

work page 2022
[9]

Alexander Behm, Shoumik Palkar, Utkarsh Agarwal, Timothy Armstrong, David Cashman, Ankur Dave, Todd Greenstein, Shant Hovsepian, Ryan Johnson, Arvind Sai Krishnan, Paul Leventis, Ala Luszczak, Prashanth Menon, Mostafa Mokhtar, Gene Pang, Sameer Paranjpye, Greg Rahn, Bart Samwel, Tom van Bussel, Herman van Hovell, Maryann Xue, Reynold Xin, and Matei Zahari...

work page arXiv 2022
[10]

Bello, Karl Dias, Alan Downing, James J

Randall G. Bello, Karl Dias, Alan Downing, James J. Feenan, James L. Finnerty, William D. Norcott, Harry Sun, Andrew Witkowski, and Mohamed Ziauddin

work page
[11]

InProceedings of the 24th International Conference on Very Large Data Bases (VLDB ’98)

Materialized Views in Oracle. InProceedings of the 24th International Conference on Very Large Data Bases (VLDB ’98). Morgan Kaufmann, New York, NY, USA, 659–664

work page
[12]

Blakeley, Per-Åke Larson, and Frank Wm

José A. Blakeley, Per-Åke Larson, and Frank Wm. Tompa. 1986. Efficiently Updating Materialized Views. InProceedings of the ACM International Conference on Management of Data (SIGMOD ’86). ACM, Washington, DC, USA, 61–71. doi:10.1145/16894.16861

work page doi:10.1145/16894.16861 1986
[13]

Mihai Budiu, Tej Chajed, Frank McSherry, Leonid Ryzhyk, and Val Tannen. 2023. DBSP: Automatic Incremental View Maintenance for Rich Query Languages. Proceedings of the VLDB Endowment16, 7 (2023), 1601–1614. doi:10.14778/3587136. 3587137

work page doi:10.14778/3587136 2023
[14]

Ramesh Chandra, Haogang Chen, Ray Matharu, Sarah Cai, Jeff Chen, Priyam Dutta, Bogdan Ghita, Todd Greenstein, Gopal Holla, Peng Huang, Yuchen Huo, Adrian Ionescu, Adriana Ispas, Tim Januschowski, Vihang Karajgaonkar, Stefania Leone, David Lewis, Andrew Li, Nong Li, Cheng Lian, Stephen Link, Qing Lu, Yesheng Ma, Chris Pettitt, Vijayan Prabhakaran, Bogdan R...

work page 2025
[15]

2019.Introducing Delta Time Travel for Large Scale Data Lakes

Databricks. 2019.Introducing Delta Time Travel for Large Scale Data Lakes. Retrieved November 1, 2025 from https://www.databricks.com/blog/2019/02/04/ introducing-delta-time-travel-for-large-scale-data-lakes.html

work page 2019
[16]

2024.Use Row Tracking for Delta Tables

Databricks. 2024.Use Row Tracking for Delta Tables. Retrieved November 1, 2025 from https://docs.databricks.com/aws/en/delta/row-tracking

work page 2024
[17]

2025.The AUTO CDC APIs: Simplify Change Data Capture with Pipelines

Databricks. 2025.The AUTO CDC APIs: Simplify Change Data Capture with Pipelines. Retrieved November 1, 2025 from https://docs.databricks.com/aws/en/ ldp/cdc

work page 2025
[18]

2025.MERGE INTO (Delta Lake SQL Reference)

Databricks. 2025.MERGE INTO (Delta Lake SQL Reference). Retrieved Novem- ber 1, 2025 from https://docs.databricks.com/aws/en/sql/language-manual/delta- merge-into

work page 2025
[19]

2025.Selectively Overwrite Data with Delta Lake

Databricks. 2025.Selectively Overwrite Data with Delta Lake. Retrieved November 1, 2025 from https://docs.databricks.com/aws/en/delta/selective-overwrite

work page 2025
[20]

2025.Use Delta Lake Change Data Feed on Databricks

Databricks. 2025.Use Delta Lake Change Data Feed on Databricks. Retrieved November 1, 2025 from https://docs.databricks.com/aws/en/delta/delta-change- data-feed

work page 2025
[21]

2025.What Are Deletion Vectors?Retrieved November 1, 2025 from https://docs.databricks.com/aws/en/delta/deletion-vectors

Databricks. 2025.What Are Deletion Vectors?Retrieved November 1, 2025 from https://docs.databricks.com/aws/en/delta/deletion-vectors

work page 2025
[22]

Jonathan Goldstein and Per-Åke Larson. 2001. Optimizing Queries Using Materi- alized Views: A Practical, Scalable Solution.ACM SIGMOD Record30, 2 (2001), 331–342

work page 2001
[23]

2025.Introduction to Materialized Views

Google. 2025.Introduction to Materialized Views. Retrieved November 1, 2025 from https://cloud.google.com/bigquery/docs/materialized-views-intro

work page 2025
[24]

Green, Shan Shan Huang, Boon Thau Loo, and Wenchao Zhou

Todd J. Green, Shan Shan Huang, Boon Thau Loo, and Wenchao Zhou. 2013. Datalog and Recursive Query Processing.Foundations and Trends in Databases5, 2 (2013), 105–195. doi:10.1561/1900000017

work page doi:10.1561/1900000017 2013
[25]

Timothy Griffin and Bharat Kumar. 1998. Algebraic Change Propagation for Semijoin and Outerjoin Queries.ACM SIGMOD Record27, 3 (1998), 22–27

work page 1998
[26]

Timothy Griffin and Leonid Libkin. 1995. Incremental Maintenance of Views with Duplicates. InProceedings of the ACM International Conference on Management of Data (SIGMOD ’95). ACM, San Jose, CA, USA, 328–339. doi:10.1145/223784.223849

work page doi:10.1145/223784.223849 1995
[27]

Ashish Gupta, Inderpal Singh Mumick, and V. S. Subrahmanian. 1993. Main- taining Views Incrementally. InProceedings of the ACM International Conference on Management of Data (SIGMOD ’93). ACM, Washington, DC, USA, 157–166. doi:10.1145/170035.170066

work page doi:10.1145/170035.170066 1993
[28]

Muhammad Idris, Martín Ugarte, and Stijn Vansummeren. 2017. The Dynamic Yannakakis Algorithm: Compact and Efficient Query Processing Under Updates. InProceedings of the ACM International Conference on Management of Data (SIG- MOD ’17). ACM, 1259–1274. doi:10.1145/3035918.3064027

work page doi:10.1145/3035918.3064027 2017
[29]

Muhammad Idris, Martín Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolf- gang Lehner. 2018. Conjunctive Queries with Inequalities Under Updates.Pro- ceedings of the VLDB Endowment11, 7 (2018), 733–745

work page 2018
[30]

Muhammad Idris, Martín Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolf- gang Lehner. 2019. Efficient Query Processing for Dynamically Changing Datasets.ACM SIGMOD Record48, 1 (2019), 33–40

work page 2019
[31]

Yannis Katsis, Kian Win Ong, Yannis Papakonstantinou, and Kevin Keliang Zhao

work page
[32]

InProceedings of the ACM International Conference on Management of Data (SIGMOD ’15)

Utilizing IDs to Accelerate Incremental View Maintenance. InProceedings of the ACM International Conference on Management of Data (SIGMOD ’15). ACM, 1985–2000

work page 1985
[33]

Oliver Kennedy, Yanif Ahmad, and Christoph Koch. 2011. DBToaster: Agile Views for a Dynamic Data Management System. InProceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR ’11). www.cidrdb.org, 284–295

work page 2011
[34]

Christoph Koch. 2010. Incremental Query Evaluation in a Ring of Databases. InProceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS ’10). ACM, 87–98. doi:10.1145/1807085. 1807100

work page doi:10.1145/1807085 2010
[35]

Christoph Koch, Yanif Ahmad, Oliver Kennedy, Milos Nikolic, Andres Nötzli, Daniel Lupei, and Amir Shaikhha. 2014. DBToaster: Higher-Order Delta Pro- cessing for Dynamic, Frequently Fresh Views.The VLDB Journal23, 2 (2014), 253–278. doi:10.1007/s00778-013-0348-4

work page doi:10.1007/s00778-013-0348-4 2014
[36]

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really?Proceedings of the VLDB Endowment9, 3 (2015), 204–215. doi:10.14778/2850583.2850594

work page doi:10.14778/2850583.2850594 2015
[37]

Frank McSherry. 2022. Materialize: A Platform for Building Scalable Event Based Systems. InProceedings of the 16th ACM International Conference on Distributed and Event-Based Systems (DEBS ’22). ACM, 3

work page 2022
[38]

Frank McSherry, Derek Gordon Murray, Rebecca Isaacs, and Michael Isard. 2013. Differential Dataflow. InProceedings of the 6th Biennial Conference on Innovative Data Systems Research (CIDR ’13). www.cidrdb.org

work page 2013
[39]

Sudarshan, and Krithi Ramamritham

Hoshi Mistry, Prasan Roy, S. Sudarshan, and Krithi Ramamritham. 2001. Mate- rialized View Selection and Maintenance Using Multi-Query Optimization. In Proceedings of the ACM International Conference on Management of Data (SIGMOD ’01). ACM, 307–318

work page 2001
[40]

Meikel Poess, Tilmann Rabl, Hans-Arno Jacobsen, and Brian Caufield. 2014. TPC- DI: The First Industry Benchmark for Data Integration.Proceedings of the VLDB Endowment7, 13 (2014), 1367–1378

work page 2014
[41]

2025.Materialized Views

PostgreSQL Global Development Group. 2025.Materialized Views. Re- trieved November 1, 2025 from https://www.postgresql.org/docs/current/rules- materializedviews.html

work page 2025
[42]

Dallan Quass. 1996. Maintenance Expressions for Views with Aggregation. In Proceedings of the Workshop on Materialized Views: Techniques and Applications (VIEWS ’96). 110–118

work page 1996
[43]

Deepak Vohra. 2016. Apache Parquet. InPractical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools. Springer, 325–335. SIGMOD Companion ’26, May 31-June 05, 2026, Bengaluru, India Ritwik Yadav et al

work page 2016
[44]

Qichen Wang and Ke Yi. 2020. Maintaining Acyclic Foreign-Key Joins under Updates. InProceedings of the ACM International Conference on Management of Data (SIGMOD ’20). ACM, 1225–1239

work page 2020
[45]

Yanghao Wang and Zhi Liu. 2022. A Sneak Peek at RisingWave: A Cloud-Native Streaming Database. InProceedings of the 16th ACM International Conference on Distributed and Event-Based Systems (DEBS ’22). ACM, 190–193. doi:10.1145/ 3524860.3543284

work page arXiv 2022
[46]

Maryann Xue, Yingyi Bu, Abhishek Somani, Wenchen Fan, Ziqi Liu, Steven Chen, Herman van Hovell, Bart Samwel, Mostafa Mokhtar, Rk Korlapati, Andy Lam, Yunxiao Ma, Vuk Ercegovac, Jiexing Li, Alexander Behm, Yuanjian Li, Xiao Li, Sriram Krishnamurthy, Amit Shukla, Michalis Petropoulos, Sameer Paranjpye, Reynold Xin, and Matei Zaharia. 2024. Adaptive and Robu...

work page 2024
[47]

Zilio, Calisto Zuzarte, Sam Lightstone, Wenbin Ma, Guy M

Daniel C. Zilio, Calisto Zuzarte, Sam Lightstone, Wenbin Ma, Guy M. Lohman, Roberta Cochrane, Hamid Pirahesh, Latha S. Colby, Jarek Gryz, Eric Alton, Dong- ming Liang, and Gary Valentin. 2004. Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor. InProceedings of the International Conference on Autonomic Computing (ICAC ’04). IEEE, 180–187

work page 2004