Declarative Data Services: Structured Agentic Discovery for Composing Data Systems

Duo Lu; Shanshan Ye

arxiv: 2605.20690 · v1 · pith:2LNMEWO5new · submitted 2026-05-20 · 💻 cs.AI

Declarative Data Services: Structured Agentic Discovery for Composing Data Systems

Shanshan Ye , Duo Lu This is my paper

Pith reviewed 2026-05-21 05:14 UTC · model grok-4.3

classification 💻 cs.AI

keywords agentic discoverydata system compositiondeclarative servicestyped contractsruntime feedbackLLM agentstrading backend

0 comments

The pith

Structured agentic discovery using four typed contracts lets data-system compositions converge where unbounded search fails.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that agentic discovery for composing heterogeneous data backends needs structure to succeed, because open-ended LLM iteration on failure logs does not reliably produce working stacks. Declarative Data Services decomposes the task into four successive typed contracts—intent, operator DAG, per-system skills, and runtime attribution—so that specialized sub-agents perform bounded searches while knowledge moves forward through inline skill citations and backward through typed error signals. On a trading-backend workload the method reaches consistent convergence and converts runtime failures into patches cited by later deployments. A reader would care because real data systems are assembled from multiple components whose composition knowledge is poorly captured in pretraining, and current agents lose direction in the resulting search space.

Core claim

Declarative Data Services owns four typed contracts at successive layers (intent, operator DAG, per-system skills, runtime attribution) that decompose the global search into bounded sub-searches performed by sub-agents. The framework supplies channels for knowledge to flow forward as inline skill citations and for errors to route backward as typed signals. In a proof-of-life demonstration on a trading-backend workload, this architecture converges to working stacks where unbounded agentic discovery does not, and runtime failures become skill patches cited inline in the next deployment.

What carries the argument

The four typed contracts (intent, operator DAG, per-system skills, runtime attribution) that break the global composition search into bounded sub-searches and route knowledge via inline citations and typed error signals.

If this is right

Runtime failures are captured as reusable skill patches that later deployments cite directly.
Sub-agents can succeed at their narrower, typed search spaces even when the overall composition problem remains large.
Composition knowledge accumulates across deployments through the framework's citation and signal channels rather than depending solely on pretraining.
Declarative user intent can drive end-to-end composition of heterogeneous data systems without requiring the agent to maintain the entire search space in one context.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same layered-contract pattern could be applied to composing infrastructure stacks or scientific workflows where components also interact through typed interfaces.
Runtime attribution might reduce reliance on exhaustive pretraining by letting the system learn system-specific behaviors from actual deployments.
Extending the contracts to include cost or latency objectives could turn the current convergence result into an optimization method.
The approach suggests that many agentic discovery tasks become tractable once the search is factored into contract-defined layers rather than left fully open-ended.

Load-bearing premise

That the four contracts can be defined and maintained so sub-agents reliably perform their bounded searches and that inline citations plus typed errors suffice to carry useful knowledge across iterations.

What would settle it

Run repeated trials of unbounded discovery versus DDS on the identical trading-backend workload and observe whether DDS produces a working stack in every trial while unbounded discovery continues to fail to converge.

Figures

Figures reproduced from arXiv: 2605.20690 by Duo Lu, Shanshan Ye.

**Figure 2.** Figure 2: The four DDS layers (L1–L4), each carrying a typed contract. L0 in the figure is an [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The L4 attribution loop: deploy, observe, attribute, patch. Each runtime signal is routed to [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Live-data proof of life on a DDS-generated stack. A Coinbase public WebSocket feed (20 [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Trading-workload intent populating all six L1 dimensions (§3). The framework emits one [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

read the original abstract

Agentic discovery has shown that LLM-driven search can find novel algorithms, designs, and code under benchmark conditions. Translating the paradigm to multi-system data backends surfaces a harder problem: the search space is heterogeneous, the verifier is whether a deployed stack actually runs, and composition knowledge is unevenly captured in pretraining. Unbounded agentic discovery, a coding agent iterating on failure-log feedback, fails to converge consistently on a working stack even when iteration and explicit composition knowledge are added. We propose Declarative Data Services (DDS), an architecture for structured agentic discovery of data-system compositions from declarative user intent. The framework owns four typed contracts at successive layers (intent, operator DAG, per-system skills, runtime attribution) that decompose the global search into bounded sub-searches; sub-agents search each typed space, while the framework provides the channels by which knowledge flows forward as inline skill citations and errors route backward as typed signals. As a proof of life on a trading-backend workload, DDS converges where unbounded discovery does not; runtime failures become skill patches that the next deployment cites inline. We position this as an early prototype reporting lessons from real-world data-system composition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DDS proposes a four-layer typed contract structure to bound agentic search over heterogeneous data-system compositions, but the single proof-of-life case gives no numbers or ablations to show the contracts actually drive the reported convergence.

read the letter

The main takeaway is that this paper gives a concrete architecture for making agentic discovery work on real data backends instead of letting the agent wander. They split the problem into four typed layers—user intent, operator DAG, per-system skills, and runtime attribution—and let sub-agents search inside each one while passing knowledge forward through inline citations and routing errors back as typed signals. On a trading-backend example the structured version reaches a working stack while plain unbounded search does not, and failures get turned into patches that later runs cite directly.

Referee Report

2 major / 2 minor

Summary. The paper proposes Declarative Data Services (DDS), an architecture for structured agentic discovery of data-system compositions from declarative user intent using four typed contracts (intent, operator DAG, per-system skills, runtime attribution). These contracts decompose the heterogeneous search space into bounded sub-searches performed by sub-agents, with forward knowledge flow via inline skill citations and backward propagation of typed error signals. As a proof-of-life demonstration on a trading-backend workload, DDS is reported to converge on a working stack where unbounded agentic discovery (even with iteration and composition knowledge) fails, converting runtime failures into reusable skill patches cited in subsequent deployments.

Significance. If the central claim holds, the work offers a structured alternative to unbounded LLM-driven search for practical multi-system data backend composition, where verifiers are deployment success and pretraining knowledge is uneven. The explicit layering of contracts and bidirectional knowledge channels could generalize to other heterogeneous composition tasks; the positioning as an early prototype reporting real-world lessons is a constructive contribution even at this stage.

major comments (2)

[Abstract and Evaluation] Abstract and Evaluation (proof-of-life section): The claim that DDS converges where unbounded discovery does not rests on a single unreported workload run without iteration counts, success rates, failure-mode distributions, baseline comparisons, or logs showing how the four contracts produced bounded sub-searches. This is load-bearing for the central architectural claim and leaves open whether convergence arises from the contract structure, workload simplicity, or unstated human tuning.
[Architecture] Architecture (contract definitions): The assumption that the four typed contracts can be maintained so that sub-agents reliably bound their searches and that inline citations plus typed runtime attribution signals propagate root-cause knowledge (rather than generic errors) is asserted but not supported by ablation or tracing of signal flow across iterations in the reported case.

minor comments (2)

[Introduction/Architecture] Add a dedicated subsection early in the paper that formally defines the interfaces and invariants of each of the four typed contracts to improve readability for readers unfamiliar with the layered approach.
[Related Work] Expand the related-work discussion to include recent agentic discovery systems and data-system composition frameworks for clearer positioning.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments correctly identify areas where the proof-of-life demonstration would benefit from greater transparency. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Abstract and Evaluation] Abstract and Evaluation (proof-of-life section): The claim that DDS converges where unbounded discovery does not rests on a single unreported workload run without iteration counts, success rates, failure-mode distributions, baseline comparisons, or logs showing how the four contracts produced bounded sub-searches. This is load-bearing for the central architectural claim and leaves open whether convergence arises from the contract structure, workload simplicity, or unstated human tuning.

Authors: We acknowledge that the current evaluation presents only a qualitative proof-of-life on one trading-backend workload and does not report quantitative metrics such as iteration counts, success rates, or detailed logs. The manuscript positions this as an early prototype illustrating that the contract structure enabled convergence where an unbounded baseline did not. In the revised manuscript we will expand the evaluation section to include available iteration counts, observed failure modes in the unbounded case, and a step-by-step description of how the four contracts produced bounded sub-searches in the reported run. We will also add a brief discussion of potential human tuning and workload characteristics to address the concern that convergence may not generalize from the contract design alone. revision: yes
Referee: [Architecture] Architecture (contract definitions): The assumption that the four typed contracts can be maintained so that sub-agents reliably bound their searches and that inline citations plus typed runtime attribution signals propagate root-cause knowledge (rather than generic errors) is asserted but not supported by ablation or tracing of signal flow across iterations in the reported case.

Authors: We agree that the manuscript asserts the utility of the four contracts and the bidirectional knowledge channels without providing explicit tracing or ablation evidence. The proof-of-life example shows the outcome but does not walk through signal propagation. In the revision we will add a new figure and accompanying text that traces the forward flow of inline skill citations and the backward propagation of typed error signals for the reported deployment. We will also include a short discussion of observed challenges in maintaining contract consistency and how root-cause attribution differed from generic error logs in the case study. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural proposal without derivation chain or self-referential reduction

full rationale

The manuscript proposes Declarative Data Services as an architectural framework that decomposes agentic search into four typed contracts (intent, operator DAG, per-system skills, runtime attribution) to bound sub-searches and route knowledge via inline citations and typed error signals. This is presented as an original design with a proof-of-life demonstration on a trading-backend workload rather than any numerical derivation, fitted-parameter prediction, or equation that reduces to its own inputs by construction. No self-citation load-bearing steps, uniqueness theorems imported from prior author work, or ansatz smuggling appear in the description; the central claim of convergence where unbounded discovery fails is asserted via empirical illustration without reducing to a tautology or renamed known result. The proposal remains self-contained as an engineering architecture.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the assumption that LLM agents can exploit the typed layers effectively and that runtime feedback can be turned into reusable skill patches without additional mechanisms.

axioms (1)

domain assumption LLM agents can perform effective bounded sub-searches when supplied with typed contracts and bidirectional knowledge channels.
Invoked to justify why decomposing the search space leads to convergence.

invented entities (1)

Four typed contracts (intent, operator DAG, per-system skills, runtime attribution) no independent evidence
purpose: Decompose global search into bounded sub-searches with knowledge flow
New structure introduced by the paper; no independent evidence supplied beyond the single workload example.

pith-pipeline@v0.9.0 · 5734 in / 1232 out tokens · 30369 ms · 2026-05-21T05:14:05.342220+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The framework owns four typed contracts at successive layers (intent, operator DAG, per-system skills, runtime attribution) that decompose the global search into bounded sub-searches

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 10 internal anchors

[1]

Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. AlphaEvolve: A coding agent for scientific and algor...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Pan, Alexander Du, Kurt Keutzer, Alvin Cheung, Alexandros G

Shu Liu, Shubham Agarwal, Monishwaran Maheswaran, Mert Cemri, Zhifei Li, Qiuyang Mang, Ashwin Naren, Ethan Boneh, Audrey Cheng, Melissa Z. Pan, Alexander Du, Kurt Keutzer, Alvin Cheung, Alexandros G. Dimakis, Koushik Sen, Matei Zaharia, and Ion Stoica. EvoX: Meta- evolution for automated discovery, 2026. URLhttps://arxiv.org/abs/2602.23413

work page arXiv 2026
[3]

AdaEvolve: Adaptive LLM driven zeroth-order optimization, 2026

Mert Cemri, Shubham Agrawal, Akshat Gupta, Shu Liu, Audrey Cheng, Qiuyang Mang, Ashwin Naren, Lutfi Eren Erdogan, Koushik Sen, Matei Zaharia, Alex Dimakis, and Ion Stoica. AdaEvolve: Adaptive LLM driven zeroth-order optimization, 2026. URL https: //arxiv.org/abs/2602.20133

work page arXiv 2026
[4]

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl- Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khattab. GEPA: Reflective prompt evolution can outperform reinforcement learning, 2026. URL https: //a...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

Glia: A human-inspired AI for automated systems design and optimization, 2026

Pouya Hamadanian, Pantea Karimi, Arash Nasr-Esfahany, Kimia Noorbakhsh, Joseph Chandler, Ali ParandehGheibi, Mohammad Alizadeh, and Hari Balakrishnan. Glia: A human-inspired AI for automated systems design and optimization, 2026. URL https://arxiv.org/abs/2510. 27176

work page 2026
[6]

Claude Code: An agentic coding tool

Anthropic. Claude Code: An agentic coding tool. https://www.anthropic.com/ claude-code, 2025. Accessed April 2026

work page 2025
[7]

Elmore, Michael Stonebraker, Magda Balazinska, Bill Howe, Jeremy Kepner, Sam Madden, David Maier, Tim Mattson, and Stan Zdonik

Jennie Duggan, Aaron J. Elmore, Michael Stonebraker, Magda Balazinska, Bill Howe, Jeremy Kepner, Sam Madden, David Maier, Tim Mattson, and Stan Zdonik. The BigDAWG polystore system.SIGMOD Rec., 44(2):11–16, August 2015. ISSN 0163-5808. doi: 10.1145/2814710. 2814713. URLhttps://doi.org/10.1145/2814710.2814713

work page doi:10.1145/2814710 2015
[8]

Gordon, and Bohan Zhang

Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. Automatic database management system tuning through large-scale machine learning. InProceedings of the 2017 ACM International Conference on Management of Data, SIGMOD ’17, pages 1009–1024, New York, NY , USA, 2017. Association for Computing Machinery. ISBN 9781450341974. doi: 10.1145/303591...

work page doi:10.1145/3035918.3064029 2017
[9]

Accessed April 2026

dbt (data build tool).https://www.getdbt.com/, . Accessed April 2026

work page 2026
[10]

Accessed April 2026

Airbyte.https://airbyte.com/. Accessed April 2026

work page 2026
[11]

Accessed April 2026

Fivetran.https://www.fivetran.com/. Accessed April 2026

work page 2026
[12]

Lee, Ashish Motivala, Abdul Q

Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. The Snowflake elastic data warehouse. InProceedings o...

work page doi:10.1145/2882903.2903741 2016
[13]

Lakehouse: A new generation of open platforms that unify data warehousing and advanced analytics

Michael Armbrust, Ali Ghodsi, Reynold Xin, and Matei Zaharia. Lakehouse: A new generation of open platforms that unify data warehousing and advanced analytics. InConference on Innovative Data Systems Research, 2021. URL https://vldb.org/cidrdb/papers/2021/ cidr2021_paper17.pdf

work page 2021
[14]

Accessed April 2026

DB-Engines ranking.https://db-engines.com/en/ranking. Accessed April 2026. 10

work page 2026
[15]

Gonzalez, and Aditya G

Shu Liu, Soujanya Ponnapalli, Shreya Shankar, Sepanta Zeighami, Alan Zhu, Shubham Agarwal, Ruiqi Chen, Samion Suwito, Shuo Yuan, Ion Stoica, Matei Zaharia, Alvin Cheung, Natacha Crooks, Joseph E. Gonzalez, and Aditya G. Parameswaran. Supporting our ai overlords: Redesigning data systems to be agent-first. 2025. URL https://arxiv.org/abs/2509. 00997

work page 2025
[16]

Accessed April 2026

Pulumi AI.https://www.pulumi.com/ai/. Accessed April 2026

work page 2026
[17]

https://developer.hashicorp.com/terraform/docs/tools/ mcp-server

Terraform MCP server. https://developer.hashicorp.com/terraform/docs/tools/ mcp-server. Accessed April 2026

work page 2026
[18]

Park, George S

Patrick Tser Jern Kon, Jiachen Liu, Yiming Qiu, Weijun Fan, Ting He, Lei Lin, Hao- ran Zhang, Owen M. Park, George S. Elengikal, Yuxin Kang, Ang Chen, Mosharaf Chowdhury, Myungjin Lee, and Xinyu Wang. Iac-eval: A code generation benchmark for cloud infrastructure-as-code programs. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and...

work page doi:10.52202/079017-4273 2024
[19]

O’Reilly Media, 2017

Martin Kleppmann.Designing Data-Intensive Applications. O’Reilly Media, 2017. ISBN 978-1449373320

work page 2017
[20]

One size fits all

Michael Stonebraker and U˘gur Çetintemel."One size fits all": an idea whose time has come and gone, pages 441–462. Association for Computing Machinery and Morgan & Claypool,

work page
[21]

URLhttps://doi.org/10.1145/3226595.3226636

ISBN 9781947487192. URLhttps://doi.org/10.1145/3226595.3226636

work page doi:10.1145/3226595.3226636
[22]

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z. Pan, Shuyi Yang, Lakshya A. Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, Matei Zaharia, Joseph E. Gonzalez, and Ion Stoica. Why do multi-agent LLM systems fail?, 2025. URL https://arxiv.org/abs/2503.13657

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

Multi-agent teams hold experts back, 2026

Aneesh Pappu, Batu El, Hancheng Cao, Carmelo di Nolfo, Yanchao Sun, Meng Cao, and James Zou. Multi-agent teams hold experts back, 2026. URL https://arxiv.org/abs/ 2602.01011

work page arXiv 2026
[24]

Dimakis, Matei Zaharia, and Ion Stoica

Shu Liu, Mert Cemri, Shubham Agarwal, Alexander Krentsel, Ashwin Naren, Qiuyang Mang, Zhifei Li, Akshat Gupta, Monishwaran Maheswaran, Audrey Cheng, Melissa Pan, Ethan Boneh, Kannan Ramchandran, Koushik Sen, Alexandros G. Dimakis, Matei Zaharia, and Ion Stoica. SkyDiscover: A flexible framework for AI-driven scientific and algorithmic discovery, 2026. URL...

work page 2026
[25]

OpenEvolve: an open-source evolutionary coding agent, 2025

Asankhaya Sharma. OpenEvolve: an open-source evolutionary coding agent, 2025. URL https://github.com/algorithmicsuperintelligence/openevolve

work page 2025
[26]

ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin. ShinkaEvolve: Towards open-ended and sample-efficient program evolution, 2025. URL https://arxiv.org/abs/2509.19349

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, and Kunle Olukotun. Agentic context engineering: Evolving contexts for self-improving language models, 2026. URLhttps://arxiv.org/abs/2510.04618

work page internal anchor Pith review Pith/arXiv arXiv 2026
[28]

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. DSPy: Compiling declarative language model calls into self-improving pipelines, 2023. URLhttps://arxiv.org/abs/2310.03714

work page internal anchor Pith review Pith/arXiv arXiv 2023
[29]

Meta-Harness: End-to-End Optimization of Model Harnesses

Yoonho Lee, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, and Chelsea Finn. Meta-Harness: End-to-end optimization of model harnesses, 2026. URL https://arxiv. org/abs/2603.28052. 11

work page internal anchor Pith review Pith/arXiv arXiv 2026
[30]

Melissa Z. Pan, Negar Arabzadeh, Riccardo Cogo, Yuxuan Zhu, Alexander Xiong, Lakshya A Agrawal, Huanzhi Mao, Emma Shen, Sid Pallerla, Liana Patel, Shu Liu, Tianneng Shi, Xiaoyuan Liu, Jared Quincy Davis, Emmanuele Lacavalla, Alessandro Basile, Shuyi Yang, Paul Castro, Daniel Kang, Joseph E. Gonzalez, Koushik Sen, Dawn Song, Ion Stoica, Matei Zaharia, and ...

work page 2026
[31]

Semantic operators and their optimization: Enabling llm-based data processing with accuracy guarantees in lotus.Proc

Liana Patel, Siddharth Jha, Melissa Pan, Harshit Gupta, Parth Asawa, Carlos Guestrin, and Matei Zaharia. Semantic operators and their optimization: Enabling llm-based data processing with accuracy guarantees in lotus.Proc. VLDB Endow., 18(11):4171–4184, July 2025. ISSN 2150-8097. doi: 10.14778/3749646.3749685. URL https://doi.org/10.14778/3749646. 3749685

work page doi:10.14778/3749646.3749685 2025
[32]

Parameswaran, and Eugene Wu

Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, and Eugene Wu. DocETL: Agentic query rewriting and evaluation for complex document processing.Proc. VLDB Endow., 18(9):3035–3048, May 2025. ISSN 2150-8097. doi: 10.14778/3746405.3746426. URLhttps://doi.org/10.14778/3746405.3746426

work page doi:10.14778/3746405.3746426 2025
[33]

A declarative system for optimizing ai workloads, 2024

Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baille Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, and Gerardo Vitagliano. A declarative system for optimizing ai workloads, 2024. URLhttps://arxiv.org/abs/2405.14696

work page arXiv 2024
[34]

Accessed April 2026

dbt Mesh.https://www.getdbt.com/product/dbt-mesh, . Accessed April 2026

work page 2026
[35]

Accessed April 2026

Apache Iceberg.https://iceberg.apache.org/. Accessed April 2026

work page 2026
[36]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE-agent: agent-computer interfaces enable automated soft- ware engineering. InProceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY , USA, 2024. Curran Associates Inc. ISBN 9798331314385

work page 2024
[37]

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Lilian Weng, and Aleksander M ˛ adry. MLE-bench: Evaluating machine learning agents on machine learning engineering, 2025. URL https://arxiv.org/abs/2410.07095

work page internal anchor Pith review Pith/arXiv arXiv 2025
[38]

DS-1000: a natural and reliable benchmark for data science code generation

Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Wen-tau Yih, Daniel Fried, Sida Wang, and Tao Yu. DS-1000: a natural and reliable benchmark for data science code generation. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

work page 2023
[39]

Presto: SQL on everything

Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, and Christopher Berner. Presto: SQL on everything. In2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 1802–1813, 2019. doi: 10.1109/ICDE.2019.00196

work page doi:10.1109/icde.2019.00196 2019
[40]

Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K

Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. Spark SQL: Relational data processing in Spark. InProceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD ’15, pages 1383–1394, New York, NY , USA,

work page 2015
[41]

ISBN 9781450327589

Association for Computing Machinery. ISBN 9781450327589. doi: 10.1145/2723372. 2742797. URLhttps://doi.org/10.1145/2723372.2742797

work page doi:10.1145/2723372
[42]

HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots

Alfons Kemper and Thomas Neumann. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In2011 IEEE 27th International Conference on Data Engineering, pages 195–206, 2011. doi: 10.1109/ICDE.2011.5767867

work page doi:10.1109/icde.2011.5767867 2011
[43]

SAP HANA database: data management for modern business applications.SIGMOD Rec., 40(4):45–51, January 2012

Franz Färber, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and Wolfgang Lehner. SAP HANA database: data management for modern business applications.SIGMOD Rec., 40(4):45–51, January 2012. ISSN 0163-5808. doi: 10.1145/2094114.2094126. URL https://doi.org/10.1145/2094114.2094126. 12

work page doi:10.1145/2094114.2094126 2012
[44]

https://engineering.fb.com/2022/05/04/data-infrastructure/delta/

Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, and Himani Apte. F1: a distributed SQL database that scales.Proc. VLDB Endow., 6(11):1068–1079, August 2013. ISSN 2150-8097. doi: 10.14778/2536222.2536232. URL https...

work page doi:10.14778/2536222.2536232 2013
[45]

A New Presumed Commit Optimization for Two Phase Commit

James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak,...

work page doi:10.1145/2491245 2013
[46]

Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang

Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. Self-driving database management systems. InConference on Innovative Data Systems Research, 2017. URLhttps://...

work page 2017
[47]

An end-to-end automatic cloud database tuning system using deep reinforcement learning

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, Minwei Ran, and Zekang Li. An end-to-end automatic cloud database tuning system using deep reinforcement learning. InProceedings of the 2019 International Conference on Management of Data, SIGMOD ’19, pages 415–432, New York, NY , USA, 2019. A...

work page doi:10.1145/3299869.3300085 2019
[48]

Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task,

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task,

work page
[49]

URLhttps://arxiv.org/abs/1809.08887

work page internal anchor Pith review Pith/arXiv arXiv
[50]

Chang, Fei Huang, Reynold Cheng, and Yongbin Li

Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, Xuanhe Zhou, Chenhao Ma, Guoliang Li, Kevin C.C. Chang, Fei Huang, Reynold Cheng, and Yongbin Li. Can LLM already serve as a database interface? a big bench for large-scale database grounded text-to-SQLs. InProceedings of the 37th International ...

work page 2023
[51]

DIN-SQL: decomposed in-context learning of text-to-SQL with self-correction

Mohammadreza Pourreza and Davood Rafiei. DIN-SQL: decomposed in-context learning of text-to-SQL with self-correction. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

work page 2023
[52]

E. F. Codd. A relational model of data for large shared data banks.Commun. ACM, 13 (6):377–387, June 1970. ISSN 0001-0782. doi: 10.1145/362384.362685. URL https: //doi.org/10.1145/362384.362685

work page doi:10.1145/362384.362685 1970
[53]

Chamberlin and Raymond F

Donald D. Chamberlin and Raymond F. Boyce. Sequel: A structured english query language. InProceedings of the 1974 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control, SIGFIDET ’74, pages 249–264, New York, NY , USA, 1974. Association for Computing Machinery. ISBN 9781450374156. doi: 10.1145/800296.811515. URL https://doi.org/10.1145...

work page doi:10.1145/800296.811515 1974
[54]

Inefficiencies of meta agents for agent design,

Batu El, Mert Yuksekgonul, and James Zou. Inefficiencies of meta agents for agent design,

work page
[55]

URLhttps://arxiv.org/abs/2510.06711

work page arXiv
[56]

Barbarians at the gate: How AI is upending systems research, 2025

Audrey Cheng, Shu Liu, Melissa Pan, Zhifei Li, Bowen Wang, Alex Krentsel, Tian Xia, Mert Cemri, Jongseok Park, Shuo Yang, Jeff Chen, Lakshya Agrawal, Aditya Desai, Jiarong Xing, Koushik Sen, Matei Zaharia, and Ion Stoica. Barbarians at the gate: How AI is upending systems research, 2025. URLhttps://arxiv.org/abs/2510.06189. 13

work page arXiv 2025
[57]

Let the barbarians in: How AI can accelerate systems performance research, 2025

Audrey Cheng, Shu Liu, Melissa Pan, Zhifei Li, Shubham Agarwal, Mert Cemri, Bowen Wang, Alexander Krentsel, Tian Xia, Jongseok Park, Shuo Yang, Jeff Chen, Lakshya Agrawal, Ashwin Naren, Shulu Li, Ruiying Ma, Aditya Desai, Jiarong Xing, Koushik Sen, Matei Zaharia, and Ion Stoica. Let the barbarians in: How AI can accelerate systems performance research, 20...

work page arXiv 2025
[58]

Cost-of-Pass: An economic framework for evaluating language models, 2026

Mehmet Hamza Erol, Batu El, Mirac Suzgun, Mert Yuksekgonul, and James Zou. Cost-of-Pass: An economic framework for evaluating language models, 2026. URL https://arxiv.org/ abs/2504.13359

work page arXiv 2026
[59]

V oyager: An open-ended embodied agent with large language models,

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models,

work page
[60]

URLhttps://arxiv.org/abs/2305.16291

work page internal anchor Pith review Pith/arXiv arXiv
[61]

Equipping agents for the real world with agent skills

Anthropic. Equipping agents for the real world with agent skills. https://www.anthropic. com/engineering/equipping-agents-for-the-real-world-with-agent-skills ,

work page
[62]

24.3" operator_types: [STORE, TRANSFORM] capabilities: data_models: [columnar, time_series, event] access_patterns: [olap, streaming] max_throughput:

Accessed April 2026. A Example Agent Skill: ClickHouse Figure A shows a trimmed excerpt of clickhouse.yaml skill with one representative entry per block. The dated comments are real attribution-log entries: each was added after a specific failure during the learning-loop experiment (§4.3), which is the traceability property cited in §3. B Per-run detail f...

work page 2026

[1] [1]

Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. AlphaEvolve: A coding agent for scientific and algor...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Pan, Alexander Du, Kurt Keutzer, Alvin Cheung, Alexandros G

Shu Liu, Shubham Agarwal, Monishwaran Maheswaran, Mert Cemri, Zhifei Li, Qiuyang Mang, Ashwin Naren, Ethan Boneh, Audrey Cheng, Melissa Z. Pan, Alexander Du, Kurt Keutzer, Alvin Cheung, Alexandros G. Dimakis, Koushik Sen, Matei Zaharia, and Ion Stoica. EvoX: Meta- evolution for automated discovery, 2026. URLhttps://arxiv.org/abs/2602.23413

work page arXiv 2026

[3] [3]

AdaEvolve: Adaptive LLM driven zeroth-order optimization, 2026

Mert Cemri, Shubham Agrawal, Akshat Gupta, Shu Liu, Audrey Cheng, Qiuyang Mang, Ashwin Naren, Lutfi Eren Erdogan, Koushik Sen, Matei Zaharia, Alex Dimakis, and Ion Stoica. AdaEvolve: Adaptive LLM driven zeroth-order optimization, 2026. URL https: //arxiv.org/abs/2602.20133

work page arXiv 2026

[4] [4]

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl- Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khattab. GEPA: Reflective prompt evolution can outperform reinforcement learning, 2026. URL https: //a...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[5] [5]

Glia: A human-inspired AI for automated systems design and optimization, 2026

Pouya Hamadanian, Pantea Karimi, Arash Nasr-Esfahany, Kimia Noorbakhsh, Joseph Chandler, Ali ParandehGheibi, Mohammad Alizadeh, and Hari Balakrishnan. Glia: A human-inspired AI for automated systems design and optimization, 2026. URL https://arxiv.org/abs/2510. 27176

work page 2026

[6] [6]

Claude Code: An agentic coding tool

Anthropic. Claude Code: An agentic coding tool. https://www.anthropic.com/ claude-code, 2025. Accessed April 2026

work page 2025

[7] [7]

Elmore, Michael Stonebraker, Magda Balazinska, Bill Howe, Jeremy Kepner, Sam Madden, David Maier, Tim Mattson, and Stan Zdonik

Jennie Duggan, Aaron J. Elmore, Michael Stonebraker, Magda Balazinska, Bill Howe, Jeremy Kepner, Sam Madden, David Maier, Tim Mattson, and Stan Zdonik. The BigDAWG polystore system.SIGMOD Rec., 44(2):11–16, August 2015. ISSN 0163-5808. doi: 10.1145/2814710. 2814713. URLhttps://doi.org/10.1145/2814710.2814713

work page doi:10.1145/2814710 2015

[8] [8]

Gordon, and Bohan Zhang

Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. Automatic database management system tuning through large-scale machine learning. InProceedings of the 2017 ACM International Conference on Management of Data, SIGMOD ’17, pages 1009–1024, New York, NY , USA, 2017. Association for Computing Machinery. ISBN 9781450341974. doi: 10.1145/303591...

work page doi:10.1145/3035918.3064029 2017

[9] [9]

Accessed April 2026

dbt (data build tool).https://www.getdbt.com/, . Accessed April 2026

work page 2026

[10] [10]

Accessed April 2026

Airbyte.https://airbyte.com/. Accessed April 2026

work page 2026

[11] [11]

Accessed April 2026

Fivetran.https://www.fivetran.com/. Accessed April 2026

work page 2026

[12] [12]

Lee, Ashish Motivala, Abdul Q

Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. The Snowflake elastic data warehouse. InProceedings o...

work page doi:10.1145/2882903.2903741 2016

[13] [13]

Lakehouse: A new generation of open platforms that unify data warehousing and advanced analytics

Michael Armbrust, Ali Ghodsi, Reynold Xin, and Matei Zaharia. Lakehouse: A new generation of open platforms that unify data warehousing and advanced analytics. InConference on Innovative Data Systems Research, 2021. URL https://vldb.org/cidrdb/papers/2021/ cidr2021_paper17.pdf

work page 2021

[14] [14]

Accessed April 2026

DB-Engines ranking.https://db-engines.com/en/ranking. Accessed April 2026. 10

work page 2026

[15] [15]

Gonzalez, and Aditya G

Shu Liu, Soujanya Ponnapalli, Shreya Shankar, Sepanta Zeighami, Alan Zhu, Shubham Agarwal, Ruiqi Chen, Samion Suwito, Shuo Yuan, Ion Stoica, Matei Zaharia, Alvin Cheung, Natacha Crooks, Joseph E. Gonzalez, and Aditya G. Parameswaran. Supporting our ai overlords: Redesigning data systems to be agent-first. 2025. URL https://arxiv.org/abs/2509. 00997

work page 2025

[16] [16]

Accessed April 2026

Pulumi AI.https://www.pulumi.com/ai/. Accessed April 2026

work page 2026

[17] [17]

https://developer.hashicorp.com/terraform/docs/tools/ mcp-server

Terraform MCP server. https://developer.hashicorp.com/terraform/docs/tools/ mcp-server. Accessed April 2026

work page 2026

[18] [18]

Park, George S

Patrick Tser Jern Kon, Jiachen Liu, Yiming Qiu, Weijun Fan, Ting He, Lei Lin, Hao- ran Zhang, Owen M. Park, George S. Elengikal, Yuxin Kang, Ang Chen, Mosharaf Chowdhury, Myungjin Lee, and Xinyu Wang. Iac-eval: A code generation benchmark for cloud infrastructure-as-code programs. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and...

work page doi:10.52202/079017-4273 2024

[19] [19]

O’Reilly Media, 2017

Martin Kleppmann.Designing Data-Intensive Applications. O’Reilly Media, 2017. ISBN 978-1449373320

work page 2017

[20] [20]

One size fits all

Michael Stonebraker and U˘gur Çetintemel."One size fits all": an idea whose time has come and gone, pages 441–462. Association for Computing Machinery and Morgan & Claypool,

work page

[21] [21]

URLhttps://doi.org/10.1145/3226595.3226636

ISBN 9781947487192. URLhttps://doi.org/10.1145/3226595.3226636

work page doi:10.1145/3226595.3226636

[22] [22]

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z. Pan, Shuyi Yang, Lakshya A. Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, Matei Zaharia, Joseph E. Gonzalez, and Ion Stoica. Why do multi-agent LLM systems fail?, 2025. URL https://arxiv.org/abs/2503.13657

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

Multi-agent teams hold experts back, 2026

Aneesh Pappu, Batu El, Hancheng Cao, Carmelo di Nolfo, Yanchao Sun, Meng Cao, and James Zou. Multi-agent teams hold experts back, 2026. URL https://arxiv.org/abs/ 2602.01011

work page arXiv 2026

[24] [24]

Dimakis, Matei Zaharia, and Ion Stoica

Shu Liu, Mert Cemri, Shubham Agarwal, Alexander Krentsel, Ashwin Naren, Qiuyang Mang, Zhifei Li, Akshat Gupta, Monishwaran Maheswaran, Audrey Cheng, Melissa Pan, Ethan Boneh, Kannan Ramchandran, Koushik Sen, Alexandros G. Dimakis, Matei Zaharia, and Ion Stoica. SkyDiscover: A flexible framework for AI-driven scientific and algorithmic discovery, 2026. URL...

work page 2026

[25] [25]

OpenEvolve: an open-source evolutionary coding agent, 2025

Asankhaya Sharma. OpenEvolve: an open-source evolutionary coding agent, 2025. URL https://github.com/algorithmicsuperintelligence/openevolve

work page 2025

[26] [26]

ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin. ShinkaEvolve: Towards open-ended and sample-efficient program evolution, 2025. URL https://arxiv.org/abs/2509.19349

work page internal anchor Pith review Pith/arXiv arXiv 2025

[27] [27]

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, and Kunle Olukotun. Agentic context engineering: Evolving contexts for self-improving language models, 2026. URLhttps://arxiv.org/abs/2510.04618

work page internal anchor Pith review Pith/arXiv arXiv 2026

[28] [28]

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. DSPy: Compiling declarative language model calls into self-improving pipelines, 2023. URLhttps://arxiv.org/abs/2310.03714

work page internal anchor Pith review Pith/arXiv arXiv 2023

[29] [29]

Meta-Harness: End-to-End Optimization of Model Harnesses

Yoonho Lee, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, and Chelsea Finn. Meta-Harness: End-to-end optimization of model harnesses, 2026. URL https://arxiv. org/abs/2603.28052. 11

work page internal anchor Pith review Pith/arXiv arXiv 2026

[30] [30]

Melissa Z. Pan, Negar Arabzadeh, Riccardo Cogo, Yuxuan Zhu, Alexander Xiong, Lakshya A Agrawal, Huanzhi Mao, Emma Shen, Sid Pallerla, Liana Patel, Shu Liu, Tianneng Shi, Xiaoyuan Liu, Jared Quincy Davis, Emmanuele Lacavalla, Alessandro Basile, Shuyi Yang, Paul Castro, Daniel Kang, Joseph E. Gonzalez, Koushik Sen, Dawn Song, Ion Stoica, Matei Zaharia, and ...

work page 2026

[31] [31]

Semantic operators and their optimization: Enabling llm-based data processing with accuracy guarantees in lotus.Proc

Liana Patel, Siddharth Jha, Melissa Pan, Harshit Gupta, Parth Asawa, Carlos Guestrin, and Matei Zaharia. Semantic operators and their optimization: Enabling llm-based data processing with accuracy guarantees in lotus.Proc. VLDB Endow., 18(11):4171–4184, July 2025. ISSN 2150-8097. doi: 10.14778/3749646.3749685. URL https://doi.org/10.14778/3749646. 3749685

work page doi:10.14778/3749646.3749685 2025

[32] [32]

Parameswaran, and Eugene Wu

Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, and Eugene Wu. DocETL: Agentic query rewriting and evaluation for complex document processing.Proc. VLDB Endow., 18(9):3035–3048, May 2025. ISSN 2150-8097. doi: 10.14778/3746405.3746426. URLhttps://doi.org/10.14778/3746405.3746426

work page doi:10.14778/3746405.3746426 2025

[33] [33]

A declarative system for optimizing ai workloads, 2024

Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baille Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, and Gerardo Vitagliano. A declarative system for optimizing ai workloads, 2024. URLhttps://arxiv.org/abs/2405.14696

work page arXiv 2024

[34] [34]

Accessed April 2026

dbt Mesh.https://www.getdbt.com/product/dbt-mesh, . Accessed April 2026

work page 2026

[35] [35]

Accessed April 2026

Apache Iceberg.https://iceberg.apache.org/. Accessed April 2026

work page 2026

[36] [36]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE-agent: agent-computer interfaces enable automated soft- ware engineering. InProceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY , USA, 2024. Curran Associates Inc. ISBN 9798331314385

work page 2024

[37] [37]

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Lilian Weng, and Aleksander M ˛ adry. MLE-bench: Evaluating machine learning agents on machine learning engineering, 2025. URL https://arxiv.org/abs/2410.07095

work page internal anchor Pith review Pith/arXiv arXiv 2025

[38] [38]

DS-1000: a natural and reliable benchmark for data science code generation

Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Wen-tau Yih, Daniel Fried, Sida Wang, and Tao Yu. DS-1000: a natural and reliable benchmark for data science code generation. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

work page 2023

[39] [39]

Presto: SQL on everything

Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, and Christopher Berner. Presto: SQL on everything. In2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 1802–1813, 2019. doi: 10.1109/ICDE.2019.00196

work page doi:10.1109/icde.2019.00196 2019

[40] [40]

Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K

Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. Spark SQL: Relational data processing in Spark. InProceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD ’15, pages 1383–1394, New York, NY , USA,

work page 2015

[41] [41]

ISBN 9781450327589

Association for Computing Machinery. ISBN 9781450327589. doi: 10.1145/2723372. 2742797. URLhttps://doi.org/10.1145/2723372.2742797

work page doi:10.1145/2723372

[42] [42]

HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots

Alfons Kemper and Thomas Neumann. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In2011 IEEE 27th International Conference on Data Engineering, pages 195–206, 2011. doi: 10.1109/ICDE.2011.5767867

work page doi:10.1109/icde.2011.5767867 2011

[43] [43]

SAP HANA database: data management for modern business applications.SIGMOD Rec., 40(4):45–51, January 2012

Franz Färber, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and Wolfgang Lehner. SAP HANA database: data management for modern business applications.SIGMOD Rec., 40(4):45–51, January 2012. ISSN 0163-5808. doi: 10.1145/2094114.2094126. URL https://doi.org/10.1145/2094114.2094126. 12

work page doi:10.1145/2094114.2094126 2012

[44] [44]

https://engineering.fb.com/2022/05/04/data-infrastructure/delta/

Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, and Himani Apte. F1: a distributed SQL database that scales.Proc. VLDB Endow., 6(11):1068–1079, August 2013. ISSN 2150-8097. doi: 10.14778/2536222.2536232. URL https...

work page doi:10.14778/2536222.2536232 2013

[45] [45]

A New Presumed Commit Optimization for Two Phase Commit

James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak,...

work page doi:10.1145/2491245 2013

[46] [46]

Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang

Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. Self-driving database management systems. InConference on Innovative Data Systems Research, 2017. URLhttps://...

work page 2017

[47] [47]

An end-to-end automatic cloud database tuning system using deep reinforcement learning

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, Minwei Ran, and Zekang Li. An end-to-end automatic cloud database tuning system using deep reinforcement learning. InProceedings of the 2019 International Conference on Management of Data, SIGMOD ’19, pages 415–432, New York, NY , USA, 2019. A...

work page doi:10.1145/3299869.3300085 2019

[48] [48]

Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task,

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task,

work page

[49] [49]

URLhttps://arxiv.org/abs/1809.08887

work page internal anchor Pith review Pith/arXiv arXiv

[50] [50]

Chang, Fei Huang, Reynold Cheng, and Yongbin Li

Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, Xuanhe Zhou, Chenhao Ma, Guoliang Li, Kevin C.C. Chang, Fei Huang, Reynold Cheng, and Yongbin Li. Can LLM already serve as a database interface? a big bench for large-scale database grounded text-to-SQLs. InProceedings of the 37th International ...

work page 2023

[51] [51]

DIN-SQL: decomposed in-context learning of text-to-SQL with self-correction

Mohammadreza Pourreza and Davood Rafiei. DIN-SQL: decomposed in-context learning of text-to-SQL with self-correction. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

work page 2023

[52] [52]

E. F. Codd. A relational model of data for large shared data banks.Commun. ACM, 13 (6):377–387, June 1970. ISSN 0001-0782. doi: 10.1145/362384.362685. URL https: //doi.org/10.1145/362384.362685

work page doi:10.1145/362384.362685 1970

[53] [53]

Chamberlin and Raymond F

Donald D. Chamberlin and Raymond F. Boyce. Sequel: A structured english query language. InProceedings of the 1974 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control, SIGFIDET ’74, pages 249–264, New York, NY , USA, 1974. Association for Computing Machinery. ISBN 9781450374156. doi: 10.1145/800296.811515. URL https://doi.org/10.1145...

work page doi:10.1145/800296.811515 1974

[54] [54]

Inefficiencies of meta agents for agent design,

Batu El, Mert Yuksekgonul, and James Zou. Inefficiencies of meta agents for agent design,

work page

[55] [55]

URLhttps://arxiv.org/abs/2510.06711

work page arXiv

[56] [56]

Barbarians at the gate: How AI is upending systems research, 2025

Audrey Cheng, Shu Liu, Melissa Pan, Zhifei Li, Bowen Wang, Alex Krentsel, Tian Xia, Mert Cemri, Jongseok Park, Shuo Yang, Jeff Chen, Lakshya Agrawal, Aditya Desai, Jiarong Xing, Koushik Sen, Matei Zaharia, and Ion Stoica. Barbarians at the gate: How AI is upending systems research, 2025. URLhttps://arxiv.org/abs/2510.06189. 13

work page arXiv 2025

[57] [57]

Let the barbarians in: How AI can accelerate systems performance research, 2025

Audrey Cheng, Shu Liu, Melissa Pan, Zhifei Li, Shubham Agarwal, Mert Cemri, Bowen Wang, Alexander Krentsel, Tian Xia, Jongseok Park, Shuo Yang, Jeff Chen, Lakshya Agrawal, Ashwin Naren, Shulu Li, Ruiying Ma, Aditya Desai, Jiarong Xing, Koushik Sen, Matei Zaharia, and Ion Stoica. Let the barbarians in: How AI can accelerate systems performance research, 20...

work page arXiv 2025

[58] [58]

Cost-of-Pass: An economic framework for evaluating language models, 2026

Mehmet Hamza Erol, Batu El, Mirac Suzgun, Mert Yuksekgonul, and James Zou. Cost-of-Pass: An economic framework for evaluating language models, 2026. URL https://arxiv.org/ abs/2504.13359

work page arXiv 2026

[59] [59]

V oyager: An open-ended embodied agent with large language models,

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models,

work page

[60] [60]

URLhttps://arxiv.org/abs/2305.16291

work page internal anchor Pith review Pith/arXiv arXiv

[61] [61]

Equipping agents for the real world with agent skills

Anthropic. Equipping agents for the real world with agent skills. https://www.anthropic. com/engineering/equipping-agents-for-the-real-world-with-agent-skills ,

work page

[62] [62]

24.3" operator_types: [STORE, TRANSFORM] capabilities: data_models: [columnar, time_series, event] access_patterns: [olap, streaming] max_throughput:

Accessed April 2026. A Example Agent Skill: ClickHouse Figure A shows a trimmed excerpt of clickhouse.yaml skill with one representative entry per block. The dated comments are real attribution-log entries: each was added after a specific failure during the learning-loop experiment (§4.3), which is the traceability property cited in §3. B Per-run detail f...

work page 2026