pith. machine review for the scientific record. sign in

arxiv: 2604.09944 · v2 · submitted 2026-04-10 · 💻 cs.DB

Recognition: unknown

PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:36 UTC · model grok-4.3

classification 💻 cs.DB
keywords query optimizationsemantic operatorshybrid query plansdynamic programmingLLM integrationcost modeldatabase systems
0
0 comments X

The pith

PLOP reduces hybrid semantic-relational query planning to semantic filter placement and uses dynamic programming to select positions that minimize the combined LLM and relational processing costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to decide where to locate semantic operators that rely on large language models inside plans that also contain ordinary relational operators such as joins. Early placement filters data before expensive relational steps but triggers more model calls, while late placement reduces model calls via deduplication yet forces relational operators to handle larger volumes. PLOP converts the placement problem into a choice of filter positions through two rewrites that leave query meaning unchanged, then solves it with dynamic programming that picks the position minimizing a weighted sum of model invocations and relational work. A reader would care because database systems are increasingly adding natural-language predicates to classical queries, and poor placement quickly makes the total expense impractical. If the approach holds, hybrid queries can run substantially faster and cheaper without losing output quality on semantic benchmarks.

Core claim

PLOP reduces hybrid query planning to semantic filter placement via two equivalence-preserving rewrites. It proves that deferring all semantic filters to the latest possible position minimizes LLM invocations under function caching, but shows that this can cause relational processing costs to dominate on complex multi-table queries. To balance LLM cost against relational cost, PLOP uses a dynamic-programming-based cost model that finds the placement minimizing their weighted sum.

What carries the argument

Dynamic-programming cost model that selects positions for semantic filters after reducing the plan via two equivalence-preserving rewrites.

If this is right

  • Deferring semantic filters to the latest position minimizes LLM calls when function caching is present.
  • On multi-table queries the relational costs can outweigh the savings from fewer LLM calls, requiring the cost model to choose earlier placements.
  • The selected placements preserve output quality measured as F1 score against both unoptimized baselines and human ground truth.
  • The overall approach yields lower total cost than heuristic placement while keeping the highest accuracy among compared systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same rewrite-plus-cost-model pattern could apply to other systems that combine expensive learned functions with classical relational operators.
  • Varying the weight between LLM and relational costs in the model would let users tune for different hardware or pricing regimes.
  • Extending the model to account for partial caching or varying model latency could improve predictions when queries run repeatedly.

Load-bearing premise

The dynamic-programming cost model must accurately estimate real LLM invocation costs including caching effects together with relational processing costs, and the two rewrites must leave query semantics and output quality unchanged.

What would settle it

Running the 44 test queries on actual hardware with measured LLM and relational times to check whether the placement chosen by the cost model produces lower total cost and runtime than the late-deferral placement or simple heuristics.

Figures

Figures reproduced from arXiv: 2604.09944 by Aditya Parameswaran, Alvin Cheung, Hanchen Li, Hangrui Zhou, Jiaxiang Yu, Qiuyang Mang, Runyuan He, Yufan Xiang.

Figure 1
Figure 1. Figure 1: Three hybrid query plans for Listing 1 ( [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pulling up SP and its dependent 𝜎. (a) SP computes a sentiment score on all reviews before the join. (b) After pull￾up, SP evaluates only reviews that have a matching book. (a) Before 𝜎m.yr≥r.yr SJ: {r} mentions {m}? movies reviews ⇒ (b) After SF: {r} mentions {m}? 𝜎m.yr≥r.yr × movies reviews [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Decomposing a semantic join. (a) 𝜎 sits above the monolithic SJ, which evaluates the semantic predicate on all pairs. (b) After decomposition, 𝜎 is placed between × and SF, filtering pairs before LLM evaluation. query in Listing 2 and the illustration of its transformation in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustrating the DP cost model from Alg. 2 at join node [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Hybrid query benchmark: operator composition per query. Each stacked bar shows the count of [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Per-query latency (s) and LLM cost ($) on log scale across the 30-query hybrid benchmark. Subfigures (a, c) show [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Sensitivity to 𝛼. Left: LLM calls and latency as 𝛼 varies. Right: plan trees under three settings. (a) Large 𝛼: both filters pushed down, 11 calls but 28.2s. (b) Moderate 𝛼: SF1 pulled above the top join, 7 calls at 16.7s. (c) Small 𝛼: both filters pulled up, 8 calls but 97.1s as unfiltered joins dominate. (a) Cost ($) 𝑠1 0.05 0.1 0.8 𝑠𝑖 0.1 0.2 0.8 0.067 0.158 0.158 0.068 0.067 0.158 0.067 0.068 0.155 (b)… view at source ↗
Figure 8
Figure 8. Figure 8: Sensitivity to selectivity estimates. (a) LLM cost [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Optimizer overhead by number of SFs. Percentages above bars show the optimizer’s share of total query execu￾tion time. The optimizer takes under 0.12s in all cases. default 𝛼 because our primary objective is to optimize monetary cost from LLM calls, while still discouraging plans whose relational cost blowup would make the query prohibitively slow: on simple queries, the resulting plans are near-optimal, w… view at source ↗
read the original abstract

Recent database systems have introduced semantic operators that leverage large language models (LLMs) to filter, join, and project over structured data using natural language predicates. In practice, these operators are combined with traditional relational operators, e.g., equi-joins, producing hybrid query plans whose execution cost depends on both expensive LLM calls and conventional database processing. A key optimization question is where to place each semantic operator relative to the relational operators in the plan: placing them earlier reduces the data that subsequent operators process, but requires more LLM calls; placing them later reduces LLM calls through deduplication, but forces relational operators to process larger intermediate data. Existing systems either ignore this placement question or apply simple heuristics without considering the full cost trade-off. We present PLOP, a plan-level optimizer for hybrid semantic-relational queries. PLOP reduces hybrid query planning to semantic filter placement via two equivalence-preserving rewrites. We prove that deferring all semantic filters to the latest possible position minimizes LLM invocations under function caching, but show that this can cause relational processing costs to dominate on complex multi-table queries. To balance LLM cost against relational cost, PLOP uses a dynamic-programming-based cost model that finds the placement minimizing their weighted sum. On 44 semantic SQL queries across five schemas and two benchmarks, PLOP achieves up to 1.5$\times$ speedup and 4.29$\times$ cost reduction while maintaining high output quality: an average F1 of 0.85 against the unoptimized baseline and 0.84 against human-annotated ground truth on SemBench. Overall, PLOP achieves a significant cost reduction while preserving the highest accuracy among six publicly available systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. PLOP is a plan-level optimizer for hybrid queries combining semantic operators (LLM-based filters, joins, projections) with traditional relational operators. It reduces placement decisions to semantic filter positioning via two equivalence-preserving rewrites, proves that maximal deferral of filters minimizes LLM invocations under function caching, and employs dynamic programming to select the placement minimizing the weighted sum of LLM and relational costs. On 44 semantic SQL queries across five schemas and two benchmarks, it reports up to 1.5× speedup and 4.29× cost reduction while preserving output quality (average F1 of 0.85 vs. unoptimized baseline and 0.84 vs. human ground truth on SemBench).

Significance. If the cost model accurately captures real LLM and relational execution costs (including caching), PLOP provides a principled, extensible method for optimizing hybrid semantic-relational plans that existing systems handle only heuristically. The equivalence-preserving rewrites and the stated proof that maximal deferral minimizes LLM calls under caching are clear technical strengths, as is the evaluation spanning multiple schemas and benchmarks with quality metrics against both baselines and ground truth. This work could inform query optimizers in LLM-augmented database systems.

major comments (2)
  1. [§4] §4 (dynamic-programming cost model): the estimates of per-call LLM costs, caching hit rates (which depend on post-relational deduplication), and relational operator costs are not validated against measured execution times or real LLM latency distributions. Because the reported 1.5× speedup and 4.29× cost reduction rest on the DP optimizer selecting placements that truly minimize the weighted cost, absence of this validation is load-bearing for the central empirical claim.
  2. [§5] §5 (evaluation): no sensitivity analysis is reported for the free weighting parameter that trades off LLM versus relational costs, nor are error bars or per-query variance provided for the speedup and cost-reduction figures. This leaves open whether the gains hold across different cost ratios or are sensitive to the specific weight chosen.
minor comments (3)
  1. [Abstract] Abstract: the phrase 'up to 1.5× speedup' does not indicate the specific query/schema achieving the maximum or any variance; adding this context would strengthen the summary claim.
  2. [§3] §3 (rewrites): while the deferral proof is stated, a short worked example showing how one of the two rewrites transforms a concrete hybrid query plan would improve readability for readers unfamiliar with the semantic operators.
  3. [§4] Notation: the weighted cost function used inside the DP recurrence should be defined with an explicit equation number at its first appearance rather than described only in prose.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address the two major comments point by point below, acknowledging the points where the manuscript can be strengthened and outlining the revisions we will make.

read point-by-point responses
  1. Referee: [§4] §4 (dynamic-programming cost model): the estimates of per-call LLM costs, caching hit rates (which depend on post-relational deduplication), and relational operator costs are not validated against measured execution times or real LLM latency distributions. Because the reported 1.5× speedup and 4.29× cost reduction rest on the DP optimizer selecting placements that truly minimize the weighted cost, absence of this validation is load-bearing for the central empirical claim.

    Authors: We agree that validating the cost-model parameters against real measurements is necessary to fully support the empirical claims. The manuscript derives LLM per-call costs from standard API pricing and average observed latencies, caching hit rates from the deduplication effects in the rewritten plans, and relational costs from conventional database micro-benchmarks; the DP then optimizes placements under this model. However, we did not include a direct comparison of predicted versus measured end-to-end execution times or latency distributions. In the revised manuscript we will add a dedicated validation subsection in §4 that reports measured runtimes and latencies on the benchmark queries and compares them to the model's predictions, thereby confirming that the relative cost trade-offs used by the optimizer are accurate. revision: yes

  2. Referee: [§5] §5 (evaluation): no sensitivity analysis is reported for the free weighting parameter that trades off LLM versus relational costs, nor are error bars or per-query variance provided for the speedup and cost-reduction figures. This leaves open whether the gains hold across different cost ratios or are sensitive to the specific weight chosen.

    Authors: We acknowledge that sensitivity analysis and variance reporting would strengthen the evaluation. The weighting parameter was selected to reflect typical cost ratios in our experimental environment, but the manuscript does not vary this parameter or report per-query variance or error bars. In the revised §5 we will add a sensitivity study that sweeps the weighting parameter over a representative range of values and will include error bars together with per-query variance for the reported aggregate speedup and cost-reduction figures. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper reduces hybrid planning to semantic filter placement via two equivalence-preserving rewrites, proves (via caching and deduplication logic) that maximal deferral minimizes LLM calls, and applies standard dynamic programming to minimize a weighted sum of estimated LLM and relational costs. No step reduces by construction to its own inputs, fitted parameters, or self-citation chains; the rewrites and proof are presented as independent logical arguments, the cost model is an estimator rather than a tautology, and reported speedups/cost reductions are measured empirical outcomes on 44 queries rather than forced predictions. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on standard query optimization assumptions plus one tunable cost weight and the caching model; no new entities are postulated.

free parameters (1)
  • cost weight between LLM and relational costs
    The weighting factor in the objective minimized by dynamic programming; its value determines the placement chosen and is not derived from first principles.
axioms (2)
  • domain assumption Two equivalence-preserving rewrites reduce hybrid planning to semantic filter placement
    Invoked to simplify the search space; assumed to preserve semantics for all semantic filters.
  • domain assumption LLM function caching makes repeated calls on identical inputs free after the first
    Central to the proof that maximal deferral minimizes LLM invocations.

pith-pipeline@v0.9.0 · 5629 in / 1507 out tokens · 75300 ms · 2026-05-10T15:36:58.506504+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 15 canonical work pages · 3 internal anchors

  1. [1]

    Samuel Arch, Yuchen Liu, Todd C Mowry, Jignesh M Patel, and Andrew Pavlo

  2. [2]

    The key to effective udf optimization: Before inlining, first perform outlin- ing.Proceedings of the VLDB Endowment18, 1 (2024)

  3. [3]

    Konstantinos Chasialis, Yannis Foufoulas, Alkis Simitsis, and Yannis Ioannidis

  4. [4]

    Optimizing UDF Queries in SQL Data Engines. (2025)

  5. [5]

    Surajit Chaudhuri and Kyuseok Shim. 1999. Optimization of Queries with User- Defined Predicates.ACM Transactions on Database Systems24, 2 (1999), 177–228

  6. [6]

    Anas Dorbani, Sunny Yasser, Jimmy Lin, and Amine Mhedhbi. 2025. Beyond quacking: Deep integration of language models and RAG into DuckDB.arXiv preprint arXiv:2504.01157(2025)

  7. [7]

    2022.Join Order Optimization with (Almost) No Statistics

    Tom Ebergen. 2022.Join Order Optimization with (Almost) No Statistics. Master’s thesis. Vrije Universiteit Amsterdam

  8. [8]

    Kai Franz, Samuel Arch, Denis Hirn, Torsten Grust, Todd C Mowry, and Andrew Pavlo. 2024. Dear User-Defined Functions, Inlining isn’t working out so great for us. Let’s try batching to make our relationship work. Sincerely, SQL.. InCIDR

  9. [9]

    Victor Giannakouris and Immanuel Trummer. 2025. 𝜆-Tune: Harnessing Large Language Models for Automated Database System Tuning.Proceedings of the ACM on Management of Data3, 1 (2025), 1–26

  10. [10]

    Hellerstein and Michael Stonebraker

    Joseph M. Hellerstein and Michael Stonebraker. 1993. Predicate Migration: Optimizing Queries with Expensive Predicates. InProceedings of the 1993 ACM SIGMOD International Conference on Management of Data. ACM, 267–276

  11. [11]

    Zijin Hong, Zheng Yuan, Qinggang Zhang, Hao Chen, Junnan Dong, Feiran Huang, and Xiao Huang. 2024. Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL.arXiv preprint arXiv:2406.08426(2024)

  12. [12]

    Saehan Jo and Immanuel Trummer. 2024. Thalamusdb: Approximate query processing on multi-modal data.Proceedings of the ACM on Management of Data 2, 3 (2024), 1–26

  13. [13]

    Daniel Kang, Edward Gan, Peter Bailis, Tatsunori Hashimoto, and Matei Zaharia

  14. [14]

    Approximate selection with guarantees using proxies.arXiv preprint arXiv:2004.00827(2020)

  15. [15]

    Udesh Kumarasinghe, Tyler Liu, Chunwei Liu, and Walid G Aref. 2026. iPDB– Optimizing SQL Queries with ML and LLM Predicates.arXiv preprint arXiv:2601.16432(2026)

  16. [16]

    Jiale Lao, Andreas Zimmerer, Olga Ovcharenko, Tianji Cong, Matthew Russo, Gerardo Vitagliano, Michael Cochez, Fatma Özcan, Gautam Gupta, Thibaud Hottelier, H. V. Jagadish, Kris Kissel, Sebastian Schelter, Andreas Kipf, and Im- manuel Trummer. 2026. SemBench: A Benchmark for Semantic Query Processing Engines. arXiv:2511.01716 [cs.DB] https://arxiv.org/abs/...

  17. [17]

    Zhaodongshui Li, Haitao Gao, Huiming Wang, Gao Cong, and Lidong Bing. 2024. LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency.arXiv preprint arXiv:2404.12872(2024)

  18. [18]

    Parameswaran

    Yiming Lin, Madelon Hulsebos, Ruiying Ma, Shreya Shankar, Sepanta Zeighami, and Aditya G. Parameswaran. 2025. Towards Accurate and Efficient Document Analytics with Large Language Models. InProceedings of the IEEE International Conference on Data Engineering (ICDE)

  19. [19]

    Paweł Liskowski, Benjamin Han, Paritosh Aggarwal, Bowei Chen, Boxin Jiang, Nitish Jindal, Zihan Li, Aaron Lin, Kyle Schmaus, Jay Tayade, Weicheng Zhao, Anupam Datta, Nathan Wiegand, and Dimitris Tsirogiannis. 2025. Cortex AISQL: A Production SQL Engine for Unstructured Data. arXiv:2511.07663 [cs.DB]

  20. [20]

    Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baile Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, Rana Shahout, et al. 2025. Palimpzest: Optimizing ai-powered analytics with declarative query processing. InProceedings of the Conference on Innovative Database Research (CIDR). 2

  21. [21]

    Shu Liu, Asim Biswal, Amog Kamsetty, Audrey Cheng, Luis G Schroeder, Liana Patel, Shiyi Cao, Xiangxi Mo, Ion Stoica, Joseph E Gonzalez, et al. 2025. Optimiz- ing llm queries in relational data analytics workloads.Proceedings of Machine Learning and Systems7 (2025)

  22. [22]

    Ruiying Ma, Shreya Shankar, Ruiqi Chen, Yiming Lin, Sepanta Zeighami, Ra- joshi Ghosh, Abhinav Gupta, Anushrut Gupta, Tanmai Gopal, and Aditya G Parameswaran. 2026. Can AI Agents Answer Your Data Questions? A Bench- mark for Data Agents.arXiv preprint arXiv:2603.20576(2026)

  23. [23]

    Qiuyang Mang, Runyuan He, Suyang Zhong, Xiaoxuan Liu, Huanchen Zhang, and Alvin Cheung. 2026. Automated Discovery of Test Oracles for Database Management Systems Using LLMs.Proceedings of the ACM on Management of Data4, 3 (2026), 1–40

  24. [24]

    Liana Patel, Siddharth Jha, Melissa Pan, Harshit Gupta, Parth Asawa, Carlos Guestrin, and Matei Zaharia. 2025. Semantic Operators and Their Optimization: Enabling LLM-Based Data Processing with Accuracy Guarantees in LOTUS. Proceedings of the VLDB Endowment18, 11 (2025), 4171–4184

  25. [25]

    Kangkang Qi, Dongyang Xie, Wenbo Li, Hao Zhang, Yuanyuan Zhu, Jeffrey Xu Yu, and Kangfei Zhao. 2026. Sema: A High-performance System for LLM-based Semantic Query Processing. arXiv:2603.11622 [cs.DB]

  26. [26]

    Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. InProceedings of the 2019 International Conference on Management of Data (SIGMOD). ACM, 1981–1984

  27. [27]

    Matthew Russo, Sivaprasad Sudhir, Gerardo Vitagliano, Chunwei Liu, Tim Kraska, Samuel Madden, and Michael Cafarella. 2025. Abacus: A Cost-Based Optimizer for Semantic Operator Systems. arXiv:2505.14661 [cs.DB] https://arxiv.org/abs/ 2505.14661

  28. [28]

    Dario Satriani, Enzo Veltri, Donatello Santoro, Sara Rosato, Simone Varriale, and Paolo Papotti. 2025. Logical and physical optimizations for sql query execution over large language models.Proceedings of the ACM on Management of Data3, 3 (2025), 1–28

  29. [29]

    Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G Parameswaran, and Eugene Wu. 2024. Docetl: Agentic query rewriting and evaluation for complex document processing.arXiv preprint arXiv:2410.12189(2024)

  30. [30]

    Mihail Stoian and Andreas Kipf. 2024. DPconv: Super-Polynomially Faster Join Ordering.Proceedings of the ACM on Management of Data2, 6 (2024), 1–26

  31. [31]

    Jie Tan, Kangfei Zhao, Li Rui, Jeff Xu, Chengzhi Yu, Hong Piao, Helen Cheng, Deli Meng, Yu Zhao, and Yu Rong. 2025. Can Large Language Models Be Query Optimizer for Relational Databases?arXiv preprint arXiv:2502.05562(2025)

  32. [32]

    Transaction Processing Performance Council. 2024. TPC-H Benchmark Specifi- cation. https://www.tpc.org/tpch/. Accessed: 2026-03-31

  33. [33]

    Johannes Wehrstein, Tiemo Bang, Roman Heinrich, and Carsten Binnig. 2025. GRACEFUL: A Learned Cost Estimator for UDFs. In2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE, 2450–2463

  34. [34]

    Hengrui Zhang, Yulong Hui, Yihao Liu, and Huanchen Zhang. 2025. ScaleDoc: Scaling LLM-based Predicates over Large Document Collections.arXiv preprint arXiv:2509.12610(2025)

  35. [35]

    Wei Zhou, Yuyang Gao, Xuanhe Zhou, and Guoliang Li. 2025. CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models. arXiv preprint arXiv:2504.00882(2025)

  36. [36]

    Xuanhe Zhou, Zhaoyan Sun, and Guoliang Li. 2024. Db-gpt: Large Language Model Meets Database.Data Science and Engineering9, 1 (2024), 102–111

  37. [37]

    Junhao Zhu, Lu Chen, Xiangyu Ke, Ziquan Fang, Tianyi Li, Yunjun Gao, and Christian S Jensen. 2025. Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization.arXiv preprint arXiv:2511.19830 (2025). 13arXiv preprint, 2026