Recognition: unknown
PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans
Pith reviewed 2026-05-10 15:36 UTC · model grok-4.3
The pith
PLOP reduces hybrid semantic-relational query planning to semantic filter placement and uses dynamic programming to select positions that minimize the combined LLM and relational processing costs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PLOP reduces hybrid query planning to semantic filter placement via two equivalence-preserving rewrites. It proves that deferring all semantic filters to the latest possible position minimizes LLM invocations under function caching, but shows that this can cause relational processing costs to dominate on complex multi-table queries. To balance LLM cost against relational cost, PLOP uses a dynamic-programming-based cost model that finds the placement minimizing their weighted sum.
What carries the argument
Dynamic-programming cost model that selects positions for semantic filters after reducing the plan via two equivalence-preserving rewrites.
If this is right
- Deferring semantic filters to the latest position minimizes LLM calls when function caching is present.
- On multi-table queries the relational costs can outweigh the savings from fewer LLM calls, requiring the cost model to choose earlier placements.
- The selected placements preserve output quality measured as F1 score against both unoptimized baselines and human ground truth.
- The overall approach yields lower total cost than heuristic placement while keeping the highest accuracy among compared systems.
Where Pith is reading between the lines
- The same rewrite-plus-cost-model pattern could apply to other systems that combine expensive learned functions with classical relational operators.
- Varying the weight between LLM and relational costs in the model would let users tune for different hardware or pricing regimes.
- Extending the model to account for partial caching or varying model latency could improve predictions when queries run repeatedly.
Load-bearing premise
The dynamic-programming cost model must accurately estimate real LLM invocation costs including caching effects together with relational processing costs, and the two rewrites must leave query semantics and output quality unchanged.
What would settle it
Running the 44 test queries on actual hardware with measured LLM and relational times to check whether the placement chosen by the cost model produces lower total cost and runtime than the late-deferral placement or simple heuristics.
Figures
read the original abstract
Recent database systems have introduced semantic operators that leverage large language models (LLMs) to filter, join, and project over structured data using natural language predicates. In practice, these operators are combined with traditional relational operators, e.g., equi-joins, producing hybrid query plans whose execution cost depends on both expensive LLM calls and conventional database processing. A key optimization question is where to place each semantic operator relative to the relational operators in the plan: placing them earlier reduces the data that subsequent operators process, but requires more LLM calls; placing them later reduces LLM calls through deduplication, but forces relational operators to process larger intermediate data. Existing systems either ignore this placement question or apply simple heuristics without considering the full cost trade-off. We present PLOP, a plan-level optimizer for hybrid semantic-relational queries. PLOP reduces hybrid query planning to semantic filter placement via two equivalence-preserving rewrites. We prove that deferring all semantic filters to the latest possible position minimizes LLM invocations under function caching, but show that this can cause relational processing costs to dominate on complex multi-table queries. To balance LLM cost against relational cost, PLOP uses a dynamic-programming-based cost model that finds the placement minimizing their weighted sum. On 44 semantic SQL queries across five schemas and two benchmarks, PLOP achieves up to 1.5$\times$ speedup and 4.29$\times$ cost reduction while maintaining high output quality: an average F1 of 0.85 against the unoptimized baseline and 0.84 against human-annotated ground truth on SemBench. Overall, PLOP achieves a significant cost reduction while preserving the highest accuracy among six publicly available systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. PLOP is a plan-level optimizer for hybrid queries combining semantic operators (LLM-based filters, joins, projections) with traditional relational operators. It reduces placement decisions to semantic filter positioning via two equivalence-preserving rewrites, proves that maximal deferral of filters minimizes LLM invocations under function caching, and employs dynamic programming to select the placement minimizing the weighted sum of LLM and relational costs. On 44 semantic SQL queries across five schemas and two benchmarks, it reports up to 1.5× speedup and 4.29× cost reduction while preserving output quality (average F1 of 0.85 vs. unoptimized baseline and 0.84 vs. human ground truth on SemBench).
Significance. If the cost model accurately captures real LLM and relational execution costs (including caching), PLOP provides a principled, extensible method for optimizing hybrid semantic-relational plans that existing systems handle only heuristically. The equivalence-preserving rewrites and the stated proof that maximal deferral minimizes LLM calls under caching are clear technical strengths, as is the evaluation spanning multiple schemas and benchmarks with quality metrics against both baselines and ground truth. This work could inform query optimizers in LLM-augmented database systems.
major comments (2)
- [§4] §4 (dynamic-programming cost model): the estimates of per-call LLM costs, caching hit rates (which depend on post-relational deduplication), and relational operator costs are not validated against measured execution times or real LLM latency distributions. Because the reported 1.5× speedup and 4.29× cost reduction rest on the DP optimizer selecting placements that truly minimize the weighted cost, absence of this validation is load-bearing for the central empirical claim.
- [§5] §5 (evaluation): no sensitivity analysis is reported for the free weighting parameter that trades off LLM versus relational costs, nor are error bars or per-query variance provided for the speedup and cost-reduction figures. This leaves open whether the gains hold across different cost ratios or are sensitive to the specific weight chosen.
minor comments (3)
- [Abstract] Abstract: the phrase 'up to 1.5× speedup' does not indicate the specific query/schema achieving the maximum or any variance; adding this context would strengthen the summary claim.
- [§3] §3 (rewrites): while the deferral proof is stated, a short worked example showing how one of the two rewrites transforms a concrete hybrid query plan would improve readability for readers unfamiliar with the semantic operators.
- [§4] Notation: the weighted cost function used inside the DP recurrence should be defined with an explicit equation number at its first appearance rather than described only in prose.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. We address the two major comments point by point below, acknowledging the points where the manuscript can be strengthened and outlining the revisions we will make.
read point-by-point responses
-
Referee: [§4] §4 (dynamic-programming cost model): the estimates of per-call LLM costs, caching hit rates (which depend on post-relational deduplication), and relational operator costs are not validated against measured execution times or real LLM latency distributions. Because the reported 1.5× speedup and 4.29× cost reduction rest on the DP optimizer selecting placements that truly minimize the weighted cost, absence of this validation is load-bearing for the central empirical claim.
Authors: We agree that validating the cost-model parameters against real measurements is necessary to fully support the empirical claims. The manuscript derives LLM per-call costs from standard API pricing and average observed latencies, caching hit rates from the deduplication effects in the rewritten plans, and relational costs from conventional database micro-benchmarks; the DP then optimizes placements under this model. However, we did not include a direct comparison of predicted versus measured end-to-end execution times or latency distributions. In the revised manuscript we will add a dedicated validation subsection in §4 that reports measured runtimes and latencies on the benchmark queries and compares them to the model's predictions, thereby confirming that the relative cost trade-offs used by the optimizer are accurate. revision: yes
-
Referee: [§5] §5 (evaluation): no sensitivity analysis is reported for the free weighting parameter that trades off LLM versus relational costs, nor are error bars or per-query variance provided for the speedup and cost-reduction figures. This leaves open whether the gains hold across different cost ratios or are sensitive to the specific weight chosen.
Authors: We acknowledge that sensitivity analysis and variance reporting would strengthen the evaluation. The weighting parameter was selected to reflect typical cost ratios in our experimental environment, but the manuscript does not vary this parameter or report per-query variance or error bars. In the revised §5 we will add a sensitivity study that sweeps the weighting parameter over a representative range of values and will include error bars together with per-query variance for the reported aggregate speedup and cost-reduction figures. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper reduces hybrid planning to semantic filter placement via two equivalence-preserving rewrites, proves (via caching and deduplication logic) that maximal deferral minimizes LLM calls, and applies standard dynamic programming to minimize a weighted sum of estimated LLM and relational costs. No step reduces by construction to its own inputs, fitted parameters, or self-citation chains; the rewrites and proof are presented as independent logical arguments, the cost model is an estimator rather than a tautology, and reported speedups/cost reductions are measured empirical outcomes on 44 queries rather than forced predictions. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- cost weight between LLM and relational costs
axioms (2)
- domain assumption Two equivalence-preserving rewrites reduce hybrid planning to semantic filter placement
- domain assumption LLM function caching makes repeated calls on identical inputs free after the first
Reference graph
Works this paper leans on
-
[1]
Samuel Arch, Yuchen Liu, Todd C Mowry, Jignesh M Patel, and Andrew Pavlo
-
[2]
The key to effective udf optimization: Before inlining, first perform outlin- ing.Proceedings of the VLDB Endowment18, 1 (2024)
2024
-
[3]
Konstantinos Chasialis, Yannis Foufoulas, Alkis Simitsis, and Yannis Ioannidis
-
[4]
Optimizing UDF Queries in SQL Data Engines. (2025)
2025
-
[5]
Surajit Chaudhuri and Kyuseok Shim. 1999. Optimization of Queries with User- Defined Predicates.ACM Transactions on Database Systems24, 2 (1999), 177–228
1999
-
[6]
Anas Dorbani, Sunny Yasser, Jimmy Lin, and Amine Mhedhbi. 2025. Beyond quacking: Deep integration of language models and RAG into DuckDB.arXiv preprint arXiv:2504.01157(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
2022.Join Order Optimization with (Almost) No Statistics
Tom Ebergen. 2022.Join Order Optimization with (Almost) No Statistics. Master’s thesis. Vrije Universiteit Amsterdam
2022
-
[8]
Kai Franz, Samuel Arch, Denis Hirn, Torsten Grust, Todd C Mowry, and Andrew Pavlo. 2024. Dear User-Defined Functions, Inlining isn’t working out so great for us. Let’s try batching to make our relationship work. Sincerely, SQL.. InCIDR
2024
-
[9]
Victor Giannakouris and Immanuel Trummer. 2025. 𝜆-Tune: Harnessing Large Language Models for Automated Database System Tuning.Proceedings of the ACM on Management of Data3, 1 (2025), 1–26
2025
-
[10]
Hellerstein and Michael Stonebraker
Joseph M. Hellerstein and Michael Stonebraker. 1993. Predicate Migration: Optimizing Queries with Expensive Predicates. InProceedings of the 1993 ACM SIGMOD International Conference on Management of Data. ACM, 267–276
1993
- [11]
-
[12]
Saehan Jo and Immanuel Trummer. 2024. Thalamusdb: Approximate query processing on multi-modal data.Proceedings of the ACM on Management of Data 2, 3 (2024), 1–26
2024
-
[13]
Daniel Kang, Edward Gan, Peter Bailis, Tatsunori Hashimoto, and Matei Zaharia
- [14]
-
[15]
Udesh Kumarasinghe, Tyler Liu, Chunwei Liu, and Walid G Aref. 2026. iPDB– Optimizing SQL Queries with ML and LLM Predicates.arXiv preprint arXiv:2601.16432(2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[16]
Jiale Lao, Andreas Zimmerer, Olga Ovcharenko, Tianji Cong, Matthew Russo, Gerardo Vitagliano, Michael Cochez, Fatma Özcan, Gautam Gupta, Thibaud Hottelier, H. V. Jagadish, Kris Kissel, Sebastian Schelter, Andreas Kipf, and Im- manuel Trummer. 2026. SemBench: A Benchmark for Semantic Query Processing Engines. arXiv:2511.01716 [cs.DB] https://arxiv.org/abs/...
- [17]
-
[18]
Parameswaran
Yiming Lin, Madelon Hulsebos, Ruiying Ma, Shreya Shankar, Sepanta Zeighami, and Aditya G. Parameswaran. 2025. Towards Accurate and Efficient Document Analytics with Large Language Models. InProceedings of the IEEE International Conference on Data Engineering (ICDE)
2025
-
[19]
Paweł Liskowski, Benjamin Han, Paritosh Aggarwal, Bowei Chen, Boxin Jiang, Nitish Jindal, Zihan Li, Aaron Lin, Kyle Schmaus, Jay Tayade, Weicheng Zhao, Anupam Datta, Nathan Wiegand, and Dimitris Tsirogiannis. 2025. Cortex AISQL: A Production SQL Engine for Unstructured Data. arXiv:2511.07663 [cs.DB]
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[20]
Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baile Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, Rana Shahout, et al. 2025. Palimpzest: Optimizing ai-powered analytics with declarative query processing. InProceedings of the Conference on Innovative Database Research (CIDR). 2
2025
-
[21]
Shu Liu, Asim Biswal, Amog Kamsetty, Audrey Cheng, Luis G Schroeder, Liana Patel, Shiyi Cao, Xiangxi Mo, Ion Stoica, Joseph E Gonzalez, et al. 2025. Optimiz- ing llm queries in relational data analytics workloads.Proceedings of Machine Learning and Systems7 (2025)
2025
- [22]
-
[23]
Qiuyang Mang, Runyuan He, Suyang Zhong, Xiaoxuan Liu, Huanchen Zhang, and Alvin Cheung. 2026. Automated Discovery of Test Oracles for Database Management Systems Using LLMs.Proceedings of the ACM on Management of Data4, 3 (2026), 1–40
2026
-
[24]
Liana Patel, Siddharth Jha, Melissa Pan, Harshit Gupta, Parth Asawa, Carlos Guestrin, and Matei Zaharia. 2025. Semantic Operators and Their Optimization: Enabling LLM-Based Data Processing with Accuracy Guarantees in LOTUS. Proceedings of the VLDB Endowment18, 11 (2025), 4171–4184
2025
- [25]
-
[26]
Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. InProceedings of the 2019 International Conference on Management of Data (SIGMOD). ACM, 1981–1984
2019
- [27]
-
[28]
Dario Satriani, Enzo Veltri, Donatello Santoro, Sara Rosato, Simone Varriale, and Paolo Papotti. 2025. Logical and physical optimizations for sql query execution over large language models.Proceedings of the ACM on Management of Data3, 3 (2025), 1–28
2025
- [29]
-
[30]
Mihail Stoian and Andreas Kipf. 2024. DPconv: Super-Polynomially Faster Join Ordering.Proceedings of the ACM on Management of Data2, 6 (2024), 1–26
2024
- [31]
-
[32]
Transaction Processing Performance Council. 2024. TPC-H Benchmark Specifi- cation. https://www.tpc.org/tpch/. Accessed: 2026-03-31
2024
-
[33]
Johannes Wehrstein, Tiemo Bang, Roman Heinrich, and Carsten Binnig. 2025. GRACEFUL: A Learned Cost Estimator for UDFs. In2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE, 2450–2463
2025
- [34]
- [35]
-
[36]
Xuanhe Zhou, Zhaoyan Sun, and Guoliang Li. 2024. Db-gpt: Large Language Model Meets Database.Data Science and Engineering9, 1 (2024), 102–111
2024
- [37]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.