Recognition: unknown
LASER: A Data-Centric Method for Low-Cost and Efficient SQL Rewriting based on SQL-GRPO
Pith reviewed 2026-05-10 17:10 UTC · model grok-4.3
The pith
LASER trains small language models on MCTS-generated slow queries using SQL-GRPO to rewrite SQL for better execution efficiency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By constructing SQL-MCTS, a large-scale corpus of complex slow queries evolved from seeds using rule-guided anti-patterns and LLM mutations, and applying SQL-GRPO with anchored group advantage and complexity-adaptive dynamic rollouts, small models can autonomously learn execution-verified rewriting patterns that deliver superior efficiency and robust zero-shot transferability.
What carries the argument
SQL-GRPO, an adaptation of Group Relative Policy Optimization that integrates Anchored Group Advantage for refined estimation and Complexity-Adaptive Dynamic Rollout for efficient exploration to teach latency-aware rewriting.
If this is right
- Compact models outperform rule-based systems and LLMs on execution efficiency for rewritten queries.
- Zero-shot transferability reduces the need for domain-specific retraining when facing new query workloads.
- Minimal inference overhead makes the approach suitable for production database environments.
- Data generation through hybrid MCTS expansion provides a scalable way to create training examples without manual annotation.
Where Pith is reading between the lines
- Integrating such models into query optimizers could automate performance tuning in databases used by non-experts.
- Extending the MCTS data synthesis to other optimization problems like index selection or join ordering might yield similar gains.
- Lower model size could enable on-device or edge database query optimization in resource-constrained settings.
Load-bearing premise
The MCTS-generated synthetic slow queries capture the variety of performance bottlenecks present in real database workloads sufficiently well for the learned rewriting rules to apply broadly.
What would settle it
Evaluating the LASER-trained model on a collection of actual production SQL queries from diverse database systems and comparing the resulting execution times against unoptimized and baseline-rewritten versions would test if the claimed improvements hold.
Figures
read the original abstract
Query rewriting, the process of transforming queries into semantically equivalent yet more efficient variants, is crucial for database optimization. Existing solutions predominantly rely on either rule-based heuristics or Large Language Models (LLMs). However, traditional rule-based methods lack adaptability, while LLM-based approaches incur prohibitive inference costs and privacy risks. In contrast, Small Language Models (SLMs) present a compelling middle ground, potentially offering both flexibility and efficiency. However, the development of such compact models is severely bottlenecked by the scarcity of high-quality, domain-specific training data. To bridge this gap, we introduce LASER, a data-centric framework designed to empower small models for robust SQL optimization. First, to address the scarcity of existing benchmarks and the limited optimization headroom of generic synthetic queries, we construct SQL-MCTS, a large-scale corpus of complex slow queries. We employ an MCTS-based hybrid expansion strategy that combines rule-guided anti-patterns with LLM mutations to evolve structurally expressive seeds into execution-verified slow variants. Second, to enable the model to autonomously discover latency-aware rewriting patterns, we propose SQL-GRPO, a specialized alignment strategy adapted from Group Relative Policy Optimization. By integrating Anchored Group Advantage to refine advantage estimation and Complexity-Adaptive Dynamic Rollout to efficiently allocate exploration budgets, this approach effectively empowers compact models to master execution-based optimization logic. Implemented on Qwen3 models, LASER significantly outperforms rule-based systems and LLMs in execution efficiency, while exhibiting robust zero-shot transferability with minimal overhead.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces LASER, a data-centric framework for low-cost SQL query rewriting with small language models. It first builds SQL-MCTS, a large corpus of complex slow queries, via a hybrid MCTS expansion strategy that combines rule-guided anti-patterns with LLM mutations on seed queries followed by execution verification. It then applies SQL-GRPO, an adaptation of Group Relative Policy Optimization that incorporates Anchored Group Advantage for refined advantage estimation and Complexity-Adaptive Dynamic Rollout for efficient exploration, to train compact models (Qwen3) to discover latency-aware rewriting patterns. The central claim is that LASER significantly outperforms both rule-based systems and LLMs in execution efficiency while showing robust zero-shot transferability with minimal overhead.
Significance. If the experimental results and generalization claims hold, LASER would offer a practical middle ground between rigid rule-based optimizers and high-cost LLM-based rewriters, directly addressing data scarcity for domain-specific SLM training in databases. The explicit use of execution-verified synthetic data generation and the two GRPO adaptations (Anchored Group Advantage, Complexity-Adaptive Dynamic Rollout) constitute concrete, reusable contributions to applying RL-style alignment to query optimization.
major comments (2)
- [Abstract] Abstract: the assertions of 'significantly outperforms rule-based systems and LLMs in execution efficiency' and 'robust zero-shot transferability' are presented without any numerical results, baseline names, latency reduction percentages, success rates, dataset sizes, or statistical tests, making it impossible to evaluate the magnitude or reliability of the central performance claims.
- [SQL-MCTS construction] SQL-MCTS construction section: the hybrid MCTS strategy (rule-guided anti-patterns + LLM mutations + execution verification) is load-bearing for the zero-shot transfer claims, yet the manuscript supplies no analysis comparing the generated query distribution (structural mutations, latency profiles) against real production slow-query logs or external benchmarks such as TPC-DS or industry traces; without such validation the risk that learned patterns exploit generation artifacts rather than general optimization logic remains unaddressed.
minor comments (2)
- [Throughout] Ensure that all acronyms (SLM, MCTS, GRPO, SQL-GRPO) are defined at first use and used consistently in equations and figure captions.
- [Experimental figures] Figure captions for any latency or success-rate plots should explicitly state the number of queries, number of runs, and error bars or statistical tests used.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights opportunities to strengthen the clarity and validation of our claims. We address each major comment below and outline the corresponding revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertions of 'significantly outperforms rule-based systems and LLMs in execution efficiency' and 'robust zero-shot transferability' are presented without any numerical results, baseline names, latency reduction percentages, success rates, dataset sizes, or statistical tests, making it impossible to evaluate the magnitude or reliability of the central performance claims.
Authors: We agree that the abstract would be more informative with explicit quantitative support for the performance claims. In the revised manuscript we will expand the abstract to report key metrics from our experiments, including average latency reductions relative to rule-based baselines and LLMs, success rates on the evaluated workloads, dataset sizes used for training and testing, and references to the specific benchmarks (e.g., TPC-DS variants). This change will make the magnitude and reliability of the results immediately apparent to readers. revision: yes
-
Referee: [SQL-MCTS construction] SQL-MCTS construction section: the hybrid MCTS strategy (rule-guided anti-patterns + LLM mutations + execution verification) is load-bearing for the zero-shot transfer claims, yet the manuscript supplies no analysis comparing the generated query distribution (structural mutations, latency profiles) against real production slow-query logs or external benchmarks such as TPC-DS or industry traces; without such validation the risk that learned patterns exploit generation artifacts rather than general optimization logic remains unaddressed.
Authors: We acknowledge the value of distributional validation for the SQL-MCTS corpus. Our current evaluation already demonstrates zero-shot transfer on standard benchmarks including TPC-DS, and the execution-verification step ensures only genuinely slow queries are retained. However, the manuscript does not include an explicit side-by-side comparison of structural features or latency profiles against TPC-DS or external traces. We will add a dedicated analysis subsection that quantifies these aspects (e.g., query complexity distributions, anti-pattern coverage, and latency histograms) relative to TPC-DS and any available public query logs, thereby addressing the concern about potential generation artifacts. revision: partial
Circularity Check
No significant circularity; empirical pipeline is self-contained
full rationale
The paper's core contributions are the SQL-MCTS data generation procedure (MCTS hybrid expansion + execution verification) and the SQL-GRPO alignment method (Anchored Group Advantage + Complexity-Adaptive Dynamic Rollout, adapted from published GRPO). Neither reduces by construction to its inputs: synthetic queries are generated and then filtered by runtime measurement, while the policy is trained and evaluated on held-out splits with external baselines. No equations equate a claimed performance gain to a fitted constant, no uniqueness theorem is imported from the authors' prior work, and no ansatz is smuggled via self-citation. The zero-shot transfer claims rest on experimental results rather than definitional equivalence, making the derivation chain non-circular.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption MCTS hybrid expansion with rule-guided anti-patterns and LLM mutations produces structurally diverse, execution-verified slow queries that are useful for training.
- domain assumption Execution latency provides a reliable, non-noisy reward signal for policy optimization in SQL rewriting.
Reference graph
Works this paper leans on
-
[1]
https://www.autodl.com/
[n.d.].AutoDL. https://www.autodl.com/
-
[2]
Common Table Expressions
[n.d.]. Common Table Expressions. https://www.postgresql.org/docs/current/ queries-with.html
-
[3]
[n.d.]. SQLGlot. https://github.com/tobymao/sqlglot
-
[4]
https://console.volcengine.com
[n.d.].VolcEngine. https://console.volcengine.com
-
[5]
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, and Sara Hooker. 2024. Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs. ACL, 12248–12267
2024
- [6]
-
[7]
Barto, Richard S
Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. 1983. Neuron- like adaptive elements that can solve difficult learning control problems.IEEE Transactions on Systems, Man, and CyberneticsSMC-13, 5 (1983), 834–846
1983
-
[8]
Mior, and Daniel Lemire
Edmon Begoli, Jesús Camacho-Rodríguez, Julian Hyde, Michael J. Mior, and Daniel Lemire. 2018. Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources.SIGMOD, 221–230
2018
-
[9]
Browne, Edward Powley, Daniel Whitehouse, Simon M
Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A Survey of Monte Carlo Tree Search Methods.IEEE Transactions on Computational Intelligence and AI in Games4, 1 (2012), 1–43
2012
- [10]
-
[11]
Xu Chen, Zhen Wang, Shuncheng Liu, Yaliang Li, Kai Zeng, Bolin Ding, Jingren Zhou, Han Su, and Kai Zheng. 2023. Base: Bridging the gap between cost and latency for query optimization.Proc. VLDB Endow.16, 8 (2023), 1958–1966
2023
-
[12]
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V. Le, Sergey Levine, and Yi Ma. 2025. SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training.arXiv preprint arXiv:2501.17161(2025)
work page internal anchor Pith review arXiv 2025
-
[13]
Bailu Ding, Surajit Chaudhuri, Johannes Gehrke, and Vivek Narasayya. 2021. DSB: A decision support benchmark for workload-driven and traditional database systems.Proc. VLDB Endow.14, 13 (2021), 3376–3388
2021
- [14]
-
[15]
Lishui Fan, Yu Zhang, Mouxiang Chen, and Zhongxin Liu. 2025. Posterior- grpo: Rewarding reasoning processes in code generation.arXiv preprint arXiv:2508.05170(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[16]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al . 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Jo- hannes Welbl, Aidan Clark, et al. 2022. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556(2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[18]
Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. InProceedings of the 17th European Conference on Machine Learning. 282–293
2006
- [19]
-
[20]
Haoyang Li, Shang Wu, Xiaokang Zhang, Xinmei Huang, Jing Zhang, Fuxin Jiang, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Hong Chen, and Cuiping Li. 2025. OmniSQL: Synthesizing High-Quality Text-to-SQL Data at Scale.Proc. VLDB Endow.(2025), 4695–4709
2025
-
[21]
Jiahui Li, Tongwang Wu, Yuren Mao, Yunjun Gao, Yajie Feng, and Huaizhong Liu
- [22]
-
[23]
Zhaodonghui Li, Haitao Yuan, Huiming Wang, Gao Cong, and Lidong Bing. 2024. LLM-R2: A Large Language Model Enhanced Rule-Based Rewrite System for Boosting Query Efficiency.Proc. VLDB Endow.18, 1 (2024), 53–65
2024
-
[24]
Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Cheng- gang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. 2024. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [25]
-
[26]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: a learned query optimizer.Proc. VLDB Endow.12, 11 (2019), 1705–1718
2019
-
[27]
Raghunath Othayoth Nambiar and Meikel Poess. 2006. The making of TPC-DS. InVLDB
2006
-
[28]
OpenAI. 2024. Hello GPT-4o. https://openai.com/index/hello-gpt-4o/
2024
-
[29]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.NeurIPS (2022), 27730–27744
2022
-
[30]
Meikel Poess and Chris Floyd. 2000. New TPC benchmarks for decision support and web commerce.ACM SIGMOD Record29, 4 (2000), 64–71
2000
- [31]
-
[32]
Suming Qiu, Jing Li, Zhicheng Zhou, Junjie Huang, Linyuan Qiu, and Zhijie Sun
- [33]
-
[34]
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. [n.d.]. Improving language understanding by generative pre-training. ([n. d.])
-
[35]
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems(2023), 53728–53741
2023
-
[36]
Praveen Seshadri, Hamid Pirahesh, and TY Cliff Leung. 1996. Complex Query Decorrelation. InICDE. 450–450
1996
-
[37]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, et al. 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[38]
Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. 2025. Hybridflow: A flexible and efficient rlhf framework. InEuroSys. 1279–1297
2025
- [39]
-
[40]
Zhaoyan Sun, Xuanhe Zhou, Guoliang Li, Xiang Yu, Jianhua Feng, and Yong Zhang. 2025. R-Bot: An LLM-Based Query Rewrite System.Proc. VLDB Endow. 18, 12 (2025), 5031–5044
2025
-
[41]
Zhaoguo Wang, Zhou Zhou, Yicun Yang, Haoran Ding, Gansen Hu, Ding Ding, Chuzhe Tang, Haibo Chen, and Jinyang Li. 2022. Wetune: Automatic discovery and verification of query rewrite rules. InSIGMOD. 94–107
2022
-
[42]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reason- ing in large language models.NeurIPS35 (2022), 24824–24837
2022
- [43]
-
[44]
Cong Yan, Yin Lin, and Yeye He. 2023. Predicate pushdown for data science pipelines.SIGMOD1, 2 (2023), 1–28
2023
-
[45]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[46]
Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, and Ion Stoica. 2022. Balsa: Learning a query optimizer without expert demonstrations. InSIGMOD. 931–944
2022
-
[47]
Bohan Zhai, Canwen Xu, Yuxiong He, and Zhewei Yao. 2025. Optimizing Rea- soning for Text-to-SQL with Execution Feedback. InACL. 19206–19218
2025
-
[48]
Yunjia Zhang, Yannis Chronis, Jignesh M Patel, and Theodoros Rekatsinas. 2023. Simple adaptive query processing vs. learned query optimizers: Observations and analysis.Proc. VLDB Endow.16, 11 (2023), 2962–2975
2023
- [49]
- [50]
-
[51]
Xuanhe Zhou, Guoliang Li, Chengliang Chai, and Jianhua Feng. 2021. A learned query rewrite system using monte carlo tree search.Proc. VLDB Endow.15, 1 (2021), 46–58. 13
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.