SQLConductor: Search-to-Policy Learning for Step-wise Text-to-SQL Orchestration

Boyan Li; Yizhang Zhu; Yuyu Luo; Zhangyang Peng

arxiv: 2606.23537 · v1 · pith:ZD7TWKPTnew · submitted 2026-06-22 · 💻 cs.DB · cs.AI· cs.LG

SQLConductor: Search-to-Policy Learning for Step-wise Text-to-SQL Orchestration

Yizhang Zhu , Zhangyang Peng , Boyan Li , Yuyu Luo This is my paper

Pith reviewed 2026-06-26 06:17 UTC · model grok-4.3

classification 💻 cs.DB cs.AIcs.LG

keywords Text-to-SQLorchestration policyMonte Carlo Tree Searchworkflow compositionexecution accuracygeneralizationreinforcement learning

0 comments

The pith

A learned policy selects Text-to-SQL actions step-by-step using stability signals from tree search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SQLConductor as a framework that learns a policy for choosing the next subtask in Text-to-SQL workflows instead of using fixed pipelines or committing to complete plans in advance. It applies Monte Carlo Tree Search to explore possible sequences of actions and derives stable supervision signals to train the policy model through weighted fine-tuning and curriculum reinforcement learning. The resulting compact policy coordinates larger frozen models while adapting to intermediate results and feedback. A sympathetic reader would care because the method reports 73.2 percent execution accuracy on BIRD-Dev together with generalization to out-of-distribution datasets. If the approach holds, systems could handle complex database queries more flexibly without retraining entire large models for every stage.

Core claim

SQLConductor formulates Text-to-SQL subtasks as specialized actions for workflow composition and trains a policy model to select the next action based on intermediate artifacts and feedback. Search-to-Policy Learning uses Monte Carlo Tree Search to explore candidate workflows and stability estimation to identify robust supervision. The policy is trained with Stability-weighted Supervised Fine-tuning to prioritize high-quality orchestration patterns and further enhanced through Curriculum Reinforcement Learning. This transforms offline workflow search into a deployable policy for step-wise orchestration at inference time, reaching 73.2 percent execution accuracy on BIRD-Dev while coordinating

What carries the argument

Search-to-Policy Learning, which uses Monte Carlo Tree Search to explore workflows and stability estimation to produce supervision signals for training the step-wise orchestration policy.

If this is right

The orchestration policy adapts to intermediate artifacts and feedback rather than committing to a full workflow in advance.
A compact policy can coordinate frozen larger Text-to-SQL models to achieve higher execution accuracy than methods that train comparable or larger backbones directly.
The learned policy generalizes to out-of-distribution datasets and handles diverse query demands without predefined stage orders.
Step-wise orchestration becomes practical at inference time after offline search converts into an online policy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Keeping large action models frozen while training only the policy may reduce the cost of deploying Text-to-SQL systems on new hardware.
The same search-to-policy pattern could apply to other adaptive multi-stage reasoning tasks such as code generation or multi-hop question answering.
Further tests on databases with rapidly changing schemas would show whether the stability signals remain reliable when action models are swapped.

Load-bearing premise

Stability estimation derived from Monte Carlo Tree Search produces supervision signals that allow the learned policy to generalize to out-of-distribution datasets and diverse query demands.

What would settle it

Applying the trained policy to a held-out database schema or query distribution and measuring whether execution accuracy falls below that of prior direct-training baselines would test the generalization claim.

Figures

Figures reproduced from arXiv: 2606.23537 by Boyan Li, Yizhang Zhu, Yuyu Luo, Zhangyang Peng.

**Figure 2.** Figure 2: The framework overview of SQLConductor [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Statistics of workflow exploration and curation. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Instruction-tuning format for subsequent training. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Rollout and reward computation in curriculum [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: BIRD-Dev EX vs. trained parameter size for training [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Workflow-length distributions across BIRD-Dev [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 9.** Figure 9: Case study. should be viewed as a policy-controlled capability rather than an inherent advantage of dynamic workflows. Policy learning is more effective than raw scaling. The trained 4B policy surpasses the non-trained 32B policy (70.3% vs. 69.7%), despite being roughly 8× smaller. This indicates that Search-toPolicy learning contributes orchestration behavior that cannot be explained by policy capacity a… view at source ↗

read the original abstract

Text-to-SQL enables users to access relational databases via natural language, but real-world settings remain challenging due to coordinated reasoning over complex database environments. Existing systems often use multi-stage pipelines or reasoning models specialized for individual stages. However, fixed pipelines rely on predefined stage orders, limiting their adaptivity to query demands and intermediate evidence. Recent orchestration-based methods provide flexibility by composing specialized modules for each query, but typical plan-then-execute approaches still commit to a complete workflow before execution and cannot adapt to intermediate artifacts and feedback. In this paper, we propose SQLConductor, a step-wise orchestration learning framework for Text-to-SQL. SQLConductor formulates Text-to-SQL subtasks as specialized actions for workflow composition and trains a policy model to select the next action based on intermediate artifacts and feedback. To learn this policy, SQLConductor introduces Search-to-Policy Learning, which uses Monte Carlo Tree Search to explore candidate workflows and stability estimation to identify robust supervision. The policy model is trained with Stability-weighted Supervised Fine-tuning to prioritize high-quality orchestration patterns and further enhanced through Curriculum Reinforcement Learning. This transforms offline workflow search into a deployable policy for step-wise orchestration at inference time. Experiments on BIRD-Dev and out-of-distribution datasets show that SQLConductor achieves superior execution accuracy and strong generalization, reaching 73.2% EX on BIRD-Dev with a compact orchestration policy coordinating frozen larger action models, outperforming prior methods that directly train comparable or larger Text-to-SQL backbones. Further analyses show that the learned policy adapts orchestration to diverse query demands.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SQLConductor turns MCTS workflow search into a step-wise policy via stability-weighted SFT and curriculum RL, but the OOD generalization claim rests on unverified assumptions about what the stability signal actually captures.

read the letter

The core move here is converting offline Monte Carlo Tree Search over Text-to-SQL action sequences into a compact policy that chooses the next subtask at each step. The recipe combines stability estimation to filter supervision, weighted supervised fine-tuning, and then curriculum reinforcement learning. That specific loop is not a direct restatement of prior plan-then-execute work.

It correctly notes that fixed pipelines cannot react to intermediate results and that one-shot planning commits too early. Framing the problem as learning when to invoke which frozen action model based on artifacts and feedback is a reasonable response to those limits. The engineering choice to keep the orchestration policy small while leaving the heavy models untouched also makes sense for practical use.

The soft spot is the transfer story. Stability is computed from success rates in the same search process used to generate training data; nothing described shows that this metric isolates coordination patterns that survive distribution shift rather than BIRD-specific schema or query regularities. The 73.2% execution accuracy figure is stated without protocol details, baseline definitions, or variance, so it is hard to tell how much the method actually improves on existing orchestration baselines.

This is for people already working on natural-language database interfaces who want more adaptive control than current multi-stage systems. A reader focused on RL or search for planning would get value from the training details. The work is coherent enough on its own terms to deserve referee time, though the experiments will need closer scrutiny.

Send it for peer review.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces SQLConductor, a step-wise Text-to-SQL orchestration framework. It formulates subtasks as actions and trains a policy model using Search-to-Policy Learning: Monte Carlo Tree Search explores workflows, stability estimation identifies robust supervision signals, followed by Stability-weighted Supervised Fine-Tuning and Curriculum Reinforcement Learning. The resulting compact policy coordinates frozen larger action models. The paper claims this achieves 73.2% execution accuracy on BIRD-Dev, outperforming prior methods that train larger backbones directly, and shows strong generalization on out-of-distribution datasets.

Significance. If the reported performance and generalization results are substantiated with rigorous experimental protocols, this work could be significant for the Text-to-SQL field. It demonstrates a way to achieve high performance with a small orchestration policy rather than large end-to-end models, and provides adaptivity through step-wise decisions based on intermediate feedback, addressing limitations of fixed pipelines and plan-then-execute approaches.

major comments (3)

Abstract: The abstract asserts a 73.2% execution accuracy figure and outperformance but supplies no experimental protocol, baseline descriptions, statistical tests, error bars, or data-exclusion rules, so it is impossible to determine whether the reported numbers support the claim. This is load-bearing for all performance claims.
Search-to-Policy Learning section: The training loop uses search-generated trajectories to supervise the policy; it is unclear whether the stability metric introduces circular dependence on the same search process used for evaluation, which is load-bearing for the generalization claims.
Experiments on OOD datasets: Stability estimation derived from Monte Carlo Tree Search produces supervision signals claimed to allow the learned policy to generalize to out-of-distribution datasets and diverse query demands. However, no independent verification that the selected supervision signals are invariant across distribution shifts is described, which is central to the OOD generalization result.

minor comments (1)

Notation in the stability estimation description could be clarified with an explicit equation or pseudocode for the Monte Carlo estimate.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The concerns about experimental clarity, potential circularity in our training procedure, and verification of OOD generalization are important. We address each major comment below with clarifications and proposed revisions.

read point-by-point responses

Referee: Abstract: The abstract asserts a 73.2% execution accuracy figure and outperformance but supplies no experimental protocol, baseline descriptions, statistical tests, error bars, or data-exclusion rules, so it is impossible to determine whether the reported numbers support the claim. This is load-bearing for all performance claims.

Authors: We agree that the abstract should better contextualize the performance claims. In the revised version we will expand the abstract to include a concise statement of the evaluation protocol (BIRD-Dev benchmark, standard execution accuracy metric, comparison against prior Text-to-SQL methods), and we will explicitly reference Section 4 for full details on baselines, statistical reporting, error bars, and data handling. This addresses the load-bearing nature of the claim while respecting abstract length constraints. revision: yes
Referee: Search-to-Policy Learning section: The training loop uses search-generated trajectories to supervise the policy; it is unclear whether the stability metric introduces circular dependence on the same search process used for evaluation, which is load-bearing for the generalization claims.

Authors: Stability is computed from the variance of execution outcomes across multiple independent MCTS rollouts (different random seeds and simulation depths) performed on the same training query; this intra-query consistency measure is then used only to weight supervision for SFT. Final policy evaluation occurs on a completely disjoint test set (BIRD-Dev and OOD datasets) with no reuse of the search trajectories. We will add an explicit paragraph in the Search-to-Policy Learning section describing this separation and the independence of the stability computation from test-time evaluation. revision: yes
Referee: Experiments on OOD datasets: Stability estimation derived from Monte Carlo Tree Search produces supervision signals claimed to allow the learned policy to generalize to out-of-distribution datasets and diverse query demands. However, no independent verification that the selected supervision signals are invariant across distribution shifts is described, which is central to the OOD generalization result.

Authors: The empirical OOD results on separate datasets already demonstrate that policies trained on stability-weighted signals transfer better than alternatives. However, we acknowledge that an explicit analysis isolating invariance of the selected signals (e.g., correlation between stability scores and OOD performance or cross-distribution stability statistics) is not currently provided. In revision we will add a dedicated subsection with such verification metrics and, if needed, additional controlled experiments. revision: partial

Circularity Check

0 steps flagged

No circularity: MCTS generates offline supervision for independent policy

full rationale

The paper's derivation uses Monte Carlo Tree Search to explore workflows and produce stability-weighted trajectories as training data for a separate policy model via SFT and curriculum RL. This is a standard offline search-to-policy distillation where the search step creates the dataset and the learned policy is deployed without search at inference. No equations, self-citations, or definitions in the abstract or method description reduce any claimed prediction or generalization result back to the inputs by construction. The approach is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, invented entities, or additional axioms are stated beyond the core modeling choice that subtasks can be treated as actions.

axioms (1)

domain assumption Text-to-SQL subtasks can be formulated as specialized actions whose selection can be learned from intermediate artifacts and feedback
This premise underpins the entire policy model and is stated directly in the abstract as the basis for step-wise orchestration.

pith-pipeline@v0.9.1-grok · 5827 in / 1416 out tokens · 38888 ms · 2026-06-26T06:17:19.846311+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references · 23 canonical work pages · 3 internal anchors

[1]

David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning.Advances in neural information processing systems32 (2019)

2019
[2]

Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samoth- rakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games4, 1 (2012), 1–43

2012
[3]

Zhenbiao Cao, Yuanlei Zheng, Zhihao Fan, Xiaojin Zhang, Wei Chen, and Xiang Bai. 2024. RSL-SQL: Robust Schema Linking in Text-to-SQL Generation.CoRR abs/2411.00073 (2024)

work page arXiv 2024
[4]

Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, et al
[5]

The danger of overthinking: Examining the reasoning-action dilemma in agentic tasks.arXiv preprint arXiv:2502.08235(2025)

work page arXiv 2025
[6]

Xuemei Dong, Chao Zhang, Yuhang Ge, Yuren Mao, Yunjun Gao, Jinshu Lin, Dongfang Lou, et al. 2023. C3: Zero-shot text-to-sql with chatgpt.arXiv preprint arXiv:2307.07306(2023)

work page arXiv 2023
[7]

Jiazhan Feng, Shijue Huang, Xingwei Qu, Ge Zhang, Yujia Qin, Baoquan Zhong, Chengquan Jiang, Jinxin Chi, and Wanjun Zhong. 2026. ReTool: Reinforcement Learning for Strategic Tool Use in LLMs. InThe Fourteenth International Confer- ence on Learning Representations. https://openreview.net/forum?id=tRk1nofSmz

2026
[8]

Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou. 2024. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation.Proc. VLDB Endow.17, 5 (2024), 1132–1145

2024
[9]

Hongcheng Gao, Yue Liu, Yufei He, Longxu Dou, Chao Du, Zhijie Deng, Bryan Hooi, Min Lin, and Tianyu Pang. 2025. Flowreasoner: Reinforcing query-level meta-agents.arXiv preprint arXiv:2504.15257(2025)

work page arXiv 2025
[10]

Yuxiang Guo, Zhuoran Du, Nan Tang, Kezheng Tang, Congcong Ge, and Yunjun Gao. 2026. DTBench: A Synthetic Benchmark for Document-to-Table Extraction. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining

2026
[11]

Mingqian He, Yongliang Shen, Wenqi Zhang, Qiuying Peng, Jun Wang, and Weiming Lu. 2025. Star-sql: Self-taught reasoner for text-to-sql. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 24365–24375

2025
[12]

Zijin Hong, Zheng Yuan, Qinggang Zhang, Hao Chen, Junnan Dong, Feiran Huang, and Xiao Huang. 2025. Next-generation database interfaces: A survey of llm-based text-to-sql.IEEE Transactions on Knowledge and Data Engineering (2025)

2025
[13]

Ruilin Hu, Yuyu Luo, Guoliang Li, Shuangqiao Wu, and Yun Luo. 2026. OpenSQL: Data-Efficient Text-to-SQL for Open-Source LLMs via Synthesized Intermediate Supervision.Proc. VLDB Endow.(2026)

2026
[14]

Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. InEuropean conference on machine learning. Springer, 282–293

2006
[15]

Chia-Hsuan Lee, Oleksandr Polozov, and Matthew Richardson. 2021. KaggleD- BQA: Realistic Evaluation of Text-to-SQL Parsers. InACL/IJCNLP (1). Association for Computational Linguistics, 2261–2273

2021
[16]

Boyan Li, Chong Chen, Zhujun Xue, Yinan Mei, and Yuyu Luo. 2026. DeepEye- SQL: A Software-Engineering-Inspired Text-to-SQL Framework.Proc. ACM Manag. Data4, 3 (2026). https://doi.org/10.1145/3802035

work page doi:10.1145/3802035 2026
[17]

Boyan Li, Ou Ocean Kun Hei, Yue Yu, and Yuyu Luo. 2026. DPC: Training- Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency. https: //doi.org/10.48550/arXiv.2604.15163 arXiv:2604.15163 [cs.DB] Accepted to ACL 2026 Main Track

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.15163 2026
[18]

Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, and Nan Tang. 2024. The Dawn of Natural Language to SQL: Are We Fully Ready? [Experiment, Analysis & Benchmark].Proc. VLDB Endow.17, 11 (2024), 3318–3331

2024
[19]

Boyan Li, Yiran Peng, Yupeng Xie, Sirong Lu, Yizhang Zhu, Xing Mu, Xinyu Liu, and Yuyu Luo. 2026. DeepEye: A Steerable Self-driving Data Agent System. InCompanion of the International Conference on Management of Data(India) (SIGMOD Companion ’26). Association for Computing Machinery, New York, NY, USA, 74–77. https://doi.org/10.1145/3788853.3801612

work page doi:10.1145/3788853.3801612 2026
[20]

Boyan Li, Jiayi Zhang, Ju Fan, Yanwei Xu, Chong Chen, Nan Tang, and Yuyu Luo. 2025. Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search. InForty-second International Conference on Machine Learning

2025
[21]

Haoyang Li, Shang Wu, Xiaokang Zhang, Xinmei Huang, Jing Zhang, Fuxin Jiang, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Hong Chen, and Cuiping Li. 2025. OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale.Proc. VLDB Endow.18, 11 (2025), 4695–4709

2025
[22]

Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, and Hong Chen. 2024. CodeS: Towards Building Open-source Language Models for Text-to-SQL.Proc. ACM Manag. Data2, 3 (2024), 127

2024
[23]

Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, Xuanhe Zhou, Chenhao Ma, Guoliang Li, Kevin Chen-Chuan Chang, Fei Huang, Reynold Cheng, and Yongbin Li. 2023. Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs. InNeurIPS

2023
[24]

Weibin Liao, Xin Gao, Tianyu Jia, Rihong Qiu, Yifan Zhu, Yang Lin, Xinyu Ma, Junfeng Zhao, and Yasha Wang. 2026. LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models. InThe Fourteenth International Conference on Learning Representations. https://openreview.net/ forum?id=q6kXd8Gpfj

2026
[25]

Xiaotian Lin, Yanlin Qi, Yizhang Zhu, Themis Palpanas, Chengliang Chai, Nan Tang, and Yuyu Luo. 2025. LEAD: iterative data selection for efficient LLM instruction tuning.Proceedings of the VLDB Endowment19, 3 (2025), 426–439

2025
[26]

Shih-Yang Liu, Xin Dong, Ximing Lu, Shizhe Diao, Peter Belcak, Mingjie Liu, Min-Hung Chen, Hongxu Yin, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Yejin Choi, Jan Kautz, and Pavlo Molchanov. 2026. GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization. InForty- third International Conference on Machine Learning

2026
[27]

Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, and Yuyu Luo. 2025. A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?IEEE Transactions on Knowledge and Data Engineering(2025)

2025
[28]

Xinyu Liu, Shuyu Shen, Boyan Li, Nan Tang, and Yuyu Luo. 2025. NL2SQL- BUGs: A Benchmark for Detecting Semantic Errors in NL2SQL Translation. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2(Toronto ON, Canada)(KDD ’25). Association for Computing Machinery, New York, NY, USA, 5662–5673. https://doi.org/10.1145/37...

work page doi:10.1145/3711896 2025
[29]

Yifu Liu, Yin Zhu, Yingqi Gao, Zhiling Luo, Xiaoxia Li, Xiaorong Shi, Yuntao Hong, Jinyang Gao, Yu Li, Bolin Ding, and Jingren Zhou. 2026. XiYan-SQL: A Novel Multi-Generator Framework for Text-to-SQL.IEEE Trans. Knowl. Data Eng.38, 4 (2026), 2474–2487

2026
[30]

Tianqi Luo, Chuhan Huang, Leixian Shen, Boyan Li, Shuyu Shen, Wei Zeng, Nan Tang, and Yuyu Luo. 2026. nvBench 2.0: Resolving Ambiguity in Text-to- Visualization through Stepwise Reasoning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https: //openreview.net/forum?id=PuzbYHf1GR

2026
[31]

Yuyu Luo, Guoliang Li, Ju Fan, Chengliang Chai, and Nan Tang. 2025. Natural language to sql: State of the art and open problems.Proceedings of the VLDB Endowment18, 12 (2025), 5466–5471

2025
[32]

Yuyu Luo, Guoliang Li, Ju Fan, and Nan Tang. 2026. Data Agents: Levels, State of the Art, and Open Problems.arXiv preprint arXiv:2602.04261(2026)

work page arXiv 2026
[33]

Yuyu Luo, Xuedi Qin, Nan Tang, and Guoliang Li. 2018. DeepEye: Towards Automatic Data Visualization. In34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018. IEEE Computer Society, 101–112. https://doi.org/10.1109/ICDE.2018.00019

work page doi:10.1109/icde.2018.00019 2018
[34]

Shuai Lyu, Haoran Luo, Ripeng Li, Zhonghong Ou, Jiangfeng Sun, Yang Qin, Xiaoran Shang, Meina Song, and Yifan Zhu. 2025. Sql-o1: A self-reward heuristic dynamic search method for text-to-sql.arXiv preprint arXiv:2502.11741(2025)

work page arXiv 2025
[35]

Da Ma, Ziyue Yang, Hongshen Xu, Haotian Fang, Kai Yu, and Lu Chen. 2026. Empowering LLM Tool Invocation with Tool-call Reward Model. InThe Four- teenth International Conference on Learning Representations. https://openreview. net/forum?id=LnBEASInVr

2026
[36]

Peixian Ma, Boyan Li, Runzhi Jiang, Ju Fan, Nan Tang, and Yuyu Luo. 2024. A Plug-and-Play Natural Language Rewriter for Natural Language to SQL.arXiv preprint arXiv:2412.17068(2024)

work page arXiv 2024
[37]

Stefan Nielsen, Edoardo Cetin, Peter Schwendeman, Qi Sun, Jinglue Xu, and Yujin Tang. 2026. Learning to Orchestrate Agents in Natural Language with the Conductor. InThe Fourteenth International Conference on Learning Representations. https://openreview.net/forum?id=U23A2BUKYt

2026
[38]

Ma Peixian, Xialie Zhuang, Chengjin Xu, Xuhui Jiang, Ran Chen, and Jian Guo
[39]

InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

SQL-R1: Training natural language to sql reasoning model by reinforcement learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems
[40]

Mohammadreza Pourreza, Hailong Li, Ruoxi Sun, Yeounoh Chung, Shayan Talaei, Gaurav Tarlok Kakkar, Yu Gan, Amin Saberi, Fatma Ozcan, and Sercan Ö. Arik
[41]

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL. InICLR. OpenReview.net
[42]

Mohammadreza Pourreza and Davood Rafiei. 2023. DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction. InThirty-seventh Con- ference on Neural Information Processing Systems

2023
[43]

Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun, Xingchen Wan, Hailong Li, Azalia Mirhoseini, Amin Saberi, and Sercan O Arik. 2025. Reasoning-SQL: Rein- forcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL. InSecond Conference on Language Modeling

2025
[44]

Yang Qin, Chao Chen, Zhihang Fu, Ze Chen, Dezhong Peng, Peng Hu, and Jieping Ye. 2025. ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL. InThe Thirteenth International Conference on Learning Representations

2025
[45]

Jianhao Ruan, Zhihao Xu, Yiran Peng, Fashen Ren, Zhaoyang Yu, Xinbing Liang, Jinyu Xiang, Yongru Chen, Bang Liu, Chenglin Wu, Yuyu Luo, and Jiayi Zhang
[46]

Yizhang Zhu, Zhangyang Peng, Boyan Li, and Yuyu Luo* arXiv:2602.03786 [cs.AI] https://arxiv.org/abs/2602.03786

AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration. Yizhang Zhu, Zhangyang Peng, Boyan Li, and Yuyu Luo* arXiv:2602.03786 [cs.AI] https://arxiv.org/abs/2602.03786

work page arXiv
[47]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[48]

Lei Sheng and Xu Shuai Shuai. 2025. CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning. InIJCNLP-AACL (Findings). The Asian Federation of Natural Language Processing and The Association for Computa- tional Linguistics, 1473–1496

2025
[49]

Lei Sheng, Shuai-Shuai Xu, and Wei Xie. 2025. BASE-SQL: A powerful open source Text-To-SQL baseline approach.arXiv preprint arXiv:2502.10739(2025)

work page arXiv 2025
[50]

Chang-Yu Tai, Ziru Chen, Tianshu Zhang, Xiang Deng, and Huan Sun. 2023. Exploring chain of thought style prompting for text-to-sql. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 5376–5393

2023
[51]

Shayan Talaei, Mohammadreza Pourreza, Yu-Chen Chang, Azalia Mirhoseini, and Amin Saberi. 2024. CHESS: Contextual Harnessing for Efficient SQL Synthesis. CoRRabs/2405.16755 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[52]

Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, LinZheng Chai, Zhao Yan, Qian-Wen Zhang, Di Yin, Xing Sun, and Zhoujun Li. 2025. MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL. InProceedings of the 31st International Conference on Computational Linguistics

2025
[53]

Pengfei Wang, Baolin Sun, Xuemei Dong, Yaxun Dai, Hongwei Yuan, Mengdie Chu, Yingqi Gao, Xiang Qi, Peng Zhang, and Ying Yan. 2025. Agentar- Scale-SQL: Advancing Text-to-SQL through Orchestrated Test-Time Scaling. arXiv:2509.24403 [cs.CL] https://arxiv.org/abs/2509.24403

work page arXiv 2025
[54]

Yihan Wang, Peiyu Liu, Runyu Chen, Jiaxing Pu, and Wei Xu. 2025. Squrve: A Unified and Modular Framework for Complex Real-World Text-to-SQL Tasks. arXiv preprint arXiv:2510.24102(2025)

work page arXiv 2025
[55]

Yihan Wang, Peiyu Liu, Runyu Chen, and Wei Xu. 2026. Beyond Static Pipelines: Learning Dynamic Workflows for Text-to-SQL.arXiv preprint arXiv:2602.15564 (2026)

work page arXiv 2026
[56]

Xiangjin Xie, Guangwei Xu, Lingyan Zhao, and Ruijie Guo. 2025. OpenSearch- SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Align- ment.Proc. ACM Manag. Data3, 3 (2025), 194:1–194:24

2025
[57]

Zhewei Yao, Guoheng Sun, Lukasz Borchmann, Gaurav Nuti, Zheyu Shen, Ming- hang Deng, Bohan Zhai, Hao Zhang, Ang Li, and Yuxiong He. 2025. Arctic- text2sql-r1: Simple rewards, strong reasoning in text-to-sql.arXiv preprint arXiv:2505.20315(2025)

work page arXiv 2025
[58]

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir R. Radev. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. InEMNLP. Association for Computational Linguistics, 3911–3921

2018
[59]

Shuozhi Yuan, Liming Chen, Miaomiao Yuan, and Zhao Jin. 2026. MCTS-SQL: Light-Weight LLMs Can Master the Text-to-SQL Through Monte Carlo Tree Search. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 34521–34529

2026
[60]

Sandy L Zabell. 1989. The rule of succession.Erkenntnis31, 2 (1989), 283–321

1989
[61]

Fan Zhang, Vireo Zhang, Shengju Qian, Haoxuan Li, Hao Wu, Jinyang Wu, Donghao Zhou, Zhihong Zhu, Zheng Lian, Xin Wang, and Pheng-Ann Heng
[62]

arXiv:2606.13707 [cs.AI] https://arxiv.org/abs/2606.13707

Orchestra-o1: Omnimodal Agent Orchestration. arXiv:2606.13707 [cs.AI] https://arxiv.org/abs/2606.13707

work page arXiv
[63]

Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xiong-Hui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, and Chenglin Wu. 2025. AFlow: Automating Agentic Workflow Generation. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=z5uVAKwmjf

2025
[64]

Yi Zhang, Jan Deriu, George Katsogiannis-Meimarakis, Catherine Kosten, Geor- gia Koutrika, and Kurt Stockinger. 2023. ScienceBenchmark: A Complex Real- World Benchmark for Evaluating Natural Language to SQL Systems.Proc. VLDB Endow.17, 4 (2023), 685–698

2023
[65]

Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Guoliang Li, Bin Wu, and Wenchao Zhou. 2026. Reward-SQL: Boosting Text-to-SQL via Stepwise Execution-Aware Reasoning and Process-Supervised Rewards.Proceedings of the ACM on Management of Data4, 3 (SIGMOD (2026), 1–27

2026
[66]

Yabo Zhang, Yihan Zeng, Qingyun Li, Zhen Hu, Kavin Han, and Wangmeng Zuo
[67]

arXiv preprint arXiv:2509.12867(2025)

Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use. arXiv preprint arXiv:2509.12867(2025)

work page arXiv 2025
[68]

Yizhang Zhu, Shiyin Du, Boyan Li, Yuyu Luo, and Nan Tang. 2024. Are large language models good statisticians?Advances in Neural Information Processing Systems37 (2024), 62697–62731

2024
[69]

Yizhang Zhu, Runzhi Jiang, Boyan Li, Nan Tang, and Yuyu Luo. 2025. EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing. InSecond Conference on Language Modeling

2025
[70]

Yizhang Zhu, Liangwei Wang, Chenyu Yang, Xiaotian Lin, Boyan Li, Wei Zhou, Xinyu Liu, Zhangyang Peng, Tianqi Luo, Yu Li, et al . 2025. A Survey of Data Agents: Emerging Paradigm or Overstated Hype?arXiv preprint arXiv:2510.23587 (2025)

work page arXiv 2025

[1] [1]

David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning.Advances in neural information processing systems32 (2019)

2019

[2] [2]

Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samoth- rakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games4, 1 (2012), 1–43

2012

[3] [3]

Zhenbiao Cao, Yuanlei Zheng, Zhihao Fan, Xiaojin Zhang, Wei Chen, and Xiang Bai. 2024. RSL-SQL: Robust Schema Linking in Text-to-SQL Generation.CoRR abs/2411.00073 (2024)

work page arXiv 2024

[4] [4]

Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, et al

[5] [5]

The danger of overthinking: Examining the reasoning-action dilemma in agentic tasks.arXiv preprint arXiv:2502.08235(2025)

work page arXiv 2025

[6] [6]

Xuemei Dong, Chao Zhang, Yuhang Ge, Yuren Mao, Yunjun Gao, Jinshu Lin, Dongfang Lou, et al. 2023. C3: Zero-shot text-to-sql with chatgpt.arXiv preprint arXiv:2307.07306(2023)

work page arXiv 2023

[7] [7]

Jiazhan Feng, Shijue Huang, Xingwei Qu, Ge Zhang, Yujia Qin, Baoquan Zhong, Chengquan Jiang, Jinxin Chi, and Wanjun Zhong. 2026. ReTool: Reinforcement Learning for Strategic Tool Use in LLMs. InThe Fourteenth International Confer- ence on Learning Representations. https://openreview.net/forum?id=tRk1nofSmz

2026

[8] [8]

Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou. 2024. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation.Proc. VLDB Endow.17, 5 (2024), 1132–1145

2024

[9] [9]

Hongcheng Gao, Yue Liu, Yufei He, Longxu Dou, Chao Du, Zhijie Deng, Bryan Hooi, Min Lin, and Tianyu Pang. 2025. Flowreasoner: Reinforcing query-level meta-agents.arXiv preprint arXiv:2504.15257(2025)

work page arXiv 2025

[10] [10]

Yuxiang Guo, Zhuoran Du, Nan Tang, Kezheng Tang, Congcong Ge, and Yunjun Gao. 2026. DTBench: A Synthetic Benchmark for Document-to-Table Extraction. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining

2026

[11] [11]

Mingqian He, Yongliang Shen, Wenqi Zhang, Qiuying Peng, Jun Wang, and Weiming Lu. 2025. Star-sql: Self-taught reasoner for text-to-sql. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 24365–24375

2025

[12] [12]

Zijin Hong, Zheng Yuan, Qinggang Zhang, Hao Chen, Junnan Dong, Feiran Huang, and Xiao Huang. 2025. Next-generation database interfaces: A survey of llm-based text-to-sql.IEEE Transactions on Knowledge and Data Engineering (2025)

2025

[13] [13]

Ruilin Hu, Yuyu Luo, Guoliang Li, Shuangqiao Wu, and Yun Luo. 2026. OpenSQL: Data-Efficient Text-to-SQL for Open-Source LLMs via Synthesized Intermediate Supervision.Proc. VLDB Endow.(2026)

2026

[14] [14]

Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. InEuropean conference on machine learning. Springer, 282–293

2006

[15] [15]

Chia-Hsuan Lee, Oleksandr Polozov, and Matthew Richardson. 2021. KaggleD- BQA: Realistic Evaluation of Text-to-SQL Parsers. InACL/IJCNLP (1). Association for Computational Linguistics, 2261–2273

2021

[16] [16]

Boyan Li, Chong Chen, Zhujun Xue, Yinan Mei, and Yuyu Luo. 2026. DeepEye- SQL: A Software-Engineering-Inspired Text-to-SQL Framework.Proc. ACM Manag. Data4, 3 (2026). https://doi.org/10.1145/3802035

work page doi:10.1145/3802035 2026

[17] [17]

Boyan Li, Ou Ocean Kun Hei, Yue Yu, and Yuyu Luo. 2026. DPC: Training- Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency. https: //doi.org/10.48550/arXiv.2604.15163 arXiv:2604.15163 [cs.DB] Accepted to ACL 2026 Main Track

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.15163 2026

[18] [18]

Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, and Nan Tang. 2024. The Dawn of Natural Language to SQL: Are We Fully Ready? [Experiment, Analysis & Benchmark].Proc. VLDB Endow.17, 11 (2024), 3318–3331

2024

[19] [19]

Boyan Li, Yiran Peng, Yupeng Xie, Sirong Lu, Yizhang Zhu, Xing Mu, Xinyu Liu, and Yuyu Luo. 2026. DeepEye: A Steerable Self-driving Data Agent System. InCompanion of the International Conference on Management of Data(India) (SIGMOD Companion ’26). Association for Computing Machinery, New York, NY, USA, 74–77. https://doi.org/10.1145/3788853.3801612

work page doi:10.1145/3788853.3801612 2026

[20] [20]

Boyan Li, Jiayi Zhang, Ju Fan, Yanwei Xu, Chong Chen, Nan Tang, and Yuyu Luo. 2025. Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search. InForty-second International Conference on Machine Learning

2025

[21] [21]

Haoyang Li, Shang Wu, Xiaokang Zhang, Xinmei Huang, Jing Zhang, Fuxin Jiang, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Hong Chen, and Cuiping Li. 2025. OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale.Proc. VLDB Endow.18, 11 (2025), 4695–4709

2025

[22] [22]

Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, and Hong Chen. 2024. CodeS: Towards Building Open-source Language Models for Text-to-SQL.Proc. ACM Manag. Data2, 3 (2024), 127

2024

[23] [23]

Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, Xuanhe Zhou, Chenhao Ma, Guoliang Li, Kevin Chen-Chuan Chang, Fei Huang, Reynold Cheng, and Yongbin Li. 2023. Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs. InNeurIPS

2023

[24] [24]

Weibin Liao, Xin Gao, Tianyu Jia, Rihong Qiu, Yifan Zhu, Yang Lin, Xinyu Ma, Junfeng Zhao, and Yasha Wang. 2026. LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models. InThe Fourteenth International Conference on Learning Representations. https://openreview.net/ forum?id=q6kXd8Gpfj

2026

[25] [25]

Xiaotian Lin, Yanlin Qi, Yizhang Zhu, Themis Palpanas, Chengliang Chai, Nan Tang, and Yuyu Luo. 2025. LEAD: iterative data selection for efficient LLM instruction tuning.Proceedings of the VLDB Endowment19, 3 (2025), 426–439

2025

[26] [26]

Shih-Yang Liu, Xin Dong, Ximing Lu, Shizhe Diao, Peter Belcak, Mingjie Liu, Min-Hung Chen, Hongxu Yin, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Yejin Choi, Jan Kautz, and Pavlo Molchanov. 2026. GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization. InForty- third International Conference on Machine Learning

2026

[27] [27]

Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, and Yuyu Luo. 2025. A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?IEEE Transactions on Knowledge and Data Engineering(2025)

2025

[28] [28]

Xinyu Liu, Shuyu Shen, Boyan Li, Nan Tang, and Yuyu Luo. 2025. NL2SQL- BUGs: A Benchmark for Detecting Semantic Errors in NL2SQL Translation. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2(Toronto ON, Canada)(KDD ’25). Association for Computing Machinery, New York, NY, USA, 5662–5673. https://doi.org/10.1145/37...

work page doi:10.1145/3711896 2025

[29] [29]

Yifu Liu, Yin Zhu, Yingqi Gao, Zhiling Luo, Xiaoxia Li, Xiaorong Shi, Yuntao Hong, Jinyang Gao, Yu Li, Bolin Ding, and Jingren Zhou. 2026. XiYan-SQL: A Novel Multi-Generator Framework for Text-to-SQL.IEEE Trans. Knowl. Data Eng.38, 4 (2026), 2474–2487

2026

[30] [30]

Tianqi Luo, Chuhan Huang, Leixian Shen, Boyan Li, Shuyu Shen, Wei Zeng, Nan Tang, and Yuyu Luo. 2026. nvBench 2.0: Resolving Ambiguity in Text-to- Visualization through Stepwise Reasoning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https: //openreview.net/forum?id=PuzbYHf1GR

2026

[31] [31]

Yuyu Luo, Guoliang Li, Ju Fan, Chengliang Chai, and Nan Tang. 2025. Natural language to sql: State of the art and open problems.Proceedings of the VLDB Endowment18, 12 (2025), 5466–5471

2025

[32] [32]

Yuyu Luo, Guoliang Li, Ju Fan, and Nan Tang. 2026. Data Agents: Levels, State of the Art, and Open Problems.arXiv preprint arXiv:2602.04261(2026)

work page arXiv 2026

[33] [33]

Yuyu Luo, Xuedi Qin, Nan Tang, and Guoliang Li. 2018. DeepEye: Towards Automatic Data Visualization. In34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018. IEEE Computer Society, 101–112. https://doi.org/10.1109/ICDE.2018.00019

work page doi:10.1109/icde.2018.00019 2018

[34] [34]

Shuai Lyu, Haoran Luo, Ripeng Li, Zhonghong Ou, Jiangfeng Sun, Yang Qin, Xiaoran Shang, Meina Song, and Yifan Zhu. 2025. Sql-o1: A self-reward heuristic dynamic search method for text-to-sql.arXiv preprint arXiv:2502.11741(2025)

work page arXiv 2025

[35] [35]

Da Ma, Ziyue Yang, Hongshen Xu, Haotian Fang, Kai Yu, and Lu Chen. 2026. Empowering LLM Tool Invocation with Tool-call Reward Model. InThe Four- teenth International Conference on Learning Representations. https://openreview. net/forum?id=LnBEASInVr

2026

[36] [36]

Peixian Ma, Boyan Li, Runzhi Jiang, Ju Fan, Nan Tang, and Yuyu Luo. 2024. A Plug-and-Play Natural Language Rewriter for Natural Language to SQL.arXiv preprint arXiv:2412.17068(2024)

work page arXiv 2024

[37] [37]

Stefan Nielsen, Edoardo Cetin, Peter Schwendeman, Qi Sun, Jinglue Xu, and Yujin Tang. 2026. Learning to Orchestrate Agents in Natural Language with the Conductor. InThe Fourteenth International Conference on Learning Representations. https://openreview.net/forum?id=U23A2BUKYt

2026

[38] [38]

Ma Peixian, Xialie Zhuang, Chengjin Xu, Xuhui Jiang, Ran Chen, and Jian Guo

[39] [39]

InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

SQL-R1: Training natural language to sql reasoning model by reinforcement learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

[40] [40]

Mohammadreza Pourreza, Hailong Li, Ruoxi Sun, Yeounoh Chung, Shayan Talaei, Gaurav Tarlok Kakkar, Yu Gan, Amin Saberi, Fatma Ozcan, and Sercan Ö. Arik

[41] [41]

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL. InICLR. OpenReview.net

[42] [42]

Mohammadreza Pourreza and Davood Rafiei. 2023. DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction. InThirty-seventh Con- ference on Neural Information Processing Systems

2023

[43] [43]

Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun, Xingchen Wan, Hailong Li, Azalia Mirhoseini, Amin Saberi, and Sercan O Arik. 2025. Reasoning-SQL: Rein- forcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL. InSecond Conference on Language Modeling

2025

[44] [44]

Yang Qin, Chao Chen, Zhihang Fu, Ze Chen, Dezhong Peng, Peng Hu, and Jieping Ye. 2025. ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL. InThe Thirteenth International Conference on Learning Representations

2025

[45] [45]

Jianhao Ruan, Zhihao Xu, Yiran Peng, Fashen Ren, Zhaoyang Yu, Xinbing Liang, Jinyu Xiang, Yongru Chen, Bang Liu, Chenglin Wu, Yuyu Luo, and Jiayi Zhang

[46] [46]

Yizhang Zhu, Zhangyang Peng, Boyan Li, and Yuyu Luo* arXiv:2602.03786 [cs.AI] https://arxiv.org/abs/2602.03786

AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration. Yizhang Zhu, Zhangyang Peng, Boyan Li, and Yuyu Luo* arXiv:2602.03786 [cs.AI] https://arxiv.org/abs/2602.03786

work page arXiv

[47] [47]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[48] [48]

Lei Sheng and Xu Shuai Shuai. 2025. CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning. InIJCNLP-AACL (Findings). The Asian Federation of Natural Language Processing and The Association for Computa- tional Linguistics, 1473–1496

2025

[49] [49]

Lei Sheng, Shuai-Shuai Xu, and Wei Xie. 2025. BASE-SQL: A powerful open source Text-To-SQL baseline approach.arXiv preprint arXiv:2502.10739(2025)

work page arXiv 2025

[50] [50]

Chang-Yu Tai, Ziru Chen, Tianshu Zhang, Xiang Deng, and Huan Sun. 2023. Exploring chain of thought style prompting for text-to-sql. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 5376–5393

2023

[51] [51]

Shayan Talaei, Mohammadreza Pourreza, Yu-Chen Chang, Azalia Mirhoseini, and Amin Saberi. 2024. CHESS: Contextual Harnessing for Efficient SQL Synthesis. CoRRabs/2405.16755 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[52] [52]

Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, LinZheng Chai, Zhao Yan, Qian-Wen Zhang, Di Yin, Xing Sun, and Zhoujun Li. 2025. MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL. InProceedings of the 31st International Conference on Computational Linguistics

2025

[53] [53]

Pengfei Wang, Baolin Sun, Xuemei Dong, Yaxun Dai, Hongwei Yuan, Mengdie Chu, Yingqi Gao, Xiang Qi, Peng Zhang, and Ying Yan. 2025. Agentar- Scale-SQL: Advancing Text-to-SQL through Orchestrated Test-Time Scaling. arXiv:2509.24403 [cs.CL] https://arxiv.org/abs/2509.24403

work page arXiv 2025

[54] [54]

Yihan Wang, Peiyu Liu, Runyu Chen, Jiaxing Pu, and Wei Xu. 2025. Squrve: A Unified and Modular Framework for Complex Real-World Text-to-SQL Tasks. arXiv preprint arXiv:2510.24102(2025)

work page arXiv 2025

[55] [55]

Yihan Wang, Peiyu Liu, Runyu Chen, and Wei Xu. 2026. Beyond Static Pipelines: Learning Dynamic Workflows for Text-to-SQL.arXiv preprint arXiv:2602.15564 (2026)

work page arXiv 2026

[56] [56]

Xiangjin Xie, Guangwei Xu, Lingyan Zhao, and Ruijie Guo. 2025. OpenSearch- SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Align- ment.Proc. ACM Manag. Data3, 3 (2025), 194:1–194:24

2025

[57] [57]

Zhewei Yao, Guoheng Sun, Lukasz Borchmann, Gaurav Nuti, Zheyu Shen, Ming- hang Deng, Bohan Zhai, Hao Zhang, Ang Li, and Yuxiong He. 2025. Arctic- text2sql-r1: Simple rewards, strong reasoning in text-to-sql.arXiv preprint arXiv:2505.20315(2025)

work page arXiv 2025

[58] [58]

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir R. Radev. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. InEMNLP. Association for Computational Linguistics, 3911–3921

2018

[59] [59]

Shuozhi Yuan, Liming Chen, Miaomiao Yuan, and Zhao Jin. 2026. MCTS-SQL: Light-Weight LLMs Can Master the Text-to-SQL Through Monte Carlo Tree Search. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 34521–34529

2026

[60] [60]

Sandy L Zabell. 1989. The rule of succession.Erkenntnis31, 2 (1989), 283–321

1989

[61] [61]

Fan Zhang, Vireo Zhang, Shengju Qian, Haoxuan Li, Hao Wu, Jinyang Wu, Donghao Zhou, Zhihong Zhu, Zheng Lian, Xin Wang, and Pheng-Ann Heng

[62] [62]

arXiv:2606.13707 [cs.AI] https://arxiv.org/abs/2606.13707

Orchestra-o1: Omnimodal Agent Orchestration. arXiv:2606.13707 [cs.AI] https://arxiv.org/abs/2606.13707

work page arXiv

[63] [63]

Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xiong-Hui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, and Chenglin Wu. 2025. AFlow: Automating Agentic Workflow Generation. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=z5uVAKwmjf

2025

[64] [64]

Yi Zhang, Jan Deriu, George Katsogiannis-Meimarakis, Catherine Kosten, Geor- gia Koutrika, and Kurt Stockinger. 2023. ScienceBenchmark: A Complex Real- World Benchmark for Evaluating Natural Language to SQL Systems.Proc. VLDB Endow.17, 4 (2023), 685–698

2023

[65] [65]

Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Guoliang Li, Bin Wu, and Wenchao Zhou. 2026. Reward-SQL: Boosting Text-to-SQL via Stepwise Execution-Aware Reasoning and Process-Supervised Rewards.Proceedings of the ACM on Management of Data4, 3 (SIGMOD (2026), 1–27

2026

[66] [66]

Yabo Zhang, Yihan Zeng, Qingyun Li, Zhen Hu, Kavin Han, and Wangmeng Zuo

[67] [67]

arXiv preprint arXiv:2509.12867(2025)

Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use. arXiv preprint arXiv:2509.12867(2025)

work page arXiv 2025

[68] [68]

Yizhang Zhu, Shiyin Du, Boyan Li, Yuyu Luo, and Nan Tang. 2024. Are large language models good statisticians?Advances in Neural Information Processing Systems37 (2024), 62697–62731

2024

[69] [69]

Yizhang Zhu, Runzhi Jiang, Boyan Li, Nan Tang, and Yuyu Luo. 2025. EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing. InSecond Conference on Language Modeling

2025

[70] [70]

Yizhang Zhu, Liangwei Wang, Chenyu Yang, Xiaotian Lin, Boyan Li, Wei Zhou, Xinyu Liu, Zhangyang Peng, Tianqi Luo, Yu Li, et al . 2025. A Survey of Data Agents: Emerging Paradigm or Overstated Hype?arXiv preprint arXiv:2510.23587 (2025)

work page arXiv 2025