Progress-SQL: Improving Reinforcement Learning for Text-to-SQL via Progressive Rewards

Shihao Zhang; Weining Qian; Xiaoman Wang; Yuan Liu; Yunshi Lan

arxiv: 2606.06825 · v1 · pith:O5PC5SRQnew · submitted 2026-06-05 · 💻 cs.CL · cs.AI

Progress-SQL: Improving Reinforcement Learning for Text-to-SQL via Progressive Rewards

Shihao Zhang , Xiaoman Wang , Yuan Liu , Yunshi Lan , Weining Qian This is my paper

Pith reviewed 2026-06-27 22:25 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords Text-to-SQLReinforcement LearningProgressive RewardsDiagnostic TreeMulti-turn RefinementSQL GenerationBenchmark Evaluation

0 comments

The pith

Progress-SQL improves Text-to-SQL by using progressive rewards in multi-turn reinforcement learning guided by an Oracle Diagnostic Tree.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a new reinforcement learning method for generating SQL from text that operates over multiple turns rather than one shot. It introduces an Oracle-guided Diagnostic Tree to break down SQL into clause structures and provide feedback, then defines rewards that track how much the SQL improves from start to end, plus bonuses for quick success and fixing errors. This is tested on standard benchmarks and their robustness versions, showing gains in both settings.

Core claim

By abstracting SQL into clause-level structural profiles with the Oracle-guided Diagnostic Tree and combining structural alignment with lexical alignment into a progressive reward, plus latency and execution rewards, the framework supplies denser signals that better support iterative correction of SQL queries in reinforcement learning.

What carries the argument

The Oracle-guided Diagnostic Tree (ODT) that abstracts SQL queries into clause-level structural profiles and produces diagnostic feedback for refinement.

If this is right

Multi-turn SQL refinement receives dense signals measuring structural and lexical progress.
Rewards encourage reaching correct SQL earlier in the process.
Models are incentivized to recover from invalid SQL states.
Performance improves on both standard and robustness evaluations for Text-to-SQL tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could extend to other structured generation tasks like code or query languages beyond SQL.
Diagnostic trees might help in debugging or explaining model outputs in related domains.
Combining this with larger models or different base RL algorithms could yield further gains.

Load-bearing premise

The Oracle-guided Diagnostic Tree must produce accurate clause-level profiles and feedback that actually helps guide useful refinements.

What would settle it

Running the same RL setup but replacing the ODT feedback with random or fixed uninformative signals and observing whether the performance gains disappear on the BIRD and Spider benchmarks.

Figures

Figures reproduced from arXiv: 2606.06825 by Shihao Zhang, Weining Qian, Xiaoman Wang, Yuan Liu, Yunshi Lan.

**Figure 1.** Figure 1: Comparison of reward paradigms. (a) Singleturn Rollout: the policy model generates a single SQL and receives a reward signal after execution. (b) Multiturn Rollout with Progressive Reward (Ours): the policy model iteratively refines its SQL over T turns guided by ODT engine. The progressive reward measures improvement from the first SQL to the final SQL. guidance on SQL generation. This results in inef… view at source ↗

**Figure 2.** Figure 2: Overall framework of Progress-SQL, our multi-turn reinforcement learning method for Text-to-SQL. The [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Effects of per-turn decay and interaction budget. (a) Removing per-turn decay leads to less stable [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Training dynamics of reward and response [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

read the original abstract

Reinforcement learning has recently shown promise in improving large language models for Text-to-SQL generation, yet existing methods typically optimize one-shot rewards defined over a single SQL state. Such rewards provide limited guidance for iterative SQL correction and are insufficient to capture the improvement of multi-turn SQL refinement. In this paper, we propose Progress-SQL, a multi-turn reinforcement learning framework with progressive rewards for Text-to-SQL. Our approach introduces an Oracle-guided Diagnostic Tree (ODT), which abstracts SQL queries into clause-level structural profiles and produces diagnostic feedback for next-turn refinement. To provide dense and robust reward signals, we combine ODT-based structural alignment with lexical alignment and define a progressive reward that measures the improvement from the initial SQL to the final SQL. We further incorporate a progression latency reward that favors earlier correctness and an execution status reward that encourages recovery from the invalid SQL. Experiments on BIRD, Spider, and Spider robustness variants demonstrate that our method consistently improves Text-to-SQL performance across both primary and robustness evaluations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Progress-SQL adds an ODT for clause-level diagnostics and progressive rewards in multi-turn Text-to-SQL RL, with reported gains on Spider and BIRD, but the ODT's accuracy and contribution lack direct checks.

read the letter

Hey,

The main new piece is the Oracle-guided Diagnostic Tree that turns SQL into clause-level structural profiles for diagnostic feedback, plus a progressive reward that scores improvement across turns instead of just the final output. They layer this with lexical alignment, a latency term that rewards earlier correctness, and an execution status signal for recovering from invalid SQL.

The experiments run the method on BIRD, Spider, and robustness variants and claim consistent gains on both primary and robustness metrics. That evaluation setup is straightforward and covers the right datasets for this task.

The soft spot is exactly what the stress-test note flags: no human evaluation, error analysis, or inter-annotator check on whether the ODT profiles are accurate or the feedback actually useful. Without ablations that isolate the ODT component, the gains could be coming from the execution or latency rewards alone. The abstract gives no sign that this validation was done.

Everything else looks standard for RL Text-to-SQL work, with rewards tied to an external oracle, so no obvious circularity.

This is for people already working on RL or multi-turn refinement for structured generation in NLP and databases. A reader focused on reward shaping would pick up the progressive reward design.

It should go to peer review because the benchmarks are standard and the multi-turn framing is a reasonable step forward, even if the ODT part needs more scrutiny in revision.

Referee Report

2 major / 0 minor

Summary. The paper proposes Progress-SQL, a multi-turn RL framework for Text-to-SQL that defines progressive rewards via an Oracle-guided Diagnostic Tree (ODT) for clause-level structural alignment, combined with lexical alignment, a progression latency term favoring earlier correctness, and an execution status term for recovery from invalid SQL. It claims that this yields consistent gains over prior methods on BIRD, Spider, and Spider robustness variants.

Significance. If the empirical claims hold after proper validation, the progressive-reward formulation could address a recognized limitation of one-shot rewards in iterative Text-to-SQL refinement by supplying denser, multi-turn signals. The ODT mechanism for producing structural profiles is a potentially reusable idea for diagnostic feedback in structured generation tasks.

major comments (2)

[Abstract] Abstract: the central claim that the method 'consistently improves Text-to-SQL performance' is presented without any experimental details, baselines, ablation results, number of runs, or statistical tests. This prevents assessment of whether observed gains are attributable to the ODT component or to the execution-status and latency terms alone.
[ODT description] ODT description (abstract paragraph on Oracle-guided Diagnostic Tree): no human evaluation, inter-annotator agreement, accuracy metrics, or error analysis is reported for the clause-level structural profiles or diagnostic feedback. Because the progressive reward is defined in terms of ODT alignment, the absence of direct validation of ODT accuracy is load-bearing for the claim that the structural component drives the reported improvements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to incorporate additional details and validation where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the method 'consistently improves Text-to-SQL performance' is presented without any experimental details, baselines, ablation results, number of runs, or statistical tests. This prevents assessment of whether observed gains are attributable to the ODT component or to the execution-status and latency terms alone.

Authors: We agree that the abstract is high-level and would benefit from more specifics to help readers evaluate the source of improvements. In the revised version we will expand the abstract to report key quantitative gains on BIRD and Spider, name the primary baselines, and note that ablations isolate the contribution of the ODT structural term versus the latency and execution-status components. Full experimental details, including run counts and any statistical tests, remain in the Experiments section; the abstract revision will be kept concise while addressing the attribution concern. revision: yes
Referee: [ODT description] ODT description (abstract paragraph on Oracle-guided Diagnostic Tree): no human evaluation, inter-annotator agreement, accuracy metrics, or error analysis is reported for the clause-level structural profiles or diagnostic feedback. Because the progressive reward is defined in terms of ODT alignment, the absence of direct validation of ODT accuracy is load-bearing for the claim that the structural component drives the reported improvements.

Authors: The referee is correct that the current manuscript provides no direct human evaluation, inter-annotator agreement, or accuracy metrics for the ODT profiles. Because the ODT is constructed by direct comparison to the oracle (gold) SQL, its structural profiles are definitionally faithful to the reference; however, we acknowledge that this does not substitute for explicit validation of the diagnostic feedback quality. We will add an error-analysis subsection with qualitative examples of ODT alignments and any available quantitative checks on profile accuracy. Ablation results that isolate the structural-alignment reward term already provide indirect evidence for its contribution, but we agree that the requested direct metrics would strengthen the paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces Progress-SQL as a multi-turn RL framework whose progressive reward is explicitly constructed from three external signals (ODT structural alignment, lexical alignment, execution status) plus a latency term. These components are defined relative to an oracle and ground-truth SQL rather than to the model's own predictions or fitted parameters. No equations reduce a claimed prediction to a fitted input by construction, no self-citations are invoked as load-bearing uniqueness theorems, and the ODT is presented as a new module whose outputs are not defined circularly in terms of the reward itself. Experiments on independent benchmarks (BIRD, Spider) supply external falsifiability. The derivation therefore remains self-contained against the listed circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Only the abstract is available; no equations, hyperparameters, or background assumptions can be extracted or audited.

invented entities (1)

Oracle-guided Diagnostic Tree (ODT) no independent evidence
purpose: Abstracts SQL queries into clause-level structural profiles to produce diagnostic feedback for refinement
Introduced in the abstract as the core mechanism for dense structural rewards.

pith-pipeline@v0.9.1-grok · 5706 in / 1209 out tokens · 26865 ms · 2026-06-27T22:25:41.255759+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 9 canonical work pages

[1]

arXiv preprint arXiv:2503.23157 , year=

Reasoning-sql: Reinforcement learning with sql tailored partial rewards for reasoning-enhanced text-to-sql , author=. arXiv preprint arXiv:2503.23157 , year=

arXiv
[2]

arXiv preprint arXiv:2509.07159 , year=

PaVeRL-SQL: Text-to-SQL via Partial-Match Rewards and Verbal Reinforcement Learning , author=. arXiv preprint arXiv:2509.07159 , year=

arXiv
[3]

arXiv preprint arXiv:2504.15077 , year=

Think2sql: Reinforce llm reasoning capabilities for text2sql , author=. arXiv preprint arXiv:2504.15077 , year=

Pith/arXiv arXiv
[4]

arXiv preprint arXiv:2509.21459 , year=

A state-of-the-art sql reasoning model using rlvr , author=. arXiv preprint arXiv:2509.21459 , year=

arXiv
[5]

Advances in Neural Information Processing Systems , volume=

Sql-r1: Training natural language to sql reasoning model by reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=
[6]

The First Workshop on Neural Reasoning and Mathematical Discovery at AAAI’2025

Llm-based sql generation with reinforcement learning , author=. The First Workshop on Neural Reasoning and Mathematical Discovery at AAAI’2025. Workshop Paper , year=

2025
[7]

Jinyang Li and Binyuan Hui and Ge Qu and Jiaxi Yang and Binhua Li and Bowen Li and Bailin Wang and Bowen Qin and Ruiying Geng and Nan Huo and Xuanhe Zhou and Chenhao Ma and Guoliang Li and Kevin Chen. Can. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, U...

2023
[8]

Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task

Tao Yu and Rui Zhang and Kai Yang and Michihiro Yasunaga and Dongxu Wang and Zifan Li and James Ma and Irene Li and Qingning Yao and Shanelle Roman and Zilin Zhang and Dragomir R. Radev , editor =. Spider:. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018 , pages =. 20...

work page doi:10.18653/v1/d18-1425 2018
[9]

and Xie, Jinxia and Huang, Pengsheng

Gan, Yujian and Chen, Xinyun and Huang, Qiuping and Purver, Matthew and Woodward, John R. and Xie, Jinxia and Huang, Pengsheng. Towards Robustness of Text-to- SQL Models against Synonym Substitution. 2021. doi:10.18653/v1/2021.acl-long.195

work page doi:10.18653/v1/2021.acl-long.195 2021
[10]

Structure-Grounded Pretraining for Text-to-SQL , booktitle =

Xiang Deng and Ahmed Hassan Awadallah and Christopher Meek and Oleksandr Polozov and Huan Sun and Matthew Richardson , editor =. Structure-Grounded Pretraining for Text-to-SQL , booktitle =. 2021 , url =. doi:10.18653/V1/2021.NAACL-MAIN.105 , timestamp =

work page doi:10.18653/v1/2021.naacl-main.105 2021
[11]

Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization , booktitle =

Yujian Gan and Xinyun Chen and Matthew Purver , editor =. Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization , booktitle =. 2021 , url =. doi:10.18653/V1/2021.EMNLP-MAIN.702 , timestamp =

work page doi:10.18653/v1/2021.emnlp-main.702 2021
[12]

2025 , eprint=

Group Sequence Policy Optimization , author=. 2025 , eprint=

2025
[13]

2024 , journal =

HybridFlow: A Flexible and Efficient RLHF Framework , author =. 2024 , journal =

2024
[14]

2025 , eprint=

Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards , author=. 2025 , eprint=

2025
[15]

Mohammadreza Pourreza and Davood Rafiei , editor =. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , year =

2023
[16]

Dawei Gao and Haibin Wang and Yaliang Li and Xiuyu Sun and Yichen Qian and Bolin Ding and Jingren Zhou , title =. Proc. 2024 , url =. doi:10.14778/3641204.3641221 , timestamp =

work page doi:10.14778/3641204.3641221 2024
[17]

Findings of the Association for Computational Linguistics:

Mohammadreza Pourreza and Davood Rafiei , editor =. Findings of the Association for Computational Linguistics:. 2024 , url =. doi:10.18653/V1/2024.FINDINGS-EMNLP.481 , timestamp =

work page doi:10.18653/v1/2024.findings-emnlp.481 2024
[18]

Xiyan-sql: A novel multi-generator framework for text-to-sql.IEEE Transactions on Knowledge and Data Engineering, pages 1–14, 2026

Yifu Liu and Yin Zhu and Yingqi Gao and Zhiling Luo and Xiaoxia Li and Xiaorong Shi and Yuntao Hong and Jinyang Gao and Yu Li and Bolin Ding and Jingren Zhou , title =. 2026 , url =. doi:10.1109/TKDE.2026.3657851 , timestamp =

work page doi:10.1109/tkde.2026.3657851 2026
[19]

The Thirteenth International Conference on Learning Representations,

Mohammadreza Pourreza and Hailong Li and Ruoxi Sun and Yeounoh Chung and Shayan Talaei and Gaurav Tarlok Kakkar and Yu Gan and Amin Saberi and Fatma Ozcan and Sercan. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025
[20]

SkyRL-SQL: Multi-turn SQL Data Agents via RL , author=
[21]

arXiv preprint arXiv:1707.06347 , year=

Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

Pith/arXiv arXiv
[22]

2024 , eprint=

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author=. 2024 , eprint=

2024
[23]

arXiv preprint arXiv:2505.12768 , year=

Reex-sql: Reasoning with execution-aware reinforcement learning for text-to-sql , author=. arXiv preprint arXiv:2505.12768 , year=

arXiv
[24]

Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward , booktitle =

Han Weng and Puzhen Wu and Longjie Cui and Yi Zhan and Boyi Liu and Yuanfeng Song and Dun Zeng and Yingxiang Yang and Qianru Zhang and Dong Huang and Xiaoming Yin and Yang Sun and Xing Chen , editor =. Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward , booktitle =. 2025 , url =

2025
[25]

2026 , url=

Qiying Yu and Zheng Zhang and Ruofei Zhu and Yufeng Yuan and Xiaochen Zuo and YuYue and Weinan Dai and Tiantian Fan and Gaohong Liu and Juncai Liu and LingJun Liu and Xin Liu and Haibin Lin and Zhiqi Lin and Bole Ma and Guangming Sheng and Yuxuan Tong and Chi Zhang and Mofan Zhang and Ru Zhang and Wang Zhang and Hang Zhu and Jinhua Zhu and Jiaze Chen and ...

2026
[26]

Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Zhang, Ruoyu and Ma, Shirong and Bi, Xiao and Zhang, Xiaokang and Yu, Xingkai and Wu, Yu and Wu, Z. F. and Gou, Zhibin and Shao, Zhihong and Li, Zhuoshu and Gao, Ziyi and Liu, Aixin and Xue, Bing and Wang, Bingxuan and Wu, Bochao and Feng, Bei ...

work page doi:10.1038/s41586-025-09422-z
[27]

Haoyang Li and Shang Wu and Xiaokang Zhang and Xinmei Huang and Jing Zhang and Fuxin Jiang and Shuai Wang and Tieying Zhang and Jianjun Chen and Rui Shi and Hong Chen and Cuiping Li , title =. Proc. 2025 , url =. doi:10.14778/3749646.3749723 , timestamp =

work page doi:10.14778/3749646.3749723 2025
[28]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv
[29]

5-coder technical report , author=

Qwen2. 5-coder technical report , author=. arXiv preprint arXiv:2409.12186 , year=

Pith/arXiv arXiv
[30]

arXiv preprint arXiv:2601.17699 , year=

SQL-Trail: Multi-Turn Reinforcement Learning with Interleaved Feedback for Text-to-SQL , author=. arXiv preprint arXiv:2601.17699 , year=

arXiv
[31]

Proceedings of the ACM on Management of Data , volume=

Codes: Towards building open-source language models for text-to-sql , author=. Proceedings of the ACM on Management of Data , volume=. 2024 , publisher=

2024
[32]

Proceedings of the national conference on artificial intelligence , pages=

Learning to parse database queries using inductive logic programming , author=. Proceedings of the national conference on artificial intelligence , pages=
[33]

Proceedings of the 8th international conference on Intelligent user interfaces , pages=

Towards a theory of natural language interfaces to databases , author=. Proceedings of the 8th international conference on Intelligent user interfaces , pages=
[34]

, author=

Constructing an interactive natural language interface for relational databases. , author=. Proc. VLDB Endow. , volume=
[35]

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) , pages=

Typesql: Knowledge-based type-aware neural text-to-sql generation , author=. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) , pages=

2018
[36]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

[1] [1]

arXiv preprint arXiv:2503.23157 , year=

Reasoning-sql: Reinforcement learning with sql tailored partial rewards for reasoning-enhanced text-to-sql , author=. arXiv preprint arXiv:2503.23157 , year=

arXiv

[2] [2]

arXiv preprint arXiv:2509.07159 , year=

PaVeRL-SQL: Text-to-SQL via Partial-Match Rewards and Verbal Reinforcement Learning , author=. arXiv preprint arXiv:2509.07159 , year=

arXiv

[3] [3]

arXiv preprint arXiv:2504.15077 , year=

Think2sql: Reinforce llm reasoning capabilities for text2sql , author=. arXiv preprint arXiv:2504.15077 , year=

Pith/arXiv arXiv

[4] [4]

arXiv preprint arXiv:2509.21459 , year=

A state-of-the-art sql reasoning model using rlvr , author=. arXiv preprint arXiv:2509.21459 , year=

arXiv

[5] [5]

Advances in Neural Information Processing Systems , volume=

Sql-r1: Training natural language to sql reasoning model by reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

[6] [6]

The First Workshop on Neural Reasoning and Mathematical Discovery at AAAI’2025

Llm-based sql generation with reinforcement learning , author=. The First Workshop on Neural Reasoning and Mathematical Discovery at AAAI’2025. Workshop Paper , year=

2025

[7] [7]

Jinyang Li and Binyuan Hui and Ge Qu and Jiaxi Yang and Binhua Li and Bowen Li and Bailin Wang and Bowen Qin and Ruiying Geng and Nan Huo and Xuanhe Zhou and Chenhao Ma and Guoliang Li and Kevin Chen. Can. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, U...

2023

[8] [8]

Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task

Tao Yu and Rui Zhang and Kai Yang and Michihiro Yasunaga and Dongxu Wang and Zifan Li and James Ma and Irene Li and Qingning Yao and Shanelle Roman and Zilin Zhang and Dragomir R. Radev , editor =. Spider:. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018 , pages =. 20...

work page doi:10.18653/v1/d18-1425 2018

[9] [9]

and Xie, Jinxia and Huang, Pengsheng

Gan, Yujian and Chen, Xinyun and Huang, Qiuping and Purver, Matthew and Woodward, John R. and Xie, Jinxia and Huang, Pengsheng. Towards Robustness of Text-to- SQL Models against Synonym Substitution. 2021. doi:10.18653/v1/2021.acl-long.195

work page doi:10.18653/v1/2021.acl-long.195 2021

[10] [10]

Structure-Grounded Pretraining for Text-to-SQL , booktitle =

Xiang Deng and Ahmed Hassan Awadallah and Christopher Meek and Oleksandr Polozov and Huan Sun and Matthew Richardson , editor =. Structure-Grounded Pretraining for Text-to-SQL , booktitle =. 2021 , url =. doi:10.18653/V1/2021.NAACL-MAIN.105 , timestamp =

work page doi:10.18653/v1/2021.naacl-main.105 2021

[11] [11]

Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization , booktitle =

Yujian Gan and Xinyun Chen and Matthew Purver , editor =. Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization , booktitle =. 2021 , url =. doi:10.18653/V1/2021.EMNLP-MAIN.702 , timestamp =

work page doi:10.18653/v1/2021.emnlp-main.702 2021

[12] [12]

2025 , eprint=

Group Sequence Policy Optimization , author=. 2025 , eprint=

2025

[13] [13]

2024 , journal =

HybridFlow: A Flexible and Efficient RLHF Framework , author =. 2024 , journal =

2024

[14] [14]

2025 , eprint=

Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards , author=. 2025 , eprint=

2025

[15] [15]

Mohammadreza Pourreza and Davood Rafiei , editor =. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , year =

2023

[16] [16]

Dawei Gao and Haibin Wang and Yaliang Li and Xiuyu Sun and Yichen Qian and Bolin Ding and Jingren Zhou , title =. Proc. 2024 , url =. doi:10.14778/3641204.3641221 , timestamp =

work page doi:10.14778/3641204.3641221 2024

[17] [17]

Findings of the Association for Computational Linguistics:

Mohammadreza Pourreza and Davood Rafiei , editor =. Findings of the Association for Computational Linguistics:. 2024 , url =. doi:10.18653/V1/2024.FINDINGS-EMNLP.481 , timestamp =

work page doi:10.18653/v1/2024.findings-emnlp.481 2024

[18] [18]

Xiyan-sql: A novel multi-generator framework for text-to-sql.IEEE Transactions on Knowledge and Data Engineering, pages 1–14, 2026

Yifu Liu and Yin Zhu and Yingqi Gao and Zhiling Luo and Xiaoxia Li and Xiaorong Shi and Yuntao Hong and Jinyang Gao and Yu Li and Bolin Ding and Jingren Zhou , title =. 2026 , url =. doi:10.1109/TKDE.2026.3657851 , timestamp =

work page doi:10.1109/tkde.2026.3657851 2026

[19] [19]

The Thirteenth International Conference on Learning Representations,

Mohammadreza Pourreza and Hailong Li and Ruoxi Sun and Yeounoh Chung and Shayan Talaei and Gaurav Tarlok Kakkar and Yu Gan and Amin Saberi and Fatma Ozcan and Sercan. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025

[20] [20]

SkyRL-SQL: Multi-turn SQL Data Agents via RL , author=

[21] [21]

arXiv preprint arXiv:1707.06347 , year=

Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

Pith/arXiv arXiv

[22] [22]

2024 , eprint=

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author=. 2024 , eprint=

2024

[23] [23]

arXiv preprint arXiv:2505.12768 , year=

Reex-sql: Reasoning with execution-aware reinforcement learning for text-to-sql , author=. arXiv preprint arXiv:2505.12768 , year=

arXiv

[24] [24]

Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward , booktitle =

Han Weng and Puzhen Wu and Longjie Cui and Yi Zhan and Boyi Liu and Yuanfeng Song and Dun Zeng and Yingxiang Yang and Qianru Zhang and Dong Huang and Xiaoming Yin and Yang Sun and Xing Chen , editor =. Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward , booktitle =. 2025 , url =

2025

[25] [25]

2026 , url=

Qiying Yu and Zheng Zhang and Ruofei Zhu and Yufeng Yuan and Xiaochen Zuo and YuYue and Weinan Dai and Tiantian Fan and Gaohong Liu and Juncai Liu and LingJun Liu and Xin Liu and Haibin Lin and Zhiqi Lin and Bole Ma and Guangming Sheng and Yuxuan Tong and Chi Zhang and Mofan Zhang and Ru Zhang and Wang Zhang and Hang Zhu and Jinhua Zhu and Jiaze Chen and ...

2026

[26] [26]

Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Zhang, Ruoyu and Ma, Shirong and Bi, Xiao and Zhang, Xiaokang and Yu, Xingkai and Wu, Yu and Wu, Z. F. and Gou, Zhibin and Shao, Zhihong and Li, Zhuoshu and Gao, Ziyi and Liu, Aixin and Xue, Bing and Wang, Bingxuan and Wu, Bochao and Feng, Bei ...

work page doi:10.1038/s41586-025-09422-z

[27] [27]

Haoyang Li and Shang Wu and Xiaokang Zhang and Xinmei Huang and Jing Zhang and Fuxin Jiang and Shuai Wang and Tieying Zhang and Jianjun Chen and Rui Shi and Hong Chen and Cuiping Li , title =. Proc. 2025 , url =. doi:10.14778/3749646.3749723 , timestamp =

work page doi:10.14778/3749646.3749723 2025

[28] [28]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv

[29] [29]

5-coder technical report , author=

Qwen2. 5-coder technical report , author=. arXiv preprint arXiv:2409.12186 , year=

Pith/arXiv arXiv

[30] [30]

arXiv preprint arXiv:2601.17699 , year=

SQL-Trail: Multi-Turn Reinforcement Learning with Interleaved Feedback for Text-to-SQL , author=. arXiv preprint arXiv:2601.17699 , year=

arXiv

[31] [31]

Proceedings of the ACM on Management of Data , volume=

Codes: Towards building open-source language models for text-to-sql , author=. Proceedings of the ACM on Management of Data , volume=. 2024 , publisher=

2024

[32] [32]

Proceedings of the national conference on artificial intelligence , pages=

Learning to parse database queries using inductive logic programming , author=. Proceedings of the national conference on artificial intelligence , pages=

[33] [33]

Proceedings of the 8th international conference on Intelligent user interfaces , pages=

Towards a theory of natural language interfaces to databases , author=. Proceedings of the 8th international conference on Intelligent user interfaces , pages=

[34] [34]

, author=

Constructing an interactive natural language interface for relational databases. , author=. Proc. VLDB Endow. , volume=

[35] [35]

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) , pages=

Typesql: Knowledge-based type-aware neural text-to-sql generation , author=. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) , pages=

2018

[36] [36]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=