ZAS-SQL: Distilling Rules from Failures for Zero-Shot Text-to-SQL

Hongzhou Zheng; Wenjia Zhang; Yixin Gou

arxiv: 2606.08245 · v1 · pith:A55VHKSWnew · submitted 2026-06-06 · 💻 cs.CL

ZAS-SQL: Distilling Rules from Failures for Zero-Shot Text-to-SQL

Hongzhou Zheng , Yixin Gou , Wenjia Zhang This is my paper

Pith reviewed 2026-06-27 19:42 UTC · model grok-4.3

classification 💻 cs.CL

keywords text-to-sqlzero-shot learningrule distillationlarge language modelsspider benchmarkstructured reasoningfailure analysisexecution feedback

0 comments

The pith

Distilling recurring LLM failure patterns into a compact set of generation rules enables a fully zero-shot Text-to-SQL system to reach 88.6% execution accuracy on Spider test.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that zero-shot Text-to-SQL errors by large language models are not scattered but cluster into repeatable structural mistakes. A Map-Reduce pipeline extracts these mistakes and condenses them into a small number of reusable generation rules. These rules are then enforced through three modules that add missing schema semantics, force step-by-step structured reasoning, and stop early when execution feedback indicates an error. The resulting system sets new zero-shot records on Spider while using no demonstrations, and it also works on a domain-specific dataset and with a 4B-parameter model. This approach removes the need for example queries that few-shot methods require and still exceeds several of those methods built on GPT-4.

Core claim

LLM failures in zero-shot Text-to-SQL follow systematic patterns that can be distilled into a small set of core generation rules; when these rules are paired with knowledge-augmented schema representation and rule-driven structured reasoning plus execution-guided early stopping, the framework produces executable SQL at 87.2% accuracy on Spider development and 88.6% on test without any in-context examples.

What carries the argument

The Map-Reduce-based rule distillation pipeline that scans failure cases, maps individual error modes, and reduces them to a compact rule set that then constrains generation in the three downstream modules.

If this is right

The same distilled rules allow an 81.3% result on the domain-specific UrbanPlan dataset.
A 4B-parameter model equipped with the rules surpasses zero-shot baselines of larger closed-source models.
Rule-driven structured reasoning reduces structural deviations that otherwise appear in free-form generation.
Execution-guided early stopping supplies low-cost self-correction without extra model calls.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The distillation pipeline could be reused on other structured output tasks where errors also cluster, such as semantic parsing or code generation.
Because the rules replace in-context examples, the method may scale better when schemas become very large and context windows are limited.
Iterative application of the same rules across multiple self-correction rounds might further close the remaining gap to few-shot performance.

Load-bearing premise

The assumption that LLM failures in zero-shot Text-to-SQL are systematic and recurring enough to be distilled into a small set of rules that generalize across domains and model sizes.

What would settle it

Extract the rules on Spider, then test them on a fresh cross-domain benchmark where the rule-augmented zero-shot accuracy falls back to the level of an unaugmented zero-shot baseline.

Figures

Figures reproduced from arXiv: 2606.08245 by Hongzhou Zheng, Wenjia Zhang, Yixin Gou.

**Figure 2.** Figure 2: Overall architecture of ZAS-SQL. (a) Offline schema augmentation: sampled values and LLM-inferred [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Text-to-SQL translates natural language into executable SQL queries. Few-shot in-context learning methods built upon large language models (LLMs) achieve strong performance, yet their reliance on demonstrations limits cross-domain generalization and consumes substantial context window space. Existing zero-shot methods, lacking effective generation constraints, still fall short of few-shot approaches. We observe that LLM failures in zero-shot Text-to-SQL are not random but exhibit systematic, recurring patterns. Building on this observation, we propose a fully zero-shot Text-to-SQL framework that distills core generation rules from failure cases through a Map-Reduce-based rule distillation pipeline and improves generation quality via three complementary modules: knowledge-augmented schema representation, which supplements missing semantics in Data Definition Language; a rule-driven structured reasoning framework that suppresses structural deviations; and Execution-Guided Early Stopping, which enables low-cost self-correction. On Spider, the proposed framework achieves up to 87.2% and 88.6% execution accuracy on the Dev and Test sets, respectively, establishing a new zero-shot state-of-the-art and surpassing multiple few-shot and fine-tuning methods built upon GPT-4/4o. On the domain-specific dataset UrbanPlan, it achieves 81.3%, confirming that the rule distillation approach generalizes across domains. Moreover, when equipped with a 4B-parameter model, the framework surpasses zero-shot baselines of leading closed-source models, demonstrating strong model generality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The zero-shot claim rests on shaky ground because distilling rules from failures requires ground-truth labels from the benchmark.

read the letter

The main thing here is that the paper distills generation rules from LLM failures via a Map-Reduce pipeline and then applies them through schema augmentation, structured reasoning, and early stopping. It reports 87.2% execution accuracy on Spider dev and 88.6% on test, plus 81.3% on UrbanPlan and decent results with a 4B model.

What is new is the specific rule-distillation pipeline and its combination with those three modules. The observation that failures follow recurring patterns is plausible, and packaging them as reusable rules is a reasonable direction if the rules actually transfer.

The soft spot is the zero-shot guarantee. Identifying failures means executing the generated SQL against ground-truth answers and spotting mismatches. That step uses labeled data from Spider. The abstract does not say whether the distillation ran on a separate synthetic set or on the dev/test splits themselves. If it used the evaluation data, the comparison to few-shot GPT-4 baselines no longer holds and the numbers are not evidence of a true zero-shot method.

No ablations or error analysis appear in the abstract, so it is unclear how much each module contributes or whether the rules are domain-general. The citation pattern is thin on prior rule-extraction work.

This is for people working on LLM prompting for structured generation. A reader who wants concrete prompting tricks might extract something useful from the modules. It deserves peer review so the data-source question can be checked directly; the numbers are high enough that the details matter.

Referee Report

1 major / 1 minor

Summary. The paper proposes ZAS-SQL, a fully zero-shot Text-to-SQL framework that identifies systematic failure patterns in LLM-generated SQL, distills a small set of core generation rules via a Map-Reduce pipeline, and augments generation with knowledge-augmented schema representation, rule-driven structured reasoning, and Execution-Guided Early Stopping. It reports execution accuracies of 87.2% (Spider Dev) and 88.6% (Spider Test), surpassing several few-shot and fine-tuned GPT-4/4o baselines, with additional results on UrbanPlan (81.3%) and a 4B-parameter model.

Significance. If the zero-shot guarantee can be verified without label leakage, the result would be significant: it would demonstrate that a compact set of distilled rules can substitute for in-context demonstrations while improving cross-domain generalization and model efficiency. The reported outperformance of closed-source zero-shot baselines by an open 4B model would also be noteworthy if reproducible.

major comments (1)

[Abstract and Methods (Map-Reduce rule distillation)] Abstract and rule-distillation pipeline description: the central claim of a 'fully zero-shot' framework that sets new SOTA while surpassing few-shot GPT-4 methods rests on the distillation step. Identifying failures for rule extraction requires executing generated SQL against ground-truth labels and comparing results; the manuscript provides no explicit statement that this step was performed exclusively on a held-out synthetic corpus rather than the Spider dev/test splits whose accuracies are later reported. If evaluation labels were used, the zero-shot property and the comparison to few-shot baselines are invalidated.

minor comments (1)

[Abstract] The abstract mentions results on 'UrbanPlan' but supplies no dataset statistics, domain description, or baseline numbers for that corpus.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting this critical point regarding the zero-shot guarantee. We address the concern directly below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Abstract and rule-distillation pipeline description: the central claim of a 'fully zero-shot' framework that sets new SOTA while surpassing few-shot GPT-4 methods rests on the distillation step. Identifying failures for rule extraction requires executing generated SQL against ground-truth labels and comparing results; the manuscript provides no explicit statement that this step was performed exclusively on a held-out synthetic corpus rather than the Spider dev/test splits whose accuracies are later reported. If evaluation labels were used, the zero-shot property and the comparison to few-shot baselines are invalidated.

Authors: We agree that the current manuscript lacks an explicit statement on the data source for rule distillation, which is necessary to substantiate the zero-shot claim. The rule distillation was performed exclusively on a held-out synthetic corpus constructed independently of the Spider dev/test splits (with no overlap in queries or schemas), ensuring no label leakage into the reported evaluation results. We will revise the Methods section and add a dedicated subsection detailing the synthetic corpus generation process, the Map-Reduce pipeline execution, and explicit confirmation of separation from evaluation data. This clarification will also be reflected in the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on benchmark evaluation without self-referential reduction

full rationale

The paper describes an empirical pipeline that observes LLM failure patterns in zero-shot Text-to-SQL, distills rules via Map-Reduce, and augments generation with schema and reasoning modules. No equations, parameter fits, or self-citations are invoked that would make any reported accuracy (e.g., 87.2% on Spider dev) equivalent to its own inputs by construction. The central claims are externally falsifiable via standard benchmark execution accuracy and do not reduce to renaming, self-definition, or load-bearing self-citation chains. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only abstract available; limited information on parameters or axioms.

axioms (1)

domain assumption LLM failures in zero-shot Text-to-SQL are systematic and recurring rather than random
Stated as the foundational observation enabling the distillation pipeline.

pith-pipeline@v0.9.1-grok · 5790 in / 1159 out tokens · 15533 ms · 2026-06-27T19:42:06.562189+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 22 canonical work pages

[1]

The VLDB Journal , author =

A survey on deep learning approaches for text-to-. The VLDB Journal , author =. 2023 , pages =. doi:10.1007/s00778-022-00776-8 , language =

work page doi:10.1007/s00778-022-00776-8 2023
[2]

Recent Advances in Text-to- SQL : A Survey of What We Have and What We Expect

Deng, Naihao and Chen, Yulong and Zhang, Yue. Recent Advances in Text-to- SQL : A Survey of What We Have and What We Expect. Proceedings of the 29th International Conference on Computational Linguistics. 2022

2022
[3]

Next-Generation Database Interfaces: A Survey of LLM-Based Text-to-SQL , year=

Hong, Zijin and Yuan, Zheng and Zhang, Qinggang and Chen, Hao and Dong, Junnan and Huang, Feiran and Huang, Xiao , journal=. Next-Generation Database Interfaces: A Survey of LLM-Based Text-to-SQL , year=
[4]

2025 , eprint=

Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities , author=. 2025 , eprint=

2025
[5]

Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task

Yu, Tao and Zhang, Rui and Yang, Kai and Yasunaga, Michihiro and Wang, Dongxu and Li, Zifan and Ma, James and Li, Irene and Yao, Qingning and Roman, Shanelle and Zhang, Zilin and Radev, Dragomir. S pider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to- SQL Task. Proceedings of the 2018 Conference on Empirical...

work page doi:10.18653/v1/d18-1425 2018
[6]

2024 , eprint=

GPT-4 Technical Report , author=. 2024 , eprint=

2024
[7]

A survey on employing large language models for text-to-SQL tasks.ACM Computing Surveys, 58(2):1–37, 2026

Shi, Liang and Tang, Zhengju and Zhang, Nan and Zhang, Xiaotong and Yang, Zhi , title =. ACM Comput. Surv. , month = sep, articleno =. 2025 , issue_date =. doi:10.1145/3737873 , abstract =

work page doi:10.1145/3737873 2025
[8]

MS c- SQL : Multi-Sample Critiquing Small Language Models For Text-To- SQL Translation

Gorti, Satya Krishna and Gofman, Ilan and Liu, Zhaoyan and Wu, Jiapeng and Vouitsis, No. MS c- SQL : Multi-Sample Critiquing Small Language Models For Text-To- SQL Translation. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 202...

work page doi:10.18653/v1/2025.naacl-long.107 2025
[9]

Learning from Imperfect Data: Towards Efficient Knowledge Distillation of Autoregressive Language Models for Text-to- SQL

Zhong, Qihuang and Chen, Kunfeng and Ding, Liang and Liu, Juhua and Du, Bo and Tao, Dacheng. Learning from Imperfect Data: Towards Efficient Knowledge Distillation of Autoregressive Language Models for Text-to- SQL. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.403

work page doi:10.18653/v1/2024.findings-emnlp.403 2024
[10]

Companion of the 2024 International Conference on Management of Data , pages =

Zhang, Chao and Mao, Yuren and Fan, Yijiang and Mi, Yu and Gao, Yunjun and Chen, Lu and Lou, Dongfang and Lin, Jinshu , title =. Companion of the 2024 International Conference on Management of Data , pages =. 2024 , isbn =. doi:10.1145/3626246.3653375 , abstract =

work page doi:10.1145/3626246.3653375 2024
[11]

Synthesizing Text-to- SQL Data from Weak and Strong LLM s

Yang, Jiaxi and Hui, Binyuan and Yang, Min and Yang, Jian and Lin, Junyang and Zhou, Chang. Synthesizing Text-to- SQL Data from Weak and Strong LLM s. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.425

work page doi:10.18653/v1/2024.acl-long.425 2024
[12]

Findings of the Association for Computational Linguistics:

Pourreza, Mohammadreza and Rafiei, Davood. DTS - SQL : Decomposed Text-to- SQL with Small Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.481

work page doi:10.18653/v1/2024.findings-emnlp.481 2024
[13]

ACT - SQL : In-Context Learning for Text-to- SQL with Automatically-Generated Chain-of-Thought

Zhang, Hanchong and Cao, Ruisheng and Chen, Lu and Xu, Hongshen and Yu, Kai. ACT - SQL : In-Context Learning for Text-to- SQL with Automatically-Generated Chain-of-Thought. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.227

work page doi:10.18653/v1/2023.findings-emnlp.227 2023
[14]

MAC - SQL : A Multi-Agent Collaborative Framework for Text-to- SQL

Wang, Bing and Ren, Changyu and Yang, Jian and Liang, Xinnian and Bai, Jiaqi and Chai, LinZheng and Yan, Zhao and Zhang, Qian-Wen and Yin, Di and Sun, Xing and Li, Zhoujun. MAC - SQL : A Multi-Agent Collaborative Framework for Text-to- SQL. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025
[15]

SQLP rompt: In-Context Text-to- SQL with Minimal Labeled Data

Sun, Ruoxi and Arik, Sercan and Sinha, Rajarishi and Nakhost, Hootan and Dai, Hanjun and Yin, Pengcheng and Pfister, Tomas. SQLP rompt: In-Context Text-to- SQL with Minimal Labeled Data. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.39

work page doi:10.18653/v1/2023.findings-emnlp.39 2023
[16]

DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction , url =

Pourreza, Mohammadreza and Rafiei, Davood , booktitle =. DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction , url =
[17]

2023 , eprint=

Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation , author=. 2023 , eprint=

2023
[18]

MCS - SQL : Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to- SQL Generation

Lee, Dongjun and Park, Choongwon and Kim, Jaehyuk and Park, Heesoo. MCS - SQL : Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to- SQL Generation. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025
[19]

2023 , eprint=

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. 2023 , eprint=

2023
[20]

2022 , eprint=

CodeT: Code Generation with Generated Tests , author=. 2022 , eprint=

2022
[21]

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL , url =

Pourreza, Mohammadreza and Li, Hailong and Sun, Ruoxi and Chung, Yeounoh and Talaei, Shayan and Kakkar, Gaurav Tarlok and Gan, Yu and Saberi, Amin and Ozcan, Fatma and Arik, Sercan , booktitle =. CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL , url =
[22]

Enhancing Text-to- SQL Parsing through Question Rewriting and Execution-Guided Refinement

Mao, Wenxin and Wang, Ruiqi and Guo, Jiyu and Zeng, Jichuan and Gao, Cuiyun and Han, Peiyi and Liu, Chuanyi. Enhancing Text-to- SQL Parsing through Question Rewriting and Execution-Guided Refinement. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.120

work page doi:10.18653/v1/2024.findings-acl.120 2024
[23]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2025 , month=. doi:10.1609/aaai.v39i22.34511 , number=

work page doi:10.1609/aaai.v39i22.34511 2025
[24]

Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s

Aggarwal, Pranjal and Madaan, Aman and Yang, Yiming and Mausam. Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.761

work page doi:10.18653/v1/2023.emnlp-main.761 2023
[25]

Dong, Xuemei and Zhang, Chao and Ge, Yuhang and Mao, Yuren and Gao, Yunjun and Chen, lu and Lin, Jinshu and Lou, Dongfang , month = jul, year =. C3:. doi:10.48550/arXiv.2307.07306 , language =

work page doi:10.48550/arxiv.2307.07306
[26]

doi: 10.18653/v1/2023.findings-acl.53

Gan, Yujian and Chen, Xinyun and Purver, Matthew. Re-appraising the Schema Linking for Text-to- SQL. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.53

work page doi:10.18653/v1/2023.findings-acl.53 2023
[27]

Improving Retrieval-augmented Text-to- SQL with AST -based Ranking and Schema Pruning

Shen, Zhili and Vougiouklis, Pavlos and Diao, Chenxin and Vyas, Kaustubh and Ji, Yuanyi and Pan, Jeff Z. Improving Retrieval-augmented Text-to- SQL with AST -based Ranking and Schema Pruning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.449

work page doi:10.18653/v1/2024.emnlp-main.449 2024
[28]

K aggle DBQA : Realistic Evaluation of Text-to- SQL Parsers

Lee, Chia-Hsuan and Polozov, Oleksandr and Richardson, Matthew. K aggle DBQA : Realistic Evaluation of Text-to- SQL Parsers. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021. doi:10.18653/v1/2021.acl-long.176

work page doi:10.18653/v1/2021.acl-long.176 2021
[29]

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs , url =

Li, Jinyang and Hui, Binyuan and Qu, Ge and Yang, Jiaxi and Li, Binhua and Li, Bowen and Wang, Bailin and Qin, Bowen and Geng, Ruiying and Huo, Nan and Zhou, Xuanhe and Chenhao, Ma and Li, Guoliang and Chang, Kevin and Huang, Fei and Cheng, Reynold and Li, Yongbin , booktitle =. Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Da...
[30]

2025 , eprint=

DeepSeek-V3 Technical Report , author=. 2025 , eprint=

2025
[31]

Enhancing Text-to- SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies

Nan, Linyong and Zhao, Yilun and Zou, Weijin and Ri, Narutatsu and Tae, Jaesung and Zhang, Ellen and Cohan, Arman and Radev, Dragomir. Enhancing Text-to- SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.996

work page doi:10.18653/v1/2023.findings-emnlp.996 2023
[32]

2023 , eprint=

A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability , author=. 2023 , eprint=

2023
[33]

Decomposition for Enhancing Attention: Improving LLM -based Text-to- SQL through Workflow Paradigm

Xie, Yuanzhen and Jin, Xinzhou and Xie, Tao and Lin, Mingxiong and Chen, Liang and Yu, Chenyun and Cheng, Lei and Zhuo, Chengxiang and Hu, Bo and Li, Zang. Decomposition for Enhancing Attention: Improving LLM -based Text-to- SQL through Workflow Paradigm. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findi...

work page doi:10.18653/v1/2024.findings-acl.641 2024
[34]

2025 , issue_date =

Xie, Xiangjin and Xu, Guangwei and Zhao, Lingyan and Guo, Ruijie , title =. 2025 , issue_date =. doi:10.1145/3725331 , journal =

work page doi:10.1145/3725331 2025
[35]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url =

Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and ichter, brian and Xia, Fei and Chi, Ed and Le, Quoc V and Zhou, Denny , booktitle =. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url =
[36]

Liu , title =

Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. Journal of Machine Learning Research , year =
[37]

BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke. BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguisti...

work page doi:10.18653/v1/2020.acl-main.703 2020
[38]

2021 , eprint=

Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=

2021
[39]

2026 , howpublished =

GPT-4o API Documentation , author =. 2026 , howpublished =

2026
[40]

2022 , url =

OpenAI , title =. 2022 , url =

2022
[41]

ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL , url =

Qin, Yang and Chen, Chao and Fu, Zhihang and Chen, Ze and Peng, Dezhong and Hu, Peng and Ye, Jieping , booktitle =. ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL , url =
[42]

2025 , eprint=

Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search , author=. 2025 , eprint=

2025
[43]

SAFE - SQL : Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to- SQL

Lee, Jimin and Baek, Ingeol and Kim, Byeongjeong and Bae, Hyunkyung and Lee, Hwanhee. SAFE - SQL : Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to- SQL. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.962

work page doi:10.18653/v1/2025.emnlp-main.962 2025
[44]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025
[45]

2024 , eprint=

Qwen2.5-Coder Technical Report , author=. 2024 , eprint=

2024
[46]

2025 , eprint=

Qwen2.5 Technical Report , author=. 2025 , eprint=

2025
[47]

2026 , eprint=

GLM-5: from Vibe Coding to Agentic Engineering , author=. 2026 , eprint=

2026
[48]

2026 , eprint=

Kimi K2: Open Agentic Intelligence , author=. 2026 , eprint=

2026

[1] [1]

The VLDB Journal , author =

A survey on deep learning approaches for text-to-. The VLDB Journal , author =. 2023 , pages =. doi:10.1007/s00778-022-00776-8 , language =

work page doi:10.1007/s00778-022-00776-8 2023

[2] [2]

Recent Advances in Text-to- SQL : A Survey of What We Have and What We Expect

Deng, Naihao and Chen, Yulong and Zhang, Yue. Recent Advances in Text-to- SQL : A Survey of What We Have and What We Expect. Proceedings of the 29th International Conference on Computational Linguistics. 2022

2022

[3] [3]

Next-Generation Database Interfaces: A Survey of LLM-Based Text-to-SQL , year=

Hong, Zijin and Yuan, Zheng and Zhang, Qinggang and Chen, Hao and Dong, Junnan and Huang, Feiran and Huang, Xiao , journal=. Next-Generation Database Interfaces: A Survey of LLM-Based Text-to-SQL , year=

[4] [4]

2025 , eprint=

Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities , author=. 2025 , eprint=

2025

[5] [5]

Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task

Yu, Tao and Zhang, Rui and Yang, Kai and Yasunaga, Michihiro and Wang, Dongxu and Li, Zifan and Ma, James and Li, Irene and Yao, Qingning and Roman, Shanelle and Zhang, Zilin and Radev, Dragomir. S pider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to- SQL Task. Proceedings of the 2018 Conference on Empirical...

work page doi:10.18653/v1/d18-1425 2018

[6] [6]

2024 , eprint=

GPT-4 Technical Report , author=. 2024 , eprint=

2024

[7] [7]

A survey on employing large language models for text-to-SQL tasks.ACM Computing Surveys, 58(2):1–37, 2026

Shi, Liang and Tang, Zhengju and Zhang, Nan and Zhang, Xiaotong and Yang, Zhi , title =. ACM Comput. Surv. , month = sep, articleno =. 2025 , issue_date =. doi:10.1145/3737873 , abstract =

work page doi:10.1145/3737873 2025

[8] [8]

MS c- SQL : Multi-Sample Critiquing Small Language Models For Text-To- SQL Translation

Gorti, Satya Krishna and Gofman, Ilan and Liu, Zhaoyan and Wu, Jiapeng and Vouitsis, No. MS c- SQL : Multi-Sample Critiquing Small Language Models For Text-To- SQL Translation. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 202...

work page doi:10.18653/v1/2025.naacl-long.107 2025

[9] [9]

Learning from Imperfect Data: Towards Efficient Knowledge Distillation of Autoregressive Language Models for Text-to- SQL

Zhong, Qihuang and Chen, Kunfeng and Ding, Liang and Liu, Juhua and Du, Bo and Tao, Dacheng. Learning from Imperfect Data: Towards Efficient Knowledge Distillation of Autoregressive Language Models for Text-to- SQL. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.403

work page doi:10.18653/v1/2024.findings-emnlp.403 2024

[10] [10]

Companion of the 2024 International Conference on Management of Data , pages =

Zhang, Chao and Mao, Yuren and Fan, Yijiang and Mi, Yu and Gao, Yunjun and Chen, Lu and Lou, Dongfang and Lin, Jinshu , title =. Companion of the 2024 International Conference on Management of Data , pages =. 2024 , isbn =. doi:10.1145/3626246.3653375 , abstract =

work page doi:10.1145/3626246.3653375 2024

[11] [11]

Synthesizing Text-to- SQL Data from Weak and Strong LLM s

Yang, Jiaxi and Hui, Binyuan and Yang, Min and Yang, Jian and Lin, Junyang and Zhou, Chang. Synthesizing Text-to- SQL Data from Weak and Strong LLM s. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.425

work page doi:10.18653/v1/2024.acl-long.425 2024

[12] [12]

Findings of the Association for Computational Linguistics:

Pourreza, Mohammadreza and Rafiei, Davood. DTS - SQL : Decomposed Text-to- SQL with Small Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.481

work page doi:10.18653/v1/2024.findings-emnlp.481 2024

[13] [13]

ACT - SQL : In-Context Learning for Text-to- SQL with Automatically-Generated Chain-of-Thought

Zhang, Hanchong and Cao, Ruisheng and Chen, Lu and Xu, Hongshen and Yu, Kai. ACT - SQL : In-Context Learning for Text-to- SQL with Automatically-Generated Chain-of-Thought. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.227

work page doi:10.18653/v1/2023.findings-emnlp.227 2023

[14] [14]

MAC - SQL : A Multi-Agent Collaborative Framework for Text-to- SQL

Wang, Bing and Ren, Changyu and Yang, Jian and Liang, Xinnian and Bai, Jiaqi and Chai, LinZheng and Yan, Zhao and Zhang, Qian-Wen and Yin, Di and Sun, Xing and Li, Zhoujun. MAC - SQL : A Multi-Agent Collaborative Framework for Text-to- SQL. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025

[15] [15]

SQLP rompt: In-Context Text-to- SQL with Minimal Labeled Data

Sun, Ruoxi and Arik, Sercan and Sinha, Rajarishi and Nakhost, Hootan and Dai, Hanjun and Yin, Pengcheng and Pfister, Tomas. SQLP rompt: In-Context Text-to- SQL with Minimal Labeled Data. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.39

work page doi:10.18653/v1/2023.findings-emnlp.39 2023

[16] [16]

DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction , url =

Pourreza, Mohammadreza and Rafiei, Davood , booktitle =. DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction , url =

[17] [17]

2023 , eprint=

Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation , author=. 2023 , eprint=

2023

[18] [18]

MCS - SQL : Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to- SQL Generation

Lee, Dongjun and Park, Choongwon and Kim, Jaehyuk and Park, Heesoo. MCS - SQL : Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to- SQL Generation. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025

[19] [19]

2023 , eprint=

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. 2023 , eprint=

2023

[20] [20]

2022 , eprint=

CodeT: Code Generation with Generated Tests , author=. 2022 , eprint=

2022

[21] [21]

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL , url =

Pourreza, Mohammadreza and Li, Hailong and Sun, Ruoxi and Chung, Yeounoh and Talaei, Shayan and Kakkar, Gaurav Tarlok and Gan, Yu and Saberi, Amin and Ozcan, Fatma and Arik, Sercan , booktitle =. CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL , url =

[22] [22]

Enhancing Text-to- SQL Parsing through Question Rewriting and Execution-Guided Refinement

Mao, Wenxin and Wang, Ruiqi and Guo, Jiyu and Zeng, Jichuan and Gao, Cuiyun and Han, Peiyi and Liu, Chuanyi. Enhancing Text-to- SQL Parsing through Question Rewriting and Execution-Guided Refinement. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.120

work page doi:10.18653/v1/2024.findings-acl.120 2024

[23] [23]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2025 , month=. doi:10.1609/aaai.v39i22.34511 , number=

work page doi:10.1609/aaai.v39i22.34511 2025

[24] [24]

Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s

Aggarwal, Pranjal and Madaan, Aman and Yang, Yiming and Mausam. Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.761

work page doi:10.18653/v1/2023.emnlp-main.761 2023

[25] [25]

Dong, Xuemei and Zhang, Chao and Ge, Yuhang and Mao, Yuren and Gao, Yunjun and Chen, lu and Lin, Jinshu and Lou, Dongfang , month = jul, year =. C3:. doi:10.48550/arXiv.2307.07306 , language =

work page doi:10.48550/arxiv.2307.07306

[26] [26]

doi: 10.18653/v1/2023.findings-acl.53

Gan, Yujian and Chen, Xinyun and Purver, Matthew. Re-appraising the Schema Linking for Text-to- SQL. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.53

work page doi:10.18653/v1/2023.findings-acl.53 2023

[27] [27]

Improving Retrieval-augmented Text-to- SQL with AST -based Ranking and Schema Pruning

Shen, Zhili and Vougiouklis, Pavlos and Diao, Chenxin and Vyas, Kaustubh and Ji, Yuanyi and Pan, Jeff Z. Improving Retrieval-augmented Text-to- SQL with AST -based Ranking and Schema Pruning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.449

work page doi:10.18653/v1/2024.emnlp-main.449 2024

[28] [28]

K aggle DBQA : Realistic Evaluation of Text-to- SQL Parsers

Lee, Chia-Hsuan and Polozov, Oleksandr and Richardson, Matthew. K aggle DBQA : Realistic Evaluation of Text-to- SQL Parsers. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021. doi:10.18653/v1/2021.acl-long.176

work page doi:10.18653/v1/2021.acl-long.176 2021

[29] [29]

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs , url =

Li, Jinyang and Hui, Binyuan and Qu, Ge and Yang, Jiaxi and Li, Binhua and Li, Bowen and Wang, Bailin and Qin, Bowen and Geng, Ruiying and Huo, Nan and Zhou, Xuanhe and Chenhao, Ma and Li, Guoliang and Chang, Kevin and Huang, Fei and Cheng, Reynold and Li, Yongbin , booktitle =. Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Da...

[30] [30]

2025 , eprint=

DeepSeek-V3 Technical Report , author=. 2025 , eprint=

2025

[31] [31]

Enhancing Text-to- SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies

Nan, Linyong and Zhao, Yilun and Zou, Weijin and Ri, Narutatsu and Tae, Jaesung and Zhang, Ellen and Cohan, Arman and Radev, Dragomir. Enhancing Text-to- SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.996

work page doi:10.18653/v1/2023.findings-emnlp.996 2023

[32] [32]

2023 , eprint=

A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability , author=. 2023 , eprint=

2023

[33] [33]

Decomposition for Enhancing Attention: Improving LLM -based Text-to- SQL through Workflow Paradigm

Xie, Yuanzhen and Jin, Xinzhou and Xie, Tao and Lin, Mingxiong and Chen, Liang and Yu, Chenyun and Cheng, Lei and Zhuo, Chengxiang and Hu, Bo and Li, Zang. Decomposition for Enhancing Attention: Improving LLM -based Text-to- SQL through Workflow Paradigm. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findi...

work page doi:10.18653/v1/2024.findings-acl.641 2024

[34] [34]

2025 , issue_date =

Xie, Xiangjin and Xu, Guangwei and Zhao, Lingyan and Guo, Ruijie , title =. 2025 , issue_date =. doi:10.1145/3725331 , journal =

work page doi:10.1145/3725331 2025

[35] [35]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url =

Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and ichter, brian and Xia, Fei and Chi, Ed and Le, Quoc V and Zhou, Denny , booktitle =. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url =

[36] [36]

Liu , title =

Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. Journal of Machine Learning Research , year =

[37] [37]

BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke. BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguisti...

work page doi:10.18653/v1/2020.acl-main.703 2020

[38] [38]

2021 , eprint=

Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=

2021

[39] [39]

2026 , howpublished =

GPT-4o API Documentation , author =. 2026 , howpublished =

2026

[40] [40]

2022 , url =

OpenAI , title =. 2022 , url =

2022

[41] [41]

ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL , url =

Qin, Yang and Chen, Chao and Fu, Zhihang and Chen, Ze and Peng, Dezhong and Hu, Peng and Ye, Jieping , booktitle =. ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL , url =

[42] [42]

2025 , eprint=

Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search , author=. 2025 , eprint=

2025

[43] [43]

SAFE - SQL : Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to- SQL

Lee, Jimin and Baek, Ingeol and Kim, Byeongjeong and Bae, Hyunkyung and Lee, Hwanhee. SAFE - SQL : Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to- SQL. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.962

work page doi:10.18653/v1/2025.emnlp-main.962 2025

[44] [44]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025

[45] [45]

2024 , eprint=

Qwen2.5-Coder Technical Report , author=. 2024 , eprint=

2024

[46] [46]

2025 , eprint=

Qwen2.5 Technical Report , author=. 2025 , eprint=

2025

[47] [47]

2026 , eprint=

GLM-5: from Vibe Coding to Agentic Engineering , author=. 2026 , eprint=

2026

[48] [48]

2026 , eprint=

Kimi K2: Open Agentic Intelligence , author=. 2026 , eprint=

2026