arxiv: 2605.09295 · v1 · submitted 2026-05-10 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

LEAF-SQL: Level-wise Exploration with Adaptive Fine-graining for Text-to-SQL Skeleton Prediction

Zhao Tan , Xiping Liu , Qing Shu , Qizhi Wan , Dexi Liu , Changxuan Wan

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:27 UTC · model grok-4.3

classification 💻 cs.CL

keywords Text-to-SQLSQL skeleton predictiontree searchlarge language modelsBIRD benchmarkcoarse-to-fine explorationadaptive refinement

0 comments

The pith

Reframing SQL skeleton prediction as a coarse-to-fine tree search enables more accurate generation of complex database queries from natural language.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LEAF-SQL to address limitations in existing Text-to-SQL methods that rely on single structural hypotheses for SQL skeletons. By using a three-level hierarchy to guide progressive exploration, it generates diverse candidate skeletons and prunes them efficiently with dedicated agents. This approach leads to better performance on challenging benchmarks like BIRD. A sympathetic reader would care because accurate Text-to-SQL systems allow non-experts to interact with databases without writing code, especially for intricate queries involving nesting and multiple conditions. If the central claim holds, it supplies a more reliable way to build query structures before final generation.

Core claim

LEAF-SQL reframes skeleton prediction as a coarse-to-fine tree search process. It employs a three-level skeleton hierarchy to guide the search, a Skeleton Formulation Agent to generate diverse candidates, and a Skeleton Evaluation Agent to efficiently prune the search space. This integrated design yields skeleton candidates that are both structurally diverse and granularity-adaptive, providing a stronger foundation for the SQL generation.

What carries the argument

The three-level skeleton hierarchy together with the Skeleton Formulation Agent and Skeleton Evaluation Agent, which together turn skeleton prediction into a level-wise tree search with adaptive refinement.

If this is right

LEAF-SQL consistently improves performance when used with various LLM backbones for Text-to-SQL tasks.
On the BIRD benchmark hidden test set the method reaches 71.6 execution accuracy and exceeds leading search-based and skeleton-based approaches.
Complex queries that contain deeply nested logic or multiple clauses are handled more effectively than methods limited to one structural hypothesis.
The combination of progressive refinement and pruning balances structural diversity with computational efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same level-wise exploration pattern could be tested on other structured-generation problems such as program synthesis or formal proof construction where multiple valid structures exist.
Replacing the fixed evaluation rules with a learned reward model might allow the pruning step to improve automatically from execution feedback.
The results indicate that explicit multi-level decomposition helps large language models more than flat, single-pass generation when the output must satisfy strict syntactic constraints.

Load-bearing premise

The Skeleton Evaluation Agent can reliably prune away bad structural hypotheses while keeping all correct ones and that the three-level hierarchy plus adaptive fine-graining covers the needed query variety without excessive cost.

What would settle it

Compare full LEAF-SQL accuracy against versions that disable the Skeleton Evaluation Agent or the adaptive fine-graining step on the official BIRD hidden test set; absence of a clear accuracy drop would undermine the necessity of those components.

Figures

Figures reproduced from arXiv: 2605.09295 by Changxuan Wan, Dexi Liu, Qing Shu, Qizhi Wan, Xiping Liu, Zhao Tan.

**Figure 1.** Figure 1: Comparison of LEAF-SQL with prior works. LEAF [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 3.** Figure 3: The prevailing multi-stage pipeline for Text-to-SQL in [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of LEAF-SQL. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: An illustration of the Level-wise Skeleton Search. The search has three phases (Base, Expanded, Detailed), each with [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: An illustration of how a three-level skeleton hierarchy [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Overall architecture of Skeleton Evaluation Agent [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: An example of the prompt used for SQL Generation. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Performance comparison between using Oracle Skele [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: Granularity distribution of skeletons generated by [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 11.** Figure 11: Average consumption of time and tokens on BIRD. [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 12.** Figure 12: Case studies from the BIRD benchmark illustrating the outputs of LEAF-SQL. For each question of varying difficulty [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗

read the original abstract

Text-to-SQL translates natural language questions into executable SQL queries, enabling intuitive database access for non-experts. While large language models achieve strong performance on Text-to-SQL with prompting, they still struggle with complex queries that involve deeply nested logic or multiple clauses. A widely used approach employs SQL skeletons--intermediate representations of query logic--to streamline generation, but existing methods are limited by their reliance on a single structural hypothesis and lack of progressive reasoning. To overcome these limitations, we propose LEAF-SQL, a novel framework that reframes skeleton prediction as a coarse-to-fine tree search process. LEAF-SQL enables systematic exploration of diverse structural hypotheses with adaptive refinement. Several key techniques are employed in LEAF-SQL: (1) a three-level skeleton hierarchy to guide the search, (2) a Skeleton Formulation Agent to generate diverse candidates, and (3) a Skeleton Evaluation Agent to efficiently prune the search space. This integrated design yields skeleton candidates that are both structurally diverse and granularity-adaptive, providing a stronger foundation for the SQL generation. Extensive experiments show that LEAF-SQL consistently improves the performance of various LLM backbones. On the official hidden test set of the challenging BIRD benchmark, our method achieves 71.6 execution accuracy, which outperforms leading search-based and skeleton-based methods, affirming its effectiveness for complex queries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LEAF-SQL adds a three-level tree search with formulation and evaluation agents to skeleton prediction for Text-to-SQL and reports 71.6% on BIRD, but the pruning and coverage assumptions still need direct checks.

read the letter

LEAF-SQL reframes skeleton prediction as a coarse-to-fine tree search instead of committing to one structure early. The new pieces are the three-level hierarchy, the Skeleton Formulation Agent that produces diverse candidates, and the Skeleton Evaluation Agent that prunes them, plus adaptive fine-graining to adjust granularity. This is a clear step past the single-hypothesis limit noted in prior work, and the experiments show gains across different LLM backbones on standard benchmarks.

Referee Report

2 major / 2 minor

Summary. The paper proposes LEAF-SQL, a framework that reframes Text-to-SQL skeleton prediction as a coarse-to-fine tree search process. It employs a three-level skeleton hierarchy to guide exploration, a Skeleton Formulation Agent to generate diverse structural candidates, and a Skeleton Evaluation Agent to prune the search space via LLM prompting. The approach is evaluated on standard benchmarks with various LLM backbones, claiming consistent improvements and a new state-of-the-art of 71.6% execution accuracy on the official hidden test set of the BIRD benchmark, outperforming prior search-based and skeleton-based methods.

Significance. If the empirical gains hold under scrutiny, the work offers a structured way to increase diversity in skeleton hypotheses for complex, nested queries, which could strengthen LLM-based Text-to-SQL systems on schema-diverse datasets like BIRD. The integration of level-wise search with adaptive agents represents a practical advance over single-hypothesis skeleton methods, provided the pruning and coverage mechanisms are shown to be reliable.

major comments (2)

[§3] §3 (Method), Skeleton Evaluation Agent description: No quantitative evaluation of the agent's pruning reliability (e.g., precision, recall, or false-negative rate on gold skeletons) is reported. This is load-bearing for the 71.6% BIRD claim, as unverified false negatives on valid skeletons would directly undermine the outperformance over baselines.
[§4] §4 (Experiments), BIRD results and hierarchy discussion: The manuscript provides no coverage statistics or ablation on how the three-level hierarchy plus adaptive fine-graining enumerates structural variants for BIRD's complex queries (e.g., nested clauses across diverse schemas). Without this, the assumption that the search sufficiently covers query diversity remains unverified and central to the headline result.

minor comments (2)

[Abstract] Abstract and §1: The claim of 'consistent improvements' across 'various LLM backbones' would benefit from explicit listing of the backbones and exact baseline comparisons in the abstract for immediate clarity.
[Tables] Figure captions and tables: Ensure all tables reporting execution accuracy include standard deviations or multiple runs to support the reported gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and commit to incorporating the suggested analyses in the revised version.

read point-by-point responses

Referee: [§3] §3 (Method), Skeleton Evaluation Agent description: No quantitative evaluation of the agent's pruning reliability (e.g., precision, recall, or false-negative rate on gold skeletons) is reported. This is load-bearing for the 71.6% BIRD claim, as unverified false negatives on valid skeletons would directly undermine the outperformance over baselines.

Authors: We thank the referee for pointing this out. The Skeleton Evaluation Agent is intended to prune invalid or low-quality skeletons to focus the search. While our end-to-end results on BIRD support the overall approach, we agree that reporting quantitative metrics on the pruning step, such as precision, recall, and false-negative rates relative to gold skeletons, would provide important validation. In the revision, we will add a dedicated subsection with these metrics computed on a sample of BIRD queries to demonstrate the agent's reliability and address concerns about potential false negatives. revision: yes
Referee: [§4] §4 (Experiments), BIRD results and hierarchy discussion: The manuscript provides no coverage statistics or ablation on how the three-level hierarchy plus adaptive fine-graining enumerates structural variants for BIRD's complex queries (e.g., nested clauses across diverse schemas). Without this, the assumption that the search sufficiently covers query diversity remains unverified and central to the headline result.

Authors: We appreciate this feedback. The level-wise exploration with adaptive fine-graining is designed to handle the diversity in BIRD's complex queries by starting with coarse skeletons and refining them. However, we acknowledge the absence of explicit coverage statistics and ablations in the current manuscript. We will include additional experiments in the revised paper, such as statistics on the number of skeletons explored at each level and an ablation showing performance with and without the hierarchy, to verify sufficient coverage of structural variants for nested and schema-diverse queries. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework evaluated on external benchmarks

full rationale

The paper proposes LEAF-SQL as a practical coarse-to-fine skeleton search framework using LLM agents and a three-level hierarchy. Performance claims rest on execution accuracy measured on the external BIRD hidden test set, not on any internal equations, fitted parameters, or self-referential definitions. No derivation chain reduces a result to its own inputs by construction, and the method is presented as an engineering proposal rather than a mathematical derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The method builds on standard LLM prompting and benchmark evaluation practices; the abstract introduces no new free parameters, mathematical axioms, or invented entities beyond the proposed search framework itself.

pith-pipeline@v0.9.0 · 5551 in / 1077 out tokens · 75908 ms · 2026-05-12T04:27:14.737646+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

three-level skeleton hierarchy: Base, Expanded, and Detailed... derived from a query’s Abstract Syntax Tree (AST)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Skeleton Evaluation Agent... prunes low-quality or incorrect branches

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 7 internal anchors

[1]

Spider: A large- scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task,

T. Yu, R. Zhang, K. Yang, M. Yasunaga, D. Wang, Z. Li, J. Ma, I. Li, Q. Yao, S. Roman, Z. Zhang, and D. Radev, “Spider: A large- scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task,” inProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Comput...

work page 2018
[2]

Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls,

J. Li, B. Hui, G. Qu, J. Yang, B. Li, B. Li, B. Wang, B. Qin, R. Geng, N. Huo, X. Zhou, C. Ma, G. Li, K. C. Chang, F. Huang, R. Cheng, and Y . Li, “Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls,” inProceedings of the 37th International Conference on Neural Information Processing Systems (NeurIPS)...

work page 2023
[3]

Sql-o1: A self-reward heuristic dynamic search method for text- to-sql,

S. Lyu, H. Luo, R. Li, Z. Ou, J. Sun, Y . Qin, X. Shang, M. Song, and Y . Zhu, “Sql-o1: A self-reward heuristic dynamic search method for text-to-sql,”arXiv preprint arXiv:2502.11741, 2025

work page arXiv 2025
[4]

Learnat: Learning nl2sql with ast-guided task decomposition for large language models,

W. Liao, X. Gao, T. Jia, R. Qiu, Y . Zhu, Y . Lin, X. Chu, J. Zhao, and Y . Wang, “Learnat: Learning nl2sql with ast-guided task decomposition for large language models,”arXiv preprint arXiv:2504.02327, 2025

work page arXiv 2025
[5]

Few- shot text-to-sql translation using structure and content prompt learning,

Z. Gu, J. Fan, N. Tang, L. Cao, B. Jia, S. Madden, and X. Du, “Few- shot text-to-sql translation using structure and content prompt learning,” Proc. ACM Manag. Data, vol. 1, no. 2, pp. 138–166, 2023

work page 2023
[6]

Resdsql: Decoupling schema linking and skeleton parsing for text-to-sql,

H. Li, J. Zhang, C. Li, and H. Chen, “Resdsql: Decoupling schema linking and skeleton parsing for text-to-sql,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 11, 2023, pp. 13 067– 13 075

work page 2023
[7]

Combining small language models and large language models for zero-shot nl2sql,

J. Fan, Z. Gu, S. Zhang, Y . Zhang, Z. Chen, L. Cao, G. Li, S. Madden, X. Du, and N. Tang, “Combining small language models and large language models for zero-shot nl2sql,”Proc. VLDB Endow., vol. 17, no. 11, p. 2750–2763, 2024

work page 2024
[8]

Dac: Decomposed automation correction for text-to-sql,

D. Wang, L. Dou, X. Zhang, Q. Zhu, and W. Che, “Dac: Decomposed automation correction for text-to-sql,”arXiv preprint arXiv:2408.08779, 2024

work page arXiv 2024
[9]

Ucs-sql: Uniting content and structure for enhanced semantic bridging in text-to-sql,

Z. Wu, Z. Li, J. JieZhangChinaTele, Z. He, J. Yang, Y . Zhao, R. Fang, B. Wang, H. Xie, S. Song, and Z. Li, “Ucs-sql: Uniting content and structure for enhanced semantic bridging in text-to-sql,” inFindings of the Association for Computational Linguistics: ACL 2025. Vienna, Austria: Association for Computational Linguistics, 2025, pp. 8156– 8168

work page 2025
[10]

Purple: Making a large language model a better sql writer,

T. Ren, Y . Fan, Z. He, R. Huang, J. Dai, C. Huang, Y . Jing, K. Zhang, Y . Yang, and X. S. Wang, “Purple: Making a large language model a better sql writer,”arXiv preprint arXiv:2403.20014, 2024

work page arXiv 2024
[11]

Chain of thought prompting elicits knowledge augmentation,

D. Wu, J. Zhang, and X. Huang, “Chain of thought prompting elicits knowledge augmentation,” inFindings of the Association for Compu- tational Linguistics: ACL 2023. Toronto, Canada: Association for Computational Linguistics, 2023, pp. 6519–6534

work page 2023
[12]

Tree of thoughts: deliberate problem solving with large language models,

S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: deliberate problem solving with large language models,” inProceedings of the 37th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2023, pp. 8812–8825

work page 2023
[13]

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

V . Zhong, C. Xiong, and R. Socher, “Seq2sql: Generating structured queries from natural language using reinforcement learning,”arXiv preprint arXiv:1709.00103, 2017

work page internal anchor Pith review arXiv 2017
[14]

arXiv preprint arXiv:2411.07763 (2024)

F. Lei, J. Chen, Y . Ye, R. Cao, D. Shin, H. Su, Z. Suo, H. Gao, W. Hu, P. Yin, V . Zhong, C. Xiong, R. Sun, Q. Liu, S. Wang, and T. Yu, “Spider 2.0: Evaluating language models on real-world enterprise text- to-sql workflows,”arXiv preprint arXiv:2411.07763, 2025

work page arXiv 2025
[15]

A survey of text-to-sql in the era of llms: Where are we, and where are we going?

X. Liu, S. Shen, B. Li, P. Ma, R. Jiang, Y . Zhang, J. Fan, G. Li, N. Tang, and Y . Luo, “A survey of text-to-sql in the era of llms: Where are we, and where are we going?”IEEE Transactions on Knowledge and Data Engineering, vol. 37, no. 10, pp. 5735–5754, 2025

work page 2025
[16]

The dawn of natural language to sql: Are we fully ready?

B. Li, Y . Luo, C. Chai, G. Li, and N. Tang, “The dawn of natural language to sql: Are we fully ready?”Proc. VLDB Endow., vol. 17, no. 11, p. 3318–3331, 2024

work page 2024
[17]

arXiv preprint arXiv:2411.00073 (2024)

Z. Cao, Y . Zheng, Z. Fan, X. Zhang, W. Chen, and X. Bai, “Rsl- sql: Robust schema linking in text-to-sql generation,”arXiv preprint arXiv:2411.00073, 2024

work page arXiv 2024
[18]

Enhancing text-to-sql parsing through question rewriting and execution- guided refinement,

W. Mao, R. Wang, J. Guo, J. Zeng, C. Gao, P. Han, and C. Liu, “Enhancing text-to-sql parsing through question rewriting and execution- guided refinement,” inFindings of the Association for Computational Linguistics: ACL 2024. Bangkok, Thailand: Association for Computa- tional Linguistics, 2024, pp. 2009–2024

work page 2024
[19]

Share: An slm-based hierarchical action correction assistant for text-to-sql,

G. Qu, J. Li, B. Qin, X. Li, N. Huo, C. Ma, and R. Cheng, “Share: An slm-based hierarchical action correction assistant for text-to-sql,” in Proceedings of the 63rd Annual Meeting of the Association for Compu- tational Linguistics. Vienna, Austria: Association for Computational Linguistics, 2025, pp. 11 268–11 292

work page 2025
[20]

Teaching Large Language Models to Self-Debug

X. Chen, M. Lin, N. Sch ¨arli, and D. Zhou, “Teaching large language models to self-debug,”arXiv preprint arXiv:2304.05128, 2023

work page internal anchor Pith review arXiv 2023
[21]

arXiv preprint arXiv:2502.17248 (2025)

B. Li, J. Zhang, J. Fan, Y . Xu, C. Chen, N. Tang, and Y . Luo, “Alpha- sql: Zero-shot text-to-sql using monte carlo tree search,”arXiv preprint arXiv:2502.17248, 2025

work page arXiv 2025
[22]

Large language model instruction following: A survey of progresses and challenges,

R. Lou, K. Zhang, and W. Yin, “Large language model instruction following: A survey of progresses and challenges,”Computational Linguistics, vol. 50, no. 3, pp. 1053–1095, 2024

work page 2024
[23]

Exploring chain of thought style prompting for text-to-sql,

C.-Y . Tai, Z. Chen, T. Zhang, X. Deng, and H. Sun, “Exploring chain of thought style prompting for text-to-sql,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Singapore: Association for Computational Linguistics, 2023, pp. 5376– 5393

work page 2023
[24]

Text- to-sql empowered by large language models: A benchmark evaluation,

D. Gao, H. Wang, Y . Li, X. Sun, Y . Qian, B. Ding, and J. Zhou, “Text- to-sql empowered by large language models: A benchmark evaluation,” Proc. VLDB Endow., vol. 17, no. 5, p. 1132–1145, 2024

work page 2024
[25]

Mac-sql: A multi-agent collaborative framework for text-to-sql,

B. Wang, C. Ren, J. Yang, X. Liang, J. Bai, L. Chai, Z. Yan, Q.-W. Zhang, D. Yin, X. Sun, and Z. Li, “Mac-sql: A multi-agent collaborative framework for text-to-sql,” inProceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, UAE, 2025, pp. 540–557

work page 2025
[26]

arXiv preprint arXiv:2505.13271 (2025) Verification-based Text-to-SQL Evaluation with Database Constraints 9

L. Sheng and S.-S. Xu, “Csc-sql: Corrective self-consistency in text-to- sql via reinforcement learning,”arXiv preprint arXiv:2505.13271, 2025

work page arXiv 2025
[27]

Mcts-sql: Light-weight llms can master the text-to-sql through monte carlo tree search,

S. Yuan, L. Chen, M. Yuan, and J. Zhao, “Mcts-sql: Light-weight llms can master the text-to-sql through monte carlo tree search,”arXiv preprint arXiv:2501.16607, 2025

work page arXiv 2025
[28]

XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL

Y . Liu, Y . Zhu, Y . Gao, Z. Luo, X. Li, X. Shi, Y . Hong, J. Gao, Y . Li, B. Ding, and J. Zhou, “Xiyan-sql: A novel multi-generator framework for text-to-sql,”arXiv preprint arXiv:2507.04701, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

Dcg-sql: Enhancing in-context learning for text-to-sql with deep contextual schema link graph,

J. Lee, J.-S. Lee, J. Lee, Y . Choi, and J.-H. Lee, “Dcg-sql: Enhancing in-context learning for text-to-sql with deep contextual schema link graph,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria, 2025, pp. 15 397–15 412

work page 2025
[30]

Parsql: Enhancing text-to-sql through sql parsing and reasoning,

Y . Dai, H. Yang, M. Hao, and P. Chao, “Parsql: Enhancing text-to-sql through sql parsing and reasoning,” inFindings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria, 2025, pp. 661– 681

work page 2025
[31]

Synthesizing text-to-sql data from weak and strong llms,

J. Yang, B. Hui, M. Yang, J. Yang, J. Lin, and C. Zhou, “Synthesizing text-to-sql data from weak and strong llms,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 2024, pp. 7864–7875

work page 2024
[32]

Towards robustness of text-to-sql models against synonym substitution,

Y . Gan, X. Chen, Q. Huang, M. Purver, J. R. Woodward, J. Xie, and P. Huang, “Towards robustness of text-to-sql models against synonym substitution,” inProceedings of the 59th Annual Meeting of the Asso- ciation for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 2021, pp. 2505– 2515

work page 2021
[33]

Structure-grounded pretraining for text-to-sql,

X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun, and M. Richardson, “Structure-grounded pretraining for text-to-sql,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 2021, pp. 1337–1350

work page 2021
[34]

Exploring underexplored limitations of cross-domain text-to-sql generalization,

Y . Gan, X. Chen, and M. Purver, “Exploring underexplored limitations of cross-domain text-to-sql generalization,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 2021, pp. 8926–8931

work page 2021
[35]

Qwen3 Technical Report

A. Yang, A. Li, B. Yang, B. Zhanget al., “Qwen3 technical report,” arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

GPT-4 Technical Report

OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmadet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[37]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

G. Comanici, E. Bieber, M. Schaekermann, I. Pasupat, N. Sachdeva et al., “Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities,” arXiv preprint arXiv:2507.06261, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[38]

Qwen2.5 Technical Report

A. Yang, B. Yang, B. Zhang, B. Hui, B. Zhenget al., “Qwen2.5 technical report,”arXiv preprint arXiv:2412.15115, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024