Learnat: Learning nl2sql with ast-guided task decomposition for large language models

Learnat: Learning nl2sql with ast-guided task decomposition for large language models , author= · 2025 · cs.CL · arXiv 2504.02327

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Natural Language to SQL (NL2SQL) aims to translate natural language queries into executable SQL statements, offering non-expert users intuitive access to databases. While recent approaches leveraging large-scale private LLMs such as GPT-4 have achieved state-of-the-art results, they face two critical challenges: the lack of openness and reproducibility, and the prohibitive computational cost of test-time scaling. To address these issues, we explore improving the model-level performance of small-scale public LLMs in NL2SQL under resource-constrained settings. Our exploratory experiments reveal the potential of task decomposition for enhancing NL2SQL performance, but also highlight the difficulty of enabling LLMs to decompose queries effectively. Motivated by these findings, we propose LearNAT, a novel framework designed to enhance decomposition capabilities of LLM. LearNAT introduces (1) a Decomposition Synthesis Procedure, which leverages AST-guided search with pruning strategies to generate verifiable and efficient decompositions, and (2) Margin-Aware Reinforcement Learning, which provides fine-grained preference optimization for multi-step reasoning beyond standard DPO. Extensive experiments on benchmark datasets demonstrate that LearNAT significantly improves the performance of small-scale LLMs, achieving results comparable to GPT-4 with only a 7B parameter model. These results validate the effectiveness of verifiable decomposition and fine-grained preference learning in advancing NL2SQL towards openness, transparency, and efficiency. Our code is publicly available at https://github.com/MrBlankness/LearNAT.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

LEAF-SQL: Level-wise Exploration with Adaptive Fine-graining for Text-to-SQL Skeleton Prediction

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

LEAF-SQL uses level-wise exploration with adaptive fine-graining and dual agents to generate diverse SQL skeletons, reaching 71.6% execution accuracy on the BIRD benchmark and outperforming prior search- and skeleton-based methods.

Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories

cs.AI · 2026-06-26 · unverdicted · novelty 6.0 · 2 refs

DynaSteer is a dynamic representation editing framework that uses pattern clustering, Fisher-LDA, and lookahead entropy monitoring to steer LLM reasoning trajectories toward truth on MATH and coding tasks.

The Tell-Tale Norm: $\ell_2$ Magnitude as a Signal for Reasoning Dynamics in Large Language Models

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

The L2 norm of LLM hidden states signals reasoning intensity, with a theoretical bound on SAE feature activations, enabling three new test-time scaling techniques that boost performance.

citing papers explorer

Showing 3 of 3 citing papers after filters.

LEAF-SQL: Level-wise Exploration with Adaptive Fine-graining for Text-to-SQL Skeleton Prediction cs.CL · 2026-05-10 · unverdicted · none · ref 4 · internal anchor
LEAF-SQL uses level-wise exploration with adaptive fine-graining and dual agents to generate diverse SQL skeletons, reaching 71.6% execution accuracy on the BIRD benchmark and outperforming prior search- and skeleton-based methods.
Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories cs.AI · 2026-06-26 · unverdicted · none · ref 43 · 2 links · internal anchor
DynaSteer is a dynamic representation editing framework that uses pattern clustering, Fisher-LDA, and lookahead entropy monitoring to steer LLM reasoning trajectories toward truth on MATH and coding tasks.
The Tell-Tale Norm: $\ell_2$ Magnitude as a Signal for Reasoning Dynamics in Large Language Models cs.CL · 2026-06-04 · unverdicted · none · ref 48 · internal anchor
The L2 norm of LLM hidden states signals reasoning intensity, with a theoretical bound on SAE feature activations, enabling three new test-time scaling techniques that boost performance.

Learnat: Learning nl2sql with ast-guided task decomposition for large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer