hub Mixed citations

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

Victor Zhong, Caiming Xiong, Richard Socher · 2017 · cs.CL · arXiv 1709.00103

Mixed citation behavior. Most common role is background (62%).

42 Pith papers citing it

Background 62% of classified citations

open full Pith review browse 42 citing papers arXiv PDF

abstract

A significant amount of the world's knowledge is stored in relational databases. However, the ability for users to retrieve facts from a database is limited due to a lack of understanding of query languages such as SQL. We propose Seq2SQL, a deep neural network for translating natural language questions to corresponding SQL queries. Our model leverages the structure of SQL queries to significantly reduce the output space of generated queries. Moreover, we use rewards from in-the-loop query execution over the database to learn a policy to generate unordered parts of the query, which we show are less suitable for optimization via cross entropy loss. In addition, we will publish WikiSQL, a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia. This dataset is required to train our model and is an order of magnitude larger than comparable datasets. By applying policy-based reinforcement learning with a query execution environment to WikiSQL, our model Seq2SQL outperforms attentional sequence to sequence models, improving execution accuracy from 35.9% to 59.4% and logical form accuracy from 23.4% to 48.3%.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 dataset 2 method 1

citation-polarity summary

background 5 use dataset 2 use method 1

representative citing papers

ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification

cs.CL · 2026-04-11 · conditional · novelty 8.0

Introduces the ODUTQA-MDC task with a 25k-pair benchmark and MAIC-TQA multi-agent framework for detecting and clarifying underspecified open-domain tabular questions via dialogue.

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

cs.CL · 2023-05-12 · conditional · novelty 8.0

Tiny language models under 10M parameters trained on a synthetic children's story dataset generate fluent, consistent, multi-paragraph English text with near-perfect grammar and reasoning.

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

cs.CL · 2022-02-25 · accept · novelty 8.0

Randomly replacing labels in in-context demonstrations barely hurts performance, showing that label space, input distribution, and sequence format drive in-context learning more than ground-truth labels.

GS-QA: A Benchmark for Geospatial Question Answering

cs.DB · 2026-05-21 · unverdicted · novelty 7.0

GS-QA is a new benchmark of 2,800 QA pairs on 28 templates using OSM and Wikipedia data to evaluate LLMs on spatial predicates, multi-source reasoning, and diverse answer types including distances and counts.

LEAF-SQL: Level-wise Exploration with Adaptive Fine-graining for Text-to-SQL Skeleton Prediction

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

LEAF-SQL uses level-wise exploration with adaptive fine-graining and dual agents to generate diverse SQL skeletons, reaching 71.6% execution accuracy on the BIRD benchmark and outperforming prior search- and skeleton-based methods.

RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners

cs.CL · 2026-04-30 · conditional · novelty 7.0

RSAT uses SFT on verified traces followed by GRPO with NLI faithfulness rewards to make 1-8B models produce verifiable table reasoning with cell citations, raising faithfulness 3.7x to 0.826.

NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions

cs.DB · 2026-04-13 · conditional · novelty 7.0

NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.

Large Language Model-Enhanced Relational Operators: Taxonomy, Benchmark, and Analysis

cs.DB · 2026-03-03 · unverdicted · novelty 7.0

The authors define a taxonomy for LLM-enhanced relational operators categorized into Select, Match, Impute, Cluster and Order, and release LROBench to evaluate single and multi-operator queries on semantic database processing.

Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images

cs.CV · 2025-09-09 · conditional · novelty 7.0

Visual-TableQA is a new open-domain benchmark of rendered table images and complex QA pairs created via multi-LLM collaborative generation, with fine-tuned models showing robust generalization to external tests.

FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

cs.CV · 2025-04-14 · unverdicted · novelty 7.0

FLARE is a vision-language model family using text-guided vision encoding, context-aware alignment decoding, dual-semantic mapping loss, and text-driven VQA synthesis to achieve deep cross-modal integration, outperforming larger models with only 630 vision tokens at 3B scale.

LoRA: Low-Rank Adaptation of Large Language Models

cs.CL · 2021-06-17 · accept · novelty 7.0

Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.

Cost-Effective Model Evaluation with Meta-Learning

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

MetaEvaluator applies meta-learning over reference models to deliver label-free performance estimates for unseen models across architectures and modalities on unlabeled datasets.

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

ξ-DPO rewrites the preference objective as minimizing distance to optimal margins and defines reward as a chosen-to-rejected ratio, yielding a bounded, interpretable margin ξ set directly from the initial reward-gap distribution.

Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

FineStep adds step-level process rewards and credit assignment to tool-augmented Text-to-SQL, achieving 3.25% higher execution accuracy than GRPO on BIRD while cutting redundant tool calls.

FINER-SQL: Boosting Small Language Models for Text-to-SQL

cs.DB · 2026-05-05 · unverdicted · novelty 6.0

FINER-SQL boosts 3B-parameter small language models to 67.73% and 85% execution accuracy on BIRD and Spider benchmarks via dense memory and atomic rewards in group relative policy optimization, matching larger LLMs at lower latency.

EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement

cs.DB · 2026-05-01 · unverdicted · novelty 6.0

EGRefine optimizes column renamings via execution-grounded verification and view materialization to recover Text-to-SQL accuracy lost to schema naming issues while guaranteeing query equivalence.

LeGo-Code: Can Modular Curriculum Learning Advance Complex Code Generation? Insights from Text-to-SQL

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

Modular curriculum learning with tier-specific adapters outperforms standard fine-tuning on complex Text-to-SQL queries in Spider and BIRD benchmarks by avoiding catastrophic forgetting.

ReCoQA: A Benchmark for Tool-Augmented and Multi-Step Reasoning in Real Estate Question and Answering

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

ReCoQA is a new large-scale benchmark for multi-step tool-augmented reasoning in real estate QA, accompanied by the HIRE-Agent hierarchical understand-plan-execute baseline.

SQL Query Engine: A Self-Healing LLM Pipeline for Natural Language to PostgreSQL Translation

cs.DB · 2026-04-15 · unverdicted · novelty 6.0

A self-healing LLM pipeline for natural language to PostgreSQL translation achieves up to 9.3 percentage point accuracy gains on benchmarks through error diagnosis and anti-regression mechanisms.

AV-SQL: Decomposing Complex Text-to-SQL Queries with Agentic Views

cs.DB · 2026-04-08 · unverdicted · novelty 6.0

AV-SQL uses a pipeline of LLM agents to generate intermediate CTE views that decompose complex Text-to-SQL queries, reaching 70.38% execution accuracy on Spider 2.0.

OmniTQA: A Cost-Aware System for Hybrid Query Processing over Semi-Structured Data

cs.DB · 2026-04-02 · unverdicted · novelty 6.0

OmniTQA integrates LLM semantic reasoning as a first-class query operator with classical relational operators in a cost-aware planner for hybrid structured and semi-structured data.

Natural Language Interfaces for Spatial and Temporal Databases: A Comprehensive Overview of Methods, Taxonomy, and Future Directions

cs.DB · 2026-03-24 · unverdicted · novelty 6.0

A literature survey that taxonomizes methods, datasets, and evaluation practices for natural language interfaces to geospatial and temporal databases while identifying recurring trends and future directions.

TabXEval: Why this is a Bad Table? An eXhaustive Rubric for Table Evaluation

cs.CL · 2025-05-28 · conditional · novelty 6.0

TabXEval is a rubric-based two-phase framework using structural alignment (TabAlign) and semantic-syntactic comparison (TabCompare) to evaluate tables more precisely than standard metrics.

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

cs.CV · 2024-12-18 · unverdicted · novelty 6.0

VPiT enables pretrained LLMs to perform both visual understanding and generation by predicting discrete text tokens and continuous visual tokens, with understanding data proving more effective than generation-specific data.

citing papers explorer

Showing 42 of 42 citing papers.

ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification cs.CL · 2026-04-11 · conditional · none · ref 3 · internal anchor
Introduces the ODUTQA-MDC task with a 25k-pair benchmark and MAIC-TQA multi-agent framework for detecting and clarifying underspecified open-domain tabular questions via dialogue.
TinyStories: How Small Can Language Models Be and Still Speak Coherent English? cs.CL · 2023-05-12 · conditional · none · ref 33 · internal anchor
Tiny language models under 10M parameters trained on a synthetic children's story dataset generate fluent, consistent, multi-paragraph English text with near-perfect grammar and reasoning.
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? cs.CL · 2022-02-25 · accept · none · ref 81 · internal anchor
Randomly replacing labels in in-context demonstrations barely hurts performance, showing that label space, input distribution, and sequence format drive in-context learning more than ground-truth labels.
GS-QA: A Benchmark for Geospatial Question Answering cs.DB · 2026-05-21 · unverdicted · none · ref 69 · internal anchor
GS-QA is a new benchmark of 2,800 QA pairs on 28 templates using OSM and Wikipedia data to evaluate LLMs on spatial predicates, multi-source reasoning, and diverse answer types including distances and counts.
LEAF-SQL: Level-wise Exploration with Adaptive Fine-graining for Text-to-SQL Skeleton Prediction cs.CL · 2026-05-10 · unverdicted · none · ref 13 · internal anchor
LEAF-SQL uses level-wise exploration with adaptive fine-graining and dual agents to generate diverse SQL skeletons, reaching 71.6% execution accuracy on the BIRD benchmark and outperforming prior search- and skeleton-based methods.
RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners cs.CL · 2026-04-30 · conditional · none · ref 8 · internal anchor
RSAT uses SFT on verified traces followed by GRPO with NLI faithfulness rewards to make 1-8B models produce verifiable table reasoning with cell citations, raising faithfulness 3.7x to 0.826.
NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions cs.DB · 2026-04-13 · conditional · none · ref 83 · internal anchor
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
Large Language Model-Enhanced Relational Operators: Taxonomy, Benchmark, and Analysis cs.DB · 2026-03-03 · unverdicted · none · ref 60 · internal anchor
The authors define a taxonomy for LLM-enhanced relational operators categorized into Select, Match, Impute, Cluster and Order, and release LROBench to evaluate single and multi-operator queries on semantic database processing.
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images cs.CV · 2025-09-09 · conditional · none · ref 48 · internal anchor
Visual-TableQA is a new open-domain benchmark of rendered table images and complex QA pairs created via multi-LLM collaborative generation, with fine-tuned models showing robust generalization to external tests.
FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding cs.CV · 2025-04-14 · unverdicted · none · ref 83 · internal anchor
FLARE is a vision-language model family using text-guided vision encoding, context-aware alignment decoding, dual-semantic mapping loss, and text-driven VQA synthesis to achieve deep cross-modal integration, outperforming larger models with only 630 vision tokens at 3B scale.
LoRA: Low-Rank Adaptation of Large Language Models cs.CL · 2021-06-17 · accept · none · ref 62 · internal anchor
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
Cost-Effective Model Evaluation with Meta-Learning cs.LG · 2026-05-22 · unverdicted · none · ref 105 · internal anchor
MetaEvaluator applies meta-learning over reference models to deliver label-free performance estimates for unseen models across architectures and modalities on unlabeled datasets.
$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin cs.LG · 2026-05-09 · unverdicted · none · ref 41 · internal anchor
ξ-DPO rewrites the preference objective as minimizing distance to optimal margins and defines reward as a chosen-to-rejected ratio, yielding a bounded, interpretable margin ξ set directly from the initial reward-gap distribution.
Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL cs.CL · 2026-05-06 · unverdicted · none · ref 8 · internal anchor
FineStep adds step-level process rewards and credit assignment to tool-augmented Text-to-SQL, achieving 3.25% higher execution accuracy than GRPO on BIRD while cutting redundant tool calls.
FINER-SQL: Boosting Small Language Models for Text-to-SQL cs.DB · 2026-05-05 · unverdicted · none · ref 44 · internal anchor
FINER-SQL boosts 3B-parameter small language models to 67.73% and 85% execution accuracy on BIRD and Spider benchmarks via dense memory and atomic rewards in group relative policy optimization, matching larger LLMs at lower latency.
EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement cs.DB · 2026-05-01 · unverdicted · none · ref 29 · internal anchor
EGRefine optimizes column renamings via execution-grounded verification and view materialization to recover Text-to-SQL accuracy lost to schema naming issues while guaranteeing query equivalence.
LeGo-Code: Can Modular Curriculum Learning Advance Complex Code Generation? Insights from Text-to-SQL cs.AI · 2026-04-20 · unverdicted · none · ref 26 · internal anchor
Modular curriculum learning with tier-specific adapters outperforms standard fine-tuning on complex Text-to-SQL queries in Spider and BIRD benchmarks by avoiding catastrophic forgetting.
ReCoQA: A Benchmark for Tool-Augmented and Multi-Step Reasoning in Real Estate Question and Answering cs.CL · 2026-04-20 · unverdicted · none · ref 2 · internal anchor
ReCoQA is a new large-scale benchmark for multi-step tool-augmented reasoning in real estate QA, accompanied by the HIRE-Agent hierarchical understand-plan-execute baseline.
SQL Query Engine: A Self-Healing LLM Pipeline for Natural Language to PostgreSQL Translation cs.DB · 2026-04-15 · unverdicted · none · ref 1 · internal anchor
A self-healing LLM pipeline for natural language to PostgreSQL translation achieves up to 9.3 percentage point accuracy gains on benchmarks through error diagnosis and anti-regression mechanisms.
AV-SQL: Decomposing Complex Text-to-SQL Queries with Agentic Views cs.DB · 2026-04-08 · unverdicted · none · ref 53 · internal anchor
AV-SQL uses a pipeline of LLM agents to generate intermediate CTE views that decompose complex Text-to-SQL queries, reaching 70.38% execution accuracy on Spider 2.0.
OmniTQA: A Cost-Aware System for Hybrid Query Processing over Semi-Structured Data cs.DB · 2026-04-02 · unverdicted · none · ref 46 · internal anchor
OmniTQA integrates LLM semantic reasoning as a first-class query operator with classical relational operators in a cost-aware planner for hybrid structured and semi-structured data.
Natural Language Interfaces for Spatial and Temporal Databases: A Comprehensive Overview of Methods, Taxonomy, and Future Directions cs.DB · 2026-03-24 · unverdicted · none · ref 1 · internal anchor
A literature survey that taxonomizes methods, datasets, and evaluation practices for natural language interfaces to geospatial and temporal databases while identifying recurring trends and future directions.
TabXEval: Why this is a Bad Table? An eXhaustive Rubric for Table Evaluation cs.CL · 2025-05-28 · conditional · none · ref 3 · internal anchor
TabXEval is a rubric-based two-phase framework using structural alignment (TabAlign) and semantic-syntactic comparison (TabCompare) to evaluate tables more precisely than standard metrics.
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning cs.CV · 2024-12-18 · unverdicted · none · ref 133 · internal anchor
VPiT enables pretrained LLMs to perform both visual understanding and generation by predicting discrete text tokens and continuous visual tokens, with understanding data proving more effective than generation-specific data.
Encoding Database Schemas with Relation-Aware Self-Attention for Text-to-SQL Parsers cs.LG · 2019-06-27 · unverdicted · none · ref 25 · internal anchor
Relation-aware self-attention encodes schema structure for text-to-SQL, raising exact-match accuracy on Spider from 18.96% to 42.94%.
SecureMCP: A Policy-Enforced LLM Data Access Framework for AIoT Systems via Model Context Protocol cs.CR · 2026-05-06 · unverdicted · none · ref 7 · internal anchor
SecureMCP integrates RBAC with five sequential defense modules in an MCP server to achieve 82.3% policy compliance against adversarial LLM SQL queries in AIoT while preserving execution accuracy.
SCOPE:Planning for Hybrid Querying over Clinical Trial Data cs.CL · 2026-04-28 · unverdicted · none · ref 3 · internal anchor
SCOPE uses explicit multi-LLM planning to improve accuracy on 1,500 hybrid reasoning questions over clinical trial tables compared to zero-shot, few-shot, CoT, and agent baselines.
A Demonstration of SQLyzr: A Platform for Fine-Grained Text-to-SQL Evaluation and Analysis cs.DB · 2026-04-23 · unverdicted · none · ref 21 · internal anchor
SQLyzr is a new evaluation platform that adds diverse metrics, realistic settings, query classification, and analysis features to overcome the single-score limitations of existing text-to-SQL benchmarks.
FD-NL2SQL: Feedback-Driven Clinical NL2SQL that Improves with Use cs.CL · 2026-04-17 · unverdicted · none · ref 3 · internal anchor
FD-NL2SQL is a feedback-driven clinical NL2SQL system that decomposes questions, retrieves exemplars via embeddings, synthesizes SQL, and expands its example bank from user edits plus logic-based mutations to improve without new annotations.
Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning cs.CL · 2026-04-11 · unverdicted · none · ref 68 · internal anchor
APMPO boosts average Pass@1 scores on math reasoning benchmarks by 3 points over GRPO by using an adaptive power-mean policy objective and feedback-driven clipping bounds in RLVR training.
Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs cs.CL · 2026-04-11 · unverdicted · none · ref 83 · internal anchor
FREIA applies free energy principles and adaptive advantage shaping to unsupervised RL, outperforming baselines by 0.5-3.5 Pass@1 points on math reasoning with a 1.5B model.
HFX: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling cs.DC · 2025-08-21 · unverdicted · none · ref 56 · internal anchor
HFX jointly designs scheduling and scaling for multi-SLO LLM serving, achieving up to 4.44x higher SLO attainment, 65.82% lower latency, and 49.81% lower cost than prior systems on multi-task workloads.
StarCoder: may the source be with you! cs.CL · 2023-05-09 · accept · none · ref 181 · internal anchor
StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.
CraftAssist: A Framework for Dialogue-enabled Interactive Agents cs.AI · 2019-07-19 · unverdicted · none · ref 21 · internal anchor
CraftAssist supplies a Minecraft bot, dialogue interface, and data-recording platform intended to support research on agents that execute tasks specified through conversation.
Knowledge Distillation for Low-Resource Open-source Text-to-SQL Model cs.CL · 2026-05-13 · unverdicted · none · ref 8 · internal anchor
A knowledge-aware Text-to-SQL framework constructs domain knowledge bases to generate synthetic data and enhance inference, claiming substantial gains on seven benchmarks especially in low-resource settings.
Retrieve Only Relevant Tables Whether Few or Many: Adaptive Table Retrieval Method cs.IR · 2026-04-12 · unverdicted · none · ref 72 · internal anchor
An adaptive thresholding mechanism combined with sliding-window reranking retrieves a query-dependent number of tables from large corpora, improving retrieval and downstream text-to-SQL performance on Spider, BIRD, and Spider 2.0.
M3: Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis cs.IR · 2025-06-27 · accept · none · ref 21 · internal anchor
M3 uses LLMs to translate natural language into SQL for the MIMIC-IV database, achieving 93-94% accuracy on benchmark questions with support for local privacy-preserving deployment.
Why Build an Assistant in Minecraft? cs.AI · 2019-07-22 · unverdicted · none · ref 103 · internal anchor
A rationale is presented for developing an assistant in Minecraft to advance natural language understanding and dialogue learning.
MLFriend: Interactive Prediction Task Recommendation for Event-Driven Time-Series Data cs.LG · 2019-06-28 · unverdicted · none · ref 24 · internal anchor
MLFriend enumerates prediction tasks for event-driven time-series data and interactively recommends useful ones, with evaluation on three datasets yielding 2885 tasks of which 722 were deemed useful by experts.
Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation cs.CL · 2025-10-08 · unverdicted · none · ref 19 · internal anchor
A survey that categorizes TQA benchmarks and LLM modeling strategies by challenges while identifying underexplored areas such as reinforcement learning.
Large Language Models: A Survey cs.CL · 2024-02-09 · accept · none · ref 186 · internal anchor
The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.
Neural Machine Translating from Natural Language to SPARQL cs.CL · 2019-06-21 · unverdicted · none · ref 21 · internal anchor
Eight NMT models are evaluated for natural language to SPARQL translation, with CNN-based models reaching BLEU up to 98 and accuracy up to 94% on high-quality datasets.

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer