NoahShinn, FedericoCassano, AshwinGopinath, KarthikNarasimhan, andShunyuYao

Mohammadreza Pourreza et al · 2023 · arXiv 2304.11015

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

cs.CL · 2023-10-05 · conditional · novelty 8.0

DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.

Memory Architectures for Multi-Turn Text-to-SQL: A Benchmark and Empirical Study

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

EnterpriseMem-Bench shows stateless multi-turn Text-to-SQL accuracy drops to zero by turn 3, working memory is the main driver of gains, and additional memory components yield model- and dataset-dependent effects from +14 to -16 percentage points.

SANE Schema-aware Natural-language Evaluation of Biological Data

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

SANE is a new schema-aware benchmark paradigm for text-to-SQL evaluation that demonstrates few-shot LLMs with structured prompting can generate accurate queries on constrained biological data schemas without fine-tuning.

FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents

cs.CL · 2026-05-04 · unverdicted · novelty 6.0

FlexSQL reaches 65.4% on Spider2-Snow by allowing agents to flexibly explore schemas, generate diverse plans, choose SQL or Python execution, and apply two-tiered repair.

Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

TeCoD improves Text-to-SQL execution accuracy by up to 36% over in-context learning and cuts latency 2.2x on matched queries by extracting templates from historical pairs and enforcing them with constrained decoding.

Exploring the Semantic Gap in Agentic Data Systems: A Formative Study of Operationalization Failures in Analytical Workflows

cs.DB · 2026-07-01 · unverdicted · novelty 5.0

Formative study across three domains identifies five recurring classes of operationalization failures in agent-generated analytical workflows.

Schema-First Retrieval: Embedding Catalogs for Natural Language Analytics

cs.IR · 2026-06-23 · unverdicted · novelty 5.0

Schema-First Retrieval embeds catalog metadata rather than rows and uses parallel retrieval plus reranking to raise table and column recall and cut SQL execution errors on three benchmarks.

MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL

cs.CL · 2025-11-02 · unverdicted · novelty 5.0

MARS-SQL trains a multi-agent RL system with ReAct-style interaction and generative validation to produce SQL queries, reaching 77.84% execution accuracy on BIRD dev and 89.75% on Spider test.

BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning

cs.AI · 2026-06-01 · unverdicted · novelty 4.0

BADGER is a new enterprise evaluation framework that adds LLM-assisted SQL component extraction and a Hybrid-EX metric validated on 150 human-annotated queries to existing text-to-SQL and agentic assessment methods.

Knowledge Distillation for Low-Resource Open-source Text-to-SQL Model

cs.CL · 2026-05-13 · unverdicted · novelty 4.0

A knowledge-aware Text-to-SQL framework constructs domain knowledge bases to generate synthetic data and enhance inference, claiming substantial gains on seven benchmarks especially in low-resource settings.

LLM-Based SQL Generation: Prompting, Self-Refinement, and Adaptive Weighted Majority Voting

cs.AI · 2026-01-25 · unverdicted · novelty 4.0

SSEV reaches 85.5-86.4% execution accuracy on Spider benchmarks and 66.3% on BIRD-Dev through self-refinement and voting; ReCAPAgent-SQL achieves 31% on initial Spider 2.0-Lite queries via agent collaboration.

citing papers explorer

Showing 10 of 10 citing papers after filters.

Memory Architectures for Multi-Turn Text-to-SQL: A Benchmark and Empirical Study cs.CL · 2026-05-25 · unverdicted · none · ref 7
EnterpriseMem-Bench shows stateless multi-turn Text-to-SQL accuracy drops to zero by turn 3, working memory is the main driver of gains, and additional memory components yield model- and dataset-dependent effects from +14 to -16 percentage points.
SANE Schema-aware Natural-language Evaluation of Biological Data cs.CL · 2026-06-03 · unverdicted · none · ref 11
SANE is a new schema-aware benchmark paradigm for text-to-SQL evaluation that demonstrates few-shot LLMs with structured prompting can generate accurate queries on constrained biological data schemas without fine-tuning.
FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents cs.CL · 2026-05-04 · unverdicted · none · ref 17
FlexSQL reaches 65.4% on Spider2-Snow by allowing agents to flexibly explore schemas, generate diverse plans, choose SQL or Python execution, and apply two-tiered repair.
Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding cs.CL · 2026-04-30 · unverdicted · none · ref 24
TeCoD improves Text-to-SQL execution accuracy by up to 36% over in-context learning and cuts latency 2.2x on matched queries by extracting templates from historical pairs and enforcing them with constrained decoding.
Exploring the Semantic Gap in Agentic Data Systems: A Formative Study of Operationalization Failures in Analytical Workflows cs.DB · 2026-07-01 · unverdicted · none · ref 19
Formative study across three domains identifies five recurring classes of operationalization failures in agent-generated analytical workflows.
Schema-First Retrieval: Embedding Catalogs for Natural Language Analytics cs.IR · 2026-06-23 · unverdicted · none · ref 46
Schema-First Retrieval embeds catalog metadata rather than rows and uses parallel retrieval plus reranking to raise table and column recall and cut SQL execution errors on three benchmarks.
MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL cs.CL · 2025-11-02 · unverdicted · none · ref 24
MARS-SQL trains a multi-agent RL system with ReAct-style interaction and generative validation to produce SQL queries, reaching 77.84% execution accuracy on BIRD dev and 89.75% on Spider test.
BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning cs.AI · 2026-06-01 · unverdicted · none · ref 14
BADGER is a new enterprise evaluation framework that adds LLM-assisted SQL component extraction and a Hybrid-EX metric validated on 150 human-annotated queries to existing text-to-SQL and agentic assessment methods.
Knowledge Distillation for Low-Resource Open-source Text-to-SQL Model cs.CL · 2026-05-13 · unverdicted · none · ref 5
A knowledge-aware Text-to-SQL framework constructs domain knowledge bases to generate synthetic data and enhance inference, claiming substantial gains on seven benchmarks especially in low-resource settings.
LLM-Based SQL Generation: Prompting, Self-Refinement, and Adaptive Weighted Majority Voting cs.AI · 2026-01-25 · unverdicted · none · ref 8
SSEV reaches 85.5-86.4% execution accuracy on Spider benchmarks and 66.3% on BIRD-Dev through self-refinement and voting; ReCAPAgent-SQL achieves 31% on initial Spider 2.0-Lite queries via agent collaboration.

NoahShinn, FedericoCassano, AshwinGopinath, KarthikNarasimhan, andShunyuYao

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer