DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.
NoahShinn, FedericoCassano, AshwinGopinath, KarthikNarasimhan, andShunyuYao
11 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
EnterpriseMem-Bench shows stateless multi-turn Text-to-SQL accuracy drops to zero by turn 3, working memory is the main driver of gains, and additional memory components yield model- and dataset-dependent effects from +14 to -16 percentage points.
SANE is a new schema-aware benchmark paradigm for text-to-SQL evaluation that demonstrates few-shot LLMs with structured prompting can generate accurate queries on constrained biological data schemas without fine-tuning.
FlexSQL reaches 65.4% on Spider2-Snow by allowing agents to flexibly explore schemas, generate diverse plans, choose SQL or Python execution, and apply two-tiered repair.
TeCoD improves Text-to-SQL execution accuracy by up to 36% over in-context learning and cuts latency 2.2x on matched queries by extracting templates from historical pairs and enforcing them with constrained decoding.
Formative study across three domains identifies five recurring classes of operationalization failures in agent-generated analytical workflows.
Schema-First Retrieval embeds catalog metadata rather than rows and uses parallel retrieval plus reranking to raise table and column recall and cut SQL execution errors on three benchmarks.
MARS-SQL trains a multi-agent RL system with ReAct-style interaction and generative validation to produce SQL queries, reaching 77.84% execution accuracy on BIRD dev and 89.75% on Spider test.
BADGER is a new enterprise evaluation framework that adds LLM-assisted SQL component extraction and a Hybrid-EX metric validated on 150 human-annotated queries to existing text-to-SQL and agentic assessment methods.
A knowledge-aware Text-to-SQL framework constructs domain knowledge bases to generate synthetic data and enhance inference, claiming substantial gains on seven benchmarks especially in low-resource settings.
SSEV reaches 85.5-86.4% execution accuracy on Spider benchmarks and 66.3% on BIRD-Dev through self-refinement and voting; ReCAPAgent-SQL achieves 31% on initial Spider 2.0-Lite queries via agent collaboration.
citing papers explorer
-
Memory Architectures for Multi-Turn Text-to-SQL: A Benchmark and Empirical Study
EnterpriseMem-Bench shows stateless multi-turn Text-to-SQL accuracy drops to zero by turn 3, working memory is the main driver of gains, and additional memory components yield model- and dataset-dependent effects from +14 to -16 percentage points.
-
SANE Schema-aware Natural-language Evaluation of Biological Data
SANE is a new schema-aware benchmark paradigm for text-to-SQL evaluation that demonstrates few-shot LLMs with structured prompting can generate accurate queries on constrained biological data schemas without fine-tuning.
-
FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents
FlexSQL reaches 65.4% on Spider2-Snow by allowing agents to flexibly explore schemas, generate diverse plans, choose SQL or Python execution, and apply two-tiered repair.
-
Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding
TeCoD improves Text-to-SQL execution accuracy by up to 36% over in-context learning and cuts latency 2.2x on matched queries by extracting templates from historical pairs and enforcing them with constrained decoding.
-
Exploring the Semantic Gap in Agentic Data Systems: A Formative Study of Operationalization Failures in Analytical Workflows
Formative study across three domains identifies five recurring classes of operationalization failures in agent-generated analytical workflows.
-
Schema-First Retrieval: Embedding Catalogs for Natural Language Analytics
Schema-First Retrieval embeds catalog metadata rather than rows and uses parallel retrieval plus reranking to raise table and column recall and cut SQL execution errors on three benchmarks.
-
MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL
MARS-SQL trains a multi-agent RL system with ReAct-style interaction and generative validation to produce SQL queries, reaching 77.84% execution accuracy on BIRD dev and 89.75% on Spider test.
-
BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning
BADGER is a new enterprise evaluation framework that adds LLM-assisted SQL component extraction and a Hybrid-EX metric validated on 150 human-annotated queries to existing text-to-SQL and agentic assessment methods.
-
Knowledge Distillation for Low-Resource Open-source Text-to-SQL Model
A knowledge-aware Text-to-SQL framework constructs domain knowledge bases to generate synthetic data and enhance inference, claiming substantial gains on seven benchmarks especially in low-resource settings.
-
LLM-Based SQL Generation: Prompting, Self-Refinement, and Adaptive Weighted Majority Voting
SSEV reaches 85.5-86.4% execution accuracy on Spider benchmarks and 66.3% on BIRD-Dev through self-refinement and voting; ReCAPAgent-SQL achieves 31% on initial Spider 2.0-Lite queries via agent collaboration.