hub

Text- to-sql empowered by large language models: A benchmark evaluation

Text-to-sql empowered by large language models: A benchmark evaluation · 2023 · arXiv 2308.15363

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

cs.CR · 2025-07-14 · unverdicted · novelty 8.0

ExCyTIn-Bench is the first benchmark of 7542 questions from Microsoft Sentinel threat investigation graphs, where the best LLM agent achieves a reward of 0.606.

Residual Skill Optimization for Text-to-SQL Ensembles

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

Residual skill optimization creates complementary Text-to-SQL agents by training each new skill on prior ensemble failures, yielding accuracy gains on Spider2-Lite and transfer to other dialects and tasks.

CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.

FINER-SQL: Boosting Small Language Models for Text-to-SQL

cs.DB · 2026-05-05 · unverdicted · novelty 6.0

FINER-SQL boosts 3B-parameter small language models to 67.73% and 85% execution accuracy on BIRD and Spider benchmarks via dense memory and atomic rewards in group relative policy optimization, matching larger LLMs at lower latency.

EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement

cs.DB · 2026-05-01 · unverdicted · novelty 6.0

EGRefine optimizes column renamings via execution-grounded verification and view materialization to recover Text-to-SQL accuracy lost to schema naming issues while guaranteeing query equivalence.

Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

TeCoD improves Text-to-SQL execution accuracy by up to 36% over in-context learning and cuts latency 2.2x on matched queries by extracting templates from historical pairs and enforcing them with constrained decoding.

Querying Structured Data Through Natural Language Using Language Models

cs.CL · 2026-04-03 · conditional · novelty 6.0

Fine-tuning an 8B LLM with synthetic data enables accurate natural language querying of structured datasets like accessibility services in Spain, generalizing to new locations.

Access Paths for Efficient Ordering with Large Language Models

cs.DB · 2025-08-30 · unverdicted · novelty 6.0

Introduces the LLM ORDER BY semantic operator with algorithmic improvements, a semantic-aware external merge sort, and a budget-aware optimizer that selects near-optimal access paths for LLM-based ordering.

Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning

cs.CL · 2025-05-20 · unverdicted · novelty 6.0

N-rep consistency achieves comparable BIRD benchmark scores for text-to-SQL at $0.039 per query by combining multiple schema representations, without chain-of-thought reasoning or fine-tuning.

Intelligent Drill-Down: Large Language Model-Driven Drill-Down Technique for Human-AI Collaborative Visual Exploration

cs.HC · 2026-04-18 · unverdicted · novelty 5.0

An LLM-based framework recommends drill-down paths in visual analytics by approximating a greedy algorithm, interpreting user intent, and managing exploration branches to reduce cognitive load.

From Business Events to Auditable Decisions: Ontology-Governed Graph Simulation for Enterprise AI

cs.AI · 2026-04-08 · unverdicted · novelty 5.0

LOM-action uses business events to drive ontology-governed graph simulations that generate auditable decisions, reporting 93.82% accuracy and 98.74% tool-chain F1 versus 24-36% F1 for frontier LLMs.

MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL

cs.CL · 2025-11-02 · unverdicted · novelty 5.0

MARS-SQL trains a multi-agent RL system with ReAct-style interaction and generative validation to produce SQL queries, reaching 77.84% execution accuracy on BIRD dev and 89.75% on Spider test.

XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL

cs.CL · 2025-07-07 · unverdicted · novelty 5.0

XiYan-SQL achieves SOTA Text-to-SQL accuracy by combining schema filtering, a multi-generator ensemble fine-tuned on varied SQL formats, and a selection model.

CHESS: Contextual Harnessing for Efficient SQL Synthesis

cs.LG · 2024-05-27 · conditional · novelty 5.0

CHESS deploys four LLM agents to retrieve information, prune schemas, generate refined SQL candidates, and validate via unit tests, reporting up to 71.10% accuracy on BIRD with 83% fewer calls than leading proprietary baselines.

Retrieve Only Relevant Tables Whether Few or Many: Adaptive Table Retrieval Method

cs.IR · 2026-04-12 · unverdicted · novelty 4.0

An adaptive thresholding mechanism combined with sliding-window reranking retrieves a query-dependent number of tables from large corpora, improving retrieval and downstream text-to-SQL performance on Spider, BIRD, and Spider 2.0.

LLM-Based SQL Generation: Prompting, Self-Refinement, and Adaptive Weighted Majority Voting

cs.AI · 2026-01-25 · unverdicted · novelty 4.0

SSEV reaches 85.5-86.4% execution accuracy on Spider benchmarks and 66.3% on BIRD-Dev through self-refinement and voting; ReCAPAgent-SQL achieves 31% on initial Spider 2.0-Lite queries via agent collaboration.

LLM Agents Are the Antidote to Walled Gardens

cs.LG · 2025-06-30 · unverdicted · novelty 4.0

LLM agents enable universal interoperability by serving as automatic translators and adapters between proprietary digital services.

citing papers explorer

Showing 17 of 17 citing papers.

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation cs.CR · 2025-07-14 · unverdicted · none · ref 10
ExCyTIn-Bench is the first benchmark of 7542 questions from Microsoft Sentinel threat investigation graphs, where the best LLM agent achieves a reward of 0.606.
Residual Skill Optimization for Text-to-SQL Ensembles cs.CL · 2026-05-20 · unverdicted · none · ref 12
Residual skill optimization creates complementary Text-to-SQL agents by training each new skill on prior ensemble failures, yielding accuracy gains on Spider2-Lite and transfer to other dialects and tasks.
CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation cs.CL · 2026-05-08 · unverdicted · none · ref 19
CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.
FINER-SQL: Boosting Small Language Models for Text-to-SQL cs.DB · 2026-05-05 · unverdicted · none · ref 91
FINER-SQL boosts 3B-parameter small language models to 67.73% and 85% execution accuracy on BIRD and Spider benchmarks via dense memory and atomic rewards in group relative policy optimization, matching larger LLMs at lower latency.
EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement cs.DB · 2026-05-01 · unverdicted · none · ref 4
EGRefine optimizes column renamings via execution-grounded verification and view materialization to recover Text-to-SQL accuracy lost to schema naming issues while guaranteeing query equivalence.
Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding cs.CL · 2026-04-30 · unverdicted · none · ref 9
TeCoD improves Text-to-SQL execution accuracy by up to 36% over in-context learning and cuts latency 2.2x on matched queries by extracting templates from historical pairs and enforcing them with constrained decoding.
Querying Structured Data Through Natural Language Using Language Models cs.CL · 2026-04-03 · conditional · none · ref 6
Fine-tuning an 8B LLM with synthetic data enables accurate natural language querying of structured datasets like accessibility services in Spain, generalizing to new locations.
Access Paths for Efficient Ordering with Large Language Models cs.DB · 2025-08-30 · unverdicted · none · ref 18
Introduces the LLM ORDER BY semantic operator with algorithmic improvements, a semantic-aware external merge sort, and a budget-aware optimizer that selects near-optimal access paths for LLM-based ordering.
Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning cs.CL · 2025-05-20 · unverdicted · none · ref 2
N-rep consistency achieves comparable BIRD benchmark scores for text-to-SQL at $0.039 per query by combining multiple schema representations, without chain-of-thought reasoning or fine-tuning.
Intelligent Drill-Down: Large Language Model-Driven Drill-Down Technique for Human-AI Collaborative Visual Exploration cs.HC · 2026-04-18 · unverdicted · none · ref 17
An LLM-based framework recommends drill-down paths in visual analytics by approximating a greedy algorithm, interpreting user intent, and managing exploration branches to reduce cognitive load.
From Business Events to Auditable Decisions: Ontology-Governed Graph Simulation for Enterprise AI cs.AI · 2026-04-08 · unverdicted · none · ref 7
LOM-action uses business events to drive ontology-governed graph simulations that generate auditable decisions, reporting 93.82% accuracy and 98.74% tool-chain F1 versus 24-36% F1 for frontier LLMs.
MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL cs.CL · 2025-11-02 · unverdicted · none · ref 8
MARS-SQL trains a multi-agent RL system with ReAct-style interaction and generative validation to produce SQL queries, reaching 77.84% execution accuracy on BIRD dev and 89.75% on Spider test.
XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL cs.CL · 2025-07-07 · unverdicted · none · ref 13
XiYan-SQL achieves SOTA Text-to-SQL accuracy by combining schema filtering, a multi-generator ensemble fine-tuned on varied SQL formats, and a selection model.
CHESS: Contextual Harnessing for Efficient SQL Synthesis cs.LG · 2024-05-27 · conditional · none · ref 65
CHESS deploys four LLM agents to retrieve information, prune schemas, generate refined SQL candidates, and validate via unit tests, reporting up to 71.10% accuracy on BIRD with 83% fewer calls than leading proprietary baselines.
Retrieve Only Relevant Tables Whether Few or Many: Adaptive Table Retrieval Method cs.IR · 2026-04-12 · unverdicted · none · ref 24
An adaptive thresholding mechanism combined with sliding-window reranking retrieves a query-dependent number of tables from large corpora, improving retrieval and downstream text-to-SQL performance on Spider, BIRD, and Spider 2.0.
LLM-Based SQL Generation: Prompting, Self-Refinement, and Adaptive Weighted Majority Voting cs.AI · 2026-01-25 · unverdicted · none · ref 9
SSEV reaches 85.5-86.4% execution accuracy on Spider benchmarks and 66.3% on BIRD-Dev through self-refinement and voting; ReCAPAgent-SQL achieves 31% on initial Spider 2.0-Lite queries via agent collaboration.
LLM Agents Are the Antidote to Walled Gardens cs.LG · 2025-06-30 · unverdicted · none · ref 31
LLM agents enable universal interoperability by serving as automatic translators and adapters between proprietary digital services.

Text- to-sql empowered by large language models: A benchmark evaluation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer