Benchmarking and Improving Text-to- SQL Generation under Ambiguity

Benchmarking, Improving Text-to-SQL Generation under Ambiguity · 2023 · DOI 10.18653/v1/2023.emnlp-main.436

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Draft-Refine-Optimize: Self-Evolved Learning for Natural Language to MongoDB Query Generation

cs.DB · 2026-03-11 · unverdicted · novelty 7.0

EvoMQL uses iterative Draft-Refine-Optimize cycles with execution feedback to reach 76.6% accuracy on EAI and 83.1% on TEND benchmarks for natural language to MongoDB query generation.

SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

SPENCE shows older NL2SQL benchmarks like Spider have high performance sensitivity to syntactic changes, indicating likely training contamination, while newer ones like BIRD show little sensitivity and appear largely clean.

citing papers explorer

Showing 2 of 2 citing papers.

Draft-Refine-Optimize: Self-Evolved Learning for Natural Language to MongoDB Query Generation cs.DB · 2026-03-11 · unverdicted · none · ref 3
EvoMQL uses iterative Draft-Refine-Optimize cycles with execution feedback to reach 76.6% accuracy on EAI and 83.1% on TEND benchmarks for natural language to MongoDB query generation.
SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks cs.CL · 2026-04-20 · unverdicted · none · ref 37
SPENCE shows older NL2SQL benchmarks like Spider have high performance sensitivity to syntactic changes, indicating likely training contamination, while newer ones like BIRD show little sensitivity and appear largely clean.

Benchmarking and Improving Text-to- SQL Generation under Ambiguity

fields

years

verdicts

representative citing papers

citing papers explorer