SPENCE shows older NL2SQL benchmarks like Spider have high performance sensitivity to syntactic changes, indicating likely training contamination, while newer ones like BIRD show little sensitivity and appear largely clean.
and Xie, Jinxia and Huang, Pengsheng
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Progress-SQL introduces a multi-turn RL framework with ODT-based structural alignment and progressive rewards that measure improvement across refinement turns, yielding gains on BIRD, Spider, and robustness benchmarks.
citing papers explorer
-
SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks
SPENCE shows older NL2SQL benchmarks like Spider have high performance sensitivity to syntactic changes, indicating likely training contamination, while newer ones like BIRD show little sensitivity and appear largely clean.
-
Progress-SQL: Improving Reinforcement Learning for Text-to-SQL via Progressive Rewards
Progress-SQL introduces a multi-turn RL framework with ODT-based structural alignment and progressive rewards that measure improvement across refinement turns, yielding gains on BIRD, Spider, and robustness benchmarks.