CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.
E-sql: Direct schema linking via question enrichment in text-to-sql
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
PV-SQL boosts Text-to-SQL execution accuracy by 5% and valid efficiency by 20.8% on BIRD benchmarks via database probing and rule-based SQL verification while using fewer tokens.
N-rep consistency achieves comparable BIRD benchmark scores for text-to-SQL at $0.039 per query by combining multiple schema representations, without chain-of-thought reasoning or fine-tuning.
KaSLA applies knapsack optimization hierarchically to schema linking for LLM text-to-SQL, claiming better results than large models and improved SQL generation on Spider and BIRD.
Post-generation grammar and schema filtering on top of confidence scoring raises syntactic validity and execution success for Text2Cypher but increases empty outputs and lowers coverage.
APMPO boosts average Pass@1 scores on math reasoning benchmarks by 3 points over GRPO by using an adaptive power-mean policy objective and feedback-driven clipping bounds in RLVR training.
FREIA applies free energy principles and adaptive advantage shaping to unsupervised RL, outperforming baselines by 0.5-3.5 Pass@1 points on math reasoning with a 1.5B model.
citing papers explorer
-
CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation
CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.
-
NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
-
PV-SQL: Synergizing Database Probing and Rule-based Verification for Text-to-SQL Agents
PV-SQL boosts Text-to-SQL execution accuracy by 5% and valid efficiency by 20.8% on BIRD benchmarks via database probing and rule-based SQL verification while using fewer tokens.
-
Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning
N-rep consistency achieves comparable BIRD benchmark scores for text-to-SQL at $0.039 per query by combining multiple schema representations, without chain-of-thought reasoning or fine-tuning.
-
Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation
KaSLA applies knapsack optimization hierarchically to schema linking for LLM text-to-SQL, claiming better results than large models and improved SQL generation on Spider and BIRD.
-
Extending Confidence-Based Text2Cypher with Grammar and Schema Aware Filtering
Post-generation grammar and schema filtering on top of confidence scoring raises syntactic validity and execution success for Text2Cypher but increases empty outputs and lowers coverage.
-
Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning
APMPO boosts average Pass@1 scores on math reasoning benchmarks by 3 points over GRPO by using an adaptive power-mean policy objective and feedback-driven clipping bounds in RLVR training.
-
Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs
FREIA applies free energy principles and adaptive advantage shaping to unsupervised RL, outperforming baselines by 0.5-3.5 Pass@1 points on math reasoning with a 1.5B model.