Database Context Compression for Text-to-SQL on Real-World Large Databases

Jingwen Liu; Junfeng Zhao; Weibin Liao; Xin Gao; Yasha Wang

arxiv: 2606.28601 · v1 · pith:5BG5CV5Onew · submitted 2026-06-26 · 💻 cs.DB · cs.AI

Database Context Compression for Text-to-SQL on Real-World Large Databases

Jingwen Liu , Weibin Liao , Xin Gao , Junfeng Zhao , Yasha Wang This is my paper

Pith reviewed 2026-06-30 00:46 UTC · model grok-4.3

classification 💻 cs.DB cs.AI

keywords Text-to-SQLdatabase context compressionschema linkinglarge language modelsenterprise databasescontext reductionSGCF

0 comments

The pith

A query-agnostic offline compression of database schemas and documentation cuts input size by up to 75 times and lifts Text-to-SQL execution accuracy by 1.8-1.9 percent on large real-world benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that the main obstacle for Text-to-SQL on enterprise databases is not model reasoning but the sheer volume and redundancy of schema, column, and documentation text fed to the model. It introduces DBCC, a middleware that rewrites this material into a much smaller representation using the SGCF principle before any query arrives. Experiments on Spider 2.0-Snow and BIRD show token counts falling from millions to tens of thousands, schema-linking recall rising from zero to over 56 percent, and end-to-end accuracy improving across three different Text-to-SQL systems. Because the compression runs once and offline, it can be dropped into existing pipelines without changing the language model or the query-time logic.

Core claim

The central claim is that database context compression, formalized through the Support-Gain Component Factorization principle, unifies repeated-column removal, table templating, semantic componentization, and evidence purification into a single coverage objective; performing this transformation offline produces a compact representation that preserves query-relevant information and yields higher schema-linking recall and execution accuracy when the compressed context is supplied to existing Text-to-SQL systems.

What carries the argument

DBCC, a model-agnostic middleware that performs offline structural and semantic compression of schemas, descriptions, and external documentation according to the SGCF coverage objective, followed by lightweight online evidence purification.

If this is right

Input token counts fall from 2.6 million to 34.7 thousand on the largest Spider 2.0-Snow subset.
Schema-linking strict recall rises from 0 percent to 56.5 percent under DeepSeek-V3.2 and 63.1 percent under Claude Opus 4.7.
End-to-end execution accuracy increases by 1.8-1.9 percent over three recent Text-to-SQL systems on Spider 2.0-Snow and BIRD.
The compressed representation can be inserted into any existing Text-to-SQL pipeline without retraining the language model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Databases intended for AI use could be redesigned with compressibility as an explicit design goal rather than treating documentation as an afterthought.
The same offline rewrite technique may apply to other LLM tasks that ingest large structured repositories, such as code generation over enterprise codebases.
If the compression preserves coverage, it opens the possibility of maintaining a single compressed database view that serves many downstream models simultaneously.

Load-bearing premise

The compression step, performed without seeing future queries, never discards information that later turns out to be required for generating correct SQL.

What would settle it

Run the same set of previously unseen queries on both the original full context and the DBCC-compressed context; if any query produces a correct execution result only on the full context, the compression has lost necessary information.

Figures

Figures reproduced from arXiv: 2606.28601 by Jingwen Liu, Junfeng Zhao, Weibin Liao, Xin Gao, Yasha Wang.

**Figure 2.** Figure 2: Architecture of DBCC. Phase I (offline, query-agnostic, per database) rewrites the raw database context D = (S, M, E) into a compact view through two operators: a structural operator that turns wide tables and families of sharded tables into column-group factorizations and template hierarchies S→S ′ , and a semantic operator that turns verbose column descriptions into hierarchical keyword tags and shared-t… view at source ↗

**Figure 3.** Figure 3: Cost–utility Pareto frontier on LARGE databases of Spider 2.0-Snow. LLM: DeepSeek-V3.2. Three threshold sweeps are shown: structural-only, semantic-only, and full DBCC. The horizontal axis is the input-token cost (log scale, smaller-is-better, hence inverted); the vertical axis is SRR. SRR rises non-monotonically as the prompt shrinks, peaks at Tok ≈ 34.7K for full DBCC, and drops at extreme over-compressi… view at source ↗

**Figure 4.** Figure 4: Distribution of failure modes of full DBCC on Spider 2.0-Snow. LLM: DeepSeek-V3.2. Slices show the share of each failure mode within the schema-linking error subset; absolute case counts in parentheses [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

Recent progress in Text-to-SQL has been driven by stronger language models and prompting strategies, yet performance on real enterprise benchmarks such as Spider 2.0 and BIRD remains far below that on classical academic datasets. We argue that the main bottleneck is no longer reasoning, but database representation. Real databases contain repeated audit columns, large groups of similar tables, opaque identifiers whose meanings are stored only in documentation, and extensive data dictionaries with little query-relevant information. Existing query-aware methods, including schema linking and retrieval-based schema selection, filter this raw context but still operate on redundant and verbose representations. We reformulate the problem as database context compression, a query-agnostic transformation that rewrites schemas, semantic descriptions, and external documentation into a compact representation. We formalize this transformation with the SGCF (Support-Gain Component Factorization) principle, which unifies repeated column extraction, isomorphic table templating, semantic componentization, and evidence purification under a single coverage objective. Based on SGCF, we propose DBCC, a database-side middleware that performs offline structural and semantic compression together with lightweight online evidence purification. DBCC is model-agnostic and can be integrated into existing Text-to-SQL pipelines. On Spider 2.0-Snow and BIRD, DBCC reduces input context by up to two orders of magnitude (from 2.6M to 34.7K tokens on the largest Spider 2.0-Snow subset), improves schema-linking strict recall from 0% to 56.5% under DeepSeek-V3.2 (63.1% under Claude Opus 4.7), and consistently increases end-to-end execution accuracy by 1.8-1.9% over three recent Text-to-SQL systems. Our code is open-sourced at https://github.com/MrBlankness/SchemaCompression.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DBCC reframes Text-to-SQL schema handling as query-agnostic compression and delivers large token cuts with small accuracy lifts, but the completeness argument for unseen queries needs checking.

read the letter

The core contribution is treating database context as something you can compress offline without knowing the query, using SGCF to pull together repeated-column removal, table templating, and evidence cleanup under one coverage goal. That unification looks new relative to the schema-linking and retrieval work they cite.

The engineering side is solid. They report cutting context from 2.6M tokens down to 34.7K on the biggest Spider 2.0-Snow slice, lifting strict schema-linking recall from 0% to 56.5%, and getting a steady 1.8-1.9% execution-accuracy bump across three recent systems on both Spider 2.0-Snow and BIRD. The middleware is model-agnostic and the code is released, which makes the practical claim testable.

The accuracy numbers are modest, and the abstract gives no ablations, error bars, or derivation of the SGCF objective. The query-agnostic claim is the load-bearing one: if the offline pass drops low-frequency documentation or opaque identifiers that later queries need, the reported gains could shrink on new data. The stress-test concern about rare elements is worth pressing in review, though the paper's own results on the two benchmarks suggest the compression preserved what mattered for those queries.

This is aimed at people shipping LLM-based database interfaces on real enterprise schemas where token budgets matter. It is worth sending to referees because the problem is real, the engineering is concrete, and the open code lets others check the claims directly.

Referee Report

2 major / 0 minor

Summary. The paper claims that a query-agnostic database context compression technique called DBCC, based on the SGCF principle, can reduce the input context for Text-to-SQL by up to two orders of magnitude (from 2.6M to 34.7K tokens) on large databases, improve schema-linking strict recall from 0% to 56.5%, and increase end-to-end execution accuracy by 1.8-1.9% over three recent systems on Spider 2.0-Snow and BIRD benchmarks.

Significance. If the results hold, this work has significant practical implications for Text-to-SQL on real-world large databases by addressing the context length bottleneck through offline compression. The model-agnostic design and open-sourced code at https://github.com/MrBlankness/SchemaCompression are strengths that enhance the contribution's value.

major comments (2)

[Abstract] The abstract reports concrete token-reduction and accuracy numbers, but provides no derivation of SGCF, no error bars, no ablation of the compression steps, and no discussion of how query-agnosticism was validated; evaluation details are absent.
[SGCF definition] The SGCF principle unifies repeated column extraction, isomorphic table templating, semantic componentization, and evidence purification under a coverage objective but supplies no explicit invariant or completeness argument for low-frequency documentation entries or opaque identifiers whose meanings appear only in external docs; this is load-bearing for the claim that the offline pass preserves everything required for unseen queries.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and for recognizing the practical value of addressing the context-length bottleneck in real-world Text-to-SQL. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [Abstract] The abstract reports concrete token-reduction and accuracy numbers, but provides no derivation of SGCF, no error bars, no ablation of the compression steps, and no discussion of how query-agnosticism was validated; evaluation details are absent.

Authors: The abstract is written to be concise. The full manuscript contains the SGCF derivation (Section 3), ablations of the individual compression steps (Section 5.3), and explicit validation of query-agnosticism via experiments on queries unseen during the offline pass (Section 4.2). We will revise the manuscript to add error bars to all reported metrics, include a short pointer to the evaluation protocol in the abstract, and ensure all numerical claims are cross-referenced to the relevant sections and tables. revision: yes
Referee: [SGCF definition] The SGCF principle unifies repeated column extraction, isomorphic table templating, semantic componentization, and evidence purification under a coverage objective but supplies no explicit invariant or completeness argument for low-frequency documentation entries or opaque identifiers whose meanings appear only in external docs; this is load-bearing for the claim that the offline pass preserves everything required for unseen queries.

Authors: We agree that an explicit completeness argument would strengthen the formalization. The coverage objective is intended to retain every semantic component present in the original context, with low-frequency documentation entries kept verbatim and opaque identifiers preserved in the compressed schema; online purification is applied only at query time. The current version does not supply a formal invariant. In revision we will add a dedicated paragraph in Section 3.2 that states the preservation property, provides concrete examples from Spider 2.0-Snow and BIRD involving external documentation, and clarifies the boundary between offline retention and online filtering. revision: yes

Circularity Check

0 steps flagged

No circularity: SGCF is presented as a new unifying principle with empirical outcomes

full rationale

The paper defines SGCF as a coverage objective that unifies several compression operations and reports DBCC's effects as measured improvements on Spider 2.0-Snow and BIRD (context reduction, recall lift, accuracy gains). No equations, fitted parameters, or self-citations are exhibited that reduce the claimed results to inputs by construction; the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

The central claim rests on the premise that database representation (rather than reasoning) is the dominant bottleneck and that SGCF can factor schemas without query-specific loss; no free parameters, axioms, or invented entities are enumerated in the abstract.

invented entities (2)

SGCF principle no independent evidence
purpose: Unify repeated column extraction, isomorphic table templating, semantic componentization, and evidence purification under a single coverage objective
Introduced in the abstract as the formalization of the compression transformation; no independent evidence supplied.
DBCC middleware no independent evidence
purpose: Perform offline structural/semantic compression plus lightweight online evidence purification
Proposed system that implements the SGCF principle; no independent evidence supplied.

pith-pipeline@v0.9.1-grok · 5878 in / 1287 out tokens · 35526 ms · 2026-06-30T00:46:29.255130+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 8 canonical work pages · 1 internal anchor

[1]

Constructing an interactive natural language interface for relational databases,

F. Li and H. V . Jagadish, “Constructing an interactive natural language interface for relational databases,” inProceedings of the VLDB Endow- ment, vol. 8, no. 1, 2014, pp. 73–84

2014
[2]

SQLizer: Query synthesis from natural language,

N. Yaghmazadeh, Y . Wang, I. Dillig, and T. Dillig, “SQLizer: Query synthesis from natural language,”Proc. ACM Program. Lang., vol. 1, no. OOPSLA, pp. 1–26, 2017

2017
[3]

Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task,

T. Yu, R. Zhang, K. Yang, M. Yasunaga, D. Wang, Z. Li, J. Ma, I. Li, Q. Yao, S. Roman, Z. Zhang, and D. Radev, “Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task,” inProc. EMNLP, 2018, pp. 3911–3921

2018
[4]

ATHENA: An ontology-driven system for natural language querying over relational data stores,

D. Saha, A. Floratou, K. Sankaranarayanan, U. F. Minhas, A. R. Mittal, and F. ¨Ozcan, “ATHENA: An ontology-driven system for natural language querying over relational data stores,”Proc. VLDB Endow., vol. 9, no. 12, pp. 1209–1220, 2016

2016
[5]

Bridging the semantic gap with SQL query logs in natural language interfaces to databases,

C. Baik, H. V . Jagadish, and Y . Li, “Bridging the semantic gap with SQL query logs in natural language interfaces to databases,” inProc. IEEE Int. Conf. Data Engineering (ICDE), 2019, pp. 374–385

2019
[6]

Duoquest: A dual-specification system for expressive SQL queries,

C. Baik, Z. Jin, M. J. Cafarella, and H. V . Jagadish, “Duoquest: A dual-specification system for expressive SQL queries,” inProc. ACM SIGMOD Int. Conf. Management of Data, 2020, pp. 2319–2329

2020
[7]

A survey of text-to-SQL in the era of LLMs: Where are we, and where are we going?

X. Liu, S. Shen, B. Li, P. Ma, R. Jiang, Y . Luo, Y . Zhang, J. Fan, G. Li, and N. Tang, “A survey of text-to-SQL in the era of LLMs: Where are we, and where are we going?”IEEE Trans. Knowl. Data Eng., 2025

2025
[8]

Next-generation database interfaces: A survey of LLM-based text-to- SQL,

Z. Hong, Z. Yuan, Q. Zhang, H. Chen, J. Dong, F. Huang, and X. Huang, “Next-generation database interfaces: A survey of LLM-based text-to- SQL,”IEEE Trans. Knowl. Data Eng., 2025

2025
[9]

RAT-SQL: Relation-aware schema encoding and linking for text-to-SQL parsers,

B. Wang, R. Shin, X. Liu, O. Polozov, and M. Richardson, “RAT-SQL: Relation-aware schema encoding and linking for text-to-SQL parsers,” inProc. ACL, 2020, pp. 7567–7578

2020
[10]

RESDSQL: Decoupling schema linking and skeleton parsing for text-to-SQL,

H. Li, J. Zhang, C. Li, and H. Chen, “RESDSQL: Decoupling schema linking and skeleton parsing for text-to-SQL,” inProc. AAAI, vol. 37, no. 11, 2023, pp. 13 067–13 075

2023
[11]

DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction,

M. Pourreza and D. Rafiei, “DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction,”Adv. Neural Inf. Process. Syst., vol. 36, pp. 36 339–36 348, 2023

2023
[12]

CHESS: Contextual Harnessing for Efficient SQL Synthesis

S. Talaei, M. Pourreza, Y .-C. Chang, A. Mirhoseini, and A. Saberi, “CHESS: Contextual harnessing for efficient SQL synthesis,”arXiv preprint arXiv:2405.16755, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

arXiv preprint arXiv:2410.01943

M. Pourreza, H. Li, R. Sun, Y . Chung, S. Talaeiet al., “CHASE-SQL: Multi-path reasoning and preference optimized candidate selection in text-to-SQL,”arXiv preprint arXiv:2410.01943, 2024

work page arXiv 2024
[14]

Text-to- SQL empowered by large language models: A benchmark evaluation,

D. Gao, H. Wang, Y . Li, X. Sun, Y . Qian, B. Ding, and J. Zhou, “Text-to- SQL empowered by large language models: A benchmark evaluation,” Proc. VLDB Endow., vol. 17, no. 5, pp. 1132–1145, 2024

2024
[15]

Can LLM already serve as a database interface? A big bench for large-scale database grounded text-to-SQLs,

J. Li, B. Hui, G. Qu, J. Yang, B. Li, B. Li, B. Wang, B. Qin, R. Geng, N. Huoet al., “Can LLM already serve as a database interface? A big bench for large-scale database grounded text-to-SQLs,”Adv. Neural Inf. Process. Syst., vol. 36, pp. 42 330–42 357, 2023

2023
[16]

Spider 2.0: Evaluating language models on real-world enterprise text-to-sql workflows.arXiv preprint arXiv:2411.07763, 2024

F. Lei, J. Chen, Y . Yeet al., “Spider 2.0: Evaluating language mod- els on real-world enterprise text-to-SQL workflows,”arXiv preprint arXiv:2411.07763, 2024

work page arXiv 2024
[17]

Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing,

X. V . Lin, R. Socher, and C. Xiong, “Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing,” inFindings of EMNLP, 2020, pp. 4870–4888

2020
[18]

arXiv preprint arXiv:2411.00073 , year=

Z. Cao, Y . Zheng, Z. Fan, X. Zhang, W. Chen, and X. Bai, “RSL- SQL: Robust schema linking in text-to-SQL generation,”arXiv preprint arXiv:2411.00073, 2024

work page arXiv 2024
[19]

CRUSH4SQL: Collective retrieval using schema hallucination for Text2SQL,

M. Kothyari, D. Dhingra, S. Sarawagi, and S. Chakrabarti, “CRUSH4SQL: Collective retrieval using schema hallucination for Text2SQL,” inProc. EMNLP, 2023, pp. 14 054–14 066

2023
[20]

LinkAlign: Scalable schema linking for real-world large-scale multi-database text-to-SQL,

Y . Wang, P. Liu, and X. Yang, “LinkAlign: Scalable schema linking for real-world large-scale multi-database text-to-SQL,” inProc. EMNLP, 2025, pp. 977–991

2025
[21]

CodeS: Towards building open-source language models for text-to-SQL,

H. Li, J. Zhang, H. Liu, J. Fan, X. Zhang, J. Zhu, R. Wei, H. Pan, C. Li, and H. Chen, “CodeS: Towards building open-source language models for text-to-SQL,”Proc. ACM Manag. Data, vol. 2, no. 3, pp. 1–28, 2024

2024
[22]

DB-Explore: Automated database exploration and instruction synthesis for text-to-SQL,

H. Ma, Y . Shen, H. Liuet al., “DB-Explore: Automated database exploration and instruction synthesis for text-to-SQL,”arXiv preprint arXiv:2503.04959, 2025

work page arXiv 2025
[23]

ReFoRCE: A text-to-SQL agent with self-refinement, format restriction, and column exploration,

M. Deng, A. Ramachandran, C. Xu, L. Hu, Z. Yao, A. Datta, and H. Zhang, “ReFoRCE: A text-to-SQL agent with self-refinement, format restriction, and column exploration,” inICLR Workshop on VerifAI, 2025

2025
[24]

MAC-SQL: A multi-agent collaborative framework for text-to- SQL,

B. Wang, C. Ren, J. Yang, X. Liang, J. Bai, Q.-W. Zhang, Z. Yan, and Z. Li, “MAC-SQL: A multi-agent collaborative framework for text-to- SQL,” inProc. COLING, 2025, pp. 540–557

2025
[25]

AutoLink: Autonomous schema exploration and expansion for scalable schema linking in text-to-SQL at scale,

Z. Wang, Y . Zheng, Z. Caoet al., “AutoLink: Autonomous schema exploration and expansion for scalable schema linking in text-to-SQL at scale,” inProc. AAAI Conf. Artificial Intelligence, 2026, to appear

2026
[26]

LGESQL: Line graph enhanced text-to-SQL model with mixed local and non-local relations,

R. Cao, L. Chen, Z. Chen, Y . Zhao, S. Zhu, and K. Yu, “LGESQL: Line graph enhanced text-to-SQL model with mixed local and non-local relations,” inProc. ACL-IJCNLP, 2021, pp. 2541–2555

2021
[27]

GraPPa: Grammar-augmented pre-training for table semantic parsing,

T. Yu, C.-S. Wu, X. V . Lin, B. Wang, Y . C. Tan, X. Yang, D. Radev, R. Socher, and C. Xiong, “GraPPa: Grammar-augmented pre-training for table semantic parsing,” inProc. ICLR, 2021

2021
[28]

Structure-grounded pretraining for text-to-SQL,

X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun, and M. Richardson, “Structure-grounded pretraining for text-to-SQL,” in Proc. NAACL, 2021, pp. 1337–1350

2021
[29]

PICARD: Parsing in- crementally for constrained auto-regressive decoding from language models,

T. Scholak, N. Schucher, and D. Bahdanau, “PICARD: Parsing in- crementally for constrained auto-regressive decoding from language models,” inProc. EMNLP, 2021, pp. 9895–9901

2021
[30]

To- wards complex text-to-SQL in cross-domain database with intermediate representation,

J. Guo, Z. Zhan, Y . Gao, Y . Xiao, J.-G. Lou, T. Liu, and D. Zhang, “To- wards complex text-to-SQL in cross-domain database with intermediate representation,” inProc. ACL, 2019, pp. 4524–4535

2019
[31]

arXiv preprint arXiv:2307.07306 , year=

X. Dong, C. Zhang, Y . Ge, Y . Mao, Y . Gao, J. Lin, D. Lou et al., “C3: Zero-shot text-to-SQL with ChatGPT,”arXiv preprint arXiv:2307.07306, 2023

work page arXiv 2023
[32]

ACT-SQL: In-context learning for text-to-SQL with automatically-generated chain-of-thought,

H. Zhang, R. Cao, L. Chen, H. Xu, and K. Yu, “ACT-SQL: In-context learning for text-to-SQL with automatically-generated chain-of-thought,” arXiv preprint arXiv:2310.17342, 2023

work page arXiv 2023
[33]

PET-SQL: A prompt- enhanced two-stage text-to-SQL framework with cross-consistency,

Z. Li, X. Wang, J. Zhao, S. Yanget al., “PET-SQL: A prompt- enhanced two-stage text-to-SQL framework with cross-consistency,” arXiv preprint arXiv:2403.09732, 2024

work page arXiv 2024
[34]

PURPLE: Making a large language model a better SQL writer,

T. Ren, Y . Fan, Z. He, R. Huang, J. Dai, C. Huang, Y . Jing, K. Zhang, Y . Yang, and X. S. Wang, “PURPLE: Making a large language model a better SQL writer,” inProc. IEEE Int. Conf. Data Engineering (ICDE), 2024, pp. 15–28

2024
[35]

AID-SQL: Adaptive in-context learning of text-to-SQL with difficulty-aware instruction and retrieval-augmented generation,

X. Li, Q. Cai, Y . Shu, C. Guo, and B. Yang, “AID-SQL: Adaptive in-context learning of text-to-SQL with difficulty-aware instruction and retrieval-augmented generation,” inProc. IEEE Int. Conf. Data Engi- neering (ICDE), 2025, pp. 3945–3957

2025
[36]

Gar: A generate-and-rank approach for natural language to SQL translation,

Y . Fan, Z. He, T. Ren, D. Guo, L. Chen, R. Zhu, G. Chen, Y . Jing, K. Zhang, and X. S. Wang, “Gar: A generate-and-rank approach for natural language to SQL translation,” inProc. IEEE Int. Conf. Data Engineering (ICDE), 2023, pp. 110–122

2023
[37]

Metasql: A generate-then-rank framework for natural language to SQL translation,

Y . Fan, Z. He, T. Ren, C. Huang, Y . Jing, K. Zhang, and X. S. Wang, “Metasql: A generate-then-rank framework for natural language to SQL translation,” inProc. IEEE Int. Conf. Data Engineering (ICDE), 2024, pp. 1765–1778

2024
[38]

An in-depth benchmarking of text-to-SQL systems,

O. Gkini, T. Belmpas, G. Koutrika, and Y . E. Ioannidis, “An in-depth benchmarking of text-to-SQL systems,” inProc. ACM SIGMOD Int. Conf. Management of Data, 2021, pp. 632–644

2021
[39]

Semantic enhanced text-to-SQL parsing via iteratively learning schema linking graph,

A. Liu, X. Hu, L. Lin, and L. Wen, “Semantic enhanced text-to-SQL parsing via iteratively learning schema linking graph,” inProc. ACM SIGKDD Conf. Knowledge Discovery and Data Mining, 2022, pp. 1021– 1030

2022
[40]

Schema matching using pre-trained language models,

Y . Zhang, A. Floratou, J. Cahoon, S. Krishnan, A. C. M ¨uller, D. Banda, F. Psallidas, and J. M. Patel, “Schema matching using pre-trained language models,” inProc. IEEE Int. Conf. Data Engineering (ICDE), 2023, pp. 1558–1571

2023
[41]

CLEAR: A parser-independent disambiguation framework for NL2SQL,

M. Zhang, K. Ma, L. Xu, K. Zhang, Y . Peng, and R. Jin, “CLEAR: A parser-independent disambiguation framework for NL2SQL,” inProc. IEEE Int. Conf. Data Engineering (ICDE), 2025, pp. 1–14

2025
[42]

LLMLingua: Com- pressing prompts for accelerated inference of large language models,

H. Jiang, Q. Wu, C.-Y . Lin, Y . Yang, and L. Qiu, “LLMLingua: Com- pressing prompts for accelerated inference of large language models,” inProc. EMNLP, 2023, pp. 13 358–13 376

2023
[43]

Lost in the middle: How language models use long contexts,

N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang, “Lost in the middle: How language models use long contexts,” Trans. Assoc. Comput. Linguist., vol. 12, pp. 157–173, 2024

2024
[44]

Generating succinct descriptions of database schemata for cost-efficient prompting of large language models,

I. Trummer, “Generating succinct descriptions of database schemata for cost-efficient prompting of large language models,”Proc. VLDB Endow., vol. 17, no. 11, 2024

2024
[45]

FinSQL: Model-agnostic LLMs-based text-to-SQL framework for financial analysis,

C. Zhang, Y . Mao, Y . Fan, Y . Mi, Y . Gao, L. Chen, D. Lou, and J. Lin, “FinSQL: Model-agnostic LLMs-based text-to-SQL framework for financial analysis,” inCompanion of the ACM SIGMOD Int. Conf. Management of Data, 2024, pp. 93–105

2024
[46]

Combining small language models and large language models for zero-shot NL2SQL,

J. Fan, Z. Gu, S. Zhang, Y . Zhang, Z. Chen, L. Cao, G. Li, S. Madden, X. Du, and N. Tang, “Combining small language models and large language models for zero-shot NL2SQL,”Proc. VLDB Endow., vol. 17, no. 11, pp. 2750–2763, 2024

2024
[47]

Is long context all you need? Leveraging LLM’s extended context for NL2SQL,

Y . Chung, G. T. Kakkar, Y . Gan, B. Milne, and F. Ozcan, “Is long context all you need? Leveraging LLM’s extended context for NL2SQL,”Proc. VLDB Endow., vol. 18, no. 8, pp. 2735–2747, 2025

2025
[48]

Robertson and H

S. Robertson and H. Zaragoza,The Probabilistic Relevance Framework: BM25 and Beyond. Now Publishers Inc., 2009, vol. 4

2009
[49]

Sentence-BERT: Sentence embeddings using Siamese BERT-networks,

N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using Siamese BERT-networks,” inProc. EMNLP-IJCNLP, 2019, pp. 3982–3992

2019
[50]

APEX-SQL: Talking to the data via agentic exploration for Text-to-SQL,

B. Cao, W. Liao, Y . Sun, D. Fang, H. Li, and W. Lam, “APEX-SQL: Talking to the data via agentic exploration for Text-to-SQL,” inProc. ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD), 2026

2026

[1] [1]

Constructing an interactive natural language interface for relational databases,

F. Li and H. V . Jagadish, “Constructing an interactive natural language interface for relational databases,” inProceedings of the VLDB Endow- ment, vol. 8, no. 1, 2014, pp. 73–84

2014

[2] [2]

SQLizer: Query synthesis from natural language,

N. Yaghmazadeh, Y . Wang, I. Dillig, and T. Dillig, “SQLizer: Query synthesis from natural language,”Proc. ACM Program. Lang., vol. 1, no. OOPSLA, pp. 1–26, 2017

2017

[3] [3]

Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task,

T. Yu, R. Zhang, K. Yang, M. Yasunaga, D. Wang, Z. Li, J. Ma, I. Li, Q. Yao, S. Roman, Z. Zhang, and D. Radev, “Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task,” inProc. EMNLP, 2018, pp. 3911–3921

2018

[4] [4]

ATHENA: An ontology-driven system for natural language querying over relational data stores,

D. Saha, A. Floratou, K. Sankaranarayanan, U. F. Minhas, A. R. Mittal, and F. ¨Ozcan, “ATHENA: An ontology-driven system for natural language querying over relational data stores,”Proc. VLDB Endow., vol. 9, no. 12, pp. 1209–1220, 2016

2016

[5] [5]

Bridging the semantic gap with SQL query logs in natural language interfaces to databases,

C. Baik, H. V . Jagadish, and Y . Li, “Bridging the semantic gap with SQL query logs in natural language interfaces to databases,” inProc. IEEE Int. Conf. Data Engineering (ICDE), 2019, pp. 374–385

2019

[6] [6]

Duoquest: A dual-specification system for expressive SQL queries,

C. Baik, Z. Jin, M. J. Cafarella, and H. V . Jagadish, “Duoquest: A dual-specification system for expressive SQL queries,” inProc. ACM SIGMOD Int. Conf. Management of Data, 2020, pp. 2319–2329

2020

[7] [7]

A survey of text-to-SQL in the era of LLMs: Where are we, and where are we going?

X. Liu, S. Shen, B. Li, P. Ma, R. Jiang, Y . Luo, Y . Zhang, J. Fan, G. Li, and N. Tang, “A survey of text-to-SQL in the era of LLMs: Where are we, and where are we going?”IEEE Trans. Knowl. Data Eng., 2025

2025

[8] [8]

Next-generation database interfaces: A survey of LLM-based text-to- SQL,

Z. Hong, Z. Yuan, Q. Zhang, H. Chen, J. Dong, F. Huang, and X. Huang, “Next-generation database interfaces: A survey of LLM-based text-to- SQL,”IEEE Trans. Knowl. Data Eng., 2025

2025

[9] [9]

RAT-SQL: Relation-aware schema encoding and linking for text-to-SQL parsers,

B. Wang, R. Shin, X. Liu, O. Polozov, and M. Richardson, “RAT-SQL: Relation-aware schema encoding and linking for text-to-SQL parsers,” inProc. ACL, 2020, pp. 7567–7578

2020

[10] [10]

RESDSQL: Decoupling schema linking and skeleton parsing for text-to-SQL,

H. Li, J. Zhang, C. Li, and H. Chen, “RESDSQL: Decoupling schema linking and skeleton parsing for text-to-SQL,” inProc. AAAI, vol. 37, no. 11, 2023, pp. 13 067–13 075

2023

[11] [11]

DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction,

M. Pourreza and D. Rafiei, “DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction,”Adv. Neural Inf. Process. Syst., vol. 36, pp. 36 339–36 348, 2023

2023

[12] [12]

CHESS: Contextual Harnessing for Efficient SQL Synthesis

S. Talaei, M. Pourreza, Y .-C. Chang, A. Mirhoseini, and A. Saberi, “CHESS: Contextual harnessing for efficient SQL synthesis,”arXiv preprint arXiv:2405.16755, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

arXiv preprint arXiv:2410.01943

M. Pourreza, H. Li, R. Sun, Y . Chung, S. Talaeiet al., “CHASE-SQL: Multi-path reasoning and preference optimized candidate selection in text-to-SQL,”arXiv preprint arXiv:2410.01943, 2024

work page arXiv 2024

[14] [14]

Text-to- SQL empowered by large language models: A benchmark evaluation,

D. Gao, H. Wang, Y . Li, X. Sun, Y . Qian, B. Ding, and J. Zhou, “Text-to- SQL empowered by large language models: A benchmark evaluation,” Proc. VLDB Endow., vol. 17, no. 5, pp. 1132–1145, 2024

2024

[15] [15]

Can LLM already serve as a database interface? A big bench for large-scale database grounded text-to-SQLs,

J. Li, B. Hui, G. Qu, J. Yang, B. Li, B. Li, B. Wang, B. Qin, R. Geng, N. Huoet al., “Can LLM already serve as a database interface? A big bench for large-scale database grounded text-to-SQLs,”Adv. Neural Inf. Process. Syst., vol. 36, pp. 42 330–42 357, 2023

2023

[16] [16]

Spider 2.0: Evaluating language models on real-world enterprise text-to-sql workflows.arXiv preprint arXiv:2411.07763, 2024

F. Lei, J. Chen, Y . Yeet al., “Spider 2.0: Evaluating language mod- els on real-world enterprise text-to-SQL workflows,”arXiv preprint arXiv:2411.07763, 2024

work page arXiv 2024

[17] [17]

Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing,

X. V . Lin, R. Socher, and C. Xiong, “Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing,” inFindings of EMNLP, 2020, pp. 4870–4888

2020

[18] [18]

arXiv preprint arXiv:2411.00073 , year=

Z. Cao, Y . Zheng, Z. Fan, X. Zhang, W. Chen, and X. Bai, “RSL- SQL: Robust schema linking in text-to-SQL generation,”arXiv preprint arXiv:2411.00073, 2024

work page arXiv 2024

[19] [19]

CRUSH4SQL: Collective retrieval using schema hallucination for Text2SQL,

M. Kothyari, D. Dhingra, S. Sarawagi, and S. Chakrabarti, “CRUSH4SQL: Collective retrieval using schema hallucination for Text2SQL,” inProc. EMNLP, 2023, pp. 14 054–14 066

2023

[20] [20]

LinkAlign: Scalable schema linking for real-world large-scale multi-database text-to-SQL,

Y . Wang, P. Liu, and X. Yang, “LinkAlign: Scalable schema linking for real-world large-scale multi-database text-to-SQL,” inProc. EMNLP, 2025, pp. 977–991

2025

[21] [21]

CodeS: Towards building open-source language models for text-to-SQL,

H. Li, J. Zhang, H. Liu, J. Fan, X. Zhang, J. Zhu, R. Wei, H. Pan, C. Li, and H. Chen, “CodeS: Towards building open-source language models for text-to-SQL,”Proc. ACM Manag. Data, vol. 2, no. 3, pp. 1–28, 2024

2024

[22] [22]

DB-Explore: Automated database exploration and instruction synthesis for text-to-SQL,

H. Ma, Y . Shen, H. Liuet al., “DB-Explore: Automated database exploration and instruction synthesis for text-to-SQL,”arXiv preprint arXiv:2503.04959, 2025

work page arXiv 2025

[23] [23]

ReFoRCE: A text-to-SQL agent with self-refinement, format restriction, and column exploration,

M. Deng, A. Ramachandran, C. Xu, L. Hu, Z. Yao, A. Datta, and H. Zhang, “ReFoRCE: A text-to-SQL agent with self-refinement, format restriction, and column exploration,” inICLR Workshop on VerifAI, 2025

2025

[24] [24]

MAC-SQL: A multi-agent collaborative framework for text-to- SQL,

B. Wang, C. Ren, J. Yang, X. Liang, J. Bai, Q.-W. Zhang, Z. Yan, and Z. Li, “MAC-SQL: A multi-agent collaborative framework for text-to- SQL,” inProc. COLING, 2025, pp. 540–557

2025

[25] [25]

AutoLink: Autonomous schema exploration and expansion for scalable schema linking in text-to-SQL at scale,

Z. Wang, Y . Zheng, Z. Caoet al., “AutoLink: Autonomous schema exploration and expansion for scalable schema linking in text-to-SQL at scale,” inProc. AAAI Conf. Artificial Intelligence, 2026, to appear

2026

[26] [26]

LGESQL: Line graph enhanced text-to-SQL model with mixed local and non-local relations,

R. Cao, L. Chen, Z. Chen, Y . Zhao, S. Zhu, and K. Yu, “LGESQL: Line graph enhanced text-to-SQL model with mixed local and non-local relations,” inProc. ACL-IJCNLP, 2021, pp. 2541–2555

2021

[27] [27]

GraPPa: Grammar-augmented pre-training for table semantic parsing,

T. Yu, C.-S. Wu, X. V . Lin, B. Wang, Y . C. Tan, X. Yang, D. Radev, R. Socher, and C. Xiong, “GraPPa: Grammar-augmented pre-training for table semantic parsing,” inProc. ICLR, 2021

2021

[28] [28]

Structure-grounded pretraining for text-to-SQL,

X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun, and M. Richardson, “Structure-grounded pretraining for text-to-SQL,” in Proc. NAACL, 2021, pp. 1337–1350

2021

[29] [29]

PICARD: Parsing in- crementally for constrained auto-regressive decoding from language models,

T. Scholak, N. Schucher, and D. Bahdanau, “PICARD: Parsing in- crementally for constrained auto-regressive decoding from language models,” inProc. EMNLP, 2021, pp. 9895–9901

2021

[30] [30]

To- wards complex text-to-SQL in cross-domain database with intermediate representation,

J. Guo, Z. Zhan, Y . Gao, Y . Xiao, J.-G. Lou, T. Liu, and D. Zhang, “To- wards complex text-to-SQL in cross-domain database with intermediate representation,” inProc. ACL, 2019, pp. 4524–4535

2019

[31] [31]

arXiv preprint arXiv:2307.07306 , year=

X. Dong, C. Zhang, Y . Ge, Y . Mao, Y . Gao, J. Lin, D. Lou et al., “C3: Zero-shot text-to-SQL with ChatGPT,”arXiv preprint arXiv:2307.07306, 2023

work page arXiv 2023

[32] [32]

ACT-SQL: In-context learning for text-to-SQL with automatically-generated chain-of-thought,

H. Zhang, R. Cao, L. Chen, H. Xu, and K. Yu, “ACT-SQL: In-context learning for text-to-SQL with automatically-generated chain-of-thought,” arXiv preprint arXiv:2310.17342, 2023

work page arXiv 2023

[33] [33]

PET-SQL: A prompt- enhanced two-stage text-to-SQL framework with cross-consistency,

Z. Li, X. Wang, J. Zhao, S. Yanget al., “PET-SQL: A prompt- enhanced two-stage text-to-SQL framework with cross-consistency,” arXiv preprint arXiv:2403.09732, 2024

work page arXiv 2024

[34] [34]

PURPLE: Making a large language model a better SQL writer,

T. Ren, Y . Fan, Z. He, R. Huang, J. Dai, C. Huang, Y . Jing, K. Zhang, Y . Yang, and X. S. Wang, “PURPLE: Making a large language model a better SQL writer,” inProc. IEEE Int. Conf. Data Engineering (ICDE), 2024, pp. 15–28

2024

[35] [35]

AID-SQL: Adaptive in-context learning of text-to-SQL with difficulty-aware instruction and retrieval-augmented generation,

X. Li, Q. Cai, Y . Shu, C. Guo, and B. Yang, “AID-SQL: Adaptive in-context learning of text-to-SQL with difficulty-aware instruction and retrieval-augmented generation,” inProc. IEEE Int. Conf. Data Engi- neering (ICDE), 2025, pp. 3945–3957

2025

[36] [36]

Gar: A generate-and-rank approach for natural language to SQL translation,

Y . Fan, Z. He, T. Ren, D. Guo, L. Chen, R. Zhu, G. Chen, Y . Jing, K. Zhang, and X. S. Wang, “Gar: A generate-and-rank approach for natural language to SQL translation,” inProc. IEEE Int. Conf. Data Engineering (ICDE), 2023, pp. 110–122

2023

[37] [37]

Metasql: A generate-then-rank framework for natural language to SQL translation,

Y . Fan, Z. He, T. Ren, C. Huang, Y . Jing, K. Zhang, and X. S. Wang, “Metasql: A generate-then-rank framework for natural language to SQL translation,” inProc. IEEE Int. Conf. Data Engineering (ICDE), 2024, pp. 1765–1778

2024

[38] [38]

An in-depth benchmarking of text-to-SQL systems,

O. Gkini, T. Belmpas, G. Koutrika, and Y . E. Ioannidis, “An in-depth benchmarking of text-to-SQL systems,” inProc. ACM SIGMOD Int. Conf. Management of Data, 2021, pp. 632–644

2021

[39] [39]

Semantic enhanced text-to-SQL parsing via iteratively learning schema linking graph,

A. Liu, X. Hu, L. Lin, and L. Wen, “Semantic enhanced text-to-SQL parsing via iteratively learning schema linking graph,” inProc. ACM SIGKDD Conf. Knowledge Discovery and Data Mining, 2022, pp. 1021– 1030

2022

[40] [40]

Schema matching using pre-trained language models,

Y . Zhang, A. Floratou, J. Cahoon, S. Krishnan, A. C. M ¨uller, D. Banda, F. Psallidas, and J. M. Patel, “Schema matching using pre-trained language models,” inProc. IEEE Int. Conf. Data Engineering (ICDE), 2023, pp. 1558–1571

2023

[41] [41]

CLEAR: A parser-independent disambiguation framework for NL2SQL,

M. Zhang, K. Ma, L. Xu, K. Zhang, Y . Peng, and R. Jin, “CLEAR: A parser-independent disambiguation framework for NL2SQL,” inProc. IEEE Int. Conf. Data Engineering (ICDE), 2025, pp. 1–14

2025

[42] [42]

LLMLingua: Com- pressing prompts for accelerated inference of large language models,

H. Jiang, Q. Wu, C.-Y . Lin, Y . Yang, and L. Qiu, “LLMLingua: Com- pressing prompts for accelerated inference of large language models,” inProc. EMNLP, 2023, pp. 13 358–13 376

2023

[43] [43]

Lost in the middle: How language models use long contexts,

N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang, “Lost in the middle: How language models use long contexts,” Trans. Assoc. Comput. Linguist., vol. 12, pp. 157–173, 2024

2024

[44] [44]

Generating succinct descriptions of database schemata for cost-efficient prompting of large language models,

I. Trummer, “Generating succinct descriptions of database schemata for cost-efficient prompting of large language models,”Proc. VLDB Endow., vol. 17, no. 11, 2024

2024

[45] [45]

FinSQL: Model-agnostic LLMs-based text-to-SQL framework for financial analysis,

C. Zhang, Y . Mao, Y . Fan, Y . Mi, Y . Gao, L. Chen, D. Lou, and J. Lin, “FinSQL: Model-agnostic LLMs-based text-to-SQL framework for financial analysis,” inCompanion of the ACM SIGMOD Int. Conf. Management of Data, 2024, pp. 93–105

2024

[46] [46]

Combining small language models and large language models for zero-shot NL2SQL,

J. Fan, Z. Gu, S. Zhang, Y . Zhang, Z. Chen, L. Cao, G. Li, S. Madden, X. Du, and N. Tang, “Combining small language models and large language models for zero-shot NL2SQL,”Proc. VLDB Endow., vol. 17, no. 11, pp. 2750–2763, 2024

2024

[47] [47]

Is long context all you need? Leveraging LLM’s extended context for NL2SQL,

Y . Chung, G. T. Kakkar, Y . Gan, B. Milne, and F. Ozcan, “Is long context all you need? Leveraging LLM’s extended context for NL2SQL,”Proc. VLDB Endow., vol. 18, no. 8, pp. 2735–2747, 2025

2025

[48] [48]

Robertson and H

S. Robertson and H. Zaragoza,The Probabilistic Relevance Framework: BM25 and Beyond. Now Publishers Inc., 2009, vol. 4

2009

[49] [49]

Sentence-BERT: Sentence embeddings using Siamese BERT-networks,

N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using Siamese BERT-networks,” inProc. EMNLP-IJCNLP, 2019, pp. 3982–3992

2019

[50] [50]

APEX-SQL: Talking to the data via agentic exploration for Text-to-SQL,

B. Cao, W. Liao, Y . Sun, D. Fang, H. Li, and W. Lam, “APEX-SQL: Talking to the data via agentic exploration for Text-to-SQL,” inProc. ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD), 2026

2026