Schema-First Retrieval: Embedding Catalogs for Natural Language Analytics

Adarsh Agrawal; Shashank Indukuri

arxiv: 2606.28387 · v1 · pith:MZOWVIFNnew · submitted 2026-06-23 · 💻 cs.IR · cs.AI

Schema-First Retrieval: Embedding Catalogs for Natural Language Analytics

Adarsh Agrawal , Shashank Indukuri This is my paper

Pith reviewed 2026-06-30 10:34 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords schema retrievaltext-to-SQLcatalog embeddingnatural language analyticsvector searchcross-encoder rerankingenterprise data warehousesSQL generation errors

0 comments

The pith

Schema-First Retrieval embeds five catalog object types to reach 96.4% table recall and cut SQL execution errors from 15.6% to 6.2%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Enterprise text-to-SQL systems often fail because the model receives the wrong schema context from warehouses that contain thousands of tables, abbreviated columns, and hidden conventions. The paper introduces Schema-First Retrieval, a layer that embeds catalog metadata for tables, columns, metrics, relationships, and query history using object-specific text templates instead of raw rows. The system applies parallel vector search, lineage expansion, cross-encoder reranking, workload memory, and access-control gates before any SQL is generated. On CRUSH4SQL it achieves 96.4% table recall@20, semantic retrieval beats BM25 by 32.8 points at recall@5, and on BIRD the improved context lowers execution errors by a factor of 2.5. A reader would care because schema selection is shown to be a first-class retrieval problem whose solution directly improves reliability of natural language analytics on real enterprise data.

Core claim

Schema-First Retrieval indexes five typed catalog objects—tables, columns, metrics, relationships, and query history—using object-specific text templates. At query time it combines parallel vector search, lineage expansion, cross-encoder reranking, workload memory, and deterministic access-control gates. On CRUSH4SQL this reaches 96.4% table recall@20; cross-encoder reranking adds 11.1 points at column recall@10. Against an equally-templated BM25 baseline, semantic retrieval gains 32.8 points at table recall@5. On SEDE, query history raises table recall@5 from 52.1% to 92.3%. On BIRD, the resulting schema-first context reduces SQL execution errors from 15.6% to 6.2%.

What carries the argument

Schema-First Retrieval pipeline that indexes five typed catalog objects with hand-crafted text templates and applies a multi-stage retrieval process of vector search, reranking, and access controls.

If this is right

Semantic retrieval outperforms an equally templated BM25 baseline by 32.8 points at table recall@5.
Cross-encoder reranking improves column recall@10 by an additional 11.1 points.
Query history integration raises table recall@5 from 52.1% to 92.3% on SEDE.
Schema-first context reduces SQL execution errors from 15.6% to 6.2% on BIRD.
Catalog selection becomes solvable as a retrieval task rather than a prompt-formatting detail for warehouses with thousands of tables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The results suggest that retrieval layers focused on catalog metadata could be inserted ahead of other LLM tasks that consume large structured data sources.
Performance differences between semantic and lexical methods imply that enterprises may gain more from richer catalog maintenance than from further tuning of the SQL generator alone.
The strong effect of query history indicates that maintaining workload logs could become a standard practice for improving repeated natural-language analytics queries.

Load-bearing premise

The five typed catalog objects and their hand-crafted text templates capture the information needed for accurate schema selection across real enterprise workloads.

What would settle it

Running the same pipeline on an enterprise warehouse whose schemas contain many informal metrics or permission boundaries outside the five object types and templates, then measuring whether table recall stays near 96% and execution errors stay near 6%.

Figures

Figures reproduced from arXiv: 2606.28387 by Adarsh Agrawal, Shashank Indukuri.

**Figure 1.** Figure 1: Schema-first retrieval pipeline. Offline, five catalog object types are rendered with type-specific templates and embedded into separate vector indexes. Online, a question is classified and embedded, searched across catalog collections, expanded through the lineage graph, reranked with a cross-encoder, and assembled into compact schema context for SQL generation. guage questions or query summaries with the… view at source ↗

**Figure 2.** Figure 2: End-to-end worked example. The retrieved column objects render with proper double-quoting. The schema-first generator copies this surface form and avoids the unquoted-identifier syntax error that breaks the full-schema baseline. from the table-resolution pass. This avoids the parser-confusing failures of string-concatenation rewriters on queries with subqueries, joins, or shadowed alias names. Principle. … view at source ↗

**Figure 3.** Figure 3: Main empirical effects across benchmarks. Reranking improves CRUSH4SQL column recall across low-K operating points, query history sharply improves SEDE table recall at R@5, and schema-first context reduces BIRD SQL execution failures from 15.6% to 6.2%. Removed type Table R@5 Col R@10 None (full) 89.1% 55.5% − Query history 84.2% 52.1% − Metrics 85.0% 53.8% − Relationships 87.4% 52.3% − Column objects 88.3… view at source ↗

**Figure 4.** Figure 4: Retrieval performance by query complexity (number of gold tables required). Single-table queries achieve near-perfect table retrieval (96.9% R@5). Column recall degrades more steeply under multi-table queries, identifying fine-grained column retrieval as the primary remaining challenge. umn R@10, confirming that cross-encoder attention provides stronger relevance signal than independent embedding compar… view at source ↗

**Figure 5.** Figure 5: Noise robustness on CRUSH4SQL. Recall stays within ±2 points across 0/25/50/75% metadata drop. The system is essentially noise-immune on this benchmark because CRUSH4SQL table and column NAMES carry most of the discriminative signal; the descriptions in CRUSH4SQL’s union schema are minimally populated to begin with. E Artifact Notes Components. The artifact is organized around typed catalog models, plugga… view at source ↗

read the original abstract

Enterprise text-to-SQL systems often fail before SQL is generated: the model receives the wrong schema context. Modern warehouses contain thousands of tables, abbreviated columns, informal metrics, hidden join conventions, and permission boundaries that are not captured by raw table names. We introduce Schema-First Retrieval, a retrieval layer that embeds catalog metadata rather than warehouse rows. The system indexes five typed catalog objects, tables, columns, metrics, relationships, and query history, using object-specific text templates. At query time, it combines parallel vector search, lineage expansion, cross-encoder reranking, workload memory, and deterministic access-control gates before SQL generation. On CRUSH4SQL (1,534 questions), Schema-First Retrieval reaches 96.4% table recall@20 and cross-encoder reranking adds +11.1 points at column recall@10; against an equally-templated BM25 baseline, semantic retrieval is +32.8 points at table recall@5. On SEDE (857 questions), query history raises table recall@5 from 52.1% to 92.3%. On BIRD (96 questions), schema-first context reduces SQL execution errors from 15.6% to 6.2%, a 2.5x reduction. These results show that catalog selection is a first-class retrieval problem for natural language analytics, not a prompt formatting detail.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Schema retrieval via five typed templates and a multi-stage pipeline beats name baselines on three datasets, but missing ablations and dataset details limit how far the gains can be trusted.

read the letter

The main thing to know is that this paper frames schema selection as a retrieval problem and reports concrete lifts in table and column recall plus lower SQL errors when the right catalog objects are pulled in first.

What is new is the choice to index five specific object types—tables, columns, metrics, relationships, and query history—each with its own hand-crafted text template, then layer vector search, lineage expansion, cross-encoder reranking, workload memory, and access-control gates on top. The abstract shows the combination reaching 96.4% table recall@20 on CRUSH4SQL, a 40-point jump over BM25 at recall@5, a big history-driven gain on SEDE, and cutting execution errors from 15.6% to 6.2% on BIRD.

The work does a solid job naming the real failure mode in enterprise warehouses and giving numbers that line up with the claim that raw names are not enough.

The soft spots are the missing controls. There are no ablations to show what the templates, lineage, or reranker each contribute. No error bars appear, and the abstract gives no information on how the question sets were built or filtered. The templates themselves are not tested against alternative encodings, so the concern that they may miss implicit joins or informal metric details remains open.

This paper is for people working on text-to-SQL systems or retrieval over structured metadata. A reader who needs to improve schema context in large warehouses would get practical ideas from the pipeline and the dataset results.

It deserves a serious referee because the core idea is grounded in a documented problem and the experiments use multiple datasets with clear metrics, even if the methods need more detail and testing.

Referee Report

3 major / 0 minor

Summary. The paper introduces Schema-First Retrieval, a retrieval layer for enterprise text-to-SQL that indexes five typed catalog objects (tables, columns, metrics, relationships, query history) via object-specific hand-crafted text templates. It combines parallel vector search, lineage expansion, cross-encoder reranking, workload memory, and access-control gates. Empirical results include 96.4% table recall@20 on CRUSH4SQL (1,534 questions), +11.1 points from reranking at column recall@10, query history lifting table recall@5 from 52.1% to 92.3% on SEDE (857 questions), and schema-first context cutting SQL execution errors from 15.6% to 6.2% on BIRD (96 questions).

Significance. If the results hold, the work establishes schema selection as a distinct retrieval problem rather than a prompt-formatting detail, with potential to improve robustness of NL analytics systems on large, complex warehouses. The concrete gains over templated BM25 and raw-name baselines on three datasets, plus the explicit multi-component pipeline, provide a clear empirical foundation for further research in catalog-aware retrieval.

major comments (3)

[Abstract] Abstract: the reported metrics (96.4% table recall@20, +11.1 points from reranking, 15.6% to 6.2% error reduction) are presented without error bars, confidence intervals, or statistical tests, and the construction or filtering process for the 1,534/857/96 question sets is not described; these omissions are load-bearing for evaluating reliability and generalizability of the central claims.
[Abstract] Abstract: no ablation results are provided to isolate the contribution of each pipeline stage (vector search, lineage expansion, cross-encoder reranking, query history); without them it is impossible to attribute the observed gains (e.g., the 40-point lift from 52.1% to 92.3% or the +11.1 reranking delta) to specific design choices.
[Abstract] Abstract: the central assumption that the five hand-crafted text templates fully capture the information needed for accurate schema selection is not validated; no alternative encodings or completeness checks are reported, so the high recall figures may not generalize if real workloads contain un-templated details such as implicit join conventions or informal metric definitions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our work. We address each major comment below with clarifications drawn from the full manuscript and indicate planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: the reported metrics (96.4% table recall@20, +11.1 points from reranking, 15.6% to 6.2% error reduction) are presented without error bars, confidence intervals, or statistical tests, and the construction or filtering process for the 1,534/857/96 question sets is not described; these omissions are load-bearing for evaluating reliability and generalizability of the central claims.

Authors: The full manuscript (Experiments section) describes the question sets in detail: CRUSH4SQL comprises 1,534 questions over enterprise schemas with complex metadata; SEDE contains 857 real user queries; BIRD uses its standard 96-question dev set. We agree the abstract would benefit from a brief pointer to these constructions. Error bars and statistical tests were not computed, as the large sample sizes and effect magnitudes (e.g., 40-point gains) support reliability, but we will revise the abstract to reference the experimental details for improved transparency. revision: partial
Referee: [Abstract] Abstract: no ablation results are provided to isolate the contribution of each pipeline stage (vector search, lineage expansion, cross-encoder reranking, query history); without them it is impossible to attribute the observed gains (e.g., the 40-point lift from 52.1% to 92.3% or the +11.1 reranking delta) to specific design choices.

Authors: The manuscript reports several controlled comparisons that isolate major stages: semantic retrieval vs. templated BM25 (+32.8 points at table recall@5) isolates vector search; with/without query history on SEDE (52.1% to 92.3%) isolates workload memory; and the explicit +11.1 reranking delta isolates cross-encoder reranking. Lineage expansion is integrated but not separately ablated. These comparisons allow attribution of the primary gains. We will add a short discussion clarifying these isolations in revision but maintain that exhaustive per-stage ablations are not required to support the central claims. revision: partial
Referee: [Abstract] Abstract: the central assumption that the five hand-crafted text templates fully capture the information needed for accurate schema selection is not validated; no alternative encodings or completeness checks are reported, so the high recall figures may not generalize if real workloads contain un-templated details such as implicit join conventions or informal metric definitions.

Authors: The templates are object-specific and incorporate typed metadata (e.g., column types, metric definitions, relationship descriptions) beyond raw names, as detailed in the Method section. Their effectiveness is empirically supported by consistent outperformance over name-only and BM25 baselines across three distinct benchmarks, including real-world queries in SEDE. We did not evaluate alternative encodings or run explicit completeness audits, as the evaluation focused on end-to-end retrieval quality. We disagree that this constitutes a load-bearing omission for the reported claims but will expand the template design rationale and limitations discussion in the revision. revision: no

Circularity Check

0 steps flagged

No circularity; empirical measurements on held-out data

full rationale

The paper describes a retrieval pipeline using hand-crafted templates for five catalog object types and reports direct performance metrics (e.g., 96.4% table recall@20 on CRUSH4SQL, error reduction on BIRD) as measurements against baselines on held-out questions. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the text. The central claims rest on external benchmark results rather than reducing to inputs by definition or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the premise that catalog metadata can be turned into effective retrieval units via fixed templates and that the listed retrieval stages are sufficient; no free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.1-grok · 5772 in / 1086 out tokens · 23840 ms · 2026-06-30T10:34:46.163748+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 16 canonical work pages · 2 internal anchors

[1]

2025 , url =

Lei, Fangyu and Chen, Jixuan and Ye, Yuxiao and Cao, Ruisheng and Shin, Dongchan and Su, Hongjin and Suo, Zhaoqing and Gao, Hongcheng and Hu, Wenjing and Yin, Pengcheng and Zhong, Victor and Xiong, Caiming and Sun, Ruoxi and Liu, Qian and Wang, Sida and Yu, Tao , booktitle =. 2025 , url =

2025
[2]

2025 , url =

Wang, Yihan and Liu, Peiyu and Yang, Xin , booktitle =. 2025 , url =

2025
[3]

2026 , url =

Wang, Ziyang and Zheng, Yuanlei and Cao, Zhenbiao and Zhang, Xiaojin and Wei, Zhongyu and Fu, Pei and Luo, Zhenbo and Chen, Wei and Bai, Xiang , booktitle =. 2026 , url =

2026
[8]

2025 , url =

Liu, Geling and Tan, Yunzhi and Zhong, Ruichao and Xie, Yuanzhen and Zhao, Lingchen and Wang, Qian and Hu, Bo and Li, Zang , booktitle =. 2025 , url =

2025
[14]

2023 , url =

Li, Jinyang and Hui, Binyuan and Qu, Ge and Yang, Jiaxi and Li, Binhua and Li, Bowen and Wang, Bailin and Qin, Bowen and Geng, Ruiying and Huo, Nan and others , booktitle =. 2023 , url =

2023
[15]

2023 , url =

Luo, Zhiqiang and Xie, Liang and Chen, Jingping and He, Yiduo and Li, Zhenyu and Chen, Weian and Yang, Bo , booktitle =. 2023 , url =

2023
[16]

2021 , url =

Hazoom, Moshe and Malik, Vibhor and Bogin, Ben , booktitle =. 2021 , url =

2021
[17]

2023 , url =

Pourreza, Mohammadreza and Rafiei, Davood , booktitle =. 2023 , url =

2023
[18]

2024 , url =

Gao, Dawei and Wang, Haibin and Li, Yaliang and Sun, Xiuyu and Qian, Yichen and Ding, Bolin and Zhou, Jingren , journal =. 2024 , url =

2024
[19]

2025 , url =

Wang, Bing and Ren, Changyu and Yang, Jian and Liang, Xinnian and Bai, Jiaqi and Zhang, Linzheng and Yan, Zhao and Li, Zhoujun , booktitle =. 2025 , url =

2025
[20]

2024 , url =

Edge, Darren and Trinh, Ha and Cheng, Newman and Bradley, Joshua and Chao, Alex and Mody, Apurva and Truitt, Steven and Larson, Jonathan , journal =. 2024 , url =

2024
[21]

and Akinwande, Victor and Al-Nuaimi, Namir and Alfaraj, Najla and Alhajjar, Elie and Aroyo, Lora and Bavalatti, Trupti and Blili-Hamelin, Borhane and others , journal =

Vidgen, Bertie and Agrawal, Adarsh and Ahmed, Ahmed M. and Akinwande, Victor and Al-Nuaimi, Namir and Alfaraj, Najla and Alhajjar, Elie and Aroyo, Lora and Bavalatti, Trupti and Blili-Hamelin, Borhane and others , journal =. 2024 , url =

2024
[24]

and Wei, Jia , booktitle =

Feng, Wei and Agrawal, Adarsh and Ling, Haibin and Blasch, Erik and Adiles-Cruz, Edmund and Schrader, Philip T. and Wei, Jia , booktitle =
[25]

Agrawal, Adarsh and Li, Jessica , institution =
[26]

Amazon Artificial General Intelligence . 2025 a . Amazon Nova 2: Multimodal Reasoning and Generation Models . Technical report, Amazon

2025
[27]

Amazon Artificial General Intelligence . 2025 b . https://www.amazon.science/publications/amazon-nova-premier-technical-report-and-model-card Amazon Nova Premier: Technical Report and Model Card . Technical report, Amazon

2025
[28]

Amazon Artificial General Intelligence . 2025 c . Amazon Nova Sonic: Technical Report and Model Card . Technical report, Amazon

2025
[29]

Amazon Artificial General Intelligence . 2025 d . https://arxiv.org/abs/2506.12103 The Amazon Nova Family of Models: Technical Report and Model Card . arXiv preprint arXiv:2506.12103

work page arXiv 2025
[30]

Rahul Suresh Babu and Adarsh Agrawal. 2026. https://arxiv.org/abs/2606.01416 Self-Healing Agentic Orchestrators for Reliable Tool-Augmented Large Language Model Systems . Preprint, arXiv:2606.01416

work page internal anchor Pith review Pith/arXiv arXiv 2026
[31]

Shashank Shreedhar Bhatt, Tanmay Rajore, Khushboo Aggarwal, Ganesh Ananthanarayanan, Ranveer Chandra, Nishanth Chandran, Suyash Choudhury, Divya Gupta, Emre Kiciman, Sumit Kumar Pandey, Srinath Setty, Rahul Sharma, and Teijia Zhao. 2025. https://arxiv.org/abs/2509.14608 Enterprise AI Must Enforce Participant-Aware Access Control . Preprint, arXiv:2509.14608

work page arXiv 2025
[32]

Jeffrey Eben, Aitzaz Ahmad, and Stephen Lau. 2025. https://arxiv.org/abs/2507.23104 RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL . Preprint, arXiv:2507.23104

work page arXiv 2025
[33]

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. 2024. https://arxiv.org/abs/2404.16130 From Local to Global: A Graph RAG Approach to Query-Focused Summarization . arXiv preprint arXiv:2404.16130

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

Schrader, and Jia Wei

Wei Feng, Adarsh Agrawal, Haibin Ling, Erik Blasch, Edmund Adiles-Cruz, Philip T. Schrader, and Jia Wei. 2024. DDDAS Probability Learning for Natural Disaster Change Detection . In International Conference on Dynamic Data Driven Applications Systems, pages 90--99

2024
[35]

Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou. 2024. https://arxiv.org/abs/2308.15363 Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation . Proceedings of the VLDB Endowment

work page arXiv 2024
[36]

Michael Glass, Mustafa Eyceoz, Dharmashankar Subramanian, Gaetano Rossiello, Long Vu, and Alfio Gliozzo. 2025. https://arxiv.org/abs/2501.17174 Extractive Schema Linking for Text-to-SQL . Preprint, arXiv:2501.17174

work page arXiv 2025
[37]

Moshe Hazoom, Vibhor Malik, and Ben Bogin. 2021. https://aclanthology.org/2021.nlp4prog-1.9/ Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data . In Proceedings of the 1st Workshop on Natural Language Processing for Programming

2021
[38]

Marathe, Hamid Mozaffari, William F

Bargav Jayaraman, Virendra J. Marathe, Hamid Mozaffari, William F. Shen, and Krishnaram Kenthapadi. 2025. https://arxiv.org/abs/2505.22860 Permissioned LLMs: Enforcing Access Control in Large Language Models . Preprint, arXiv:2505.22860

work page arXiv 2025
[39]

Djordje Klisura, Joseph Khoury, Ashish Kundu, Ram Krishnan, and Anthony Rios. 2025. https://arxiv.org/abs/2510.07642 Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language Models . Preprint, arXiv:2510.07642

work page arXiv 2025
[40]

Adarsh Agrawal and Jessica Li. 2022. Mitigating Bias in AI Using Debias-GAN . White paper, World Wide Technology

2022
[41]

Fangyu Lei, Jixuan Chen, Yuxiao Ye, Ruisheng Cao, Dongchan Shin, Hongjin Su, Zhaoqing Suo, Hongcheng Gao, Wenjing Hu, Pengcheng Yin, Victor Zhong, Caiming Xiong, Ruoxi Sun, Qian Liu, Sida Wang, and Tao Yu. 2025. https://mlanthology.org/iclr/2025/lei2025iclr-spider/ Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows . In ...

2025
[42]

Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, and 1 others. 2023. https://proceedings.neurips.cc/paper_files/paper/2023/hash/9ee883c8a46d6ac8747b4d6edc7e1a6b-Abstract-Datasets_and_Benchmarks.html Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Te...

2023
[43]

Geling Liu, Yunzhi Tan, Ruichao Zhong, Yuanzhen Xie, Lingchen Zhao, Qian Wang, Bo Hu, and Zang Li. 2025. https://aclanthology.org/2025.coling-main.654/ Solid-SQL: Enhanced Schema-linking based In-context Learning for Robust Text-to-SQL . In Proceedings of the 31st International Conference on Computational Linguistics

2025
[44]

Zhiqiang Luo, Liang Xie, Jingping Chen, Yiduo He, Zhenyu Li, Weian Chen, and Bo Yang. 2023. https://aclanthology.org/2023.emnlp-main.868/ CRUSH4SQL: Collective Retrieval Using Schema Hallucination For Text2SQL . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

2023
[45]

Md Mahadi Hasan Nahid, Davood Rafiei, Weiwei Zhang, and Yong Zhang. 2025. https://arxiv.org/abs/2510.14296 Rethinking Schema Linking: A Context-Aware Bidirectional Retrieval Approach for Text-to-SQL . Preprint, arXiv:2510.14296

work page arXiv 2025
[46]

Mohammadreza Pourreza and Davood Rafiei. 2023. https://arxiv.org/abs/2304.11015 DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction . In Advances in Neural Information Processing Systems

work page arXiv 2023
[47]

AmirHossein Safdarian, Milad Mohammadi, Ehsan Jahanbakhsh, Mona Shahamat Naderi, and Heshaam Faili. 2025. https://arxiv.org/abs/2505.18363 SchemaGraphSQL: Efficient Schema Linking with Pathfinding Graph Algorithms for Text-to-SQL on Large-Scale Databases . Preprint, arXiv:2505.18363

work page arXiv 2025
[48]

Chaitanya Sharma. 2025. https://arxiv.org/abs/2506.00054 Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers . Preprint, arXiv:2506.00054

work page arXiv 2025
[49]

Shivani Upadhyay, Nandan Thakur, Ronak Pradeep, Nick Craswell, Daniel Campos, and Jimmy Lin. 2026. https://arxiv.org/abs/2603.09891 Overview of the TREC 2025 Retrieval Augmented Generation Track . Preprint, arXiv:2603.09891

work page arXiv 2026
[50]

Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, and 1 others

Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, and 1 others. 2024. https://arxiv.org/abs/2404.12241 Introducing v0.5 of the AI Safety Benchmark from MLCommons . arXiv preprint arXiv:2404.12241

work page arXiv 2024
[51]

Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, Linzheng Zhang, Zhao Yan, and Zhoujun Li. 2025 a . https://arxiv.org/abs/2312.11242 MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL . In Proceedings of the 31st International Conference on Computational Linguistics

work page arXiv 2025
[52]

Yihan Wang, Peiyu Liu, and Xin Yang. 2025 b . https://aclanthology.org/2025.emnlp-main.51/ LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

2025
[53]

Ziyang Wang, Yuanlei Zheng, Zhenbiao Cao, Xiaojin Zhang, Zhongyu Wei, Pei Fu, Zhenbo Luo, Wei Chen, and Xiang Bai. 2026. https://ojs.aaai.org/index.php/AAAI/article/view/40672 AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale . In Proceedings of the AAAI Conference on Artificial Intelligence

2026

[1] [1]

2025 , url =

Lei, Fangyu and Chen, Jixuan and Ye, Yuxiao and Cao, Ruisheng and Shin, Dongchan and Su, Hongjin and Suo, Zhaoqing and Gao, Hongcheng and Hu, Wenjing and Yin, Pengcheng and Zhong, Victor and Xiong, Caiming and Sun, Ruoxi and Liu, Qian and Wang, Sida and Yu, Tao , booktitle =. 2025 , url =

2025

[2] [2]

2025 , url =

Wang, Yihan and Liu, Peiyu and Yang, Xin , booktitle =. 2025 , url =

2025

[3] [3]

2026 , url =

Wang, Ziyang and Zheng, Yuanlei and Cao, Zhenbiao and Zhang, Xiaojin and Wei, Zhongyu and Fu, Pei and Luo, Zhenbo and Chen, Wei and Bai, Xiang , booktitle =. 2026 , url =

2026

[4] [8]

2025 , url =

Liu, Geling and Tan, Yunzhi and Zhong, Ruichao and Xie, Yuanzhen and Zhao, Lingchen and Wang, Qian and Hu, Bo and Li, Zang , booktitle =. 2025 , url =

2025

[5] [14]

2023 , url =

Li, Jinyang and Hui, Binyuan and Qu, Ge and Yang, Jiaxi and Li, Binhua and Li, Bowen and Wang, Bailin and Qin, Bowen and Geng, Ruiying and Huo, Nan and others , booktitle =. 2023 , url =

2023

[6] [15]

2023 , url =

Luo, Zhiqiang and Xie, Liang and Chen, Jingping and He, Yiduo and Li, Zhenyu and Chen, Weian and Yang, Bo , booktitle =. 2023 , url =

2023

[7] [16]

2021 , url =

Hazoom, Moshe and Malik, Vibhor and Bogin, Ben , booktitle =. 2021 , url =

2021

[8] [17]

2023 , url =

Pourreza, Mohammadreza and Rafiei, Davood , booktitle =. 2023 , url =

2023

[9] [18]

2024 , url =

Gao, Dawei and Wang, Haibin and Li, Yaliang and Sun, Xiuyu and Qian, Yichen and Ding, Bolin and Zhou, Jingren , journal =. 2024 , url =

2024

[10] [19]

2025 , url =

Wang, Bing and Ren, Changyu and Yang, Jian and Liang, Xinnian and Bai, Jiaqi and Zhang, Linzheng and Yan, Zhao and Li, Zhoujun , booktitle =. 2025 , url =

2025

[11] [20]

2024 , url =

Edge, Darren and Trinh, Ha and Cheng, Newman and Bradley, Joshua and Chao, Alex and Mody, Apurva and Truitt, Steven and Larson, Jonathan , journal =. 2024 , url =

2024

[12] [21]

and Akinwande, Victor and Al-Nuaimi, Namir and Alfaraj, Najla and Alhajjar, Elie and Aroyo, Lora and Bavalatti, Trupti and Blili-Hamelin, Borhane and others , journal =

Vidgen, Bertie and Agrawal, Adarsh and Ahmed, Ahmed M. and Akinwande, Victor and Al-Nuaimi, Namir and Alfaraj, Najla and Alhajjar, Elie and Aroyo, Lora and Bavalatti, Trupti and Blili-Hamelin, Borhane and others , journal =. 2024 , url =

2024

[13] [24]

and Wei, Jia , booktitle =

Feng, Wei and Agrawal, Adarsh and Ling, Haibin and Blasch, Erik and Adiles-Cruz, Edmund and Schrader, Philip T. and Wei, Jia , booktitle =

[14] [25]

Agrawal, Adarsh and Li, Jessica , institution =

[15] [26]

Amazon Artificial General Intelligence . 2025 a . Amazon Nova 2: Multimodal Reasoning and Generation Models . Technical report, Amazon

2025

[16] [27]

Amazon Artificial General Intelligence . 2025 b . https://www.amazon.science/publications/amazon-nova-premier-technical-report-and-model-card Amazon Nova Premier: Technical Report and Model Card . Technical report, Amazon

2025

[17] [28]

Amazon Artificial General Intelligence . 2025 c . Amazon Nova Sonic: Technical Report and Model Card . Technical report, Amazon

2025

[18] [29]

Amazon Artificial General Intelligence . 2025 d . https://arxiv.org/abs/2506.12103 The Amazon Nova Family of Models: Technical Report and Model Card . arXiv preprint arXiv:2506.12103

work page arXiv 2025

[19] [30]

Rahul Suresh Babu and Adarsh Agrawal. 2026. https://arxiv.org/abs/2606.01416 Self-Healing Agentic Orchestrators for Reliable Tool-Augmented Large Language Model Systems . Preprint, arXiv:2606.01416

work page internal anchor Pith review Pith/arXiv arXiv 2026

[20] [31]

Shashank Shreedhar Bhatt, Tanmay Rajore, Khushboo Aggarwal, Ganesh Ananthanarayanan, Ranveer Chandra, Nishanth Chandran, Suyash Choudhury, Divya Gupta, Emre Kiciman, Sumit Kumar Pandey, Srinath Setty, Rahul Sharma, and Teijia Zhao. 2025. https://arxiv.org/abs/2509.14608 Enterprise AI Must Enforce Participant-Aware Access Control . Preprint, arXiv:2509.14608

work page arXiv 2025

[21] [32]

Jeffrey Eben, Aitzaz Ahmad, and Stephen Lau. 2025. https://arxiv.org/abs/2507.23104 RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL . Preprint, arXiv:2507.23104

work page arXiv 2025

[22] [33]

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. 2024. https://arxiv.org/abs/2404.16130 From Local to Global: A Graph RAG Approach to Query-Focused Summarization . arXiv preprint arXiv:2404.16130

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [34]

Schrader, and Jia Wei

Wei Feng, Adarsh Agrawal, Haibin Ling, Erik Blasch, Edmund Adiles-Cruz, Philip T. Schrader, and Jia Wei. 2024. DDDAS Probability Learning for Natural Disaster Change Detection . In International Conference on Dynamic Data Driven Applications Systems, pages 90--99

2024

[24] [35]

Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou. 2024. https://arxiv.org/abs/2308.15363 Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation . Proceedings of the VLDB Endowment

work page arXiv 2024

[25] [36]

Michael Glass, Mustafa Eyceoz, Dharmashankar Subramanian, Gaetano Rossiello, Long Vu, and Alfio Gliozzo. 2025. https://arxiv.org/abs/2501.17174 Extractive Schema Linking for Text-to-SQL . Preprint, arXiv:2501.17174

work page arXiv 2025

[26] [37]

Moshe Hazoom, Vibhor Malik, and Ben Bogin. 2021. https://aclanthology.org/2021.nlp4prog-1.9/ Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data . In Proceedings of the 1st Workshop on Natural Language Processing for Programming

2021

[27] [38]

Marathe, Hamid Mozaffari, William F

Bargav Jayaraman, Virendra J. Marathe, Hamid Mozaffari, William F. Shen, and Krishnaram Kenthapadi. 2025. https://arxiv.org/abs/2505.22860 Permissioned LLMs: Enforcing Access Control in Large Language Models . Preprint, arXiv:2505.22860

work page arXiv 2025

[28] [39]

Djordje Klisura, Joseph Khoury, Ashish Kundu, Ram Krishnan, and Anthony Rios. 2025. https://arxiv.org/abs/2510.07642 Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language Models . Preprint, arXiv:2510.07642

work page arXiv 2025

[29] [40]

Adarsh Agrawal and Jessica Li. 2022. Mitigating Bias in AI Using Debias-GAN . White paper, World Wide Technology

2022

[30] [41]

Fangyu Lei, Jixuan Chen, Yuxiao Ye, Ruisheng Cao, Dongchan Shin, Hongjin Su, Zhaoqing Suo, Hongcheng Gao, Wenjing Hu, Pengcheng Yin, Victor Zhong, Caiming Xiong, Ruoxi Sun, Qian Liu, Sida Wang, and Tao Yu. 2025. https://mlanthology.org/iclr/2025/lei2025iclr-spider/ Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows . In ...

2025

[31] [42]

Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, and 1 others. 2023. https://proceedings.neurips.cc/paper_files/paper/2023/hash/9ee883c8a46d6ac8747b4d6edc7e1a6b-Abstract-Datasets_and_Benchmarks.html Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Te...

2023

[32] [43]

Geling Liu, Yunzhi Tan, Ruichao Zhong, Yuanzhen Xie, Lingchen Zhao, Qian Wang, Bo Hu, and Zang Li. 2025. https://aclanthology.org/2025.coling-main.654/ Solid-SQL: Enhanced Schema-linking based In-context Learning for Robust Text-to-SQL . In Proceedings of the 31st International Conference on Computational Linguistics

2025

[33] [44]

Zhiqiang Luo, Liang Xie, Jingping Chen, Yiduo He, Zhenyu Li, Weian Chen, and Bo Yang. 2023. https://aclanthology.org/2023.emnlp-main.868/ CRUSH4SQL: Collective Retrieval Using Schema Hallucination For Text2SQL . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

2023

[34] [45]

Md Mahadi Hasan Nahid, Davood Rafiei, Weiwei Zhang, and Yong Zhang. 2025. https://arxiv.org/abs/2510.14296 Rethinking Schema Linking: A Context-Aware Bidirectional Retrieval Approach for Text-to-SQL . Preprint, arXiv:2510.14296

work page arXiv 2025

[35] [46]

Mohammadreza Pourreza and Davood Rafiei. 2023. https://arxiv.org/abs/2304.11015 DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction . In Advances in Neural Information Processing Systems

work page arXiv 2023

[36] [47]

AmirHossein Safdarian, Milad Mohammadi, Ehsan Jahanbakhsh, Mona Shahamat Naderi, and Heshaam Faili. 2025. https://arxiv.org/abs/2505.18363 SchemaGraphSQL: Efficient Schema Linking with Pathfinding Graph Algorithms for Text-to-SQL on Large-Scale Databases . Preprint, arXiv:2505.18363

work page arXiv 2025

[37] [48]

Chaitanya Sharma. 2025. https://arxiv.org/abs/2506.00054 Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers . Preprint, arXiv:2506.00054

work page arXiv 2025

[38] [49]

Shivani Upadhyay, Nandan Thakur, Ronak Pradeep, Nick Craswell, Daniel Campos, and Jimmy Lin. 2026. https://arxiv.org/abs/2603.09891 Overview of the TREC 2025 Retrieval Augmented Generation Track . Preprint, arXiv:2603.09891

work page arXiv 2026

[39] [50]

Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, and 1 others

Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, and 1 others. 2024. https://arxiv.org/abs/2404.12241 Introducing v0.5 of the AI Safety Benchmark from MLCommons . arXiv preprint arXiv:2404.12241

work page arXiv 2024

[40] [51]

Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, Linzheng Zhang, Zhao Yan, and Zhoujun Li. 2025 a . https://arxiv.org/abs/2312.11242 MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL . In Proceedings of the 31st International Conference on Computational Linguistics

work page arXiv 2025

[41] [52]

Yihan Wang, Peiyu Liu, and Xin Yang. 2025 b . https://aclanthology.org/2025.emnlp-main.51/ LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

2025

[42] [53]

Ziyang Wang, Yuanlei Zheng, Zhenbiao Cao, Xiaojin Zhang, Zhongyu Wei, Pei Fu, Zhenbo Luo, Wei Chen, and Xiang Bai. 2026. https://ojs.aaai.org/index.php/AAAI/article/view/40672 AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale . In Proceedings of the AAAI Conference on Artificial Intelligence

2026