pith. sign in

arxiv: 2405.16755 · v3 · pith:7PWO22UInew · submitted 2024-05-27 · 💻 cs.LG · cs.AI· cs.DB

CHESS: Contextual Harnessing for Efficient SQL Synthesis

Pith reviewed 2026-05-19 11:19 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DB
keywords text-to-SQLmulti-agent systemslarge language modelsSQL synthesisschema pruningdatabase catalogsquery validationindustrial-scale databases
0
0 comments X p. Extension
pith:7PWO22UI Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{7PWO22UI}

Prints a linked pith:7PWO22UI badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

CHESS deploys four LLM agents to prune massive database schemas and validate SQL outputs for accurate text-to-SQL on industrial-scale data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CHESS to convert natural language questions into working SQL even when databases contain thousands of tables and columns. It splits the work across an Information Retriever that pulls relevant facts, a Schema Selector that trims oversized catalogs, a Candidate Generator that builds and refines queries, and a Unit Tester that checks correctness with LLM-written natural-language tests. The authors demonstrate that this structure supports very large real-world schemas, delivers leading accuracy among open-source systems, and scales to 71.10 percent on the BIRD benchmark while using far fewer model calls than closed alternatives. A reader would care because reliable text-to-SQL removes the need for experts to write queries by hand and makes complex data accessible under tight cost and privacy limits.

Core claim

CHESS is an LLM-based multi-agent framework with four agents that together solve the core difficulties of text-to-SQL: the Information Retriever extracts relevant data, the Schema Selector prunes large schemas, the Candidate Generator produces and iteratively refines queries, and the Unit Tester validates functional correctness through LLM-generated natural-language unit tests.

What carries the argument

Four-agent LLM system in which the Schema Selector narrows large catalogs and the Unit Tester checks candidate queries with natural-language tests.

If this is right

  • The Schema Selector raises accuracy roughly 2 percent and cuts LLM tokens by a factor of five on large schemas.
  • CHESS reaches state-of-the-art accuracy among open-source methods on standard text-to-SQL benchmarks.
  • With additional compute the system attains 71.10 percent accuracy on the BIRD test set while using about 83 percent fewer LLM calls than the leading proprietary approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The modular agent design could be reused for other structured generation tasks such as converting language to Python or other query languages.
  • Keeping all agents on open-source models reduces the need to transmit sensitive database content to external services.
  • The large drop in model calls suggests the approach may support lower-latency interactive query assistants in production environments.

Load-bearing premise

The Unit Tester is assumed to catch functional errors in SQL candidates through LLM-generated natural-language unit tests without systematic false negatives.

What would settle it

A benchmark set of queries where many candidates pass the Unit Tester's natural-language checks yet return incorrect results on actual database execution would show the validation step does not reliably ensure correctness.

read the original abstract

Translating natural language questions into SQL queries, known as text-to-SQL, is a long-standing research problem. Effective text-to-SQL synthesis can become very challenging due to (i) the extensive size of database catalogs (descriptions of tables and their columns) and database values, (ii) reasoning over large database schemas, (iii) ensuring the functional validity of the generated queries, and (iv) navigating the ambiguities of natural language questions. We introduce CHESS, a Large Language Model (LLM) based multi-agent framework for efficient and scalable SQL synthesis, comprising four specialized agents, each targeting one of the aforementioned challenges: the Information Retriever (IR) extracts relevant data, the Schema Selector (SS) prunes large schemas, the Candidate Generator (CG) generates high-quality candidates and refines queries iteratively, and the Unit Tester (UT) validates queries through LLM-based natural language unit tests. Our framework offers configurable features that adapt to various deployment constraints, including 1) Supporting industrial-scale databases: leveraging the Schema Selector agent, CHESS efficiently narrows down very large database schemas into manageable sub-schemas, boosting system accuracy by approximately $2\%$ and reducing the number of LLM tokens by $\times 5$. 2) State-of-the-Art privacy-preserving performance: Among the methods using open-source models, CHESS achieves state-of-the-art performance, resulting in a high-performing, privacy-preserving system suitable for industrial deployment. 3) Scalablity with additional compute budget: In settings with high computational budgets, CHESS achieves $71.10\%$ accuracy on the BIRD test set, within $2\%$ of the leading proprietary method, while requiring approximately $83\%$ fewer LLM calls.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces CHESS, a multi-agent LLM framework for text-to-SQL synthesis with four agents (Information Retriever, Schema Selector, Candidate Generator, and Unit Tester) that respectively extract relevant data, prune large schemas, generate and refine candidates, and validate queries via LLM-generated natural-language unit tests. It claims support for industrial-scale schemas (with ~2% accuracy boost and 5x token reduction), SOTA performance among open-source methods, and 71.10% accuracy on the BIRD test set (within 2% of leading proprietary systems) while using ~83% fewer LLM calls under higher compute budgets.

Significance. If the empirical results and validation protocol hold, the work demonstrates a practical, configurable multi-agent approach that scales text-to-SQL to large industrial schemas while preserving privacy through open-source models and reducing LLM usage, which could inform efficient agentic systems for database interfaces.

major comments (2)
  1. [§3.4 and §4] §3.4 (Unit Tester description) and §4 (evaluation protocol): The headline accuracy figures (including 71.10% on BIRD) rest on the Unit Tester accepting candidate queries as functionally correct. The UT generates natural-language tests from the question and uses an LLM to check satisfaction, yet the manuscript provides no execution-based oracle, human verification of UT decisions, or analysis of false-negative rates. This directly undermines the functional-validity claim and the reported accuracies, as incomplete tests or checker errors would count incorrect SQL as successes.
  2. [§4] §4 (experimental results): The abstract and results sections report headline accuracy and token-reduction numbers without error bars, standard deviations across runs, or detailed ablation studies isolating the contribution of each agent (e.g., Schema Selector vs. full CHESS). In addition, the counting of “post-hoc query fixes” is not described, making it impossible to judge whether the claimed gains over baselines are robust.
minor comments (2)
  1. [Figure 2 and §3.2] Figure 2 and §3.2: The schema-pruning example would benefit from an explicit before/after table size comparison to illustrate the claimed 5x token reduction.
  2. [§5] §5 (related work): A few recent open-source text-to-SQL baselines that also use schema linking or self-consistency are cited only briefly; expanding the comparison table would strengthen the SOTA claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We address each major comment point by point below and outline the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: [§3.4 and §4] §3.4 (Unit Tester description) and §4 (evaluation protocol): The headline accuracy figures (including 71.10% on BIRD) rest on the Unit Tester accepting candidate queries as functionally correct. The UT generates natural-language tests from the question and uses an LLM to check satisfaction, yet the manuscript provides no execution-based oracle, human verification of UT decisions, or analysis of false-negative rates. This directly undermines the functional-validity claim and the reported accuracies, as incomplete tests or checker errors would count incorrect SQL as successes.

    Authors: We appreciate the referee highlighting this key aspect of our evaluation. The Unit Tester is intentionally designed as an LLM-based natural-language validator to support privacy-preserving and execution-restricted industrial deployments where direct SQL execution may be infeasible or undesirable. We acknowledge, however, that the current manuscript lacks an execution-based oracle, human verification, or explicit false-negative analysis, which limits the strength of the functional-validity claims. In the revised version we will add: (i) a manual inspection of a sampled subset of UT decisions to estimate false-negative rates, (ii) a comparison of UT outcomes against execution results on the portion of BIRD where execution is feasible, and (iii) an explicit limitations discussion of LLM-based validation. These additions will better substantiate the reported accuracies. revision: yes

  2. Referee: [§4] §4 (experimental results): The abstract and results sections report headline accuracy and token-reduction numbers without error bars, standard deviations across runs, or detailed ablation studies isolating the contribution of each agent (e.g., Schema Selector vs. full CHESS). In addition, the counting of “post-hoc query fixes” is not described, making it impossible to judge whether the claimed gains over baselines are robust.

    Authors: We agree that greater statistical transparency and component-level analysis would improve the results section. We will expand the experimental evaluation to include error bars and standard deviations obtained from multiple runs with different random seeds for the primary metrics. We will also add more detailed ablation studies that isolate the contribution of each agent (including Schema Selector versus full CHESS). Finally, we will provide a precise description of the post-hoc query fixes procedure, including the criteria used, how fixes are counted, and their quantitative effect on accuracy. These changes will make the robustness of the reported gains clearer. revision: partial

Circularity Check

0 steps flagged

No circularity: performance claims rest on external benchmark evaluations, not internal derivations or self-referential fits.

full rationale

The paper describes an LLM multi-agent system (IR, SS, CG, UT) for text-to-SQL and reports empirical accuracy numbers on BIRD and similar benchmarks. These are direct experimental comparisons against external baselines and prior methods, with no equations, fitted parameters, or derivations that reduce claimed results to quantities defined inside the paper itself. The Unit Tester component relies on LLM-generated tests, but this is an empirical assumption affecting validity rather than a circular reduction in any derivation chain. No load-bearing self-citations or ansatzes are invoked to force the central claims.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the empirical premise that current LLMs can be prompted to perform reliable schema pruning and unit testing; no new mathematical axioms or physical entities are introduced.

axioms (1)
  • domain assumption Large language models can be prompted to extract relevant database values, prune schemas without losing critical columns, generate valid SQL candidates, and write effective natural-language unit tests.
    All four agents depend on this capability; the paper provides no independent verification that the prompts succeed across arbitrary schemas.

pith-pipeline@v0.9.0 · 5863 in / 1238 out tokens · 42868 ms · 2026-05-19T11:19:02.552096+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

    cs.CR 2025-07 unverdicted novelty 8.0

    ExCyTIn-Bench is the first benchmark of 7542 questions from Microsoft Sentinel threat investigation graphs, where the best LLM agent achieves a reward of 0.606.

  2. NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions

    cs.DB 2026-04 conditional novelty 7.0

    NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.

  3. Agentic Jackal: Live Execution and Semantic Value Grounding for Text-to-JQL

    cs.CL 2026-04 unverdicted novelty 7.0

    Jackal is the first execution-verified benchmark for text-to-JQL with 100k pairs, and Agentic Jackal with JiraAnchor semantic retrieval lifts categorical value accuracy from 48.7% to 71.7% and overall execution accura...

  4. Draft-Refine-Optimize: Self-Evolved Learning for Natural Language to MongoDB Query Generation

    cs.DB 2026-03 unverdicted novelty 7.0

    EvoMQL uses iterative Draft-Refine-Optimize cycles with execution feedback to reach 76.6% accuracy on EAI and 83.1% on TEND benchmarks for natural language to MongoDB query generation.

  5. SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

    cs.DB 2026-03 unverdicted novelty 7.0

    SpotIt+ uses verification to find realistic counterexample databases that expose discrepancies between generated and gold SQL queries missed by standard test-based evaluation on the BIRD dataset.

  6. DeepEye-SQL: A Software-Engineering-Inspired Text-to-SQL Framework

    cs.DB 2025-10 unverdicted novelty 7.0

    DeepEye-SQL applies SDLC-inspired orchestration to Text-to-SQL, achieving 73.5% on BIRD-Dev, 75.07% on BIRD-Test, and 89.8% on Spider-Test with ~30B MoE models.

  7. Data-aware candidate selection in NL2SQL translation via small separating instances

    cs.DB 2026-05 unverdicted novelty 6.0

    A selection technique based on separating instances and provenance outperforms baselines for choosing among 2-3 NL2SQL candidates on a BIRD-DEV subset without consistency scores.

  8. FINER-SQL: Boosting Small Language Models for Text-to-SQL

    cs.DB 2026-05 unverdicted novelty 6.0

    FINER-SQL boosts 3B-parameter small language models to 67.73% and 85% execution accuracy on BIRD and Spider benchmarks via dense memory and atomic rewards in group relative policy optimization, matching larger LLMs at...

  9. FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents

    cs.CL 2026-05 unverdicted novelty 6.0

    FlexSQL reaches 65.4% on Spider2-Snow by allowing agents to flexibly explore schemas, generate diverse plans, choose SQL or Python execution, and apply two-tiered repair.

  10. EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement

    cs.DB 2026-05 unverdicted novelty 6.0

    EGRefine optimizes column renamings via execution-grounded verification and view materialization to recover Text-to-SQL accuracy lost to schema naming issues while guaranteeing query equivalence.

  11. SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models

    cs.DB 2026-04 unverdicted novelty 6.0

    SEMA-SQL formalizes Hybrid Relational Algebra to let users pose natural language questions answered by automatically generated queries that combine relational operators with LLM semantic reasoning, cutting LLM calls b...

  12. SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models

    cs.DB 2026-04 unverdicted novelty 6.0

    SEMA-SQL automates natural language to efficient hybrid queries combining relational algebra with LLM semantic operations via a new Hybrid Relational Algebra abstraction.

  13. SemanticAgent: A Semantics-Aware Framework for Text-to-SQL Data Synthesis

    cs.AI 2026-04 unverdicted novelty 6.0

    SemanticAgent introduces a three-stage semantic analysis, synthesis, and verification process that produces higher-quality text-to-SQL training data than prior execution-only methods.

  14. AV-SQL: Decomposing Complex Text-to-SQL Queries with Agentic Views

    cs.DB 2026-04 unverdicted novelty 6.0

    AV-SQL uses a pipeline of LLM agents to generate intermediate CTE views that decompose complex Text-to-SQL queries, reaching 70.38% execution accuracy on Spider 2.0.

  15. SecureMCP: A Policy-Enforced LLM Data Access Framework for AIoT Systems via Model Context Protocol

    cs.CR 2026-05 unverdicted novelty 5.0

    SecureMCP integrates RBAC with five sequential defense modules in an MCP server to achieve 82.3% policy compliance against adversarial LLM SQL queries in AIoT while preserving execution accuracy.

  16. Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning

    cs.CL 2026-04 unverdicted novelty 5.0

    APMPO boosts average Pass@1 scores on math reasoning benchmarks by 3 points over GRPO by using an adaptive power-mean policy objective and feedback-driven clipping bounds in RLVR training.

  17. Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

    cs.CL 2026-04 unverdicted novelty 5.0

    FREIA applies free energy principles and adaptive advantage shaping to unsupervised RL, outperforming baselines by 0.5-3.5 Pass@1 points on math reasoning with a 1.5B model.

  18. MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL

    cs.CL 2025-11 unverdicted novelty 5.0

    MARS-SQL trains a multi-agent RL system with ReAct-style interaction and generative validation to produce SQL queries, reaching 77.84% execution accuracy on BIRD dev and 89.75% on Spider test.

  19. XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL

    cs.CL 2025-07 unverdicted novelty 5.0

    XiYan-SQL achieves SOTA Text-to-SQL accuracy by combining schema filtering, a multi-generator ensemble fine-tuned on varied SQL formats, and a selection model.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · cited by 18 Pith papers · 11 internal anchors

  1. [1]

    Langley , title =

    P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

  2. [2]

    T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

  3. [3]

    M. J. Kearns , title =

  4. [4]

    Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

  5. [5]

    R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

  6. [6]

    Suppressed for Anonymity , author=

  7. [7]

    Newell and P

    A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

  8. [8]

    A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

  9. [9]

    Exploring meta llama-3

    Meta AI. Exploring meta llama-3. https://ai.meta.com/blog/meta-llama-3/. Accessed: 2024-04-18

  10. [11]

    Sadga: Structure-aware dual graph aggregation network for text-to-sql

    Ruichu Cai, Jinjie Yuan, Boyan Xu, and Zhifeng Hao. Sadga: Structure-aware dual graph aggregation network for text-to-sql. Advances in Neural Information Processing Systems, 34: 0 7664--7676, 2021

  11. [13]

    Qlora: Efficient finetuning of quantized llms

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36, 2024

  12. [23]

    Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls

    Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, et al. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems, 36, 2024 b

  13. [24]

    Lost in the middle: How language models use long contexts

    Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12: 0 157--173, 2024

  14. [26]

    Embeddings, 2024 a

    OpenAI. Embeddings, 2024 a . Retrieved May 15, 2024, from https://platform.openai.com/docs/guides/embeddings

  15. [27]

    Hello gpt-4o

    OpenAI. Hello gpt-4o. https://openai.com/index/hello-gpt-4o/, 2024 b . Accessed: October 15, 2024

  16. [29]

    Din-sql: Decomposed in-context learning of text-to-sql with self-correction

    Mohammadreza Pourreza and Davood Rafiei. Din-sql: Decomposed in-context learning of text-to-sql with self-correction. Advances in Neural Information Processing Systems, 36, 2024 a

  17. [35]

    Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, and Azalia Mirhoseini

    Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, E. Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, and Azalia Mirhoseini. Archon: An architecture search framework for inference-time techniques, 2024. URL https://arxiv.org/abs/2409.15254

  18. [36]

    Sequence to sequence learning with neural networks

    Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014

  19. [38]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017

  20. [42]

    Learning to parse database queries using inductive logic programming

    John M Zelle and Raymond J Mooney. Learning to parse database queries using inductive logic programming. In Proceedings of the national conference on artificial intelligence, pages 1050--1055, 1996

  21. [43]

    Advances in neural information processing systems , volume=

    Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

  22. [44]

    Proceedings of the national conference on artificial intelligence , pages=

    Learning to parse database queries using inductive logic programming , author=. Proceedings of the national conference on artificial intelligence , pages=

  23. [45]

    Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

    Large language monkeys: Scaling inference compute with repeated sampling , author=. arXiv preprint arXiv:2407.21787 , year=

  24. [46]

    arXiv preprint arXiv:2410.01943 , year=

    CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL , author=. arXiv preprint arXiv:2410.01943 , year=

  25. [47]

    arXiv preprint arXiv:2408.07702 , year=

    The death of schema linking? text-to-sql in the age of well-reasoned language models , author=. arXiv preprint arXiv:2408.07702 , year=

  26. [48]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

  27. [49]

    arXiv preprint arXiv:2310.18538 , year=

    Evaluating cross-domain text-to-sql models and benchmarks , author=. arXiv preprint arXiv:2310.18538 , year=

  28. [50]

    arXiv preprint arXiv:2106.01065 , year=

    Towards robustness of text-to-SQL models against synonym substitution , author=. arXiv preprint arXiv:2106.01065 , year=

  29. [51]

    arXiv preprint arXiv:2208.05309 , year=

    Looking for a needle in a haystack: A comprehensive study of hallucinations in neural machine translation , author=. arXiv preprint arXiv:2208.05309 , year=

  30. [52]

    Transactions of the Association for Computational Linguistics , volume=

    Lost in the middle: How language models use long contexts , author=. Transactions of the Association for Computational Linguistics , volume=. 2024 , publisher=

  31. [53]

    Advances in neural information processing systems , volume=

    Attention is all you need , author=. Advances in neural information processing systems , volume=

  32. [54]

    Advances in neural information processing systems , volume=

    Sequence to sequence learning with neural networks , author=. Advances in neural information processing systems , volume=

  33. [55]

    arXiv preprint arXiv:2208.13629 , year=

    A survey on text-to-sql parsing: Concepts, methods, and future directions , author=. arXiv preprint arXiv:2208.13629 , year=

  34. [56]

    Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

    Towards complex text-to-sql in cross-domain database with intermediate representation , author=. arXiv preprint arXiv:1905.08205 , year=

  35. [57]

    Computational Linguistics , volume=

    Ryansql: Recursively applying sketch-based slot fillings for complex text-to-sql in cross-domain databases , author=. Computational Linguistics , volume=. 2021 , publisher=

  36. [58]

    arXiv preprint arXiv:1911.04942 , year=

    Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers , author=. arXiv preprint arXiv:1911.04942 , year=

  37. [59]

    arXiv preprint arXiv:2205.06983 , year=

    Rasat: Integrating relational structures into pretrained seq2seq model for text-to-sql , author=. arXiv preprint arXiv:2205.06983 , year=

  38. [60]

    Advances in Neural Information Processing Systems , volume=

    Sadga: Structure-aware dual graph aggregation network for text-to-sql , author=. Advances in Neural Information Processing Systems , volume=

  39. [61]

    arXiv preprint arXiv:2106.01093 , year=

    LGESQL: line graph enhanced text-to-SQL model with mixed local and non-local relations , author=. arXiv preprint arXiv:2106.01093 , year=

  40. [62]

    Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

    Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task , author=. arXiv preprint arXiv:1809.08887 , year=

  41. [63]

    arXiv preprint arXiv:2204.00498 , year=

    Evaluating the text-to-sql capabilities of large language models , author=. arXiv preprint arXiv:2204.00498 , year=

  42. [64]

    Advances in Neural Information Processing Systems , volume=

    Din-sql: Decomposed in-context learning of text-to-sql with self-correction , author=. Advances in Neural Information Processing Systems , volume=

  43. [65]

    arXiv preprint arXiv:2308.15363 , year=

    Text-to-sql empowered by large language models: A benchmark evaluation , author=. arXiv preprint arXiv:2308.15363 , year=

  44. [66]

    arXiv preprint arXiv:2307.07306 , year=

    C3: Zero-shot text-to-sql with chatgpt , author=. arXiv preprint arXiv:2307.07306 , year=

  45. [67]

    Advances in neural information processing systems , volume=

    Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

  46. [68]

    Self-Consistency Improves Chain of Thought Reasoning in Language Models

    Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=

  47. [69]

    Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

    Least-to-most prompting enables complex reasoning in large language models , author=. arXiv preprint arXiv:2205.10625 , year=

  48. [70]

    arXiv preprint arXiv:2312.11242 , year=

    Mac-sql: Multi-agent collaboration for text-to-sql , author=. arXiv preprint arXiv:2312.11242 , year=

  49. [71]

    arXiv preprint arXiv:2402.01117 , year=

    DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models , author=. arXiv preprint arXiv:2402.01117 , year=

  50. [72]

    arXiv preprint arXiv:2402.16347 , year=

    CodeS: Towards Building Open-source Language Models for Text-to-SQL , author=. arXiv preprint arXiv:2402.16347 , year=

  51. [73]

    Advances in Neural Information Processing Systems , volume=

    Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls , author=. Advances in Neural Information Processing Systems , volume=

  52. [74]

    arXiv e-prints , pages=

    RULER: What's the Real Context Size of Your Long-Context Language Models? , author=. arXiv e-prints , pages=

  53. [75]

    Emergent Abilities of Large Language Models

    Emergent abilities of large language models , author=. arXiv preprint arXiv:2206.07682 , year=

  54. [76]

    arXiv preprint arXiv:2403.09732 , year=

    PET-SQL: A Prompt-enhanced Two-stage Text-to-SQL Framework with Cross-consistency , author=. arXiv preprint arXiv:2403.09732 , year=

  55. [77]

    Towards Reasoning in Large Language Models: A Survey

    Towards reasoning in large language models: A survey , author=. arXiv preprint arXiv:2212.10403 , year=

  56. [78]

    Exploring Meta Llama-3 , author =

  57. [79]

    DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

    DeepSeek-Coder: When the Large Language Model Meets Programming--The Rise of Code Intelligence , author=. arXiv preprint arXiv:2401.14196 , year=

  58. [80]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Resdsql: Decoupling schema linking and skeleton parsing for text-to-sql , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  59. [81]

    arXiv preprint arXiv:2405.07467 , year=

    MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation , author=. arXiv preprint arXiv:2405.07467 , year=

  60. [82]

    2024 , eprint=

    Archon: An Architecture Search Framework for Inference-Time Techniques , author=. 2024 , eprint=

  61. [83]

    LoRA: Low-Rank Adaptation of Large Language Models

    Lora: Low-rank adaptation of large language models , author=. arXiv preprint arXiv:2106.09685 , year=

  62. [84]

    Advances in Neural Information Processing Systems , volume=

    Qlora: Efficient finetuning of quantized llms , author=. Advances in Neural Information Processing Systems , volume=

  63. [85]

    2024 , note =

    OpenAI , title =. 2024 , note =

  64. [86]

    Constitutional AI: Harmlessness from AI Feedback

    Constitutional ai: Harmlessness from ai feedback , author=. arXiv preprint arXiv:2212.08073 , year=