CHESS: Contextual Harnessing for Efficient SQL Synthesis
Pith reviewed 2026-05-19 11:19 UTC · model grok-4.3
pith:7PWO22UI Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{7PWO22UI}
Prints a linked pith:7PWO22UI badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
CHESS deploys four LLM agents to prune massive database schemas and validate SQL outputs for accurate text-to-SQL on industrial-scale data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CHESS is an LLM-based multi-agent framework with four agents that together solve the core difficulties of text-to-SQL: the Information Retriever extracts relevant data, the Schema Selector prunes large schemas, the Candidate Generator produces and iteratively refines queries, and the Unit Tester validates functional correctness through LLM-generated natural-language unit tests.
What carries the argument
Four-agent LLM system in which the Schema Selector narrows large catalogs and the Unit Tester checks candidate queries with natural-language tests.
If this is right
- The Schema Selector raises accuracy roughly 2 percent and cuts LLM tokens by a factor of five on large schemas.
- CHESS reaches state-of-the-art accuracy among open-source methods on standard text-to-SQL benchmarks.
- With additional compute the system attains 71.10 percent accuracy on the BIRD test set while using about 83 percent fewer LLM calls than the leading proprietary approach.
Where Pith is reading between the lines
- The modular agent design could be reused for other structured generation tasks such as converting language to Python or other query languages.
- Keeping all agents on open-source models reduces the need to transmit sensitive database content to external services.
- The large drop in model calls suggests the approach may support lower-latency interactive query assistants in production environments.
Load-bearing premise
The Unit Tester is assumed to catch functional errors in SQL candidates through LLM-generated natural-language unit tests without systematic false negatives.
What would settle it
A benchmark set of queries where many candidates pass the Unit Tester's natural-language checks yet return incorrect results on actual database execution would show the validation step does not reliably ensure correctness.
read the original abstract
Translating natural language questions into SQL queries, known as text-to-SQL, is a long-standing research problem. Effective text-to-SQL synthesis can become very challenging due to (i) the extensive size of database catalogs (descriptions of tables and their columns) and database values, (ii) reasoning over large database schemas, (iii) ensuring the functional validity of the generated queries, and (iv) navigating the ambiguities of natural language questions. We introduce CHESS, a Large Language Model (LLM) based multi-agent framework for efficient and scalable SQL synthesis, comprising four specialized agents, each targeting one of the aforementioned challenges: the Information Retriever (IR) extracts relevant data, the Schema Selector (SS) prunes large schemas, the Candidate Generator (CG) generates high-quality candidates and refines queries iteratively, and the Unit Tester (UT) validates queries through LLM-based natural language unit tests. Our framework offers configurable features that adapt to various deployment constraints, including 1) Supporting industrial-scale databases: leveraging the Schema Selector agent, CHESS efficiently narrows down very large database schemas into manageable sub-schemas, boosting system accuracy by approximately $2\%$ and reducing the number of LLM tokens by $\times 5$. 2) State-of-the-Art privacy-preserving performance: Among the methods using open-source models, CHESS achieves state-of-the-art performance, resulting in a high-performing, privacy-preserving system suitable for industrial deployment. 3) Scalablity with additional compute budget: In settings with high computational budgets, CHESS achieves $71.10\%$ accuracy on the BIRD test set, within $2\%$ of the leading proprietary method, while requiring approximately $83\%$ fewer LLM calls.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CHESS, a multi-agent LLM framework for text-to-SQL synthesis with four agents (Information Retriever, Schema Selector, Candidate Generator, and Unit Tester) that respectively extract relevant data, prune large schemas, generate and refine candidates, and validate queries via LLM-generated natural-language unit tests. It claims support for industrial-scale schemas (with ~2% accuracy boost and 5x token reduction), SOTA performance among open-source methods, and 71.10% accuracy on the BIRD test set (within 2% of leading proprietary systems) while using ~83% fewer LLM calls under higher compute budgets.
Significance. If the empirical results and validation protocol hold, the work demonstrates a practical, configurable multi-agent approach that scales text-to-SQL to large industrial schemas while preserving privacy through open-source models and reducing LLM usage, which could inform efficient agentic systems for database interfaces.
major comments (2)
- [§3.4 and §4] §3.4 (Unit Tester description) and §4 (evaluation protocol): The headline accuracy figures (including 71.10% on BIRD) rest on the Unit Tester accepting candidate queries as functionally correct. The UT generates natural-language tests from the question and uses an LLM to check satisfaction, yet the manuscript provides no execution-based oracle, human verification of UT decisions, or analysis of false-negative rates. This directly undermines the functional-validity claim and the reported accuracies, as incomplete tests or checker errors would count incorrect SQL as successes.
- [§4] §4 (experimental results): The abstract and results sections report headline accuracy and token-reduction numbers without error bars, standard deviations across runs, or detailed ablation studies isolating the contribution of each agent (e.g., Schema Selector vs. full CHESS). In addition, the counting of “post-hoc query fixes” is not described, making it impossible to judge whether the claimed gains over baselines are robust.
minor comments (2)
- [Figure 2 and §3.2] Figure 2 and §3.2: The schema-pruning example would benefit from an explicit before/after table size comparison to illustrate the claimed 5x token reduction.
- [§5] §5 (related work): A few recent open-source text-to-SQL baselines that also use schema linking or self-consistency are cited only briefly; expanding the comparison table would strengthen the SOTA claim.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback on our manuscript. We address each major comment point by point below and outline the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: [§3.4 and §4] §3.4 (Unit Tester description) and §4 (evaluation protocol): The headline accuracy figures (including 71.10% on BIRD) rest on the Unit Tester accepting candidate queries as functionally correct. The UT generates natural-language tests from the question and uses an LLM to check satisfaction, yet the manuscript provides no execution-based oracle, human verification of UT decisions, or analysis of false-negative rates. This directly undermines the functional-validity claim and the reported accuracies, as incomplete tests or checker errors would count incorrect SQL as successes.
Authors: We appreciate the referee highlighting this key aspect of our evaluation. The Unit Tester is intentionally designed as an LLM-based natural-language validator to support privacy-preserving and execution-restricted industrial deployments where direct SQL execution may be infeasible or undesirable. We acknowledge, however, that the current manuscript lacks an execution-based oracle, human verification, or explicit false-negative analysis, which limits the strength of the functional-validity claims. In the revised version we will add: (i) a manual inspection of a sampled subset of UT decisions to estimate false-negative rates, (ii) a comparison of UT outcomes against execution results on the portion of BIRD where execution is feasible, and (iii) an explicit limitations discussion of LLM-based validation. These additions will better substantiate the reported accuracies. revision: yes
-
Referee: [§4] §4 (experimental results): The abstract and results sections report headline accuracy and token-reduction numbers without error bars, standard deviations across runs, or detailed ablation studies isolating the contribution of each agent (e.g., Schema Selector vs. full CHESS). In addition, the counting of “post-hoc query fixes” is not described, making it impossible to judge whether the claimed gains over baselines are robust.
Authors: We agree that greater statistical transparency and component-level analysis would improve the results section. We will expand the experimental evaluation to include error bars and standard deviations obtained from multiple runs with different random seeds for the primary metrics. We will also add more detailed ablation studies that isolate the contribution of each agent (including Schema Selector versus full CHESS). Finally, we will provide a precise description of the post-hoc query fixes procedure, including the criteria used, how fixes are counted, and their quantitative effect on accuracy. These changes will make the robustness of the reported gains clearer. revision: partial
Circularity Check
No circularity: performance claims rest on external benchmark evaluations, not internal derivations or self-referential fits.
full rationale
The paper describes an LLM multi-agent system (IR, SS, CG, UT) for text-to-SQL and reports empirical accuracy numbers on BIRD and similar benchmarks. These are direct experimental comparisons against external baselines and prior methods, with no equations, fitted parameters, or derivations that reduce claimed results to quantities defined inside the paper itself. The Unit Tester component relies on LLM-generated tests, but this is an empirical assumption affecting validity rather than a circular reduction in any derivation chain. No load-bearing self-citations or ansatzes are invoked to force the central claims.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can be prompted to extract relevant database values, prune schemas without losing critical columns, generate valid SQL candidates, and write effective natural-language unit tests.
Forward citations
Cited by 19 Pith papers
-
ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation
ExCyTIn-Bench is the first benchmark of 7542 questions from Microsoft Sentinel threat investigation graphs, where the best LLM agent achieves a reward of 0.606.
-
NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
-
Agentic Jackal: Live Execution and Semantic Value Grounding for Text-to-JQL
Jackal is the first execution-verified benchmark for text-to-JQL with 100k pairs, and Agentic Jackal with JiraAnchor semantic retrieval lifts categorical value accuracy from 48.7% to 71.7% and overall execution accura...
-
Draft-Refine-Optimize: Self-Evolved Learning for Natural Language to MongoDB Query Generation
EvoMQL uses iterative Draft-Refine-Optimize cycles with execution feedback to reach 76.6% accuracy on EAI and 83.1% on TEND benchmarks for natural language to MongoDB query generation.
-
SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints
SpotIt+ uses verification to find realistic counterexample databases that expose discrepancies between generated and gold SQL queries missed by standard test-based evaluation on the BIRD dataset.
-
DeepEye-SQL: A Software-Engineering-Inspired Text-to-SQL Framework
DeepEye-SQL applies SDLC-inspired orchestration to Text-to-SQL, achieving 73.5% on BIRD-Dev, 75.07% on BIRD-Test, and 89.8% on Spider-Test with ~30B MoE models.
-
Data-aware candidate selection in NL2SQL translation via small separating instances
A selection technique based on separating instances and provenance outperforms baselines for choosing among 2-3 NL2SQL candidates on a BIRD-DEV subset without consistency scores.
-
FINER-SQL: Boosting Small Language Models for Text-to-SQL
FINER-SQL boosts 3B-parameter small language models to 67.73% and 85% execution accuracy on BIRD and Spider benchmarks via dense memory and atomic rewards in group relative policy optimization, matching larger LLMs at...
-
FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents
FlexSQL reaches 65.4% on Spider2-Snow by allowing agents to flexibly explore schemas, generate diverse plans, choose SQL or Python execution, and apply two-tiered repair.
-
EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement
EGRefine optimizes column renamings via execution-grounded verification and view materialization to recover Text-to-SQL accuracy lost to schema naming issues while guaranteeing query equivalence.
-
SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models
SEMA-SQL formalizes Hybrid Relational Algebra to let users pose natural language questions answered by automatically generated queries that combine relational operators with LLM semantic reasoning, cutting LLM calls b...
-
SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models
SEMA-SQL automates natural language to efficient hybrid queries combining relational algebra with LLM semantic operations via a new Hybrid Relational Algebra abstraction.
-
SemanticAgent: A Semantics-Aware Framework for Text-to-SQL Data Synthesis
SemanticAgent introduces a three-stage semantic analysis, synthesis, and verification process that produces higher-quality text-to-SQL training data than prior execution-only methods.
-
AV-SQL: Decomposing Complex Text-to-SQL Queries with Agentic Views
AV-SQL uses a pipeline of LLM agents to generate intermediate CTE views that decompose complex Text-to-SQL queries, reaching 70.38% execution accuracy on Spider 2.0.
-
SecureMCP: A Policy-Enforced LLM Data Access Framework for AIoT Systems via Model Context Protocol
SecureMCP integrates RBAC with five sequential defense modules in an MCP server to achieve 82.3% policy compliance against adversarial LLM SQL queries in AIoT while preserving execution accuracy.
-
Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning
APMPO boosts average Pass@1 scores on math reasoning benchmarks by 3 points over GRPO by using an adaptive power-mean policy objective and feedback-driven clipping bounds in RLVR training.
-
Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs
FREIA applies free energy principles and adaptive advantage shaping to unsupervised RL, outperforming baselines by 0.5-3.5 Pass@1 points on math reasoning with a 1.5B model.
-
MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL
MARS-SQL trains a multi-agent RL system with ReAct-style interaction and generative validation to produce SQL queries, reaching 77.84% execution accuracy on BIRD dev and 89.75% on Spider test.
-
XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL
XiYan-SQL achieves SOTA Text-to-SQL accuracy by combining schema filtering, a multi-generator ensemble fine-tuned on varied SQL formats, and a selection model.
Reference graph
Works this paper leans on
-
[1]
P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =
work page 2000
-
[2]
T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980
work page 1980
-
[3]
M. J. Kearns , title =
-
[4]
Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983
work page 1983
-
[5]
R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000
work page 2000
-
[6]
Suppressed for Anonymity , author=
-
[7]
A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981
work page 1981
-
[8]
A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959
work page 1959
-
[9]
Meta AI. Exploring meta llama-3. https://ai.meta.com/blog/meta-llama-3/. Accessed: 2024-04-18
work page 2024
-
[11]
Sadga: Structure-aware dual graph aggregation network for text-to-sql
Ruichu Cai, Jinjie Yuan, Boyan Xu, and Zhifeng Hao. Sadga: Structure-aware dual graph aggregation network for text-to-sql. Advances in Neural Information Processing Systems, 34: 0 7664--7676, 2021
work page 2021
-
[13]
Qlora: Efficient finetuning of quantized llms
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36, 2024
work page 2024
-
[23]
Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, et al. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems, 36, 2024 b
work page 2024
-
[24]
Lost in the middle: How language models use long contexts
Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12: 0 157--173, 2024
work page 2024
-
[26]
OpenAI. Embeddings, 2024 a . Retrieved May 15, 2024, from https://platform.openai.com/docs/guides/embeddings
work page 2024
-
[27]
OpenAI. Hello gpt-4o. https://openai.com/index/hello-gpt-4o/, 2024 b . Accessed: October 15, 2024
work page 2024
-
[29]
Din-sql: Decomposed in-context learning of text-to-sql with self-correction
Mohammadreza Pourreza and Davood Rafiei. Din-sql: Decomposed in-context learning of text-to-sql with self-correction. Advances in Neural Information Processing Systems, 36, 2024 a
work page 2024
-
[35]
Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, and Azalia Mirhoseini
Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, E. Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, and Azalia Mirhoseini. Archon: An architecture search framework for inference-time techniques, 2024. URL https://arxiv.org/abs/2409.15254
-
[36]
Sequence to sequence learning with neural networks
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014
work page 2014
-
[38]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017
work page 2017
-
[42]
Learning to parse database queries using inductive logic programming
John M Zelle and Raymond J Mooney. Learning to parse database queries using inductive logic programming. In Proceedings of the national conference on artificial intelligence, pages 1050--1055, 1996
work page 1996
-
[43]
Advances in neural information processing systems , volume=
Language models are few-shot learners , author=. Advances in neural information processing systems , volume=
-
[44]
Proceedings of the national conference on artificial intelligence , pages=
Learning to parse database queries using inductive logic programming , author=. Proceedings of the national conference on artificial intelligence , pages=
-
[45]
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Large language monkeys: Scaling inference compute with repeated sampling , author=. arXiv preprint arXiv:2407.21787 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[46]
arXiv preprint arXiv:2410.01943 , year=
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL , author=. arXiv preprint arXiv:2410.01943 , year=
-
[47]
arXiv preprint arXiv:2408.07702 , year=
The death of schema linking? text-to-sql in the age of well-reasoned language models , author=. arXiv preprint arXiv:2408.07702 , year=
-
[48]
Gemini: A Family of Highly Capable Multimodal Models
Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[49]
arXiv preprint arXiv:2310.18538 , year=
Evaluating cross-domain text-to-sql models and benchmarks , author=. arXiv preprint arXiv:2310.18538 , year=
-
[50]
arXiv preprint arXiv:2106.01065 , year=
Towards robustness of text-to-SQL models against synonym substitution , author=. arXiv preprint arXiv:2106.01065 , year=
-
[51]
arXiv preprint arXiv:2208.05309 , year=
Looking for a needle in a haystack: A comprehensive study of hallucinations in neural machine translation , author=. arXiv preprint arXiv:2208.05309 , year=
-
[52]
Transactions of the Association for Computational Linguistics , volume=
Lost in the middle: How language models use long contexts , author=. Transactions of the Association for Computational Linguistics , volume=. 2024 , publisher=
work page 2024
-
[53]
Advances in neural information processing systems , volume=
Attention is all you need , author=. Advances in neural information processing systems , volume=
-
[54]
Advances in neural information processing systems , volume=
Sequence to sequence learning with neural networks , author=. Advances in neural information processing systems , volume=
-
[55]
arXiv preprint arXiv:2208.13629 , year=
A survey on text-to-sql parsing: Concepts, methods, and future directions , author=. arXiv preprint arXiv:2208.13629 , year=
-
[56]
Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation
Towards complex text-to-sql in cross-domain database with intermediate representation , author=. arXiv preprint arXiv:1905.08205 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[57]
Computational Linguistics , volume=
Ryansql: Recursively applying sketch-based slot fillings for complex text-to-sql in cross-domain databases , author=. Computational Linguistics , volume=. 2021 , publisher=
work page 2021
-
[58]
arXiv preprint arXiv:1911.04942 , year=
Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers , author=. arXiv preprint arXiv:1911.04942 , year=
-
[59]
arXiv preprint arXiv:2205.06983 , year=
Rasat: Integrating relational structures into pretrained seq2seq model for text-to-sql , author=. arXiv preprint arXiv:2205.06983 , year=
-
[60]
Advances in Neural Information Processing Systems , volume=
Sadga: Structure-aware dual graph aggregation network for text-to-sql , author=. Advances in Neural Information Processing Systems , volume=
-
[61]
arXiv preprint arXiv:2106.01093 , year=
LGESQL: line graph enhanced text-to-SQL model with mixed local and non-local relations , author=. arXiv preprint arXiv:2106.01093 , year=
-
[62]
Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task , author=. arXiv preprint arXiv:1809.08887 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[63]
arXiv preprint arXiv:2204.00498 , year=
Evaluating the text-to-sql capabilities of large language models , author=. arXiv preprint arXiv:2204.00498 , year=
-
[64]
Advances in Neural Information Processing Systems , volume=
Din-sql: Decomposed in-context learning of text-to-sql with self-correction , author=. Advances in Neural Information Processing Systems , volume=
-
[65]
arXiv preprint arXiv:2308.15363 , year=
Text-to-sql empowered by large language models: A benchmark evaluation , author=. arXiv preprint arXiv:2308.15363 , year=
-
[66]
arXiv preprint arXiv:2307.07306 , year=
C3: Zero-shot text-to-sql with chatgpt , author=. arXiv preprint arXiv:2307.07306 , year=
-
[67]
Advances in neural information processing systems , volume=
Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
-
[68]
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[69]
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Least-to-most prompting enables complex reasoning in large language models , author=. arXiv preprint arXiv:2205.10625 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[70]
arXiv preprint arXiv:2312.11242 , year=
Mac-sql: Multi-agent collaboration for text-to-sql , author=. arXiv preprint arXiv:2312.11242 , year=
-
[71]
arXiv preprint arXiv:2402.01117 , year=
DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models , author=. arXiv preprint arXiv:2402.01117 , year=
-
[72]
arXiv preprint arXiv:2402.16347 , year=
CodeS: Towards Building Open-source Language Models for Text-to-SQL , author=. arXiv preprint arXiv:2402.16347 , year=
-
[73]
Advances in Neural Information Processing Systems , volume=
Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls , author=. Advances in Neural Information Processing Systems , volume=
-
[74]
RULER: What's the Real Context Size of Your Long-Context Language Models? , author=. arXiv e-prints , pages=
-
[75]
Emergent Abilities of Large Language Models
Emergent abilities of large language models , author=. arXiv preprint arXiv:2206.07682 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[76]
arXiv preprint arXiv:2403.09732 , year=
PET-SQL: A Prompt-enhanced Two-stage Text-to-SQL Framework with Cross-consistency , author=. arXiv preprint arXiv:2403.09732 , year=
-
[77]
Towards Reasoning in Large Language Models: A Survey
Towards reasoning in large language models: A survey , author=. arXiv preprint arXiv:2212.10403 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[78]
Exploring Meta Llama-3 , author =
-
[79]
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
DeepSeek-Coder: When the Large Language Model Meets Programming--The Rise of Code Intelligence , author=. arXiv preprint arXiv:2401.14196 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[80]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Resdsql: Decoupling schema linking and skeleton parsing for text-to-sql , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[81]
arXiv preprint arXiv:2405.07467 , year=
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation , author=. arXiv preprint arXiv:2405.07467 , year=
-
[82]
Archon: An Architecture Search Framework for Inference-Time Techniques , author=. 2024 , eprint=
work page 2024
-
[83]
LoRA: Low-Rank Adaptation of Large Language Models
Lora: Low-rank adaptation of large language models , author=. arXiv preprint arXiv:2106.09685 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[84]
Advances in Neural Information Processing Systems , volume=
Qlora: Efficient finetuning of quantized llms , author=. Advances in Neural Information Processing Systems , volume=
- [85]
-
[86]
Constitutional AI: Harmlessness from AI Feedback
Constitutional ai: Harmlessness from ai feedback , author=. arXiv preprint arXiv:2212.08073 , year=
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.