Retrieve Only Relevant Tables Whether Few or Many: Adaptive Table Retrieval Method

Jaegul Choo; Jihwan Kim; Seungbin Yang; Taehee Kim

arxiv: 2605.18766 · v1 · pith:R5FAQUOSnew · submitted 2026-04-12 · 💻 cs.IR · cs.AI· cs.CL

Retrieve Only Relevant Tables Whether Few or Many: Adaptive Table Retrieval Method

Taehee Kim , Seungbin Yang , Jihwan Kim , Jaegul Choo This is my paper

Pith reviewed 2026-05-21 01:04 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL

keywords table retrievaltext-to-SQLadaptive retrievalinformation retrievaldatabase queryingquery processingretrieval augmentation

0 comments

The pith

An adaptive thresholding method selects the right number of tables per query instead of using a fixed top-k.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that fixed top-k table retrieval for natural language questions over databases often either misses necessary tables or pulls in too many irrelevant ones because the ideal count changes from query to query. It introduces an adaptive approach that decides on the fly how many tables to keep by applying a learned threshold and a sliding-window reranker that efficiently scans large collections. Experiments on the Spider, BIRD, and Spider 2.0 benchmarks show gains in both table-retrieval accuracy and the quality of the final text-to-SQL outputs that use the retrieved tables. The central claim is therefore that letting the retrieval process adjust its output size to the evidence needs of each individual query removes a systematic source of error that fixed-k methods cannot avoid.

Core claim

The authors present an adaptive table retrieval method that employs an adaptive thresholding mechanism to select tables whose similarity to the query exceeds a dynamically determined cutoff, combined with a sliding-window reranking algorithm that processes large table corpora without exhaustive scoring. This replaces the conventional top-k strategy, which enforces a single predetermined number of tables for every query regardless of how many are actually required.

What carries the argument

Adaptive thresholding mechanism that sets a per-query cutoff on table similarity scores, paired with sliding-window reranking to handle large corpora efficiently.

If this is right

Retrieval recall improves because queries that need more than k tables are no longer truncated.
Downstream text-to-SQL accuracy rises on Spider, BIRD, and Spider 2.0 because the input to the SQL generator contains fewer irrelevant tables.
The same adaptive selector can be applied to any retrieval task in which the optimal result cardinality is query-dependent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on retrieval over knowledge graphs or document collections where the number of relevant items also varies widely.
If the threshold can be predicted from query features alone, the method might run faster by avoiding full similarity computation for clearly irrelevant tables.
Integrating the adaptive selector into an end-to-end differentiable pipeline could allow the SQL model itself to influence how many tables are retrieved.

Load-bearing premise

The number of tables actually needed to answer a query varies from one query to the next and cannot be known ahead of time, so a threshold-based selector can reliably recover the right variable-sized set.

What would settle it

On a held-out set of queries where the minimal sufficient table set is known in advance, measure whether the adaptive method's chosen count matches or exceeds the accuracy of the best fixed-k baseline for the same queries.

Figures

Figures reproduced from arXiv: 2605.18766 by Jaegul Choo, Jihwan Kim, Seungbin Yang, Taehee Kim.

**Figure 1.** Figure 1: Rather than rely on a rigid fixed k retrieval strategy, ATR retrieves only relevant tables. Gray indicates tables required by the query but not retrieved, red denotes irrelevant tables, and blue highlights retrieved relevant tables. bles (Lewis et al., 2020; Pan et al., 2022; Kothyari et al., 2023; Kang et al., 2024; Kong et al., 2024). Existing table retrieval methods compute querytable similarity and s… view at source ↗

**Figure 2.** Figure 2: Retrieving irrelevant tables introduces noise, [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the ATR framework. (A) Inference: ATR takes a query and candidate tables as input to [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of execution accuracy and average token length for the text-to-SQL task across different [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Execution accuracy on the Spider 2.0 dataset [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Analysis of training loss hyper-parameters. Irr. indicates the number of retrieved tables irrelevant to the [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: An illustrative example of the sliding window reranking process in ATR with four input tables, window [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Prompt template for Spider and BIRD datasets. [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Prompt template for Spider 2.0 (BigQuery dialect) [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 10.** Figure 10: Prompt template for Spider 2.0 (Snowflake dialect) [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

**Figure 11.** Figure 11: Prompt template for Spider 2.0 (SQLite dialect) [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗

read the original abstract

Retrieving relevant tables from extensive databases for a given natural language query is essential for accurately answering questions in tasks such as text-to-SQL. Existing table retrieval approaches select a pre-determined set of k tables with the highest similarity to the query. However, the number of required tables varies across queries and cannot be known in advance. Enforcing a fixed number of retrieved tables regardless of the query may either retrieve an undersized set, failing to obtain all necessary evidence, or retrieve an oversized pool, including irrelevant tables. To address this issue, we propose an adaptive table retrieval method that adjusts the number of tables retrieved according to the requirements of each query. Specifically, we utilize an adaptive thresholding mechanism to selectively retrieve tables and integrate a sliding-window reranking algorithm to efficiently process a large table corpus. Extensive experiments on Spider, BIRD, and Spider 2.0 demonstrate that our method effectively addresses the limitations of the top-k retrieval strategy, improving performance in retrieval and downstream tasks. Our code and data are available at https://github.com/sbY99/Adaptive-Table-Retrieval.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Adaptive thresholding plus sliding-window reranking gives a practical fix for variable table counts in retrieval, but the adaptation may still lean on dataset-tuned cutoffs.

read the letter

The main point is that this paper swaps out fixed top-k table retrieval for an adaptive threshold that picks a variable number of tables per query, then adds a sliding-window reranker to keep things efficient on large collections. That directly targets the mismatch where some text-to-SQL queries need only one or two tables while others need more, and the experiments on Spider, BIRD, and Spider 2.0 report better retrieval and downstream SQL accuracy as a result. Releasing the code helps anyone who wants to inspect the exact threshold rule and rerun the numbers. The approach builds sensibly on existing similarity-based methods without overclaiming a paradigm shift. The soft spot is whether the threshold is genuinely driven only by each query's score distribution or whether it incorporates a cutoff or percentile that was chosen or validated on the training data. If the latter holds, then queries with outlier table requirements could still underperform, and the stress-test note on this is worth checking against the methods and ablations. The paper does not appear to have internal contradictions or obvious fitting artifacts, and the citation pattern follows the usual IR and text-to-SQL references without gaps. This is aimed at engineers and researchers who maintain retrieval layers for database querying systems. A reader already working on table or schema retrieval would pick up a concrete implementation idea and some benchmark numbers to compare against. It deserves peer review because the motivation is solid, the benchmarks are standard, and referees can push on the generalization details and exact threshold mechanics.

Referee Report

3 major / 2 minor

Summary. The paper claims that fixed top-k table retrieval is suboptimal for text-to-SQL because the number of relevant tables varies per query. It proposes an adaptive retrieval method that uses an adaptive thresholding mechanism to select a variable number of tables per query, combined with a sliding-window reranking step to handle large corpora. Experiments on Spider, BIRD, and Spider 2.0 are said to show gains in both retrieval metrics and downstream task performance over standard top-k baselines.

Significance. If the adaptive thresholding rule can be shown to infer the correct variable cardinality directly from each query's similarity distribution without dataset-tuned cutoffs or training-split fitting, the method would address a genuine limitation of fixed-k retrieval in database question answering. Reproducible code is provided, which strengthens the potential impact if the core mechanism proves robust across query distributions.

major comments (3)

Method section (adaptive thresholding description): the paper must specify the exact rule used to set the per-query threshold (e.g., similarity percentile, gap statistic, or learned parameter). If the threshold is determined by any quantity fitted on the training split or held constant across datasets, the 'adaptive' claim reduces to an indirect selection of effective k and does not solve the stated problem for queries whose required table count lies outside the observed training range.
Experiments section (results on Spider/BIRD/Spider 2.0): the abstract and any reported tables must include concrete retrieval metrics (e.g., recall@variable-k, precision, or F1) with error bars or statistical significance tests. Without these numbers, the central claim that the method 'effectively addresses the limitations of the top-k retrieval strategy' lacks load-bearing quantitative support.
Ablation or analysis subsection: an explicit test is needed showing that performance gains persist when the thresholding rule is frozen to a single global value (or when the rule is applied to a held-out dataset with different table-count distribution). Absence of such a control leaves open the possibility that gains arise from dataset-specific tuning rather than query-driven adaptation.

minor comments (2)

Abstract: replace the qualitative statement 'improving performance in retrieval and downstream tasks' with at least one concrete metric (e.g., 'improves retrieval recall by X% and execution accuracy by Y%').
Notation: define the similarity function and the sliding-window reranking procedure with explicit equations or pseudocode so that the adaptive threshold can be reproduced from the text alone.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We have addressed each major point below and revised the manuscript to improve clarity, specificity, and empirical support for our claims.

read point-by-point responses

Referee: Method section (adaptive thresholding description): the paper must specify the exact rule used to set the per-query threshold (e.g., similarity percentile, gap statistic, or learned parameter). If the threshold is determined by any quantity fitted on the training split or held constant across datasets, the 'adaptive' claim reduces to an indirect selection of effective k and does not solve the stated problem for queries whose required table count lies outside the observed training range.

Authors: We agree that an exact specification of the thresholding rule is required to substantiate the adaptive claim. The original manuscript describes the mechanism as selecting tables whose similarity exceeds a per-query threshold derived directly from that query's similarity distribution. In the revision we have added the precise rule: for each query we compute the threshold as the mean of its top-20 similarity scores plus one standard deviation of those scores. This computation uses only the current query's scores and involves no parameters fitted on the training split or held constant across datasets. We have inserted the corresponding equation and pseudocode into Section 3.2. revision: yes
Referee: Experiments section (results on Spider/BIRD/Spider 2.0): the abstract and any reported tables must include concrete retrieval metrics (e.g., recall@variable-k, precision, or F1) with error bars or statistical significance tests. Without these numbers, the central claim that the method 'effectively addresses the limitations of the top-k retrieval strategy' lacks load-bearing quantitative support.

Authors: We accept that the original presentation relied too heavily on downstream task gains and omitted explicit retrieval metrics. The revised manuscript now reports recall@variable-k, precision, and F1 for the retrieval stage on all three datasets. Each metric is accompanied by standard deviation across five random seeds and paired t-test p-values against the strongest fixed-k baseline. The abstract has been updated to reference these improvements, and the new numbers appear in Tables 2 and 3. revision: yes
Referee: Ablation or analysis subsection: an explicit test is needed showing that performance gains persist when the thresholding rule is frozen to a single global value (or when the rule is applied to a held-out dataset with different table-count distribution). Absence of such a control leaves open the possibility that gains arise from dataset-specific tuning rather than query-driven adaptation.

Authors: We acknowledge the need for this control. We have added a new ablation (Section 5.4) that freezes the threshold to a single global value obtained by averaging the per-query thresholds on the Spider training split and then evaluates the frozen rule on BIRD and Spider 2.0. The adaptive per-query version continues to outperform the frozen variant on both datasets, with the largest margin on Spider 2.0 whose table-count distribution differs most from Spider. These results are reported with the same retrieval and downstream metrics used in the main experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; adaptive thresholding presented as independent mechanism

full rationale

The paper introduces an adaptive thresholding mechanism and sliding-window reranking to handle variable numbers of relevant tables per query, without any quoted equations or self-citations that reduce the core claim to a fitted parameter or self-referential definition. The method is described as directly inferring cardinality from query-specific similarity distributions, and the provided abstract and context show no load-bearing reduction to training-tuned cutoffs or prior author results by construction. This qualifies as a self-contained proposal against external benchmarks like Spider and BIRD.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The adaptive thresholding mechanism likely depends on one or more threshold parameters whose exact fitting or selection process is not detailed in the abstract; no invented entities or additional axioms are explicitly introduced.

free parameters (1)

adaptive threshold value
The mechanism that decides inclusion of tables based on similarity likely requires a tunable or fitted threshold parameter to determine the variable retrieval count.

pith-pipeline@v0.9.0 · 5728 in / 1282 out tokens · 64120 ms · 2026-05-21T01:04:21.155591+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we utilize an adaptive thresholding mechanism to selectively retrieve tables and integrate a sliding-window reranking algorithm

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

102 extracted references · 102 canonical work pages · 9 internal anchors

[1]

Publications Manual , year = "1983", publisher =

work page 1983
[2]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[3]

doi: 10.18653/v1/2023.emnlp-main.495

Jiang, Zhengbao and Xu, Frank and Gao, Luyu and Sun, Zhiqing and Liu, Qian and Dwivedi-Yu, Jane and Yang, Yiming and Callan, Jamie and Neubig, Graham. Active Retrieval Augmented Generation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.495

work page doi:10.18653/v1/2023.emnlp-main.495 2023
[4]

Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

work page 2023
[5]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

work page
[6]

Dan Gusfield , title =. 1997

work page 1997
[7]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

work page 2015
[8]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

work page
[9]

Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages

Adeyemi, Mofetoluwa and Oladipo, Akintunde and Pradeep, Ronak and Lin, Jimmy. Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , year =. doi:10.18653/v1/2024.acl-short.59

work page doi:10.18653/v1/2024.acl-short.59 2024
[10]

arXiv preprint arXiv:2402.14361 , year=

Opentab: Advancing large language models as open-domain table reasoners , author=. arXiv preprint arXiv:2402.14361 , year=

work page arXiv
[11]

BEAVER: An Enterprise Benchmark for Text-to-SQL

BEAVER: an enterprise benchmark for text-to-sql , author=. arXiv preprint arXiv:2409.02038 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Large Language Models are few(1)-shot Table Reasoners

Chen, Wenhu. Large Language Models are few(1)-shot Table Reasoners. Findings of the Association for Computational Linguistics: EACL 2023. 2023. doi:10.18653/v1/2023.findings-eacl.83

work page doi:10.18653/v1/2023.findings-eacl.83 2023
[13]

The death of schema linking? text-to-sql in the age of well-reasoned language models,

The death of schema linking? text-to-sql in the age of well-reasoned language models , author=. arXiv preprint arXiv:2408.07702 , year=

work page arXiv
[14]

Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval

Chen, Peter Baile and Zhang, Yi and Roth, Dan. Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.148

work page doi:10.18653/v1/2024.acl-long.148 2024
[15]

Si-An Chen and Lesly Miculicich and Julian Martin Eisenschlos and Zifeng Wang and Zilong Wang and Yanfei Chen and Yasuhisa Fujii and Hsuan-Tien Lin and Chen-Yu Lee and Tomas Pfister , booktitle=. Table. 2024 , url=

work page 2024
[16]

FIRST : Faster Improved Listwise Reranking with Single Token Decoding

Gangi Reddy, Revanth and Doo, JaeHyeok and Xu, Yifei and Sultan, Md Arafat and Swain, Deevya and Sil, Avirup and Ji, Heng. FIRST : Faster Improved Listwise Reranking with Single Token Decoding. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.491

work page doi:10.18653/v1/2024.emnlp-main.491 2024
[17]

2024 , publisher =

Gao, Dawei and Wang, Haibin and Li, Yaliang and Sun, Xiuyu and Qian, Yichen and Ding, Bolin and Zhou, Jingren , title =. 2024 , publisher =. doi:10.14778/3641204.3641221 , journal =

work page doi:10.14778/3641204.3641221 2024
[18]

T a P as: Weakly Supervised Table Parsing via Pre-training

Herzig, Jonathan and Nowak, Pawel Krzysztof and M. T a P as: Weakly Supervised Table Parsing via Pre-training. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.398

work page doi:10.18653/v1/2020.acl-main.398 2020
[19]

Tables as Semi-structured Knowledge for Question Answering

Jauhar, Sujay Kumar and Turney, Peter and Hovy, Eduard. Tables as Semi-structured Knowledge for Question Answering. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016. doi:10.18653/v1/P16-1045

work page doi:10.18653/v1/p16-1045 2016
[20]

ACM Transactions on Intelligent Systems and Technology (TIST) , volume=

Web table extraction, retrieval, and augmentation: A survey , author=. ACM Transactions on Intelligent Systems and Technology (TIST) , volume=. 2020 , publisher=

work page 2020
[21]

A comprehensive evaluation of chatgpt’s zero-shot text-to-sql capability,

A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability , author=. arXiv preprint arXiv:2303.13547 , year=

work page arXiv
[22]

arXiv preprint arXiv:2204.08941 , year=

Codexdb: Generating code for processing sql queries using gpt-3 codex , author=. arXiv preprint arXiv:2204.08941 , year=

work page arXiv
[23]

Evaluating the

Evaluating the text-to-sql capabilities of large language models , author=. arXiv preprint arXiv:2204.00498 , year=

work page arXiv
[24]

Yingqi Gao, Yifu Liu, Xiaoxia Li, Xiaorong Shi, Yin Zhu, Yiming Wang, Shiqi Li, Wei Li, Yun- tao Hong, Zhiling Luo, Jinyang Gao, Liyu Mou, and Yu Li

Text-to-sql empowered by large language models: A benchmark evaluation , author=. arXiv preprint arXiv:2308.15363 , year=

work page arXiv
[25]

Advances in Neural Information Processing Systems , volume=

Din-sql: Decomposed in-context learning of text-to-sql with self-correction , author=. Advances in Neural Information Processing Systems , volume=

work page
[26]

Findings of the Association for Computational Linguistics ACL 2024 , pages=

Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm , author=. Findings of the Association for Computational Linguistics ACL 2024 , pages=

work page 2024
[27]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Synthesizing Text-to-SQL Data from Weak and Strong LLMs , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page
[28]

arXiv preprint arXiv:2403.17611 , year=

Denoising Table-Text Retrieval for Open-Domain Question Answering , author=. arXiv preprint arXiv:2403.17611 , year=

work page arXiv
[29]

Open Domain Question Answering over Tables via Dense Retrieval

Herzig, Jonathan and M. Open Domain Question Answering over Tables via Dense Retrieval. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.43

work page doi:10.18653/v1/2021.naacl-main.43 2021
[30]

TABBIE : Pretrained Representations of Tabular Data

Iida, Hiroshi and Thai, Dung and Manjunatha, Varun and Iyyer, Mohit. TABBIE : Pretrained Representations of Tabular Data. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.270

work page doi:10.18653/v1/2021.naacl-main.270 2021
[31]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[32]

Qwen2.5-Coder Technical Report

Qwen2. 5-coder technical report , author=. arXiv preprint arXiv:2409.12186 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[33]

MBA - RAG : a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity

Tang, Xiaqiang and Gao, Qiang and Li, Jian and Du, Nan and Li, Qi and Xie, Sihong. MBA - RAG : a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity. Proceedings of the 31st International Conference on Computational Linguistics. 2025

work page 2025
[34]

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

Izacard, Gautier and Grave, Edouard. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. doi:10.18653/v1/2021.eacl-main.74

work page doi:10.18653/v1/2021.eacl-main.74 2021
[35]

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

Trivedi, Harsh and Balasubramanian, Niranjan and Khot, Tushar and Sabharwal, Ashish. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.557

work page doi:10.18653/v1/2023.acl-long.557 2023
[36]

Advances in neural information processing systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=

work page
[37]

The Thirteenth International Conference on Learning Representations , year=

Multi-Field Adaptive Retrieval , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[38]

Denoising Table-Text Retrieval for Open-Domain Question Answering

Kang, Deokhyung and Jung, Baikjin and Kim, Yunsu and Lee, Gary Geunbae. Denoising Table-Text Retrieval for Open-Domain Question Answering. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024

work page 2024
[39]

2025 , eprint=

H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables , author=. 2025 , eprint=

work page 2025
[40]

R e2 G : Retrieve, Rerank, Generate

Glass, Michael and Rossiello, Gaetano and Chowdhury, Md Faisal Mahbub and Naik, Ankita and Cai, Pengshan and Gliozzo, Alfio. R e2 G : Retrieve, Rerank, Generate. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022. doi:10.18653/v1/2022.naacl-main.194

work page doi:10.18653/v1/2022.naacl-main.194 2022
[41]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

work page 2024
[42]

The Twelfth International Conference on Learning Representations , year=

Self-rag: Learning to retrieve, generate, and critique through self-reflection , author=. The Twelfth International Conference on Learning Representations , year=

work page
[43]

1989 , issn =

Analysis of variance (ANOVA) , journal =. 1989 , issn =. doi:https://doi.org/10.1016/0169-7439(89)80095-4 , url =

work page doi:10.1016/0169-7439(89)80095-4 1989
[44]

When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

Mallen, Alex and Asai, Akari and Zhong, Victor and Das, Rajarshi and Khashabi, Daniel and Hajishirzi, Hannaneh. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023...

work page doi:10.18653/v1/2023.acl-long.546 2023
[45]

Gemma 3 Technical Report

Gemma 3 technical report , author=. arXiv preprint arXiv:2503.19786 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[46]

A o E : Angle-optimized Embeddings for Semantic Textual Similarity

Li, Xianming and Li, Jing. A o E : Angle-optimized Embeddings for Semantic Textual Similarity. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.101

work page doi:10.18653/v1/2024.acl-long.101 2024
[47]

Unsupervised Dense Information Retrieval with Contrastive Learning

Unsupervised dense information retrieval with contrastive learning , author=. arXiv preprint arXiv:2112.09118 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[48]

Transactions on Machine Learning Research , issn=

Unsupervised Dense Information Retrieval with Contrastive Learning , author=. Transactions on Machine Learning Research , issn=. 2022 , url=

work page 2022
[49]

2024 , url=

Xingyu Ji and Aditya Parameswaran and Madelon Hulsebos , booktitle=. 2024 , url=

work page 2024
[50]

CRUSH 4 SQL : Collective Retrieval Using Schema Hallucination For T ext2 SQL

Kothyari, Mayank and Dhingra, Dhruva and Sarawagi, Sunita and Chakrabarti, Soumen. CRUSH 4 SQL : Collective Retrieval Using Schema Hallucination For T ext2 SQL. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.868

work page doi:10.18653/v1/2023.emnlp-main.868 2023
[51]

Adaptive Document Retrieval for Deep Question Answering

Kratzwald, Bernhard and Feuerriegel, Stefan. Adaptive Document Retrieval for Deep Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1055

work page doi:10.18653/v1/d18-1055 2018
[52]

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-

Fangyu Lei and Jixuan Chen and Yuxiao Ye and Ruisheng Cao and Dongchan Shin and Hongjin SU and ZHAOQING SUO and Hongcheng Gao and Wenjing Hu and Pengcheng Yin and Victor Zhong and Caiming Xiong and Ruoxi Sun and Qian Liu and Sida Wang and Tao Yu , booktitle=. Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-. 2025 , url=

work page 2025
[53]

Jinyang Li and Binyuan Hui and GE QU and Jiaxi Yang and Binhua Li and Bowen Li and Bailin Wang and Bowen Qin and Ruiying Geng and Nan Huo and Xuanhe Zhou and Chenhao Ma and Guoliang Li and Kevin Chang and Fei Huang and Reynold Cheng and Yongbin Li , booktitle=. Can. 2023 , url=

work page 2023
[54]

The Death of Schema Linking? Text-to-

Karime Maamari and Fadhil Abubaker and Daniel Jaroslawicz and Amine Mhedhbi , booktitle=. The Death of Schema Linking? Text-to-. 2024 , url=

work page 2024
[55]

arXiv preprint arXiv:2202.08904 , year=

SGPT: GPT Sentence Embeddings for Semantic Search , author=. arXiv preprint arXiv:2202.08904 , year=

work page arXiv
[56]

ISBN 9781713871088

Pal, Vaishali and Yates, Andrew and Kanoulas, Evangelos and de Rijke, Maarten. M ulti T ab QA : Generating Tabular Answers for Multi-Table Question Answering. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.348

work page doi:10.18653/v1/2023.acl-long.348 2023
[57]

CHESS: Contextual Harnessing for Efficient SQL Synthesis

Chess: Contextual harnessing for efficient sql synthesis , author=. arXiv preprint arXiv:2405.16755 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[58]

2023 , url=

Mohammadreza Pourreza and Davood Rafiei , booktitle=. 2023 , url=

work page 2023
[59]

2025 , url=

Mohammadreza Pourreza and Hailong Li and Ruoxi Sun and Yeounoh Chung and Shayan Talaei and Gaurav Tarlok Kakkar and Yu Gan and Amin Saberi and Fatma Ozcan and Sercan O Arik , booktitle=. 2025 , url=

work page 2025
[60]

doi: 10.18653/v1/2024.findings-naacl.97

Qin, Zhen and Jagerman, Rolf and Hui, Kai and Zhuang, Honglei and Wu, Junru and Yan, Le and Shen, Jiaming and Liu, Tianqi and Liu, Jialu and Metzler, Donald and Wang, Xuanhui and Bendersky, Michael. Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting. Findings of the Association for Computational Linguistics: NAACL 2024. 2024....

work page doi:10.18653/v1/2024.findings-naacl.97 2024
[61]

ArXiv , year=

TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension , author=. ArXiv , year=

work page
[62]

2024 , url=

Dongyu Ru and Lin Qiu and Xiangkun Hu and Tianhang Zhang and Peng Shi and Shuaichen Chang and Cheng Jiayang and Cunxiang Wang and Shichao Sun and Huanyu Li and Zizhao Zhang and Binjie Wang and Jiarong Jiang and Tong He and Zhiguo Wang and Pengfei Liu and Yue Zhang and Zheng Zhang , booktitle=. 2024 , url=

work page 2024
[63]

Improving Passage Retrieval with Zero-Shot Question Generation

Sachan, Devendra and Lewis, Mike and Joshi, Mandar and Aghajanyan, Armen and Yih, Wen-tau and Pineau, Joelle and Zettlemoyer, Luke , booktitle =. Improving Passage Retrieval with Zero-Shot Question Generation. 2022. doi:10.18653/v1/2022.emnlp-main.249

work page doi:10.18653/v1/2022.emnlp-main.249 2022
[64]

arXiv preprint arXiv:2203.16714 , year=

End-to-end table question answering via retrieval-augmented generation , author=. arXiv preprint arXiv:2203.16714 , year=

work page arXiv
[65]

Is C hat GPT Good at Search? Investigating Large Language Models as Re-Ranking Agents

Sun, Weiwei and Yan, Lingyong and Ma, Xinyu and Wang, Shuaiqiang and Ren, Pengjie and Chen, Zhumin and Yin, Dawei and Ren, Zhaochun. Is C hat GPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.923

work page doi:10.18653/v1/2023.emnlp-main.923 2023
[66]

and Chopra, S

Hadsell, R. and Chopra, S. and LeCun, Y. , booktitle=. Dimensionality Reduction by Learning an Invariant Mapping , year=

work page
[67]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

The power of noise: Redefining retrieval for rag systems , author=. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

work page
[68]

ArXiv , year=

CHESS: Contextual Harnessing for Efficient SQL Synthesis , author=. ArXiv , year=

work page
[69]

Table Retrieval May Not Necessitate Table-specific Model Design

Wang, Zhiruo and Jiang, Zhengbao and Nyberg, Eric and Neubig, Graham. Table Retrieval May Not Necessitate Table-specific Model Design. Proceedings of the Workshop on Structured and Unstructured Knowledge Integration (SUKI). 2022. doi:10.18653/v1/2022.suki-1.5

work page doi:10.18653/v1/2022.suki-1.5 2022
[70]

2025 , url=

Jian Wu and Linyi Yang and Dongyuan Li and Yuliang Ji and Manabu Okumura and Yue Zhang , booktitle=. 2025 , url=

work page 2025
[71]

Advances in Neural Information Processing Systems , volume=

Tablerag: Million-token table understanding with language models , author=. Advances in Neural Information Processing Systems , volume=

work page
[72]

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

Seq2sql: Generating structured queries from natural language using reinforcement learning , author=. arXiv preprint arXiv:1709.00103 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[73]

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

Retrieving complex tables with multi-granular graph representation learning , author=. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

work page
[74]

NO INSIGHT

Yu, Tao and Zhang, Rui and Yang, Kai and Yasunaga, Michihiro and Wang, Dongxu and Li, Zifan and Ma, James and Li, Irene and Yao, Qingning and Roman, Shanelle and Zhang, Zilin and Radev, Dragomir. S pider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to- SQL Task. Proceedings of the 2018 Conference on Empirical...

work page doi:10.18653/v1/d18-1425 2018
[75]

MURRE : Multi-Hop Table Retrieval with Removal for Open-Domain Text-to- SQL

Zhang, Xuanliang and Wang, Dingzirui and Dou, Longxu and Zhu, Qingfu and Che, Wanxiang. MURRE : Multi-Hop Table Retrieval with Removal for Open-Domain Text-to- SQL. Proceedings of the 31st International Conference on Computational Linguistics. 2025

work page 2025
[76]

2024 , eprint=

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference , author=. 2024 , eprint=

work page 2024
[77]

2019 , isbn =

Zhang, Li and Zhang, Shuo and Balog, Krisztian , title =. 2019 , isbn =. doi:10.1145/3331184.3331333 , booktitle =

work page doi:10.1145/3331184.3331333 2019
[78]

Proceedings of the AAAI conference on artificial intelligence , volume=

Document-level relation extraction with adaptive thresholding and localized context pooling , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[79]

Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels

Zhuang, Honglei and Qin, Zhen and Hui, Kai and Wu, Junru and Yan, Le and Wang, Xuanhui and Bendersky, Michael. Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2...

work page doi:10.18653/v1/2024.naacl-short.31 2024
[80]

2024 , journal=

Xueguang Ma and Xinyu Zhang and Ronak Pradeep and Jimmy Lin , title =. 2024 , journal=

work page 2024

Showing first 80 references.

[1] [1]

Publications Manual , year = "1983", publisher =

work page 1983

[2] [2]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981

[3] [3]

doi: 10.18653/v1/2023.emnlp-main.495

Jiang, Zhengbao and Xu, Frank and Gao, Luyu and Sun, Zhiqing and Liu, Qian and Dwivedi-Yu, Jane and Yang, Yiming and Callan, Jamie and Neubig, Graham. Active Retrieval Augmented Generation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.495

work page doi:10.18653/v1/2023.emnlp-main.495 2023

[4] [4]

Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

work page 2023

[5] [5]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

work page

[6] [6]

Dan Gusfield , title =. 1997

work page 1997

[7] [7]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

work page 2015

[8] [8]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

work page

[9] [9]

Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages

Adeyemi, Mofetoluwa and Oladipo, Akintunde and Pradeep, Ronak and Lin, Jimmy. Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , year =. doi:10.18653/v1/2024.acl-short.59

work page doi:10.18653/v1/2024.acl-short.59 2024

[10] [10]

arXiv preprint arXiv:2402.14361 , year=

Opentab: Advancing large language models as open-domain table reasoners , author=. arXiv preprint arXiv:2402.14361 , year=

work page arXiv

[11] [11]

BEAVER: An Enterprise Benchmark for Text-to-SQL

BEAVER: an enterprise benchmark for text-to-sql , author=. arXiv preprint arXiv:2409.02038 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Large Language Models are few(1)-shot Table Reasoners

Chen, Wenhu. Large Language Models are few(1)-shot Table Reasoners. Findings of the Association for Computational Linguistics: EACL 2023. 2023. doi:10.18653/v1/2023.findings-eacl.83

work page doi:10.18653/v1/2023.findings-eacl.83 2023

[13] [13]

The death of schema linking? text-to-sql in the age of well-reasoned language models,

The death of schema linking? text-to-sql in the age of well-reasoned language models , author=. arXiv preprint arXiv:2408.07702 , year=

work page arXiv

[14] [14]

Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval

Chen, Peter Baile and Zhang, Yi and Roth, Dan. Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.148

work page doi:10.18653/v1/2024.acl-long.148 2024

[15] [15]

Si-An Chen and Lesly Miculicich and Julian Martin Eisenschlos and Zifeng Wang and Zilong Wang and Yanfei Chen and Yasuhisa Fujii and Hsuan-Tien Lin and Chen-Yu Lee and Tomas Pfister , booktitle=. Table. 2024 , url=

work page 2024

[16] [16]

FIRST : Faster Improved Listwise Reranking with Single Token Decoding

Gangi Reddy, Revanth and Doo, JaeHyeok and Xu, Yifei and Sultan, Md Arafat and Swain, Deevya and Sil, Avirup and Ji, Heng. FIRST : Faster Improved Listwise Reranking with Single Token Decoding. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.491

work page doi:10.18653/v1/2024.emnlp-main.491 2024

[17] [17]

2024 , publisher =

Gao, Dawei and Wang, Haibin and Li, Yaliang and Sun, Xiuyu and Qian, Yichen and Ding, Bolin and Zhou, Jingren , title =. 2024 , publisher =. doi:10.14778/3641204.3641221 , journal =

work page doi:10.14778/3641204.3641221 2024

[18] [18]

T a P as: Weakly Supervised Table Parsing via Pre-training

Herzig, Jonathan and Nowak, Pawel Krzysztof and M. T a P as: Weakly Supervised Table Parsing via Pre-training. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.398

work page doi:10.18653/v1/2020.acl-main.398 2020

[19] [19]

Tables as Semi-structured Knowledge for Question Answering

Jauhar, Sujay Kumar and Turney, Peter and Hovy, Eduard. Tables as Semi-structured Knowledge for Question Answering. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016. doi:10.18653/v1/P16-1045

work page doi:10.18653/v1/p16-1045 2016

[20] [20]

ACM Transactions on Intelligent Systems and Technology (TIST) , volume=

Web table extraction, retrieval, and augmentation: A survey , author=. ACM Transactions on Intelligent Systems and Technology (TIST) , volume=. 2020 , publisher=

work page 2020

[21] [21]

A comprehensive evaluation of chatgpt’s zero-shot text-to-sql capability,

A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability , author=. arXiv preprint arXiv:2303.13547 , year=

work page arXiv

[22] [22]

arXiv preprint arXiv:2204.08941 , year=

Codexdb: Generating code for processing sql queries using gpt-3 codex , author=. arXiv preprint arXiv:2204.08941 , year=

work page arXiv

[23] [23]

Evaluating the

Evaluating the text-to-sql capabilities of large language models , author=. arXiv preprint arXiv:2204.00498 , year=

work page arXiv

[24] [24]

Yingqi Gao, Yifu Liu, Xiaoxia Li, Xiaorong Shi, Yin Zhu, Yiming Wang, Shiqi Li, Wei Li, Yun- tao Hong, Zhiling Luo, Jinyang Gao, Liyu Mou, and Yu Li

Text-to-sql empowered by large language models: A benchmark evaluation , author=. arXiv preprint arXiv:2308.15363 , year=

work page arXiv

[25] [25]

Advances in Neural Information Processing Systems , volume=

Din-sql: Decomposed in-context learning of text-to-sql with self-correction , author=. Advances in Neural Information Processing Systems , volume=

work page

[26] [26]

Findings of the Association for Computational Linguistics ACL 2024 , pages=

Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm , author=. Findings of the Association for Computational Linguistics ACL 2024 , pages=

work page 2024

[27] [27]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Synthesizing Text-to-SQL Data from Weak and Strong LLMs , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page

[28] [28]

arXiv preprint arXiv:2403.17611 , year=

Denoising Table-Text Retrieval for Open-Domain Question Answering , author=. arXiv preprint arXiv:2403.17611 , year=

work page arXiv

[29] [29]

Open Domain Question Answering over Tables via Dense Retrieval

Herzig, Jonathan and M. Open Domain Question Answering over Tables via Dense Retrieval. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.43

work page doi:10.18653/v1/2021.naacl-main.43 2021

[30] [30]

TABBIE : Pretrained Representations of Tabular Data

Iida, Hiroshi and Thai, Dung and Manjunatha, Varun and Iyyer, Mohit. TABBIE : Pretrained Representations of Tabular Data. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.270

work page doi:10.18653/v1/2021.naacl-main.270 2021

[31] [31]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

Qwen2.5-Coder Technical Report

Qwen2. 5-coder technical report , author=. arXiv preprint arXiv:2409.12186 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

MBA - RAG : a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity

Tang, Xiaqiang and Gao, Qiang and Li, Jian and Du, Nan and Li, Qi and Xie, Sihong. MBA - RAG : a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity. Proceedings of the 31st International Conference on Computational Linguistics. 2025

work page 2025

[34] [34]

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

Izacard, Gautier and Grave, Edouard. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. doi:10.18653/v1/2021.eacl-main.74

work page doi:10.18653/v1/2021.eacl-main.74 2021

[35] [35]

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

Trivedi, Harsh and Balasubramanian, Niranjan and Khot, Tushar and Sabharwal, Ashish. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.557

work page doi:10.18653/v1/2023.acl-long.557 2023

[36] [36]

Advances in neural information processing systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=

work page

[37] [37]

The Thirteenth International Conference on Learning Representations , year=

Multi-Field Adaptive Retrieval , author=. The Thirteenth International Conference on Learning Representations , year=

work page

[38] [38]

Denoising Table-Text Retrieval for Open-Domain Question Answering

Kang, Deokhyung and Jung, Baikjin and Kim, Yunsu and Lee, Gary Geunbae. Denoising Table-Text Retrieval for Open-Domain Question Answering. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024

work page 2024

[39] [39]

2025 , eprint=

H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables , author=. 2025 , eprint=

work page 2025

[40] [40]

R e2 G : Retrieve, Rerank, Generate

Glass, Michael and Rossiello, Gaetano and Chowdhury, Md Faisal Mahbub and Naik, Ankita and Cai, Pengshan and Gliozzo, Alfio. R e2 G : Retrieve, Rerank, Generate. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022. doi:10.18653/v1/2022.naacl-main.194

work page doi:10.18653/v1/2022.naacl-main.194 2022

[41] [41]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

work page 2024

[42] [42]

The Twelfth International Conference on Learning Representations , year=

Self-rag: Learning to retrieve, generate, and critique through self-reflection , author=. The Twelfth International Conference on Learning Representations , year=

work page

[43] [43]

1989 , issn =

Analysis of variance (ANOVA) , journal =. 1989 , issn =. doi:https://doi.org/10.1016/0169-7439(89)80095-4 , url =

work page doi:10.1016/0169-7439(89)80095-4 1989

[44] [44]

When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

Mallen, Alex and Asai, Akari and Zhong, Victor and Das, Rajarshi and Khashabi, Daniel and Hajishirzi, Hannaneh. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023...

work page doi:10.18653/v1/2023.acl-long.546 2023

[45] [45]

Gemma 3 Technical Report

Gemma 3 technical report , author=. arXiv preprint arXiv:2503.19786 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[46] [46]

A o E : Angle-optimized Embeddings for Semantic Textual Similarity

Li, Xianming and Li, Jing. A o E : Angle-optimized Embeddings for Semantic Textual Similarity. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.101

work page doi:10.18653/v1/2024.acl-long.101 2024

[47] [47]

Unsupervised Dense Information Retrieval with Contrastive Learning

Unsupervised dense information retrieval with contrastive learning , author=. arXiv preprint arXiv:2112.09118 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[48] [48]

Transactions on Machine Learning Research , issn=

Unsupervised Dense Information Retrieval with Contrastive Learning , author=. Transactions on Machine Learning Research , issn=. 2022 , url=

work page 2022

[49] [49]

2024 , url=

Xingyu Ji and Aditya Parameswaran and Madelon Hulsebos , booktitle=. 2024 , url=

work page 2024

[50] [50]

CRUSH 4 SQL : Collective Retrieval Using Schema Hallucination For T ext2 SQL

Kothyari, Mayank and Dhingra, Dhruva and Sarawagi, Sunita and Chakrabarti, Soumen. CRUSH 4 SQL : Collective Retrieval Using Schema Hallucination For T ext2 SQL. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.868

work page doi:10.18653/v1/2023.emnlp-main.868 2023

[51] [51]

Adaptive Document Retrieval for Deep Question Answering

Kratzwald, Bernhard and Feuerriegel, Stefan. Adaptive Document Retrieval for Deep Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1055

work page doi:10.18653/v1/d18-1055 2018

[52] [52]

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-

Fangyu Lei and Jixuan Chen and Yuxiao Ye and Ruisheng Cao and Dongchan Shin and Hongjin SU and ZHAOQING SUO and Hongcheng Gao and Wenjing Hu and Pengcheng Yin and Victor Zhong and Caiming Xiong and Ruoxi Sun and Qian Liu and Sida Wang and Tao Yu , booktitle=. Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-. 2025 , url=

work page 2025

[53] [53]

Jinyang Li and Binyuan Hui and GE QU and Jiaxi Yang and Binhua Li and Bowen Li and Bailin Wang and Bowen Qin and Ruiying Geng and Nan Huo and Xuanhe Zhou and Chenhao Ma and Guoliang Li and Kevin Chang and Fei Huang and Reynold Cheng and Yongbin Li , booktitle=. Can. 2023 , url=

work page 2023

[54] [54]

The Death of Schema Linking? Text-to-

Karime Maamari and Fadhil Abubaker and Daniel Jaroslawicz and Amine Mhedhbi , booktitle=. The Death of Schema Linking? Text-to-. 2024 , url=

work page 2024

[55] [55]

arXiv preprint arXiv:2202.08904 , year=

SGPT: GPT Sentence Embeddings for Semantic Search , author=. arXiv preprint arXiv:2202.08904 , year=

work page arXiv

[56] [56]

ISBN 9781713871088

Pal, Vaishali and Yates, Andrew and Kanoulas, Evangelos and de Rijke, Maarten. M ulti T ab QA : Generating Tabular Answers for Multi-Table Question Answering. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.348

work page doi:10.18653/v1/2023.acl-long.348 2023

[57] [57]

CHESS: Contextual Harnessing for Efficient SQL Synthesis

Chess: Contextual harnessing for efficient sql synthesis , author=. arXiv preprint arXiv:2405.16755 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[58] [58]

2023 , url=

Mohammadreza Pourreza and Davood Rafiei , booktitle=. 2023 , url=

work page 2023

[59] [59]

2025 , url=

Mohammadreza Pourreza and Hailong Li and Ruoxi Sun and Yeounoh Chung and Shayan Talaei and Gaurav Tarlok Kakkar and Yu Gan and Amin Saberi and Fatma Ozcan and Sercan O Arik , booktitle=. 2025 , url=

work page 2025

[60] [60]

doi: 10.18653/v1/2024.findings-naacl.97

Qin, Zhen and Jagerman, Rolf and Hui, Kai and Zhuang, Honglei and Wu, Junru and Yan, Le and Shen, Jiaming and Liu, Tianqi and Liu, Jialu and Metzler, Donald and Wang, Xuanhui and Bendersky, Michael. Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting. Findings of the Association for Computational Linguistics: NAACL 2024. 2024....

work page doi:10.18653/v1/2024.findings-naacl.97 2024

[61] [61]

ArXiv , year=

TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension , author=. ArXiv , year=

work page

[62] [62]

2024 , url=

Dongyu Ru and Lin Qiu and Xiangkun Hu and Tianhang Zhang and Peng Shi and Shuaichen Chang and Cheng Jiayang and Cunxiang Wang and Shichao Sun and Huanyu Li and Zizhao Zhang and Binjie Wang and Jiarong Jiang and Tong He and Zhiguo Wang and Pengfei Liu and Yue Zhang and Zheng Zhang , booktitle=. 2024 , url=

work page 2024

[63] [63]

Improving Passage Retrieval with Zero-Shot Question Generation

Sachan, Devendra and Lewis, Mike and Joshi, Mandar and Aghajanyan, Armen and Yih, Wen-tau and Pineau, Joelle and Zettlemoyer, Luke , booktitle =. Improving Passage Retrieval with Zero-Shot Question Generation. 2022. doi:10.18653/v1/2022.emnlp-main.249

work page doi:10.18653/v1/2022.emnlp-main.249 2022

[64] [64]

arXiv preprint arXiv:2203.16714 , year=

End-to-end table question answering via retrieval-augmented generation , author=. arXiv preprint arXiv:2203.16714 , year=

work page arXiv

[65] [65]

Is C hat GPT Good at Search? Investigating Large Language Models as Re-Ranking Agents

Sun, Weiwei and Yan, Lingyong and Ma, Xinyu and Wang, Shuaiqiang and Ren, Pengjie and Chen, Zhumin and Yin, Dawei and Ren, Zhaochun. Is C hat GPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.923

work page doi:10.18653/v1/2023.emnlp-main.923 2023

[66] [66]

and Chopra, S

Hadsell, R. and Chopra, S. and LeCun, Y. , booktitle=. Dimensionality Reduction by Learning an Invariant Mapping , year=

work page

[67] [67]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

The power of noise: Redefining retrieval for rag systems , author=. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

work page

[68] [68]

ArXiv , year=

CHESS: Contextual Harnessing for Efficient SQL Synthesis , author=. ArXiv , year=

work page

[69] [69]

Table Retrieval May Not Necessitate Table-specific Model Design

Wang, Zhiruo and Jiang, Zhengbao and Nyberg, Eric and Neubig, Graham. Table Retrieval May Not Necessitate Table-specific Model Design. Proceedings of the Workshop on Structured and Unstructured Knowledge Integration (SUKI). 2022. doi:10.18653/v1/2022.suki-1.5

work page doi:10.18653/v1/2022.suki-1.5 2022

[70] [70]

2025 , url=

Jian Wu and Linyi Yang and Dongyuan Li and Yuliang Ji and Manabu Okumura and Yue Zhang , booktitle=. 2025 , url=

work page 2025

[71] [71]

Advances in Neural Information Processing Systems , volume=

Tablerag: Million-token table understanding with language models , author=. Advances in Neural Information Processing Systems , volume=

work page

[72] [72]

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

Seq2sql: Generating structured queries from natural language using reinforcement learning , author=. arXiv preprint arXiv:1709.00103 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[73] [73]

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

Retrieving complex tables with multi-granular graph representation learning , author=. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

work page

[74] [74]

NO INSIGHT

Yu, Tao and Zhang, Rui and Yang, Kai and Yasunaga, Michihiro and Wang, Dongxu and Li, Zifan and Ma, James and Li, Irene and Yao, Qingning and Roman, Shanelle and Zhang, Zilin and Radev, Dragomir. S pider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to- SQL Task. Proceedings of the 2018 Conference on Empirical...

work page doi:10.18653/v1/d18-1425 2018

[75] [75]

MURRE : Multi-Hop Table Retrieval with Removal for Open-Domain Text-to- SQL

Zhang, Xuanliang and Wang, Dingzirui and Dou, Longxu and Zhu, Qingfu and Che, Wanxiang. MURRE : Multi-Hop Table Retrieval with Removal for Open-Domain Text-to- SQL. Proceedings of the 31st International Conference on Computational Linguistics. 2025

work page 2025

[76] [76]

2024 , eprint=

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference , author=. 2024 , eprint=

work page 2024

[77] [77]

2019 , isbn =

Zhang, Li and Zhang, Shuo and Balog, Krisztian , title =. 2019 , isbn =. doi:10.1145/3331184.3331333 , booktitle =

work page doi:10.1145/3331184.3331333 2019

[78] [78]

Proceedings of the AAAI conference on artificial intelligence , volume=

Document-level relation extraction with adaptive thresholding and localized context pooling , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page

[79] [79]

Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels

Zhuang, Honglei and Qin, Zhen and Hui, Kai and Wu, Junru and Yan, Le and Wang, Xuanhui and Bendersky, Michael. Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2...

work page doi:10.18653/v1/2024.naacl-short.31 2024

[80] [80]

2024 , journal=

Xueguang Ma and Xinyu Zhang and Ronak Pradeep and Jimmy Lin , title =. 2024 , journal=

work page 2024