arxiv: 2604.09492 · v1 · submitted 2026-04-10 · 💻 cs.IR

Recognition: unknown

Dynamic Ranked List Truncation for Reranking Pipelines via LLM-generated Reference-Documents

Nilanjan Sinhababu , Soumedhik Bharati , Debasis Ganguly , Pabitra Mitra

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:22 UTC · model grok-4.3

classification 💻 cs.IR

keywords ranked list truncationLLM rerankingreference documentslistwise rerankinginformation retrievalefficiencyTREC benchmarksdynamic truncation

0 comments

The pith

LLM-generated reference documents serve as pivots to dynamically truncate ranked lists and accelerate listwise reranking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes that large language models can produce reference documents equivalent to relevance judgments, allowing these documents to separate relevant from non-relevant items in an initial ranked list. These references support both ranked list truncation without fixed heuristics and more efficient listwise reranking through parallel non-overlapping batches or adaptive overlapping windows with varying strides. Experiments on TREC Deep Learning collections show the resulting methods outperform prior truncation baselines while cutting computation time for LLM rerankers by up to 66 percent on both in-domain and out-of-domain tasks. A reader would care because the approach reduces the context-length and latency costs that currently limit LLM use in production reranking pipelines.

Core claim

LLMs can generate reference documents that act as reliable pivots between relevant and non-relevant documents; these documents enable dynamic ranked list truncation and adaptive batch processing during listwise reranking, outperforming static truncation and fixed-stride baselines on TREC benchmarks.

What carries the argument

LLM-generated reference documents that function as pivots separating relevant from non-relevant documents in a ranked list.

If this is right

Ranked list truncation no longer requires topic-agnostic fixed cutoffs or hand-tuned hyperparameters.
Listwise reranking can switch from sequential fixed-stride batches to parallel non-overlapping windows or adaptive-stride overlapping windows.
The same reference documents improve the efficiency of existing listwise reranking frameworks without changing their internal scoring logic.
Both in-domain and out-of-domain TREC-style collections exhibit up to 66 percent reduction in LLM inference cost.
Performance gains appear on standard relevance metrics while latency decreases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The reference-document technique could be tested on non-LLM rerankers such as dense retrievers or cross-encoders to measure whether the pivot effect is model-agnostic.
If the generated documents encode relevance signals cleanly, they might serve as synthetic training data for smaller ranking models.
Adaptive windowing might generalize to other sequential processing tasks where context length is a bottleneck, such as long-document summarization.
The method invites direct comparison of LLM-generated references against human-written relevance passages on the same collections.

Load-bearing premise

Large language models can produce documents whose semantic content reliably distinguishes relevant from non-relevant items using only relevance signals.

What would settle it

A controlled experiment in which the generated reference documents produce truncation points or reranking scores no better than random selection on a held-out TREC collection would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.09492 by Debasis Ganguly, Nilanjan Sinhababu, Pabitra Mitra, Soumedhik Bharati.

**Figure 1.** Figure 1: Prompt P (𝑄, 𝜏) used for pivot generation. We adopt the relevance scale used by human NIST assessors [4] and LLM-based judge UMBRELA [37]. Crucially, we experiment on the capability of LLMs to generate documents of a particular relevance label. We acknowledge that the capability of LLMs in document generation controlled with varying levels of relevance can be useful. In particular, we are focused on genera… view at source ↗

**Figure 3.** Figure 3: This diagram shows the generation of a reference-document pivot ( [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Results from Table 2 show that PSI-Rank in both point [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: This graph demonstrates stable reference-document [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

read the original abstract

Large Language Models (LLM) have been widely used in reranking. Computational overhead and large context lengths remain a challenging issue for LLM rerankers. Efficient reranking usually involves selecting a subset of the ranked list from the first stage, known as ranked list truncation (RLT). The truncated list is processed further by a reranker. For LLM rerankers, the ranked list is often partitioned and processed sequentially in batches to reduce the context length. Both these steps involve hyperparameters and topic-agnostic heuristics. Recently, LLMs have been shown to be effective for relevance judgment. Equivalently, we propose that LLMs can be used to generate reference documents that can act as a pivot between relevant and non-relevant documents in a ranked list. We propose methods to use these generated reference documents for RLT as well as for efficient listwise reranking. While reranking, we process the ranked list in either parallel batches of non-overlapping windows or overlapping windows with adaptive strides, improving the existing fixed stride setup. The generated reference documents are also shown to improve existing efficient listwise reranking frameworks. Experiments on TREC Deep Learning benchmarks show that our approach outperforms existing RLT-based approaches. In-domain and out-of-domain benchmarks demonstrate that our proposed methods accelerate LLM-based listwise reranking by up to 66\% compared to existing approaches. This work not only establishes a practical paradigm for efficient LLM-based reranking but also provides insight into the capability of LLMs to generate semantically controlled documents using relevance signals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses LLM-generated reference documents as pivots to drive dynamic truncation and adaptive batching in LLM rerankers, reporting up to 66% speedup on TREC DL benchmarks.

read the letter

The central move is to generate a reference document from relevance signals and treat it as a pivot: documents closer to it get kept for truncation, and the same signal guides whether to use fixed or adaptive strides when feeding windows to the LLM reranker. They also try both non-overlapping parallel batches and overlapping windows with variable strides, then fold the reference into existing listwise frameworks. That combination is the concrete novelty relative to prior RLT and fixed-stride work cited in the abstract. The experiments on TREC Deep Learning tracks show outperformance over existing RLT baselines plus the claimed acceleration in both in-domain and out-of-domain settings, which is the practical payoff they emphasize. The efficiency angle matters for anyone running listwise LLM rerankers at scale, and the idea of turning relevance judgment into a generated pivot is a reasonable extension of recent LLM-as-judge results. The soft spot is that the abstract gives no direct evidence the generated references actually separate relevant from non-relevant items better than simple heuristics or chance. No similarity histograms, no oracle alignment checks, and no ablation isolating the reference from the batching mechanics themselves. If the separation is weak, the speedups could be coming mostly from the overlapping-window trick rather than the pivot. Experimental controls, statistical tests, and hyperparameter details are also thin in the summary, so the outperformance numbers are hard to weigh without the full tables. This is aimed at people working on efficient neural ranking pipelines who already use LLMs for reranking. It has enough of a working method and benchmark numbers to deserve peer review, though any referee will want clearer diagnostics on whether the reference documents are doing the separation work claimed.

Referee Report

2 major / 2 minor

Summary. The paper proposes generating reference documents via LLMs from relevance signals to serve as pivots for dynamic ranked list truncation (RLT) and efficient listwise reranking. It introduces parallel non-overlapping batch windows and overlapping windows with adaptive strides to reduce context length and computation in LLM rerankers, claiming these outperform prior RLT methods and yield up to 66% acceleration on TREC Deep Learning in-domain and out-of-domain benchmarks while establishing a paradigm for semantically controlled document generation.

Significance. If the experimental claims hold after proper controls, the work offers a practical route to scale LLM reranking by cutting overhead without effectiveness loss, and the reference-document pivot idea could generalize beyond RLT to other retrieval pipelines. The reported speedups and outperformance would be notable contributions to efficient IR if substantiated with reproducible baselines and diagnostics.

major comments (2)

[Abstract and Experiments] Abstract and Experiments section: the claims of outperformance over existing RLT approaches and 66% acceleration rest on benchmark results, yet the manuscript supplies no details on the exact baselines, statistical significance tests, hyperparameter selection for batch window sizes and adaptive strides, or how reference-document quality was validated (e.g., no similarity distributions or oracle truncation alignment).
[Proposed Method] Proposed Method section: the load-bearing assumption that LLM-generated reference documents reliably separate relevant from non-relevant items (equivalence to human relevance judgments) lacks direct supporting diagnostics; without evidence that proximity to the reference outperforms chance or heuristic baselines, gains may derive from the window mechanics alone, especially in out-of-domain settings.

minor comments (2)

[Method] Clarify notation for adaptive strides versus fixed strides and how reference documents are constructed from relevance signals.
[Related Work] Add missing references to recent LLM relevance judgment work to better support the equivalence claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will incorporate the suggested clarifications and additional analyses into the revised manuscript to strengthen the experimental reporting and validation of the core assumptions.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: the claims of outperformance over existing RLT approaches and 66% acceleration rest on benchmark results, yet the manuscript supplies no details on the exact baselines, statistical significance tests, hyperparameter selection for batch window sizes and adaptive strides, or how reference-document quality was validated (e.g., no similarity distributions or oracle truncation alignment).

Authors: We agree that the current manuscript lacks sufficient detail on these aspects, which is necessary for full reproducibility and to substantiate the claims. In the revised version, we will expand the Experiments section with: (1) explicit descriptions of all baselines, including their sources, configurations, and any modifications; (2) statistical significance tests (e.g., paired t-tests or Wilcoxon tests with p-values reported for key comparisons); (3) a dedicated subsection on hyperparameter selection for batch window sizes and adaptive strides, detailing the search space, validation procedure, and chosen values; and (4) reference-document quality validation, including similarity distributions (e.g., cosine similarities to relevant vs. non-relevant documents) and alignment metrics with oracle truncation points. These additions will directly address the concerns and allow readers to evaluate the sources of the reported gains. revision: yes
Referee: [Proposed Method] Proposed Method section: the load-bearing assumption that LLM-generated reference documents reliably separate relevant from non-relevant items (equivalence to human relevance judgments) lacks direct supporting diagnostics; without evidence that proximity to the reference outperforms chance or heuristic baselines, gains may derive from the window mechanics alone, especially in out-of-domain settings.

Authors: We acknowledge that direct diagnostics are needed to confirm the reference documents' role in separation rather than attributing gains solely to the batching mechanics. In the revision, we will add supporting analyses in the Proposed Method and Experiments sections. These will include quantitative comparisons of truncation and reranking performance using proximity to the LLM-generated reference versus chance (random) and heuristic baselines (e.g., query embedding or document centroid). Results will be broken down by in-domain and out-of-domain settings, with metrics such as truncation precision and separation effectiveness. This will demonstrate that the reference documents provide benefits beyond the window mechanics and address the concern for out-of-domain generalization. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical proposal validated externally

full rationale

The paper proposes LLM-generated reference documents as pivots for dynamic ranked list truncation and adaptive listwise reranking, with claims resting on TREC DL benchmark experiments showing outperformance and up to 66% acceleration. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the derivation; the method is presented as a practical construction whose value is assessed via independent external results rather than internal self-reference or definition. The equivalence to relevance judgments is an explicit proposal, not a hidden tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the untested premise that LLM-generated reference documents faithfully encode relevance signals and can be used without introducing systematic bias. No free parameters are explicitly fitted in the abstract, but window sizes and stride rules are treated as tunable.

free parameters (1)

batch window sizes and adaptive strides
Hyperparameters controlling how the ranked list is partitioned for parallel or overlapping LLM calls.

axioms (1)

domain assumption LLMs can generate semantically controlled documents using relevance signals
Invoked when proposing that generated references act as pivots equivalent to relevance judgments.

invented entities (1)

LLM-generated reference documents no independent evidence
purpose: Serve as pivots between relevant and non-relevant documents for truncation and reranking decisions
New construct introduced to replace topic-agnostic heuristics; no independent evidence of correctness supplied.

pith-pipeline@v0.9.0 · 5590 in / 1379 out tokens · 55291 ms · 2026-05-10T16:22:51.626003+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 35 canonical work pages · 2 internal anchors

[1]

Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, and Andrew Tomkins. 2020. Choppy: Cut Transformer for Ranked List Truncation. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Infor- mation Retrieval(Virtual Event, China)(SIGIR ’20). Association for Computing Machinery, New York, NY, USA, 1513–1516. doi:10.1145/339...

work page doi:10.1145/3397271.3401188 2020
[2]

Zijian Chen, Ronak Pradeep, and Jimmy Lin. 2025. Accelerating Listwise Rerank- ing: Reproducing and Enhancing FIRST. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (Padua, Italy)(SIGIR ’25). Association for Computing Machinery, New York, NY, USA, 3165–3172. doi:10.1145/3726302.3730287

work page doi:10.1145/3726302.3730287 2025
[3]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2021. Overview of the TREC 2020 deep learning track. arXiv:2102.07662 [cs.IR]

work page arXiv 2021
[4]

Voorhees

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. 2020. Overview of the TREC 2019 deep learning track. arXiv:2003.07820 [cs.IR] https://arxiv.org/abs/2003.07820

work page arXiv 2020
[5]

Zhao, Ji Ma, Yi Luan, Jianmo Ni, Jing Lu, Anton Bakalov, Kelvin Guu, Keith B

Zhuyun Dai, Vincent Y. Zhao, Ji Ma, Yi Luan, Jianmo Ni, Jing Lu, Anton Bakalov, Kelvin Guu, Keith B. Hall, and Ming-Wei Chang. 2022. Promptagator: Few-shot Dense Retrieval From 8 Examples. arXiv:2209.11755 [cs.CL] https://arxiv.org/ abs/2209.11755

work page arXiv 2022
[6]

Guglielmo Faggioli, Laura Dietz, Charles L. A. Clarke, Gianluca Demartini, Matthias Hagen, Claudia Hauff, Noriko Kando, Evangelos Kanoulas, Martin Potthast, Benno Stein, and Henning Wachsmuth. 2023. Perspectives on Large Language Models for Relevance Judgment. InProceedings of the 2023 ACM SI- GIR International Conference on Theory of Information Retrieva...

work page doi:10.1145/3578337.3605136 2023
[7]

Guglielmo Faggioli, Laura Dietz, Charles L. A. Clarke, Gianluca Demartini, Matthias Hagen, Claudia Hauff, Julia Kiseleva, Gabriella Pasi, Martin Potthast, and Maarten de Rijke. 2023. Perspectives on Large Language Models for Rele- vance Judgment. InICTIR ’23: Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval. AC...

2023
[8]

Thibault Formal, Benjamin Piwowarski, and Stéphane Clinchant. 2021. SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval(Virtual Event, Canada)(SIGIR ’21). Association for Computing Machinery, New York, NY, USA, 2288–2292. doi:1...

work page doi:10.1145/3404835 2021
[9]

Revanth Gangi Reddy, JaeHyeok Doo, Yifei Xu, Md Arafat Sultan, Deevya Swain, Avirup Sil, and Heng Ji. 2024. FIRST: Faster Improved Listwise Reranking with Sin- gle Token Decoding. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational...

work page doi:10.18653/v1/2024.emnlp-main.491 2024
[10]

Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. 2022. Precise Zero-Shot Dense Retrieval without Relevance Labels. https://arxiv.org/abs/2212.10496

work page arXiv 2022
[11]

Jieran Li, Xiuyuan Hu, Yang Zhao, Shengyao Zhuang, and Hao Zhang. 2025. Leveraging Reference Documents for Zero-Shot Ranking via Large Language Models. arXiv:2506.11452 [cs.IR] https://arxiv.org/abs/2506.11452

work page arXiv 2025
[13]

Bruce Croft

Yen-Chieh Lien, Daniel Cohen, and W. Bruce Croft. 2019. An Assumption-Free Approach to the Dynamic Truncation of Ranked Lists. InProceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval(Santa Clara, CA, USA)(ICTIR ’19). Association for Computing Machinery, New York, NY, USA, 79–82. doi:10.1145/3341981.3344234

work page doi:10.1145/3341981.3344234 2019
[14]

Wenhan Liu, Xinyu Ma, Yutao Zhu, Ziliang Zhao, Shuaiqiang Wang, Dawei Yin, and Zhicheng Dou. 2024. Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models. arXiv:2412.14574 [cs.IR] https://arxiv.org/abs/2412.14574

work page arXiv 2024
[15]

Xueguang Ma, Xinyu Zhang, Ronak Pradeep, and Jimmy Lin. 2023. Fine-Tuning LLaMA for Multi-Stage Text Retrieval. InSIGIR ’23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Taipei, Taiwan, 2659–2669. https://arxiv.org/abs/2310.08319

work page arXiv 2023
[16]

Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, and Maarten de Rijke. 2024. Ranked List Truncation for Large Language Model- based Re-Ranking. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA) (SIGIR ’24). Association for Computing Machinery, New York, NY,...

work page doi:10.1145/3626772.3657864 2024
[17]

Chuan Meng, Jiqun Liu, Mohammad Aliannejadi, Fengran Mo, Jeff Dalton, and Maarten de Rijke. 2026. Re-Rankers as Relevance Judges. arXiv:2601.04455 [cs.IR] https://arxiv.org/abs/2601.04455

work page arXiv 2026
[18]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2017. MS MARCO: A Human-Generated MAchine Reading COmprehension Dataset. https://openreview.net/forum?id=Hk1iOLcle

2017
[19]

Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. https://arxiv.org/abs/1901.04085

work page internal anchor Pith review arXiv 2019
[20]

Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Docu- ment Ranking with a Pretrained Sequence-to-Sequence Model. InFindings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 708–718. doi:10.18653/v1/2020.findings-emnlp.63

work page doi:10.18653/v1/2020.findings-emnlp.63 2020
[21]

Andrew Parry, Sean MacAvaney, and Debasis Ganguly. 2024. Top-Down Partitioning for Efficient List-Wise Ranking. arXiv:2405.14589 [cs.IR] https: //arxiv.org/abs/2405.14589

work page arXiv 2024
[22]

Ronak Pradeep, Rodrigo Nogueira, and Jimmy Lin. 2021. The Expando-Mono- Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models. arXiv:2101.05667 [cs.IR] https://arxiv.org/abs/2101.05667

work page arXiv 2021
[23]

Ronak Pradeep, Rodrigo Nogueira, and Jimmy Lin. 2021. The Expresso Library for Ranking at Scale. InSIGIR ’21: Proceedings of the 44th International ACM SIGIR Dynamic Ranked List Truncation for Reranking Pipelines via LLM-generated Reference-Documents Conference on Research and Development in Information Retrieval. ACM, Montréal, QC, Canada, 2447–2450

2021
[24]

Ronak Pradeep, Sahel Sharifymoghaddam, and Jimmy Lin. 2023. RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models. arXiv:2309.15088 [cs.IR] https://arxiv.org/abs/2309.15088

work page arXiv 2023
[25]

Ronak Pradeep, Sahel Sharifymoghaddam, and Jimmy Lin. 2023. RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze! arXiv:2312.02724 [cs.IR] https://arxiv.org/abs/2312.02724

work page arXiv 2023
[26]

Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Le Yan, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, and Michael Ben- dersky. 2024. Large Language Models are Effective Text Rankers with Pair- wise Ranking Prompting. InFindings of the Association for Computational Lin- guistics: NAACL 2024, Kevin Duh, Helena Gomez, and Ste...

work page doi:10.18653/v1/2024.findings-naacl.97 2024
[27]

Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L

Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, and Emine Yilmaz. 2024. LLM4Eval: Large Language Model for Evaluation in IR. InProceed- ings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(S...

work page doi:10.1145/3626772.3657992 2024
[28]

Ruiyang Ren, Yuhao Wang, Kun Zhou, Wayne Xin Zhao, Wenjie Wang, Jing Liu, Ji-Rong Wen, and Tat-Seng Chua. 2025. Self-Calibrated Listwise Reranking with Large Language Models. InProceedings of the ACM on Web Conference 2025 (Sydney NSW, Australia)(WWW ’25). Association for Computing Machinery, New York, NY, USA, 3692–3701. doi:10.1145/3696410.3714658

work page doi:10.1145/3696410.3714658 2025
[29]

Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, and Mike Gatford. 1995. Okapi at TREC-3. InOverview of the Third Text REtrieval Conference (TREC-3). NIST Special Publication 500-225, Gaithersburg, MD, 109– 126

1995
[30]

Haiyang Shen, Hang Yan, Zhongshi Xing, Mugeng Liu, Yue Li, Zhiyang Chen, Yuxiang Wang, Jiuzheng Wang, and Yun Ma. 2025. RAGSynth: Synthetic Data for Robust and Faithful RAG Component Optimization. arXiv:2505.10989 [cs.AI] https://arxiv.org/abs/2505.10989

work page arXiv 2025
[31]

Nilanjan Sinhababu, Andrew Parry, Debasis Ganguly, and Pabitra Mitra. 2025. Modeling Ranking Properties with In-Context Learning. arXiv:2505.17736 [cs.IR] https://arxiv.org/abs/2505.17736

work page arXiv 2025
[32]

Nilanjan Sinhababu, Andrew Parry, Debasis Ganguly, Debasis Samanta, and Pabitra Mitra. 2024. Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model. InFindings of the Association for Computational Linguistics: EMNLP 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miam...

work page doi:10.18653/v1/2024.findings-emnlp.720 2024
[33]

Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. 2023. Is ChatGPT Good at Search? Inves- tigating Large Language Models as Re-Ranking Agents. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for ...

work page doi:10.18653/v1/2023.emnlp-main.923 2023
[34]

Raphael Tang, Crystina Zhang, Xueguang Ma, Jimmy Lin, and Ferhan Ture
[35]

Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Kevin Duh, Helena Gomez, and Steven Bethard (Eds.). Association for Computational Lin...

work page doi:10.18653/v1/2024.naacl-long.129 2024
[36]

Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models. arXiv:2104.08663 [cs.IR] https://arxiv.org/abs/ 2104.08663

work page internal anchor Pith review arXiv 2021
[37]

Paul Thomas, Seth Spielman, Nick Craswell, and Bhaskar Mitra. 2024. Large Language Models can Accurately Predict Searcher Preferences. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Machinery, New York, NY, USA, 1930–1940. doi:10.11...

work page doi:10.1145/3626772 2024
[38]

Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Nick Craswell, and Jimmy Lin. 2024. UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor. arXiv:2406.06519 [cs.IR] https://arxiv.org/abs/2406.06519

work page arXiv 2024
[39]

Dong Wang, Jianxin Li, Tianchen Zhu, Haoyi Zhou, Qishan Zhu, Yuxin Wen, and Hongming Piao. 2022. MtCut: A Multi-Task Framework for Ranked List Truncation. InProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining(Virtual Event, AZ, USA)(WSDM ’22). Association for Computing Machinery, New York, NY, USA, 1054–1062. doi:10.114...

work page doi:10.1145/3488560 2022
[40]

Liang Wang, Nan Yang, and Furu Wei. 2023. Query2doc: Query Expansion with Large Language Models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 9414–9423. doi:10.18653/v1/2023.emnlp-main.585

work page doi:10.18653/v1/2023.emnlp-main.585 2023
[41]

Chen Wu, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, and Xueqi Cheng. 2021. Learning to Truncate Ranked Lists for Information Retrieval. arXiv:2102.12793 [cs.IR] https://arxiv.org/abs/2102.12793

work page arXiv 2021