Layer-wise Token Compression for Efficient Document Reranking
Pith reviewed 2026-05-22 09:15 UTC · model grok-4.3
The pith
Applying adaptive token pooling at intermediate transformer layers speeds up cross-encoder rerankers without loss of ranking quality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that adaptive token pooling inserted at intermediate transformer layers, rather than at the initial embedding layer, reduces the effective sequence length for later computations while preserving the cross-encoder's capacity to model query-document interactions, thereby delivering higher inference throughput on both passage and document ranking tasks without degrading standard effectiveness metrics.
What carries the argument
Layer-wise Token Compression (LTC), which inserts adaptive token pooling operations at selected intermediate layers to shrink the token sequence before the remaining transformer blocks.
If this is right
- Ranking effectiveness on MS MARCO passage and document tasks stays comparable to the uncompressed models.
- Inference throughput rises by up to 25 percent for passage reranking and up to 116 percent for document reranking.
- The identical compression pattern applies directly to listwise LLM rerankers and yields larger relative speed gains on long inputs.
- Models trained with compression outperform their uncompressed counterparts when used for long-document ranking tasks.
Where Pith is reading between the lines
- Early-layer compression appears to destroy interaction patterns that middle layers would otherwise build, which explains why only later placement succeeds.
- The length-invariance benefit observed on long documents suggests compression acts as implicit regularization against over-reliance on document length cues.
- The same selective reduction could be explored in other transformer pipelines where early layers capture coarse features and later layers refine them.
- Learned or query-dependent policies for choosing compression layers might further improve the speed-quality trade-off.
Load-bearing premise
The approach assumes that query-document matching signals are already sufficiently formed by the middle layers so that later token reduction does not erase critical interaction information.
What would settle it
An experiment that applies middle-layer compression on the MS MARCO document ranking task and measures a statistically significant drop in NDCG@10 relative to the uncompressed baseline at matched computational cost would refute the claim that ranking quality is preserved.
Figures
read the original abstract
Transformer-based document cross-encoder rerankers are a central component of modern information retrieval systems. Despite their success, these models suffer from high computational costs due to processing long query-document sequences at inference time. A known approach to improve efficiency is token compression, which consists of aggregating groups of tokens together in the initial embedding layer, reducing the effective number of tokens, and making the computation faster. While token compression has proven to be successful for bi-encoder retrievers, we empirically observed that this approach may be ineffective for cross-encoder rerankers. In this paper, we propose Layer-wise Token Compression (LTC), which applies adaptive token pooling at intermediate transformer layers. Through extensive ablation studies on MS MARCO passage and document ranking tasks, we demonstrate that compression at middle layers preserves ranking quality while increasing inference QPS by up to 25% for passage ranking and up to 116% for document ranking. We also extend LTC to listwise LLM rerankers and show that the same approach can be easily applied to long-context listwise reranking, where the QPS improvements are even greater. More surprisingly, when applying rerankers trained on short passages to long-document ranking tasks, models trained with compression outperform their uncompressed counterparts, suggesting that compression may act as a beneficial regularizer that encourages length-invariant representations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Layer-wise Token Compression (LTC) for transformer-based cross-encoder rerankers. Rather than applying token compression at the initial embedding layer (observed to degrade cross-encoder performance), LTC performs adaptive token pooling at selected intermediate layers. Ablation studies on MS MARCO passage and document ranking tasks show that middle-layer compression preserves ranking quality while yielding inference QPS gains of up to 25% (passages) and 116% (documents). The approach is extended to listwise LLM rerankers with larger gains, and models trained with compression outperform uncompressed baselines when transferred from short-passage to long-document ranking, suggesting a regularizing effect toward length-invariant representations.
Significance. If the empirical findings prove robust, the work offers practical value for efficient neural reranking in production IR pipelines, especially for long-context and LLM-based listwise settings. The key insight that layer position critically affects compression viability for cross-encoders (unlike bi-encoders) and the incidental regularizer benefit are useful observations. The paper supplies ablation results across tasks and an extension to LLMs, which bolsters applicability. However, the absence of variance estimates and statistical tests in the reported efficiency and effectiveness numbers weakens the strength of the central efficiency-quality trade-off claim.
major comments (3)
- [§4.2 and Table 2] §4.2 and Table 2: The reported QPS gains (25% for passage ranking, 116% for document ranking) are presented as point estimates without error bars, standard deviations across runs, or statistical significance tests. Because the central claim is that quality is preserved while efficiency improves, the lack of these controls makes it impossible to determine whether the gains exceed experimental noise.
- [§3.1 and §3.2] §3.1 and §3.2: The precise definition of the adaptive pooling operation, the criterion used to choose which intermediate layers receive compression, and the exact compression ratios tested are not fully specified. These details are load-bearing for reproducing the reported result that middle-layer compression succeeds while initial-layer compression fails.
- [§5.3] §5.3: The interpretation that LTC functions as a regularizer producing length-invariant representations rests solely on improved transfer performance from short to long documents. No supporting measurements (e.g., length-score correlation or representation similarity across lengths) are provided, leaving the mechanistic claim under-supported relative to its prominence in the abstract.
minor comments (3)
- [Abstract] Abstract: The phrase 'up to 25%' and 'up to 116%' should be accompanied by the specific compression ratios and layer indices that achieve these maxima so readers can assess the operating range.
- [Related Work] Related Work: A short paragraph contrasting why initial-layer pooling harms cross-encoder query-document interaction modeling (while succeeding for bi-encoders) would sharpen the motivation for LTC.
- [§3.2] Notation: The manuscript uses 'adaptive token pooling' without an explicit equation; adding a concise formal definition (e.g., in §3.2) would improve clarity for readers unfamiliar with the pooling variant.
Simulated Author's Rebuttal
Thank you for the constructive feedback and the recommendation for minor revision. We appreciate the points raised regarding the robustness of our efficiency results, the need for greater methodological detail, and the support for our interpretation of the regularizer effect. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [§4.2 and Table 2] §4.2 and Table 2: The reported QPS gains (25% for passage ranking, 116% for document ranking) are presented as point estimates without error bars, standard deviations across runs, or statistical significance tests. Because the central claim is that quality is preserved while efficiency improves, the lack of these controls makes it impossible to determine whether the gains exceed experimental noise.
Authors: We agree that variance estimates and statistical tests would strengthen the central efficiency-quality trade-off claim. In the revised manuscript, we will rerun the key experiments across multiple random seeds (at least 3 runs per configuration), report standard deviations for both NDCG/MRR and QPS values, and include paired statistical significance tests (e.g., t-tests) comparing compressed and baseline models. This will allow readers to assess whether the reported gains exceed experimental variability. revision: yes
-
Referee: [§3.1 and §3.2] §3.1 and §3.2: The precise definition of the adaptive pooling operation, the criterion used to choose which intermediate layers receive compression, and the exact compression ratios tested are not fully specified. These details are load-bearing for reproducing the reported result that middle-layer compression succeeds while initial-layer compression fails.
Authors: We thank the referee for highlighting this reproducibility gap. In the revised version, we will expand Sections 3.1 and 3.2 with: (1) the full mathematical definition of the adaptive pooling, including the token importance scoring function and aggregation rule; (2) the explicit layer-selection criterion (derived from preliminary ablations showing early-layer degradation); and (3) the precise compression ratios (e.g., pooling factors of 2× or 4×) applied at each chosen layer. We will also include pseudocode for the LTC procedure to ensure full reproducibility. revision: yes
-
Referee: [§5.3] §5.3: The interpretation that LTC functions as a regularizer producing length-invariant representations rests solely on improved transfer performance from short to long documents. No supporting measurements (e.g., length-score correlation or representation similarity across lengths) are provided, leaving the mechanistic claim under-supported relative to its prominence in the abstract.
Authors: We acknowledge that the regularizer interpretation currently relies primarily on the transfer results. While these results provide suggestive evidence, we agree additional mechanistic support would be valuable. In the revision, we will either tone down the language in the abstract and §5.3 to emphasize the observational nature of the finding, or add a brief supporting analysis (e.g., length-relevance score correlations or representation similarity metrics across document lengths) if space allows. We view this as an interesting direction for future work rather than a fully substantiated mechanism. revision: partial
Circularity Check
No significant circularity detected
full rationale
The paper advances an empirical method for layer-wise token compression in cross-encoder rerankers, validated through ablation studies on MS MARCO passage and document ranking tasks. Central claims rest on observed performance preservation at intermediate layers (with QPS gains) and extension to LLM rerankers, without any mathematical derivations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the result to the paper's own inputs by construction. The findings are externally falsifiable via standard IR benchmarks and do not invoke uniqueness theorems or ansatzes from prior author work.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Layer-wise Token Compression (LTC)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebron, and Sumit Sanghai. 2023. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Compu...
-
[2]
Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2018. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 [cs.CL] https://arxiv.org/abs/1611.09268
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[3]
Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Fe- ichtenhofer, and Judy Hoffman. 2023. Token Merging: Your ViT But Faster. InThe Eleventh International Conference on Learning Representations. https: //openreview.net/forum?id=JroZRaRw7Eu
work page 2023
-
[4]
Zefan Cai, Yichi Zhang, Bofei Gao, Yuliang Liu, Yucheng Li, Tianyu Liu, Kem- ing Lu, Wayne Xiong, Yue Dong, Junjie Hu, and Wen Xiao. 2025. PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling. arXiv:2406.02069 [cs.CL] https://arxiv.org/abs/2406.02069
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
Haodong Chen, Shengyao Zhuang, Zheng Yao, Guido Zuccon, and Teerapong Leelanupab. 2026. Where Relevance Emerges: A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking. arXiv:2602.22591 [cs.IR] https://arxiv.org/ abs/2602.22591
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[6]
Shijie Chen, Bernal Jimenez Gutierrez, and Yu Su. 2025. Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers. InThe Thirteenth Inter- national Conference on Learning Representations. https://openreview.net/forum? id=yzloNYH3QN
work page 2025
-
[7]
Zijian Chen, Ronak Pradeep, and Jimmy Lin. 2025. Accelerating Listwise Rerank- ing: Reproducing and Enhancing FIRST. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (Padua, Italy)(SIGIR ’25). Association for Computing Machinery, New York, NY, USA, 3165–3172. doi:10.1145/3726302.3730287
-
[8]
Alexis Chevalier, Alexander Wettig, Anirudh Ajith, and Danqi Chen. 2023. Adapt- ing Language Models to Compress Contexts. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 3829–3846. doi:10.18653/v1/2023.emnlp-main.232
- [9]
- [10]
-
[11]
Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. 2022. Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale.Advances in neural information processing systems35 (2022), 30318–30332
work page 2022
-
[12]
Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. 2023. OPTQ: Accurate Quantization for Generative Pre-trained Transformers. InThe Eleventh International Conference on Learning Representations. https://openreview.net/ forum?id=tcbBPnfwxS
work page 2023
-
[13]
Revanth Gangi Reddy, JaeHyeok Doo, Yifei Xu, Md Arafat Sultan, Deevya Swain, Avirup Sil, and Heng Ji. 2024. FIRST: Faster Improved Listwise Reranking with Single Token Decoding. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational L...
-
[14]
Luyu Gao, Zhuyun Dai, and Jamie Callan. 2021. Rethink Training of BERT Rerankers in Multi-stage Retrieval Pipeline. InAdvances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 – April 1, 2021, Proceedings, Part II. Springer-Verlag, Berlin, Heidelberg, 280–286. doi:10.1007/978-3-030-72240-1_26
-
[15]
Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. 2023. Tevatron: An Efficient and Flexible Toolkit for Neural Retrieval. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval(Taipei, Taiwan)(SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 3120–3124. doi:10.1145/3539618.3591805
-
[16]
Saurabh Goyal, Anamitra Roy Choudhury, Saurabh M. Raje, Venkatesan T. Chakaravarthy, Yogish Sabharwal, and Ashish Verma. 2020. PoWER-BERT: accel- erating BERT inference via progressive word-vector elimination. InProceedings of the 37th International Conference on Machine Learning (ICML’20). JMLR.org, Article 346, 10 pages
work page 2020
-
[17]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. arXiv:1503.02531 [stat.ML] https://arxiv.org/abs/1503.02531
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[18]
Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W Mahoney, Yakun S Shao, Kurt Keutzer, and Amir Gholami. 2024. Kvquant: Towards 10 million context length llm inference with kv cache quantization.Advances in Neural Information Processing Systems37 (2024), 1270–1303
work page 2024
-
[19]
Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. InInternational Conference on Learning Representations. https: //openreview.net/forum?id=nZeVKeeFYf9
work page 2022
-
[20]
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, De- vendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2023. Mistral 7B. arXiv:2310.068...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[21]
Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. 2023. LLM- Lingua: Compressing Prompts for Accelerated Inference of Large Language Mod- els. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 1...
-
[22]
Yann LeCun, John Denker, and Sara Solla. 1989. Optimal brain damage.Advances in neural information processing systems2 (1989)
work page 1989
-
[23]
Yucheng Li, Bo Dong, Frank Guerin, and Chenghua Lin. 2023. Compressing Context to Enhance Inference Efficiency of Large Language Models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 6342–6353. doi:10.18653/v1...
- [24]
-
[25]
Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python Toolkit for Reproducible Infor- mation Retrieval Research with Sparse and Dense Representations. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval(Virtual Event, Canada)(SIG...
-
[26]
Zheng Liu, Chaofan Li, Shitao Xiao, Chaozhuo Li, Chen Jason Zhang, Hao Liao, Defu Lian, and Yingxia Shao. 2025. Fitting Into Any Shape: A Flexible LLM-Based Re-Ranker With Configurable Depth and Width. InProceedings of the ACM on Web Conference 2025(Sydney NSW, Australia)(WWW ’25). Association for Computing Machinery, New York, NY, USA, 3942–3951. doi:10....
-
[27]
Zirui Liu, Jiayi Yuan, Hongye Jin, Shaochen Zhong, Zhaozhuo Xu, Vladimir Braverman, Beidi Chen, and Xia Hu. 2024. KIVI: A Tuning-Free Asymmet- ric 2bit Quantization for KV Cache. InProceedings of the 41st International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 235), Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, ...
work page 2024
-
[28]
Xueguang Ma, Luyu Gao, Shengyao Zhuang, Jiaqi Samantha Zhan, Jamie Callan, and Jimmy Lin. 2025. Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval(Padua, Italy)(SIGIR ’25). Association for Computing Machiner...
- [29]
-
[30]
Paul Michel, Omer Levy, and Graham Neubig. 2019. Are Sixteen Heads Really Better than One?. InAdvances in Neural Information Processing Systems, H. Wal- lach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/ 2019/file/2c601ad9d2ff9bc8b282670cdd5...
work page 2019
-
[31]
Jesse Mu, Xiang Li, and Noah Goodman. 2023. Learning to compress prompts with gist tokens.Advances in Neural Information Processing Systems36 (2023), 19327–19352
work page 2023
-
[32]
Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. CoRRabs/1901.04085 (2019). arXiv:1901.04085 http://arxiv.org/abs/1901.04085
work page internal anchor Pith review Pith/arXiv arXiv 2019
- [33]
-
[34]
Ronak Pradeep, Sahel Sharifymoghaddam, and Jimmy Lin. 2023. RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze! arXiv:2312.02724 [cs.IR] https://arxiv.org/abs/2312.02724
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[35]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 [cs.CL] https://arxiv.org/abs/1910.01108
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[36]
Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. 2023. Is ChatGPT Good at Search? In- vestigating Large Language Models as Re-Ranking Agents. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for ...
-
[37]
Haoran Wei, Yaofeng Sun, and Yukun Li. 2025. DeepSeek-OCR: Contexts Optical Compression.arXiv preprint arXiv:2510.18234(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [38]
-
[39]
Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, and Danqi Chen. 2024. Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning. InThe Twelfth International Conference on Learning Representations. https://openreview. net/forum?id=09iOdaeOzp
work page 2024
-
[40]
Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis. 2024. Efficient Streaming Language Models with Attention Sinks. InThe Twelfth Inter- national Conference on Learning Representations. https://openreview.net/forum? id=NG7sS51zVF
work page 2024
- [41]
-
[42]
Zhichao Xu, Ashim Gupta, Tao Li, Oliver Bentham, and Vivek Srikumar. 2024. Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression. InFindings of the Association for Computational Linguistics: EMNLP 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Compu- tational Linguistics, Miami, Florida, USA, 15359–1539...
-
[43]
Zhichao Xu, Zhiqi Huang, Shengyao Zhuang, and Vivek Srikumar. 2025. Distilla- tion versus Contrastive Learning: How to Train Your Rerankers. InProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, Kentaro Inui, Sakriani Sakt...
work page 2025
- [44]
-
[45]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [46]
-
[47]
Andrew Yates, Rodrigo Nogueira, and Jimmy Lin. 2021. Pretrained Transformers for Text Ranking: BERT and Beyond. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval(Virtual Event, Canada)(SIGIR ’21). Association for Computing Machinery, New York, NY, USA, 2666–2668. doi:10.1145/3404835.3462812
- [48]
- [49]
-
[50]
Shengyao Zhuang, Xueguang Ma, Bevan Koopman, Jimmy Lin, and Guido Zuccon
-
[51]
arXiv:2503.06034 [cs.IR] https://arxiv.org/abs/2503
Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning. arXiv:2503.06034 [cs.IR] https://arxiv.org/abs/2503. 06034
-
[52]
Shengyao Zhuang, Honglei Zhuang, Bevan Koopman, and Guido Zuccon. 2024. A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Machinery, Ne...
- [53]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.