pith. sign in

arxiv: 2605.19628 · v1 · pith:HFTB57IUnew · submitted 2026-05-19 · 💻 cs.IR

Understanding Wacky Weights: A Dissection of SPLADE's Learned Term Importance

Pith reviewed 2026-05-20 02:23 UTC · model grok-4.3

classification 💻 cs.IR
keywords SPLADEwacky weightslearned sparse retrievalterm expansioninterpretabilityin-domain effectivenessout-of-domain generalizationsparsity regularization
0
0 comments X

The pith

SPLADE models use wacky weights primarily to boost retrieval inside the training domain rather than to generalize to new domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors reproduce SPLADE-v2 and test it under many loss functions, datasets, and backbone transformers to study wacky weights, which are expansion terms that appear unrelated to the query. They introduce a formal definition of wackiness based on lexical utility of those terms and a new way to measure their prevalence across different vocabularies and sparsity settings. Experiments show that bigger vocabularies produce more wacky tokens while stricter sparsity regularizers produce fewer. The central finding is that these weights improve effectiveness mainly on data from the same domain used in training and add little to performance on new domains.

Core claim

Reproducing SPLADE-v2 across varied training conditions shows that wacky weights, defined formally by the lexical utility of expansion terms, appear more often with larger vocabularies and less often under stronger sparsity constraints. These weights contribute to higher retrieval effectiveness inside the training domain and provide limited help for generalization to unseen domains.

What carries the argument

Formal definition of wackiness based on lexical utility of expansion terms, together with a normalized prevalence measure that allows comparison across models with different vocabularies and sparsity levels.

If this is right

  • Larger vocabularies are associated with higher prevalence of wacky tokens.
  • Stricter sparsity regularizers reduce the prevalence of wacky weights.
  • Wacky weights support in-domain retrieval effectiveness more than out-of-domain generalization.
  • Choice of loss function, training dataset, and backbone transformer affects how many wacky weights appear.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Interpretability advantages claimed for learned sparse retrievers may shrink when models are applied outside their original training domain.
  • Tuning vocabulary size and regularization strength offers a direct way to control the number of wacky weights.
  • The same pattern of domain-specific wacky terms may occur in other learned sparse retrieval models that expand queries.
  • Accepting some drop in peak in-domain performance may be necessary if the goal is to reduce wacky weights for better generalization.

Load-bearing premise

Lexical utility of an expansion term accurately identifies terms that are semantically unrelated to the query without many false positives or negatives.

What would settle it

Finding that wacky weights improve out-of-domain performance at least as much as in-domain performance, or that vocabulary size shows no relation to wacky-token prevalence, would contradict the main results.

Figures

Figures reproduced from arXiv: 2605.19628 by Carsten Eickhoff, Gregory Polyakov, Harrisen Scells.

Figure 1
Figure 1. Figure 1: Our wackiness score assigns a quantifiable value [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Impact of removing top-𝑁 wacky tokens vs. random tokens on MS MARCO. The blue line corresponds to effectiveness after removing the top-𝑁 wacky tokens. The orange region corresponds to the average ± two standard deviations of effectiveness after the removal of random expansion tokens. The red and green lines correspond to the cases where no expansion tokens are used for retrieval and where all of the SPLADE… view at source ↗
Figure 3
Figure 3. Figure 3: Impact of removing top-𝑁 wacky tokens vs. random tokens on BEIR. The blue lines correspond to effectiveness after removing the top-𝑁 wacky tokens. The orange regions correspond to the average ± two standard deviations of effectiveness after removing random expansion tokens. The red and green lines correspond to the cases where no expansion tokens are used for retrieval and where all of the SPLADE expansion… view at source ↗
Figure 4
Figure 4. Figure 4: Normalized Wackiness Curves for different ver [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Learned sparse retrieval models such as SPLADE combine the effectiveness of neural architectures with the efficiency of inverted indices. As these models assign weights to terms from a fixed vocabulary, interpretability is often touted as a major benefit of these models. However, the emergence of wacky weights, i.e., expansion terms that appear semantically unrelated to the input, limits interpretability. While prior research has anecdotally observed this phenomenon, there is a lack of systematic understanding regarding their origins, prevalence, and contribution to retrieval effectiveness. In this paper, we reproduce SPLADE-v2 to systematically investigate wacky weights across the SPLADE family of models. We present a comprehensive dissection of wacky weights, providing a formal definition of wackiness based on the lexical utility of expansion terms. Furthermore, we introduce a novel measure to compare the prevalence of these tokens across models with varying vocabularies and sparsity levels. Beyond reproducing the original SPLADE-v2, we train it with various loss functions, datasets, and backbone transformers to isolate the factors contributing to wackiness. Our results show that larger vocabularies are associated with a higher prevalence of wacky tokens, while stricter sparsity regularizers are associated with lower prevalence. Finally, we find that wacky weights are used primarily for in-domain effectiveness rather than out-of-domain generalization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript reproduces SPLADE-v2 and systematically investigates wacky weights (expansion terms semantically unrelated to the query) in learned sparse retrieval. It introduces a formal definition of wackiness based on lexical utility of expansion terms, a novel prevalence measure across varying vocabularies and sparsity levels, and experiments training SPLADE variants with different loss functions, datasets, and backbone transformers. Key results indicate that larger vocabularies correlate with higher wacky token prevalence, stricter sparsity regularizers with lower prevalence, and that wacky weights contribute primarily to in-domain effectiveness rather than out-of-domain generalization.

Significance. If the central claims hold, the work supplies a needed systematic dissection of an interpretability limitation in SPLADE-style models. The reproduction plus controlled ablations over losses, data, and backbones constitute a strength, as does the introduction of a prevalence measure that normalizes across vocabulary sizes. The ID/OOD distinction, if robust, could inform training regimes that reduce reliance on domain-specific lexical artifacts while preserving effectiveness.

major comments (2)
  1. [§3.2] §3.2 (definition of wackiness): The formal definition ties wackiness directly to lexical utility of expansion terms. If this utility is measured on the same in-domain collections used for the effectiveness ablations (as implied by the use of standard MS MARCO-style splits), the subsequent claim that wacky weights drive in-domain gains more than OOD gains risks circularity; terms are labeled wacky precisely because they supply useful lexical signals in that domain. Please clarify the data used for utility computation and provide an explicit argument or control experiment showing the ID/OOD differential is not an artifact of the labeling procedure.
  2. [§5] §5 (experimental results on ID vs OOD): The central claim that wacky weights are used primarily for in-domain effectiveness rests on comparisons of effectiveness deltas with and without wacky terms. No statistical significance tests or confidence intervals are reported for these deltas, and the construction of the out-of-domain splits is not detailed. Both omissions are load-bearing for the generalization conclusion.
minor comments (3)
  1. [§3.3] The exact formula or pseudocode for the novel prevalence measure should be given as an equation rather than described only in prose.
  2. [Tables/Figures] Table 2 and Figure 4: axis labels and legends use inconsistent terminology ('wacky tokens' vs 'wacky weights'); standardize throughout.
  3. [§3.2] The paper should report the precise lexical-utility threshold or cutoff used to label a term as wacky, and whether it is fixed or tuned per model.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address the major concerns regarding the definition of wackiness and the experimental analysis of ID vs OOD effectiveness below. We will make revisions to clarify and strengthen these aspects.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (definition of wackiness): The formal definition ties wackiness directly to lexical utility of expansion terms. If this utility is measured on the same in-domain collections used for the effectiveness ablations (as implied by the use of standard MS MARCO-style splits), the subsequent claim that wacky weights drive in-domain gains more than OOD gains risks circularity; terms are labeled wacky precisely because they supply useful lexical signals in that domain. Please clarify the data used for utility computation and provide an explicit argument or control experiment showing the ID/OOD differential is not an artifact of the labeling procedure.

    Authors: We appreciate this observation on potential circularity. In the manuscript, the lexical utility for defining wackiness is computed using the MS MARCO training set, which is the in-domain data. To address the concern, we will revise §3.2 to explicitly state the data used and add a control where we compute utility on a separate out-of-domain dataset (e.g., using TREC DL or other collections). We will also include an argument that the ablation experiments measure the marginal contribution to retrieval metrics separately from the labeling, and show that the differential effect holds even when controlling for the definition. This will demonstrate that the ID/OOD distinction is not merely an artifact. revision: yes

  2. Referee: [§5] §5 (experimental results on ID vs OOD): The central claim that wacky weights are used primarily for in-domain effectiveness rests on comparisons of effectiveness deltas with and without wacky terms. No statistical significance tests or confidence intervals are reported for these deltas, and the construction of the out-of-domain splits is not detailed. Both omissions are load-bearing for the generalization conclusion.

    Authors: We agree that reporting statistical significance and detailing the OOD splits are necessary for robust conclusions. We will update §5 to include statistical tests (such as paired t-tests with p-values) and bootstrap confidence intervals for the effectiveness deltas. Additionally, we will expand the description of the out-of-domain evaluation splits, specifying the datasets (e.g., which OOD collections like BEIR subsets or others were used) and how the splits were constructed to ensure they are truly out-of-domain relative to the training data. revision: yes

Circularity Check

0 steps flagged

Empirical reproduction and external benchmarks; definition of wackiness does not force the ID/OOD claim by construction

full rationale

The paper is an empirical dissection that reproduces SPLADE-v2, trains variants with different losses/datasets/backbones, and measures prevalence and contribution on standard collections (MS MARCO and others). The formal definition of wackiness uses lexical utility of expansion terms as a proxy, but this is applied to analyze observed behavior rather than deriving the central ID-vs-OOD result tautologically from the labeling procedure itself. No equations reduce a reported prediction to a fitted input by construction, and no load-bearing self-citation chain is present. Minor risk exists that utility computed on the same collections could bias labeling, but this remains an empirical measurement choice rather than a mathematical equivalence, keeping circularity low.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claims rest on the validity of the lexical-utility definition of wackiness and on the assumption that the chosen training variations (loss, dataset, backbone) isolate the factors that control wacky-term prevalence; no new physical or mathematical entities are postulated.

free parameters (2)
  • vocabulary size
    Larger vocabularies are reported to increase wacky-token prevalence; this hyper-parameter is varied across experiments.
  • sparsity regularizer strength
    Stricter regularizers are reported to decrease wacky-token prevalence; this hyper-parameter is varied across experiments.
axioms (1)
  • domain assumption Lexical utility of an expansion term is a valid and sufficient criterion for labeling the term as wacky.
    Invoked when the paper introduces its formal definition of wackiness.

pith-pipeline@v0.9.0 · 5770 in / 1429 out tokens · 63271 ms · 2026-05-20T02:23:32.763434+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

  1. [1]

    Avishek Anand, Lijun Lyu, Maximilian Idahl, Yumeng Wang, Jonas Wallat, and Zijian Zhang. 2022. Explainable Information Retrieval: A Survey.CoRR abs/2211.02405 (2022). arXiv:2211.02405 doi:10.48550/ARXIV.2211.02405

  2. [2]

    Yang Bai, Xiaoguang Li, Gang Wang, Chaoliang Zhang, Lifeng Shang, Jun Xu, Zhaowei Wang, Fangshan Wang, and Qun Liu. 2020. SparTerm: Learning Term- based Sparse Representation for Fast Text Retrieval.CoRRabs/2010.00768 (2020). arXiv:2010.00768 https://arxiv.org/abs/2010.00768

  3. [3]

    Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Y

    Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Y. Zien. 2003. Efficient query evaluation using a two-level retrieval process. In Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management, New Orleans, Louisiana, USA, November 2-8, 2003. ACM, 426–434. doi:10.1145/956863.956944

  4. [4]

    Shane Culpepper, Jimmy Lin, Joel M

    Matt Crane, J. Shane Culpepper, Jimmy Lin, Joel M. Mackenzie, and Andrew Trotman. 2017. A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation. InProceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017, Cambridge, United Kingdom, February 6-10, 2017. ACM, 201–210. doi:10.1145/3018661.3018726

  5. [5]

    Zhuyun Dai and Jamie Callan. 2019. Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval.CoRRabs/1910.10687 (2019). arXiv:1910.10687 http://arxiv.org/abs/1910.10687

  6. [6]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, ...

  7. [7]

    Shuai Ding and Torsten Suel. 2011. Faster top-k document retrieval using block- max indexes. InProceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, July 25-29, 2011. ACM, 993–1002. doi:10.1145/2009916.2010048

  8. [8]

    Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant

  9. [9]

    doi:10.48550/ARXIV.2109.10086

    SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval. CoRRabs/2109.10086 (2021). arXiv:2109.10086 https://arxiv.org/abs/2109.10086

  10. [10]

    Furnas, Thomas K

    George W. Furnas, Thomas K. Landauer, Louis M. Gomez, and Susan T. Dumais

  11. [11]

    ACM30, 11 (1987), 964–971

    The Vocabulary Problem in Human-System Communication.Commun. ACM30, 11 (1987), 964–971. doi:10.1145/32206.32212

  12. [12]

    Luyu Gao, Zhuyun Dai, and Jamie Callan. 2021. COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11,

  13. [14]

    Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Allan Hanbury. 2020. Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation.CoRRabs/2010.02666 (2020). arXiv:2010.02666 https://arxiv.org/abs/2010.02666

  14. [15]

    Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, and Allan Hanbury. 2021. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. InSIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. ACM, 113–122. doi:10.1145/3...

  15. [16]

    Bruce Croft, Fernando Diaz, Leah S

    Nasreen Abdul Jaleel, James Allan, W. Bruce Croft, Fernando Diaz, Leah S. Larkey, Xiaoyan Li, Mark D. Smucker, and Courtney Wade. 2004. UMass at TREC 2004: Novelty and HARD. InProceedings of the Thirteenth Text REtrieval Conference, TREC 2004, Gaithersburg, Maryland, USA, November 16-19, 2004 (NIST Special Publication). National Institute of Standards and...

  16. [17]

    Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, and Jimmy Lin. 2024. Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Ma...

  17. [18]

    Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20,

  18. [19]

    In: Zong, C., Xia, F., Li, W., Navigli, R

    Association for Computational Linguistics, 6769–6781. doi:10.18653/V1/ 2020.EMNLP-MAIN.550

  19. [20]

    Carlos Lassance, Hervé Déjean, Thibault Formal, and Stéphane Clinchant

  20. [21]

    arXiv:2403.06789 doi:10.48550/ARXIV.2403.06789

    SPLADE-v3: New baselines for SPLADE.CoRRabs/2403.06789 (2024). arXiv:2403.06789 doi:10.48550/ARXIV.2403.06789

  21. [22]

    Jimmy Lin and Xueguang Ma. 2021. A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.CoRR abs/2106.14807 (2021). arXiv:2106.14807 https://arxiv.org/abs/2106.14807

  22. [23]

    Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. 2020. Expansion via prediction of importance with contextualization. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 1573–1576

  23. [24]

    Joel Mackenzie, Matthias Petri, and Alistair Moffat. 2022. Anytime Ranking on Document-Ordered Indexes.ACM Trans. Inf. Syst.40, 1 (2022), 13:1–13:32. doi:10.1145/3467890

  24. [25]

    Joel Mackenzie, Andrew Trotman, and Jimmy Lin. 2021. Wacky Weights in Learned Sparse Representations and the Revenge of Score-at-a-Time Query Evaluation.CoRRabs/2110.11540 (2021). arXiv:2110.11540 https://arxiv.org/abs/ 2110.11540

  25. [26]

    Joel Mackenzie, Shengyao Zhuang, and Guido Zuccon. 2023. Exploring the Representation Power of SPLADE Models. InProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2023, Taipei, Taiwan, 23 July 2023. ACM, 143–147. doi:10.1145/3578337.3605129

  26. [27]

    Antonio Mallia, Omar Khattab, Torsten Suel, and Nicola Tonellotto. 2021. Learn- ing Passage Impacts for Inverted Indexes. InSIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. ACM, 1723–1727. doi:10.1145/3404835.3463030

  27. [28]

    Antonio Mallia, Giuseppe Ottaviano, Elia Porciani, Nicola Tonellotto, and Rossano Venturini. 2017. Faster BlockMax WAND with Variable-sized Blocks. InProceed- ings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. ACM, 625–634. doi:10.1145/3077136.3080780

  28. [29]

    Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. InProceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems...

  29. [30]

    Rodrigo Nogueira, Zhiying Jiang, and Jimmy Lin. 2020. Document Ranking with a Pretrained Sequence-to-Sequence Model. InFindings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL). Association for Computational Linguistics, 708–718. doi:10. 18653/V1/2020.FINDINGS-EMNLP.63

  30. [31]

    Rodrigo Nogueira, Jimmy Lin, and AI Epistemic. 2019. From doc2query to docTTTTTquery. (2019)

  31. [32]

    Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document Expansion by Query Prediction.CoRRabs/1904.08375 (2019). arXiv:1904.08375 http://arxiv.org/abs/1904.08375

  32. [33]

    Biswajit Paria, Chih-Kuan Yeh, Ian En-Hsu Yen, Ning Xu, Pradeep Raviku- mar, and Barnabás Póczos. 2020. Minimizing FLOPs to Learn Efficient Sparse Representations. In8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https: //openreview.net/forum?id=SygpC6Ntvr

  33. [34]

    Aldo Porco, Dhruv Mehra, Igor Malioutov, Karthik Radhakrishnan, Moniba Key- manesh, Daniel Preotiuc-Pietro, Sean MacAvaney, and Pengxiang Cheng. 2025. An Alternative to FLOPS Regularization to Effectively Productionize SPLADE- Doc. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025...

  34. [35]

    doi:10.1145/3726302.3730163

    ACM, 2789–2793. doi:10.1145/3726302.3730163

  35. [36]

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.J. Mach. Learn. Res.21 (2020), 140:1–140:67. https://jmlr.org/papers/v21/20-074.html

  36. [37]

    Robertson and Hugo Zaragoza , title =

    Stephen E. Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond.Found. Trends Inf. Retr.3, 4 (2009), 333–389. doi:10.1561/1500000019

  37. [38]

    Andrew Trotman and Matt Crane. 2019. Micro- and macro-optimizations of SaaT search.Softw. Pract. Exp.49, 5 (2019), 942–950. doi:10.1002/SPE.2683

  38. [39]

    Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding.CoRRabs/1807.03748 (2018). arXiv:1807.03748 http://arxiv.org/abs/1807.03748

  39. [40]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InAdvances in Neural Information Processing Systems 30: Annual Con- ference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 5998–6008

  40. [41]

    Liang Wang, Nan Yang, and Furu Wei. 2023. Query2doc: Query Expansion with Large Language Models. (2023), 9414–9423. doi:10.18653/V1/2023.EMNLP- MAIN.585

  41. [42]

    Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate nearest neighbor negative contrastive learning for dense text retrieval. (2021). https://openreview. net/forum?id=zeFrfgyZln Understanding Wacky Weights: A Dissection of SPLADE’s Learned Term Importance SIGIR ’26, July 20–24, 202...

  42. [43]

    Honglei Zhuang, Zhen Qin, Rolf Jagerman, Kai Hui, Ji Ma, Jing Lu, Jianmo Ni, Xuanhui Wang, and Michael Bendersky. 2023. Rankt5: Fine-tuning t5 for text ranking with ranking losses. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2308–2313. doi:10.1145/3539618.3592047