Understanding Wacky Weights: A Dissection of SPLADE's Learned Term Importance

Carsten Eickhoff; Gregory Polyakov; Harrisen Scells

arxiv: 2605.19628 · v1 · pith:HFTB57IUnew · submitted 2026-05-19 · 💻 cs.IR

Understanding Wacky Weights: A Dissection of SPLADE's Learned Term Importance

Gregory Polyakov , Harrisen Scells , Carsten Eickhoff This is my paper

Pith reviewed 2026-05-20 02:23 UTC · model grok-4.3

classification 💻 cs.IR

keywords SPLADEwacky weightslearned sparse retrievalterm expansioninterpretabilityin-domain effectivenessout-of-domain generalizationsparsity regularization

0 comments

The pith

SPLADE models use wacky weights primarily to boost retrieval inside the training domain rather than to generalize to new domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors reproduce SPLADE-v2 and test it under many loss functions, datasets, and backbone transformers to study wacky weights, which are expansion terms that appear unrelated to the query. They introduce a formal definition of wackiness based on lexical utility of those terms and a new way to measure their prevalence across different vocabularies and sparsity settings. Experiments show that bigger vocabularies produce more wacky tokens while stricter sparsity regularizers produce fewer. The central finding is that these weights improve effectiveness mainly on data from the same domain used in training and add little to performance on new domains.

Core claim

Reproducing SPLADE-v2 across varied training conditions shows that wacky weights, defined formally by the lexical utility of expansion terms, appear more often with larger vocabularies and less often under stronger sparsity constraints. These weights contribute to higher retrieval effectiveness inside the training domain and provide limited help for generalization to unseen domains.

What carries the argument

Formal definition of wackiness based on lexical utility of expansion terms, together with a normalized prevalence measure that allows comparison across models with different vocabularies and sparsity levels.

If this is right

Larger vocabularies are associated with higher prevalence of wacky tokens.
Stricter sparsity regularizers reduce the prevalence of wacky weights.
Wacky weights support in-domain retrieval effectiveness more than out-of-domain generalization.
Choice of loss function, training dataset, and backbone transformer affects how many wacky weights appear.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Interpretability advantages claimed for learned sparse retrievers may shrink when models are applied outside their original training domain.
Tuning vocabulary size and regularization strength offers a direct way to control the number of wacky weights.
The same pattern of domain-specific wacky terms may occur in other learned sparse retrieval models that expand queries.
Accepting some drop in peak in-domain performance may be necessary if the goal is to reduce wacky weights for better generalization.

Load-bearing premise

Lexical utility of an expansion term accurately identifies terms that are semantically unrelated to the query without many false positives or negatives.

What would settle it

Finding that wacky weights improve out-of-domain performance at least as much as in-domain performance, or that vocabulary size shows no relation to wacky-token prevalence, would contradict the main results.

Figures

Figures reproduced from arXiv: 2605.19628 by Carsten Eickhoff, Gregory Polyakov, Harrisen Scells.

**Figure 2.** Figure 2: Impact of removing top-𝑁 wacky tokens vs. random tokens on MS MARCO. The blue line corresponds to effectiveness after removing the top-𝑁 wacky tokens. The orange region corresponds to the average ± two standard deviations of effectiveness after the removal of random expansion tokens. The red and green lines correspond to the cases where no expansion tokens are used for retrieval and where all of the SPLADE… view at source ↗

**Figure 3.** Figure 3: Impact of removing top-𝑁 wacky tokens vs. random tokens on BEIR. The blue lines correspond to effectiveness after removing the top-𝑁 wacky tokens. The orange regions correspond to the average ± two standard deviations of effectiveness after removing random expansion tokens. The red and green lines correspond to the cases where no expansion tokens are used for retrieval and where all of the SPLADE expansion… view at source ↗

**Figure 4.** Figure 4: Normalized Wackiness Curves for different ver [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Learned sparse retrieval models such as SPLADE combine the effectiveness of neural architectures with the efficiency of inverted indices. As these models assign weights to terms from a fixed vocabulary, interpretability is often touted as a major benefit of these models. However, the emergence of wacky weights, i.e., expansion terms that appear semantically unrelated to the input, limits interpretability. While prior research has anecdotally observed this phenomenon, there is a lack of systematic understanding regarding their origins, prevalence, and contribution to retrieval effectiveness. In this paper, we reproduce SPLADE-v2 to systematically investigate wacky weights across the SPLADE family of models. We present a comprehensive dissection of wacky weights, providing a formal definition of wackiness based on the lexical utility of expansion terms. Furthermore, we introduce a novel measure to compare the prevalence of these tokens across models with varying vocabularies and sparsity levels. Beyond reproducing the original SPLADE-v2, we train it with various loss functions, datasets, and backbone transformers to isolate the factors contributing to wackiness. Our results show that larger vocabularies are associated with a higher prevalence of wacky tokens, while stricter sparsity regularizers are associated with lower prevalence. Finally, we find that wacky weights are used primarily for in-domain effectiveness rather than out-of-domain generalization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript reproduces SPLADE-v2 and systematically investigates wacky weights (expansion terms semantically unrelated to the query) in learned sparse retrieval. It introduces a formal definition of wackiness based on lexical utility of expansion terms, a novel prevalence measure across varying vocabularies and sparsity levels, and experiments training SPLADE variants with different loss functions, datasets, and backbone transformers. Key results indicate that larger vocabularies correlate with higher wacky token prevalence, stricter sparsity regularizers with lower prevalence, and that wacky weights contribute primarily to in-domain effectiveness rather than out-of-domain generalization.

Significance. If the central claims hold, the work supplies a needed systematic dissection of an interpretability limitation in SPLADE-style models. The reproduction plus controlled ablations over losses, data, and backbones constitute a strength, as does the introduction of a prevalence measure that normalizes across vocabulary sizes. The ID/OOD distinction, if robust, could inform training regimes that reduce reliance on domain-specific lexical artifacts while preserving effectiveness.

major comments (2)

[§3.2] §3.2 (definition of wackiness): The formal definition ties wackiness directly to lexical utility of expansion terms. If this utility is measured on the same in-domain collections used for the effectiveness ablations (as implied by the use of standard MS MARCO-style splits), the subsequent claim that wacky weights drive in-domain gains more than OOD gains risks circularity; terms are labeled wacky precisely because they supply useful lexical signals in that domain. Please clarify the data used for utility computation and provide an explicit argument or control experiment showing the ID/OOD differential is not an artifact of the labeling procedure.
[§5] §5 (experimental results on ID vs OOD): The central claim that wacky weights are used primarily for in-domain effectiveness rests on comparisons of effectiveness deltas with and without wacky terms. No statistical significance tests or confidence intervals are reported for these deltas, and the construction of the out-of-domain splits is not detailed. Both omissions are load-bearing for the generalization conclusion.

minor comments (3)

[§3.3] The exact formula or pseudocode for the novel prevalence measure should be given as an equation rather than described only in prose.
[Tables/Figures] Table 2 and Figure 4: axis labels and legends use inconsistent terminology ('wacky tokens' vs 'wacky weights'); standardize throughout.
[§3.2] The paper should report the precise lexical-utility threshold or cutoff used to label a term as wacky, and whether it is fixed or tuned per model.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address the major concerns regarding the definition of wackiness and the experimental analysis of ID vs OOD effectiveness below. We will make revisions to clarify and strengthen these aspects.

read point-by-point responses

Referee: [§3.2] §3.2 (definition of wackiness): The formal definition ties wackiness directly to lexical utility of expansion terms. If this utility is measured on the same in-domain collections used for the effectiveness ablations (as implied by the use of standard MS MARCO-style splits), the subsequent claim that wacky weights drive in-domain gains more than OOD gains risks circularity; terms are labeled wacky precisely because they supply useful lexical signals in that domain. Please clarify the data used for utility computation and provide an explicit argument or control experiment showing the ID/OOD differential is not an artifact of the labeling procedure.

Authors: We appreciate this observation on potential circularity. In the manuscript, the lexical utility for defining wackiness is computed using the MS MARCO training set, which is the in-domain data. To address the concern, we will revise §3.2 to explicitly state the data used and add a control where we compute utility on a separate out-of-domain dataset (e.g., using TREC DL or other collections). We will also include an argument that the ablation experiments measure the marginal contribution to retrieval metrics separately from the labeling, and show that the differential effect holds even when controlling for the definition. This will demonstrate that the ID/OOD distinction is not merely an artifact. revision: yes
Referee: [§5] §5 (experimental results on ID vs OOD): The central claim that wacky weights are used primarily for in-domain effectiveness rests on comparisons of effectiveness deltas with and without wacky terms. No statistical significance tests or confidence intervals are reported for these deltas, and the construction of the out-of-domain splits is not detailed. Both omissions are load-bearing for the generalization conclusion.

Authors: We agree that reporting statistical significance and detailing the OOD splits are necessary for robust conclusions. We will update §5 to include statistical tests (such as paired t-tests with p-values) and bootstrap confidence intervals for the effectiveness deltas. Additionally, we will expand the description of the out-of-domain evaluation splits, specifying the datasets (e.g., which OOD collections like BEIR subsets or others were used) and how the splits were constructed to ensure they are truly out-of-domain relative to the training data. revision: yes

Circularity Check

0 steps flagged

Empirical reproduction and external benchmarks; definition of wackiness does not force the ID/OOD claim by construction

full rationale

The paper is an empirical dissection that reproduces SPLADE-v2, trains variants with different losses/datasets/backbones, and measures prevalence and contribution on standard collections (MS MARCO and others). The formal definition of wackiness uses lexical utility of expansion terms as a proxy, but this is applied to analyze observed behavior rather than deriving the central ID-vs-OOD result tautologically from the labeling procedure itself. No equations reduce a reported prediction to a fitted input by construction, and no load-bearing self-citation chain is present. Minor risk exists that utility computed on the same collections could bias labeling, but this remains an empirical measurement choice rather than a mathematical equivalence, keeping circularity low.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claims rest on the validity of the lexical-utility definition of wackiness and on the assumption that the chosen training variations (loss, dataset, backbone) isolate the factors that control wacky-term prevalence; no new physical or mathematical entities are postulated.

free parameters (2)

vocabulary size
Larger vocabularies are reported to increase wacky-token prevalence; this hyper-parameter is varied across experiments.
sparsity regularizer strength
Stricter regularizers are reported to decrease wacky-token prevalence; this hyper-parameter is varied across experiments.

axioms (1)

domain assumption Lexical utility of an expansion term is a valid and sufficient criterion for labeling the term as wacky.
Invoked when the paper introduces its formal definition of wackiness.

pith-pipeline@v0.9.0 · 5770 in / 1429 out tokens · 63271 ms · 2026-05-20T02:23:32.763434+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

formal definition of wackiness based on the lexical utility of expansion terms... WackinessScore(t)=1−(1/|Xt|∑x∈XtS(t,x))
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Normalized Wackiness Curve... W-AUC score... larger vocabularies... stricter sparsity regularizers

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Avishek Anand, Lijun Lyu, Maximilian Idahl, Yumeng Wang, Jonas Wallat, and Zijian Zhang. 2022. Explainable Information Retrieval: A Survey.CoRR abs/2211.02405 (2022). arXiv:2211.02405 doi:10.48550/ARXIV.2211.02405

work page doi:10.48550/arxiv.2211.02405 2022
[2]

Yang Bai, Xiaoguang Li, Gang Wang, Chaoliang Zhang, Lifeng Shang, Jun Xu, Zhaowei Wang, Fangshan Wang, and Qun Liu. 2020. SparTerm: Learning Term- based Sparse Representation for Fast Text Retrieval.CoRRabs/2010.00768 (2020). arXiv:2010.00768 https://arxiv.org/abs/2010.00768

work page arXiv 2020
[3]

Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Y

Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Y. Zien. 2003. Efficient query evaluation using a two-level retrieval process. In Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management, New Orleans, Louisiana, USA, November 2-8, 2003. ACM, 426–434. doi:10.1145/956863.956944

work page doi:10.1145/956863.956944 2003
[4]

Shane Culpepper, Jimmy Lin, Joel M

Matt Crane, J. Shane Culpepper, Jimmy Lin, Joel M. Mackenzie, and Andrew Trotman. 2017. A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation. InProceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017, Cambridge, United Kingdom, February 6-10, 2017. ACM, 201–210. doi:10.1145/3018661.3018726

work page doi:10.1145/3018661.3018726 2017
[5]

Zhuyun Dai and Jamie Callan. 2019. Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval.CoRRabs/1910.10687 (2019). arXiv:1910.10687 http://arxiv.org/abs/1910.10687

work page arXiv 2019
[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, ...

work page doi:10.18653/v1/n19-1423 2019
[7]

Shuai Ding and Torsten Suel. 2011. Faster top-k document retrieval using block- max indexes. InProceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, July 25-29, 2011. ACM, 993–1002. doi:10.1145/2009916.2010048

work page doi:10.1145/2009916.2010048 2011
[8]

Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant

work page
[9]

doi:10.48550/ARXIV.2109.10086

SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval. CoRRabs/2109.10086 (2021). arXiv:2109.10086 https://arxiv.org/abs/2109.10086

work page arXiv 2021
[10]

Furnas, Thomas K

George W. Furnas, Thomas K. Landauer, Louis M. Gomez, and Susan T. Dumais

work page
[11]

ACM30, 11 (1987), 964–971

The Vocabulary Problem in Human-System Communication.Commun. ACM30, 11 (1987), 964–971. doi:10.1145/32206.32212

work page doi:10.1145/32206.32212 1987
[12]

Luyu Gao, Zhuyun Dai, and Jamie Callan. 2021. COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11,

work page 2021
[14]

Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Allan Hanbury. 2020. Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation.CoRRabs/2010.02666 (2020). arXiv:2010.02666 https://arxiv.org/abs/2010.02666

work page arXiv 2020
[15]

Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, and Allan Hanbury. 2021. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. InSIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. ACM, 113–122. doi:10.1145/3...

work page doi:10.1145/3404835.3462891 2021
[16]

Bruce Croft, Fernando Diaz, Leah S

Nasreen Abdul Jaleel, James Allan, W. Bruce Croft, Fernando Diaz, Leah S. Larkey, Xiaoyan Li, Mark D. Smucker, and Courtney Wade. 2004. UMass at TREC 2004: Novelty and HARD. InProceedings of the Thirteenth Text REtrieval Conference, TREC 2004, Gaithersburg, Maryland, USA, November 16-19, 2004 (NIST Special Publication). National Institute of Standards and...

work page 2004
[17]

Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, and Jimmy Lin. 2024. Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Ma...

work page doi:10.1145/3626772.3657862 2024
[18]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20,

work page 2020
[19]

In: Zong, C., Xia, F., Li, W., Navigli, R

Association for Computational Linguistics, 6769–6781. doi:10.18653/V1/ 2020.EMNLP-MAIN.550

work page doi:10.18653/v1/ 2020
[20]

Carlos Lassance, Hervé Déjean, Thibault Formal, and Stéphane Clinchant

work page
[21]

arXiv:2403.06789 doi:10.48550/ARXIV.2403.06789

SPLADE-v3: New baselines for SPLADE.CoRRabs/2403.06789 (2024). arXiv:2403.06789 doi:10.48550/ARXIV.2403.06789

work page doi:10.48550/arxiv.2403.06789 2024
[22]

Jimmy Lin and Xueguang Ma. 2021. A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.CoRR abs/2106.14807 (2021). arXiv:2106.14807 https://arxiv.org/abs/2106.14807

work page arXiv 2021
[23]

Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. 2020. Expansion via prediction of importance with contextualization. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 1573–1576

work page 2020
[24]

Joel Mackenzie, Matthias Petri, and Alistair Moffat. 2022. Anytime Ranking on Document-Ordered Indexes.ACM Trans. Inf. Syst.40, 1 (2022), 13:1–13:32. doi:10.1145/3467890

work page doi:10.1145/3467890 2022
[25]

Joel Mackenzie, Andrew Trotman, and Jimmy Lin. 2021. Wacky Weights in Learned Sparse Representations and the Revenge of Score-at-a-Time Query Evaluation.CoRRabs/2110.11540 (2021). arXiv:2110.11540 https://arxiv.org/abs/ 2110.11540

work page arXiv 2021
[26]

Joel Mackenzie, Shengyao Zhuang, and Guido Zuccon. 2023. Exploring the Representation Power of SPLADE Models. InProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2023, Taipei, Taiwan, 23 July 2023. ACM, 143–147. doi:10.1145/3578337.3605129

work page doi:10.1145/3578337.3605129 2023
[27]

Antonio Mallia, Omar Khattab, Torsten Suel, and Nicola Tonellotto. 2021. Learn- ing Passage Impacts for Inverted Indexes. InSIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. ACM, 1723–1727. doi:10.1145/3404835.3463030

work page doi:10.1145/3404835.3463030 2021
[28]

Antonio Mallia, Giuseppe Ottaviano, Elia Porciani, Nicola Tonellotto, and Rossano Venturini. 2017. Faster BlockMax WAND with Variable-sized Blocks. InProceed- ings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. ACM, 625–634. doi:10.1145/3077136.3080780

work page doi:10.1145/3077136.3080780 2017
[29]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. InProceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems...

work page 2016
[30]

Rodrigo Nogueira, Zhiying Jiang, and Jimmy Lin. 2020. Document Ranking with a Pretrained Sequence-to-Sequence Model. InFindings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL). Association for Computational Linguistics, 708–718. doi:10. 18653/V1/2020.FINDINGS-EMNLP.63

work page 2020
[31]

Rodrigo Nogueira, Jimmy Lin, and AI Epistemic. 2019. From doc2query to docTTTTTquery. (2019)

work page 2019
[32]

Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document Expansion by Query Prediction.CoRRabs/1904.08375 (2019). arXiv:1904.08375 http://arxiv.org/abs/1904.08375

work page arXiv 2019
[33]

Biswajit Paria, Chih-Kuan Yeh, Ian En-Hsu Yen, Ning Xu, Pradeep Raviku- mar, and Barnabás Póczos. 2020. Minimizing FLOPs to Learn Efficient Sparse Representations. In8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https: //openreview.net/forum?id=SygpC6Ntvr

work page 2020
[34]

Aldo Porco, Dhruv Mehra, Igor Malioutov, Karthik Radhakrishnan, Moniba Key- manesh, Daniel Preotiuc-Pietro, Sean MacAvaney, and Pengxiang Cheng. 2025. An Alternative to FLOPS Regularization to Effectively Productionize SPLADE- Doc. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025...

work page 2025
[35]

doi:10.1145/3726302.3730163

ACM, 2789–2793. doi:10.1145/3726302.3730163

work page doi:10.1145/3726302.3730163
[36]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.J. Mach. Learn. Res.21 (2020), 140:1–140:67. https://jmlr.org/papers/v21/20-074.html

work page 2020
[37]

Robertson and Hugo Zaragoza , title =

Stephen E. Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond.Found. Trends Inf. Retr.3, 4 (2009), 333–389. doi:10.1561/1500000019

work page doi:10.1561/1500000019 2009
[38]

Andrew Trotman and Matt Crane. 2019. Micro- and macro-optimizations of SaaT search.Softw. Pract. Exp.49, 5 (2019), 942–950. doi:10.1002/SPE.2683

work page doi:10.1002/spe.2683 2019
[39]

Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding.CoRRabs/1807.03748 (2018). arXiv:1807.03748 http://arxiv.org/abs/1807.03748

work page internal anchor Pith review Pith/arXiv arXiv 2018
[40]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InAdvances in Neural Information Processing Systems 30: Annual Con- ference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 5998–6008

work page 2017
[41]

Liang Wang, Nan Yang, and Furu Wei. 2023. Query2doc: Query Expansion with Large Language Models. (2023), 9414–9423. doi:10.18653/V1/2023.EMNLP- MAIN.585

work page doi:10.18653/v1/2023.emnlp- 2023
[42]

Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate nearest neighbor negative contrastive learning for dense text retrieval. (2021). https://openreview. net/forum?id=zeFrfgyZln Understanding Wacky Weights: A Dissection of SPLADE’s Learned Term Importance SIGIR ’26, July 20–24, 202...

work page 2021
[43]

Honglei Zhuang, Zhen Qin, Rolf Jagerman, Kai Hui, Ji Ma, Jing Lu, Jianmo Ni, Xuanhui Wang, and Michael Bendersky. 2023. Rankt5: Fine-tuning t5 for text ranking with ranking losses. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2308–2313. doi:10.1145/3539618.3592047

work page doi:10.1145/3539618.3592047 2023

[1] [1]

Avishek Anand, Lijun Lyu, Maximilian Idahl, Yumeng Wang, Jonas Wallat, and Zijian Zhang. 2022. Explainable Information Retrieval: A Survey.CoRR abs/2211.02405 (2022). arXiv:2211.02405 doi:10.48550/ARXIV.2211.02405

work page doi:10.48550/arxiv.2211.02405 2022

[2] [2]

Yang Bai, Xiaoguang Li, Gang Wang, Chaoliang Zhang, Lifeng Shang, Jun Xu, Zhaowei Wang, Fangshan Wang, and Qun Liu. 2020. SparTerm: Learning Term- based Sparse Representation for Fast Text Retrieval.CoRRabs/2010.00768 (2020). arXiv:2010.00768 https://arxiv.org/abs/2010.00768

work page arXiv 2020

[3] [3]

Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Y

Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Y. Zien. 2003. Efficient query evaluation using a two-level retrieval process. In Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management, New Orleans, Louisiana, USA, November 2-8, 2003. ACM, 426–434. doi:10.1145/956863.956944

work page doi:10.1145/956863.956944 2003

[4] [4]

Shane Culpepper, Jimmy Lin, Joel M

Matt Crane, J. Shane Culpepper, Jimmy Lin, Joel M. Mackenzie, and Andrew Trotman. 2017. A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation. InProceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017, Cambridge, United Kingdom, February 6-10, 2017. ACM, 201–210. doi:10.1145/3018661.3018726

work page doi:10.1145/3018661.3018726 2017

[5] [5]

Zhuyun Dai and Jamie Callan. 2019. Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval.CoRRabs/1910.10687 (2019). arXiv:1910.10687 http://arxiv.org/abs/1910.10687

work page arXiv 2019

[6] [6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, ...

work page doi:10.18653/v1/n19-1423 2019

[7] [7]

Shuai Ding and Torsten Suel. 2011. Faster top-k document retrieval using block- max indexes. InProceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, July 25-29, 2011. ACM, 993–1002. doi:10.1145/2009916.2010048

work page doi:10.1145/2009916.2010048 2011

[8] [8]

Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant

work page

[9] [9]

doi:10.48550/ARXIV.2109.10086

SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval. CoRRabs/2109.10086 (2021). arXiv:2109.10086 https://arxiv.org/abs/2109.10086

work page arXiv 2021

[10] [10]

Furnas, Thomas K

George W. Furnas, Thomas K. Landauer, Louis M. Gomez, and Susan T. Dumais

work page

[11] [11]

ACM30, 11 (1987), 964–971

The Vocabulary Problem in Human-System Communication.Commun. ACM30, 11 (1987), 964–971. doi:10.1145/32206.32212

work page doi:10.1145/32206.32212 1987

[12] [12]

Luyu Gao, Zhuyun Dai, and Jamie Callan. 2021. COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11,

work page 2021

[13] [14]

Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Allan Hanbury. 2020. Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation.CoRRabs/2010.02666 (2020). arXiv:2010.02666 https://arxiv.org/abs/2010.02666

work page arXiv 2020

[14] [15]

Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, and Allan Hanbury. 2021. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. InSIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. ACM, 113–122. doi:10.1145/3...

work page doi:10.1145/3404835.3462891 2021

[15] [16]

Bruce Croft, Fernando Diaz, Leah S

Nasreen Abdul Jaleel, James Allan, W. Bruce Croft, Fernando Diaz, Leah S. Larkey, Xiaoyan Li, Mark D. Smucker, and Courtney Wade. 2004. UMass at TREC 2004: Novelty and HARD. InProceedings of the Thirteenth Text REtrieval Conference, TREC 2004, Gaithersburg, Maryland, USA, November 16-19, 2004 (NIST Special Publication). National Institute of Standards and...

work page 2004

[16] [17]

Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, and Jimmy Lin. 2024. Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Ma...

work page doi:10.1145/3626772.3657862 2024

[17] [18]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20,

work page 2020

[18] [19]

In: Zong, C., Xia, F., Li, W., Navigli, R

Association for Computational Linguistics, 6769–6781. doi:10.18653/V1/ 2020.EMNLP-MAIN.550

work page doi:10.18653/v1/ 2020

[19] [20]

Carlos Lassance, Hervé Déjean, Thibault Formal, and Stéphane Clinchant

work page

[20] [21]

arXiv:2403.06789 doi:10.48550/ARXIV.2403.06789

SPLADE-v3: New baselines for SPLADE.CoRRabs/2403.06789 (2024). arXiv:2403.06789 doi:10.48550/ARXIV.2403.06789

work page doi:10.48550/arxiv.2403.06789 2024

[21] [22]

Jimmy Lin and Xueguang Ma. 2021. A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.CoRR abs/2106.14807 (2021). arXiv:2106.14807 https://arxiv.org/abs/2106.14807

work page arXiv 2021

[22] [23]

Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. 2020. Expansion via prediction of importance with contextualization. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 1573–1576

work page 2020

[23] [24]

Joel Mackenzie, Matthias Petri, and Alistair Moffat. 2022. Anytime Ranking on Document-Ordered Indexes.ACM Trans. Inf. Syst.40, 1 (2022), 13:1–13:32. doi:10.1145/3467890

work page doi:10.1145/3467890 2022

[24] [25]

Joel Mackenzie, Andrew Trotman, and Jimmy Lin. 2021. Wacky Weights in Learned Sparse Representations and the Revenge of Score-at-a-Time Query Evaluation.CoRRabs/2110.11540 (2021). arXiv:2110.11540 https://arxiv.org/abs/ 2110.11540

work page arXiv 2021

[25] [26]

Joel Mackenzie, Shengyao Zhuang, and Guido Zuccon. 2023. Exploring the Representation Power of SPLADE Models. InProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2023, Taipei, Taiwan, 23 July 2023. ACM, 143–147. doi:10.1145/3578337.3605129

work page doi:10.1145/3578337.3605129 2023

[26] [27]

Antonio Mallia, Omar Khattab, Torsten Suel, and Nicola Tonellotto. 2021. Learn- ing Passage Impacts for Inverted Indexes. InSIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. ACM, 1723–1727. doi:10.1145/3404835.3463030

work page doi:10.1145/3404835.3463030 2021

[27] [28]

Antonio Mallia, Giuseppe Ottaviano, Elia Porciani, Nicola Tonellotto, and Rossano Venturini. 2017. Faster BlockMax WAND with Variable-sized Blocks. InProceed- ings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. ACM, 625–634. doi:10.1145/3077136.3080780

work page doi:10.1145/3077136.3080780 2017

[28] [29]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. InProceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems...

work page 2016

[29] [30]

Rodrigo Nogueira, Zhiying Jiang, and Jimmy Lin. 2020. Document Ranking with a Pretrained Sequence-to-Sequence Model. InFindings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL). Association for Computational Linguistics, 708–718. doi:10. 18653/V1/2020.FINDINGS-EMNLP.63

work page 2020

[30] [31]

Rodrigo Nogueira, Jimmy Lin, and AI Epistemic. 2019. From doc2query to docTTTTTquery. (2019)

work page 2019

[31] [32]

Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document Expansion by Query Prediction.CoRRabs/1904.08375 (2019). arXiv:1904.08375 http://arxiv.org/abs/1904.08375

work page arXiv 2019

[32] [33]

Biswajit Paria, Chih-Kuan Yeh, Ian En-Hsu Yen, Ning Xu, Pradeep Raviku- mar, and Barnabás Póczos. 2020. Minimizing FLOPs to Learn Efficient Sparse Representations. In8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https: //openreview.net/forum?id=SygpC6Ntvr

work page 2020

[33] [34]

Aldo Porco, Dhruv Mehra, Igor Malioutov, Karthik Radhakrishnan, Moniba Key- manesh, Daniel Preotiuc-Pietro, Sean MacAvaney, and Pengxiang Cheng. 2025. An Alternative to FLOPS Regularization to Effectively Productionize SPLADE- Doc. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025...

work page 2025

[34] [35]

doi:10.1145/3726302.3730163

ACM, 2789–2793. doi:10.1145/3726302.3730163

work page doi:10.1145/3726302.3730163

[35] [36]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.J. Mach. Learn. Res.21 (2020), 140:1–140:67. https://jmlr.org/papers/v21/20-074.html

work page 2020

[36] [37]

Robertson and Hugo Zaragoza , title =

Stephen E. Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond.Found. Trends Inf. Retr.3, 4 (2009), 333–389. doi:10.1561/1500000019

work page doi:10.1561/1500000019 2009

[37] [38]

Andrew Trotman and Matt Crane. 2019. Micro- and macro-optimizations of SaaT search.Softw. Pract. Exp.49, 5 (2019), 942–950. doi:10.1002/SPE.2683

work page doi:10.1002/spe.2683 2019

[38] [39]

Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding.CoRRabs/1807.03748 (2018). arXiv:1807.03748 http://arxiv.org/abs/1807.03748

work page internal anchor Pith review Pith/arXiv arXiv 2018

[39] [40]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InAdvances in Neural Information Processing Systems 30: Annual Con- ference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 5998–6008

work page 2017

[40] [41]

Liang Wang, Nan Yang, and Furu Wei. 2023. Query2doc: Query Expansion with Large Language Models. (2023), 9414–9423. doi:10.18653/V1/2023.EMNLP- MAIN.585

work page doi:10.18653/v1/2023.emnlp- 2023

[41] [42]

Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate nearest neighbor negative contrastive learning for dense text retrieval. (2021). https://openreview. net/forum?id=zeFrfgyZln Understanding Wacky Weights: A Dissection of SPLADE’s Learned Term Importance SIGIR ’26, July 20–24, 202...

work page 2021

[42] [43]

Honglei Zhuang, Zhen Qin, Rolf Jagerman, Kai Hui, Ji Ma, Jing Lu, Jianmo Ni, Xuanhui Wang, and Michael Bendersky. 2023. Rankt5: Fine-tuning t5 for text ranking with ranking losses. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2308–2313. doi:10.1145/3539618.3592047

work page doi:10.1145/3539618.3592047 2023