SPLADE : Sparse Lexical and Expansion Model for First Stage Ranking

Thibault Formal, Benjamin Piwowarski, Stéphane Cli nchant · 2021 · arXiv 4835.346309

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

Continual Model Routing in Evolving Model Hubs

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

Formalizes continual model routing (CMR), releases CMRBench with over 2000 models, and presents CARvE which outperforms retrieval, fine-tuning and adapter-merging baselines on model/family/domain accuracy.

On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability

cs.IR · 2026-04-17 · unverdicted · novelty 7.0

LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

cs.CL · 2024-02-05 · unverdicted · novelty 7.0

M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.

Test-Time Compute for Frozen Embedding Models through Agentic Program Search

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

Agentic program search over a frozen encoder API yields retrieval programs that improve nDCG@10 on held-out tasks and unseen encoder families with no per-domain training.

Kernel Affine Hull Machines as Compute-Efficient Encoders for Frozen Semantic Spaces

cs.LG · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

KAHM yields a compute-efficient query encoder that outperforms matched learned adapters in reconstructing a frozen Mixedbread embedding space on an Austrian-law retrieval task while delivering an 8.53x CPU speedup.

A Human-Centric Framework for Data Attribution in Large Language Models

cs.CY · 2026-02-11 · unverdicted · novelty 6.0

Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.

Improving the Efficiency and Effectiveness of LLM Knowledge Distillation for Conversational Search

cs.IR · 2026-06-03 · unverdicted · novelty 5.0

Combining contrastive loss with KLD distillation and adding sparsity regularization improves effectiveness and reduces FLOPS by 2x in conversational search with minimal recall loss.

Fairness-Aware Retrieval Optimization for Retrieval-Augmented Generation

cs.DB · 2026-05-15 · unverdicted · novelty 5.0

Introduces FARO, a scalable quadratic optimization approach for fairness-aware top-k retrieval in RAG that mitigates generation bias via controlled reranking and position-aware propagation modeling.

uva-irlab-conv at SemEval-2026 Task 8: Multi-Turn RAG with Learned Sparse Retrieval and Listwise Reranking

cs.CL · 2026-06-10 · unverdicted · novelty 2.0

A multi-turn RAG system combines learned sparse retrieval with LLM-conditioned rewriting, listwise reranking, and generation to handle conversational QA and unanswerable queries across four domains.

From Tokens to Concepts: Leveraging SAE for SPLADE

cs.IR · 2026-04-23

citing papers explorer

Showing 10 of 10 citing papers.

Continual Model Routing in Evolving Model Hubs cs.AI · 2026-05-27 · unverdicted · none · ref 8
Formalizes continual model routing (CMR), releases CMRBench with over 2000 models, and presents CARvE which outperforms retrieval, fine-tuning and adapter-merging baselines on model/family/domain accuracy.
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability cs.IR · 2026-04-17 · unverdicted · none · ref 21
LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation cs.CL · 2024-02-05 · unverdicted · none · ref 24
M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.
Test-Time Compute for Frozen Embedding Models through Agentic Program Search cs.LG · 2026-05-12 · unverdicted · none · ref 5 · 2 links
Agentic program search over a frozen encoder API yields retrieval programs that improve nDCG@10 on held-out tasks and unseen encoder families with no per-domain training.
Kernel Affine Hull Machines as Compute-Efficient Encoders for Frozen Semantic Spaces cs.LG · 2026-05-01 · unverdicted · none · ref 9 · 2 links
KAHM yields a compute-efficient query encoder that outperforms matched learned adapters in reconstructing a frozen Mixedbread embedding space on an Austrian-law retrieval task while delivering an 8.53x CPU speedup.
A Human-Centric Framework for Data Attribution in Large Language Models cs.CY · 2026-02-11 · unverdicted · none · ref 72
Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.
Improving the Efficiency and Effectiveness of LLM Knowledge Distillation for Conversational Search cs.IR · 2026-06-03 · unverdicted · none · ref 8
Combining contrastive loss with KLD distillation and adding sparsity regularization improves effectiveness and reduces FLOPS by 2x in conversational search with minimal recall loss.
Fairness-Aware Retrieval Optimization for Retrieval-Augmented Generation cs.DB · 2026-05-15 · unverdicted · none · ref 32
Introduces FARO, a scalable quadratic optimization approach for fairness-aware top-k retrieval in RAG that mitigates generation bias via controlled reranking and position-aware propagation modeling.
uva-irlab-conv at SemEval-2026 Task 8: Multi-Turn RAG with Learned Sparse Retrieval and Listwise Reranking cs.CL · 2026-06-10 · unverdicted · none · ref 36
A multi-turn RAG system combines learned sparse retrieval with LLM-conditioned rewriting, listwise reranking, and generation to handle conversational QA and unanswerable queries across four domains.
From Tokens to Concepts: Leveraging SAE for SPLADE cs.IR · 2026-04-23 · unreviewed · ref 19

SPLADE : Sparse Lexical and Expansion Model for First Stage Ranking

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer