Improving eﬀicient neural ranking models with cross-architecture knowledge distilla- tion

Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, Allan Hanbury · 2020 · arXiv 2010.02666

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

representative citing papers

Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval

cs.IR · 2026-04-26 · accept · novelty 7.0

Prism-Reranker models output relevance, contribution statements, and evidence passages to support agentic retrieval beyond scalar scoring.

A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation

cs.IR · 2026-04-15 · unverdicted · novelty 7.0

A single model unifies retrieval and context compression for on-device RAG via shared representations, matching traditional RAG performance at 1/10 context size with no extra storage.

Understanding Wacky Weights: A Dissection of SPLADE's Learned Term Importance

cs.IR · 2026-05-19 · conditional · novelty 6.0

SPLADE models produce wacky expansion terms whose prevalence rises with larger vocabularies and falls with stricter sparsity; these terms primarily aid in-domain retrieval rather than out-of-domain generalization.

LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations

cs.IR · 2025-09-16 · conditional · novelty 6.0

LEAF distills teacher-aligned student embedding models that achieve new SOTA results on BEIR and MTEB for their size class while requiring only modest data and compute.

Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval

cs.IR · 2026-04-06 · unverdicted · novelty 5.0

Stratified sampling preserving teacher score distribution outperforms hard-negative mining as a robust baseline for knowledge distillation in dense retrieval.

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

cs.CL · 2026-02-17 · unverdicted · novelty 5.0

A distillation-plus-task-contrastive training regimen yields compact embedding models that match or exceed state-of-the-art performance for their size while supporting 32k-token contexts and quantization.

The Role of Vocabularies in Learning Sparse Representations for Ranking

cs.IR · 2025-09-20 · unverdicted · novelty 5.0

Larger 100K vocabularies in SPLADE models, especially those initialized with ESPLADE pretraining, improve retrieval effectiveness after pruning compared to 32K baselines while keeping similar efficiency.

Unified Supervision for Walmart's Sponsored Search Retrieval via Joint Semantic Relevance and Behavioral Engagement Modeling

cs.IR · 2026-04-09 · unverdicted · novelty 4.0

A hybrid supervision method for bi-encoder retrievers combines graded relevance from teacher models, production retrieval priors, and selective engagement to improve relevance and NDCG over Walmart's current sponsored search system.

citing papers explorer

Showing 8 of 8 citing papers.

Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval cs.IR · 2026-04-26 · accept · none · ref 6
Prism-Reranker models output relevance, contribution statements, and evidence passages to support agentic retrieval beyond scalar scoring.
A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation cs.IR · 2026-04-15 · unverdicted · none · ref 16
A single model unifies retrieval and context compression for on-device RAG via shared representations, matching traditional RAG performance at 1/10 context size with no extra storage.
Understanding Wacky Weights: A Dissection of SPLADE's Learned Term Importance cs.IR · 2026-05-19 · conditional · none · ref 14
SPLADE models produce wacky expansion terms whose prevalence rises with larger vocabularies and falls with stricter sparsity; these terms primarily aid in-domain retrieval rather than out-of-domain generalization.
LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations cs.IR · 2025-09-16 · conditional · none · ref 10
LEAF distills teacher-aligned student embedding models that achieve new SOTA results on BEIR and MTEB for their size class while requiring only modest data and compute.
Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval cs.IR · 2026-04-06 · unverdicted · none · ref 7
Stratified sampling preserving teacher score distribution outperforms hard-negative mining as a robust baseline for knowledge distillation in dense retrieval.
jina-embeddings-v5-text: Task-Targeted Embedding Distillation cs.CL · 2026-02-17 · unverdicted · none · ref 5
A distillation-plus-task-contrastive training regimen yields compact embedding models that match or exceed state-of-the-art performance for their size while supporting 32k-token contexts and quantization.
The Role of Vocabularies in Learning Sparse Representations for Ranking cs.IR · 2025-09-20 · unverdicted · none · ref 8
Larger 100K vocabularies in SPLADE models, especially those initialized with ESPLADE pretraining, improve retrieval effectiveness after pruning compared to 32K baselines while keeping similar efficiency.
Unified Supervision for Walmart's Sponsored Search Retrieval via Joint Semantic Relevance and Behavioral Engagement Modeling cs.IR · 2026-04-09 · unverdicted · none · ref 7
A hybrid supervision method for bi-encoder retrievers combines graded relevance from teacher models, production retrieval priors, and selective engagement to improve relevance and NDCG over Walmart's current sponsored search system.

Improving eﬀicient neural ranking models with cross-architecture knowledge distilla- tion

fields

years

verdicts

representative citing papers

citing papers explorer