hub Canonical reference

C ol BERT v2: Effective and Efficient Retrieval via Lightweight Late Interaction

Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, Matei Zaharia · 2022 · Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies · DOI 10.18653/v1/2022.naacl-main.272

Canonical reference. 100% of citing Pith papers cite this work as background.

24 Pith papers citing it

214 external citations · Crossref

Background 100% of classified citations

open at publisher browse 24 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 8

citation-polarity summary

background 8

representative citing papers

Closing the Calibration Gap in Semantic Caching

cs.IR · 2026-06-18 · unverdicted · novelty 7.0

Introduces P-CHR AUC and CRR metrics to demonstrate that semantic caching model selection is limited by calibration quality rather than ranking performance.

Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation

cs.CL · 2026-06-17 · unverdicted · novelty 7.0

DICE aggregates independently encoded document chunks into a single vector to reduce evidence dilution in long-document dense retrieval, reporting gains on LongEmbed especially beyond 4k tokens.

IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.

NumColBERT: Non-Intrusive Numeracy Injection for Late-Interaction Retrieval Models

cs.IR · 2026-05-11 · unverdicted · novelty 7.0

NumColBERT improves ColBERT performance on numerical query conditions non-intrusively via gating and contrastive learning, outperforming fine-tuning while matching or exceeding separate text-number scoring methods.

When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models

cs.IR · 2026-04-29 · unverdicted · novelty 7.0

ReaLM-Retrieve uses step-level uncertainty to trigger retrievals during reasoning, achieving 10.1% better F1 scores and 47% fewer calls on multi-hop QA benchmarks.

HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads

cs.IR · 2026-04-19 · unverdicted · novelty 7.0

HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.

On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability

cs.IR · 2026-04-17 · unverdicted · novelty 7.0

LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,

Extending Context Window of Large Language Models via Positional Interpolation

cs.CL · 2023-06-27 · conditional · novelty 7.0

Position Interpolation linearly down-scales position indices to extend RoPE context windows to 32768 tokens with 1000-step fine-tuning, delivering strong long-context results on LLaMA 7B-65B while preserving short-context quality.

Test-Time Compute for Frozen Embedding Models through Agentic Program Search

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

Agentic program search over a frozen encoder API yields retrieval programs that improve nDCG@10 on held-out tasks and unseen encoder families with no per-domain training.

The Interference Gap: Comparing Retrieval Bounds in Human Memory and RAG Systems

cs.IR · 2026-05-09 · unverdicted · novelty 6.0

Unified SDT model finds humans less sensitive to interference (α/σ=0.41) than dense passage retrieval (0.67), with HippoRAG intermediate (0.44), backed by N=112 experiments and simulations favoring logarithmic over power-law decline.

DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

cs.IR · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

DiffRetriever uses parallel masked tokens in diffusion LMs for retrieval representations, outperforming DiffEmbed and other baselines on aggregate effectiveness while supporting efficient multi-representation matching.

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.

A Replicability Study of XTR

cs.IR · 2026-05-01 · accept · novelty 6.0

XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.

Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing

cs.IR · 2026-04-30 · unverdicted · novelty 6.0

TACHIOM speeds up multivector retrieval by up to 247x in clustering and 9.8x in retrieval on MS-MARCOv1 and LoTTE benchmarks using token-distribution-aware centroid allocation and a graph-plus-PQ index, with comparable effectiveness to prior systems.

NeocorRAG: Less Irrelevant Information, More Explicit Evidence, and More Effective Recall via Evidence Chains

cs.IR · 2026-04-30 · unverdicted · novelty 6.0

NeocorRAG uses Evidence Chains to achieve SOTA retrieval quality in RAG on HotpotQA, 2WikiMultiHopQA, MuSiQue, and NQ for 3B and 70B models while using under 20% of the tokens of comparable methods.

ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation

cs.IR · 2026-04-14 · unverdicted · novelty 6.0

ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.

A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models

cs.IR · 2026-03-10 · unverdicted · novelty 6.0

A Voronoi cell estimation framework in embedding space enables principled token pruning for late-interaction models, reducing index size while retaining retrieval quality.

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

cs.CL · 2024-01-31 · unverdicted · novelty 6.0

RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.

Improving Long-Context Retrieval with Multi-Prefix Embedding

cs.IR · 2026-06-22 · unverdicted · novelty 5.0

Multi-Prefix Embedding extracts per-chunk embeddings from a single forward pass over EOS-separated document chunks and matches via MaxSim while training only on document-level labels.

Spike Hijacking in Late-Interaction Retrieval

cs.IR · 2026-04-06 · unverdicted · novelty 5.0

Hard maximum similarity pooling in late-interaction models induces higher patch-level gradient concentration and greater length sensitivity than top-k or softmax alternatives.

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

cs.CL · 2024-12-18 · unverdicted · novelty 5.0

ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.

Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval

cs.IR · 2026-04-29 · conditional · novelty 3.0

Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.

Overview of the EReL@MIR 2025 Multimodal Document Retrieval Challenge (Track 1)

cs.CV · 2026-06-02 · unverdicted · novelty 1.0

The EReL@MIR 2025 Track 1 challenge evaluates single systems on two multimodal retrieval tasks and finds that Qwen2-VL decoder-based embedders dominate, with a training-free entry within 0.1 points of the fine-tuned winner.

Kernel Affine Hull Machines as Compute-Efficient Encoders for Frozen Semantic Spaces

cs.LG · 2026-05-01

citing papers explorer

Showing 21 of 21 citing papers after filters.

Closing the Calibration Gap in Semantic Caching cs.IR · 2026-06-18 · unverdicted · none · ref 29
Introduces P-CHR AUC and CRR metrics to demonstrate that semantic caching model selection is limited by calibration quality rather than ranking performance.
Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation cs.CL · 2026-06-17 · unverdicted · none · ref 21
DICE aggregates independently encoded document chunks into a single vector to reduce evidence dilution in long-document dense retrieval, reporting gains on LongEmbed especially beyond 4k tokens.
IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions cs.CL · 2026-05-21 · unverdicted · none · ref 47
IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.
NumColBERT: Non-Intrusive Numeracy Injection for Late-Interaction Retrieval Models cs.IR · 2026-05-11 · unverdicted · none · ref 36
NumColBERT improves ColBERT performance on numerical query conditions non-intrusively via gating and contrastive learning, outperforming fine-tuning while matching or exceeding separate text-number scoring methods.
When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models cs.IR · 2026-04-29 · unverdicted · none · ref 26
ReaLM-Retrieve uses step-level uncertainty to trigger retrievals during reasoning, achieving 10.1% better F1 scores and 47% fewer calls on multi-hop QA benchmarks.
HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads cs.IR · 2026-04-19 · unverdicted · none · ref 15
HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability cs.IR · 2026-04-17 · unverdicted · none · ref 58
LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,
Test-Time Compute for Frozen Embedding Models through Agentic Program Search cs.LG · 2026-05-12 · unverdicted · none · ref 25 · 2 links
Agentic program search over a frozen encoder API yields retrieval programs that improve nDCG@10 on held-out tasks and unseen encoder families with no per-domain training.
The Interference Gap: Comparing Retrieval Bounds in Human Memory and RAG Systems cs.IR · 2026-05-09 · unverdicted · none · ref 20
Unified SDT model finds humans less sensitive to interference (α/σ=0.41) than dense passage retrieval (0.67), with HippoRAG intermediate (0.44), backed by N=112 experiments and simulations favoring logarithmic over power-law decline.
DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models cs.IR · 2026-05-08 · unverdicted · none · ref 5 · 2 links
DiffRetriever uses parallel masked tokens in diffusion LMs for retrieval representations, outperforming DiffEmbed and other baselines on aggregate effectiveness while supporting efficient multi-representation matching.
Retrieval from Within: An Intrinsic Capability of Attention-Based Models cs.LG · 2026-05-07 · unverdicted · none · ref 30 · 2 links
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
A Replicability Study of XTR cs.IR · 2026-05-01 · accept · none · ref 25
XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.
Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing cs.IR · 2026-04-30 · unverdicted · none · ref 30
TACHIOM speeds up multivector retrieval by up to 247x in clustering and 9.8x in retrieval on MS-MARCOv1 and LoTTE benchmarks using token-distribution-aware centroid allocation and a graph-plus-PQ index, with comparable effectiveness to prior systems.
NeocorRAG: Less Irrelevant Information, More Explicit Evidence, and More Effective Recall via Evidence Chains cs.IR · 2026-04-30 · unverdicted · none · ref 38
NeocorRAG uses Evidence Chains to achieve SOTA retrieval quality in RAG on HotpotQA, 2WikiMultiHopQA, MuSiQue, and NQ for 3B and 70B models while using under 20% of the tokens of comparable methods.
ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation cs.IR · 2026-04-14 · unverdicted · none · ref 90
ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.
A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models cs.IR · 2026-03-10 · unverdicted · none · ref 28
A Voronoi cell estimation framework in embedding space enables principled token pruning for late-interaction models, reducing index size while retaining retrieval quality.
Improving Long-Context Retrieval with Multi-Prefix Embedding cs.IR · 2026-06-22 · unverdicted · none · ref 18
Multi-Prefix Embedding extracts per-chunk embeddings from a single forward pass over EOS-separated document chunks and matches via MaxSim while training only on document-level labels.
Spike Hijacking in Late-Interaction Retrieval cs.IR · 2026-04-06 · unverdicted · none · ref 4
Hard maximum similarity pooling in late-interaction models induces higher patch-level gradient concentration and greater length sensitivity than top-k or softmax alternatives.
Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval cs.IR · 2026-04-29 · conditional · none · ref 48
Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.
Overview of the EReL@MIR 2025 Multimodal Document Retrieval Challenge (Track 1) cs.CV · 2026-06-02 · unverdicted · none · ref 21
The EReL@MIR 2025 Track 1 challenge evaluates single systems on two multimodal retrieval tasks and finds that Qwen2-VL decoder-based embedders dominate, with a training-free entry within 0.1 points of the fine-tuned winner.
Kernel Affine Hull Machines as Compute-Efficient Encoders for Frozen Semantic Spaces cs.LG · 2026-05-01 · unreviewed · ref 41

C ol BERT v2: Effective and Efficient Retrieval via Lightweight Late Interaction

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer