Canonical reference

C ol BERT v2: Effective and Efficient Retrieval via Lightweight Late Interaction

Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, Matei Zaharia · 2022 · Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies · DOI 10.18653/v1/2022.naacl-main.272

Canonical reference. 100% of citing Pith papers cite this work as background.

19 Pith papers citing it

214 external citations · Crossref

Background 100% of classified citations

open at publisher browse 19 citing papers

citation-role summary

background 8

citation-polarity summary

background 8

representative citing papers

IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.

NumColBERT: Non-Intrusive Numeracy Injection for Late-Interaction Retrieval Models

cs.IR · 2026-05-11 · unverdicted · novelty 7.0

NumColBERT improves ColBERT performance on numerical query conditions non-intrusively via gating and contrastive learning, outperforming fine-tuning while matching or exceeding separate text-number scoring methods.

When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models

cs.IR · 2026-04-29 · unverdicted · novelty 7.0

ReaLM-Retrieve uses step-level uncertainty to trigger retrievals during reasoning, achieving 10.1% better F1 scores and 47% fewer calls on multi-hop QA benchmarks.

HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads

cs.IR · 2026-04-19 · unverdicted · novelty 7.0

HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.

On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability

cs.IR · 2026-04-17 · unverdicted · novelty 7.0

LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,

Extending Context Window of Large Language Models via Positional Interpolation

cs.CL · 2023-06-27 · conditional · novelty 7.0

Position Interpolation linearly down-scales position indices to extend RoPE context windows to 32768 tokens with 1000-step fine-tuning, delivering strong long-context results on LLaMA 7B-65B while preserving short-context quality.

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.

Kernel Affine Hull Machines for Compute-Efficient Query-Side Semantic Encoding

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

Kernel Affine Hull Machines map lexical features to semantic embeddings via RKHS and least-mean-squares, outperforming adapters in reconstruction and retrieval metrics while reducing latency 8.5-fold on a legal benchmark.

A Replicability Study of XTR

cs.IR · 2026-05-01 · accept · novelty 6.0

XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.

Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing

cs.IR · 2026-04-30 · unverdicted · novelty 6.0

TACHIOM speeds up multivector retrieval by up to 247x in clustering and 9.8x in retrieval on MS-MARCOv1 and LoTTE benchmarks using token-distribution-aware centroid allocation and a graph-plus-PQ index, with comparable effectiveness to prior systems.

NeocorRAG: Less Irrelevant Information, More Explicit Evidence, and More Effective Recall via Evidence Chains

cs.IR · 2026-04-30 · unverdicted · novelty 6.0

NeocorRAG uses Evidence Chains to achieve SOTA retrieval quality in RAG on HotpotQA, 2WikiMultiHopQA, MuSiQue, and NQ for 3B and 70B models while using under 20% of the tokens of comparable methods.

ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation

cs.IR · 2026-04-14 · unverdicted · novelty 6.0

ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.

A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models

cs.IR · 2026-03-10 · unverdicted · novelty 6.0

A Voronoi cell estimation framework in embedding space enables principled token pruning for late-interaction models, reducing index size while retaining retrieval quality.

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

cs.CL · 2024-01-31 · unverdicted · novelty 6.0

RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.

Spike Hijacking in Late-Interaction Retrieval

cs.IR · 2026-04-06 · unverdicted · novelty 5.0

Hard maximum similarity pooling in late-interaction models induces higher patch-level gradient concentration and greater length sensitivity than top-k or softmax alternatives.

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

cs.CL · 2024-12-18 · unverdicted · novelty 5.0

ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.

Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval

cs.IR · 2026-04-29 · conditional · novelty 3.0

Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.

Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models

cs.LG · 2026-05-12

DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

cs.IR · 2026-05-08

citing papers explorer

Showing 19 of 19 citing papers.

IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions cs.CL · 2026-05-21 · unverdicted · none · ref 47
IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.
NumColBERT: Non-Intrusive Numeracy Injection for Late-Interaction Retrieval Models cs.IR · 2026-05-11 · unverdicted · none · ref 36
NumColBERT improves ColBERT performance on numerical query conditions non-intrusively via gating and contrastive learning, outperforming fine-tuning while matching or exceeding separate text-number scoring methods.
When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models cs.IR · 2026-04-29 · unverdicted · none · ref 26
ReaLM-Retrieve uses step-level uncertainty to trigger retrievals during reasoning, achieving 10.1% better F1 scores and 47% fewer calls on multi-hop QA benchmarks.
HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads cs.IR · 2026-04-19 · unverdicted · none · ref 15
HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability cs.IR · 2026-04-17 · unverdicted · none · ref 58
LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,
Extending Context Window of Large Language Models via Positional Interpolation cs.CL · 2023-06-27 · conditional · none · ref 15
Position Interpolation linearly down-scales position indices to extend RoPE context windows to 32768 tokens with 1000-step fine-tuning, delivering strong long-context results on LLaMA 7B-65B while preserving short-context quality.
Retrieval from Within: An Intrinsic Capability of Attention-Based Models cs.LG · 2026-05-07 · unverdicted · none · ref 30 · 2 links
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
Kernel Affine Hull Machines for Compute-Efficient Query-Side Semantic Encoding cs.LG · 2026-05-01 · unverdicted · none · ref 41
Kernel Affine Hull Machines map lexical features to semantic embeddings via RKHS and least-mean-squares, outperforming adapters in reconstruction and retrieval metrics while reducing latency 8.5-fold on a legal benchmark.
A Replicability Study of XTR cs.IR · 2026-05-01 · accept · none · ref 25
XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.
Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing cs.IR · 2026-04-30 · unverdicted · none · ref 30
TACHIOM speeds up multivector retrieval by up to 247x in clustering and 9.8x in retrieval on MS-MARCOv1 and LoTTE benchmarks using token-distribution-aware centroid allocation and a graph-plus-PQ index, with comparable effectiveness to prior systems.
NeocorRAG: Less Irrelevant Information, More Explicit Evidence, and More Effective Recall via Evidence Chains cs.IR · 2026-04-30 · unverdicted · none · ref 38
NeocorRAG uses Evidence Chains to achieve SOTA retrieval quality in RAG on HotpotQA, 2WikiMultiHopQA, MuSiQue, and NQ for 3B and 70B models while using under 20% of the tokens of comparable methods.
ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation cs.IR · 2026-04-14 · unverdicted · none · ref 90
ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.
A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models cs.IR · 2026-03-10 · unverdicted · none · ref 28
A Voronoi cell estimation framework in embedding space enables principled token pruning for late-interaction models, reducing index size while retaining retrieval quality.
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval cs.CL · 2024-01-31 · unverdicted · none · ref 160
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
Spike Hijacking in Late-Interaction Retrieval cs.IR · 2026-04-06 · unverdicted · none · ref 4
Hard maximum similarity pooling in late-interaction models induces higher patch-level gradient concentration and greater length sensitivity than top-k or softmax alternatives.
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference cs.CL · 2024-12-18 · unverdicted · none · ref 180
ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.
Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval cs.IR · 2026-04-29 · conditional · none · ref 48
Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.
Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models cs.LG · 2026-05-12 · unreviewed · ref 25
DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models cs.IR · 2026-05-08 · unreviewed · ref 5

C ol BERT v2: Effective and Efficient Retrieval via Lightweight Late Interaction

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer