Introduces P-CHR AUC and CRR metrics to demonstrate that semantic caching model selection is limited by calibration quality rather than ranking performance.
hub Canonical reference
C ol BERT v2: Effective and Efficient Retrieval via Lightweight Late Interaction
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 8polarities
background 8representative citing papers
DICE aggregates independently encoded document chunks into a single vector to reduce evidence dilution in long-document dense retrieval, reporting gains on LongEmbed especially beyond 4k tokens.
IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.
NumColBERT improves ColBERT performance on numerical query conditions non-intrusively via gating and contrastive learning, outperforming fine-tuning while matching or exceeding separate text-number scoring methods.
ReaLM-Retrieve uses step-level uncertainty to trigger retrievals during reasoning, achieving 10.1% better F1 scores and 47% fewer calls on multi-hop QA benchmarks.
HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.
LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,
Position Interpolation linearly down-scales position indices to extend RoPE context windows to 32768 tokens with 1000-step fine-tuning, delivering strong long-context results on LLaMA 7B-65B while preserving short-context quality.
Agentic program search over a frozen encoder API yields retrieval programs that improve nDCG@10 on held-out tasks and unseen encoder families with no per-domain training.
Unified SDT model finds humans less sensitive to interference (α/σ=0.41) than dense passage retrieval (0.67), with HippoRAG intermediate (0.44), backed by N=112 experiments and simulations favoring logarithmic over power-law decline.
DiffRetriever uses parallel masked tokens in diffusion LMs for retrieval representations, outperforming DiffEmbed and other baselines on aggregate effectiveness while supporting efficient multi-representation matching.
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.
TACHIOM speeds up multivector retrieval by up to 247x in clustering and 9.8x in retrieval on MS-MARCOv1 and LoTTE benchmarks using token-distribution-aware centroid allocation and a graph-plus-PQ index, with comparable effectiveness to prior systems.
NeocorRAG uses Evidence Chains to achieve SOTA retrieval quality in RAG on HotpotQA, 2WikiMultiHopQA, MuSiQue, and NQ for 3B and 70B models while using under 20% of the tokens of comparable methods.
ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.
A Voronoi cell estimation framework in embedding space enables principled token pruning for late-interaction models, reducing index size while retaining retrieval quality.
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
Multi-Prefix Embedding extracts per-chunk embeddings from a single forward pass over EOS-separated document chunks and matches via MaxSim while training only on document-level labels.
Hard maximum similarity pooling in late-interaction models induces higher patch-level gradient concentration and greater length sensitivity than top-k or softmax alternatives.
ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.
Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.
The EReL@MIR 2025 Track 1 challenge evaluates single systems on two multimodal retrieval tasks and finds that Qwen2-VL decoder-based embedders dominate, with a training-free entry within 0.1 points of the fine-tuned winner.
citing papers explorer
-
Closing the Calibration Gap in Semantic Caching
Introduces P-CHR AUC and CRR metrics to demonstrate that semantic caching model selection is limited by calibration quality rather than ranking performance.
-
Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation
DICE aggregates independently encoded document chunks into a single vector to reduce evidence dilution in long-document dense retrieval, reporting gains on LongEmbed especially beyond 4k tokens.
-
IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions
IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.
-
NumColBERT: Non-Intrusive Numeracy Injection for Late-Interaction Retrieval Models
NumColBERT improves ColBERT performance on numerical query conditions non-intrusively via gating and contrastive learning, outperforming fine-tuning while matching or exceeding separate text-number scoring methods.
-
When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models
ReaLM-Retrieve uses step-level uncertainty to trigger retrievals during reasoning, achieving 10.1% better F1 scores and 47% fewer calls on multi-hop QA benchmarks.
-
HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads
HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.
-
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability
LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,
-
Test-Time Compute for Frozen Embedding Models through Agentic Program Search
Agentic program search over a frozen encoder API yields retrieval programs that improve nDCG@10 on held-out tasks and unseen encoder families with no per-domain training.
-
The Interference Gap: Comparing Retrieval Bounds in Human Memory and RAG Systems
Unified SDT model finds humans less sensitive to interference (α/σ=0.41) than dense passage retrieval (0.67), with HippoRAG intermediate (0.44), backed by N=112 experiments and simulations favoring logarithmic over power-law decline.
-
DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models
DiffRetriever uses parallel masked tokens in diffusion LMs for retrieval representations, outperforming DiffEmbed and other baselines on aggregate effectiveness while supporting efficient multi-representation matching.
-
Retrieval from Within: An Intrinsic Capability of Attention-Based Models
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
-
A Replicability Study of XTR
XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.
-
Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing
TACHIOM speeds up multivector retrieval by up to 247x in clustering and 9.8x in retrieval on MS-MARCOv1 and LoTTE benchmarks using token-distribution-aware centroid allocation and a graph-plus-PQ index, with comparable effectiveness to prior systems.
-
NeocorRAG: Less Irrelevant Information, More Explicit Evidence, and More Effective Recall via Evidence Chains
NeocorRAG uses Evidence Chains to achieve SOTA retrieval quality in RAG on HotpotQA, 2WikiMultiHopQA, MuSiQue, and NQ for 3B and 70B models while using under 20% of the tokens of comparable methods.
-
ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation
ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.
-
A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models
A Voronoi cell estimation framework in embedding space enables principled token pruning for late-interaction models, reducing index size while retaining retrieval quality.
-
Improving Long-Context Retrieval with Multi-Prefix Embedding
Multi-Prefix Embedding extracts per-chunk embeddings from a single forward pass over EOS-separated document chunks and matches via MaxSim while training only on document-level labels.
-
Spike Hijacking in Late-Interaction Retrieval
Hard maximum similarity pooling in late-interaction models induces higher patch-level gradient concentration and greater length sensitivity than top-k or softmax alternatives.
-
Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval
Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.
-
Overview of the EReL@MIR 2025 Multimodal Document Retrieval Challenge (Track 1)
The EReL@MIR 2025 Track 1 challenge evaluates single systems on two multimodal retrieval tasks and finds that Qwen2-VL decoder-based embedders dominate, with a training-free entry within 0.1 points of the fine-tuned winner.
- Kernel Affine Hull Machines as Compute-Efficient Encoders for Frozen Semantic Spaces