ColChunk adaptively chunks visual document patches into contextual multi-vectors via clustering, cutting storage by over 90% while raising average nDCG@5 by 9 points.
Reducing the footprint of multi-vector retrieval with minimal per- formance impact via token pooling.arXiv preprint arXiv:2409.14683,
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
background 1polarities
background 1representative citing papers
Prune-then-Merge combines adaptive pruning of low-signal patches with hierarchical merging to achieve higher compression rates and better performance than prior single-stage methods in visual document retrieval.
XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.
A Voronoi cell estimation framework in embedding space enables principled token pruning for late-interaction models, reducing index size while retaining retrieval quality.
LEMUR accelerates multi-vector retrieval by learning a neural network approximation to MaxSim and reducing it to single-vector search in latent space.
citing papers explorer
-
Visual Late Chunking: An Empirical Study of Contextual Chunking for Efficient Visual Document Retrieval
ColChunk adaptively chunks visual document patches into contextual multi-vectors via clustering, cutting storage by over 90% while raising average nDCG@5 by 9 points.
-
Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework
Prune-then-Merge combines adaptive pruning of low-signal patches with hierarchical merging to achieve higher compression rates and better performance than prior single-stage methods in visual document retrieval.
-
A Replicability Study of XTR
XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.
-
A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models
A Voronoi cell estimation framework in embedding space enables principled token pruning for late-interaction models, reducing index size while retaining retrieval quality.
-
LEMUR: Learned Multi-Vector Retrieval
LEMUR accelerates multi-vector retrieval by learning a neural network approximation to MaxSim and reducing it to single-vector search in latent space.