ColChunk adaptively chunks visual document patches into contextual multi-vectors via clustering, cutting storage by over 90% while raising average nDCG@5 by 9 points.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
dataset 1polarities
use dataset 1representative citing papers
A scalable training-free pipeline using video segmentation, filtering, and off-the-shelf multimodal models creates DenseStep2M, a dataset of 100K videos and 2M detailed instructional steps that improves dense captioning, step grounding, and cross-modal retrieval.
UniSID jointly optimizes embeddings and Semantic IDs end-to-end with multi-granularity contrastive learning and summary-based reconstruction, outperforming RQ-based methods by up to 4.62% in Hit Rate for ad recommendation.
citing papers explorer
-
Visual Late Chunking: An Empirical Study of Contextual Chunking for Efficient Visual Document Retrieval
ColChunk adaptively chunks visual document patches into contextual multi-vectors via clustering, cutting storage by over 90% while raising average nDCG@5 by 9 points.
-
DenseStep2M: A Scalable, Training-Free Pipeline for Dense Instructional Video Annotation
A scalable training-free pipeline using video segmentation, filtering, and off-the-shelf multimodal models creates DenseStep2M, a dataset of 100K videos and 2M detailed instructional steps that improves dense captioning, step grounding, and cross-modal retrieval.
-
End-to-End Semantic ID Generation for Generative Advertisement Recommendation
UniSID jointly optimizes embeddings and Semantic IDs end-to-end with multi-granularity contrastive learning and summary-based reconstruction, outperforming RQ-based methods by up to 4.62% in Hit Rate for ad recommendation.