In: CVPR (2025), https://arxiv.org/abs/2412.01720

Yikun Liu, Pingan Chen, Jiayin Cai, Xiaolong Jiang, Yao Hu, Jiangchao Yao, Yanfeng Wang, Weidi Xie · 2024 · arXiv 2412.01720

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Bottleneck Tokens for Unified Multimodal Retrieval

cs.LG · 2026-04-13 · unverdicted · novelty 7.0

Bottleneck Tokens paired with a masked generative objective achieve state-of-the-art unified multimodal retrieval performance among 2B-scale models on the MMEB-V2 benchmark with 78 datasets.

Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering

cs.CV · 2025-08-31 · unverdicted · novelty 6.0

PMSR progressively constructs structured reasoning trajectories with dual-scope queries and compositional reasoning to improve knowledge acquisition and answer accuracy in knowledge-intensive VQA.

VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

cs.CV · 2025-07-07 · unverdicted · novelty 5.0

VLM2Vec-V2 is a multimodal embedding model trained on an extended MMEB-V2 benchmark that adds video and visual document tasks and reports gains on both new and prior image benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

Bottleneck Tokens for Unified Multimodal Retrieval cs.LG · 2026-04-13 · unverdicted · none · ref 15
Bottleneck Tokens paired with a masked generative objective achieve state-of-the-art unified multimodal retrieval performance among 2B-scale models on the MMEB-V2 benchmark with 78 datasets.
Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering cs.CV · 2025-08-31 · unverdicted · none · ref 26
PMSR progressively constructs structured reasoning trajectories with dual-scope queries and compositional reasoning to improve knowledge acquisition and answer accuracy in knowledge-intensive VQA.
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents cs.CV · 2025-07-07 · unverdicted · none · ref 17
VLM2Vec-V2 is a multimodal embedding model trained on an extended MMEB-V2 benchmark that adds video and visual document tasks and reports gains on both new and prior image benchmarks.

In: CVPR (2025), https://arxiv.org/abs/2412.01720

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer