Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Vila: On pre-training for visual language models , author=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding

cs.CV · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

Response-G1 uses query-guided scene graphs, memory retrieval, and augmented prompting to improve when Video-LLMs decide to respond during streaming videos.

AFMRL: Attribute-Enhanced Fine-Grained Multi-Modal Representation Learning in E-commerce

cs.CL · 2026-04-22 · unverdicted · novelty 5.0

AFMRL uses MLLM-generated attributes in attribute-guided contrastive learning and retrieval-aware reinforcement to achieve SOTA fine-grained multimodal retrieval on e-commerce datasets.

Think before Go: Hierarchical Reasoning for Image-goal Navigation

cs.RO · 2026-04-19 · unverdicted · novelty 5.0

HRNav decomposes image-goal navigation into VLM-based short-horizon planning and RL-based execution with a wandering suppression penalty to improve performance in complex unseen settings.

Dual-Anchoring: Addressing State Drift in Vision-Language Navigation

cs.CV · 2026-04-19

citing papers explorer

Showing 4 of 4 citing papers.

Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding cs.CV · 2026-05-08 · unverdicted · none · ref 49 · 2 links
Response-G1 uses query-guided scene graphs, memory retrieval, and augmented prompting to improve when Video-LLMs decide to respond during streaming videos.
AFMRL: Attribute-Enhanced Fine-Grained Multi-Modal Representation Learning in E-commerce cs.CL · 2026-04-22 · unverdicted · none · ref 23
AFMRL uses MLLM-generated attributes in attribute-guided contrastive learning and retrieval-aware reinforcement to achieve SOTA fine-grained multimodal retrieval on e-commerce datasets.
Think before Go: Hierarchical Reasoning for Image-goal Navigation cs.RO · 2026-04-19 · unverdicted · none · ref 93
HRNav decomposes image-goal navigation into VLM-based short-horizon planning and RL-based execution with a wandering suppression penalty to improve performance in complex unseen settings.
Dual-Anchoring: Addressing State Drift in Vision-Language Navigation cs.CV · 2026-04-19 · unreviewed · ref 90

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

fields

years

verdicts

representative citing papers

citing papers explorer