Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Vila: On pre-training for visual language models , author=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding

cs.CV · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

Response-G1 uses query-guided scene graphs, memory retrieval, and augmented prompting to improve when Video-LLMs decide to respond during streaming videos.

AFMRL: Attribute-Enhanced Fine-Grained Multi-Modal Representation Learning in E-commerce

cs.CL · 2026-04-22 · unverdicted · novelty 5.0

AFMRL uses MLLM-generated attributes in attribute-guided contrastive learning and retrieval-aware reinforcement to achieve SOTA fine-grained multimodal retrieval on e-commerce datasets.

Think before Go: Hierarchical Reasoning for Image-goal Navigation

cs.RO · 2026-04-19 · unverdicted · novelty 5.0

HRNav decomposes image-goal navigation into VLM-based short-horizon planning and RL-based execution with a wandering suppression penalty to improve performance in complex unseen settings.

Dual-Anchoring: Addressing State Drift in Vision-Language Navigation

cs.CV · 2026-04-19

citing papers explorer

Showing 1 of 1 citing paper after filters.

Think before Go: Hierarchical Reasoning for Image-goal Navigation cs.RO · 2026-04-19 · unverdicted · none · ref 93
HRNav decomposes image-goal navigation into VLM-based short-horizon planning and RL-based execution with a wandering suppression penalty to improve performance in complex unseen settings.

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

fields

years

verdicts

representative citing papers

citing papers explorer