International Journal of Computer Vision (IJCV)123(1), 32–73 (2017),https: //link.springer.com/article/10.1007/S11263-016-0981-7

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A · 2017 · DOI 10.1007/s11263-016-0981-7

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents

cs.CV · 2026-05-19 · accept · novelty 7.0

WildRoadBench provides a professionally annotated UAV corpus and dual-track protocol showing frontier VLMs and LLM agents achieve limited performance on wild aerial road-damage grounding under unified metrics.

HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

Training on automatically generated hard negative captions improves vision-language models' zero-shot detection of fine-grained image-text mismatches and robustness to noisy inputs.

MG$^2$-RAG: Multi-Granularity Graph for Multimodal Retrieval-Augmented Generation

cs.IR · 2026-04-04 · unverdicted · novelty 6.0

MG²-RAG proposes a multi-granularity graph RAG framework that constructs hierarchical multimodal nodes via entity-driven visual grounding and performs structured retrieval, delivering SOTA results on four multimodal tasks with 43.3× faster graph construction.

ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance

cs.CV · 2026-04-29 · unverdicted · novelty 5.0

ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.

AutoVQA-G: Self-Improving Agentic Framework for Automated Visual Question Answering and Grounding Annotation

cs.CV · 2026-04-19 · unverdicted · novelty 5.0

AutoVQA-G is a self-improving framework that generates VQA-G datasets with higher visual grounding accuracy than leading multimodal LLMs via iterative CoT verification and prompt refinement.

citing papers explorer

Showing 5 of 5 citing papers.

WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents cs.CV · 2026-05-19 · accept · none · ref 15
WildRoadBench provides a professionally annotated UAV corpus and dual-track protocol showing frontier VLMs and LLM agents achieve limited performance on wild aerial road-damage grounding under unified metrics.
HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities cs.CL · 2026-05-06 · unverdicted · none · ref 32
Training on automatically generated hard negative captions improves vision-language models' zero-shot detection of fine-grained image-text mismatches and robustness to noisy inputs.
MG$^2$-RAG: Multi-Granularity Graph for Multimodal Retrieval-Augmented Generation cs.IR · 2026-04-04 · unverdicted · none · ref 36
MG²-RAG proposes a multi-granularity graph RAG framework that constructs hierarchical multimodal nodes via entity-driven visual grounding and performs structured retrieval, delivering SOTA results on four multimodal tasks with 43.3× faster graph construction.
ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance cs.CV · 2026-04-29 · unverdicted · none · ref 10
ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.
AutoVQA-G: Self-Improving Agentic Framework for Automated Visual Question Answering and Grounding Annotation cs.CV · 2026-04-19 · unverdicted · none · ref 16
AutoVQA-G is a self-improving framework that generates VQA-G datasets with higher visual grounding accuracy than leading multimodal LLMs via iterative CoT verification and prompt refinement.

International Journal of Computer Vision (IJCV)123(1), 32–73 (2017),https: //link.springer.com/article/10.1007/S11263-016-0981-7

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer