2509.22647 , archivePrefix=

· 2025 · arXiv 2509.22647

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

DetailVerifyBench supplies 1,000 images and densely annotated long captions to evaluate precise hallucination localization in multimodal large language models.

VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

VideoSeeker integrates agentic reasoning and visual prompts into LVLMs via automated data synthesis, cold-start supervision, and RL training, yielding +13.7% gains on instance-level video tasks over baselines including GPT-4o.

BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

BalCapRL applies balanced multi-objective RL with GDPO-style normalization and length-conditional masking to improve MLLM image captioning, reporting gains of up to +13.6 DCScore, +9.0 CaptionQA, and +29.0 CapArena on LLaVA and Qwen models.

DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

DiffCap-Bench supplies a diverse IDC benchmark with ten categories and LLM judging grounded in human difference lists to evaluate MLLMs more robustly than prior lexical metrics.

CharTide: Data-Centric Chart-to-Code Generation via Tri-Perspective Tuning and Inquiry-Driven Evolution

cs.CV · 2026-04-24 · unverdicted · novelty 6.0

CharTide decouples chart-to-code data into three perspectives and uses inquiry-driven RL with atomic QA verification to let smaller VLMs surpass GPT-4o on chart-to-code tasks.

citing papers explorer

Showing 5 of 5 citing papers.

DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions cs.CV · 2026-04-07 · unverdicted · none · ref 48
DetailVerifyBench supplies 1,000 images and densely annotated long captions to evaluate precise hallucination localization in multimodal large language models.
VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation cs.CV · 2026-05-15 · unverdicted · none · ref 40
VideoSeeker integrates agentic reasoning and visual prompts into LVLMs via automated data synthesis, cold-start supervision, and RL training, yielding +13.7% gains on instance-level video tasks over baselines including GPT-4o.
BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning cs.CV · 2026-05-08 · unverdicted · none · ref 3
BalCapRL applies balanced multi-objective RL with GDPO-style normalization and length-conditional masking to improve MLLM image captioning, reporting gains of up to +13.6 DCScore, +9.0 CaptionQA, and +29.0 CapArena on LLaVA and Qwen models.
DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning cs.CV · 2026-05-06 · unverdicted · none · ref 36
DiffCap-Bench supplies a diverse IDC benchmark with ten categories and LLM judging grounded in human difference lists to evaluate MLLMs more robustly than prior lexical metrics.
CharTide: Data-Centric Chart-to-Code Generation via Tri-Perspective Tuning and Inquiry-Driven Evolution cs.CV · 2026-04-24 · unverdicted · none · ref 33
CharTide decouples chart-to-code data into three perspectives and uses inquiry-driven RL with atomic QA verification to let smaller VLMs surpass GPT-4o on chart-to-code tasks.

2509.22647 , archivePrefix=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer