Title resolution pending

· 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Are We on the Right Way for Evaluating Large Vision-Language Models?

cs.CV · 2024-03-29 · conditional · novelty 6.0

Current LVLM benchmarks overestimate capabilities because many questions can be answered without images due to design flaws or data leakage; MMStar is a human-curated set of 1,500 vision-indispensable samples across 6 capabilities and 18 axes with new metrics for leakage and true multi-modal gain.

CogVLM2: Visual Language Models for Image and Video Understanding

cs.CV · 2024-08-29 · conditional · novelty 5.0

CogVLM2 family achieves state-of-the-art results on image and video understanding benchmarks through improved visual expert architecture, higher resolution inputs, and automated temporal grounding for videos.

VLM-AutoDrive: Post-Training Vision-Language Models for Safety-Critical Autonomous Driving Events

cs.CV · 2026-03-18 · unverdicted · novelty 4.0

VLM-AutoDrive adapts pretrained VLMs via metadata captions, LLM descriptions, VQA, and CoT supervision, lifting collision F1 from 0.00 to 0.69 and accuracy from 35.35% to 77.27% on Nexar dashcam videos.

citing papers explorer

Showing 3 of 3 citing papers.

Are We on the Right Way for Evaluating Large Vision-Language Models? cs.CV · 2024-03-29 · conditional · none · ref 25
Current LVLM benchmarks overestimate capabilities because many questions can be answered without images due to design flaws or data leakage; MMStar is a human-curated set of 1,500 vision-indispensable samples across 6 capabilities and 18 axes with new metrics for leakage and true multi-modal gain.
CogVLM2: Visual Language Models for Image and Video Understanding cs.CV · 2024-08-29 · conditional · none · ref 42
CogVLM2 family achieves state-of-the-art results on image and video understanding benchmarks through improved visual expert architecture, higher resolution inputs, and automated temporal grounding for videos.
VLM-AutoDrive: Post-Training Vision-Language Models for Safety-Critical Autonomous Driving Events cs.CV · 2026-03-18 · unverdicted · none · ref 8
VLM-AutoDrive adapts pretrained VLMs via metadata captions, LLM descriptions, VQA, and CoT supervision, lifting collision F1 from 0.00 to 0.69 and accuracy from 35.35% to 77.27% on Nexar dashcam videos.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer