A survey of state of the art large vision language models: Benchmark evaluations and challenges

Zongxia Li, Xiyang Wu, Hongyang Du, Fuxiao Liu, Huy Nghiem, Guangyao Shi · 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

AgroTools: A Benchmark for Tool-Augmented Multimodal Agents in Agriculture

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

AgroTools is a new benchmark for tool-augmented multimodal agents in agriculture featuring 539 QA pairs, 1,097 images, five task families, and 14 tools, with evaluations showing major limitations in current models' tool planning and execution.

FORGE: Fine-grained Multimodal Evaluation for Manufacturing Scenarios

cs.CV · 2026-04-08 · conditional · novelty 7.0

FORGE benchmark shows domain-specific knowledge, not visual grounding, is the main bottleneck for MLLMs in manufacturing, with SFT on a 3B model delivering up to 90.8% relative accuracy improvement on held-out scenarios.

citing papers explorer

Showing 2 of 2 citing papers.

AgroTools: A Benchmark for Tool-Augmented Multimodal Agents in Agriculture cs.CV · 2026-05-21 · unverdicted · none · ref 28
AgroTools is a new benchmark for tool-augmented multimodal agents in agriculture featuring 539 QA pairs, 1,097 images, five task families, and 14 tools, with evaluations showing major limitations in current models' tool planning and execution.
FORGE: Fine-grained Multimodal Evaluation for Manufacturing Scenarios cs.CV · 2026-04-08 · conditional · none · ref 25
FORGE benchmark shows domain-specific knowledge, not visual grounding, is the main bottleneck for MLLMs in manufacturing, with SFT on a 3B model delivering up to 90.8% relative accuracy improvement on held-out scenarios.

A survey of state of the art large vision language models: Benchmark evaluations and challenges

fields

years

verdicts

representative citing papers

citing papers explorer