pith. sign in

A survey of state of the art large vision language models: Benchmark evaluations and challenges

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CV 2

years

2026 2

representative citing papers

AgroTools: A Benchmark for Tool-Augmented Multimodal Agents in Agriculture

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

AgroTools is a new benchmark for tool-augmented multimodal agents in agriculture featuring 539 QA pairs, 1,097 images, five task families, and 14 tools, with evaluations showing major limitations in current models' tool planning and execution.

citing papers explorer

Showing 2 of 2 citing papers.

  • AgroTools: A Benchmark for Tool-Augmented Multimodal Agents in Agriculture cs.CV · 2026-05-21 · unverdicted · none · ref 28

    AgroTools is a new benchmark for tool-augmented multimodal agents in agriculture featuring 539 QA pairs, 1,097 images, five task families, and 14 tools, with evaluations showing major limitations in current models' tool planning and execution.

  • FORGE: Fine-grained Multimodal Evaluation for Manufacturing Scenarios cs.CV · 2026-04-08 · conditional · none · ref 25

    FORGE benchmark shows domain-specific knowledge, not visual grounding, is the main bottleneck for MLLMs in manufacturing, with SFT on a 3B model delivering up to 90.8% relative accuracy improvement on held-out scenarios.