FindIt is the first comprehensive benchmark for evaluating generalist MLLMs on promptable object detection, referring expression detection, instance-level detection, and video detection with standardized parsable outputs.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
A new leaf-instance dataset for soybean-cotton detection and segmentation collected across growth stages and conditions from commercial farms is presented and validated with YOLOv11.
citing papers explorer
-
FindIt: A Format-Informed Visual Detection Benchmark for Generalist Multimodal LLMs
FindIt is the first comprehensive benchmark for evaluating generalist MLLMs on promptable object detection, referring expression detection, instance-level detection, and video detection with standardized parsable outputs.