StemBind benchmark diagnoses MLLM failures in abstract visual reasoning by separating perception, rule induction, and answer selection on shared stems, finding a persistent rule-to-instance binding gap even when perception and rule are correct.
Vriq: Bench- marking and analyzing visual-reasoning iq of vlms
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it