Why are visually-grounded language models bad at image classifi- cation?NeurIPS, 2024

Yuhui Zhang, Alyssa Unell, Xiaohan Wang, Dhruba Ghosh, Yuchang Su, Ludwig Schmidt, Serena Yeung-Levy · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Specificity-aware reinforcement learning for fine-grained open-world classification

cs.CV · 2026-03-03 · unverdicted · novelty 6.0

SpeciaRL applies a dynamic verifier-based reward in reinforcement learning to steer reasoning LMMs toward correct and specific predictions on fine-grained open-world image classification tasks.

Can Textual Reasoning Improve the Performance of MLLMs on Fine-grained Visual Classification?

cs.CV · 2026-01-11 · unverdicted · novelty 5.0

Longer textual reasoning chains degrade MLLM accuracy on fine-grained visual tasks; a new normalization and constrained-reward training framework mitigates the effect and sets new SOTA numbers.

citing papers explorer

Showing 2 of 2 citing papers.

Specificity-aware reinforcement learning for fine-grained open-world classification cs.CV · 2026-03-03 · unverdicted · none · ref 62
SpeciaRL applies a dynamic verifier-based reward in reinforcement learning to steer reasoning LMMs toward correct and specific predictions on fine-grained open-world image classification tasks.
Can Textual Reasoning Improve the Performance of MLLMs on Fine-grained Visual Classification? cs.CV · 2026-01-11 · unverdicted · none · ref 46
Longer textual reasoning chains degrade MLLM accuracy on fine-grained visual tasks; a new normalization and constrained-reward training framework mitigates the effect and sets new SOTA numbers.

Why are visually-grounded language models bad at image classifi- cation?NeurIPS, 2024

fields

years

verdicts

representative citing papers

citing papers explorer