FIKA-Bench is a leakage-aware benchmark of 311 instances showing that even the best large multimodal models and tool-equipped agents reach only 25.1% accuracy on fine-grained recognition questions that require external evidence search and verification.
Visual-rft: Visual reinforcement fine-tuning
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
RaPO reduces catastrophic forgetting in visual continual learning by shaping rewards around policy drift and stabilizing advantages with cross-task exponential moving averages during reinforcement fine-tuning of multimodal models.
citing papers explorer
-
FIKA-Bench: From Fine-grained Recognition to Fine-Grained Knowledge Acquisition
FIKA-Bench is a leakage-aware benchmark of 311 instances showing that even the best large multimodal models and tool-equipped agents reach only 25.1% accuracy on fine-grained recognition questions that require external evidence search and verification.
-
Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning
RaPO reduces catastrophic forgetting in visual continual learning by shaping rewards around policy drift and stabilizing advantages with cross-task exponential moving averages during reinforcement fine-tuning of multimodal models.