RL post-training on hallucination-forced multimodal data improves reasoning performance and can outperform standard training.
Llamav-o1: Rethinking step-by-step vi- sual reasoning in llms
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
SignReasoner decomposes traffic signs into functional structure units and uses a two-stage VLM post-training pipeline to achieve state-of-the-art compositional reasoning on a new benchmark.
citing papers explorer
-
Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models
RL post-training on hallucination-forced multimodal data improves reasoning performance and can outperform standard training.
-
SignReasoner: Compositional Reasoning for Complex Traffic Sign Understanding via Functional Structure Units
SignReasoner decomposes traffic signs into functional structure units and uses a two-stage VLM post-training pipeline to achieve state-of-the-art compositional reasoning on a new benchmark.