Flow-OPD applies on-policy distillation to Flow Matching models through specialized teachers, cold-start initialization, task routing, and manifold regularization, lifting GenEval from 63 to 92 and OCR from 59 to 94 on Stable Diffusion 3.5 Medium.
Vision-r1: Incentivizing reasoning capability in multimodal large language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
RAC adds ranking-aware group loss and clean-corrupted pairwise loss to RL post-training to boost both accuracy and calibration in multimodal reasoning without extra annotations.
citing papers explorer
-
Flow-OPD: On-Policy Distillation for Flow Matching Models
Flow-OPD applies on-policy distillation to Flow Matching models through specialized teachers, cold-start initialization, task routing, and manifold regularization, lifting GenEval from 63 to 92 and OCR from 59 to 94 on Stable Diffusion 3.5 Medium.
-
Ranking-Aware Calibration for Reliable Multimodal Reinforcement Learning
RAC adds ranking-aware group loss and clean-corrupted pairwise loss to RL post-training to boost both accuracy and calibration in multimodal reasoning without extra annotations.