MIRL uses mutual information to guide trajectory selection and provide separate rewards for visual perception in RLVR for VLMs, achieving 70.22% average accuracy with 25% fewer full trajectories.
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? , booktitle =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
CADFT improves supervised fine-tuning of large language models by dynamically down-weighting training samples whose low model-likelihood indicates high gradient variance, yielding better stability and generalization.
citing papers explorer
-
MIRL: Mutual Information-Guided Reinforcement Learning for Vision-Language Models
MIRL uses mutual information to guide trajectory selection and provide separate rewards for visual perception in RLVR for VLMs, achieving 70.22% average accuracy with 25% fewer full trajectories.
-
Compatibility-Aware Dynamic Fine-Tuning for Large Language Models
CADFT improves supervised fine-tuning of large language models by dynamically down-weighting training samples whose low model-likelihood indicates high gradient variance, yielding better stability and generalization.