Latent Visual Reasoning enables autoregressive generation of latent visual states that reconstruct critical image tokens, yielding gains on perception-heavy VQA benchmarks such as 71.67% on MMVP.
Apo: Enhancing reasoning ability of mllms via asymmetric policy optimization.arXiv preprint arXiv:2506.21655
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
PS-PPO samples prefixes of trajectories in critic-free RLHF and uses importance-weighted updates to reduce compute and memory while claiming to preserve the full-trajectory objective.
Hidden-Align adds an auxiliary loss to align hidden states of correct reasoning paths at the pre-answer token in RLVR, improving pass@1 by 3.8-6.2 points over DAPO on eight math benchmarks for Qwen3 models of 1.7B-14B scale.
citing papers explorer
-
PS-PPO: Prefix-Sampling PPO for Critic-Free RLHF
PS-PPO samples prefixes of trajectories in critic-free RLHF and uses importance-weighted updates to reduce compute and memory while claiming to preserve the full-trajectory objective.
-
Right Makes Might: Aligning Verified Hidden States Empowers RL Reasoning
Hidden-Align adds an auxiliary loss to align hidden states of correct reasoning paths at the pre-answer token in RLVR, improving pass@1 by 3.8-6.2 points over DAPO on eight math benchmarks for Qwen3 models of 1.7B-14B scale.