pith. sign in

Apo: Enhancing reasoning ability of mllms via asymmetric policy optimization.arXiv preprint arXiv:2506.21655

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.LG 2 cs.CV 1

years

2026 2 2025 1

verdicts

UNVERDICTED 3

clear filters

representative citing papers

Latent Visual Reasoning

cs.CV · 2025-09-29 · unverdicted · novelty 7.0

Latent Visual Reasoning enables autoregressive generation of latent visual states that reconstruct critical image tokens, yielding gains on perception-heavy VQA benchmarks such as 71.67% on MMVP.

PS-PPO: Prefix-Sampling PPO for Critic-Free RLHF

cs.LG · 2026-06-29 · unverdicted · novelty 6.0

PS-PPO samples prefixes of trajectories in critic-free RLHF and uses importance-weighted updates to reduce compute and memory while claiming to preserve the full-trajectory objective.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • PS-PPO: Prefix-Sampling PPO for Critic-Free RLHF cs.LG · 2026-06-29 · unverdicted · none · ref 20

    PS-PPO samples prefixes of trajectories in critic-free RLHF and uses importance-weighted updates to reduce compute and memory while claiming to preserve the full-trajectory objective.

  • Right Makes Might: Aligning Verified Hidden States Empowers RL Reasoning cs.LG · 2026-06-02 · unverdicted · none · ref 10

    Hidden-Align adds an auxiliary loss to align hidden states of correct reasoning paths at the pre-answer token in RLVR, improving pass@1 by 3.8-6.2 points over DAPO on eight math benchmarks for Qwen3 models of 1.7B-14B scale.