pith. sign in

Adavip: Aligning multi-modal llms via adaptive vision- enhanced preference optimization, 2025a

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Experience Augmented Policy Optimization for LLM Reasoning

cs.LG · 2026-06-29 · unverdicted · novelty 5.0

EAPO reuses prior RL policy experience adaptively at decision points in LLM rollouts with adapted importance sampling and reports gains over prior RLVR methods on math benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

  • Experience Augmented Policy Optimization for LLM Reasoning cs.LG · 2026-06-29 · unverdicted · none · ref 9

    EAPO reuses prior RL policy experience adaptively at decision points in LLM rollouts with adapted importance sampling and reports gains over prior RLVR methods on math benchmarks.