pith. sign in

Minjae Oh

Identifiers

No identifiers captured yet.

Papers (4)

  1. KL for a KL: On-Policy Distillation with Control Variate Baseline cs.LG · 2026 · author #1
  2. Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States cs.LG · 2026 · author #4
  3. ThinkBrake: Efficient Reasoning via Log-Probability Margin Guided Decoding cs.CL · 2025 · author #2
  4. Future Policy Approximation for Offline Reinforcement Learning Improves Mathematical Reasoning cs.CL · 2025 · author #1

Mentions

No mention provenance yet.

Frequent Coauthors