pith. sign in

Binghai Wang

Identifiers

  • name variant Binghai Wang 0.60 · backfill

Papers (3)

  1. EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training cs.LG · 2026 · author #9
  2. MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning cs.CL · 2026 · author #3
  3. Secrets of RLHF in Large Language Models Part I: PPO cs.CL · 2023 · author #6

Mentions

  • 2307.04964 #6 · arxiv_oai · confidence 0.70 Binghai Wang

Frequent Coauthors