pith. sign in

Soichiro Nishimori

Identifiers

  • name variant Soichiro Nishimori 0.60 · backfill

Papers (3)

  1. Finite-Time Regret Analysis of Retry-Aware Bandits cs.LG · 2026 · author #3
  2. Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX cs.AI · 2026 · author #1
  3. Mitigating Reward Hacking in RLHF via Advantage Sign Robustness cs.LG · 2026 · author #3

Mentions

  • 2605.20854 #3 · arxiv_oai · confidence 0.70 Soichiro Nishimori
  • 2605.20577 #1 · arxiv_oai · confidence 0.70 Soichiro Nishimori

Frequent Coauthors