Pgx: Hardware-Accelerated Parallel Game Simulators for Reinforcement Learning

Sotetsu Koyamada, Shinri Okano, Soichiro Nishimori, Yu Murata, Keigo Habara, Haruka Kita, Shin Ishii · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.LG · 2025-12-25 · unverdicted · novelty 7.0

Inverse-RPO derives two variance-aware prior-based UCT policies from UCB-V that outperform PUCT on benchmarks with no extra cost.

Showing 1 of 1 citing paper.

Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search cs.LG · 2025-12-25 · unverdicted · none · ref 22
Inverse-RPO derives two variance-aware prior-based UCT policies from UCB-V that outperform PUCT on benchmarks with no extra cost.