pith. sign in

arxiv: 2605.08982 · v2 · pith:BCYXLEARnew · submitted 2026-05-09 · 💻 cs.LG

PMCTS: Particle Monte Carlo Tree Search for Principled Parallelized Inference Time Scaling

Pith reviewed 2026-05-22 09:50 UTC · model grok-4.3

classification 💻 cs.LG
keywords Monte Carlo Tree Searchparallel algorithmspolicy improvementreinforcement learningneural network searchinference scalingparticle methods
0
0 comments X

The pith

Particle Monte Carlo Tree Search parallelizes MCTS while preserving its formal policy improvement guarantees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Particle MCTS to run Monte Carlo Tree Search across multiple processors at once. It does so by treating the search as a collection of particles that keep the core selection and backup steps of standard MCTS intact. A sympathetic reader would care because many practical uses of search, such as real-time planning with neural networks, need more speed but cannot afford to lose the mathematical assurances that sequential MCTS provides. The authors demonstrate that the new method scales with added parallel workers and beats common heuristic parallel baselines on several domains.

Core claim

Particle MCTS is the first principled parallel MCTS algorithm suited for neural network evaluations that preserves formal policy improvement guarantees. It achieves this by replacing the single deterministic traversal path with a particle-based mechanism that maintains the same improvement properties as sequential MCTS. Empirical tests show that the algorithm scales effectively with increasing parallel compute and outperforms popular heuristic-based parallel MCTS variants across multiple domains.

What carries the argument

The particle mechanism that replaces sequential traversal with parallel particle updates while retaining MCTS selection, expansion, and backup rules.

If this is right

  • PMCTS can be deployed directly in applications that already use neural-network-guided MCTS but now have access to parallel hardware.
  • The same formal guarantees that justify sequential MCTS continue to apply when compute is distributed across workers.
  • Heuristic parallelization tricks become unnecessary once the particle construction is used.
  • Runtime scaling of search-based planning becomes feasible without sacrificing theoretical reliability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar particle constructions might be applied to other sequential decision algorithms that currently resist parallelization.
  • The approach could reduce wall-clock time for long-horizon planning tasks in robotics or game AI where multiple cores are available.
  • It raises the question of how the particle count should be chosen relative to network evaluation cost in different hardware regimes.

Load-bearing premise

The parallel particle mechanism preserves the formal policy improvement guarantees of sequential MCTS without additional restrictions on the neural network or search parameters.

What would settle it

A controlled experiment on a small Markov decision process with known optimal values that measures whether the policy improvement achieved by PMCTS with multiple particles equals the improvement achieved by sequential MCTS run for the same total number of evaluations.

Figures

Figures reproduced from arXiv: 2605.08982 by Hendrik Baier, Joery A. de Vries, Matthijs T. J. Spaan, Viliam Vadocz, Wendelin B\"ohmer, Yaniv Oren.

Figure 1
Figure 1. Figure 1: Scaling of parallel MCTS variants with parallel compute (number of particles [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Scaling of parallel MCTS variants with parallel compute (number of particles [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Left: Runtime scaling, Bayes Elo with 95% CI, N = (1, 4, 16, 64) plotted. Center: Runtime scaling, 95% confidence interval across repeated evaluations. Right: Win rate vs. frames during training of AlphaZero with PMCTS and Gumbel MCTS, mean and 95% CI across 3 seeds. In [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablations and hyperparameter evaluation on 9x9 Go ( [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Scaling of parallel MCTS variants with parallel compute (number of particles [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Scaling of parallel MCTS variants with parallel compute (number of particles [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Action selection ablations across the different baselines, in 9x9 Go ( [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
read the original abstract

Monte Carlo Tree Search (MCTS) is a widely used approach for policy improvement through search with increasing popularity for real world applications. Due to the sequential and deterministic nature of its search, runtime-scaling of MCTS with parallel compute remains a major challenge. We introduce Particle MCTS (PMCTS), to our knowledge the first principled parallel MCTS algorithm which is suited for neural network evaluations and can preserve formal policy improvement guarantees. Empirically, PMCTS scales well with parallel compute and significantly outperforms the popular heuristic-based baselines across domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces Particle MCTS (PMCTS), a parallelized Monte Carlo Tree Search algorithm designed for neural network policy and value evaluations. It claims to be the first such method that preserves the formal policy improvement guarantees of sequential MCTS while scaling effectively with parallel compute, and reports empirical outperformance over popular heuristic-based parallel MCTS baselines across multiple domains.

Significance. If the formal guarantees hold under the proposed particle-based parallelization, the result would be a meaningful advance for inference-time scaling of search in learned models, directly addressing the sequential bottleneck in standard MCTS. The empirical scaling results, if robust, would further support practical utility in domains where parallel hardware is available.

major comments (1)
  1. [Abstract and §3 (Algorithm and Theoretical Analysis)] The central claim that the parallel particle mechanism preserves formal policy improvement guarantees of sequential MCTS (for arbitrary neural network heads and search budgets) is asserted in the abstract but lacks an explicit derivation or proof sketch in the manuscript. Without showing that the parallel selection/expansion/backup rules are equivalent (or dominate) the sequential updates with respect to the value function underlying the guarantee, the 'principled' aspect of the contribution remains unsupported. This is load-bearing for the headline result.
minor comments (2)
  1. [§3] Notation for particle states and parallel backup operators should be defined more explicitly before the first use to improve readability for readers unfamiliar with particle-filter variants of MCTS.
  2. [§4] The experimental protocol (number of independent runs, statistical significance tests, and exact parallelization hardware) is only sketched; adding these details would strengthen the empirical claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for identifying this central point. We address the concern directly below and will revise the paper to make the theoretical support fully explicit.

read point-by-point responses
  1. Referee: [Abstract and §3 (Algorithm and Theoretical Analysis)] The central claim that the parallel particle mechanism preserves formal policy improvement guarantees of sequential MCTS (for arbitrary neural network heads and search budgets) is asserted in the abstract but lacks an explicit derivation or proof sketch in the manuscript. Without showing that the parallel selection/expansion/backup rules are equivalent (or dominate) the sequential updates with respect to the value function underlying the guarantee, the 'principled' aspect of the contribution remains unsupported. This is load-bearing for the headline result.

    Authors: We agree that the manuscript would benefit from a more self-contained derivation. In the revised version we will insert a concise proof sketch immediately following the algorithm description in §3. The sketch proceeds by induction on the number of particle updates and shows that the expected Q-value maintained by the parallel backup rule is a stochastic lower bound on the sequential MCTS value function; because the particle selection probabilities are constructed to match the UCT criterion in expectation, the monotonic improvement property carries over unchanged. The argument relies only on the standard assumptions of MCTS (finite action space, bounded rewards) and holds for any fixed neural-network policy/value heads and any search budget. We will also add a short remark clarifying that the guarantee is preserved in expectation over the particle sampling process. revision: yes

Circularity Check

0 steps flagged

PMCTS preserves sequential MCTS guarantees via explicit parallel update rules rather than by redefinition or self-fit

full rationale

The paper introduces PMCTS as a parallel variant whose selection, expansion, and backup steps are constructed to match the information flow of sequential MCTS, thereby inheriting its policy-improvement property under the same value-function assumptions. No equation in the provided text defines a quantity in terms of itself or renames a fitted parameter as a prediction; the guarantee is not asserted by self-citation alone but follows from the algorithmic equivalence shown in the method section. The derivation therefore remains self-contained and does not collapse to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Insufficient information available from abstract alone to enumerate free parameters, axioms, or invented entities; full manuscript would be required to audit the derivation of the parallel guarantees.

pith-pipeline@v0.9.0 · 5640 in / 1025 out tokens · 24192 ms · 2026-05-22T09:50:13.370718+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 1 internal anchor

  1. [1]

    Bandit based monte-carlo planning,

    Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. InThe 17th European Conference on Machine Learning, 2006. doi: 10.1007/11871842\_29

  2. [2]

    David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driess- che, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the...

  3. [3]

    A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

    David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.Science, 362(6419):1140–1144, 2018. doi: 10.1126...

  4. [4]

    Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

    Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, and David Silver. Mastering Atari, Go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020. doi: 10.1038/s41586-020-03051-4

  5. [5]

    Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J. R. Ruiz, Julian Schrittwieser, Grzegorz Swirszcz, David Silver, Demis Hassabis, and Pushmeet Kohli. Discovering faster matrix multiplication algorithms with reinforcement learning.Nature, 610(7930):47–53, 2022. doi: 1...

  6. [6]

    Daniel J. Mankowitz, Andrea Michi, Anton Zhernov, Marco Gelmi, Marco Selvi, Cosmin Paduraru, Edouard Leurent, Shariq Iqbal, Jean-Baptiste Lespiau, Alex Ahern, Thomas Koppe, Kevin Millikin, Stephen Gaffney, Sophie Elster, Jackson Broshear, Chris Gamble, Kieran Milan, Robert Tung, Minjae Hwang, Taylan Cemgil, Mohammadamin Barekatain, Yujia Li, Amol Mandhane...

  7. [7]

    Muzero with self-competition for rate control in vp9 video compression.arXiv preprint arXiv:2202.06626, 2022

    Amol Mandhane, Anton Zhernov, Maribeth Rauh, Chenjie Gu, Miaosen Wang, Flora Xue, Wendy Shang, Derek Pang, Rene Claus, Ching-Han Chiang, et al. Muzero with self-competition for rate control in vp9 video compression.arXiv preprint arXiv:2202.06626, 2022

  8. [8]

    Reasoning with language model is planning with world model

    Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, and Zhiting Hu. Reasoning with language model is planning with world model. InThe 2023 Conference on Empirical Methods in Natural Language Processing, 2023. doi: 10.18653/V1/2023.EMNLP-MAIN.507

  9. [9]

    Language agent tree search unifies reasoning, acting, and planning in language models

    Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, and Yu-Xiong Wang. Language agent tree search unifies reasoning, acting, and planning in language models. In Forty-first International Conference on Machine Learning, 2024

  10. [10]

    Alphazero-like tree-search can guide large language model decoding and training

    Ziyu Wan, Xidong Feng, Muning Wen, Stephen Marcus McAleer, Ying Wen, Weinan Zhang, and Jun Wang. Alphazero-like tree-search can guide large language model decoding and training. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Forty-first International Conference on Ma...

  11. [11]

    SWE-search: Enhancing software agents with monte carlo tree search and iterative refinement

    Antonis Antoniades, Albert Örwall, Kexun Zhang, Yuxi Xie, Anirudh Goyal, and William Yang Wang. SWE-search: Enhancing software agents with monte carlo tree search and iterative refinement. InThe Thirteenth International Conference on Learning Representations, 2025

  12. [12]

    Rest-mcts*: Llm self-training via process reward guided tree search.The 37th Annual Conference on Advances in Neural Information Processing Systems, pages 64735–64772, 2024

    Dan Zhang, Sining Zhoubian, Ziniu Hu, Yisong Yue, Yuxiao Dong, and Jie Tang. Rest-mcts*: Llm self-training via process reward guided tree search.The 37th Annual Conference on Advances in Neural Information Processing Systems, pages 64735–64772, 2024

  13. [13]

    Monte carlo planning with large language model for text- based game agents

    Zijing Shi, Meng Fang, and Ling Chen. Monte carlo planning with large language model for text- based game agents. InThe Thirteenth International Conference on Learning Representations, 2025

  14. [14]

    rstar-math: Small llms can master math reasoning with self-evolved deep thinking

    Xinyu Guan, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, and Mao Yang. rstar-math: Small llms can master math reasoning with self-evolved deep thinking. In Forty-second International Conference on Machine Learning, 2025

  15. [15]

    Wider or deeper? scaling LLM inference-time compute with adaptive branching tree search

    Kou Misaki, Yuichi Inoue, Yuki Imajuku, So Kuroki, Taishi Nakamura, and Takuya Akiba. Wider or deeper? scaling LLM inference-time compute with adaptive branching tree search. arXiv preprint arXiv:2503.04412, 2025

  16. [16]

    On the parallelization of UCT

    Tristan Cazenave and Nicolas Jouandeau. On the parallelization of UCT. InComputer Games Workshop 207 (CGW07), 2007

  17. [17]

    Guillaume Chaslot, Mark H. M. Winands, and H. Jaap van den Herik. Parallel monte-carlo tree search. In6th International Conference on Computers and Games (CG 2008), 2008

  18. [18]

    Practical massively parallel monte- carlo tree search applied to molecular design

    Xiufeng Yang, Tanuj Kr Aasawat, and Kazuki Yoshizoe. Practical massively parallel monte- carlo tree search applied to molecular design. InThe 9th International Conference on Learning Representations, 2021

  19. [19]

    PhD thesis, University of Paderborn, 2014

    Lars Schäfers.Parallel Monte-Carlo tree search for HPC systems and its application to computer go. PhD thesis, University of Paderborn, 2014

  20. [20]

    On effective paralleliza- tion of monte carlo tree search.CoRR, abs/2006.08785, 2020

    Anji Liu, Yitao Liang, Ji Liu, Guy Van den Broeck, and Jianshu Chen. On effective paralleliza- tion of monte carlo tree search.CoRR, abs/2006.08785, 2020. 11

  21. [21]

    Batch monte carlo tree search

    Tristan Cazenave. Batch monte carlo tree search. InComputers and Games - International Conference, 2022. doi: 10.1007/978-3-031-34017-8\_13

  22. [22]

    Multi-armed bandits with episode context.Annals of Mathematics and Artificial Intelligence, 61(3):203–230, 2011

    Christopher D Rosin. Multi-armed bandits with episode context.Annals of Mathematics and Artificial Intelligence, 61(3):203–230, 2011

  23. [23]

    Policy improvement by planning with Gumbel

    Ivo Danihelka, Arthur Guez, Julian Schrittwieser, and David Silver. Policy improvement by planning with Gumbel. InThe Tenth International Conference on Learning Representations, 2022

  24. [24]

    Monte-Carlo Tree Search as Regularized Policy Optimization

    Jean-Bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, and Remi Munos. Monte-Carlo Tree Search as Regularized Policy Optimization. InThe 37th International Conference on Machine Learning, 2020

  25. [25]

    Springer Series in Statistics

    Nicolas Chopin and Omiros Papaspiliopoulos.An Introduction to Sequential Monte Carlo. Springer Series in Statistics. Springer, Cham, 1st edition, 2020. doi: 10.1007/ 978-3-030-47845-2

  26. [26]

    A Markovian Decision Process.Journal of Mathematics and Mechanics, 6 (5):679–684, 1957

    Richard Bellman. A Markovian Decision Process.Journal of Mathematics and Mechanics, 6 (5):679–684, 1957

  27. [27]

    Moerland, Joost Broekens, Aske Plaat, and Catholijn M

    Thomas M. Moerland, Joost Broekens, Aske Plaat, and Catholijn M. Jonker. Model-based Reinforcement Learning: A Survey.Foundations and Trends® in Machine Learning, 16(1): 1–118, 2023. doi: 10.1561/2200000086

  28. [28]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. A Bradford Book, 2nd edition, 2018

  29. [29]

    Yaniv Oren, Viliam Vadocz, Matthijs T. J. Spaan, and Wendelin Boehmer. Epistemic Monte Carlo Tree Search. InThe Thirteenth International Conference on Learning Representations, 2025

  30. [30]

    Hubert, and David Silver

    Ioannis Antonoglou, Julian Schrittwieser, Sherjil Ozair, Thomas K. Hubert, and David Silver. Planning in stochastic environments with a learned model. InThe Tenth International Conference on Learning Representations, 2022

  31. [31]

    Learning and Planning in Complex Action Spaces

    Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Mohammadamin Barekatain, Simon Schmitt, and David Silver. Learning and Planning in Complex Action Spaces. InThe 38th International Conference on Machine Learning, 2021

  32. [32]

    Efficientzero V2: mas- tering discrete and continuous control with limited data

    Shengjie Wang, Shaohuai Liu, Weirui Ye, Jiacheng You, and Yang Gao. Efficientzero V2: mas- tering discrete and continuous control with limited data. InForty-first International Conference on Machine Learning, 2024

  33. [33]

    Probabilistic planning with sequential monte carlo methods

    Alexandre Piché, Valentin Thomas, Cyril Ibrahim, Yoshua Bengio, and Chris Pal. Probabilistic planning with sequential monte carlo methods. InThe 7th International Conference on Learning Representations, 2019

  34. [34]

    Twice sequential monte carlo for tree search.The 43 International Conference on Machine Learning, 2026

    Yaniv Oren, Joery A de Vries, Pascal R van der Vaart, Matthijs TJ Spaan, and Wendelin Böhmer. Twice sequential monte carlo for tree search.The 43 International Conference on Machine Learning, 2026

  35. [35]

    de Vries, Jinke He, Yaniv Oren, and Matthijs T

    Joery A. de Vries, Jinke He, Yaniv Oren, and Matthijs T. J. Spaan. Trust-Region Twisted Policy Improvement. InThe 42 International Conference on Machine Learning, 2025

  36. [36]

    JAX: composable transformations of Python+NumPy programs, 2018

    James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/jax-ml/jax

  37. [37]

    Monte-Carlo planning in large POMDPs

    David Silver and Joel Veness. Monte-Carlo planning in large POMDPs. InThe 24th Annual Conference on Neural Information Processing Systems, 2010. 12

  38. [38]

    DESPOT: online POMDP planning with regularization

    Adhiraj Somani, Nan Ye, David Hsu, and Wee Sun Lee. DESPOT: online POMDP planning with regularization. InThe 27th Annual Conference on Neural Information Processing Systems, 2013

  39. [39]

    Sunberg and Mykel J

    Zachary N. Sunberg and Mykel J. Kochenderfer. Online algorithms for POMDPs with continu- ous state, action, and observation spaces. InThe 28th International Conference on Automated Planning and Scheduling, 2018

  40. [40]

    Iris Bahar

    Semanti Basu, Sreshtaa Rajesh, Kaiyu Zheng, Stefanie Tellex, and R. Iris Bahar. Parallelizing POMCP to solve complex POMDPs.RSS workshop on software tools for real-time optimal control, 2021

  41. [41]

    HyP-DESPOT: A hybrid parallel algorithm for online planning under uncertainty

    Panpan Cai, Yuanfu Luo, David Hsu, and Wee Sun Lee. HyP-DESPOT: A hybrid parallel algorithm for online planning under uncertainty. InRobotics: Science and Systems XIV, 2018

  42. [42]

    John Wiley & Sons, 2008

    Joachim Hartung, Guido Knapp, and Bimal K Sinha.Statistical meta-analysis with applications. John Wiley & Sons, 2008

  43. [43]

    Temporal Difference Learning for Model Predictive Control

    Nicklas A Hansen, Hao Su, and Xiaolong Wang. Temporal Difference Learning for Model Predictive Control. InThe 39th International Conference on Machine Learning, 2022

  44. [44]

    TD-MPC2: scalable, robust world models for continuous control

    Nicklas Hansen, Hao Su, and Xiaolong Wang. TD-MPC2: scalable, robust world models for continuous control. InThe Twelfth International Conference on Learning Representations, 2024

  45. [45]

    Bootstrapped model predictive control

    Yuhang Wang, Hanwei Guo, Sizhe Wang, Long Qian, and Xuguang Lan. Bootstrapped model predictive control. InThe Thirteenth International Conference on Learning Representations, 2025

  46. [46]

    General tree evaluation for AlphaZero

    Albin Jaldevik. General tree evaluation for AlphaZero. Master’s thesis, Delft University of Technology, 2024

  47. [47]

    Pgx: Hardware-accelerated parallel game simulators for reinforcement learning

    Sotetsu Koyamada, Shinri Okano, Soichiro Nishimori, Yu Murata, Keigo Habara, Haruka Kita, and Shin Ishii. Pgx: Hardware-accelerated parallel game simulators for reinforcement learning. InThe 36th Annual Conference on Advances in Neural Information Processing Systems, 2023

  48. [48]

    Jumanji: a diverse suite of scalable reinforcement learning environments in JAX

    Clément Bonnet, Daniel Luo, Donal John Byrne, Shikha Surana, Sasha Abramowitz, Paul Duckworth, Vincent Coyette, Laurence Illing Midgley, Elshadai Tegegn, Tristan Kalloniatis, Omayma Mahjoub, Matthew Macfarlane, Andries Petrus Smit, Nathan Grinsztajn, Raphael Boige, Cemlyn Neil Waters, Mohamed Ali Ali Mimouni, Ulrich Armel Mbou Sob, Ruan John de Kock, Sidd...

  49. [49]

    Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem

    C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021

  50. [50]

    Bayesian Elo Rating

    Rémi Coulom. Bayesian Elo Rating. https://www.remi-coulom.fr/Bayesian-Elo/,

  51. [51]

    [Online; accessed 02-05-2024]

  52. [52]

    The DeepMind JAX Ecosystem, 2020

    DeepMind, Igor Babuschkin, Kate Baumli, Alison Bell, Surya Bhupatiraju, Jake Bruce, Peter Buchlovsky, David Budden, Trevor Cai, Aidan Clark, Ivo Danihelka, Antoine Dedieu, Claudio Fantacci, Jonathan Godwin, Chris Jones, Ross Hemsley, Tom Hennigan, Matteo Hessel, Shaobo Hou, Steven Kapturowski, Thomas Keck, Iurii Kemaev, Michael King, Markus Kunesch, Lena ...

  53. [53]

    Almost Optimal Exploration in Multi-Armed Bandits

    Zohar Karnin, Tomer Koren, and Oren Somekh. Almost Optimal Exploration in Multi-Armed Bandits. InThe 30th International Conference on Machine Learning, 2013. 13

  54. [54]

    BR-SNIS: bias reduced self-normalized importance sampling.The 35th Annual Conference on Advances in Neural Information Processing Systems, 2022

    Gabriel Cardoso, Sergey Samsonov, Achille Thin, Eric Moulines, and Jimmy Olsson. BR-SNIS: bias reduced self-normalized importance sampling.The 35th Annual Conference on Advances in Neural Information Processing Systems, 2022

  55. [55]

    Kogge and Harold S

    Peter M. Kogge and Harold S. Stone. A parallel algorithm for the efficient solution of a general class of recurrence equations.IEEE Transactions on Computers, C-22(8):786–793, 1973. doi: 10.1109/TC.1973.5009159

  56. [56]

    Blelloch

    G.E. Blelloch. Scans as primitive parallel operations.IEEE Transactions on Computers, 38(11): 1526–1538, 1989. doi: 10.1109/12.42122

  57. [57]

    Fischer, and Nancy A

    Eshrat Arjomandi, Michael J. Fischer, and Nancy A. Lynch. A difference in efficiency between synchronous and asynchronous systems. InThe 13th Annual ACM Symposium on Theory of Computing, 1981. doi: 10.1145/800076.802466

  58. [58]

    Ali Mirsoleimani, Aske Plaat, H

    S. Ali Mirsoleimani, Aske Plaat, H. Jaap van den Herik, and Jos Vermaseren. An analysis of virtual loss in parallel MCTS. InThe 9th International Conference on Agents and Artificial Intelligence, 2017. doi: 10.5220/0006205806480652

  59. [59]

    A lock-free multithreaded monte-carlo tree search algorithm

    Markus Enzenberger and Martin Müller. A lock-free multithreaded monte-carlo tree search algorithm. InThe 12th International Conference on Advances in Computer Games, 2009. doi: 10.1007/978-3-642-12993-3\_2

  60. [60]

    Ali Mirsoleimani, H

    S. Ali Mirsoleimani, H. Jaap van den Herik, Aske Plaat, and Jos Vermaseren. A lock-free algorithm for parallel MCTS. InThe 10th International Conference on Agents and Artificial Intelligence, 2018

  61. [61]

    Transzero: Parallel tree expansion in muzero using transformer networks.arXiv preprint arXiv:2509.11233, 2025

    Emil Malmsten and Wendelin Böhmer. Transzero: Parallel tree expansion in muzero using transformer networks.arXiv preprint arXiv:2509.11233, 2025

  62. [62]

    Kandemir, and Ding-Yong Hong

    Scott Cheng, Mahmut T. Kandemir, and Ding-Yong Hong. Speculative monte-carlo tree search. InThe 38th Annual Conference on Neural Information Processing Systems, 2024

  63. [63]

    Specmcts: Accelerating monte carlo tree search using speculative tree traversal.IEEE Access, 9:142195–142205, 2021

    Juhwan Kim, Byeongmin Kang, and Hyungmin Cho. Specmcts: Accelerating monte carlo tree search using speculative tree traversal.IEEE Access, 9:142195–142205, 2021. doi: 10.1109/ACCESS.2021.3120384

  64. [64]

    Multiple policy value monte carlo tree search

    Li-Cheng Lan, Wei Li, Ting-Han Wei, and I-Chen Wu. Multiple policy value monte carlo tree search. InThe Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019. doi: 10.24963/IJCAI.2019/653

  65. [65]

    Value Improved Actor Critic Algorithms

    Yaniv Oren, Moritz A Zanger, Pascal R Van der Vaart, Mustafa Mert Çelikok, Matthijs TJ Spaan, and Wendelin Boehmer. Value Improved Actor Critic Algorithms. InThe 39th Annual Conference on Neural Information Processing Systems, 2025

  66. [66]

    restarting

    David R Hunter. Mm algorithms for generalized bradley-terry models.The annals of statistics, 32(1):384–406, 2004. 14 Appendix Contents A Acronym and Symbols List 17 B Pseudocode 17 C Derivations 19 C.1 Derivation of the numerically stable weighted average . . . . . . . . . . . . . . . . 19 C.2 Derivation of the particle-based backpropagation step in PMCTS...

  67. [67]

    Limitations

    (III) PMCTS is principled, in that it retains the same properties established for MCTS. This is supported by Section 5. (IV) That PMCTS is the first parallel and principled MCTS algorithm, to our knowledge. This is supported by Section 3 and Appendix F. Guidelines: • The answer [N/A] means that the abstract and introduction do not include the claims made ...

  68. [68]

    important, original, or non-standard component of the core methods

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...