Using Common Random Numbers for Simulation-based Planning with Rollouts

Frederic J Maliakkal; Harshad Khadilkar; Sandarbh Yadav; Shivaram Kalyanakrishnan

arxiv: 2605.04732 · v1 · submitted 2026-05-06 · 💻 cs.LG

Using Common Random Numbers for Simulation-based Planning with Rollouts

Sandarbh Yadav , Frederic J Maliakkal , Harshad Khadilkar , Shivaram Kalyanakrishnan This is my paper

Pith reviewed 2026-05-08 18:24 UTC · model grok-4.3

classification 💻 cs.LG

keywords common random numbersrollout planningvariance reductionsimulation-based planningstochastic environmentsUCTMonte Carlo planning

0 comments

The pith

Using common random numbers in rollout simulations provably reduces variance in relative utility estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how sharing the same random numbers when simulating different actions in rollout-based planning lowers the variance of their estimated utility differences. This matters because more stable relative comparisons let the planner pick better actions in stochastic settings without extra samples or computation. The reduction is provable once simulations switch to a rollout policy after an initial depth. Synthetic experiments and two applications, single-step planning for pension disbursement and UCT in Ludo, show the scheme improves final task performance.

Core claim

When the sampling model invokes a rollout policy beyond some depth, applying common random numbers across the trajectory generations for different actions yields a strictly lower variance for the relative utility estimates, improving action selection in the planning loop.

What carries the argument

Common random numbers applied to the sampling model so that trajectories for competing actions share the same randomness and produce correlated utility estimates.

Load-bearing premise

The sampling model must generate trajectories whose relative utilities can be compared under shared randomness without introducing bias.

What would settle it

An experiment that measures the variance of the difference between two action utilities and finds no reduction (or an increase) when common random numbers replace independent draws in the rollout phase.

Figures

Figures reproduced from arXiv: 2605.04732 by Frederic J Maliakkal, Harshad Khadilkar, Sandarbh Yadav, Shivaram Kalyanakrishnan.

**Figure 1.** Figure 1: Figure (a) shows the MDP defined in the proof of Proposition view at source ↗

**Figure 2.** Figure 2: Performance metrics against the number of simulations on synthetic tasks. Results (here view at source ↗

**Figure 3.** Figure 3: Figure (a) explains the sequence of steps in the FTVAF task, while Figure (b) records the view at source ↗

**Figure 4.** Figure 4: Ludo: Figure (a) shows the board, and Figure (b) the performance of simulation-based view at source ↗

read the original abstract

Simulation-based planning with rollouts is a widely-deployed technique for decision making in stochastic environments. The primary instrument of simulation-based planning is a sampling model, which is repeatedly called to generate trajectories and estimate the utilities of available actions. Among the actions thus explored, one with the maximum estimated utility is then executed. In this paper, we examine the effect of using common random numbers in the simulation process. We obtain a simple recipe for (provably) reducing variance in relative utility when simulations invoke a rollout policy beyond some depth. Experiments on synthetic tasks confirm that our scheme improves task performance. The broader significance of our innovation is apparent from two practical applications: (1) single-step lookahead planning in a pension-disbursement task, and (2) a deployment of the well-known UCT algorithm for the game of Ludo.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean, provable way to apply common random numbers to the rollout phase so relative utilities have lower variance, and the experiments back it without obvious holes.

read the letter

The core result is a straightforward recipe that shares random numbers across simulations once the rollout policy takes over, which provably shrinks variance on the differences between action values. This is not a brand-new idea in simulation, but the targeted proof and the way it slots into existing planners is the actual novelty here. The synthetic tasks show the variance drop in practice, and the two real applications—one in a pension disbursement problem and one running UCT on Ludo—demonstrate that the lower variance translates into better final decisions without extra bias or cost.

Referee Report

0 major / 4 minor

Summary. The manuscript proposes applying common random numbers (CRN) to the post-depth rollout phase of sampling models in simulation-based planning. It claims this yields a simple, provable reduction in the variance of relative action utilities (without biasing comparisons), leading to better action selection. The claim is supported by experiments on synthetic tasks showing improved performance, plus two applications: single-step lookahead in a pension-disbursement task and UCT for Ludo.

Significance. If the variance-reduction claim holds under the stated conditions, the technique offers a low-cost way to improve sample efficiency in rollout-based planners such as UCT/MCTS variants. The two practical applications provide concrete evidence of utility beyond synthetic benchmarks. The work correctly identifies and exploits the controllable randomness already present in many simulators.

minor comments (4)

[§3] §3 (or wherever the variance argument appears): the derivation would benefit from an explicit side-by-side comparison of Var(U_i - U_j) under independent sampling versus CRN, including the covariance term, to make the reduction factor transparent.
[Experiments] Experiments section: report the number of independent trials, standard errors, and any statistical tests for the performance gains on synthetic tasks and Ludo; without these the magnitude of improvement is hard to judge.
[Notation/§2] Notation: introduce the rollout-depth parameter d explicitly when first used and keep its symbol consistent throughout.
[Figures] Figure captions: ensure each figure states the number of simulations per action and whether error bars represent standard error or deviation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report accurately captures the core contribution regarding common random numbers for variance reduction in post-depth rollouts.

Circularity Check

0 steps flagged

No significant circularity; standard CRN variance reduction applied to rollouts

full rationale

The paper's central claim is a direct application of the well-known common random numbers (CRN) technique to reduce variance of relative utilities in post-depth rollouts. This follows from the standard property that shared randomness makes the difference of two estimators have lower variance than independent sampling, without bias, provided the simulator exposes controllable randomness. No equations reduce to self-definition, no fitted parameters are relabeled as predictions, and no self-citation chain or uniqueness theorem is invoked to force the result. The derivation is self-contained against external simulation benchmarks and probabilistic facts about CRN.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly relies on standard assumptions of stochastic simulation models.

axioms (1)

domain assumption The environment admits a sampling model that can generate trajectories under shared randomness.
Required for any simulation-based planning method described.

pith-pipeline@v0.9.0 · 5447 in / 1111 out tokens · 28150 ms · 2026-05-08T18:24:25.733346+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Statistics/MDP variance analysis — orthogonal to Cost.FunctionalEquation and Foundation forcing chain none applicable unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 2: var(X_DD) ≤ var(X_I) ... cov(V^{π1}_{M1}(s,t), V^{π2}_{M3}(s,t)) ≥ 0 by induction.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

125 extracted references · 125 canonical work pages

[1]

2015 , publisher=

Goals-based wealth management: An integrated and practical approach to changing the structure of wealth advisory practices , author=. 2015 , publisher=

work page 2015
[2]

International conference on machine learning , pages=

Asynchronous methods for deep reinforcement learning , author=. International conference on machine learning , pages=. 2016 , organization=

work page 2016
[3]

Journal of political economy , volume=

The pricing of options and corporate liabilities , author=. Journal of political economy , volume=. 1973 , publisher=

work page 1973
[4]

1992 , publisher=

Aalen, Odd O , journal=. 1992 , publisher=

work page 1992
[5]

2006 , address =

Elizabeth Arias , title =. 2006 , address =

work page 2006
[6]

Journal of portfolio management , volume=

The sharpe ratio , author=. Journal of portfolio management , volume=. 1994 , publisher=

work page 1994
[7]

Nature , volume=

Human-level control through deep reinforcement learning , author=. Nature , volume=. 2015 , publisher=

work page 2015
[8]

1997 , publisher=

Monte Carlo Simulation , author=. 1997 , publisher=

work page 1997
[9]

Advances in neural information processing systems , volume=

Actor-critic algorithms , author=. Advances in neural information processing systems , volume=

work page
[10]

Journal of financial and quantitative analysis , volume=

An analytic derivation of the efficient portfolio frontier , author=. Journal of financial and quantitative analysis , volume=. 1972 , publisher=

work page 1972
[11]

Computational Management Science , volume=

Dynamic portfolio allocation in goals-based wealth management , author=. Computational Management Science , volume=. 2020 , publisher=

work page 2020
[12]

2012 , publisher=

Dynamic programming and optimal control: Volume I , author=. 2012 , publisher=

work page 2012
[13]

2007 , publisher=

Hidden Markov Models in Finance , author=. 2007 , publisher=

work page 2007
[14]

2011 , publisher=

B. 2011 , publisher=

work page 2011
[15]

Sutton, Richard S and Barto, Andrew G , year=

work page
[16]

2024 , organization=

Das, Sanjiv R and Ostrov, Daniel and Mittal, Sukrit and Radhakrishnan, Anand and Srivastav, Deep Ratna and Wang, Hungjen , booktitle=. 2024 , organization=

work page 2024
[17]

2011 , publisher=

Bacinello, Anna Rita and Millossovich, Pietro and Olivieri, Annamaria and Pitacco, Ermanno , journal=. 2011 , publisher=

work page 2011
[18]

The Review of Economics and Statistics , urldate =

Lifetime Portfolio Selection By Dynamic Stochastic Programming , author =. The Review of Economics and Statistics , urldate =. 1969 , pages =

work page 1969
[19]

2018 , title =

Das, Sanjiv Ranjan and Ostrov, Daniel N and Radhakrishnan, Anand and Srivastav, Deep , journal =. 2018 , title =. doi:10.2139/ssrn.3117765 , url =

work page doi:10.2139/ssrn.3117765 2018
[20]

Journal of Banking & Finance , author =

Dynamic optimization for multi-goals wealth management , author =. Journal of Banking; Finance , publisher =. 2022 , month =. doi:10.1016/j.jbankfin.2021.106192 , url =

work page doi:10.1016/j.jbankfin.2021.106192 2022
[21]

Jucker and Jorge Alberto Garcia Gomez , volume =

James V. Jucker and Jorge Alberto Garcia Gomez , volume =. 1975 , pages =

work page 1975
[22]

Scientific Reports , publisher =

Quantifying the randomness of the stock markets , author =. Scientific Reports , publisher =. 2019 , month =. doi:10.1038/s41598-019-49320-9 , url =

work page doi:10.1038/s41598-019-49320-9 2019
[23]

and Fleet, David J

Wang, Jack M. and Fleet, David J. and Hertzmann, Aaron , number =. ACM Transactions on Graphics , publisher =. 2010 , title =

work page 2010
[24]

, publisher =

Spall, James C. , publisher =. 2003 , title =

work page 2003
[25]

Ng, Andrew and Jordan, Michael , booktitle=

work page
[26]

Annual Review of Statistics and Its Application , volume=

A review of reinforcement learning in financial applications , author=. Annual Review of Statistics and Its Application , volume=. 2025 , publisher=

work page 2025
[27]

Available at SSRN 5289956 , year=

A Pre-trained Reinforcement Learning Approach to Goals-Based Wealth Management , author=. Available at SSRN 5289956 , year=

work page
[28]

Expert systems with applications , volume=

Decision-making for financial trading: A fusion approach of machine learning and portfolio selection , author=. Expert systems with applications , volume=. 2019 , publisher=

work page 2019
[29]

arXiv preprint arXiv:1301.7380 , year=

Solving POMDPs by searching in policy space , author=. arXiv preprint arXiv:1301.7380 , year=

work page arXiv
[30]

Blackmore and B

L. Blackmore and B. Williams , journal =. 2007 , title =

work page 2007
[31]

2003 , title =

Andrew Ng and H-jin Kim and Michael Jordan and Shankar Sastry , journal =. 2003 , title =

work page 2003
[32]

Yao , volume =

Paul Glasserman and David D. Yao , volume =. 1992 , pages =

work page 1992
[33]

2017 , title =

Tim Salimans and Jonathan Ho and Xi Chen and Ilya Sutskever , volume =. 2017 , title =

work page 2017
[34]

2021 , pages =

Recent advances in reinforcement learning in finance , author =. 2021 , pages =

work page 2021
[35]

2015 , title =

Phelim, Boyle and Mary, Hardy and Anne, MacKay and David, Saunders , journal =. 2015 , title =

work page 2015
[36]

2003 , title =

Ng, Andrew , journal =. 2003 , title =

work page 2003
[37]

2016 , title =

Volodymyr Mnih and Adri. 2016 , title =

work page 2016
[38]

2021 , pages =

Stable-Baselines3: Reliable Reinforcement Learning Implementations , author =. 2021 , pages =

work page 2021
[39]

1952 , pages =

PORTFOLIO SELECTION , author =. 1952 , pages =. doi:https://doi.org/10.1111/j.1540-6261.1952.tb01525.x , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1540-6261.1952.tb01525.x , number =

work page doi:10.1111/j.1540-6261.1952.tb01525.x 1952
[40]

and Saffell, M

Moody, J. and Saffell, M. , journal=. Learning to trade via direct reinforcement , year=

work page
[41]

Sharing Longevity Risk: Why Governments Should Issue Longevity Bonds , volume =

Blake, David and Boardman, Tom and Cairns, Andrew , year =. Sharing Longevity Risk: Why Governments Should Issue Longevity Bonds , volume =. North American Actuarial Journal , doi =

work page
[42]

Reinforcement learning for optimized trade execution , volume =

Nevmyvaka, Yuriy and Feng, Yi and Kearns, Michael , year =. Reinforcement learning for optimized trade execution , volume =. ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning , doi =

work page 2006
[43]

2000 , journal =

Optimal execution of portfolio trans-actions , author=. 2000 , journal =

work page 2000
[44]

2017 , eprint=

A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem , author=. 2017 , eprint=

work page 2017
[45]

2019 , eprint=

QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds , author=. 2019 , eprint=

work page 2019
[46]

2022 , eprint=

Deep Hedging: Continuous Reinforcement Learning for Hedging of General Portfolios across Multiple Risk Aversions , author=. 2022 , eprint=

work page 2022
[47]

2011 , organization=

Alvi, Faisal and Ahmed, Moataz , booktitle=. 2011 , organization=

work page 2011
[48]

2012 IEEE Conference on Computational Intelligence and Games (CIG) , pages=

TD ( ) and Q-learning based Ludo players , author=. 2012 IEEE Conference on Computational Intelligence and Games (CIG) , pages=. 2012 , organization=

work page 2012
[49]

2023 7th IEEE Congress on Information Science and Technology (CiSt) , pages=

Incorporating Feature Penalty in Reinforcement Learning for Ludo Game , author=. 2023 7th IEEE Congress on Information Science and Technology (CiSt) , pages=. 2023 , organization=

work page 2023
[50]

Vittori, Edoardo and Likmeta, Amarildo and Restelli, Marcello , booktitle=

work page
[51]

2011 IEEE Conference on Computational Intelligence and Games (CIG'11) , pages=

Monte-Carlo tree search for the game of Scotland Yard , author=. 2011 IEEE Conference on Computational Intelligence and Games (CIG'11) , pages=. 2011 , organization=

work page 2011
[52]

Journal of Computational Finance , year=

Hedging of financial derivative contracts via Monte Carlo tree search , author=. Journal of Computational Finance , year=

work page
[53]

FinPlan 2023 , pages=

FinRDDL: Can AI planning be used for quantitative finance problems? , author=. FinPlan 2023 , pages=

work page 2023
[54]

2016 , publisher=

Silver, David and Huang, Aja and Maddison, Chris J and Guez, Arthur and Sifre, Laurent and Van Den Driessche, George and Schrittwieser, Julian and Antonoglou, Ioannis and Panneershelvam, Veda and Lanctot, Marc and others , journal=. 2016 , publisher=

work page 2016
[55]

2017 , publisher=

Silver, David and Schrittwieser, Julian and Simonyan, Karen and Antonoglou, Ioannis and Huang, Aja and Guez, Arthur and Hubert, Thomas and Baker, Lucas and Lai, Matthew and Bolton, Adrian and others , journal=. 2017 , publisher=

work page 2017
[56]

2018 , publisher=

Silver, David and Hubert, Thomas and Schrittwieser, Julian and Antonoglou, Ioannis and Lai, Matthew and Guez, Arthur and Lanctot, Marc and Sifre, Laurent and Kumaran, Dharshan and Graepel, Thore and others , journal=. 2018 , publisher=

work page 2018
[57]

Foundations and Trends in Machine Learning , volume=

Model-based reinforcement learning: A survey , author=. Foundations and Trends in Machine Learning , volume=. 2023 , publisher=

work page 2023
[58]

Nature , volume=

Discovering faster matrix multiplication algorithms with reinforcement learning , author=. Nature , volume=. 2022 , publisher=

work page 2022
[59]

Nature , volume=

Faster sorting algorithms discovered using deep reinforcement learning , author=. Nature , volume=. 2023 , publisher=

work page 2023
[60]

2022 , publisher=

Dam, Tuan and Chalvatzaki, Georgia and Peters, Jan and Pajarinen, Joni , journal=. 2022 , publisher=

work page 2022
[61]

2020 IEEE 16th International Conference on Automation Science and Engineering (CASE) , pages=

Energy-aware multi-goal motion planning guided by monte carlo search , author=. 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE) , pages=. 2020 , organization=

work page 2020
[62]

Sorensen , title =

Simon L.B. Sorensen , title =. 2023 , howpublished =

work page 2023
[63]

2023 , organization=

Sinclair, Sean R and Frujeri, Felipe Vieira and Cheng, Ching-An and Marshall, Luke and Barbalho, Hugo De Oliveira and Li, Jingling and Neville, Jennifer and Menache, Ishai and Swaminathan, Adith , booktitle=. 2023 , organization=

work page 2023
[64]

Mao, Hongzi and Venkatakrishnan, Shaileshh Bojja and Schwarzkopf, Malte and Alizadeh, Mohammad , journal=

work page
[65]

2000 , organization=

Chong, Edwin KP and Givan, Robert L and Chang, Hyeong Soo , booktitle=. 2000 , organization=

work page 2000
[66]

2015 , publisher=

Decision making under uncertainty: theory and application , author=. 2015 , publisher=

work page 2015
[67]

2022 , organization=

Efroni, Yonathan and Foster, Dylan J and Misra, Dipendra and Krishnamurthy, Akshay and Langford, John , booktitle=. 2022 , organization=

work page 2022
[68]

International Conference on Machine Learning , pages=

Discovering and removing exogenous state variables and rewards for reinforcement learning , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018
[69]

2008 , publisher=

Stout, Natasha K and Goldie, Sue J , journal=. 2008 , publisher=

work page 2008
[70]

Journal of Machine Learning Research , volume=

Variance reduction techniques for gradient estimates in reinforcement learning , author=. Journal of Machine Learning Research , volume=

work page
[71]

Computer-Aided Design , volume=

Using Monte-Carlo variance reduction in statistical tolerance synthesis , author=. Computer-Aided Design , volume=. 1997 , publisher=

work page 1997
[72]

1956 , organization=

Hammersley, John Michael and Morton, Keith William , booktitle=. 1956 , organization=

work page 1956
[73]

International Journal of Reliability and Safety , volume=

Separable Monte Carlo combined with importance sampling for variance reduction , author=. International Journal of Reliability and Safety , volume=. 2013 , publisher=

work page 2013
[74]

2007 , publisher=

Variance reduction three approaches to control variates , author=. 2007 , publisher=

work page 2007
[75]

2002 , organization=

Glynn, Peter W and Szechtman, Roberto , booktitle=. 2002 , organization=

work page 2002
[76]

Progress in Nuclear Energy , volume=

Monte Carlo variance reduction with deterministic importance functions , author=. Progress in Nuclear Energy , volume=. 2003 , publisher=

work page 2003
[77]

Science and Technology of Engineering, Chemistry and Environmental Protection , volume=

Variance Reduction in Monte Carlo Option Pricing: A Comparative Analysis of Control Variates, Multiple Control Variates and Antithetic Variates , author=. Science and Technology of Engineering, Chemistry and Environmental Protection , volume=

work page
[78]

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing , volume=

A study of stratified sampling in variance reduction techniques for parametric yield estimation , author=. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing , volume=. 2002 , publisher=

work page 2002
[79]

Journal of Machine Learning Research , volume=

Monte carlo gradient estimation in machine learning , author=. Journal of Machine Learning Research , volume=

work page
[80]

1981 , publisher=

Lavenberg, Stephen S and Welch, Peter D , journal=. 1981 , publisher=

work page 1981

Showing first 80 references.

[1] [1]

2015 , publisher=

Goals-based wealth management: An integrated and practical approach to changing the structure of wealth advisory practices , author=. 2015 , publisher=

work page 2015

[2] [2]

International conference on machine learning , pages=

Asynchronous methods for deep reinforcement learning , author=. International conference on machine learning , pages=. 2016 , organization=

work page 2016

[3] [3]

Journal of political economy , volume=

The pricing of options and corporate liabilities , author=. Journal of political economy , volume=. 1973 , publisher=

work page 1973

[4] [4]

1992 , publisher=

Aalen, Odd O , journal=. 1992 , publisher=

work page 1992

[5] [5]

2006 , address =

Elizabeth Arias , title =. 2006 , address =

work page 2006

[6] [6]

Journal of portfolio management , volume=

The sharpe ratio , author=. Journal of portfolio management , volume=. 1994 , publisher=

work page 1994

[7] [7]

Nature , volume=

Human-level control through deep reinforcement learning , author=. Nature , volume=. 2015 , publisher=

work page 2015

[8] [8]

1997 , publisher=

Monte Carlo Simulation , author=. 1997 , publisher=

work page 1997

[9] [9]

Advances in neural information processing systems , volume=

Actor-critic algorithms , author=. Advances in neural information processing systems , volume=

work page

[10] [10]

Journal of financial and quantitative analysis , volume=

An analytic derivation of the efficient portfolio frontier , author=. Journal of financial and quantitative analysis , volume=. 1972 , publisher=

work page 1972

[11] [11]

Computational Management Science , volume=

Dynamic portfolio allocation in goals-based wealth management , author=. Computational Management Science , volume=. 2020 , publisher=

work page 2020

[12] [12]

2012 , publisher=

Dynamic programming and optimal control: Volume I , author=. 2012 , publisher=

work page 2012

[13] [13]

2007 , publisher=

Hidden Markov Models in Finance , author=. 2007 , publisher=

work page 2007

[14] [14]

2011 , publisher=

B. 2011 , publisher=

work page 2011

[15] [15]

Sutton, Richard S and Barto, Andrew G , year=

work page

[16] [16]

2024 , organization=

Das, Sanjiv R and Ostrov, Daniel and Mittal, Sukrit and Radhakrishnan, Anand and Srivastav, Deep Ratna and Wang, Hungjen , booktitle=. 2024 , organization=

work page 2024

[17] [17]

2011 , publisher=

Bacinello, Anna Rita and Millossovich, Pietro and Olivieri, Annamaria and Pitacco, Ermanno , journal=. 2011 , publisher=

work page 2011

[18] [18]

The Review of Economics and Statistics , urldate =

Lifetime Portfolio Selection By Dynamic Stochastic Programming , author =. The Review of Economics and Statistics , urldate =. 1969 , pages =

work page 1969

[19] [19]

2018 , title =

Das, Sanjiv Ranjan and Ostrov, Daniel N and Radhakrishnan, Anand and Srivastav, Deep , journal =. 2018 , title =. doi:10.2139/ssrn.3117765 , url =

work page doi:10.2139/ssrn.3117765 2018

[20] [20]

Journal of Banking & Finance , author =

Dynamic optimization for multi-goals wealth management , author =. Journal of Banking; Finance , publisher =. 2022 , month =. doi:10.1016/j.jbankfin.2021.106192 , url =

work page doi:10.1016/j.jbankfin.2021.106192 2022

[21] [21]

Jucker and Jorge Alberto Garcia Gomez , volume =

James V. Jucker and Jorge Alberto Garcia Gomez , volume =. 1975 , pages =

work page 1975

[22] [22]

Scientific Reports , publisher =

Quantifying the randomness of the stock markets , author =. Scientific Reports , publisher =. 2019 , month =. doi:10.1038/s41598-019-49320-9 , url =

work page doi:10.1038/s41598-019-49320-9 2019

[23] [23]

and Fleet, David J

Wang, Jack M. and Fleet, David J. and Hertzmann, Aaron , number =. ACM Transactions on Graphics , publisher =. 2010 , title =

work page 2010

[24] [24]

, publisher =

Spall, James C. , publisher =. 2003 , title =

work page 2003

[25] [25]

Ng, Andrew and Jordan, Michael , booktitle=

work page

[26] [26]

Annual Review of Statistics and Its Application , volume=

A review of reinforcement learning in financial applications , author=. Annual Review of Statistics and Its Application , volume=. 2025 , publisher=

work page 2025

[27] [27]

Available at SSRN 5289956 , year=

A Pre-trained Reinforcement Learning Approach to Goals-Based Wealth Management , author=. Available at SSRN 5289956 , year=

work page

[28] [28]

Expert systems with applications , volume=

Decision-making for financial trading: A fusion approach of machine learning and portfolio selection , author=. Expert systems with applications , volume=. 2019 , publisher=

work page 2019

[29] [29]

arXiv preprint arXiv:1301.7380 , year=

Solving POMDPs by searching in policy space , author=. arXiv preprint arXiv:1301.7380 , year=

work page arXiv

[30] [30]

Blackmore and B

L. Blackmore and B. Williams , journal =. 2007 , title =

work page 2007

[31] [31]

2003 , title =

Andrew Ng and H-jin Kim and Michael Jordan and Shankar Sastry , journal =. 2003 , title =

work page 2003

[32] [32]

Yao , volume =

Paul Glasserman and David D. Yao , volume =. 1992 , pages =

work page 1992

[33] [33]

2017 , title =

Tim Salimans and Jonathan Ho and Xi Chen and Ilya Sutskever , volume =. 2017 , title =

work page 2017

[34] [34]

2021 , pages =

Recent advances in reinforcement learning in finance , author =. 2021 , pages =

work page 2021

[35] [35]

2015 , title =

Phelim, Boyle and Mary, Hardy and Anne, MacKay and David, Saunders , journal =. 2015 , title =

work page 2015

[36] [36]

2003 , title =

Ng, Andrew , journal =. 2003 , title =

work page 2003

[37] [37]

2016 , title =

Volodymyr Mnih and Adri. 2016 , title =

work page 2016

[38] [38]

2021 , pages =

Stable-Baselines3: Reliable Reinforcement Learning Implementations , author =. 2021 , pages =

work page 2021

[39] [39]

1952 , pages =

PORTFOLIO SELECTION , author =. 1952 , pages =. doi:https://doi.org/10.1111/j.1540-6261.1952.tb01525.x , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1540-6261.1952.tb01525.x , number =

work page doi:10.1111/j.1540-6261.1952.tb01525.x 1952

[40] [40]

and Saffell, M

Moody, J. and Saffell, M. , journal=. Learning to trade via direct reinforcement , year=

work page

[41] [41]

Sharing Longevity Risk: Why Governments Should Issue Longevity Bonds , volume =

Blake, David and Boardman, Tom and Cairns, Andrew , year =. Sharing Longevity Risk: Why Governments Should Issue Longevity Bonds , volume =. North American Actuarial Journal , doi =

work page

[42] [42]

Reinforcement learning for optimized trade execution , volume =

Nevmyvaka, Yuriy and Feng, Yi and Kearns, Michael , year =. Reinforcement learning for optimized trade execution , volume =. ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning , doi =

work page 2006

[43] [43]

2000 , journal =

Optimal execution of portfolio trans-actions , author=. 2000 , journal =

work page 2000

[44] [44]

2017 , eprint=

A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem , author=. 2017 , eprint=

work page 2017

[45] [45]

2019 , eprint=

QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds , author=. 2019 , eprint=

work page 2019

[46] [46]

2022 , eprint=

Deep Hedging: Continuous Reinforcement Learning for Hedging of General Portfolios across Multiple Risk Aversions , author=. 2022 , eprint=

work page 2022

[47] [47]

2011 , organization=

Alvi, Faisal and Ahmed, Moataz , booktitle=. 2011 , organization=

work page 2011

[48] [48]

2012 IEEE Conference on Computational Intelligence and Games (CIG) , pages=

TD ( ) and Q-learning based Ludo players , author=. 2012 IEEE Conference on Computational Intelligence and Games (CIG) , pages=. 2012 , organization=

work page 2012

[49] [49]

2023 7th IEEE Congress on Information Science and Technology (CiSt) , pages=

Incorporating Feature Penalty in Reinforcement Learning for Ludo Game , author=. 2023 7th IEEE Congress on Information Science and Technology (CiSt) , pages=. 2023 , organization=

work page 2023

[50] [50]

Vittori, Edoardo and Likmeta, Amarildo and Restelli, Marcello , booktitle=

work page

[51] [51]

2011 IEEE Conference on Computational Intelligence and Games (CIG'11) , pages=

Monte-Carlo tree search for the game of Scotland Yard , author=. 2011 IEEE Conference on Computational Intelligence and Games (CIG'11) , pages=. 2011 , organization=

work page 2011

[52] [52]

Journal of Computational Finance , year=

Hedging of financial derivative contracts via Monte Carlo tree search , author=. Journal of Computational Finance , year=

work page

[53] [53]

FinPlan 2023 , pages=

FinRDDL: Can AI planning be used for quantitative finance problems? , author=. FinPlan 2023 , pages=

work page 2023

[54] [54]

2016 , publisher=

Silver, David and Huang, Aja and Maddison, Chris J and Guez, Arthur and Sifre, Laurent and Van Den Driessche, George and Schrittwieser, Julian and Antonoglou, Ioannis and Panneershelvam, Veda and Lanctot, Marc and others , journal=. 2016 , publisher=

work page 2016

[55] [55]

2017 , publisher=

Silver, David and Schrittwieser, Julian and Simonyan, Karen and Antonoglou, Ioannis and Huang, Aja and Guez, Arthur and Hubert, Thomas and Baker, Lucas and Lai, Matthew and Bolton, Adrian and others , journal=. 2017 , publisher=

work page 2017

[56] [56]

2018 , publisher=

Silver, David and Hubert, Thomas and Schrittwieser, Julian and Antonoglou, Ioannis and Lai, Matthew and Guez, Arthur and Lanctot, Marc and Sifre, Laurent and Kumaran, Dharshan and Graepel, Thore and others , journal=. 2018 , publisher=

work page 2018

[57] [57]

Foundations and Trends in Machine Learning , volume=

Model-based reinforcement learning: A survey , author=. Foundations and Trends in Machine Learning , volume=. 2023 , publisher=

work page 2023

[58] [58]

Nature , volume=

Discovering faster matrix multiplication algorithms with reinforcement learning , author=. Nature , volume=. 2022 , publisher=

work page 2022

[59] [59]

Nature , volume=

Faster sorting algorithms discovered using deep reinforcement learning , author=. Nature , volume=. 2023 , publisher=

work page 2023

[60] [60]

2022 , publisher=

Dam, Tuan and Chalvatzaki, Georgia and Peters, Jan and Pajarinen, Joni , journal=. 2022 , publisher=

work page 2022

[61] [61]

2020 IEEE 16th International Conference on Automation Science and Engineering (CASE) , pages=

Energy-aware multi-goal motion planning guided by monte carlo search , author=. 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE) , pages=. 2020 , organization=

work page 2020

[62] [62]

Sorensen , title =

Simon L.B. Sorensen , title =. 2023 , howpublished =

work page 2023

[63] [63]

2023 , organization=

Sinclair, Sean R and Frujeri, Felipe Vieira and Cheng, Ching-An and Marshall, Luke and Barbalho, Hugo De Oliveira and Li, Jingling and Neville, Jennifer and Menache, Ishai and Swaminathan, Adith , booktitle=. 2023 , organization=

work page 2023

[64] [64]

Mao, Hongzi and Venkatakrishnan, Shaileshh Bojja and Schwarzkopf, Malte and Alizadeh, Mohammad , journal=

work page

[65] [65]

2000 , organization=

Chong, Edwin KP and Givan, Robert L and Chang, Hyeong Soo , booktitle=. 2000 , organization=

work page 2000

[66] [66]

2015 , publisher=

Decision making under uncertainty: theory and application , author=. 2015 , publisher=

work page 2015

[67] [67]

2022 , organization=

Efroni, Yonathan and Foster, Dylan J and Misra, Dipendra and Krishnamurthy, Akshay and Langford, John , booktitle=. 2022 , organization=

work page 2022

[68] [68]

International Conference on Machine Learning , pages=

Discovering and removing exogenous state variables and rewards for reinforcement learning , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018

[69] [69]

2008 , publisher=

Stout, Natasha K and Goldie, Sue J , journal=. 2008 , publisher=

work page 2008

[70] [70]

Journal of Machine Learning Research , volume=

Variance reduction techniques for gradient estimates in reinforcement learning , author=. Journal of Machine Learning Research , volume=

work page

[71] [71]

Computer-Aided Design , volume=

Using Monte-Carlo variance reduction in statistical tolerance synthesis , author=. Computer-Aided Design , volume=. 1997 , publisher=

work page 1997

[72] [72]

1956 , organization=

Hammersley, John Michael and Morton, Keith William , booktitle=. 1956 , organization=

work page 1956

[73] [73]

International Journal of Reliability and Safety , volume=

Separable Monte Carlo combined with importance sampling for variance reduction , author=. International Journal of Reliability and Safety , volume=. 2013 , publisher=

work page 2013

[74] [74]

2007 , publisher=

Variance reduction three approaches to control variates , author=. 2007 , publisher=

work page 2007

[75] [75]

2002 , organization=

Glynn, Peter W and Szechtman, Roberto , booktitle=. 2002 , organization=

work page 2002

[76] [76]

Progress in Nuclear Energy , volume=

Monte Carlo variance reduction with deterministic importance functions , author=. Progress in Nuclear Energy , volume=. 2003 , publisher=

work page 2003

[77] [77]

Science and Technology of Engineering, Chemistry and Environmental Protection , volume=

Variance Reduction in Monte Carlo Option Pricing: A Comparative Analysis of Control Variates, Multiple Control Variates and Antithetic Variates , author=. Science and Technology of Engineering, Chemistry and Environmental Protection , volume=

work page

[78] [78]

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing , volume=

A study of stratified sampling in variance reduction techniques for parametric yield estimation , author=. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing , volume=. 2002 , publisher=

work page 2002

[79] [79]

Journal of Machine Learning Research , volume=

Monte carlo gradient estimation in machine learning , author=. Journal of Machine Learning Research , volume=

work page

[80] [80]

1981 , publisher=

Lavenberg, Stephen S and Welch, Peter D , journal=. 1981 , publisher=

work page 1981