Multilevel Monte-Carlo for Solving POMDPs Online

Alberto Elfes; Hanna Kurniawati; Marcus Hoerger

arxiv: 1907.09673 · v1 · pith:SGG6MY7Anew · submitted 2019-07-23 · 💻 cs.RO

Multilevel Monte-Carlo for Solving POMDPs Online

Marcus Hoerger , Hanna Kurniawati , Alberto Elfes This is my paper

Pith reviewed 2026-05-24 17:54 UTC · model grok-4.3

classification 💻 cs.RO

keywords POMDPMonte Carlo Tree SearchMultilevel Monte Carloonline planningroboticspartial observabilitymotion planning

0 comments

The pith

MLPP combines multilevel Monte Carlo with tree search to solve POMDPs with complex dynamics faster.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Multilevel POMDP Planner (MLPP) as a new online solver for Partially Observable Markov Decision Processes. It integrates the concept of multilevel Monte Carlo into Monte Carlo tree search to reduce the cost of forward simulations needed for estimating action outcomes. This is particularly useful for robotic systems with non-linear dynamics that are expensive to simulate and for problems with long planning horizons. A reader would care because it makes it feasible to plan under uncertainty in more realistic robot scenarios where previous methods were too slow.

Core claim

The central claim is that multilevel Monte Carlo can be combined with Monte Carlo tree search to produce faster yet sufficiently accurate value estimates for POMDPs with complex dynamics, enabling the generation of approximately optimal solutions online. The resulting MLPP method is shown through experiments on four problems to substantially outperform state-of-the-art POMDP solvers in torque control, navigation, and grasping tasks.

What carries the argument

The Multilevel POMDP Planner (MLPP), which merges multilevel Monte Carlo sampling with Monte Carlo tree search to accelerate computation of expected outcomes in POMDPs.

If this is right

MLPP allows solving POMDPs online for systems with complex non-linear dynamics that lack closed-form solutions.
It reduces the prohibitive expense of many forward simulations required in long-horizon planning.
The approach yields robust solutions where standard Monte Carlo methods are too computationally demanding.
Experiments demonstrate superior performance over existing POMDP solvers on robotic control and navigation problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

MLPP could enable POMDP-based planning to be used in more real-time robotic applications by lowering computational demands.
This multilevel approach might be adapted to other Monte Carlo-based methods in planning under uncertainty.
Testing on a wider range of dynamics models could reveal how broadly the speed-up applies.

Load-bearing premise

That multilevel Monte Carlo combined with MCTS produces value estimates accurate enough to support approximately optimal POMDP solutions without unacceptable bias or variance.

What would settle it

Experiments on the torque control or grasping problems where MLPP takes more time or yields lower-quality policies than standard POMDP solvers.

Figures

Figures reproduced from arXiv: 1907.09673 by Alberto Elfes, Hanna Kurniawati, Marcus Hoerger.

**Figure 1.** Figure 1: Test scenarios used to evaluate MLPP. (a) 4DOF-Factory (b) KukaOffice (c) CarNavigation (d) MovoGrasping 5.1 Problem scenarios with expensive transition dynamics 4DOF-Factory A torque-controlled manipulator with 4 revolute joints must move from an initial state to a state where the end-effector lies inside a goal region (colored green in [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗

**Figure 2.** Figure 2: Average variance of Ql (solid blue line) and Ql − Ql−1 (dashed red line) for the problem scenarios (a) 4DOF-Factory, (b) KukaOffice, (c) CarNavigation and (d) MovoGrasp. The x-axis represents the level l and the y-axis represents the variance. Average total discounted rewards [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Average total discounted reward of MLPP, ABT and POMCP on the 4DOF-Factory (a), KukaOffice (b), CarNavigation (c) and MovoGrasp (d) scenarios. The x-axis represents the level of approximation of the transition function used for planning. Note that MLPP uses all levels for planning (hence the horizontal lines), whereas ABT and POMCP use only a single level as indicated by the x-axis. For each scenario, the … view at source ↗

**Figure 4.** Figure 4: Average total discounted rewards for ABT, POMCP and MLPP for 4DOF-Factory using increasing planning times per step. The average is taken over 500 simulation runs for each planning time and algorithm. Vertical bars are the 95% confidence intervals. Increasing planning times [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

read the original abstract

Planning under partial obervability is essential for autonomous robots. A principled way to address such planning problems is the Partially Observable Markov Decision Process (POMDP). Although solving POMDPs is computationally intractable, substantial advancements have been achieved in developing approximate POMDP solvers in the past two decades. However, computing robust solutions for systems with complex dynamics remain challenging. Most on-line solvers rely on a large number of forward-simulations and standard Monte-Carlo methods to compute the expected outcomes of actions the robot can perform. For systems with complex dynamics, e.g., those with non-linear dynamics that admit no closed form solution, even a single forward simulation can be prohibitively expensive. Of course, this issue exacerbates for problems with long planning horizons. This paper aims to alleviate the above difficulty. To this end, we propose a new on-line POMDP solver, called Multilevel POMDP Planner (MLPP), that combines the commonly known Monte-Carlo-Tree-Search with the concept of Multilevel Monte-Carlo to speed-up our capability in generating approximately optimal solutions for POMDPs with complex dynamics. Experiments on four different problems of POMDP-based torque control, navigation and grasping indicate that MLPP substantially outperforms state-of-the-art POMDP solvers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MLPP combines multilevel Monte Carlo with MCTS for online POMDPs but the abstract supplies no integration details or quantitative results, leaving the performance claims uncheckable.

read the letter

The paper's core move is to replace standard Monte Carlo rollouts inside MCTS with a multilevel estimator so that POMDPs with expensive nonlinear simulations can still be solved online. The motivation is clear: long horizons and complex dynamics make each forward simulation costly, and the authors argue that multilevel sampling can cut that cost while keeping value estimates accurate enough for approximately optimal policies. That framing matches a known pain point in robotics planning under uncertainty. The four test domains (torque control, navigation, grasping) are reasonable choices for showing relevance. The claim that MLPP substantially outperforms existing solvers is stated directly. The soft spot is that none of the supporting evidence appears in the abstract. There is no description of how the multilevel levels are chosen, how bias or variance is controlled inside the tree, what the convergence argument is, or even the raw numbers from the experiments. Without those pieces it is impossible to judge whether the reported gains come from the method or from favorable problem instances. The work is therefore still at the stage of a plausible algorithmic sketch rather than a validated result. This paper is aimed at people already working on online POMDP solvers who need faster simulation handling. A reader in that group could extract the basic idea and try to implement it, but would have to fill in the missing analysis themselves. I would send it to peer review because the problem is real and the proposed direction is not obviously broken, even though the current version needs substantial additional technical content before it can be assessed properly.

Referee Report

1 major / 0 minor

Summary. The paper proposes the Multilevel POMDP Planner (MLPP), a new online solver that combines Monte-Carlo Tree Search with Multilevel Monte Carlo to generate approximately optimal solutions for POMDPs with complex dynamics more efficiently. It reports that experiments on four problems involving torque control, navigation, and grasping show that MLPP substantially outperforms state-of-the-art POMDP solvers.

Significance. If the combination of multilevel Monte Carlo with MCTS can indeed produce faster value estimates without introducing unacceptable bias or variance, this approach could enable more robust planning for autonomous robots in domains with non-linear dynamics and long horizons where standard Monte Carlo simulations are too expensive.

major comments (1)

[Abstract] Abstract: The abstract claims substantial outperformance on four problems but supplies no algorithm details, experimental design, quantitative metrics, or error analysis. This makes it impossible to evaluate the soundness of the central claim that the multilevel estimator maintains sufficient accuracy for approximately optimal solutions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their comments. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract claims substantial outperformance on four problems but supplies no algorithm details, experimental design, quantitative metrics, or error analysis. This makes it impossible to evaluate the soundness of the central claim that the multilevel estimator maintains sufficient accuracy for approximately optimal solutions.

Authors: We agree that the abstract, due to its brevity, does not include algorithm details, experimental design, quantitative metrics, or error analysis. The manuscript body provides these elements, including the integration of multilevel Monte Carlo with MCTS, the four experimental domains, performance comparisons, and analysis of the estimator. To directly address the concern, we will revise the abstract to include key quantitative highlights and a brief statement on estimator accuracy. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new algorithmic combination with independent validation

full rationale

The provided abstract and context describe MLPP as a novel combination of established MCTS and Multilevel Monte-Carlo techniques to accelerate value estimation in POMDPs with complex dynamics. No derivation equations, parameter-fitting steps, self-citations, or uniqueness claims are shown that reduce the claimed result to its inputs by construction. Experiments on four problems are presented as external validation rather than tautological outputs. This matches the default expectation of non-circular papers; the central claim remains an independent algorithmic proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides insufficient detail to enumerate free parameters, axioms, or invented entities; the contribution is algorithmic rather than introducing new physical entities or unstated mathematical axioms beyond standard POMDP and Monte Carlo assumptions.

pith-pipeline@v0.9.0 · 5755 in / 1154 out tokens · 38012 ms · 2026-05-24T17:54:25.194441+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

In: Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ In- ternational Conference on, pp

Agha-Mohammadi, A.A., Chakravorty, S., Amato, N.M.: Firm: Feedback controller-based information-state roadmap-a framework for motion planning un- der uncertainty. In: Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ In- ternational Conference on, pp. 4284–4291. IEEE (2011)

work page 2011
[2]

Multiscale Modeling & Simula- tion 10(1), 146–179 (2012)

Anderson, D.F., Higham, D.J.: Multilevel monte carlo for continuous time markov chains, with applications in biochemical kinetics. Multiscale Modeling & Simula- tion 10(1), 146–179 (2012)

work page 2012
[3]

IEEE Transactions on signal processing 50(2), 174–188 (2002)

Arulampalam, M.S., Maskell, S., Gordon, N., Clapp, T.: A tutorial on particle ﬁlters for online nonlinear/non-gaussian bayesian tracking. IEEE Transactions on signal processing 50(2), 174–188 (2002)

work page 2002
[4]

Machine Learning 47(2-3), 235–256 (2002)

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2-3), 235–256 (2002)

work page 2002
[5]

Robotics: Science and Systems VII 1, 1–8 (2012)

Bai, H., Hsu, D.: Unmanned aircraft collision avoidance using continuous-state pomdps. Robotics: Science and Systems VII 1, 1–8 (2012)

work page 2012
[6]

The International Journal of Robotics Research 33(9), 1288–1302 (2014)

Bai, H., Hsu, D., Lee, W.S.: Integrated perception and planning in the continuous space: A pomdp approach. The International Journal of Robotics Research 33(9), 1288–1302 (2014)

work page 2014
[7]

Journal of Computational Physics 314, 661–681 (2016)

Bierig, C., Chernov, A.: Approximation of probability density functions by the mul- tilevel monte carlo maximum entropy method. Journal of Computational Physics 314, 661–681 (2016)

work page 2016
[8]

Operations Research 56(3), 607–617 (2008)

Giles, M.B.: Multilevel monte carlo path simulation. Operations Research 56(3), 607–617 (2008)

work page 2008
[9]

Acta Numerica 24, 259–328 (2015)

Giles, M.B.: Multilevel monte carlo methods. Acta Numerica 24, 259–328 (2015)

work page 2015
[10]

In: Proceedings of the National Conference on Artiﬁcial Intelligence, vol

He, R., Brunskill, E., Roy, N.: Puma: Planning under uncertainty with macro- actions. In: Proceedings of the National Conference on Artiﬁcial Intelligence, vol. 2 (2010)

work page 2010
[11]

Heinrich, S.: Multilevel monte carlo methods. In: S. Margenov, J. Wa´ sniewski, P. Yalamov (eds.) Large-Scale Scientiﬁc Computing, pp. 58–67. Springer Berlin Heidelberg, Berlin, Heidelberg (2001)

work page 2001
[12]

In: Proc

Hoerger, M., Kurniawati, H., Elfes, A.: A software framework for planning under partial observability. In: Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–9. IEEE (2018)

work page 2018
[13]

In: Proceedings of the 19th International Joint Conference on Artiﬁ- cial Intelligence, IJCAI’05, pp

Hoey, J., Poupart, P.: Solving pomdps with continuous or large discrete observa- tion spaces. In: Proceedings of the 19th International Joint Conference on Artiﬁ- cial Intelligence, IJCAI’05, pp. 1332–1338. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2005) 16 Marcus Hoerger et al

work page 2005
[14]

In: Robotics and Automation (ICRA), 2013 IEEE International Confer- ence on, pp

Horowitz, M., Burdick, J.: Interactive non-prehensile manipulation for grasping via pomdps. In: Robotics and Automation (ICRA), 2013 IEEE International Confer- ence on, pp. 3257–3264. IEEE (2013)

work page 2013
[15]

In: Robotics and Automation, 2007 IEEE International Conference on, pp

Hsiao, K., Kaelbling, L.P., Lozano-Perez, T.: Grasping pomdps. In: Robotics and Automation, 2007 IEEE International Conference on, pp. 4685–4692. IEEE (2007)

work page 2007
[16]

In: Proceedings of the Australasian Conference on Robotics and Automation (2014)

Klimenko, D., Song, J., Kurniawati, H.: Tapir: a software toolkit for approximat- ing and adapting pomdp solutions online. In: Proceedings of the Australasian Conference on Robotics and Automation (2014)

work page 2014
[17]

In: European con- ference on machine learning, pp

Kocsis, L., Szepesv´ ari, C.: Bandit based monte-carlo planning. In: European con- ference on machine learning, pp. 282–293. Springer (2006)

work page 2006
[18]

The International Journal of Robotics Research 30(3), 308–323 (2011)

Kurniawati, H., Du, Y., Hsu, D., Lee, W.S.: Motion planning under uncertainty for robotic tasks with long time horizons. The International Journal of Robotics Research 30(3), 308–323 (2011)

work page 2011
[19]

In: In Proc

Kurniawati, H., Hsu, D., Lee, W.S.: Sarsop: Eﬃcient point-based pomdp planning by approximating optimally reachable belief spaces. In: In Proc. Robotics: Science and Systems (2008)

work page 2008
[20]

In: Proc

Kurniawati, H., Yadav, V.: An online pomdp solver for uncertainty planning in dynamic environment. In: Proc. Int. Symp. on Robotics Research (2013)

work page 2013
[21]

The International Journal of Robotics Research p

Luo, Y., Bai, H., Hsu, D., Lee, W.S.: Importance sampling for online plan- ning under uncertainty. The International Journal of Robotics Research p. 0278364918780322

work page
[22]

Owen, A.B.: Monte Carlo theory, methods and examples (2013)

work page 2013
[23]

Mathematics of operations research 12(3), 441–450 (1987)

Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of markov decision processes. Mathematics of operations research 12(3), 441–450 (1987)

work page 1987
[24]

Pineau, J., Gordon, G., Thrun, S.: Point-based Value Iteration: An anytime algo- rithm for POMDPs (2003)

work page 2003
[25]

In: Proceedings of the Winter Simulation Conference, p

Rhee, C.h., Glynn, P.W.: A new approach to unbiased estimation for sde’s. In: Proceedings of the Winter Simulation Conference, p. 17. Winter Simulation Con- ference (2012)

work page 2012
[26]

In: Robotics and Automation (ICRA), 2015 IEEE International Conference on, pp

Seiler, K.M., Kurniawati, H., Singh, S.P.: An online and approximate solver for pomdps with continuous action space. In: Robotics and Automation (ICRA), 2015 IEEE International Conference on, pp. 2290–2297. IEEE (2015)

work page 2015
[27]

In: Advances in neural information processing systems, pp

Silver, D., Veness, J.: Monte-carlo planning in large POMDPs. In: Advances in neural information processing systems, pp. 2164–2172 (2010)

work page 2010
[28]

http://www.ode.org/

Smith, R.: Open dynamics engine. http://www.ode.org/

work page
[29]

Smith, T., Simmons, R.: Point-based POMDP algorithms: Improved analysis and implementation (2005)

work page 2005
[30]

In: Advances in neural information processing systems, pp

Somani, A., Ye, N., Hsu, D., Lee, W.S.: Despot: Online pomdp planning with regularization. In: Advances in neural information processing systems, pp. 1772– 1780 (2013)

work page 2013
[31]

Sondik, E.J.: The optimal control of partially observable markov decision processes. Ph.D. thesis, Stanford, California (1971)

work page 1971
[32]

Spong, M.W., Hutchinson, S., Vidyasagar, M.: Robot Modeling and Control, vol. 3. Wiley New York (2006)

work page 2006
[33]

In: Twenty-Eighth International Conference on Automated Planning and Scheduling (2018)

Sunberg, Z.N., Kochenderfer, M.J.: Online algorithms for pomdps with continuous state, action, and observation spaces. In: Twenty-Eighth International Conference on Automated Planning and Scheduling (2018)

work page 2018
[34]

MIT Press (2012)

Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press (2012)

work page 2012
[35]

In: ICAPS, pp

Wang, E., Kurniawati, H., Kroese, D.P.: An on-line planner for pomdps with large discrete action space: A quantile-based approach. In: ICAPS, pp. 273–277. AAAI Press (2018)

work page 2018

[1] [1]

In: Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ In- ternational Conference on, pp

Agha-Mohammadi, A.A., Chakravorty, S., Amato, N.M.: Firm: Feedback controller-based information-state roadmap-a framework for motion planning un- der uncertainty. In: Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ In- ternational Conference on, pp. 4284–4291. IEEE (2011)

work page 2011

[2] [2]

Multiscale Modeling & Simula- tion 10(1), 146–179 (2012)

Anderson, D.F., Higham, D.J.: Multilevel monte carlo for continuous time markov chains, with applications in biochemical kinetics. Multiscale Modeling & Simula- tion 10(1), 146–179 (2012)

work page 2012

[3] [3]

IEEE Transactions on signal processing 50(2), 174–188 (2002)

Arulampalam, M.S., Maskell, S., Gordon, N., Clapp, T.: A tutorial on particle ﬁlters for online nonlinear/non-gaussian bayesian tracking. IEEE Transactions on signal processing 50(2), 174–188 (2002)

work page 2002

[4] [4]

Machine Learning 47(2-3), 235–256 (2002)

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2-3), 235–256 (2002)

work page 2002

[5] [5]

Robotics: Science and Systems VII 1, 1–8 (2012)

Bai, H., Hsu, D.: Unmanned aircraft collision avoidance using continuous-state pomdps. Robotics: Science and Systems VII 1, 1–8 (2012)

work page 2012

[6] [6]

The International Journal of Robotics Research 33(9), 1288–1302 (2014)

Bai, H., Hsu, D., Lee, W.S.: Integrated perception and planning in the continuous space: A pomdp approach. The International Journal of Robotics Research 33(9), 1288–1302 (2014)

work page 2014

[7] [7]

Journal of Computational Physics 314, 661–681 (2016)

Bierig, C., Chernov, A.: Approximation of probability density functions by the mul- tilevel monte carlo maximum entropy method. Journal of Computational Physics 314, 661–681 (2016)

work page 2016

[8] [8]

Operations Research 56(3), 607–617 (2008)

Giles, M.B.: Multilevel monte carlo path simulation. Operations Research 56(3), 607–617 (2008)

work page 2008

[9] [9]

Acta Numerica 24, 259–328 (2015)

Giles, M.B.: Multilevel monte carlo methods. Acta Numerica 24, 259–328 (2015)

work page 2015

[10] [10]

In: Proceedings of the National Conference on Artiﬁcial Intelligence, vol

He, R., Brunskill, E., Roy, N.: Puma: Planning under uncertainty with macro- actions. In: Proceedings of the National Conference on Artiﬁcial Intelligence, vol. 2 (2010)

work page 2010

[11] [11]

Heinrich, S.: Multilevel monte carlo methods. In: S. Margenov, J. Wa´ sniewski, P. Yalamov (eds.) Large-Scale Scientiﬁc Computing, pp. 58–67. Springer Berlin Heidelberg, Berlin, Heidelberg (2001)

work page 2001

[12] [12]

In: Proc

Hoerger, M., Kurniawati, H., Elfes, A.: A software framework for planning under partial observability. In: Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–9. IEEE (2018)

work page 2018

[13] [13]

In: Proceedings of the 19th International Joint Conference on Artiﬁ- cial Intelligence, IJCAI’05, pp

Hoey, J., Poupart, P.: Solving pomdps with continuous or large discrete observa- tion spaces. In: Proceedings of the 19th International Joint Conference on Artiﬁ- cial Intelligence, IJCAI’05, pp. 1332–1338. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2005) 16 Marcus Hoerger et al

work page 2005

[14] [14]

In: Robotics and Automation (ICRA), 2013 IEEE International Confer- ence on, pp

Horowitz, M., Burdick, J.: Interactive non-prehensile manipulation for grasping via pomdps. In: Robotics and Automation (ICRA), 2013 IEEE International Confer- ence on, pp. 3257–3264. IEEE (2013)

work page 2013

[15] [15]

In: Robotics and Automation, 2007 IEEE International Conference on, pp

Hsiao, K., Kaelbling, L.P., Lozano-Perez, T.: Grasping pomdps. In: Robotics and Automation, 2007 IEEE International Conference on, pp. 4685–4692. IEEE (2007)

work page 2007

[16] [16]

In: Proceedings of the Australasian Conference on Robotics and Automation (2014)

Klimenko, D., Song, J., Kurniawati, H.: Tapir: a software toolkit for approximat- ing and adapting pomdp solutions online. In: Proceedings of the Australasian Conference on Robotics and Automation (2014)

work page 2014

[17] [17]

In: European con- ference on machine learning, pp

Kocsis, L., Szepesv´ ari, C.: Bandit based monte-carlo planning. In: European con- ference on machine learning, pp. 282–293. Springer (2006)

work page 2006

[18] [18]

The International Journal of Robotics Research 30(3), 308–323 (2011)

Kurniawati, H., Du, Y., Hsu, D., Lee, W.S.: Motion planning under uncertainty for robotic tasks with long time horizons. The International Journal of Robotics Research 30(3), 308–323 (2011)

work page 2011

[19] [19]

In: In Proc

Kurniawati, H., Hsu, D., Lee, W.S.: Sarsop: Eﬃcient point-based pomdp planning by approximating optimally reachable belief spaces. In: In Proc. Robotics: Science and Systems (2008)

work page 2008

[20] [20]

In: Proc

Kurniawati, H., Yadav, V.: An online pomdp solver for uncertainty planning in dynamic environment. In: Proc. Int. Symp. on Robotics Research (2013)

work page 2013

[21] [21]

The International Journal of Robotics Research p

Luo, Y., Bai, H., Hsu, D., Lee, W.S.: Importance sampling for online plan- ning under uncertainty. The International Journal of Robotics Research p. 0278364918780322

work page

[22] [22]

Owen, A.B.: Monte Carlo theory, methods and examples (2013)

work page 2013

[23] [23]

Mathematics of operations research 12(3), 441–450 (1987)

Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of markov decision processes. Mathematics of operations research 12(3), 441–450 (1987)

work page 1987

[24] [24]

Pineau, J., Gordon, G., Thrun, S.: Point-based Value Iteration: An anytime algo- rithm for POMDPs (2003)

work page 2003

[25] [25]

In: Proceedings of the Winter Simulation Conference, p

Rhee, C.h., Glynn, P.W.: A new approach to unbiased estimation for sde’s. In: Proceedings of the Winter Simulation Conference, p. 17. Winter Simulation Con- ference (2012)

work page 2012

[26] [26]

In: Robotics and Automation (ICRA), 2015 IEEE International Conference on, pp

Seiler, K.M., Kurniawati, H., Singh, S.P.: An online and approximate solver for pomdps with continuous action space. In: Robotics and Automation (ICRA), 2015 IEEE International Conference on, pp. 2290–2297. IEEE (2015)

work page 2015

[27] [27]

In: Advances in neural information processing systems, pp

Silver, D., Veness, J.: Monte-carlo planning in large POMDPs. In: Advances in neural information processing systems, pp. 2164–2172 (2010)

work page 2010

[28] [28]

http://www.ode.org/

Smith, R.: Open dynamics engine. http://www.ode.org/

work page

[29] [29]

Smith, T., Simmons, R.: Point-based POMDP algorithms: Improved analysis and implementation (2005)

work page 2005

[30] [30]

In: Advances in neural information processing systems, pp

Somani, A., Ye, N., Hsu, D., Lee, W.S.: Despot: Online pomdp planning with regularization. In: Advances in neural information processing systems, pp. 1772– 1780 (2013)

work page 2013

[31] [31]

Sondik, E.J.: The optimal control of partially observable markov decision processes. Ph.D. thesis, Stanford, California (1971)

work page 1971

[32] [32]

Spong, M.W., Hutchinson, S., Vidyasagar, M.: Robot Modeling and Control, vol. 3. Wiley New York (2006)

work page 2006

[33] [33]

In: Twenty-Eighth International Conference on Automated Planning and Scheduling (2018)

Sunberg, Z.N., Kochenderfer, M.J.: Online algorithms for pomdps with continuous state, action, and observation spaces. In: Twenty-Eighth International Conference on Automated Planning and Scheduling (2018)

work page 2018

[34] [34]

MIT Press (2012)

Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press (2012)

work page 2012

[35] [35]

In: ICAPS, pp

Wang, E., Kurniawati, H., Kroese, D.P.: An on-line planner for pomdps with large discrete action space: A quantile-based approach. In: ICAPS, pp. 273–277. AAAI Press (2018)

work page 2018