Hybrid Planning for Dynamic Multimodal Stochastic Shortest Paths

Mykel J. Kochenderfer; Shushman Choudhury

arxiv: 1906.09094 · v1 · pith:66IGA2PAnew · submitted 2019-06-21 · 💻 cs.AI

Hybrid Planning for Dynamic Multimodal Stochastic Shortest Paths

Shushman Choudhury , Mykel J. Kochenderfer This is my paper

Pith reviewed 2026-05-25 18:56 UTC · model grok-4.3

classification 💻 cs.AI

keywords hybrid stochastic planningdynamic multimodal stochastic shortest pathsautonomous multimodal routingMarkov decision processesheuristic searchapproximate dynamic programminghierarchical planning

0 comments

The pith

The Hybrid Stochastic Planning algorithm unifies heuristic search, approximate dynamic programming, and hierarchical execution to solve dynamic multimodal stochastic shortest path problems more effectively than prior methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Dynamic Multimodal Stochastic Shortest Paths as a class of Markov Decision Processes that capture uncertainty, discrete modes, continuous states, and dynamic updates in applications like vehicle routing and manipulation. It proposes the Hybrid Stochastic Planning algorithm, which relies on domain-agnostic abstractions to combine planning over discrete modes via heuristic search, stochastic planning over continuous states via approximate dynamic programming, and interleaved planning with execution via hierarchy. Empirical evaluation in autonomous multimodal routing shows HSP producing significantly higher quality solutions than Upper Confidence Trees and two-level Receding Horizon Control. A sympathetic reader would care because many practical sequential decisions involve uncertainty that deterministic planners handle poorly, and a unified method could improve reliability without separate specialized solvers.

Core claim

HSP uses domain-agnostic abstractions to efficiently unify heuristic search for planning over discrete modes, approximate dynamic programming for stochastic planning over continuous states, and hierarchical interleaved planning and execution, yielding significantly higher quality solutions for DMSSPs in the domain of autonomous multimodal routing than a state-of-the-art Upper Confidence Trees algorithm and a two-level Receding Horizon Control algorithm.

What carries the argument

The Hybrid Stochastic Planning (HSP) algorithm, which applies domain-agnostic abstractions to integrate heuristic search over discrete modes, approximate dynamic programming over continuous states, and hierarchical execution.

If this is right

HSP applies directly to other DMSSP instances such as warehouse manipulation and multi-step meal preparation.
HSP improves upon deterministic planners when uncertainty has downstream effects on mode choices and state transitions.
The hybrid structure supports reacting to dynamic updates during execution without full replanning from scratch.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same abstractions might extend to other hybrid discrete-continuous MDPs outside the paper's listed applications.
If the abstractions prove portable, HSP could serve as a template for combining search and dynamic programming in additional stochastic planning settings.
Empirical gains in routing suggest testing whether the method scales to larger mode sets or higher-dimensional continuous states.

Load-bearing premise

The chosen domain-agnostic abstractions are sufficient to unify heuristic search, approximate dynamic programming, and hierarchical execution without substantial loss of solution quality or computational tractability in the target domains.

What would settle it

Running the autonomous multimodal routing experiments and finding that HSP does not obtain significantly higher quality solutions than Upper Confidence Trees or two-level Receding Horizon Control would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 1906.09094 by Mykel J. Kochenderfer, Shushman Choudhury.

**Figure 2.** Figure 2: The global open-loop layer of HSP jointly decides the current valid mode sequence [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Our framework efficiently interleaves planning and execution. From time [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Average cost (1000 problems) for HSP (three [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Average cumulative trajectory cost to reach the goal, over [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

read the original abstract

Sequential decision problems in applications such as manipulation in warehouses, multi-step meal preparation, and routing in autonomous vehicle networks often involve reasoning about uncertainty, planning over discrete modes as well as continuous states, and reacting to dynamic updates. To formalize such problems generally, we introduce a class of Markov Decision Processes (MDPs) called Dynamic Multimodal Stochastic Shortest Paths (DMSSPs). Much of the work in these domains solves deterministic variants, which can yield poor results when the uncertainty has downstream effects. We develop a Hybrid Stochastic Planning (HSP) algorithm, which uses domain-agnostic abstractions to efficiently unify heuristic search for planning over discrete modes, approximate dynamic programming for stochastic planning over continuous states, and hierarchical interleaved planning and execution. In the domain of autonomous multimodal routing, HSP obtains significantly higher quality solutions than a state-of-the-art Upper Confidence Trees algorithm and a two-level Receding Horizon Control algorithm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces DMSSPs and HSP as a hybrid that interleaves search, ADP, and hierarchical execution, with routing experiments showing gains over UCT and RHC, but the domain-agnostic claim needs direct checking against the setup.

read the letter

The core of this paper is a new MDP class called DMSSPs that captures planning problems with discrete modes, continuous states, and dynamic updates, plus the HSP algorithm that combines heuristic search over modes, approximate dynamic programming over the continuous parts, and hierarchical interleaved planning and execution. The routing experiments are the main evidence offered, where HSP produces higher-quality solutions than UCT and a two-level receding-horizon controller. That combination and the formalization appear new relative to the cited work. The paper does a straightforward job showing how the three established pieces can be glued together without new primitives, and the routing domain is a reasonable test case for the class. The performance numbers are the part that would interest people who actually need to solve these problems. The soft spot is exactly the stress-test point: whether the abstractions used to make HSP domain-agnostic were applied evenly to the baselines or whether they quietly gave HSP extra representational power in the routing instances. If the experiments keep the abstractions neutral, the quality gap holds; if not, the advantage may shrink under a stricter comparison. The abstract alone does not settle this, so the full experimental section matters. This is for researchers working on stochastic planning in robotics or autonomous systems who already know the three component techniques. It deserves peer review because the formalization is fresh and the empirical comparison exists, even if the evaluation details will need tightening.

Referee Report

2 major / 2 minor

Summary. The paper defines Dynamic Multimodal Stochastic Shortest Paths (DMSSPs), a class of MDPs capturing uncertainty, discrete modes, continuous states, and dynamic updates in domains such as autonomous routing. It introduces the Hybrid Stochastic Planning (HSP) algorithm, which employs domain-agnostic abstractions to combine heuristic search over modes, approximate dynamic programming over continuous states, and hierarchical interleaved planning/execution. In autonomous multimodal routing experiments, HSP is reported to yield significantly higher-quality solutions than UCT and two-level RHC.

Significance. If the central empirical claim holds and the abstractions prove domain-agnostic without hidden domain-specific tuning, the framework could provide a general, practical method for hybrid stochastic planning that bridges heuristic search and ADP in multimodal settings with dynamic updates.

major comments (2)

[Experiments / routing domain description] The central claim that HSP outperforms UCT and two-level RHC rests on the abstractions being domain-agnostic and introducing no substantial loss of quality or tractability. The manuscript must explicitly detail the construction of these abstractions for the routing domain (including any mode discretization or state aggregation choices) and demonstrate that equivalent representational power is available to the baselines; otherwise the quality gap may be an artifact of unequal modeling rather than the hybrid unification.
[Abstract and § on empirical evaluation] The abstract and evaluation sections state 'significantly higher quality solutions' but supply no quantitative metrics (e.g., expected cost, success rate), statistical tests, number of trials, or ablation results isolating the contribution of each abstraction component. These details are required to assess robustness of the performance claim.

minor comments (2)

[Formal definition] Notation for the DMSSP tuple and the abstraction operators should be introduced with a single consolidated definition rather than scattered across sections.
[Figures] Figure captions for the routing domain should include the specific parameter values used for uncertainty and mode transitions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript introducing DMSSPs and the HSP algorithm. We address each major comment below and commit to revisions that strengthen the empirical presentation without altering the core claims.

read point-by-point responses

Referee: [Experiments / routing domain description] The central claim that HSP outperforms UCT and two-level RHC rests on the abstractions being domain-agnostic and introducing no substantial loss of quality or tractability. The manuscript must explicitly detail the construction of these abstractions for the routing domain (including any mode discretization or state aggregation choices) and demonstrate that equivalent representational power is available to the baselines; otherwise the quality gap may be an artifact of unequal modeling rather than the hybrid unification.

Authors: We agree that the manuscript requires explicit details on abstraction construction to support the domain-agnostic claim and to confirm equivalent power for baselines. In the revision we will insert a new subsection in the experimental evaluation that specifies the mode discretization procedure and state aggregation choices used in the autonomous multimodal routing domain. We will also document how identical abstractions were provided to UCT and two-level RHC, thereby showing that observed quality differences stem from the hybrid unification rather than representational disparity. revision: yes
Referee: [Abstract and § on empirical evaluation] The abstract and evaluation sections state 'significantly higher quality solutions' but supply no quantitative metrics (e.g., expected cost, success rate), statistical tests, number of trials, or ablation results isolating the contribution of each abstraction component. These details are required to assess robustness of the performance claim.

Authors: We acknowledge that the current text lacks the requested quantitative details. The revised manuscript will augment both the abstract and evaluation section with tables reporting expected costs, success rates, the number of independent trials, results of statistical significance tests, and ablation experiments that isolate the contribution of each HSP component (heuristic search over modes, ADP over continuous states, and hierarchical interleaving). revision: yes

Circularity Check

0 steps flagged

No circularity detected; HSP is presented as an independent algorithmic construction

full rationale

The paper defines a new MDP class (DMSSPs) and constructs the HSP algorithm by combining heuristic search, approximate dynamic programming, and hierarchical execution through explicitly domain-agnostic abstractions. No equations, parameter fits, or self-citations are shown to reduce the claimed performance gains to tautological inputs or prior author results by construction. The empirical superiority claim rests on external comparisons (UCT, RHC) rather than any self-referential derivation, making the work self-contained against the listed circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities can be extracted from the given text.

pith-pipeline@v0.9.0 · 5682 in / 1050 out tokens · 22269 ms · 2026-05-25T18:56:24.893928+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop a Hybrid Stochastic Planning (HSP) algorithm, which uses domain-agnostic abstractions to efficiently unify heuristic search for planning over discrete modes, approximate dynamic programming for stochastic planning over continuous states, and hierarchical interleaved planning and execution.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the cost-to-go function of the policy for a local MDP corresponding to the mode

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 2 internal anchors

[1]

An Analysis of Stochastic Shortest Path Problems

Dimitri P Bertsekas, John N Tsitsiklis, et al. An Analysis of Stochastic Shortest Path Problems. Mathematics of Operations Research, 16(3):580–595, 1991

work page 1991
[2]

Combined Task and Motion Planning for Mobile Manipulation

Jason Wolfe, Bhaskara Marthi, and Stuart Russell. Combined Task and Motion Planning for Mobile Manipulation. In International Conference on Automated Planning and Scheduling (ICAPS), 2010

work page 2010
[3]

Integrated Task Planning and Control for Mobile Manipulators

Jindong Tan, Ning Xi, and Yuechao Wang. Integrated Task Planning and Control for Mobile Manipulators. The International Journal of Robotics Research, 22(5):337–354, 2003

work page 2003
[4]

Dynamic Real-time Multimodal Routing with Hierarchical Hybrid Planning

Shushman Choudhury and Mykel J Kochenderfer. Dynamic Real-time Multimodal Routing with Hierarchical Hybrid Planning. arXiv preprint arXiv:1902.01560, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[5]

Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

Thomas G Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artiﬁcial Intelligence Research, 13:227–303, 2000

work page 2000
[6]

Reinforcement Learning and Dynamic Programming using Function Approximators

Lucian Busoniu, Robert Babuska, Bart De Schutter, and Damien Ernst. Reinforcement Learning and Dynamic Programming using Function Approximators. CRC press, 2010

work page 2010
[7]

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Martin L Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. 1994

work page 1994
[8]

Arbitrarily Modulated Markov Decision Processes

Jia Yuan Yu and Shie Mannor. Arbitrarily Modulated Markov Decision Processes. In IEEE Conference on Decision and Control (CDC), pages 2946–2953. IEEE, 2009

work page 2009
[9]

The Online Loop-free Stochastic Shortest- Path Problem

Gergely Neu, András György, and Csaba Szepesvári. The Online Loop-free Stochastic Shortest- Path Problem. In Conference on Learning Theory, pages 231–243, 2010

work page 2010
[10]

Solving Factored MDPs with Hybrid State and Action Variables

Branislav Kveton, Milos Hauskrecht, and Carlos Guestrin. Solving Factored MDPs with Hybrid State and Action Variables. Journal of Artiﬁcial Intelligence Research, 27:153–201, 2006

work page 2006
[11]

A Formal Basis for the Heuristic Determination of Minimum Cost Paths

Peter Hart, Nils Nilsson, and Bertram Raphael. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics, 2(4):100–107, 1968

work page 1968
[12]

Heuristics: Intelligent Search Strategies for Computer Problem Solving

Judea Pearl. Heuristics: Intelligent Search Strategies for Computer Problem Solving. The Addison-Wesley Series in Artiﬁcial Intelligence, Reading, Mass.: Addison-Wesley, 1985, Reprinted version, 1985

work page 1985
[13]

Artiﬁcial Intelligence: A Modern Approach

Stuart Russell and Peter Norvig. Artiﬁcial Intelligence: A Modern Approach. Prentice Hall, 2003

work page 2003
[14]

The FF Planning System: Fast Plan Generation through Heuristic Search

Jörg Hoffmann and Bernhard Nebel. The FF Planning System: Fast Plan Generation through Heuristic Search. Journal of Artiﬁcial Intelligence Research, 14:253–302, 2001

work page 2001
[15]

Dynamic Programming and Optimal Control, volume 1

Dimitri P Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientiﬁc Belmont, MA, 2005

work page 2005
[16]

University of California, Berkeley, 1998

Ronald Edward Parr and Stuart Russell.Hierarchical Control and Learning for Markov Decision Processes. University of California, Berkeley, 1998

work page 1998
[17]

Hierarchical Solution of Markov Decision Processes using Macro-actions

Milos Hauskrecht, Nicolas Meuleau, Leslie Pack Kaelbling, Thomas Dean, and Craig Boutilier. Hierarchical Solution of Markov Decision Processes using Macro-actions. In Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 220–229. Morgan Kaufmann Publishers Inc., 1998

work page 1998
[18]

Hierarchical Approaches

Bernhard Hengst. Hierarchical Approaches. In Reinforcement Learning, pages 293–323. Springer, 2012

work page 2012
[19]

Interleaving Planning and Execution

Illah Reza Nourbakhsh. Interleaving Planning and Execution. In Interleaving Planning and Execution for Autonomous Robots, pages 53–64. Springer, 1997. 9

work page 1997
[20]

Interleaving Temporal Planning and Execution in Robotics Domains

Solange Lemai and Félix Ingrand. Interleaving Temporal Planning and Execution in Robotics Domains. In AAAI Conference on Artiﬁcial Intelligence (AAAI) , volume 4, pages 617–622, 2004

work page 2004
[21]

A Hybridized Planner for Stochastic Domains

Mausam, Piergiorgio Bertoli, and Daniel S Weld. A Hybridized Planner for Stochastic Domains. In International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 1972–1978, 2007

work page 1972
[22]

Heuristic Search for Generalized Stochastic Shortest Path MDPs

Andrey Kolobov, Mausam Mausam, Daniel S Weld, and Hector Geffner. Heuristic Search for Generalized Stochastic Shortest Path MDPs. In International Conference on Automated Planning and Scheduling (ICAPS), 2011

work page 2011
[23]

Planning with Markov Decision Processes: An AI Perspective

Andrey Kolobov. Planning with Markov Decision Processes: An AI Perspective. Synthesis Lectures on Artiﬁcial Intelligence and Machine Learning, 6(1):1–210, 2012

work page 2012
[24]

Using Local Trajectory Optimizers to speed up Global Optimization in Dynamic Programming

Christopher G Atkeson. Using Local Trajectory Optimizers to speed up Global Optimization in Dynamic Programming. In Advances in Neural Information Processing Systems , pages 663–670, 1994

work page 1994
[25]

Regionally Accelerated Batch Informed Trees (RABIT*): A Framework to Integrate Local Information into Optimal Path Planning

Sanjiban Choudhury, Jonathan D Gammell, Timothy D Barfoot, Siddhartha S Srinivasa, and Sebastian Scherer. Regionally Accelerated Batch Informed Trees (RABIT*): A Framework to Integrate Local Information into Optimal Path Planning. In IEEE International Conference on Robotics and Automation (ICRA), pages 4207–4214. IEEE, 2016

work page 2016
[26]

Combined Task and Motion Planning through an Extensible Planner-independent Interface Layer

Siddharth Srivastava, Eugene Fang, Lorenzo Riano, Rohan Chitnis, Stuart Russell, and Pieter Abbeel. Combined Task and Motion Planning through an Extensible Planner-independent Interface Layer. In IEEE International Conference on Robotics and Automation (ICRA), pages 639–646. IEEE, 2014

work page 2014
[27]

An Anytime Algorithm for Task and Motion MDPs

Siddharth Srivastava, Nishant Desai, Richard Freedman, and Shlomo Zilberstein. An Anytime Algorithm for Task and Motion MDPs. arXiv preprint arXiv:1802.05835, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[28]

Integrated Task and Motion Planning in Belief Space

Leslie Pack Kaelbling and Tomás Lozano-Pérez. Integrated Task and Motion Planning in Belief Space. International Journal of Robotics Research, 32(9-10):1194–1227, 2013

work page 2013
[29]

Hierarchical Task and Motion Planning in the now

Leslie Pack Kaelbling and Tomás Lozano-Pérez. Hierarchical Task and Motion Planning in the now. In IEEE International Conference on Robotics and Automation (ICRA), pages 1470–1477. IEEE, 2011

work page 2011
[30]

Implicit Belief-space Pre-images for Hierar- chical Planning and Execution

Leslie Pack Kaelbling and Tomás Lozano-Pérez. Implicit Belief-space Pre-images for Hierar- chical Planning and Execution. In IEEE International Conference on Robotics and Automation (ICRA), pages 5455–5462. IEEE, 2016

work page 2016
[31]

Logic-Geometric Programming: An Optimization-based Approach to Com- bined Task and Motion Planning

Marc Toussaint. Logic-Geometric Programming: An Optimization-based Approach to Com- bined Task and Motion Planning. In International Joint Conference on Artiﬁcial Intelligence (IJCAI), 2015

work page 2015
[32]

Automatic Grasp Planning using Shape Primitives

AT Miller, S Knoop, HI Christensen, and PK Allen. Automatic Grasp Planning using Shape Primitives. In IEEE International Conference on Robotics and Automation (ICRA), volume 2, pages 1824–1829. IEEE, 2003

work page 2003
[33]

Landmarks, Critical Paths and Abstractions: What’s the Difference Anyway? In International Conference on Automated Planning and Scheduling (ICAPS), pages 162–169

Malte Helmert and Carmel Domshlak. Landmarks, Critical Paths and Abstractions: What’s the Difference Anyway? In International Conference on Automated Planning and Scheduling (ICAPS), pages 162–169. AAAI Press, 2009

work page 2009
[34]

Benchmarking Deep Reinforcement Learning for Continuous Control

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking Deep Reinforcement Learning for Continuous Control. In International Conference on Machine Learning (ICML), pages 1329–1338, 2016

work page 2016
[35]

Analysis of Periodic and Event-driven Rescheduling Policies in Dynamic Shops

Laura K Church and Reha Uzsoy. Analysis of Periodic and Event-driven Rescheduling Policies in Dynamic Shops. International Journal of Computer Integrated Manufacturing, 5(3):153–163, 1992

work page 1992
[36]

Bandit Based Monte-Carlo Planning

Levente Kocsis and Csaba Szepesvári. Bandit Based Monte-Carlo Planning. European Confer- ence on Machine Learning (ECML), pages 282–293, 2006. 10

work page 2006
[37]

Prost: Probabilistic Planning based on UCT

Thomas Keller and Patrick Eyerich. Prost: Probabilistic Planning based on UCT. In Interna- tional Conference on Automated Planning and Scheduling (ICAPS) , pages 119–127. AAAI Press, 2012

work page 2012
[38]

The 2014 International Planning Competition: Progress and Trends

Mauro Vallati, Lukáš Chrpa, Marek Grzes, Thomas L McCluskey, Mark Roberts, and Scott Sanner. The 2014 International Planning Competition: Progress and Trends. AI Magazine, 36 (3):90–98, 2015

work page 2014
[39]

Continuous Upper Conﬁdence Trees

Adrien Couëtoux, Jean-Baptiste Hoock, Nataliya Sokolovska, Olivier Teytaud, and Nicolas Bonnard. Continuous Upper Conﬁdence Trees. Learning and Intelligent Optimization, pages 433–445, 2011

work page 2011
[40]

POMDPs.jl: A Framework for Sequential Decision Making under Uncertainty

Maxim Egorov, Zachary N Sunberg, Edward Balaban, Tim A Wheeler, Jayesh K Gupta, and Mykel J Kochenderfer. POMDPs.jl: A Framework for Sequential Decision Making under Uncertainty. Journal of Machine Learning Research, 18(26):1–5, 2017. 11 All references are from the original bibliography. A Comments on HSP Optimality We brieﬂy stated in section 2 how the p...

work page 2017

[1] [1]

An Analysis of Stochastic Shortest Path Problems

Dimitri P Bertsekas, John N Tsitsiklis, et al. An Analysis of Stochastic Shortest Path Problems. Mathematics of Operations Research, 16(3):580–595, 1991

work page 1991

[2] [2]

Combined Task and Motion Planning for Mobile Manipulation

Jason Wolfe, Bhaskara Marthi, and Stuart Russell. Combined Task and Motion Planning for Mobile Manipulation. In International Conference on Automated Planning and Scheduling (ICAPS), 2010

work page 2010

[3] [3]

Integrated Task Planning and Control for Mobile Manipulators

Jindong Tan, Ning Xi, and Yuechao Wang. Integrated Task Planning and Control for Mobile Manipulators. The International Journal of Robotics Research, 22(5):337–354, 2003

work page 2003

[4] [4]

Dynamic Real-time Multimodal Routing with Hierarchical Hybrid Planning

Shushman Choudhury and Mykel J Kochenderfer. Dynamic Real-time Multimodal Routing with Hierarchical Hybrid Planning. arXiv preprint arXiv:1902.01560, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902

[5] [5]

Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

Thomas G Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artiﬁcial Intelligence Research, 13:227–303, 2000

work page 2000

[6] [6]

Reinforcement Learning and Dynamic Programming using Function Approximators

Lucian Busoniu, Robert Babuska, Bart De Schutter, and Damien Ernst. Reinforcement Learning and Dynamic Programming using Function Approximators. CRC press, 2010

work page 2010

[7] [7]

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Martin L Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. 1994

work page 1994

[8] [8]

Arbitrarily Modulated Markov Decision Processes

Jia Yuan Yu and Shie Mannor. Arbitrarily Modulated Markov Decision Processes. In IEEE Conference on Decision and Control (CDC), pages 2946–2953. IEEE, 2009

work page 2009

[9] [9]

The Online Loop-free Stochastic Shortest- Path Problem

Gergely Neu, András György, and Csaba Szepesvári. The Online Loop-free Stochastic Shortest- Path Problem. In Conference on Learning Theory, pages 231–243, 2010

work page 2010

[10] [10]

Solving Factored MDPs with Hybrid State and Action Variables

Branislav Kveton, Milos Hauskrecht, and Carlos Guestrin. Solving Factored MDPs with Hybrid State and Action Variables. Journal of Artiﬁcial Intelligence Research, 27:153–201, 2006

work page 2006

[11] [11]

A Formal Basis for the Heuristic Determination of Minimum Cost Paths

Peter Hart, Nils Nilsson, and Bertram Raphael. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics, 2(4):100–107, 1968

work page 1968

[12] [12]

Heuristics: Intelligent Search Strategies for Computer Problem Solving

Judea Pearl. Heuristics: Intelligent Search Strategies for Computer Problem Solving. The Addison-Wesley Series in Artiﬁcial Intelligence, Reading, Mass.: Addison-Wesley, 1985, Reprinted version, 1985

work page 1985

[13] [13]

Artiﬁcial Intelligence: A Modern Approach

Stuart Russell and Peter Norvig. Artiﬁcial Intelligence: A Modern Approach. Prentice Hall, 2003

work page 2003

[14] [14]

The FF Planning System: Fast Plan Generation through Heuristic Search

Jörg Hoffmann and Bernhard Nebel. The FF Planning System: Fast Plan Generation through Heuristic Search. Journal of Artiﬁcial Intelligence Research, 14:253–302, 2001

work page 2001

[15] [15]

Dynamic Programming and Optimal Control, volume 1

Dimitri P Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientiﬁc Belmont, MA, 2005

work page 2005

[16] [16]

University of California, Berkeley, 1998

Ronald Edward Parr and Stuart Russell.Hierarchical Control and Learning for Markov Decision Processes. University of California, Berkeley, 1998

work page 1998

[17] [17]

Hierarchical Solution of Markov Decision Processes using Macro-actions

Milos Hauskrecht, Nicolas Meuleau, Leslie Pack Kaelbling, Thomas Dean, and Craig Boutilier. Hierarchical Solution of Markov Decision Processes using Macro-actions. In Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 220–229. Morgan Kaufmann Publishers Inc., 1998

work page 1998

[18] [18]

Hierarchical Approaches

Bernhard Hengst. Hierarchical Approaches. In Reinforcement Learning, pages 293–323. Springer, 2012

work page 2012

[19] [19]

Interleaving Planning and Execution

Illah Reza Nourbakhsh. Interleaving Planning and Execution. In Interleaving Planning and Execution for Autonomous Robots, pages 53–64. Springer, 1997. 9

work page 1997

[20] [20]

Interleaving Temporal Planning and Execution in Robotics Domains

Solange Lemai and Félix Ingrand. Interleaving Temporal Planning and Execution in Robotics Domains. In AAAI Conference on Artiﬁcial Intelligence (AAAI) , volume 4, pages 617–622, 2004

work page 2004

[21] [21]

A Hybridized Planner for Stochastic Domains

Mausam, Piergiorgio Bertoli, and Daniel S Weld. A Hybridized Planner for Stochastic Domains. In International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 1972–1978, 2007

work page 1972

[22] [22]

Heuristic Search for Generalized Stochastic Shortest Path MDPs

Andrey Kolobov, Mausam Mausam, Daniel S Weld, and Hector Geffner. Heuristic Search for Generalized Stochastic Shortest Path MDPs. In International Conference on Automated Planning and Scheduling (ICAPS), 2011

work page 2011

[23] [23]

Planning with Markov Decision Processes: An AI Perspective

Andrey Kolobov. Planning with Markov Decision Processes: An AI Perspective. Synthesis Lectures on Artiﬁcial Intelligence and Machine Learning, 6(1):1–210, 2012

work page 2012

[24] [24]

Using Local Trajectory Optimizers to speed up Global Optimization in Dynamic Programming

Christopher G Atkeson. Using Local Trajectory Optimizers to speed up Global Optimization in Dynamic Programming. In Advances in Neural Information Processing Systems , pages 663–670, 1994

work page 1994

[25] [25]

Regionally Accelerated Batch Informed Trees (RABIT*): A Framework to Integrate Local Information into Optimal Path Planning

Sanjiban Choudhury, Jonathan D Gammell, Timothy D Barfoot, Siddhartha S Srinivasa, and Sebastian Scherer. Regionally Accelerated Batch Informed Trees (RABIT*): A Framework to Integrate Local Information into Optimal Path Planning. In IEEE International Conference on Robotics and Automation (ICRA), pages 4207–4214. IEEE, 2016

work page 2016

[26] [26]

Combined Task and Motion Planning through an Extensible Planner-independent Interface Layer

Siddharth Srivastava, Eugene Fang, Lorenzo Riano, Rohan Chitnis, Stuart Russell, and Pieter Abbeel. Combined Task and Motion Planning through an Extensible Planner-independent Interface Layer. In IEEE International Conference on Robotics and Automation (ICRA), pages 639–646. IEEE, 2014

work page 2014

[27] [27]

An Anytime Algorithm for Task and Motion MDPs

Siddharth Srivastava, Nishant Desai, Richard Freedman, and Shlomo Zilberstein. An Anytime Algorithm for Task and Motion MDPs. arXiv preprint arXiv:1802.05835, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[28] [28]

Integrated Task and Motion Planning in Belief Space

Leslie Pack Kaelbling and Tomás Lozano-Pérez. Integrated Task and Motion Planning in Belief Space. International Journal of Robotics Research, 32(9-10):1194–1227, 2013

work page 2013

[29] [29]

Hierarchical Task and Motion Planning in the now

Leslie Pack Kaelbling and Tomás Lozano-Pérez. Hierarchical Task and Motion Planning in the now. In IEEE International Conference on Robotics and Automation (ICRA), pages 1470–1477. IEEE, 2011

work page 2011

[30] [30]

Implicit Belief-space Pre-images for Hierar- chical Planning and Execution

Leslie Pack Kaelbling and Tomás Lozano-Pérez. Implicit Belief-space Pre-images for Hierar- chical Planning and Execution. In IEEE International Conference on Robotics and Automation (ICRA), pages 5455–5462. IEEE, 2016

work page 2016

[31] [31]

Logic-Geometric Programming: An Optimization-based Approach to Com- bined Task and Motion Planning

Marc Toussaint. Logic-Geometric Programming: An Optimization-based Approach to Com- bined Task and Motion Planning. In International Joint Conference on Artiﬁcial Intelligence (IJCAI), 2015

work page 2015

[32] [32]

Automatic Grasp Planning using Shape Primitives

AT Miller, S Knoop, HI Christensen, and PK Allen. Automatic Grasp Planning using Shape Primitives. In IEEE International Conference on Robotics and Automation (ICRA), volume 2, pages 1824–1829. IEEE, 2003

work page 2003

[33] [33]

Landmarks, Critical Paths and Abstractions: What’s the Difference Anyway? In International Conference on Automated Planning and Scheduling (ICAPS), pages 162–169

Malte Helmert and Carmel Domshlak. Landmarks, Critical Paths and Abstractions: What’s the Difference Anyway? In International Conference on Automated Planning and Scheduling (ICAPS), pages 162–169. AAAI Press, 2009

work page 2009

[34] [34]

Benchmarking Deep Reinforcement Learning for Continuous Control

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking Deep Reinforcement Learning for Continuous Control. In International Conference on Machine Learning (ICML), pages 1329–1338, 2016

work page 2016

[35] [35]

Analysis of Periodic and Event-driven Rescheduling Policies in Dynamic Shops

Laura K Church and Reha Uzsoy. Analysis of Periodic and Event-driven Rescheduling Policies in Dynamic Shops. International Journal of Computer Integrated Manufacturing, 5(3):153–163, 1992

work page 1992

[36] [36]

Bandit Based Monte-Carlo Planning

Levente Kocsis and Csaba Szepesvári. Bandit Based Monte-Carlo Planning. European Confer- ence on Machine Learning (ECML), pages 282–293, 2006. 10

work page 2006

[37] [37]

Prost: Probabilistic Planning based on UCT

Thomas Keller and Patrick Eyerich. Prost: Probabilistic Planning based on UCT. In Interna- tional Conference on Automated Planning and Scheduling (ICAPS) , pages 119–127. AAAI Press, 2012

work page 2012

[38] [38]

The 2014 International Planning Competition: Progress and Trends

Mauro Vallati, Lukáš Chrpa, Marek Grzes, Thomas L McCluskey, Mark Roberts, and Scott Sanner. The 2014 International Planning Competition: Progress and Trends. AI Magazine, 36 (3):90–98, 2015

work page 2014

[39] [39]

Continuous Upper Conﬁdence Trees

Adrien Couëtoux, Jean-Baptiste Hoock, Nataliya Sokolovska, Olivier Teytaud, and Nicolas Bonnard. Continuous Upper Conﬁdence Trees. Learning and Intelligent Optimization, pages 433–445, 2011

work page 2011

[40] [40]

POMDPs.jl: A Framework for Sequential Decision Making under Uncertainty

Maxim Egorov, Zachary N Sunberg, Edward Balaban, Tim A Wheeler, Jayesh K Gupta, and Mykel J Kochenderfer. POMDPs.jl: A Framework for Sequential Decision Making under Uncertainty. Journal of Machine Learning Research, 18(26):1–5, 2017. 11 All references are from the original bibliography. A Comments on HSP Optimality We brieﬂy stated in section 2 how the p...

work page 2017