Leveraging Experience in Lazy Search

Byron Boots; Mohak Bhardwaj; Sanjiban Choudhury; Siddhartha Srinivasa

arxiv: 1907.07238 · v1 · pith:USTJH3TAnew · submitted 2019-07-16 · 💻 cs.RO · cs.LG

Leveraging Experience in Lazy Search

Mohak Bhardwaj , Sanjiban Choudhury , Byron Boots , Siddhartha Srinivasa This is my paper

Pith reviewed 2026-05-24 20:45 UTC · model grok-4.3

classification 💻 cs.RO cs.LG

keywords lazy searchmotion planningimitation learningedge evaluationgraph searchroboticsMDP

0 comments

The pith

A policy learned by imitating oracular edge selectors solves similar motion planning problems with fewer edge evaluations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that lazy graph search for motion planning can be accelerated by learning an edge evaluation ordering from prior experience. It formulates selector choice as an MDP on the search state and trains a policy by imitating oracles that know the optimal ordering on training problems. A sympathetic reader would care because edge evaluation, typically a collision check, dominates runtime, and a learned ordering that invalidates bad paths early reduces the total number of checks. If new problems are similar enough to the training distribution, the policy transfers without hand-crafted heuristics.

Core claim

Lazy search defers edge evaluations by repeatedly finding the shortest potentially feasible path and checking its edges until a feasible path is confirmed. The central claim is that oracular selectors, which solve the MDP optimally on training instances, can supervise an imitation learning procedure that produces a policy effective on new instances drawn from a similar distribution, thereby minimizing the cumulative number of edge evaluations required.

What carries the argument

MDP on the state of the search problem, solved during training by oracular selectors and transferred via imitation learning to a runtime policy for edge ordering.

If this is right

The learned selector outperforms commonly used heuristics on a range of 2D and 7D motion planning problems when the similarity condition holds.
A good edge ordering both identifies invalid edges quickly and removes many future candidate paths from consideration.
Formulating selection as an MDP makes the problem amenable to imitation learning once oracles are available.
Solving the large MDP exactly is intractable, so the oracle-plus-imitation route is required to obtain a usable policy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same oracle-imitation pipeline could be applied to other search settings where the cost of evaluating actions dominates, such as symbolic planning with expensive precondition checks.
Collecting experience from deployed robots would allow the policy to improve continuously as the robot encounters new but related environments.
If similarity breaks down, an online adaptation mechanism that fine-tunes the policy on recently solved problems could restore performance without full retraining.

Load-bearing premise

New search problems must be sufficiently similar to the problems solved during training.

What would settle it

Apply the learned policy to a test set of motion planning problems drawn from a distribution clearly different from the training distribution and check whether the number of edge evaluations exceeds that of standard heuristics.

Figures

Figures reproduced from arXiv: 1907.07238 by Byron Boots, Mohak Bhardwaj, Sanjiban Choudhury, Siddhartha Srinivasa.

**Figure 1.** Figure 1: The LAZYSP [2] framework. LAZYSP iteratively computes the shortest path, queries a SELECTOR for an edge on the path, evaluates it and updates the graph until a feasible path is found. The number of edges evaluated depends on the choice of SELECTOR. We propose to train a SELECTOR from prior data. (verify if they are valid) and return a path ξ ∗ upon halting. Two conditions must be met: 1) The returned path … view at source ↗

**Figure 2.** Figure 2: Overview of STROLL- a training procedure for a SELECTOR to select edges to evaluate in the LAZYSP framework. In each training iteration, a world map φ is sampled. The learner is executed upto a time step to get a state st which is the set of edges evaluated and their outcomes. The learner has to decide which edge to evaluate on the current shortest path. It extracts features from every edge - we use a set … view at source ↗

**Figure 3.** Figure 3: Distribution over worlds for (a) Environment 1 and (b) Environment [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Average reward per epsiode of Tabular Q-learning. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Edges evaluated (green valid, red invalid) on a world from B [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: (a) STROLL-R (green) vs STROLL (orange) (b) Densification of data. (a) (b) 0 25 50 75 100 125 150 175 Training Episodes -20 -40 -60 -80 -100 -120 -140 -160 Average Rewards 0 20 40 60 80 100 -40 -50 -60 -70 -80 -90 -100 -110 -120 Rewards Percentage Mixing StrOLL Q-Learning StrOLL Alternate [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: (a) Running average reward for 200 episodes of training. Q learning [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Weight bins depicting relative importance of each feature learned by the learner. S [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

read the original abstract

Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a good edge selector chooses edges that are not only likely to be invalid, but also eliminates future paths from consideration. We wish to learn such a selector by leveraging prior experience. We formulate this problem as a Markov Decision Process (MDP) on the state of the search problem. While solving this large MDP is generally intractable, we show that we can compute oracular selectors that can solve the MDP during training. With access to such oracles, we use imitation learning to find effective policies. If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly. We evaluate our algorithms on a wide range of 2D and 7D problems and show that the learned selector outperforms baseline commonly used heuristics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They frame edge ordering in lazy motion planning as an MDP and train a selector by imitating oracles, but the transfer claim rests on unmeasured similarity.

read the letter

The main move is casting the choice of which edge to evaluate next as an MDP on the search state, then using oracles that know the optimal ordering during training to supervise an imitation-learned policy. That specific combination for this bottleneck looks new compared to the usual heuristics. It gives a concrete way to reuse experience from solved problems when the new instances come from a similar distribution, and the abstract reports that the resulting selector beats standard baselines on their 2D and 7D test sets. That is useful framing for anyone who already runs lazy search and wants to amortize the cost across a family of problems. The soft spot is exactly the one the stress-test note flags: success is stated to require that new problems are sufficiently similar to the training ones, yet no metric, distribution-shift test, or ablation on dissimilar instances is described. The abstract also gives no numbers, error bars, train/test protocol, or details on how the oracles or state features are constructed, so the strength of the empirical support is hard to judge from what is here. This is aimed at robotics researchers who work on motion planning with expensive edge checks. A reader in that niche could extract a practical idea if the full experiments hold up. I would send it to peer review so the details and generalization tests can be examined properly.

Referee Report

3 major / 2 minor

Summary. The paper formulates edge selection ordering in lazy graph search for motion planning as an MDP on the search state. It computes oracular selectors during training and applies imitation learning to obtain policies. The central claim is that, when new problems are sufficiently similar to those seen in training, the learned policy selects an effective ordering and solves instances faster than standard heuristics; this is supported by reported outperformance on a range of 2D and 7D problems.

Significance. If the generalization claim holds, the work offers a practical way to leverage prior experience to reduce edge evaluations, a key bottleneck in lazy planning. The oracle-based imitation learning approach is a clear strength, as it supplies high-quality training signals without requiring the policy to be defined circularly. The method could influence experience-driven planning pipelines in robotics, provided the similarity assumption can be made operational.

major comments (3)

[Abstract and §5] Abstract and §5 (Experiments): the reported outperformance on 2D/7D problems supplies no error bars, confidence intervals, or statistical tests comparing the learned selector against baselines, so the magnitude and reliability of the improvement cannot be assessed.
[Abstract and §4] Abstract (final paragraph) and §4 (MDP and policy): the transfer claim is conditioned on 'sufficiently similar' problems, yet no metric, embedding distance, or distribution-shift experiment is defined or performed; without this, it is impossible to separate genuine generalization from memorization of the training distribution.
[§5] §5 (Experiments): no train/test split protocol, cross-validation procedure, or ablation on deliberately dissimilar instances is described, leaving the central generalization assumption untested and load-bearing for the main result.

minor comments (2)

[§4] The abstract states that oracles are 'computed externally' but the manuscript never specifies the exact procedure or computational cost of obtaining them; a short paragraph in §4 would clarify this.
[§3] Notation for the state representation and similarity function is introduced without an explicit equation or pseudocode block, making the MDP definition harder to follow on first reading.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments correctly identify gaps in the experimental reporting and the need to better substantiate the generalization claim. We address each point below and will make the corresponding revisions.

read point-by-point responses

Referee: [Abstract and §5] Abstract and §5 (Experiments): the reported outperformance on 2D/7D problems supplies no error bars, confidence intervals, or statistical tests comparing the learned selector against baselines, so the magnitude and reliability of the improvement cannot be assessed.

Authors: We agree. The original results reported only mean performance. In the revision we will add per-problem standard deviations across repeated runs with different random seeds and include statistical comparisons (paired Wilcoxon tests) between the learned policy and each baseline. revision: yes
Referee: [Abstract and §4] Abstract (final paragraph) and §4 (MDP and policy): the transfer claim is conditioned on 'sufficiently similar' problems, yet no metric, embedding distance, or distribution-shift experiment is defined or performed; without this, it is impossible to separate genuine generalization from memorization of the training distribution.

Authors: The manuscript states the claim conditionally. We acknowledge that the absence of a quantitative similarity measure leaves the generalization claim under-supported. We will add to §4 a concrete similarity metric based on the Wasserstein distance between obstacle-position distributions and will include a controlled distribution-shift experiment in the revised §5. revision: yes
Referee: [§5] §5 (Experiments): no train/test split protocol, cross-validation procedure, or ablation on deliberately dissimilar instances is described, leaving the central generalization assumption untested and load-bearing for the main result.

Authors: We will revise §5 to document the exact train/test generation procedure (distinct seeds, same parametric family) and add an explicit ablation that systematically increases dissimilarity (number and placement of obstacles) while measuring degradation of the learned policy relative to the oracle. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained imitation learning from external oracles

full rationale

The paper defines an MDP on search state, computes oracular selectors externally to solve the MDP at training time, then applies imitation learning to obtain a policy. This chain does not reduce any quantity to itself by definition, nor does it rename a fitted parameter as a prediction, nor rely on self-citation for a uniqueness claim. The conditional statement about similarity to training problems is an explicit generalization assumption rather than a tautological step. No load-bearing equation or premise collapses to its own input.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the ability to compute oracular selectors during training and on the transferability of the learned policy under a similarity assumption between training and test problems. No explicit free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption The state of the lazy search problem can be modeled as an MDP whose actions correspond to edge evaluations.
Abstract states: 'We formulate this problem as a Markov Decision Process (MDP) on the state of the search problem.'

pith-pipeline@v0.9.0 · 5729 in / 1247 out tokens · 21565 ms · 2026-05-24T20:45:53.691238+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We map edge selection in lazy search to an MDP (Section II) and solve it for small graphs (Section III). We show that larger MDPs can be efficiently solved by imitating clairvoyant oracles (Section IV).
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1 (Clairvoyant Oracle as Set Cover)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 3 internal anchors

[1]

Lazy collision checking in asymptotically- optimal motion planning

Kris Hauser. Lazy collision checking in asymptotically- optimal motion planning. In ICRA, 2015

work page 2015
[2]

A unifying formalism for shortest path problems with ex- pensive edge evaluations via lazy best-ﬁrst search over paths with edge selectors

Christopher M Dellin and Siddhartha S Srinivasa. A unifying formalism for shortest path problems with ex- pensive edge evaluations via lazy best-ﬁrst search over paths with edge selectors. In ICAPS, 2016

work page 2016
[3]

Path planning using lazy prm

Robert Bohlin and Lydia E Kavraki. Path planning using lazy prm. In ICRA, 2000

work page 2000
[4]

piano movers’

Jacob T Schwartz and Micha Sharir. On the “piano movers’” problem i. the case of a two-dimensional rigid polygonal body moving amidst polygonal barriers. Com- munications on pure and applied mathematics , 36(3): 345–398, 1983

work page 1983
[5]

Near-optimal edge evalu- ation in explicit generalized binomial graphs

Sanjiban Choudhury, Shervin JAvdani, Siddhartha Srini- vasa, and Sebastian Scherer. Near-optimal edge evalu- ation in explicit generalized binomial graphs. In NIPS, 2017

work page 2017
[6]

Choudhury, S.S

S. Choudhury, S.S. Srinivasa, and S. Scherer. Bayesian active edge evaluation on expensive graphs. In IJCAI, 2018

work page 2018
[7]

Data-driven planning via imitation learning

Sanjiban Choudhury, Mohak Bhardwaj, Sankalp Arora, Ashish Kapoor, Gireeja Ranade, Sebastian Scherer, and Debadeepta Dey. Data-driven planning via imitation learning. IJRR, 2017

work page 2017
[8]

Deeply aggrevated: Dif- ferentiable imitation learning for sequential prediction

Wen Sun, Arun Venkatraman, Geoffrey J Gordon, Byron Boots, and J Andrew Bagnell. Deeply aggrevated: Dif- ferentiable imitation learning for sequential prediction. In International Conference on Machine Learning, pages 3309–3318, 2017

work page 2017
[9]

A reduction of imitation learning and structured prediction to no-regret online learning

St ´ephane Ross, Geoffrey J Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In AISTATS, volume 1, page 6, 2011

work page 2011
[10]

Procaccia, Oren Salzman, and Siddhartha S

Nika Haghtalab, Simon Mackenzie, Ariel D. Procaccia, Oren Salzman, and Siddhartha S. Srinivasa. The Provable Virtue of Laziness in Motion Planning. pages 106– 113, 2018. URL https://aaai.org/ocs/index.php/ICAPS/ ICAPS18/paper/view/17726

work page 2018
[11]

Q-learning

Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279–292, 1992

work page 1992
[12]

Stable function approximation in dy- namic programming

Geoffrey J Gordon. Stable function approximation in dy- namic programming. In Machine Learning Proceedings 1995, pages 261–268. Elsevier, 1995

work page 1995
[13]

Reinforcement and imitation learning via interactive no-regret learning

Stephane Ross and J Andrew Bagnell. Reinforcement and imitation learning via interactive no-regret learning. arXiv, 2014

work page 2014
[14]

Planning single-arm manipulations with n-arm robots

Benjamin Cohen, Mike Phillips, and Maxim Likhachev. Planning single-arm manipulations with n-arm robots. In Eigth Annual Symposium on Combinatorial Search , 2015

work page 2015
[15]

Lazy Receding Horizon A* for Efﬁcient Path Planning in Graphs with Expensive-to-Evaluate Edges

Aditya Mandalika, Oren Salzman, and Siddhartha Srini- vasa. Lazy Receding Horizon A* for Efﬁcient Path Planning in Graphs with Expensive-to-Evaluate Edges. pages 476–484, 2018

work page 2018
[16]

Gammell, Siddhartha S

Jonathan D. Gammell, Siddhartha S. Srinivasa, and Tim- othy D. Barfoot. Batch Informed Trees: Sampling- based optimal planning via heuristically guided search of random geometric graphs. In ICRA, 2015

work page 2015
[17]

A 2 level fuzzy prm for manipulation planning

Christian L Nielsen and Lydia E Kavraki. A 2 level fuzzy prm for manipulation planning. In IROS, 2000

work page 2000
[18]

Heuris- tic search on graphs with existence priors for expensive- to-evaluate edges

Venkatraman Narayanan and Maxim Likhachev. Heuris- tic search on graphs with existence priors for expensive- to-evaluate edges. In ICAPS, 2017

work page 2017
[19]

Learning high- dimensional mixture models for fast collision detection in rapidly-exploring random trees

Jinwook Huh and Daniel D Lee. Learning high- dimensional mixture models for fast collision detection in rapidly-exploring random trees. In ICRA, 2016

work page 2016
[20]

Pareto-optimal search over conﬁg- uration space beliefs for anytime motion planning

Shushman Choudhury, Christopher M Dellin, and Sid- dhartha S Srinivasa. Pareto-optimal search over conﬁg- uration space beliefs for anytime motion planning. In IROS, 2016

work page 2016
[21]

Free-conﬁguration biased sampling for motion planning

Joshua Bialkowski, Michael Otte, and Emilio Frazzoli. Free-conﬁguration biased sampling for motion planning. In IROS, 2013

work page 2013
[22]

Path planning in expansive conﬁguration spaces

David Hsu, J-C Latombe, and Rajeev Motwani. Path planning in expansive conﬁguration spaces. In ICRA, 1997

work page 1997
[23]

Sampling-based mo- tion planning using predictive models

Brendan Burns and Oliver Brock. Sampling-based mo- tion planning using predictive models. In ICRA, 2005

work page 2005
[24]

Burs of free c-space: a novel structure for path planning

Bakir Lacevic, Dinko Osmankovic, and Adnan Ade- movic. Burs of free c-space: a novel structure for path planning. In ICRA. IEEE, 2016

work page 2016
[25]

Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning

Wen Sun, J Andrew Bagnell, and Byron Boots. Truncated horizon policy search: Combining reinforcement learning & imitation learning. arXiv preprint arXiv:1805.11240 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[26]

Fast Policy Learning through Imitation and Reinforcement

Ching-An Cheng, Xinyan Yan, Nolan Wagener, and Byron Boots. Fast policy learning through imitation and reinforcement. arXiv preprint arXiv:1805.10413 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[27]

Learning heuristic search via imitation

Mohak Bhardwaj, Sanjiban Choudhury, and Sebastian Scherer. Learning heuristic search via imitation. In CoRL, 2017

work page 2017
[28]

Plato: Policy learning using adaptive trajectory optimization

Gregory Kahn, Tianhao Zhang, Sergey Levine, and Pieter Abbeel. Plato: Policy learning using adaptive trajectory optimization. In ICRA, 2017

work page 2017
[29]

Learning from the Hindsight Plan -- Episodic MPC Improvement

Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, and Pieter Abbeel. Learning from the hind- sight plan–episodic mpc improvement. arXiv preprint arXiv:1609.09001, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

Lazy collision checking in asymptotically- optimal motion planning

Kris Hauser. Lazy collision checking in asymptotically- optimal motion planning. In ICRA, 2015

work page 2015

[2] [2]

A unifying formalism for shortest path problems with ex- pensive edge evaluations via lazy best-ﬁrst search over paths with edge selectors

Christopher M Dellin and Siddhartha S Srinivasa. A unifying formalism for shortest path problems with ex- pensive edge evaluations via lazy best-ﬁrst search over paths with edge selectors. In ICAPS, 2016

work page 2016

[3] [3]

Path planning using lazy prm

Robert Bohlin and Lydia E Kavraki. Path planning using lazy prm. In ICRA, 2000

work page 2000

[4] [4]

piano movers’

Jacob T Schwartz and Micha Sharir. On the “piano movers’” problem i. the case of a two-dimensional rigid polygonal body moving amidst polygonal barriers. Com- munications on pure and applied mathematics , 36(3): 345–398, 1983

work page 1983

[5] [5]

Near-optimal edge evalu- ation in explicit generalized binomial graphs

Sanjiban Choudhury, Shervin JAvdani, Siddhartha Srini- vasa, and Sebastian Scherer. Near-optimal edge evalu- ation in explicit generalized binomial graphs. In NIPS, 2017

work page 2017

[6] [6]

Choudhury, S.S

S. Choudhury, S.S. Srinivasa, and S. Scherer. Bayesian active edge evaluation on expensive graphs. In IJCAI, 2018

work page 2018

[7] [7]

Data-driven planning via imitation learning

Sanjiban Choudhury, Mohak Bhardwaj, Sankalp Arora, Ashish Kapoor, Gireeja Ranade, Sebastian Scherer, and Debadeepta Dey. Data-driven planning via imitation learning. IJRR, 2017

work page 2017

[8] [8]

Deeply aggrevated: Dif- ferentiable imitation learning for sequential prediction

Wen Sun, Arun Venkatraman, Geoffrey J Gordon, Byron Boots, and J Andrew Bagnell. Deeply aggrevated: Dif- ferentiable imitation learning for sequential prediction. In International Conference on Machine Learning, pages 3309–3318, 2017

work page 2017

[9] [9]

A reduction of imitation learning and structured prediction to no-regret online learning

St ´ephane Ross, Geoffrey J Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In AISTATS, volume 1, page 6, 2011

work page 2011

[10] [10]

Procaccia, Oren Salzman, and Siddhartha S

Nika Haghtalab, Simon Mackenzie, Ariel D. Procaccia, Oren Salzman, and Siddhartha S. Srinivasa. The Provable Virtue of Laziness in Motion Planning. pages 106– 113, 2018. URL https://aaai.org/ocs/index.php/ICAPS/ ICAPS18/paper/view/17726

work page 2018

[11] [11]

Q-learning

Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279–292, 1992

work page 1992

[12] [12]

Stable function approximation in dy- namic programming

Geoffrey J Gordon. Stable function approximation in dy- namic programming. In Machine Learning Proceedings 1995, pages 261–268. Elsevier, 1995

work page 1995

[13] [13]

Reinforcement and imitation learning via interactive no-regret learning

Stephane Ross and J Andrew Bagnell. Reinforcement and imitation learning via interactive no-regret learning. arXiv, 2014

work page 2014

[14] [14]

Planning single-arm manipulations with n-arm robots

Benjamin Cohen, Mike Phillips, and Maxim Likhachev. Planning single-arm manipulations with n-arm robots. In Eigth Annual Symposium on Combinatorial Search , 2015

work page 2015

[15] [15]

Lazy Receding Horizon A* for Efﬁcient Path Planning in Graphs with Expensive-to-Evaluate Edges

Aditya Mandalika, Oren Salzman, and Siddhartha Srini- vasa. Lazy Receding Horizon A* for Efﬁcient Path Planning in Graphs with Expensive-to-Evaluate Edges. pages 476–484, 2018

work page 2018

[16] [16]

Gammell, Siddhartha S

Jonathan D. Gammell, Siddhartha S. Srinivasa, and Tim- othy D. Barfoot. Batch Informed Trees: Sampling- based optimal planning via heuristically guided search of random geometric graphs. In ICRA, 2015

work page 2015

[17] [17]

A 2 level fuzzy prm for manipulation planning

Christian L Nielsen and Lydia E Kavraki. A 2 level fuzzy prm for manipulation planning. In IROS, 2000

work page 2000

[18] [18]

Heuris- tic search on graphs with existence priors for expensive- to-evaluate edges

Venkatraman Narayanan and Maxim Likhachev. Heuris- tic search on graphs with existence priors for expensive- to-evaluate edges. In ICAPS, 2017

work page 2017

[19] [19]

Learning high- dimensional mixture models for fast collision detection in rapidly-exploring random trees

Jinwook Huh and Daniel D Lee. Learning high- dimensional mixture models for fast collision detection in rapidly-exploring random trees. In ICRA, 2016

work page 2016

[20] [20]

Pareto-optimal search over conﬁg- uration space beliefs for anytime motion planning

Shushman Choudhury, Christopher M Dellin, and Sid- dhartha S Srinivasa. Pareto-optimal search over conﬁg- uration space beliefs for anytime motion planning. In IROS, 2016

work page 2016

[21] [21]

Free-conﬁguration biased sampling for motion planning

Joshua Bialkowski, Michael Otte, and Emilio Frazzoli. Free-conﬁguration biased sampling for motion planning. In IROS, 2013

work page 2013

[22] [22]

Path planning in expansive conﬁguration spaces

David Hsu, J-C Latombe, and Rajeev Motwani. Path planning in expansive conﬁguration spaces. In ICRA, 1997

work page 1997

[23] [23]

Sampling-based mo- tion planning using predictive models

Brendan Burns and Oliver Brock. Sampling-based mo- tion planning using predictive models. In ICRA, 2005

work page 2005

[24] [24]

Burs of free c-space: a novel structure for path planning

Bakir Lacevic, Dinko Osmankovic, and Adnan Ade- movic. Burs of free c-space: a novel structure for path planning. In ICRA. IEEE, 2016

work page 2016

[25] [25]

Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning

Wen Sun, J Andrew Bagnell, and Byron Boots. Truncated horizon policy search: Combining reinforcement learning & imitation learning. arXiv preprint arXiv:1805.11240 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[26] [26]

Fast Policy Learning through Imitation and Reinforcement

Ching-An Cheng, Xinyan Yan, Nolan Wagener, and Byron Boots. Fast policy learning through imitation and reinforcement. arXiv preprint arXiv:1805.10413 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[27] [27]

Learning heuristic search via imitation

Mohak Bhardwaj, Sanjiban Choudhury, and Sebastian Scherer. Learning heuristic search via imitation. In CoRL, 2017

work page 2017

[28] [28]

Plato: Policy learning using adaptive trajectory optimization

Gregory Kahn, Tianhao Zhang, Sergey Levine, and Pieter Abbeel. Plato: Policy learning using adaptive trajectory optimization. In ICRA, 2017

work page 2017

[29] [29]

Learning from the Hindsight Plan -- Episodic MPC Improvement

Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, and Pieter Abbeel. Learning from the hind- sight plan–episodic mpc improvement. arXiv preprint arXiv:1609.09001, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016