OSCAR: Obstacle Survival Curves for Adaptive Robot Navigation
Pith reviewed 2026-06-28 17:27 UTC · model grok-4.3
The pith
A robot learns class-specific obstacle clearance distributions from experience to set optimal patience thresholds for waiting or rerouting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that online learning of residual clearance-time distributions conditioned on obstacle class, incorporating right-censored observations, enables a planner to compute dynamic patience thresholds that balance waiting and rerouting costs more effectively than fixed rules or heuristics.
What carries the argument
Class-conditioned residual clearance-time distributions learned from online experience with right-censored data, used to set patience thresholds in a time-dependent graph planner.
If this is right
- The policy converges to within 1% of oracle time-to-goal after fewer than 20 observations per obstacle class in simulation.
- It outperforms all heuristic baselines in simulation.
- Real-world tests show the policy improves its patience thresholds across 50 navigation episodes.
- The planner uses the models to decide patience at each blocked edge while maintaining obstacle memory.
Where Pith is reading between the lines
- The method could be extended to cases where obstacle classes must be inferred from sensor data rather than provided at encounter.
- Similar survival modeling might improve decisions in other robotics tasks involving uncertain event durations such as task scheduling.
- Shared clearance models across multiple robots could accelerate learning for the group in collaborative settings.
Load-bearing premise
Obstacle class labels must be available at the time each blockage is encountered.
What would settle it
If the learned policy in simulation does not approach the oracle's time-to-goal within 1% after 20 observations per class, or fails to improve over heuristics, the central claim would be falsified.
Figures
read the original abstract
A mobile robot following a graph of known routes can make costly navigation errors when a temporary obstacle blocks a critical edge: waiting too long behind a parked cart wastes time, but immediately rerouting around a person who would move in a few seconds is also inefficient. Standard reactive obstacle avoidance addresses local motion around obstacles, while fixed wait-or-reroute rules ignore how long different obstacle types tend to persist. We propose OSCAR: an adaptive survival-modeling framework for graph-based navigation with temporary blockages. Assuming obstacle class labels are available at encounter time, the robot learns class-conditioned residual clearance-time distributions from online experience, including right-censored observations when it reroutes before observing clearance. These survival models are integrated into a time-dependent graph planner that maintains obstacle memory and computes a patience threshold at each blocked edge: how long to wait before taking an alternate route. The method continuously updates its clearance estimates across episodes and uses them to balance waiting against rerouting. We evaluate the approach in simulation and on a real mobile robot in a university atrium with obstacles including people, chairs, bins, and tubes. In simulation, the learned policy's time-to-goal converges to within 1% of an oracle with access to ground-truth clearance distributions after fewer than 20 observations per obstacle class, outperforming all heuristic baselines. Real-world deployment confirms that the policy improves online, adapting its patience thresholds from experience across 50 navigation episodes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes OSCAR, a framework for graph-based robot navigation that learns class-conditioned residual clearance-time distributions online from navigation episodes (including right-censored observations when rerouting occurs before clearance) and integrates these survival models into a time-dependent planner that computes patience thresholds for waiting versus rerouting at blocked edges. It assumes obstacle class labels are available at encounter and claims that, in simulation, the learned policy's time-to-goal converges to within 1% of an oracle with ground-truth distributions after fewer than 20 observations per class while outperforming heuristic baselines; real-robot experiments in a university atrium with people, chairs, bins, and tubes are reported to show online improvement across 50 episodes.
Significance. If the convergence result holds after addressing potential biases in the survival estimation, the work would offer a data-driven alternative to fixed wait-or-reroute heuristics for temporary obstacles, with the online adaptation and explicit handling of censored clearance times as notable technical contributions. The simulation-to-real transfer and explicit comparison to an oracle provide a clear benchmark for assessing practical utility.
major comments (2)
- [Abstract] Abstract: the headline simulation claim (convergence to within 1% of oracle after <20 observations per class) rests on class-conditioned clearance distributions converging to ground truth, yet the censoring mechanism is policy-dependent—the patience threshold is computed from the current running model, so early inaccurate estimates directly determine the censoring times. This violates the non-informative censoring assumption required by standard survival estimators (Kaplan-Meier or parametric), creating a feedback loop whose effect on convergence is not analyzed.
- [Abstract] Abstract: no information is supplied on the concrete survival model (parametric family, non-parametric estimator, or mixture), the exact fitting procedure across episodes, how right-censored observations are incorporated, the statistical test or confidence interval used to assert “within 1%” convergence, or the precise definition of the heuristic baselines and their parameter settings.
minor comments (1)
- [Abstract] The assumption that obstacle class labels are available at encounter time is stated but its impact on real-world applicability is not quantified.
Simulated Author's Rebuttal
We thank the referee for highlighting important methodological clarifications needed in the abstract and for raising the issue of policy-dependent censoring. We address both points below and will revise the manuscript accordingly to strengthen the presentation of the survival modeling approach and its assumptions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline simulation claim (convergence to within 1% of oracle after <20 observations per class) rests on class-conditioned clearance distributions converging to ground truth, yet the censoring mechanism is policy-dependent—the patience threshold is computed from the current running model, so early inaccurate estimates directly determine the censoring times. This violates the non-informative censoring assumption required by standard survival estimators (Kaplan-Meier or parametric), creating a feedback loop whose effect on convergence is not analyzed.
Authors: The referee correctly identifies that the online patience threshold introduces dependence between the censoring time and the current survival estimate, which can violate the standard non-informative censoring assumption. While our empirical results demonstrate convergence to oracle performance, we did not provide a dedicated analysis of this feedback loop. We will add a new subsection in the methods and an additional simulation experiment that isolates the effect of policy-dependent censoring (e.g., by comparing against an oracle that uses fixed thresholds) and discuss the conditions under which the bias remains limited in practice. revision: partial
-
Referee: [Abstract] Abstract: no information is supplied on the concrete survival model (parametric family, non-parametric estimator, or mixture), the exact fitting procedure across episodes, how right-censored observations are incorporated, the statistical test or confidence interval used to assert “within 1%” convergence, or the precise definition of the heuristic baselines and their parameter settings.
Authors: We agree that these implementation details are essential and were omitted from the abstract. The full manuscript uses the Kaplan-Meier estimator updated incrementally per class, incorporating right-censored observations at the current patience threshold; the 1% figure is the relative difference in mean time-to-goal averaged over 100 simulation trials; baselines are fixed-wait policies at 5 s / 10 s / 20 s and a random-reroute policy. We will expand the abstract with a concise methods sentence and ensure the supplementary material or methods section contains the exact update rule, convergence metric definition, and baseline parameter values. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper describes an online learning procedure that fits class-conditioned survival models to observed clearance times (including right-censored cases produced by the current policy). The headline simulation result is an empirical statement that the resulting policy's time-to-goal approaches an external oracle after <20 observations per class. No equations, self-citations, or ansatzes are shown that reduce this convergence claim to a tautology, a fitted parameter renamed as a prediction, or a self-referential definition. The method remains self-contained against external benchmarks (ground-truth distributions and heuristic baselines).
Axiom & Free-Parameter Ledger
free parameters (1)
- class-conditioned survival distribution parameters
axioms (1)
- domain assumption Obstacle class labels are available at encounter time
Reference graph
Works this paper leans on
-
[1]
B. Lacerda, F. Faruq, D. Parker, and N. Hawes. Probabilistic planning with formal performance guarantees for mobile service robots.The International Journal of Robotics Research, 38 (9):1098–1123, 2019. doi:10.1177/0278364919856695. URLhttps://doi.org/10.1177/ 0278364919856695
-
[2]
Nardi and C
L. Nardi and C. Stachniss. Long-term robot navigation in indoor environments estimating patterns in traversability changes.2020 IEEE International Conference on Robotics and Automation (ICRA), pages 300–306, 2019. URLhttps://api.semanticscholar.org/ CorpusID:203591939
2020
-
[3]
B. Marthi. Navigation in partially observed dynamic roadmaps. 2010. URLhttps://api. semanticscholar.org/CorpusID:12577833
2010
-
[4]
Wurman, R
P. Wurman, R. D’Andrea, and M. Mountz. Coordinating hundreds of cooperative, autonomous vehicles in warehouses.AI Magazine, 29:9–20, 03 2008
2008
-
[5]
Fragapane, H.-H
G. Fragapane, H.-H. Hvolby, F. Sgarbossa, and J. O. Strandhagen. Autonomous mobile robots in hospital logistics. InAdvances in Production Management Systems. The Path to Digital Transformation and Innovation of Production Management Systems, pages 672–679, Cham,
-
[6]
Springer International Publishing
-
[7]
Hawes, C
N. Hawes, C. Burbridge, F. Jovan, L. Kunze, B. Lacerda, L. Mudrova, J. Young, J. Wyatt, D. Hebesberger, T. Kortner, et al. The strands project: Long-term autonomy in everyday environments.IEEE Robotics & Automation Magazine, 24(3):146–156, 2017
2017
-
[8]
A. R. Di ´eguez, R. Sanz, and J. L. Fern ´andez. Integrating time performance in global path planning for autonomous mobile robots. In X.-J. Jing, editor,Motion Planning, chapter 24. IntechOpen, London, 2008. doi:10.5772/5999. URLhttps://doi.org/10.5772/5999
-
[9]
S. Bing. Online canadian traveler problem with stochastic blockages recovery time.Sys- tems Engineering - Theory & Practice, 2005. URLhttps://api.semanticscholar.org/ CorpusID:156857270
2005
-
[10]
E. L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations.Jour- nal of the American Statistical Association, 53:457–481, 1958. URLhttps://api. semanticscholar.org/CorpusID:18549513
1958
-
[11]
D. R. Cox. Regression models and life tables (with discussion. 1972. URLhttps://api. semanticscholar.org/CorpusID:116197768
1972
-
[12]
J. P. Klein and M. L. Moeschberger.Survival Analysis: Techniques for Censored and Truncated Data. Statistics for Biology and Health. Springer, New York, 2 edition, 2003. ISBN 978-0- 387-95399-1
2003
-
[13]
K. Fujimura. Time-minimum routes in time-dependent networks.IEEE Transactions on Robotics and Automation, 11(3):343–351, 1995. doi:10.1109/70.388776
-
[14]
M. P. Wellman, M. Ford, and K. Larson. Path planning under time-dependent uncer- tainty. InConference on Uncertainty in Artificial Intelligence, 1995. URLhttps://api. semanticscholar.org/CorpusID:12281821
1995
-
[15]
Gao and I
S. Gao and I. Chabini. Optimal routing policy problems in stochastic time-dependent net- works.Transportation Research Part B-methodological, 40:93–122, 2006. URLhttps: //api.semanticscholar.org/CorpusID:121870726
2006
-
[16]
D. M. Rosen, J. Mason, and J. J. Leonard. Towards lifelong feature-based mapping in semi- static environments. In2016 IEEE International Conference on Robotics and Automation (ICRA), pages 1063–1070, 2016. doi:10.1109/ICRA.2016.7487237. 9
-
[17]
Tsamis, I
G. Tsamis, I. Kostavelis, D. Giakoumis, and D. Tzovaras. Towards life-long mapping of dynamic environments using temporal persistence modeling.2020 25th International Con- ference on Pattern Recognition (ICPR), pages 10480–10485, 2021. URLhttps://api. semanticscholar.org/CorpusID:233877760
2020
-
[18]
D. Fox, W. Burgard, and S. Thrun. The dynamic window approach to collision avoidance. IEEE Robotics & Automation Magazine, 4(1):23–33, 1997
1997
-
[19]
T. Kruse, A. K. Pandey, R. Alami, and A. Kirsch. Human-aware robot navigation: A survey. Robotics and Autonomous Systems, 61(12):1726–1743, 2013. ISSN 0921-8890. doi:https:// doi.org/10.1016/j.robot.2013.05.007. URLhttps://www.sciencedirect.com/science/ article/pii/S0921889013001048
-
[20]
P. Trautman, J. Ma, R. M. Murray, and A. Krause. Robot navigation in dense human crowds: Statistical models and experimental studies of human–robot cooperation.The International Journal of Robotics Research, 34(3):335–356, 2015. doi:10.1177/0278364914557874. URL https://doi.org/10.1177/0278364914557874
-
[21]
J. Han, M. Vanniasinghe, H. Sahak, N. Rhinehart, and T. Barfoot. Ratatouille: Imitation learning ingredients for real-world social robot navigation.arXiv preprint arXiv:2509.17204. doi:10.48550/arXiv.2509.17204
-
[22]
E. W. Dijkstra. A note on two problems in connexion with graphs.Numerische Mathematik, 1:269–271, 1959. doi:10.1007/BF01386390
-
[23]
P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost paths.IEEE Transactions on Systems Science and Cybernetics, 4(2):100–107,
-
[24]
doi:10.1109/TSSC.1968.300136
-
[25]
C. Papadimitriou and M. Yannakakis. Shortest paths without a map. pages 610–620, 01 1989. ISBN 978-3-540-51371-1. doi:10.1007/BFb0035787
-
[26]
Nikolova
E. Nikolova. Exact algorithms for the canadian traveller problem on paths and trees. 01 2008
2008
-
[27]
P. Eyerich, T. Keller, and M. Helmert. High-quality policies for the canadian traveler’s prob- lem.Proceedings of the AAAI Conference on Artificial Intelligence, 24(1):51–58, Jul. 2010. doi:10.1609/aaai.v24i1.7542. URLhttps://ojs.aaai.org/index.php/AAAI/article/ view/7542
-
[28]
H. Guo and T. D. Barfoot. The robust canadian traveler problem applied to robot routing. In 2019 International Conference on Robotics and Automation (ICRA), page 5523–5529. IEEE Press, 2019. doi:10.1109/ICRA.2019.8794252. URLhttps://doi.org/10.1109/ICRA. 2019.8794252
-
[29]
J. Halpern. Shortest route with time dependent length of edges and limited delay possi- bilities in nodes.Zeitschrift f ¨ur Operations Research, 21:117–124, 1977. URLhttps: //api.semanticscholar.org/CorpusID:12304537
1977
-
[30]
R. W. Hall. The fastest path through a network with random time-dependent travel times. Transp. Sci., 20:182–188, 1986. URLhttps://api.semanticscholar.org/CorpusID: 207246327
1986
-
[31]
Orda and R
A. Orda and R. Rom. Minimum weight paths in time-dependent networks.Networks, 21: 295–319, 1991. URLhttps://api.semanticscholar.org/CorpusID:15025216
1991
-
[32]
P. Furgale and T. Barfoot. Visual teach and repeat for long-range rover autonomy.J. Field Robotics, 27:534–560, 09 2010. doi:10.1002/rob.20342
-
[33]
Gpt-5.3 instant: Smoother, more useful everyday conversations.https://openai
OpenAI. Gpt-5.3 instant: Smoother, more useful everyday conversations.https://openai. com/index/gpt-5-3-instant/, 2026. Accessed: 2026-05-27. 10 Appendix A Functional Time-Dependent Planner and Proofs This appendix gives the functional planner used to computeA clear andA avoid, and proves the prop- erties claimed in Sec. 3. Throughout, times are measured ...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.