Safe Planning in Interactive Environments via Iterative Policy Updates and Adversarially Robust Conformal Prediction

Eliot Shekhtman; Lars Lindemann; Nikolai Matni; Omid Mirzaeedodangeh

arxiv: 2511.10586 · v2 · submitted 2025-11-13 · 📡 eess.SY · cs.RO· cs.SY

Safe Planning in Interactive Environments via Iterative Policy Updates and Adversarially Robust Conformal Prediction

Omid Mirzaeedodangeh , Eliot Shekhtman , Nikolai Matni , Lars Lindemann This is my paper

Pith reviewed 2026-05-17 22:11 UTC · model grok-4.3

classification 📡 eess.SY cs.ROcs.SY

keywords safe planninginteractive environmentsconformal predictionpolicy updatesdistribution shiftssafety guaranteesautonomous agents

0 comments

The pith

An iterative framework maintains valid safety guarantees for planning in reactive environments by adjusting conformal prediction for policy-induced shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish a method for keeping safety assurances intact when an autonomous agent's policy updates cause a responsive environment to change its behavior, as happens with a self-driving car among pedestrians. Existing conformal prediction techniques break down because they rely on data exchangeability, which no longer holds when the agent's actions and the environment's responses form a circular loop. The solution performs standard conformal prediction on current data and then analytically adjusts the bounds using a sensitivity analysis of how policy changes affect trajectories, yielding a safe open-loop planner that iterates safely. A contraction analysis supplies conditions for the whole process to converge. This matters for real applications where agents must replan without invalidating prior safety claims.

Core claim

The authors claim an iterative framework that runs regular conformal prediction each episode on data from the current policy, then transfers the safety guarantees to a new policy by analytically correcting for the distribution shift induced by that policy change. The correction rests on a policy-to-trajectory sensitivity analysis. A separate contraction argument shows conditions under which both the conformal bounds and the policy sequence converge. The result is a safe episodic planner whose guarantees remain valid in interactive settings, with demonstrations on a two-dimensional car-pedestrian scenario and a high-dimensional quadcopter.

What carries the argument

Adversarially robust conformal prediction whose quantile is adjusted by a policy-to-trajectory sensitivity bound that quantifies and accounts for the distribution shift caused by each planned policy update.

If this is right

Safety guarantees transfer from one policy to the next without requiring fresh exchangeable data.
Both the conformal prediction sets and the sequence of policies converge under the stated contraction conditions.
The planner produces safe open-loop trajectories for each episode while remaining valid in the presence of environment reactions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sensitivity-adjustment idea could be applied to other sequential decision problems where one agent's actions alter the statistics of its observations.
Tighter sensitivity bounds, perhaps learned from data, might reduce the conservatism that currently limits how aggressively policies can be updated.
Hardware tests with human participants would reveal whether the derived bounds remain valid when real reaction times and perception noise are present.

Load-bearing premise

The policy-to-trajectory sensitivity analysis must produce a sufficiently tight bound on the distribution shift caused by each policy update.

What would settle it

A measured rate of safety violations after a policy update that exceeds the adjusted conformal prediction bound in an interactive simulation or hardware experiment would falsify the guarantee transfer.

Figures

Figures reproduced from arXiv: 2511.10586 by Eliot Shekhtman, Lars Lindemann, Nikolai Matni, Omid Mirzaeedodangeh.

**Figure 1.** Figure 1: “Chicken-and-egg” problem: A change in policy can induce a distribution shift in the environment, here a pedestrian. This shift results in a modified safety radius r ′ that captures the pedestrian’s behavior under this policy. This modified safety radius in turn requires a policy update π(r ′ ). Non-interactive approaches sequentially integrate predictions of uncontrollable agents into the design of the … view at source ↗

**Figure 2.** Figure 2: Our iterative algorithm consists of the following steps: (0) initialization via a safe (potentially [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: 2D car-pedestrian example. Top left: convergence of tube radius [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

read the original abstract

Safe planning of an autonomous agent in interactive environments -- such as the control of a self-driving vehicle among pedestrians -- poses a major challenge as the behavior of the environment is unknown and reactive to the behavior of the autonomous agent. This coupling gives rise to interaction-driven distribution shifts where the autonomous agent's control policy may change the environment's behavior, thereby invalidating safety guarantees in existing work. Indeed, recent works have used conformal prediction (CP) to generate distribution-free safety guarantees using observed data of the environment. However, CP's assumption on data exchangeability is violated in interactive settings due to a circular dependency where a control policy update changes the environment's behavior, and vice versa. To address this gap, we propose an iterative framework that robustly maintains safety guarantees across policy updates by quantifying the potential impact of a planned policy update on the environment's behavior. We realize this via adversarially robust CP where we perform a regular CP step in each episode using observed data under the current policy, but then transfer safety guarantees across policy updates by analytically adjusting the CP result to account for distribution shifts. This adjustment is performed based on a policy-to-trajectory sensitivity analysis, resulting in a safe, episodic open-loop planner. We further conduct a contraction analysis of the system providing conditions under which both the CP results and the policy updates are guaranteed to converge. We empirically demonstrate these safety and convergence guarantees on a two-dimensional car-pedestrian and a high-dimensional quadcopter case study. To the best of our knowledge, these are the first results that provide valid safety guarantees in such interactive settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a practical way to carry conformal prediction safety bounds across policy updates in reactive environments by using sensitivity analysis to adjust for the induced shift, plus a contraction argument for convergence.

read the letter

The core contribution is an iterative loop that runs standard CP on data collected under the current policy, then analytically adjusts the conformal quantile to account for how the next policy change will alter the environment's response distribution. They back this with a policy-to-trajectory sensitivity relation and show convergence conditions under which both the safety sets and the policies stabilize. On the car-pedestrian and quadcopter examples the method keeps the empirical violation rate inside the target while the policy improves, which is useful for anyone who has watched plain CP break once the controller starts affecting the other agents.

Referee Report

3 major / 2 minor

Summary. The paper proposes an iterative framework for safe planning in interactive environments (e.g., autonomous vehicles among pedestrians) where policy updates induce distribution shifts that violate standard conformal prediction (CP) exchangeability. It performs per-episode CP on observed data under the current policy and transfers guarantees across updates via an analytical policy-to-trajectory sensitivity adjustment within an adversarially robust CP step. A contraction analysis is provided to guarantee convergence of both the CP quantiles and policy iterates under suitable conditions. The approach is evaluated on a 2D car-pedestrian scenario and a high-dimensional quadcopter example, claiming the first valid safety guarantees for such reactive interactive settings.

Significance. If the sensitivity bounds are shown to dominate worst-case shifts without hidden conservatism and the contraction holds uniformly, the result would be significant: it extends distribution-free safety certificates to non-stationary interactive control problems where environment reactivity couples with policy changes. The separation of episodic CP from analytical shift adjustment, together with the empirical demonstrations on both low- and high-dimensional systems, offers a practical path forward for safe planning under distribution shift.

major comments (3)

[§4.2] §4.2 (Policy-to-Trajectory Sensitivity Analysis): the adjustment of the conformal quantile rests on a local linearization or Lipschitz-style bound around the current policy. In reactive settings (pedestrian avoidance or quadcopter interaction), higher-order or state-dependent terms can produce distribution shifts larger than this bound, directly invalidating the transferred safety guarantee. A concrete error term or worst-case analysis quantifying the linearization residual is needed to support the claim.
[Theorem 5.1] Theorem 5.1 / contraction argument: the proof that both CP results and policy updates converge assumes the adjusted sets remain valid after each update. This assumption is load-bearing; if the sensitivity bound fails to dominate the true shift (as can occur under strong reactivity), the contraction mapping property does not hold. Explicit conditions on the Lipschitz constant or reactivity level that guarantee domination should be stated.
[Empirical evaluation] Empirical sections (car-pedestrian and quadcopter): the reported safety rates and convergence plots do not include an ablation that deliberately violates the local-linearization assumption (e.g., by increasing pedestrian reactivity). Without such a stress test, it is unclear whether the observed safety is due to the bound being tight or merely conservative in the tested regimes.

minor comments (2)

[§3-4] Notation for the sensitivity map and the adjusted quantile should be introduced with a clear table or diagram showing how the per-episode CP set is transformed into the robust set used for planning.
[Introduction] The abstract's claim of 'first results' would benefit from a brief comparison paragraph in the introduction that cites the closest prior robust-CP or interactive-planning works to clarify the precise novelty.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and insightful comments, which help clarify the assumptions and strengthen the presentation of our results on safe planning under interaction-induced shifts. We address each major comment below, indicating planned revisions to the manuscript.

read point-by-point responses

Referee: [§4.2] §4.2 (Policy-to-Trajectory Sensitivity Analysis): the adjustment of the conformal quantile rests on a local linearization or Lipschitz-style bound around the current policy. In reactive settings (pedestrian avoidance or quadcopter interaction), higher-order or state-dependent terms can produce distribution shifts larger than this bound, directly invalidating the transferred safety guarantee. A concrete error term or worst-case analysis quantifying the linearization residual is needed to support the claim.

Authors: We agree that the first-order sensitivity analysis in §4.2 requires a quantifiable residual to remain valid under strong reactivity. In the revision we will augment the policy-to-trajectory mapping with an explicit Taylor remainder term (Lagrange form) under the assumption of bounded second derivatives, yielding a concrete additive error bound that is folded into the adversarially robust CP quantile. This bound will be stated as a function of the policy-update magnitude and will be used to tighten the worst-case adjustment when higher-order effects cannot be neglected. revision: yes
Referee: [Theorem 5.1] Theorem 5.1 / contraction argument: the proof that both CP results and policy updates converge assumes the adjusted sets remain valid after each update. This assumption is load-bearing; if the sensitivity bound fails to dominate the true shift (as can occur under strong reactivity), the contraction mapping property does not hold. Explicit conditions on the Lipschitz constant or reactivity level that guarantee domination should be stated.

Authors: The contraction in Theorem 5.1 indeed rests on the adjusted CP sets remaining valid. We will revise the theorem statement to include an explicit assumption that the environment response is Lipschitz continuous with constant L_env, and we will derive the precise threshold on L_env (relative to the policy-update step size) under which the sensitivity adjustment dominates any induced shift. The proof will then be updated to invoke this condition when establishing the contraction mapping for both the quantile sequence and the policy iterates. revision: yes
Referee: [Empirical evaluation] Empirical sections (car-pedestrian and quadcopter): the reported safety rates and convergence plots do not include an ablation that deliberately violates the local-linearization assumption (e.g., by increasing pedestrian reactivity). Without such a stress test, it is unclear whether the observed safety is due to the bound being tight or merely conservative in the tested regimes.

Authors: We acknowledge that the current experiments do not stress-test the linearization assumption. In the revised manuscript we will add an ablation in the car-pedestrian scenario that systematically scales the pedestrians’ reactivity gain from the nominal value up to a level where the first-order bound is expected to be violated. We will report empirical safety rates, observed distribution-shift magnitudes, and whether the adjusted CP quantiles remain conservative or become invalid, thereby delineating the practical operating regime of the method. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The framework collects data under the current policy, applies standard conformal prediction to obtain a quantile, and then applies an independent analytical adjustment derived from a policy-to-trajectory sensitivity bound to transfer the guarantee across updates. This sensitivity relation is obtained from first-principles local analysis (Lipschitz or linearization-style) rather than being fitted to the same data or defined in terms of the target safety set. The subsequent contraction mapping supplies separate conditions under which the adjusted sets and policy sequence converge; these conditions reference the sensitivity bound but do not presuppose the final safety claim. No step reduces by construction to a fitted input renamed as prediction, a self-definition, or a load-bearing self-citation whose validity is internal to the paper. The argument therefore rests on external conformal-prediction theory plus an explicit sensitivity model and is not circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the existence of a computable policy-to-trajectory sensitivity bound that can be used to adjust CP quantiles without introducing new fitted parameters beyond standard CP calibration; the contraction analysis invokes standard fixed-point assumptions from dynamical systems.

axioms (2)

domain assumption A computable upper bound on the trajectory distribution shift induced by a policy update exists and can be obtained from local sensitivity analysis.
Invoked when transferring CP guarantees across policy updates in the iterative framework.
domain assumption The overall system satisfies contraction conditions that guarantee convergence of both CP bounds and policy iterates.
Stated in the contraction analysis section referenced in the abstract.

pith-pipeline@v0.9.0 · 5607 in / 1375 out tokens · 38554 ms · 2026-05-17T22:11:02.089589+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 2 internal anchors

[1]

Core challenges of social robot navigation: A survey,

C. Mavrogiannis, F. Baldini, A. Wang, D. Zhao, P. Trautman, A. Steinfeld, and J. Oh, “Core challenges of social robot navigation: A survey,”ACM Transactions on Human-Robot Interaction, vol. 12, no. 3, pp. 1–39, 2023

work page 2023
[2]

Multi-agent embod- ied ai: Advances and future directions,

Z. Feng, R. Xue, L. Yuan, Y . Yu, N. Ding, M. Liu, B. Gao, J. Sun, and G. Wang, “Multi-agent embodied ai: Advances and future directions,” 2025. [Online]. Available: https://arxiv.org/abs/2505.05108

work page arXiv 2025
[3]

Unfreezing the robot: Navigation in dense, interacting crowds,

P. Trautman and A. Krause, “Unfreezing the robot: Navigation in dense, interacting crowds,” in2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2010, pp. 797–803

work page 2010
[4]

Robot motion planning in dynamic, uncertain environments,

N. E. Du Toit and J. W. Burdick, “Robot motion planning in dynamic, uncertain environments,”IEEE Transactions on Robotics, vol. 28, no. 1, pp. 101–115, 2011

work page 2011
[5]

Socially compliant mobile robot navigation via inverse reinforcement learning,

H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially compliant mobile robot navigation via inverse reinforcement learning,”The Int. Journal Robot. Research, vol. 35, no. 11, pp. 1289–1307, 2016

work page 2016
[6]

Collision avoidance in pedestrian-rich environments with deep reinforcement learning,

M. Everett, Y . F. Chen, and J. P. How, “Collision avoidance in pedestrian-rich environments with deep reinforcement learning,”IEEE Access, vol. 9, pp. 10 357–10 377, 2021

work page 2021
[7]

V ovk, A

V . V ovk, A. Gammerman, and G. Shafer,Algorithmic learning in a random world. Springer, 2005, vol. 29

work page 2005
[8]

A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

A. N. Angelopoulos and S. Bates, “A gentle introduction to conformal prediction and distribution-free uncertainty quantification,”arXiv preprint arXiv:2107.07511, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[9]

Conformal prediction regions for time series using linear complementarity programming,

M. Cleaveland, I. Lee, G. J. Pappas, and L. Lindemann, “Conformal prediction regions for time series using linear complementarity programming,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 19, 2024, pp. 20 984–20 992

work page 2024
[10]

Copula conformal prediction for multi-step time series forecasting,

S. Sun and R. Yu, “Copula conformal prediction for multi-step time series forecasting,”arXiv preprint arXiv:2212.03281, 2022

work page arXiv 2022
[11]

Forking uncertainties: Reliable prediction and model predic- tive control with sequence models via conformal risk control,

M. Zecchin, S. Park, and O. Simeone, “Forking uncertainties: Reliable prediction and model predic- tive control with sequence models via conformal risk control,”IEEE Journal on Selected Areas in Information Theory, vol. 5, pp. 44–61, 2024

work page 2024
[12]

Safe planning in dynamic environments using conformal prediction,

L. Lindemann, M. Cleaveland, G. Shim, and G. J. Pappas, “Safe planning in dynamic environments using conformal prediction,”IEEE Robotics and Automation Letters, vol. 8, no. 8, pp. 5116–5123, 2023

work page 2023
[13]

Adaptive conformal prediction for motion planning among dynamic agents,

A. Dixit, L. Lindemann, S. X. Wei, M. Cleaveland, G. J. Pappas, and J. W. Burdick, “Adaptive conformal prediction for motion planning among dynamic agents,” inProceedings 12 of The 5th Annual Learning for Dynamics and Control Conference, ser. Proceedings of Machine Learning Research, vol. 211. PMLR, 2023, pp. 300–314. [Online]. Available: https://proceedi...

work page 2023
[14]

Egocentric conformal prediction for safe and efficient navigation in dynamic cluttered environments,

J. Shin, J. Lee, and I. Yang, “Egocentric conformal prediction for safe and efficient navigation in dynamic cluttered environments,”arXiv preprint arXiv:2504.00447, 2025

work page arXiv 2025
[15]

Sonic: Safe social nav- igation with adaptive conformal inference and constrained reinforcement learning,

J. Yao, X. Zhang, Y . Xia, Z. Wang, A. K. Roy-Chowdhury, and J. Li, “Sonic: Safe social nav- igation with adaptive conformal inference and constrained reinforcement learning,”arXiv preprint arXiv:2407.17460, 2024

work page arXiv 2024
[16]

Conformal prediction for uncertainty-aware planning with diffusion dynamics model,

J. Sun, Y . Jiang, J. Qiu, P. Nobel, M. J. Kochenderfer, and M. Schwager, “Conformal prediction for uncertainty-aware planning with diffusion dynamics model,”Advances in Neural Information Process- ing Systems, vol. 36, pp. 80 324–80 337, 2023

work page 2023
[17]

Conformal prediction in the loop: Risk-aware con- trol barrier functions for stochastic systems with data-driven state estimators,

J. Zhang, B. Hoxha, G. Fainekos, and D. Panagou, “Conformal prediction in the loop: Risk-aware con- trol barrier functions for stochastic systems with data-driven state estimators,”IEEE Control Systems Letters, 2025

work page 2025
[18]

Statistical guarantees in data-driven nonlinear control: Conformal robustness for stability and safety,

T.-W. Hsu and H. Tsukamoto, “Statistical guarantees in data-driven nonlinear control: Conformal robustness for stability and safety,”IEEE Control Systems Letters, 2025

work page 2025
[19]

Safe pomdp online planning among dy- namic agents via adaptive conformal prediction,

S. Sheng, P. Yu, D. Parker, M. Kwiatkowska, and L. Feng, “Safe pomdp online planning among dy- namic agents via adaptive conformal prediction,”IEEE Robotics and Automation Letters, 2024

work page 2024
[20]

arXiv preprint arXiv:2505.09427 , year=

A. Doula, M. Mühlhäuser, and A. S. Guinea, “Safepath: Conformal prediction for safe llm-based autonomous navigation,”arXiv preprint arXiv:2505.09427, 2025

work page arXiv 2025
[21]

Probabilistically correct language-based multi-robot planning using conformal prediction,

J. Wang, G. He, and Y . Kantaros, “Probabilistically correct language-based multi-robot planning using conformal prediction,”IEEE Robotics and Automation Letters, 2024

work page 2024
[22]

Formal verification and control with conformal prediction,

L. Lindemann, Y . Zhao, X. Yu, G. J. Pappas, and J. V . Deshmukh, “Formal verification and control with conformal prediction,”arXiv preprint arXiv:2409.00536, 2024

work page arXiv 2024
[23]

Conformal prediction under covariate shift,

R. J. Tibshirani, R. Foygel Barber, E. Candes, and A. Ramdas, “Conformal prediction under covariate shift,”Advances in neural information processing systems, vol. 32, 2019

work page 2019
[24]

Robust conformal prediction for stl runtime verification under distribution shift,

Y . Zhao, B. Hoxha, G. Fainekos, J. V . Deshmukh, and L. Lindemann, “Robust conformal prediction for stl runtime verification under distribution shift,” in2024 ACM/IEEE 15th International Conference on Cyber-Physical Systems (ICCPS). IEEE, 2024, pp. 169–179

work page 2024
[25]

Adaptive conformal inference under distribution shift,

I. Gibbs and E. Candes, “Adaptive conformal inference under distribution shift,” inAdvances in Neural Information Processing Systems, vol. 34, 2021, pp. 1660–1672

work page 2021
[26]

Adaptive conformal predictions for time series,

M. Zaffran, O. Féron, Y . Goude, J. Josse, and A. Dieuleveut, “Adaptive conformal predictions for time series,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 25 834–25 866

work page 2022
[27]

Doubly robust calibration of prediction sets under covariate shift,

Y . Yang, A. K. Kuchibhotla, and E. Tchetgen Tchetgen, “Doubly robust calibration of prediction sets under covariate shift,”Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 86, no. 4, pp. 943–965, 2024. 13

work page 2024
[28]

Wqlcp: Weighted adaptive conformal prediction for robust uncertainty quantification under distribution shifts,

S. Alijani and H. Najjaran, “Wqlcp: Weighted adaptive conformal prediction for robust uncertainty quantification under distribution shifts,” 2025. [Online]. Available: https://arxiv.org/abs/2505.19587

work page arXiv 2025
[29]

Robust validation: Confident predictions even when distributions shift,

M. Cauchois, S. Gupta, A. Ali, and J. C. Duchi, “Robust validation: Confident predictions even when distributions shift,”Journal of the American Statistical Association, vol. 119, no. 548, pp. 3033–3044, 2024

work page 2024
[30]

Conformal prediction under levy-prokhorov distribution shifts: Robustness to local and global perturbations,

L. Aolaritei, Z. O. Wang, J. Zhu, M. I. Jordan, and Y . Marzouk, “Conformal prediction under levy-prokhorov distribution shifts: Robustness to local and global perturbations,”arXiv preprint arXiv:2502.14105, 2025

work page arXiv 2025
[31]

Adversarially robust conformal prediction,

A. Gendler, T. Weng, L. Daniel, and Y . Romano, “Adversarially robust conformal prediction,” in Proceedings of the International Conference on Learning Representations (ICLR), 2022, openReview ID: 9L1BsI4wP1H. [Online]. Available: https://openreview.net/forum?id=9L1BsI4wP1H

work page 2022
[32]

Scalable safe long-horizon planning in dynamic envi- ronments leveraging conformal prediction and temporal correlations,

S. Tonkens, S. Sun, R. Yu, and S. Herbert, “Scalable safe long-horizon planning in dynamic envi- ronments leveraging conformal prediction and temporal correlations,” inLong-Term Human Motion Prediction Workshop, International Conference on Robotics and Automation, 2023

work page 2023
[33]

Signal temporal logic control synthesis among uncontrollable dynamic agents with conformal prediction,

X. Yu, Y . Zhao, X. Yin, and L. Lindemann, “Signal temporal logic control synthesis among uncontrollable dynamic agents with conformal prediction,”Automatica, vol. 183, p. 112616, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0005109825005126

work page 2026
[34]

Interaction-aware confor- mal prediction for crowd navigation,

Z. Huang, T. Ji, H. Zhang, F. C. Pouria, K. Driggs-Campbell, and R. Dong, “Interaction-aware confor- mal prediction for crowd navigation,”arXiv preprint arXiv:2502.06221, 2025

work page arXiv 2025
[35]

Chance-Constrained Neural MPC under Uncontrollable Agents via Sequential Convex Programming

S. Wang, Y . Gao, and X. Yin, “Learning-based conformal tube mpc for safe control in interactive multi-agent systems,”arXiv preprint arXiv:2504.03293, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

Conditional validity of inductive conformal predictors,

V . V ovk, “Conditional validity of inductive conformal predictors,” inAsian conference on machine learning. PMLR, 2012, pp. 475–490. 14 A The Iterative Policy Update Algorithm Algorithm 1Iterative Safe Policy Improvement (Explicit & Implicit Forms) 1:Input:confidence levels1−α,1−δ; initial safe policyπ 0. 2:Input:closed-loop gainκ=β T LU; fixed predictor...

work page 2012
[37]

Start with a search range[r low, rhigh] = [qj, rmax]

work page
[38]

Propose a candidate radiusr cand = (rlow +r high)/2

work page
[39]

Call the planner (the "pi_cand" step): SolveP[j+1;r cand]to get the candidate policyπ cand =π ⋆(rcand)

work page
[40]

Check the inequality: Calculate the true required budget for this policy:M req =β T ∥πcand −π j∥∞

work page
[41]

This is a valid solution, so we try to find a smaller one: rhigh =r cand

Ifr cand ≥q j +M req, the radius is safe. This is a valid solution, so we try to find a smaller one: rhigh =r cand. 15

work page
[42]

We must search higher:r low =r cand

Ifr cand < q j +M req, the radius is unsafe. We must search higher:r low =r cand

work page
[43]

The finalr j+1 isr high

Repeat until the range is sufficiently small. The finalr j+1 isr high. C Deferred Proofs C.1 Proof of Lemma 2 Let us first recall the dynamics of ego and uncontrollable agents from (1) as xt+1 =f X(xt, ut), y t+1 =f Y (yt, xt, ut, νt), under the noise sequence{ν t}T−1 t=0 . Under Assumption 1, i.e., Lipschitz continuity off X andf Y , our goal is now to b...

work page
[44]

Apply the Mean Value Theorem (MVT).SinceF r is differentiable on the open interval betweenq andq ′, there exists a pointξbetweenqandq ′ such that Fr(q′)−F r(q) =f r(ξ) (q′ −q)

work page
[45]

Use the density lower bound.Becausef r(ξ)≥f ⋆ by (54), we get |p′ −p|=|F r(q′)−F r(q)|=f r(ξ)|q ′ −q| ≥f ⋆ |q′ −q|

work page
[46]

25 Why we need Lemma 4.In the convergence analysis (Theorem 3), we compare the population quantiles at twolevels,1−¯α j (used for calibration) and1−α(the target)

Rearrange.Hence |q′ −q| ≤ |p′ −p| f⋆ , which is (55). 25 Why we need Lemma 4.In the convergence analysis (Theorem 3), we compare the population quantiles at twolevels,1−¯α j (used for calibration) and1−α(the target). Lemma 4 gives the clean bound Q1−¯αj −Q 1−α ≤ |α−¯αj| f⋆ , which is the level-shift term in the perturbationη j. Lemma 5(Empirical quantile ...

work page
[47]

Pin the target quantile and a local window.Let q⋆ =Q p(π⋆(r)), and pick a small∆>0that keeps[q ⋆ −∆, q ⋆ + ∆]inside the neighborhood wheref r ≥f ⋆

work page
[48]

SinceF r(q⋆) =p, we get Fr(q⋆ + ∆)≥p+f ⋆∆, F r(q⋆ −∆)≤p−f ⋆∆.(58)

One-sided controls for thetrueCDF using the density lower bound.By the Mean Value Theorem applied toF r on[q ⋆, q ⋆ + ∆]and[q ⋆ −∆, q ⋆]there exist pointsξ +, ξ− in those intervals with Fr(q⋆ + ∆)−F r(q⋆) =f r(ξ+) ∆≥f ⋆∆, F r(q⋆)−F r(q⋆ −∆) =f r(ξ−) ∆≥f ⋆∆. SinceF r(q⋆) =p, we get Fr(q⋆ + ∆)≥p+f ⋆∆, F r(q⋆ −∆)≤p−f ⋆∆.(58)

work page
[49]

Transfer these inequalities to theempiricalCDF on the DKW event.Using (56), we have Pn{Fr,n(q⋆ + ∆)≥F r(q⋆ + ∆)−ε≥p+f ⋆∆−ε(59) andF r,n(q⋆ −∆)≤F r(q⋆ −∆) +ε≤p−f ⋆∆ +ε} ≥1−δ.(60)

work page
[50]

Choose∆to make the inequalities straddle levelp.Set∆ = ε f⋆ so that Pn{Fr,n(q⋆ + ∆)≥pandF r,n(q⋆ −∆)≤p} ≥1−δ. 26

work page
[51]

SinceF r,n(q⋆ −∆)≤pandF r,n(q⋆ + ∆)≥p, the monotonicity ofF r,n implies Pn{qn,p ∈[q ⋆ −∆, q ⋆ + ∆ ]} ≥1−δ, where we used the union bound

Use the empirical quantile definition to trapq n,p.By definition,q n,p = inf{t:F r,n(t)≥p}. SinceF r,n(q⋆ −∆)≤pandF r,n(q⋆ + ∆)≥p, the monotonicity ofF r,n implies Pn{qn,p ∈[q ⋆ −∆, q ⋆ + ∆ ]} ≥1−δ, where we used the union bound. Therefore, Pn |qn,p −q ⋆| ≤∆ = ε f⋆ = εn(δ) f⋆ ≥1−δ. This is exactly (57). Why we need Lemma 5.In the convergence proof, the em...

work page

[1] [1]

Core challenges of social robot navigation: A survey,

C. Mavrogiannis, F. Baldini, A. Wang, D. Zhao, P. Trautman, A. Steinfeld, and J. Oh, “Core challenges of social robot navigation: A survey,”ACM Transactions on Human-Robot Interaction, vol. 12, no. 3, pp. 1–39, 2023

work page 2023

[2] [2]

Multi-agent embod- ied ai: Advances and future directions,

Z. Feng, R. Xue, L. Yuan, Y . Yu, N. Ding, M. Liu, B. Gao, J. Sun, and G. Wang, “Multi-agent embodied ai: Advances and future directions,” 2025. [Online]. Available: https://arxiv.org/abs/2505.05108

work page arXiv 2025

[3] [3]

Unfreezing the robot: Navigation in dense, interacting crowds,

P. Trautman and A. Krause, “Unfreezing the robot: Navigation in dense, interacting crowds,” in2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2010, pp. 797–803

work page 2010

[4] [4]

Robot motion planning in dynamic, uncertain environments,

N. E. Du Toit and J. W. Burdick, “Robot motion planning in dynamic, uncertain environments,”IEEE Transactions on Robotics, vol. 28, no. 1, pp. 101–115, 2011

work page 2011

[5] [5]

Socially compliant mobile robot navigation via inverse reinforcement learning,

H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially compliant mobile robot navigation via inverse reinforcement learning,”The Int. Journal Robot. Research, vol. 35, no. 11, pp. 1289–1307, 2016

work page 2016

[6] [6]

Collision avoidance in pedestrian-rich environments with deep reinforcement learning,

M. Everett, Y . F. Chen, and J. P. How, “Collision avoidance in pedestrian-rich environments with deep reinforcement learning,”IEEE Access, vol. 9, pp. 10 357–10 377, 2021

work page 2021

[7] [7]

V ovk, A

V . V ovk, A. Gammerman, and G. Shafer,Algorithmic learning in a random world. Springer, 2005, vol. 29

work page 2005

[8] [8]

A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

A. N. Angelopoulos and S. Bates, “A gentle introduction to conformal prediction and distribution-free uncertainty quantification,”arXiv preprint arXiv:2107.07511, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[9] [9]

Conformal prediction regions for time series using linear complementarity programming,

M. Cleaveland, I. Lee, G. J. Pappas, and L. Lindemann, “Conformal prediction regions for time series using linear complementarity programming,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 19, 2024, pp. 20 984–20 992

work page 2024

[10] [10]

Copula conformal prediction for multi-step time series forecasting,

S. Sun and R. Yu, “Copula conformal prediction for multi-step time series forecasting,”arXiv preprint arXiv:2212.03281, 2022

work page arXiv 2022

[11] [11]

Forking uncertainties: Reliable prediction and model predic- tive control with sequence models via conformal risk control,

M. Zecchin, S. Park, and O. Simeone, “Forking uncertainties: Reliable prediction and model predic- tive control with sequence models via conformal risk control,”IEEE Journal on Selected Areas in Information Theory, vol. 5, pp. 44–61, 2024

work page 2024

[12] [12]

Safe planning in dynamic environments using conformal prediction,

L. Lindemann, M. Cleaveland, G. Shim, and G. J. Pappas, “Safe planning in dynamic environments using conformal prediction,”IEEE Robotics and Automation Letters, vol. 8, no. 8, pp. 5116–5123, 2023

work page 2023

[13] [13]

Adaptive conformal prediction for motion planning among dynamic agents,

A. Dixit, L. Lindemann, S. X. Wei, M. Cleaveland, G. J. Pappas, and J. W. Burdick, “Adaptive conformal prediction for motion planning among dynamic agents,” inProceedings 12 of The 5th Annual Learning for Dynamics and Control Conference, ser. Proceedings of Machine Learning Research, vol. 211. PMLR, 2023, pp. 300–314. [Online]. Available: https://proceedi...

work page 2023

[14] [14]

Egocentric conformal prediction for safe and efficient navigation in dynamic cluttered environments,

J. Shin, J. Lee, and I. Yang, “Egocentric conformal prediction for safe and efficient navigation in dynamic cluttered environments,”arXiv preprint arXiv:2504.00447, 2025

work page arXiv 2025

[15] [15]

Sonic: Safe social nav- igation with adaptive conformal inference and constrained reinforcement learning,

J. Yao, X. Zhang, Y . Xia, Z. Wang, A. K. Roy-Chowdhury, and J. Li, “Sonic: Safe social nav- igation with adaptive conformal inference and constrained reinforcement learning,”arXiv preprint arXiv:2407.17460, 2024

work page arXiv 2024

[16] [16]

Conformal prediction for uncertainty-aware planning with diffusion dynamics model,

J. Sun, Y . Jiang, J. Qiu, P. Nobel, M. J. Kochenderfer, and M. Schwager, “Conformal prediction for uncertainty-aware planning with diffusion dynamics model,”Advances in Neural Information Process- ing Systems, vol. 36, pp. 80 324–80 337, 2023

work page 2023

[17] [17]

Conformal prediction in the loop: Risk-aware con- trol barrier functions for stochastic systems with data-driven state estimators,

J. Zhang, B. Hoxha, G. Fainekos, and D. Panagou, “Conformal prediction in the loop: Risk-aware con- trol barrier functions for stochastic systems with data-driven state estimators,”IEEE Control Systems Letters, 2025

work page 2025

[18] [18]

Statistical guarantees in data-driven nonlinear control: Conformal robustness for stability and safety,

T.-W. Hsu and H. Tsukamoto, “Statistical guarantees in data-driven nonlinear control: Conformal robustness for stability and safety,”IEEE Control Systems Letters, 2025

work page 2025

[19] [19]

Safe pomdp online planning among dy- namic agents via adaptive conformal prediction,

S. Sheng, P. Yu, D. Parker, M. Kwiatkowska, and L. Feng, “Safe pomdp online planning among dy- namic agents via adaptive conformal prediction,”IEEE Robotics and Automation Letters, 2024

work page 2024

[20] [20]

arXiv preprint arXiv:2505.09427 , year=

A. Doula, M. Mühlhäuser, and A. S. Guinea, “Safepath: Conformal prediction for safe llm-based autonomous navigation,”arXiv preprint arXiv:2505.09427, 2025

work page arXiv 2025

[21] [21]

Probabilistically correct language-based multi-robot planning using conformal prediction,

J. Wang, G. He, and Y . Kantaros, “Probabilistically correct language-based multi-robot planning using conformal prediction,”IEEE Robotics and Automation Letters, 2024

work page 2024

[22] [22]

Formal verification and control with conformal prediction,

L. Lindemann, Y . Zhao, X. Yu, G. J. Pappas, and J. V . Deshmukh, “Formal verification and control with conformal prediction,”arXiv preprint arXiv:2409.00536, 2024

work page arXiv 2024

[23] [23]

Conformal prediction under covariate shift,

R. J. Tibshirani, R. Foygel Barber, E. Candes, and A. Ramdas, “Conformal prediction under covariate shift,”Advances in neural information processing systems, vol. 32, 2019

work page 2019

[24] [24]

Robust conformal prediction for stl runtime verification under distribution shift,

Y . Zhao, B. Hoxha, G. Fainekos, J. V . Deshmukh, and L. Lindemann, “Robust conformal prediction for stl runtime verification under distribution shift,” in2024 ACM/IEEE 15th International Conference on Cyber-Physical Systems (ICCPS). IEEE, 2024, pp. 169–179

work page 2024

[25] [25]

Adaptive conformal inference under distribution shift,

I. Gibbs and E. Candes, “Adaptive conformal inference under distribution shift,” inAdvances in Neural Information Processing Systems, vol. 34, 2021, pp. 1660–1672

work page 2021

[26] [26]

Adaptive conformal predictions for time series,

M. Zaffran, O. Féron, Y . Goude, J. Josse, and A. Dieuleveut, “Adaptive conformal predictions for time series,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 25 834–25 866

work page 2022

[27] [27]

Doubly robust calibration of prediction sets under covariate shift,

Y . Yang, A. K. Kuchibhotla, and E. Tchetgen Tchetgen, “Doubly robust calibration of prediction sets under covariate shift,”Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 86, no. 4, pp. 943–965, 2024. 13

work page 2024

[28] [28]

Wqlcp: Weighted adaptive conformal prediction for robust uncertainty quantification under distribution shifts,

S. Alijani and H. Najjaran, “Wqlcp: Weighted adaptive conformal prediction for robust uncertainty quantification under distribution shifts,” 2025. [Online]. Available: https://arxiv.org/abs/2505.19587

work page arXiv 2025

[29] [29]

Robust validation: Confident predictions even when distributions shift,

M. Cauchois, S. Gupta, A. Ali, and J. C. Duchi, “Robust validation: Confident predictions even when distributions shift,”Journal of the American Statistical Association, vol. 119, no. 548, pp. 3033–3044, 2024

work page 2024

[30] [30]

Conformal prediction under levy-prokhorov distribution shifts: Robustness to local and global perturbations,

L. Aolaritei, Z. O. Wang, J. Zhu, M. I. Jordan, and Y . Marzouk, “Conformal prediction under levy-prokhorov distribution shifts: Robustness to local and global perturbations,”arXiv preprint arXiv:2502.14105, 2025

work page arXiv 2025

[31] [31]

Adversarially robust conformal prediction,

A. Gendler, T. Weng, L. Daniel, and Y . Romano, “Adversarially robust conformal prediction,” in Proceedings of the International Conference on Learning Representations (ICLR), 2022, openReview ID: 9L1BsI4wP1H. [Online]. Available: https://openreview.net/forum?id=9L1BsI4wP1H

work page 2022

[32] [32]

Scalable safe long-horizon planning in dynamic envi- ronments leveraging conformal prediction and temporal correlations,

S. Tonkens, S. Sun, R. Yu, and S. Herbert, “Scalable safe long-horizon planning in dynamic envi- ronments leveraging conformal prediction and temporal correlations,” inLong-Term Human Motion Prediction Workshop, International Conference on Robotics and Automation, 2023

work page 2023

[33] [33]

Signal temporal logic control synthesis among uncontrollable dynamic agents with conformal prediction,

X. Yu, Y . Zhao, X. Yin, and L. Lindemann, “Signal temporal logic control synthesis among uncontrollable dynamic agents with conformal prediction,”Automatica, vol. 183, p. 112616, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0005109825005126

work page 2026

[34] [34]

Interaction-aware confor- mal prediction for crowd navigation,

Z. Huang, T. Ji, H. Zhang, F. C. Pouria, K. Driggs-Campbell, and R. Dong, “Interaction-aware confor- mal prediction for crowd navigation,”arXiv preprint arXiv:2502.06221, 2025

work page arXiv 2025

[35] [35]

Chance-Constrained Neural MPC under Uncontrollable Agents via Sequential Convex Programming

S. Wang, Y . Gao, and X. Yin, “Learning-based conformal tube mpc for safe control in interactive multi-agent systems,”arXiv preprint arXiv:2504.03293, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[36] [36]

Conditional validity of inductive conformal predictors,

V . V ovk, “Conditional validity of inductive conformal predictors,” inAsian conference on machine learning. PMLR, 2012, pp. 475–490. 14 A The Iterative Policy Update Algorithm Algorithm 1Iterative Safe Policy Improvement (Explicit & Implicit Forms) 1:Input:confidence levels1−α,1−δ; initial safe policyπ 0. 2:Input:closed-loop gainκ=β T LU; fixed predictor...

work page 2012

[37] [37]

Start with a search range[r low, rhigh] = [qj, rmax]

work page

[38] [38]

Propose a candidate radiusr cand = (rlow +r high)/2

work page

[39] [39]

Call the planner (the "pi_cand" step): SolveP[j+1;r cand]to get the candidate policyπ cand =π ⋆(rcand)

work page

[40] [40]

Check the inequality: Calculate the true required budget for this policy:M req =β T ∥πcand −π j∥∞

work page

[41] [41]

This is a valid solution, so we try to find a smaller one: rhigh =r cand

Ifr cand ≥q j +M req, the radius is safe. This is a valid solution, so we try to find a smaller one: rhigh =r cand. 15

work page

[42] [42]

We must search higher:r low =r cand

Ifr cand < q j +M req, the radius is unsafe. We must search higher:r low =r cand

work page

[43] [43]

The finalr j+1 isr high

Repeat until the range is sufficiently small. The finalr j+1 isr high. C Deferred Proofs C.1 Proof of Lemma 2 Let us first recall the dynamics of ego and uncontrollable agents from (1) as xt+1 =f X(xt, ut), y t+1 =f Y (yt, xt, ut, νt), under the noise sequence{ν t}T−1 t=0 . Under Assumption 1, i.e., Lipschitz continuity off X andf Y , our goal is now to b...

work page

[44] [44]

Apply the Mean Value Theorem (MVT).SinceF r is differentiable on the open interval betweenq andq ′, there exists a pointξbetweenqandq ′ such that Fr(q′)−F r(q) =f r(ξ) (q′ −q)

work page

[45] [45]

Use the density lower bound.Becausef r(ξ)≥f ⋆ by (54), we get |p′ −p|=|F r(q′)−F r(q)|=f r(ξ)|q ′ −q| ≥f ⋆ |q′ −q|

work page

[46] [46]

25 Why we need Lemma 4.In the convergence analysis (Theorem 3), we compare the population quantiles at twolevels,1−¯α j (used for calibration) and1−α(the target)

Rearrange.Hence |q′ −q| ≤ |p′ −p| f⋆ , which is (55). 25 Why we need Lemma 4.In the convergence analysis (Theorem 3), we compare the population quantiles at twolevels,1−¯α j (used for calibration) and1−α(the target). Lemma 4 gives the clean bound Q1−¯αj −Q 1−α ≤ |α−¯αj| f⋆ , which is the level-shift term in the perturbationη j. Lemma 5(Empirical quantile ...

work page

[47] [47]

Pin the target quantile and a local window.Let q⋆ =Q p(π⋆(r)), and pick a small∆>0that keeps[q ⋆ −∆, q ⋆ + ∆]inside the neighborhood wheref r ≥f ⋆

work page

[48] [48]

SinceF r(q⋆) =p, we get Fr(q⋆ + ∆)≥p+f ⋆∆, F r(q⋆ −∆)≤p−f ⋆∆.(58)

One-sided controls for thetrueCDF using the density lower bound.By the Mean Value Theorem applied toF r on[q ⋆, q ⋆ + ∆]and[q ⋆ −∆, q ⋆]there exist pointsξ +, ξ− in those intervals with Fr(q⋆ + ∆)−F r(q⋆) =f r(ξ+) ∆≥f ⋆∆, F r(q⋆)−F r(q⋆ −∆) =f r(ξ−) ∆≥f ⋆∆. SinceF r(q⋆) =p, we get Fr(q⋆ + ∆)≥p+f ⋆∆, F r(q⋆ −∆)≤p−f ⋆∆.(58)

work page

[49] [49]

Transfer these inequalities to theempiricalCDF on the DKW event.Using (56), we have Pn{Fr,n(q⋆ + ∆)≥F r(q⋆ + ∆)−ε≥p+f ⋆∆−ε(59) andF r,n(q⋆ −∆)≤F r(q⋆ −∆) +ε≤p−f ⋆∆ +ε} ≥1−δ.(60)

work page

[50] [50]

Choose∆to make the inequalities straddle levelp.Set∆ = ε f⋆ so that Pn{Fr,n(q⋆ + ∆)≥pandF r,n(q⋆ −∆)≤p} ≥1−δ. 26

work page

[51] [51]

SinceF r,n(q⋆ −∆)≤pandF r,n(q⋆ + ∆)≥p, the monotonicity ofF r,n implies Pn{qn,p ∈[q ⋆ −∆, q ⋆ + ∆ ]} ≥1−δ, where we used the union bound

Use the empirical quantile definition to trapq n,p.By definition,q n,p = inf{t:F r,n(t)≥p}. SinceF r,n(q⋆ −∆)≤pandF r,n(q⋆ + ∆)≥p, the monotonicity ofF r,n implies Pn{qn,p ∈[q ⋆ −∆, q ⋆ + ∆ ]} ≥1−δ, where we used the union bound. Therefore, Pn |qn,p −q ⋆| ≤∆ = ε f⋆ = εn(δ) f⋆ ≥1−δ. This is exactly (57). Why we need Lemma 5.In the convergence proof, the em...

work page