pith. sign in

arxiv: 2605.24463 · v1 · pith:FDZ62UWYnew · submitted 2026-05-23 · 📡 eess.SY · cs.SY

Cost-Aware Adaptive Conformal Inference for Runtime Assurance in Dynamic Environments

Pith reviewed 2026-06-30 13:17 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords conformal inferenceadaptive conformalruntime assurancecost-awareviolation costdynamic environmentsstatistical guaranteecontrol synthesis
0
0 comments X

The pith

Cost-aware conformal inference bounds both violation frequency and cumulative harm.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Cost-Aware Adaptive Conformal Inference, which folds violation costs into the adaptation rule so that prediction sets widen in proportion to how harmful a miscoverage would be. This produces simultaneous long-run bounds on the average rate of violations and on the total accumulated cost of those violations, even when the underlying data distribution shifts over time. A reader would care because standard conformal methods control only one of those quantities, leaving open the possibility that rare but expensive failures accumulate unacceptable harm. The method is then embedded in a model-free control loop that trades off task performance against these two risk measures.

Core claim

Cost-Aware Adaptive Conformal Inference uses a loss function that multiplies the usual miscoverage indicator by the realized violation cost; the resulting score sequence is fed into the standard adaptive conformal update, yielding a dual guarantee that the long-run fraction of violations stays below a target and the long-run average cost per step stays below a second target, all without knowledge of the time-varying distribution.

What carries the argument

Cost-aware loss function that multiplies the miscoverage indicator by the violation cost.

If this is right

  • The controller expands sets more aggressively precisely when violations would be costly and keeps them tight otherwise.
  • The closed-loop system balances task performance against both reliability and total harm without an explicit plant model.
  • Prediction-set size automatically reflects severity rather than treating every violation as equal.
  • The same guarantee holds for any sequence of cost functions provided the costs remain non-negative and bounded.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same weighting idea could be tried inside other adaptive inference schemes that currently track only frequency.
  • In safety-critical applications the cumulative-cost bound supplies a direct handle on expected total harm over a mission horizon.
  • One could test whether the dual guarantee remains intact when costs themselves are estimated from data rather than observed exactly.

Load-bearing premise

Weighting the miscoverage indicator by violation costs inside the conformal adaptation rule still produces valid statistical guarantees when the data distribution changes over time.

What would settle it

An experiment in which, under a known non-stationary distribution, either the long-run violation frequency exceeds its target or the cumulative violation cost exceeds its target while the algorithm is running.

Figures

Figures reproduced from arXiv: 2605.24463 by Bai Xue, Jingduo Pan, Luke Ong, Taoran Wu.

Figure 1
Figure 1. Figure 1: Comparison of selected trajectories on Vanderpol [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of experimental results (lower is better for all three metrics) [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Effect of Sensitivity β and W Effect of Learning Rate γ [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
read the original abstract

This paper addresses the problem of providing runtime assurance for systems operating online under unknown and potentially time-varying data distributions. We propose Cost-Aware Adaptive Conformal Inference (ACI), a novel framework that incorporates constraint violation costs directly into the conformal adaptation mechanism. Our key insight is that uncertainty margins should adapt not only to the frequency of constraint violations but also to their severity. We formalize this through a cost-aware loss function that couples the miscoverage indicator with violation costs. Unlike existing methods that regulate a single controlled metric, our approach provides a dual statistical guarantee: simultaneously bounding the long-run average violation frequencies (reliability) and cumulative violation cost (harm). By weighting prediction failures according to their severity, the algorithm enables the controller to respond proportionally to violation severity, expanding prediction sets aggressively when necessary while maintaining efficiency during nominal operation. We integrate Cost-Aware ACI into a robust control synthesis framework, creating a closed-loop system that balances task performance with runtime risk control without requiring explicit model knowledge. Experiments validate its effectiveness for online risk-aware controller synthesis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes Cost-Aware Adaptive Conformal Inference (ACI), which augments standard ACI by replacing the usual miscoverage loss with a cost-weighted version that couples the indicator of constraint violation with the associated violation cost. The central claim is that this yields a dual long-run statistical guarantee: the time-average violation frequency is bounded by a target α while the time-average cumulative violation cost is simultaneously bounded by a target eta. The method is then embedded in a robust control synthesis loop for runtime assurance under unknown time-varying distributions, with experiments demonstrating its use for online risk-aware controller design.

Significance. If the dual guarantee can be established, the contribution would be significant for safety-critical control applications. It would extend conformal prediction beyond single-metric coverage to a setting that penalizes high-severity violations more heavily, allowing the prediction sets (and thus the controller) to respond proportionally to harm rather than only to frequency. The closed-loop integration with control synthesis is a natural and potentially useful direction.

major comments (1)
  1. [Abstract] Abstract (and wherever the dual-guarantee theorem appears): the claim that both (1/n)Σ I_t ≤ α and (1/n)Σ c_t I_t ≤ eta hold simultaneously is load-bearing for the paper’s contribution. Standard ACI adapts a single threshold via a martingale or quantile-tracking argument that directly drives the unweighted miscoverage process to α. Substituting a cost-weighted loss shifts the adaptation signal to the weighted process. When violation costs are heterogeneous and time-varying, the threshold can converge to a value that meets the weighted target while leaving the unweighted frequency above α. The manuscript must either (i) state auxiliary assumptions (e.g., uniformly bounded costs, separate adaptation loops, or a proven invariance) that restore both bounds or (ii) provide an explicit proof that the weighted adaptation still controls the unweighted rate under the paper’s stated conditions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and substantive comment on the dual statistical guarantee, which is indeed central to the contribution. We address the concern directly below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract (and wherever the dual-guarantee theorem appears): the claim that both (1/n)Σ I_t ≤ α and (1/n)Σ c_t I_t ≤ β hold simultaneously is load-bearing for the paper’s contribution. Standard ACI adapts a single threshold via a martingale or quantile-tracking argument that directly drives the unweighted miscoverage process to α. Substituting a cost-weighted loss shifts the adaptation signal to the weighted process. When violation costs are heterogeneous and time-varying, the threshold can converge to a value that meets the weighted target while leaving the unweighted frequency above α. The manuscript must either (i) state auxiliary assumptions (e.g., uniformly bounded costs, separate adaptation loops, or a proven invariance) that restore both bounds or (ii) provide an explicit proof that the weighted adaptation still controls the unweighted rate under the paper’s stated condi

    Authors: We agree that the original presentation did not make the simultaneous control fully explicit. The Cost-Aware ACI update uses a single threshold driven by the cost-weighted loss, and the manuscript's theorem statement claims both long-run bounds without a self-contained argument showing why the unweighted frequency cannot exceed α when costs vary. In the revision we will supply an explicit proof (option (ii)) under the paper's existing conditions: costs are nonnegative and upper-bounded by a known constant C, the adaptation gain satisfies the standard step-size conditions for almost-sure convergence of the weighted process to β, and the indicator I_t is recovered from the weighted term via the bound 0 ≤ c_t I_t ≤ C I_t. This yields the auxiliary inequality that the unweighted average is controlled by (1/C) times the weighted average plus a vanishing term, thereby establishing both guarantees simultaneously without additional assumptions. The proof will be inserted after the main theorem and the abstract wording will be tightened to reference the new argument. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces a cost-aware loss function within the ACI adaptation mechanism and claims that this yields simultaneous long-run bounds on both unweighted violation frequency and cost-weighted harm. No equations, self-citations, or uniqueness theorems are exhibited in the abstract or description that would reduce either guarantee to a fitted parameter or prior result by construction. The adaptation is presented as a direct formal extension of standard ACI, with the dual property asserted to follow from the weighted loss without evident self-referential closure or renaming of known empirical patterns. The derivation chain therefore remains self-contained against external statistical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only; no explicit free parameters, invented entities, or ad-hoc axioms listed. Relies on background assumptions of conformal prediction.

axioms (1)
  • standard math Conformal prediction provides valid coverage guarantees under suitable assumptions on data exchangeability or stationarity
    Implicit foundation for any conformal inference method.

pith-pipeline@v0.9.1-grok · 5716 in / 1016 out tokens · 29556 ms · 2026-06-30T13:17:00.909470+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 5 canonical work pages · 2 internal anchors

  1. [1]

    Safe reinforcement learning via shielding

    Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. Safe reinforcement learning via shielding. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

  2. [2]

    Control barrier functions: Theory and applications

    Aaron D Ames, Samuel Coogan, Magnus Egerstedt, Gennaro Notomista, Koushil Sreenath, and Paulo Tabuada. Control barrier functions: Theory and applications. In2019 18th European control conference (ECC), pages 3420–3431. Ieee, 2019

  3. [3]

    Control barrier function based quadratic programs for safety critical systems.IEEE Transactions on Automatic Control, 62(8):3861–3876, 2016

    Aaron D Ames, Xiangru Xu, Jessy W Grizzle, and Paulo Tabuada. Control barrier function based quadratic programs for safety critical systems.IEEE Transactions on Automatic Control, 62(8):3861–3876, 2016

  4. [4]

    Casadi—a soft- ware framework for nonlinear optimization and optimal control.Mathematical Programming Computation, 11(1):1–36, 2018

    Joel Andersson, Joris Gillis, Greg Horn, Jim Rawlings, and Moritz Diehl. Casadi—a soft- ware framework for nonlinear optimization and optimal control.Mathematical Programming Computation, 11(1):1–36, 2018

  5. [5]

    Conformal pid control for time series prediction.Advances in neural information processing systems, 36:23047– 23074, 2023

    Anastasios Angelopoulos, Emmanuel Candes, and Ryan J Tibshirani. Conformal pid control for time series prediction.Advances in neural information processing systems, 36:23047– 23074, 2023

  6. [6]

    Conformal prediction beyond exchangeability.The Annals of Statistics, 51(2):816–845, 2023

    Rina Foygel Barber, Emmanuel J Candes, Aaditya Ramdas, and Ryan J Tibshirani. Conformal prediction beyond exchangeability.The Annals of Statistics, 51(2):816–845, 2023

  7. [7]

    Robust adaptive control of feedback linearizable mimo nonlinear systems with prescribed performance.IEEE transactions on Au- tomatic Control, 53(9):2090–2099, 2008

    Charalampos P Bechlioulis and George A Rovithakis. Robust adaptive control of feedback linearizable mimo nonlinear systems with prescribed performance.IEEE transactions on Au- tomatic Control, 53(9):2090–2099, 2008

  8. [8]

    Conformal quantitative predictive monitoring of stl requirements for stochastic processes

    Francesca Cairoli, Nicola Paoletti, Luca Bortolussi, et al. Conformal quantitative predictive monitoring of stl requirements for stochastic processes. InHSCC’23: Proceedings of the 26th 10 ACM International Conference on Hybrid Systems: Computation and Control, volume 1, pages 1–11. ACM, 2023

  9. [9]

    Guaranteeing safety of learned perception modules via measurement-robust control barrier functions

    Sarah Dean, Andrew Taylor, Ryan Cosner, Benjamin Recht, and Aaron Ames. Guaranteeing safety of learned perception modules via measurement-robust control barrier functions. In Conference on Robot Learning, pages 654–670. PMLR, 2021

  10. [10]

    Adaptive conformal prediction for motion planning among dynamic agents

    Anushri Dixit, Lars Lindemann, Skylar X Wei, Matthew Cleaveland, George J Pappas, and Joel W Burdick. Adaptive conformal prediction for motion planning among dynamic agents. InLearning for Dynamics and Control Conference, pages 300–314. PMLR, 2023

  11. [11]

    Shrinking horizon model predictive control with signal temporal logic constraints under stochastic dis- turbances.IEEE Transactions on Automatic Control, 64(8):3324–3331, 2018

    Samira S Farahani, Rupak Majumdar, Vinayak S Prabhu, and Sadegh Soudjani. Shrinking horizon model predictive control with signal temporal logic constraints under stochastic dis- turbances.IEEE Transactions on Automatic Control, 64(8):3324–3331, 2018

  12. [12]

    Achieving risk control in online learning settings.Transactions on Machine Learning Research, 2024

    Shai Feldman, Liran Ringel, Stephen Bates, and Yaniv Romano. Achieving risk control in online learning settings.Transactions on Machine Learning Research, 2024

  13. [13]

    Model predictive control: Theory and practice—a survey.Automatica, 25(3):335–348, 1989

    Carlos E Garcia, David M Prett, and Manfred Morari. Model predictive control: Theory and practice—a survey.Automatica, 25(3):335–348, 1989

  14. [14]

    Adaptive conformal inference under distribution shift

    Isaac Gibbs and Emmanuel Candes. Adaptive conformal inference under distribution shift. Advances in Neural Information Processing Systems, 34:1660–1672, 2021

  15. [15]

    Conformal inference for online prediction with arbitrary distribution shifts.Journal of Machine Learning Research, 25(162):1–36, 2024

    Isaac Gibbs and Emmanuel J Candès. Conformal inference for online prediction with arbitrary distribution shifts.Journal of Machine Learning Research, 25(162):1–36, 2024

  16. [16]

    Convex computation of the region of attraction of polynomial control systems.IEEE Transactions on Automatic Control, 59(2):297–312, 2013

    Didier Henrion and Milan Korda. Convex computation of the region of attraction of polynomial control systems.IEEE Transactions on Automatic Control, 59(2):297–312, 2013

  17. [17]

    How to train your robot with deep reinforcement learning: lessons we have learned.The International Journal of Robotics Research, 40(4-5):698–721, 2021

    Julian Ibarz, Jie Tan, Chelsea Finn, Mrinal Kalakrishnan, Peter Pastor, and Sergey Levine. How to train your robot with deep reinforcement learning: lessons we have learned.The International Journal of Robotics Research, 40(4-5):698–721, 2021

  18. [18]

    Conformal decision theory: Safe autonomous decisions from imperfect predictions

    Jordan Lekeufack, Anastasios N Angelopoulos, Andrea Bajcsy, Michael I Jordan, and Jitendra Malik. Conformal decision theory: Safe autonomous decisions from imperfect predictions. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 11668– 11675. IEEE, 2024

  19. [19]

    Safe planning in dynamic environments using conformal prediction.IEEE Robotics and Automation Letters, 8(8):5116–5123, 2023

    Lars Lindemann, Matthew Cleaveland, Gihyun Shim, and George J Pappas. Safe planning in dynamic environments using conformal prediction.IEEE Robotics and Automation Letters, 8(8):5116–5123, 2023

  20. [20]

    Control barrier functions for signal temporal logic tasks.IEEE control systems letters, 3(1):96–101, 2018

    Lars Lindemann and Dimos V Dimarogonas. Control barrier functions for signal temporal logic tasks.IEEE control systems letters, 3(1):96–101, 2018

  21. [21]

    Learning robust output control barrier functions from safe expert demonstrations.IEEE Open Journal of Control Systems, 3:158–172, 2024

    Lars Lindemann, Alexander Robey, Lejun Jiang, Satyajeet Das, Stephen Tu, and Nikolai Matni. Learning robust output control barrier functions from safe expert demonstrations.IEEE Open Journal of Control Systems, 3:158–172, 2024

  22. [22]

    Formal verification and control with conformal prediction: Practical safety guarantees for autonomous systems.IEEE Control Systems, 45(6):72–122, 2025

    Lars Lindemann, Yiqi Zhao, Xinyi Yu, George J Pappas, and Jyotirmoy V Deshmukh. Formal verification and control with conformal prediction: Practical safety guarantees for autonomous systems.IEEE Control Systems, 45(6):72–122, 2025

  23. [23]

    Lennart Ljung and Torsten Söderström.Theory and practice of recursive identification. 1983

  24. [24]

    Predictability: A problem partly solved

    Edward N Lorenz. Predictability: A problem partly solved. InProc. Seminar on predictability, volume 1, pages 1–18. Reading, 1996

  25. [25]

    Model predictive control: past, present and future.Computers & chemical engineering, 23(4-5):667–682, 1999

    Manfred Morari and Jay H Lee. Model predictive control: past, present and future.Computers & chemical engineering, 23(4-5):667–682, 1999

  26. [26]

    Adaptive conformal inference by betting

    Aleksandr Podkopaev, Darren Xu, and Kuang-chih Lee. Adaptive conformal inference by betting. InProceedings of the 41st International Conference on Machine Learning, pages 40886–40907, 2024. 11

  27. [27]

    Learning control barrier functions from expert demonstra- tions

    Alexander Robey, Haimin Hu, Lars Lindemann, Hanwen Zhang, Dimos V Dimarogonas, Stephen Tu, and Nikolai Matni. Learning control barrier functions from expert demonstra- tions. In2020 59th IEEE Conference on Decision and Control (CDC), pages 3717–3724. Ieee, 2020

  28. [28]

    Suboptimal model predictive control (feasibility implies stability).IEEE Transactions on Automatic Control, 44(3):648– 654, 2002

    Pierre OM Scokaert, David Q Mayne, and James B Rawlings. Suboptimal model predictive control (feasibility implies stability).IEEE Transactions on Automatic Control, 44(3):648– 654, 2002

  29. [29]

    A tutorial on conformal prediction.Journal of Machine Learning Research, 9(3), 2008

    Glenn Shafer and Vladimir V ovk. A tutorial on conformal prediction.Journal of Machine Learning Research, 9(3), 2008

  30. [30]

    Safe pomdp online planning among dynamic agents via adaptive conformal prediction.IEEE Robotics and Au- tomation Letters, 2024

    Shili Sheng, Pian Yu, David Parker, Marta Kwiatkowska, and Lu Feng. Safe pomdp online planning among dynamic agents via adaptive conformal prediction.IEEE Robotics and Au- tomation Letters, 2024

  31. [31]

    A general framework for multi-step ahead adaptive conformal heteroscedastic time series forecasting.Neurocomputing, 608:128434, 2024

    Martim Sousa, Ana Maria Tomé, and José Moreira. A general framework for multi-step ahead adaptive conformal heteroscedastic time series forecasting.Neurocomputing, 608:128434, 2024

  32. [32]

    Synthesis of con- trol barrier functions using a supervised machine learning approach

    Mohit Srinivasan, Amogh Dabholkar, Samuel Coogan, and Patricio A Vela. Synthesis of con- trol barrier functions using a supervised machine learning approach. In2020 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), pages 7139–7145. Ieee, 2020

  33. [33]

    Conformal predictive safety filter for rl controllers in dynamic environments.IEEE Robotics and Automation Letters, 8(11):7833– 7840, 2023

    Kegan J Strawn, Nora Ayanian, and Lars Lindemann. Conformal predictive safety filter for rl controllers in dynamic environments.IEEE Robotics and Automation Letters, 8(11):7833– 7840, 2023

  34. [34]

    Adaptive conformal inference for multi-step ahead time-series forecasting online.arXiv preprint arXiv:2409.14792, 2024

    Johan Hallberg Szabadváry. Adaptive conformal inference for multi-step ahead time-series forecasting online.arXiv preprint arXiv:2409.14792, 2024

  35. [35]

    Learning for safety-critical control with control barrier functions

    Andrew Taylor, Andrew Singletary, Yisong Yue, and Aaron Ames. Learning for safety-critical control with control barrier functions. InLearning for dynamics and control, pages 708–717. PMLR, 2020

  36. [36]

    Recovery rl: Safe reinforcement learning with learned recovery zones.IEEE Robotics and Automation Letters, 6(3):4915–4922, 2021

    Brijen Thananjeyan, Ashwin Balakrishna, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E Gonzalez, Julian Ibarz, Chelsea Finn, and Ken Goldberg. Recovery rl: Safe reinforcement learning with learned recovery zones.IEEE Robotics and Automation Letters, 6(3):4915–4922, 2021

  37. [37]

    Conformal prediction under covariate shift.Advances in neural information processing systems, 32, 2019

    Ryan J Tibshirani, Rina Foygel Barber, Emmanuel Candes, and Aaditya Ramdas. Conformal prediction under covariate shift.Advances in neural information processing systems, 32, 2019

  38. [38]

    Behavioral Cloning from Observation

    Faraz Torabi, Garrett Warnell, and Peter Stone. Behavioral cloning from observation.arXiv preprint arXiv:1805.01954, 2018

  39. [39]

    Gymnasium: A Standard Interface for Reinforcement Learning Environments

    Mark Towers, Ariel Kwiatkowski, Jordan Terry, John U Balis, Gianluca De Cola, Tristan Deleu, Manuel Goulão, Andreas Kallinteris, Markus Krimmel, Arjun KG, et al. Gym- nasium: A standard interface for reinforcement learning environments.arXiv preprint arXiv:2407.17032, 2024

  40. [40]

    Springer, 2005

    Vladimir V ovk, Alexander Gammerman, and Glenn Shafer.Algorithmic learning in a random world. Springer, 2005

  41. [41]

    On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming.Mathematical programming, 106(1):25–57, 2006

    Andreas Wächter and Lorenz T Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming.Mathematical programming, 106(1):25–57, 2006

  42. [42]

    Bellman conformal inference: Calibrating prediction intervals for time series.arXiv preprint arXiv:2402.05203, 2024

    Zitong Yang, Emmanuel Candès, and Lihua Lei. Bellman conformal inference: Calibrating prediction intervals for time series.arXiv preprint arXiv:2402.05203, 2024

  43. [43]

    Sonic: Safe social navigation with adaptive conformal inference and constrained reinforcement learning.arXiv preprint arXiv:2407.17460, 2024

    Jianpeng Yao, Xiaopan Zhang, Yu Xia, Zejin Wang, Amit K Roy-Chowdhury, and Jiachen Li. Sonic: Safe social navigation with adaptive conformal inference and constrained reinforcement learning.arXiv preprint arXiv:2407.17460, 2024. 12

  44. [44]

    Adaptive conformal predictions for time series

    Margaux Zaffran, Olivier Féron, Yannig Goude, Julie Josse, and Aymeric Dieuleveut. Adaptive conformal predictions for time series. InInternational Conference on Machine Learning, pages 25834–25866. PMLR, 2022

  45. [45]

    Safety-critical control with uncertainty quantifica- tion using adaptive conformal prediction

    Hao Zhou, Yanze Zhang, and Wenhao Luo. Safety-critical control with uncertainty quantifica- tion using adaptive conformal prediction. In2024 American Control Conference (ACC), pages 574–580. IEEE, 2024

  46. [46]

    Online convex programming and generalized infinitesimal gradient ascent

    Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. InProceedings of the 20th international conference on machine learning (icml-03), pages 928– 936, 2003. 13 Appendix A Proof A.1 Proof of Lemma 1 Lemma 1(Parameter Boundedness).Let{δ k}be generated by the update rule(7). Under Assump- tions 3 and 4, ifδ 1 is initiali...

  47. [47]

    By Assumption 4, the controller enforcesbh(xk0−2,u k0−2)≥M, which impliess k0−1 ≤M= ˆQk0−1, leading toe k0−1 = 0andL k0−1 = 0

    Ifδ k0−1 <0, then by definition ˆQk0−1(δk0−1) =M. By Assumption 4, the controller enforcesbh(xk0−2,u k0−2)≥M, which impliess k0−1 ≤M= ˆQk0−1, leading toe k0−1 = 0andL k0−1 = 0. This contradictsL k0−1 > α

  48. [48]

    Upper bound:The argument follows symmetrically

    Ifδ k0−1 ≥0, the minimum possibleδ k0 is0+γ(α−L max) =−γ(L max −α), establishing the contradiction. Upper bound:The argument follows symmetrically. Ifδ k0 >1 +γα, consider the minimal such k0. Forδ k0−1 >1, we have ˆQk0−1 =−ϵ(since1−δ k0−1 <0), forcinge k0 = 1andL k0 >1> α, which decreasesδ k0, a contradiction. Forδ k0−1 ≤1, the maximum possibleδ k0 is1 +...