Confidence Sequences for Online Statistical Model Checking of Markov Decision Processes
Pith reviewed 2026-06-25 20:05 UTC · model grok-4.3
The pith
Confidence sequences designed for online sampling yield valid statistical guarantees for Markov decision processes while using 50 times fewer samples than prior methods on average.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the presented confidence sequences remain valid in the online setting for MDPs, deliver tighter bounds than union-bound methods, and enable an implementation that requires 50 times fewer samples on average than the previous state of the art while still producing meaningful statistical guarantees about the underlying value.
What carries the argument
Confidence sequences for online statistical model checking, which supply time-uniform concentration bounds that stay valid under adaptive stopping rules.
If this is right
- Statistical conclusions about unknown transition probabilities in MDPs can be obtained without assuming fixed sample sizes in advance.
- Bounds on the true value of an MDP can be tightened enough to decide properties that union-bound methods leave inconclusive.
- Verification of cyber-physical and biological models becomes feasible under realistic sampling budgets.
- Existing tools that rely on union bounds or risk invalid online conclusions can be replaced by the new sequences.
Where Pith is reading between the lines
- The same sequences could be tested on related models such as stochastic games that add an adversarial player.
- Integration into existing probabilistic model checkers might reduce the computational cost of handling parameter uncertainty.
- The sample reduction suggests the approach could scale verification to MDPs with larger state spaces than previously practical.
- Real-time monitoring applications might use the online property to update guarantees as new observations arrive.
Load-bearing premise
The sequences stay valid and non-vacuous when applied to MDPs that mix nondeterministic choices with probabilistic transitions and when sampling stops based on observed data.
What would settle it
Execute the new implementation on a standard MDP benchmark suite and record either an incorrect statistical bound on at least one instance or an average sample count no lower than the previous best method.
Figures
read the original abstract
Markov decision processes (MDPs) are a classic model of decision making under uncertainty, exhibiting both non-deterministic choice as well as probabilistic uncertainty. Traditionally, exact knowledge of the underlying probabilities is assumed. However, this often is unrealistic, e.g.\ when modelling cyber-physical systems or biological processes. Here, statistical methods provide a way towards obtaining meaningful guarantees. The classical approach is to gather samples in the MDP, use these to draw statistical conclusions about the transition probabilities, and from there obtain bounds on the true value; then, if these bounds are too broad, repeat. However, existing implementations of this approach are either subtly incorrect or sub-optimal, and quite often both. We present several \emph{confidence sequences}, which are specifically designed for such \enquote{online} settings, implement all of them in an efficient tool, and show their practical applicability. In particular, we show that they outperform classical \enquote{union-bound} style approaches, and overall our implementation requires 50x less samples on average than previous state of the art.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes several confidence sequences specifically designed for online statistical model checking of MDPs that incorporate both nondeterministic action choices and probabilistic transitions. It implements them in an efficient tool, shows they outperform classical union-bound approaches, and claims the implementation requires 50x fewer samples on average than prior state-of-the-art methods.
Significance. If the sequences remain valid and non-vacuous under adaptive sampling and stopping, the work would offer a practical advance in sample-efficient statistical verification of MDPs for cyber-physical and biological systems. The reported 50x reduction would represent a substantial efficiency gain if supported by rigorous guarantees and reproducible experiments.
major comments (2)
- [theoretical construction / proof of validity] The central theoretical claim that the sequences are valid for online MDP settings is load-bearing for both correctness and the 50x sample-reduction result, yet the provided text supplies no explicit argument addressing the dependence between martingale increments and the stopping time when the stopping rule and sampling policy are functions of running value estimates that include a max over actions.
- [empirical evaluation] § on empirical results (or equivalent): the 50x average sample reduction is presented without visible error bars, benchmark descriptions, or controls for the specific MDP structure (nondeterminism + adaptivity), making it impossible to assess whether the bounds remain tight or whether hidden looseness from the online stopping rule is present.
minor comments (1)
- The abstract would be clearer if it briefly named the specific families of confidence sequences (e.g., via Ville, betting, or other constructions) rather than referring only to “several confidence sequences.”
Simulated Author's Rebuttal
We thank the referee for the careful review and constructive comments on the theoretical validity and empirical presentation. We address each major comment below.
read point-by-point responses
-
Referee: [theoretical construction / proof of validity] The central theoretical claim that the sequences are valid for online MDP settings is load-bearing for both correctness and the 50x sample-reduction result, yet the provided text supplies no explicit argument addressing the dependence between martingale increments and the stopping time when the stopping rule and sampling policy are functions of running value estimates that include a max over actions.
Authors: We agree that the manuscript would be strengthened by an explicit argument addressing the dependence between martingale increments and the stopping time under adaptive policies that depend on running value estimates (including the max over actions). The confidence sequences are constructed via martingale methods intended to remain valid under optional stopping and adaptive sampling, but we acknowledge the need for a dedicated clarification. In the revision we will add a subsection providing a proof outline that the relevant processes remain martingales (or supermartingales) despite the dependence induced by the policy and the max operator, thereby confirming validity of the sequences and the sample-efficiency claims. revision: yes
-
Referee: [empirical evaluation] § on empirical results (or equivalent): the 50x average sample reduction is presented without visible error bars, benchmark descriptions, or controls for the specific MDP structure (nondeterminism + adaptivity), making it impossible to assess whether the bounds remain tight or whether hidden looseness from the online stopping rule is present.
Authors: We agree that the empirical section would benefit from additional detail to allow assessment of tightness and the effect of the online stopping rule. In the revision we will add error bars to the reported sample-reduction figures, expand the benchmark descriptions to explicitly characterize the nondeterminism and adaptivity of each MDP, and include controls or ablations that isolate the contribution of the online stopping rule versus fixed-sample baselines. revision: yes
Circularity Check
No circularity: empirical performance claims rest on independent implementation and testing
full rationale
The paper introduces confidence sequences for online model checking in MDPs and reports an empirical result (50x fewer samples than prior art) obtained by implementing and testing the sequences. No equations, fitted parameters, or self-citations appear in the abstract that would reduce any claimed prediction or uniqueness result to a definition or input by construction. The derivation chain is therefore self-contained against external benchmarks; the performance metric is measured rather than defined in terms of itself.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: COLT
Agarwal, A., Kakade, S.M., Yang, L.F.: Model-based reinforcement learning with a generative model is minimax optimal. In: COLT. Proceedings of Machine Learning Research, vol. 125, pp. 67–83. PMLR (2020)
2020
-
[2]
Agarwal, C., Guha, S., Křetínsk` y, J., Muruganandham, P.: PAC statistical model checking of mean payoff in discrete-and continuous-time MDP. In: CAV. pp. 3–25. Springer (2022)
2022
-
[3]
In: AAAI
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI. pp. 2669–2678. AAAI Press (2018)
2018
-
[4]
In: CAV, Part I
Ashok, P., Kretínský, J., Weininger, M.: PAC statistical model checking for Markov decision processes and stochastic games. In: CAV, Part I. LNCS, vol. 11561, pp. 497–519. Springer (2019)
2019
-
[5]
In: International Conference on Quantitative Evaluation of Systems
Bacci, G., Ingólfsdóttir, A., Larsen, K.G., Reynouard, R.: An mm algorithm to estimate parameters in continuous-time markov chains. In: International Conference on Quantitative Evaluation of Systems. pp. 82–100. Springer (2023)
2023
-
[6]
In: AAAI
Badings, T.S., Romao, L., Abate, A., Jansen, N.: Probabilities are not enough: For- mal controller synthesis for stochastic dynamical models with epistemic uncertainty. In: AAAI. pp. 14701–14710. AAAI Press (2023)
2023
-
[7]
Badings, T.S., Romao, L., Abate, A., Parker, D., Poonawala, H.A., Stoelinga, M., Jansen, N.: Robust control for dynamical systems with non-gaussian noise via formal abstractions. J. Artif. Intell. Res.76, 341–391 (2023)
2023
-
[8]
Badings, T.S., Simão, T.D., Suilen, M., Jansen, N.: Decision-making under uncer- tainty: beyond probabilities. Int. J. Softw. Tools Technol. Transf.25(3), 375–391 (2023)
2023
-
[9]
In: NFM, Lecture Notes in Computer Science, vol
Baier, C., Dubslaff, C., Wienhöft, P., Kiebel, S.J.: Strategy synthesis in Markov decision processes under limited sampling access. In: NFM, Lecture Notes in Computer Science, vol. 13903, pp. 86–103. Springer (2023)
2023
-
[10]
In: NASA Formal Methods Symposium
Baier, C., Dubslaff, C., Wienhöft, P., Kiebel, S.J.: Strategy synthesis in markov decision processes under limited sampling access. In: NASA Formal Methods Symposium. pp. 86–103. Springer (2023)
2023
-
[11]
MIT Press (2008),https: //mitpress.mit.edu/books/principles-model-checking
Baier, C., Katoen, J.: Principles of model checking. MIT Press (2008),https: //mitpress.mit.edu/books/principles-model-checking
2008
-
[12]
Bazille,H.,Genest,B.,Jégourel,C.,Sun,J.:GlobalPACboundsforlearningdiscrete time Markov chains. In: CAV (2). Lecture Notes in Computer Science, vol. 12225, pp. 304–326. Springer (2020).https://doi.org/10.1007/978-3-030-53291-8_17
-
[13]
John Wiley & Sons (2017)
Billingsley, P.: Probability and measure. John Wiley & Sons (2017)
2017
-
[14]
Cambridge university press (2004)
Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge university press (2004)
2004
-
[15]
Artificial Intelligence121(1-2), 31–47 (2000)
Brafman, R.I., Tennenholtz, M.: A near-optimal polynomial time algorithm for learning in certain classes of stochastic games. Artificial Intelligence121(1-2), 31–47 (2000)
2000
-
[16]
TheoretiCS4(2025)
Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Kretínský, J., Kwiatkowska, M., Meggendorfer, T., Parker, D., Ujma, M.: Learning algorithms for verification of markov decision processes. TheoretiCS4(2025)
2025
-
[17]
In: ATVA
Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Křetínský, J., Kwiatkowska, M.Z., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: ATVA. pp. 98–114. Springer (2014).https://doi.org/10.1007/ 978-3-319-11936-6_8
2014
-
[18]
Statistical Science16(2), 101 – 133 (2001).https://doi.org/10.1214/ss/ 1009213286 20 K
Brown, L.D., Cai, T.T., DasGupta, A.: Interval Estimation for a Binomial Propor- tion. Statistical Science16(2), 101 – 133 (2001).https://doi.org/10.1214/ss/ 1009213286 20 K. Kueffner et al
work page doi:10.1214/ss/ 2001
-
[19]
In: TACAS (to appear, preprint available athttps://arxiv.org/pdf/2411.00559)
Budde, C.E., Hartmanns, A., Meggendorfer, T., Weininger, M., Wienhöft, P.: Sound statistical model checking for probabilities and expected rewards. In: TACAS (to appear, preprint available athttps://arxiv.org/pdf/2411.00559). Lecture Notes in Computer Science, Springer (2025)
arXiv 2025
-
[20]
In: 25 Years of Model Check- ing
Chatterjee, K., Henzinger, T.A.: Value iteration. In: 25 Years of Model Check- ing. Lecture Notes in Computer Science, vol. 5000, pp. 107–138. Springer (2008). https://doi.org/10.1007/978-3-540-69850-0_7, https://doi.org/10. 1007/978-3-540-69850-0_7
-
[21]
Biometrika26(4), 404–413 (1934)
Clopper, C.J., Pearson, E.S.: The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika26(4), 404–413 (1934)
1934
-
[22]
In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems
Daca, P., Henzinger, T.A., Křetínsk` y, J., Petrov, T.: Faster statistical model checking for unbounded temporal properties. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. pp. 112–129. Springer (2016)
2016
-
[23]
Proceedings of the National Academy of Sciences58(1), 66–68 (1967)
Darling, D.A., Robbins, H.: Confidence sequences for mean, variance, and median. Proceedings of the National Academy of Sciences58(1), 66–68 (1967)
1967
-
[24]
In: International Symposium on Automated Technology for Verification and Analysis
El Rabih, D., Pekergin, N.: Statistical model checking using perfect simulation. In: International Symposium on Automated Technology for Verification and Analysis. pp. 120–134. Springer (2009)
2009
-
[25]
The American Statistician64(3), 242–249 (2010).https://doi.org/10.1198/tast.2010.09140
Frey, J.: Fixed-width sequential confidence intervals for a proportion. The American Statistician64(3), 242–249 (2010).https://doi.org/10.1198/tast.2010.09140
-
[26]
In: Robotics: Science and Systems (2014).https: //doi.org/10.15607/RSS.2014.X.039
Fu, J., Topcu, U.: Probably approximately correct MDP learning and control with temporal logic constraints. In: Robotics: Science and Systems (2014).https: //doi.org/10.15607/RSS.2014.X.039
-
[27]
Artificial Intelligence122(1-2), 71–109 (2000)
Givan, R., Leach, S., Dean, T.: Bounded-parameter Markov decision processes. Artificial Intelligence122(1-2), 71–109 (2000)
2000
-
[28]
Oxford university press (2020)
Grimmett, G., Stirzaker, D.: Probability and random processes. Oxford university press (2020)
2020
-
[29]
Haddad, S., Monmege, B.: Interval iteration algorithm for MDPs and IMDPs. Theor. Comput. Sci.735, 111–131 (2018).https://doi.org/10.1016/j.tcs.2016. 12.003
-
[30]
Hartmanns, A., Junges, S., Quatmann, T., Weininger, M.: A practitioner’s guide to MDP model checking algorithms. In: TACAS (1). LNCS, vol. 13993, pp. 469–488. Springer (2023).https://doi.org/10.1007/978-3-031-30823-9_24
-
[31]
STTT (to appear), Preprint: https://sjunges.github.io/files/revised_practitioners_guide.pdf (2026)
Hartmanns, A., Junges, S., Quatmann, T., Weininger, M.: The revised practi- tioner’s guide to MDP model checking algorithms. STTT (to appear), Preprint: https://sjunges.github.io/files/revised_practitioners_guide.pdf (2026)
2026
-
[32]
In: AAAI
HasanzadeZonuzy, A., Bura, A., Kalathil, D.M., Shakkottai, S.: Learning with safety constraints: Sample complexity of reinforcement learning for constrained MDPs. In: AAAI. pp. 7667–7674. AAAI Press (2021)
2021
-
[33]
He, R., Jennings, P., Basu, S., Ghosh, A.P., Wu, H.: A bounded statistical approach for model checking of unbounded until properties. In: ASE. pp. 225–234. ACM (2010)
2010
-
[34]
In: Inter- national Conference on Runtime Verification
Henzinger, T.A., Kueffner, K., Singh, V., Sun, I.: Alignment monitoring. In: Inter- national Conference on Runtime Verification. pp. 140–159. Springer (2025)
2025
-
[35]
Journal of the American Statistical Association58(301), 13–30 (1963)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association58(301), 13–30 (1963)
1963
-
[36]
Probability Surveys17, 257–317 (2020)
Howard, S.R., Ramdas, A., McAuliffe, J., Sekhon, J.: Time-uniform chernoff bounds via nonnegative supermartingales. Probability Surveys17, 257–317 (2020)
2020
-
[37]
The Annals of Statistics49(2), 1055–1080 (2021) Confidence Sequences for Online MDP-SMC 21
Howard, S.R., Ramdas, A., McAuliffe, J., Sekhon, J.: Time-uniform, nonparametric, nonasymptotic confidence sequences. The Annals of Statistics49(2), 1055–1080 (2021) Confidence Sequences for Online MDP-SMC 21
2021
-
[38]
In: Conference on Learning Theory
Jamieson, K., Malloy, M., Nowak, R., Bubeck, S.: lil’ucb: An optimal exploration algorithm for multi-armed bandits. In: Conference on Learning Theory. pp. 423–439. PMLR (2014)
2014
-
[39]
In: ICML
Jin, C., Krishnamurthy, A., Simchowitz, M., Yu, T.: Reward-free exploration for reinforcement learning. In: ICML. Proceedings of Machine Learning Research, vol. 119, pp. 4870–4879. PMLR (2020)
2020
-
[40]
Fundamenta Mathematicae6(1), 9–20 (1924)
Khintchine, A.: über einen satz der wahrscheinlichkeitsrechnung. Fundamenta Mathematicae6(1), 9–20 (1924)
1924
-
[41]
Semi- simplicial set models for distributed knowledge
Kretínský, J., Meggendorfer, T., Weininger, M.: Stopping criteria for value itera- tion on stochastic games with quantitative objectives. In: LICS. pp. 1–14. IEEE (2023). https://doi.org/10.1109/LICS56636.2023.10175771, https://doi.org/ 10.1109/LICS56636.2023.10175771
-
[42]
Kretínský, J., Michel, F., Michel, L., Pérez, G.A.: Finite-memory near-optimal learning for Markov decision processes with long-run average reward. In: UAI. Proceedings of Machine Learning Research, vol. 124, pp. 1149–1158. AUAI Press (2020)
2020
-
[43]
In: CONCUR
Křetínský, J., Pérez, G.A., Raskin, J.F.: Learning-based mean-payoff optimization in an unknown MDP under omega-regular constraints. In: CONCUR. pp. 8:1–8:18. Dagstuhl (2018)
2018
-
[44]
In: International conference on computer aided verification
Kwiatkowska, M., Norman, G., Parker, D.: Prism 4.0: Verification of probabilistic real-time systems. In: International conference on computer aided verification. pp. 585–591. Springer (2011)
2011
-
[45]
In: QEST
Kwiatkowska, M.Z., Norman, G., Parker, D.: The PRISM benchmark suite. In: QEST. pp. 203–204. IEEE Computer Society (2012).https://doi.org/10.1109/ QEST.2012.14
2012
-
[46]
In: Rein- forcement Learning, Adaptation, Learning, and Optimization, vol
Lange, S., Gabel, T., Riedmiller, M.A.: Batch reinforcement learning. In: Rein- forcement Learning, Adaptation, Learning, and Optimization, vol. 12, pp. 45–73. Springer (2012)
2012
-
[47]
Cambridge University Press (2020)
Lattimore, T., Szepesvári, C.: Bandit algorithms. Cambridge University Press (2020)
2020
-
[48]
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. pp. 541–548. AAAI Press (1999)
1999
-
[49]
In: COLT (2009),http://www.cs.mcgill.ca/%7Ecolt2009/papers/012
Maurer, A., Pontil, M.: Empirical bernstein bounds and sample-variance penal- ization. In: COLT (2009),http://www.cs.mcgill.ca/%7Ecolt2009/papers/012. pdf#page=1
2009
-
[50]
In: ATVA
Meggendorfer, T.: PET - A partial exploration tool for probabilistic verification. In: ATVA. Lecture Notes in Computer Science, vol. 13505, pp. 320–326. Springer (2022)
2022
-
[51]
In: CAV (3)
Meggendorfer, T., Weininger, M.: Playing games with your PET: extending the partial exploration tool to stochastic games. In: CAV (3). Lecture Notes in Computer Science, vol. 14683, pp. 359–372. Springer (2024)
2024
-
[52]
Meggendorfer, T., Weininger, M., Wienhöft, P.: Solving robust markov deci- sion processes: Generic, reliable, efficient. In: AAAI. pp. 26631–26641. AAAI Press (2025). https://doi.org/10.1609/AAAI.V39I25.34865, https://doi.org/ 10.1609/aaai.v39i25.34865
-
[53]
Meggendorfer, T., Weininger, M., Wienhöft, P.: What are the odds? improving statistical model checking of markov decision processes. In: QEST+FORMATS. Lecture Notes in Computer Science, vol. 16143, pp. 195–218. Springer (2025). https://doi.org/10.1007/978-3-032-05792-1_11, https://doi.org/10.1007/ 978-3-032-05792-1_11 22 K. Kueffner et al
-
[54]
Perez, M., Somenzi, F., Trivedi, A.: A PAC learning algorithm for LTL and omega- regular objectives in mdps. In: AAAI. pp. 21510–21517. AAAI Press (2024).https: //doi.org/10.1609/AAAI.V38I19.30148
-
[55]
John Wiley and Sons (1994)
Puterman, M.L.: Markov decision processes: Discrete stochastic dynamic program- ming. John Wiley and Sons (1994)
1994
-
[56]
Foundations and Trends® in Statistics1(1-2), 1–390 (2025)
Ramdas, A., Wang, R.: Hypothesis testing with e-values. Foundations and Trends® in Statistics1(1-2), 1–390 (2025)
2025
-
[57]
In: International Conference on Computer Aided Verification
Sen, K., Viswanathan, M., Agha, G.: Statistical model checking of black-box probabilistic systems. In: International Conference on Computer Aided Verification. pp. 202–215. Springer (2004)
2004
-
[58]
In: ICML
Shi, L., Li, G., Wei, Y., Chen, Y., Chi, Y.: Pessimistic q-learning for offline rein- forcement learning: Towards optimal sample complexity. In: ICML. Proceedings of Machine Learning Research, vol. 162, pp. 19967–20025. PMLR (2022)
2022
-
[59]
In: NeurIPS (2022)
Suilen, M., Simão, T.D., Parker, D., Jansen, N.: Robust anytime learning of Markov decision processes. In: NeurIPS (2022)
2022
-
[60]
Tsybakov.Introduction to Nonparametric Estimation
Tsybakov, A.B.: Introduction to Nonparametric Estimation. Springer Series in Statistics, Springer, New York, NY (2009).https://doi.org/10.1007/b13794
-
[61]
In: COLT
Wagenmaker, A.J., Simchowitz, M., Jamieson, K.: Beyond no regret: Instance- dependent PAC reinforcement learning. In: COLT. Proceedings of Machine Learning Research, vol. 178, pp. 358–418. PMLR (2022)
2022
-
[62]
Wainwright, M.J.: High-dimensional statistics: A non-asymptotic viewpoint, vol. 48. Cambridge university press (2019)
2019
-
[63]
In: Breakthroughs in statistics: Foundations and basic theory, pp
Wald, A.: Sequential tests of statistical hypotheses. In: Breakthroughs in statistics: Foundations and basic theory, pp. 256–298. Springer (1992)
1992
-
[64]
Journal of the Royal Statistical Society Series B: Statistical Methodology 86(1), 1–27 (2024)
Waudby-Smith, I., Ramdas, A.: Estimating means of bounded random variables by betting. Journal of the Royal Statistical Society Series B: Statistical Methodology 86(1), 1–27 (2024)
2024
-
[65]
Weininger, M., Grover, K., Misra, S., Kretínský, J.: Guaranteed trade-offs in dynamic information flow tracking games. In: CDC. pp. 3786–3793. IEEE (2021). https://doi.org/10.1109/CDC45484.2021.9683447
-
[66]
IEEE Trans
Wen, M., Topcu, U.: Probably approximately correct learning in adversarial envi- ronments with temporal logic specifications. IEEE Trans. Autom. Control.67(10), 5055–5070 (2022)
2022
-
[67]
Wienhöft, P.: Statistical model checking with robust markov decision processes (2025),https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-1000969
2025
-
[68]
In: 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
Wienhöft, P., Suilen, M., Simão, T.D., Dubslaff, C., Baier, C., Jansen, N.: More for less: Safe policy improvement with stronger performance guarantees. In: 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023. pp. 4406–4415. International Joint Conferences on Artificial Intelligence (IJCAI) (2023)
2023
-
[69]
In: SBMF
Younes, H.L.S., Clarke, E.M., Zuliani, P.: Statistical verification of probabilistic properties with unbounded until. In: SBMF. Lecture Notes in Computer Science, vol. 6527, pp. 144–160. Springer (2010)
2010
-
[70]
seen” the entire system, can apply graph-based analysis, and switch to the grey box approach, using the remaining “unspent
Younes, H.L.S., Simmons, R.G.: Probabilistic verification of discrete event systems using acceptance sampling. In: CAV. pp. 223–235. Springer (2002) Confidence Sequences for Online MDP-SMC 23 A Assumptions on Sampling Access and Knowledge In this appendix, we describe the different settings for the MDP-SMC problem that have appeared in the literature. The...
2002
-
[71]
failure event
We show this using a standard argument from information theory, i.e., we derive a Le Cam-style lower bound, which reduces this problem to binary hypothesis testing. Towards that end let p− := 1 2 −w and p+ := 1 2 + w. Notice that sincep+ −p − = 2w > w, no interval of width at mostw can contain bothp− and p+ simultaneously. We construct a hypothesis test t...
2037
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.