Confidence Sequences for Online Statistical Model Checking of Markov Decision Processes

Konstantin Kueffner; Maximilian Weininger; Patrick Wienh\"oft; Tobias Meggendorfer

arxiv: 2606.25797 · v1 · pith:UCSQL4BHnew · submitted 2026-06-24 · 💻 cs.AI

Confidence Sequences for Online Statistical Model Checking of Markov Decision Processes

Konstantin Kueffner , Tobias Meggendorfer , Maximilian Weininger , Patrick Wienh\"oft This is my paper

Pith reviewed 2026-06-25 20:05 UTC · model grok-4.3

classification 💻 cs.AI

keywords confidence sequencesstatistical model checkingMarkov decision processesonline samplingsample efficiencystatistical verificationcyber-physical systems

0 comments

The pith

Confidence sequences designed for online sampling yield valid statistical guarantees for Markov decision processes while using 50 times fewer samples than prior methods on average.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops several confidence sequences that support sequential sampling from MDPs where decisions to stop or continue depend on the data collected so far. These sequences are constructed to handle the combination of nondeterministic choices and probabilistic transitions without relying on fixed sample sizes. The authors implement the sequences in an efficient tool and compare performance against classical union-bound approaches. A sympathetic reader would care because MDPs model systems like cyber-physical devices and biological processes where exact transition probabilities are unavailable and must be estimated from samples.

Core claim

The central claim is that the presented confidence sequences remain valid in the online setting for MDPs, deliver tighter bounds than union-bound methods, and enable an implementation that requires 50 times fewer samples on average than the previous state of the art while still producing meaningful statistical guarantees about the underlying value.

What carries the argument

Confidence sequences for online statistical model checking, which supply time-uniform concentration bounds that stay valid under adaptive stopping rules.

If this is right

Statistical conclusions about unknown transition probabilities in MDPs can be obtained without assuming fixed sample sizes in advance.
Bounds on the true value of an MDP can be tightened enough to decide properties that union-bound methods leave inconclusive.
Verification of cyber-physical and biological models becomes feasible under realistic sampling budgets.
Existing tools that rely on union bounds or risk invalid online conclusions can be replaced by the new sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sequences could be tested on related models such as stochastic games that add an adversarial player.
Integration into existing probabilistic model checkers might reduce the computational cost of handling parameter uncertainty.
The sample reduction suggests the approach could scale verification to MDPs with larger state spaces than previously practical.
Real-time monitoring applications might use the online property to update guarantees as new observations arrive.

Load-bearing premise

The sequences stay valid and non-vacuous when applied to MDPs that mix nondeterministic choices with probabilistic transitions and when sampling stops based on observed data.

What would settle it

Execute the new implementation on a standard MDP benchmark suite and record either an incorrect statistical bound on at least one instance or an average sample count no lower than the previous best method.

Figures

Figures reproduced from arXiv: 2606.25797 by Konstantin Kueffner, Maximilian Weininger, Patrick Wienh\"oft, Tobias Meggendorfer.

**Figure 2.** Figure 2: Example MDP to show the problem with re-using confidence budget. [PITH_FULL_IMAGE:figures/full_fig_p026_2.png] view at source ↗

**Figure 3.** Figure 3: Bounds on uniform distributions [PITH_FULL_IMAGE:figures/full_fig_p046_3.png] view at source ↗

**Figure 4.** Figure 4: Bounds on linear distributions [PITH_FULL_IMAGE:figures/full_fig_p047_4.png] view at source ↗

**Figure 5.** Figure 5: Bounds on squared distributions [PITH_FULL_IMAGE:figures/full_fig_p048_5.png] view at source ↗

**Figure 6.** Figure 6: Bounds on one-vs-all distributions [PITH_FULL_IMAGE:figures/full_fig_p049_6.png] view at source ↗

read the original abstract

Markov decision processes (MDPs) are a classic model of decision making under uncertainty, exhibiting both non-deterministic choice as well as probabilistic uncertainty. Traditionally, exact knowledge of the underlying probabilities is assumed. However, this often is unrealistic, e.g.\ when modelling cyber-physical systems or biological processes. Here, statistical methods provide a way towards obtaining meaningful guarantees. The classical approach is to gather samples in the MDP, use these to draw statistical conclusions about the transition probabilities, and from there obtain bounds on the true value; then, if these bounds are too broad, repeat. However, existing implementations of this approach are either subtly incorrect or sub-optimal, and quite often both. We present several \emph{confidence sequences}, which are specifically designed for such \enquote{online} settings, implement all of them in an efficient tool, and show their practical applicability. In particular, we show that they outperform classical \enquote{union-bound} style approaches, and overall our implementation requires 50x less samples on average than previous state of the art.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New confidence sequences tailored to online MDP model checking with a reported 50x sample cut, but the adaptive stopping and nondeterminism handling needs explicit verification.

read the letter

The paper supplies confidence sequences built for the online loop in statistical model checking of MDPs, where samples are drawn adaptively until value bounds tighten. They implement the sequences in a tool and report that the approach uses far fewer samples than union-bound baselines, with an average 50x reduction in their tests.

What stands out is the focus on the repeated sampling setting that arises when you cannot assume known probabilities. The authors note that prior methods are often either incorrect or inefficient in this loop, and they target the combination of nondeterministic choices and probabilistic transitions that defines MDPs. The empirical comparison is the clearest practical takeaway.

The main open question is whether the sequences stay valid and reasonably tight once the stopping time depends on the running estimates. In an MDP the value involves a max over actions, so the sampling policy and the decision to stop can both feed back into the data. Standard time-uniform bounds can lose their guarantee or inflate under that dependence unless the proof includes an explicit correction. The abstract states the sequences are designed for online settings, but the stress-test concern about hidden looseness from the feedback loop is worth checking in the full derivation and experiments.

The 50x figure is an average over their benchmarks; seeing the variance, the exact test MDPs, and how the baselines were implemented would strengthen the claim. Minor implementation details like how they handle the max operator in the bounds would also help.

This is for researchers in statistical model checking and formal verification of systems with uncertain transitions. A reader already working on cyber-physical or biological MDPs would find the sequences and the tool useful. The work is coherent enough on its own terms to deserve referee time even if the adaptive guarantee requires tightening.

Referee Report

2 major / 1 minor

Summary. The paper proposes several confidence sequences specifically designed for online statistical model checking of MDPs that incorporate both nondeterministic action choices and probabilistic transitions. It implements them in an efficient tool, shows they outperform classical union-bound approaches, and claims the implementation requires 50x fewer samples on average than prior state-of-the-art methods.

Significance. If the sequences remain valid and non-vacuous under adaptive sampling and stopping, the work would offer a practical advance in sample-efficient statistical verification of MDPs for cyber-physical and biological systems. The reported 50x reduction would represent a substantial efficiency gain if supported by rigorous guarantees and reproducible experiments.

major comments (2)

[theoretical construction / proof of validity] The central theoretical claim that the sequences are valid for online MDP settings is load-bearing for both correctness and the 50x sample-reduction result, yet the provided text supplies no explicit argument addressing the dependence between martingale increments and the stopping time when the stopping rule and sampling policy are functions of running value estimates that include a max over actions.
[empirical evaluation] § on empirical results (or equivalent): the 50x average sample reduction is presented without visible error bars, benchmark descriptions, or controls for the specific MDP structure (nondeterminism + adaptivity), making it impossible to assess whether the bounds remain tight or whether hidden looseness from the online stopping rule is present.

minor comments (1)

The abstract would be clearer if it briefly named the specific families of confidence sequences (e.g., via Ville, betting, or other constructions) rather than referring only to “several confidence sequences.”

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive comments on the theoretical validity and empirical presentation. We address each major comment below.

read point-by-point responses

Referee: [theoretical construction / proof of validity] The central theoretical claim that the sequences are valid for online MDP settings is load-bearing for both correctness and the 50x sample-reduction result, yet the provided text supplies no explicit argument addressing the dependence between martingale increments and the stopping time when the stopping rule and sampling policy are functions of running value estimates that include a max over actions.

Authors: We agree that the manuscript would be strengthened by an explicit argument addressing the dependence between martingale increments and the stopping time under adaptive policies that depend on running value estimates (including the max over actions). The confidence sequences are constructed via martingale methods intended to remain valid under optional stopping and adaptive sampling, but we acknowledge the need for a dedicated clarification. In the revision we will add a subsection providing a proof outline that the relevant processes remain martingales (or supermartingales) despite the dependence induced by the policy and the max operator, thereby confirming validity of the sequences and the sample-efficiency claims. revision: yes
Referee: [empirical evaluation] § on empirical results (or equivalent): the 50x average sample reduction is presented without visible error bars, benchmark descriptions, or controls for the specific MDP structure (nondeterminism + adaptivity), making it impossible to assess whether the bounds remain tight or whether hidden looseness from the online stopping rule is present.

Authors: We agree that the empirical section would benefit from additional detail to allow assessment of tightness and the effect of the online stopping rule. In the revision we will add error bars to the reported sample-reduction figures, expand the benchmark descriptions to explicitly characterize the nondeterminism and adaptivity of each MDP, and include controls or ablations that isolate the contribution of the online stopping rule versus fixed-sample baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on independent implementation and testing

full rationale

The paper introduces confidence sequences for online model checking in MDPs and reports an empirical result (50x fewer samples than prior art) obtained by implementing and testing the sequences. No equations, fitted parameters, or self-citations appear in the abstract that would reduce any claimed prediction or uniqueness result to a definition or input by construction. The derivation chain is therefore self-contained against external benchmarks; the performance metric is measured rather than defined in terms of itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. Standard assumptions of statistical model checking (i.i.d. samples, bounded rewards) are implicitly relied upon but not enumerated.

pith-pipeline@v0.9.1-grok · 5724 in / 965 out tokens · 15395 ms · 2026-06-25T20:05:02.950937+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 13 canonical work pages

[1]

In: COLT

Agarwal, A., Kakade, S.M., Yang, L.F.: Model-based reinforcement learning with a generative model is minimax optimal. In: COLT. Proceedings of Machine Learning Research, vol. 125, pp. 67–83. PMLR (2020)

2020
[2]

Agarwal, C., Guha, S., Křetínsk` y, J., Muruganandham, P.: PAC statistical model checking of mean payoff in discrete-and continuous-time MDP. In: CAV. pp. 3–25. Springer (2022)

2022
[3]

In: AAAI

Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI. pp. 2669–2678. AAAI Press (2018)

2018
[4]

In: CAV, Part I

Ashok, P., Kretínský, J., Weininger, M.: PAC statistical model checking for Markov decision processes and stochastic games. In: CAV, Part I. LNCS, vol. 11561, pp. 497–519. Springer (2019)

2019
[5]

In: International Conference on Quantitative Evaluation of Systems

Bacci, G., Ingólfsdóttir, A., Larsen, K.G., Reynouard, R.: An mm algorithm to estimate parameters in continuous-time markov chains. In: International Conference on Quantitative Evaluation of Systems. pp. 82–100. Springer (2023)

2023
[6]

In: AAAI

Badings, T.S., Romao, L., Abate, A., Jansen, N.: Probabilities are not enough: For- mal controller synthesis for stochastic dynamical models with epistemic uncertainty. In: AAAI. pp. 14701–14710. AAAI Press (2023)

2023
[7]

Badings, T.S., Romao, L., Abate, A., Parker, D., Poonawala, H.A., Stoelinga, M., Jansen, N.: Robust control for dynamical systems with non-gaussian noise via formal abstractions. J. Artif. Intell. Res.76, 341–391 (2023)

2023
[8]

Badings, T.S., Simão, T.D., Suilen, M., Jansen, N.: Decision-making under uncer- tainty: beyond probabilities. Int. J. Softw. Tools Technol. Transf.25(3), 375–391 (2023)

2023
[9]

In: NFM, Lecture Notes in Computer Science, vol

Baier, C., Dubslaff, C., Wienhöft, P., Kiebel, S.J.: Strategy synthesis in Markov decision processes under limited sampling access. In: NFM, Lecture Notes in Computer Science, vol. 13903, pp. 86–103. Springer (2023)

2023
[10]

In: NASA Formal Methods Symposium

Baier, C., Dubslaff, C., Wienhöft, P., Kiebel, S.J.: Strategy synthesis in markov decision processes under limited sampling access. In: NASA Formal Methods Symposium. pp. 86–103. Springer (2023)

2023
[11]

MIT Press (2008),https: //mitpress.mit.edu/books/principles-model-checking

Baier, C., Katoen, J.: Principles of model checking. MIT Press (2008),https: //mitpress.mit.edu/books/principles-model-checking

2008
[12]

In: CAV (2)

Bazille,H.,Genest,B.,Jégourel,C.,Sun,J.:GlobalPACboundsforlearningdiscrete time Markov chains. In: CAV (2). Lecture Notes in Computer Science, vol. 12225, pp. 304–326. Springer (2020).https://doi.org/10.1007/978-3-030-53291-8_17

work page doi:10.1007/978-3-030-53291-8_17 2020
[13]

John Wiley & Sons (2017)

Billingsley, P.: Probability and measure. John Wiley & Sons (2017)

2017
[14]

Cambridge university press (2004)

Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge university press (2004)

2004
[15]

Artificial Intelligence121(1-2), 31–47 (2000)

Brafman, R.I., Tennenholtz, M.: A near-optimal polynomial time algorithm for learning in certain classes of stochastic games. Artificial Intelligence121(1-2), 31–47 (2000)

2000
[16]

TheoretiCS4(2025)

Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Kretínský, J., Kwiatkowska, M., Meggendorfer, T., Parker, D., Ujma, M.: Learning algorithms for verification of markov decision processes. TheoretiCS4(2025)

2025
[17]

In: ATVA

Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Křetínský, J., Kwiatkowska, M.Z., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: ATVA. pp. 98–114. Springer (2014).https://doi.org/10.1007/ 978-3-319-11936-6_8

2014
[18]

Statistical Science16(2), 101 – 133 (2001).https://doi.org/10.1214/ss/ 1009213286 20 K

Brown, L.D., Cai, T.T., DasGupta, A.: Interval Estimation for a Binomial Propor- tion. Statistical Science16(2), 101 – 133 (2001).https://doi.org/10.1214/ss/ 1009213286 20 K. Kueffner et al

work page doi:10.1214/ss/ 2001
[19]

In: TACAS (to appear, preprint available athttps://arxiv.org/pdf/2411.00559)

Budde, C.E., Hartmanns, A., Meggendorfer, T., Weininger, M., Wienhöft, P.: Sound statistical model checking for probabilities and expected rewards. In: TACAS (to appear, preprint available athttps://arxiv.org/pdf/2411.00559). Lecture Notes in Computer Science, Springer (2025)

arXiv 2025
[20]

In: 25 Years of Model Check- ing

Chatterjee, K., Henzinger, T.A.: Value iteration. In: 25 Years of Model Check- ing. Lecture Notes in Computer Science, vol. 5000, pp. 107–138. Springer (2008). https://doi.org/10.1007/978-3-540-69850-0_7, https://doi.org/10. 1007/978-3-540-69850-0_7

work page doi:10.1007/978-3-540-69850-0_7 2008
[21]

Biometrika26(4), 404–413 (1934)

Clopper, C.J., Pearson, E.S.: The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika26(4), 404–413 (1934)

1934
[22]

In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems

Daca, P., Henzinger, T.A., Křetínsk` y, J., Petrov, T.: Faster statistical model checking for unbounded temporal properties. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. pp. 112–129. Springer (2016)

2016
[23]

Proceedings of the National Academy of Sciences58(1), 66–68 (1967)

Darling, D.A., Robbins, H.: Confidence sequences for mean, variance, and median. Proceedings of the National Academy of Sciences58(1), 66–68 (1967)

1967
[24]

In: International Symposium on Automated Technology for Verification and Analysis

El Rabih, D., Pekergin, N.: Statistical model checking using perfect simulation. In: International Symposium on Automated Technology for Verification and Analysis. pp. 120–134. Springer (2009)

2009
[25]

The American Statistician64(3), 242–249 (2010).https://doi.org/10.1198/tast.2010.09140

Frey, J.: Fixed-width sequential confidence intervals for a proportion. The American Statistician64(3), 242–249 (2010).https://doi.org/10.1198/tast.2010.09140

work page doi:10.1198/tast.2010.09140 2010
[26]

In: Robotics: Science and Systems (2014).https: //doi.org/10.15607/RSS.2014.X.039

Fu, J., Topcu, U.: Probably approximately correct MDP learning and control with temporal logic constraints. In: Robotics: Science and Systems (2014).https: //doi.org/10.15607/RSS.2014.X.039

work page doi:10.15607/rss.2014.x.039 2014
[27]

Artificial Intelligence122(1-2), 71–109 (2000)

Givan, R., Leach, S., Dean, T.: Bounded-parameter Markov decision processes. Artificial Intelligence122(1-2), 71–109 (2000)

2000
[28]

Oxford university press (2020)

Grimmett, G., Stirzaker, D.: Probability and random processes. Oxford university press (2020)

2020
[29]

Haddad, S., Monmege, B.: Interval iteration algorithm for MDPs and IMDPs. Theor. Comput. Sci.735, 111–131 (2018).https://doi.org/10.1016/j.tcs.2016. 12.003

work page doi:10.1016/j.tcs.2016 2018
[30]

In: TACAS (1)

Hartmanns, A., Junges, S., Quatmann, T., Weininger, M.: A practitioner’s guide to MDP model checking algorithms. In: TACAS (1). LNCS, vol. 13993, pp. 469–488. Springer (2023).https://doi.org/10.1007/978-3-031-30823-9_24

work page doi:10.1007/978-3-031-30823-9_24 2023
[31]

STTT (to appear), Preprint: https://sjunges.github.io/files/revised_practitioners_guide.pdf (2026)

Hartmanns, A., Junges, S., Quatmann, T., Weininger, M.: The revised practi- tioner’s guide to MDP model checking algorithms. STTT (to appear), Preprint: https://sjunges.github.io/files/revised_practitioners_guide.pdf (2026)

2026
[32]

In: AAAI

HasanzadeZonuzy, A., Bura, A., Kalathil, D.M., Shakkottai, S.: Learning with safety constraints: Sample complexity of reinforcement learning for constrained MDPs. In: AAAI. pp. 7667–7674. AAAI Press (2021)

2021
[33]

He, R., Jennings, P., Basu, S., Ghosh, A.P., Wu, H.: A bounded statistical approach for model checking of unbounded until properties. In: ASE. pp. 225–234. ACM (2010)

2010
[34]

In: Inter- national Conference on Runtime Verification

Henzinger, T.A., Kueffner, K., Singh, V., Sun, I.: Alignment monitoring. In: Inter- national Conference on Runtime Verification. pp. 140–159. Springer (2025)

2025
[35]

Journal of the American Statistical Association58(301), 13–30 (1963)

Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association58(301), 13–30 (1963)

1963
[36]

Probability Surveys17, 257–317 (2020)

Howard, S.R., Ramdas, A., McAuliffe, J., Sekhon, J.: Time-uniform chernoff bounds via nonnegative supermartingales. Probability Surveys17, 257–317 (2020)

2020
[37]

The Annals of Statistics49(2), 1055–1080 (2021) Confidence Sequences for Online MDP-SMC 21

Howard, S.R., Ramdas, A., McAuliffe, J., Sekhon, J.: Time-uniform, nonparametric, nonasymptotic confidence sequences. The Annals of Statistics49(2), 1055–1080 (2021) Confidence Sequences for Online MDP-SMC 21

2021
[38]

In: Conference on Learning Theory

Jamieson, K., Malloy, M., Nowak, R., Bubeck, S.: lil’ucb: An optimal exploration algorithm for multi-armed bandits. In: Conference on Learning Theory. pp. 423–439. PMLR (2014)

2014
[39]

In: ICML

Jin, C., Krishnamurthy, A., Simchowitz, M., Yu, T.: Reward-free exploration for reinforcement learning. In: ICML. Proceedings of Machine Learning Research, vol. 119, pp. 4870–4879. PMLR (2020)

2020
[40]

Fundamenta Mathematicae6(1), 9–20 (1924)

Khintchine, A.: über einen satz der wahrscheinlichkeitsrechnung. Fundamenta Mathematicae6(1), 9–20 (1924)

1924
[41]

Structure-aware lower bounds and broadening the horizon of tractability for QBF

Kretínský, J., Meggendorfer, T., Weininger, M.: Stopping criteria for value itera- tion on stochastic games with quantitative objectives. In: LICS. pp. 1–14. IEEE (2023). https://doi.org/10.1109/LICS56636.2023.10175771, https://doi.org/ 10.1109/LICS56636.2023.10175771

work page doi:10.1109/lics56636.2023.10175771 2023
[42]

Kretínský, J., Michel, F., Michel, L., Pérez, G.A.: Finite-memory near-optimal learning for Markov decision processes with long-run average reward. In: UAI. Proceedings of Machine Learning Research, vol. 124, pp. 1149–1158. AUAI Press (2020)

2020
[43]

In: CONCUR

Křetínský, J., Pérez, G.A., Raskin, J.F.: Learning-based mean-payoff optimization in an unknown MDP under omega-regular constraints. In: CONCUR. pp. 8:1–8:18. Dagstuhl (2018)

2018
[44]

In: International conference on computer aided verification

Kwiatkowska, M., Norman, G., Parker, D.: Prism 4.0: Verification of probabilistic real-time systems. In: International conference on computer aided verification. pp. 585–591. Springer (2011)

2011
[45]

In: QEST

Kwiatkowska, M.Z., Norman, G., Parker, D.: The PRISM benchmark suite. In: QEST. pp. 203–204. IEEE Computer Society (2012).https://doi.org/10.1109/ QEST.2012.14

2012
[46]

In: Rein- forcement Learning, Adaptation, Learning, and Optimization, vol

Lange, S., Gabel, T., Riedmiller, M.A.: Batch reinforcement learning. In: Rein- forcement Learning, Adaptation, Learning, and Optimization, vol. 12, pp. 45–73. Springer (2012)

2012
[47]

Cambridge University Press (2020)

Lattimore, T., Szepesvári, C.: Bandit algorithms. Cambridge University Press (2020)

2020
[48]

Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. pp. 541–548. AAAI Press (1999)

1999
[49]

In: COLT (2009),http://www.cs.mcgill.ca/%7Ecolt2009/papers/012

Maurer, A., Pontil, M.: Empirical bernstein bounds and sample-variance penal- ization. In: COLT (2009),http://www.cs.mcgill.ca/%7Ecolt2009/papers/012. pdf#page=1

2009
[50]

In: ATVA

Meggendorfer, T.: PET - A partial exploration tool for probabilistic verification. In: ATVA. Lecture Notes in Computer Science, vol. 13505, pp. 320–326. Springer (2022)

2022
[51]

In: CAV (3)

Meggendorfer, T., Weininger, M.: Playing games with your PET: extending the partial exploration tool to stochastic games. In: CAV (3). Lecture Notes in Computer Science, vol. 14683, pp. 359–372. Springer (2024)

2024
[52]

In: AAAI

Meggendorfer, T., Weininger, M., Wienhöft, P.: Solving robust markov deci- sion processes: Generic, reliable, efficient. In: AAAI. pp. 26631–26641. AAAI Press (2025). https://doi.org/10.1609/AAAI.V39I25.34865, https://doi.org/ 10.1609/aaai.v39i25.34865

work page doi:10.1609/aaai.v39i25.34865 2025
[53]

In: QEST+FORMATS

Meggendorfer, T., Weininger, M., Wienhöft, P.: What are the odds? improving statistical model checking of markov decision processes. In: QEST+FORMATS. Lecture Notes in Computer Science, vol. 16143, pp. 195–218. Springer (2025). https://doi.org/10.1007/978-3-032-05792-1_11, https://doi.org/10.1007/ 978-3-032-05792-1_11 22 K. Kueffner et al

work page doi:10.1007/978-3-032-05792-1_11 2025
[54]

In: AAAI

Perez, M., Somenzi, F., Trivedi, A.: A PAC learning algorithm for LTL and omega- regular objectives in mdps. In: AAAI. pp. 21510–21517. AAAI Press (2024).https: //doi.org/10.1609/AAAI.V38I19.30148

work page doi:10.1609/aaai.v38i19.30148 2024
[55]

John Wiley and Sons (1994)

Puterman, M.L.: Markov decision processes: Discrete stochastic dynamic program- ming. John Wiley and Sons (1994)

1994
[56]

Foundations and Trends® in Statistics1(1-2), 1–390 (2025)

Ramdas, A., Wang, R.: Hypothesis testing with e-values. Foundations and Trends® in Statistics1(1-2), 1–390 (2025)

2025
[57]

In: International Conference on Computer Aided Verification

Sen, K., Viswanathan, M., Agha, G.: Statistical model checking of black-box probabilistic systems. In: International Conference on Computer Aided Verification. pp. 202–215. Springer (2004)

2004
[58]

In: ICML

Shi, L., Li, G., Wei, Y., Chen, Y., Chi, Y.: Pessimistic q-learning for offline rein- forcement learning: Towards optimal sample complexity. In: ICML. Proceedings of Machine Learning Research, vol. 162, pp. 19967–20025. PMLR (2022)

2022
[59]

In: NeurIPS (2022)

Suilen, M., Simão, T.D., Parker, D., Jansen, N.: Robust anytime learning of Markov decision processes. In: NeurIPS (2022)

2022
[60]

Tsybakov.Introduction to Nonparametric Estimation

Tsybakov, A.B.: Introduction to Nonparametric Estimation. Springer Series in Statistics, Springer, New York, NY (2009).https://doi.org/10.1007/b13794

work page doi:10.1007/b13794 2009
[61]

In: COLT

Wagenmaker, A.J., Simchowitz, M., Jamieson, K.: Beyond no regret: Instance- dependent PAC reinforcement learning. In: COLT. Proceedings of Machine Learning Research, vol. 178, pp. 358–418. PMLR (2022)

2022
[62]

Wainwright, M.J.: High-dimensional statistics: A non-asymptotic viewpoint, vol. 48. Cambridge university press (2019)

2019
[63]

In: Breakthroughs in statistics: Foundations and basic theory, pp

Wald, A.: Sequential tests of statistical hypotheses. In: Breakthroughs in statistics: Foundations and basic theory, pp. 256–298. Springer (1992)

1992
[64]

Journal of the Royal Statistical Society Series B: Statistical Methodology 86(1), 1–27 (2024)

Waudby-Smith, I., Ramdas, A.: Estimating means of bounded random variables by betting. Journal of the Royal Statistical Society Series B: Statistical Methodology 86(1), 1–27 (2024)

2024
[65]

Weininger, M., Grover, K., Misra, S., Kretínský, J.: Guaranteed trade-offs in dynamic information flow tracking games. In: CDC. pp. 3786–3793. IEEE (2021). https://doi.org/10.1109/CDC45484.2021.9683447

work page doi:10.1109/cdc45484.2021.9683447 2021
[66]

IEEE Trans

Wen, M., Topcu, U.: Probably approximately correct learning in adversarial envi- ronments with temporal logic specifications. IEEE Trans. Autom. Control.67(10), 5055–5070 (2022)

2022
[67]

Wienhöft, P.: Statistical model checking with robust markov decision processes (2025),https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-1000969

2025
[68]

In: 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023

Wienhöft, P., Suilen, M., Simão, T.D., Dubslaff, C., Baier, C., Jansen, N.: More for less: Safe policy improvement with stronger performance guarantees. In: 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023. pp. 4406–4415. International Joint Conferences on Artificial Intelligence (IJCAI) (2023)

2023
[69]

In: SBMF

Younes, H.L.S., Clarke, E.M., Zuliani, P.: Statistical verification of probabilistic properties with unbounded until. In: SBMF. Lecture Notes in Computer Science, vol. 6527, pp. 144–160. Springer (2010)

2010
[70]

seen” the entire system, can apply graph-based analysis, and switch to the grey box approach, using the remaining “unspent

Younes, H.L.S., Simmons, R.G.: Probabilistic verification of discrete event systems using acceptance sampling. In: CAV. pp. 223–235. Springer (2002) Confidence Sequences for Online MDP-SMC 23 A Assumptions on Sampling Access and Knowledge In this appendix, we describe the different settings for the MDP-SMC problem that have appeared in the literature. The...

2002
[71]

failure event

We show this using a standard argument from information theory, i.e., we derive a Le Cam-style lower bound, which reduces this problem to binary hypothesis testing. Towards that end let p− := 1 2 −w and p+ := 1 2 + w. Notice that sincep+ −p − = 2w > w, no interval of width at mostw can contain bothp− and p+ simultaneously. We construct a hypothesis test t...

2037

[1] [1]

In: COLT

Agarwal, A., Kakade, S.M., Yang, L.F.: Model-based reinforcement learning with a generative model is minimax optimal. In: COLT. Proceedings of Machine Learning Research, vol. 125, pp. 67–83. PMLR (2020)

2020

[2] [2]

Agarwal, C., Guha, S., Křetínsk` y, J., Muruganandham, P.: PAC statistical model checking of mean payoff in discrete-and continuous-time MDP. In: CAV. pp. 3–25. Springer (2022)

2022

[3] [3]

In: AAAI

Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI. pp. 2669–2678. AAAI Press (2018)

2018

[4] [4]

In: CAV, Part I

Ashok, P., Kretínský, J., Weininger, M.: PAC statistical model checking for Markov decision processes and stochastic games. In: CAV, Part I. LNCS, vol. 11561, pp. 497–519. Springer (2019)

2019

[5] [5]

In: International Conference on Quantitative Evaluation of Systems

Bacci, G., Ingólfsdóttir, A., Larsen, K.G., Reynouard, R.: An mm algorithm to estimate parameters in continuous-time markov chains. In: International Conference on Quantitative Evaluation of Systems. pp. 82–100. Springer (2023)

2023

[6] [6]

In: AAAI

Badings, T.S., Romao, L., Abate, A., Jansen, N.: Probabilities are not enough: For- mal controller synthesis for stochastic dynamical models with epistemic uncertainty. In: AAAI. pp. 14701–14710. AAAI Press (2023)

2023

[7] [7]

Badings, T.S., Romao, L., Abate, A., Parker, D., Poonawala, H.A., Stoelinga, M., Jansen, N.: Robust control for dynamical systems with non-gaussian noise via formal abstractions. J. Artif. Intell. Res.76, 341–391 (2023)

2023

[8] [8]

Badings, T.S., Simão, T.D., Suilen, M., Jansen, N.: Decision-making under uncer- tainty: beyond probabilities. Int. J. Softw. Tools Technol. Transf.25(3), 375–391 (2023)

2023

[9] [9]

In: NFM, Lecture Notes in Computer Science, vol

Baier, C., Dubslaff, C., Wienhöft, P., Kiebel, S.J.: Strategy synthesis in Markov decision processes under limited sampling access. In: NFM, Lecture Notes in Computer Science, vol. 13903, pp. 86–103. Springer (2023)

2023

[10] [10]

In: NASA Formal Methods Symposium

Baier, C., Dubslaff, C., Wienhöft, P., Kiebel, S.J.: Strategy synthesis in markov decision processes under limited sampling access. In: NASA Formal Methods Symposium. pp. 86–103. Springer (2023)

2023

[11] [11]

MIT Press (2008),https: //mitpress.mit.edu/books/principles-model-checking

Baier, C., Katoen, J.: Principles of model checking. MIT Press (2008),https: //mitpress.mit.edu/books/principles-model-checking

2008

[12] [12]

In: CAV (2)

Bazille,H.,Genest,B.,Jégourel,C.,Sun,J.:GlobalPACboundsforlearningdiscrete time Markov chains. In: CAV (2). Lecture Notes in Computer Science, vol. 12225, pp. 304–326. Springer (2020).https://doi.org/10.1007/978-3-030-53291-8_17

work page doi:10.1007/978-3-030-53291-8_17 2020

[13] [13]

John Wiley & Sons (2017)

Billingsley, P.: Probability and measure. John Wiley & Sons (2017)

2017

[14] [14]

Cambridge university press (2004)

Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge university press (2004)

2004

[15] [15]

Artificial Intelligence121(1-2), 31–47 (2000)

Brafman, R.I., Tennenholtz, M.: A near-optimal polynomial time algorithm for learning in certain classes of stochastic games. Artificial Intelligence121(1-2), 31–47 (2000)

2000

[16] [16]

TheoretiCS4(2025)

Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Kretínský, J., Kwiatkowska, M., Meggendorfer, T., Parker, D., Ujma, M.: Learning algorithms for verification of markov decision processes. TheoretiCS4(2025)

2025

[17] [17]

In: ATVA

Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Křetínský, J., Kwiatkowska, M.Z., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: ATVA. pp. 98–114. Springer (2014).https://doi.org/10.1007/ 978-3-319-11936-6_8

2014

[18] [18]

Statistical Science16(2), 101 – 133 (2001).https://doi.org/10.1214/ss/ 1009213286 20 K

Brown, L.D., Cai, T.T., DasGupta, A.: Interval Estimation for a Binomial Propor- tion. Statistical Science16(2), 101 – 133 (2001).https://doi.org/10.1214/ss/ 1009213286 20 K. Kueffner et al

work page doi:10.1214/ss/ 2001

[19] [19]

In: TACAS (to appear, preprint available athttps://arxiv.org/pdf/2411.00559)

Budde, C.E., Hartmanns, A., Meggendorfer, T., Weininger, M., Wienhöft, P.: Sound statistical model checking for probabilities and expected rewards. In: TACAS (to appear, preprint available athttps://arxiv.org/pdf/2411.00559). Lecture Notes in Computer Science, Springer (2025)

arXiv 2025

[20] [20]

In: 25 Years of Model Check- ing

Chatterjee, K., Henzinger, T.A.: Value iteration. In: 25 Years of Model Check- ing. Lecture Notes in Computer Science, vol. 5000, pp. 107–138. Springer (2008). https://doi.org/10.1007/978-3-540-69850-0_7, https://doi.org/10. 1007/978-3-540-69850-0_7

work page doi:10.1007/978-3-540-69850-0_7 2008

[21] [21]

Biometrika26(4), 404–413 (1934)

Clopper, C.J., Pearson, E.S.: The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika26(4), 404–413 (1934)

1934

[22] [22]

In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems

Daca, P., Henzinger, T.A., Křetínsk` y, J., Petrov, T.: Faster statistical model checking for unbounded temporal properties. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. pp. 112–129. Springer (2016)

2016

[23] [23]

Proceedings of the National Academy of Sciences58(1), 66–68 (1967)

Darling, D.A., Robbins, H.: Confidence sequences for mean, variance, and median. Proceedings of the National Academy of Sciences58(1), 66–68 (1967)

1967

[24] [24]

In: International Symposium on Automated Technology for Verification and Analysis

El Rabih, D., Pekergin, N.: Statistical model checking using perfect simulation. In: International Symposium on Automated Technology for Verification and Analysis. pp. 120–134. Springer (2009)

2009

[25] [25]

The American Statistician64(3), 242–249 (2010).https://doi.org/10.1198/tast.2010.09140

Frey, J.: Fixed-width sequential confidence intervals for a proportion. The American Statistician64(3), 242–249 (2010).https://doi.org/10.1198/tast.2010.09140

work page doi:10.1198/tast.2010.09140 2010

[26] [26]

In: Robotics: Science and Systems (2014).https: //doi.org/10.15607/RSS.2014.X.039

Fu, J., Topcu, U.: Probably approximately correct MDP learning and control with temporal logic constraints. In: Robotics: Science and Systems (2014).https: //doi.org/10.15607/RSS.2014.X.039

work page doi:10.15607/rss.2014.x.039 2014

[27] [27]

Artificial Intelligence122(1-2), 71–109 (2000)

Givan, R., Leach, S., Dean, T.: Bounded-parameter Markov decision processes. Artificial Intelligence122(1-2), 71–109 (2000)

2000

[28] [28]

Oxford university press (2020)

Grimmett, G., Stirzaker, D.: Probability and random processes. Oxford university press (2020)

2020

[29] [29]

Haddad, S., Monmege, B.: Interval iteration algorithm for MDPs and IMDPs. Theor. Comput. Sci.735, 111–131 (2018).https://doi.org/10.1016/j.tcs.2016. 12.003

work page doi:10.1016/j.tcs.2016 2018

[30] [30]

In: TACAS (1)

Hartmanns, A., Junges, S., Quatmann, T., Weininger, M.: A practitioner’s guide to MDP model checking algorithms. In: TACAS (1). LNCS, vol. 13993, pp. 469–488. Springer (2023).https://doi.org/10.1007/978-3-031-30823-9_24

work page doi:10.1007/978-3-031-30823-9_24 2023

[31] [31]

STTT (to appear), Preprint: https://sjunges.github.io/files/revised_practitioners_guide.pdf (2026)

Hartmanns, A., Junges, S., Quatmann, T., Weininger, M.: The revised practi- tioner’s guide to MDP model checking algorithms. STTT (to appear), Preprint: https://sjunges.github.io/files/revised_practitioners_guide.pdf (2026)

2026

[32] [32]

In: AAAI

HasanzadeZonuzy, A., Bura, A., Kalathil, D.M., Shakkottai, S.: Learning with safety constraints: Sample complexity of reinforcement learning for constrained MDPs. In: AAAI. pp. 7667–7674. AAAI Press (2021)

2021

[33] [33]

He, R., Jennings, P., Basu, S., Ghosh, A.P., Wu, H.: A bounded statistical approach for model checking of unbounded until properties. In: ASE. pp. 225–234. ACM (2010)

2010

[34] [34]

In: Inter- national Conference on Runtime Verification

Henzinger, T.A., Kueffner, K., Singh, V., Sun, I.: Alignment monitoring. In: Inter- national Conference on Runtime Verification. pp. 140–159. Springer (2025)

2025

[35] [35]

Journal of the American Statistical Association58(301), 13–30 (1963)

Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association58(301), 13–30 (1963)

1963

[36] [36]

Probability Surveys17, 257–317 (2020)

Howard, S.R., Ramdas, A., McAuliffe, J., Sekhon, J.: Time-uniform chernoff bounds via nonnegative supermartingales. Probability Surveys17, 257–317 (2020)

2020

[37] [37]

The Annals of Statistics49(2), 1055–1080 (2021) Confidence Sequences for Online MDP-SMC 21

Howard, S.R., Ramdas, A., McAuliffe, J., Sekhon, J.: Time-uniform, nonparametric, nonasymptotic confidence sequences. The Annals of Statistics49(2), 1055–1080 (2021) Confidence Sequences for Online MDP-SMC 21

2021

[38] [38]

In: Conference on Learning Theory

Jamieson, K., Malloy, M., Nowak, R., Bubeck, S.: lil’ucb: An optimal exploration algorithm for multi-armed bandits. In: Conference on Learning Theory. pp. 423–439. PMLR (2014)

2014

[39] [39]

In: ICML

Jin, C., Krishnamurthy, A., Simchowitz, M., Yu, T.: Reward-free exploration for reinforcement learning. In: ICML. Proceedings of Machine Learning Research, vol. 119, pp. 4870–4879. PMLR (2020)

2020

[40] [40]

Fundamenta Mathematicae6(1), 9–20 (1924)

Khintchine, A.: über einen satz der wahrscheinlichkeitsrechnung. Fundamenta Mathematicae6(1), 9–20 (1924)

1924

[41] [41]

Structure-aware lower bounds and broadening the horizon of tractability for QBF

Kretínský, J., Meggendorfer, T., Weininger, M.: Stopping criteria for value itera- tion on stochastic games with quantitative objectives. In: LICS. pp. 1–14. IEEE (2023). https://doi.org/10.1109/LICS56636.2023.10175771, https://doi.org/ 10.1109/LICS56636.2023.10175771

work page doi:10.1109/lics56636.2023.10175771 2023

[42] [42]

Kretínský, J., Michel, F., Michel, L., Pérez, G.A.: Finite-memory near-optimal learning for Markov decision processes with long-run average reward. In: UAI. Proceedings of Machine Learning Research, vol. 124, pp. 1149–1158. AUAI Press (2020)

2020

[43] [43]

In: CONCUR

Křetínský, J., Pérez, G.A., Raskin, J.F.: Learning-based mean-payoff optimization in an unknown MDP under omega-regular constraints. In: CONCUR. pp. 8:1–8:18. Dagstuhl (2018)

2018

[44] [44]

In: International conference on computer aided verification

Kwiatkowska, M., Norman, G., Parker, D.: Prism 4.0: Verification of probabilistic real-time systems. In: International conference on computer aided verification. pp. 585–591. Springer (2011)

2011

[45] [45]

In: QEST

Kwiatkowska, M.Z., Norman, G., Parker, D.: The PRISM benchmark suite. In: QEST. pp. 203–204. IEEE Computer Society (2012).https://doi.org/10.1109/ QEST.2012.14

2012

[46] [46]

In: Rein- forcement Learning, Adaptation, Learning, and Optimization, vol

Lange, S., Gabel, T., Riedmiller, M.A.: Batch reinforcement learning. In: Rein- forcement Learning, Adaptation, Learning, and Optimization, vol. 12, pp. 45–73. Springer (2012)

2012

[47] [47]

Cambridge University Press (2020)

Lattimore, T., Szepesvári, C.: Bandit algorithms. Cambridge University Press (2020)

2020

[48] [48]

Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. pp. 541–548. AAAI Press (1999)

1999

[49] [49]

In: COLT (2009),http://www.cs.mcgill.ca/%7Ecolt2009/papers/012

Maurer, A., Pontil, M.: Empirical bernstein bounds and sample-variance penal- ization. In: COLT (2009),http://www.cs.mcgill.ca/%7Ecolt2009/papers/012. pdf#page=1

2009

[50] [50]

In: ATVA

Meggendorfer, T.: PET - A partial exploration tool for probabilistic verification. In: ATVA. Lecture Notes in Computer Science, vol. 13505, pp. 320–326. Springer (2022)

2022

[51] [51]

In: CAV (3)

Meggendorfer, T., Weininger, M.: Playing games with your PET: extending the partial exploration tool to stochastic games. In: CAV (3). Lecture Notes in Computer Science, vol. 14683, pp. 359–372. Springer (2024)

2024

[52] [52]

In: AAAI

Meggendorfer, T., Weininger, M., Wienhöft, P.: Solving robust markov deci- sion processes: Generic, reliable, efficient. In: AAAI. pp. 26631–26641. AAAI Press (2025). https://doi.org/10.1609/AAAI.V39I25.34865, https://doi.org/ 10.1609/aaai.v39i25.34865

work page doi:10.1609/aaai.v39i25.34865 2025

[53] [53]

In: QEST+FORMATS

Meggendorfer, T., Weininger, M., Wienhöft, P.: What are the odds? improving statistical model checking of markov decision processes. In: QEST+FORMATS. Lecture Notes in Computer Science, vol. 16143, pp. 195–218. Springer (2025). https://doi.org/10.1007/978-3-032-05792-1_11, https://doi.org/10.1007/ 978-3-032-05792-1_11 22 K. Kueffner et al

work page doi:10.1007/978-3-032-05792-1_11 2025

[54] [54]

In: AAAI

Perez, M., Somenzi, F., Trivedi, A.: A PAC learning algorithm for LTL and omega- regular objectives in mdps. In: AAAI. pp. 21510–21517. AAAI Press (2024).https: //doi.org/10.1609/AAAI.V38I19.30148

work page doi:10.1609/aaai.v38i19.30148 2024

[55] [55]

John Wiley and Sons (1994)

Puterman, M.L.: Markov decision processes: Discrete stochastic dynamic program- ming. John Wiley and Sons (1994)

1994

[56] [56]

Foundations and Trends® in Statistics1(1-2), 1–390 (2025)

Ramdas, A., Wang, R.: Hypothesis testing with e-values. Foundations and Trends® in Statistics1(1-2), 1–390 (2025)

2025

[57] [57]

In: International Conference on Computer Aided Verification

Sen, K., Viswanathan, M., Agha, G.: Statistical model checking of black-box probabilistic systems. In: International Conference on Computer Aided Verification. pp. 202–215. Springer (2004)

2004

[58] [58]

In: ICML

Shi, L., Li, G., Wei, Y., Chen, Y., Chi, Y.: Pessimistic q-learning for offline rein- forcement learning: Towards optimal sample complexity. In: ICML. Proceedings of Machine Learning Research, vol. 162, pp. 19967–20025. PMLR (2022)

2022

[59] [59]

In: NeurIPS (2022)

Suilen, M., Simão, T.D., Parker, D., Jansen, N.: Robust anytime learning of Markov decision processes. In: NeurIPS (2022)

2022

[60] [60]

Tsybakov.Introduction to Nonparametric Estimation

Tsybakov, A.B.: Introduction to Nonparametric Estimation. Springer Series in Statistics, Springer, New York, NY (2009).https://doi.org/10.1007/b13794

work page doi:10.1007/b13794 2009

[61] [61]

In: COLT

Wagenmaker, A.J., Simchowitz, M., Jamieson, K.: Beyond no regret: Instance- dependent PAC reinforcement learning. In: COLT. Proceedings of Machine Learning Research, vol. 178, pp. 358–418. PMLR (2022)

2022

[62] [62]

Wainwright, M.J.: High-dimensional statistics: A non-asymptotic viewpoint, vol. 48. Cambridge university press (2019)

2019

[63] [63]

In: Breakthroughs in statistics: Foundations and basic theory, pp

Wald, A.: Sequential tests of statistical hypotheses. In: Breakthroughs in statistics: Foundations and basic theory, pp. 256–298. Springer (1992)

1992

[64] [64]

Journal of the Royal Statistical Society Series B: Statistical Methodology 86(1), 1–27 (2024)

Waudby-Smith, I., Ramdas, A.: Estimating means of bounded random variables by betting. Journal of the Royal Statistical Society Series B: Statistical Methodology 86(1), 1–27 (2024)

2024

[65] [65]

Weininger, M., Grover, K., Misra, S., Kretínský, J.: Guaranteed trade-offs in dynamic information flow tracking games. In: CDC. pp. 3786–3793. IEEE (2021). https://doi.org/10.1109/CDC45484.2021.9683447

work page doi:10.1109/cdc45484.2021.9683447 2021

[66] [66]

IEEE Trans

Wen, M., Topcu, U.: Probably approximately correct learning in adversarial envi- ronments with temporal logic specifications. IEEE Trans. Autom. Control.67(10), 5055–5070 (2022)

2022

[67] [67]

Wienhöft, P.: Statistical model checking with robust markov decision processes (2025),https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-1000969

2025

[68] [68]

In: 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023

Wienhöft, P., Suilen, M., Simão, T.D., Dubslaff, C., Baier, C., Jansen, N.: More for less: Safe policy improvement with stronger performance guarantees. In: 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023. pp. 4406–4415. International Joint Conferences on Artificial Intelligence (IJCAI) (2023)

2023

[69] [69]

In: SBMF

Younes, H.L.S., Clarke, E.M., Zuliani, P.: Statistical verification of probabilistic properties with unbounded until. In: SBMF. Lecture Notes in Computer Science, vol. 6527, pp. 144–160. Springer (2010)

2010

[70] [70]

seen” the entire system, can apply graph-based analysis, and switch to the grey box approach, using the remaining “unspent

Younes, H.L.S., Simmons, R.G.: Probabilistic verification of discrete event systems using acceptance sampling. In: CAV. pp. 223–235. Springer (2002) Confidence Sequences for Online MDP-SMC 23 A Assumptions on Sampling Access and Knowledge In this appendix, we describe the different settings for the MDP-SMC problem that have appeared in the literature. The...

2002

[71] [71]

failure event

We show this using a standard argument from information theory, i.e., we derive a Le Cam-style lower bound, which reduces this problem to binary hypothesis testing. Towards that end let p− := 1 2 −w and p+ := 1 2 + w. Notice that sincep+ −p − = 2w > w, no interval of width at mostw can contain bothp− and p+ simultaneously. We construct a hypothesis test t...

2037