pith. sign in

arxiv: 2506.17792 · v2 · submitted 2025-06-21 · 💻 cs.AI · cs.LO· cs.SE

Accelerating Policy Synthesis in Large-Scale MDPs via Hierarchical Adaptive Refinement

Pith reviewed 2026-05-19 07:20 UTC · model grok-4.3

classification 💻 cs.AI cs.LOcs.SE
keywords Markov decision processespolicy synthesishierarchical refinementadaptive refinementlarge-scale MDPsnear-optimal policiescomputational speedup
0
0 comments X p. Extension

The pith

Hierarchical adaptive refinement produces near-optimal policies for million-state MDPs with up to 2x speedup.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an iterative method that refines only the most fragile regions of a large Markov decision process while leaving coarser areas untouched. This selective refinement balances the need for accuracy against the cost of computation, allowing policy synthesis to reach MDPs with up to one million states. A sympathetic reader cares because conventional solvers cannot handle the state spaces that arise in robotics and software product lines. The authors prove that the final composed policy stays near-optimal whenever a local solver supplies bounded error and boundary mismatches stay controllable.

Core claim

By repeatedly locating fragile MDP regions and refining them hierarchically, the method solves local sub-problems exactly and stitches the results into a global policy. Under standard assumptions the composed policy remains near-optimal, with total error bounded by the local solver tolerance plus any boundary mismatch. Experiments across diverse case studies demonstrate up to twofold speedup over PRISM on MDPs reaching one million states.

What carries the argument

Hierarchical adaptive refinement, an iterative process that selects and subdivides the most fragile MDP regions for targeted local solving.

If this is right

  • Policy synthesis becomes feasible for MDPs with up to one million states.
  • The error of the returned policy stays within the local solver tolerance plus boundary mismatch.
  • Refinement effort is spent only where it is needed, preserving efficiency.
  • Speedups reach a factor of two relative to the PRISM tool on the tested benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same selective-refinement idea could be applied to other stochastic planning problems whose state spaces grow too large for monolithic solvers.
  • Tighter control of boundary conditions at refinement interfaces would directly tighten the overall error bound.
  • Real-time robotic controllers could invoke the method on demand when new uncertainty regions appear.

Load-bearing premise

A local solver must exist that can guarantee bounded error inside each refined sub-region and boundary mismatches between refined and coarse parts must stay small enough not to destroy the overall guarantee.

What would settle it

Apply the method to a concrete one-million-state MDP, compute the realized policy error, and check whether that error exceeds the sum of the local solver tolerance and the observed boundary mismatch.

Figures

Figures reproduced from arXiv: 2506.17792 by Alexandros Evangelidis, Gricel V\'azquez, Simos Gerasimou.

Figure 1
Figure 1. Figure 1: A warehouse layout with three shelves (W1–W3), [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SHARP overview to be maximised/minimised), and refinement parameters that control the hierarchical decomposition of the approach. The first SHARP stage (S1) involves executing an initial partitioning of the MDP into a set of partitions, where each partition is a sub-MDP. SHARP uses this partitioning to construct the hierarchy tree (S2), whose root node corresponds to the entire set of states, and the child… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the sub-MDP creation and hierarchical refinement process underpinning SHARP. A 6 × 6 warehouse environment with shelves (W1–W3) is partitioned into nine zones. Because Block 9, red highlight, contains the goal state, we refine it further. Note that, Block 1 is a leaf node containing the start position: its boundary states (orange dashed) connect to and are influenced by the adjacent Block 2 and… view at source ↗
Figure 4
Figure 4. Figure 4: SHARP and PRISM performance comparison showing [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Performance stability analysis on random MW models. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Performance comparison between SHARP and PRISM for W1024* instances. Box plots show execution times (in [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Scalability analysis of PRISM vs. SHARP across [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Scatter plot of SHARP hyperparameter configurations [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
read the original abstract

Software-intensive systems, such as software product lines and robotics, utilise Markov decision processes (MDPs) to capture uncertainty and analyse sequential decision-making problems. Despite the usefulness of conventional policy synthesis methods, they fail to scale to large state spaces. Our approach addresses this issue and accelerates policy synthesis in large MDPs by dynamically refining the MDP and iteratively selecting the most fragile MDP regions for refinement. This iterative procedure offers a balance between accuracy and efficiency, as refinement occurs only when necessary. We formally show that the composed policy is near-optimal under standard assumptions, with error bounded by the local solver tolerance and boundary mismatch. Across diverse case studies and MDPs up to 1M states, we demonstrate that our approach achieves up to $2\times$ speedup over PRISM, offering a competitive solution for real-world policy synthesis in large MDPs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes a hierarchical adaptive refinement algorithm for policy synthesis in large MDPs. It iteratively identifies and refines fragile sub-regions of the state space while maintaining a coarse skeleton, proves that the resulting composed policy is near-optimal under standard MDP assumptions with an error bound consisting of local solver tolerance plus boundary mismatch, and reports up to 2× speedup versus PRISM on MDPs with up to 1 M states across several case studies.

Significance. If the formal error analysis can be completed with an explicit, computable bound on boundary mismatch, the method would provide a principled way to trade accuracy for scalability in MDP policy synthesis, which is relevant for robotics and software product lines. The empirical scaling results on 1 M state models are a positive indicator, but the significance hinges on whether the refinement strategy remains effective when the mismatch term is rigorously controlled.

major comments (1)
  1. [Proof of near-optimality (abstract and §4)] The central near-optimality claim (abstract and presumably the proof in §4) states that total error is bounded by local solver tolerance plus boundary mismatch and that this remains below a user threshold. However, no quantitative bound is supplied on how boundary mismatch grows with the number of refinement iterations or with the geometry of the chosen sub-regions. Without such a bound the induction that keeps the composed policy near-optimal does not close; this is load-bearing for the formal result.
minor comments (2)
  1. [Abstract] The abstract refers to “standard assumptions” without enumerating them; a short list or pointer to the precise statement in the main text would help readers assess applicability.
  2. [Experiments] The experimental section should report the exact tolerance values used for the local solver and the refinement threshold so that the claimed speed-up can be reproduced under the same error budget.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the empirical contributions on large-scale MDPs. The primary concern is the lack of a quantitative bound on boundary mismatch in the near-optimality proof. We address this directly below and commit to revisions that strengthen the formal result without altering the core claims or experimental findings.

read point-by-point responses
  1. Referee: [Proof of near-optimality (abstract and §4)] The central near-optimality claim (abstract and presumably the proof in §4) states that total error is bounded by local solver tolerance plus boundary mismatch and that this remains below a user threshold. However, no quantitative bound is supplied on how boundary mismatch grows with the number of refinement iterations or with the geometry of the chosen sub-regions. Without such a bound the induction that keeps the composed policy near-optimal does not close; this is load-bearing for the formal result.

    Authors: We agree that an explicit, computable bound on the boundary mismatch term is necessary to fully close the induction and make the overall guarantee practical. The current proof correctly decomposes the error into local solver tolerance (controlled by the user) plus boundary mismatch, but leaves the latter as a residual term. To address this, we will revise §4 by adding a new lemma that bounds the mismatch under standard assumptions (Lipschitz continuity of the value function, which is satisfied in the robotics and product-line MDPs we consider). The bound will be expressed in terms of sub-region diameter, number of boundary states, and the refinement threshold, allowing the user to select parameters that keep the total error below a desired threshold. We will also update the abstract and add a short discussion on how the bound scales with iterations. These changes will be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation self-contained

full rationale

The paper presents an algorithmic hierarchical refinement procedure for MDPs and claims a formal proof that the composed policy is near-optimal under standard assumptions, with error bounded by local solver tolerance plus boundary mismatch. This is supported by external empirical comparisons to PRISM on MDPs up to 1M states. No equations, self-citations, or fitted parameters are shown in the abstract or description that reduce the error bound, speedup, or near-optimality claim to a definition or input taken from the paper's own results. The derivation relies on independent local solvers and standard MDP assumptions rather than internal construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard MDP theory plus the assumption that local solvers can be composed without introducing uncontrolled error at boundaries.

axioms (1)
  • domain assumption Standard assumptions for MDP policy synthesis and near-optimality of composed policies
    Invoked when stating that the composed policy is near-optimal with bounded error.

pith-pipeline@v0.9.0 · 5682 in / 1198 out tokens · 40682 ms · 2026-05-19T07:20:42.103757+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 1 internal anchor

  1. [1]

    Edge intelligence—research opportunities for distributed computing continuum systems,

    V . C. Pujol, P. K. Donta, A. Morichetta, I. Murturi, and S. Dustdar, “Edge intelligence—research opportunities for distributed computing continuum systems,” IEEE Internet Computing , vol. 27, no. 4, pp. 53–74, 2023

  2. [2]

    A review of microservices autoscaling with formal verification perspective,

    S. N. A. Jawaddi, M. H. Johari, and A. Ismail, “A review of microservices autoscaling with formal verification perspective,” Software: Practice and Experience, vol. 52, no. 11, pp. 2476–2495, 2022

  3. [3]

    Coordinating hundreds of cooperative, autonomous vehicles in warehouses,

    P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating hundreds of cooperative, autonomous vehicles in warehouses,” in Proceedings of the 19th National Conference on Innovative Applications of Artificial Intelligence - Volume 2, ser. IAAI’07. AAAI Press, 2007, p. 1752–1759

  4. [4]

    Probabilistic model checking of robots deployed in extreme environments,

    X. Zhao, V . Robu, D. Flynn, F. Dinmohammadi, M. Fisher, and M. Webster, “Probabilistic model checking of robots deployed in extreme environments,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 8066–8074

  5. [5]

    An overview of verification and validation challenges for inspection robots,

    M. Fisher, R. C. Cardoso, E. C. Collins, C. Dadswell, L. A. Dennis, C. Dixon, M. Farrell, A. Ferrando, X. Huang, M. Jump et al. , “An overview of verification and validation challenges for inspection robots,” Robotics, vol. 10, no. 2, p. 67, 2021

  6. [6]

    Uncertainty in self-adaptive systems: A research community perspective,

    S. M. Hezavehi, D. Weyns, P. Avgeriou, R. Calinescu, R. Mirandola, and D. Perez-Palacin, “Uncertainty in self-adaptive systems: A research community perspective,”ACM Transactions on Autonomous and Adaptive Systems (TAAS), vol. 15, no. 4, pp. 1–36, 2021

  7. [7]

    Towards a research agenda for understanding and managing uncertainty in self-adaptive systems,

    D. Weyns, R. Calinescu, R. Mirandola, K. Tei, M. Acosta, N. Bencomo, A. Bennaceur, N. Boltz, T. Bures, J. Camara et al., “Towards a research agenda for understanding and managing uncertainty in self-adaptive systems,” ACM SIGSOFT Software Engineering Notes , vol. 48, no. 4, pp. 20–36, 2023

  8. [8]

    M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st ed. USA: John Wiley & Sons, Inc., 1994

  9. [10]

    Dynamic qos management and optimization in service- based systems,

    R. Calinescu, L. Grunske, M. Kwiatkowska, R. Mirandola, and G. Tamburrelli, “Dynamic qos management and optimization in service- based systems,” IEEE Trans. Softw. Eng. , vol. 37, no. 3, p. 387–409, May 2011. [Online]. Available: https://doi.org/10.1109/TSE.2010.92

  10. [11]

    Family- based modeling and analysis for probabilistic systems — featuring profeat,

    P. Chrszon, C. Dubslaff, S. Kl ¨uppelholz, and C. Baier, “Family- based modeling and analysis for probabilistic systems — featuring profeat,” in Proceedings of the 19th International Conference on Fundamental Approaches to Software Engineering - Volume 9633 . Berlin, Heidelberg: Springer-Verlag, 2016, p. 287–304. [Online]. Available: https://doi.org/10.10...

  11. [12]

    Multiple mobile robot task and motion planning: A survey,

    L. Antonyshyn, J. Silveira, S. Givigi, and J. Marshall, “Multiple mobile robot task and motion planning: A survey,” ACM Comput. Surv., vol. 55, no. 10, Feb. 2023. [Online]. Available: https://doi.org/10.1145/3564696

  12. [13]

    E. M. Clarke, W. Klieber, M. Nov ´aˇcek, and P. Zuliani, Model Checking and the State Explosion Problem . Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 1–30. [Online]. Available: https://doi.org/10.1007/978-3-642-35746-6 1

  13. [14]

    Baier and J.-P

    C. Baier and J.-P. Katoen, Principles of model checking . MIT press, 2008

  14. [15]

    Kwiatkowska, G

    M. Kwiatkowska, G. Norman, and D. Parker, Stochastic Model Checking. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 220–270. [Online]. Available: https://doi.org/10.1007/978-3-540-72522-0 6

  15. [16]

    Abstraction-refinement for hierarchical probabilistic models,

    S. Junges and M. T. Spaan, “Abstraction-refinement for hierarchical probabilistic models,” in International Conference on Computer Aided Verification. Springer, 2022, pp. 102–123

  16. [17]

    The 10,000 facets of mdp model checking,

    C. Baier, H. Hermanns, and J.-P. Katoen, “The 10,000 facets of mdp model checking,” Computing and Software Science: State of the Art and Perspectives, pp. 420–451, 2019

  17. [18]

    On correctness, precision, and performance in quantitative verification,

    C. E. Budde, A. Hartmanns, M. Klauck, J. K ˇret´ınsk´y, D. Parker, T. Quatmann, A. Turrini, and Z. Zhang, “On correctness, precision, and performance in quantitative verification,” in Leveraging Applications of Formal Methods, Verification and Validation: Tools and Trends , T. Margaria and B. Steffen, Eds. Cham: Springer International Publishing, 2021, pp...

  18. [19]

    PRISM 4.0: Verification of probabilistic real-time systems,

    M. Kwiatkowska, G. Norman, and D. Parker, “PRISM 4.0: Verification of probabilistic real-time systems,” in Proc. 23rd International Conference on Computer Aided Verification (CAV’11), ser. LNCS, G. Gopalakrishnan and S. Qadeer, Eds., vol. 6806. Springer, 2011, pp. 585–591

  19. [20]

    The probabilistic model checker storm,

    C. Hensel, S. Junges, J.-P. Katoen, T. Quatmann, and M. V olk, “The probabilistic model checker storm,” vol. 24, no. 4, p. 589–610, Aug

  20. [21]

    Available: https://doi.org/10.1007/s10009-021-00633-z

    [Online]. Available: https://doi.org/10.1007/s10009-021-00633-z

  21. [22]

    Symbolic model checking of probabilistic processes using mtbdds and the kronecker representation,

    L. d. Alfaro, M. Z. Kwiatkowska, G. Norman, D. Parker, and R. Segala, “Symbolic model checking of probabilistic processes using mtbdds and the kronecker representation,” in Proceedings of the 6th International Conference on Tools and Algorithms for Construction and Analysis of Systems: Held as Part of the European Joint Conferences on the Theory and Pract...

  22. [23]

    Explicit model checking of very large mdp using partitioning and secondary storage,

    A. Hartmanns and H. Hermanns, “Explicit model checking of very large mdp using partitioning and secondary storage,” in Automated Technology for Verification and Analysis , B. Finkbeiner, G. Pu, and L. Zhang, Eds. Cham: Springer International Publishing, 2015, pp. 131–147

  23. [24]

    Statistical model checking for markov decision processes,

    D. Henriques, J. G. Martins, P. Zuliani, A. Platzer, and E. M. Clarke, “Statistical model checking for markov decision processes,” in Proceedings of the 2012 Ninth International Conference on Quantitative Evaluation of Systems , ser. QEST ’12. USA: IEEE Computer Society, 2012, p. 84–93. [Online]. Available: https: //doi.org/10.1109/QEST.2012.19

  24. [25]

    [Hir04] Mika Hirvensalo

    H. Hansson and B. Jonsson, “A logic for reasoning about time and reliability,” Form. Asp. Comput., vol. 6, no. 5, p. 512–535, Sep. 1994. [Online]. Available: https://doi.org/10.1007/BF01211866

  25. [26]

    Model checking of probabilistic and nondeterministic systems,

    A. Bianco and L. De Alfaro, “Model checking of probabilistic and nondeterministic systems,” in International Conference on Foundations of Software Technology and Theoretical Computer Science . Springer, 1995, pp. 499–513

  26. [27]

    Chatterjee and T

    K. Chatterjee and T. A. Henzinger, Value Iteration. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 107–138. [Online]. Available: https://doi.org/10.1007/978-3-540-69850-0 7

  27. [28]

    Bellman, Dynamic Programming

    R. Bellman, Dynamic Programming. Princeton University Press, 1957

  28. [29]

    Planning and control of autonomous mobile robots for intralogistics: Literature review and research agenda,

    G. Fragapane, R. de Koster, F. Sgarbossa, and J. O. Strandhagen, “Planning and control of autonomous mobile robots for intralogistics: Literature review and research agenda,” European Journal of Operational Research, vol. 294, no. 2, pp. 405–426, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0377221721000217

  29. [30]

    Model minimization in markov decision processes,

    T. Dean and R. Givan, “Model minimization in markov decision processes,” in Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence , ser. AAAI’97/IAAI’97. AAAI Press, 1997, p. 106–111

  30. [31]

    Reinforcement learning: a survey,

    L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: a survey,” J. Artif. Int. Res. , vol. 4, no. 1, p. 237–285, May 1996

  31. [32]

    Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,

    R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence, vol. 112, no. 1, pp. 181–211, 1999. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0004370299000521

  32. [33]

    AI Safety Gridworlds

    J. Leike, M. Martic, V . Krakovna, P. A. Ortega, T. Everitt, A. Lefrancq, L. Orseau, and S. Legg, “AI safety gridworlds,” CoRR, vol. abs/1711.09883, 2017. [Online]. Available: http://arxiv.org/abs/1711. 09883

  33. [34]

    Con- troller synthesis for omega-regular and steady-state specifications,

    A. Velasquez, I. Alkhouri, A. Beckus, A. Trivedi, and G. Atia, “Con- troller synthesis for omega-regular and steady-state specifications,” in Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems , ser. AAMAS ’22. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2022, p. 1310–1318

  34. [35]

    Multigain 2.0: Mdp controller synthesis for multiple mean-payoff, ltl and steady-state constraints,

    S. Bals, A. Evangelidis, J. K ˇret´ınsk´y, and J. Waibel, “Multigain 2.0: Mdp controller synthesis for multiple mean-payoff, ltl and steady-state constraints,” in Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control , ser. HSCC ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Availabl...

  35. [36]

    Learning-based probabilistic ltl motion planning with environment and motion uncertainties,

    M. Cai, H. Peng, Z. Li, and Z. Kan, “Learning-based probabilistic ltl motion planning with environment and motion uncertainties,” IEEE Transactions on Automatic Control , vol. 66, no. 5, pp. 2386–2392, 2021

  36. [37]

    Engineering trustworthy self-adaptive software with dynamic assurance cases,

    R. Calinescu, D. Weyns, S. Gerasimou, M. U. Iftikhar, I. Habli, and T. Kelly, “Engineering trustworthy self-adaptive software with dynamic assurance cases,” IEEE Transactions on Software Engineering , vol. 44, no. 11, pp. 1039–1069, 2017

  37. [38]

    Synthesizing tradeoff spaces with quantitative guarantees for families of software systems,

    J. C ´amara, D. Garlan, and B. Schmerl, “Synthesizing tradeoff spaces with quantitative guarantees for families of software systems,” Journal of Systems and Software , vol. 152, pp. 33–49, 2019

  38. [39]

    Markov decision process (mdp) framework for optimizing software on mobile phones,

    T. L. Cheung, K. Okamoto, F. Maker, X. Liu, and V . Akella, “Markov decision process (mdp) framework for optimizing software on mobile phones,” in Proceedings of the Seventh ACM International Conference on Embedded Software , ser. EMSOFT ’09. New York, NY , USA: Association for Computing Machinery, 2009, p. 11–20. [Online]. Available: https://doi.org/10.1...

  39. [40]

    A markov decision process- based service migration procedure for follow me cloud,

    A. Ksentini, T. Taleb, and M. Chen, “A markov decision process- based service migration procedure for follow me cloud,” in 2014 IEEE International Conference on Communications (ICC) , 2014, pp. 1350– 1354

  40. [41]

    Performance modelling and verification of cloud-based auto-scaling policies,

    A. Evangelidis, D. Parker, and R. Bahsoon, “Performance modelling and verification of cloud-based auto-scaling policies,” Future Generation Computer Systems , vol. 87, pp. 629–638, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X17312475

  41. [42]

    Proactive self-adaptation under uncertainty: a probabilistic model checking approach,

    G. A. Moreno, J. C ´amara, D. Garlan, and B. Schmerl, “Proactive self-adaptation under uncertainty: a probabilistic model checking approach,” in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering , ser. ESEC/FSE 2015. New York, NY , USA: Association for Computing Machinery, 2015, p. 1–12. [Online]. Available: https://doi.org...

  42. [43]

    Counterexample- guided abstraction refinement,

    E. Clarke, O. Grumberg, S. Jha, Y . Lu, and H. Veith, “Counterexample- guided abstraction refinement,” in Computer Aided Verification, E. A. Emerson and A. P. Sistla, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000, pp. 154–169

  43. [44]

    Probabilistic cegar,

    H. Hermanns, B. Wachter, and L. Zhang, “Probabilistic cegar,” in International Conference on Computer Aided Verification . Springer, 2008, pp. 162–175

  44. [45]

    A game- based abstraction-refinement framework for markov decision processes,

    M. Kattenbelt, M. Kwiatkowska, G. Norman, and D. Parker, “A game- based abstraction-refinement framework for markov decision processes,” Formal Methods in System Design , vol. 36, pp. 246–280, 2010

  45. [46]

    Compositional proba- bilistic model checking with string diagrams of mdps,

    K. Watanabe, C. Eberhart, K. Asada, and I. Hasuo, “Compositional proba- bilistic model checking with string diagrams of mdps,” in Computer Aided Verification - 35th International Conference, CAV 2023, Proceedings , ser. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), C. ...

  46. [47]

    C. A. C. Coello, Evolutionary algorithms for solving multi-objective problems. Springer, 2007

  47. [48]

    Search-based synthesis of probabilistic models for quality-of-service software engineering (t),

    S. Gerasimou, G. Tamburrelli, and R. Calinescu, “Search-based synthesis of probabilistic models for quality-of-service software engineering (t),” in 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015, pp. 319–330

  48. [49]

    Synthesis of proba- bilistic models for quality-of-service software engineering,

    S. Gerasimou, R. Calinescu, and G. Tamburrelli, “Synthesis of proba- bilistic models for quality-of-service software engineering,” Automated Software Engg., vol. 25, no. 4, p. 785–831, Dec. 2018

  49. [50]

    Haiq: Synthesis of software design spaces with structural and probabilistic guarantees,

    J. C ´amara, “Haiq: Synthesis of software design spaces with structural and probabilistic guarantees,” in Proceedings of the 8th International Conference on Formal Methods in Software Engineering , 2020, pp. 22– 33

  50. [51]

    Targeting requirements violations of autonomous driving systems by dynamic evolutionary search

    S. Gerasimou, J. C ´amara, R. Calinescu, N. Alasmari, F. Alhwikem, and X. Fang, “Evolutionary-guided synthesis of verified pareto- optimal mdp policies,” in Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering , ser. ASE ’21. IEEE Press, 2022, p. 842–853. [Online]. Available: https://doi.org/10.1109/ASE51524.2021.9678727

  51. [52]

    Feudal reinforcement learning,

    P. Dayan and G. E. Hinton, “Feudal reinforcement learning,” in Advances in Neural Information Processing Systems , S. Hanson, J. Cowan, and C. Giles, Eds., vol. 5. Morgan-Kaufmann, 1992. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 1992/file/d14220ee66aeec73c49038385428ec4c-Paper.pdf

  52. [53]

    Hierarchical reinforcement learning with the maxq value function decomposition,

    T. G. Dietterich, “Hierarchical reinforcement learning with the maxq value function decomposition,” J. Artif. Int. Res. , vol. 13, no. 1, p. 227–303, Nov. 2000

  53. [54]

    Abstract value iteration for hierarchical reinforcement learning,

    K. Jothimurugan, O. Bastani, and R. Alur, “Abstract value iteration for hierarchical reinforcement learning,” in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics , ser. Proceedings of Machine Learning Research, A. Banerjee and K. Fukumizu, Eds., vol. 130. PMLR, 13–15 Apr 2021, pp. 1162–1170. [Online]. Available: h...

  54. [55]

    Markovian state and action abstractions for mdps via hierarchical mcts,

    A. Bai, S. Srivastava, and S. Russell, “Markovian state and action abstractions for mdps via hierarchical mcts,” in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence , ser. IJCAI’16. AAAI Press, 2016, p. 3029–3037