Accelerating Policy Synthesis in Large-Scale MDPs via Hierarchical Adaptive Refinement

arxiv: 2506.17792 · v2 · submitted 2025-06-21 · 💻 cs.AI · cs.LO· cs.SE

Accelerating Policy Synthesis in Large-Scale MDPs via Hierarchical Adaptive Refinement

Alexandros Evangelidis , Gricel V\'azquez , Simos Gerasimou This is my paper

Pith reviewed 2026-05-19 07:20 UTC · model grok-4.3

classification 💻 cs.AI cs.LOcs.SE

keywords Markov decision processespolicy synthesishierarchical refinementadaptive refinementlarge-scale MDPsnear-optimal policiescomputational speedup

0 comments p. Extension

The pith

Hierarchical adaptive refinement produces near-optimal policies for million-state MDPs with up to 2x speedup.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an iterative method that refines only the most fragile regions of a large Markov decision process while leaving coarser areas untouched. This selective refinement balances the need for accuracy against the cost of computation, allowing policy synthesis to reach MDPs with up to one million states. A sympathetic reader cares because conventional solvers cannot handle the state spaces that arise in robotics and software product lines. The authors prove that the final composed policy stays near-optimal whenever a local solver supplies bounded error and boundary mismatches stay controllable.

Core claim

By repeatedly locating fragile MDP regions and refining them hierarchically, the method solves local sub-problems exactly and stitches the results into a global policy. Under standard assumptions the composed policy remains near-optimal, with total error bounded by the local solver tolerance plus any boundary mismatch. Experiments across diverse case studies demonstrate up to twofold speedup over PRISM on MDPs reaching one million states.

What carries the argument

Hierarchical adaptive refinement, an iterative process that selects and subdivides the most fragile MDP regions for targeted local solving.

If this is right

Policy synthesis becomes feasible for MDPs with up to one million states.
The error of the returned policy stays within the local solver tolerance plus boundary mismatch.
Refinement effort is spent only where it is needed, preserving efficiency.
Speedups reach a factor of two relative to the PRISM tool on the tested benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same selective-refinement idea could be applied to other stochastic planning problems whose state spaces grow too large for monolithic solvers.
Tighter control of boundary conditions at refinement interfaces would directly tighten the overall error bound.
Real-time robotic controllers could invoke the method on demand when new uncertainty regions appear.

Load-bearing premise

A local solver must exist that can guarantee bounded error inside each refined sub-region and boundary mismatches between refined and coarse parts must stay small enough not to destroy the overall guarantee.

What would settle it

Apply the method to a concrete one-million-state MDP, compute the realized policy error, and check whether that error exceeds the sum of the local solver tolerance and the observed boundary mismatch.

Figures

Figures reproduced from arXiv: 2506.17792 by Alexandros Evangelidis, Gricel V\'azquez, Simos Gerasimou.

**Figure 2.** Figure 2: SHARP overview to be maximised/minimised), and refinement parameters that control the hierarchical decomposition of the approach. The first SHARP stage (S1) involves executing an initial partitioning of the MDP into a set of partitions, where each partition is a sub-MDP. SHARP uses this partitioning to construct the hierarchy tree (S2), whose root node corresponds to the entire set of states, and the child… view at source ↗

**Figure 3.** Figure 3: Overview of the sub-MDP creation and hierarchical refinement process underpinning SHARP. A 6 × 6 warehouse environment with shelves (W1–W3) is partitioned into nine zones. Because Block 9, red highlight, contains the goal state, we refine it further. Note that, Block 1 is a leaf node containing the start position: its boundary states (orange dashed) connect to and are influenced by the adjacent Block 2 and… view at source ↗

**Figure 4.** Figure 4: SHARP and PRISM performance comparison showing [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Performance stability analysis on random MW models. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Performance comparison between SHARP and PRISM for W1024* instances. Box plots show execution times (in [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Scalability analysis of PRISM vs. SHARP across [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Scatter plot of SHARP hyperparameter configurations [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

read the original abstract

Software-intensive systems, such as software product lines and robotics, utilise Markov decision processes (MDPs) to capture uncertainty and analyse sequential decision-making problems. Despite the usefulness of conventional policy synthesis methods, they fail to scale to large state spaces. Our approach addresses this issue and accelerates policy synthesis in large MDPs by dynamically refining the MDP and iteratively selecting the most fragile MDP regions for refinement. This iterative procedure offers a balance between accuracy and efficiency, as refinement occurs only when necessary. We formally show that the composed policy is near-optimal under standard assumptions, with error bounded by the local solver tolerance and boundary mismatch. Across diverse case studies and MDPs up to 1M states, we demonstrate that our approach achieves up to $2\times$ speedup over PRISM, offering a competitive solution for real-world policy synthesis in large MDPs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Hierarchical adaptive refinement gives practical speedups on large MDPs but the boundary mismatch error bound stays unquantified.

read the letter

The key takeaway is that this work accelerates policy synthesis for large MDPs using hierarchical adaptive refinement, achieving reported speedups while maintaining a formal near-optimality claim. The approach is new in how it iteratively selects fragile regions within a hierarchy for refinement, rather than uniform or static abstraction. It builds on standard MDP assumptions and local solvers to compose the final policy. The paper does well on the experimental side, testing across case studies with MDPs up to 1 million states and showing up to 2 times faster performance than PRISM. Those results give a sense of real-world applicability in areas like robotics and software product lines. The soft spots center on the error analysis. The central claim is that error stays bounded by local solver tolerance and boundary mismatch, but the development does not supply a computable bound on mismatch growth with the number of iterations or the geometry of chosen sub-regions. Without that, it is difficult to confirm that the total error remains under a user-specified threshold as refinement proceeds. This is a genuine limitation in the formal argument, though the empirical outcomes may still hold. This paper targets researchers focused on scalable verification and planning for systems with uncertainty. Readers dealing with state spaces too big for exhaustive solvers would find the method and the scale of the tests relevant. It has enough grounding to merit a serious referee who can examine the proofs and the experimental setup closely. I recommend sending it out for peer review, with attention to tightening the mismatch bound in any revision.

Referee Report

1 major / 2 minor

Summary. The paper proposes a hierarchical adaptive refinement algorithm for policy synthesis in large MDPs. It iteratively identifies and refines fragile sub-regions of the state space while maintaining a coarse skeleton, proves that the resulting composed policy is near-optimal under standard MDP assumptions with an error bound consisting of local solver tolerance plus boundary mismatch, and reports up to 2× speedup versus PRISM on MDPs with up to 1 M states across several case studies.

Significance. If the formal error analysis can be completed with an explicit, computable bound on boundary mismatch, the method would provide a principled way to trade accuracy for scalability in MDP policy synthesis, which is relevant for robotics and software product lines. The empirical scaling results on 1 M state models are a positive indicator, but the significance hinges on whether the refinement strategy remains effective when the mismatch term is rigorously controlled.

major comments (1)

[Proof of near-optimality (abstract and §4)] The central near-optimality claim (abstract and presumably the proof in §4) states that total error is bounded by local solver tolerance plus boundary mismatch and that this remains below a user threshold. However, no quantitative bound is supplied on how boundary mismatch grows with the number of refinement iterations or with the geometry of the chosen sub-regions. Without such a bound the induction that keeps the composed policy near-optimal does not close; this is load-bearing for the formal result.

minor comments (2)

[Abstract] The abstract refers to “standard assumptions” without enumerating them; a short list or pointer to the precise statement in the main text would help readers assess applicability.
[Experiments] The experimental section should report the exact tolerance values used for the local solver and the refinement threshold so that the claimed speed-up can be reproduced under the same error budget.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the empirical contributions on large-scale MDPs. The primary concern is the lack of a quantitative bound on boundary mismatch in the near-optimality proof. We address this directly below and commit to revisions that strengthen the formal result without altering the core claims or experimental findings.

read point-by-point responses

Referee: [Proof of near-optimality (abstract and §4)] The central near-optimality claim (abstract and presumably the proof in §4) states that total error is bounded by local solver tolerance plus boundary mismatch and that this remains below a user threshold. However, no quantitative bound is supplied on how boundary mismatch grows with the number of refinement iterations or with the geometry of the chosen sub-regions. Without such a bound the induction that keeps the composed policy near-optimal does not close; this is load-bearing for the formal result.

Authors: We agree that an explicit, computable bound on the boundary mismatch term is necessary to fully close the induction and make the overall guarantee practical. The current proof correctly decomposes the error into local solver tolerance (controlled by the user) plus boundary mismatch, but leaves the latter as a residual term. To address this, we will revise §4 by adding a new lemma that bounds the mismatch under standard assumptions (Lipschitz continuity of the value function, which is satisfied in the robotics and product-line MDPs we consider). The bound will be expressed in terms of sub-region diameter, number of boundary states, and the refinement threshold, allowing the user to select parameters that keep the total error below a desired threshold. We will also update the abstract and add a short discussion on how the bound scales with iterations. These changes will be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation self-contained

full rationale

The paper presents an algorithmic hierarchical refinement procedure for MDPs and claims a formal proof that the composed policy is near-optimal under standard assumptions, with error bounded by local solver tolerance plus boundary mismatch. This is supported by external empirical comparisons to PRISM on MDPs up to 1M states. No equations, self-citations, or fitted parameters are shown in the abstract or description that reduce the error bound, speedup, or near-optimality claim to a definition or input taken from the paper's own results. The derivation relies on independent local solvers and standard MDP assumptions rather than internal construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard MDP theory plus the assumption that local solvers can be composed without introducing uncontrolled error at boundaries.

axioms (1)

domain assumption Standard assumptions for MDP policy synthesis and near-optimality of composed policies
Invoked when stating that the composed policy is near-optimal with bounded error.

pith-pipeline@v0.9.0 · 5682 in / 1198 out tokens · 40682 ms · 2026-05-19T07:20:42.103757+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SHARP is a hierarchical partition-based approach for MDP policy synthesis that adaptively refines MDP partitions only where needed... value spread criterion... boundary states... PROPAGATE ALLUP
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The composed policy is near-optimal under standard assumptions, with error bounded by the local solver tolerance and boundary mismatch.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 1 internal anchor

[1]

Edge intelligence—research opportunities for distributed computing continuum systems,

V . C. Pujol, P. K. Donta, A. Morichetta, I. Murturi, and S. Dustdar, “Edge intelligence—research opportunities for distributed computing continuum systems,” IEEE Internet Computing , vol. 27, no. 4, pp. 53–74, 2023

work page 2023
[2]

A review of microservices autoscaling with formal verification perspective,

S. N. A. Jawaddi, M. H. Johari, and A. Ismail, “A review of microservices autoscaling with formal verification perspective,” Software: Practice and Experience, vol. 52, no. 11, pp. 2476–2495, 2022

work page 2022
[3]

Coordinating hundreds of cooperative, autonomous vehicles in warehouses,

P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating hundreds of cooperative, autonomous vehicles in warehouses,” in Proceedings of the 19th National Conference on Innovative Applications of Artificial Intelligence - Volume 2, ser. IAAI’07. AAAI Press, 2007, p. 1752–1759

work page 2007
[4]

Probabilistic model checking of robots deployed in extreme environments,

X. Zhao, V . Robu, D. Flynn, F. Dinmohammadi, M. Fisher, and M. Webster, “Probabilistic model checking of robots deployed in extreme environments,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 8066–8074

work page 2019
[5]

An overview of verification and validation challenges for inspection robots,

M. Fisher, R. C. Cardoso, E. C. Collins, C. Dadswell, L. A. Dennis, C. Dixon, M. Farrell, A. Ferrando, X. Huang, M. Jump et al. , “An overview of verification and validation challenges for inspection robots,” Robotics, vol. 10, no. 2, p. 67, 2021

work page 2021
[6]

Uncertainty in self-adaptive systems: A research community perspective,

S. M. Hezavehi, D. Weyns, P. Avgeriou, R. Calinescu, R. Mirandola, and D. Perez-Palacin, “Uncertainty in self-adaptive systems: A research community perspective,”ACM Transactions on Autonomous and Adaptive Systems (TAAS), vol. 15, no. 4, pp. 1–36, 2021

work page 2021
[7]

Towards a research agenda for understanding and managing uncertainty in self-adaptive systems,

D. Weyns, R. Calinescu, R. Mirandola, K. Tei, M. Acosta, N. Bencomo, A. Bennaceur, N. Boltz, T. Bures, J. Camara et al., “Towards a research agenda for understanding and managing uncertainty in self-adaptive systems,” ACM SIGSOFT Software Engineering Notes , vol. 48, no. 4, pp. 20–36, 2023

work page 2023
[8]

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st ed. USA: John Wiley & Sons, Inc., 1994

work page 1994
[10]

Dynamic qos management and optimization in service- based systems,

R. Calinescu, L. Grunske, M. Kwiatkowska, R. Mirandola, and G. Tamburrelli, “Dynamic qos management and optimization in service- based systems,” IEEE Trans. Softw. Eng. , vol. 37, no. 3, p. 387–409, May 2011. [Online]. Available: https://doi.org/10.1109/TSE.2010.92

work page doi:10.1109/tse.2010.92 2011
[11]

Family- based modeling and analysis for probabilistic systems — featuring profeat,

P. Chrszon, C. Dubslaff, S. Kl ¨uppelholz, and C. Baier, “Family- based modeling and analysis for probabilistic systems — featuring profeat,” in Proceedings of the 19th International Conference on Fundamental Approaches to Software Engineering - Volume 9633 . Berlin, Heidelberg: Springer-Verlag, 2016, p. 287–304. [Online]. Available: https://doi.org/10.10...

work page doi:10.1007/978-3-662-49665-7 2016
[12]

Multiple mobile robot task and motion planning: A survey,

L. Antonyshyn, J. Silveira, S. Givigi, and J. Marshall, “Multiple mobile robot task and motion planning: A survey,” ACM Comput. Surv., vol. 55, no. 10, Feb. 2023. [Online]. Available: https://doi.org/10.1145/3564696

work page doi:10.1145/3564696 2023
[13]

E. M. Clarke, W. Klieber, M. Nov ´aˇcek, and P. Zuliani, Model Checking and the State Explosion Problem . Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 1–30. [Online]. Available: https://doi.org/10.1007/978-3-642-35746-6 1

work page doi:10.1007/978-3-642-35746-6 2012
[14]

Baier and J.-P

C. Baier and J.-P. Katoen, Principles of model checking . MIT press, 2008

work page 2008
[15]

Kwiatkowska, G

M. Kwiatkowska, G. Norman, and D. Parker, Stochastic Model Checking. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 220–270. [Online]. Available: https://doi.org/10.1007/978-3-540-72522-0 6

work page doi:10.1007/978-3-540-72522-0 2007
[16]

Abstraction-refinement for hierarchical probabilistic models,

S. Junges and M. T. Spaan, “Abstraction-refinement for hierarchical probabilistic models,” in International Conference on Computer Aided Verification. Springer, 2022, pp. 102–123

work page 2022
[17]

The 10,000 facets of mdp model checking,

C. Baier, H. Hermanns, and J.-P. Katoen, “The 10,000 facets of mdp model checking,” Computing and Software Science: State of the Art and Perspectives, pp. 420–451, 2019

work page 2019
[18]

On correctness, precision, and performance in quantitative verification,

C. E. Budde, A. Hartmanns, M. Klauck, J. K ˇret´ınsk´y, D. Parker, T. Quatmann, A. Turrini, and Z. Zhang, “On correctness, precision, and performance in quantitative verification,” in Leveraging Applications of Formal Methods, Verification and Validation: Tools and Trends , T. Margaria and B. Steffen, Eds. Cham: Springer International Publishing, 2021, pp...

work page 2021
[19]

PRISM 4.0: Verification of probabilistic real-time systems,

M. Kwiatkowska, G. Norman, and D. Parker, “PRISM 4.0: Verification of probabilistic real-time systems,” in Proc. 23rd International Conference on Computer Aided Verification (CAV’11), ser. LNCS, G. Gopalakrishnan and S. Qadeer, Eds., vol. 6806. Springer, 2011, pp. 585–591

work page 2011
[20]

The probabilistic model checker storm,

C. Hensel, S. Junges, J.-P. Katoen, T. Quatmann, and M. V olk, “The probabilistic model checker storm,” vol. 24, no. 4, p. 589–610, Aug

work page
[21]

Available: https://doi.org/10.1007/s10009-021-00633-z

[Online]. Available: https://doi.org/10.1007/s10009-021-00633-z

work page doi:10.1007/s10009-021-00633-z
[22]

Symbolic model checking of probabilistic processes using mtbdds and the kronecker representation,

L. d. Alfaro, M. Z. Kwiatkowska, G. Norman, D. Parker, and R. Segala, “Symbolic model checking of probabilistic processes using mtbdds and the kronecker representation,” in Proceedings of the 6th International Conference on Tools and Algorithms for Construction and Analysis of Systems: Held as Part of the European Joint Conferences on the Theory and Pract...

work page 2000
[23]

Explicit model checking of very large mdp using partitioning and secondary storage,

A. Hartmanns and H. Hermanns, “Explicit model checking of very large mdp using partitioning and secondary storage,” in Automated Technology for Verification and Analysis , B. Finkbeiner, G. Pu, and L. Zhang, Eds. Cham: Springer International Publishing, 2015, pp. 131–147

work page 2015
[24]

Statistical model checking for markov decision processes,

D. Henriques, J. G. Martins, P. Zuliani, A. Platzer, and E. M. Clarke, “Statistical model checking for markov decision processes,” in Proceedings of the 2012 Ninth International Conference on Quantitative Evaluation of Systems , ser. QEST ’12. USA: IEEE Computer Society, 2012, p. 84–93. [Online]. Available: https: //doi.org/10.1109/QEST.2012.19

work page doi:10.1109/qest.2012.19 2012
[25]

[Hir04] Mika Hirvensalo

H. Hansson and B. Jonsson, “A logic for reasoning about time and reliability,” Form. Asp. Comput., vol. 6, no. 5, p. 512–535, Sep. 1994. [Online]. Available: https://doi.org/10.1007/BF01211866

work page doi:10.1007/bf01211866 1994
[26]

Model checking of probabilistic and nondeterministic systems,

A. Bianco and L. De Alfaro, “Model checking of probabilistic and nondeterministic systems,” in International Conference on Foundations of Software Technology and Theoretical Computer Science . Springer, 1995, pp. 499–513

work page 1995
[27]

Chatterjee and T

K. Chatterjee and T. A. Henzinger, Value Iteration. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 107–138. [Online]. Available: https://doi.org/10.1007/978-3-540-69850-0 7

work page doi:10.1007/978-3-540-69850-0 2008
[28]

Bellman, Dynamic Programming

R. Bellman, Dynamic Programming. Princeton University Press, 1957

work page 1957
[29]

Planning and control of autonomous mobile robots for intralogistics: Literature review and research agenda,

G. Fragapane, R. de Koster, F. Sgarbossa, and J. O. Strandhagen, “Planning and control of autonomous mobile robots for intralogistics: Literature review and research agenda,” European Journal of Operational Research, vol. 294, no. 2, pp. 405–426, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0377221721000217

work page 2021
[30]

Model minimization in markov decision processes,

T. Dean and R. Givan, “Model minimization in markov decision processes,” in Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence , ser. AAAI’97/IAAI’97. AAAI Press, 1997, p. 106–111

work page 1997
[31]

Reinforcement learning: a survey,

L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: a survey,” J. Artif. Int. Res. , vol. 4, no. 1, p. 237–285, May 1996

work page 1996
[32]

Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,

R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence, vol. 112, no. 1, pp. 181–211, 1999. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0004370299000521

work page 1999
[33]

AI Safety Gridworlds

J. Leike, M. Martic, V . Krakovna, P. A. Ortega, T. Everitt, A. Lefrancq, L. Orseau, and S. Legg, “AI safety gridworlds,” CoRR, vol. abs/1711.09883, 2017. [Online]. Available: http://arxiv.org/abs/1711. 09883

work page internal anchor Pith review Pith/arXiv arXiv 2017
[34]

Con- troller synthesis for omega-regular and steady-state specifications,

A. Velasquez, I. Alkhouri, A. Beckus, A. Trivedi, and G. Atia, “Con- troller synthesis for omega-regular and steady-state specifications,” in Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems , ser. AAMAS ’22. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2022, p. 1310–1318

work page 2022
[35]

Multigain 2.0: Mdp controller synthesis for multiple mean-payoff, ltl and steady-state constraints,

S. Bals, A. Evangelidis, J. K ˇret´ınsk´y, and J. Waibel, “Multigain 2.0: Mdp controller synthesis for multiple mean-payoff, ltl and steady-state constraints,” in Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control , ser. HSCC ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Availabl...

work page doi:10.1145/3641513.3650135 2024
[36]

Learning-based probabilistic ltl motion planning with environment and motion uncertainties,

M. Cai, H. Peng, Z. Li, and Z. Kan, “Learning-based probabilistic ltl motion planning with environment and motion uncertainties,” IEEE Transactions on Automatic Control , vol. 66, no. 5, pp. 2386–2392, 2021

work page 2021
[37]

Engineering trustworthy self-adaptive software with dynamic assurance cases,

R. Calinescu, D. Weyns, S. Gerasimou, M. U. Iftikhar, I. Habli, and T. Kelly, “Engineering trustworthy self-adaptive software with dynamic assurance cases,” IEEE Transactions on Software Engineering , vol. 44, no. 11, pp. 1039–1069, 2017

work page 2017
[38]

Synthesizing tradeoff spaces with quantitative guarantees for families of software systems,

J. C ´amara, D. Garlan, and B. Schmerl, “Synthesizing tradeoff spaces with quantitative guarantees for families of software systems,” Journal of Systems and Software , vol. 152, pp. 33–49, 2019

work page 2019
[39]

Markov decision process (mdp) framework for optimizing software on mobile phones,

T. L. Cheung, K. Okamoto, F. Maker, X. Liu, and V . Akella, “Markov decision process (mdp) framework for optimizing software on mobile phones,” in Proceedings of the Seventh ACM International Conference on Embedded Software , ser. EMSOFT ’09. New York, NY , USA: Association for Computing Machinery, 2009, p. 11–20. [Online]. Available: https://doi.org/10.1...

work page doi:10.1145/1629335.1629338 2009
[40]

A markov decision process- based service migration procedure for follow me cloud,

A. Ksentini, T. Taleb, and M. Chen, “A markov decision process- based service migration procedure for follow me cloud,” in 2014 IEEE International Conference on Communications (ICC) , 2014, pp. 1350– 1354

work page 2014
[41]

Performance modelling and verification of cloud-based auto-scaling policies,

A. Evangelidis, D. Parker, and R. Bahsoon, “Performance modelling and verification of cloud-based auto-scaling policies,” Future Generation Computer Systems , vol. 87, pp. 629–638, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X17312475

work page 2018
[42]

Proactive self-adaptation under uncertainty: a probabilistic model checking approach,

G. A. Moreno, J. C ´amara, D. Garlan, and B. Schmerl, “Proactive self-adaptation under uncertainty: a probabilistic model checking approach,” in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering , ser. ESEC/FSE 2015. New York, NY , USA: Association for Computing Machinery, 2015, p. 1–12. [Online]. Available: https://doi.org...

work page doi:10.1145/2786805.2786853 2015
[43]

Counterexample- guided abstraction refinement,

E. Clarke, O. Grumberg, S. Jha, Y . Lu, and H. Veith, “Counterexample- guided abstraction refinement,” in Computer Aided Verification, E. A. Emerson and A. P. Sistla, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000, pp. 154–169

work page 2000
[44]

Probabilistic cegar,

H. Hermanns, B. Wachter, and L. Zhang, “Probabilistic cegar,” in International Conference on Computer Aided Verification . Springer, 2008, pp. 162–175

work page 2008
[45]

A game- based abstraction-refinement framework for markov decision processes,

M. Kattenbelt, M. Kwiatkowska, G. Norman, and D. Parker, “A game- based abstraction-refinement framework for markov decision processes,” Formal Methods in System Design , vol. 36, pp. 246–280, 2010

work page 2010
[46]

Compositional proba- bilistic model checking with string diagrams of mdps,

K. Watanabe, C. Eberhart, K. Asada, and I. Hasuo, “Compositional proba- bilistic model checking with string diagrams of mdps,” in Computer Aided Verification - 35th International Conference, CAV 2023, Proceedings , ser. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), C. ...

work page 2023
[47]

C. A. C. Coello, Evolutionary algorithms for solving multi-objective problems. Springer, 2007

work page 2007
[48]

Search-based synthesis of probabilistic models for quality-of-service software engineering (t),

S. Gerasimou, G. Tamburrelli, and R. Calinescu, “Search-based synthesis of probabilistic models for quality-of-service software engineering (t),” in 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015, pp. 319–330

work page 2015
[49]

Synthesis of proba- bilistic models for quality-of-service software engineering,

S. Gerasimou, R. Calinescu, and G. Tamburrelli, “Synthesis of proba- bilistic models for quality-of-service software engineering,” Automated Software Engg., vol. 25, no. 4, p. 785–831, Dec. 2018

work page 2018
[50]

Haiq: Synthesis of software design spaces with structural and probabilistic guarantees,

J. C ´amara, “Haiq: Synthesis of software design spaces with structural and probabilistic guarantees,” in Proceedings of the 8th International Conference on Formal Methods in Software Engineering , 2020, pp. 22– 33

work page 2020
[51]

Targeting requirements violations of autonomous driving systems by dynamic evolutionary search

S. Gerasimou, J. C ´amara, R. Calinescu, N. Alasmari, F. Alhwikem, and X. Fang, “Evolutionary-guided synthesis of verified pareto- optimal mdp policies,” in Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering , ser. ASE ’21. IEEE Press, 2022, p. 842–853. [Online]. Available: https://doi.org/10.1109/ASE51524.2021.9678727

work page doi:10.1109/ase51524.2021.9678727 2022
[52]

Feudal reinforcement learning,

P. Dayan and G. E. Hinton, “Feudal reinforcement learning,” in Advances in Neural Information Processing Systems , S. Hanson, J. Cowan, and C. Giles, Eds., vol. 5. Morgan-Kaufmann, 1992. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 1992/file/d14220ee66aeec73c49038385428ec4c-Paper.pdf

work page 1992
[53]

Hierarchical reinforcement learning with the maxq value function decomposition,

T. G. Dietterich, “Hierarchical reinforcement learning with the maxq value function decomposition,” J. Artif. Int. Res. , vol. 13, no. 1, p. 227–303, Nov. 2000

work page 2000
[54]

Abstract value iteration for hierarchical reinforcement learning,

K. Jothimurugan, O. Bastani, and R. Alur, “Abstract value iteration for hierarchical reinforcement learning,” in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics , ser. Proceedings of Machine Learning Research, A. Banerjee and K. Fukumizu, Eds., vol. 130. PMLR, 13–15 Apr 2021, pp. 1162–1170. [Online]. Available: h...

work page 2021
[55]

Markovian state and action abstractions for mdps via hierarchical mcts,

A. Bai, S. Srivastava, and S. Russell, “Markovian state and action abstractions for mdps via hierarchical mcts,” in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence , ser. IJCAI’16. AAAI Press, 2016, p. 3029–3037

work page 2016

[1] [1]

Edge intelligence—research opportunities for distributed computing continuum systems,

V . C. Pujol, P. K. Donta, A. Morichetta, I. Murturi, and S. Dustdar, “Edge intelligence—research opportunities for distributed computing continuum systems,” IEEE Internet Computing , vol. 27, no. 4, pp. 53–74, 2023

work page 2023

[2] [2]

A review of microservices autoscaling with formal verification perspective,

S. N. A. Jawaddi, M. H. Johari, and A. Ismail, “A review of microservices autoscaling with formal verification perspective,” Software: Practice and Experience, vol. 52, no. 11, pp. 2476–2495, 2022

work page 2022

[3] [3]

Coordinating hundreds of cooperative, autonomous vehicles in warehouses,

P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating hundreds of cooperative, autonomous vehicles in warehouses,” in Proceedings of the 19th National Conference on Innovative Applications of Artificial Intelligence - Volume 2, ser. IAAI’07. AAAI Press, 2007, p. 1752–1759

work page 2007

[4] [4]

Probabilistic model checking of robots deployed in extreme environments,

X. Zhao, V . Robu, D. Flynn, F. Dinmohammadi, M. Fisher, and M. Webster, “Probabilistic model checking of robots deployed in extreme environments,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 8066–8074

work page 2019

[5] [5]

An overview of verification and validation challenges for inspection robots,

M. Fisher, R. C. Cardoso, E. C. Collins, C. Dadswell, L. A. Dennis, C. Dixon, M. Farrell, A. Ferrando, X. Huang, M. Jump et al. , “An overview of verification and validation challenges for inspection robots,” Robotics, vol. 10, no. 2, p. 67, 2021

work page 2021

[6] [6]

Uncertainty in self-adaptive systems: A research community perspective,

S. M. Hezavehi, D. Weyns, P. Avgeriou, R. Calinescu, R. Mirandola, and D. Perez-Palacin, “Uncertainty in self-adaptive systems: A research community perspective,”ACM Transactions on Autonomous and Adaptive Systems (TAAS), vol. 15, no. 4, pp. 1–36, 2021

work page 2021

[7] [7]

Towards a research agenda for understanding and managing uncertainty in self-adaptive systems,

D. Weyns, R. Calinescu, R. Mirandola, K. Tei, M. Acosta, N. Bencomo, A. Bennaceur, N. Boltz, T. Bures, J. Camara et al., “Towards a research agenda for understanding and managing uncertainty in self-adaptive systems,” ACM SIGSOFT Software Engineering Notes , vol. 48, no. 4, pp. 20–36, 2023

work page 2023

[8] [8]

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st ed. USA: John Wiley & Sons, Inc., 1994

work page 1994

[9] [10]

Dynamic qos management and optimization in service- based systems,

R. Calinescu, L. Grunske, M. Kwiatkowska, R. Mirandola, and G. Tamburrelli, “Dynamic qos management and optimization in service- based systems,” IEEE Trans. Softw. Eng. , vol. 37, no. 3, p. 387–409, May 2011. [Online]. Available: https://doi.org/10.1109/TSE.2010.92

work page doi:10.1109/tse.2010.92 2011

[10] [11]

Family- based modeling and analysis for probabilistic systems — featuring profeat,

P. Chrszon, C. Dubslaff, S. Kl ¨uppelholz, and C. Baier, “Family- based modeling and analysis for probabilistic systems — featuring profeat,” in Proceedings of the 19th International Conference on Fundamental Approaches to Software Engineering - Volume 9633 . Berlin, Heidelberg: Springer-Verlag, 2016, p. 287–304. [Online]. Available: https://doi.org/10.10...

work page doi:10.1007/978-3-662-49665-7 2016

[11] [12]

Multiple mobile robot task and motion planning: A survey,

L. Antonyshyn, J. Silveira, S. Givigi, and J. Marshall, “Multiple mobile robot task and motion planning: A survey,” ACM Comput. Surv., vol. 55, no. 10, Feb. 2023. [Online]. Available: https://doi.org/10.1145/3564696

work page doi:10.1145/3564696 2023

[12] [13]

E. M. Clarke, W. Klieber, M. Nov ´aˇcek, and P. Zuliani, Model Checking and the State Explosion Problem . Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 1–30. [Online]. Available: https://doi.org/10.1007/978-3-642-35746-6 1

work page doi:10.1007/978-3-642-35746-6 2012

[13] [14]

Baier and J.-P

C. Baier and J.-P. Katoen, Principles of model checking . MIT press, 2008

work page 2008

[14] [15]

Kwiatkowska, G

M. Kwiatkowska, G. Norman, and D. Parker, Stochastic Model Checking. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 220–270. [Online]. Available: https://doi.org/10.1007/978-3-540-72522-0 6

work page doi:10.1007/978-3-540-72522-0 2007

[15] [16]

Abstraction-refinement for hierarchical probabilistic models,

S. Junges and M. T. Spaan, “Abstraction-refinement for hierarchical probabilistic models,” in International Conference on Computer Aided Verification. Springer, 2022, pp. 102–123

work page 2022

[16] [17]

The 10,000 facets of mdp model checking,

C. Baier, H. Hermanns, and J.-P. Katoen, “The 10,000 facets of mdp model checking,” Computing and Software Science: State of the Art and Perspectives, pp. 420–451, 2019

work page 2019

[17] [18]

On correctness, precision, and performance in quantitative verification,

C. E. Budde, A. Hartmanns, M. Klauck, J. K ˇret´ınsk´y, D. Parker, T. Quatmann, A. Turrini, and Z. Zhang, “On correctness, precision, and performance in quantitative verification,” in Leveraging Applications of Formal Methods, Verification and Validation: Tools and Trends , T. Margaria and B. Steffen, Eds. Cham: Springer International Publishing, 2021, pp...

work page 2021

[18] [19]

PRISM 4.0: Verification of probabilistic real-time systems,

M. Kwiatkowska, G. Norman, and D. Parker, “PRISM 4.0: Verification of probabilistic real-time systems,” in Proc. 23rd International Conference on Computer Aided Verification (CAV’11), ser. LNCS, G. Gopalakrishnan and S. Qadeer, Eds., vol. 6806. Springer, 2011, pp. 585–591

work page 2011

[19] [20]

The probabilistic model checker storm,

C. Hensel, S. Junges, J.-P. Katoen, T. Quatmann, and M. V olk, “The probabilistic model checker storm,” vol. 24, no. 4, p. 589–610, Aug

work page

[20] [21]

Available: https://doi.org/10.1007/s10009-021-00633-z

[Online]. Available: https://doi.org/10.1007/s10009-021-00633-z

work page doi:10.1007/s10009-021-00633-z

[21] [22]

Symbolic model checking of probabilistic processes using mtbdds and the kronecker representation,

L. d. Alfaro, M. Z. Kwiatkowska, G. Norman, D. Parker, and R. Segala, “Symbolic model checking of probabilistic processes using mtbdds and the kronecker representation,” in Proceedings of the 6th International Conference on Tools and Algorithms for Construction and Analysis of Systems: Held as Part of the European Joint Conferences on the Theory and Pract...

work page 2000

[22] [23]

Explicit model checking of very large mdp using partitioning and secondary storage,

A. Hartmanns and H. Hermanns, “Explicit model checking of very large mdp using partitioning and secondary storage,” in Automated Technology for Verification and Analysis , B. Finkbeiner, G. Pu, and L. Zhang, Eds. Cham: Springer International Publishing, 2015, pp. 131–147

work page 2015

[23] [24]

Statistical model checking for markov decision processes,

D. Henriques, J. G. Martins, P. Zuliani, A. Platzer, and E. M. Clarke, “Statistical model checking for markov decision processes,” in Proceedings of the 2012 Ninth International Conference on Quantitative Evaluation of Systems , ser. QEST ’12. USA: IEEE Computer Society, 2012, p. 84–93. [Online]. Available: https: //doi.org/10.1109/QEST.2012.19

work page doi:10.1109/qest.2012.19 2012

[24] [25]

[Hir04] Mika Hirvensalo

H. Hansson and B. Jonsson, “A logic for reasoning about time and reliability,” Form. Asp. Comput., vol. 6, no. 5, p. 512–535, Sep. 1994. [Online]. Available: https://doi.org/10.1007/BF01211866

work page doi:10.1007/bf01211866 1994

[25] [26]

Model checking of probabilistic and nondeterministic systems,

A. Bianco and L. De Alfaro, “Model checking of probabilistic and nondeterministic systems,” in International Conference on Foundations of Software Technology and Theoretical Computer Science . Springer, 1995, pp. 499–513

work page 1995

[26] [27]

Chatterjee and T

K. Chatterjee and T. A. Henzinger, Value Iteration. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 107–138. [Online]. Available: https://doi.org/10.1007/978-3-540-69850-0 7

work page doi:10.1007/978-3-540-69850-0 2008

[27] [28]

Bellman, Dynamic Programming

R. Bellman, Dynamic Programming. Princeton University Press, 1957

work page 1957

[28] [29]

Planning and control of autonomous mobile robots for intralogistics: Literature review and research agenda,

G. Fragapane, R. de Koster, F. Sgarbossa, and J. O. Strandhagen, “Planning and control of autonomous mobile robots for intralogistics: Literature review and research agenda,” European Journal of Operational Research, vol. 294, no. 2, pp. 405–426, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0377221721000217

work page 2021

[29] [30]

Model minimization in markov decision processes,

T. Dean and R. Givan, “Model minimization in markov decision processes,” in Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence , ser. AAAI’97/IAAI’97. AAAI Press, 1997, p. 106–111

work page 1997

[30] [31]

Reinforcement learning: a survey,

L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: a survey,” J. Artif. Int. Res. , vol. 4, no. 1, p. 237–285, May 1996

work page 1996

[31] [32]

Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,

R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence, vol. 112, no. 1, pp. 181–211, 1999. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0004370299000521

work page 1999

[32] [33]

AI Safety Gridworlds

J. Leike, M. Martic, V . Krakovna, P. A. Ortega, T. Everitt, A. Lefrancq, L. Orseau, and S. Legg, “AI safety gridworlds,” CoRR, vol. abs/1711.09883, 2017. [Online]. Available: http://arxiv.org/abs/1711. 09883

work page internal anchor Pith review Pith/arXiv arXiv 2017

[33] [34]

Con- troller synthesis for omega-regular and steady-state specifications,

A. Velasquez, I. Alkhouri, A. Beckus, A. Trivedi, and G. Atia, “Con- troller synthesis for omega-regular and steady-state specifications,” in Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems , ser. AAMAS ’22. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2022, p. 1310–1318

work page 2022

[34] [35]

Multigain 2.0: Mdp controller synthesis for multiple mean-payoff, ltl and steady-state constraints,

S. Bals, A. Evangelidis, J. K ˇret´ınsk´y, and J. Waibel, “Multigain 2.0: Mdp controller synthesis for multiple mean-payoff, ltl and steady-state constraints,” in Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control , ser. HSCC ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Availabl...

work page doi:10.1145/3641513.3650135 2024

[35] [36]

Learning-based probabilistic ltl motion planning with environment and motion uncertainties,

M. Cai, H. Peng, Z. Li, and Z. Kan, “Learning-based probabilistic ltl motion planning with environment and motion uncertainties,” IEEE Transactions on Automatic Control , vol. 66, no. 5, pp. 2386–2392, 2021

work page 2021

[36] [37]

Engineering trustworthy self-adaptive software with dynamic assurance cases,

R. Calinescu, D. Weyns, S. Gerasimou, M. U. Iftikhar, I. Habli, and T. Kelly, “Engineering trustworthy self-adaptive software with dynamic assurance cases,” IEEE Transactions on Software Engineering , vol. 44, no. 11, pp. 1039–1069, 2017

work page 2017

[37] [38]

Synthesizing tradeoff spaces with quantitative guarantees for families of software systems,

J. C ´amara, D. Garlan, and B. Schmerl, “Synthesizing tradeoff spaces with quantitative guarantees for families of software systems,” Journal of Systems and Software , vol. 152, pp. 33–49, 2019

work page 2019

[38] [39]

Markov decision process (mdp) framework for optimizing software on mobile phones,

T. L. Cheung, K. Okamoto, F. Maker, X. Liu, and V . Akella, “Markov decision process (mdp) framework for optimizing software on mobile phones,” in Proceedings of the Seventh ACM International Conference on Embedded Software , ser. EMSOFT ’09. New York, NY , USA: Association for Computing Machinery, 2009, p. 11–20. [Online]. Available: https://doi.org/10.1...

work page doi:10.1145/1629335.1629338 2009

[39] [40]

A markov decision process- based service migration procedure for follow me cloud,

A. Ksentini, T. Taleb, and M. Chen, “A markov decision process- based service migration procedure for follow me cloud,” in 2014 IEEE International Conference on Communications (ICC) , 2014, pp. 1350– 1354

work page 2014

[40] [41]

Performance modelling and verification of cloud-based auto-scaling policies,

A. Evangelidis, D. Parker, and R. Bahsoon, “Performance modelling and verification of cloud-based auto-scaling policies,” Future Generation Computer Systems , vol. 87, pp. 629–638, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X17312475

work page 2018

[41] [42]

Proactive self-adaptation under uncertainty: a probabilistic model checking approach,

G. A. Moreno, J. C ´amara, D. Garlan, and B. Schmerl, “Proactive self-adaptation under uncertainty: a probabilistic model checking approach,” in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering , ser. ESEC/FSE 2015. New York, NY , USA: Association for Computing Machinery, 2015, p. 1–12. [Online]. Available: https://doi.org...

work page doi:10.1145/2786805.2786853 2015

[42] [43]

Counterexample- guided abstraction refinement,

E. Clarke, O. Grumberg, S. Jha, Y . Lu, and H. Veith, “Counterexample- guided abstraction refinement,” in Computer Aided Verification, E. A. Emerson and A. P. Sistla, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000, pp. 154–169

work page 2000

[43] [44]

Probabilistic cegar,

H. Hermanns, B. Wachter, and L. Zhang, “Probabilistic cegar,” in International Conference on Computer Aided Verification . Springer, 2008, pp. 162–175

work page 2008

[44] [45]

A game- based abstraction-refinement framework for markov decision processes,

M. Kattenbelt, M. Kwiatkowska, G. Norman, and D. Parker, “A game- based abstraction-refinement framework for markov decision processes,” Formal Methods in System Design , vol. 36, pp. 246–280, 2010

work page 2010

[45] [46]

Compositional proba- bilistic model checking with string diagrams of mdps,

K. Watanabe, C. Eberhart, K. Asada, and I. Hasuo, “Compositional proba- bilistic model checking with string diagrams of mdps,” in Computer Aided Verification - 35th International Conference, CAV 2023, Proceedings , ser. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), C. ...

work page 2023

[46] [47]

C. A. C. Coello, Evolutionary algorithms for solving multi-objective problems. Springer, 2007

work page 2007

[47] [48]

Search-based synthesis of probabilistic models for quality-of-service software engineering (t),

S. Gerasimou, G. Tamburrelli, and R. Calinescu, “Search-based synthesis of probabilistic models for quality-of-service software engineering (t),” in 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015, pp. 319–330

work page 2015

[48] [49]

Synthesis of proba- bilistic models for quality-of-service software engineering,

S. Gerasimou, R. Calinescu, and G. Tamburrelli, “Synthesis of proba- bilistic models for quality-of-service software engineering,” Automated Software Engg., vol. 25, no. 4, p. 785–831, Dec. 2018

work page 2018

[49] [50]

Haiq: Synthesis of software design spaces with structural and probabilistic guarantees,

J. C ´amara, “Haiq: Synthesis of software design spaces with structural and probabilistic guarantees,” in Proceedings of the 8th International Conference on Formal Methods in Software Engineering , 2020, pp. 22– 33

work page 2020

[50] [51]

Targeting requirements violations of autonomous driving systems by dynamic evolutionary search

S. Gerasimou, J. C ´amara, R. Calinescu, N. Alasmari, F. Alhwikem, and X. Fang, “Evolutionary-guided synthesis of verified pareto- optimal mdp policies,” in Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering , ser. ASE ’21. IEEE Press, 2022, p. 842–853. [Online]. Available: https://doi.org/10.1109/ASE51524.2021.9678727

work page doi:10.1109/ase51524.2021.9678727 2022

[51] [52]

Feudal reinforcement learning,

P. Dayan and G. E. Hinton, “Feudal reinforcement learning,” in Advances in Neural Information Processing Systems , S. Hanson, J. Cowan, and C. Giles, Eds., vol. 5. Morgan-Kaufmann, 1992. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 1992/file/d14220ee66aeec73c49038385428ec4c-Paper.pdf

work page 1992

[52] [53]

Hierarchical reinforcement learning with the maxq value function decomposition,

T. G. Dietterich, “Hierarchical reinforcement learning with the maxq value function decomposition,” J. Artif. Int. Res. , vol. 13, no. 1, p. 227–303, Nov. 2000

work page 2000

[53] [54]

Abstract value iteration for hierarchical reinforcement learning,

K. Jothimurugan, O. Bastani, and R. Alur, “Abstract value iteration for hierarchical reinforcement learning,” in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics , ser. Proceedings of Machine Learning Research, A. Banerjee and K. Fukumizu, Eds., vol. 130. PMLR, 13–15 Apr 2021, pp. 1162–1170. [Online]. Available: h...

work page 2021

[54] [55]

Markovian state and action abstractions for mdps via hierarchical mcts,

A. Bai, S. Srivastava, and S. Russell, “Markovian state and action abstractions for mdps via hierarchical mcts,” in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence , ser. IJCAI’16. AAAI Press, 2016, p. 3029–3037

work page 2016