Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives

Alper Kamil Bozkurt; Michael Zavlanos; Miroslav Pajic; Yu Wang

arxiv: 2010.01050 · v1 · submitted 2020-10-02 · 💻 cs.RO · cs.LO

Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives

Alper Kamil Bozkurt , Yu Wang , Michael Zavlanos , Miroslav Pajic This is my paper

Pith reviewed 2026-05-24 14:48 UTC · model grok-4.3

classification 💻 cs.RO cs.LO

keywords model-free reinforcement learninglinear temporal logicstochastic gamesRabin automatoncontrol synthesismotion planningunknown environments

0 comments

The pith

Model-free reinforcement learning maximizes the probability of satisfying an LTL specification in unknown stochastic games when the derived Rabin automaton has a single accepting pair.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a reinforcement learning method that learns control strategies directly from interaction without knowing the environment model. It focuses on games where a controller competes against an adversarial environment to satisfy an LTL formula. The method works exactly for specifications whose Rabin automaton has one accepting pair and gives a guaranteed lower bound for more complex cases. This allows synthesizing high-probability satisfying strategies in settings where traditional model-based methods fail due to unknown transitions.

Core claim

We introduce a model-free reinforcement learning (RL) methodology to find a strategy that maximizes the probability of satisfying a given LTL specification when the Rabin condition of the derived DRA has a single accepting pair. We then generalize this approach to LTL formulas for which the Rabin condition has a larger number of accepting pairs, providing a lower bound on the satisfaction probability.

What carries the argument

The model-free RL algorithm applied to the product of the unknown stochastic game and the DRA, with rewards based on the single accepting pair to optimize the satisfaction probability.

If this is right

Strategies for satisfying LTL objectives can be learned through direct interaction in environments with unknown dynamics.
A lower bound on satisfaction probability is obtained for LTL formulas whose DRA has multiple accepting pairs.
The approach applies to robotic motion planning tasks where the controller must act against an adversarial environment.
No explicit model of transition probabilities or topology is required for the learning process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The single-pair restriction may cover many common motion-planning specifications, making the exact method widely usable.
The lower-bound generalization suggests a path toward handling arbitrary LTL formulas with the same learning architecture.
Extending the reward design to continuous or high-dimensional state spaces could be tested by replacing tabular RL with function approximators.

Load-bearing premise

The problem can be modeled as a turn-based zero-sum stochastic game with fully unknown transition probabilities and topology, where the LTL goal reduces directly to a DRA acceptance condition.

What would settle it

In a small stochastic game with known transitions and a computable optimal satisfaction probability for a single-pair DRA, if the learned strategy's empirical satisfaction rate falls below the known optimum after sufficient training, the method would be falsified.

Figures

Figures reproduced from arXiv: 2010.01050 by Alper Kamil Bozkurt, Michael Zavlanos, Miroslav Pajic, Yu Wang.

**Figure 1.** Figure 1: A DRA derived from the LTL formula ϕ=♦b∨♦d. Here, B1={q1}, C2={q0} and B2={q2} are the states in the Rabin acceptance condition Acc={(∅, B1),(C2, B2)}. game G. The value of a state s under the strategy pair (µ, ν) is the expected sum of the discounted reward vµ,ν(s) = Eσ∼Gµ,ν "X∞ i=0 γ iR(σ[t+i]) [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: A k-copy game obtained from a stochastic Rabin(2) game. W(1) and W(2) denote the winning sets for the first and the second Rabin pair respectively. Intuitively, the k-copy game G ? consists of k exact copies of the original game G for each accepting pair, and a dummy state hs, i+ki for every controller state s ∈ Sµ for each copy i ∈ [k] ( [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: A stochastic Rabin game with two accepting pairs: [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of the actions of the environment for [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: The objective strategy and the estimated maximal [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: The control strategies obtained for ϕ2 from (38). The values of the states are represented by the shades of blue (the darker, the higher value), which are the estimation of how likely the controller satisfies the objective. is based on the position of the first agent as L(hs1, s2i) := ( L(s1), s1 6= s2 L(s1) ∪ {a}, s1 = s2 (37) The label a represents the state where both agents are in the same position (ad… view at source ↗

read the original abstract

We study the problem of synthesizing control strategies for Linear Temporal Logic (LTL) objectives in unknown environments. We model this problem as a turn-based zero-sum stochastic game between the controller and the environment, where the transition probabilities and the model topology are fully unknown. The winning condition for the controller in this game is the satisfaction of the given LTL specification, which can be captured by the acceptance condition of a deterministic Rabin automaton (DRA) directly derived from the LTL specification. We introduce a model-free reinforcement learning (RL) methodology to find a strategy that maximizes the probability of satisfying a given LTL specification when the Rabin condition of the derived DRA has a single accepting pair. We then generalize this approach to LTL formulas for which the Rabin condition has a larger number of accepting pairs, providing a lower bound on the satisfaction probability. Finally, we illustrate applicability of our RL method on two motion planning case studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a model-free RL method to maximize LTL satisfaction probability in fully unknown turn-based stochastic games via DRA, at least for single accepting pairs plus a lower bound for more.

read the letter

The main contribution is a model-free RL approach that treats the controller-environment interaction as an unknown zero-sum stochastic game and optimizes directly for the Rabin acceptance condition on the product with a DRA derived from the LTL spec. They start with the single-pair case to maximize satisfaction probability, then extend to a lower bound when there are multiple pairs. Two motion-planning case studies are used to illustrate the idea. This is a clear step beyond model-based methods that assume known transitions or topology. The modeling choice is standard and internally consistent, and restricting the initial claim to single pairs avoids overreaching on the harder multi-pair case. The lower-bound generalization is a reasonable, cautious move. The case studies show the method can be applied to robotic planning tasks where the environment is not fully known in advance. One soft spot is that the abstract gives no information on the specific RL algorithm, convergence arguments, or how they handle the infinite-horizon probabilistic objective in practice. Without those details it is difficult to judge whether the learned strategies actually approach the claimed probabilities or how sensitive the results are to the unknown topology. The experiments are mentioned but not described, so scaling behavior and comparison to alternatives remain open. This work is aimed at researchers combining RL with formal specifications for uncertain robotic systems. Readers already working on safe autonomy or automata-based control would get the most from it. The core idea is distinct enough and the problem is relevant that the paper deserves a serious referee, even if the theoretical and empirical sections will likely need expansion.

Referee Report

0 major / 2 minor

Summary. The paper models LTL control synthesis in unknown environments as a turn-based zero-sum stochastic game with fully unknown transition probabilities and topology. It derives a DRA from the LTL formula and introduces a model-free RL method to maximize the probability of satisfying the specification when the Rabin condition has a single accepting pair; a generalization supplies a lower bound on the satisfaction probability for formulas with multiple accepting pairs. The approach is illustrated on two motion planning case studies.

Significance. If the RL procedure is shown to converge to an optimal (or near-optimal) strategy for the product game, the result would be significant for robotics and autonomous systems, where model-free methods for temporal-logic objectives under uncertainty are needed but rarely available. The explicit restriction to single-pair Rabin conditions and the lower-bound extension for multiple pairs are clearly stated strengths that avoid overclaiming.

minor comments (2)

The abstract states that the method 'finds a strategy that maximizes the probability' but does not indicate whether the underlying RL algorithm is proved to converge or only empirically observed to do so; a brief statement on convergence assumptions would clarify the theoretical status of the central claim.
The two motion-planning case studies are referenced only at the end of the abstract; adding a short quantitative summary (e.g., achieved satisfaction probabilities or comparison to a baseline) would make the empirical support more immediately visible.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript, positive summary, and recommendation for minor revision. We are pleased that the significance of the model-free RL approach for LTL objectives in unknown stochastic games is recognized, along with the explicit handling of the single-pair Rabin restriction.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper models the LTL synthesis problem as an unknown turn-based zero-sum stochastic game whose product with a known DRA yields an (unknown) product game, then applies standard model-free RL to maximize acceptance probability under the single-pair Rabin restriction (with an explicit lower-bound extension for multiple pairs). No step reduces a claimed prediction to a fitted input by construction, no load-bearing premise rests on a self-citation chain, and no ansatz or uniqueness result is imported from prior author work. The central claim therefore remains independent of its own outputs and is grounded in externally verifiable game-theoretic and automata constructions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard LTL-to-DRA translation and stochastic game modeling with no free parameters or invented entities introduced.

axioms (2)

standard math LTL specifications can be translated to deterministic Rabin automata whose acceptance conditions capture satisfaction.
Standard result from automata theory for temporal logics.
domain assumption The interaction is a turn-based zero-sum stochastic game with fully unknown transitions and topology.
Core modeling choice stated in the abstract.

pith-pipeline@v0.9.0 · 5696 in / 1249 out tokens · 56692 ms · 2026-05-24T14:48:14.281861+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Secure Planning Against Stealthy Attacks via Model-Free Reinforcement Learning
cs.RO 2020-11 unverdicted novelty 6.0

Secure planning against stealthy attacks on robot actuators in unknown stochastic environments is achieved by modeling the problem as a stochastic game and solving it with model-free RL to satisfy a combined LTL formula.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Reinforcement Learning: An Introduction

Richard S Sutton and Andrew G Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, USA, 2nd edition, 2018

work page 2018
[2]

Human-level control through deep reinforcement learning

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015

work page 2015
[3]

Mastering the game of Go with deep neural networks and tree search

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Lau- rent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of Go with deep neural networks and tree search. nature, 529(7587):484–489, 2016

work page 2016
[4]

Tommaso Dreossi, Alexandre Donz ´e, and Sanjit A. Seshia. Composi- tional Falsiﬁcation of Cyber-Physical Systems with Machine Learning Components. In Clark Barrett, Misty Davies, and Temesghen Kahsai, editors, NASA Formal Methods, Lecture Notes in Computer Science, pages 357–372. Springer International Publishing, 2017

work page 2017
[5]

Formal Veriﬁcation of Neural Network Controlled Autonomous Systems

Xiaowu Sun, Haitham Khedr, and Yasser Shoukry. Formal Veriﬁcation of Neural Network Controlled Autonomous Systems. In Proceedings of the 22Nd ACM International Conference on Hybrid Systems: Computation and Control, HSCC ’19, pages 147–156, New York, NY , USA, 2019. ACM

work page 2019
[6]

Johnson, and Xenofon Koutsoukos

Hoang-Dung Tran, Feiyang Cai, Manzanas Lopez Diego, Patrick Musau, Taylor T. Johnson, and Xenofon Koutsoukos. Safety Veriﬁca- tion of Cyber-Physical Systems with Reinforcement Learning Control. ACM Transactions on Embedded Computing Systems , 18(5s):1–22, October 2019

work page 2019
[7]

Gray-box Adversarial Testing for Control Systems with Machine Learning Components

Shakiba Yaghoubi and Georgios Fainekos. Gray-box Adversarial Testing for Control Systems with Machine Learning Components. In Proceedings of the 22Nd ACM International Conference on Hybrid Systems: Computation and Control , HSCC ’19, pages 179–184, New York, NY , USA, 2019. ACM

work page 2019
[8]

Statistical veriﬁcation of learning-based cyber-physical systems

Mojtaba Zarei, Yu Wang, and Miroslav Pajic. Statistical veriﬁcation of learning-based cyber-physical systems. In ACM International Conference on Hybrid Systems: Computation and Control (HSCC) , pages 1–7, Sydney, Australia, October 2020

work page 2020
[9]

Murray, and Joel W

Richard Cheng, G ´abor Orosz, Richard M. Murray, and Joel W. Burdick. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks. Proceedings of the AAAI Conference on Artiﬁcial Intelligence , 33(01):3387–3395, July 2019

work page 2019
[10]

Learning for Safety-Critical Control with Control Barrier Functions

Andrew Taylor, Andrew Singletary, Yisong Yue, and Aaron Ames. Learning for Safety-Critical Control with Control Barrier Functions. In Learning for Dynamics and Control , pages 708–717. PMLR, July 2020

work page 2020
[11]

Hyperproperties for robotics: Mo- tion planning via HyperLTL

Yu Wang and Miroslav Pajic. Hyperproperties for robotics: Mo- tion planning via HyperLTL. In IEEE International Conference on Robotics and Automation (ICRA) , page accepted, Paris, France, 2020

work page 2020
[12]

Reinforcement learning: Connections, surprises, challenges

Andrew G Barto. Reinforcement learning: Connections, surprises, challenges. AI Magazine, 40(1), 2019

work page 2019
[13]

Translating structured English to robot controllers

Hadas Kress-Gazit, Georgios E Fainekos, and George J Pappas. Translating structured English to robot controllers. Advanced Robotics, 22(12):1343–1359, 2008

work page 2008
[14]

Revising motion planning under linear temporal logic speciﬁcations in partially known workspaces

Meng Guo, Karl H Johansson, and Dimos V Dimarogonas. Revising motion planning under linear temporal logic speciﬁcations in partially known workspaces. In 2013 IEEE International Conference on Robotics and Automation , pages 5025–5032. IEEE, 2013

work page 2013
[15]

Receding horizon temporal logic planning

Tichakorn Wongpiromsarn, Ufuk Topcu, and Richard M Murray. Receding horizon temporal logic planning. IEEE Transactions on Automatic Control, 57(11):2817–2830, 2012

work page 2012
[16]

Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning

Alper Kamil Bozkurt, Yu Wang, Michael M Zavlanos, and Miroslav Pajic. Control synthesis from linear temporal logic speciﬁcations using model-free reinforcement learning, 2019. arXiv:1909.07299 [cs.RO]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[17]

Reinforcement learning for temporal logic control synthesis with probabilistic satis- faction guarantees, 2019

Mohammadhosein Hasanbeig, Yiannis Kantaros, Alessandro Abate, Daniel Kroening, George J Pappas, and Insup Lee. Reinforcement learning for temporal logic control synthesis with probabilistic satis- faction guarantees, 2019. arXiv:1909.05304 [cs.LO]

work page arXiv 2019
[18]

Omega-regular objectives in model-free reinforcement learning

Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Omega-regular objectives in model-free reinforcement learning. In Tom ´aˇs V ojnar and Lijun Zhang, editors, Tools and Algorithms for the Construction and Anal- ysis of Systems , pages 395–412, Cham, 2019. Springer International Publishing

work page 2019
[19]

Solving games without determinization

Thomas A Henzinger and Nir Piterman. Solving games without determinization. In International Workshop on Computer Science Logic, pages 395–410. Springer, 2006

work page 2006
[20]

Probably approximately correct learning in stochastic games with temporal logic speciﬁcations

Min Wen and Ufuk Topcu. Probably approximately correct learning in stochastic games with temporal logic speciﬁcations. In Proceed- ings of the Twenty-Fifth International Joint Conference on Artiﬁcial Intelligence, IJCAI’16, page 3630–3636. AAAI Press, 2016

work page 2016
[21]

PAC sta- tistical model checking for Markov decision processes and stochastic games

Pranav Ashok, Jan K ˇret´ınsk`y, and Maximilian Weininger. PAC sta- tistical model checking for Markov decision processes and stochastic games. In International Conference on Computer Aided Veriﬁcation , pages 497–519. Springer, 2019

work page 2019
[22]

Principles of Model Checking

Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT Press, Cambridge, MA, USA, 2008

work page 2008
[23]

Stochastic games

Lloyd S Shapley. Stochastic games. Proceedings of the national academy of sciences , 39(10):1095–1100, 1953

work page 1953
[24]

Markov games as a framework for multi-agent reinforcement learning

Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, pages 157–163. Elsevier, 1994

work page 1994
[25]

Nash q-learning for general-sum stochastic games

Junling Hu and Michael P Wellman. Nash q-learning for general-sum stochastic games. Journal of machine learning research, 4(Nov):1039– 1069, 2003

work page 2003
[26]

An analysis of stochastic game theory for multiagent reinforcement learning

Michael Bowling and Manuela Veloso. An analysis of stochastic game theory for multiagent reinforcement learning. Technical report, Carnegie-Mellon Univ Pittsburgh Pa School of Computer Science, 2000

work page 2000
[27]

Henzinger

Krishnendu Chatterjee and Thomas A. Henzinger. A survey of stochastic ω-regular games. Journal of Computer and System Sciences, 78(2):394 – 413, 2012. Games in Veriﬁcation

work page 2012
[28]

A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. CSRL, 2020. https://gitlab.oit.duke.edu/cpsl/csrl

work page 2020
[29]

Rabinizer 4: from LTL to your favourite deterministic automaton

Jan K ˇret´ınsk`y, Tobias Meggendorfer, Salomon Sickert, and Christo- pher Ziegler. Rabinizer 4: from LTL to your favourite deterministic automaton. In International Conference on Computer Aided Veriﬁca- tion, pages 567–577. Springer, 2018

work page 2018

[1] [1]

Reinforcement Learning: An Introduction

Richard S Sutton and Andrew G Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, USA, 2nd edition, 2018

work page 2018

[2] [2]

Human-level control through deep reinforcement learning

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015

work page 2015

[3] [3]

Mastering the game of Go with deep neural networks and tree search

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Lau- rent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of Go with deep neural networks and tree search. nature, 529(7587):484–489, 2016

work page 2016

[4] [4]

Tommaso Dreossi, Alexandre Donz ´e, and Sanjit A. Seshia. Composi- tional Falsiﬁcation of Cyber-Physical Systems with Machine Learning Components. In Clark Barrett, Misty Davies, and Temesghen Kahsai, editors, NASA Formal Methods, Lecture Notes in Computer Science, pages 357–372. Springer International Publishing, 2017

work page 2017

[5] [5]

Formal Veriﬁcation of Neural Network Controlled Autonomous Systems

Xiaowu Sun, Haitham Khedr, and Yasser Shoukry. Formal Veriﬁcation of Neural Network Controlled Autonomous Systems. In Proceedings of the 22Nd ACM International Conference on Hybrid Systems: Computation and Control, HSCC ’19, pages 147–156, New York, NY , USA, 2019. ACM

work page 2019

[6] [6]

Johnson, and Xenofon Koutsoukos

Hoang-Dung Tran, Feiyang Cai, Manzanas Lopez Diego, Patrick Musau, Taylor T. Johnson, and Xenofon Koutsoukos. Safety Veriﬁca- tion of Cyber-Physical Systems with Reinforcement Learning Control. ACM Transactions on Embedded Computing Systems , 18(5s):1–22, October 2019

work page 2019

[7] [7]

Gray-box Adversarial Testing for Control Systems with Machine Learning Components

Shakiba Yaghoubi and Georgios Fainekos. Gray-box Adversarial Testing for Control Systems with Machine Learning Components. In Proceedings of the 22Nd ACM International Conference on Hybrid Systems: Computation and Control , HSCC ’19, pages 179–184, New York, NY , USA, 2019. ACM

work page 2019

[8] [8]

Statistical veriﬁcation of learning-based cyber-physical systems

Mojtaba Zarei, Yu Wang, and Miroslav Pajic. Statistical veriﬁcation of learning-based cyber-physical systems. In ACM International Conference on Hybrid Systems: Computation and Control (HSCC) , pages 1–7, Sydney, Australia, October 2020

work page 2020

[9] [9]

Murray, and Joel W

Richard Cheng, G ´abor Orosz, Richard M. Murray, and Joel W. Burdick. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks. Proceedings of the AAAI Conference on Artiﬁcial Intelligence , 33(01):3387–3395, July 2019

work page 2019

[10] [10]

Learning for Safety-Critical Control with Control Barrier Functions

Andrew Taylor, Andrew Singletary, Yisong Yue, and Aaron Ames. Learning for Safety-Critical Control with Control Barrier Functions. In Learning for Dynamics and Control , pages 708–717. PMLR, July 2020

work page 2020

[11] [11]

Hyperproperties for robotics: Mo- tion planning via HyperLTL

Yu Wang and Miroslav Pajic. Hyperproperties for robotics: Mo- tion planning via HyperLTL. In IEEE International Conference on Robotics and Automation (ICRA) , page accepted, Paris, France, 2020

work page 2020

[12] [12]

Reinforcement learning: Connections, surprises, challenges

Andrew G Barto. Reinforcement learning: Connections, surprises, challenges. AI Magazine, 40(1), 2019

work page 2019

[13] [13]

Translating structured English to robot controllers

Hadas Kress-Gazit, Georgios E Fainekos, and George J Pappas. Translating structured English to robot controllers. Advanced Robotics, 22(12):1343–1359, 2008

work page 2008

[14] [14]

Revising motion planning under linear temporal logic speciﬁcations in partially known workspaces

Meng Guo, Karl H Johansson, and Dimos V Dimarogonas. Revising motion planning under linear temporal logic speciﬁcations in partially known workspaces. In 2013 IEEE International Conference on Robotics and Automation , pages 5025–5032. IEEE, 2013

work page 2013

[15] [15]

Receding horizon temporal logic planning

Tichakorn Wongpiromsarn, Ufuk Topcu, and Richard M Murray. Receding horizon temporal logic planning. IEEE Transactions on Automatic Control, 57(11):2817–2830, 2012

work page 2012

[16] [16]

Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning

Alper Kamil Bozkurt, Yu Wang, Michael M Zavlanos, and Miroslav Pajic. Control synthesis from linear temporal logic speciﬁcations using model-free reinforcement learning, 2019. arXiv:1909.07299 [cs.RO]

work page internal anchor Pith review Pith/arXiv arXiv 2019

[17] [17]

Reinforcement learning for temporal logic control synthesis with probabilistic satis- faction guarantees, 2019

Mohammadhosein Hasanbeig, Yiannis Kantaros, Alessandro Abate, Daniel Kroening, George J Pappas, and Insup Lee. Reinforcement learning for temporal logic control synthesis with probabilistic satis- faction guarantees, 2019. arXiv:1909.05304 [cs.LO]

work page arXiv 2019

[18] [18]

Omega-regular objectives in model-free reinforcement learning

Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Omega-regular objectives in model-free reinforcement learning. In Tom ´aˇs V ojnar and Lijun Zhang, editors, Tools and Algorithms for the Construction and Anal- ysis of Systems , pages 395–412, Cham, 2019. Springer International Publishing

work page 2019

[19] [19]

Solving games without determinization

Thomas A Henzinger and Nir Piterman. Solving games without determinization. In International Workshop on Computer Science Logic, pages 395–410. Springer, 2006

work page 2006

[20] [20]

Probably approximately correct learning in stochastic games with temporal logic speciﬁcations

Min Wen and Ufuk Topcu. Probably approximately correct learning in stochastic games with temporal logic speciﬁcations. In Proceed- ings of the Twenty-Fifth International Joint Conference on Artiﬁcial Intelligence, IJCAI’16, page 3630–3636. AAAI Press, 2016

work page 2016

[21] [21]

PAC sta- tistical model checking for Markov decision processes and stochastic games

Pranav Ashok, Jan K ˇret´ınsk`y, and Maximilian Weininger. PAC sta- tistical model checking for Markov decision processes and stochastic games. In International Conference on Computer Aided Veriﬁcation , pages 497–519. Springer, 2019

work page 2019

[22] [22]

Principles of Model Checking

Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT Press, Cambridge, MA, USA, 2008

work page 2008

[23] [23]

Stochastic games

Lloyd S Shapley. Stochastic games. Proceedings of the national academy of sciences , 39(10):1095–1100, 1953

work page 1953

[24] [24]

Markov games as a framework for multi-agent reinforcement learning

Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, pages 157–163. Elsevier, 1994

work page 1994

[25] [25]

Nash q-learning for general-sum stochastic games

Junling Hu and Michael P Wellman. Nash q-learning for general-sum stochastic games. Journal of machine learning research, 4(Nov):1039– 1069, 2003

work page 2003

[26] [26]

An analysis of stochastic game theory for multiagent reinforcement learning

Michael Bowling and Manuela Veloso. An analysis of stochastic game theory for multiagent reinforcement learning. Technical report, Carnegie-Mellon Univ Pittsburgh Pa School of Computer Science, 2000

work page 2000

[27] [27]

Henzinger

Krishnendu Chatterjee and Thomas A. Henzinger. A survey of stochastic ω-regular games. Journal of Computer and System Sciences, 78(2):394 – 413, 2012. Games in Veriﬁcation

work page 2012

[28] [28]

A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. CSRL, 2020. https://gitlab.oit.duke.edu/cpsl/csrl

work page 2020

[29] [29]

Rabinizer 4: from LTL to your favourite deterministic automaton

Jan K ˇret´ınsk`y, Tobias Meggendorfer, Salomon Sickert, and Christo- pher Ziegler. Rabinizer 4: from LTL to your favourite deterministic automaton. In International Conference on Computer Aided Veriﬁca- tion, pages 567–577. Springer, 2018

work page 2018