Learning Optimal Strategies for Temporal Tasks in Stochastic Games

Alper Kamil Bozkurt; Michael M. Zavlanos; Miroslav Pajic; Yu Wang

arxiv: 2102.04307 · v3 · submitted 2021-02-08 · 💻 cs.AI · cs.LG· cs.LO· cs.RO

Learning Optimal Strategies for Temporal Tasks in Stochastic Games

Alper Kamil Bozkurt , Yu Wang , Michael M. Zavlanos , Miroslav Pajic This is my paper

Pith reviewed 2026-05-24 13:08 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.LOcs.RO

keywords reinforcement learninglinear temporal logicstochastic gamesmodel-free learningcontroller synthesisparity automatontemporal specificationsadversarial environments

0 comments

The pith

Deriving rewards and discount factors from a parity automaton reduces worst-case LTL satisfaction in stochastic games to a standard discounted reward problem.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a way to learn optimal controller strategies for linear temporal logic specifications in completely unknown stochastic environments that may contain adversaries. It builds a product game from the specification's deterministic parity automaton and assigns rewards and discounts directly from the automaton's acceptance condition. This conversion lets model-free reinforcement learning algorithms solve for strategies that maximize the probability of satisfying the specification against the worst possible environment behavior. Additional techniques handle large acceptance sets by generating colors lazily or focusing on one color at a time. The result is a scalable method demonstrated on multiple case studies that outperforms prior approaches for this class of problems.

Core claim

By constructing the product stochastic game with the deterministic parity automaton translated from the LTL formula and deriving distinct rewards and discount factors from the DPA acceptance condition, the maximization of the worst-case probability of satisfying the LTL specification is reduced to the maximization of a discounted reward objective; this reduction enables direct application of model-free RL algorithms to learn an optimal controller strategy, with lazy color generation and single-color approximation provided for scalability when the number of colors is large.

What carries the argument

The product stochastic game formed by the controller, adversarial environment, and deterministic parity automaton, with rewards and discount factors assigned from the automaton's acceptance condition (colors).

If this is right

Model-free RL algorithms become applicable to LTL controller synthesis in unknown stochastic games.
Lazy color generation and single-color approximation extend the method to LTL formulas with large acceptance conditions.
Learned strategies maximize satisfaction probability against the worst-case environment.
The approach applies to a wide range of LTL formulas without requiring an explicit environment model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The reduction could support online policy adaptation when the environment changes slowly over time.
Similar reward encodings might apply to other omega-regular specifications beyond parity conditions.
Combining the method with function approximation could address very large or continuous state spaces.

Load-bearing premise

The specific rewards and discount factors derived from the DPA acceptance condition correctly encode the worst-case probability of LTL satisfaction as a discounted reward objective.

What would settle it

Running model-free RL on the product game and then measuring the empirical satisfaction probability of the resulting strategy in the original game against an adversarial environment; a systematic mismatch between the achieved probability and the computed discounted value would falsify the reduction.

Figures

Figures reproduced from arXiv: 2102.04307 by Alper Kamil Bozkurt, Michael M. Zavlanos, Miroslav Pajic, Yu Wang.

**Figure 2.** Figure 2: The robot navigation environments and the obtained learning curves. The shaded regions are the quarter of the standard [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: The DPA constructed from ϕ1. The figure is generated by using Spot [39]. The circles represent the DPA states; the Boolean formulas and the encircled numbers on the edges represent the labels and the colors of transitions respectively [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: A robot trying to repeatedly reach the green and [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Learning curve showing the estimated satisfaction [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

Synthesis from linear temporal logic (LTL) specifications provides assured controllers for systems operating in stochastic and potentially adversarial environments. Automatic synthesis tools, however, require a model of the environment to construct controllers. In this work, we introduce a model-free reinforcement learning (RL) approach to derive controllers from given LTL specifications even when the environment is completely unknown. We model the problem as a stochastic game (SG) between the controller and the adversarial environment; we then learn optimal control strategies that maximize the probability of satisfying the LTL specifications against the worst-case environment behavior. We first construct a product game using the deterministic parity automaton (DPA) translated from the given LTL specification. By deriving distinct rewards and discount factors from the acceptance condition of the DPA, we reduce the maximization of the worst-case probability of satisfying the LTL specification into the maximization of a discounted reward objective in the product game; this enables the use of model-free RL algorithms to learn an optimal controller strategy. To deal with the common scalability problems when the number of sets defining the acceptance condition of the DPA (usually referred as colors), is large, we propose a lazy color generation method where distinct rewards and discount factors are utilized only when needed, and an approximate method where the controller eventually focuses on only one color. In several case studies, we show that our approach is scalable to a wide range of LTL formulas, significantly outperforming existing methods for learning controllers from LTL specifications in SGs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reduces worst-case LTL satisfaction in unknown stochastic games to discounted RL via an explicit DPA product construction with per-color rewards and discounts, and the equivalence holds.

read the letter

The main takeaway is that this work gives a concrete model-free route from LTL specs to controllers in stochastic games. They translate the spec to a DPA, form the product game, and assign distinct rewards plus discount factors drawn directly from the acceptance condition so that the discounted value equals the max-min satisfaction probability. They prove the equivalence on the product SG, which lets standard RL algorithms apply without a model of the environment or the adversary. Two engineering additions address the usual blow-up from many colors: lazy generation that only materializes the reward structure when a color appears, and an approximation that eventually collapses attention to one color. Case studies show the combination runs on a useful range of formulas and improves on earlier LTL-to-RL baselines in SG settings. The reduction itself is the clearest advance; the lazy and approximate tricks are practical but secondary. The proof of equivalence is the part that matters most, and the stress-test note is right that no hidden bounded-color or non-adversarial assumption is required. The soft spot is the approximation step, which relaxes the objective and therefore the guarantee; the paper reports good empirical behavior but does not appear to give tight bounds on the probability loss against strong adversaries. The lazy method is mostly an implementation convenience and does not affect the core claim. No circularity or invented entities show up. This is aimed at people who need temporal-logic guarantees inside RL loops for robotics or autonomous systems. A reader already working on safe RL or formal synthesis will find usable algorithms and enough grounding to build on. It is worth sending to peer review because the central mapping is explicit, the proof is supplied, and the scalability methods are new enough to test.

Referee Report

0 major / 3 minor

Summary. The paper claims that LTL controller synthesis in unknown stochastic games (SGs) can be reduced to model-free RL by translating the LTL formula to a deterministic parity automaton (DPA), forming the product SG, and assigning rewards and per-color discount factors derived from the DPA acceptance condition so that the discounted value equals the worst-case satisfaction probability; this enables standard RL algorithms. Scalability for DPAs with many colors is addressed via a lazy color generation method and an approximation that focuses on one color. Case studies demonstrate applicability to a range of LTL formulas and outperformance of prior methods.

Significance. If the central reduction holds, the work provides a practical route to assured, model-free learning of strategies for temporal specifications against adversarial environments, bridging formal methods and RL. Credit is due for the explicit reward/discount mapping from the DPA acceptance condition, the proof of equivalence on the product SG, and the absence of free parameters or circular reductions. The lazy and approximate color-handling techniques address a known bottleneck without altering the core claim.

minor comments (3)

[§3] §3 (product construction and reward assignment): the statement that the derived discounts are 'distinct' per color should include an explicit statement of the range of possible discount values and confirmation that they remain strictly less than 1 for all finite color sets.
[§4.2] §4.2 (approximate method): the claim that the controller 'eventually focuses on only one color' lacks a precise termination or switching criterion; without it, the approximation guarantee relative to the exact worst-case probability is difficult to assess.
[Case studies] Table 1 and case-study section: the reported runtimes and success rates should be accompanied by the number of independent trials and standard deviations to allow statistical comparison with baselines.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our contribution and the recommendation of minor revision. No specific major comments appear in the provided report, so we have no individual points requiring rebuttal or clarification at this stage.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central reduction—constructing a product SG from an LTL-derived DPA and assigning rewards plus per-color discount factors so that the discounted value equals the worst-case satisfaction probability—is presented as an explicit, self-contained mapping whose equivalence is claimed to be proven directly in the product game. This relies on the standard external LTL-to-DPA translation and existing RL algorithms rather than any fitted parameters, self-definitional loops, or load-bearing self-citations. No step reduces by construction to its own inputs or renames a prior result of the same authors as a new derivation; the approach is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard LTL-to-DPA translation and the validity of the reward/discount reduction from the parity acceptance condition; no free parameters or invented entities are introduced in the abstract.

axioms (2)

standard math Linear temporal logic specifications can be translated to deterministic parity automata
Standard result in formal methods and automata theory.
domain assumption Rewards and discount factors derived from the DPA acceptance condition reduce worst-case LTL satisfaction probability to discounted reward maximization
This is the key technical reduction claimed but not derived in the abstract.

pith-pipeline@v0.9.0 · 5810 in / 1252 out tokens · 26295 ms · 2026-05-24T13:08:31.633720+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Model-Free Learning of Safe yet Effective Controllers
cs.RO 2021-03 unverdicted novelty 5.0

A model-free RL algorithm learns policies by sequentially maximizing safety probability, then LTL satisfaction probability, then discounted QoC rewards in unknown MDPs.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Model-free reinforcement learning for stochastic games with linear temporal logic objectives,

A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic, “Model-free reinforcement learning for stochastic games with linear temporal logic objectives,” in Proceedings of the 36th International Conference on Robotics and Automation (ICRA) , 2021, pp. 10 649–10 655

work page 2021
[2]

Baier and J.-P

C. Baier and J.-P. Katoen, Principles of Model Checking . Cambridge, MA, USA: MIT Press, 2008

work page 2008
[3]

Temporal logic planning and control of robotic swarms by hierarchical abstractions,

M. Kloetzer and C. Belta, “Temporal logic planning and control of robotic swarms by hierarchical abstractions,” IEEE Transactions on Robotics, vol. 23, no. 2, pp. 320–330, 2007

work page 2007
[4]

Temporal-logic-based reactive mission and motion planning,

H. Kress-Gazit, G. E. Fainekos, and G. J. Pappas, “Temporal-logic-based reactive mission and motion planning,” IEEE Transactions on Robotics, vol. 25, no. 6, pp. 1370–1381, 2009

work page 2009
[5]

Automatic deployment of distributed teams of robots from temporal logic motion speciﬁcations,

M. Kloetzer and C. Belta, “Automatic deployment of distributed teams of robots from temporal logic motion speciﬁcations,”IEEE Transactions on Robotics, vol. 26, no. 1, pp. 48–61, 2010

work page 2010
[6]

Formal approach to the deployment of distributed robotic teams,

Y . Chen, X. C. Ding, A. Stefanescu, and C. Belta, “Formal approach to the deployment of distributed robotic teams,” IEEE Transactions on Robotics, vol. 28, no. 1, pp. 158–171, 2012

work page 2012
[7]

Temporal logic motion planning and control with probabilistic satisfaction guarantees,

M. Lahijanian, S. B. Andersson, and C. Belta, “Temporal logic motion planning and control with probabilistic satisfaction guarantees,” IEEE Transactions on Robotics , vol. 28, no. 2, pp. 396–409, 2012

work page 2012
[8]

Optimal control of markov decision processes with linear temporal logic constraints,

X. Ding, S. L. Smith, C. Belta, and D. Rus, “Optimal control of markov decision processes with linear temporal logic constraints,” IEEE Transactions on Automatic Control, vol. 59, no. 5, pp. 1244–1257, 2014

work page 2014
[9]

Temporal logic motion planning for dynamic robots,

G. E. Fainekos, A. Girard, H. Kress-Gazit, and G. J. Pappas, “Temporal logic motion planning for dynamic robots,” Automatica, vol. 45, no. 2, pp. 343–352, 2009

work page 2009
[10]

Failure diagnosis of discrete-event systems with linear-time temporal logic speciﬁcations,

S. Jiang and R. Kumar, “Failure diagnosis of discrete-event systems with linear-time temporal logic speciﬁcations,” IEEE Transactions on Automatic Control, vol. 49, no. 6, pp. 934–945, 2004

work page 2004
[11]

Linear time logic control of discrete- time linear systems,

P. Tabuada and G. J. Pappas, “Linear time logic control of discrete- time linear systems,” IEEE Transactions on Automatic Control , vol. 51, no. 12, pp. 1862–1877, 2006

work page 2006
[12]

A fully automated framework for control of linear systems from temporal logic speciﬁcations,

M. Kloetzer and C. Belta, “A fully automated framework for control of linear systems from temporal logic speciﬁcations,” IEEE Transactions on Automatic Control , vol. 53, no. 1, pp. 287–297, 2008

work page 2008
[13]

Receding horizon temporal logic planning,

T. Wongpiromsarn, U. Topcu, and R. M. Murray, “Receding horizon temporal logic planning,” IEEE Transactions on Automatic Control , vol. 57, no. 11, pp. 2817–2830, 2012

work page 2012
[14]

Temporal logic control of discrete-time piecewise afﬁne systems,

B. Yordanov, J. Tumova, I. Cerna, J. Barnat, and C. Belta, “Temporal logic control of discrete-time piecewise afﬁne systems,” IEEE Transac- tions on Automatic Control , vol. 57, no. 6, pp. 1491–1504, 2012

work page 2012
[15]

Synthesis of reactive switching protocols from temporal logic speciﬁcations,

J. Liu, N. Ozay, U. Topcu, and R. M. Murray, “Synthesis of reactive switching protocols from temporal logic speciﬁcations,” IEEE Transac- tions on Automatic Control , vol. 58, no. 7, pp. 1771–1785, 2013

work page 2013
[16]

Symbolic control of stochastic systems via approximately bisimilar ﬁnite abstractions,

M. Zamani, P. Mohajerin Esfahani, R. Majumdar, A. Abate, and J. Lygeros, “Symbolic control of stochastic systems via approximately bisimilar ﬁnite abstractions,” IEEE Transactions on Automatic Control , vol. 59, no. 12, pp. 3135–3150, 2014

work page 2014
[17]

Probably approximately correct MDP learning and control with temporal logic constraints,

J. Fu and U. Topcu, “Probably approximately correct MDP learning and control with temporal logic constraints,” in Robotics: Science and Systems Conference, 2014

work page 2014
[18]

Veriﬁcation of Markov decision processes using learning algorithms,

T. Br ´azdil, K. Chatterjee, M. Chmelik, V . Forejt, J. K ˇret´ınsk`y, M. Kwiatkowska, D. Parker, and M. Ujma, “Veriﬁcation of Markov decision processes using learning algorithms,” in Proceedings of the 12th International Symposium on Automated Technology for Veriﬁcation and Analysis, 2014, pp. 98–114

work page 2014
[19]

Deep imitative reinforcement learning for temporal logic robot motion planning with noisy semantic observations,

Q. Gao, M. Pajic, and M. M. Zavlanos, “Deep imitative reinforcement learning for temporal logic robot motion planning with noisy semantic observations,” in Proceedings of the 35th International Conference on Robotics and Automation (ICRA) , 2020, pp. 8490–8496

work page 2020
[20]

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction . MIT press, 2018

work page 2018
[21]

Omega-regular objectives in model-free reinforcement learning,

E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak, “Omega-regular objectives in model-free reinforcement learning,” in Proceedings of the 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS) , 2019, pp. 395–412. 14

work page 2019
[22]

Control synthesis from linear temporal logic speciﬁcations using model-free rein- forcement learning,

A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic, “Control synthesis from linear temporal logic speciﬁcations using model-free rein- forcement learning,” in Proceedings of the 35th International Conference on Robotics and Automation (ICRA) , 2020, pp. 10 349–10 355

work page 2020
[23]

Model-free learning of safe yet effective controllers,

A. K. Bozkurt, Y . Wang, and M. Pajic, “Model-free learning of safe yet effective controllers,” inProceedings of the 60th Conference on Decision and Control (CDC) , 2021

work page 2021
[24]

Lazy probabilistic model checking without determinisation,

E. M. Hahn, G. Li, S. Schewe, A. Turrini, and L. Zhang, “Lazy probabilistic model checking without determinisation,” in Proceedings of the 26th International Conference on Concurrency Theory (CONCUR) , 2015, p. 354

work page 2015
[25]

Secure estimation and control for cyber-physical systems under adversarial attacks,

H. Fawzi, P. Tabuada, and S. Diggavi, “Secure estimation and control for cyber-physical systems under adversarial attacks,” IEEE Transactions on Automatic Control, vol. 59, no. 6, pp. 1454–1467, 2014

work page 2014
[26]

A survey of stochastic ω-regular games,

K. Chatterjee and T. A. Henzinger, “A survey of stochastic ω-regular games,” Journal of Computer and System Sciences , vol. 78, no. 2, pp. 394–413, 2012

work page 2012
[27]

Filar and K

J. Filar and K. Vrieze, Competitive Markov Decision Processes . Springer, 1997

work page 1997
[28]

Neyman and S

A. Neyman and S. Sorin, Stochastic Games and Applications . Dor- drecht, The Netherlands: Kluwer Academic Publishers, 2003, vol. 570

work page 2003
[29]

Security-aware synthesis using delayed-action games,

M. Elfar, Y . Wang, and M. Pajic, “Security-aware synthesis using delayed-action games,” in Computer Aided Veriﬁcation (CAV). Springer International Publishing, 2019, pp. 180–199

work page 2019
[30]

Security-aware synthesis of human-uav protocols,

M. Elfar, H. Zhu, M. L. Cummings, and M. Pajic, “Security-aware synthesis of human-uav protocols,” in 2019 International Conference on Robotics and Automation (ICRA) , May 2019, pp. 8011–8017

work page 2019
[31]

Secure planning against stealthy attacks via model-free reinforcement learning,

A. K. Bozkurt, Y . Wang, and M. Pajic, “Secure planning against stealthy attacks via model-free reinforcement learning,” in Proceedings of the 36th International Conference on Robotics and Automation (ICRA) , 2021, pp. 10 656–10 662

work page 2021
[32]

Probably approximately correct learning in adversarial environments with temporal logic speciﬁcations,

M. Wen and U. Topcu, “Probably approximately correct learning in adversarial environments with temporal logic speciﬁcations,” IEEE Transactions on Automatic Control , pp. 1–1, 2021

work page 2021
[33]

Pac statistical model check- ing for markov decision processes and stochastic games,

P. Ashok, J. K ˇret´ınsk`y, and M. Weininger, “Pac statistical model check- ing for markov decision processes and stochastic games,” in Proceedings of the 31st International Conference on Computer Aided Veriﬁcation , 2019, pp. 497–519

work page 2019
[34]

Model-free reinforcement learning for stochastic parity games,

E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wo- jtczak, “Model-free reinforcement learning for stochastic parity games,” in Proceedings of the 31st International Conference on Concurrency Theory (CONCUR), vol. 171, 2020, p. 21

work page 2020
[35]

From LTL and limit-deterministic B ¨uchi automata to deterministic parity automata,

J. Esparza, J. K ˇret´ınsk`y, J.-F. Raskin, and S. Sickert, “From LTL and limit-deterministic B ¨uchi automata to deterministic parity automata,” in Proceedings of the 23rd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2017, pp. 426–442

work page 2017
[36]

Markov games as a framework for multi-agent reinforce- ment learning,

M. L. Littman, “Markov games as a framework for multi-agent reinforce- ment learning,” in Machine learning proceedings , 1994, pp. 157–163

work page 1994
[37]

Owl: A library for ω- words, automata, and LTL,

J. Kret ´ınsk´y, T. Meggendorfer, and S. Sickert, “Owl: A library for ω- words, automata, and LTL,” in Proceedings of the 16th International Symposium on Automated Technology for Veriﬁcation and Analysis (ATVA), ser. LNCS, vol. 11138, 2018, pp. 543–550

work page 2018
[38]

PRISM 4.0: Veriﬁcation of probabilistic real-time systems,

M. Kwiatkowska, G. Norman, and D. Parker, “PRISM 4.0: Veriﬁcation of probabilistic real-time systems,” in Proceedings of the 23rd Inter- national Conference on Computer Aided Veriﬁcation (CAV) , 2011, pp. 585–591

work page 2011
[39]

Spot 2.0 — a framework for LTL and ω-automata manipulation,

A. Duret-Lutz, A. Lewkowicz, A. Fauchille, T. Michaud, E. Renault, and L. Xu, “Spot 2.0 — a framework for LTL and ω-automata manipulation,” in Proceedings of the 14th International Symposium on Automated Technology for Veriﬁcation and Analysis (ATVA’16), ser. Lecture Notes in Computer Science, vol. 9938. Springer, Oct. 2016, pp. 122–129

work page 2016
[40]

Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

M. Plappert, M. Andrychowicz, A. Ray, B. McGrew, B. Baker, G. Powell, J. Schneider, J. Tobin, M. Chociej, P. Welinder, V . Kumar, and W. Zaremba, “Multi-goal reinforcement learning: Challenging robotics environments and request for research,” 2018. [Online]. Available: https://arxiv.org/abs/1802.09464

work page internal anchor Pith review Pith/arXiv arXiv 2018
[41]

Openai gym,

G. Brockman, V . Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” 2016

work page 2016
[42]

Mujoco: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , 2012, pp. 5026–5033

work page 2012
[43]

Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 1861–1870. [Online]. Avai...

work page 2018

[1] [1]

Model-free reinforcement learning for stochastic games with linear temporal logic objectives,

A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic, “Model-free reinforcement learning for stochastic games with linear temporal logic objectives,” in Proceedings of the 36th International Conference on Robotics and Automation (ICRA) , 2021, pp. 10 649–10 655

work page 2021

[2] [2]

Baier and J.-P

C. Baier and J.-P. Katoen, Principles of Model Checking . Cambridge, MA, USA: MIT Press, 2008

work page 2008

[3] [3]

Temporal logic planning and control of robotic swarms by hierarchical abstractions,

M. Kloetzer and C. Belta, “Temporal logic planning and control of robotic swarms by hierarchical abstractions,” IEEE Transactions on Robotics, vol. 23, no. 2, pp. 320–330, 2007

work page 2007

[4] [4]

Temporal-logic-based reactive mission and motion planning,

H. Kress-Gazit, G. E. Fainekos, and G. J. Pappas, “Temporal-logic-based reactive mission and motion planning,” IEEE Transactions on Robotics, vol. 25, no. 6, pp. 1370–1381, 2009

work page 2009

[5] [5]

Automatic deployment of distributed teams of robots from temporal logic motion speciﬁcations,

M. Kloetzer and C. Belta, “Automatic deployment of distributed teams of robots from temporal logic motion speciﬁcations,”IEEE Transactions on Robotics, vol. 26, no. 1, pp. 48–61, 2010

work page 2010

[6] [6]

Formal approach to the deployment of distributed robotic teams,

Y . Chen, X. C. Ding, A. Stefanescu, and C. Belta, “Formal approach to the deployment of distributed robotic teams,” IEEE Transactions on Robotics, vol. 28, no. 1, pp. 158–171, 2012

work page 2012

[7] [7]

Temporal logic motion planning and control with probabilistic satisfaction guarantees,

M. Lahijanian, S. B. Andersson, and C. Belta, “Temporal logic motion planning and control with probabilistic satisfaction guarantees,” IEEE Transactions on Robotics , vol. 28, no. 2, pp. 396–409, 2012

work page 2012

[8] [8]

Optimal control of markov decision processes with linear temporal logic constraints,

X. Ding, S. L. Smith, C. Belta, and D. Rus, “Optimal control of markov decision processes with linear temporal logic constraints,” IEEE Transactions on Automatic Control, vol. 59, no. 5, pp. 1244–1257, 2014

work page 2014

[9] [9]

Temporal logic motion planning for dynamic robots,

G. E. Fainekos, A. Girard, H. Kress-Gazit, and G. J. Pappas, “Temporal logic motion planning for dynamic robots,” Automatica, vol. 45, no. 2, pp. 343–352, 2009

work page 2009

[10] [10]

Failure diagnosis of discrete-event systems with linear-time temporal logic speciﬁcations,

S. Jiang and R. Kumar, “Failure diagnosis of discrete-event systems with linear-time temporal logic speciﬁcations,” IEEE Transactions on Automatic Control, vol. 49, no. 6, pp. 934–945, 2004

work page 2004

[11] [11]

Linear time logic control of discrete- time linear systems,

P. Tabuada and G. J. Pappas, “Linear time logic control of discrete- time linear systems,” IEEE Transactions on Automatic Control , vol. 51, no. 12, pp. 1862–1877, 2006

work page 2006

[12] [12]

A fully automated framework for control of linear systems from temporal logic speciﬁcations,

M. Kloetzer and C. Belta, “A fully automated framework for control of linear systems from temporal logic speciﬁcations,” IEEE Transactions on Automatic Control , vol. 53, no. 1, pp. 287–297, 2008

work page 2008

[13] [13]

Receding horizon temporal logic planning,

T. Wongpiromsarn, U. Topcu, and R. M. Murray, “Receding horizon temporal logic planning,” IEEE Transactions on Automatic Control , vol. 57, no. 11, pp. 2817–2830, 2012

work page 2012

[14] [14]

Temporal logic control of discrete-time piecewise afﬁne systems,

B. Yordanov, J. Tumova, I. Cerna, J. Barnat, and C. Belta, “Temporal logic control of discrete-time piecewise afﬁne systems,” IEEE Transac- tions on Automatic Control , vol. 57, no. 6, pp. 1491–1504, 2012

work page 2012

[15] [15]

Synthesis of reactive switching protocols from temporal logic speciﬁcations,

J. Liu, N. Ozay, U. Topcu, and R. M. Murray, “Synthesis of reactive switching protocols from temporal logic speciﬁcations,” IEEE Transac- tions on Automatic Control , vol. 58, no. 7, pp. 1771–1785, 2013

work page 2013

[16] [16]

Symbolic control of stochastic systems via approximately bisimilar ﬁnite abstractions,

M. Zamani, P. Mohajerin Esfahani, R. Majumdar, A. Abate, and J. Lygeros, “Symbolic control of stochastic systems via approximately bisimilar ﬁnite abstractions,” IEEE Transactions on Automatic Control , vol. 59, no. 12, pp. 3135–3150, 2014

work page 2014

[17] [17]

Probably approximately correct MDP learning and control with temporal logic constraints,

J. Fu and U. Topcu, “Probably approximately correct MDP learning and control with temporal logic constraints,” in Robotics: Science and Systems Conference, 2014

work page 2014

[18] [18]

Veriﬁcation of Markov decision processes using learning algorithms,

T. Br ´azdil, K. Chatterjee, M. Chmelik, V . Forejt, J. K ˇret´ınsk`y, M. Kwiatkowska, D. Parker, and M. Ujma, “Veriﬁcation of Markov decision processes using learning algorithms,” in Proceedings of the 12th International Symposium on Automated Technology for Veriﬁcation and Analysis, 2014, pp. 98–114

work page 2014

[19] [19]

Deep imitative reinforcement learning for temporal logic robot motion planning with noisy semantic observations,

Q. Gao, M. Pajic, and M. M. Zavlanos, “Deep imitative reinforcement learning for temporal logic robot motion planning with noisy semantic observations,” in Proceedings of the 35th International Conference on Robotics and Automation (ICRA) , 2020, pp. 8490–8496

work page 2020

[20] [20]

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction . MIT press, 2018

work page 2018

[21] [21]

Omega-regular objectives in model-free reinforcement learning,

E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak, “Omega-regular objectives in model-free reinforcement learning,” in Proceedings of the 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS) , 2019, pp. 395–412. 14

work page 2019

[22] [22]

Control synthesis from linear temporal logic speciﬁcations using model-free rein- forcement learning,

A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic, “Control synthesis from linear temporal logic speciﬁcations using model-free rein- forcement learning,” in Proceedings of the 35th International Conference on Robotics and Automation (ICRA) , 2020, pp. 10 349–10 355

work page 2020

[23] [23]

Model-free learning of safe yet effective controllers,

A. K. Bozkurt, Y . Wang, and M. Pajic, “Model-free learning of safe yet effective controllers,” inProceedings of the 60th Conference on Decision and Control (CDC) , 2021

work page 2021

[24] [24]

Lazy probabilistic model checking without determinisation,

E. M. Hahn, G. Li, S. Schewe, A. Turrini, and L. Zhang, “Lazy probabilistic model checking without determinisation,” in Proceedings of the 26th International Conference on Concurrency Theory (CONCUR) , 2015, p. 354

work page 2015

[25] [25]

Secure estimation and control for cyber-physical systems under adversarial attacks,

H. Fawzi, P. Tabuada, and S. Diggavi, “Secure estimation and control for cyber-physical systems under adversarial attacks,” IEEE Transactions on Automatic Control, vol. 59, no. 6, pp. 1454–1467, 2014

work page 2014

[26] [26]

A survey of stochastic ω-regular games,

K. Chatterjee and T. A. Henzinger, “A survey of stochastic ω-regular games,” Journal of Computer and System Sciences , vol. 78, no. 2, pp. 394–413, 2012

work page 2012

[27] [27]

Filar and K

J. Filar and K. Vrieze, Competitive Markov Decision Processes . Springer, 1997

work page 1997

[28] [28]

Neyman and S

A. Neyman and S. Sorin, Stochastic Games and Applications . Dor- drecht, The Netherlands: Kluwer Academic Publishers, 2003, vol. 570

work page 2003

[29] [29]

Security-aware synthesis using delayed-action games,

M. Elfar, Y . Wang, and M. Pajic, “Security-aware synthesis using delayed-action games,” in Computer Aided Veriﬁcation (CAV). Springer International Publishing, 2019, pp. 180–199

work page 2019

[30] [30]

Security-aware synthesis of human-uav protocols,

M. Elfar, H. Zhu, M. L. Cummings, and M. Pajic, “Security-aware synthesis of human-uav protocols,” in 2019 International Conference on Robotics and Automation (ICRA) , May 2019, pp. 8011–8017

work page 2019

[31] [31]

Secure planning against stealthy attacks via model-free reinforcement learning,

A. K. Bozkurt, Y . Wang, and M. Pajic, “Secure planning against stealthy attacks via model-free reinforcement learning,” in Proceedings of the 36th International Conference on Robotics and Automation (ICRA) , 2021, pp. 10 656–10 662

work page 2021

[32] [32]

Probably approximately correct learning in adversarial environments with temporal logic speciﬁcations,

M. Wen and U. Topcu, “Probably approximately correct learning in adversarial environments with temporal logic speciﬁcations,” IEEE Transactions on Automatic Control , pp. 1–1, 2021

work page 2021

[33] [33]

Pac statistical model check- ing for markov decision processes and stochastic games,

P. Ashok, J. K ˇret´ınsk`y, and M. Weininger, “Pac statistical model check- ing for markov decision processes and stochastic games,” in Proceedings of the 31st International Conference on Computer Aided Veriﬁcation , 2019, pp. 497–519

work page 2019

[34] [34]

Model-free reinforcement learning for stochastic parity games,

E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wo- jtczak, “Model-free reinforcement learning for stochastic parity games,” in Proceedings of the 31st International Conference on Concurrency Theory (CONCUR), vol. 171, 2020, p. 21

work page 2020

[35] [35]

From LTL and limit-deterministic B ¨uchi automata to deterministic parity automata,

J. Esparza, J. K ˇret´ınsk`y, J.-F. Raskin, and S. Sickert, “From LTL and limit-deterministic B ¨uchi automata to deterministic parity automata,” in Proceedings of the 23rd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2017, pp. 426–442

work page 2017

[36] [36]

Markov games as a framework for multi-agent reinforce- ment learning,

M. L. Littman, “Markov games as a framework for multi-agent reinforce- ment learning,” in Machine learning proceedings , 1994, pp. 157–163

work page 1994

[37] [37]

Owl: A library for ω- words, automata, and LTL,

J. Kret ´ınsk´y, T. Meggendorfer, and S. Sickert, “Owl: A library for ω- words, automata, and LTL,” in Proceedings of the 16th International Symposium on Automated Technology for Veriﬁcation and Analysis (ATVA), ser. LNCS, vol. 11138, 2018, pp. 543–550

work page 2018

[38] [38]

PRISM 4.0: Veriﬁcation of probabilistic real-time systems,

M. Kwiatkowska, G. Norman, and D. Parker, “PRISM 4.0: Veriﬁcation of probabilistic real-time systems,” in Proceedings of the 23rd Inter- national Conference on Computer Aided Veriﬁcation (CAV) , 2011, pp. 585–591

work page 2011

[39] [39]

Spot 2.0 — a framework for LTL and ω-automata manipulation,

A. Duret-Lutz, A. Lewkowicz, A. Fauchille, T. Michaud, E. Renault, and L. Xu, “Spot 2.0 — a framework for LTL and ω-automata manipulation,” in Proceedings of the 14th International Symposium on Automated Technology for Veriﬁcation and Analysis (ATVA’16), ser. Lecture Notes in Computer Science, vol. 9938. Springer, Oct. 2016, pp. 122–129

work page 2016

[40] [40]

Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

M. Plappert, M. Andrychowicz, A. Ray, B. McGrew, B. Baker, G. Powell, J. Schneider, J. Tobin, M. Chociej, P. Welinder, V . Kumar, and W. Zaremba, “Multi-goal reinforcement learning: Challenging robotics environments and request for research,” 2018. [Online]. Available: https://arxiv.org/abs/1802.09464

work page internal anchor Pith review Pith/arXiv arXiv 2018

[41] [41]

Openai gym,

G. Brockman, V . Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” 2016

work page 2016

[42] [42]

Mujoco: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , 2012, pp. 5026–5033

work page 2012

[43] [43]

Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 1861–1870. [Online]. Avai...

work page 2018