arxiv: 2603.07094 · v2 · submitted 2026-03-07 · 💻 cs.GT · cs.FL· cs.MA

Recognition: 2 theorem links

· Lean Theorem

Randomise Alone, Reach as a Team

L\'eonard Brice , Thomas A. Henzinger , Alipasha Montaseri , Ali Shafiee , K. S. Thejaswini

Authors on Pith no claims yet

Pith reviewed 2026-05-15 15:46 UTC · model grok-4.3

classification 💻 cs.GT cs.FLcs.MA

keywords concurrent graph gamesdistributed randomizationmemoryless strategiesthreshold problemNP-hardnessalmost-sure reachabilityIRATL

0 comments

The pith

Memoryless strategies suffice for teams to exceed winning probability thresholds in games with private randomness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies concurrent graph games in which several players cooperate against an opponent to reach a set of target states while each must randomize using a private source that remains hidden from the opponent and from teammates. It proves that for deciding whether the team can force a win probability strictly above any given threshold, it suffices for every player to choose actions based only on the current position. This sufficiency places the decision problem in the existential theory of the reals and permits value-iteration algorithms that compute the relevant probabilities. The authors additionally establish that the threshold problem is NP-hard while the special case of almost-sure reachability is NP-complete. They introduce Individually Randomised Alternating-time Temporal Logic to express such probabilistic team goals under distributed randomization.

Core claim

Memoryless strategies are sufficient for the threshold problem in concurrent graph games with distributed randomisation. The result embeds the decision problem in the existential theory of the reals and supports value-iteration algorithms. The threshold problem is NP-hard; almost-sure reachability is NP-complete. Individually Randomised Alternating-time Temporal Logic supplies the semantics for coalitions that lack a shared randomness source.

What carries the argument

Memoryless strategies under the distributed-randomisation model, which reduce the threshold decision to an existential formula over the reals.

If this is right

Value-iteration algorithms can compute winning probabilities for the threshold problem.
The threshold problem is NP-hard, so polynomial-time algorithms are unlikely unless P equals NP.
Almost-sure reachability admits an NP decision procedure and is therefore complete for that class.
IRATL provides a logical language for specifying probability thresholds when randomness is private to each player.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If players could observe one another's random outcomes, memory might again become necessary and the complexity could rise.
Verification tools for multi-agent systems can therefore prioritise memoryless implementations when randomness sources are known to be private.
The same reduction technique may apply to related objectives such as safety or parity when the underlying graph remains finite.

Load-bearing premise

Each player's random source is completely private and stays hidden from the opponent and from all other team members throughout the game.

What would settle it

A finite graph game in which the optimal probability above a chosen threshold can be achieved only by a strategy that remembers previous random outcomes or positions would falsify the memoryless sufficiency claim.

Figures

Figures reproduced from arXiv: 2603.07094 by Alipasha Montaseri, Ali Shafiee, K. S. Thejaswini, L\'eonard Brice, Thomas A. Henzinger.

**Figure 1.** Figure 1: The two agents choose among actions tL, Ru and the environment chooses among actions tL, Ru . Consider first the setting where R2D2 and C3PO have access to a shared source of randomness, which can’t be observed by the adversary. Such games can be viewed as standard two-player concurrent games, where the team acts as a single meta-player. By playing the joint actions pL, Lq or pR, Rq with probability 1 2 ea… view at source ↗

**Figure 2.** Figure 2: An example game for opponent with no memory. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Execution time of the algorithms across three benchmarks. The plots [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: The plot compares the execution time (in seconds, logarithmic scale) [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Visualizations of the six scenarios used for evaluation. Scenarios 1–4 in [PITH_FULL_IMAGE:figures/full_fig_p043_5.png] view at source ↗

**Figure 6.** Figure 6: Visualizations of the seven scenarios used for the almost-sure experiment. [PITH_FULL_IMAGE:figures/full_fig_p044_6.png] view at source ↗

**Figure 7.** Figure 7: The four benchmark scenarios for the robot game. Robots are shown as [PITH_FULL_IMAGE:figures/full_fig_p045_7.png] view at source ↗

read the original abstract

We study concurrent graph games where n players cooperate against an opponent to reach a set of target states. Unlike traditional settings, we study distributed randomisation: team players do not share a source of randomness, and their private random sources are hidden from the opponent and from each other. We show that memoryless strategies are sufficient for the threshold problem (deciding whether there is a strategy for the team that ensures winning with probability that exceeds a threshold), a result that not only places the problem in the Existential Theory of the Reals (\exists\mathbb{R}) but also enables the construction of value iteration algorithms. We additionally show that the threshold problem is NP-hard. For the almost-sure reachability problem, we prove NP-completeness. We introduce Individually Randomised Alternating-time Temporal Logic (IRATL). This logic extends the standard ATL framework to reason about probability thresholds, with semantics explicitly designed for coalitions that lack a shared source of randomness. On the practical side, we implement and evaluate a solver for the threshold and almost-sure problem based on the algorithms that we develop.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Memoryless strategies suffice for threshold reachability under private per-player randomness in concurrent games, which places the problem in ∃R and supports value iteration plus NP-hardness results.

read the letter

The main thing here is that memoryless strategies work for deciding whether a team can force target reachability above a given probability threshold, even when each player draws from their own hidden random source. This holds in finite turn-based concurrent graphs against an opponent, and it lets them encode the existence of suitable action distributions as a system of polynomial inequalities over the reals. That move puts the threshold problem in ∃R and opens the way for value iteration. They also show NP-hardness for thresholds and NP-completeness for the almost-sure case, plus they define IRATL to reason about these probabilities with individual randomization semantics.

Referee Report

2 major / 1 minor

Summary. The paper studies concurrent graph games in which a coalition of n players cooperates against an opponent to reach a target set, using distributed randomization in which each player has a private random source hidden from the opponent and from the other players. The central claims are that memoryless strategies suffice for the threshold reachability problem (placing it in ∃R and enabling value iteration), that the threshold problem is NP-hard, and that almost-sure reachability is NP-complete. The authors introduce Individually Randomised Alternating-time Temporal Logic (IRATL) whose semantics are defined for coalitions without shared randomness and present an implemented solver for the decision problems.

Significance. If the memoryless-sufficiency result holds, the work supplies a complexity classification (∃R membership together with NP-hardness) and a practical algorithmic route (value iteration) for reachability under private, hidden randomness; this extends the standard shared-randomness model of stochastic games and ATL. The NP-completeness result for almost-sure reachability gives a tight bound, and the implemented solver provides an immediate practical contribution for verification of multi-agent systems with individual randomization.

major comments (2)

[Abstract] Abstract: the claim that memoryless strategies suffice for the threshold problem rests on an encoding of per-state action distributions as existential real variables whose induced reachability probabilities satisfy a system of polynomial inequalities; no explicit system of inequalities, no proof sketch, and no verification that the encoding preserves the private-randomness semantics are supplied, rendering the ∃R membership and value-iteration claims unverifiable from the given text.
[Abstract] Abstract: the NP-hardness reduction for the threshold problem and the NP-completeness proof for almost-sure reachability are asserted without describing the source problem, the gadget construction, or how the reduction respects the finite turn-based concurrent graph and hidden-private-randomness model; these details are load-bearing for the hardness claims.

minor comments (1)

The syntax and semantics of IRATL are introduced but not displayed; a short formal definition or example formula would improve readability for readers outside the ATL community.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the significance of the memoryless-sufficiency result and the complexity classifications. We address each major comment below with clarifications drawn from the full manuscript and indicate where revisions will strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that memoryless strategies suffice for the threshold problem rests on an encoding of per-state action distributions as existential real variables whose induced reachability probabilities satisfy a system of polynomial inequalities; no explicit system of inequalities, no proof sketch, and no verification that the encoding preserves the private-randomness semantics are supplied, rendering the ∃R membership and value-iteration claims unverifiable from the given text.

Authors: The explicit system of polynomial inequalities (encoding independent per-player action distributions at each state together with the resulting reachability probabilities) and the proof that this encoding preserves private-randomness semantics appear in Section 3.2 of the full manuscript; the ∃R membership follows immediately from existential quantification over the real variables, and value iteration is justified by the memoryless property established in Theorem 3. We agree the abstract is too terse and will revise it to include a concise sketch of the encoding and its semantic preservation. revision: partial
Referee: [Abstract] Abstract: the NP-hardness reduction for the threshold problem and the NP-completeness proof for almost-sure reachability are asserted without describing the source problem, the gadget construction, or how the reduction respects the finite turn-based concurrent graph and hidden-private-randomness model; these details are load-bearing for the hardness claims.

Authors: The NP-hardness reduction for threshold reachability (from 3-SAT) and the NP-completeness proof for almost-sure reachability (from SAT) are fully detailed in Sections 4.1 and 4.2, respectively, with explicit gadget constructions that enforce the concurrent turn-based structure and independent private randomness. We acknowledge that the abstract omits even a high-level description of the source problems and key gadget properties; we will add a brief overview of both reductions to the revised abstract. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The central claim that memoryless strategies suffice for the threshold problem in finite turn-based concurrent graph games with private hidden randomness is presented as a direct result of encoding per-state action distributions as existential real variables satisfying polynomial inequalities. No equations, self-citations, or reductions in the abstract or provided text equate the claimed sufficiency to a fitted parameter, renamed input, or load-bearing self-citation chain. The placement in ∃R and NP-hardness follow from standard complexity reductions in game theory without circular self-definition. The model assumptions (private randomness sources, no shared/observable randomness) are stated explicitly and do not presuppose the memoryless result.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on the standard definition of concurrent graph games and the assumption that randomness is strictly private; no free parameters or invented physical entities are introduced.

axioms (2)

standard math Finite turn-based concurrent graph games with reachability objectives
Invoked in the opening sentence of the abstract as the base model.
domain assumption Private random sources are independent and hidden from all other agents
Explicitly stated as the distinguishing feature of the setting.

invented entities (1)

IRATL logic no independent evidence
purpose: To reason about probability thresholds for coalitions without shared randomness
New syntax and semantics introduced to capture the distributed-randomization setting.

pith-pipeline@v0.9.0 · 5513 in / 1418 out tokens · 41251 ms · 2026-05-15T15:46:18.285425+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We show that memoryless strategies are sufficient for the threshold problem... places the problem in the Existential Theory of the Reals (∃R) but also enables the construction of value iteration algorithms.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce Individually Randomised Alternating-time Temporal Logic (IRATL)... semantics explicitly designed for coalitions that lack a shared source of randomness.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 2 internal anchors

[1]

In: CONCUR 2001 — Concurrency Theory

de Alfaro, L., Henzinger, T.A., Jhala, R.: Compositional methods for probabilistic systems. In: CONCUR 2001 — Concurrency Theory. pp. 351–365. Springer Berlin Heidelberg, Berlin, Heidelberg (2001)

work page 2001
[2]

Alur, R., Henzinger, T.A., Kupferman, O.: Alternating-time temporal logic. J. ACM49(5), 672–713 (2002).https://doi.org/10.1145/585265.585270,https: //doi.org/10.1145/585265.585270

work page doi:10.1145/585265.585270 2002
[3]

Attoui, A.: Multi-agent based method for reactive systems formal specification and validation. IFAC Proceedings Volumes29(2), 1–6 (1996).https://doi.org/ https://doi.org/10.1016/S1474-6670(17)43768-2, 5th IFAC/IFIP/GI/GMA Workshop on Experience With the Management of Software Projects 1995 (MSP ’95)

work page doi:10.1016/s1474-6670(17)43768-2 1996
[4]

CoRRabs/2510.02984(2025).https://doi.org/10

Bertrand, N., Bouyer, P., Lapointe, L., Mascle, C.: Reach together: How popula- tions win repeated games. CoRRabs/2510.02984(2025).https://doi.org/10. 48550/ARXIV.2510.02984,https://doi.org/10.48550/arXiv.2510.02984

work page doi:10.48550/arxiv.2510.02984 2025
[5]

In: FSTTCS 2019

Bertrand, N., Bouyer, P., Majumdar, A.: Concurrent parameterized games. In: FSTTCS 2019. LIPIcs, vol. 150, pp. 31:1–31:15. Schloss Dagstuhl - Leibniz- Zentrum für Informatik (2019).https://doi.org/10.4230/LIPICS.FSTTCS.2019. 31

work page doi:10.4230/lipics.fsttcs.2019 2019
[6]

In: FSTTCS 2020

Bertrand, N., Bouyer, P., Majumdar, A.: Synthesizing safe coalition strategies. In: FSTTCS 2020. LIPIcs, vol. 182, pp. 39:1–39:17. Schloss Dagstuhl - Leibniz- Zentrum für Informatik (2020).https://doi.org/10.4230/LIPICS.FSTTCS.2020. 39

work page doi:10.4230/lipics.fsttcs.2020 2020
[7]

The Annals of Mathematical Statis- tics33(2), 719–726 (1962)

Blackwell, D.: Discrete dynamic programming. The Annals of Mathematical Statis- tics33(2), 719–726 (1962)

work page 1962
[8]

Brice, L., Henzinger, T.A., Thejaswini, K.S.: Dicey games: Shared sources of ran- domness in distributed systems (2026),https://arxiv.org/abs/2601.18303

work page internal anchor Pith review Pith/arXiv arXiv 2026
[9]

In: Pro- ceedings of the Twentieth Annual ACM Symposium on Theory of Computing STOC

Canny, J.: Some algebraic and geometric computations in PSPACE. In: Pro- ceedings of the Twentieth Annual ACM Symposium on Theory of Computing STOC. p. 460–467. STOC ’88, Association for Computing Machinery (1988). https://doi.org/10.1145/62212.62257

work page doi:10.1145/62212.62257 1988
[10]

Strategy Improvement for Concurrent Reachability and Safety Games

Chatterjee, K., de Alfaro, L., Henzinger, T.A.: Strategy improvement for con- current reachability and safety games. CoRRabs/1201.2834(2012),https: //arxiv.org/abs/1201.2834, preprint version of the combined QEST/SODA re- sults

work page internal anchor Pith review Pith/arXiv arXiv 2012
[11]

In: TACAS 2013

Chen, T., Forejt, V., Kwiatkowska, M.Z., Parker, D., Simaitis, A.: Prism-games: A model checker for stochastic multi-player games. In: TACAS 2013. Lecture Notes in Computer Science, vol. 7795, pp. 185–191. Springer (2013).https://doi.org/ 10.1007/978-3-642-36742-7_13

work page doi:10.1007/978-3-642-36742-7_13 2013
[12]

In: Proceedings of the 18th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS)

Chen, T., Forejt, V., Kwiatkowska, M., Parker, D., Simaitis, A.: Automatic verifi- cation of competitive stochastic systems. In: Proceedings of the 18th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). Lecture Notes in Computer Science, vol. 7214, pp. 315–330. Springer (2012).https://doi.org/10.1007/97...

work page doi:10.1007/978-3-642-28756-5_22 2012
[13]

In: Lei, J

Chen, T., Lu, J.: Probabilistic alternating-time temporal logic and model checking algorithm. In: Lei, J. (ed.) FSKD 2007. pp. 35–39. IEEE Computer Society (2007)

work page 2007
[14]

Theoretical Computer Science386(3), 188–217 (2007).https://doi.org/https: //doi.org/10.1016/j.tcs.2007.07.008, expressiveness in Concurrency 20 L

de Alfaro, L., Henzinger, T.A., Kupferman, O.: Concurrent reachability games. Theoretical Computer Science386(3), 188–217 (2007).https://doi.org/https: //doi.org/10.1016/j.tcs.2007.07.008, expressiveness in Concurrency 20 L. Brice, T. A. Henzinger, A. Montaseri, A. Shafiee, and K. S. Thejaswini

work page doi:10.1016/j.tcs.2007.07.008 2007
[15]

In: Nonlinear model predictive control: towards new challenging applications, pp

Diehl, M., Ferreau, H.J., Haverbeke, N.: Efficient numerical methods for nonlin- ear mpc and moving horizon estimation. In: Nonlinear model predictive control: towards new challenging applications, pp. 391–417. Springer (2009)

work page 2009
[16]

Etessami, K., Yannakakis, M.: Recursive markov decision processes and recursive stochastic games. J. ACM62(2) (May 2015).https://doi.org/10.1145/2699431, https://doi.org/10.1145/2699431

work page doi:10.1145/2699431 2015
[17]

Forejt, V., Kwiatkowska, M., Norman, G., Parker, D.: Automated Verifica- tion Techniques for Probabilistic Systems, pp. 53–113. Springer Berlin Heidel- berg, Berlin, Heidelberg (2011).https://doi.org/10.1007/978-3-642-21455-4_ 3,https://doi.org/10.1007/978-3-642-21455-4_3

work page doi:10.1007/978-3-642-21455-4_ 2011
[18]

In: Advances in Neural Information Processing Systems (NIPS)

Guestrin, C., Koller, D., Parr, R.: Multiagent planning with factored MDPs. In: Advances in Neural Information Processing Systems (NIPS). vol. 14, pp. 1523–

work page
[19]

In: Lahiri, S.K., Wang, C

Gutierrez, J., Najib, M., Perelli, G., Wooldridge, M.J.: EVE: A tool for temporal equilibrium analysis. In: Lahiri, S.K., Wang, C. (eds.) Automated Technology for Verification and Analysis - 16th International Symposium, ATVA 2018, Los An- geles, CA, USA, October 7-10, 2018, Proceedings. Lecture Notes in Computer Science, vol. 11138, pp. 551–557. Springer...

work page doi:10.1007/978-3-030-01090-4_35 2018
[20]

In: Computer Science – Theory and Applications

Hansen, K.A., Ibsen-Jensen, R., Miltersen, P.B.: The complexity of solving reach- ability games using value and strategy iteration. In: Computer Science – Theory and Applications. Springer (2011)

work page 2011
[21]

Hensel, C., Junges, S., Katoen, J.P., Quatmann, T., Volk, M.: The probabilis- tic model checker storm. Int. J. Softw. Tools Technol. Transf.24(4), 589–610 (Aug 2022).https://doi.org/10.1007/s10009-021-00633-z,https://doi.org/ 10.1007/s10009-021-00633-z

work page doi:10.1007/s10009-021-00633-z 2022
[22]

Immerman, N.: Number of quantifiers is better than number of tape cells. J. Comput. Syst. Sci.22, 384–406 (1981),https://api.semanticscholar.org/ CorpusID:40753086

work page 1981
[23]

Journal of Machine Learning Research (JMLR)7(65), 1789–1828 (2006)

Kok, J.R., Vlassis, N.: Collaborative multiagent reinforcement learning by pay- off propagation. Journal of Machine Learning Research (JMLR)7(65), 1789–1828 (2006)

work page 2006
[24]

Kraft, D.: A software package for sequential quadratic programming. Tech. Rep. Forschungsbericht FB 88-28, Deutsche Forschungs- und Versuchsanstalt für Luft- und Raumfahrt (DFVLR), Köln, Germany (1988)

work page 1988
[25]

In: Computer Aided Ver- ification - 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21-24, 2020, Proceedings, Part II

Kwiatkowska, M., Norman, G., Parker, D., Santos, G.: Prism-games 3.0: Stochastic game verification with concurrency, equilibria and time. In: Computer Aided Ver- ification - 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21-24, 2020, Proceedings, Part II. Lecture Notes in Computer Science, vol. 12225, pp. 475–487. Springer (2020).http...

work page doi:10.1007/978-3-030-53291-8_ 2020
[26]

Formal Methods in System Design58(1), 188–250 (2021)

Kwiatkowska, M., Norman, G., Parker, D., Santos, G.: Automatic verification of concurrent stochastic systems. Formal Methods in System Design58(1), 188–250 (2021)

work page 2021
[27]

Křetínský,J.,Meggendorfer,T.,Weininger,M.:Stoppingcriteriaforvalueiteration onstochasticgameswithquantitativeobjectives.In:202338thAnnualACM/IEEE Symposium on Logic in Computer Science (LICS). pp. 1–14 (2023).https://doi. org/10.1109/LICS56636.2023.10175771

work page doi:10.1109/lics56636.2023.10175771 2023
[28]

Operations Research29(5), 971–982 (1981),http://www.jstor.org/stable/170234 Randomise Alone, Reach as a Team 21

Lefèvre, C.: Optimal control of a birth and death epidemic process. Operations Research29(5), 971–982 (1981),http://www.jstor.org/stable/170234 Randomise Alone, Reach as a Team 21

work page 1981
[29]

Lomuscio, A., Qu, H., Raimondi, F.: MCMAS: an open-source model checker for the verification of multi-agent systems. Int. J. Softw. Tools Technol. Transf.19(1), 9–30 (2017).https://doi.org/10.1007/S10009-015-0378-X,https://doi.org/ 10.1007/s10009-015-0378-x

work page doi:10.1007/s10009-015-0378-x 2017
[30]

Procedia Manufacturing51, 1200–1206 (2020).https://doi.org/https://doi.org/10.1016/j.promfg.2020

Lyu, G., Fazlirad, A., Brennan, R.W.: Multi-agent modeling of cyber-physical systems for iec 61499 based distributed automation. Procedia Manufacturing51, 1200–1206 (2020).https://doi.org/https://doi.org/10.1016/j.promfg.2020. 10.168, 30th International Conference on Flexible Automation and Intelligent Manufacturing (FAIM2021)

work page doi:10.1016/j.promfg.2020 2020
[31]

de Moura, L., Bjørner, N.: Z3: an efficient smt solver. vol. 4963, pp. 337–340 (04 2008).https://doi.org/10.1007/978-3-540-78800-3_24

work page doi:10.1007/978-3-540-78800-3_24 2008
[32]

In: Logics in Artificial Intelligence

Murano, A., Neider, D., Zimmermann, M.: Robust alternating-time temporal logic. In: Logics in Artificial Intelligence. pp. 796–813. Springer Nature Switzerland, Cham (2023)

work page 2023
[33]

In: Theory and Applications of Graphs: Proceedings, Michigan May 11–15, 1976, pp

Parsons, T.D.: Pursuit-evasion in a graph. In: Theory and Applications of Graphs: Proceedings, Michigan May 11–15, 1976, pp. 426–441. Springer (2006)

work page 1976
[34]

John Wiley & Sons (1994)

Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Pro- gramming. John Wiley & Sons (1994)

work page 1994
[35]

McGraw-Hill, New York, 3rd edn

Rudin, W.: Principles of Mathematical Analysis. McGraw-Hill, New York, 3rd edn. (1976)

work page 1976
[36]

CoRRabs/2407.18006(2024).https://doi.org/ 10.48550/ARXIV.2407.18006,https://doi.org/10.48550/arXiv.2407.18006

Schaefer, M., Cardinal, J., Miltzow, T.: The existential theory of the reals as a com- plexity class: A compendium. CoRRabs/2407.18006(2024).https://doi.org/ 10.48550/ARXIV.2407.18006,https://doi.org/10.48550/arXiv.2407.18006

work page doi:10.48550/arxiv.2407.18006 2024
[37]

IEEE Transactions on Automatic Control 62(1), 65–80 (2016)

Shi, D., Elliott, R.J., Chen, T.: On finite-state stochastic modeling and secure estimation of cyber-physical systems. IEEE Transactions on Automatic Control 62(1), 65–80 (2016)

work page 2016
[38]

http://baldur

Sorensson, N.: Minisat 2.2 and minisat++ 1.1. http://baldur. iti. uka. de/sat-race- 2010/descriptions/solver_25+ 26. pdf (2010)

work page 2010
[39]

Nature Methods17, 261–272 (2020).https://doi.org/10.1038/ s41592-019-0686-2

Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Courna- peau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, İ., Feng, Y., Moore, E.W., Vander- Plas, J., Laxalde, D., Perktold, J., Cim...

work page 2020
[40]

Handbook of reinforcement learning and con- trol pp

Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and con- trol pp. 321–384 (2021)

work page 2021
[41]

2023 IEEE International Confer- ence on Robotics and Automation (ICRA) pp

Zhu, E.L., Borrelli, F.: A sequential quadratic programming approach to the so- lution of open-loop generalized nash equilibria. 2023 IEEE International Confer- ence on Robotics and Automation (ICRA) pp. 3211–3217 (2022),https://api. semanticscholar.org/CorpusID:247793988 A The existential theory of the reals Definition 3 (Existential theory of the reals)...

work page 2023
[42]

LetV λ σ˚ be the value of the game under collective memoryless strategy¯σ˚ Π with discount factorλ

It was shown in [7], that (this intuition is actually correct) asλÑ1´, the expected reward is equal to the actual reachability probability. LetV λ σ˚ be the value of the game under collective memoryless strategy¯σ˚ Π with discount factorλ. We haveV˚ “lim λÑ1´ V λ σ˚ ěV 1. SinceV 1 is strictly greater thant, we haveV ˚ ąt. By the definition of the limit, t...

work page
[43]

Lemma 6.If the graphGcontains a clique of sizek, the team has a memoryless strategy to reachJwith probability1

We show that the team can win with proba- bility greater thantif and only if the graph has a clique of sizek. Lemma 6.If the graphGcontains a clique of sizek, the team has a memoryless strategy to reachJwith probability1. Proof.Assume that the graphGhas a clique of sizek. Lettc 1, c2, . . . , ckube the vertices of this clique. We define a memoryless colle...

work page
[44]

The game moves toWk´1 with a positive probability on the next turn

work page
[45]

Letηbe the strategy that plays this memoryless strategy at each states

The game remains inWwith probability 1 on the next turn. Letηbe the strategy that plays this memoryless strategy at each states. We show in the following lemma that for each statesPW, strategyηguarantees reachingTwith probability 1. Lemma 9.If the team players play the memoryless collective strategyη, for every statesPW, the probability of reachingTis 1. ...

work page
[46]

Furthermore, we force that each state hasat mostone rank

Consistency of winning set and ranks.A statesis in the winning setWif and only if it has a rank. Furthermore, we force that each state hasat mostone rank. These two ensure that every state inWhas exactly one rank. Target states must have rank 0. We also forces0 to be inW. Φrank :“ ľ sPS ˜ ws Ø Nł k“0 rs,k ¸ ľ sPS ľ 0ďjăkďN ␣prs,j ^r s,kq ľ sPT rs,0 ľ ws0

work page
[47]

Φsupport :“ ľ sPS ľ PiPΠ ˜ ws Ñ ł aPAi us,i,a ¸

Non-Empty strategy support.If a state is inW, every team player must have at least one valid action in their support. Φsupport :“ ľ sPS ľ PiPΠ ˜ ws Ñ ł aPAi us,i,a ¸

work page
[48]

If the team plays actions from their supports, then for any opponent action b, all possible next states must be inW

Safety inW.The strategy must guarantee that the game never leaves W. If the team plays actions from their supports, then for any opponent action b, all possible next states must be inW. Φsafe :“ ľ sPS ľ ¯aPś i Avipsq ľ bPAO ¨ ˝ ˜ ľ PiPΠ us,i,ai ¸ Ñ ľ tPSupppδps,¯a,bqq wt ˛ ‚

work page
[49]

Progress (rank reduction).FromsPWwith rankką0, the strategy must have a positive probability to move to a state with a lower rank. Formally, for every opponent actionb, there must be a joint action in the team’s support that leads to a statetwith rank strictly less thank.Φprog “ ľ sPS Nľ k“1 ľ bPAO ¨ ˚˝rs,k Ñ ł ¯aPś i Avipsq ¨ ˝ ˜ ľ PiPΠ us,i,ai ¸ ^ ł tPS...

work page
[50]

Setw s “true for allsPW

LetWbe the set of almost-sure winning states underη. Setw s “true for allsPW

work page
[51]

For each sPW, letkbe the smallest index such thatsPW k

Define the ranks based on the layersWk constructed in Section 4.1. For each sPW, letkbe the smallest index such thatsPW k. Setr s,k “true and all other rank variables forsto false

work page
[52]

This assignment satisfies all constraints by construction

Setu s,i,a “true if and only ifηipsqpaq ą0. This assignment satisfies all constraints by construction. The Consistency con- straints hold because every state inWbelongs to exactly one layerW k. The Non-Empty Support holds becauseηis a valid strategy. The Safety constraint is Randomise Alone, Reach as a Team 39 satisfied because an almost-sure winning stra...

work page
[53]

These are linear constraints

Efficient handling of linear constraints.Our basic constraints are that proba- bilities must sum to 1 and be non-negative. These are linear constraints. Unlike penalty-based methods that may explore infeasible regions during intermediate steps, SLSQP exploits the structure of linear constraints to satisfy them exactly at every iteration [15]. Since the ge...

work page
[54]

This means the probability for most actions should be exactly zero

Efficiency with sparse strategies.In concurrent games, the best strategies are often simple: players sometimes rely on just a small number of actions rather than mixing many options. This means the probability for most actions should be exactly zero. SLSQP employs anactive-set strategywhich is designed to quickly identify and lock onto these zero values. ...

work page
[55]

For instance, Zhu and Borrelli [41] showed that SQP acts much faster than generic solvers when finding solutions to dynamic games

Proven performance in dynamic games.SQP-based methods are a standard choice for solving games that evolve over time. For instance, Zhu and Borrelli [41] showed that SQP acts much faster than generic solvers when finding solutions to dynamic games. They found that SQP is particularly good at handling the tight interactions between players. This performance...

work page
[56]

T.O.” denotes a timeout after 600 seconds. In cases where an algorithm timed out, we report the value obtained from the final completed iteration. “N/F

Sound under-approximation.Since our optimisation problem is non-convex, SLSQP behaves as a local optimiser. This guarantees that the valuev˚ it returns is a validunder-approximationof the true maximum (v˚ ďP repu kqpsq). In the context of formal verification, this is a “safe” error: we might underestimate the team’s winning chance, but we will never false...

work page 2000