pith. machine review for the scientific record. sign in

arxiv: 2603.07094 · v2 · submitted 2026-03-07 · 💻 cs.GT · cs.FL· cs.MA

Recognition: 2 theorem links

· Lean Theorem

Randomise Alone, Reach as a Team

Authors on Pith no claims yet

Pith reviewed 2026-05-15 15:46 UTC · model grok-4.3

classification 💻 cs.GT cs.FLcs.MA
keywords concurrent graph gamesdistributed randomizationmemoryless strategiesthreshold problemNP-hardnessalmost-sure reachabilityIRATL
0
0 comments X

The pith

Memoryless strategies suffice for teams to exceed winning probability thresholds in games with private randomness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies concurrent graph games in which several players cooperate against an opponent to reach a set of target states while each must randomize using a private source that remains hidden from the opponent and from teammates. It proves that for deciding whether the team can force a win probability strictly above any given threshold, it suffices for every player to choose actions based only on the current position. This sufficiency places the decision problem in the existential theory of the reals and permits value-iteration algorithms that compute the relevant probabilities. The authors additionally establish that the threshold problem is NP-hard while the special case of almost-sure reachability is NP-complete. They introduce Individually Randomised Alternating-time Temporal Logic to express such probabilistic team goals under distributed randomization.

Core claim

Memoryless strategies are sufficient for the threshold problem in concurrent graph games with distributed randomisation. The result embeds the decision problem in the existential theory of the reals and supports value-iteration algorithms. The threshold problem is NP-hard; almost-sure reachability is NP-complete. Individually Randomised Alternating-time Temporal Logic supplies the semantics for coalitions that lack a shared randomness source.

What carries the argument

Memoryless strategies under the distributed-randomisation model, which reduce the threshold decision to an existential formula over the reals.

If this is right

  • Value-iteration algorithms can compute winning probabilities for the threshold problem.
  • The threshold problem is NP-hard, so polynomial-time algorithms are unlikely unless P equals NP.
  • Almost-sure reachability admits an NP decision procedure and is therefore complete for that class.
  • IRATL provides a logical language for specifying probability thresholds when randomness is private to each player.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If players could observe one another's random outcomes, memory might again become necessary and the complexity could rise.
  • Verification tools for multi-agent systems can therefore prioritise memoryless implementations when randomness sources are known to be private.
  • The same reduction technique may apply to related objectives such as safety or parity when the underlying graph remains finite.

Load-bearing premise

Each player's random source is completely private and stays hidden from the opponent and from all other team members throughout the game.

What would settle it

A finite graph game in which the optimal probability above a chosen threshold can be achieved only by a strategy that remembers previous random outcomes or positions would falsify the memoryless sufficiency claim.

Figures

Figures reproduced from arXiv: 2603.07094 by Alipasha Montaseri, Ali Shafiee, K. S. Thejaswini, L\'eonard Brice, Thomas A. Henzinger.

Figure 1
Figure 1. Figure 1: The two agents choose among actions tL, Ru and the environment chooses among actions tL, Ru . Consider first the setting where R2D2 and C3PO have access to a shared source of randomness, which can’t be observed by the adversary. Such games can be viewed as standard two-player concurrent games, where the team acts as a single meta-player. By playing the joint actions pL, Lq or pR, Rq with probability 1 2 ea… view at source ↗
Figure 2
Figure 2. Figure 2: An example game for opponent with no memory. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Execution time of the algorithms across three benchmarks. The plots [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The plot compares the execution time (in seconds, logarithmic scale) [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualizations of the six scenarios used for evaluation. Scenarios 1–4 in [PITH_FULL_IMAGE:figures/full_fig_p043_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualizations of the seven scenarios used for the almost-sure experiment. [PITH_FULL_IMAGE:figures/full_fig_p044_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The four benchmark scenarios for the robot game. Robots are shown as [PITH_FULL_IMAGE:figures/full_fig_p045_7.png] view at source ↗
read the original abstract

We study concurrent graph games where n players cooperate against an opponent to reach a set of target states. Unlike traditional settings, we study distributed randomisation: team players do not share a source of randomness, and their private random sources are hidden from the opponent and from each other. We show that memoryless strategies are sufficient for the threshold problem (deciding whether there is a strategy for the team that ensures winning with probability that exceeds a threshold), a result that not only places the problem in the Existential Theory of the Reals (\exists\mathbb{R}) but also enables the construction of value iteration algorithms. We additionally show that the threshold problem is NP-hard. For the almost-sure reachability problem, we prove NP-completeness. We introduce Individually Randomised Alternating-time Temporal Logic (IRATL). This logic extends the standard ATL framework to reason about probability thresholds, with semantics explicitly designed for coalitions that lack a shared source of randomness. On the practical side, we implement and evaluate a solver for the threshold and almost-sure problem based on the algorithms that we develop.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper studies concurrent graph games in which a coalition of n players cooperates against an opponent to reach a target set, using distributed randomization in which each player has a private random source hidden from the opponent and from the other players. The central claims are that memoryless strategies suffice for the threshold reachability problem (placing it in ∃R and enabling value iteration), that the threshold problem is NP-hard, and that almost-sure reachability is NP-complete. The authors introduce Individually Randomised Alternating-time Temporal Logic (IRATL) whose semantics are defined for coalitions without shared randomness and present an implemented solver for the decision problems.

Significance. If the memoryless-sufficiency result holds, the work supplies a complexity classification (∃R membership together with NP-hardness) and a practical algorithmic route (value iteration) for reachability under private, hidden randomness; this extends the standard shared-randomness model of stochastic games and ATL. The NP-completeness result for almost-sure reachability gives a tight bound, and the implemented solver provides an immediate practical contribution for verification of multi-agent systems with individual randomization.

major comments (2)
  1. [Abstract] Abstract: the claim that memoryless strategies suffice for the threshold problem rests on an encoding of per-state action distributions as existential real variables whose induced reachability probabilities satisfy a system of polynomial inequalities; no explicit system of inequalities, no proof sketch, and no verification that the encoding preserves the private-randomness semantics are supplied, rendering the ∃R membership and value-iteration claims unverifiable from the given text.
  2. [Abstract] Abstract: the NP-hardness reduction for the threshold problem and the NP-completeness proof for almost-sure reachability are asserted without describing the source problem, the gadget construction, or how the reduction respects the finite turn-based concurrent graph and hidden-private-randomness model; these details are load-bearing for the hardness claims.
minor comments (1)
  1. The syntax and semantics of IRATL are introduced but not displayed; a short formal definition or example formula would improve readability for readers outside the ATL community.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the significance of the memoryless-sufficiency result and the complexity classifications. We address each major comment below with clarifications drawn from the full manuscript and indicate where revisions will strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that memoryless strategies suffice for the threshold problem rests on an encoding of per-state action distributions as existential real variables whose induced reachability probabilities satisfy a system of polynomial inequalities; no explicit system of inequalities, no proof sketch, and no verification that the encoding preserves the private-randomness semantics are supplied, rendering the ∃R membership and value-iteration claims unverifiable from the given text.

    Authors: The explicit system of polynomial inequalities (encoding independent per-player action distributions at each state together with the resulting reachability probabilities) and the proof that this encoding preserves private-randomness semantics appear in Section 3.2 of the full manuscript; the ∃R membership follows immediately from existential quantification over the real variables, and value iteration is justified by the memoryless property established in Theorem 3. We agree the abstract is too terse and will revise it to include a concise sketch of the encoding and its semantic preservation. revision: partial

  2. Referee: [Abstract] Abstract: the NP-hardness reduction for the threshold problem and the NP-completeness proof for almost-sure reachability are asserted without describing the source problem, the gadget construction, or how the reduction respects the finite turn-based concurrent graph and hidden-private-randomness model; these details are load-bearing for the hardness claims.

    Authors: The NP-hardness reduction for threshold reachability (from 3-SAT) and the NP-completeness proof for almost-sure reachability (from SAT) are fully detailed in Sections 4.1 and 4.2, respectively, with explicit gadget constructions that enforce the concurrent turn-based structure and independent private randomness. We acknowledge that the abstract omits even a high-level description of the source problems and key gadget properties; we will add a brief overview of both reductions to the revised abstract. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The central claim that memoryless strategies suffice for the threshold problem in finite turn-based concurrent graph games with private hidden randomness is presented as a direct result of encoding per-state action distributions as existential real variables satisfying polynomial inequalities. No equations, self-citations, or reductions in the abstract or provided text equate the claimed sufficiency to a fitted parameter, renamed input, or load-bearing self-citation chain. The placement in ∃R and NP-hardness follow from standard complexity reductions in game theory without circular self-definition. The model assumptions (private randomness sources, no shared/observable randomness) are stated explicitly and do not presuppose the memoryless result.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on the standard definition of concurrent graph games and the assumption that randomness is strictly private; no free parameters or invented physical entities are introduced.

axioms (2)
  • standard math Finite turn-based concurrent graph games with reachability objectives
    Invoked in the opening sentence of the abstract as the base model.
  • domain assumption Private random sources are independent and hidden from all other agents
    Explicitly stated as the distinguishing feature of the setting.
invented entities (1)
  • IRATL logic no independent evidence
    purpose: To reason about probability thresholds for coalitions without shared randomness
    New syntax and semantics introduced to capture the distributed-randomization setting.

pith-pipeline@v0.9.0 · 5513 in / 1418 out tokens · 41251 ms · 2026-05-15T15:46:18.285425+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 2 internal anchors

  1. [1]

    In: CONCUR 2001 — Concurrency Theory

    de Alfaro, L., Henzinger, T.A., Jhala, R.: Compositional methods for probabilistic systems. In: CONCUR 2001 — Concurrency Theory. pp. 351–365. Springer Berlin Heidelberg, Berlin, Heidelberg (2001)

  2. [2]

    Alur, R., Henzinger, T.A., Kupferman, O.: Alternating-time temporal logic. J. ACM49(5), 672–713 (2002).https://doi.org/10.1145/585265.585270,https: //doi.org/10.1145/585265.585270

  3. [3]

    Attoui, A.: Multi-agent based method for reactive systems formal specification and validation. IFAC Proceedings Volumes29(2), 1–6 (1996).https://doi.org/ https://doi.org/10.1016/S1474-6670(17)43768-2, 5th IFAC/IFIP/GI/GMA Workshop on Experience With the Management of Software Projects 1995 (MSP ’95)

  4. [4]

    CoRRabs/2510.02984(2025).https://doi.org/10

    Bertrand, N., Bouyer, P., Lapointe, L., Mascle, C.: Reach together: How popula- tions win repeated games. CoRRabs/2510.02984(2025).https://doi.org/10. 48550/ARXIV.2510.02984,https://doi.org/10.48550/arXiv.2510.02984

  5. [5]

    In: FSTTCS 2019

    Bertrand, N., Bouyer, P., Majumdar, A.: Concurrent parameterized games. In: FSTTCS 2019. LIPIcs, vol. 150, pp. 31:1–31:15. Schloss Dagstuhl - Leibniz- Zentrum für Informatik (2019).https://doi.org/10.4230/LIPICS.FSTTCS.2019. 31

  6. [6]

    In: FSTTCS 2020

    Bertrand, N., Bouyer, P., Majumdar, A.: Synthesizing safe coalition strategies. In: FSTTCS 2020. LIPIcs, vol. 182, pp. 39:1–39:17. Schloss Dagstuhl - Leibniz- Zentrum für Informatik (2020).https://doi.org/10.4230/LIPICS.FSTTCS.2020. 39

  7. [7]

    The Annals of Mathematical Statis- tics33(2), 719–726 (1962)

    Blackwell, D.: Discrete dynamic programming. The Annals of Mathematical Statis- tics33(2), 719–726 (1962)

  8. [8]

    Brice, L., Henzinger, T.A., Thejaswini, K.S.: Dicey games: Shared sources of ran- domness in distributed systems (2026),https://arxiv.org/abs/2601.18303

  9. [9]

    In: Pro- ceedings of the Twentieth Annual ACM Symposium on Theory of Computing STOC

    Canny, J.: Some algebraic and geometric computations in PSPACE. In: Pro- ceedings of the Twentieth Annual ACM Symposium on Theory of Computing STOC. p. 460–467. STOC ’88, Association for Computing Machinery (1988). https://doi.org/10.1145/62212.62257

  10. [10]

    Strategy Improvement for Concurrent Reachability and Safety Games

    Chatterjee, K., de Alfaro, L., Henzinger, T.A.: Strategy improvement for con- current reachability and safety games. CoRRabs/1201.2834(2012),https: //arxiv.org/abs/1201.2834, preprint version of the combined QEST/SODA re- sults

  11. [11]

    In: TACAS 2013

    Chen, T., Forejt, V., Kwiatkowska, M.Z., Parker, D., Simaitis, A.: Prism-games: A model checker for stochastic multi-player games. In: TACAS 2013. Lecture Notes in Computer Science, vol. 7795, pp. 185–191. Springer (2013).https://doi.org/ 10.1007/978-3-642-36742-7_13

  12. [12]

    In: Proceedings of the 18th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS)

    Chen, T., Forejt, V., Kwiatkowska, M., Parker, D., Simaitis, A.: Automatic verifi- cation of competitive stochastic systems. In: Proceedings of the 18th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). Lecture Notes in Computer Science, vol. 7214, pp. 315–330. Springer (2012).https://doi.org/10.1007/97...

  13. [13]

    In: Lei, J

    Chen, T., Lu, J.: Probabilistic alternating-time temporal logic and model checking algorithm. In: Lei, J. (ed.) FSKD 2007. pp. 35–39. IEEE Computer Society (2007)

  14. [14]

    Theoretical Computer Science386(3), 188–217 (2007).https://doi.org/https: //doi.org/10.1016/j.tcs.2007.07.008, expressiveness in Concurrency 20 L

    de Alfaro, L., Henzinger, T.A., Kupferman, O.: Concurrent reachability games. Theoretical Computer Science386(3), 188–217 (2007).https://doi.org/https: //doi.org/10.1016/j.tcs.2007.07.008, expressiveness in Concurrency 20 L. Brice, T. A. Henzinger, A. Montaseri, A. Shafiee, and K. S. Thejaswini

  15. [15]

    In: Nonlinear model predictive control: towards new challenging applications, pp

    Diehl, M., Ferreau, H.J., Haverbeke, N.: Efficient numerical methods for nonlin- ear mpc and moving horizon estimation. In: Nonlinear model predictive control: towards new challenging applications, pp. 391–417. Springer (2009)

  16. [16]

    Etessami, K., Yannakakis, M.: Recursive markov decision processes and recursive stochastic games. J. ACM62(2) (May 2015).https://doi.org/10.1145/2699431, https://doi.org/10.1145/2699431

  17. [17]

    Forejt, V., Kwiatkowska, M., Norman, G., Parker, D.: Automated Verifica- tion Techniques for Probabilistic Systems, pp. 53–113. Springer Berlin Heidel- berg, Berlin, Heidelberg (2011).https://doi.org/10.1007/978-3-642-21455-4_ 3,https://doi.org/10.1007/978-3-642-21455-4_3

  18. [18]

    In: Advances in Neural Information Processing Systems (NIPS)

    Guestrin, C., Koller, D., Parr, R.: Multiagent planning with factored MDPs. In: Advances in Neural Information Processing Systems (NIPS). vol. 14, pp. 1523–

  19. [19]

    In: Lahiri, S.K., Wang, C

    Gutierrez, J., Najib, M., Perelli, G., Wooldridge, M.J.: EVE: A tool for temporal equilibrium analysis. In: Lahiri, S.K., Wang, C. (eds.) Automated Technology for Verification and Analysis - 16th International Symposium, ATVA 2018, Los An- geles, CA, USA, October 7-10, 2018, Proceedings. Lecture Notes in Computer Science, vol. 11138, pp. 551–557. Springer...

  20. [20]

    In: Computer Science – Theory and Applications

    Hansen, K.A., Ibsen-Jensen, R., Miltersen, P.B.: The complexity of solving reach- ability games using value and strategy iteration. In: Computer Science – Theory and Applications. Springer (2011)

  21. [21]

    Hensel, C., Junges, S., Katoen, J.P., Quatmann, T., Volk, M.: The probabilis- tic model checker storm. Int. J. Softw. Tools Technol. Transf.24(4), 589–610 (Aug 2022).https://doi.org/10.1007/s10009-021-00633-z,https://doi.org/ 10.1007/s10009-021-00633-z

  22. [22]

    Immerman, N.: Number of quantifiers is better than number of tape cells. J. Comput. Syst. Sci.22, 384–406 (1981),https://api.semanticscholar.org/ CorpusID:40753086

  23. [23]

    Journal of Machine Learning Research (JMLR)7(65), 1789–1828 (2006)

    Kok, J.R., Vlassis, N.: Collaborative multiagent reinforcement learning by pay- off propagation. Journal of Machine Learning Research (JMLR)7(65), 1789–1828 (2006)

  24. [24]

    Kraft, D.: A software package for sequential quadratic programming. Tech. Rep. Forschungsbericht FB 88-28, Deutsche Forschungs- und Versuchsanstalt für Luft- und Raumfahrt (DFVLR), Köln, Germany (1988)

  25. [25]

    In: Computer Aided Ver- ification - 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21-24, 2020, Proceedings, Part II

    Kwiatkowska, M., Norman, G., Parker, D., Santos, G.: Prism-games 3.0: Stochastic game verification with concurrency, equilibria and time. In: Computer Aided Ver- ification - 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21-24, 2020, Proceedings, Part II. Lecture Notes in Computer Science, vol. 12225, pp. 475–487. Springer (2020).http...

  26. [26]

    Formal Methods in System Design58(1), 188–250 (2021)

    Kwiatkowska, M., Norman, G., Parker, D., Santos, G.: Automatic verification of concurrent stochastic systems. Formal Methods in System Design58(1), 188–250 (2021)

  27. [27]

    Křetínský,J.,Meggendorfer,T.,Weininger,M.:Stoppingcriteriaforvalueiteration onstochasticgameswithquantitativeobjectives.In:202338thAnnualACM/IEEE Symposium on Logic in Computer Science (LICS). pp. 1–14 (2023).https://doi. org/10.1109/LICS56636.2023.10175771

  28. [28]

    Operations Research29(5), 971–982 (1981),http://www.jstor.org/stable/170234 Randomise Alone, Reach as a Team 21

    Lefèvre, C.: Optimal control of a birth and death epidemic process. Operations Research29(5), 971–982 (1981),http://www.jstor.org/stable/170234 Randomise Alone, Reach as a Team 21

  29. [29]

    Lomuscio, A., Qu, H., Raimondi, F.: MCMAS: an open-source model checker for the verification of multi-agent systems. Int. J. Softw. Tools Technol. Transf.19(1), 9–30 (2017).https://doi.org/10.1007/S10009-015-0378-X,https://doi.org/ 10.1007/s10009-015-0378-x

  30. [30]

    Procedia Manufacturing51, 1200–1206 (2020).https://doi.org/https://doi.org/10.1016/j.promfg.2020

    Lyu, G., Fazlirad, A., Brennan, R.W.: Multi-agent modeling of cyber-physical systems for iec 61499 based distributed automation. Procedia Manufacturing51, 1200–1206 (2020).https://doi.org/https://doi.org/10.1016/j.promfg.2020. 10.168, 30th International Conference on Flexible Automation and Intelligent Manufacturing (FAIM2021)

  31. [31]

    de Moura, L., Bjørner, N.: Z3: an efficient smt solver. vol. 4963, pp. 337–340 (04 2008).https://doi.org/10.1007/978-3-540-78800-3_24

  32. [32]

    In: Logics in Artificial Intelligence

    Murano, A., Neider, D., Zimmermann, M.: Robust alternating-time temporal logic. In: Logics in Artificial Intelligence. pp. 796–813. Springer Nature Switzerland, Cham (2023)

  33. [33]

    In: Theory and Applications of Graphs: Proceedings, Michigan May 11–15, 1976, pp

    Parsons, T.D.: Pursuit-evasion in a graph. In: Theory and Applications of Graphs: Proceedings, Michigan May 11–15, 1976, pp. 426–441. Springer (2006)

  34. [34]

    John Wiley & Sons (1994)

    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Pro- gramming. John Wiley & Sons (1994)

  35. [35]

    McGraw-Hill, New York, 3rd edn

    Rudin, W.: Principles of Mathematical Analysis. McGraw-Hill, New York, 3rd edn. (1976)

  36. [36]

    CoRRabs/2407.18006(2024).https://doi.org/ 10.48550/ARXIV.2407.18006,https://doi.org/10.48550/arXiv.2407.18006

    Schaefer, M., Cardinal, J., Miltzow, T.: The existential theory of the reals as a com- plexity class: A compendium. CoRRabs/2407.18006(2024).https://doi.org/ 10.48550/ARXIV.2407.18006,https://doi.org/10.48550/arXiv.2407.18006

  37. [37]

    IEEE Transactions on Automatic Control 62(1), 65–80 (2016)

    Shi, D., Elliott, R.J., Chen, T.: On finite-state stochastic modeling and secure estimation of cyber-physical systems. IEEE Transactions on Automatic Control 62(1), 65–80 (2016)

  38. [38]

    http://baldur

    Sorensson, N.: Minisat 2.2 and minisat++ 1.1. http://baldur. iti. uka. de/sat-race- 2010/descriptions/solver_25+ 26. pdf (2010)

  39. [39]

    Nature Methods17, 261–272 (2020).https://doi.org/10.1038/ s41592-019-0686-2

    Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Courna- peau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, İ., Feng, Y., Moore, E.W., Vander- Plas, J., Laxalde, D., Perktold, J., Cim...

  40. [40]

    Handbook of reinforcement learning and con- trol pp

    Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and con- trol pp. 321–384 (2021)

  41. [41]

    2023 IEEE International Confer- ence on Robotics and Automation (ICRA) pp

    Zhu, E.L., Borrelli, F.: A sequential quadratic programming approach to the so- lution of open-loop generalized nash equilibria. 2023 IEEE International Confer- ence on Robotics and Automation (ICRA) pp. 3211–3217 (2022),https://api. semanticscholar.org/CorpusID:247793988 A The existential theory of the reals Definition 3 (Existential theory of the reals)...

  42. [42]

    LetV λ σ˚ be the value of the game under collective memoryless strategy¯σ˚ Π with discount factorλ

    It was shown in [7], that (this intuition is actually correct) asλÑ1´, the expected reward is equal to the actual reachability probability. LetV λ σ˚ be the value of the game under collective memoryless strategy¯σ˚ Π with discount factorλ. We haveV˚ “lim λÑ1´ V λ σ˚ ěV 1. SinceV 1 is strictly greater thant, we haveV ˚ ąt. By the definition of the limit, t...

  43. [43]

    Lemma 6.If the graphGcontains a clique of sizek, the team has a memoryless strategy to reachJwith probability1

    We show that the team can win with proba- bility greater thantif and only if the graph has a clique of sizek. Lemma 6.If the graphGcontains a clique of sizek, the team has a memoryless strategy to reachJwith probability1. Proof.Assume that the graphGhas a clique of sizek. Lettc 1, c2, . . . , ckube the vertices of this clique. We define a memoryless colle...

  44. [44]

    The game moves toWk´1 with a positive probability on the next turn

  45. [45]

    Letηbe the strategy that plays this memoryless strategy at each states

    The game remains inWwith probability 1 on the next turn. Letηbe the strategy that plays this memoryless strategy at each states. We show in the following lemma that for each statesPW, strategyηguarantees reachingTwith probability 1. Lemma 9.If the team players play the memoryless collective strategyη, for every statesPW, the probability of reachingTis 1. ...

  46. [46]

    Furthermore, we force that each state hasat mostone rank

    Consistency of winning set and ranks.A statesis in the winning setWif and only if it has a rank. Furthermore, we force that each state hasat mostone rank. These two ensure that every state inWhas exactly one rank. Target states must have rank 0. We also forces0 to be inW. Φrank :“ ľ sPS ˜ ws Ø Nł k“0 rs,k ¸ ľ sPS ľ 0ďjăkďN ␣prs,j ^r s,kq ľ sPT rs,0 ľ ws0

  47. [47]

    Φsupport :“ ľ sPS ľ PiPΠ ˜ ws Ñ ł aPAi us,i,a ¸

    Non-Empty strategy support.If a state is inW, every team player must have at least one valid action in their support. Φsupport :“ ľ sPS ľ PiPΠ ˜ ws Ñ ł aPAi us,i,a ¸

  48. [48]

    If the team plays actions from their supports, then for any opponent action b, all possible next states must be inW

    Safety inW.The strategy must guarantee that the game never leaves W. If the team plays actions from their supports, then for any opponent action b, all possible next states must be inW. Φsafe :“ ľ sPS ľ ¯aPś i Avipsq ľ bPAO ¨ ˝ ˜ ľ PiPΠ us,i,ai ¸ Ñ ľ tPSupppδps,¯a,bqq wt ˛ ‚

  49. [49]

    Progress (rank reduction).FromsPWwith rankką0, the strategy must have a positive probability to move to a state with a lower rank. Formally, for every opponent actionb, there must be a joint action in the team’s support that leads to a statetwith rank strictly less thank.Φprog “ ľ sPS Nľ k“1 ľ bPAO ¨ ˚˝rs,k Ñ ł ¯aPś i Avipsq ¨ ˝ ˜ ľ PiPΠ us,i,ai ¸ ^ ł tPS...

  50. [50]

    Setw s “true for allsPW

    LetWbe the set of almost-sure winning states underη. Setw s “true for allsPW

  51. [51]

    For each sPW, letkbe the smallest index such thatsPW k

    Define the ranks based on the layersWk constructed in Section 4.1. For each sPW, letkbe the smallest index such thatsPW k. Setr s,k “true and all other rank variables forsto false

  52. [52]

    This assignment satisfies all constraints by construction

    Setu s,i,a “true if and only ifηipsqpaq ą0. This assignment satisfies all constraints by construction. The Consistency con- straints hold because every state inWbelongs to exactly one layerW k. The Non-Empty Support holds becauseηis a valid strategy. The Safety constraint is Randomise Alone, Reach as a Team 39 satisfied because an almost-sure winning stra...

  53. [53]

    These are linear constraints

    Efficient handling of linear constraints.Our basic constraints are that proba- bilities must sum to 1 and be non-negative. These are linear constraints. Unlike penalty-based methods that may explore infeasible regions during intermediate steps, SLSQP exploits the structure of linear constraints to satisfy them exactly at every iteration [15]. Since the ge...

  54. [54]

    This means the probability for most actions should be exactly zero

    Efficiency with sparse strategies.In concurrent games, the best strategies are often simple: players sometimes rely on just a small number of actions rather than mixing many options. This means the probability for most actions should be exactly zero. SLSQP employs anactive-set strategywhich is designed to quickly identify and lock onto these zero values. ...

  55. [55]

    For instance, Zhu and Borrelli [41] showed that SQP acts much faster than generic solvers when finding solutions to dynamic games

    Proven performance in dynamic games.SQP-based methods are a standard choice for solving games that evolve over time. For instance, Zhu and Borrelli [41] showed that SQP acts much faster than generic solvers when finding solutions to dynamic games. They found that SQP is particularly good at handling the tight interactions between players. This performance...

  56. [56]

    T.O.” denotes a timeout after 600 seconds. In cases where an algorithm timed out, we report the value obtained from the final completed iteration. “N/F

    Sound under-approximation.Since our optimisation problem is non-convex, SLSQP behaves as a local optimiser. This guarantees that the valuev˚ it returns is a validunder-approximationof the true maximum (v˚ ďP repu kqpsq). In the context of formal verification, this is a “safe” error: we might underestimate the team’s winning chance, but we will never false...