Multi-agent Reach-avoid MDP via Potential Games and Low-rank Policy Structure

Abraham P. Vinod; Adam Casselman; Sarah H.Q. Li

arxiv: 2410.17690 · v2 · submitted 2024-10-23 · 📡 eess.SY · cs.GT· cs.MA· cs.RO· cs.SY

Multi-agent Reach-avoid MDP via Potential Games and Low-rank Policy Structure

Adam Casselman , Abraham P. Vinod , Sarah H.Q. Li This is my paper

Pith reviewed 2026-05-23 19:07 UTC · model grok-4.3

classification 📡 eess.SY cs.GTcs.MAcs.ROcs.SY

keywords multi-agent MDPreach-avoidpotential gamelocal feedback policiesrank-one factorizationiterative best responseNash equilibriumdynamic programming

0 comments

The pith

Local feedback policies in multi-agent reach-avoid MDPs are rank-one factorizations of global policies and induce a potential game structure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a method to optimize finite-horizon multi-agent reach-avoid Markov decision processes by restricting attention to local feedback policies. Global feedback policies achieve optimality but incur communication, memory, and computation costs that grow exponentially with the number of agents. Local policies are shown to be rank-one factorizations of global policies, which directly reduces those costs. The formulation over local policies is further shown to possess a potential game structure, so that iterative best response becomes a tractable learning procedure guaranteed to converge to a deterministic Nash equilibrium, with each agent's best response obtained from multiplicative dynamic programming over the joint state space.

Core claim

Multi-agent reach-avoid MDPs over local feedback policies admit a potential game structure, and local feedback policies are rank-one factorizations of the corresponding global feedback policies. This structure enables iterative best response as a multi-agent learning scheme that converges to a deterministic Nash equilibrium, with each agent's best response computed via multiplicative dynamic programming.

What carries the argument

Local feedback policies, proven to be rank-one factorizations of global feedback policies, inside a potential-game formulation of the multi-agent reach-avoid MDP.

If this is right

Communication complexity and memory usage no longer scale exponentially with the number of agents.
Iterative best response converges to a deterministic Nash equilibrium.
Each agent's best response is obtained by multiplicative dynamic programming over the joint state space.
Peak memory usage and offline computation complexity are reduced while approximation error to the global optimum remains small.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The rank-one factorization may allow similar complexity reductions in other multi-agent stochastic control settings that admit global feedback policies.
Because the game is potential, other no-regret dynamics besides iterative best response could also be applied.
The approach could be combined with approximate dynamic programming to further scale to very large joint state spaces.

Load-bearing premise

Local feedback policies maintain an acceptable approximation error to the global reach-avoid objective, supported only by numerical simulations rather than a general bound.

What would settle it

An MDP instance in which the reach-avoid probability achieved by the local-policy solution falls substantially below the probability achieved by the global-policy solution.

Figures

Figures reproduced from arXiv: 2410.17690 by Abraham P. Vinod, Adam Casselman, Sarah H.Q. Li.

**Figure 2.** Figure 2: Computation time (seconds) and best response iteration (k) [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

We optimize finite horizon multi-agent reach-avoid Markov decision process (MDP) via \emph{local feedback policies}. The global feedback policy solution yields global optimality but its communication complexity, memory usage and computation complexity scale exponentially with the number of agents. We mitigate this exponential dependency by restricting the solution space to local feedback policies and show that local feedback policies are rank-one factorizations of global feedback policies, which provides a principled approach to reducing communication complexity and memory usage. Additionally, by demonstrating that multi-agent reach-avoid MDPs over local feedback policies has a potential game structure, we show that iterative best response is a tractable multi-agent learning scheme with guaranteed convergence to deterministic Nash equilibrium, and derive each agent's best response via multiplicative dynamic program (DP) over the joint state space. Numerical simulations across different MDPs and agent sets show that the peak memory usage and offline computation complexity are significantly reduced while the approximation error to the optimal global reach-avoid objective is maintained.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Local rank-one policies create a potential game for multi-agent reach-avoid MDPs with convergent best response, but the suboptimality gap to the global optimum has no general bound.

read the letter

The paper shows that restricting to local feedback policies in finite-horizon multi-agent reach-avoid MDPs yields rank-one factorizations of the global policies. This turns the problem into a potential game, so iterative best response converges to a deterministic Nash equilibrium, and each agent's best response can be computed with a multiplicative dynamic program over the joint state space. Simulations then report lower memory and computation with acceptable error on the tested cases. This specific combination for reach-avoid is new and gives a clean structural handle on the exponential scaling in agents. The derivations follow standard potential-game and MDP properties without circularity, and the reduction in communication and memory is a direct consequence of the local policy class. The main limitation is the one flagged in the stress test: there is no analytic bound on how far the local Nash value sits from the global optimum. The claim that approximation error stays controlled rests entirely on the numerical examples across different MDPs and agent sets. Without a general guarantee, it is unclear how the gap behaves with larger teams, longer horizons, or different dynamics. This work is aimed at control researchers and roboticists who need scalable safety methods for multi-agent systems. A reader looking for structured approximations rather than exact global solutions will find the method and the empirical results worth examining. It deserves peer review because the approach is technically coherent and addresses a concrete scalability issue, even if the error analysis will require strengthening.

Referee Report

3 major / 2 minor

Summary. The paper claims that finite-horizon multi-agent reach-avoid MDPs can be optimized over local feedback policies, which are rank-one factorizations of global policies. This restriction reduces exponential scaling in communication, memory, and computation. The resulting game over local policies admits a potential-game structure, so iterative best response converges to a deterministic Nash equilibrium; each agent's best response is obtained from a multiplicative dynamic program over the joint state space. Simulations on varied MDPs and agent counts indicate that peak memory and offline computation are reduced while approximation error to the global optimum is maintained.

Significance. If the claims hold with supporting analysis, the work supplies a concrete mechanism (rank-one factorization plus potential-game structure) for making multi-agent reach-avoid problems tractable. The explicit multiplicative DP for best responses and the convergence guarantee to deterministic Nash equilibria are concrete strengths that could be useful for distributed safety-critical control.

major comments (3)

[Abstract] Abstract and the section deriving the suboptimality claim: the statement that 'the approximation error to the optimal global reach-avoid objective is maintained' rests entirely on numerical simulations. No analytic bound is supplied on ||V_local^* - V_global^*|| for arbitrary horizon, state space, or number of agents; without such a bound the practical utility of the local-policy restriction for the original objective is not theoretically justified.
[Section on rank-one factorization] Section establishing the rank-one factorization (likely §3): the claim that local feedback policies are rank-one factorizations of global policies is central to the complexity reduction, yet the manuscript provides only a high-level statement. A precise definition of the factorization, the conditions under which it holds, and the resulting memory/communication savings (with explicit scaling) are required to substantiate the 'principled approach'.
[Section on potential game and multiplicative DP] Section on the potential-game structure and multiplicative DP (likely §4-5): while the potential-game property yields convergence of iterative best response, the derivation of the multiplicative DP update for each agent's best response must be checked for correctness in the reach-avoid setting (especially the handling of the avoid set and the product-form value function). The absence of a sketched proof or key intermediate steps makes it impossible to verify that the DP indeed solves the best-response problem.

minor comments (2)

[Notation throughout] Clarify notation for local versus global policies and for the joint versus product state spaces; inconsistent symbols appear in the abstract and early sections.
[Numerical simulations section] In the numerical results, report quantitative error metrics (e.g., relative value gap) rather than qualitative statements that the error 'is maintained'; also state the exact MDP sizes, horizons, and agent counts used.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive comments. We address each major point below, indicating planned revisions where the manuscript can be strengthened.

read point-by-point responses

Referee: [Abstract] Abstract and the section deriving the suboptimality claim: the statement that 'the approximation error to the optimal global reach-avoid objective is maintained' rests entirely on numerical simulations. No analytic bound is supplied on ||V_local^* - V_global^*|| for arbitrary horizon, state space, or number of agents; without such a bound the practical utility of the local-policy restriction for the original objective is not theoretically justified.

Authors: We agree that the suboptimality claim rests on numerical evidence rather than an analytic bound. Deriving a general bound on ||V_local^* - V_global^*|| appears difficult for arbitrary horizons and agent counts due to the reach-avoid structure. In revision we will explicitly qualify the claim as empirically observed and discuss this limitation. revision: partial
Referee: [Section on rank-one factorization] Section establishing the rank-one factorization (likely §3): the claim that local feedback policies are rank-one factorizations of global policies is central to the complexity reduction, yet the manuscript provides only a high-level statement. A precise definition of the factorization, the conditions under which it holds, and the resulting memory/communication savings (with explicit scaling) are required to substantiate the 'principled approach'.

Authors: We will revise §3 to supply the precise definition: a local policy tuple induces a rank-one factorization of the global policy tensor via Π(s,a) = ∏_i π_i(s_i,a_i). This holds when policies are decentralized. We will also state the explicit scaling: memory and communication drop from O(|S|^N |A|^N) to O(N |S| |A|). revision: yes
Referee: [Section on potential game and multiplicative DP] Section on the potential-game structure and multiplicative DP (likely §4-5): while the potential-game property yields convergence of iterative best response, the derivation of the multiplicative DP update for each agent's best response must be checked for correctness in the reach-avoid setting (especially the handling of the avoid set and the product-form value function). The absence of a sketched proof or key intermediate steps makes it impossible to verify that the DP indeed solves the best-response problem.

Authors: We will add a sketched derivation (in the main text or appendix) showing that the best-response objective inherits a multiplicative structure from the potential function and the product-form policy. The avoid set is handled by setting terminal value to zero; the product-form value function then yields the multiplicative DP update. Key step: separability of the potential allows each agent’s DP to be solved independently over the joint state space. revision: yes

standing simulated objections not resolved

Providing a general analytic bound on ||V_local^* - V_global^*|| for arbitrary horizon, state space, and number of agents.

Circularity Check

0 steps flagged

No circularity; derivation is self-contained

full rationale

The paper derives local policies as rank-one factorizations of global policies directly from the MDP setup and shows the restricted multi-agent reach-avoid problem admits a potential-game structure, then invokes the standard property that potential games admit convergent best-response dynamics to a Nash equilibrium. Each agent's best response is obtained via a multiplicative DP on the joint state space. None of these steps reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the claims rest on explicit constructions and external game-theoretic facts rather than renaming or circular invocation of prior results by the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides insufficient detail to identify specific free parameters, axioms, or invented entities; relies on standard MDP and game-theoretic concepts without new postulates.

pith-pipeline@v0.9.0 · 5716 in / 1102 out tokens · 42536 ms · 2026-05-23T19:07:32.250932+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

A proposed taxonomy for advanced air mobility,

L. A. Garrow, B. German, N. T. Schwab, M. D. Patterson, N. Men- donca, Y . O. Gawdiak, and J. R. Murphy, “A proposed taxonomy for advanced air mobility,” in AIAA Aviation 2022 Forum, 2022, p. 3321

work page 2022
[2]

Advanced air mobility: Demand analysis and market potential of the airport shuttle and air taxi markets,

R. Goyal, C. Reiche, C. Fernando, and A. Cohen, “Advanced air mobility: Demand analysis and market potential of the airport shuttle and air taxi markets,” Sustainability, vol. 13, no. 13, p. 7421, 2021

work page 2021
[3]

Safety considerations for operation of unmanned aerial vehicles in the national airspace system,

R. E. Weibel and R. J. Hansman, “Safety considerations for operation of unmanned aerial vehicles in the national airspace system,” Tech. Rep., 2006

work page 2006
[4]

J. P. McGee, A. S. Mavor, and C. D. Wickens, Flight to the future: Human factors in air traffic control. National Academies Press, 1997

work page 1997
[5]

Applications of stochastic modeling in air traffic management: Methods, challenges and opportunities for solving air traffic problems under uncertainty,

R. Shone, K. Glazebrook, and K. G. Zografos, “Applications of stochastic modeling in air traffic management: Methods, challenges and opportunities for solving air traffic problems under uncertainty,” Euro. J. Oper. Res., pp. 1–26, 2021

work page 2021
[6]

On maximizing safety in stochastic aircraft trajectory planning with uncer- tain thunderstorm development,

D. Hentzen, M. Kamgarpour, M. Soler, and D. Gonz ´alez-Arribas, “On maximizing safety in stochastic aircraft trajectory planning with uncer- tain thunderstorm development,” Aerospace Science and Technology , vol. 79, pp. 543–553, 2018

work page 2018
[7]

Markov decision process routing games,

D. Calderone and S. Sastry, “Markov decision process routing games,” in Int’l Conf. Cyber-Phys. Syst., 2017, pp. 273–279

work page 2017
[8]

Aircraft approach management using reachability and dynamic programming,

A. P. Vinod, S. Yamazaki, A. Chakrabarty, N. Yoshikawa, and S. Di Cairano, “Aircraft approach management using reachability and dynamic programming,” in 2024 American Control Conference (ACC). IEEE, 2024, pp. 318–324

work page 2024
[9]

Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems,

A. Abate, M. Prandini, J. Lygeros, and S. Sastry, “Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems,” Automatica, 2008

work page 2008
[10]

Verification of discrete time stochastic hybrid systems: A stochastic reach-avoid decision problem,

S. Summers and J. Lygeros, “Verification of discrete time stochastic hybrid systems: A stochastic reach-avoid decision problem,” Automat- ica, vol. 46, no. 12, pp. 1951–1961, 2010

work page 1951
[11]

Stochastic reachability of a target tube: Theory and computation,

A. Vinod and M. Oishi, “Stochastic reachability of a target tube: Theory and computation,” Automatica, 2021

work page 2021
[12]

Computing optimal joint chance constrained control policies,

N. Schmid, M. Fochesato, S. H. Li, T. Sutter, and J. Lygeros, “Computing optimal joint chance constrained control policies,” arXiv preprint arXiv:2312.10495, 2023

work page arXiv 2023
[13]

Multiplayer reach-avoid games via pairwise outcomes,

M. Chen, Z. Zhou, and C. J. Tomlin, “Multiplayer reach-avoid games via pairwise outcomes,” IEEE Transactions on Automatic Control , vol. 62, no. 3, pp. 1451–1457, 2016

work page 2016
[14]

The pursuit-evasion-defense differential game in dynamic constrained environments,

J. F. Fisac and S. S. Sastry, “The pursuit-evasion-defense differential game in dynamic constrained environments,” in 2015 54th IEEE conference on decision and control (CDC) . IEEE, 2015, pp. 4549– 4556

work page 2015
[15]

Safe platooning of unmanned aerial vehicles via reachability,

M. Chen, Q. Hu, C. Mackin, J. F. Fisac, and C. J. Tomlin, “Safe platooning of unmanned aerial vehicles via reachability,” in 2015 54th IEEE conference on decision and control (CDC) . IEEE, 2015, pp. 4695–4701

work page 2015
[16]

Hamilton–jacobi formulation for reach– avoid differential games,

K. Margellos and J. Lygeros, “Hamilton–jacobi formulation for reach– avoid differential games,” IEEE Transactions on automatic control , vol. 56, no. 8, pp. 1849–1861, 2011

work page 2011
[17]

Hierarchical game-theoretic planning for autonomous vehicles,

J. F. Fisac, E. Bronstein, E. Stefansson, D. Sadigh, S. S. Sastry, and A. D. Dragan, “Hierarchical game-theoretic planning for autonomous vehicles,” in2019 International conference on robotics and automation (ICRA). IEEE, 2019, pp. 9590–9596

work page 2019
[18]

The existence, uniqueness and stability of traffic equilibria,

M. J. Smith, “The existence, uniqueness and stability of traffic equilibria,” Transp. Res. Part B: Method. , pp. 295–304, 1979

work page 1979
[19]

Roughgarden, Selfish routing and the price of anarchy

T. Roughgarden, Selfish routing and the price of anarchy. MIT press, 2005

work page 2005
[20]

Congestion-aware path coordination game with markov decision process dynamics,

S. H. Li, D. Calderone, and B. Ac ¸ıkmes ¸e, “Congestion-aware path coordination game with markov decision process dynamics,” IEEE Control Systems Letters , vol. 7, pp. 431–436, 2022

work page 2022
[21]

Potential games,

D. Monderer and L. S. Shapley, “Potential games,” Games and economic behavior, 1996

work page 1996
[22]

Adaptive constraint satisfaction for markov decision process congestion games: Application to transportation networks,

S. H. Li, Y . Yu, N. I. Miguel, D. Calderone, L. J. Ratliff, and B. Ac ¸ıkmes ¸e, “Adaptive constraint satisfaction for markov decision process congestion games: Application to transportation networks,” Automatica, vol. 151, p. 110879, 2023

work page 2023
[23]

State based potential games,

J. R. Marden, “State based potential games,” Automatica, vol. 48, no. 12, pp. 3075–3088, 2012. APPENDIX A. Proof of Proposition 1 Proof. For each s ∈ S N , we prove the following recursive identity for (15): if V π t+1(s) satisfies (17), then V π t (s) satis- fies (17). 7 If V π t+1(s) satisfies (17), it is equivalent to V π t+1(s) = X τ RT t+1 (s, τ) Y j...

work page 2012

[1] [1]

A proposed taxonomy for advanced air mobility,

L. A. Garrow, B. German, N. T. Schwab, M. D. Patterson, N. Men- donca, Y . O. Gawdiak, and J. R. Murphy, “A proposed taxonomy for advanced air mobility,” in AIAA Aviation 2022 Forum, 2022, p. 3321

work page 2022

[2] [2]

Advanced air mobility: Demand analysis and market potential of the airport shuttle and air taxi markets,

R. Goyal, C. Reiche, C. Fernando, and A. Cohen, “Advanced air mobility: Demand analysis and market potential of the airport shuttle and air taxi markets,” Sustainability, vol. 13, no. 13, p. 7421, 2021

work page 2021

[3] [3]

Safety considerations for operation of unmanned aerial vehicles in the national airspace system,

R. E. Weibel and R. J. Hansman, “Safety considerations for operation of unmanned aerial vehicles in the national airspace system,” Tech. Rep., 2006

work page 2006

[4] [4]

J. P. McGee, A. S. Mavor, and C. D. Wickens, Flight to the future: Human factors in air traffic control. National Academies Press, 1997

work page 1997

[5] [5]

Applications of stochastic modeling in air traffic management: Methods, challenges and opportunities for solving air traffic problems under uncertainty,

R. Shone, K. Glazebrook, and K. G. Zografos, “Applications of stochastic modeling in air traffic management: Methods, challenges and opportunities for solving air traffic problems under uncertainty,” Euro. J. Oper. Res., pp. 1–26, 2021

work page 2021

[6] [6]

On maximizing safety in stochastic aircraft trajectory planning with uncer- tain thunderstorm development,

D. Hentzen, M. Kamgarpour, M. Soler, and D. Gonz ´alez-Arribas, “On maximizing safety in stochastic aircraft trajectory planning with uncer- tain thunderstorm development,” Aerospace Science and Technology , vol. 79, pp. 543–553, 2018

work page 2018

[7] [7]

Markov decision process routing games,

D. Calderone and S. Sastry, “Markov decision process routing games,” in Int’l Conf. Cyber-Phys. Syst., 2017, pp. 273–279

work page 2017

[8] [8]

Aircraft approach management using reachability and dynamic programming,

A. P. Vinod, S. Yamazaki, A. Chakrabarty, N. Yoshikawa, and S. Di Cairano, “Aircraft approach management using reachability and dynamic programming,” in 2024 American Control Conference (ACC). IEEE, 2024, pp. 318–324

work page 2024

[9] [9]

Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems,

A. Abate, M. Prandini, J. Lygeros, and S. Sastry, “Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems,” Automatica, 2008

work page 2008

[10] [10]

Verification of discrete time stochastic hybrid systems: A stochastic reach-avoid decision problem,

S. Summers and J. Lygeros, “Verification of discrete time stochastic hybrid systems: A stochastic reach-avoid decision problem,” Automat- ica, vol. 46, no. 12, pp. 1951–1961, 2010

work page 1951

[11] [11]

Stochastic reachability of a target tube: Theory and computation,

A. Vinod and M. Oishi, “Stochastic reachability of a target tube: Theory and computation,” Automatica, 2021

work page 2021

[12] [12]

Computing optimal joint chance constrained control policies,

N. Schmid, M. Fochesato, S. H. Li, T. Sutter, and J. Lygeros, “Computing optimal joint chance constrained control policies,” arXiv preprint arXiv:2312.10495, 2023

work page arXiv 2023

[13] [13]

Multiplayer reach-avoid games via pairwise outcomes,

M. Chen, Z. Zhou, and C. J. Tomlin, “Multiplayer reach-avoid games via pairwise outcomes,” IEEE Transactions on Automatic Control , vol. 62, no. 3, pp. 1451–1457, 2016

work page 2016

[14] [14]

The pursuit-evasion-defense differential game in dynamic constrained environments,

J. F. Fisac and S. S. Sastry, “The pursuit-evasion-defense differential game in dynamic constrained environments,” in 2015 54th IEEE conference on decision and control (CDC) . IEEE, 2015, pp. 4549– 4556

work page 2015

[15] [15]

Safe platooning of unmanned aerial vehicles via reachability,

M. Chen, Q. Hu, C. Mackin, J. F. Fisac, and C. J. Tomlin, “Safe platooning of unmanned aerial vehicles via reachability,” in 2015 54th IEEE conference on decision and control (CDC) . IEEE, 2015, pp. 4695–4701

work page 2015

[16] [16]

Hamilton–jacobi formulation for reach– avoid differential games,

K. Margellos and J. Lygeros, “Hamilton–jacobi formulation for reach– avoid differential games,” IEEE Transactions on automatic control , vol. 56, no. 8, pp. 1849–1861, 2011

work page 2011

[17] [17]

Hierarchical game-theoretic planning for autonomous vehicles,

J. F. Fisac, E. Bronstein, E. Stefansson, D. Sadigh, S. S. Sastry, and A. D. Dragan, “Hierarchical game-theoretic planning for autonomous vehicles,” in2019 International conference on robotics and automation (ICRA). IEEE, 2019, pp. 9590–9596

work page 2019

[18] [18]

The existence, uniqueness and stability of traffic equilibria,

M. J. Smith, “The existence, uniqueness and stability of traffic equilibria,” Transp. Res. Part B: Method. , pp. 295–304, 1979

work page 1979

[19] [19]

Roughgarden, Selfish routing and the price of anarchy

T. Roughgarden, Selfish routing and the price of anarchy. MIT press, 2005

work page 2005

[20] [20]

Congestion-aware path coordination game with markov decision process dynamics,

S. H. Li, D. Calderone, and B. Ac ¸ıkmes ¸e, “Congestion-aware path coordination game with markov decision process dynamics,” IEEE Control Systems Letters , vol. 7, pp. 431–436, 2022

work page 2022

[21] [21]

Potential games,

D. Monderer and L. S. Shapley, “Potential games,” Games and economic behavior, 1996

work page 1996

[22] [22]

Adaptive constraint satisfaction for markov decision process congestion games: Application to transportation networks,

S. H. Li, Y . Yu, N. I. Miguel, D. Calderone, L. J. Ratliff, and B. Ac ¸ıkmes ¸e, “Adaptive constraint satisfaction for markov decision process congestion games: Application to transportation networks,” Automatica, vol. 151, p. 110879, 2023

work page 2023

[23] [23]

State based potential games,

J. R. Marden, “State based potential games,” Automatica, vol. 48, no. 12, pp. 3075–3088, 2012. APPENDIX A. Proof of Proposition 1 Proof. For each s ∈ S N , we prove the following recursive identity for (15): if V π t+1(s) satisfies (17), then V π t (s) satis- fies (17). 7 If V π t+1(s) satisfies (17), it is equivalent to V π t+1(s) = X τ RT t+1 (s, τ) Y j...

work page 2012