Distributed TD Tracking with Linear Function Approximation over Directed Communication Networks

Haocheng Yang; Shengchao Zhao; Yongchao Liu

arxiv: 2605.04466 · v1 · submitted 2026-05-06 · 🧮 math.OC

Distributed TD Tracking with Linear Function Approximation over Directed Communication Networks

Haocheng Yang , Shengchao Zhao , Yongchao Liu This is my paper

Pith reviewed 2026-05-08 17:30 UTC · model grok-4.3

classification 🧮 math.OC

keywords multi-agent reinforcement learningtemporal difference learningdistributed algorithmsdirected graphspolicy evaluationlinear convergencevariance reductionfunction approximation

0 comments

The pith

The PP-DTD algorithm provides the first distributed TD method for MARL policy evaluation on directed graphs that matches single-agent convergence rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops PP-DTD, a new distributed algorithm that combines temporal difference learning with a push-pull mechanism for information exchange among agents connected by directed links. Variance reduction is added to stabilize the updates and accelerate convergence. The authors prove linear convergence to a small neighborhood of the optimum with constant learning rates and an O(1/T) rate with time-varying rates, for both independent samples and Markov chains. A sympathetic reader would care because many real multi-agent systems have asymmetric communication, such as in wireless or hierarchical setups, and this removes a major restriction from prior work. If true, it means agents can jointly evaluate policies efficiently without needing mutual communication channels.

Core claim

We study the policy evaluation problem in multi-agent reinforcement learning (MARL) over directed communication networks. We propose a Push-Pull-type distributed algorithm, named PP-DTD, for policy evaluation in MARL within the framework of temporal difference (TD) learning with linear function approximation. PP-DTD integrates TD learning with the Push-Pull mechanism to accommodate directed communication networks, and further utilizes variance reduction techniques to enhance both algorithmic stability and convergence rate. We show that PP-DTD achieves linear convergence to a neighborhood of the optimum under constant step-sizes and a convergence rate of O(T^{-1}) under decaying step-sizes wh

What carries the argument

PP-DTD, the Push-Pull distributed temporal difference algorithm that merges TD updates with push and pull operations for directed graphs plus variance reduction for improved rates.

Load-bearing premise

The communication network must be such that the push-pull operations allow information to flow in a way that enables consensus among all agents.

What would settle it

Running the algorithm on a directed graph with a clear information bottleneck where some agents cannot receive updates from others, and observing that the value function estimates do not converge to the claimed neighborhood.

Figures

Figures reproduced from arXiv: 2605.04466 by Haocheng Yang, Shengchao Zhao, Yongchao Liu.

**Figure 1.** Figure 1: i.i.d. setting. work together to cover all landmarks. In every step, each agent follows its policy to select an action from the set {up, down, left, right, stay} and receives a local reward based on its distance to its assigned landmark, along with an extra penalty for collisions with other agents. The underlying communication graph G with n agents is generated by adding random links to a ring network, wh… view at source ↗

read the original abstract

We study the policy evaluation problem in multi-agent reinforcement learning (MARL) over directed communication networks, where agents cooperate with each other to explore an unknown environment and accomplish a specific task. We propose a Push-Pull-type distributed algorithm, named PP-DTD, for policy evaluation in MARL within the framework of temporal difference (TD) learning with linear function approximation. PP-DTD integrates TD learning with the Push-Pull mechanism to accommodate directed communication networks, and further utilizes variance reduction techniques to enhance both algorithmic stability and convergence rate. We show that PP-DTD achieves linear convergence to a neighborhood of the optimum under constant step-sizes and a convergence rate of $\mathcal{O}({T^{-1}})$ under decaying step-sizes when the sample is independent and identically distributed or Markovian. To the best of our knowledge, PP-DTD is the first distributed algorithm for policy evaluation in MARL over directed graphs that achieves a comparable convergence rate to single-agent TD. The numerical experiments on cooperative navigation tasks demonstrate the robustness and effectiveness of PP-DTD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PP-DTD cleanly combines Push-Pull tracking with variance-reduced TD to get single-agent rates on directed graphs under standard assumptions.

read the letter

The main takeaway is that this paper gives a distributed TD algorithm for policy evaluation that works on directed networks and matches the convergence order of centralized variance-reduced TD. PP-DTD uses the Push-Pull mechanism to track the average gradient across agents with asymmetric links, then layers on variance reduction to remove the extra bias terms that usually appear in distributed settings. The result is linear convergence to a neighborhood with constant steps and O(1/T) with decaying steps, for both i.i.d. and Markovian samples. The full manuscript supplies the graph conditions (strongly connected digraph with positive left eigenvector) and shows how the variance reduction keeps the rate constants graph-dependent but of the same order as the single-agent case. That is the concrete advance over prior undirected distributed TD work. The numerical experiments on cooperative navigation tasks are straightforward and show the method is stable in practice. The analysis rests on standard RL assumptions rather than hidden self-referential steps, so there is no obvious circularity. The main limitation is that the neighborhood radius and the hidden constants grow with the graph's mixing properties, which is expected but means performance can degrade on poorly connected digraphs. The paper does not claim parameter-free rates or universal constants, so this is not a flaw in the stated results. Overall the work is for people building multi-agent systems with one-way communication links, such as sensor networks or robot teams. It is grounded enough and addresses a clear gap, so it should go to peer review rather than desk rejection.

Referee Report

0 major / 4 minor

Summary. The paper proposes PP-DTD, a Push-Pull distributed TD learning algorithm for policy evaluation in multi-agent RL over directed networks with linear function approximation. It integrates variance reduction to obtain linear convergence to a neighborhood under constant step-sizes and an O(T^{-1}) rate under decaying step-sizes, for both i.i.d. and Markovian sampling. The central claim is that this is the first such distributed algorithm on directed graphs achieving convergence rates of the same order as single-agent TD. Numerical experiments on cooperative navigation tasks are provided to illustrate performance.

Significance. If the stated rates hold under the supplied graph and sampling assumptions, the work is significant for extending centralized TD analysis to directed multi-agent settings common in real-world MARL. The combination of the Push-Pull mechanism with variance reduction to cancel distributed tracking bias while preserving the centralized rate order is a technical contribution. The explicit handling of strongly connected digraphs with positive left Perron vector strengthens the result relative to undirected consensus-based methods. No machine-checked proofs or open-source code are mentioned, but the derivation appears self-contained on standard assumptions.

minor comments (4)

[Section 2.2] Section 2.2: The linear function approximation setup reuses standard notation from single-agent TD but does not explicitly restate the projection operator onto the span of features; this creates minor ambiguity when the distributed tracking error is introduced in Eq. (8).
[Theorem 3.1] Theorem 3.1 and its proof: The constant in the O(T^{-1}) bound is stated to be graph-dependent yet of the same order as the centralized case; a short remark comparing the leading constant to the single-agent TD constant (e.g., via the spectral gap) would make the 'comparable' claim easier to verify.
[Figure 3] Figure 3: The error curves for different step-size schedules overlap in the log-scale plot; adding a zoomed inset or separate panels for the transient and asymptotic regimes would improve readability.
[Appendix B] Appendix B: The mixing-time argument for the Markovian case assumes the sampling chain mixes independently of the communication network; a one-sentence justification that the product of the two mixing rates remains bounded would clarify the overall constant.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive evaluation of our work on PP-DTD and the recommendation for minor revision. We appreciate the recognition that the combination of the Push-Pull mechanism with variance reduction enables rates comparable to single-agent TD under the stated assumptions on directed graphs and sampling.

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper derives convergence of PP-DTD by integrating standard TD(0) with linear function approximation, the established Push-Pull tracking mechanism for directed graphs, and variance-reduction corrections that cancel extra bias terms arising from distributed updates. All rate statements (linear convergence under constant steps, O(T^{-1}) under decaying steps) are obtained from standard stochastic approximation arguments under i.i.d. or Markovian sampling plus the graph assumption of strong connectivity with positive left Perron vector; none of these steps reduce a claimed prediction to a fitted quantity or to a self-citation by construction. The comparability claim to single-agent TD follows directly from the bias-cancellation analysis rather than from renaming or re-using prior fitted results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are listed. The method likely inherits standard assumptions from distributed optimization (graph connectivity) and RL (linear FA, bounded variance) without introducing new entities.

pith-pipeline@v0.9.0 · 5483 in / 1146 out tokens · 40282 ms · 2026-05-08T17:30:56.649922+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

[1]

Human-level control through deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al. , “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015

work page 2015
[2]

Mastering the game of go with deep neural networks and tree search,

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016

work page 2016
[3]

A systematic study on reinforcement learning based applications,

K. Sivamayil, E. Rajasekar, B. Aljafari, S. Nikolovski, S. Vairavasun- daram, and I. Vairavasundaram, “A systematic study on reinforcement learning based applications,” Energies, vol. 16, no. 3, p. 1512, 2023

work page 2023
[4]

Deep reinforcement learning for swarm systems,

M. Hüttenrauch, A. Šoši ´c, and G. Neumann, “Deep reinforcement learning for swarm systems,” Journal of Machine Learning Research , vol. 20, no. 54, pp. 1–31, 2019

work page 2019
[5]

Grand- master level in starcraft ii using multi-agent reinforcement learning,

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grand- master level in starcraft ii using multi-agent reinforcement learning,” nature, vol. 575, no. 7782, pp. 350–354, 2019

work page 2019
[6]

A review of the applications of multi- agent reinforcement learning in smart factories,

F. Bahrpeyma and D. Reichelt, “A review of the applications of multi- agent reinforcement learning in smart factories,” Frontiers in Robotics and AI, vol. 9, p. 1027340, 2022

work page 2022
[7]

Learning to predict by the methods of temporal differ- ences,

R. S. Sutton, “Learning to predict by the methods of temporal differ- ences,” Machine learning, vol. 3, no. 1, pp. 9–44, 1988

work page 1988
[8]

R. S. Sutton, A. G. Barto et al., Reinforcement learning: An introduction. MIT press Cambridge, 1998, vol. 1, no. 1

work page 1998
[9]

Distributed reinforcement learning via gossip,

A. Mathkar and V . S. Borkar, “Distributed reinforcement learning via gossip,” IEEE Transactions on Automatic Control , vol. 62, no. 3, pp. 1465–1470, 2017

work page 2017
[10]

Multi-agent temporal-difference learning with linear function approximation: Weak convergence under time-varying network topologies,

M. S. Stankovi ´c and S. S. Stankovi ´c, “Multi-agent temporal-difference learning with linear function approximation: Weak convergence under time-varying network topologies,” in 2016 American control conference (ACC). IEEE, 2016, pp. 167–172

work page 2016
[11]

Finite-time analysis of dis- tributed td (0) with linear function approximation on multi-agent rein- forcement learning,

T. Doan, S. Maguluri, and J. Romberg, “Finite-time analysis of dis- tributed td (0) with linear function approximation on multi-agent rein- forcement learning,” in International Conference on Machine Learning . PMLR, 2019, pp. 1626–1635

work page 2019
[12]

Finite- time analysis of decentralized temporal-difference learning with linear function approximation,

J. Sun, G. Wang, G. B. Giannakis, Q. Yang, and Z. Yang, “Finite- time analysis of decentralized temporal-difference learning with linear function approximation,” in International Conference on Artificial Intel- ligence and Statistics . PMLR, 2020, pp. 4485–4495

work page 2020
[13]

Multi-agent off-policy tdc with near- optimal sample and communication complexity,

Z. Chen, Y . Zhou, and R. Chen, “Multi-agent off-policy tdc with near- optimal sample and communication complexity,” in 2021 55th Asilomar Conference on Signals, Systems, and Computers , 2021, pp. 504–508

work page 2021
[14]

Decentralized td tracking with linear function approximation and its finite-time analysis,

G. Wang, S. Lu, G. Giannakis, G. Tesauro, and J. Sun, “Decentralized td tracking with linear function approximation and its finite-time analysis,” Advances in neural information processing systems, vol. 33, pp. 13 762– 13 772, 2020

work page 2020
[15]

Fast gradient-descent methods for temporal-difference learning with linear function approximation,

R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E. Wiewiora, “Fast gradient-descent methods for temporal-difference learning with linear function approximation,” in Proceedings of the 26th International Conference on Machine Learning , 2009, pp. 993–1000

work page 2009
[16]

Distributed consensus- based multi-agent temporal-difference learning,

M. S. Stankovi ´c, M. Beko, and S. S. Stankovi ´c, “Distributed consensus- based multi-agent temporal-difference learning,” Automatica, vol. 151, p. 110922, 2023

work page 2023
[17]

Finite-time performance of distributed temporal-difference learning with linear function approx- imation,

T. T. Doan, S. T. Maguluri, and J. Romberg, “Finite-time performance of distributed temporal-difference learning with linear function approx- imation,” SIAM Journal on Mathematics of Data Science , vol. 3, no. 1, pp. 298–320, 2021

work page 2021
[18]

Decentralized adaptive td ( λ) learning with linear function approximation: nonasymp- totic analysis,

J. Zhu, T. Mao, M. Zhang, Q. Ge, Q. Wu, and K. Li, “Decentralized adaptive td ( λ) learning with linear function approximation: nonasymp- totic analysis,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 54, no. 8, pp. 4630–4641, 2024

work page 2024
[19]

Byzantine-resilient decentralized policy evaluation with linear function approximation,

Z. Wu, H. Shen, T. Chen, and Q. Ling, “Byzantine-resilient decentralized policy evaluation with linear function approximation,” IEEE Transac- tions on Signal Processing , vol. 69, pp. 3839–3853, 2021

work page 2021
[20]

On the convergence of adam and beyond,

S. J. Reddi, S. Kale, and S. Kumar, “On the convergence of adam and beyond,” in International Conference on Learning Representations , 2018

work page 2018
[21]

Finite-time error bounds for distributed linear stochastic approximation,

Y . Lin, V . Gupta, and J. Liu, “Finite-time error bounds for distributed linear stochastic approximation,” Automatica, vol. 159, p. 111368, 2024

work page 2024
[22]

A finite time analysis of temporal difference learning with linear function approximation,

J. Bhandari, D. Russo, and R. Singal, “A finite time analysis of temporal difference learning with linear function approximation,” Operations Research, vol. 69, no. 3, pp. 950–973, 2021

work page 2021
[23]

Push–pull gradient methods for distributed optimization in networks,

S. Pu, W. Shi, J. Xu, and A. Nedi ´c, “Push–pull gradient methods for distributed optimization in networks,” IEEE Transactions on Automatic Control, vol. 66, no. 1, pp. 1–16, 2020

work page 2020
[24]

Momentum-based variance reduction in non-convex sgd,

A. Cutkosky and F. Orabona, “Momentum-based variance reduction in non-convex sgd,” Advances in neural information processing systems , vol. 32, 2019

work page 2019
[25]

An analysis of temporal-difference learning with function approximation,

J. Tsitsiklis and B. Van Roy, “An analysis of temporal-difference learning with function approximation,” IEEE Transactions on Automatic Control, vol. 42, no. 5, pp. 674–690, 1997

work page 1997
[26]

Achieving geometric convergence for distributed optimization over time-varying graphs,

A. Nedi ´c, A. Olshevsky, and W. Shi, “Achieving geometric convergence for distributed optimization over time-varying graphs,” SIAM Journal on Optimization, vol. 27, no. 4, pp. 2597–2633, 2017

work page 2017
[27]

Compressed gradient tracking for decentralized optimization over general directed networks,

Z. Song, L. Shi, S. Pu, and M. Yan, “Compressed gradient tracking for decentralized optimization over general directed networks,” IEEE Transactions on Signal Processing , vol. 70, pp. 1775–1787, 2022

work page 2022
[28]

Multi-agent actor-critic for mixed cooperative-competitive environ- ments,

R. Lowe, Y . I. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environ- ments,” Advances in neural information processing systems , vol. 30, 2017

work page 2017

[1] [1]

Human-level control through deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al. , “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015

work page 2015

[2] [2]

Mastering the game of go with deep neural networks and tree search,

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016

work page 2016

[3] [3]

A systematic study on reinforcement learning based applications,

K. Sivamayil, E. Rajasekar, B. Aljafari, S. Nikolovski, S. Vairavasun- daram, and I. Vairavasundaram, “A systematic study on reinforcement learning based applications,” Energies, vol. 16, no. 3, p. 1512, 2023

work page 2023

[4] [4]

Deep reinforcement learning for swarm systems,

M. Hüttenrauch, A. Šoši ´c, and G. Neumann, “Deep reinforcement learning for swarm systems,” Journal of Machine Learning Research , vol. 20, no. 54, pp. 1–31, 2019

work page 2019

[5] [5]

Grand- master level in starcraft ii using multi-agent reinforcement learning,

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grand- master level in starcraft ii using multi-agent reinforcement learning,” nature, vol. 575, no. 7782, pp. 350–354, 2019

work page 2019

[6] [6]

A review of the applications of multi- agent reinforcement learning in smart factories,

F. Bahrpeyma and D. Reichelt, “A review of the applications of multi- agent reinforcement learning in smart factories,” Frontiers in Robotics and AI, vol. 9, p. 1027340, 2022

work page 2022

[7] [7]

Learning to predict by the methods of temporal differ- ences,

R. S. Sutton, “Learning to predict by the methods of temporal differ- ences,” Machine learning, vol. 3, no. 1, pp. 9–44, 1988

work page 1988

[8] [8]

R. S. Sutton, A. G. Barto et al., Reinforcement learning: An introduction. MIT press Cambridge, 1998, vol. 1, no. 1

work page 1998

[9] [9]

Distributed reinforcement learning via gossip,

A. Mathkar and V . S. Borkar, “Distributed reinforcement learning via gossip,” IEEE Transactions on Automatic Control , vol. 62, no. 3, pp. 1465–1470, 2017

work page 2017

[10] [10]

Multi-agent temporal-difference learning with linear function approximation: Weak convergence under time-varying network topologies,

M. S. Stankovi ´c and S. S. Stankovi ´c, “Multi-agent temporal-difference learning with linear function approximation: Weak convergence under time-varying network topologies,” in 2016 American control conference (ACC). IEEE, 2016, pp. 167–172

work page 2016

[11] [11]

Finite-time analysis of dis- tributed td (0) with linear function approximation on multi-agent rein- forcement learning,

T. Doan, S. Maguluri, and J. Romberg, “Finite-time analysis of dis- tributed td (0) with linear function approximation on multi-agent rein- forcement learning,” in International Conference on Machine Learning . PMLR, 2019, pp. 1626–1635

work page 2019

[12] [12]

Finite- time analysis of decentralized temporal-difference learning with linear function approximation,

J. Sun, G. Wang, G. B. Giannakis, Q. Yang, and Z. Yang, “Finite- time analysis of decentralized temporal-difference learning with linear function approximation,” in International Conference on Artificial Intel- ligence and Statistics . PMLR, 2020, pp. 4485–4495

work page 2020

[13] [13]

Multi-agent off-policy tdc with near- optimal sample and communication complexity,

Z. Chen, Y . Zhou, and R. Chen, “Multi-agent off-policy tdc with near- optimal sample and communication complexity,” in 2021 55th Asilomar Conference on Signals, Systems, and Computers , 2021, pp. 504–508

work page 2021

[14] [14]

Decentralized td tracking with linear function approximation and its finite-time analysis,

G. Wang, S. Lu, G. Giannakis, G. Tesauro, and J. Sun, “Decentralized td tracking with linear function approximation and its finite-time analysis,” Advances in neural information processing systems, vol. 33, pp. 13 762– 13 772, 2020

work page 2020

[15] [15]

Fast gradient-descent methods for temporal-difference learning with linear function approximation,

R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E. Wiewiora, “Fast gradient-descent methods for temporal-difference learning with linear function approximation,” in Proceedings of the 26th International Conference on Machine Learning , 2009, pp. 993–1000

work page 2009

[16] [16]

Distributed consensus- based multi-agent temporal-difference learning,

M. S. Stankovi ´c, M. Beko, and S. S. Stankovi ´c, “Distributed consensus- based multi-agent temporal-difference learning,” Automatica, vol. 151, p. 110922, 2023

work page 2023

[17] [17]

Finite-time performance of distributed temporal-difference learning with linear function approx- imation,

T. T. Doan, S. T. Maguluri, and J. Romberg, “Finite-time performance of distributed temporal-difference learning with linear function approx- imation,” SIAM Journal on Mathematics of Data Science , vol. 3, no. 1, pp. 298–320, 2021

work page 2021

[18] [18]

Decentralized adaptive td ( λ) learning with linear function approximation: nonasymp- totic analysis,

J. Zhu, T. Mao, M. Zhang, Q. Ge, Q. Wu, and K. Li, “Decentralized adaptive td ( λ) learning with linear function approximation: nonasymp- totic analysis,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 54, no. 8, pp. 4630–4641, 2024

work page 2024

[19] [19]

Byzantine-resilient decentralized policy evaluation with linear function approximation,

Z. Wu, H. Shen, T. Chen, and Q. Ling, “Byzantine-resilient decentralized policy evaluation with linear function approximation,” IEEE Transac- tions on Signal Processing , vol. 69, pp. 3839–3853, 2021

work page 2021

[20] [20]

On the convergence of adam and beyond,

S. J. Reddi, S. Kale, and S. Kumar, “On the convergence of adam and beyond,” in International Conference on Learning Representations , 2018

work page 2018

[21] [21]

Finite-time error bounds for distributed linear stochastic approximation,

Y . Lin, V . Gupta, and J. Liu, “Finite-time error bounds for distributed linear stochastic approximation,” Automatica, vol. 159, p. 111368, 2024

work page 2024

[22] [22]

A finite time analysis of temporal difference learning with linear function approximation,

J. Bhandari, D. Russo, and R. Singal, “A finite time analysis of temporal difference learning with linear function approximation,” Operations Research, vol. 69, no. 3, pp. 950–973, 2021

work page 2021

[23] [23]

Push–pull gradient methods for distributed optimization in networks,

S. Pu, W. Shi, J. Xu, and A. Nedi ´c, “Push–pull gradient methods for distributed optimization in networks,” IEEE Transactions on Automatic Control, vol. 66, no. 1, pp. 1–16, 2020

work page 2020

[24] [24]

Momentum-based variance reduction in non-convex sgd,

A. Cutkosky and F. Orabona, “Momentum-based variance reduction in non-convex sgd,” Advances in neural information processing systems , vol. 32, 2019

work page 2019

[25] [25]

An analysis of temporal-difference learning with function approximation,

J. Tsitsiklis and B. Van Roy, “An analysis of temporal-difference learning with function approximation,” IEEE Transactions on Automatic Control, vol. 42, no. 5, pp. 674–690, 1997

work page 1997

[26] [26]

Achieving geometric convergence for distributed optimization over time-varying graphs,

A. Nedi ´c, A. Olshevsky, and W. Shi, “Achieving geometric convergence for distributed optimization over time-varying graphs,” SIAM Journal on Optimization, vol. 27, no. 4, pp. 2597–2633, 2017

work page 2017

[27] [27]

Compressed gradient tracking for decentralized optimization over general directed networks,

Z. Song, L. Shi, S. Pu, and M. Yan, “Compressed gradient tracking for decentralized optimization over general directed networks,” IEEE Transactions on Signal Processing , vol. 70, pp. 1775–1787, 2022

work page 2022

[28] [28]

Multi-agent actor-critic for mixed cooperative-competitive environ- ments,

R. Lowe, Y . I. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environ- ments,” Advances in neural information processing systems , vol. 30, 2017

work page 2017