On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments

Leo Muxing Wang; Lili Su; Pengkun Yang

arxiv: 2409.03897 · v3 · pith:QDRCHRAKnew · submitted 2024-09-05 · 💻 cs.LG · cs.DC

On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments

Leo Muxing Wang , Pengkun Yang , Lili Su This is my paper

Pith reviewed 2026-05-23 20:37 UTC · model grok-4.3

classification 💻 cs.LG cs.DC

keywords federated Q-learningconvergence ratesenvironmental heterogeneitysynchronous averagingerror boundsreinforcement learningmulti-agent systems

0 comments

The pith

In heterogeneous environments, federated Q-learning cannot achieve error decay faster than Θ(E/T) when agents average every E iterations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies synchronous federated Q-learning in which K agents average their local Q-estimates every E iterations while facing heterogeneous environments. It shows that sampling randomness still yields linear speedup in K, but that E greater than 1 produces clear degradation unlike in homogeneous cases. The error trajectory is tracked in detail and shown to approach zero with growing T, yet a matching lower bound of order E/T on the infinity-norm error is established for many stepsize schedules. Experiments further reveal a two-phase pattern of fast initial decay followed by an uptick and plateau.

Core claim

The paper proves that in the presence of environmental heterogeneity, the ℓ_∞ norm of the error in synchronous federated Q-learning cannot decay faster than Θ(E/T) for a wide range of stepsizes, where E is the number of local iterations between averaging steps and T is the total number of iterations. This limit is fundamental and not due to analysis looseness. Additionally, the convergence exhibits a two-phase behavior with rapid initial decay followed by stabilization.

What carries the argument

Synchronous periodic averaging of local Q-estimates every E iterations, which interacts with persistent environmental heterogeneity to sustain differences across agents.

If this is right

Sampling errors continue to enjoy linear speedup in the number of agents K.
Larger averaging intervals E slow the overall error decay proportionally.
The error still reaches zero as T grows, but only at the reduced rate set by E.
Switching stepsizes after the observed phase-transition point improves total convergence speed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Communication frequency may need to increase with the degree of environmental heterogeneity to preserve fast convergence.
Analogous rate ceilings could appear in other federated reinforcement-learning methods that rely on periodic model averaging.
Heterogeneity-aware schedules that adjust E over time might reduce the slowdown without raising communication cost.

Load-bearing premise

The analysis relies on synchronous periodic averaging where environmental heterogeneity creates persistent differences in local Q-estimates that infrequent communication does not fully cancel.

What would settle it

An experiment in which the ℓ_∞ error in a heterogeneous multi-agent Q-learning run with E>1 decays faster than order E/T for large T would contradict the proven lower bound.

Figures

Figures reproduced from arXiv: 2409.03897 by Leo Muxing Wang, Lili Su, Pengkun Yang.

**Figure 1.** Figure 1: A federated learning system instance. However, many large-scale multi-agent systems are often deployed across wide geographic areas, resulting in agents interacting with heterogeneous environments. For instance, connected and autonomous vehicles (CAVs) operating in various regions of a metropolitan area encounter diverse conditions such as varying traffic patterns, road infrastructure, and local regulati… view at source ↗

**Figure 2.** Figure 2: The ℓ∞ error of different constant stepsizes under the heterogeneous and homogenous settings. constant stepsizes are often used in reinforcement learning problems because of the great performance in applications as described in Sutton & Barto (2018), they suffer significant performance degradation in the presence of environmental heterogeneity. Impacts of the synchronization period E. Furthermore, we test … view at source ↗

**Figure 3.** Figure 3: Heterogeneous environments with varying E. From left to right, E = 1, 20, 40, and ∞ respectively period E. As shown in [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Choosing different stepsizes for phases 1 [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Homogeneous FQL with varying E. G.2 Different target error levels. In [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗

**Figure 6.** Figure 6: Convergence performance of different tolerance levels of different stepsize choices. The horizontal dashed lines represent the tolerance levels not met, while the vertical dashed lines indicate the iterations at which the training processes meet the corresponding tolerance levels. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

read the original abstract

Large-scale multi-agent systems are often deployed across wide geographic areas, where agents interact with heterogeneous environments. There is an emerging interest in understanding the role of heterogeneity in the performance of the federated versions of classic reinforcement learning algorithms. In this paper, we study synchronous federated Q-learning, which aims to learn an optimal Q-function by having $K$ agents average their local Q-estimates per $E$ iterations. We observe an interesting phenomenon on the convergence speeds in terms of $K$ and $E$. Similar to the homogeneous environment settings, there is a linear speed-up concerning $K$ in reducing the errors that arise from sampling randomness. Yet, in sharp contrast to the homogeneous settings, $E>1$ leads to significant performance degradation. Specifically, we provide a fine-grained characterization of the error evolution in the presence of environmental heterogeneity, which decay to zero as the number of iterations $T$ increases. The slow convergence of having $E>1$ turns out to be fundamental rather than an artifact of our analysis. We prove that, for a wide range of stepsizes, the $\ell_{\infty}$ norm of the error cannot decay faster than $\Theta (E/T)$. In addition, our experiments demonstrate that the convergence exhibits an interesting two-phase phenomenon. For any given stepsize, there is a sharp phase-transition of the convergence: the error decays rapidly in the beginning yet later bounces up and stabilizes. Provided that the phase-transition time can be estimated, choosing different stepsizes for the two phases leads to faster overall convergence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper establishes a Theta(E/T) lower bound on l_infty error for federated Q-learning under heterogeneity and documents a two-phase convergence pattern in experiments.

read the letter

The main thing to know is that this work shows infrequent averaging (E>1) creates a fundamental slowdown to Theta(E/T) in heterogeneous federated Q-learning, unlike the homogeneous case, and experiments reveal error that drops fast then bounces and plateaus for fixed stepsizes. The fine-grained split between sampling noise and heterogeneity effects is the clearest new piece, along with the lower bound that holds across a range of stepsizes and the phase-transition observation that suggests switching stepsizes mid-training could help. The abstract presents these as direct consequences of environmental differences that produce persistent local Q-estimate gaps averaging cannot erase. The lower bound and the experimental pattern are the parts that stand out as useful for guiding communication frequency in multi-robot or edge settings. The analysis looks careful in separating the error sources, and the claim that the slowdown is not an artifact is stated plainly. The main soft spot is whether the lower bound construction is general or tied to a narrow heterogeneity model, such as fixed additive biases in local Q* that periodic averaging leaves untouched. If other natural heterogeneity (identical Q* or biases that average to zero) allows faster rates, the Theta(E/T) result would apply only to adversarial cases rather than arbitrary heterogeneous MDPs. The two-phase behavior is shown experimentally but lacks a matching theoretical explanation in the abstract. This paper is for researchers working on convergence rates in federated RL with real deployment heterogeneity. It is worth sending to peer review because the lower bound and the documented phase transition are specific enough to merit checking the proofs and experimental details, even if the generality of the bound needs tightening.

Referee Report

2 major / 2 minor

Summary. The paper analyzes synchronous federated Q-learning with K agents that average local Q-estimates every E iterations in heterogeneous MDPs. It claims linear speedup in K for sampling-induced errors (similar to homogeneous cases) but shows that E > 1 causes fundamental degradation due to persistent heterogeneity effects. The authors provide a fine-grained error evolution characterization that decays with T, prove that the ℓ_∞ error cannot decay faster than Θ(E/T) for a wide range of stepsizes, and report experiments exhibiting a two-phase convergence phenomenon (rapid initial decay followed by stabilization) that can be mitigated by stepsize scheduling across phases.

Significance. If the lower bound holds beyond the specific heterogeneity constructions examined, the result establishes a fundamental communication-frequency limit in heterogeneous federated RL, which is relevant for large-scale multi-agent deployments. The two-phase experimental observation and the explicit separation of sampling vs. heterogeneity errors are useful for algorithm design. The paper supplies both upper-bound analysis and matching lower bounds plus reproducible experiments, which strengthens its contribution relative to purely asymptotic claims.

major comments (2)

[Lower-bound theorem] Lower-bound theorem (likely §4 or §5): the Θ(E/T) claim is presented as fundamental for heterogeneous environments, yet the construction appears to rely on MDPs where local optimal Q-functions differ by a fixed additive bias that periodic averaging cannot cancel. It is unclear whether the bound extends to general heterogeneous MDPs (e.g., those where optimal Q* coincide or where bias averages to zero under infrequent communication). This directly affects whether the result rules out faster rates under natural heterogeneity models.
[Error decomposition] Error decomposition (likely Eq. (X) in the convergence analysis): the characterization separates sampling variance (which benefits from K) from a heterogeneity term that scales with E. The proof must explicitly verify that the heterogeneity term remains non-vanishing for the stated stepsize range; otherwise the lower bound reduces to a fitted quantity rather than an independent limit.

minor comments (2)

[Abstract] The abstract states the lower bound holds 'for a wide range of stepsizes' without specifying the exact interval or dependence on E and heterogeneity parameters; this should be stated precisely in the theorem statement.
[Experiments] Figure captions for the two-phase experiments should include the exact stepsize values, E, K, and MDP parameters used so that the phase-transition time can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and insightful comments on our manuscript. We address each major comment in detail below, providing clarifications and indicating where revisions will be made.

read point-by-point responses

Referee: [Lower-bound theorem] Lower-bound theorem (likely §4 or §5): the Θ(E/T) claim is presented as fundamental for heterogeneous environments, yet the construction appears to rely on MDPs where local optimal Q-functions differ by a fixed additive bias that periodic averaging cannot cancel. It is unclear whether the bound extends to general heterogeneous MDPs (e.g., those where optimal Q* coincide or where bias averages to zero under infrequent communication). This directly affects whether the result rules out faster rates under natural heterogeneity models.

Authors: Our lower bound is established for a specific class of heterogeneous MDPs in which the local optimal Q-functions differ by a constant additive bias that cannot be eliminated through periodic averaging. This construction is intended to capture scenarios where environmental heterogeneity leads to persistent discrepancies. We do not assert that the Θ(E/T) rate applies universally to all heterogeneous MDPs; indeed, when local optima coincide (reducing to the homogeneous case), faster convergence is possible. The result serves to highlight the potential fundamental limit imposed by communication frequency in the presence of non-cancellable heterogeneity. To address the concern, we will revise the manuscript to explicitly state the scope of the lower bound and provide a brief discussion on conditions under which faster rates might hold. revision: partial
Referee: [Error decomposition] Error decomposition (likely Eq. (X) in the convergence analysis): the characterization separates sampling variance (which benefits from K) from a heterogeneity term that scales with E. The proof must explicitly verify that the heterogeneity term remains non-vanishing for the stated stepsize range; otherwise the lower bound reduces to a fitted quantity rather than an independent limit.

Authors: The error decomposition in our analysis separates the sampling-induced error, which scales with 1/sqrt(KT) or similar, from the heterogeneity-induced term that scales with E/T. The proof shows that the heterogeneity term arises directly from the mismatch in local transition and reward functions and remains non-vanishing under the stepsize conditions considered (e.g., stepsizes of order 1/t). This is verified by bounding the bias term independently of the variance terms in the recursive error evolution. The lower bound then follows by showing that this term dominates and cannot be canceled. We will add an explicit statement or corollary confirming that the heterogeneity term is Ω(E/T) and does not vanish for the relevant stepsize range. revision: partial

Circularity Check

0 steps flagged

No significant circularity; lower bound is an independent proof

full rationale

The paper's central claim is a mathematical proof that the ℓ_∞ error cannot decay faster than Θ(E/T) under the stated heterogeneity model and synchronous averaging. No quoted step reduces the claimed rate to a fitted parameter, self-citation chain, or definitional tautology. The lower bound is presented as derived from the model assumptions rather than constructed from the target quantity itself. The derivation chain remains self-contained against external benchmarks, with the two-phase convergence observation also arising from direct analysis rather than renaming or smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the standard synchronous federated Q-learning model with periodic averaging and the presence of environmental heterogeneity; no explicit free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Synchronous federated Q-learning with periodic averaging every E iterations across heterogeneous environments
The model and lower bound are stated for this specific synchronous averaging protocol.

pith-pipeline@v0.9.0 · 5813 in / 1157 out tokens · 33743 ms · 2026-05-23T20:37:00.254840+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove that, for a wide range of stepsizes, the ℓ∞ norm of the error cannot decay faster than Θ(E/T)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the slow convergence of having E>1 turns out to be fundamental rather than an artifact of our analysis

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 2 internal anchors

[1]

Reinforcement learning and optimal control, volume 1

Dimitri Bertsekas. Reinforcement learning and optimal control, volume 1. Athena Scientific, 2019

work page 2019
[2]

Neuro-dynamic programming

Dimitri Bertsekas and John N Tsitsiklis. Neuro-dynamic programming. Athena Scientific, 1996

work page 1996
[3]

A finite time analysis of temporal difference learning with linear function approximation

Jalaj Bhandari, Daniel Russo, and Raghav Singal. A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pp.\ 1691--1692. PMLR, 2018

work page 2018
[4]

Bdfl: a byzantine-fault-tolerance decentralized federated learning method for autonomous vehicle

Jin-Hua Chen, Min-Rong Chen, Guo-Qiang Zeng, and Jia-Si Weng. Bdfl: a byzantine-fault-tolerance decentralized federated learning method for autonomous vehicle. IEEE Transactions on Vehicular Technology, 70 0 (9): 0 8639--8652, 2021

work page 2021
[5]

Adaptive personalized federated learning,

Yuyang Deng, Mohammad Mahdi Kamani, and Mehrdad Mahdavi. Adaptive personalized federated learning. arXiv preprint arXiv:2003.13461, 2020

work page arXiv 2003
[6]

Finite-time analysis of distributed td (0) with linear function approximation on multi-agent reinforcement learning

Thinh Doan, Siva Maguluri, and Justin Romberg. Finite-time analysis of distributed td (0) with linear function approximation on multi-agent reinforcement learning. In International Conference on Machine Learning, pp.\ 1626--1635. PMLR, 2019

work page 2019
[7]

Federated learning for vehicular internet of things: Recent advances and open issues

Zhaoyang Du, Celimuge Wu, Tsutomu Yoshinaga, Kok-Lim Alvin Yau, Yusheng Ji, and Jie Li. Federated learning for vehicular internet of things: Recent advances and open issues. IEEE Open Journal of the Computer Society, 1: 0 45--61, 2020

work page 2020
[8]

Federated reinforcement learning with environment heterogeneity

Hao Jin, Yang Peng, Wenhao Yang, Shusen Wang, and Zhihua Zhang. Federated reinforcement learning with environment heterogeneity. In International Conference on Artificial Intelligence and Statistics, pp.\ 18--37. PMLR, 2022

work page 2022
[9]

Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G

Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D’Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Cha...

work page 2021
[10]

Scaffold: Stochastic controlled averaging for federated learning

Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pp.\ 5132--5143. PMLR, 2020

work page 2020
[11]

Federated reinforcement learning: Linear speedup under markovian sampling

Sajad Khodadadian, Pranay Sharma, Gauri Joshi, and Siva Theja Maguluri. Federated reinforcement learning: Linear speedup under markovian sampling. In International Conference on Machine Learning, pp.\ 10997--11057. PMLR, 2022

work page 2022
[12]

Deep reinforcement learning for autonomous driving: A survey

B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A Al Sallab, Senthil Yogamani, and Patrick P \'e rez. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23 0 (6): 0 4909--4926, 2021

work page 2021
[13]

Sample complexity of asynchronous q-learning: Sharper analysis and variance reduction

Gen Li, Yuting Wei, Yuejie Chi, Yuantao Gu, and Yuxin Chen. Sample complexity of asynchronous q-learning: Sharper analysis and variance reduction. Advances in neural information processing systems, 33: 0 7031--7043, 2020 a

work page 2020
[14]

Is q-learning minimax optimal? a tight sample complexity analysis

Gen Li, Changxiao Cai, Yuxin Chen, Yuting Wei, and Yuejie Chi. Is q-learning minimax optimal? a tight sample complexity analysis. Operations Research, 72 0 (1): 0 222--236, 2024

work page 2024
[15]

Federated optimization in heterogeneous networks

Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems, 2: 0 429--450, 2020 b

work page 2020
[16]

Fedvision: An online visual object detection platform powered by federated learning

Yang Liu, Anbu Huang, Yun Luo, He Huang, Youzhi Liu, Yuanyuan Chen, Lican Feng, Tianjian Chen, Han Yu, and Qiang Yang. Fedvision: An online visual object detection platform powered by federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.\ 13172--13179, 2020

work page 2020
[17]

Communication-efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp.\ 1273--1282. PMLR, 2017

work page 2017
[18]

Reinforcement learning environment for tactical networks

Thies M \"o hlenhof, Norman Jansen, and Wiam Rachid. Reinforcement learning environment for tactical networks. In 2021 International Conference on Military Communication and Information Systems (ICMCIS), pp.\ 1--8. IEEE, 2021

work page 2021
[19]

D \"i ot: A federated self-learning anomaly detection system for iot

Thien Duc Nguyen, Samuel Marchal, Markus Miettinen, Hossein Fereidooni, N Asokan, and Ahmad-Reza Sadeghi. D \"i ot: A federated self-learning anomaly detection system for iot. In 2019 IEEE 39th International conference on distributed computing systems (ICDCS), pp.\ 756--767. IEEE, 2019

work page 2019
[20]

A reinforcement learning approach to robust scheduling of semiconductor manufacturing facilities

In-Beom Park, Jaeseok Huh, Joongkyun Kim, and Jonghun Park. A reinforcement learning approach to robust scheduling of semiconductor manufacturing facilities. IEEE Transactions on Automation Science and Engineering, 17 0 (3): 0 1420--1431, 2019

work page 2019
[21]

Fedsplit: An algorithmic framework for fast federated optimization

Reese Pathak and Martin J Wainwright. Fedsplit: An algorithmic framework for fast federated optimization. Advances in neural information processing systems, 33: 0 7057--7066, 2020

work page 2020
[22]

Fingerprint policy optimisation for robust reinforcement learning

Supratik Paul, Michael A Osborne, and Shimon Whiteson. Fingerprint policy optimisation for robust reinforcement learning. In International Conference on Machine Learning, pp.\ 5082--5091. PMLR, 2019

work page 2019
[23]

Privacy-preserving and uncertainty-aware federated trajectory prediction for connected autonomous vehicles

Muzi Peng, Jiangwei Wang, Dongjin Song, Fei Miao, and Lili Su. Privacy-preserving and uncertainty-aware federated trajectory prediction for connected autonomous vehicles. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.\ 11141--11147, 2023. doi:10.1109/IROS55552.2023.10341638

work page doi:10.1109/iros55552.2023.10341638 2023
[24]

Federated learning in vehicular networks: Opportunities and solutions

Jason Posner, Lewis Tseng, Moayad Aloqaily, and Yaser Jararweh. Federated learning in vehicular networks: Opportunities and solutions. IEEE Network, 35 0 (2): 0 152--159, 2021

work page 2021
[25]

Federated reinforcement learning: techniques, applications, and open challenges

Jiaju Qi, Qihao Zhou, Lei Lei, and Kan Zheng. Federated reinforcement learning: techniques, applications, and open challenges. Intelligence & Robotics, 2021. doi:10.20517/ir.2021.02. URL https://doi.org/10.20517

work page doi:10.20517/ir.2021.02 2021
[26]

Federated Learning for Emoji Prediction in a Mobile Keyboard

Swaroop Ramaswamy, Rajiv Mathews, Kanishka Rao, and Fran c oise Beaufays. Federated learning for emoji prediction in a mobile keyboard. arXiv preprint arXiv:1906.04329, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[27]

Multi-institutional deep learning modeling without sharing patient data: A feasibility study on brain tumor segmentation

Micah J Sheller, G Anthony Reina, Brandon Edwards, Jason Martin, and Spyridon Bakas. Multi-institutional deep learning modeling without sharing patient data: A feasibility study on brain tumor segmentation. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with M...

work page 2018
[28]

A non-parametric view of fedavg and fedprox: Beyond stationary points

Lili Su, Jiaming Xu, and Pengkun Yang. A non-parametric view of fedavg and fedprox: Beyond stationary points. Journal of Machine Learning Research, 24 0 (203): 0 1--48, 2023

work page 2023
[29]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. Chapter 2.5 Tracking a Nonstationary Problem, Reinforcement Learning: An Introduction, chapter 8, pp.\ 33. The MIT Press, 2018

work page 2018
[30]

High-dimensional probability: An introduction with applications in data science, volume 47

Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018

work page 2018
[31]

Atpfl: Automatic trajectory prediction model design under federated learning framework

Chunnan Wang, Xiang Chen, Junzhe Wang, and Hongzhi Wang. Atpfl: Automatic trajectory prediction model design under federated learning framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 6563--6572, June 2022 a

work page 2022
[32]

Federated temporal difference learning with linear function approximation under environmental heterogeneity

Han Wang, Aritra Mitra, Hamed Hassani, George J Pappas, and James Anderson. Federated temporal difference learning with linear function approximation under environmental heterogeneity. arXiv preprint arXiv:2302.02212, 2023

work page arXiv 2023
[33]

Tackling the objective inconsistency problem in heterogeneous federated optimization

Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, and H Vincent Poor. Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33: 0 7611--7623, 2020

work page 2020
[34]

On the unreasonable effectiveness of federated averaging with heterogeneous data

Jianyu Wang, Rudrajit Das, Gauri Joshi, Satyen Kale, Zheng Xu, and Tong Zhang. On the unreasonable effectiveness of federated averaging with heterogeneous data. arXiv preprint arXiv:2206.04723, 2022 b

work page arXiv 2022
[35]

Q-learning

Christopher Watkins and Peter Dayan. Q-learning. Machine Learning, 8: 0 279--292, 1992. URL https://api.semanticscholar.org/CorpusID:208910339

work page 1992
[36]

The blessing of heterogeneity in federated q-learning: Linear speedup and beyond

Jiin Woo, Gauri Joshi, and Yuejie Chi. The blessing of heterogeneity in federated q-learning: Linear speedup and beyond. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Resear...

work page 2023
[37]

FedKL : Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence

Zhijie Xie and Shenghui Song. FedKL : Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence . IEEE Journal on Selected Areas in Communications, 41 0 (4): 0 1227--1242, April 2023. ISSN 1558-0008. doi:10.1109/JSAC.2023.3242734. URL https://ieeexplore.ieee.org/abstract/document/10038492?casa_token=yGyMDlnL_FsAAAAA:hqNvz...

work page doi:10.1109/jsac.2023.3242734 2023
[38]

Experiments of federated learning for covid-19 chest x-ray images

Bingjie Yan, Jun Wang, Jieren Cheng, Yize Zhou, Yixian Zhang, Yifan Yang, Li Liu, Haojiang Zhao, Chunjuan Wang, and Boyi Liu. Experiments of federated learning for covid-19 chest x-ray images. In Advances in Artificial Intelligence and Security: 7th International Conference, ICAIS 2021, Dublin, Ireland, July 19-23, 2021, Proceedings, Part II 7, pp.\ 41--5...

work page 2021
[39]

Applied Federated Learning: Improving Google Keyboard Query Suggestions

Timothy Yang, Galen Andrew, Hubert Eichner, Haicheng Sun, Wei Li, Nicholas Kong, Daniel Ramage, and Fran c oise Beaufays. Applied federated learning: Improving google keyboard query suggestions. arXiv preprint arXiv:1812.02903, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[40]

Learning context-aware policies from multiple smart homes via federated multi-task learning

Tianlong Yu, Tian Li, Yuqiong Sun, Susanta Nanda, Virginia Smith, Vyas Sekar, and Srinivasan Seshan. Learning context-aware policies from multiple smart homes via federated multi-task learning. In 2020 IEEE/ACM Fifth International Conference on Internet-of-Things Design and Implementation (IoTDI), pp.\ 104--115. IEEE, 2020

work page 2020
[41]

Federated learning on the road autonomous controller design for connected and autonomous vehicles

Tengchan Zeng, Omid Semiari, Mingzhe Chen, Walid Saad, and Mehdi Bennis. Federated learning on the road autonomous controller design for connected and autonomous vehicles. IEEE Transactions on Wireless Communications, 21 0 (12): 0 10407--10423, 2022

work page 2022
[42]

Finite-time analysis of on-policy heterogeneous federated reinforcement learning

Chenyu Zhang, Han Wang, Aritra Mitra, and James Anderson. Finite-time analysis of on-policy heterogeneous federated reinforcement learning. In The Twelfth International Conference on Learning Representations, 2023 a

work page 2023
[43]

Finite-time analysis of on-policy heterogeneous federated reinforcement learning

Chenyu Zhang, Han Wang, Aritra Mitra, and James Anderson. Finite-time analysis of on-policy heterogeneous federated reinforcement learning. arXiv preprint arXiv:2401.15273, 2024

work page arXiv 2024
[44]

On the convergence of sarsa with linear function approximation

Shangtong Zhang, Remi Tachet Des Combes, and Romain Laroche. On the convergence of sarsa with linear function approximation. In International Conference on Machine Learning, pp.\ 41613--41646. PMLR, 2023 b

work page 2023
[45]

Federated q-learning: Linear regret speedup with low communication cost

Zhong Zheng, Fengyu Gao, Lingzhou Xue, and Jing Yang. Federated q-learning: Linear regret speedup with low communication cost. arXiv preprint arXiv:2312.15023, 2023

work page arXiv 2023

[1] [1]

Reinforcement learning and optimal control, volume 1

Dimitri Bertsekas. Reinforcement learning and optimal control, volume 1. Athena Scientific, 2019

work page 2019

[2] [2]

Neuro-dynamic programming

Dimitri Bertsekas and John N Tsitsiklis. Neuro-dynamic programming. Athena Scientific, 1996

work page 1996

[3] [3]

A finite time analysis of temporal difference learning with linear function approximation

Jalaj Bhandari, Daniel Russo, and Raghav Singal. A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pp.\ 1691--1692. PMLR, 2018

work page 2018

[4] [4]

Bdfl: a byzantine-fault-tolerance decentralized federated learning method for autonomous vehicle

Jin-Hua Chen, Min-Rong Chen, Guo-Qiang Zeng, and Jia-Si Weng. Bdfl: a byzantine-fault-tolerance decentralized federated learning method for autonomous vehicle. IEEE Transactions on Vehicular Technology, 70 0 (9): 0 8639--8652, 2021

work page 2021

[5] [5]

Adaptive personalized federated learning,

Yuyang Deng, Mohammad Mahdi Kamani, and Mehrdad Mahdavi. Adaptive personalized federated learning. arXiv preprint arXiv:2003.13461, 2020

work page arXiv 2003

[6] [6]

Finite-time analysis of distributed td (0) with linear function approximation on multi-agent reinforcement learning

Thinh Doan, Siva Maguluri, and Justin Romberg. Finite-time analysis of distributed td (0) with linear function approximation on multi-agent reinforcement learning. In International Conference on Machine Learning, pp.\ 1626--1635. PMLR, 2019

work page 2019

[7] [7]

Federated learning for vehicular internet of things: Recent advances and open issues

Zhaoyang Du, Celimuge Wu, Tsutomu Yoshinaga, Kok-Lim Alvin Yau, Yusheng Ji, and Jie Li. Federated learning for vehicular internet of things: Recent advances and open issues. IEEE Open Journal of the Computer Society, 1: 0 45--61, 2020

work page 2020

[8] [8]

Federated reinforcement learning with environment heterogeneity

Hao Jin, Yang Peng, Wenhao Yang, Shusen Wang, and Zhihua Zhang. Federated reinforcement learning with environment heterogeneity. In International Conference on Artificial Intelligence and Statistics, pp.\ 18--37. PMLR, 2022

work page 2022

[9] [9]

Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G

Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D’Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Cha...

work page 2021

[10] [10]

Scaffold: Stochastic controlled averaging for federated learning

Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pp.\ 5132--5143. PMLR, 2020

work page 2020

[11] [11]

Federated reinforcement learning: Linear speedup under markovian sampling

Sajad Khodadadian, Pranay Sharma, Gauri Joshi, and Siva Theja Maguluri. Federated reinforcement learning: Linear speedup under markovian sampling. In International Conference on Machine Learning, pp.\ 10997--11057. PMLR, 2022

work page 2022

[12] [12]

Deep reinforcement learning for autonomous driving: A survey

B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A Al Sallab, Senthil Yogamani, and Patrick P \'e rez. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23 0 (6): 0 4909--4926, 2021

work page 2021

[13] [13]

Sample complexity of asynchronous q-learning: Sharper analysis and variance reduction

Gen Li, Yuting Wei, Yuejie Chi, Yuantao Gu, and Yuxin Chen. Sample complexity of asynchronous q-learning: Sharper analysis and variance reduction. Advances in neural information processing systems, 33: 0 7031--7043, 2020 a

work page 2020

[14] [14]

Is q-learning minimax optimal? a tight sample complexity analysis

Gen Li, Changxiao Cai, Yuxin Chen, Yuting Wei, and Yuejie Chi. Is q-learning minimax optimal? a tight sample complexity analysis. Operations Research, 72 0 (1): 0 222--236, 2024

work page 2024

[15] [15]

Federated optimization in heterogeneous networks

Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems, 2: 0 429--450, 2020 b

work page 2020

[16] [16]

Fedvision: An online visual object detection platform powered by federated learning

Yang Liu, Anbu Huang, Yun Luo, He Huang, Youzhi Liu, Yuanyuan Chen, Lican Feng, Tianjian Chen, Han Yu, and Qiang Yang. Fedvision: An online visual object detection platform powered by federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.\ 13172--13179, 2020

work page 2020

[17] [17]

Communication-efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp.\ 1273--1282. PMLR, 2017

work page 2017

[18] [18]

Reinforcement learning environment for tactical networks

Thies M \"o hlenhof, Norman Jansen, and Wiam Rachid. Reinforcement learning environment for tactical networks. In 2021 International Conference on Military Communication and Information Systems (ICMCIS), pp.\ 1--8. IEEE, 2021

work page 2021

[19] [19]

D \"i ot: A federated self-learning anomaly detection system for iot

Thien Duc Nguyen, Samuel Marchal, Markus Miettinen, Hossein Fereidooni, N Asokan, and Ahmad-Reza Sadeghi. D \"i ot: A federated self-learning anomaly detection system for iot. In 2019 IEEE 39th International conference on distributed computing systems (ICDCS), pp.\ 756--767. IEEE, 2019

work page 2019

[20] [20]

A reinforcement learning approach to robust scheduling of semiconductor manufacturing facilities

In-Beom Park, Jaeseok Huh, Joongkyun Kim, and Jonghun Park. A reinforcement learning approach to robust scheduling of semiconductor manufacturing facilities. IEEE Transactions on Automation Science and Engineering, 17 0 (3): 0 1420--1431, 2019

work page 2019

[21] [21]

Fedsplit: An algorithmic framework for fast federated optimization

Reese Pathak and Martin J Wainwright. Fedsplit: An algorithmic framework for fast federated optimization. Advances in neural information processing systems, 33: 0 7057--7066, 2020

work page 2020

[22] [22]

Fingerprint policy optimisation for robust reinforcement learning

Supratik Paul, Michael A Osborne, and Shimon Whiteson. Fingerprint policy optimisation for robust reinforcement learning. In International Conference on Machine Learning, pp.\ 5082--5091. PMLR, 2019

work page 2019

[23] [23]

Privacy-preserving and uncertainty-aware federated trajectory prediction for connected autonomous vehicles

Muzi Peng, Jiangwei Wang, Dongjin Song, Fei Miao, and Lili Su. Privacy-preserving and uncertainty-aware federated trajectory prediction for connected autonomous vehicles. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.\ 11141--11147, 2023. doi:10.1109/IROS55552.2023.10341638

work page doi:10.1109/iros55552.2023.10341638 2023

[24] [24]

Federated learning in vehicular networks: Opportunities and solutions

Jason Posner, Lewis Tseng, Moayad Aloqaily, and Yaser Jararweh. Federated learning in vehicular networks: Opportunities and solutions. IEEE Network, 35 0 (2): 0 152--159, 2021

work page 2021

[25] [25]

Federated reinforcement learning: techniques, applications, and open challenges

Jiaju Qi, Qihao Zhou, Lei Lei, and Kan Zheng. Federated reinforcement learning: techniques, applications, and open challenges. Intelligence & Robotics, 2021. doi:10.20517/ir.2021.02. URL https://doi.org/10.20517

work page doi:10.20517/ir.2021.02 2021

[26] [26]

Federated Learning for Emoji Prediction in a Mobile Keyboard

Swaroop Ramaswamy, Rajiv Mathews, Kanishka Rao, and Fran c oise Beaufays. Federated learning for emoji prediction in a mobile keyboard. arXiv preprint arXiv:1906.04329, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[27] [27]

Multi-institutional deep learning modeling without sharing patient data: A feasibility study on brain tumor segmentation

Micah J Sheller, G Anthony Reina, Brandon Edwards, Jason Martin, and Spyridon Bakas. Multi-institutional deep learning modeling without sharing patient data: A feasibility study on brain tumor segmentation. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with M...

work page 2018

[28] [28]

A non-parametric view of fedavg and fedprox: Beyond stationary points

Lili Su, Jiaming Xu, and Pengkun Yang. A non-parametric view of fedavg and fedprox: Beyond stationary points. Journal of Machine Learning Research, 24 0 (203): 0 1--48, 2023

work page 2023

[29] [29]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. Chapter 2.5 Tracking a Nonstationary Problem, Reinforcement Learning: An Introduction, chapter 8, pp.\ 33. The MIT Press, 2018

work page 2018

[30] [30]

High-dimensional probability: An introduction with applications in data science, volume 47

Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018

work page 2018

[31] [31]

Atpfl: Automatic trajectory prediction model design under federated learning framework

Chunnan Wang, Xiang Chen, Junzhe Wang, and Hongzhi Wang. Atpfl: Automatic trajectory prediction model design under federated learning framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 6563--6572, June 2022 a

work page 2022

[32] [32]

Federated temporal difference learning with linear function approximation under environmental heterogeneity

Han Wang, Aritra Mitra, Hamed Hassani, George J Pappas, and James Anderson. Federated temporal difference learning with linear function approximation under environmental heterogeneity. arXiv preprint arXiv:2302.02212, 2023

work page arXiv 2023

[33] [33]

Tackling the objective inconsistency problem in heterogeneous federated optimization

Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, and H Vincent Poor. Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33: 0 7611--7623, 2020

work page 2020

[34] [34]

On the unreasonable effectiveness of federated averaging with heterogeneous data

Jianyu Wang, Rudrajit Das, Gauri Joshi, Satyen Kale, Zheng Xu, and Tong Zhang. On the unreasonable effectiveness of federated averaging with heterogeneous data. arXiv preprint arXiv:2206.04723, 2022 b

work page arXiv 2022

[35] [35]

Q-learning

Christopher Watkins and Peter Dayan. Q-learning. Machine Learning, 8: 0 279--292, 1992. URL https://api.semanticscholar.org/CorpusID:208910339

work page 1992

[36] [36]

The blessing of heterogeneity in federated q-learning: Linear speedup and beyond

Jiin Woo, Gauri Joshi, and Yuejie Chi. The blessing of heterogeneity in federated q-learning: Linear speedup and beyond. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Resear...

work page 2023

[37] [37]

FedKL : Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence

Zhijie Xie and Shenghui Song. FedKL : Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence . IEEE Journal on Selected Areas in Communications, 41 0 (4): 0 1227--1242, April 2023. ISSN 1558-0008. doi:10.1109/JSAC.2023.3242734. URL https://ieeexplore.ieee.org/abstract/document/10038492?casa_token=yGyMDlnL_FsAAAAA:hqNvz...

work page doi:10.1109/jsac.2023.3242734 2023

[38] [38]

Experiments of federated learning for covid-19 chest x-ray images

Bingjie Yan, Jun Wang, Jieren Cheng, Yize Zhou, Yixian Zhang, Yifan Yang, Li Liu, Haojiang Zhao, Chunjuan Wang, and Boyi Liu. Experiments of federated learning for covid-19 chest x-ray images. In Advances in Artificial Intelligence and Security: 7th International Conference, ICAIS 2021, Dublin, Ireland, July 19-23, 2021, Proceedings, Part II 7, pp.\ 41--5...

work page 2021

[39] [39]

Applied Federated Learning: Improving Google Keyboard Query Suggestions

Timothy Yang, Galen Andrew, Hubert Eichner, Haicheng Sun, Wei Li, Nicholas Kong, Daniel Ramage, and Fran c oise Beaufays. Applied federated learning: Improving google keyboard query suggestions. arXiv preprint arXiv:1812.02903, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[40] [40]

Learning context-aware policies from multiple smart homes via federated multi-task learning

Tianlong Yu, Tian Li, Yuqiong Sun, Susanta Nanda, Virginia Smith, Vyas Sekar, and Srinivasan Seshan. Learning context-aware policies from multiple smart homes via federated multi-task learning. In 2020 IEEE/ACM Fifth International Conference on Internet-of-Things Design and Implementation (IoTDI), pp.\ 104--115. IEEE, 2020

work page 2020

[41] [41]

Federated learning on the road autonomous controller design for connected and autonomous vehicles

Tengchan Zeng, Omid Semiari, Mingzhe Chen, Walid Saad, and Mehdi Bennis. Federated learning on the road autonomous controller design for connected and autonomous vehicles. IEEE Transactions on Wireless Communications, 21 0 (12): 0 10407--10423, 2022

work page 2022

[42] [42]

Finite-time analysis of on-policy heterogeneous federated reinforcement learning

Chenyu Zhang, Han Wang, Aritra Mitra, and James Anderson. Finite-time analysis of on-policy heterogeneous federated reinforcement learning. In The Twelfth International Conference on Learning Representations, 2023 a

work page 2023

[43] [43]

Finite-time analysis of on-policy heterogeneous federated reinforcement learning

Chenyu Zhang, Han Wang, Aritra Mitra, and James Anderson. Finite-time analysis of on-policy heterogeneous federated reinforcement learning. arXiv preprint arXiv:2401.15273, 2024

work page arXiv 2024

[44] [44]

On the convergence of sarsa with linear function approximation

Shangtong Zhang, Remi Tachet Des Combes, and Romain Laroche. On the convergence of sarsa with linear function approximation. In International Conference on Machine Learning, pp.\ 41613--41646. PMLR, 2023 b

work page 2023

[45] [45]

Federated q-learning: Linear regret speedup with low communication cost

Zhong Zheng, Fengyu Gao, Lingzhou Xue, and Jing Yang. Federated q-learning: Linear regret speedup with low communication cost. arXiv preprint arXiv:2312.15023, 2023

work page arXiv 2023