pith. sign in

arxiv: 2409.03897 · v3 · pith:QDRCHRAKnew · submitted 2024-09-05 · 💻 cs.LG · cs.DC

On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments

Pith reviewed 2026-05-23 20:37 UTC · model grok-4.3

classification 💻 cs.LG cs.DC
keywords federated Q-learningconvergence ratesenvironmental heterogeneitysynchronous averagingerror boundsreinforcement learningmulti-agent systems
0
0 comments X

The pith

In heterogeneous environments, federated Q-learning cannot achieve error decay faster than Θ(E/T) when agents average every E iterations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies synchronous federated Q-learning in which K agents average their local Q-estimates every E iterations while facing heterogeneous environments. It shows that sampling randomness still yields linear speedup in K, but that E greater than 1 produces clear degradation unlike in homogeneous cases. The error trajectory is tracked in detail and shown to approach zero with growing T, yet a matching lower bound of order E/T on the infinity-norm error is established for many stepsize schedules. Experiments further reveal a two-phase pattern of fast initial decay followed by an uptick and plateau.

Core claim

The paper proves that in the presence of environmental heterogeneity, the ℓ_∞ norm of the error in synchronous federated Q-learning cannot decay faster than Θ(E/T) for a wide range of stepsizes, where E is the number of local iterations between averaging steps and T is the total number of iterations. This limit is fundamental and not due to analysis looseness. Additionally, the convergence exhibits a two-phase behavior with rapid initial decay followed by stabilization.

What carries the argument

Synchronous periodic averaging of local Q-estimates every E iterations, which interacts with persistent environmental heterogeneity to sustain differences across agents.

If this is right

  • Sampling errors continue to enjoy linear speedup in the number of agents K.
  • Larger averaging intervals E slow the overall error decay proportionally.
  • The error still reaches zero as T grows, but only at the reduced rate set by E.
  • Switching stepsizes after the observed phase-transition point improves total convergence speed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Communication frequency may need to increase with the degree of environmental heterogeneity to preserve fast convergence.
  • Analogous rate ceilings could appear in other federated reinforcement-learning methods that rely on periodic model averaging.
  • Heterogeneity-aware schedules that adjust E over time might reduce the slowdown without raising communication cost.

Load-bearing premise

The analysis relies on synchronous periodic averaging where environmental heterogeneity creates persistent differences in local Q-estimates that infrequent communication does not fully cancel.

What would settle it

An experiment in which the ℓ_∞ error in a heterogeneous multi-agent Q-learning run with E>1 decays faster than order E/T for large T would contradict the proven lower bound.

Figures

Figures reproduced from arXiv: 2409.03897 by Leo Muxing Wang, Lili Su, Pengkun Yang.

Figure 1
Figure 1. Figure 1: A federated learning system instance. However, many large-scale multi-agent systems are often deployed across wide geographic areas, result￾ing in agents interacting with heterogeneous envi￾ronments. For instance, connected and autonomous vehicles (CAVs) operating in various regions of a metropolitan area encounter diverse conditions such as varying traffic patterns, road infrastructure, and local regulati… view at source ↗
Figure 2
Figure 2. Figure 2: The ℓ∞ error of different constant stepsizes under the heterogeneous and homogenous settings. constant stepsizes are often used in reinforcement learning problems because of the great performance in applications as described in Sutton & Barto (2018), they suffer significant performance degradation in the presence of environmental heterogeneity. Impacts of the synchronization period E. Furthermore, we test … view at source ↗
Figure 3
Figure 3. Figure 3: Heterogeneous environments with varying E. From left to right, E = 1, 20, 40, and ∞ respectively period E. As shown in [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Choosing different stepsizes for phases 1 [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Homogeneous FQL with varying E. G.2 Different target error levels. In [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Convergence performance of different tolerance levels of different stepsize choices. The horizontal dashed lines represent the tolerance levels not met, while the vertical dashed lines indicate the iterations at which the training processes meet the corresponding tolerance levels. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗
read the original abstract

Large-scale multi-agent systems are often deployed across wide geographic areas, where agents interact with heterogeneous environments. There is an emerging interest in understanding the role of heterogeneity in the performance of the federated versions of classic reinforcement learning algorithms. In this paper, we study synchronous federated Q-learning, which aims to learn an optimal Q-function by having $K$ agents average their local Q-estimates per $E$ iterations. We observe an interesting phenomenon on the convergence speeds in terms of $K$ and $E$. Similar to the homogeneous environment settings, there is a linear speed-up concerning $K$ in reducing the errors that arise from sampling randomness. Yet, in sharp contrast to the homogeneous settings, $E>1$ leads to significant performance degradation. Specifically, we provide a fine-grained characterization of the error evolution in the presence of environmental heterogeneity, which decay to zero as the number of iterations $T$ increases. The slow convergence of having $E>1$ turns out to be fundamental rather than an artifact of our analysis. We prove that, for a wide range of stepsizes, the $\ell_{\infty}$ norm of the error cannot decay faster than $\Theta (E/T)$. In addition, our experiments demonstrate that the convergence exhibits an interesting two-phase phenomenon. For any given stepsize, there is a sharp phase-transition of the convergence: the error decays rapidly in the beginning yet later bounces up and stabilizes. Provided that the phase-transition time can be estimated, choosing different stepsizes for the two phases leads to faster overall convergence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper analyzes synchronous federated Q-learning with K agents that average local Q-estimates every E iterations in heterogeneous MDPs. It claims linear speedup in K for sampling-induced errors (similar to homogeneous cases) but shows that E > 1 causes fundamental degradation due to persistent heterogeneity effects. The authors provide a fine-grained error evolution characterization that decays with T, prove that the ℓ_∞ error cannot decay faster than Θ(E/T) for a wide range of stepsizes, and report experiments exhibiting a two-phase convergence phenomenon (rapid initial decay followed by stabilization) that can be mitigated by stepsize scheduling across phases.

Significance. If the lower bound holds beyond the specific heterogeneity constructions examined, the result establishes a fundamental communication-frequency limit in heterogeneous federated RL, which is relevant for large-scale multi-agent deployments. The two-phase experimental observation and the explicit separation of sampling vs. heterogeneity errors are useful for algorithm design. The paper supplies both upper-bound analysis and matching lower bounds plus reproducible experiments, which strengthens its contribution relative to purely asymptotic claims.

major comments (2)
  1. [Lower-bound theorem] Lower-bound theorem (likely §4 or §5): the Θ(E/T) claim is presented as fundamental for heterogeneous environments, yet the construction appears to rely on MDPs where local optimal Q-functions differ by a fixed additive bias that periodic averaging cannot cancel. It is unclear whether the bound extends to general heterogeneous MDPs (e.g., those where optimal Q* coincide or where bias averages to zero under infrequent communication). This directly affects whether the result rules out faster rates under natural heterogeneity models.
  2. [Error decomposition] Error decomposition (likely Eq. (X) in the convergence analysis): the characterization separates sampling variance (which benefits from K) from a heterogeneity term that scales with E. The proof must explicitly verify that the heterogeneity term remains non-vanishing for the stated stepsize range; otherwise the lower bound reduces to a fitted quantity rather than an independent limit.
minor comments (2)
  1. [Abstract] The abstract states the lower bound holds 'for a wide range of stepsizes' without specifying the exact interval or dependence on E and heterogeneity parameters; this should be stated precisely in the theorem statement.
  2. [Experiments] Figure captions for the two-phase experiments should include the exact stepsize values, E, K, and MDP parameters used so that the phase-transition time can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and insightful comments on our manuscript. We address each major comment in detail below, providing clarifications and indicating where revisions will be made.

read point-by-point responses
  1. Referee: [Lower-bound theorem] Lower-bound theorem (likely §4 or §5): the Θ(E/T) claim is presented as fundamental for heterogeneous environments, yet the construction appears to rely on MDPs where local optimal Q-functions differ by a fixed additive bias that periodic averaging cannot cancel. It is unclear whether the bound extends to general heterogeneous MDPs (e.g., those where optimal Q* coincide or where bias averages to zero under infrequent communication). This directly affects whether the result rules out faster rates under natural heterogeneity models.

    Authors: Our lower bound is established for a specific class of heterogeneous MDPs in which the local optimal Q-functions differ by a constant additive bias that cannot be eliminated through periodic averaging. This construction is intended to capture scenarios where environmental heterogeneity leads to persistent discrepancies. We do not assert that the Θ(E/T) rate applies universally to all heterogeneous MDPs; indeed, when local optima coincide (reducing to the homogeneous case), faster convergence is possible. The result serves to highlight the potential fundamental limit imposed by communication frequency in the presence of non-cancellable heterogeneity. To address the concern, we will revise the manuscript to explicitly state the scope of the lower bound and provide a brief discussion on conditions under which faster rates might hold. revision: partial

  2. Referee: [Error decomposition] Error decomposition (likely Eq. (X) in the convergence analysis): the characterization separates sampling variance (which benefits from K) from a heterogeneity term that scales with E. The proof must explicitly verify that the heterogeneity term remains non-vanishing for the stated stepsize range; otherwise the lower bound reduces to a fitted quantity rather than an independent limit.

    Authors: The error decomposition in our analysis separates the sampling-induced error, which scales with 1/sqrt(KT) or similar, from the heterogeneity-induced term that scales with E/T. The proof shows that the heterogeneity term arises directly from the mismatch in local transition and reward functions and remains non-vanishing under the stepsize conditions considered (e.g., stepsizes of order 1/t). This is verified by bounding the bias term independently of the variance terms in the recursive error evolution. The lower bound then follows by showing that this term dominates and cannot be canceled. We will add an explicit statement or corollary confirming that the heterogeneity term is Ω(E/T) and does not vanish for the relevant stepsize range. revision: partial

Circularity Check

0 steps flagged

No significant circularity; lower bound is an independent proof

full rationale

The paper's central claim is a mathematical proof that the ℓ_∞ error cannot decay faster than Θ(E/T) under the stated heterogeneity model and synchronous averaging. No quoted step reduces the claimed rate to a fitted parameter, self-citation chain, or definitional tautology. The lower bound is presented as derived from the model assumptions rather than constructed from the target quantity itself. The derivation chain remains self-contained against external benchmarks, with the two-phase convergence observation also arising from direct analysis rather than renaming or smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the standard synchronous federated Q-learning model with periodic averaging and the presence of environmental heterogeneity; no explicit free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Synchronous federated Q-learning with periodic averaging every E iterations across heterogeneous environments
    The model and lower bound are stated for this specific synchronous averaging protocol.

pith-pipeline@v0.9.0 · 5813 in / 1157 out tokens · 33743 ms · 2026-05-23T20:37:00.254840+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 2 internal anchors

  1. [1]

    Reinforcement learning and optimal control, volume 1

    Dimitri Bertsekas. Reinforcement learning and optimal control, volume 1. Athena Scientific, 2019

  2. [2]

    Neuro-dynamic programming

    Dimitri Bertsekas and John N Tsitsiklis. Neuro-dynamic programming. Athena Scientific, 1996

  3. [3]

    A finite time analysis of temporal difference learning with linear function approximation

    Jalaj Bhandari, Daniel Russo, and Raghav Singal. A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pp.\ 1691--1692. PMLR, 2018

  4. [4]

    Bdfl: a byzantine-fault-tolerance decentralized federated learning method for autonomous vehicle

    Jin-Hua Chen, Min-Rong Chen, Guo-Qiang Zeng, and Jia-Si Weng. Bdfl: a byzantine-fault-tolerance decentralized federated learning method for autonomous vehicle. IEEE Transactions on Vehicular Technology, 70 0 (9): 0 8639--8652, 2021

  5. [5]

    Adaptive personalized federated learning,

    Yuyang Deng, Mohammad Mahdi Kamani, and Mehrdad Mahdavi. Adaptive personalized federated learning. arXiv preprint arXiv:2003.13461, 2020

  6. [6]

    Finite-time analysis of distributed td (0) with linear function approximation on multi-agent reinforcement learning

    Thinh Doan, Siva Maguluri, and Justin Romberg. Finite-time analysis of distributed td (0) with linear function approximation on multi-agent reinforcement learning. In International Conference on Machine Learning, pp.\ 1626--1635. PMLR, 2019

  7. [7]

    Federated learning for vehicular internet of things: Recent advances and open issues

    Zhaoyang Du, Celimuge Wu, Tsutomu Yoshinaga, Kok-Lim Alvin Yau, Yusheng Ji, and Jie Li. Federated learning for vehicular internet of things: Recent advances and open issues. IEEE Open Journal of the Computer Society, 1: 0 45--61, 2020

  8. [8]

    Federated reinforcement learning with environment heterogeneity

    Hao Jin, Yang Peng, Wenhao Yang, Shusen Wang, and Zhihua Zhang. Federated reinforcement learning with environment heterogeneity. In International Conference on Artificial Intelligence and Statistics, pp.\ 18--37. PMLR, 2022

  9. [9]

    Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G

    Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D’Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Cha...

  10. [10]

    Scaffold: Stochastic controlled averaging for federated learning

    Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pp.\ 5132--5143. PMLR, 2020

  11. [11]

    Federated reinforcement learning: Linear speedup under markovian sampling

    Sajad Khodadadian, Pranay Sharma, Gauri Joshi, and Siva Theja Maguluri. Federated reinforcement learning: Linear speedup under markovian sampling. In International Conference on Machine Learning, pp.\ 10997--11057. PMLR, 2022

  12. [12]

    Deep reinforcement learning for autonomous driving: A survey

    B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A Al Sallab, Senthil Yogamani, and Patrick P \'e rez. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23 0 (6): 0 4909--4926, 2021

  13. [13]

    Sample complexity of asynchronous q-learning: Sharper analysis and variance reduction

    Gen Li, Yuting Wei, Yuejie Chi, Yuantao Gu, and Yuxin Chen. Sample complexity of asynchronous q-learning: Sharper analysis and variance reduction. Advances in neural information processing systems, 33: 0 7031--7043, 2020 a

  14. [14]

    Is q-learning minimax optimal? a tight sample complexity analysis

    Gen Li, Changxiao Cai, Yuxin Chen, Yuting Wei, and Yuejie Chi. Is q-learning minimax optimal? a tight sample complexity analysis. Operations Research, 72 0 (1): 0 222--236, 2024

  15. [15]

    Federated optimization in heterogeneous networks

    Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems, 2: 0 429--450, 2020 b

  16. [16]

    Fedvision: An online visual object detection platform powered by federated learning

    Yang Liu, Anbu Huang, Yun Luo, He Huang, Youzhi Liu, Yuanyuan Chen, Lican Feng, Tianjian Chen, Han Yu, and Qiang Yang. Fedvision: An online visual object detection platform powered by federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.\ 13172--13179, 2020

  17. [17]

    Communication-efficient learning of deep networks from decentralized data

    Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp.\ 1273--1282. PMLR, 2017

  18. [18]

    Reinforcement learning environment for tactical networks

    Thies M \"o hlenhof, Norman Jansen, and Wiam Rachid. Reinforcement learning environment for tactical networks. In 2021 International Conference on Military Communication and Information Systems (ICMCIS), pp.\ 1--8. IEEE, 2021

  19. [19]

    D \"i ot: A federated self-learning anomaly detection system for iot

    Thien Duc Nguyen, Samuel Marchal, Markus Miettinen, Hossein Fereidooni, N Asokan, and Ahmad-Reza Sadeghi. D \"i ot: A federated self-learning anomaly detection system for iot. In 2019 IEEE 39th International conference on distributed computing systems (ICDCS), pp.\ 756--767. IEEE, 2019

  20. [20]

    A reinforcement learning approach to robust scheduling of semiconductor manufacturing facilities

    In-Beom Park, Jaeseok Huh, Joongkyun Kim, and Jonghun Park. A reinforcement learning approach to robust scheduling of semiconductor manufacturing facilities. IEEE Transactions on Automation Science and Engineering, 17 0 (3): 0 1420--1431, 2019

  21. [21]

    Fedsplit: An algorithmic framework for fast federated optimization

    Reese Pathak and Martin J Wainwright. Fedsplit: An algorithmic framework for fast federated optimization. Advances in neural information processing systems, 33: 0 7057--7066, 2020

  22. [22]

    Fingerprint policy optimisation for robust reinforcement learning

    Supratik Paul, Michael A Osborne, and Shimon Whiteson. Fingerprint policy optimisation for robust reinforcement learning. In International Conference on Machine Learning, pp.\ 5082--5091. PMLR, 2019

  23. [23]

    Privacy-preserving and uncertainty-aware federated trajectory prediction for connected autonomous vehicles

    Muzi Peng, Jiangwei Wang, Dongjin Song, Fei Miao, and Lili Su. Privacy-preserving and uncertainty-aware federated trajectory prediction for connected autonomous vehicles. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.\ 11141--11147, 2023. doi:10.1109/IROS55552.2023.10341638

  24. [24]

    Federated learning in vehicular networks: Opportunities and solutions

    Jason Posner, Lewis Tseng, Moayad Aloqaily, and Yaser Jararweh. Federated learning in vehicular networks: Opportunities and solutions. IEEE Network, 35 0 (2): 0 152--159, 2021

  25. [25]

    Federated reinforcement learning: techniques, applications, and open challenges

    Jiaju Qi, Qihao Zhou, Lei Lei, and Kan Zheng. Federated reinforcement learning: techniques, applications, and open challenges. Intelligence & Robotics, 2021. doi:10.20517/ir.2021.02. URL https://doi.org/10.20517

  26. [26]

    Federated Learning for Emoji Prediction in a Mobile Keyboard

    Swaroop Ramaswamy, Rajiv Mathews, Kanishka Rao, and Fran c oise Beaufays. Federated learning for emoji prediction in a mobile keyboard. arXiv preprint arXiv:1906.04329, 2019

  27. [27]

    Multi-institutional deep learning modeling without sharing patient data: A feasibility study on brain tumor segmentation

    Micah J Sheller, G Anthony Reina, Brandon Edwards, Jason Martin, and Spyridon Bakas. Multi-institutional deep learning modeling without sharing patient data: A feasibility study on brain tumor segmentation. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with M...

  28. [28]

    A non-parametric view of fedavg and fedprox: Beyond stationary points

    Lili Su, Jiaming Xu, and Pengkun Yang. A non-parametric view of fedavg and fedprox: Beyond stationary points. Journal of Machine Learning Research, 24 0 (203): 0 1--48, 2023

  29. [29]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto. Chapter 2.5 Tracking a Nonstationary Problem, Reinforcement Learning: An Introduction, chapter 8, pp.\ 33. The MIT Press, 2018

  30. [30]

    High-dimensional probability: An introduction with applications in data science, volume 47

    Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018

  31. [31]

    Atpfl: Automatic trajectory prediction model design under federated learning framework

    Chunnan Wang, Xiang Chen, Junzhe Wang, and Hongzhi Wang. Atpfl: Automatic trajectory prediction model design under federated learning framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 6563--6572, June 2022 a

  32. [32]

    Federated temporal difference learning with linear function approximation under environmental heterogeneity

    Han Wang, Aritra Mitra, Hamed Hassani, George J Pappas, and James Anderson. Federated temporal difference learning with linear function approximation under environmental heterogeneity. arXiv preprint arXiv:2302.02212, 2023

  33. [33]

    Tackling the objective inconsistency problem in heterogeneous federated optimization

    Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, and H Vincent Poor. Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33: 0 7611--7623, 2020

  34. [34]

    On the unreasonable effectiveness of federated averaging with heterogeneous data

    Jianyu Wang, Rudrajit Das, Gauri Joshi, Satyen Kale, Zheng Xu, and Tong Zhang. On the unreasonable effectiveness of federated averaging with heterogeneous data. arXiv preprint arXiv:2206.04723, 2022 b

  35. [35]

    Q-learning

    Christopher Watkins and Peter Dayan. Q-learning. Machine Learning, 8: 0 279--292, 1992. URL https://api.semanticscholar.org/CorpusID:208910339

  36. [36]

    The blessing of heterogeneity in federated q-learning: Linear speedup and beyond

    Jiin Woo, Gauri Joshi, and Yuejie Chi. The blessing of heterogeneity in federated q-learning: Linear speedup and beyond. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Resear...

  37. [37]

    FedKL : Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence

    Zhijie Xie and Shenghui Song. FedKL : Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence . IEEE Journal on Selected Areas in Communications, 41 0 (4): 0 1227--1242, April 2023. ISSN 1558-0008. doi:10.1109/JSAC.2023.3242734. URL https://ieeexplore.ieee.org/abstract/document/10038492?casa_token=yGyMDlnL_FsAAAAA:hqNvz...

  38. [38]

    Experiments of federated learning for covid-19 chest x-ray images

    Bingjie Yan, Jun Wang, Jieren Cheng, Yize Zhou, Yixian Zhang, Yifan Yang, Li Liu, Haojiang Zhao, Chunjuan Wang, and Boyi Liu. Experiments of federated learning for covid-19 chest x-ray images. In Advances in Artificial Intelligence and Security: 7th International Conference, ICAIS 2021, Dublin, Ireland, July 19-23, 2021, Proceedings, Part II 7, pp.\ 41--5...

  39. [39]

    Applied Federated Learning: Improving Google Keyboard Query Suggestions

    Timothy Yang, Galen Andrew, Hubert Eichner, Haicheng Sun, Wei Li, Nicholas Kong, Daniel Ramage, and Fran c oise Beaufays. Applied federated learning: Improving google keyboard query suggestions. arXiv preprint arXiv:1812.02903, 2018

  40. [40]

    Learning context-aware policies from multiple smart homes via federated multi-task learning

    Tianlong Yu, Tian Li, Yuqiong Sun, Susanta Nanda, Virginia Smith, Vyas Sekar, and Srinivasan Seshan. Learning context-aware policies from multiple smart homes via federated multi-task learning. In 2020 IEEE/ACM Fifth International Conference on Internet-of-Things Design and Implementation (IoTDI), pp.\ 104--115. IEEE, 2020

  41. [41]

    Federated learning on the road autonomous controller design for connected and autonomous vehicles

    Tengchan Zeng, Omid Semiari, Mingzhe Chen, Walid Saad, and Mehdi Bennis. Federated learning on the road autonomous controller design for connected and autonomous vehicles. IEEE Transactions on Wireless Communications, 21 0 (12): 0 10407--10423, 2022

  42. [42]

    Finite-time analysis of on-policy heterogeneous federated reinforcement learning

    Chenyu Zhang, Han Wang, Aritra Mitra, and James Anderson. Finite-time analysis of on-policy heterogeneous federated reinforcement learning. In The Twelfth International Conference on Learning Representations, 2023 a

  43. [43]

    Finite-time analysis of on-policy heterogeneous federated reinforcement learning

    Chenyu Zhang, Han Wang, Aritra Mitra, and James Anderson. Finite-time analysis of on-policy heterogeneous federated reinforcement learning. arXiv preprint arXiv:2401.15273, 2024

  44. [44]

    On the convergence of sarsa with linear function approximation

    Shangtong Zhang, Remi Tachet Des Combes, and Romain Laroche. On the convergence of sarsa with linear function approximation. In International Conference on Machine Learning, pp.\ 41613--41646. PMLR, 2023 b

  45. [45]

    Federated q-learning: Linear regret speedup with low communication cost

    Zhong Zheng, Fengyu Gao, Lingzhou Xue, and Jing Yang. Federated q-learning: Linear regret speedup with low communication cost. arXiv preprint arXiv:2312.15023, 2023