Federated Client Selection under Partial Visibility: A POMDP Approach with Spatio-Temporal Attention

Khaled B. Letaief; Pingyi Fan; Qijun Hou; Yuchen Shi

arxiv: 2605.11752 · v1 · submitted 2026-05-12 · 💻 cs.LG

Federated Client Selection under Partial Visibility: A POMDP Approach with Spatio-Temporal Attention

Qijun Hou , Yuchen Shi , Pingyi Fan , Khaled B. Letaief This is my paper

Pith reviewed 2026-05-13 06:15 UTC · model grok-4.3

classification 💻 cs.LG

keywords federated learningclient selectionpartial visibilityPOMDPreinforcement learningspatio-temporal attentiondata heterogeneity

0 comments

The pith

Federated client selection under partial visibility is solved by framing it as a POMDP and using spatio-temporal attention on historical models and client embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles client selection in federated learning when the server sees only a subset of clients each round due to real-world constraints. It models the problem as a partially observable Markov decision process and trains a reinforcement learning agent that attends to sequences of past global models together with fixed client identity embeddings. This lets the agent infer useful information about unobserved clients from temporal training patterns and persistent device traits. Experiments on several datasets show the approach outperforms baselines that assume full client visibility, particularly when data is heterogeneous. The result matters because large-scale or mobile federated systems routinely operate with incomplete observations.

Core claim

Formulating client selection under partial visibility as a POMDP and solving it with a spatio-temporal attention reinforcement learning policy that processes historical global models and client identity embeddings yields better aggregation performance than existing methods in heterogeneous settings.

What carries the argument

A reinforcement learning policy that applies spatio-temporal attention to sequences of past global models and client identity embeddings inside a POMDP formulation of the client selection task.

If this is right

Servers can achieve strong global models without requiring every client to be reachable in every round.
Persistent client characteristics can be learned once and reused across rounds to guide selection.
Temporal patterns in model updates become explicit signals for deciding which clients to include next.
The same attention structure can be retrained when the set of participating clients changes over time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be tested in settings where client availability follows predictable mobility patterns rather than random dropout.
Replacing the attention layers with simpler recurrent units might reveal how much of the gain comes from the spatio-temporal mechanism itself.
Extending the state to include data-size or compute-capacity embeddings could further reduce the impact of partial visibility.

Load-bearing premise

That sequences of past global models and fixed client identity embeddings contain enough information for attention to compensate for the missing observations of unselected clients.

What would settle it

A controlled experiment on a new dataset with controlled client availability rates where the proposed method shows no accuracy gain over random or full-visibility baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.11752 by Khaled B. Letaief, Pingyi Fan, Qijun Hou, Yuchen Shi.

**Figure 2.** Figure 2: Architecture of the Spatio-Temporal Attention-based Q [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Accuracy versus Communication epochs under various settings. The curves are smoothed using a moving average with a window [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Accuracy versus Communication epochs on UCI-HAR. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Accuracy versus Communication epochs for different [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Federated learning relies on effective client selection to alleviate the performance degradation caused by data heterogeneity. Most existing methods assume full visibility of all clients at each communication round. However, in large-scale or edge-based deployments, the server can only access a subset of clients due to communication, mobility, or availability constraints, resulting in partial visibility where only a subset of clients is observable for aggregation in each communication round. In this paper, we formulate federated client selection under partial visibility as a Partially Observable Markov Decision Process (POMDP) and propose a Spatial-Temporal attention-based reinforcement learning framework. By integrating historical global models and client identity embeddings, the proposed method captures both the temporal contexts of training and the persistent characteristics of clients. Experimental results across multiple datasets demonstrate that our approach achieves superior performance compared to existing baselines in heterogeneous and partially visible settings, validating its effectiveness in addressing the challenges of incomplete observations in practical federated learning systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

This paper frames partial-visibility client selection in federated learning as a POMDP and adds spatio-temporal attention RL on historical models and client embeddings, but the abstract supplies no experimental details to check whether it works. The modeling choice makes sense because real edge or large-scale FL rarely gives the server every client at once, and most prior selection methods assume full visibility. Treating the problem as a POMDP lets the policy act on incomplete observations, and feeding past global aggregates plus client identity embeddings through attention is a straightforward way to try to recover some temporal and client-specific signal. That combination is new enough in this setting to be worth noting. The paper also states the practical motivation clearly: communication limits, mobility, and availability constraints create exactly this partial-observability regime, and data heterogeneity makes naive selection worse. Those points are useful for anyone who has tried to run FL outside controlled clusters. The main weakness is the complete absence of experimental substance in the abstract. It claims superior performance across multiple datasets against existing baselines yet shows no metrics, no simulation protocol for which clients are visible each round, and no list of comparators. Without those, it is impossible to judge whether the attention layers actually compensate for missing clients whose data distributions differ from the visible set. The stress-test concern about historical aggregates carrying little information about hidden clients therefore stands, at least on the basis of what is written. The citation pattern follows the usual FL and RL lines but does not appear to engage deeply with prior POMDP work in distributed optimization. This paper is aimed at researchers building federated systems for mobile or IoT environments where client availability is unreliable. A reader who needs a concrete starting point for handling incomplete observations could extract the POMDP formulation and attention idea, but only if the full experiments later prove reproducible and robust. I would send it for peer review because the underlying problem is real and the proposed structure is not obviously flawed, even though the current version leaves the empirical claims uncheckable and would need detailed ablations and availability schedules before it could be accepted.

Referee Report

2 major / 0 minor

Summary. The paper formulates federated client selection under partial visibility as a Partially Observable Markov Decision Process (POMDP) and proposes a Spatial-Temporal attention-based reinforcement learning framework. By integrating historical global models and client identity embeddings, the method aims to capture temporal contexts and persistent client characteristics to address incomplete observations. It claims superior performance compared to existing baselines across multiple datasets in heterogeneous and partially visible settings.

Significance. If the experimental results hold and the attention mechanism is shown to reliably compensate for missing observations, the work would be significant for practical federated learning deployments. It directly tackles the realistic constraint of partial client availability (due to communication, mobility, or edge constraints) that standard full-visibility client selection methods ignore, potentially improving convergence and accuracy in heterogeneous environments.

major comments (2)

Abstract: The abstract asserts superior performance on multiple datasets but supplies no experimental details, metrics, baselines, or validation procedures, so it is impossible to determine whether the data actually supports the stated claim. This is load-bearing for the central empirical claim.
Method description (POMDP formulation and attention mechanism): The central assumption that feeding historical global models (aggregated only over the visible subset at each round) plus client identity embeddings through spatio-temporal attention suffices to recover information about unobserved clients is not justified. When client data distributions are heterogeneous and non-stationary, the aggregate observation carries little information about hidden clients' gradients or loss surfaces; the attention can at best fit correlations in the particular simulation schedule, with no analysis showing transfer to real availability patterns or clients never observed together.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the thorough and constructive review of our manuscript. We appreciate the identification of areas where the presentation and justification can be strengthened. Below we provide point-by-point responses to the major comments, indicating the specific revisions we will make in the next version.

read point-by-point responses

Referee: Abstract: The abstract asserts superior performance on multiple datasets but supplies no experimental details, metrics, baselines, or validation procedures, so it is impossible to determine whether the data actually supports the stated claim. This is load-bearing for the central empirical claim.

Authors: We agree that the abstract would benefit from greater specificity. In the revised manuscript we will expand the abstract to explicitly name the datasets (MNIST, CIFAR-10, Shakespeare), the evaluation metrics (test accuracy and number of communication rounds to target accuracy), the baselines (Random, FedAvg, FedProx, and prior RL-based selectors), and the partial-visibility simulation protocol (random and correlated client dropout at varying ratios). These additions will allow readers to directly assess the empirical support for our claims. revision: yes
Referee: Method description (POMDP formulation and attention mechanism): The central assumption that feeding historical global models (aggregated only over the visible subset at each round) plus client identity embeddings through spatio-temporal attention suffices to recover information about unobserved clients is not justified. When client data distributions are heterogeneous and non-stationary, the aggregate observation carries little information about hidden clients' gradients or loss surfaces; the attention can at best fit correlations in the particular simulation schedule, with no analysis showing transfer to real availability patterns or clients never observed together.

Authors: We acknowledge the concern that the justification rests primarily on empirical performance rather than a formal recovery guarantee. The POMDP formulation explicitly treats unobserved clients as hidden states, and the spatio-temporal attention is designed to extract temporal patterns from the sequence of partial aggregates together with static client embeddings. Our experiments across multiple heterogeneity levels and visibility ratios demonstrate consistent improvements, indicating that the learned policy exploits observable correlations effectively. To strengthen the presentation we will add a dedicated limitations subsection that discusses the non-stationarity assumption, reports additional ablation results on correlated versus independent dropout schedules, and clarifies that generalization to real-world availability traces remains an open question for future study. revision: partial

Circularity Check

0 steps flagged

No circularity: new POMDP formulation and attention framework presented as independent construction

full rationale

The paper formulates client selection under partial visibility as a POMDP and proposes a spatio-temporal attention RL method that integrates historical global models with client identity embeddings. No equations, derivations, or self-citations are shown that reduce any claimed prediction or result to a fitted quantity or prior result by construction. Experimental comparisons to baselines on multiple datasets constitute independent validation rather than tautological output. The derivation chain is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; none can be extracted or audited from the given text.

pith-pipeline@v0.9.0 · 5468 in / 1137 out tokens · 47513 ms · 2026-05-13T06:15:45.771290+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

FLASH-RL: Federated Learning Address- ing System and Static Heterogeneity using Reinforcement Learning

[Bouazizet al., 2023 ] Sofiane Bouaziz, Hadjer Benmeziane, Youcef Imine, Leila Hamdad, Smail Niar, and Hamza Ouarnoughi. FLASH-RL: Federated Learning Address- ing System and Static Heterogeneity using Reinforcement Learning. In2023 IEEE 41st International Conference on Computer Design (ICCD), pages 444–447, November

work page 2023
[2]

[Changet al., 2018 ] Ken Chang, Niranjan Balachandar, Car- son Lam, Darvin Yi, James Brown, Andrew Beers, Bruce Rosen, Daniel L Rubin, and Jayashree Kalpathy-Cramer

ISSN: 2576-6996. [Changet al., 2018 ] Ken Chang, Niranjan Balachandar, Car- son Lam, Darvin Yi, James Brown, Andrew Beers, Bruce Rosen, Daniel L Rubin, and Jayashree Kalpathy-Cramer. Distributed deep learning networks among institutions for medical imaging.Journal of the American Medical Infor- matics Association, 25(8):945–954,

work page 2018
[3]

Personalized federated learning with attention- based client selection

[Chenet al., 2024b ] Zihan Chen, Jundong Li, and Cong Shen. Personalized federated learning with attention- based client selection. InICASSP 2024-2024 IEEE In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6930–6934. IEEE,

work page 2024
[4]

Fedfmc: Sequential efficient federated learning on non-iid data,

[Kopparapu and Lin, 2020] Kavya Kopparapu and Eric Lin. Fedfmc: Sequential efficient federated learning on non-iid data.arXiv preprint arXiv:2006.10937,

work page arXiv 2020
[5]

Federated optimization in heterogeneous networks.Pro- ceedings of Machine learning and systems, 2:429–450,

[Liet al., 2020 ] Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks.Pro- ceedings of Machine learning and systems, 2:429–450,

work page 2020
[6]

Communication-efficient learning of deep networks from decentralized data

[McMahanet al., 2017 ] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Ar- cas. Communication-efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–1282. PMLR,

work page 2017
[7]

A survey of pomdp solution techniques.environment, 2(10):1–12,

[Murphy and others, 2000] Kevin P Murphy et al. A survey of pomdp solution techniques.environment, 2(10):1–12,

work page 2000
[8]

Human activity recognition using smartphones

[Reyes-Ortizet al., 2013 ] Jorge Reyes-Ortiz, Davide An- guita, Alessandro Ghio, Luca Oneto, and Xavier Parra. Human activity recognition using smartphones. UCI Machine Learning Repository,

work page 2013
[9]

Reyes-Ortiz, D

DOI: 10.24432/C54S4K. [Shiet al., 2026 ] Yuchen Shi, Qijun Hou, Pingyi Fan, and Khaled B. Letaief. Edgeflow: Serverless federated learn- ing via sequential model migration in edge networks,

work page doi:10.24432/c54s4k 2026
[10]

FedAgent: Federated Learning on Non-IID Data via Reinforcement Learning and Knowledge Distilla- tion.Expert Systems with Applications, 285:127973, Au- gust

[Sunet al., 2025 ] Bingli Sun, Xiao Song, Yuchun Tu, and Ming Liu. FedAgent: Federated Learning on Non-IID Data via Reinforcement Learning and Knowledge Distilla- tion.Expert Systems with Applications, 285:127973, Au- gust

work page 2025
[11]

Attention is all you need.Advances in neural information processing systems, 30,

[Vaswaniet al., 2017 ] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30,

work page 2017
[12]

Optimizing federated learning on non-iid data with reinforcement learning

[Wanget al., 2020 ] Hao Wang, Zakhary Kaplan, Di Niu, and Baochun Li. Optimizing federated learning on non-iid data with reinforcement learning. InIEEE INFOCOM 2020 - IEEE Conference on Computer Communications, pages 1698–1707,

work page 2020
[13]

Fedabc: Attention-based client selection for federated learning with long-term view

[Yeet al., 2025 ] Wenxuan Ye, Xueli An, Junfan Wang, Xue- qiang Yan, and Georg Carle. Fedabc: Attention-based client selection for federated learning with long-term view. InICC 2025-IEEE International Conference on Commu- nications, pages 801–806. IEEE,

work page 2025
[14]

Snake learning: A communication-and computation-efficient distributed learning framework for 6g.IEEE Communications Magazine,

[Yuet al., 2025 ] Xiaoxue Yu, Xingfu Yi, Rongpeng Li, Fei Wang, Chenghui Peng, Zhifeng Zhao, and Hong- gang Zhang. Snake learning: A communication-and computation-efficient distributed learning framework for 6g.IEEE Communications Magazine,

work page 2025
[15]

Overcoming Forgetting Using Adaptive Federated Learn- ing for IIoT Devices With Non-IID Data.IEEE Internet of Things Journal, pages 1–1,

[Zhanget al., 2025 ] Benteng Zhang, Yingchi Mao, Haowen Xu, Yihan Chen, Tasiu Muazu, Xiaoming He, and Jie Wu. Overcoming Forgetting Using Adaptive Federated Learn- ing for IIoT Devices With Non-IID Data.IEEE Internet of Things Journal, pages 1–1,

work page 2025

[1] [1]

FLASH-RL: Federated Learning Address- ing System and Static Heterogeneity using Reinforcement Learning

[Bouazizet al., 2023 ] Sofiane Bouaziz, Hadjer Benmeziane, Youcef Imine, Leila Hamdad, Smail Niar, and Hamza Ouarnoughi. FLASH-RL: Federated Learning Address- ing System and Static Heterogeneity using Reinforcement Learning. In2023 IEEE 41st International Conference on Computer Design (ICCD), pages 444–447, November

work page 2023

[2] [2]

[Changet al., 2018 ] Ken Chang, Niranjan Balachandar, Car- son Lam, Darvin Yi, James Brown, Andrew Beers, Bruce Rosen, Daniel L Rubin, and Jayashree Kalpathy-Cramer

ISSN: 2576-6996. [Changet al., 2018 ] Ken Chang, Niranjan Balachandar, Car- son Lam, Darvin Yi, James Brown, Andrew Beers, Bruce Rosen, Daniel L Rubin, and Jayashree Kalpathy-Cramer. Distributed deep learning networks among institutions for medical imaging.Journal of the American Medical Infor- matics Association, 25(8):945–954,

work page 2018

[3] [3]

Personalized federated learning with attention- based client selection

[Chenet al., 2024b ] Zihan Chen, Jundong Li, and Cong Shen. Personalized federated learning with attention- based client selection. InICASSP 2024-2024 IEEE In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6930–6934. IEEE,

work page 2024

[4] [4]

Fedfmc: Sequential efficient federated learning on non-iid data,

[Kopparapu and Lin, 2020] Kavya Kopparapu and Eric Lin. Fedfmc: Sequential efficient federated learning on non-iid data.arXiv preprint arXiv:2006.10937,

work page arXiv 2020

[5] [5]

Federated optimization in heterogeneous networks.Pro- ceedings of Machine learning and systems, 2:429–450,

[Liet al., 2020 ] Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks.Pro- ceedings of Machine learning and systems, 2:429–450,

work page 2020

[6] [6]

Communication-efficient learning of deep networks from decentralized data

[McMahanet al., 2017 ] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Ar- cas. Communication-efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–1282. PMLR,

work page 2017

[7] [7]

A survey of pomdp solution techniques.environment, 2(10):1–12,

[Murphy and others, 2000] Kevin P Murphy et al. A survey of pomdp solution techniques.environment, 2(10):1–12,

work page 2000

[8] [8]

Human activity recognition using smartphones

[Reyes-Ortizet al., 2013 ] Jorge Reyes-Ortiz, Davide An- guita, Alessandro Ghio, Luca Oneto, and Xavier Parra. Human activity recognition using smartphones. UCI Machine Learning Repository,

work page 2013

[9] [9]

Reyes-Ortiz, D

DOI: 10.24432/C54S4K. [Shiet al., 2026 ] Yuchen Shi, Qijun Hou, Pingyi Fan, and Khaled B. Letaief. Edgeflow: Serverless federated learn- ing via sequential model migration in edge networks,

work page doi:10.24432/c54s4k 2026

[10] [10]

FedAgent: Federated Learning on Non-IID Data via Reinforcement Learning and Knowledge Distilla- tion.Expert Systems with Applications, 285:127973, Au- gust

[Sunet al., 2025 ] Bingli Sun, Xiao Song, Yuchun Tu, and Ming Liu. FedAgent: Federated Learning on Non-IID Data via Reinforcement Learning and Knowledge Distilla- tion.Expert Systems with Applications, 285:127973, Au- gust

work page 2025

[11] [11]

Attention is all you need.Advances in neural information processing systems, 30,

[Vaswaniet al., 2017 ] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30,

work page 2017

[12] [12]

Optimizing federated learning on non-iid data with reinforcement learning

[Wanget al., 2020 ] Hao Wang, Zakhary Kaplan, Di Niu, and Baochun Li. Optimizing federated learning on non-iid data with reinforcement learning. InIEEE INFOCOM 2020 - IEEE Conference on Computer Communications, pages 1698–1707,

work page 2020

[13] [13]

Fedabc: Attention-based client selection for federated learning with long-term view

[Yeet al., 2025 ] Wenxuan Ye, Xueli An, Junfan Wang, Xue- qiang Yan, and Georg Carle. Fedabc: Attention-based client selection for federated learning with long-term view. InICC 2025-IEEE International Conference on Commu- nications, pages 801–806. IEEE,

work page 2025

[14] [14]

Snake learning: A communication-and computation-efficient distributed learning framework for 6g.IEEE Communications Magazine,

[Yuet al., 2025 ] Xiaoxue Yu, Xingfu Yi, Rongpeng Li, Fei Wang, Chenghui Peng, Zhifeng Zhao, and Hong- gang Zhang. Snake learning: A communication-and computation-efficient distributed learning framework for 6g.IEEE Communications Magazine,

work page 2025

[15] [15]

Overcoming Forgetting Using Adaptive Federated Learn- ing for IIoT Devices With Non-IID Data.IEEE Internet of Things Journal, pages 1–1,

[Zhanget al., 2025 ] Benteng Zhang, Yingchi Mao, Haowen Xu, Yihan Chen, Tasiu Muazu, Xiaoming He, and Jie Wu. Overcoming Forgetting Using Adaptive Federated Learn- ing for IIoT Devices With Non-IID Data.IEEE Internet of Things Journal, pages 1–1,

work page 2025