Flower: A Friendly Federated Learning Research Framework
Pith reviewed 2026-05-22 14:01 UTC · model grok-4.3
The pith
Flower is a federated learning framework that supports experiments with 15 million clients using only two high-end GPUs and allows seamless migration to real devices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Flower provides new facilities to execute large-scale FL experiments and consider richly heterogeneous FL device scenarios. Our experiments show Flower can perform FL experiments up to 15M in client size using only a pair of high-end GPUs. Researchers can then seamlessly migrate experiments to real devices to examine other parts of the design space.
What carries the argument
The Flower framework, which supplies execution facilities for large-scale client simulations and heterogeneous device scenarios in federated learning.
Load-bearing premise
The simulation of heterogeneous device scenarios in Flower accurately represents the behavior of real edge hardware so that code and results transfer without major adjustments.
What would settle it
Execute the same federated learning workload first in Flower simulation and then on a set of real heterogeneous edge devices, then compare convergence speed, resource usage, and final model accuracy for discrepancies.
read the original abstract
Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn a shared prediction model, while keeping their training data on the device, thereby decoupling the ability to do machine learning from the need to store the data in the cloud. However, FL is difficult to implement realistically, both in terms of scale and systems heterogeneity. Although there are a number of research frameworks available to simulate FL algorithms, they do not support the study of scalable FL workloads on heterogeneous edge devices. In this paper, we present Flower -- a comprehensive FL framework that distinguishes itself from existing platforms by offering new facilities to execute large-scale FL experiments and consider richly heterogeneous FL device scenarios. Our experiments show Flower can perform FL experiments up to 15M in client size using only a pair of high-end GPUs. Researchers can then seamlessly migrate experiments to real devices to examine other parts of the design space. We believe Flower provides the community with a critical new tool for FL study and development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Flower, a federated learning research framework intended to address limitations in existing platforms by supporting large-scale FL experiments (up to 15M clients) on heterogeneous edge devices. It claims that such experiments can be run using only a pair of high-end GPUs, after which the same code can be seamlessly migrated to real devices to explore additional design-space aspects.
Significance. If the scalability and migration claims are substantiated with reproducible evidence, Flower would offer a practical new tool for the FL community to study workloads that combine extreme scale with device heterogeneity, areas where current simulation frameworks are reported to fall short.
major comments (2)
- [Abstract] Abstract: the central claim that 'Flower can perform FL experiments up to 15M in client size using only a pair of high-end GPUs' is presented without any description of the experimental setup, including the FL algorithm used, dataset, communication model, client sampling strategy, or hardware configuration. This absence prevents verification of whether the result is load-bearing or dependent on unstated assumptions about homogeneity or ideal networking.
- [Abstract] Abstract: the statement that researchers 'can then seamlessly migrate experiments to real devices' is asserted without any reported evidence, metrics, or discussion of the interface or adjustments required for the transition. This claim is load-bearing for the paper's positioning as a bridge between simulation and deployment.
minor comments (1)
- [Abstract] Abstract: the phrase 'richly heterogeneous FL device scenarios' is used without elaboration on the dimensions of heterogeneity (compute, network, data distribution, availability) that the framework actually models.
Simulated Author's Rebuttal
We thank the referee for their valuable feedback on our manuscript. We respond to the major comments point by point, being honest about the content available in the provided manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'Flower can perform FL experiments up to 15M in client size using only a pair of high-end GPUs' is presented without any description of the experimental setup, including the FL algorithm used, dataset, communication model, client sampling strategy, or hardware configuration. This absence prevents verification of whether the result is load-bearing or dependent on unstated assumptions about homogeneity or ideal networking.
Authors: We agree with the referee that the abstract does not include the experimental setup details. The provided manuscript text is limited to the abstract, so these specifics are not present here. In a revision, we will either expand the abstract slightly to include key aspects of the setup or ensure the full paper clearly cross-references the experiments demonstrating this scalability. revision: yes
-
Referee: [Abstract] Abstract: the statement that researchers 'can then seamlessly migrate experiments to real devices' is asserted without any reported evidence, metrics, or discussion of the interface or adjustments required for the transition. This claim is load-bearing for the paper's positioning as a bridge between simulation and deployment.
Authors: The referee correctly notes that the abstract asserts the seamless migration without providing evidence or details in the given text. Since only the abstract is available, we do not have the supporting discussion here. We will revise the manuscript to include a brief description or reference to the relevant section discussing the interface that enables this migration. revision: yes
- The specific experimental setup for the 15M client simulation and the evidence/metrics for seamless migration to real devices, as these are not described in the provided abstract.
Circularity Check
No significant circularity
full rationale
This is a software framework paper rather than a mathematical derivation. The central claims concern tool capabilities for large-scale FL simulation (up to 15M clients on two GPUs) and seamless migration to real devices. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the available text. The paper is self-contained as a description of a research tool whose claims can be externally validated through code inspection and independent replication.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith.Foundation.DimensionForcingdimension_forced unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our experiments show Flower can perform FL experiments up to 15M in client size using only a pair of high-end GPUs. Researchers can then seamlessly migrate experiments to real devices to examine other parts of the design space.
-
IndisputableMonolith.Foundation.LedgerCanonicalityZeroParameterComparisonLedger unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Flower -- a comprehensive FL framework that distinguishes itself from existing platforms by offering new facilities to execute large-scale FL experiments and consider richly heterogeneous FL device scenarios.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 22 Pith papers
-
A Typed Tensor Language for Federated Learning
A typed tensor language formalizes federated computations via virtual global tensor semantics and proves shared-state factorization for one-round and iterative programs, plus a differentiable fragment for gradient descent.
-
Federated Learning for Surgical Vision in Appendicitis Classification: Results of the FedSurg EndoVis 2024 Challenge
The FedSurg challenge benchmarks federated learning on appendectomy videos and finds only 26% F1 on unseen centers even with centralized data, plus extra penalties from decentralization, with spatiotemporal models per...
-
FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation
FeDa4Fair is a new library and benchmark for creating federated datasets with heterogeneous client-level biases to standardize evaluation of fairness methods in federated learning.
-
Beyond Assumptions: Measuring Federated Learning over Real 5G Networks
Real 5G testbed experiments show consistent stragglers in 70% of federated learning trials due to communication delays, challenging common wireless FL assumptions.
-
Model Merging: Foundations and Algorithms
New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.
-
When To Adapt? Adapting the Model or Data in Federated Medical Imaging
Harmonization works better than personalization for appearance-based domain shifts in federated medical imaging while personalization is superior for structural shifts, with both performing similarly when shifts are small.
-
Robust Synchronisation for Federated Learning in The Face of Correlated Device Failure
AW-PSP dynamically weights node sampling by real-time availability predictions and failure correlations to improve robustness, label coverage, and fairness in federated learning under correlated device failures.
-
CroSatFL: Energy-Efficient Federated Learning with Cross-Aggregation for Satellite Edge Computing
CroSatFL cuts ground station communications by over 100x and transmission energy by 6x in satellite federated learning compared to baselines, while keeping competitive accuracy.
-
Task-Centric Personalized Federated Fine-Tuning of Language Models
FedRouter clusters adapters locally per task samples and globally across clients to create task-centric personalized models, improving generalization and reducing task interference in federated fine-tuning.
-
Embedding-Based Federated Learning with Runtime Governance for Iron Deficiency Prediction
Embedding-based federated learning with personalised aggregation and governance platform improves iron deficiency prediction from full blood count data across two non-IID real-world clinical sites.
-
M$^2$FedAQI: Multimodal Federated Learning for Air Quality Prediction on Heterogeneous Edge Devices
M²FedAQI is a lightweight multimodal federated framework that fuses visual and tabular data via feature modulation for improved AQI prediction and regression on heterogeneous edge devices.
-
Federated Medical Image Classification under Class and Domain Imbalance exploiting Synthetic Sample Generation
FedSSG generates and shares synthetic samples within a federated setup to reduce class imbalance and domain shift problems in medical image classification.
-
FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices
Fed-FSTQ reduces uplink traffic by 46x and improves time-to-accuracy by 52% in federated LLM fine-tuning using Fisher-guided token quantization and selection.
-
OpenCLAW-Nexus: A Self-Reinforcing Trust Framework for Byzantine-Resilient Decentralized Federated Learning
OpenCLAW-Nexus uses a single discounted Beta-reputation model to unify reputation-based node selection, Rep-FedAvg aggregation, and reputation-aware BFT consensus, achieving Byzantine resilience in decentralized FL wi...
-
Cooperate to Compete: Strategic Data Generation and Incentivization Framework for Coopetitive Cross-Silo Federated Learning
CoCoGen+ models each federated learning round as a weighted potential game with strategic synthetic data generation and payoff redistribution incentives, showing improved efficiency over baselines under non-IID data a...
-
Evaluating Differential Privacy Against Membership Inference in Federated Learning: Insights from the NIST Genomics Red Team Challenge
Stacking seven black-box estimators into a meta-classifier reveals persistent membership leakage in differentially private federated learning models at epsilon=200 on NIST genomics data, outperforming single-signal baselines.
-
FedRef: Bayesian Fine-Tuning using a Reference Model to Mitigate Catastrophic Forgetting for Heterogeneous Federated Learning
FedRef uses a temporally aggregated reference model and MAP regularization for server-side fine-tuning to reduce forgetting and drift in non-IID federated learning, showing better accuracy and lower client compute on ...
-
Understanding Communication Backends in Cross-Silo Federated Learning
Benchmarks of MPI, gRPC, and PyTorch RPC in cross-silo FL plus a new gRPC+S3 hybrid backend deliver up to 3.8x speedup for large-model transmission under realistic network conditions.
-
Automating aggregation strategy selection in federated learning
A framework automates federated learning aggregation strategy selection via LLM inference in single-trial mode and genetic search in multi-trial mode, improving robustness under non-IID data.
-
AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research
AI4EOSC is a federated cloud platform that integrates modular AI development, serverless AI-as-a-Service, and distributed orchestration with built-in FAIR metadata and provenance tracking for scientific AI workloads in EOSC.
-
Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings
Lightweight federated learning with frozen embeddings and MLP heads reaches competitive micro and macro F1 scores for ICD-9 and ICD-10 coding on MIMIC-IV, nearly matching centralized training.
-
A Comparative Study of Federated Learning Aggregation Strategies under Homogeneous and Heterogeneous Data Distributions
Federated aggregation strategies show distinct performance trade-offs in accuracy, loss, and efficiency depending on whether client data distributions are homogeneous or heterogeneous.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.