Flower: A Friendly Federated Learning Research Framework

Akhil Mathur; Daniel J. Beutel; Javier Fernandez-Marques; Kwing Hei Li; Lorenzo Sani; Nicholas D. Lane; Pedro Porto Buarque de Gusm\~ao; Taner Topal; Titouan Parcollet; Xinchi Qiu

arxiv: 2007.14390 · v5 · pith:PQMMFDXUnew · submitted 2020-07-28 · 💻 cs.LG · cs.CV· stat.ML

Flower: A Friendly Federated Learning Research Framework

Daniel J. Beutel , Taner Topal , Akhil Mathur , Xinchi Qiu , Javier Fernandez-Marques , Yan Gao , Lorenzo Sani , Kwing Hei Li

show 3 more authors

Titouan Parcollet Pedro Porto Buarque de Gusm\~ao Nicholas D. Lane

This is my paper

Pith reviewed 2026-05-22 14:01 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords federated learningmachine learning frameworklarge-scale simulationheterogeneous devicesedge computingdistributed training

0 comments

The pith

Flower is a federated learning framework that supports experiments with 15 million clients using only two high-end GPUs and allows seamless migration to real devices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Flower as a comprehensive framework for federated learning research. Existing platforms lack support for large-scale workloads and heterogeneous edge devices, limiting realistic study of FL. Flower addresses this by providing facilities to run simulations at massive scale and then transfer experiments directly to actual devices. A sympathetic reader would care because it makes it practical to examine both scale and systems heterogeneity without requiring enormous hardware resources.

Core claim

Flower provides new facilities to execute large-scale FL experiments and consider richly heterogeneous FL device scenarios. Our experiments show Flower can perform FL experiments up to 15M in client size using only a pair of high-end GPUs. Researchers can then seamlessly migrate experiments to real devices to examine other parts of the design space.

What carries the argument

The Flower framework, which supplies execution facilities for large-scale client simulations and heterogeneous device scenarios in federated learning.

Load-bearing premise

The simulation of heterogeneous device scenarios in Flower accurately represents the behavior of real edge hardware so that code and results transfer without major adjustments.

What would settle it

Execute the same federated learning workload first in Flower simulation and then on a set of real heterogeneous edge devices, then compare convergence speed, resource usage, and final model accuracy for discrepancies.

read the original abstract

Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn a shared prediction model, while keeping their training data on the device, thereby decoupling the ability to do machine learning from the need to store the data in the cloud. However, FL is difficult to implement realistically, both in terms of scale and systems heterogeneity. Although there are a number of research frameworks available to simulate FL algorithms, they do not support the study of scalable FL workloads on heterogeneous edge devices. In this paper, we present Flower -- a comprehensive FL framework that distinguishes itself from existing platforms by offering new facilities to execute large-scale FL experiments and consider richly heterogeneous FL device scenarios. Our experiments show Flower can perform FL experiments up to 15M in client size using only a pair of high-end GPUs. Researchers can then seamlessly migrate experiments to real devices to examine other parts of the design space. We believe Flower provides the community with a critical new tool for FL study and development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Flower offers a new FL framework with bold scale claims but the abstract alone leaves the methods and validation too thin to judge.

read the letter

Hi, Flower is a new federated learning framework that stands out for its reported ability to simulate experiments with up to 15 million clients on just two high-end GPUs, plus easy migration to real devices. That's the core pitch. It does address a real need: most existing FL simulators struggle with large scale and device heterogeneity, so a tool that handles both could help move research closer to practical edge deployments. The seamless migration part is particularly practical if it works as described. The soft spot is that we only have the abstract, which states the performance numbers but skips over methods, datasets, error bars, or any validation of the heterogeneity model. Without those, it's hard to know if the 15M figure holds up or if the simulation accurately captures real hardware behavior. The weakest link seems to be assuming the simulated heterogeneous scenarios will transfer without major tweaks. This paper is mainly for researchers in federated learning who need to test ideas at bigger scales than before. Someone building or evaluating FL systems might find it worth a look for the tooling. I think it deserves peer review. The idea has enough potential that getting detailed feedback on the implementation and experiments would be worthwhile, assuming the full paper fills in the gaps.

Referee Report

2 major / 1 minor

Summary. The paper introduces Flower, a federated learning research framework intended to address limitations in existing platforms by supporting large-scale FL experiments (up to 15M clients) on heterogeneous edge devices. It claims that such experiments can be run using only a pair of high-end GPUs, after which the same code can be seamlessly migrated to real devices to explore additional design-space aspects.

Significance. If the scalability and migration claims are substantiated with reproducible evidence, Flower would offer a practical new tool for the FL community to study workloads that combine extreme scale with device heterogeneity, areas where current simulation frameworks are reported to fall short.

major comments (2)

[Abstract] Abstract: the central claim that 'Flower can perform FL experiments up to 15M in client size using only a pair of high-end GPUs' is presented without any description of the experimental setup, including the FL algorithm used, dataset, communication model, client sampling strategy, or hardware configuration. This absence prevents verification of whether the result is load-bearing or dependent on unstated assumptions about homogeneity or ideal networking.
[Abstract] Abstract: the statement that researchers 'can then seamlessly migrate experiments to real devices' is asserted without any reported evidence, metrics, or discussion of the interface or adjustments required for the transition. This claim is load-bearing for the paper's positioning as a bridge between simulation and deployment.

minor comments (1)

[Abstract] Abstract: the phrase 'richly heterogeneous FL device scenarios' is used without elaboration on the dimensions of heterogeneity (compute, network, data distribution, availability) that the framework actually models.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their valuable feedback on our manuscript. We respond to the major comments point by point, being honest about the content available in the provided manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'Flower can perform FL experiments up to 15M in client size using only a pair of high-end GPUs' is presented without any description of the experimental setup, including the FL algorithm used, dataset, communication model, client sampling strategy, or hardware configuration. This absence prevents verification of whether the result is load-bearing or dependent on unstated assumptions about homogeneity or ideal networking.

Authors: We agree with the referee that the abstract does not include the experimental setup details. The provided manuscript text is limited to the abstract, so these specifics are not present here. In a revision, we will either expand the abstract slightly to include key aspects of the setup or ensure the full paper clearly cross-references the experiments demonstrating this scalability. revision: yes
Referee: [Abstract] Abstract: the statement that researchers 'can then seamlessly migrate experiments to real devices' is asserted without any reported evidence, metrics, or discussion of the interface or adjustments required for the transition. This claim is load-bearing for the paper's positioning as a bridge between simulation and deployment.

Authors: The referee correctly notes that the abstract asserts the seamless migration without providing evidence or details in the given text. Since only the abstract is available, we do not have the supporting discussion here. We will revise the manuscript to include a brief description or reference to the relevant section discussing the interface that enables this migration. revision: yes

standing simulated objections not resolved

The specific experimental setup for the 15M client simulation and the evidence/metrics for seamless migration to real devices, as these are not described in the provided abstract.

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a software framework paper rather than a mathematical derivation. The central claims concern tool capabilities for large-scale FL simulation (up to 15M clients on two GPUs) and seamless migration to real devices. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the available text. The paper is self-contained as a description of a research tool whose claims can be externally validated through code inspection and independent replication.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a systems/framework paper; the central claim rests on the existence and measured performance of the implemented software rather than on mathematical axioms or fitted parameters. No free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5720 in / 1059 out tokens · 27416 ms · 2026-05-22T14:01:54.230514+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Foundation.DimensionForcing dimension_forced unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our experiments show Flower can perform FL experiments up to 15M in client size using only a pair of high-end GPUs. Researchers can then seamlessly migrate experiments to real devices to examine other parts of the design space.
IndisputableMonolith.Foundation.LedgerCanonicality ZeroParameterComparisonLedger unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Flower -- a comprehensive FL framework that distinguishes itself from existing platforms by offering new facilities to execute large-scale FL experiments and consider richly heterogeneous FL device scenarios.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Typed Tensor Language for Federated Learning
cs.LG 2026-05 unverdicted novelty 7.0

A typed tensor language formalizes federated computations via virtual global tensor semantics and proves shared-state factorization for one-round and iterative programs, plus a differentiable fragment for gradient descent.
Federated Learning for Surgical Vision in Appendicitis Classification: Results of the FedSurg EndoVis 2024 Challenge
cs.CV 2025-10 conditional novelty 7.0

The FedSurg challenge benchmarks federated learning on appendectomy videos and finds only 26% F1 on unseen centers even with centralized data, plus extra penalties from decentralization, with spatiotemporal models per...
FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation
cs.LG 2025-06 conditional novelty 7.0

FeDa4Fair is a new library and benchmark for creating federated datasets with heterogeneous client-level biases to standardize evaluation of fairness methods in federated learning.
Beyond Assumptions: Measuring Federated Learning over Real 5G Networks
cs.NI 2025-04 accept novelty 7.0

Real 5G testbed experiments show consistent stragglers in 70% of federated learning trials due to communication delays, challenging common wireless FL assumptions.
Model Merging: Foundations and Algorithms
cs.LG 2026-05 unverdicted novelty 6.0

New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.
When To Adapt? Adapting the Model or Data in Federated Medical Imaging
cs.CV 2026-04 unverdicted novelty 6.0

Harmonization works better than personalization for appearance-based domain shifts in federated medical imaging while personalization is superior for structural shifts, with both performing similarly when shifts are small.
Robust Synchronisation for Federated Learning in The Face of Correlated Device Failure
cs.DC 2026-04 unverdicted novelty 6.0

AW-PSP dynamically weights node sampling by real-time availability predictions and failure correlations to improve robustness, label coverage, and fairness in federated learning under correlated device failures.
CroSatFL: Energy-Efficient Federated Learning with Cross-Aggregation for Satellite Edge Computing
cs.DC 2026-04 unverdicted novelty 6.0

CroSatFL cuts ground station communications by over 100x and transmission energy by 6x in satellite federated learning compared to baselines, while keeping competitive accuracy.
Task-Centric Personalized Federated Fine-Tuning of Language Models
cs.LG 2026-03 unverdicted novelty 6.0

FedRouter clusters adapters locally per task samples and globally across clients to create task-centric personalized models, improving generalization and reducing task interference in federated fine-tuning.
Embedding-Based Federated Learning with Runtime Governance for Iron Deficiency Prediction
cs.LG 2026-05 unverdicted novelty 5.0

Embedding-based federated learning with personalised aggregation and governance platform improves iron deficiency prediction from full blood count data across two non-IID real-world clinical sites.
M$^2$FedAQI: Multimodal Federated Learning for Air Quality Prediction on Heterogeneous Edge Devices
cs.LG 2026-05 unverdicted novelty 5.0

M²FedAQI is a lightweight multimodal federated framework that fuses visual and tabular data via feature modulation for improved AQI prediction and regression on heterogeneous edge devices.
Federated Medical Image Classification under Class and Domain Imbalance exploiting Synthetic Sample Generation
cs.CV 2026-04 unverdicted novelty 5.0

FedSSG generates and shares synthetic samples within a federated setup to reduce class imbalance and domain shift problems in medical image classification.
FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices
cs.LG 2026-04 unverdicted novelty 5.0

Fed-FSTQ reduces uplink traffic by 46x and improves time-to-accuracy by 52% in federated LLM fine-tuning using Fisher-guided token quantization and selection.
OpenCLAW-Nexus: A Self-Reinforcing Trust Framework for Byzantine-Resilient Decentralized Federated Learning
cs.NI 2026-04 unverdicted novelty 5.0

OpenCLAW-Nexus uses a single discounted Beta-reputation model to unify reputation-based node selection, Rep-FedAvg aggregation, and reputation-aware BFT consensus, achieving Byzantine resilience in decentralized FL wi...
Cooperate to Compete: Strategic Data Generation and Incentivization Framework for Coopetitive Cross-Silo Federated Learning
cs.AI 2026-04 unverdicted novelty 5.0

CoCoGen+ models each federated learning round as a weighted potential game with strategic synthetic data generation and payoff redistribution incentives, showing improved efficiency over baselines under non-IID data a...
Evaluating Differential Privacy Against Membership Inference in Federated Learning: Insights from the NIST Genomics Red Team Challenge
cs.CR 2026-04 unverdicted novelty 5.0

Stacking seven black-box estimators into a meta-classifier reveals persistent membership leakage in differentially private federated learning models at epsilon=200 on NIST genomics data, outperforming single-signal baselines.
FedRef: Bayesian Fine-Tuning using a Reference Model to Mitigate Catastrophic Forgetting for Heterogeneous Federated Learning
cs.LG 2025-06 unverdicted novelty 5.0

FedRef uses a temporally aggregated reference model and MAP regularization for server-side fine-tuning to reduce forgetting and drift in non-IID federated learning, showing better accuracy and lower client compute on ...
Understanding Communication Backends in Cross-Silo Federated Learning
cs.DC 2026-04 unverdicted novelty 4.0

Benchmarks of MPI, gRPC, and PyTorch RPC in cross-silo FL plus a new gRPC+S3 hybrid backend deliver up to 3.8x speedup for large-model transmission under realistic network conditions.
Automating aggregation strategy selection in federated learning
cs.LG 2026-04 unverdicted novelty 4.0

A framework automates federated learning aggregation strategy selection via LLM inference in single-trial mode and genetic search in multi-trial mode, improving robustness under non-IID data.
AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research
cs.DC 2025-12 unverdicted novelty 4.0

AI4EOSC is a federated cloud platform that integrates modular AI development, serverless AI-as-a-Service, and distributed orchestration with built-in FAIR metadata and provenance tracking for scientific AI workloads in EOSC.
Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings
cs.IR 2025-07 unverdicted novelty 4.0

Lightweight federated learning with frozen embeddings and MLP heads reaches competitive micro and macro F1 scores for ICD-9 and ICD-10 coding on MIMIC-IV, nearly matching centralized training.
A Comparative Study of Federated Learning Aggregation Strategies under Homogeneous and Heterogeneous Data Distributions
cs.LG 2026-05 unverdicted novelty 2.0

Federated aggregation strategies show distinct performance trade-offs in accuracy, loss, and efficiency depending on whether client data distributions are homogeneous or heterogeneous.