Incentive-Aware Federated Averaging with Performance Guarantees under Strategic Participation

Farzad Yousefian; Fateme Maleki; Krishnan Raghavan

arxiv: 2603.20873 · v2 · pith:BERBZNEBnew · submitted 2026-03-21 · 💻 cs.LG · math.OC

Incentive-Aware Federated Averaging with Performance Guarantees under Strategic Participation

Fateme Maleki , Krishnan Raghavan , Farzad Yousefian This is my paper

Pith reviewed 2026-05-21 10:41 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords federated learningincentive-awareNash equilibriumstrategic participationfederated averagingperformance guaranteeswelfare loss minimizationmonotone games

0 comments

The pith

An incentive-aware federated averaging method uses Nash equilibrium-seeking updates on dataset sizes to handle strategic client participation while providing performance guarantees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method for federated learning where clients send both their model parameters and updated dataset sizes to the server at each round. These dataset sizes are adjusted using a Nash equilibrium-seeking update rule to model how agents strategically balance learning benefits against data contribution costs. The approach is analyzed for both convex and nonconvex objective functions, establishing performance guarantees for the global model. Under a merely monotone game setting, a welfare loss minimization framework is introduced that converges asymptotically. This matters because standard federated learning assumes cooperative agents, but real participants may reduce their data contributions unless incentives align.

Core claim

The central discovery is an incentive-aware federated averaging algorithm in which dataset sizes are dynamically adjusted via a Nash equilibrium-seeking update rule that captures strategic data participation. The method establishes performance guarantees under convex and nonconvex global objective settings, and under merely monotone games it achieves asymptotic convergence of the welfare loss minimization scheme.

What carries the argument

The Nash equilibrium-seeking update rule on training dataset sizes, which dynamically adjusts client contributions at each communication round to reflect strategic participation.

If this is right

Clients achieve competitive global model performance while converging to stable data participation strategies.
Performance guarantees hold for the incentive-aware FL algorithm in both convex and nonconvex settings.
Asymptotic convergence is established for the welfare loss minimization framework under merely monotone game settings.
Numerical results on MNIST and CIFAR-10 datasets support the theoretical findings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the Nash equilibrium update converges faster than standard participation, it could lower overall communication costs in large-scale deployments.
The framework could be adapted to settings where agents have heterogeneous costs not captured by dataset size alone.
Strategic participation models like this may help design FL systems that maintain participation rates above a critical threshold for effective learning.

Load-bearing premise

Strategic data participation can be accurately modeled and stabilized by a Nash equilibrium-seeking update rule on dataset sizes, assuming the underlying game is merely monotone for the welfare loss convergence.

What would settle it

Observing that dataset sizes fail to stabilize at a Nash equilibrium in repeated experiments with rational agents, or that the global model performance degrades below standard federated averaging without the incentive mechanism.

Figures

Figures reproduced from arXiv: 2603.20873 by Farzad Yousefian, Fateme Maleki, Krishnan Raghavan.

**Figure 1.** Figure 1: MNIST. Left: Global cross-entropy loss across different local steps H. Right: Convergence of client contributions Ni,r toward the Nash equilibrium under IncentFedAvg [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗

**Figure 2.** Figure 2: CIFAR-10. Left: Global cross-entropy loss across different local steps H. Right: Convergence of client contributions Ni,r toward the Nash equilibrium under IncentFedAvg. The local loss for each client i is defined by the cross-entropy E = − 1 J PJ j=1 PC c=1 vjc log(v ′ jc), where v ′ jc is the predicted probability derived from the softmax output of the pre-activation a (2) c . We employ the random discov… view at source ↗

read the original abstract

Federated learning (FL) is a communication-efficient collaborative learning framework that enables model training across multiple agents with private local datasets. While the benefits of FL in improving global model performance are well established, individual agents may behave strategically, balancing the learning payoff against the cost of contributing their local data. Motivated by the need for FL frameworks that successfully retain participating agents, we propose an incentive-aware federated averaging method in which, at each communication round, clients transmit both their local model parameters and their updated training dataset sizes to the server. The dataset sizes are dynamically adjusted via a Nash equilibrium (NE)-seeking update rule that captures strategic data participation. We analyze the proposed method under convex and nonconvex global objective settings and establish performance guarantees for the resulting incentive-aware FL algorithm. Furthermore, under a merely monotone game setting, we consider a welfare loss minimization framework and establish asymptotic convergence of the scheme. Numerical experiments on the MNIST and CIFAR-10 datasets demonstrate that agents achieve competitive global model performance while converging to stable data participation strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds NE-seeking dataset size updates to FedAvg for strategic agents and claims convergence under monotone games, but the monotonicity looks assumed rather than derived from the payoff structure.

read the letter

The core idea is to extend federated averaging so that clients send both model parameters and updated dataset sizes, with the sizes adjusted each round by a Nash equilibrium seeking rule meant to capture strategic participation. They give performance guarantees for convex and nonconvex global objectives and show asymptotic convergence of a welfare-loss scheme when the game is merely monotone. Experiments on MNIST and CIFAR-10 are included to illustrate that agents settle into stable participation levels while the global model stays competitive.

Referee Report

2 major / 2 minor

Summary. The paper proposes an incentive-aware federated averaging algorithm in which clients transmit both local model parameters and dynamically updated training dataset sizes at each round. Dataset sizes are adjusted via a Nash equilibrium-seeking update rule modeling strategic participation. Performance guarantees are established for the resulting algorithm under convex and nonconvex global objectives, and asymptotic convergence of a welfare-loss minimization framework is shown under the assumption of a merely monotone game. Experiments on MNIST and CIFAR-10 illustrate competitive global model performance and convergence to stable participation strategies.

Significance. If the results hold, the work addresses a practical gap in federated learning by incorporating strategic incentives to retain agents, combining NE-seeking dynamics with FedAvg-style averaging. This could inform real-world FL systems where data contribution incurs costs, and the monotone-game convergence result offers a pathway for analyzing incentive-compatible participation.

major comments (2)

[§4.2] §4.2 (Analysis under monotone games): The asymptotic convergence of the welfare-loss minimization scheme is established under the assumption that the induced game is merely monotone, yet the manuscript does not derive the required pseudo-gradient monotonicity condition from the agents' utilities (learning payoff minus participation cost) or from the specific NE-seeking update rule on dataset sizes. If the inner-product condition fails for the chosen payoff and cost functions, both the NE-seeking dynamics and the convergence guarantee cease to apply.
[Theorem 3] Theorem 3 (Performance guarantees): The error bounds for the incentive-aware FedAvg in the nonconvex case adapt standard FedAvg analysis to time-varying dataset sizes, but do not explicitly incorporate the additional error introduced by the dataset-size dynamics; the bounds therefore rest on an implicit Lipschitz or bounded-variation assumption on the size updates that is not stated or verified.

minor comments (2)

[Abstract] The term 'merely monotone' is used in the abstract and §4 without a self-contained definition or reference to the precise pseudo-gradient condition; adding a short paragraph or citation would improve accessibility.
[§5] In the experimental section, the simulation of strategic behavior (cost parameters, initial dataset sizes, and how the NE-seeking rule is discretized) is only sketched; explicit values or pseudocode would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our manuscript. We address each major comment below and describe the revisions we plan to incorporate.

read point-by-point responses

Referee: [§4.2] §4.2 (Analysis under monotone games): The asymptotic convergence of the welfare-loss minimization scheme is established under the assumption that the induced game is merely monotone, yet the manuscript does not derive the required pseudo-gradient monotonicity condition from the agents' utilities (learning payoff minus participation cost) or from the specific NE-seeking update rule on dataset sizes. If the inner-product condition fails for the chosen payoff and cost functions, both the NE-seeking dynamics and the convergence guarantee cease to apply.

Authors: We agree that the current manuscript does not explicitly derive the pseudo-gradient monotonicity condition from the agents' utility functions or the NE-seeking update rule. In the revised version we will add a dedicated lemma (or appendix subsection) that verifies the inner-product monotonicity condition holds for the specific learning-payoff and participation-cost functions employed in the paper. This will directly justify the application of the asymptotic convergence result for the welfare-loss minimization framework. revision: yes
Referee: [Theorem 3] Theorem 3 (Performance guarantees): The error bounds for the incentive-aware FedAvg in the nonconvex case adapt standard FedAvg analysis to time-varying dataset sizes, but do not explicitly incorporate the additional error introduced by the dataset-size dynamics; the bounds therefore rest on an implicit Lipschitz or bounded-variation assumption on the size updates that is not stated or verified.

Authors: The referee is correct that the nonconvex error bounds in Theorem 3 adapt the standard FedAvg analysis without an explicit accounting of the variation induced by the dataset-size dynamics. We will revise the statement and proof of Theorem 3 to include an additional error term that captures the bounded variation of the size updates, and we will add an explicit assumption (together with a short verification) that the NE-seeking rule keeps the dataset sizes within a compact set, thereby making the bounded-variation condition rigorous. revision: yes

Circularity Check

0 steps flagged

No circularity: analysis rests on explicit assumptions and standard techniques

full rationale

The paper introduces an incentive-aware FedAvg variant that augments standard averaging with a separate NE-seeking update on dataset sizes. Performance guarantees are stated under convexity/non-convexity of the global objective and, separately, under the explicit assumption that the induced participation game is merely monotone. These are declared modeling choices rather than quantities fitted from the same data or defined in terms of the target convergence result. No equation reduces the claimed convergence or welfare-loss bound to a self-referential fit, and no load-bearing step is justified solely by a self-citation whose content is itself unverified. The derivation chain therefore remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on modeling strategic participation as a game whose equilibrium is found by an iterative update rule and on a monotonicity assumption to obtain convergence; no free parameters or new entities are mentioned in the abstract.

axioms (1)

domain assumption The participation game is merely monotone
Invoked to establish asymptotic convergence of the welfare loss minimization scheme under the NE-seeking update.

pith-pipeline@v0.9.0 · 5714 in / 1257 out tokens · 52721 ms · 2026-05-21T10:41:21.714751+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under Assumption 2.2, the mapping F(N) ≜ (Fi(N))mi=1 is µF-strongly monotone and LF-Lipschitz continuous, where Fi(N) ≜ ∇Ni li(N).
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We establish asymptotic convergence of the scheme under a merely monotone game setting.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial Intelligence and Statistics. PMLR, 2017, pp. 1273–1282

work page 2017
[2]

Advances and open problems in federated learning,

P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummingset al., “Advances and open problems in federated learning,”Foundations and Trends in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021

work page 2021
[3]

Federated optimization in heterogeneous networks,

T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” inProceedings of Machine Learning and Systems (MLSys), 2020

work page 2020
[4]

Scaffold: Stochastic controlled averaging for federated learning,

S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,”International Conference on Machine Learning, pp. 5132–5143, 2020

work page 2020
[5]

Local sgd converges fast and communicates little,

S. U. Stich, “Local sgd converges fast and communicates little,”International Conference on Learning Representations (ICLR), 2019

work page 2019
[6]

One for one, or all for all: Equilibria and optimality of collaboration in federated learning,

A. Blum, N. Haghtalab, R. L. Phillips, and H. Shao, “One for one, or all for all: Equilibria and optimality of collaboration in federated learning,” inInternational Conference on Machine Learning. PMLR, 2021, pp. 1005–1014

work page 2021
[7]

Incentives in federated learning: Equilibria, dynamics, and mechanisms for welfare maximization,

A. Murhekar, Z. Yuan, B. Ray Chaudhury, B. Li, and R. Mehta, “Incentives in federated learning: Equilibria, dynamics, and mechanisms for welfare maximization,”Advances in Neural Information Processing Systems, vol. 36, pp. 17811–17831, 2023

work page 2023
[8]

Optimal auction design,

R. B. Myerson, “Optimal auction design,”Mathematics of operations research, vol. 6, no. 1, pp. 58–73, 1981

work page 1981
[9]

Gaining competitive advantage through customer value oriented management,

F. Huber, A. Herrmann, and R. E. Morgan, “Gaining competitive advantage through customer value oriented management,”Journal of consumer marketing, vol. 18, no. 1, pp. 41–53, 2001

work page 2001
[10]

Incentive analysis for agent participation in federated learning,

L. Yi, X. Niu, and E. Wei, “Incentive analysis for agent participation in federated learning,” in 2025 IEEE 64th Conference on Decision and Control (CDC). IEEE, 2025, pp. 6346–6351

work page 2025
[11]

Incentivizing data collaboration: A mechanism design approach,

S. Alaei, A. Daei Naby, A. Makhdoumi, and A. Malekian, “Incentivizing data collaboration: A mechanism design approach,” inProceedings of the Algorithmic Collective Action Workshop at NeurIPS 2025, 2025, presented at NeurIPS 2025 Workshop. [Online]. Available: https://neurips.cc/virtual/2025/129467

work page 2025
[12]

Incentive mechanism design for unbiased federated learning with randomized client participation,

B. Luo, Y. Feng, S. Wang, J. Huang, and L. Tassiulas, “Incentive mechanism design for unbiased federated learning with randomized client participation,” inProceedings of the International Conference on Distributed Computing Systems (ICDCS), 2023, pp. 545–555

work page 2023
[13]

Fedtoken: Tokenized incentives for data contribution in federated learning,

S. R. Pandey, L. D. Nguyen, and P. Popovski, “Fedtoken: Tokenized incentives for data contribution in federated learning,” 2022, workshop on Federated Learning: Recent Advances and New Challenges (FL-NeurIPS’22). [Online]. Available: https://arxiv.org/abs/2209.09775

work page arXiv 2022
[14]

A trusted federated incentive mechanism based on blockchain for 6g network data security,

Y. Luo, B. Gong, H. Zhu, and C. Guo, “A trusted federated incentive mechanism based on blockchain for 6g network data security,”Applied Sciences, vol. 13, no. 19, p. 10586, 2023

work page 2023
[15]

Facchinei and J.-S

F. Facchinei and J.-S. Pang,Finite-dimensional variational inequalities and complementarity problems. Springer, 2003

work page 2003
[16]

Modeling the cost of information sharing in collaborative systems,

Z. Li and S. Raghunathan, “Modeling the cost of information sharing in collaborative systems,” Journal of Management Information Systems, vol. 31, no. 4, pp. 123–150, 2014

work page 2014
[17]

Data sharing and the cost of privacy,

K. C. Laudon, “Data sharing and the cost of privacy,”Communications of the ACM, vol. 39, no. 7, pp. 40–47, 1996

work page 1996
[18]

Information sharing and organizational performance,

A. Jaisinghet al., “Information sharing and organizational performance,”Journal of Strategic Information Systems, vol. 17, no. 2, pp. 103–120, 2008. APPENDIX 6.1 Supplementary material Remark 6.1.If Assumption 2.1 (i) holds, then the global loss functionf is L-smooth. This is because for anyx,y∈Rn we have ∥∇f(x)−∇f(y)∥=∥∑m i=1p∗ i (∇fi(x)−∇fi(y)∥ ≤∑m i=1∥...

work page 2008
[19]

Lemma 6.9.Consider Algorithm 1

This completes the proof. Lemma 6.9.Consider Algorithm 1. Let Assumptions 2.2 and 4.7 hold andγ≤1√ 3HL. Then, for any communication round0≤r≤R−1, we have ∑Tr+1−1 k=Tr E[¯ek|Fk] ≤18mLγ2B2H2∑Tr+1−1 t=Tr+1 E[Df(¯xt,x∗)] + 9γ2 ( ν2 +mG 2 ) H3. Proof. Consider Lemma4.11 (i). Invoking Lemma 6.5 and usingE[f(¯xk)−f∗] = E[Df(¯xk,x∗)], we may write E[¯ek]≤γ2(H−1)∑...

work page

[1] [1]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial Intelligence and Statistics. PMLR, 2017, pp. 1273–1282

work page 2017

[2] [2]

Advances and open problems in federated learning,

P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummingset al., “Advances and open problems in federated learning,”Foundations and Trends in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021

work page 2021

[3] [3]

Federated optimization in heterogeneous networks,

T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” inProceedings of Machine Learning and Systems (MLSys), 2020

work page 2020

[4] [4]

Scaffold: Stochastic controlled averaging for federated learning,

S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,”International Conference on Machine Learning, pp. 5132–5143, 2020

work page 2020

[5] [5]

Local sgd converges fast and communicates little,

S. U. Stich, “Local sgd converges fast and communicates little,”International Conference on Learning Representations (ICLR), 2019

work page 2019

[6] [6]

One for one, or all for all: Equilibria and optimality of collaboration in federated learning,

A. Blum, N. Haghtalab, R. L. Phillips, and H. Shao, “One for one, or all for all: Equilibria and optimality of collaboration in federated learning,” inInternational Conference on Machine Learning. PMLR, 2021, pp. 1005–1014

work page 2021

[7] [7]

Incentives in federated learning: Equilibria, dynamics, and mechanisms for welfare maximization,

A. Murhekar, Z. Yuan, B. Ray Chaudhury, B. Li, and R. Mehta, “Incentives in federated learning: Equilibria, dynamics, and mechanisms for welfare maximization,”Advances in Neural Information Processing Systems, vol. 36, pp. 17811–17831, 2023

work page 2023

[8] [8]

Optimal auction design,

R. B. Myerson, “Optimal auction design,”Mathematics of operations research, vol. 6, no. 1, pp. 58–73, 1981

work page 1981

[9] [9]

Gaining competitive advantage through customer value oriented management,

F. Huber, A. Herrmann, and R. E. Morgan, “Gaining competitive advantage through customer value oriented management,”Journal of consumer marketing, vol. 18, no. 1, pp. 41–53, 2001

work page 2001

[10] [10]

Incentive analysis for agent participation in federated learning,

L. Yi, X. Niu, and E. Wei, “Incentive analysis for agent participation in federated learning,” in 2025 IEEE 64th Conference on Decision and Control (CDC). IEEE, 2025, pp. 6346–6351

work page 2025

[11] [11]

Incentivizing data collaboration: A mechanism design approach,

S. Alaei, A. Daei Naby, A. Makhdoumi, and A. Malekian, “Incentivizing data collaboration: A mechanism design approach,” inProceedings of the Algorithmic Collective Action Workshop at NeurIPS 2025, 2025, presented at NeurIPS 2025 Workshop. [Online]. Available: https://neurips.cc/virtual/2025/129467

work page 2025

[12] [12]

Incentive mechanism design for unbiased federated learning with randomized client participation,

B. Luo, Y. Feng, S. Wang, J. Huang, and L. Tassiulas, “Incentive mechanism design for unbiased federated learning with randomized client participation,” inProceedings of the International Conference on Distributed Computing Systems (ICDCS), 2023, pp. 545–555

work page 2023

[13] [13]

Fedtoken: Tokenized incentives for data contribution in federated learning,

S. R. Pandey, L. D. Nguyen, and P. Popovski, “Fedtoken: Tokenized incentives for data contribution in federated learning,” 2022, workshop on Federated Learning: Recent Advances and New Challenges (FL-NeurIPS’22). [Online]. Available: https://arxiv.org/abs/2209.09775

work page arXiv 2022

[14] [14]

A trusted federated incentive mechanism based on blockchain for 6g network data security,

Y. Luo, B. Gong, H. Zhu, and C. Guo, “A trusted federated incentive mechanism based on blockchain for 6g network data security,”Applied Sciences, vol. 13, no. 19, p. 10586, 2023

work page 2023

[15] [15]

Facchinei and J.-S

F. Facchinei and J.-S. Pang,Finite-dimensional variational inequalities and complementarity problems. Springer, 2003

work page 2003

[16] [16]

Modeling the cost of information sharing in collaborative systems,

Z. Li and S. Raghunathan, “Modeling the cost of information sharing in collaborative systems,” Journal of Management Information Systems, vol. 31, no. 4, pp. 123–150, 2014

work page 2014

[17] [17]

Data sharing and the cost of privacy,

K. C. Laudon, “Data sharing and the cost of privacy,”Communications of the ACM, vol. 39, no. 7, pp. 40–47, 1996

work page 1996

[18] [18]

Information sharing and organizational performance,

A. Jaisinghet al., “Information sharing and organizational performance,”Journal of Strategic Information Systems, vol. 17, no. 2, pp. 103–120, 2008. APPENDIX 6.1 Supplementary material Remark 6.1.If Assumption 2.1 (i) holds, then the global loss functionf is L-smooth. This is because for anyx,y∈Rn we have ∥∇f(x)−∇f(y)∥=∥∑m i=1p∗ i (∇fi(x)−∇fi(y)∥ ≤∑m i=1∥...

work page 2008

[19] [19]

Lemma 6.9.Consider Algorithm 1

This completes the proof. Lemma 6.9.Consider Algorithm 1. Let Assumptions 2.2 and 4.7 hold andγ≤1√ 3HL. Then, for any communication round0≤r≤R−1, we have ∑Tr+1−1 k=Tr E[¯ek|Fk] ≤18mLγ2B2H2∑Tr+1−1 t=Tr+1 E[Df(¯xt,x∗)] + 9γ2 ( ν2 +mG 2 ) H3. Proof. Consider Lemma4.11 (i). Invoking Lemma 6.5 and usingE[f(¯xk)−f∗] = E[Df(¯xk,x∗)], we may write E[¯ek]≤γ2(H−1)∑...

work page