arxiv: 2602.08617 · v2 · submitted 2026-02-09 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

ERIS: Enhancing Privacy and Scalability in Federated Learning via Federated Shard Aggregation

Dario Fenoglio , Pasquale Polverino , Jacopo Quizi , Martin Gjoreski , Akash Dhasade , Marc Langheinrich

Authors on Pith no claims yet

Pith reviewed 2026-05-16 05:39 UTC · model grok-4.3

classification 💻 cs.LG

keywords federated learningshard aggregationprivacyscalabilitymutual informationmembership inferencedistributed aggregationmodel compression

0 comments

The pith

ERIS partitions each client update into shards aggregated by multiple other clients to match the centralized update while bounding leakage to the visible fraction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

ERIS introduces Federated Shard Aggregation so that each client's model update is split into non-overlapping shards whose partial sums are computed by several client-side aggregators rather than a single server. The shards are reassembled at the end to produce exactly the same global update that standard FedAvg would have produced. Because any single aggregator sees only a fraction of the coordinates, mutual information leakage is bounded by that fraction and shrinks further when more aggregators are used or when Distributed Shifted Compression is applied. Experiments on image classification and large language model tasks show that the approach reaches the same final accuracy as FedAvg while lowering total communication volume and raising resistance to membership-inference and reconstruction attacks.

Core claim

The central mechanism is Federated Shard Aggregation: each client update is partitioned into non-overlapping shards whose aggregation is distributed across multiple client-side aggregators; after reassembly the resulting vector is identical to the direct centralized aggregate. This removes the single-point aggregation bottleneck, limits the information visible to any observer, and yields a convergence guarantee identical to standard federated averaging under the usual assumptions. When combined with Distributed Shifted Compression the same equivalence holds while transmitted payloads and exposed coordinates are further reduced.

What carries the argument

Federated Shard Aggregation (FSA), which splits each client update into non-overlapping shards, distributes their aggregation across multiple client-side aggregators, and reassembles the shards to recover the exact centralized update.

If this is right

The final global model is mathematically identical to the one obtained by standard centralized aggregation.
Mutual information leakage to any single aggregator is bounded by the fraction of shards it observes and decreases with more aggregators.
Adding Distributed Shifted Compression further reduces communication while preserving the same leakage bound.
Empirical robustness to membership inference and reconstruction attacks improves without accuracy loss.
The central server no longer performs full aggregation, removing a communication and compute bottleneck.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sharding idea could be applied to aggregation rules other than simple averaging.
Coordination overhead for selecting and synchronizing the client-side aggregators may become noticeable in very large or highly dynamic networks.
Shard boundaries could interact with client data heterogeneity and might require adaptive partitioning to avoid uneven load.
The approach opens a path to hybrid systems in which some shards remain visible to the server while others stay distributed.

Load-bearing premise

Reassembly after distributed shard aggregation produces exactly the same vector as direct centralized aggregation.

What would settle it

A side-by-side run in which the reassembled model differs from the direct FedAvg model in any coordinate or converges to a measurably different accuracy would falsify the equivalence claim.

Figures

Figures reproduced from arXiv: 2602.08617 by Akash Dhasade, Dario Fenoglio, Jacopo Quizi, Marc Langheinrich, Martin Gjoreski, Pasquale Polverino.

**Figure 1.** Figure 1: Illustration of ERIS at training round t for two aggregators (A = 2). Left: each client performs shifted compression and model partitioning, generating shards v t k,(a) sent to aggregators C2 and Ck−1. Right: each aggregator collects and aggregates the corresponding shards across clients to produce partial updated models x t+1 (a) , which are then sent back to the clients. subset of parameters. As a result… view at source ↗

**Figure 3.** Figure 3: (right) shows the impact of the compression constant ω with A= 50 fixed: stronger compression (higher ω, i.e., lower retention probability p) steadily drives MIA accuracy toward the idealized minimum-leakage baseline. These results empirically validate Theorem 3.7, underscoring the role of shifted compression in reducing MIA risk, while Appendix F.3 quantifies its effect on model utility. For DRA, 250 5… view at source ↗

**Figure 2.** Figure 2: Comparison of test accuracy and MIA accuracy across varying model capacities (one per dataset) and client-side overfitting levels, controlled via the number of training samples per client. 6 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 4.** Figure 4: Utility–privacy trade-off on CIFAR-10 under varying strengths of the privacy-preserving mechanisms. shifted compression, which reduces transmitted parameters by orders of magnitude without harming convergence (measured by Exchanged); and (ii) decentralized aggregation, which balances network load and removes the server bottleneck (measured by Dist. Time). However, unlike prior decentralized learning meth… view at source ↗

**Figure 5.** Figure 5: Impact of honest-but-curious client collusion in ERIS. Empirical Validation To assess the robustness of ERIS against coordinated leakage attempts, we evaluate how MIA accuracy evolves as multiple honest-but-curious clients collude by sharing their received shards [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗

**Figure 6.** Figure 6: Conditional weight distributions (x t+1 k,i |Dk, Ht) over training rounds for DistilBERT, ResNet-9, and LeNet-5. Each 3D plot shows the distribution of weight values (horizontal axis) over time (depth axis), with frequency represented on the vertical axis. In all cases, the weight distributions remain ∼ N (0, σcond) with a σcond < 0.2, validating the sub-Gaussian premise used in used Remark 3.8. F.2. Scala… view at source ↗

**Figure 7.** Figure 7: Minimum distribution time for a single training round for FedAvg, PriPrune, SoteriaFL, Ako, Shatter, and ERIS. The figure shows the minimum distribution time for a single round with M = 320 Mbit on a logarithmic scale (left), and the minimum distribution time for a single round with 50 training clients as the model size increases on a logarithmic scale (right). Communication efficiency. In addition to the … view at source ↗

**Figure 8.** Figure 8: Effect of shifted compression on CIFAR-10 test accuracy, varying ω across different local training sample sizes. This section provides additional results complementing the analysis in Paragraph 4.2 [PITH_FULL_IMAGE:figures/full_fig_p034_8.png] view at source ↗

**Figure 9.** Figure 9: reports the effect of increasing dropout rates on both (i) test accuracy and (ii) the best validation round at which the model reaches its peak performance (i.e., the minimum validation loss). The results show that ERIS maintains nearly constant test accuracy up to a dropout rate of 70%. This robustness stems from the fact that each aggregator is responsible for only a disjoint shard of the model; losing a… view at source ↗

**Figure 10.** Figure 10: Reconstruction quality under DLG, iDLG, and ROG attacks as a function of the percentage of model parameters available to the attacker. The LPIPS score (higher is better) is averaged over 200 samples. The x-axis is plotted on a non-linear scale for improved clarity of the low-percentage regime. Shaded regions highlight the obfuscation achieved by ERIS, which renders reconstruction attacks ineffective even … view at source ↗

**Figure 11.** Figure 11: Comparison of test accuracy and MIA accuracy across varying model sizes and client-side overfitting levels, controlled via the number of client training samples under non-IID setting. F.8. Balancing Utility and Privacy - Biased Gradient Estimator In this section, we extend our analysis of the utility–privacy trade-off to the biased setting (already adopted for CNN/DailyMail dataset), where each client per… view at source ↗

**Figure 13.** Figure 13: Utility–privacy trade-off on CIFAR-10 under varying strengths of the privacy-preserving mechanisms. Each subplot shows test accuracy vs. MIA accuracy for methods with different client training samples. The Pareto front represents a set of optimal trade-off points. 42 [PITH_FULL_IMAGE:figures/full_fig_p042_13.png] view at source ↗

read the original abstract

Scaling Federated Learning (FL) to billion-parameter models forces a challenging trade-off between privacy, scalability, and model utility. Existing solutions often tackle these challenges in isolation, sacrificing accuracy, relying on costly cryptographic tools, or introducing communication and optimization inefficiencies that affect convergence. We introduce ERIS, an FL framework centered on Federated Shard Aggregation (FSA), a novel mechanism that partitions each client update into non-overlapping shards whose aggregation is distributed across multiple client-side aggregators. FSA removes the central aggregation bottleneck, limits the information visible to any single observer, and preserves the centralized FL update after reassembly. ERIS can further readily integrate Distributed Shifted Compression (DSC) to reduce transmitted payloads and exposed coordinates. We prove that ERIS preserves convergence under standard assumptions and bounds mutual information leakage by the observable fraction of each update, decreasing with the number of client-side aggregators, and with the compression level when DSC is enabled. Experiments across image and text tasks, including large language models, show that ERIS achieves FedAvg-level utility while substantially reducing communication bottlenecks and improving robustness to membership inference and reconstruction attacks, without relying on heavy cryptography or utility-degrading perturbations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ERIS shards updates across client-side aggregators to cut central load and visible info while claiming exact FedAvg equivalence after reassembly, with experiments showing preserved utility on LLMs.

read the letter

ERIS centers on Federated Shard Aggregation: each client update gets split into non-overlapping shards, those shards get aggregated by multiple client-side nodes instead of one central server, and the pieces get put back together to match the standard centralized result. The paper also shows easy integration with Distributed Shifted Compression to shrink payloads. This setup reduces the central bottleneck and limits what any one observer sees, without crypto or noise that hurts accuracy. Experiments on image tasks and large language models report FedAvg-level performance plus better resistance to membership inference and reconstruction attacks, with lower communication volume as the number of shard aggregators grows. The convergence proof and mutual-information bound are tied directly to the observable fraction of each update, which shrinks with more aggregators and with compression. That framing is clean and avoids self-referential fitting. The reassembly step is the part that needs the closest look. The abstract states that distributed sums reassemble without loss or scaling artifacts, but any floating-point discrepancy or ordering difference in practice would break the direct application of standard FL convergence results. The attack evaluations also appear to use standard threat models, but more detail on how the shard distribution affects reconstruction success rates would strengthen the privacy claims. Overall the construction is straightforward to implement on top of existing FL codebases. This paper is for researchers and engineers working on scaling private federated training to billion-parameter models on edge hardware. It gives a practical alternative to full cryptographic solutions or utility-damaging perturbations. The core idea is worth a serious referee process because the mechanism is testable, the assumptions are explicit, and the reported gains address a real deployment tension.

Referee Report

2 major / 2 minor

Summary. The paper introduces ERIS, a federated learning framework built around Federated Shard Aggregation (FSA). FSA partitions each client's model update into non-overlapping shards whose aggregation is distributed across multiple client-side aggregators; the shards are later reassembled to recover the original centralized update. The framework optionally incorporates Distributed Shifted Compression (DSC) to shrink payloads. The authors prove that ERIS preserves convergence under standard FL assumptions and that mutual-information leakage is bounded by the observable fraction of each update (which decreases with the number of aggregators and with compression). Experiments on image classification, text tasks, and large language models report FedAvg-level accuracy together with reduced communication volume and improved resistance to membership-inference and reconstruction attacks.

Significance. If the reassembly equivalence and the stated leakage bound hold, ERIS would offer a lightweight, cryptography-free route to simultaneously improving scalability, communication efficiency, and privacy in federated learning. The combination of a convergence proof, an explicit information-theoretic bound, and empirical results on LLMs would constitute a concrete advance over prior shard- or compression-based FL methods that typically trade utility for privacy.

major comments (2)

[convergence proof / Theorem 1] The central convergence claim rests on the assertion that reassembly after distributed shard aggregation produces exactly the same update as direct central aggregation. The abstract states this preservation occurs, but the equivalence is not automatic from non-overlapping shards; any floating-point discrepancy, ordering artifact, or scaling introduced during distribution or reassembly would invalidate direct application of standard FL convergence theorems. The proof section must therefore contain an explicit lemma or proposition showing that the reassembled vector equals the original aggregated vector with probability 1 (or up to machine precision).
[information-leakage analysis] The mutual-information leakage bound is expressed directly in terms of the observable fraction of the update. Because this fraction is a design parameter (number of aggregators and compression ratio), the bound is essentially tautological once the observable set is fixed. The paper should clarify whether the bound is information-theoretic (accounting for the adversary's view of the shard-aggregation protocol) or merely a restatement of the design choice, and whether it remains valid when DSC is enabled.

minor comments (2)

[Section 3] Notation for shard indices, aggregator assignment, and the reassembly operator should be introduced once and used consistently; the current description mixes “shard,” “partition,” and “coordinate” without a single definition table.
[Experiments] The experimental section reports “FedAvg-level utility” but does not tabulate the exact accuracy deltas or communication-volume reductions for each task; a single summary table comparing FedAvg, ERIS, and ERIS+DSC on all benchmarks would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the presentation of the convergence proof and information-leakage analysis.

read point-by-point responses

Referee: [convergence proof / Theorem 1] The central convergence claim rests on the assertion that reassembly after distributed shard aggregation produces exactly the same update as direct central aggregation. The abstract states this preservation occurs, but the equivalence is not automatic from non-overlapping shards; any floating-point discrepancy, ordering artifact, or scaling introduced during distribution or reassembly would invalidate direct application of standard FL convergence theorems. The proof section must therefore contain an explicit lemma or proposition showing that the reassembled vector equals the original aggregated vector with probability 1 (or up to machine precision).

Authors: We agree that an explicit lemma is required for rigor. The equivalence follows from the non-overlapping partition and the linearity of the aggregation operator (sum or average), which is performed identically whether centralized or distributed across shards. To address floating-point and ordering concerns, we will add Lemma 1 in the proof section proving that the reassembled vector equals the centrally aggregated vector up to machine precision under deterministic shard assignment and no scaling. This lemma will justify direct application of standard FL convergence results to ERIS. revision: yes
Referee: [information-leakage analysis] The mutual-information leakage bound is expressed directly in terms of the observable fraction of the update. Because this fraction is a design parameter (number of aggregators and compression ratio), the bound is essentially tautological once the observable set is fixed. The paper should clarify whether the bound is information-theoretic (accounting for the adversary's view of the shard-aggregation protocol) or merely a restatement of the design choice, and whether it remains valid when DSC is enabled.

Authors: The bound is information-theoretic and derived via the chain rule and data-processing inequality applied to the adversary's observation of the FSA protocol: the adversary sees only the shards routed to visible aggregators, so mutual information is bounded by the entropy of that observable subset. It is not a tautology but a direct consequence of the protocol mechanics. We will expand Section 4 to include the full derivation from the protocol steps. When DSC is enabled the bound tightens further because compression is applied after sharding; we will add a corollary confirming the bound remains valid under DSC. revision: yes

Circularity Check

0 steps flagged

No significant circularity in ERIS derivation chain

full rationale

The paper's convergence claim rests on a proof that the reassembled update after non-overlapping shard aggregation equals the direct central aggregation, which follows directly from the summation mechanism by construction rather than from any fitted parameter or self-referential definition. The mutual information leakage bound is expressed explicitly in terms of the observable fraction of each update (a controllable design parameter) and decreases with the number of aggregators and compression level. Standard FL convergence assumptions are invoked externally without load-bearing self-citations or uniqueness theorems imported from the authors' prior work. No steps reduce by construction to inputs, fitted predictions, or renamed known results; the derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on standard federated-learning convergence assumptions and introduces FSA as a new aggregation primitive whose security and utility properties are derived from the sharding design.

axioms (1)

domain assumption Standard assumptions for convergence in federated averaging hold for the reassembled updates
Invoked to prove that ERIS preserves convergence

invented entities (1)

Federated Shard Aggregation (FSA) no independent evidence
purpose: Partition each client update into non-overlapping shards whose aggregation is distributed across multiple client-side aggregators
Core novel mechanism introduced to remove central bottleneck and limit per-observer information

pith-pipeline@v0.9.0 · 5524 in / 1320 out tokens · 77907 ms · 2026-05-16T05:39:30.414543+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

partition each client update into A disjoint shards using categorical masks m_t^(a) with disjointness and completeness; aggregator computes v_t^(a) = s_t^(a) + 1/K Σ v_t^k,(a)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery and J-cost orbit unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

convergence bound (5) under Assumptions 3.1-3.2 (L-smoothness, unbiased estimator with variance C1,C2)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

doi: 10.1609/aaai.v38i10.28965

ISSN 2374-3468. doi: 10.1609/aaai.v38i10.28965. Bai, L., Hu, H., Ye, Q., Li, H., Wang, L., and Xu, J. Membership Inference Attacks and Defenses in Feder- ated Learning: A Survey.ACM Comput. Surv., 57(4): 89:1–89:35, December 2024. ISSN 0360-0300. doi: 10.1145/3704633. Beutel, D. J., Topal, T., Mathur, A., Qiu, X., Fernandez- Marques, J., Gao, Y ., Sani, L...

work page doi:10.1609/aaai.v38i10.28965 2024
[3]

doi: 10.1609/aaai.v35i8.16887. EU. Regulation - 2016/679 - EN - gdpr - EUR-Lex. https://eur-lex.europa.eu/eli/reg/2016/679/oj, 2024. Fenoglio, D., Dominici, G., Barbiero, P., Tonda, A., Gjoreski, M., and Langheinrich, M. Federated Be- havioural Planes: Explaining the Evolution of Client Behaviour in Federated Learning.Advances in Neural In- formation Proc...

work page doi:10.1609/aaai.v35i8.16887 2016
[4]

ISBN 978- 1-4503-4946-8

Association for Computing Machinery. ISBN 978- 1-4503-4946-8. doi: 10.1145/3133956.3134012. Hu, C., Jiang, J., and Wang, Z. Decentralized Feder- ated Learning: A Segmented Gossip Approach, August

work page doi:10.1145/3133956.3134012
[5]

Hu, H., Salcic, Z., Sun, L., Dobbie, G., and Zhang, X

In 1st International Workshop on Federated Ma- chine Learning for User Privacy and Data Confidentiality (FML’19). Hu, H., Salcic, Z., Sun, L., Dobbie, G., and Zhang, X. Source Inference Attacks in Federated Learning. In2021 IEEE International Conference on Data Min- ing (ICDM), pp. 1102–1107, December 2021. doi: 10.1109/ICDM51629.2021.00129. Huang, G. B.,...

work page doi:10.1109/icdm51629.2021.00129 2021
[6]

482–491, October 2003

Proceedings., pp. 482–491, October 2003. doi: 10.1109/SFCS.2003.1238221. Khan, W., Leem, S., See, K. B., Wong, J. K., Zhang, S., and Fang, R. A Comprehensive Survey of Foundation Models in Medicine.IEEE Reviews in Biomedical Engineering, pp. 1–20, 2025. ISSN 1941-1189. doi: 10.1109/RBME. 2025.3531360. Kingma, D. P. and Ba, J. Adam: A Method for Stochastic...

work page doi:10.1109/sfcs.2003.1238221 2003
[7]

Li, Z., Zhao, H., Li, B., and Chi, Y

PMLR, November 2020. Li, Z., Zhao, H., Li, B., and Chi, Y . SoteriaFL: A Unified Framework for Private Federated Learning with Commu- nication Compression.Advances in Neural Information Processing Systems, 35:4285–4300, December 2022d. Liu, W., Chen, L., and Zhang, W. Decentralized Federated Learning: Balancing Communication and Computing Costs.IEEE Trans...

work page doi:10.1109/tsipn.2022.3151242 2020
[8]

PMLR, April 2023. Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y ., and Potts, C. Learning Word Vectors for Sentiment Anal- ysis. In Lin, D., Matsumoto, Y ., and Mihalcea, R. (eds.), Proceedings of the 49th Annual Meeting of the Associ- ation for Computational Linguistics: Human Language Technologies, pp. 142–150, Portland, Oregon, USA, June

work page 2023
[9]

11 ERIS: Enhancing Privacy and Communication Efficiency in Serverless Federated Learning McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B

Association for Computational Linguistics. 11 ERIS: Enhancing Privacy and Communication Efficiency in Serverless Federated Learning McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. Communication-Efficient Learning of Deep Networks from Decentralized Data. InProceed- ings of the 20th International Conference on Artificial Intelligence an...

work page 2017
[10]

ISBN 978- 1-4503-9759-9

Association for Computing Machinery. ISBN 978- 1-4503-9759-9. doi: 10.1145/3564625.3567973. Michelusi, N., Scutari, G., and Lee, C.-S. Finite-Bit Quan- tization for Distributed Algorithms With Linear Conver- gence.IEEE Transactions on Information Theory, 68 (11):7254–7280, November 2022. ISSN 1557-9654. doi: 10.1109/TIT.2022.3176253. Mishchenko, K., Gorbu...

work page doi:10.1145/3564625.3567973 2022
[11]

st + 1 K KX k=1 vt k # =E t

At NeurIPS 2021 Workshop Privacy in Machine Learning. Zehtabi, S., Han, D.-J., Parasnis, R., Hosseinalipour, S., and Brinton, C. Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees. InThe Thirteenth International Conference on Learning Representations, October 2024. Zhang, C., Zhang, X., Sotthiwat, E., Xu...

work page doi:10.1109/iccv51070 2021
[12]

in,” the bottom third as “out,

(distilbert-base-uncased, 67M parameters) for IMDB, ResNet-9 (He et al., 2016) (1.65M parameters) for CIFAR-10, and LeNet-5 (Lecun et al., 1998) (62K parameters) for MNIST. For both IID and non-IID settings, we use one local update per client per round (i.e., unbiased gradient estimator), except for GPT-Neo, where memory constraints require two local epoc...

work page 2016
[13]

aim to recover the original training sample by optimising inputs to match the leaked gradient. Unlike earlier gradient- matching attacks, ROG projects the unknown image into a low-dimensional latent space (e.g., via bicubic downsampling or an autoencoder) and optimises that compact representation so that the decoded image’s gradients align with the leaked...

work page 2018