pith. machine review for the scientific record. sign in

arxiv: 2602.08617 · v2 · submitted 2026-02-09 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

ERIS: Enhancing Privacy and Scalability in Federated Learning via Federated Shard Aggregation

Authors on Pith no claims yet

Pith reviewed 2026-05-16 05:39 UTC · model grok-4.3

classification 💻 cs.LG
keywords federated learningshard aggregationprivacyscalabilitymutual informationmembership inferencedistributed aggregationmodel compression
0
0 comments X

The pith

ERIS partitions each client update into shards aggregated by multiple other clients to match the centralized update while bounding leakage to the visible fraction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

ERIS introduces Federated Shard Aggregation so that each client's model update is split into non-overlapping shards whose partial sums are computed by several client-side aggregators rather than a single server. The shards are reassembled at the end to produce exactly the same global update that standard FedAvg would have produced. Because any single aggregator sees only a fraction of the coordinates, mutual information leakage is bounded by that fraction and shrinks further when more aggregators are used or when Distributed Shifted Compression is applied. Experiments on image classification and large language model tasks show that the approach reaches the same final accuracy as FedAvg while lowering total communication volume and raising resistance to membership-inference and reconstruction attacks.

Core claim

The central mechanism is Federated Shard Aggregation: each client update is partitioned into non-overlapping shards whose aggregation is distributed across multiple client-side aggregators; after reassembly the resulting vector is identical to the direct centralized aggregate. This removes the single-point aggregation bottleneck, limits the information visible to any observer, and yields a convergence guarantee identical to standard federated averaging under the usual assumptions. When combined with Distributed Shifted Compression the same equivalence holds while transmitted payloads and exposed coordinates are further reduced.

What carries the argument

Federated Shard Aggregation (FSA), which splits each client update into non-overlapping shards, distributes their aggregation across multiple client-side aggregators, and reassembles the shards to recover the exact centralized update.

If this is right

  • The final global model is mathematically identical to the one obtained by standard centralized aggregation.
  • Mutual information leakage to any single aggregator is bounded by the fraction of shards it observes and decreases with more aggregators.
  • Adding Distributed Shifted Compression further reduces communication while preserving the same leakage bound.
  • Empirical robustness to membership inference and reconstruction attacks improves without accuracy loss.
  • The central server no longer performs full aggregation, removing a communication and compute bottleneck.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sharding idea could be applied to aggregation rules other than simple averaging.
  • Coordination overhead for selecting and synchronizing the client-side aggregators may become noticeable in very large or highly dynamic networks.
  • Shard boundaries could interact with client data heterogeneity and might require adaptive partitioning to avoid uneven load.
  • The approach opens a path to hybrid systems in which some shards remain visible to the server while others stay distributed.

Load-bearing premise

Reassembly after distributed shard aggregation produces exactly the same vector as direct centralized aggregation.

What would settle it

A side-by-side run in which the reassembled model differs from the direct FedAvg model in any coordinate or converges to a measurably different accuracy would falsify the equivalence claim.

Figures

Figures reproduced from arXiv: 2602.08617 by Akash Dhasade, Dario Fenoglio, Jacopo Quizi, Marc Langheinrich, Martin Gjoreski, Pasquale Polverino.

Figure 1
Figure 1. Figure 1: Illustration of ERIS at training round t for two aggregators (A = 2). Left: each client performs shifted compression and model partitioning, generating shards v t k,(a) sent to aggregators C2 and Ck−1. Right: each aggregator collects and aggregates the corresponding shards across clients to produce partial updated models x t+1 (a) , which are then sent back to the clients. subset of parameters. As a result… view at source ↗
Figure 3
Figure 3. Figure 3: (right) shows the impact of the compression con￾stant ω with A= 50 fixed: stronger compression (higher ω, i.e., lower retention probability p) steadily drives MIA accu￾racy toward the idealized minimum-leakage baseline. These results empirically validate Theorem 3.7, underscoring the role of shifted compression in reducing MIA risk, while Ap￾pendix F.3 quantifies its effect on model utility. For DRA, 250 5… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of test accuracy and MIA accuracy across varying model capacities (one per dataset) and client-side overfitting levels, controlled via the number of training samples per client. 6 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Utility–privacy trade-off on CIFAR-10 under varying strengths of the privacy-preserving mechanisms. shifted compression, which reduces transmitted parameters by orders of magnitude without harming convergence (mea￾sured by Exchanged); and (ii) decentralized aggregation, which balances network load and removes the server bot￾tleneck (measured by Dist. Time). However, unlike prior decentralized learning meth… view at source ↗
Figure 5
Figure 5. Figure 5: Impact of honest-but-curious client collusion in ERIS. Empirical Validation To assess the robustness of ERIS against coordinated leakage attempts, we evaluate how MIA accuracy evolves as multiple honest-but-curious clients collude by sharing their received shards [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Conditional weight distributions (x t+1 k,i |Dk, Ht) over training rounds for DistilBERT, ResNet-9, and LeNet-5. Each 3D plot shows the distribution of weight values (horizontal axis) over time (depth axis), with frequency represented on the vertical axis. In all cases, the weight distributions remain ∼ N (0, σcond) with a σcond < 0.2, validating the sub-Gaussian premise used in used Remark 3.8. F.2. Scala… view at source ↗
Figure 7
Figure 7. Figure 7: Minimum distribution time for a single training round for FedAvg, PriPrune, SoteriaFL, Ako, Shatter, and ERIS. The figure shows the minimum distribution time for a single round with M = 320 Mbit on a logarithmic scale (left), and the minimum distribution time for a single round with 50 training clients as the model size increases on a logarithmic scale (right). Communication efficiency. In addition to the … view at source ↗
Figure 8
Figure 8. Figure 8: Effect of shifted compression on CIFAR-10 test accuracy, varying ω across different local training sample sizes. This section provides additional results complementing the analysis in Paragraph 4.2 [PITH_FULL_IMAGE:figures/full_fig_p034_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: reports the effect of increasing dropout rates on both (i) test accuracy and (ii) the best validation round at which the model reaches its peak performance (i.e., the minimum validation loss). The results show that ERIS maintains nearly constant test accuracy up to a dropout rate of 70%. This robustness stems from the fact that each aggregator is responsible for only a disjoint shard of the model; losing a… view at source ↗
Figure 10
Figure 10. Figure 10: Reconstruction quality under DLG, iDLG, and ROG attacks as a function of the percentage of model parameters available to the attacker. The LPIPS score (higher is better) is averaged over 200 samples. The x-axis is plotted on a non-linear scale for improved clarity of the low-percentage regime. Shaded regions highlight the obfuscation achieved by ERIS, which renders reconstruction attacks ineffective even … view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of test accuracy and MIA accuracy across varying model sizes and client-side overfitting levels, controlled via the number of client training samples under non-IID setting. F.8. Balancing Utility and Privacy - Biased Gradient Estimator In this section, we extend our analysis of the utility–privacy trade-off to the biased setting (already adopted for CNN/DailyMail dataset), where each client per… view at source ↗
Figure 13
Figure 13. Figure 13: Utility–privacy trade-off on CIFAR-10 under varying strengths of the privacy-preserving mechanisms. Each subplot shows test accuracy vs. MIA accuracy for methods with different client training samples. The Pareto front represents a set of optimal trade-off points. 42 [PITH_FULL_IMAGE:figures/full_fig_p042_13.png] view at source ↗
read the original abstract

Scaling Federated Learning (FL) to billion-parameter models forces a challenging trade-off between privacy, scalability, and model utility. Existing solutions often tackle these challenges in isolation, sacrificing accuracy, relying on costly cryptographic tools, or introducing communication and optimization inefficiencies that affect convergence. We introduce ERIS, an FL framework centered on Federated Shard Aggregation (FSA), a novel mechanism that partitions each client update into non-overlapping shards whose aggregation is distributed across multiple client-side aggregators. FSA removes the central aggregation bottleneck, limits the information visible to any single observer, and preserves the centralized FL update after reassembly. ERIS can further readily integrate Distributed Shifted Compression (DSC) to reduce transmitted payloads and exposed coordinates. We prove that ERIS preserves convergence under standard assumptions and bounds mutual information leakage by the observable fraction of each update, decreasing with the number of client-side aggregators, and with the compression level when DSC is enabled. Experiments across image and text tasks, including large language models, show that ERIS achieves FedAvg-level utility while substantially reducing communication bottlenecks and improving robustness to membership inference and reconstruction attacks, without relying on heavy cryptography or utility-degrading perturbations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces ERIS, a federated learning framework built around Federated Shard Aggregation (FSA). FSA partitions each client's model update into non-overlapping shards whose aggregation is distributed across multiple client-side aggregators; the shards are later reassembled to recover the original centralized update. The framework optionally incorporates Distributed Shifted Compression (DSC) to shrink payloads. The authors prove that ERIS preserves convergence under standard FL assumptions and that mutual-information leakage is bounded by the observable fraction of each update (which decreases with the number of aggregators and with compression). Experiments on image classification, text tasks, and large language models report FedAvg-level accuracy together with reduced communication volume and improved resistance to membership-inference and reconstruction attacks.

Significance. If the reassembly equivalence and the stated leakage bound hold, ERIS would offer a lightweight, cryptography-free route to simultaneously improving scalability, communication efficiency, and privacy in federated learning. The combination of a convergence proof, an explicit information-theoretic bound, and empirical results on LLMs would constitute a concrete advance over prior shard- or compression-based FL methods that typically trade utility for privacy.

major comments (2)
  1. [convergence proof / Theorem 1] The central convergence claim rests on the assertion that reassembly after distributed shard aggregation produces exactly the same update as direct central aggregation. The abstract states this preservation occurs, but the equivalence is not automatic from non-overlapping shards; any floating-point discrepancy, ordering artifact, or scaling introduced during distribution or reassembly would invalidate direct application of standard FL convergence theorems. The proof section must therefore contain an explicit lemma or proposition showing that the reassembled vector equals the original aggregated vector with probability 1 (or up to machine precision).
  2. [information-leakage analysis] The mutual-information leakage bound is expressed directly in terms of the observable fraction of the update. Because this fraction is a design parameter (number of aggregators and compression ratio), the bound is essentially tautological once the observable set is fixed. The paper should clarify whether the bound is information-theoretic (accounting for the adversary's view of the shard-aggregation protocol) or merely a restatement of the design choice, and whether it remains valid when DSC is enabled.
minor comments (2)
  1. [Section 3] Notation for shard indices, aggregator assignment, and the reassembly operator should be introduced once and used consistently; the current description mixes “shard,” “partition,” and “coordinate” without a single definition table.
  2. [Experiments] The experimental section reports “FedAvg-level utility” but does not tabulate the exact accuracy deltas or communication-volume reductions for each task; a single summary table comparing FedAvg, ERIS, and ERIS+DSC on all benchmarks would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the presentation of the convergence proof and information-leakage analysis.

read point-by-point responses
  1. Referee: [convergence proof / Theorem 1] The central convergence claim rests on the assertion that reassembly after distributed shard aggregation produces exactly the same update as direct central aggregation. The abstract states this preservation occurs, but the equivalence is not automatic from non-overlapping shards; any floating-point discrepancy, ordering artifact, or scaling introduced during distribution or reassembly would invalidate direct application of standard FL convergence theorems. The proof section must therefore contain an explicit lemma or proposition showing that the reassembled vector equals the original aggregated vector with probability 1 (or up to machine precision).

    Authors: We agree that an explicit lemma is required for rigor. The equivalence follows from the non-overlapping partition and the linearity of the aggregation operator (sum or average), which is performed identically whether centralized or distributed across shards. To address floating-point and ordering concerns, we will add Lemma 1 in the proof section proving that the reassembled vector equals the centrally aggregated vector up to machine precision under deterministic shard assignment and no scaling. This lemma will justify direct application of standard FL convergence results to ERIS. revision: yes

  2. Referee: [information-leakage analysis] The mutual-information leakage bound is expressed directly in terms of the observable fraction of the update. Because this fraction is a design parameter (number of aggregators and compression ratio), the bound is essentially tautological once the observable set is fixed. The paper should clarify whether the bound is information-theoretic (accounting for the adversary's view of the shard-aggregation protocol) or merely a restatement of the design choice, and whether it remains valid when DSC is enabled.

    Authors: The bound is information-theoretic and derived via the chain rule and data-processing inequality applied to the adversary's observation of the FSA protocol: the adversary sees only the shards routed to visible aggregators, so mutual information is bounded by the entropy of that observable subset. It is not a tautology but a direct consequence of the protocol mechanics. We will expand Section 4 to include the full derivation from the protocol steps. When DSC is enabled the bound tightens further because compression is applied after sharding; we will add a corollary confirming the bound remains valid under DSC. revision: yes

Circularity Check

0 steps flagged

No significant circularity in ERIS derivation chain

full rationale

The paper's convergence claim rests on a proof that the reassembled update after non-overlapping shard aggregation equals the direct central aggregation, which follows directly from the summation mechanism by construction rather than from any fitted parameter or self-referential definition. The mutual information leakage bound is expressed explicitly in terms of the observable fraction of each update (a controllable design parameter) and decreases with the number of aggregators and compression level. Standard FL convergence assumptions are invoked externally without load-bearing self-citations or uniqueness theorems imported from the authors' prior work. No steps reduce by construction to inputs, fitted predictions, or renamed known results; the derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on standard federated-learning convergence assumptions and introduces FSA as a new aggregation primitive whose security and utility properties are derived from the sharding design.

axioms (1)
  • domain assumption Standard assumptions for convergence in federated averaging hold for the reassembled updates
    Invoked to prove that ERIS preserves convergence
invented entities (1)
  • Federated Shard Aggregation (FSA) no independent evidence
    purpose: Partition each client update into non-overlapping shards whose aggregation is distributed across multiple client-side aggregators
    Core novel mechanism introduced to remove central bottleneck and limit per-observer information

pith-pipeline@v0.9.0 · 5524 in / 1320 out tokens · 77907 ms · 2026-05-16T05:39:30.414543+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

  1. [1]

    doi: 10.1609/aaai.v38i10.28965

    ISSN 2374-3468. doi: 10.1609/aaai.v38i10.28965. Bai, L., Hu, H., Ye, Q., Li, H., Wang, L., and Xu, J. Membership Inference Attacks and Defenses in Feder- ated Learning: A Survey.ACM Comput. Surv., 57(4): 89:1–89:35, December 2024. ISSN 0360-0300. doi: 10.1145/3704633. Beutel, D. J., Topal, T., Mathur, A., Qiu, X., Fernandez- Marques, J., Gao, Y ., Sani, L...

  2. [3]

    doi: 10.1609/aaai.v35i8.16887. EU. Regulation - 2016/679 - EN - gdpr - EUR-Lex. https://eur-lex.europa.eu/eli/reg/2016/679/oj, 2024. Fenoglio, D., Dominici, G., Barbiero, P., Tonda, A., Gjoreski, M., and Langheinrich, M. Federated Be- havioural Planes: Explaining the Evolution of Client Behaviour in Federated Learning.Advances in Neural In- formation Proc...

  3. [4]

    ISBN 978- 1-4503-4946-8

    Association for Computing Machinery. ISBN 978- 1-4503-4946-8. doi: 10.1145/3133956.3134012. Hu, C., Jiang, J., and Wang, Z. Decentralized Feder- ated Learning: A Segmented Gossip Approach, August

  4. [5]

    Hu, H., Salcic, Z., Sun, L., Dobbie, G., and Zhang, X

    In 1st International Workshop on Federated Ma- chine Learning for User Privacy and Data Confidentiality (FML’19). Hu, H., Salcic, Z., Sun, L., Dobbie, G., and Zhang, X. Source Inference Attacks in Federated Learning. In2021 IEEE International Conference on Data Min- ing (ICDM), pp. 1102–1107, December 2021. doi: 10.1109/ICDM51629.2021.00129. Huang, G. B.,...

  5. [6]

    482–491, October 2003

    Proceedings., pp. 482–491, October 2003. doi: 10.1109/SFCS.2003.1238221. Khan, W., Leem, S., See, K. B., Wong, J. K., Zhang, S., and Fang, R. A Comprehensive Survey of Foundation Models in Medicine.IEEE Reviews in Biomedical Engineering, pp. 1–20, 2025. ISSN 1941-1189. doi: 10.1109/RBME. 2025.3531360. Kingma, D. P. and Ba, J. Adam: A Method for Stochastic...

  6. [7]

    Li, Z., Zhao, H., Li, B., and Chi, Y

    PMLR, November 2020. Li, Z., Zhao, H., Li, B., and Chi, Y . SoteriaFL: A Unified Framework for Private Federated Learning with Commu- nication Compression.Advances in Neural Information Processing Systems, 35:4285–4300, December 2022d. Liu, W., Chen, L., and Zhang, W. Decentralized Federated Learning: Balancing Communication and Computing Costs.IEEE Trans...

  7. [8]

    PMLR, April 2023. Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y ., and Potts, C. Learning Word Vectors for Sentiment Anal- ysis. In Lin, D., Matsumoto, Y ., and Mihalcea, R. (eds.), Proceedings of the 49th Annual Meeting of the Associ- ation for Computational Linguistics: Human Language Technologies, pp. 142–150, Portland, Oregon, USA, June

  8. [9]

    11 ERIS: Enhancing Privacy and Communication Efficiency in Serverless Federated Learning McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B

    Association for Computational Linguistics. 11 ERIS: Enhancing Privacy and Communication Efficiency in Serverless Federated Learning McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. Communication-Efficient Learning of Deep Networks from Decentralized Data. InProceed- ings of the 20th International Conference on Artificial Intelligence an...

  9. [10]

    ISBN 978- 1-4503-9759-9

    Association for Computing Machinery. ISBN 978- 1-4503-9759-9. doi: 10.1145/3564625.3567973. Michelusi, N., Scutari, G., and Lee, C.-S. Finite-Bit Quan- tization for Distributed Algorithms With Linear Conver- gence.IEEE Transactions on Information Theory, 68 (11):7254–7280, November 2022. ISSN 1557-9654. doi: 10.1109/TIT.2022.3176253. Mishchenko, K., Gorbu...

  10. [11]

    st + 1 K KX k=1 vt k # =E t

    At NeurIPS 2021 Workshop Privacy in Machine Learning. Zehtabi, S., Han, D.-J., Parasnis, R., Hosseinalipour, S., and Brinton, C. Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees. InThe Thirteenth International Conference on Learning Representations, October 2024. Zhang, C., Zhang, X., Sotthiwat, E., Xu...

  11. [12]

    in,” the bottom third as “out,

    (distilbert-base-uncased, 67M parameters) for IMDB, ResNet-9 (He et al., 2016) (1.65M parameters) for CIFAR-10, and LeNet-5 (Lecun et al., 1998) (62K parameters) for MNIST. For both IID and non-IID settings, we use one local update per client per round (i.e., unbiased gradient estimator), except for GPT-Neo, where memory constraints require two local epoc...

  12. [13]

    aim to recover the original training sample by optimising inputs to match the leaked gradient. Unlike earlier gradient- matching attacks, ROG projects the unknown image into a low-dimensional latent space (e.g., via bicubic downsampling or an autoencoder) and optimises that compact representation so that the decoded image’s gradients align with the leaked...