Recognition: 2 theorem links
· Lean TheoremERIS: Enhancing Privacy and Scalability in Federated Learning via Federated Shard Aggregation
Pith reviewed 2026-05-16 05:39 UTC · model grok-4.3
The pith
ERIS partitions each client update into shards aggregated by multiple other clients to match the centralized update while bounding leakage to the visible fraction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central mechanism is Federated Shard Aggregation: each client update is partitioned into non-overlapping shards whose aggregation is distributed across multiple client-side aggregators; after reassembly the resulting vector is identical to the direct centralized aggregate. This removes the single-point aggregation bottleneck, limits the information visible to any observer, and yields a convergence guarantee identical to standard federated averaging under the usual assumptions. When combined with Distributed Shifted Compression the same equivalence holds while transmitted payloads and exposed coordinates are further reduced.
What carries the argument
Federated Shard Aggregation (FSA), which splits each client update into non-overlapping shards, distributes their aggregation across multiple client-side aggregators, and reassembles the shards to recover the exact centralized update.
If this is right
- The final global model is mathematically identical to the one obtained by standard centralized aggregation.
- Mutual information leakage to any single aggregator is bounded by the fraction of shards it observes and decreases with more aggregators.
- Adding Distributed Shifted Compression further reduces communication while preserving the same leakage bound.
- Empirical robustness to membership inference and reconstruction attacks improves without accuracy loss.
- The central server no longer performs full aggregation, removing a communication and compute bottleneck.
Where Pith is reading between the lines
- The same sharding idea could be applied to aggregation rules other than simple averaging.
- Coordination overhead for selecting and synchronizing the client-side aggregators may become noticeable in very large or highly dynamic networks.
- Shard boundaries could interact with client data heterogeneity and might require adaptive partitioning to avoid uneven load.
- The approach opens a path to hybrid systems in which some shards remain visible to the server while others stay distributed.
Load-bearing premise
Reassembly after distributed shard aggregation produces exactly the same vector as direct centralized aggregation.
What would settle it
A side-by-side run in which the reassembled model differs from the direct FedAvg model in any coordinate or converges to a measurably different accuracy would falsify the equivalence claim.
Figures
read the original abstract
Scaling Federated Learning (FL) to billion-parameter models forces a challenging trade-off between privacy, scalability, and model utility. Existing solutions often tackle these challenges in isolation, sacrificing accuracy, relying on costly cryptographic tools, or introducing communication and optimization inefficiencies that affect convergence. We introduce ERIS, an FL framework centered on Federated Shard Aggregation (FSA), a novel mechanism that partitions each client update into non-overlapping shards whose aggregation is distributed across multiple client-side aggregators. FSA removes the central aggregation bottleneck, limits the information visible to any single observer, and preserves the centralized FL update after reassembly. ERIS can further readily integrate Distributed Shifted Compression (DSC) to reduce transmitted payloads and exposed coordinates. We prove that ERIS preserves convergence under standard assumptions and bounds mutual information leakage by the observable fraction of each update, decreasing with the number of client-side aggregators, and with the compression level when DSC is enabled. Experiments across image and text tasks, including large language models, show that ERIS achieves FedAvg-level utility while substantially reducing communication bottlenecks and improving robustness to membership inference and reconstruction attacks, without relying on heavy cryptography or utility-degrading perturbations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ERIS, a federated learning framework built around Federated Shard Aggregation (FSA). FSA partitions each client's model update into non-overlapping shards whose aggregation is distributed across multiple client-side aggregators; the shards are later reassembled to recover the original centralized update. The framework optionally incorporates Distributed Shifted Compression (DSC) to shrink payloads. The authors prove that ERIS preserves convergence under standard FL assumptions and that mutual-information leakage is bounded by the observable fraction of each update (which decreases with the number of aggregators and with compression). Experiments on image classification, text tasks, and large language models report FedAvg-level accuracy together with reduced communication volume and improved resistance to membership-inference and reconstruction attacks.
Significance. If the reassembly equivalence and the stated leakage bound hold, ERIS would offer a lightweight, cryptography-free route to simultaneously improving scalability, communication efficiency, and privacy in federated learning. The combination of a convergence proof, an explicit information-theoretic bound, and empirical results on LLMs would constitute a concrete advance over prior shard- or compression-based FL methods that typically trade utility for privacy.
major comments (2)
- [convergence proof / Theorem 1] The central convergence claim rests on the assertion that reassembly after distributed shard aggregation produces exactly the same update as direct central aggregation. The abstract states this preservation occurs, but the equivalence is not automatic from non-overlapping shards; any floating-point discrepancy, ordering artifact, or scaling introduced during distribution or reassembly would invalidate direct application of standard FL convergence theorems. The proof section must therefore contain an explicit lemma or proposition showing that the reassembled vector equals the original aggregated vector with probability 1 (or up to machine precision).
- [information-leakage analysis] The mutual-information leakage bound is expressed directly in terms of the observable fraction of the update. Because this fraction is a design parameter (number of aggregators and compression ratio), the bound is essentially tautological once the observable set is fixed. The paper should clarify whether the bound is information-theoretic (accounting for the adversary's view of the shard-aggregation protocol) or merely a restatement of the design choice, and whether it remains valid when DSC is enabled.
minor comments (2)
- [Section 3] Notation for shard indices, aggregator assignment, and the reassembly operator should be introduced once and used consistently; the current description mixes “shard,” “partition,” and “coordinate” without a single definition table.
- [Experiments] The experimental section reports “FedAvg-level utility” but does not tabulate the exact accuracy deltas or communication-volume reductions for each task; a single summary table comparing FedAvg, ERIS, and ERIS+DSC on all benchmarks would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the presentation of the convergence proof and information-leakage analysis.
read point-by-point responses
-
Referee: [convergence proof / Theorem 1] The central convergence claim rests on the assertion that reassembly after distributed shard aggregation produces exactly the same update as direct central aggregation. The abstract states this preservation occurs, but the equivalence is not automatic from non-overlapping shards; any floating-point discrepancy, ordering artifact, or scaling introduced during distribution or reassembly would invalidate direct application of standard FL convergence theorems. The proof section must therefore contain an explicit lemma or proposition showing that the reassembled vector equals the original aggregated vector with probability 1 (or up to machine precision).
Authors: We agree that an explicit lemma is required for rigor. The equivalence follows from the non-overlapping partition and the linearity of the aggregation operator (sum or average), which is performed identically whether centralized or distributed across shards. To address floating-point and ordering concerns, we will add Lemma 1 in the proof section proving that the reassembled vector equals the centrally aggregated vector up to machine precision under deterministic shard assignment and no scaling. This lemma will justify direct application of standard FL convergence results to ERIS. revision: yes
-
Referee: [information-leakage analysis] The mutual-information leakage bound is expressed directly in terms of the observable fraction of the update. Because this fraction is a design parameter (number of aggregators and compression ratio), the bound is essentially tautological once the observable set is fixed. The paper should clarify whether the bound is information-theoretic (accounting for the adversary's view of the shard-aggregation protocol) or merely a restatement of the design choice, and whether it remains valid when DSC is enabled.
Authors: The bound is information-theoretic and derived via the chain rule and data-processing inequality applied to the adversary's observation of the FSA protocol: the adversary sees only the shards routed to visible aggregators, so mutual information is bounded by the entropy of that observable subset. It is not a tautology but a direct consequence of the protocol mechanics. We will expand Section 4 to include the full derivation from the protocol steps. When DSC is enabled the bound tightens further because compression is applied after sharding; we will add a corollary confirming the bound remains valid under DSC. revision: yes
Circularity Check
No significant circularity in ERIS derivation chain
full rationale
The paper's convergence claim rests on a proof that the reassembled update after non-overlapping shard aggregation equals the direct central aggregation, which follows directly from the summation mechanism by construction rather than from any fitted parameter or self-referential definition. The mutual information leakage bound is expressed explicitly in terms of the observable fraction of each update (a controllable design parameter) and decreases with the number of aggregators and compression level. Standard FL convergence assumptions are invoked externally without load-bearing self-citations or uniqueness theorems imported from the authors' prior work. No steps reduce by construction to inputs, fitted predictions, or renamed known results; the derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions for convergence in federated averaging hold for the reassembled updates
invented entities (1)
-
Federated Shard Aggregation (FSA)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
partition each client update into A disjoint shards using categorical masks m_t^(a) with disjointness and completeness; aggregator computes v_t^(a) = s_t^(a) + 1/K Σ v_t^k,(a)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery and J-cost orbit unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
convergence bound (5) under Assumptions 3.1-3.2 (L-smoothness, unbiased estimator with variance C1,C2)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
doi: 10.1609/aaai.v38i10.28965
ISSN 2374-3468. doi: 10.1609/aaai.v38i10.28965. Bai, L., Hu, H., Ye, Q., Li, H., Wang, L., and Xu, J. Membership Inference Attacks and Defenses in Feder- ated Learning: A Survey.ACM Comput. Surv., 57(4): 89:1–89:35, December 2024. ISSN 0360-0300. doi: 10.1145/3704633. Beutel, D. J., Topal, T., Mathur, A., Qiu, X., Fernandez- Marques, J., Gao, Y ., Sani, L...
-
[3]
doi: 10.1609/aaai.v35i8.16887. EU. Regulation - 2016/679 - EN - gdpr - EUR-Lex. https://eur-lex.europa.eu/eli/reg/2016/679/oj, 2024. Fenoglio, D., Dominici, G., Barbiero, P., Tonda, A., Gjoreski, M., and Langheinrich, M. Federated Be- havioural Planes: Explaining the Evolution of Client Behaviour in Federated Learning.Advances in Neural In- formation Proc...
-
[4]
Association for Computing Machinery. ISBN 978- 1-4503-4946-8. doi: 10.1145/3133956.3134012. Hu, C., Jiang, J., and Wang, Z. Decentralized Feder- ated Learning: A Segmented Gossip Approach, August
-
[5]
Hu, H., Salcic, Z., Sun, L., Dobbie, G., and Zhang, X
In 1st International Workshop on Federated Ma- chine Learning for User Privacy and Data Confidentiality (FML’19). Hu, H., Salcic, Z., Sun, L., Dobbie, G., and Zhang, X. Source Inference Attacks in Federated Learning. In2021 IEEE International Conference on Data Min- ing (ICDM), pp. 1102–1107, December 2021. doi: 10.1109/ICDM51629.2021.00129. Huang, G. B.,...
-
[6]
Proceedings., pp. 482–491, October 2003. doi: 10.1109/SFCS.2003.1238221. Khan, W., Leem, S., See, K. B., Wong, J. K., Zhang, S., and Fang, R. A Comprehensive Survey of Foundation Models in Medicine.IEEE Reviews in Biomedical Engineering, pp. 1–20, 2025. ISSN 1941-1189. doi: 10.1109/RBME. 2025.3531360. Kingma, D. P. and Ba, J. Adam: A Method for Stochastic...
-
[7]
Li, Z., Zhao, H., Li, B., and Chi, Y
PMLR, November 2020. Li, Z., Zhao, H., Li, B., and Chi, Y . SoteriaFL: A Unified Framework for Private Federated Learning with Commu- nication Compression.Advances in Neural Information Processing Systems, 35:4285–4300, December 2022d. Liu, W., Chen, L., and Zhang, W. Decentralized Federated Learning: Balancing Communication and Computing Costs.IEEE Trans...
-
[8]
PMLR, April 2023. Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y ., and Potts, C. Learning Word Vectors for Sentiment Anal- ysis. In Lin, D., Matsumoto, Y ., and Mihalcea, R. (eds.), Proceedings of the 49th Annual Meeting of the Associ- ation for Computational Linguistics: Human Language Technologies, pp. 142–150, Portland, Oregon, USA, June
work page 2023
-
[9]
Association for Computational Linguistics. 11 ERIS: Enhancing Privacy and Communication Efficiency in Serverless Federated Learning McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. Communication-Efficient Learning of Deep Networks from Decentralized Data. InProceed- ings of the 20th International Conference on Artificial Intelligence an...
work page 2017
-
[10]
Association for Computing Machinery. ISBN 978- 1-4503-9759-9. doi: 10.1145/3564625.3567973. Michelusi, N., Scutari, G., and Lee, C.-S. Finite-Bit Quan- tization for Distributed Algorithms With Linear Conver- gence.IEEE Transactions on Information Theory, 68 (11):7254–7280, November 2022. ISSN 1557-9654. doi: 10.1109/TIT.2022.3176253. Mishchenko, K., Gorbu...
-
[11]
At NeurIPS 2021 Workshop Privacy in Machine Learning. Zehtabi, S., Han, D.-J., Parasnis, R., Hosseinalipour, S., and Brinton, C. Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees. InThe Thirteenth International Conference on Learning Representations, October 2024. Zhang, C., Zhang, X., Sotthiwat, E., Xu...
-
[12]
in,” the bottom third as “out,
(distilbert-base-uncased, 67M parameters) for IMDB, ResNet-9 (He et al., 2016) (1.65M parameters) for CIFAR-10, and LeNet-5 (Lecun et al., 1998) (62K parameters) for MNIST. For both IID and non-IID settings, we use one local update per client per round (i.e., unbiased gradient estimator), except for GPT-Neo, where memory constraints require two local epoc...
work page 2016
-
[13]
aim to recover the original training sample by optimising inputs to match the leaked gradient. Unlike earlier gradient- matching attacks, ROG projects the unknown image into a low-dimensional latent space (e.g., via bicubic downsampling or an autoencoder) and optimises that compact representation so that the decoded image’s gradients align with the leaked...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.