Robust Federated Inference

Akash Dhasade; Anne-Marie Kermarrec; Maxime Jacovella; Nirupam Gupta; Rachid Guerraoui; Rafael Pinot; Sadegh Farhadkhani

arxiv: 2510.00310 · v3 · submitted 2025-09-30 · 💻 cs.LG · cs.MA

Robust Federated Inference

Akash Dhasade , Sadegh Farhadkhani , Rachid Guerraoui , Nirupam Gupta , Maxime Jacovella , Anne-Marie Kermarrec , Rafael Pinot This is my paper

Pith reviewed 2026-05-18 11:12 UTC · model grok-4.3

classification 💻 cs.LG cs.MA

keywords robust federated inferenceadversarial trainingDeepSet aggregationrobust aggregationone-shot federated learningedge ensemblesfederated ensembles

0 comments

The pith

Composing adversarial training with DeepSet aggregation and test-time robust methods makes federated inference resilient to bounded attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formalizes the problem of robust federated inference for settings like one-shot federated learning and edge ensembles, where a server aggregates predictions from local models that may include adversarial ones. It first analyzes averaging-based aggregators and shows their error stays small when honest responses are similar or the margin between top classes is large. For non-linear aggregators the problem is recast as adversarial machine learning, and the authors introduce a composition of adversarial training plus test-time robust aggregation built on a DeepSet model. This yields accuracy gains of 4.7 to 22.2 points over prior robust methods across benchmarks. A sympathetic reader would care because current federated inference remains exposed even to simple attacks while models must stay local and private.

Core claim

The central claim is that the problem of robust federated inference with non-linear aggregators can be solved by casting it as an adversarial machine learning task and addressing it through a composition of adversarial training and test-time robust aggregation using the DeepSet model; for averaging aggregators the error remains small either when dissimilarity among honest responses is small or the margin between the two most probable classes is large.

What carries the argument

The composition of adversarial training at train time with test-time robust aggregation inside a DeepSet model, which treats the aggregator as a learned defense against manipulated local responses.

If this is right

Averaging aggregators produce small error when honest local responses have low dissimilarity or when the margin between the two highest-probability classes is large.
Non-linear aggregators become robust when the aggregation task is solved as an adversarial machine learning problem via the proposed composition.
The composition improves accuracy by 4.7 to 22.2 percentage points over existing robust aggregation baselines on diverse benchmarks.
Models remain local and proprietary while the server still obtains robust combined predictions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Robustness may degrade if the attack distribution at deployment differs markedly from the training distribution, indicating a need for defenses that do not rely on knowing the exact attack model.
The same composition could be tested in non-federated distributed inference or ensemble settings to check whether the gains transfer beyond the federated case.
Adaptive attacks that evolve after training would provide a direct test of how well the known-attack assumption holds in practice.

Load-bearing premise

The adversary can manipulate responses from only a bounded fraction of clients and the attack model used by the adversary is known in advance during the adversarial-training stage.

What would settle it

An experiment in which an adversary employs an attack distribution substantially different from the one used in training and reduces accuracy to levels comparable to non-robust baselines under the same bounded fraction of compromised clients.

read the original abstract

Federated inference, in the form of one-shot federated learning, edge ensembles, or federated ensembles, has emerged as an attractive solution to combine predictions from multiple models. This paradigm enables each model to remain local and proprietary while a central server queries them and aggregates predictions. Yet, the robustness of federated inference has been largely neglected, leaving them vulnerable to even simple attacks. To address this critical gap, we formalize the problem of robust federated inference and provide the first robustness analysis of this class of methods. Our analysis of averaging-based aggregators shows that the error of the aggregator is small either when the dissimilarity between honest responses is small or the margin between the two most probable classes is large. Moving beyond linear averaging, we show that problem of robust federated inference with non-linear aggregators can be cast as an adversarial machine learning problem. We then introduce an advanced technique using the DeepSet aggregation model, proposing a novel composition of adversarial training and test-time robust aggregation to robustify non-linear aggregators. Our composition yields significant improvements, surpassing existing robust aggregation methods by 4.7 - 22.2% in accuracy points across diverse benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper formalizes robust federated inference for the first time and shows accuracy gains from a DeepSet adversarial-training composition, but those gains rest on the attack distribution matching the training setup.

read the letter

The paper's main advance is formalizing robust federated inference and giving the first error analysis for averaging aggregators along with a way to handle non-linear ones. They show that for averaging, the aggregator error is controlled when honest responses are similar or when the margin between top classes is large. For the non-linear case they frame it as an adversarial ML problem and use a DeepSet model trained with adversarial examples plus robust aggregation at test time. This composition improves accuracy by 4.7 to 22.2 points over prior robust methods on the benchmarks they report. The analysis for the averaging case rests on standard concentration bounds and holds up internally. The empirical results are shown across multiple datasets, which is a plus. The limitation that stands out is the assumption that the adversary's behavior matches the attack model used during adversarial training. The paper tests under those matched conditions, but real attacks could differ and reduce the gains. They also do not provide ablations broken down by attack type or statistical significance for the accuracy improvements, so the strength of the empirical claim is a bit harder to judge. No artifacts are released either. This work targets researchers focused on federated learning, edge computing, and robust aggregation in distributed prediction systems. A reader looking for a starting point on defending one-shot federated inference would get value from the formalization and the proposed composition. It deserves a serious referee. The gap it identifies is practical and the approach is grounded enough to warrant review, even with the need for more robustness checks.

Referee Report

2 major / 2 minor

Summary. The paper formalizes robust federated inference for one-shot settings, derives error bounds for averaging aggregators from standard concentration inequalities (small error when honest responses have low dissimilarity or large class margins), recasts non-linear aggregation as an adversarial ML problem, and proposes a DeepSet architecture trained via adversarial training plus test-time robust aggregation. It reports empirical accuracy gains of 4.7-22.2 points over prior robust methods across multiple benchmarks.

Significance. If the results hold, the work addresses a neglected robustness gap in federated inference by combining theoretical analysis with a practical non-linear method. The grounding in standard concentration bounds for the averaging case and the explicit casting of the non-linear case as adversarial training are clear strengths; the reported gains on public benchmarks indicate potential utility, though generalization beyond matched attack distributions remains central to the claim's impact.

major comments (2)

[§4] §4 (experimental evaluation): the headline claim of 4.7-22.2 point accuracy gains lacks per-attack-type ablation tables and statistical significance tests (or variance estimates across random seeds), which are needed to establish that the improvements are attributable to the DeepSet composition rather than specific matched-attack conditions.
[§3.2] §3.2 (adversarial training stage): the robustness analysis and experiments presuppose that the attack distribution is known and used during adversarial training; the manuscript should explicitly discuss or test transfer when the real adversary deviates from this distribution, as this assumption is load-bearing for the transfer of the reported robustness guarantees.

minor comments (2)

[§3.1] Notation for the margin between the two most probable classes should be defined once and used consistently in the averaging analysis.
[Figures 2-4] Figure captions could more explicitly state the attack model and fraction of compromised clients for each plotted curve.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below, agreeing where changes are warranted to strengthen the empirical support and clarify assumptions.

read point-by-point responses

Referee: [§4] §4 (experimental evaluation): the headline claim of 4.7-22.2 point accuracy gains lacks per-attack-type ablation tables and statistical significance tests (or variance estimates across random seeds), which are needed to establish that the improvements are attributable to the DeepSet composition rather than specific matched-attack conditions.

Authors: We agree that variance estimates and explicit per-attack breakdowns would provide stronger evidence that the reported gains are due to the proposed DeepSet composition. In the revised manuscript we will add standard deviations computed over at least five independent random seeds for all main results. We will also include supplementary tables that disaggregate accuracy by attack type (e.g., label-flipping, gradient poisoning, and model-replacement variants) across the evaluated benchmarks, allowing readers to verify consistency beyond matched-attack settings. revision: yes
Referee: [§3.2] §3.2 (adversarial training stage): the robustness analysis and experiments presuppose that the attack distribution is known and used during adversarial training; the manuscript should explicitly discuss or test transfer when the real adversary deviates from this distribution, as this assumption is load-bearing for the transfer of the reported robustness guarantees.

Authors: We acknowledge that the current adversarial-training procedure assumes access to the attack distribution at training time. In the revision we will expand §3.2 with a dedicated paragraph discussing this assumption, its relation to standard adversarial-training practice, and the resulting limitations on transfer. We will further add a set of transfer experiments in which the DeepSet aggregator is trained on one family of attacks and evaluated on held-out attack distributions, thereby providing direct empirical evidence on generalization beyond matched conditions. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with no circular reductions

full rationale

The paper formalizes robust federated inference and derives error bounds for averaging aggregators from dissimilarity between honest responses or class margins, which follows directly from standard aggregation analysis without self-definition. Casting non-linear aggregation as an adversarial ML problem is a reformulation using established concepts. The DeepSet composition applies adversarial training and test-time robust aggregation as a novel technique, but reports empirical gains on external public benchmarks without any fitted parameter or prediction reducing to an internal definition. No load-bearing self-citations or uniqueness theorems from prior author work are invoked in the claims; the central results remain independently verifiable.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The work relies on standard assumptions from adversarial machine learning and federated learning; no new physical or mathematical entities are postulated.

free parameters (2)

adversarial training perturbation budget
Chosen during training of the DeepSet aggregator to simulate attacks; value not stated in abstract but required for the composition.
number of local models queried
Implicit in the experimental setup; affects the margin and dissimilarity conditions in the averaging analysis.

axioms (2)

domain assumption Honest local models produce responses whose dissimilarity is bounded or class margins are large
Invoked in the error bound for averaging-based aggregators.
domain assumption The adversary controls at most a fixed fraction of responses and the attack distribution matches the one used in adversarial training
Required for the robustness transfer of the proposed composition.

pith-pipeline@v0.9.0 · 5753 in / 1491 out tokens · 27437 ms · 2026-05-18T11:12:59.298043+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our analysis of averaging-based aggregators shows that the error of the aggregator is small either when the dissimilarity between honest responses is small or the margin between the two most probable classes is large.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we show that problem of robust federated inference with non-linear aggregators can be cast as an adversarial machine learning problem

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.