Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning

Ajinkya Mohgaonkar; Debarghya Ghoshdastidar; Lukas Gosch; Mahalakshmi Sabanayagam; Stephan G\"unnemann

arxiv: 2604.11416 · v1 · submitted 2026-04-13 · 💻 cs.LG

Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning

Ajinkya Mohgaonkar , Lukas Gosch , Mahalakshmi Sabanayagam , Debarghya Ghoshdastidar , Stephan G\"unnemann This is my paper

Pith reviewed 2026-05-10 16:14 UTC · model grok-4.3

classification 💻 cs.LG

keywords certified robustnesslabel flippingneural tangent kernelpartition aggregationadversarial machine learningensemble certificationlabel poisoning

0 comments

The pith

Wide neural networks admit exact polynomial-time certificates against label-flipping attacks via their equivalence to kernel methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to compute precise robustness guarantees for neural networks when training labels have been adversarially flipped. It exploits the fact that sufficiently wide networks behave exactly like kernel machines, turning the certification problem into an exact, efficient calculation rather than a loose bound. For ensembles formed by partitioning the data, it combines detailed per-partition certificates that use the internal structure of each base classifier, producing stronger ensemble-level guarantees than prior black-box methods. Experiments indicate these guarantees certify substantially more label flips on CIFAR-10 while requiring far fewer partitions.

Core claim

ScaLabelCert delivers the first exact, polynomial-time certificate for neural networks against label-flipping attacks by using the neural tangent kernel equivalence that maps wide networks to kernel methods. EnsembleCert then aggregates the resulting white-box per-partition certificates to obtain ensemble robustness guarantees in polynomial time, producing tighter bounds than existing black-box partition-aggregation approaches.

What carries the argument

The neural tangent kernel equivalence, which converts the label-flip robustness question for wide networks into an exact kernel computation that can be solved without approximation.

If this is right

Exact certificates for neural networks against label flips become computable in polynomial time.
Ensemble robustness guarantees improve by incorporating white-box information from each base classifier instead of treating them as black boxes.
On CIFAR-10 the method certifies up to 26.5 percent more label flips in the median while using 100 times fewer partitions.
Heavy partitioning is no longer required to achieve strong certified robustness against label poisoning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same kernel equivalence might allow exact certification for other supervised poisoning attacks if the loss remains linear in the labels.
Practical systems could adopt smaller ensembles for certified defenses, lowering memory and inference cost.
Finite-width networks could be tested to see how closely their empirical robustness matches the infinite-width kernel certificate.

Load-bearing premise

Sufficiently wide neural networks are exactly equivalent to kernel methods through the neural tangent kernel, with no residual approximation error in the certificate.

What would settle it

An explicit wide network and label-flip scenario in which the observed robustness deviates from the value predicted by the neural tangent kernel calculation.

Figures

Figures reproduced from arXiv: 2604.11416 by Ajinkya Mohgaonkar, Debarghya Ghoshdastidar, Lukas Gosch, Mahalakshmi Sabanayagam, Stephan G\"unnemann.

**Figure 1.** Figure 1: (a) Two-step approach of EnsembleCert to derive white-box guarantees for partition aggregation ensembles. (b) Evaluation on CIFAR-10 using wide neural networks trained on a regression loss as base classifiers. Using as few as 10 partitions with white-box knowledge enables the ensemble to withstand up to 26.5% more label flips in median compared to using 1000 partitions. For a definition of the metric medi… view at source ↗

**Figure 2.** Figure 2: EnsembleCert evaluation using the NTK for [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Comparing certified accuracies of stand-alone base models and their partition aggregation [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Figures (a) and (b) show the average latency per sample for kernel SVM on CIFAR-10 and MNIST, respectively. Figures (c) and (d) present the corresponding results for kernel regression. We report results for a single value of λ for each dataset, as the trends are consistent across different λ values and do not provide any additional qualitative insight. D.2 ROBUSTNESS-ACCURACY TRADEOFF FOR SMALL C Recall th… view at source ↗

**Figure 5.** Figure 5: Performance of an infinitely-wide neural network with a single trained on hinge loss across different values of C and the cardinality of the training dataset Ns. For each value of Ns, we subsample the training set for 5 different random seeds. The accuracy for each value of Ns initially remains constant up to a certain threshold for C, indicating the range of sufficiently small C, as the SVM performance do… view at source ↗

**Figure 6.** Figure 6: Robustness-accuracy trade-off introduced by sufficiently small C. [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of ScaLabelCert and gradient-based parameter bounding method. The parameter κ represents the gradient clipping parameter. ScaLabelCert significantly outperforms the gradient-based method for all values of κ. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗

**Figure 8.** Figure 8: EnsembleCert evaluated on ensembles with finite-width network as base classifiers. The base classifiers are certified by applying the gradient-based bounding technique by Sosnin et al. (2025). Evidently, EnsembleCert significantly outperforms SS-DPA, the black-box approach applied to the ensemble. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗

**Figure 9.** Figure 9: Robustness trends across different λ values on MNIST using kernel regression. 0 100 200 300 400 500 Number of Partitions 0 50 100 150 200 MCR SS-DPA EnsembleCert LB EnsemblCert UB (a) λ = 0.01 0 100 200 300 400 500 Number of Partitions 0 50 100 150 200 250 MCR SS-DPA EnsembleCert LB EnsemblCert UB (b) λ = 0.1 0 100 200 300 400 500 Number of Partitions 0 50 100 150 200 250 MCR SS-DPA EnsembleCert LB Ensembl… view at source ↗

**Figure 10.** Figure 10: Robustness trends across different λ values on CIFAR-10 using kernel regression. 32 [PITH_FULL_IMAGE:figures/full_fig_p032_10.png] view at source ↗

**Figure 11.** Figure 11: MNIST binary kernel regression with λ = 0.1 results for different number of partitions. Kernel SVM We present results for evaluation using kernel SVM with NTK and a sufficiently small C in [PITH_FULL_IMAGE:figures/full_fig_p033_11.png] view at source ↗

**Figure 12.** Figure 12: MNIST binary kernel SVM results for different number of partitions. [PITH_FULL_IMAGE:figures/full_fig_p034_12.png] view at source ↗

**Figure 13.** Figure 13: Invariance to number of partitions 34 [PITH_FULL_IMAGE:figures/full_fig_p034_13.png] view at source ↗

**Figure 14.** Figure 14: CIFAR-10 kernel regression results for different numbers of partitions ( [PITH_FULL_IMAGE:figures/full_fig_p035_14.png] view at source ↗

**Figure 15.** Figure 15: CIFAR-10 kernel regression with high regularization ( [PITH_FULL_IMAGE:figures/full_fig_p035_15.png] view at source ↗

**Figure 16.** Figure 16: CIFAR-10 kernel SVM results for different numbers of partitions. [PITH_FULL_IMAGE:figures/full_fig_p036_16.png] view at source ↗

**Figure 17.** Figure 17: CIFAR-10 Smoothed linear regression as base-classifier [PITH_FULL_IMAGE:figures/full_fig_p036_17.png] view at source ↗

**Figure 18.** Figure 18: MNIST multi-class kernel regression results for different numbers of partitions ( [PITH_FULL_IMAGE:figures/full_fig_p037_18.png] view at source ↗

**Figure 19.** Figure 19: MNIST multi-class kernel regression results for different numbers of partitions ( [PITH_FULL_IMAGE:figures/full_fig_p038_19.png] view at source ↗

**Figure 20.** Figure 20: MNIST multi-class kernel SVM results for different numbers of partitions. [PITH_FULL_IMAGE:figures/full_fig_p039_20.png] view at source ↗

read the original abstract

Label-flipping attacks, which corrupt training labels to induce misclassifications at inference, remain a major threat to supervised learning models. This drives the need for robustness certificates that provide formal guarantees about a model's robustness under adversarially corrupted labels. Existing certification frameworks rely on ensemble techniques such as smoothing or partition-aggregation, but treat the corresponding base classifiers as black boxes, yielding overly conservative guarantees. We introduce EnsembleCert, the first certification framework for partition-aggregation ensembles that utilizes white-box knowledge of the base classifiers. Concretely, EnsembleCert yields tighter guarantees than black-box approaches by aggregating per-partition white-box certificates to compute ensemble-level guarantees in polynomial time. To extract white-box knowledge from the base classifiers efficiently, we develop ScaLabelCert, a method that leverages the equivalence between sufficiently wide neural networks and kernel methods using the neural tangent kernel. ScaLabelCert yields the first exact, polynomial-time calculable certificate for neural networks against label-flipping attacks. EnsembleCert is either on par, or significantly outperforms the existing partition-based black box certificates. Exemplary, on CIFAR-10, our method can certify upto +26.5% more label flips in median over the test set compared to the existing black-box approach while requiring 100 times fewer partitions, thus, challenging the prevailing notion that heavy partitioning is a necessity for strong certified robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a white-box way to certify partition ensembles against label flips and claims the first exact NTK-based certificate for neural nets, with solid CIFAR-10 gains, but the finite-width gap is a real question.

read the letter

The main things to know are that EnsembleCert uses white-box certificates from each partition to tighten the ensemble guarantee in polynomial time, and ScaLabelCert replaces the trained network with its NTK kernel regressor to get what they call an exact certificate against label flips. Both are positioned as improvements over black-box partition methods that treat base classifiers as opaque.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces ScaLabelCert, which computes exact certificates against label-flipping attacks for sufficiently wide neural networks by replacing the trained model with its neural tangent kernel (NTK) regressor, enabling polynomial-time evaluation. It further presents EnsembleCert, a white-box framework that aggregates per-partition certificates from base classifiers (including those from ScaLabelCert) to obtain tighter ensemble-level guarantees than existing black-box partition-aggregation methods. Experiments on CIFAR-10 report up to 26.5% more certified label flips in the median with 100x fewer partitions.

Significance. If the central claims hold, the work would be significant for providing the first exact polynomial-time certificates for neural networks against label poisoning and for improving certified robustness of ensembles via white-box information. The polynomial-time guarantee and the reported gains over black-box baselines address key computational and tightness limitations in the field. Credit is given for deriving the ensemble aggregation in polynomial time and for attempting an exact (non-relaxed) formulation via NTK.

major comments (2)

[Abstract and ScaLabelCert derivation] Abstract and ScaLabelCert section: the claim that ScaLabelCert yields an 'exact' certificate rests on the NTK equivalence for 'sufficiently wide' networks. No rigorous error bound is supplied quantifying the O(1/width) finite-width corrections to the predictor; because certification solves a worst-case optimization over adversarial label vectors, even small deviations can enlarge the set of successful flips and invalidate the kernel-derived radius for any concrete finite-width network. This directly undermines the 'first exact' and 'exact, polynomial-time calculable' assertions.
[EnsembleCert framework] EnsembleCert aggregation (likely §5): the polynomial-time claim for combining white-box per-partition certificates is load-bearing for the outperformance result, yet the manuscript does not exhibit the explicit complexity analysis or the precise aggregation rule that avoids exponential enumeration when the base certificates themselves are optimization-based.

minor comments (2)

[Abstract] The phrase 'upto +26.5%' in the abstract should read 'up to +26.5%'.
[Experimental evaluation] Network widths and training details used in the CIFAR-10 experiments should be stated explicitly so readers can judge how close the models are to the infinite-width regime assumed by the NTK argument.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, clarifying our claims and indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract and ScaLabelCert derivation] Abstract and ScaLabelCert section: the claim that ScaLabelCert yields an 'exact' certificate rests on the NTK equivalence for 'sufficiently wide' networks. No rigorous error bound is supplied quantifying the O(1/width) finite-width corrections to the predictor; because certification solves a worst-case optimization over adversarial label vectors, even small deviations can enlarge the set of successful flips and invalidate the kernel-derived radius for any concrete finite-width network. This directly undermines the 'first exact' and 'exact, polynomial-time calculable' assertions.

Authors: We agree that the current wording could be misinterpreted. ScaLabelCert derives an exact certificate for the NTK regressor, which is equivalent to the neural network predictor in the infinite-width limit. The manuscript applies this to sufficiently wide networks as a practical surrogate. However, as noted, no quantitative bound on the O(1/width) deviation is provided, and worst-case certification could in principle be sensitive to such perturbations. We will revise the abstract and ScaLabelCert section to explicitly qualify that exactness holds for the NTK model, add a discussion of the finite-width approximation, and remove or temper the unqualified 'first exact' phrasing for finite networks. This revision will be made. revision: yes
Referee: [EnsembleCert framework] EnsembleCert aggregation (likely §5): the polynomial-time claim for combining white-box per-partition certificates is load-bearing for the outperformance result, yet the manuscript does not exhibit the explicit complexity analysis or the precise aggregation rule that avoids exponential enumeration when the base certificates themselves are optimization-based.

Authors: The aggregation rule in EnsembleCert combines the per-partition white-box certificates (each providing a certified flip radius) via a polynomial-time solvable optimization that exploits the partition structure, rather than enumerating all possible flip combinations across partitions. This yields an overall polynomial complexity. We acknowledge that the manuscript presents the claim without a self-contained complexity proof or fully expanded pseudocode for the aggregation step. In the revision we will add an explicit complexity analysis (including the precise aggregation formulation) in §5 or an appendix to substantiate the polynomial-time guarantee. revision: yes

Circularity Check

0 steps flagged

Minor reliance on prior NTK theory; no load-bearing self-citation or self-definition

full rationale

The paper's ScaLabelCert method invokes the established neural tangent kernel equivalence for sufficiently wide networks from independent prior literature to enable exact kernel-based certification. This is not a self-citation chain, fitted-input prediction, or self-definitional reduction; the equivalence is treated as an external benchmark. EnsembleCert's white-box aggregation of per-partition certificates is derived independently and does not collapse to the paper's own inputs by construction. No quoted equations or steps exhibit the enumerated circularity patterns. The derivation is self-contained against external theory, aligning with a low score.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claims depend on the validity of the NTK equivalence for wide networks and the polynomial-time aggregation of per-partition certificates.

axioms (1)

domain assumption Neural networks of sufficient width are equivalent to kernel methods under the neural tangent kernel
This equivalence is invoked to derive the exact certificate for ScaLabelCert.

pith-pipeline@v0.9.0 · 5568 in / 1217 out tokens · 38885 ms · 2026-05-10T16:14:53.940891+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[3]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[4]

0362 #1 ^H 2

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page arXiv 2000

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[3] [3]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[4] [4]

0362 #1 ^H 2

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page arXiv 2000