Theory of Minimal Weight Perturbations in Deep Networks and its Applications for Low-Rank Activated Backdoor Attacks

Bethan Evans; Jared Tanner

arxiv: 2601.16880 · v2 · pith:HIOS74S7new · submitted 2026-01-23 · 💻 cs.LG · cs.IT· math.IT

Theory of Minimal Weight Perturbations in Deep Networks and its Applications for Low-Rank Activated Backdoor Attacks

Bethan Evans , Jared Tanner This is my paper

Pith reviewed 2026-05-21 14:12 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT

keywords deep neural networksweight perturbationsminimal normbackdoor attackslow-rank compressionrobustnessLipschitz constants

0 comments

The pith

Minimal norm weight perturbations in deep networks are derived exactly for single layers and match multi-layer robustness guarantees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives exact formulas for the smallest weight changes needed in one layer of a deep neural network to produce a desired shift in its output. It discusses what determines the size of these changes and compares the results to broader multi-layer bounds based on Lipschitz constants, finding them similar in scale. This theory is then used to establish limits on how much a network can be compressed before certain backdoor attacks become impossible, and experiments show that low-rank compression can trigger hidden backdoors while keeping normal accuracy intact. These formulas highlight how the margins propagated backward through the network control how sensitive each layer is to small updates.

Core claim

The minimal norm weight perturbations of DNNs required to achieve a specified change in output are derived and the factors determining its size are discussed. These single-layer exact formulae are contrasted with more generic multi-layer Lipschitz constant based robustness guarantees; both are observed to be of the same order which indicates similar efficacy in their guarantees. These results are applied to precision-modification-activated backdoor attacks, establishing provable compression thresholds below which such attacks cannot succeed, and show empirically that low-rank compression can reliably activate latent backdoors while preserving full-precision accuracy.

What carries the argument

Exact single-layer minimal-norm weight perturbation formulas based on back-propagated margins, which quantify the smallest parameter updates needed for a target output change.

If this is right

Single-layer exact formulas provide robustness guarantees of the same order as multi-layer Lipschitz constant methods.
Provable thresholds exist on compression levels below which precision-modification backdoor attacks cannot succeed.
Low-rank compression can activate latent backdoors in networks while preserving full-precision accuracy.
Back-propagated margins directly govern the sensitivity of each layer to weight perturbations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These formulas could be used to design more efficient ways to update or fine-tune models with minimal changes.
Similar analysis might apply to other network modifications like pruning or quantization beyond backdoors.
Network designers might use the compression thresholds to set safe limits against potential attacks.

Load-bearing premise

The network behaves locally linearly or is differentiable at the point of interest, so that back-propagated margins control layer-wise sensitivity to perturbations.

What would settle it

Measuring the actual smallest weight perturbation norm required to achieve a specific output change in a trained deep network and finding it significantly deviates from the predicted single-layer formula.

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Single-layer exact minimal-norm perturbation formulas are the main new piece here, but the multi-layer comparison and backdoor thresholds lean on local linearity holding at the operating point.

read the letter

The one or two things to know are that the paper derives exact single-layer formulas for the smallest weight perturbation needed to shift the output by a target amount, and then applies this to find when low-rank compression can activate backdoors in a provable way. The new contribution is those closed-form expressions for one layer. They seem to come directly from the back-propagated margins and give the minimal norm rather than an upper bound. This is a step beyond the generic Lipschitz constant approaches mentioned in the literature. The paper does well here by discussing the factors that determine the size of these perturbations and by showing through comparison that they are of similar order to multi-layer robustness guarantees. That indicates the single-layer view captures much of the sensitivity without needing the full network analysis. On the application side, establishing provable compression thresholds below which backdoor attacks cannot succeed is a useful concrete result, and the empirical demonstration that low-rank compression can reliably activate latent backdoors while keeping full-precision accuracy is practical. The soft spots are around the assumptions needed for the multi-layer extension and the threshold claims. The work relies on the network being locally linear or differentiable so that the back-propagated margins govern the layer-wise sensitivity. If the operating point is near a ReLU kink or if the perturbation changes the activation pattern, the order equivalence might not hold precisely, and the compression thresholds could be less reliable. There is no mention of error analysis or how they validate the local linearity in the experiments, so that part feels thinner. The citation pattern looks fine, building on standard sensitivity analysis without circularity. This paper is aimed at researchers working on model compression, robustness certification, and security of deployed neural networks. A reader interested in layer-wise analysis or backdoor vulnerabilities in quantized or pruned models would get value from the formulas and the thresholds. It deserves a serious referee because the core derivations appear grounded and the application is novel enough to check thoroughly. I would recommend sending this to peer review rather than desk rejecting it.

Referee Report

1 major / 2 minor

Summary. The manuscript derives exact formulas for the minimal L2-norm weight perturbations in individual layers of deep neural networks needed to induce a specified change in the network's output. It discusses the factors that determine the size of these perturbations and contrasts the single-layer results with multi-layer robustness guarantees based on Lipschitz constants, noting that both are of similar order. These theoretical results are then applied to low-rank activated backdoor attacks, where provable thresholds on model compression are established below which such attacks cannot be activated. Empirical experiments demonstrate that low-rank compression can reliably trigger latent backdoors without degrading full-precision accuracy. The work emphasizes the role of back-propagated margins in determining layer-wise sensitivity and aims to provide certifiable guarantees on minimal parameter updates.

Significance. Should the central derivations prove correct and the local linearity assumptions hold with sufficient accuracy, this paper contributes a precise theoretical tool for analyzing DNN sensitivity and robustness. The exact single-layer formulae offer more specific insights than generic Lipschitz bounds. The application to backdoor attacks via compression is novel and could inform defense strategies in model deployment. The empirical results provide supporting evidence for the practical utility of the theory. This could be significant for the fields of adversarial machine learning and model compression.

major comments (1)

The equivalence of orders between the single-layer minimal norm perturbations and the multi-layer Lipschitz bounds is load-bearing for the claim of similar efficacy and for deriving the compression thresholds. The manuscript invokes local differentiability when extending to multi-layer cases, but does not quantify the approximation error from higher-order terms or inter-layer interactions. This is particularly relevant near ReLU kinks or after low-rank updates, as noted in the stress-test concern.

minor comments (2)

The abstract is dense; consider splitting the description of the backdoor application into a separate sentence for clarity.
Define 'back-propagated margins' explicitly in the main text before using it in the formulae.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for identifying a key point regarding the multi-layer extension. We address the major comment below and will incorporate clarifications in the revised manuscript.

read point-by-point responses

Referee: The equivalence of orders between the single-layer minimal norm perturbations and the multi-layer Lipschitz bounds is load-bearing for the claim of similar efficacy and for deriving the compression thresholds. The manuscript invokes local differentiability when extending to multi-layer cases, but does not quantify the approximation error from higher-order terms or inter-layer interactions. This is particularly relevant near ReLU kinks or after low-rank updates, as noted in the stress-test concern.

Authors: We agree that quantifying the approximation error strengthens the result. The single-layer formulas are exact under the local linearity assumption, and the order equivalence with Lipschitz bounds follows from the chain rule applied to back-propagated margins, which already incorporates inter-layer effects. In the revision we will add an explicit discussion of the Taylor remainder term, providing a bound on higher-order contributions assuming bounded Hessian away from ReLU points. Near kinks the minimal-norm direction remains aligned with the subgradient, preserving the order; we will reference the existing stress-tests to show that empirical activation thresholds remain predictive even after low-rank updates. This leading-order analysis is sufficient to establish the existence of compression thresholds, as higher-order terms affect only multiplicative constants rather than the scaling that determines the threshold. revision: yes

Circularity Check

0 steps flagged

Derivation chain self-contained; no reductions to inputs by construction

full rationale

The paper derives single-layer exact minimal-norm weight perturbation formulae under a local differentiability assumption, using back-propagated margins to govern layer-wise sensitivity. These are then contrasted with multi-layer Lipschitz constant bounds, with the observation that both are of the same order presented as an external comparison rather than an identity. The application to provable compression thresholds for low-rank backdoor attacks follows from this contrast and the derived expressions. No equations, self-citations, fitted parameters renamed as predictions, or ansatzes smuggled via prior work are indicated in the abstract or description as load-bearing for the central claims. The chain remains independent of its own outputs and does not reduce to self-definition or tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard differentiability of the network and the existence of well-defined back-propagated margins; no free parameters or new invented entities are mentioned in the abstract.

axioms (1)

domain assumption Network is differentiable at the evaluation point so that back-propagated margins exist and control layer sensitivity.
Invoked when deriving single-layer exact formulae and when extending them to multi-layer Lipschitz comparison.

pith-pipeline@v0.9.0 · 5645 in / 1302 out tokens · 38767 ms · 2026-05-21T14:12:31.013328+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 4.1 ... γ(x;θ) ≤ 2^{(p-1)/p} L_θ ‖Δθ‖_p ... parameter-space Lipschitz constant

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.