When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks
Pith reviewed 2026-05-22 06:40 UTC · model grok-4.3
The pith
In high dimensions, stronger backdoor training triggers raise clean accuracy and cap attack success on regularized models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For regularized generalized linear models trained on Gaussian-mixture data in the proportional regime p/n → κ, increasing the training trigger strength α relative to a fixed test trigger produces three results: clean test accuracy grows with α, attack success rate reaches a maximum at a finite α and then declines, and the trigger direction that maximizes damage is the minimum eigenvector of the data covariance. The first two results hold in closed form for squared loss and extend to general convex losses via a Gaussian-proxy fixed-point system; the finite-sample noise floor proportional to κ is the mechanism that drives the rise in clean accuracy.
What carries the argument
The Gaussian-proxy fixed-point system that tracks the high-dimensional behavior of the regularized GLM under varying trigger strength α, exposing the noise floor proportional to the aspect ratio κ.
If this is right
- Clean accuracy on untriggered test data increases monotonically with the strength of the trigger used in training.
- Attack success rate reaches a maximum at an intermediate training trigger strength and then decreases.
- The trigger direction that produces the largest attack success is the smallest eigenvector of the sample covariance.
- The same qualitative dependence on trigger strength holds for any convex GLM loss through the fixed-point equations.
Where Pith is reading between the lines
- Defenders could deliberately insert stronger triggers during training to improve robustness without knowledge of the attacker's test trigger.
- The same noise-floor mechanism may limit the effectiveness of other data-poisoning or backdoor strategies once models operate in the proportional regime.
- The pattern observed in ResNet-18 experiments suggests the phenomena survive beyond convex models and could be tested on other non-convex architectures.
- Robustness evaluations that assume n much larger than p may systematically underestimate the protection that high-dimensional effects already provide.
Load-bearing premise
The inputs come from a Gaussian mixture and the learner is a regularized generalized linear model whose high-dimensional limit is captured by the Gaussian proxy.
What would settle it
A controlled experiment on Gaussian-mixture data with p/n held near a constant in which clean accuracy fails to increase or attack success fails to decline after a peak when training trigger strength α is raised.
Figures
read the original abstract
Backdoor poisoning attacks behave counter-intuitively in high dimensions: stronger training triggers can help the defender. We study regularised generalised linear models on Gaussian-mixture data in the proportional regime ($p/n \to \kappa$), varying the training trigger strength $\alpha$ against a fixed test trigger. Three phenomena emerge: (i) clean test accuracy increases with $\alpha$; (ii) attack success peaks at a finite $\alpha$ and then declines; and (iii) the most damaging trigger direction is the minimum eigenvector of the data covariance. We prove all three results in closed form for the squared loss, and extend (i) and (ii) to general convex GLM losses via a Gaussian-proxy fixed-point system. We identify a finite-sample noise floor proportional to $\kappa$ as the mechanism behind (i), invisible to classical $n \gg p$ analysis. Experiments on CIFAR-10 and Gaussian surrogates match the theory closely; ResNet-18 experiments show the same phenomena beyond the convex setting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that backdoor poisoning attacks on regularized generalized linear models trained on Gaussian-mixture data in the proportional high-dimensional regime (p/n → κ) exhibit counter-intuitive behavior: clean test accuracy increases with training trigger strength α against a fixed test trigger; attack success rate is non-monotonic in α, peaking at a finite value before declining; and the most damaging trigger direction is the minimum eigenvector of the data covariance. All three results are proved in closed form for squared loss, with (i) and (ii) extended to general convex GLM losses via a Gaussian-proxy fixed-point system. Experiments on CIFAR-10 and Gaussian surrogates, plus ResNet-18, are reported to match the predictions closely.
Significance. If the results hold, the work offers a valuable high-dimensional theory of backdoor attacks that reveals mechanisms (such as the finite-sample noise floor proportional to κ) invisible to classical n ≫ p analyses. The closed-form derivations for squared loss and the fixed-point extension provide rigorous, parameter-free insights in the proportional limit; the matching experiments on both synthetic and real data add empirical support. This could inform defense design by highlighting optimal trigger strengths or directions.
major comments (1)
- [§3.2] §3.2 (Gaussian-proxy fixed-point system): the extension of claims (i) and (ii) to general convex GLM losses rests on the proxy faithfully reproducing high-dimensional behavior for non-quadratic losses. No explicit error bound or convergence guarantee is provided when the loss deviates from quadratic or as α grows; if the fixed-point mis-captures the effective regularization or κ-proportional noise term, the predicted increase in clean accuracy and non-monotonic attack success may not hold. This is load-bearing for the general-case results.
minor comments (2)
- [Experiments] Experiments section: details on data splits, hyper-parameter choices, and exact construction of the Gaussian surrogates are not shown, which would strengthen verification of the claimed close match to theory on CIFAR-10 and ResNet-18.
- [Notation] Notation: ensure α is consistently defined as training trigger strength (distinct from test trigger) in all equations and figure captions to avoid reader confusion.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on the Gaussian-proxy fixed-point system. We address the major comment point by point below.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Gaussian-proxy fixed-point system): the extension of claims (i) and (ii) to general convex GLM losses rests on the proxy faithfully reproducing high-dimensional behavior for non-quadratic losses. No explicit error bound or convergence guarantee is provided when the loss deviates from quadratic or as α grows; if the fixed-point mis-captures the effective regularization or κ-proportional noise term, the predicted increase in clean accuracy and non-monotonic attack success may not hold. This is load-bearing for the general-case results.
Authors: We agree that the manuscript does not supply an explicit quantitative error bound on the Gaussian-proxy approximation for non-quadratic losses or large α. The fixed-point system is obtained by replacing the feature distribution with a Gaussian that matches the first two moments, which yields exact state-evolution equations in the proportional limit; this is a standard device in the high-dimensional statistics literature for GLM analysis. While a rigorous convergence rate is not derived, the predictions are validated against both synthetic Gaussian mixtures and CIFAR-10 experiments. We will revise §3.2 to include (i) a brief derivation sketch showing how the proxy arises from the replica or state-evolution analysis, (ii) additional numerical checks comparing fixed-point outputs to finite-dimensional gradient-descent trajectories for logistic and hinge losses across a range of α, and (iii) a short remark on the regime where the approximation is expected to remain accurate (smooth convex losses, moderate trigger strength). These additions will make the load-bearing character of the extension more transparent without altering the main claims. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper states it proves results in closed form for squared loss and extends via a Gaussian-proxy fixed-point system derived from the model in the proportional regime. No quoted steps reduce a prediction to a fitted parameter by construction, nor does any load-bearing claim rest on a self-citation loop or imported uniqueness theorem. The fixed-point equations are presented as analytically derived from the GLM assumptions rather than calibrated to the attack-success or accuracy curves, making the central claims independent of the target quantities.
Axiom & Free-Parameter Ledger
free parameters (2)
- κ = p/n
- α (training trigger strength)
axioms (2)
- domain assumption Data are drawn from a two-component Gaussian mixture with isotropic covariance.
- domain assumption The Gaussian-proxy fixed-point system accurately tracks the high-dimensional behavior of general convex GLM losses.
Reference graph
Works this paper leans on
-
[1]
Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring
Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring, June 2018. URLhttp://arxiv.org/abs/1802.04633. arXiv:1802.04633 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[2]
High-dimensional limit theorems for SGD: Effective dynamics and critical scaling
Gerard Ben Arous, Reza Gheissari, and Aukosh Jagannath. High-dimensional limit theorems for SGD: Effective dynamics and critical scaling. InAdvances in Neural Information Processing Systems, 2022
work page 2022
-
[3]
Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models
Jean Barbier, Florent Krzakala, Nicolas Macris, Léo Miolane, and Lenka Zdeborová. Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models.Proceedings of the National Academy of Sciences, 116(12):5451–5460, March 2019. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1802705116. URL http://arxiv.org/abs/1708.03395. arXiv:1708.03395 [cs]
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1073/pnas.1802705116 2019
- [4]
-
[5]
Peter L. Bartlett, Philip M. Long, Gábor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression.Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020
work page 2020
-
[6]
Bartlett, Andrea Montanari, and Alexander Rakhlin
Peter L. Bartlett, Andrea Montanari, and Alexander Rakhlin. Deep learning: a statistical viewpoint.Acta Numerica, 30:87–201, 2021
work page 2021
-
[7]
Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine- learning practice and the classical bias–variance trade-off.Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019
work page 2019
-
[8]
Hitting the high-dimensional notes: an ode for sgd learning dynamics on glms and multi-index models
Elizabeth Collins-Woodfin, Courtney Paquette, Elliot Paquette, and Inbar Seroussi. Hitting the high-dimensional notes: an ode for sgd learning dynamics on glms and multi-index models. Information and Inference: A Journal of the IMA, 13(4):iaae028, 2024. ISSN 2049-8772
work page 2024
-
[9]
A Model of Double Descent for High- dimensionalBinaryLinearClassification, May2020
Zeyu Deng, Abla Kammoun, and Christos Thrampoulidis. A Model of Double Descent for High- dimensionalBinaryLinearClassification, May2020. URL http://arxiv.org/abs/1911.05822. arXiv:1911.05822 [stat]
-
[10]
Edgar Dobriban and Stefan Wager. High-dimensional asymptotics of prediction: Ridge regression and classification.The Annals of Statistics, 46(1):247–279, 2018
work page 2018
-
[11]
High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing
David Donoho and Andrea Montanari. High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing, November 2013. URLhttp://arxiv.org/abs/ 1310.7320. arXiv:1310.7320 [math]
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[12]
Bickel, Chinghway Lim, and Bin Yu
Noureddine El Karoui, Derek Bean, Peter J. Bickel, Chinghway Lim, and Bin Yu. On robust regression with high-dimensional predictors.Proceedings of the National Academy of Sciences, 110(36):14557–14562, September 2013. ISSN 0027-8424, 1091-6490. doi: 10.1073/ pnas.1307842110. URL https://pnas.org/doi/full/10.1073/pnas.1307842110. Publisher: Proceedings of ...
-
[13]
Bethan Evans and Jared Tanner. Theory of Minimal Weight Perturbations in Deep Networks and its Applications for Low-Rank Activated Backdoor Attacks, 2026. 12
work page 2026
-
[14]
A Linear Approach to Data Poisoning, January 2026
Donald Flynn and Diego Granziol. A Linear Approach to Data Poisoning, January 2026. URL http://arxiv.org/abs/2505.15175. arXiv:2505.15175 [stat]
-
[15]
Safety-Efficacy Trade Off: Robustness against Data-Poisoning, January 2026
Diego Granziol. Safety-Efficacy Trade Off: Robustness against Data-Poisoning, January 2026
work page 2026
-
[16]
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain, March 2019. URLhttp://arxiv.org/abs/ 1708.06733. arXiv:1708.06733 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[17]
Hastie, Andrea Montanari, Saharon Rosset, and Ryan J
Trevor J. Hastie, Andrea Montanari, Saharon Rosset, and Ryan J. Tibshirani. Surprises in high-dimensional ridgeless least squares interpolation.The Annals of Statistics, 50(2):949–986, 2019
work page 2019
-
[18]
Deep Residual Learning for Image Recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition, December 2015. URL http://arxiv.org/abs/1512.03385. arXiv:1512.03385 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[19]
Adel Javanmard and Andrea Montanari. State Evolution for General Approximate Message Passing Algorithms, with Applications to Spatial Coupling, December 2012. URL http: //arxiv.org/abs/1211.5164. arXiv:1211.5164 [math]
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[20]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009
work page 2009
-
[21]
Backdoor Attack in the Physical World, April 2021
Yiming Li, Tongqing Zhai, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. Backdoor Attack in the Physical World, April 2021. URLhttp://arxiv.org/abs/2104.02361. arXiv:2104.02361 [cs]
-
[22]
Yiming Li, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. Backdoor Learning: A Survey.IEEE Transactions on Neural Networks and Learning Systems, 35(1):5–22, January 2024. ISSN 2162-
work page 2024
-
[23]
URL https://ieeexplore.ieee.org/abstract/ document/9802938
doi: 10.1109/TNNLS.2022.3182979. URL https://ieeexplore.ieee.org/abstract/ document/9802938
-
[24]
Tengyuan Liang and Pragya Sur. A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-$\ell_1$-Norm Interpolated Classifiers.The Annals of Statistics, 50(3), June
-
[25]
ISSN 0090-5364. doi: 10.1214/22-AOS2170. URLhttp://arxiv.org/abs/2002.01586. arXiv:2002.01586 [math]
-
[26]
Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks, July 2020
Yunfei Liu, Xingjun Ma, James Bailey, and Feng Lu. Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks, July 2020. URL http://arxiv.org/abs/2007.02343. arXiv:2007.02343 [cs]
-
[27]
Learning Gaussian Mixtures with Generalized Linear Models: Precise Asymp- totics in High-dimensions
Bruno Loureiro, Gabriele Sicuro, Cedric Gerbelot, Alessandro Pacco, Florent Krzakala, and Lenka Zdeborová. Learning Gaussian Mixtures with Generalized Linear Models: Precise Asymp- totics in High-dimensions. InAdvances in Neural Information Processing Systems, volume 34, pages 10144–10157. Curran Associates, Inc., 2021. URLhttps://proceedings.neurips.cc/ ...
work page 2021
-
[28]
Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning Attacks
Yiwei Lu, Gautam Kamath, and Yaoliang Yu. Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning Attacks. InProceedings of the 40th International Conference on Machine Learning, pages 22856–22879. PMLR, July 2023. URLhttps://proceedings.mlr. press/v202/lu23e.html. ISSN: 2640-3498. 13
work page 2023
-
[29]
Xiaoyi Mai and Zhenyu Liao. High Dimensional Classification via Regularized and Unregularized Empirical Risk Minimization: Precise Error and Optimal Loss, November 2020. URLhttp: //arxiv.org/abs/1905.13742. arXiv:1905.13742 [stat]
-
[30]
MIT Press, Cambridge, UNITED STATES, 2018
Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar.Foundations of Machine Learning, Second Edition. MIT Press, Cambridge, UNITED STATES, 2018. ISBN 978-0-262-35136-2
work page 2018
-
[31]
Andrea Montanari, Feng Ruan, Youngtak Sohn, and Jun Yan. The generalization er- ror of max-margin linear classifiers: Benign overfitting and high dimensional asymptotics in the overparametrized regime, March 2023. URL http://arxiv.org/abs/1911.01544. arXiv:1911.01544 [math]
-
[32]
Andrea Montanari, Feng Ruan, Youngtak Sohn, and Jun Yan. The generalization error of max-margin linear classifiers: Benign overfitting and high dimensional asymptotics in the overparametrized regime.The Annals of Statistics, 53(2):822–853, 2025
work page 2025
-
[33]
WaNet – Imperceptible Warping-based Backdoor Attack, March
Anh Nguyen and Anh Tran. WaNet – Imperceptible Warping-based Backdoor Attack, March
- [34]
-
[35]
Courtney Paquette, Elliot Paquette, Ben Adlam, and Jeffrey Pennington. Homogenization of sgd in high-dimensions: exact dynamics and generalization properties.Mathematical Programming, 2024
work page 2024
-
[36]
Di, Yiwei Lu, Ayush Sekhari, Gautam Kamath, and Seth Neel
Martin Pawelczyk, Jimmy Z. Di, Yiwei Lu, Ayush Sekhari, Gautam Kamath, and Seth Neel. Machine Unlearning Fails to Remove Data Poisoning Attacks, April 2025. URLhttp: //arxiv.org/abs/2406.17216. arXiv:2406.17216 [cs] version: 2
-
[37]
Generalized Approximate Message Passing for Estimation with Random Linear Mixing
Sundeep Rangan. Generalized Approximate Message Passing for Estimation with Random Linear Mixing, August 2012. URLhttp://arxiv.org/abs/1010.5141. arXiv:1010.5141 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[38]
Inbar Seroussi and Ofer Zeitouni. Lower Bounds on the Generalization Error of Nonlinear Learning Models.IEEE Transactions on Information Theory, 68(12):7956–7970, December
-
[39]
ISSN 1557-9654. doi: 10.1109/TIT.2022.3189760. URL https://ieeexplore.ieee. org/document/9825668/
-
[40]
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples, October 2025
Alexandra Souly, Javier Rando, Ed Chapman, Xander Davies, Burak Hasircioglu, Ezzeldin Shereen, Carlos Mougan, Vasilios Mavroudis, Erik Jones, Chris Hicks, Nicholas Carlini, Yarin Gal, and Robert Kirk. Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples, October 2025. URLhttp://arxiv.org/abs/2510.07192. arXiv:2510.07192 [cs]
-
[41]
A framework to characterize performance of LASSO algorithms
Mihailo Stojnic. A framework to characterize performance of LASSO algorithms, March 2013. URLhttp://arxiv.org/abs/1303.7291. arXiv:1303.7291 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[42]
Pragya Sur and Emmanuel J Candès. A modern maximum-likelihood theory for high- dimensional logistic regression.Proceedings of the National Academy of Sciences, 116(29): 14516–14525, 2019
work page 2019
-
[43]
Regularized Linear Regression: A Precise Analysis of the Estimation Error
Christos Thrampoulidis, Samet Oymak, and Babak Hassibi. Regularized Linear Regression: A Precise Analysis of the Estimation Error. InProceedings of The 28th Conference on Learning Theory, pages 1683–1709. PMLR, June 2015. URLhttps://proceedings.mlr.press/v40/ Thrampoulidis15.html. ISSN: 1938-7228. 14
work page 2015
-
[44]
Label-Consistent Backdoor Attacks, December 2019
Alexander Turner, Dimitris Tsipras, and Aleksander Madry. Label-Consistent Backdoor Attacks, December 2019. URLhttp://arxiv.org/abs/1912.02771. arXiv:1912.02771 [stat]
-
[45]
Data-Efficient Backdoor Attacks, June 2022
Pengfei Xia, Ziqiang Li, Wei Zhang, and Bin Li. Data-Efficient Backdoor Attacks, June 2022. URLhttp://arxiv.org/abs/2204.12281. arXiv:2204.12281 [cs]. 15 Appendix contents 1 Introduction 1 1.1 Our contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . ....
-
[46]
Substituting into (25) yieldsµ⊤∇Lpop(θben;α)>0. 36 C Comparing ERM and information limit C.1 Precise relation between ERM and information limit Fixed-dimensional convergence of the empirical optimiser.We briefly justify the relation- ship between the empirical optimisation problem ˆθn∈arg min θ∈Rp Ln(θ),L n(θ) = 1 n n∑ i=1 L(yix⊤ i θ) +λ 2∥θ∥2, and its po...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.