Certified Robustness to Data Poisoning in Gradient-Based Training

Calvin Tsay; Mark N. M\"uller; Matthew Wicker; Maximilian Baader; Philip Sosnin

arxiv: 2406.05670 · v3 · pith:ZFIKHNUQnew · submitted 2024-06-09 · 💻 cs.LG · cs.CR· cs.CV

Certified Robustness to Data Poisoning in Gradient-Based Training

Philip Sosnin , Mark N. M\"uller , Maximilian Baader , Calvin Tsay , Matthew Wicker This is my paper

classification 💻 cs.LG cs.CRcs.CV

keywords datamodelpoisoningattacksbackdoorbehaviorlearningalgorithm

0 comments

read the original abstract

Modern machine learning pipelines leverage large amounts of public data, making it infeasible to guarantee data quality and leaving models open to poisoning and backdoor attacks. Provably bounding model behavior under such attacks remains an open problem. In this work, we address this challenge by developing the first framework providing provable guarantees on the behavior of models trained with potentially manipulated data without modifying the model or learning algorithm. In particular, our framework certifies robustness against untargeted and targeted poisoning, as well as backdoor attacks, for bounded and unbounded manipulations of the training inputs and labels. Our method leverages convex relaxations to over-approximate the set of all possible parameter updates for a given poisoning threat model, allowing us to bound the set of all reachable parameters for any gradient-based learning algorithm. Given this set of parameters, we provide bounds on worst-case behavior, including model performance and backdoor success rate. We demonstrate our approach on multiple real-world datasets from applications including energy consumption, medical imaging, and autonomous driving.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Relaxation-Informed Training of Neural Network Surrogate Models
math.OC 2026-04 conditional novelty 7.0

Regularizers that penalize big-M constants, unstable neurons, and per-sample LP relaxation gaps during neural network training reduce MILP solve times by up to four orders of magnitude while preserving surrogate accuracy.
Robustness Certificates for Neural Networks against Adversarial Attacks
cs.LG 2025-12 unverdicted novelty 6.0

A barrier-certificate framework certifies non-trivial robustness radii for neural networks under worst-case l_p poisoning during training and at test time, with PAC bounds derived via scenario convex programming.