Are Targeted Data Poisoning Attacks as Effective as We Think?

Chenyu Zhang; Gautam Kamath; Matthew Y.R. Yang; William Xu; Yaoliang Yu; Yihan Wang; Yiwei Lu; Zuoqiu Liu

arxiv: 2509.06896 · v2 · pith:GBCB4IOSnew · submitted 2025-09-08 · 💻 cs.LG · stat.ML

Are Targeted Data Poisoning Attacks as Effective as We Think?

William Xu , Chenyu Zhang , Yihan Wang , Matthew Y.R. Yang , Zuoqiu Liu , Gautam Kamath , Yaoliang Yu , Yiwei Lu This is my paper

Pith reviewed 2026-05-25 07:38 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords targeted data poisoningattack evaluationvulnerability metricsworst-case analysismachine learning securityclean training dynamicsproactive defensepoison distances

0 comments

The pith

Metrics from clean training dynamics and poison distances can stratify test samples by their vulnerability to targeted poisoning attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that standard evaluations of targeted data poisoning report average success rates over randomly chosen targets, which hides how much harder some samples are to poison than others. It shows that metrics computed only from clean model runs—training dynamics for coarse ranking and distances to candidate poisons for finer classification—can sort samples into easy and hard categories without running attacks. A sympathetic reader would care because this changes both how attack power is measured and how defenses can be applied: focus on the worst-case samples rather than the average, and protect the vulnerable ones proactively. The experiments indicate the rankings are stable enough to support these uses.

Core claim

Given a test dataset, the authors identify the easiest and hardest examples to poison using only clean model information: coarse evaluation via clean training dynamics and fine-grained classification via poison distances and budgets. Experiments show these metrics reliably stratify samples by poisoning vulnerability.

What carries the argument

Vulnerability stratification metrics computed from clean training dynamics and poison distances that rank samples without executing attacks.

If this is right

Evaluations of targeted poisoning attacks should report success on the hardest samples rather than averages over random targets.
Defenders can use the metrics to identify the most vulnerable samples in advance and apply countermeasures only where needed.
The same clean-only metrics support both worst-case attack assessment and proactive, sample-specific defense.
Reported attack success rates may need downward revision when restricted to the hardest targets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The clean-metric approach could be tested on non-targeted poisoning or backdoor settings to see whether the same dynamics predict vulnerability.
Periodic recomputation of the metrics during training might allow defenses to adapt as the model evolves.
Combining these rankings with other robustness signals could produce a broader vulnerability map for a given dataset.

Load-bearing premise

The rankings produced by clean dynamics and poison distances will still predict actual attack success when the attacker chooses poisons adaptively or when the model architecture or training procedure changes.

What would settle it

An experiment in which an adaptive attacker achieves comparable success rates on samples the metrics label as hard versus samples labeled as easy would falsify the stratification claim.

Figures

Figures reproduced from arXiv: 2509.06896 by Chenyu Zhang, Gautam Kamath, Matthew Y.R. Yang, William Xu, Yaoliang Yu, Yihan Wang, Yiwei Lu, Zuoqiu Liu.

**Figure 2.** Figure 2: The attack success rates of gradient matching on CIFAR-10 on different poison classes [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Measuring the poisoning difficulty of GM on CIFAR-10 (training from scratch) using [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: EPA for three test instances in the class “car”. Image (a): high EPA: 0.9988; ASR: 22.22%. Image (b): medium EPA: 0.6775; ASR: 90.28%. Image (c): low EPA: 0.0275; ASR: 98.61%. sample xt, and is ineffective on further ranking the poisoning difficulty within groups of targets with similar EPA. Training from scratch: For the from-scratch setting on ResNet-18/CIFAR-10, we perform clean training on Dc with the… view at source ↗

**Figure 5.** Figure 5: (a) Correlation between pairwise δ difference and ASR difference; (b) and (c) Comparison between all of our metrics for low/high EPA samples. EPA, we apply the poisoning distance and the poison budget measure τ , where our experience suggests that a larger δ or a larger τ indicates a more difficult attack (lower ASR). Specifically, given a target sample xt, we would like to confirm whether δ and τ are capa… view at source ↗

**Figure 6.** Figure 6: Disguised copyrighted style on textual inversion with the original [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Disguised copyrighted style on textual inversion with the blurry [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Disguised copyrighted style on textual inversion with the blurry [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Disguised copyrighted style on textual inversion with the blurry [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: Disguised copyrighted style on textual inversion with the blurry [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 11.** Figure 11: Disguised copyrighted style on textual inversion with a different choice of [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗

read the original abstract

Targeted data poisoning attacks manipulate model predictions on specific test samples by injecting malicious data into training. Yet existing evaluations report average attack success rates over randomly selected targets, obscuring true worst-case effectiveness. We argue that the right evaluation focuses on the hardest samples to poison. The same reasoning applies to defense: since targeted attacks leave no footprint at the distribution level, defenders should proactively identify the most vulnerable samples and apply targeted countermeasures. Given a test dataset, this paper identifies both the easiest and hardest to poison examples based on only clean model information. Specifically, we offer coarse evaluations using clean training dynamics, and fine-grained classification on poison class using poison distances and budgets. Our experiments show these metrics reliably stratify samples by poisoning vulnerability, enabling both rigorous worst-case evaluation and proactive vulnerability-aware defense.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags a real gap in how poisoning papers report results but its metrics look fragile to adaptive attacks and the experiments are not described enough to judge.

read the letter

The main thing here is that average success rates hide the worst-case targets, so the authors want metrics that rank samples by how easy they are to poison using only clean training runs plus some distance and budget info on the poison class. That direction makes sense on its face and matches the shift toward worst-case thinking already happening in robustness work. They claim the metrics stratify samples reliably in their tests, which would let people report both average and worst-case numbers and let defenders focus effort on the vulnerable ones. If the stratification actually holds across a few standard setups, that part is useful for evaluation hygiene even if it does not change the underlying attack methods. The clean-dynamics angle is a reasonable starting point because it avoids needing the poisoned model to compute the ranking. The stress-test note is on point though: nothing in the abstract shows the ranking survives when the attacker picks poisons knowing the same metrics or when the training schedule or architecture changes. That is the load-bearing assumption and it is not obviously true. The abstract says the experiments confirm the metrics work, but without numbers, controls, or even a clear description of how the poisons were generated, it is hard to tell whether the stratification is real or just post-hoc selection on the test set. The paper is therefore mostly a call for better reporting plus a proposed heuristic; the heuristic itself needs more stress-testing before it can be treated as a reliable tool. This is the kind of paper that belongs in a workshop or a short conference track where the focus is on evaluation practices rather than a flagship venue. A serious referee could still be useful to push on the adaptive case and demand the missing controls, but only if the full experiments turn out to be more careful than the abstract suggests. I would bring it to a reading group for the discussion on evaluation norms but would not cite the metrics until they are shown to hold under adaptation.

Referee Report

2 major / 1 minor

Summary. The paper argues that targeted data poisoning evaluations should focus on the hardest-to-poison samples rather than average success rates over random targets. It proposes metrics derived from clean training dynamics for coarse evaluation and from poison distances and budgets for fine-grained classification to identify vulnerable samples. Experiments are claimed to demonstrate that these metrics reliably stratify samples by poisoning vulnerability, enabling rigorous worst-case evaluation and proactive vulnerability-aware defense.

Significance. If the metrics generalize, the work could shift poisoning research toward worst-case analysis and allow defenders to preemptively target vulnerable samples using only clean runs. The empirical focus on clean-dynamics metrics is a practical strength, but significance is reduced because the central stratification claim has not been tested against adaptive attackers or shifted training procedures.

major comments (2)

[Experiments] Experiments section: stratification is demonstrated only for the paper's non-adaptive poison generation procedure; no ablation or test is provided for an adaptive attacker who selects poisons to maximize success while knowing or optimizing against the clean-dynamics metrics, directly undermining the claim of enabling 'rigorous worst-case evaluation'.
[Abstract] Abstract: the assertion that 'experiments show these metrics reliably stratify samples by poisoning vulnerability' is presented without any reported correlation, AUC, success-rate deltas, error bars, or baseline comparisons, leaving the empirical support for the central claim difficult to evaluate.

minor comments (1)

[Methods] The definitions of 'poison distances' and 'budgets' should be stated explicitly with formulas in the methods section to ensure reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, proposing revisions where they strengthen the work without misrepresenting our contributions.

read point-by-point responses

Referee: [Experiments] Experiments section: stratification is demonstrated only for the paper's non-adaptive poison generation procedure; no ablation or test is provided for an adaptive attacker who selects poisons to maximize success while knowing or optimizing against the clean-dynamics metrics, directly undermining the claim of enabling 'rigorous worst-case evaluation'.

Authors: We agree this is a substantive limitation. Our metrics are derived exclusively from clean-model training dynamics and are therefore fixed before any poisoning occurs; they do not depend on the attack generation procedure. Nevertheless, the manuscript does not include experiments with an adaptive attacker who knows or optimizes against the metrics. We will revise the experiments and discussion sections to explicitly acknowledge this gap, clarify the scope of the current claims, and outline how the metrics could be used in future adaptive evaluations. revision: partial
Referee: [Abstract] Abstract: the assertion that 'experiments show these metrics reliably stratify samples by poisoning vulnerability' is presented without any reported correlation, AUC, success-rate deltas, error bars, or baseline comparisons, leaving the empirical support for the central claim difficult to evaluate.

Authors: The abstract is intentionally concise and defers quantitative details to the body of the paper, where we report stratification results via success-rate curves, distance-based groupings, and comparisons across poison budgets. To improve readability, we will revise the abstract to include one or two key quantitative indicators (e.g., AUC or average success-rate delta between vulnerable and robust strata) while remaining within length constraints. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical stratification claims rest on experiments, not self-referential derivations

full rationale

The paper proposes metrics derived from clean training dynamics, poison distances, and budgets to rank samples by poisoning vulnerability, then validates the ranking via experiments on actual attack success. No equations, fitted parameters, or self-citations are presented in the provided text that reduce the central claim to an input by construction; the load-bearing step is an empirical observation rather than a definitional or predictive tautology. The work is self-contained against external benchmarks because success is measured by direct attack outcomes, not by re-deriving the metrics themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that targeted poisons leave no detectable footprint at the distribution level and that clean-model signals are sufficient proxies for vulnerability. No free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Targeted attacks leave no footprint at the distribution level
Stated explicitly in the abstract as the reason defenders must use per-sample rather than distribution-level detection.

pith-pipeline@v0.9.0 · 5682 in / 1201 out tokens · 16620 ms · 2026-05-25T07:38:07.923765+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce three predictive criteria for targeted data poisoning difficulty: ergodic prediction accuracy (analyzed through clean training dynamics), poison distance, and poison budget.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ε ≥ τ := max(⟨wp, g(Dc)⟩ / W(c−1/e), 0)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 5 internal anchors

[1]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

[ADH22] C. Agarwal, D. D’souza, and S. Hooker. “Estimating example difficulty using variance of gradients”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, pp. 10368–10378. [AMW+21] H. Aghakhani, D. Meng, Y.-X. Wang, C. Kruegel, and G. Vigna. “Bullseye polytope: A scalable clean- label poisoning attack with impr...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

Remarks on some nonparametric estimates of a density function

[DLP11] R. A. Davis, K.-S. Lii, and D. N. Politis. “Remarks on some nonparametric estimates of a density function”. In:Selected Works of Murray Rosenblatt. Springer, 2011, pp. 95–100. [FCG+21] L. Fowl, P.-y. Chiang, M. Goldblum, J. Geiping, A. Bansal, W. Czaja, and T. Goldstein. “Preventing unauthorizeduseofproprietarydata:Poisoningforsecuredatasetrelease...

work page arXiv 2011
[3]

Adversarial Examples Make Strong Poisons

[FGC+21] L. Fowl, M. Goldblum, P.-y. Chiang, J. Geiping, W. Czaja, and T. Goldstein. “Adversarial Examples Make Strong Poisons”. In:Advances in Neural Information Processing Systems. 2021, pp. 30339–30351. [FHL+21] S. Fu, F. He, Y. Liu, L. Shen, and D. Tao. “Robust unlearnable examples: Protecting data privacy against adversarial learning”. In:Internation...

work page 2021
[4]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

[GBB+20] L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, J. Phang, H. He, A. Thite, N. Nabeshima, et al. “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”. arXiv preprint arXiv:2101.00027

work page internal anchor Pith review Pith/arXiv arXiv
[5]

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

[GDG17] T. Gu, B. Dolan-Gavitt, and S. Garg. “Badnets: Identifying vulnerabilities in the machine learning model supply chain”. arXiv:1708.06733

work page internal anchor Pith review Pith/arXiv arXiv
[6]

A Neural Algorithm of Artistic Style

[GEB15] L. A. Gatys, A. S. Ecker, and M. Bethge. “A neural algorithm of artistic style”. arXiv preprint arXiv:1508.06576

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Practical Poisoning Attacks on Neural Networks

[GL20] J. Guo and C. Liu. “Practical Poisoning Attacks on Neural Networks”. In:European Conference on Computer Vision. 2020, pp. 142–158. [GTX+23] M. Goldblum, D. Tsipras, C. Xie, X. Chen, A. Schwarzschild, D. Song, A. Madry, B. Li, and T. Goldstein. “Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses”. IEEE Transactions...

work page 2020
[8]

Deep Residual Learning for Image Recognition

[HZRS16] K. He, X. Zhang, S. Ren, and J. Sun. “Deep Residual Learning for Image Recognition”. In:Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2016, pp. 770–

work page 2016
[9]

Understanding black-box predictions via influence functions

[KL17] P. W. Koh and P. Liang. “Understanding black-box predictions via influence functions”. In:Proceedings of the 34th International Conference on Machine Learning (ICML). 2017, pp. 1885–1894. [KNL+20] R. S. S. Kumar, M. Nyström, J. Lambert, A. Marshall, M. Goertzel, A. Comissoneru, M. Swann, and S. Xia. “Adversarial machine learning-industry perspectiv...

work page 2017
[10]

Stronger Data Poisoning Attacks Break Data Sanitization Defenses

[KSL22] P. W. Koh, J. Steinhardt, and P. Liang. “Stronger Data Poisoning Attacks Break Data Sanitization Defenses”.Machine Learning, vol. 111 (2022), pp. 1–47. [LC10] W. Liu and S. Chawla. “Mining adversarial patterns via regularized loss minimization”.Machine learn- ing, vol. 81, no. 1 (2010), pp. 69–83. [LKY22] Y. Lu, G. Kamath, and Y. Yu. “Indiscrimina...

work page 2022
[11]

On the robustness of neural networks quantization against data poisoning attacks

[LWZY24] Y. Lu, Y. Wang, G. Zhang, and Y. Yu. “On the robustness of neural networks quantization against data poisoning attacks”. In:ICML 2024 Next Generation of AI Safety Workshop

work page 2024
[12]

Tiny imagenet visual recognition challenge

[LY15] Y. Le and X. Yang. “Tiny imagenet visual recognition challenge”.CS 231N, vol. 7, no. 7 (2015), p

work page 2015
[13]

Indiscriminate data poisoning attacks on pre-trained feature extractors

[LYKY24] Y. Lu, M. Y. Yang, G. Kamath, and Y. Yu. “Indiscriminate data poisoning attacks on pre-trained feature extractors”. In:2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE. 2024, pp. 327–343. [LYL+24] Y. Lu, M. Y. Yang, Z. Liu, G. Kamath, and Y. Yu. “Disguised Copyright Infringement of Latent Diffusion Model”. In:Internat...

work page 2024
[14]

Threats tofederated learning: A survey

[LYY20] L. Lyu, H. Yu, and Q. Yang. “Threats tofederated learning: A survey”. arXiv preprint arXiv:2003.02133

work page arXiv 2003
[15]

Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization

[MBD+17] L. Muñoz-González, B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee, E. C. Lupu, and F. Roli. “Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization”. In:Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (AISec). 2017, pp. 27–38. [RBL+22] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. O...

work page 2017
[16]

Selective classification via neural network training dynamics

13 [RTH+22] S. Rabanser, A. Thudi, K. Hamidieh, A. Dziedzic, and N. Papernot. “Selective classification via neural network training dynamics”.arXiv preprint arXiv:2205.13532(2022). [SDP+24] S. Shan, W. Ding, J. Passananti, S. Wu, H. Zheng, and B. Y. Zhao. “Nightshade: Prompt-specific poisoning attacks on text-to-image generative models”. In:2024 IEEE Symp...

work page arXiv 2022
[17]

Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated Learning

[SHKR22] V. Shejwalkar, A. Houmansadr, P. Kairouz, and D. Ramage. “Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated Learning”. In:IEEE Symposium on Security and Privacy (SP). 2022, pp. 1354–1371. [SHN+18] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, and T. Goldstein. “Poison Frogs! Targ...

work page 2022
[18]

Very Deep Convolutional Networks for Large-Scale Image Recognition

[SZ14] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition”. arXiv preprint arXiv:1409.1556(2014). [SZR+20] X. Sun, Z. Zhang, X. Ren, R. Luo, and L. Li. “Exploring the vulnerability of deep neural networks: A study of parameter corruption”. In:Proceedings of the AAAI Conference on Artificial Intelligence

work page internal anchor Pith review Pith/arXiv arXiv 2014
[19]

Microsoft chatbot is taught to swear on Twitter

[Wak16] J. Wakefield. “Microsoft chatbot is taught to swear on Twitter”.BBC News(2016). [WWSK23] A. Wan, E. Wallace, S. Shen, and D. Klein. “Poisoning language models during instruction tuning”. In: International Conference on Machine Learning. PMLR. 2023, pp. 35413–35425. [YZC+22] D. Yu, H. Zhang, W. Chen, J. Yin, and T.-Y. Liu. “Availability Attacks Cre...

work page 2016
[20]

Transferable clean-label poisoning attacks on deep neural nets

[ZHL+19] C. Zhu, W. R. Huang, H. Li, G. Taylor, C. Studer, and T. Goldstein. “Transferable clean-label poisoning attacks on deep neural nets”. In:International Conference on Machine Learning. 2019, pp. 7614–7623. 14 A Related works A.1 Data poisoning attacks Data poisoning, an emerging training-time concern in modern ML pipelines, refers to the threat of ...

work page 2019
[21]

When the kernel size is bigger than 10, the textual inversion model cannot learn any useful information

We observe that by increasing the kernel size, the cirrus effect of the generated images dramatically decreases. When the kernel size is bigger than 10, the textual inversion model cannot learn any useful information. We conclude that preserving the structure ofxb is essential for a successful data poisoning attack, highlighting the role on the appearance...

work page arXiv

[1] [1]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

[ADH22] C. Agarwal, D. D’souza, and S. Hooker. “Estimating example difficulty using variance of gradients”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, pp. 10368–10378. [AMW+21] H. Aghakhani, D. Meng, Y.-X. Wang, C. Kruegel, and G. Vigna. “Bullseye polytope: A scalable clean- label poisoning attack with impr...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[2] [2]

Remarks on some nonparametric estimates of a density function

[DLP11] R. A. Davis, K.-S. Lii, and D. N. Politis. “Remarks on some nonparametric estimates of a density function”. In:Selected Works of Murray Rosenblatt. Springer, 2011, pp. 95–100. [FCG+21] L. Fowl, P.-y. Chiang, M. Goldblum, J. Geiping, A. Bansal, W. Czaja, and T. Goldstein. “Preventing unauthorizeduseofproprietarydata:Poisoningforsecuredatasetrelease...

work page arXiv 2011

[3] [3]

Adversarial Examples Make Strong Poisons

[FGC+21] L. Fowl, M. Goldblum, P.-y. Chiang, J. Geiping, W. Czaja, and T. Goldstein. “Adversarial Examples Make Strong Poisons”. In:Advances in Neural Information Processing Systems. 2021, pp. 30339–30351. [FHL+21] S. Fu, F. He, Y. Liu, L. Shen, and D. Tao. “Robust unlearnable examples: Protecting data privacy against adversarial learning”. In:Internation...

work page 2021

[4] [4]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

[GBB+20] L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, J. Phang, H. He, A. Thite, N. Nabeshima, et al. “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”. arXiv preprint arXiv:2101.00027

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

[GDG17] T. Gu, B. Dolan-Gavitt, and S. Garg. “Badnets: Identifying vulnerabilities in the machine learning model supply chain”. arXiv:1708.06733

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

A Neural Algorithm of Artistic Style

[GEB15] L. A. Gatys, A. S. Ecker, and M. Bethge. “A neural algorithm of artistic style”. arXiv preprint arXiv:1508.06576

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Practical Poisoning Attacks on Neural Networks

[GL20] J. Guo and C. Liu. “Practical Poisoning Attacks on Neural Networks”. In:European Conference on Computer Vision. 2020, pp. 142–158. [GTX+23] M. Goldblum, D. Tsipras, C. Xie, X. Chen, A. Schwarzschild, D. Song, A. Madry, B. Li, and T. Goldstein. “Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses”. IEEE Transactions...

work page 2020

[8] [8]

Deep Residual Learning for Image Recognition

[HZRS16] K. He, X. Zhang, S. Ren, and J. Sun. “Deep Residual Learning for Image Recognition”. In:Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2016, pp. 770–

work page 2016

[9] [9]

Understanding black-box predictions via influence functions

[KL17] P. W. Koh and P. Liang. “Understanding black-box predictions via influence functions”. In:Proceedings of the 34th International Conference on Machine Learning (ICML). 2017, pp. 1885–1894. [KNL+20] R. S. S. Kumar, M. Nyström, J. Lambert, A. Marshall, M. Goertzel, A. Comissoneru, M. Swann, and S. Xia. “Adversarial machine learning-industry perspectiv...

work page 2017

[10] [10]

Stronger Data Poisoning Attacks Break Data Sanitization Defenses

[KSL22] P. W. Koh, J. Steinhardt, and P. Liang. “Stronger Data Poisoning Attacks Break Data Sanitization Defenses”.Machine Learning, vol. 111 (2022), pp. 1–47. [LC10] W. Liu and S. Chawla. “Mining adversarial patterns via regularized loss minimization”.Machine learn- ing, vol. 81, no. 1 (2010), pp. 69–83. [LKY22] Y. Lu, G. Kamath, and Y. Yu. “Indiscrimina...

work page 2022

[11] [11]

On the robustness of neural networks quantization against data poisoning attacks

[LWZY24] Y. Lu, Y. Wang, G. Zhang, and Y. Yu. “On the robustness of neural networks quantization against data poisoning attacks”. In:ICML 2024 Next Generation of AI Safety Workshop

work page 2024

[12] [12]

Tiny imagenet visual recognition challenge

[LY15] Y. Le and X. Yang. “Tiny imagenet visual recognition challenge”.CS 231N, vol. 7, no. 7 (2015), p

work page 2015

[13] [13]

Indiscriminate data poisoning attacks on pre-trained feature extractors

[LYKY24] Y. Lu, M. Y. Yang, G. Kamath, and Y. Yu. “Indiscriminate data poisoning attacks on pre-trained feature extractors”. In:2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE. 2024, pp. 327–343. [LYL+24] Y. Lu, M. Y. Yang, Z. Liu, G. Kamath, and Y. Yu. “Disguised Copyright Infringement of Latent Diffusion Model”. In:Internat...

work page 2024

[14] [14]

Threats tofederated learning: A survey

[LYY20] L. Lyu, H. Yu, and Q. Yang. “Threats tofederated learning: A survey”. arXiv preprint arXiv:2003.02133

work page arXiv 2003

[15] [15]

Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization

[MBD+17] L. Muñoz-González, B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee, E. C. Lupu, and F. Roli. “Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization”. In:Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (AISec). 2017, pp. 27–38. [RBL+22] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. O...

work page 2017

[16] [16]

Selective classification via neural network training dynamics

13 [RTH+22] S. Rabanser, A. Thudi, K. Hamidieh, A. Dziedzic, and N. Papernot. “Selective classification via neural network training dynamics”.arXiv preprint arXiv:2205.13532(2022). [SDP+24] S. Shan, W. Ding, J. Passananti, S. Wu, H. Zheng, and B. Y. Zhao. “Nightshade: Prompt-specific poisoning attacks on text-to-image generative models”. In:2024 IEEE Symp...

work page arXiv 2022

[17] [17]

Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated Learning

[SHKR22] V. Shejwalkar, A. Houmansadr, P. Kairouz, and D. Ramage. “Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated Learning”. In:IEEE Symposium on Security and Privacy (SP). 2022, pp. 1354–1371. [SHN+18] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, and T. Goldstein. “Poison Frogs! Targ...

work page 2022

[18] [18]

Very Deep Convolutional Networks for Large-Scale Image Recognition

[SZ14] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition”. arXiv preprint arXiv:1409.1556(2014). [SZR+20] X. Sun, Z. Zhang, X. Ren, R. Luo, and L. Li. “Exploring the vulnerability of deep neural networks: A study of parameter corruption”. In:Proceedings of the AAAI Conference on Artificial Intelligence

work page internal anchor Pith review Pith/arXiv arXiv 2014

[19] [19]

Microsoft chatbot is taught to swear on Twitter

[Wak16] J. Wakefield. “Microsoft chatbot is taught to swear on Twitter”.BBC News(2016). [WWSK23] A. Wan, E. Wallace, S. Shen, and D. Klein. “Poisoning language models during instruction tuning”. In: International Conference on Machine Learning. PMLR. 2023, pp. 35413–35425. [YZC+22] D. Yu, H. Zhang, W. Chen, J. Yin, and T.-Y. Liu. “Availability Attacks Cre...

work page 2016

[20] [20]

Transferable clean-label poisoning attacks on deep neural nets

[ZHL+19] C. Zhu, W. R. Huang, H. Li, G. Taylor, C. Studer, and T. Goldstein. “Transferable clean-label poisoning attacks on deep neural nets”. In:International Conference on Machine Learning. 2019, pp. 7614–7623. 14 A Related works A.1 Data poisoning attacks Data poisoning, an emerging training-time concern in modern ML pipelines, refers to the threat of ...

work page 2019

[21] [21]

When the kernel size is bigger than 10, the textual inversion model cannot learn any useful information

We observe that by increasing the kernel size, the cirrus effect of the generated images dramatically decreases. When the kernel size is bigger than 10, the textual inversion model cannot learn any useful information. We conclude that preserving the structure ofxb is essential for a successful data poisoning attack, highlighting the role on the appearance...

work page arXiv