Hypnopaedia-Aware Machine Unlearning via Psychometrics of Artificial Mental Imagery

Anastasia Kordoni; Ching-Chun Chang; Christopher Leckie; Isao Echizen; Kai Gao; Shuying Xu

arxiv: 2410.05284 · v2 · submitted 2024-09-29 · 💻 cs.CR · cs.AI· cs.CV· cs.LG

Hypnopaedia-Aware Machine Unlearning via Psychometrics of Artificial Mental Imagery

Ching-Chun Chang , Kai Gao , Shuying Xu , Anastasia Kordoni , Christopher Leckie , Isao Echizen This is my paper

Pith reviewed 2026-05-23 21:05 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.CVcs.LG

keywords neural backdoorsmachine unlearningmodel inversionbackdoor detectionartificial mental imageryAI securitycybersecurityhypnopaedia

0 comments

The pith

A self-aware unlearning method uses model inversion to generate artificial mental imagery from neural networks, then applies hypothesis analysis to estimate and detach backdoor triggers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to create an autonomous system that detects and removes hidden backdoor triggers in trained models without external labels or manual intervention. It does so by inverting the model to produce artificial mental imagery, applying stochastic disruption during inversion, and then running statistical hypothesis tests to score how likely each candidate pattern is to be the actual trigger. The goal is to preserve the model's original task performance while breaking the conditioned response to the trigger. A reader would care because backdoors allow stealthy control of AI systems through data poisoning, and an internal detection loop could reduce reliance on trusted training sources.

Core claim

Through reverse engineering and model inversion, artificial mental imagery is elicited from the network; stochastic processes are introduced to prevent convergence on flawed patterns; hypothesis analysis then assigns infection probabilities to candidate triggers, enabling the model to autonomously detach its behavior from the backdoor while retaining knowledge fidelity.

What carries the argument

The psychometrics of artificial mental imagery, produced by model inversion followed by hypothesis analysis that scores each generated pattern for likelihood of being the true trigger.

If this is right

The framework supports continuous monitoring of threats from untrustworthy data sources without halting training.
Autonomous detachment from the trigger occurs while the equilibrium between task accuracy and security is preserved.
Deceptive patterns are identified through statistical inference rather than human inspection of weights or data.
Infection likelihood is quantified for each potential trigger, allowing prioritized removal decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same inversion-plus-hypothesis loop could be tested on models that have undergone multiple sequential backdoor insertions to measure cumulative detection accuracy.
If the generated imagery proves separable in practice, the approach might be combined with periodic self-audits during deployment to catch triggers introduced after initial training.
Extending the stochastic disruption step to other inversion-based attacks could reveal whether the method generalizes beyond backdoors to related model-extraction vulnerabilities.

Load-bearing premise

Model inversion will reliably surface the backdoor trigger inside the generated imagery as a pattern that can be statistically distinguished from legitimate features.

What would settle it

Apply the full pipeline to a model with a known, verifiable backdoor trigger and check whether the hypothesis analysis assigns a markedly higher infection probability to the true trigger than to any other candidate pattern.

Figures

Figures reproduced from arXiv: 2410.05284 by Anastasia Kordoni, Ching-Chun Chang, Christopher Leckie, Isao Echizen, Kai Gao, Shuying Xu.

**Figure 2.** Figure 2: Psychometric profile that shows probability of infection, backdoor [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of backdoor attack through implanting triggers into samples [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Systematic pipeline for backdoor defence consisting of model inversion, hypothesis analysis and machine unlearning. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Illustration of multi-scale model inversion for projecting an artificial mental image with a random initial noise. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Illustration of hypothesis analysis process for an artificial mental [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Illustration of machine unlearning process. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Visualisation of reverse-engineered triggers with various methods. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: Visualisation of artificial mental images from models of both infected and uninfected states. [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Visualisation of natural triggers along with artificial mental images from test and surrogate models of uninfected state. [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

read the original abstract

Neural backdoors represent insidious cybersecurity loopholes that render learning machinery vulnerable to unauthorised manipulations, potentially enabling the weaponisation of artificial intelligence with catastrophic consequences. A backdoor attack involves the clandestine infiltration of a trigger during the learning process, metaphorically analogous to hypnopaedia, where ideas are implanted into a subject's subconscious mind under the state of hypnosis or unconsciousness. When activated by a sensory stimulus, the trigger evokes a conditioned reflex that directs a machine to mount a predetermined response. In this study, we propose a cybernetic framework for constant surveillance of backdoor threats, driven by the dynamic nature of untrustworthy data sources. We develop a self-aware unlearning mechanism to autonomously detach a machine's behaviour from the backdoor trigger. Through reverse engineering and statistical inference, we detect deceptive patterns and estimate the likelihood of backdoor infection. We employ model inversion to elicit artificial mental imagery, using stochastic processes to disrupt optimisation pathways and avoid convergent but potentially flawed patterns. This is followed by hypothesis analysis, which estimates the likelihood of each potentially malicious pattern as the true trigger and infers the probability of infection. The primary objective of this study is to maintain a stable state of equilibrium between knowledge fidelity and backdoor vulnerability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a metaphorical framing for backdoor unlearning but supplies no algorithms, derivations, or results to make the claims testable.

read the letter

The core pitch is a self-aware unlearning pipeline that runs model inversion to produce artificial mental imagery, applies stochastic disruption, then uses hypothesis analysis to estimate backdoor infection probability and detach the trigger. The stated goal is an equilibrium between task performance and reduced vulnerability. That is the entire contribution on offer. Backdoor risks are real and unlearning without full retraining would matter if it could be shown to work, but nothing here demonstrates either the separation or the unlearning step. The analogies to hypnopaedia and psychometrics add no technical content. Prior work on trigger inversion and unlearning already covers the same high-level steps; this version adds no measurable improvement or new formalization. The description never defines an inversion objective, the form of the stochastic process, the hypothesis test, or any condition under which the trigger becomes statistically separable from legitimate features. Without those pieces the downstream probability estimate cannot be performed, and the equilibrium claim remains an assertion rather than a result. No datasets, no metrics, no ablation, and no comparison to existing methods appear. The assumption that inversion will reliably surface a distinguishable trigger is left unexamined, which is the load-bearing step. Readers working on conceptual AI-security framing might find the language interesting, but anyone needing reproducible methods or falsifiable claims will not. I would not bring this to a reading group or cite it. It does not meet the threshold for serious referee time.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes a cybernetic framework for detecting and unlearning neural backdoors, analogizing them to hypnopaedia. It claims a self-aware unlearning mechanism that performs reverse engineering followed by model inversion (using stochastic processes to disrupt optimization pathways) to elicit 'artificial mental imagery,' then applies hypothesis analysis to estimate the likelihood that each pattern is the true trigger and infers the probability of backdoor infection, with the goal of maintaining equilibrium between knowledge fidelity and vulnerability.

Significance. If the claimed pipeline could be formalized and shown to work, it would address an important problem in AI security by offering an autonomous, self-aware unlearning approach. The manuscript, however, supplies no algorithms, objective functions, statistical tests, or results, so no assessment of actual significance is possible.

major comments (3)

[Abstract] Abstract: the central claim that model inversion will 'elicit artificial mental imagery' amenable to statistical separation from legitimate features is unsupported; no inversion objective, loss, or stochastic disruption procedure is defined, and no condition guaranteeing distinguishability is stated. This is load-bearing for the entire unlearning and probability-estimation pipeline.
[Abstract] Abstract: the hypothesis analysis step is described only as estimating 'the likelihood of each potentially malicious pattern as the true trigger,' with no hypothesis test, likelihood model, or decision rule provided; without these, the infection-probability inference cannot be performed or validated.
[Abstract] Abstract / full text: the manuscript states the existence of mechanisms and an objective but contains no derivation, algorithm, dataset, experiment, or proof that the described steps achieve the claimed equilibrium between fidelity and vulnerability.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the detailed review. The manuscript introduces a conceptual cybernetic framework for backdoor unlearning via the hypnopaedia analogy and self-aware mechanisms, without formal algorithms or experiments. We address each major comment below, agreeing where the presentation lacks detail and noting planned revisions to clarify scope.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that model inversion will 'elicit artificial mental imagery' amenable to statistical separation from legitimate features is unsupported; no inversion objective, loss, or stochastic disruption procedure is defined, and no condition guaranteeing distinguishability is stated. This is load-bearing for the entire unlearning and probability-estimation pipeline.

Authors: We agree that the abstract presents model inversion at a conceptual level only, without defining an objective function, loss, or explicit stochastic disruption procedure. The manuscript relies on the high-level description of using stochastic processes to avoid flawed convergence. This is a limitation of the current text. We will revise to add a high-level description of the inversion as a stochastic optimization process with noise injection to promote exploration of potential trigger patterns. revision: partial
Referee: [Abstract] Abstract: the hypothesis analysis step is described only as estimating 'the likelihood of each potentially malicious pattern as the true trigger,' with no hypothesis test, likelihood model, or decision rule provided; without these, the infection-probability inference cannot be performed or validated.

Authors: We agree that no specific hypothesis test, likelihood model, or decision rule is provided. The description remains at the level of statistical inference on elicited patterns. This reflects the conceptual focus of the work. In revision we will outline a possible likelihood estimation approach based on pattern consistency across multiple inversions and a simple threshold rule for inferring infection probability. revision: partial
Referee: [Abstract] Abstract / full text: the manuscript states the existence of mechanisms and an objective but contains no derivation, algorithm, dataset, experiment, or proof that the described steps achieve the claimed equilibrium between fidelity and vulnerability.

Authors: We acknowledge that the manuscript contains no derivations, algorithms, datasets, experiments, or proofs, as it is positioned as a high-level framework proposal introducing the hypnopaedia-inspired analogy. The equilibrium is stated as the primary objective but not demonstrated. We will revise the abstract and introduction to explicitly characterize the contribution as conceptual and to indicate that formalization and validation are left for subsequent work. revision: yes

standing simulated objections not resolved

The manuscript contains no algorithms, objective functions, statistical tests, datasets, experiments, or proofs, preventing any empirical or formal demonstration of the pipeline.

Circularity Check

0 steps flagged

No circularity; derivation remains at descriptive level without self-referential reductions

full rationale

The provided abstract and description outline a high-level pipeline of reverse engineering, model inversion with stochastic disruption, and hypothesis analysis to estimate infection probability, but contain no equations, fitted parameters presented as independent predictions, self-citations, or uniqueness theorems. No load-bearing step reduces by construction to its own inputs; the claims are methodological assertions without demonstrated mathematical equivalence to the data or priors used. The derivation is therefore self-contained at the level of description.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities with independent evidence; the central narrative rests on the unstated premise that model inversion produces interpretable trigger representations.

invented entities (1)

artificial mental imagery no independent evidence
purpose: Elicit backdoor triggers via model inversion for subsequent statistical analysis
Introduced as the output of model inversion in the abstract; no independent evidence or falsifiable prediction is supplied.

pith-pipeline@v0.9.0 · 5771 in / 1268 out tokens · 34117 ms · 2026-05-23T21:05:15.787148+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages

[1]

Backdoor learning: A survey,

Y . Li, Y . Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 35, no. 1, pp. 5–22, 2024

work page 2024
[2]

I. P. Pavlov, Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex . Oxford, UK: Oxford University Press, 1927

work page 1927
[3]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. Int. Conf. Artif. Intell. Stat. (AISTATS) , vol. 54, Fort Lauderdale, FL, USA, 2017, pp. 1273–1282

work page 2017
[4]

Evolutionary principles in self-referential learning,

J. Schmidhuber, “Evolutionary principles in self-referential learning,” Diploma Thesis, Technische Universitä t Mü nchen, Munich, Germany, 1987

work page 1987
[5]

Thrun, Lifelong Learning Algorithms

S. Thrun, Lifelong Learning Algorithms. New York, NY , USA: Springer, 1998, pp. 181–209

work page 1998
[6]

Building machines that learn and think like people,

B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, “Building machines that learn and think like people,” Behav. Brain Sci., vol. 40, pp. 1–72, 2017

work page 2017
[7]

Online meta- learning,

C. Finn, A. Rajeswaran, S. Kakade, and S. Levine, “Online meta- learning,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 97, Long Beach, CA, USA, 2019, pp. 1920–1930

work page 2019
[8]

Backdoor attacks against deep learning systems in the physi- cal world,

E. Wenger, J. Passananti, A. N. Bhagoji, Y . Yao, H. Zheng, and B. Y . Zhao, “Backdoor attacks against deep learning systems in the physi- cal world,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Nashville, TN, USA, 2021, pp. 6202–6211

work page 2021
[9]

BadNets: Evaluating backdooring attacks on deep neural networks,

T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “BadNets: Evaluating backdooring attacks on deep neural networks,” IEEE Access, vol. 7, pp. 47 230–47 244, 2019

work page 2019
[10]

Invisible backdoor attack with sample-specific triggers,

Y . Li, Y . Li, B. Wu, L. Li, R. He, and S. Lyu, “Invisible backdoor attack with sample-specific triggers,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Montréal, QC, Canada, 2021, pp. 16 443–16 452

work page 2021
[11]

Dynamic backdoor attacks against machine learning models,

A. Salem, R. Wen, M. Backes, S. Ma, and Y . Zhang, “Dynamic backdoor attacks against machine learning models,” in Proc. IEEE Eur. Symp. Secur. Priv. (EuroS&P), Genoa, Italy, 2022, pp. 703–718

work page 2022
[12]

Poison ink: Robust and invisible backdoor attack,

J. Zhang, C. Dongdong, Q. Huang, J. Liao, W. Zhang, H. Feng, G. Hua, and N. Yu, “Poison ink: Robust and invisible backdoor attack,” IEEE Trans. Image Process., vol. 31, pp. 5691–5705, 2022

work page 2022
[13]

Trojaning attack on neural networks,

Y . Liu, S. Ma, Y . Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” in Proc. Netw. Distrib. Syst. Secur. Symp. (NDSS), San Diego, CA, USA, 2018, pp. 1–15

work page 2018
[14]

Witches’ brew: Industrial scale data poisoning via gradient matching,

J. Geiping, L. H. Fowl, W. R. Huang, W. Czaja, G. Taylor, M. Moeller, and T. Goldstein, “Witches’ brew: Industrial scale data poisoning via gradient matching,” in Proc. Int. Conf. Learn. Representations (ICLR) , Vienna, Austria, 2021, pp. 1–24

work page 2021
[15]

Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch,

H. Souri, L. Fowl, R. Chellappa, M. Goldblum, and T. Goldstein, “Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 35, New Orleans, LA, USA, 2022, pp. 19 165–19 178

work page 2022
[16]

How to backdoor federated learning,

E. Bagdasaryan, A. Veit, Y . Hua, D. Estrin, and V . Shmatikov, “How to backdoor federated learning,” in Proc. Int. Conf. Artif. Intell. Stat. (AISTATS), vol. 108, Online, 2020, pp. 2938–2948

work page 2020
[17]

Back- door embedding in convolutional neural network models via invisible perturbation,

H. Zhong, C. Liao, A. C. Squicciarini, S. Zhu, and D. Miller, “Back- door embedding in convolutional neural network models via invisible perturbation,” in Proc. ACM Conf. Data Appl. Secur. Priv. (CODASPY), New Orleans, LA, USA, 2020, pp. 97–108

work page 2020
[18]

Invisible backdoor attacks on deep neural networks via steganography and regularization,

S. Li, M. Xue, B. Z. H. Zhao, H. Zhu, and X. Zhang, “Invisible backdoor attacks on deep neural networks via steganography and regularization,” IEEE Trans. Dependable Secure Comput., vol. 18, no. 5, pp. 2088–2105, 2021

work page 2088
[19]

An invisible black-box backdoor attack through frequency domain,

T. Wang, Y . Yao, F. Xu, S. An, H. Tong, and T. Wang, “An invisible black-box backdoor attack through frequency domain,” in Proc. Eur. Conf. Comput. Vis. (ECCV) , Tel Aviv, Israel, 2022, pp. 396–413

work page 2022
[20]

Spectral signatures in backdoor attacks,

B. Tran, J. Li, and A. M ˛ adry, “Spectral signatures in backdoor attacks,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , Montréal, QC, Canada, 2018, pp. 8011–8021

work page 2018
[21]

Detecting backdoor attacks on deep neural networks by activation clustering,

B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networks by activation clustering,” in Proc. Assoc. Adv. Artif. Intell. Workshop (AAAI), vol. 2301, Honolulu, Hawaii, 2019, pp. 1–8

work page 2019
[22]

Demon in the variant: Sta- tistical analysis of DNNs for robust backdoor contamination detection,

D. Tang, X. Wang, H. Tang, and K. Zhang, “Demon in the variant: Sta- tistical analysis of DNNs for robust backdoor contamination detection,” in Proc. USENIX Secur. Symp. (USENIX), Online, 2021, pp. 1541–1558

work page 2021
[23]

SPECTRE: defending against backdoor attacks using robust statistics,

J. Hayase, W. Kong, R. Somani, and S. Oh, “SPECTRE: defending against backdoor attacks using robust statistics,” in Proc. Int. Conf. Mach. Learn. (ICML) , vol. 139, Online, 2021, pp. 4129–4139

work page 2021
[24]

Rethinking the backdoor attacks’ triggers: A frequency perspective,

Y . Zeng, W. Park, Z. M. Mao, and R. Jia, “Rethinking the backdoor attacks’ triggers: A frequency perspective,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Montréal, QC, Canada, 2021, pp. 16 453–16 461

work page 2021
[25]

Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff,

E. Borgnia, V . Cherepanova, L. Fowl, A. Ghiasi, J. Geiping, M. Gold- blum, T. Goldstein, and A. Gupta, “Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) , Toronto, ON, Canada, 2021, pp. 3855–3859

work page 2021
[26]

CutMix: Reg- ularization strategy to train strong classifiers with localizable features,

S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y . Yoo, “CutMix: Reg- ularization strategy to train strong classifiers with localizable features,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, Korea, 2019, pp. 6023–6032

work page 2019
[27]

Can you really backdoor federated learning?

A. T. Suresh, B. McMahan, P. Kairouz, and Z. Sun, “Can you really backdoor federated learning?” in Proc. Int. Workshop Neural Inf. Pro- cess. Syst. (NeurIPS) , Vancouver, BC, Canada, 2019, pp. 1–10

work page 2019
[28]

Robust anomaly detection and backdoor attack detection via differential privacy,

M. Du, R. Jia, and D. Song, “Robust anomaly detection and backdoor attack detection via differential privacy,” in Proc. Int. Conf. Learn. Represent. (ICLR), Addis Ababa, Ethiopia, 2020, pp. 1–19

work page 2020
[29]

Local and central differential privacy for robustness and privacy in federated learning,

M. Naseri, J. Hayes, and E. D. Cristofaro, “Local and central differential privacy for robustness and privacy in federated learning,” in Proc. Netw. Distrib. Syst. Secur. Symp. (NDSS) , San Diego, CA, USA, 2022, pp. 1–18

work page 2022
[30]

Deep learning with differential privacy,

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Tal- war, and L. Zhang, “Deep learning with differential privacy,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur. (CCS) , Vienna, Austria, 2016, pp. 308–318

work page 2016
[31]

Deep partition aggregation: Provable defenses against general poisoning attacks,

A. Levine and S. Feizi, “Deep partition aggregation: Provable defenses against general poisoning attacks,” in Proc. Int. Conf. Learn. Represent. (ICLR), Online, 2021, pp. 1–20

work page 2021
[32]

Intrinsic certified robustness of bagging against data poisoning attacks,

J. Jia, X. Cao, and N. Z. Gong, “Intrinsic certified robustness of bagging against data poisoning attacks,” in Proc. Assoc. Adv. Artif. Intell. Conf. (AAAI), vol. 35, no. 9, Online, 2021, pp. 7961–7969

work page 2021
[33]

Certified robustness of nearest neighbors against data poisoning and backdoor attacks,

J. Jia, Y . Liu, X. Cao, and N. Z. Gong, “Certified robustness of nearest neighbors against data poisoning and backdoor attacks,” in Proc. Assoc. Adv. Artif. Intell. Conf. (AAAI) , vol. 36, no. 9, Online, 2022, pp. 9575– 9583

work page 2022
[34]

Bagging predictors,

L. Breiman, “Bagging predictors,” Mach. Learn. , vol. 24, no. 2, pp. 123–140, 1996

work page 1996
[35]

STRIP: A defence against trojan attacks on deep neural networks,

Y . Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, “STRIP: A defence against trojan attacks on deep neural networks,” in Proc. Annu. Comput. Secur. Appl. Conf. (ACSAC) , San Juan, PR, USA, 2019, pp. 113–125

work page 2019
[36]

Deep probabilistic models to detect data poisoning attacks,

M. Subedar, N. A. Ahuja, R. Krishnan, I. J. Ndiour, and O. Tickoo, “Deep probabilistic models to detect data poisoning attacks,” inProc. Int. Workshop Neural Inf. Process. Syst. (NeurIPS), Vancouver, BC, Canada, 2019, pp. 1–5

work page 2019
[37]

CleaNN: Accelerated trojan shield for embedded neural networks,

M. Javaheripi, M. Samragh, G. Fields, T. Javidi, and F. Koushanfar, “CleaNN: Accelerated trojan shield for embedded neural networks,” in Proc. Int. Conf. Comput. Aided Des. (ICCAD) , Online, 2020, pp. 1–9

work page 2020
[38]

DeepSweep: An evaluation framework for mitigating DNN backdoor attacks using data augmentation,

H. Qiu, Y . Zeng, S. Guo, T. Zhang, M. Qiu, and B. Thuraisingham, “DeepSweep: An evaluation framework for mitigating DNN backdoor attacks using data augmentation,” in Proc. ACM Asia Conf. Comput. Commun. Secur. (AsiaCCS), Online, 2021, pp. 363–377. JOURNAL OF LATEX 15

work page 2021
[39]

Februus: Input purification defense against trojan attacks on deep neural network systems,

B. G. Doan, E. Abbasnejad, and D. C. Ranasinghe, “Februus: Input purification defense against trojan attacks on deep neural network systems,” in Proc. Annu. Comput. Secur. Appl. Conf. (ACSAC). Austin, TX, USA: Association for Computing Machinery, 2020, pp. 897–912

work page 2020
[40]

Model agnostic defence against backdoor attacks in machine learning,

S. Udeshi, S. Peng, G. Woo, L. Loh, L. Rawshan, and S. Chattopadhyay, “Model agnostic defence against backdoor attacks in machine learning,” IEEE Trans. Reliab., vol. 71, no. 2, pp. 880–895, 2022

work page 2022
[41]

On certifying robust- ness against backdoor attacks via randomized smoothing,

B. Wang, X. Cao, J. jia, and N. Z. Gong, “On certifying robust- ness against backdoor attacks via randomized smoothing,” in Proc. IEEE/CVF Workshop Comput. Vis. Pattern Recognit. (CVPR) , Online, 2020, pp. 1–5

work page 2020
[42]

Certified robustness to label-flipping attacks via randomized smoothing,

E. Rosenfeld, E. Winston, P. Ravikumar, and J. Z. Kolter, “Certified robustness to label-flipping attacks via randomized smoothing,” in Proc. Int. Conf. Mach. Learn. (ICML) , Online, 2020, pp. 8230–8241

work page 2020
[43]

RAB: Provable robustness against backdoor attacks,

M. Weber, X. Xu, B. Karlaš, C. Zhang, and B. Li, “RAB: Provable robustness against backdoor attacks,” in Proc. IEEE Symp. Secur. and Priv. (SP), San Francisco, CA, USA, 2023, pp. 1311–1328

work page 2023
[44]

Universal litmus patterns: Revealing backdoor attacks in CNNs,

S. Kolouri, A. Saha, H. Pirsiavash, and H. Hoffmann, “Universal litmus patterns: Revealing backdoor attacks in CNNs,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Seattle, W A, USA, 2020, pp. 298–307

work page 2020
[45]

Detecting AI trojans using meta neural analysis,

X. Xu, Q. Wang, H. Li, N. Borisov, C. A. Gunter, and B. Li, “Detecting AI trojans using meta neural analysis,” in Proc. IEEE Symp. Secur. and Priv. (SP), San Francisco, CA, USA, 2021, pp. 103–120

work page 2021
[46]

One-pixel signature: Character- izing CNN models for backdoor detection,

S. Huang, W. Peng, Z. Jia, and Z. Tu, “One-pixel signature: Character- izing CNN models for backdoor detection,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Glasgow, UK, 2020, pp. 326–341

work page 2020
[47]

Practical detection of trojan neural networks: Data-limited and data- free cases,

R. Wang, G. Zhang, S. Liu, P.-Y . Chen, J. Xiong, and M. Wang, “Practical detection of trojan neural networks: Data-limited and data- free cases,” in Proc. Eur. Conf. Comput. Vis. (ECCV) , Glasgow, UK, 2020, pp. 222–238

work page 2020
[48]

Topological detection of trojaned neural networks,

S. Zheng, Y . Zhang, H. Wagner, M. Goswami, and C. Chen, “Topological detection of trojaned neural networks,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , vol. 34, Online, 2021, pp. 17 258–17 272

work page 2021
[49]

Post-training detection of backdoor attacks for two-class and multi-attack scenarios,

Z. Xiang, D. Miller, and G. Kesidis, “Post-training detection of backdoor attacks for two-class and multi-attack scenarios,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Online, 2022, pp. 1–34

work page 2022
[50]

Overcom- ing catastrophic forgetting in neural networks,

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcom- ing catastrophic forgetting in neural networks,” Proc. Nat. Acad. Sci. (PNAS), vol. 114, no. 13, pp. 3521–3526, 2017

work page 2017
[51]

Neural trojans,

Y . Liu, Y . Xie, and A. Srivastava, “Neural trojans,” in Proc. IEEE Int. Conf. Comput. Des. (ICCD) , Boston, MA, USA, 2017, pp. 45–48

work page 2017
[52]

Adversarial unlearning of backdoors via implicit hypergradient,

Y . Zeng, S. Chen, W. Park, Z. Mao, M. Jin, and R. Jia, “Adversarial unlearning of backdoors via implicit hypergradient,” in Proc. Int. Conf. Learn. Representations (ICLR) , Online, 2022, pp. 1–28

work page 2022
[53]

Distilling the knowledge in a neural network,

G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” in Proc. Int. Workshop Neural Inf. Process. Syst. (NeurIPS) , 2015, pp. 1–9

work page 2015
[54]

Disabling backdoor and identifying poison data by using knowledge distillation in backdoor attacks on deep neural networks,

K. Yoshida and T. Fujino, “Disabling backdoor and identifying poison data by using knowledge distillation in backdoor attacks on deep neural networks,” in Proc. ACM Workshop Artif. Intell. Secur. (AISec) , Online, 2020, pp. 117–127

work page 2020
[55]

Neural attention distillation: Erasing backdoor triggers from deep neural networks,

Y . Li, X. Lyu, N. Koren, L. Lyu, B. Li, and X. Ma, “Neural attention distillation: Erasing backdoor triggers from deep neural networks,” in Proc. Int. Conf. Learn. Representations (ICLR) , Vienna, Austria, 2021, pp. 1–19

work page 2021
[56]

Neural net pruning: Why and how,

Sietsma and Dow, “Neural net pruning: Why and how,” in Proc. IEEE Int. Conf. Neural Netw. (ICNN), vol. 1, San Diego, CA, USA, 1988, pp. 325–333

work page 1988
[57]

Fine-pruning: Defending against backdooring attacks on deep neural networks,

K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against backdooring attacks on deep neural networks,” in Proc. Int. Symp. Res. Attacks Intrusions Defenses (RAID) , Heraklion, Greece, 2018, pp. 273– 294

work page 2018
[58]

Adversarial neuron pruning purifies backdoored deep models,

D. Wu and Y . Wang, “Adversarial neuron pruning purifies backdoored deep models,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , Online, 2021, pp. 1–13

work page 2021
[59]

Neural Cleanse: Identifying and mitigating backdoor attacks in neural networks,

B. Wang, Y . Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y . Zhao, “Neural Cleanse: Identifying and mitigating backdoor attacks in neural networks,” in Proc. IEEE Symp. Secur. and Priv. (SP) , San Francisco, CA, USA, 2019, pp. 707–723

work page 2019
[60]

Towards inspecting and eliminating trojan backdoors in deep neural networks,

W. Guo, L. Wang, Y . Xu, X. Xing, M. Du, and D. Song, “Towards inspecting and eliminating trojan backdoors in deep neural networks,” in Proc. IEEE Int. Conf. Data Mining (ICDM) , Sorrento, Italy, 2020, pp. 162–171

work page 2020
[61]

Better trigger inversion optimization in backdoor scanning,

G. Tao, G. Shen, Y . Liu, S. An, Q. Xu, S. Ma, P. Li, and X. Zhang, “Better trigger inversion optimization in backdoor scanning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , New Orleans, LA, USA, 2022, pp. 13 358–13 368

work page 2022
[62]

Huxley, Brave New World

A. Huxley, Brave New World. London, UK: Chatto & Windus, 1932

work page 1932
[63]

W. R. Ashby, An Introduction to Cybernetics . London, UK: Chapman & Hall, 1956

work page 1956
[64]

Wiener, Cybernetics or Control and Communication in the Animal and the Machine

N. Wiener, Cybernetics or Control and Communication in the Animal and the Machine . Cambridge, MA, USA: MIT Press, 1948

work page 1948
[65]

The cybernetics of learning,

B. Cope and M. Kalantzis, “The cybernetics of learning,” Educ. Philos. Theory, vol. 54, no. 14, pp. 2352–2388, 12 2022

work page 2022
[66]

Motivated forgetting and the study of repression,

B. Weiner, “Motivated forgetting and the study of repression,” J. Pers., vol. 36, no. 2, pp. 213–234, 1968

work page 1968
[67]

Machine unlearning,

L. Bourtoule, V . Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot, “Machine unlearning,” in Proc. IEEE Symp. Secur. and Priv. (SP) , San Francisco, CA, USA, 2021, pp. 141–159

work page 2021
[68]

Model inversion attacks that exploit confidence information and basic countermeasures,

M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur. (CCS), Denver, CO, USA, 2015, pp. 1322–1333

work page 2015
[69]

Enhancement of eyewitness memory with the cognitive interview,

R. E. Geiselman, R. P. Fisher, D. P. MacKinnon, and H. L. Holland, “Enhancement of eyewitness memory with the cognitive interview,” Am. J. Psychol., vol. 99, no. 3, pp. 385–401, 1986

work page 1986
[70]

Statistics of mental imagery,

F. Galton, “Statistics of mental imagery,” Mind, vol. 5, no. 19, pp. 301– 318, 1880

work page
[71]

Mental representation,

D. Pitt, “Mental representation,” in The Stanford Encyclopedia of Philosophy. Stanford University, 2000

work page 2000
[72]

Representation, similarity, and the chorus of prototypes,

S. Edelman, “Representation, similarity, and the chorus of prototypes,” Minds Mach., vol. 5, no. 1, pp. 45–68, 1995

work page 1995
[73]

Sensory deprivation and hallu- cinations,

J. Vernon, T. Marton, and E. Peterson, “Sensory deprivation and hallu- cinations,” Science, vol. 133, no. 3467, pp. 1808–1812, 1961

work page 1961
[74]

Intriguing properties of neural networks,

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Banff, AB, Canada, 2014, pp. 1–10

work page 2014
[75]

Explaining and harnessing adversarial examples,

I. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proc. Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, 2015, pp. 1–11

work page 2015
[76]

Adversarial machine learning at scale,

A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” in Proc. Int. Conf. Learn. Represent. (ICLR) , 2017, pp. 1–17

work page 2017
[77]

Ensemble adversarial training: Attacks and defenses

F. Tramèr, A. Kurakin, N. Papernot, I. J. Goodfellow, D. Boneh, and P. D. McDaniel, “Ensemble adversarial training: Attacks and defenses.” in Proc. Int. Conf. Learn. Representations (ICLR), Vancouver, BC, Canada, 2018, pp. 1–20

work page 2018
[78]

Adversarial training and robustness for multiple perturbations,

F. Tramèr and D. Boneh, “Adversarial training and robustness for multiple perturbations,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS), Vancouver, BC, Canada, 2019, pp. 5866–5876

work page 2019
[79]

Towards deep learning models resistant to adversarial attacks

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks.” in Proc. Int. Conf. Learn. Representations (ICLR), Vancouver, BC, Canada, 2018, pp. 1–23

work page 2018
[80]

Gradient-based learning applied to document recognition,

Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278– 2324, 1998

work page 1998

Showing first 80 references.

[1] [1]

Backdoor learning: A survey,

Y . Li, Y . Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 35, no. 1, pp. 5–22, 2024

work page 2024

[2] [2]

I. P. Pavlov, Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex . Oxford, UK: Oxford University Press, 1927

work page 1927

[3] [3]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. Int. Conf. Artif. Intell. Stat. (AISTATS) , vol. 54, Fort Lauderdale, FL, USA, 2017, pp. 1273–1282

work page 2017

[4] [4]

Evolutionary principles in self-referential learning,

J. Schmidhuber, “Evolutionary principles in self-referential learning,” Diploma Thesis, Technische Universitä t Mü nchen, Munich, Germany, 1987

work page 1987

[5] [5]

Thrun, Lifelong Learning Algorithms

S. Thrun, Lifelong Learning Algorithms. New York, NY , USA: Springer, 1998, pp. 181–209

work page 1998

[6] [6]

Building machines that learn and think like people,

B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, “Building machines that learn and think like people,” Behav. Brain Sci., vol. 40, pp. 1–72, 2017

work page 2017

[7] [7]

Online meta- learning,

C. Finn, A. Rajeswaran, S. Kakade, and S. Levine, “Online meta- learning,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 97, Long Beach, CA, USA, 2019, pp. 1920–1930

work page 2019

[8] [8]

Backdoor attacks against deep learning systems in the physi- cal world,

E. Wenger, J. Passananti, A. N. Bhagoji, Y . Yao, H. Zheng, and B. Y . Zhao, “Backdoor attacks against deep learning systems in the physi- cal world,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Nashville, TN, USA, 2021, pp. 6202–6211

work page 2021

[9] [9]

BadNets: Evaluating backdooring attacks on deep neural networks,

T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “BadNets: Evaluating backdooring attacks on deep neural networks,” IEEE Access, vol. 7, pp. 47 230–47 244, 2019

work page 2019

[10] [10]

Invisible backdoor attack with sample-specific triggers,

Y . Li, Y . Li, B. Wu, L. Li, R. He, and S. Lyu, “Invisible backdoor attack with sample-specific triggers,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Montréal, QC, Canada, 2021, pp. 16 443–16 452

work page 2021

[11] [11]

Dynamic backdoor attacks against machine learning models,

A. Salem, R. Wen, M. Backes, S. Ma, and Y . Zhang, “Dynamic backdoor attacks against machine learning models,” in Proc. IEEE Eur. Symp. Secur. Priv. (EuroS&P), Genoa, Italy, 2022, pp. 703–718

work page 2022

[12] [12]

Poison ink: Robust and invisible backdoor attack,

J. Zhang, C. Dongdong, Q. Huang, J. Liao, W. Zhang, H. Feng, G. Hua, and N. Yu, “Poison ink: Robust and invisible backdoor attack,” IEEE Trans. Image Process., vol. 31, pp. 5691–5705, 2022

work page 2022

[13] [13]

Trojaning attack on neural networks,

Y . Liu, S. Ma, Y . Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” in Proc. Netw. Distrib. Syst. Secur. Symp. (NDSS), San Diego, CA, USA, 2018, pp. 1–15

work page 2018

[14] [14]

Witches’ brew: Industrial scale data poisoning via gradient matching,

J. Geiping, L. H. Fowl, W. R. Huang, W. Czaja, G. Taylor, M. Moeller, and T. Goldstein, “Witches’ brew: Industrial scale data poisoning via gradient matching,” in Proc. Int. Conf. Learn. Representations (ICLR) , Vienna, Austria, 2021, pp. 1–24

work page 2021

[15] [15]

Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch,

H. Souri, L. Fowl, R. Chellappa, M. Goldblum, and T. Goldstein, “Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 35, New Orleans, LA, USA, 2022, pp. 19 165–19 178

work page 2022

[16] [16]

How to backdoor federated learning,

E. Bagdasaryan, A. Veit, Y . Hua, D. Estrin, and V . Shmatikov, “How to backdoor federated learning,” in Proc. Int. Conf. Artif. Intell. Stat. (AISTATS), vol. 108, Online, 2020, pp. 2938–2948

work page 2020

[17] [17]

Back- door embedding in convolutional neural network models via invisible perturbation,

H. Zhong, C. Liao, A. C. Squicciarini, S. Zhu, and D. Miller, “Back- door embedding in convolutional neural network models via invisible perturbation,” in Proc. ACM Conf. Data Appl. Secur. Priv. (CODASPY), New Orleans, LA, USA, 2020, pp. 97–108

work page 2020

[18] [18]

Invisible backdoor attacks on deep neural networks via steganography and regularization,

S. Li, M. Xue, B. Z. H. Zhao, H. Zhu, and X. Zhang, “Invisible backdoor attacks on deep neural networks via steganography and regularization,” IEEE Trans. Dependable Secure Comput., vol. 18, no. 5, pp. 2088–2105, 2021

work page 2088

[19] [19]

An invisible black-box backdoor attack through frequency domain,

T. Wang, Y . Yao, F. Xu, S. An, H. Tong, and T. Wang, “An invisible black-box backdoor attack through frequency domain,” in Proc. Eur. Conf. Comput. Vis. (ECCV) , Tel Aviv, Israel, 2022, pp. 396–413

work page 2022

[20] [20]

Spectral signatures in backdoor attacks,

B. Tran, J. Li, and A. M ˛ adry, “Spectral signatures in backdoor attacks,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , Montréal, QC, Canada, 2018, pp. 8011–8021

work page 2018

[21] [21]

Detecting backdoor attacks on deep neural networks by activation clustering,

B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networks by activation clustering,” in Proc. Assoc. Adv. Artif. Intell. Workshop (AAAI), vol. 2301, Honolulu, Hawaii, 2019, pp. 1–8

work page 2019

[22] [22]

Demon in the variant: Sta- tistical analysis of DNNs for robust backdoor contamination detection,

D. Tang, X. Wang, H. Tang, and K. Zhang, “Demon in the variant: Sta- tistical analysis of DNNs for robust backdoor contamination detection,” in Proc. USENIX Secur. Symp. (USENIX), Online, 2021, pp. 1541–1558

work page 2021

[23] [23]

SPECTRE: defending against backdoor attacks using robust statistics,

J. Hayase, W. Kong, R. Somani, and S. Oh, “SPECTRE: defending against backdoor attacks using robust statistics,” in Proc. Int. Conf. Mach. Learn. (ICML) , vol. 139, Online, 2021, pp. 4129–4139

work page 2021

[24] [24]

Rethinking the backdoor attacks’ triggers: A frequency perspective,

Y . Zeng, W. Park, Z. M. Mao, and R. Jia, “Rethinking the backdoor attacks’ triggers: A frequency perspective,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Montréal, QC, Canada, 2021, pp. 16 453–16 461

work page 2021

[25] [25]

Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff,

E. Borgnia, V . Cherepanova, L. Fowl, A. Ghiasi, J. Geiping, M. Gold- blum, T. Goldstein, and A. Gupta, “Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) , Toronto, ON, Canada, 2021, pp. 3855–3859

work page 2021

[26] [26]

CutMix: Reg- ularization strategy to train strong classifiers with localizable features,

S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y . Yoo, “CutMix: Reg- ularization strategy to train strong classifiers with localizable features,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, Korea, 2019, pp. 6023–6032

work page 2019

[27] [27]

Can you really backdoor federated learning?

A. T. Suresh, B. McMahan, P. Kairouz, and Z. Sun, “Can you really backdoor federated learning?” in Proc. Int. Workshop Neural Inf. Pro- cess. Syst. (NeurIPS) , Vancouver, BC, Canada, 2019, pp. 1–10

work page 2019

[28] [28]

Robust anomaly detection and backdoor attack detection via differential privacy,

M. Du, R. Jia, and D. Song, “Robust anomaly detection and backdoor attack detection via differential privacy,” in Proc. Int. Conf. Learn. Represent. (ICLR), Addis Ababa, Ethiopia, 2020, pp. 1–19

work page 2020

[29] [29]

Local and central differential privacy for robustness and privacy in federated learning,

M. Naseri, J. Hayes, and E. D. Cristofaro, “Local and central differential privacy for robustness and privacy in federated learning,” in Proc. Netw. Distrib. Syst. Secur. Symp. (NDSS) , San Diego, CA, USA, 2022, pp. 1–18

work page 2022

[30] [30]

Deep learning with differential privacy,

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Tal- war, and L. Zhang, “Deep learning with differential privacy,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur. (CCS) , Vienna, Austria, 2016, pp. 308–318

work page 2016

[31] [31]

Deep partition aggregation: Provable defenses against general poisoning attacks,

A. Levine and S. Feizi, “Deep partition aggregation: Provable defenses against general poisoning attacks,” in Proc. Int. Conf. Learn. Represent. (ICLR), Online, 2021, pp. 1–20

work page 2021

[32] [32]

Intrinsic certified robustness of bagging against data poisoning attacks,

J. Jia, X. Cao, and N. Z. Gong, “Intrinsic certified robustness of bagging against data poisoning attacks,” in Proc. Assoc. Adv. Artif. Intell. Conf. (AAAI), vol. 35, no. 9, Online, 2021, pp. 7961–7969

work page 2021

[33] [33]

Certified robustness of nearest neighbors against data poisoning and backdoor attacks,

J. Jia, Y . Liu, X. Cao, and N. Z. Gong, “Certified robustness of nearest neighbors against data poisoning and backdoor attacks,” in Proc. Assoc. Adv. Artif. Intell. Conf. (AAAI) , vol. 36, no. 9, Online, 2022, pp. 9575– 9583

work page 2022

[34] [34]

Bagging predictors,

L. Breiman, “Bagging predictors,” Mach. Learn. , vol. 24, no. 2, pp. 123–140, 1996

work page 1996

[35] [35]

STRIP: A defence against trojan attacks on deep neural networks,

Y . Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, “STRIP: A defence against trojan attacks on deep neural networks,” in Proc. Annu. Comput. Secur. Appl. Conf. (ACSAC) , San Juan, PR, USA, 2019, pp. 113–125

work page 2019

[36] [36]

Deep probabilistic models to detect data poisoning attacks,

M. Subedar, N. A. Ahuja, R. Krishnan, I. J. Ndiour, and O. Tickoo, “Deep probabilistic models to detect data poisoning attacks,” inProc. Int. Workshop Neural Inf. Process. Syst. (NeurIPS), Vancouver, BC, Canada, 2019, pp. 1–5

work page 2019

[37] [37]

CleaNN: Accelerated trojan shield for embedded neural networks,

M. Javaheripi, M. Samragh, G. Fields, T. Javidi, and F. Koushanfar, “CleaNN: Accelerated trojan shield for embedded neural networks,” in Proc. Int. Conf. Comput. Aided Des. (ICCAD) , Online, 2020, pp. 1–9

work page 2020

[38] [38]

DeepSweep: An evaluation framework for mitigating DNN backdoor attacks using data augmentation,

H. Qiu, Y . Zeng, S. Guo, T. Zhang, M. Qiu, and B. Thuraisingham, “DeepSweep: An evaluation framework for mitigating DNN backdoor attacks using data augmentation,” in Proc. ACM Asia Conf. Comput. Commun. Secur. (AsiaCCS), Online, 2021, pp. 363–377. JOURNAL OF LATEX 15

work page 2021

[39] [39]

Februus: Input purification defense against trojan attacks on deep neural network systems,

B. G. Doan, E. Abbasnejad, and D. C. Ranasinghe, “Februus: Input purification defense against trojan attacks on deep neural network systems,” in Proc. Annu. Comput. Secur. Appl. Conf. (ACSAC). Austin, TX, USA: Association for Computing Machinery, 2020, pp. 897–912

work page 2020

[40] [40]

Model agnostic defence against backdoor attacks in machine learning,

S. Udeshi, S. Peng, G. Woo, L. Loh, L. Rawshan, and S. Chattopadhyay, “Model agnostic defence against backdoor attacks in machine learning,” IEEE Trans. Reliab., vol. 71, no. 2, pp. 880–895, 2022

work page 2022

[41] [41]

On certifying robust- ness against backdoor attacks via randomized smoothing,

B. Wang, X. Cao, J. jia, and N. Z. Gong, “On certifying robust- ness against backdoor attacks via randomized smoothing,” in Proc. IEEE/CVF Workshop Comput. Vis. Pattern Recognit. (CVPR) , Online, 2020, pp. 1–5

work page 2020

[42] [42]

Certified robustness to label-flipping attacks via randomized smoothing,

E. Rosenfeld, E. Winston, P. Ravikumar, and J. Z. Kolter, “Certified robustness to label-flipping attacks via randomized smoothing,” in Proc. Int. Conf. Mach. Learn. (ICML) , Online, 2020, pp. 8230–8241

work page 2020

[43] [43]

RAB: Provable robustness against backdoor attacks,

M. Weber, X. Xu, B. Karlaš, C. Zhang, and B. Li, “RAB: Provable robustness against backdoor attacks,” in Proc. IEEE Symp. Secur. and Priv. (SP), San Francisco, CA, USA, 2023, pp. 1311–1328

work page 2023

[44] [44]

Universal litmus patterns: Revealing backdoor attacks in CNNs,

S. Kolouri, A. Saha, H. Pirsiavash, and H. Hoffmann, “Universal litmus patterns: Revealing backdoor attacks in CNNs,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Seattle, W A, USA, 2020, pp. 298–307

work page 2020

[45] [45]

Detecting AI trojans using meta neural analysis,

X. Xu, Q. Wang, H. Li, N. Borisov, C. A. Gunter, and B. Li, “Detecting AI trojans using meta neural analysis,” in Proc. IEEE Symp. Secur. and Priv. (SP), San Francisco, CA, USA, 2021, pp. 103–120

work page 2021

[46] [46]

One-pixel signature: Character- izing CNN models for backdoor detection,

S. Huang, W. Peng, Z. Jia, and Z. Tu, “One-pixel signature: Character- izing CNN models for backdoor detection,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Glasgow, UK, 2020, pp. 326–341

work page 2020

[47] [47]

Practical detection of trojan neural networks: Data-limited and data- free cases,

R. Wang, G. Zhang, S. Liu, P.-Y . Chen, J. Xiong, and M. Wang, “Practical detection of trojan neural networks: Data-limited and data- free cases,” in Proc. Eur. Conf. Comput. Vis. (ECCV) , Glasgow, UK, 2020, pp. 222–238

work page 2020

[48] [48]

Topological detection of trojaned neural networks,

S. Zheng, Y . Zhang, H. Wagner, M. Goswami, and C. Chen, “Topological detection of trojaned neural networks,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , vol. 34, Online, 2021, pp. 17 258–17 272

work page 2021

[49] [49]

Post-training detection of backdoor attacks for two-class and multi-attack scenarios,

Z. Xiang, D. Miller, and G. Kesidis, “Post-training detection of backdoor attacks for two-class and multi-attack scenarios,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Online, 2022, pp. 1–34

work page 2022

[50] [50]

Overcom- ing catastrophic forgetting in neural networks,

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcom- ing catastrophic forgetting in neural networks,” Proc. Nat. Acad. Sci. (PNAS), vol. 114, no. 13, pp. 3521–3526, 2017

work page 2017

[51] [51]

Neural trojans,

Y . Liu, Y . Xie, and A. Srivastava, “Neural trojans,” in Proc. IEEE Int. Conf. Comput. Des. (ICCD) , Boston, MA, USA, 2017, pp. 45–48

work page 2017

[52] [52]

Adversarial unlearning of backdoors via implicit hypergradient,

Y . Zeng, S. Chen, W. Park, Z. Mao, M. Jin, and R. Jia, “Adversarial unlearning of backdoors via implicit hypergradient,” in Proc. Int. Conf. Learn. Representations (ICLR) , Online, 2022, pp. 1–28

work page 2022

[53] [53]

Distilling the knowledge in a neural network,

G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” in Proc. Int. Workshop Neural Inf. Process. Syst. (NeurIPS) , 2015, pp. 1–9

work page 2015

[54] [54]

Disabling backdoor and identifying poison data by using knowledge distillation in backdoor attacks on deep neural networks,

K. Yoshida and T. Fujino, “Disabling backdoor and identifying poison data by using knowledge distillation in backdoor attacks on deep neural networks,” in Proc. ACM Workshop Artif. Intell. Secur. (AISec) , Online, 2020, pp. 117–127

work page 2020

[55] [55]

Neural attention distillation: Erasing backdoor triggers from deep neural networks,

Y . Li, X. Lyu, N. Koren, L. Lyu, B. Li, and X. Ma, “Neural attention distillation: Erasing backdoor triggers from deep neural networks,” in Proc. Int. Conf. Learn. Representations (ICLR) , Vienna, Austria, 2021, pp. 1–19

work page 2021

[56] [56]

Neural net pruning: Why and how,

Sietsma and Dow, “Neural net pruning: Why and how,” in Proc. IEEE Int. Conf. Neural Netw. (ICNN), vol. 1, San Diego, CA, USA, 1988, pp. 325–333

work page 1988

[57] [57]

Fine-pruning: Defending against backdooring attacks on deep neural networks,

K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against backdooring attacks on deep neural networks,” in Proc. Int. Symp. Res. Attacks Intrusions Defenses (RAID) , Heraklion, Greece, 2018, pp. 273– 294

work page 2018

[58] [58]

Adversarial neuron pruning purifies backdoored deep models,

D. Wu and Y . Wang, “Adversarial neuron pruning purifies backdoored deep models,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , Online, 2021, pp. 1–13

work page 2021

[59] [59]

Neural Cleanse: Identifying and mitigating backdoor attacks in neural networks,

B. Wang, Y . Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y . Zhao, “Neural Cleanse: Identifying and mitigating backdoor attacks in neural networks,” in Proc. IEEE Symp. Secur. and Priv. (SP) , San Francisco, CA, USA, 2019, pp. 707–723

work page 2019

[60] [60]

Towards inspecting and eliminating trojan backdoors in deep neural networks,

W. Guo, L. Wang, Y . Xu, X. Xing, M. Du, and D. Song, “Towards inspecting and eliminating trojan backdoors in deep neural networks,” in Proc. IEEE Int. Conf. Data Mining (ICDM) , Sorrento, Italy, 2020, pp. 162–171

work page 2020

[61] [61]

Better trigger inversion optimization in backdoor scanning,

G. Tao, G. Shen, Y . Liu, S. An, Q. Xu, S. Ma, P. Li, and X. Zhang, “Better trigger inversion optimization in backdoor scanning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , New Orleans, LA, USA, 2022, pp. 13 358–13 368

work page 2022

[62] [62]

Huxley, Brave New World

A. Huxley, Brave New World. London, UK: Chatto & Windus, 1932

work page 1932

[63] [63]

W. R. Ashby, An Introduction to Cybernetics . London, UK: Chapman & Hall, 1956

work page 1956

[64] [64]

Wiener, Cybernetics or Control and Communication in the Animal and the Machine

N. Wiener, Cybernetics or Control and Communication in the Animal and the Machine . Cambridge, MA, USA: MIT Press, 1948

work page 1948

[65] [65]

The cybernetics of learning,

B. Cope and M. Kalantzis, “The cybernetics of learning,” Educ. Philos. Theory, vol. 54, no. 14, pp. 2352–2388, 12 2022

work page 2022

[66] [66]

Motivated forgetting and the study of repression,

B. Weiner, “Motivated forgetting and the study of repression,” J. Pers., vol. 36, no. 2, pp. 213–234, 1968

work page 1968

[67] [67]

Machine unlearning,

L. Bourtoule, V . Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot, “Machine unlearning,” in Proc. IEEE Symp. Secur. and Priv. (SP) , San Francisco, CA, USA, 2021, pp. 141–159

work page 2021

[68] [68]

Model inversion attacks that exploit confidence information and basic countermeasures,

M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur. (CCS), Denver, CO, USA, 2015, pp. 1322–1333

work page 2015

[69] [69]

Enhancement of eyewitness memory with the cognitive interview,

R. E. Geiselman, R. P. Fisher, D. P. MacKinnon, and H. L. Holland, “Enhancement of eyewitness memory with the cognitive interview,” Am. J. Psychol., vol. 99, no. 3, pp. 385–401, 1986

work page 1986

[70] [70]

Statistics of mental imagery,

F. Galton, “Statistics of mental imagery,” Mind, vol. 5, no. 19, pp. 301– 318, 1880

work page

[71] [71]

Mental representation,

D. Pitt, “Mental representation,” in The Stanford Encyclopedia of Philosophy. Stanford University, 2000

work page 2000

[72] [72]

Representation, similarity, and the chorus of prototypes,

S. Edelman, “Representation, similarity, and the chorus of prototypes,” Minds Mach., vol. 5, no. 1, pp. 45–68, 1995

work page 1995

[73] [73]

Sensory deprivation and hallu- cinations,

J. Vernon, T. Marton, and E. Peterson, “Sensory deprivation and hallu- cinations,” Science, vol. 133, no. 3467, pp. 1808–1812, 1961

work page 1961

[74] [74]

Intriguing properties of neural networks,

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Banff, AB, Canada, 2014, pp. 1–10

work page 2014

[75] [75]

Explaining and harnessing adversarial examples,

I. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proc. Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, 2015, pp. 1–11

work page 2015

[76] [76]

Adversarial machine learning at scale,

A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” in Proc. Int. Conf. Learn. Represent. (ICLR) , 2017, pp. 1–17

work page 2017

[77] [77]

Ensemble adversarial training: Attacks and defenses

F. Tramèr, A. Kurakin, N. Papernot, I. J. Goodfellow, D. Boneh, and P. D. McDaniel, “Ensemble adversarial training: Attacks and defenses.” in Proc. Int. Conf. Learn. Representations (ICLR), Vancouver, BC, Canada, 2018, pp. 1–20

work page 2018

[78] [78]

Adversarial training and robustness for multiple perturbations,

F. Tramèr and D. Boneh, “Adversarial training and robustness for multiple perturbations,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS), Vancouver, BC, Canada, 2019, pp. 5866–5876

work page 2019

[79] [79]

Towards deep learning models resistant to adversarial attacks

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks.” in Proc. Int. Conf. Learn. Representations (ICLR), Vancouver, BC, Canada, 2018, pp. 1–23

work page 2018

[80] [80]

Gradient-based learning applied to document recognition,

Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278– 2324, 1998

work page 1998