Hypnopaedia-Aware Machine Unlearning via Psychometrics of Artificial Mental Imagery
Pith reviewed 2026-05-23 21:05 UTC · model grok-4.3
The pith
A self-aware unlearning method uses model inversion to generate artificial mental imagery from neural networks, then applies hypothesis analysis to estimate and detach backdoor triggers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through reverse engineering and model inversion, artificial mental imagery is elicited from the network; stochastic processes are introduced to prevent convergence on flawed patterns; hypothesis analysis then assigns infection probabilities to candidate triggers, enabling the model to autonomously detach its behavior from the backdoor while retaining knowledge fidelity.
What carries the argument
The psychometrics of artificial mental imagery, produced by model inversion followed by hypothesis analysis that scores each generated pattern for likelihood of being the true trigger.
If this is right
- The framework supports continuous monitoring of threats from untrustworthy data sources without halting training.
- Autonomous detachment from the trigger occurs while the equilibrium between task accuracy and security is preserved.
- Deceptive patterns are identified through statistical inference rather than human inspection of weights or data.
- Infection likelihood is quantified for each potential trigger, allowing prioritized removal decisions.
Where Pith is reading between the lines
- The same inversion-plus-hypothesis loop could be tested on models that have undergone multiple sequential backdoor insertions to measure cumulative detection accuracy.
- If the generated imagery proves separable in practice, the approach might be combined with periodic self-audits during deployment to catch triggers introduced after initial training.
- Extending the stochastic disruption step to other inversion-based attacks could reveal whether the method generalizes beyond backdoors to related model-extraction vulnerabilities.
Load-bearing premise
Model inversion will reliably surface the backdoor trigger inside the generated imagery as a pattern that can be statistically distinguished from legitimate features.
What would settle it
Apply the full pipeline to a model with a known, verifiable backdoor trigger and check whether the hypothesis analysis assigns a markedly higher infection probability to the true trigger than to any other candidate pattern.
Figures
read the original abstract
Neural backdoors represent insidious cybersecurity loopholes that render learning machinery vulnerable to unauthorised manipulations, potentially enabling the weaponisation of artificial intelligence with catastrophic consequences. A backdoor attack involves the clandestine infiltration of a trigger during the learning process, metaphorically analogous to hypnopaedia, where ideas are implanted into a subject's subconscious mind under the state of hypnosis or unconsciousness. When activated by a sensory stimulus, the trigger evokes a conditioned reflex that directs a machine to mount a predetermined response. In this study, we propose a cybernetic framework for constant surveillance of backdoor threats, driven by the dynamic nature of untrustworthy data sources. We develop a self-aware unlearning mechanism to autonomously detach a machine's behaviour from the backdoor trigger. Through reverse engineering and statistical inference, we detect deceptive patterns and estimate the likelihood of backdoor infection. We employ model inversion to elicit artificial mental imagery, using stochastic processes to disrupt optimisation pathways and avoid convergent but potentially flawed patterns. This is followed by hypothesis analysis, which estimates the likelihood of each potentially malicious pattern as the true trigger and infers the probability of infection. The primary objective of this study is to maintain a stable state of equilibrium between knowledge fidelity and backdoor vulnerability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a cybernetic framework for detecting and unlearning neural backdoors, analogizing them to hypnopaedia. It claims a self-aware unlearning mechanism that performs reverse engineering followed by model inversion (using stochastic processes to disrupt optimization pathways) to elicit 'artificial mental imagery,' then applies hypothesis analysis to estimate the likelihood that each pattern is the true trigger and infers the probability of backdoor infection, with the goal of maintaining equilibrium between knowledge fidelity and vulnerability.
Significance. If the claimed pipeline could be formalized and shown to work, it would address an important problem in AI security by offering an autonomous, self-aware unlearning approach. The manuscript, however, supplies no algorithms, objective functions, statistical tests, or results, so no assessment of actual significance is possible.
major comments (3)
- [Abstract] Abstract: the central claim that model inversion will 'elicit artificial mental imagery' amenable to statistical separation from legitimate features is unsupported; no inversion objective, loss, or stochastic disruption procedure is defined, and no condition guaranteeing distinguishability is stated. This is load-bearing for the entire unlearning and probability-estimation pipeline.
- [Abstract] Abstract: the hypothesis analysis step is described only as estimating 'the likelihood of each potentially malicious pattern as the true trigger,' with no hypothesis test, likelihood model, or decision rule provided; without these, the infection-probability inference cannot be performed or validated.
- [Abstract] Abstract / full text: the manuscript states the existence of mechanisms and an objective but contains no derivation, algorithm, dataset, experiment, or proof that the described steps achieve the claimed equilibrium between fidelity and vulnerability.
Simulated Author's Rebuttal
We thank the referee for the detailed review. The manuscript introduces a conceptual cybernetic framework for backdoor unlearning via the hypnopaedia analogy and self-aware mechanisms, without formal algorithms or experiments. We address each major comment below, agreeing where the presentation lacks detail and noting planned revisions to clarify scope.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that model inversion will 'elicit artificial mental imagery' amenable to statistical separation from legitimate features is unsupported; no inversion objective, loss, or stochastic disruption procedure is defined, and no condition guaranteeing distinguishability is stated. This is load-bearing for the entire unlearning and probability-estimation pipeline.
Authors: We agree that the abstract presents model inversion at a conceptual level only, without defining an objective function, loss, or explicit stochastic disruption procedure. The manuscript relies on the high-level description of using stochastic processes to avoid flawed convergence. This is a limitation of the current text. We will revise to add a high-level description of the inversion as a stochastic optimization process with noise injection to promote exploration of potential trigger patterns. revision: partial
-
Referee: [Abstract] Abstract: the hypothesis analysis step is described only as estimating 'the likelihood of each potentially malicious pattern as the true trigger,' with no hypothesis test, likelihood model, or decision rule provided; without these, the infection-probability inference cannot be performed or validated.
Authors: We agree that no specific hypothesis test, likelihood model, or decision rule is provided. The description remains at the level of statistical inference on elicited patterns. This reflects the conceptual focus of the work. In revision we will outline a possible likelihood estimation approach based on pattern consistency across multiple inversions and a simple threshold rule for inferring infection probability. revision: partial
-
Referee: [Abstract] Abstract / full text: the manuscript states the existence of mechanisms and an objective but contains no derivation, algorithm, dataset, experiment, or proof that the described steps achieve the claimed equilibrium between fidelity and vulnerability.
Authors: We acknowledge that the manuscript contains no derivations, algorithms, datasets, experiments, or proofs, as it is positioned as a high-level framework proposal introducing the hypnopaedia-inspired analogy. The equilibrium is stated as the primary objective but not demonstrated. We will revise the abstract and introduction to explicitly characterize the contribution as conceptual and to indicate that formalization and validation are left for subsequent work. revision: yes
- The manuscript contains no algorithms, objective functions, statistical tests, datasets, experiments, or proofs, preventing any empirical or formal demonstration of the pipeline.
Circularity Check
No circularity; derivation remains at descriptive level without self-referential reductions
full rationale
The provided abstract and description outline a high-level pipeline of reverse engineering, model inversion with stochastic disruption, and hypothesis analysis to estimate infection probability, but contain no equations, fitted parameters presented as independent predictions, self-citations, or uniqueness theorems. No load-bearing step reduces by construction to its own inputs; the claims are methodological assertions without demonstrated mathematical equivalence to the data or priors used. The derivation is therefore self-contained at the level of description.
Axiom & Free-Parameter Ledger
invented entities (1)
-
artificial mental imagery
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Y . Li, Y . Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 35, no. 1, pp. 5–22, 2024
work page 2024
-
[2]
I. P. Pavlov, Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex . Oxford, UK: Oxford University Press, 1927
work page 1927
-
[3]
Communication-efficient learning of deep networks from decentralized data,
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. Int. Conf. Artif. Intell. Stat. (AISTATS) , vol. 54, Fort Lauderdale, FL, USA, 2017, pp. 1273–1282
work page 2017
-
[4]
Evolutionary principles in self-referential learning,
J. Schmidhuber, “Evolutionary principles in self-referential learning,” Diploma Thesis, Technische Universitä t Mü nchen, Munich, Germany, 1987
work page 1987
-
[5]
Thrun, Lifelong Learning Algorithms
S. Thrun, Lifelong Learning Algorithms. New York, NY , USA: Springer, 1998, pp. 181–209
work page 1998
-
[6]
Building machines that learn and think like people,
B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, “Building machines that learn and think like people,” Behav. Brain Sci., vol. 40, pp. 1–72, 2017
work page 2017
-
[7]
C. Finn, A. Rajeswaran, S. Kakade, and S. Levine, “Online meta- learning,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 97, Long Beach, CA, USA, 2019, pp. 1920–1930
work page 2019
-
[8]
Backdoor attacks against deep learning systems in the physi- cal world,
E. Wenger, J. Passananti, A. N. Bhagoji, Y . Yao, H. Zheng, and B. Y . Zhao, “Backdoor attacks against deep learning systems in the physi- cal world,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Nashville, TN, USA, 2021, pp. 6202–6211
work page 2021
-
[9]
BadNets: Evaluating backdooring attacks on deep neural networks,
T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “BadNets: Evaluating backdooring attacks on deep neural networks,” IEEE Access, vol. 7, pp. 47 230–47 244, 2019
work page 2019
-
[10]
Invisible backdoor attack with sample-specific triggers,
Y . Li, Y . Li, B. Wu, L. Li, R. He, and S. Lyu, “Invisible backdoor attack with sample-specific triggers,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Montréal, QC, Canada, 2021, pp. 16 443–16 452
work page 2021
-
[11]
Dynamic backdoor attacks against machine learning models,
A. Salem, R. Wen, M. Backes, S. Ma, and Y . Zhang, “Dynamic backdoor attacks against machine learning models,” in Proc. IEEE Eur. Symp. Secur. Priv. (EuroS&P), Genoa, Italy, 2022, pp. 703–718
work page 2022
-
[12]
Poison ink: Robust and invisible backdoor attack,
J. Zhang, C. Dongdong, Q. Huang, J. Liao, W. Zhang, H. Feng, G. Hua, and N. Yu, “Poison ink: Robust and invisible backdoor attack,” IEEE Trans. Image Process., vol. 31, pp. 5691–5705, 2022
work page 2022
-
[13]
Trojaning attack on neural networks,
Y . Liu, S. Ma, Y . Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” in Proc. Netw. Distrib. Syst. Secur. Symp. (NDSS), San Diego, CA, USA, 2018, pp. 1–15
work page 2018
-
[14]
Witches’ brew: Industrial scale data poisoning via gradient matching,
J. Geiping, L. H. Fowl, W. R. Huang, W. Czaja, G. Taylor, M. Moeller, and T. Goldstein, “Witches’ brew: Industrial scale data poisoning via gradient matching,” in Proc. Int. Conf. Learn. Representations (ICLR) , Vienna, Austria, 2021, pp. 1–24
work page 2021
-
[15]
Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch,
H. Souri, L. Fowl, R. Chellappa, M. Goldblum, and T. Goldstein, “Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 35, New Orleans, LA, USA, 2022, pp. 19 165–19 178
work page 2022
-
[16]
How to backdoor federated learning,
E. Bagdasaryan, A. Veit, Y . Hua, D. Estrin, and V . Shmatikov, “How to backdoor federated learning,” in Proc. Int. Conf. Artif. Intell. Stat. (AISTATS), vol. 108, Online, 2020, pp. 2938–2948
work page 2020
-
[17]
Back- door embedding in convolutional neural network models via invisible perturbation,
H. Zhong, C. Liao, A. C. Squicciarini, S. Zhu, and D. Miller, “Back- door embedding in convolutional neural network models via invisible perturbation,” in Proc. ACM Conf. Data Appl. Secur. Priv. (CODASPY), New Orleans, LA, USA, 2020, pp. 97–108
work page 2020
-
[18]
Invisible backdoor attacks on deep neural networks via steganography and regularization,
S. Li, M. Xue, B. Z. H. Zhao, H. Zhu, and X. Zhang, “Invisible backdoor attacks on deep neural networks via steganography and regularization,” IEEE Trans. Dependable Secure Comput., vol. 18, no. 5, pp. 2088–2105, 2021
work page 2088
-
[19]
An invisible black-box backdoor attack through frequency domain,
T. Wang, Y . Yao, F. Xu, S. An, H. Tong, and T. Wang, “An invisible black-box backdoor attack through frequency domain,” in Proc. Eur. Conf. Comput. Vis. (ECCV) , Tel Aviv, Israel, 2022, pp. 396–413
work page 2022
-
[20]
Spectral signatures in backdoor attacks,
B. Tran, J. Li, and A. M ˛ adry, “Spectral signatures in backdoor attacks,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , Montréal, QC, Canada, 2018, pp. 8011–8021
work page 2018
-
[21]
Detecting backdoor attacks on deep neural networks by activation clustering,
B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networks by activation clustering,” in Proc. Assoc. Adv. Artif. Intell. Workshop (AAAI), vol. 2301, Honolulu, Hawaii, 2019, pp. 1–8
work page 2019
-
[22]
Demon in the variant: Sta- tistical analysis of DNNs for robust backdoor contamination detection,
D. Tang, X. Wang, H. Tang, and K. Zhang, “Demon in the variant: Sta- tistical analysis of DNNs for robust backdoor contamination detection,” in Proc. USENIX Secur. Symp. (USENIX), Online, 2021, pp. 1541–1558
work page 2021
-
[23]
SPECTRE: defending against backdoor attacks using robust statistics,
J. Hayase, W. Kong, R. Somani, and S. Oh, “SPECTRE: defending against backdoor attacks using robust statistics,” in Proc. Int. Conf. Mach. Learn. (ICML) , vol. 139, Online, 2021, pp. 4129–4139
work page 2021
-
[24]
Rethinking the backdoor attacks’ triggers: A frequency perspective,
Y . Zeng, W. Park, Z. M. Mao, and R. Jia, “Rethinking the backdoor attacks’ triggers: A frequency perspective,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Montréal, QC, Canada, 2021, pp. 16 453–16 461
work page 2021
-
[25]
Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff,
E. Borgnia, V . Cherepanova, L. Fowl, A. Ghiasi, J. Geiping, M. Gold- blum, T. Goldstein, and A. Gupta, “Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) , Toronto, ON, Canada, 2021, pp. 3855–3859
work page 2021
-
[26]
CutMix: Reg- ularization strategy to train strong classifiers with localizable features,
S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y . Yoo, “CutMix: Reg- ularization strategy to train strong classifiers with localizable features,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, Korea, 2019, pp. 6023–6032
work page 2019
-
[27]
Can you really backdoor federated learning?
A. T. Suresh, B. McMahan, P. Kairouz, and Z. Sun, “Can you really backdoor federated learning?” in Proc. Int. Workshop Neural Inf. Pro- cess. Syst. (NeurIPS) , Vancouver, BC, Canada, 2019, pp. 1–10
work page 2019
-
[28]
Robust anomaly detection and backdoor attack detection via differential privacy,
M. Du, R. Jia, and D. Song, “Robust anomaly detection and backdoor attack detection via differential privacy,” in Proc. Int. Conf. Learn. Represent. (ICLR), Addis Ababa, Ethiopia, 2020, pp. 1–19
work page 2020
-
[29]
Local and central differential privacy for robustness and privacy in federated learning,
M. Naseri, J. Hayes, and E. D. Cristofaro, “Local and central differential privacy for robustness and privacy in federated learning,” in Proc. Netw. Distrib. Syst. Secur. Symp. (NDSS) , San Diego, CA, USA, 2022, pp. 1–18
work page 2022
-
[30]
Deep learning with differential privacy,
M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Tal- war, and L. Zhang, “Deep learning with differential privacy,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur. (CCS) , Vienna, Austria, 2016, pp. 308–318
work page 2016
-
[31]
Deep partition aggregation: Provable defenses against general poisoning attacks,
A. Levine and S. Feizi, “Deep partition aggregation: Provable defenses against general poisoning attacks,” in Proc. Int. Conf. Learn. Represent. (ICLR), Online, 2021, pp. 1–20
work page 2021
-
[32]
Intrinsic certified robustness of bagging against data poisoning attacks,
J. Jia, X. Cao, and N. Z. Gong, “Intrinsic certified robustness of bagging against data poisoning attacks,” in Proc. Assoc. Adv. Artif. Intell. Conf. (AAAI), vol. 35, no. 9, Online, 2021, pp. 7961–7969
work page 2021
-
[33]
Certified robustness of nearest neighbors against data poisoning and backdoor attacks,
J. Jia, Y . Liu, X. Cao, and N. Z. Gong, “Certified robustness of nearest neighbors against data poisoning and backdoor attacks,” in Proc. Assoc. Adv. Artif. Intell. Conf. (AAAI) , vol. 36, no. 9, Online, 2022, pp. 9575– 9583
work page 2022
-
[34]
L. Breiman, “Bagging predictors,” Mach. Learn. , vol. 24, no. 2, pp. 123–140, 1996
work page 1996
-
[35]
STRIP: A defence against trojan attacks on deep neural networks,
Y . Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, “STRIP: A defence against trojan attacks on deep neural networks,” in Proc. Annu. Comput. Secur. Appl. Conf. (ACSAC) , San Juan, PR, USA, 2019, pp. 113–125
work page 2019
-
[36]
Deep probabilistic models to detect data poisoning attacks,
M. Subedar, N. A. Ahuja, R. Krishnan, I. J. Ndiour, and O. Tickoo, “Deep probabilistic models to detect data poisoning attacks,” inProc. Int. Workshop Neural Inf. Process. Syst. (NeurIPS), Vancouver, BC, Canada, 2019, pp. 1–5
work page 2019
-
[37]
CleaNN: Accelerated trojan shield for embedded neural networks,
M. Javaheripi, M. Samragh, G. Fields, T. Javidi, and F. Koushanfar, “CleaNN: Accelerated trojan shield for embedded neural networks,” in Proc. Int. Conf. Comput. Aided Des. (ICCAD) , Online, 2020, pp. 1–9
work page 2020
-
[38]
DeepSweep: An evaluation framework for mitigating DNN backdoor attacks using data augmentation,
H. Qiu, Y . Zeng, S. Guo, T. Zhang, M. Qiu, and B. Thuraisingham, “DeepSweep: An evaluation framework for mitigating DNN backdoor attacks using data augmentation,” in Proc. ACM Asia Conf. Comput. Commun. Secur. (AsiaCCS), Online, 2021, pp. 363–377. JOURNAL OF LATEX 15
work page 2021
-
[39]
Februus: Input purification defense against trojan attacks on deep neural network systems,
B. G. Doan, E. Abbasnejad, and D. C. Ranasinghe, “Februus: Input purification defense against trojan attacks on deep neural network systems,” in Proc. Annu. Comput. Secur. Appl. Conf. (ACSAC). Austin, TX, USA: Association for Computing Machinery, 2020, pp. 897–912
work page 2020
-
[40]
Model agnostic defence against backdoor attacks in machine learning,
S. Udeshi, S. Peng, G. Woo, L. Loh, L. Rawshan, and S. Chattopadhyay, “Model agnostic defence against backdoor attacks in machine learning,” IEEE Trans. Reliab., vol. 71, no. 2, pp. 880–895, 2022
work page 2022
-
[41]
On certifying robust- ness against backdoor attacks via randomized smoothing,
B. Wang, X. Cao, J. jia, and N. Z. Gong, “On certifying robust- ness against backdoor attacks via randomized smoothing,” in Proc. IEEE/CVF Workshop Comput. Vis. Pattern Recognit. (CVPR) , Online, 2020, pp. 1–5
work page 2020
-
[42]
Certified robustness to label-flipping attacks via randomized smoothing,
E. Rosenfeld, E. Winston, P. Ravikumar, and J. Z. Kolter, “Certified robustness to label-flipping attacks via randomized smoothing,” in Proc. Int. Conf. Mach. Learn. (ICML) , Online, 2020, pp. 8230–8241
work page 2020
-
[43]
RAB: Provable robustness against backdoor attacks,
M. Weber, X. Xu, B. Karlaš, C. Zhang, and B. Li, “RAB: Provable robustness against backdoor attacks,” in Proc. IEEE Symp. Secur. and Priv. (SP), San Francisco, CA, USA, 2023, pp. 1311–1328
work page 2023
-
[44]
Universal litmus patterns: Revealing backdoor attacks in CNNs,
S. Kolouri, A. Saha, H. Pirsiavash, and H. Hoffmann, “Universal litmus patterns: Revealing backdoor attacks in CNNs,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Seattle, W A, USA, 2020, pp. 298–307
work page 2020
-
[45]
Detecting AI trojans using meta neural analysis,
X. Xu, Q. Wang, H. Li, N. Borisov, C. A. Gunter, and B. Li, “Detecting AI trojans using meta neural analysis,” in Proc. IEEE Symp. Secur. and Priv. (SP), San Francisco, CA, USA, 2021, pp. 103–120
work page 2021
-
[46]
One-pixel signature: Character- izing CNN models for backdoor detection,
S. Huang, W. Peng, Z. Jia, and Z. Tu, “One-pixel signature: Character- izing CNN models for backdoor detection,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Glasgow, UK, 2020, pp. 326–341
work page 2020
-
[47]
Practical detection of trojan neural networks: Data-limited and data- free cases,
R. Wang, G. Zhang, S. Liu, P.-Y . Chen, J. Xiong, and M. Wang, “Practical detection of trojan neural networks: Data-limited and data- free cases,” in Proc. Eur. Conf. Comput. Vis. (ECCV) , Glasgow, UK, 2020, pp. 222–238
work page 2020
-
[48]
Topological detection of trojaned neural networks,
S. Zheng, Y . Zhang, H. Wagner, M. Goswami, and C. Chen, “Topological detection of trojaned neural networks,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , vol. 34, Online, 2021, pp. 17 258–17 272
work page 2021
-
[49]
Post-training detection of backdoor attacks for two-class and multi-attack scenarios,
Z. Xiang, D. Miller, and G. Kesidis, “Post-training detection of backdoor attacks for two-class and multi-attack scenarios,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Online, 2022, pp. 1–34
work page 2022
-
[50]
Overcom- ing catastrophic forgetting in neural networks,
J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcom- ing catastrophic forgetting in neural networks,” Proc. Nat. Acad. Sci. (PNAS), vol. 114, no. 13, pp. 3521–3526, 2017
work page 2017
-
[51]
Y . Liu, Y . Xie, and A. Srivastava, “Neural trojans,” in Proc. IEEE Int. Conf. Comput. Des. (ICCD) , Boston, MA, USA, 2017, pp. 45–48
work page 2017
-
[52]
Adversarial unlearning of backdoors via implicit hypergradient,
Y . Zeng, S. Chen, W. Park, Z. Mao, M. Jin, and R. Jia, “Adversarial unlearning of backdoors via implicit hypergradient,” in Proc. Int. Conf. Learn. Representations (ICLR) , Online, 2022, pp. 1–28
work page 2022
-
[53]
Distilling the knowledge in a neural network,
G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” in Proc. Int. Workshop Neural Inf. Process. Syst. (NeurIPS) , 2015, pp. 1–9
work page 2015
-
[54]
K. Yoshida and T. Fujino, “Disabling backdoor and identifying poison data by using knowledge distillation in backdoor attacks on deep neural networks,” in Proc. ACM Workshop Artif. Intell. Secur. (AISec) , Online, 2020, pp. 117–127
work page 2020
-
[55]
Neural attention distillation: Erasing backdoor triggers from deep neural networks,
Y . Li, X. Lyu, N. Koren, L. Lyu, B. Li, and X. Ma, “Neural attention distillation: Erasing backdoor triggers from deep neural networks,” in Proc. Int. Conf. Learn. Representations (ICLR) , Vienna, Austria, 2021, pp. 1–19
work page 2021
-
[56]
Neural net pruning: Why and how,
Sietsma and Dow, “Neural net pruning: Why and how,” in Proc. IEEE Int. Conf. Neural Netw. (ICNN), vol. 1, San Diego, CA, USA, 1988, pp. 325–333
work page 1988
-
[57]
Fine-pruning: Defending against backdooring attacks on deep neural networks,
K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against backdooring attacks on deep neural networks,” in Proc. Int. Symp. Res. Attacks Intrusions Defenses (RAID) , Heraklion, Greece, 2018, pp. 273– 294
work page 2018
-
[58]
Adversarial neuron pruning purifies backdoored deep models,
D. Wu and Y . Wang, “Adversarial neuron pruning purifies backdoored deep models,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , Online, 2021, pp. 1–13
work page 2021
-
[59]
Neural Cleanse: Identifying and mitigating backdoor attacks in neural networks,
B. Wang, Y . Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y . Zhao, “Neural Cleanse: Identifying and mitigating backdoor attacks in neural networks,” in Proc. IEEE Symp. Secur. and Priv. (SP) , San Francisco, CA, USA, 2019, pp. 707–723
work page 2019
-
[60]
Towards inspecting and eliminating trojan backdoors in deep neural networks,
W. Guo, L. Wang, Y . Xu, X. Xing, M. Du, and D. Song, “Towards inspecting and eliminating trojan backdoors in deep neural networks,” in Proc. IEEE Int. Conf. Data Mining (ICDM) , Sorrento, Italy, 2020, pp. 162–171
work page 2020
-
[61]
Better trigger inversion optimization in backdoor scanning,
G. Tao, G. Shen, Y . Liu, S. An, Q. Xu, S. Ma, P. Li, and X. Zhang, “Better trigger inversion optimization in backdoor scanning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , New Orleans, LA, USA, 2022, pp. 13 358–13 368
work page 2022
-
[62]
A. Huxley, Brave New World. London, UK: Chatto & Windus, 1932
work page 1932
-
[63]
W. R. Ashby, An Introduction to Cybernetics . London, UK: Chapman & Hall, 1956
work page 1956
-
[64]
Wiener, Cybernetics or Control and Communication in the Animal and the Machine
N. Wiener, Cybernetics or Control and Communication in the Animal and the Machine . Cambridge, MA, USA: MIT Press, 1948
work page 1948
-
[65]
B. Cope and M. Kalantzis, “The cybernetics of learning,” Educ. Philos. Theory, vol. 54, no. 14, pp. 2352–2388, 12 2022
work page 2022
-
[66]
Motivated forgetting and the study of repression,
B. Weiner, “Motivated forgetting and the study of repression,” J. Pers., vol. 36, no. 2, pp. 213–234, 1968
work page 1968
-
[67]
L. Bourtoule, V . Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot, “Machine unlearning,” in Proc. IEEE Symp. Secur. and Priv. (SP) , San Francisco, CA, USA, 2021, pp. 141–159
work page 2021
-
[68]
Model inversion attacks that exploit confidence information and basic countermeasures,
M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur. (CCS), Denver, CO, USA, 2015, pp. 1322–1333
work page 2015
-
[69]
Enhancement of eyewitness memory with the cognitive interview,
R. E. Geiselman, R. P. Fisher, D. P. MacKinnon, and H. L. Holland, “Enhancement of eyewitness memory with the cognitive interview,” Am. J. Psychol., vol. 99, no. 3, pp. 385–401, 1986
work page 1986
-
[70]
F. Galton, “Statistics of mental imagery,” Mind, vol. 5, no. 19, pp. 301– 318, 1880
-
[71]
D. Pitt, “Mental representation,” in The Stanford Encyclopedia of Philosophy. Stanford University, 2000
work page 2000
-
[72]
Representation, similarity, and the chorus of prototypes,
S. Edelman, “Representation, similarity, and the chorus of prototypes,” Minds Mach., vol. 5, no. 1, pp. 45–68, 1995
work page 1995
-
[73]
Sensory deprivation and hallu- cinations,
J. Vernon, T. Marton, and E. Peterson, “Sensory deprivation and hallu- cinations,” Science, vol. 133, no. 3467, pp. 1808–1812, 1961
work page 1961
-
[74]
Intriguing properties of neural networks,
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Banff, AB, Canada, 2014, pp. 1–10
work page 2014
-
[75]
Explaining and harnessing adversarial examples,
I. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proc. Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, 2015, pp. 1–11
work page 2015
-
[76]
Adversarial machine learning at scale,
A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” in Proc. Int. Conf. Learn. Represent. (ICLR) , 2017, pp. 1–17
work page 2017
-
[77]
Ensemble adversarial training: Attacks and defenses
F. Tramèr, A. Kurakin, N. Papernot, I. J. Goodfellow, D. Boneh, and P. D. McDaniel, “Ensemble adversarial training: Attacks and defenses.” in Proc. Int. Conf. Learn. Representations (ICLR), Vancouver, BC, Canada, 2018, pp. 1–20
work page 2018
-
[78]
Adversarial training and robustness for multiple perturbations,
F. Tramèr and D. Boneh, “Adversarial training and robustness for multiple perturbations,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS), Vancouver, BC, Canada, 2019, pp. 5866–5876
work page 2019
-
[79]
Towards deep learning models resistant to adversarial attacks
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks.” in Proc. Int. Conf. Learn. Representations (ICLR), Vancouver, BC, Canada, 2018, pp. 1–23
work page 2018
-
[80]
Gradient-based learning applied to document recognition,
Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278– 2324, 1998
work page 1998
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.