Imitation Game for Adversarial Disillusion with Chain-of-Thought Reasoning in Generative AI

Ching-Chun Chang; Fan-Yun Chen; Hanrui Wang; Isao Echizen; Kai Gao; Shih-Hong Gu

arxiv: 2501.19143 · v2 · submitted 2025-01-31 · 💻 cs.AI · cs.CR· cs.CV

Imitation Game for Adversarial Disillusion with Chain-of-Thought Reasoning in Generative AI

Ching-Chun Chang , Fan-Yun Chen , Shih-Hong Gu , Kai Gao , Hanrui Wang , Isao Echizen This is my paper

Pith reviewed 2026-05-23 04:46 UTC · model grok-4.3

classification 💻 cs.AI cs.CRcs.CV

keywords adversarial illusionsdeductive illusioninductive illusionimitation gamechain-of-thought reasoninggenerative AImultimodal agentdisillusion paradigm

0 comments

The pith

A chain-of-thought reasoning imitation game lets a multimodal generative agent neutralize deductive and inductive adversarial illusions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an imitation game paradigm to defend against adversarial illusions in machine perception. These illusions come in deductive form, exploiting decision boundaries with crafted stimuli, and inductive form, embedding backdoors during training. The core idea is a multimodal generative agent guided by chain-of-thought reasoning that reconstructs the semantic essence of inputs without attempting to reverse them to their original state. Experiments show this approach consistently counters both types of attacks in white-box and black-box settings. This matters because it offers a unified defense for generative AI systems facing multifaceted adversarial threats.

Core claim

The proposed disillusion paradigm centers on an imitation game where a multimodal generative agent, steered by chain-of-thought reasoning, observes, internalizes, and reconstructs the semantic essence of a sample in a way that liberates it from the classic pursuit of reversing the sample to its original state, thereby neutralizing both deductive and inductive adversarial illusions across various attack scenarios.

What carries the argument

The imitation game, featuring a multimodal generative agent steered by chain-of-thought reasoning that reconstructs semantic essence without reversing to the original state.

If this is right

The framework addresses both deductive illusions that interfere with decision-making and inductive illusions that trigger aberrant behaviors via backdoors.
It operates effectively in both white-box and black-box attack scenarios.
Experimental simulations using a multimodal generative dialogue agent validate the neutralization of illusions.
The method provides a unified defense against multiple forms of adversarial attacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could extend to defending against other types of model manipulations beyond adversarial examples.
By focusing on semantic reconstruction rather than exact reversal, it may inspire new robustness techniques in generative models.
Integration with existing AI systems might improve security in applications like autonomous decision-making.

Load-bearing premise

The multimodal generative agent steered by chain-of-thought reasoning can accurately reconstruct semantic essence without being susceptible to the adversarial illusions itself.

What would settle it

A test case where the generative agent's reconstruction fails to neutralize the illusion, resulting in the victim model still exhibiting the adversarial behavior under the same attack.

Figures

Figures reproduced from arXiv: 2501.19143 by Ching-Chun Chang, Fan-Yun Chen, Hanrui Wang, Isao Echizen, Kai Gao, Shih-Hong Gu.

**Figure 1.** Figure 1: Overview of an imitation game played by multimodal generative AI for shattering illusions induced by deductive and inductive illusory stimuli. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Visual comparison between original images (top row) and imitative images (bottom row) across various object classes. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Visual comparison of multiple defence methods (rows) against multiple attack methods (columns). [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Accuracy of the benign classifier under various non-targeted attack methods, evaluated with various defence methods. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Accuracy of the malicious classifier under various targeted attack methods, evaluated with various defence methods. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

As the cornerstone of artificial intelligence, machine perception confronts a fundamental threat posed by adversarial illusions. These adversarial attacks manifest in two primary forms: deductive illusion, where specific stimuli are crafted based on the victim model's general decision logic, and inductive illusion, where the victim model's general decision logic is shaped by specific stimuli. The former exploits the model's decision boundaries to create a stimulus that, when applied, interferes with its decision-making process. The latter reinforces a conditioned reflex in the model, embedding a backdoor during its learning phase that, when triggered by a stimulus, causes aberrant behaviours. The multifaceted nature of adversarial illusions calls for a unified defence framework, addressing vulnerabilities across various forms of attack. In this study, we propose a disillusion paradigm based on the concept of an imitation game. At the heart of the imitation game lies a multimodal generative agent, steered by chain-of-thought reasoning, which observes, internalises and reconstructs the semantic essence of a sample, liberated from the classic pursuit of reversing the sample to its original state. As a proof of concept, we conduct experimental simulations using a multimodal generative dialogue agent and evaluates the methodology under a variety of attack scenarios. Experimental results demonstrate that the proposed framework consistently neutralises both deductive and inductive adversarial illusions across diverse white-box and black-box attack scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames a CoT generative agent in an imitation game as a unified defense against deductive and inductive adversarial illusions, but supplies no data or checks on whether the agent itself resists those same attacks.

read the letter

The main thing to know is that this paper tries to unify defense against two attack types—deductive ones that craft stimuli to fool boundaries and inductive ones that plant backdoors—by having a multimodal generative agent reconstruct semantic essence through chain-of-thought reasoning instead of inverting the input. The imitation game framing is the fresh wrapper around existing generative agent and CoT ideas, and it is not just a restatement of prior work on either attack class alone.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an 'imitation game' disillusion paradigm in which a multimodal generative agent steered by chain-of-thought reasoning observes, internalizes, and reconstructs the semantic essence of input samples (rather than inverting them) in order to neutralize both deductive illusions (exploiting decision boundaries) and inductive illusions (backdoor triggers). As a proof of concept, experimental simulations are claimed to show consistent neutralization across white-box and black-box attack scenarios.

Significance. If the central claim were supported by verifiable experiments, the framework would offer a potentially unified generative defense that sidesteps conventional inversion-based or detection-based methods. The conceptual separation of deductive versus inductive illusions is a useful framing, but the absence of any quantitative results, baselines, or robustness checks on the agent itself prevents assessment of whether the approach advances the field.

major comments (2)

[Abstract] Abstract: the claim that 'experimental results demonstrate that the proposed framework consistently neutralises both deductive and inductive adversarial illusions across diverse white-box and black-box attack scenarios' supplies no quantitative metrics, attack implementations, baselines, error bars, or dataset details, rendering the central empirical claim unevaluable.
[Experimental simulations description] No section demonstrates that the CoT-steered multimodal generative agent itself resists the deductive (gradient-based) or inductive (backdoor) illusions under test; the neutralization claim is load-bearing on this unverified premise, as any vulnerability in the agent would be inherited by the reconstruction step.

minor comments (1)

[Abstract] The abstract would benefit from explicit citations to prior work distinguishing deductive versus inductive adversarial attacks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, agreeing where the manuscript requires clarification or revision.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'experimental results demonstrate that the proposed framework consistently neutralises both deductive and inductive adversarial illusions across diverse white-box and black-box attack scenarios' supplies no quantitative metrics, attack implementations, baselines, error bars, or dataset details, rendering the central empirical claim unevaluable.

Authors: We agree the abstract's phrasing implies stronger empirical support than is provided. The simulations are described only at a high level as a proof of concept. We will revise the abstract to state that preliminary simulations illustrate the framework's potential without asserting consistent neutralization or quantitative performance. revision: yes
Referee: [Experimental simulations description] No section demonstrates that the CoT-steered multimodal generative agent itself resists the deductive (gradient-based) or inductive (backdoor) illusions under test; the neutralization claim is load-bearing on this unverified premise, as any vulnerability in the agent would be inherited by the reconstruction step.

Authors: This observation is correct; the manuscript does not include dedicated robustness checks on the agent. The framework posits that multimodal CoT reasoning enables semantic reconstruction independent of the victim model's decision boundaries or triggers. We will add a discussion section clarifying this assumption, noting it as a limitation, and outlining why the agent's architecture is expected to limit inheritance of vulnerabilities. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical claim independent of inputs

full rationale

The paper advances a conceptual imitation-game framework whose central claim rests on experimental simulations demonstrating neutralization of deductive and inductive illusions. No equations, parameter fits, self-citations, or uniqueness theorems appear in the abstract or described text. The result is presented as an external empirical outcome rather than a quantity derived from or equivalent to its own premises by construction. This matches the default expectation of a non-circular paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested premise that a generative agent can perform semantic reconstruction that is robust to the very illusions it is meant to defeat; no free parameters, formal axioms, or new physical entities are named in the abstract.

axioms (1)

domain assumption A multimodal generative agent steered by chain-of-thought reasoning can reconstruct semantic essence in a way that neutralizes adversarial illusions without itself being compromised.
This premise is invoked as the core of the disillusion paradigm and is required for the experimental claim to hold.

pith-pipeline@v0.9.0 · 5783 in / 1376 out tokens · 23341 ms · 2026-05-23T04:46:17.940902+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages

[1]

Computing machinery and intelligence,

A. M. Turing, “Computing machinery and intelligence,” Mind, vol. 59, no. 236, pp. 433–460, 1950

work page 1950
[2]

The perceptron: A probabilistic model for information storage and organization in the brain

F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain.” Psychol. Rev., vol. 65, no. 6, pp. 386–408, 1958

work page 1958
[3]

Deep learning,

Y . LeCun, Y . Bengio, and G. E. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015

work page 2015
[4]

Adversarial classification,

N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma, “Adversarial classification,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. (KDD), Seattle, W A, USA, 2004, pp. 99–108

work page 2004
[5]

Adversarial learning,

D. Lowd and C. Meek, “Adversarial learning,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. (KDD) , Chicago, IL, USA, 2005, pp. 641–647

work page 2005
[6]

Adversarial machine learning,

L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar, “Adversarial machine learning,” in Proc. ACM Workshop Secur. Artif. Intell. (AISec), Chicago, IL, USA, 2011, pp. 43–58. 7

work page 2011
[7]

Wild patterns: Ten years after the rise of adversarial machine learning,

B. Biggio and F. Roli, “Wild patterns: Ten years after the rise of adversarial machine learning,” Pattern Recognit., vol. 84, pp. 317–331, 2018

work page 2018
[8]

Intriguing properties of neural networks,

C. Szegedy et al. , “Intriguing properties of neural networks,” in Proc. Int. Conf. Learn. Represent. (ICLR), Banff, AB, Canada, 2014, pp. 1–10

work page 2014
[9]

Explaining and harnessing adversarial examples,

I. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proc. Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, 2015, pp. 1–11

work page 2015
[10]

Adversarial examples in the physical world,

A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Toulon, France, 2017, pp. 1–14

work page 2017
[11]

Univer- sal adversarial perturbations,

S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Univer- sal adversarial perturbations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, 2017, pp. 86–94

work page 2017
[12]

Making machine learning robust against adversarial inputs,

I. Goodfellow, P. McDaniel, and N. Papernot, “Making machine learning robust against adversarial inputs,” Commun. ACM , vol. 61, no. 7, pp. 56–66, 2018

work page 2018
[13]

Towards deep learning models resistant to adversarial attacks

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks.” in Proc. Int. Conf. Learn. Representations (ICLR), Vancouver, BC, Canada, 2018, pp. 1–23

work page 2018
[14]

Improving transferability of adversarial examples with input diversity,

C. Xie et al. , “Improving transferability of adversarial examples with input diversity,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Long Beach, CA, USA, 2019, pp. 2725–2734

work page 2019
[15]

One pixel attack for fooling deep neural networks,

J. Su, D. V . Vargas, and K. Sakurai, “One pixel attack for fooling deep neural networks,” IEEE Trans. Evol. Comput. , vol. 23, no. 5, pp. 828– 841, 2019

work page 2019
[16]

Adversarial examples are not bugs, they are features,

A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are features,” inProc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , vol. 32, Vancouver, BC, Canada, 2019, pp. 1–12

work page 2019
[17]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,

F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in Proc. Int. Conf. Mach. Learn. (ICML) , vol. 119, Virtual Event, 2020, pp. 2206–2216

work page 2020
[18]

Neural trojans,

Y . Liu, Y . Xie, and A. Srivastava, “Neural trojans,” in Proc. IEEE Int. Conf. Comput. Des. (ICCD) , Boston, MA, USA, 2017, pp. 45–48

work page 2017
[19]

Trojaning attack on neural networks,

Y . Liu et al. , “Trojaning attack on neural networks,” in Proc. Netw. Distrib. Syst. Secur. Symp. (NDSS) , San Diego, CA, USA, 2018, pp. 1–15

work page 2018
[20]

BadNets: Evaluating backdooring attacks on deep neural networks,

T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “BadNets: Evaluating backdooring attacks on deep neural networks,” IEEE Access, vol. 7, pp. 47 230–47 244, 2019

work page 2019
[21]

Backdoor attacks against deep learning systems in the physi- cal world,

E. Wenger, J. Passananti, A. N. Bhagoji, Y . Yao, H. Zheng, and B. Y . Zhao, “Backdoor attacks against deep learning systems in the physi- cal world,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Nashville, TN, USA, 2021, pp. 6202–6211

work page 2021
[22]

Witches’ brew: Industrial scale data poisoning via gradient matching,

J. Geiping et al. , “Witches’ brew: Industrial scale data poisoning via gradient matching,” in Proc. Int. Conf. Learn. Representations (ICLR) , Vienna, Austria, 2021, pp. 1–24

work page 2021
[23]

Neurotoxin: Durable backdoors in federated learning,

Z. Zhang et al., “Neurotoxin: Durable backdoors in federated learning,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 162, Baltimore, MD, USA, 2022, pp. 26 429–26 446

work page 2022
[24]

Poison ink: Robust and invisible backdoor attack,

J. Zhang et al., “Poison ink: Robust and invisible backdoor attack,”IEEE Trans. Image Process., vol. 31, pp. 5691–5705, 2022

work page 2022
[25]

Feature squeezing: Detecting adversarial examples in deep neural networks,

W. Xu, D. Evans, and Y . Qi, “Feature squeezing: Detecting adversarial examples in deep neural networks,” in Proc. Netw. Distrib. Syst. Secur. Symp. (NDSS), San Diego, CA, USA, 2018, pp. 1–15

work page 2018
[26]

Thermometer encoding: One hot way to resist adversarial examples,

J. Buckman, A. Roy, C. Raffel, and I. Goodfellow, “Thermometer encoding: One hot way to resist adversarial examples,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Vancouver, BC, Canada, 2018, pp. 1– 22

work page 2018
[27]

Countering adversarial images using input transformations,

C. Guo, M. Rana, M. Cisse, and L. van der Maaten, “Countering adversarial images using input transformations,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Vancouver, BC, Canada, 2018, pp. 1–12

work page 2018
[28]

Defense-GAN: Pro- tecting classifiers against adversarial attacks using generative models,

P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-GAN: Pro- tecting classifiers against adversarial attacks using generative models,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Vancouver, BC, Canada, 2018, pp. 1–17

work page 2018
[29]

PixelDe- fend: Leveraging generative models to understand and defend against adversarial examples,

Y . Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman, “PixelDe- fend: Leveraging generative models to understand and defend against adversarial examples,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Vancouver, BC, Canada, 2018, pp. 1–20

work page 2018
[30]

Diffusion models for adversarial purification,

W. Nie, B. Guo, Y . Huang, C. Xiao, A. Vahdat, and A. Anandkumar, “Diffusion models for adversarial purification,” inProc. Int. Conf. Mach. Learn. (ICML) , vol. 162, Baltimore, MD, USA, 2022, pp. 16 805– 16 827

work page 2022
[31]

Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series

N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series. Cambridge, MA, USA: MIT Press, 1964

work page 1964
[32]

Scale-space and edge detection using anisotropic diffusion,

P. Perona and J. Malik, “Scale-space and edge detection using anisotropic diffusion,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 12, no. 7, pp. 629–639, 1990

work page 1990
[33]

Ideal spatial adaptation by wavelet shrinkage,

D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, no. 3, pp. 425–455, 1994

work page 1994
[34]

A non-local algorithm for image denoising,

A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 2, San Diego, CA, USA, 2005, pp. 60–65

work page 2005
[35]

Image denoising by sparse 3-D transform-domain collaborative filtering,

K. Dabov, A. Foi, V . Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080–2095, 2007

work page 2080
[36]

Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,

K. Zhang, W. Zuo, Y . Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. Image Process. , vol. 26, no. 7, pp. 3142–3155, 2017

work page 2017
[37]

Deep image prior,

V . Lempitsky, A. Vedaldi, and D. Ulyanov, “Deep image prior,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, 2018, pp. 9446–9454

work page 2018
[38]

Noise2Noise: Learning image restoration without clean data,

J. Lehtinen et al. , “Noise2Noise: Learning image restoration without clean data,” inProc. Int. Conf. Mach. Learn. (ICML), vol. 80, Stockholm, Sweden, 2018, pp. 2965–2974

work page 2018
[39]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , vol. 33, Virtual Event, 2020, pp. 6840–6851

work page 2020
[40]

Multimodal deep learning,

J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y . Ng, “Multimodal deep learning,” in Proc. Int. Conf. Mach. Learn. (ICML) , Bellevue, W A, USA, 2011, pp. 689–696

work page 2011
[41]

Multimodal learning with deep Boltzmann machines,

N. Srivastava and R. Salakhutdinov, “Multimodal learning with deep Boltzmann machines,” J. Mach. Learn. Res. , vol. 15, no. 84, pp. 2949– 2980, 2014

work page 2014
[42]

Learning transferable visual models from natural language supervision,

A. Radford et al. , “Learning transferable visual models from natural language supervision,” in Proc. Int. Conf. Mach. Learn. (ICML) , vol. 139, Virtual Event, 2021, pp. 8748–8763

work page 2021
[43]

Perceiver: General perception with iterative attention,

A. Jaegle, F. Gimeno, A. Brock, O. Vinyals, A. Zisserman, and J. Car- reira, “Perceiver: General perception with iterative attention,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 139, Virtual Event, 2021, pp. 4651– 4664

work page 2021
[44]

Zero-shot text-to-image generation,

A. Ramesh et al. , “Zero-shot text-to-image generation,” in Proc. Int. Conf. Mach. Learn. (ICML) , M. Meila and T. Zhang, Eds., vol. 139, Virtual Event, 2021, pp. 8821–8831

work page 2021
[45]

High-resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , New Orleans, LA, USA, 2022, pp. 10 674–10 685

work page 2022
[46]

A generalist agent,

S. Reed et al., “A generalist agent,” Trans. Mach. Learn. Res., pp. 1–42, 2022

work page 2022
[47]

Chain-of-thought prompting elicits reasoning in large lan- guage models,

J. Wei et al., “Chain-of-thought prompting elicits reasoning in large lan- guage models,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , vol. 35, New Orleans, LA, USA, 2022, pp. 24 824–24 837

work page 2022
[48]

Tree of thoughts: Deliberate problem solving with large language models,

S. Yao et al. , “Tree of thoughts: Deliberate problem solving with large language models,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 36, New Orleans, LA, USA, 2023, pp. 11 809–11 822

work page 2023
[49]

Graph of thoughts: Solving elaborate problems with large language models,

M. Besta et al. , “Graph of thoughts: Solving elaborate problems with large language models,” Proc. AAAI Conf. Artif. Intell. (AAAI) , vol. 38, no. 16, pp. 17 682–17 690, 2024

work page 2024
[50]

Language models are few-shot learners,

T. Brown et al., “Language models are few-shot learners,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , vol. 33, Virtual Event, 2020, pp. 1877–1901

work page 2020
[51]

Training language models to follow instructions with human feedback,

L. Ouyang et al. , “Training language models to follow instructions with human feedback,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 35, New Orleans, LA, USA, 2022, pp. 27 730–27 744

work page 2022
[52]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” in Proc. Annu. ACM Symp. User Interface Softw. Technol. (UIST) , San Francisco, CA, USA, 2023, pp. 1–22

work page 2023
[53]

Large language models and the reverse Turing test,

T. J. Sejnowski, “Large language models and the reverse Turing test,” Neural Comput., vol. 35, no. 3, pp. 309–342, 2023

work page 2023
[54]

Role play with large language models,

M. Shanahan, K. McDonell, and L. Reynolds, “Role play with large language models,” Nature, vol. 623, no. 7987, pp. 493–498, 2023

work page 2023
[55]

ImageNet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) , Miami, FL, USA, 2009, pp. 248–255

work page 2009
[56]

An image is worth 16x16 words: Transformers for image recognition at scale

A. Dosovitskiy et al. , “An image is worth 16x16 words: Transformers for image recognition at scale.” in Proc. Int. Conf. Learn. Represent. (ICLR), Virtual Event, 2021, pp. 1–21

work page 2021

[1] [1]

Computing machinery and intelligence,

A. M. Turing, “Computing machinery and intelligence,” Mind, vol. 59, no. 236, pp. 433–460, 1950

work page 1950

[2] [2]

The perceptron: A probabilistic model for information storage and organization in the brain

F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain.” Psychol. Rev., vol. 65, no. 6, pp. 386–408, 1958

work page 1958

[3] [3]

Deep learning,

Y . LeCun, Y . Bengio, and G. E. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015

work page 2015

[4] [4]

Adversarial classification,

N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma, “Adversarial classification,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. (KDD), Seattle, W A, USA, 2004, pp. 99–108

work page 2004

[5] [5]

Adversarial learning,

D. Lowd and C. Meek, “Adversarial learning,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. (KDD) , Chicago, IL, USA, 2005, pp. 641–647

work page 2005

[6] [6]

Adversarial machine learning,

L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar, “Adversarial machine learning,” in Proc. ACM Workshop Secur. Artif. Intell. (AISec), Chicago, IL, USA, 2011, pp. 43–58. 7

work page 2011

[7] [7]

Wild patterns: Ten years after the rise of adversarial machine learning,

B. Biggio and F. Roli, “Wild patterns: Ten years after the rise of adversarial machine learning,” Pattern Recognit., vol. 84, pp. 317–331, 2018

work page 2018

[8] [8]

Intriguing properties of neural networks,

C. Szegedy et al. , “Intriguing properties of neural networks,” in Proc. Int. Conf. Learn. Represent. (ICLR), Banff, AB, Canada, 2014, pp. 1–10

work page 2014

[9] [9]

Explaining and harnessing adversarial examples,

I. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proc. Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, 2015, pp. 1–11

work page 2015

[10] [10]

Adversarial examples in the physical world,

A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Toulon, France, 2017, pp. 1–14

work page 2017

[11] [11]

Univer- sal adversarial perturbations,

S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Univer- sal adversarial perturbations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, 2017, pp. 86–94

work page 2017

[12] [12]

Making machine learning robust against adversarial inputs,

I. Goodfellow, P. McDaniel, and N. Papernot, “Making machine learning robust against adversarial inputs,” Commun. ACM , vol. 61, no. 7, pp. 56–66, 2018

work page 2018

[13] [13]

Towards deep learning models resistant to adversarial attacks

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks.” in Proc. Int. Conf. Learn. Representations (ICLR), Vancouver, BC, Canada, 2018, pp. 1–23

work page 2018

[14] [14]

Improving transferability of adversarial examples with input diversity,

C. Xie et al. , “Improving transferability of adversarial examples with input diversity,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Long Beach, CA, USA, 2019, pp. 2725–2734

work page 2019

[15] [15]

One pixel attack for fooling deep neural networks,

J. Su, D. V . Vargas, and K. Sakurai, “One pixel attack for fooling deep neural networks,” IEEE Trans. Evol. Comput. , vol. 23, no. 5, pp. 828– 841, 2019

work page 2019

[16] [16]

Adversarial examples are not bugs, they are features,

A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are features,” inProc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , vol. 32, Vancouver, BC, Canada, 2019, pp. 1–12

work page 2019

[17] [17]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,

F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in Proc. Int. Conf. Mach. Learn. (ICML) , vol. 119, Virtual Event, 2020, pp. 2206–2216

work page 2020

[18] [18]

Neural trojans,

Y . Liu, Y . Xie, and A. Srivastava, “Neural trojans,” in Proc. IEEE Int. Conf. Comput. Des. (ICCD) , Boston, MA, USA, 2017, pp. 45–48

work page 2017

[19] [19]

Trojaning attack on neural networks,

Y . Liu et al. , “Trojaning attack on neural networks,” in Proc. Netw. Distrib. Syst. Secur. Symp. (NDSS) , San Diego, CA, USA, 2018, pp. 1–15

work page 2018

[20] [20]

BadNets: Evaluating backdooring attacks on deep neural networks,

T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “BadNets: Evaluating backdooring attacks on deep neural networks,” IEEE Access, vol. 7, pp. 47 230–47 244, 2019

work page 2019

[21] [21]

Backdoor attacks against deep learning systems in the physi- cal world,

E. Wenger, J. Passananti, A. N. Bhagoji, Y . Yao, H. Zheng, and B. Y . Zhao, “Backdoor attacks against deep learning systems in the physi- cal world,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Nashville, TN, USA, 2021, pp. 6202–6211

work page 2021

[22] [22]

Witches’ brew: Industrial scale data poisoning via gradient matching,

J. Geiping et al. , “Witches’ brew: Industrial scale data poisoning via gradient matching,” in Proc. Int. Conf. Learn. Representations (ICLR) , Vienna, Austria, 2021, pp. 1–24

work page 2021

[23] [23]

Neurotoxin: Durable backdoors in federated learning,

Z. Zhang et al., “Neurotoxin: Durable backdoors in federated learning,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 162, Baltimore, MD, USA, 2022, pp. 26 429–26 446

work page 2022

[24] [24]

Poison ink: Robust and invisible backdoor attack,

J. Zhang et al., “Poison ink: Robust and invisible backdoor attack,”IEEE Trans. Image Process., vol. 31, pp. 5691–5705, 2022

work page 2022

[25] [25]

Feature squeezing: Detecting adversarial examples in deep neural networks,

W. Xu, D. Evans, and Y . Qi, “Feature squeezing: Detecting adversarial examples in deep neural networks,” in Proc. Netw. Distrib. Syst. Secur. Symp. (NDSS), San Diego, CA, USA, 2018, pp. 1–15

work page 2018

[26] [26]

Thermometer encoding: One hot way to resist adversarial examples,

J. Buckman, A. Roy, C. Raffel, and I. Goodfellow, “Thermometer encoding: One hot way to resist adversarial examples,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Vancouver, BC, Canada, 2018, pp. 1– 22

work page 2018

[27] [27]

Countering adversarial images using input transformations,

C. Guo, M. Rana, M. Cisse, and L. van der Maaten, “Countering adversarial images using input transformations,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Vancouver, BC, Canada, 2018, pp. 1–12

work page 2018

[28] [28]

Defense-GAN: Pro- tecting classifiers against adversarial attacks using generative models,

P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-GAN: Pro- tecting classifiers against adversarial attacks using generative models,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Vancouver, BC, Canada, 2018, pp. 1–17

work page 2018

[29] [29]

PixelDe- fend: Leveraging generative models to understand and defend against adversarial examples,

Y . Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman, “PixelDe- fend: Leveraging generative models to understand and defend against adversarial examples,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Vancouver, BC, Canada, 2018, pp. 1–20

work page 2018

[30] [30]

Diffusion models for adversarial purification,

W. Nie, B. Guo, Y . Huang, C. Xiao, A. Vahdat, and A. Anandkumar, “Diffusion models for adversarial purification,” inProc. Int. Conf. Mach. Learn. (ICML) , vol. 162, Baltimore, MD, USA, 2022, pp. 16 805– 16 827

work page 2022

[31] [31]

Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series

N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series. Cambridge, MA, USA: MIT Press, 1964

work page 1964

[32] [32]

Scale-space and edge detection using anisotropic diffusion,

P. Perona and J. Malik, “Scale-space and edge detection using anisotropic diffusion,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 12, no. 7, pp. 629–639, 1990

work page 1990

[33] [33]

Ideal spatial adaptation by wavelet shrinkage,

D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, no. 3, pp. 425–455, 1994

work page 1994

[34] [34]

A non-local algorithm for image denoising,

A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 2, San Diego, CA, USA, 2005, pp. 60–65

work page 2005

[35] [35]

Image denoising by sparse 3-D transform-domain collaborative filtering,

K. Dabov, A. Foi, V . Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080–2095, 2007

work page 2080

[36] [36]

Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,

K. Zhang, W. Zuo, Y . Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. Image Process. , vol. 26, no. 7, pp. 3142–3155, 2017

work page 2017

[37] [37]

Deep image prior,

V . Lempitsky, A. Vedaldi, and D. Ulyanov, “Deep image prior,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, 2018, pp. 9446–9454

work page 2018

[38] [38]

Noise2Noise: Learning image restoration without clean data,

J. Lehtinen et al. , “Noise2Noise: Learning image restoration without clean data,” inProc. Int. Conf. Mach. Learn. (ICML), vol. 80, Stockholm, Sweden, 2018, pp. 2965–2974

work page 2018

[39] [39]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , vol. 33, Virtual Event, 2020, pp. 6840–6851

work page 2020

[40] [40]

Multimodal deep learning,

J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y . Ng, “Multimodal deep learning,” in Proc. Int. Conf. Mach. Learn. (ICML) , Bellevue, W A, USA, 2011, pp. 689–696

work page 2011

[41] [41]

Multimodal learning with deep Boltzmann machines,

N. Srivastava and R. Salakhutdinov, “Multimodal learning with deep Boltzmann machines,” J. Mach. Learn. Res. , vol. 15, no. 84, pp. 2949– 2980, 2014

work page 2014

[42] [42]

Learning transferable visual models from natural language supervision,

A. Radford et al. , “Learning transferable visual models from natural language supervision,” in Proc. Int. Conf. Mach. Learn. (ICML) , vol. 139, Virtual Event, 2021, pp. 8748–8763

work page 2021

[43] [43]

Perceiver: General perception with iterative attention,

A. Jaegle, F. Gimeno, A. Brock, O. Vinyals, A. Zisserman, and J. Car- reira, “Perceiver: General perception with iterative attention,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 139, Virtual Event, 2021, pp. 4651– 4664

work page 2021

[44] [44]

Zero-shot text-to-image generation,

A. Ramesh et al. , “Zero-shot text-to-image generation,” in Proc. Int. Conf. Mach. Learn. (ICML) , M. Meila and T. Zhang, Eds., vol. 139, Virtual Event, 2021, pp. 8821–8831

work page 2021

[45] [45]

High-resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , New Orleans, LA, USA, 2022, pp. 10 674–10 685

work page 2022

[46] [46]

A generalist agent,

S. Reed et al., “A generalist agent,” Trans. Mach. Learn. Res., pp. 1–42, 2022

work page 2022

[47] [47]

Chain-of-thought prompting elicits reasoning in large lan- guage models,

J. Wei et al., “Chain-of-thought prompting elicits reasoning in large lan- guage models,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , vol. 35, New Orleans, LA, USA, 2022, pp. 24 824–24 837

work page 2022

[48] [48]

Tree of thoughts: Deliberate problem solving with large language models,

S. Yao et al. , “Tree of thoughts: Deliberate problem solving with large language models,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 36, New Orleans, LA, USA, 2023, pp. 11 809–11 822

work page 2023

[49] [49]

Graph of thoughts: Solving elaborate problems with large language models,

M. Besta et al. , “Graph of thoughts: Solving elaborate problems with large language models,” Proc. AAAI Conf. Artif. Intell. (AAAI) , vol. 38, no. 16, pp. 17 682–17 690, 2024

work page 2024

[50] [50]

Language models are few-shot learners,

T. Brown et al., “Language models are few-shot learners,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS) , vol. 33, Virtual Event, 2020, pp. 1877–1901

work page 2020

[51] [51]

Training language models to follow instructions with human feedback,

L. Ouyang et al. , “Training language models to follow instructions with human feedback,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 35, New Orleans, LA, USA, 2022, pp. 27 730–27 744

work page 2022

[52] [52]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” in Proc. Annu. ACM Symp. User Interface Softw. Technol. (UIST) , San Francisco, CA, USA, 2023, pp. 1–22

work page 2023

[53] [53]

Large language models and the reverse Turing test,

T. J. Sejnowski, “Large language models and the reverse Turing test,” Neural Comput., vol. 35, no. 3, pp. 309–342, 2023

work page 2023

[54] [54]

Role play with large language models,

M. Shanahan, K. McDonell, and L. Reynolds, “Role play with large language models,” Nature, vol. 623, no. 7987, pp. 493–498, 2023

work page 2023

[55] [55]

ImageNet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) , Miami, FL, USA, 2009, pp. 248–255

work page 2009

[56] [56]

An image is worth 16x16 words: Transformers for image recognition at scale

A. Dosovitskiy et al. , “An image is worth 16x16 words: Transformers for image recognition at scale.” in Proc. Int. Conf. Learn. Represent. (ICLR), Virtual Event, 2021, pp. 1–21

work page 2021