arxiv: 2604.28176 · v1 · submitted 2026-04-30 · 🪐 quant-ph · cs.LG

Recognition: unknown

Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders

Emma Andrews , Sahan Sanjaya , Prabhat Mishra

Authors on Pith no claims yet

Pith reviewed 2026-05-07 07:36 UTC · model grok-4.3

classification 🪐 quant-ph cs.LG

keywords quantum machine learningadversarial attacksquantum autoencodersvariational quantum classifiersadversarial robustnessquantum image classificationdefense framework

0 comments

The pith

A quantum autoencoder purifies adversarial perturbations on quantum classifiers without needing attack-specific training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a defense for variational quantum classifiers in image classification tasks that are vulnerable to adversarial noise inserted into input samples. Rather than retraining the classifier on adversarial examples, which is often impractical and risks overfitting to one attack type, the approach trains a quantum autoencoder to reconstruct clean versions of the inputs from their perturbed versions. The reconstructed outputs are then fed to the classifier, and a separate confidence score based on reconstruction quality flags inputs that resist purification. Evaluations across multiple attacks show the method raises accuracy by as much as 68 percent relative to prior defenses.

Core claim

The quantum autoencoder reconstructs clean data samples from adversarially perturbed inputs without knowledge of the attack type or access to adversarial training data, thereby restoring high prediction accuracy on the downstream quantum classifier while supplying a reconstruction-based metric to detect samples that cannot be reliably purified.

What carries the argument

A quantum autoencoder trained to minimize reconstruction error on clean data, whose output serves as a purified input to the variational quantum classifier, together with a scalar derived from the reconstruction fidelity that acts as a per-sample indicator.

If this is right

Quantum classifiers become usable in settings where generating or storing adversarial training samples is infeasible.
The same reconstruction pipeline defends against multiple attack families without separate retraining for each.
Accuracy under attack rises substantially, reaching 68 percent above state-of-the-art baselines in the reported experiments.
The method yields an explicit per-sample that can be used to reject or reroute inputs before classification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the autoencoder is realized on near-term quantum hardware, the defense could run end-to-end on the same device as the classifier, reducing classical-quantum communication overhead.
The reconstruction metric might be combined with classical post-processing to create hybrid filters that handle noise sources beyond adversarial perturbations.
The approach could generalize to other quantum machine-learning tasks such as regression or generative modeling whenever an autoencoder can be trained to map perturbed inputs back to a clean manifold.

Load-bearing premise

The quantum autoencoder can reliably reconstruct clean samples from adversarially perturbed inputs without knowledge of the attack type or additional adversarial training data.

What would settle it

A test in which a previously unseen attack type produces inputs that the quantum autoencoder reconstructs with large error and the classifier accuracy remains low or drops below baseline defenses.

Figures

Figures reproduced from arXiv: 2604.28176 by Emma Andrews, Prabhat Mishra, Sahan Sanjaya.

**Figure 1.** Figure 1: A clean data sample can be adversarially perturbed view at source ↗

**Figure 3.** Figure 3: Classical autoencoder structure. decoder, to achieve data compression and reconstruction [11]. However, implementation and training of QAEs differ from CAEs. The encoder U(θ), parameterized by θ, in the QAE is responsible for encoding the input state |ψin⟩ on n qubits. The encoded representation of |ψin⟩ is the latent space representation, and consists of k qubits, where k < n. The remaining n − k qubits … view at source ↗

**Figure 4.** Figure 4: Quantum autoencoder structure. C. Adversarial Attacks A machine learning model can be attacked by creating an adversarial sample where the input data sample is perturbed with specifically crafted noise to cause the machine learning model to produce an incorrect result. One major category of attacks are gradient-based attacks, where the gradients calculated during the execution of a machine learning model a… view at source ↗

**Figure 5.** Figure 5: Example of FGSM ϵ = 0.30 attack on MNIST images. The original MNIST images are shown in the top row, while the adversarial images are in the bottom row. minimizes the loss with respect to a chosen target label by applying the perturbation in the opposite gradient direction, thereby steering the input toward an incorrect class boundary. An example of original MNIST images perturbed using FGSM with ϵ = 0.30 … view at source ↗

**Figure 6.** Figure 6: An overview of our defense framework. A sample, either clean or adversarial, is given as input view at source ↗

**Figure 7.** Figure 7: Two layers of the circuit-centric classifier design for view at source ↗

**Figure 8.** Figure 8: A SWAP test used to measure the fidelity between view at source ↗

**Figure 9.** Figure 9: The MNIST VQC-100 model can still predict with view at source ↗

**Figure 11.** Figure 11: With the PGD adversarial perturbations, the QAE view at source ↗

**Figure 12.** Figure 12: Accuracy results across various models and datasets for the FGSM and PGD attacks. Each attack is defended with view at source ↗

read the original abstract

Machine learning models can learn from data samples to carry out various tasks efficiently. When data samples are adversarially manipulated, such as by insertion of carefully crafted noise, it can cause the model to make mistakes. Quantum machine learning models are also vulnerable to such adversarial attacks, especially in image classification using variational quantum classifiers. While there are promising defenses against these adversarial perturbations, such as training with adversarial samples, they face practical limitations. For example, they are not applicable in scenarios where training with adversarial samples is either not possible or can overfit the models on one type of attack. In this paper, we propose an adversarial training-free defense framework that utilizes a quantum autoencoder to purify the adversarial samples through reconstruction. Moreover, our defense framework provides a confidence metric to identify potentially adversarial samples that cannot be purified the quantum autoencoder. Extensive evaluation demonstrates that our defense framework can significantly outperform state-of-the-art in prediction accuracy (up to 68%) under adversarial attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an adversarial-training-free defense for variational quantum classifiers against perturbations in image classification tasks. It uses a quantum autoencoder (QAE) trained exclusively on clean data to reconstruct (purify) adversarial inputs, paired with a confidence metric to flag samples that cannot be reliably purified. The central claim is that extensive evaluations demonstrate this framework significantly outperforms state-of-the-art defenses, achieving up to 68% higher prediction accuracy under adversarial attacks.

Significance. If the reconstruction mechanism and accuracy gains are rigorously validated, the work would provide a practical advance for robust quantum machine learning. By avoiding the need for adversarial training data or attack-specific knowledge, it addresses a key limitation of existing defenses and could improve the deployability of variational quantum classifiers in real-world settings where generating adversarial examples is costly or infeasible.

major comments (2)

[Experimental Evaluation] Experimental Evaluation section: The reported accuracy improvements (up to 68%) are presented as end-to-end classifier performance but without quantitative reconstruction metrics such as fidelity, MSE, or latent-space distance between QAE outputs on adversarial inputs versus clean inputs. This omission makes it impossible to confirm that gains arise from the QAE's out-of-distribution reconstruction rather than from the confidence metric simply rejecting hard samples.
[Proposed Defense Framework] Proposed Defense Framework section: The defense relies on the QAE (trained only on clean samples) mapping perturbed inputs back onto the clean manifold used by the downstream classifier. No ablation is shown that removes the confidence filter while retaining the QAE reconstruction, nor are reconstruction error distributions provided for different attack strengths. Without these, the causal attribution of the headline accuracy claim to purification remains unverified.

minor comments (2)

[Abstract] Abstract: The phrase 'up to 68%' is stated without naming the datasets, attack models (e.g., FGSM, PGD), number of shots, or baseline defenses, which reduces the ability to interpret the magnitude of the improvement.
[Notation] Notation: The confidence metric is described qualitatively; a concise mathematical definition (e.g., an equation for the reconstruction-based score) would improve reproducibility and clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and will incorporate the requested analyses and ablations into the revised manuscript to strengthen the validation of the defense framework.

read point-by-point responses

Referee: [Experimental Evaluation] Experimental Evaluation section: The reported accuracy improvements (up to 68%) are presented as end-to-end classifier performance but without quantitative reconstruction metrics such as fidelity, MSE, or latent-space distance between QAE outputs on adversarial inputs versus clean inputs. This omission makes it impossible to confirm that gains arise from the QAE's out-of-distribution reconstruction rather than from the confidence metric simply rejecting hard samples.

Authors: We agree that direct reconstruction metrics would help isolate the contribution of the QAE purification. In the revised manuscript we will add quantitative results reporting average fidelity, mean-squared error, and latent-space distances between QAE outputs on adversarial inputs and the corresponding clean inputs. These metrics will be presented alongside the end-to-end accuracy figures to demonstrate that the observed robustness gains arise from the reconstruction step rather than solely from the confidence-based rejection of difficult samples. revision: yes
Referee: [Proposed Defense Framework] Proposed Defense Framework section: The defense relies on the QAE (trained only on clean samples) mapping perturbed inputs back onto the clean manifold used by the downstream classifier. No ablation is shown that removes the confidence filter while retaining the QAE reconstruction, nor are reconstruction error distributions provided for different attack strengths. Without these, the causal attribution of the headline accuracy claim to purification remains unverified.

Authors: We acknowledge the value of an explicit ablation that isolates the QAE reconstruction. In the revised version we will include an ablation study that evaluates classifier accuracy using only the QAE-reconstructed outputs without applying the confidence filter. We will also add reconstruction-error distributions (fidelity and MSE histograms) across different attack strengths, for example by varying the perturbation magnitude in the PGD and other attacks considered. These additions will allow readers to directly assess the purification effect independent of the confidence mechanism. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the proposed defense framework

full rationale

The paper proposes an adversarial training-free defense using a quantum autoencoder for purifying adversarial samples and a confidence metric to identify unpurifiable ones. This is an empirical framework evaluated on prediction accuracy under attacks, with claims of up to 68% improvement. No mathematical derivations, equations, or self-referential definitions are present in the provided abstract or description. The reconstruction is described as a procedure trained on clean data, not defined in terms of the adversarial inputs or predictions it enables. The performance claims rest on experimental results rather than any fitted parameter or self-citation that reduces the result to its inputs by construction. Thus, the derivation chain (such as it is) is self-contained and does not exhibit circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.0 · 5468 in / 1069 out tokens · 65648 ms · 2026-05-07T07:36:47.744973+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 6 canonical work pages · 5 internal anchors

[1]

Review of Artificial Intelligence Adversarial Attack and Defense Technologies,

S. Qiu, Q. Liu, S. Zhou, and C. Wu, “Review of Artificial Intelligence Adversarial Attack and Defense Technologies,”Applied Sciences, vol. 9, no. 5, Mar. 2019

2019
[2]

Adversarial Attacks and Defenses in Deep Learning,

K. Ren, T. Zheng, Z. Qin, and X. Liu, “Adversarial Attacks and Defenses in Deep Learning,”Engineering, vol. 6, no. 3, pp. 346–360, Mar. 2020

2020
[3]

The MNIST database of handwritten digits,

Y . LeCun, “The MNIST database of handwritten digits,” 1998

1998
[4]

An introduction to quantum machine learning,

M. Schuld, I. Sinayskiy, and F. Petruccione, “An introduction to quantum machine learning,”Contemporary Physics, vol. 56, no. 2, pp. 172–185, Apr. 2015

2015
[5]

Quantum Machine Learning: Recent Advances, Challenges, and Perspectives,

P. Lamichhane and D. B. Rawat, “Quantum Machine Learning: Recent Advances, Challenges, and Perspectives,”IEEE Access, vol. 13, pp. 94 057–94 105, 2025

2025
[6]

Benchmarking adversarially robust quantum machine learning at scale,

M. T. West, S. M. Erfani, C. Leckie, M. Sevior, L. C. L. Hollenberg, and M. Usman, “Benchmarking adversarially robust quantum machine learning at scale,”Physical Review Research, vol. 5, no. 2, p. 023186, Jun. 2023

2023
[7]

A Comparative Analysis of Adversarial Robustness for Quantum and Classical Machine Learning Models,

M. Wendlinger, K. Tscharke, and P. Debus, “A Comparative Analysis of Adversarial Robustness for Quantum and Classical Machine Learning Models,” in2024 IEEE International Conference on Quantum Comput- ing and Engineering (QCE), vol. 01, Sep. 2024, pp. 1447–1457

2024
[8]

Explaining and Harnessing Adversarial Examples

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing Adversarial Examples,”arXiv:1412.6572, Mar. 2015

work page internal anchor Pith review arXiv 2015
[9]

Towards Deep Learning Models Resistant to Adversarial Attacks

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards Deep Learning Models Resistant to Adversarial Attacks,” arXiv:1706.06083, Sep. 2019

work page internal anchor Pith review arXiv 2019
[10]

Classical autoencoder distillation of quantum adversarial manipulations,

A. Khatun and M. Usman, “Classical autoencoder distillation of quantum adversarial manipulations,”Physical Review Research, vol. 7, no. 4, p. L042054, Dec. 2025

2025
[11]

Quantum autoencoders for efficient compression of quantum data,

J. Romero, J. P. Olson, and A. Aspuru-Guzik, “Quantum autoencoders for efficient compression of quantum data,”Quantum Science and Technology, vol. 2, no. 4, p. 045001, Aug. 2017

2017
[12]

Quantum machine learning,

J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, “Quantum machine learning,”Nature, vol. 549, no. 7671, pp. 195–202, Sep. 2017

2017
[13]

Reducing the Dimensionality of Data with Neural Networks,

G. E. Hinton and R. R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks,”Science, vol. 313, no. 5786, pp. 504–507, Jul. 2006

2006
[14]

Quantum noise protects quantum classifiers against adversaries,

Y . Du, M.-H. Hsieh, T. Liu, D. Tao, and N. Liu, “Quantum noise protects quantum classifiers against adversaries,”Physical Review Research, vol. 3, no. 2, p. 023153, May 2021

2021
[15]

Enhancing adversarial robustness of quantum neural networks by adding noise layers,

C. Huang and S. Zhang, “Enhancing adversarial robustness of quantum neural networks by adding noise layers,”New Journal of Physics, vol. 25, no. 8, p. 083019, Aug. 2023

2023
[16]

Quantum adversarial machine learning,

S. Lu, L.-M. Duan, and D.-L. Deng, “Quantum adversarial machine learning,”Physical Review Research, vol. 2, no. 3, p. 033212, Aug. 2020

2020
[17]

Towards quantum enhanced adversarial robustness in machine learning,

M. T. West, S.-L. Tsang, J. S. Low, C. D. Hill, C. Leckie, L. C. L. Hollenberg, S. M. Erfani, and M. Usman, “Towards quantum enhanced adversarial robustness in machine learning,”Nature Machine Intelli- gence, vol. 5, no. 6, pp. 581–589, Jun. 2023

2023
[18]

Training robust and generalizable quantum models,

J. Berberich, D. Fink, D. Pranji ´c, C. Tutschku, and C. Holm, “Training robust and generalizable quantum models,”Physical Review Research, vol. 6, no. 4, p. 043326, Dec. 2024

2024
[19]

PuV AE: A Variational Autoencoder to Purify Adversarial Examples,

U. Hwang, J. Park, H. Jang, S. Yoon, and N. I. Cho, “PuV AE: A Variational Autoencoder to Purify Adversarial Examples,”IEEE Access, vol. 7, pp. 126 582–126 593, 2019

2019
[20]

Defense-gan: Protecting classifiers against adversarial attacks using generative models

P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-GAN: Protect- ing Classifiers Against Adversarial Attacks Using Generative Models,” arXiv:1805.06605, May 2018

work page arXiv 2018
[21]

Diffusion Models for Adversarial Purification,

W. Nie, B. Guo, Y . Huang, C. Xiao, A. Vahdat, and A. Anandkumar, “Diffusion Models for Adversarial Purification,” inProceedings of the 39th International Conference on Machine Learning. PMLR, Jun. 2022, pp. 16 805–16 827

2022
[22]

On Adaptive Attacks to Adversarial Example Defenses,

F. Tramer, N. Carlini, W. Brendel, and A. Madry, “On Adaptive Attacks to Adversarial Example Defenses,” inAdvances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 1633– 1645

2020
[23]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,

F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” inProceedings of the 37th International Conference on Machine Learning. PMLR, Nov. 2020, pp. 2206–2216

2020
[24]

Circuit-centric quantum classifiers,

M. Schuld, A. Bocharov, K. M. Svore, and N. Wiebe, “Circuit-centric quantum classifiers,”Physical Review A, vol. 101, no. 3, p. 032308, Mar. 2020

2020
[25]

Schuld and F

M. Schuld and F. Petruccione,Supervised Learning with Quantum Com- puters, ser. Quantum Science and Technology. Springer International Publishing, 2018

2018
[26]

Quantum Fin- gerprinting,

H. Buhrman, R. Cleve, J. Watrous, and R. de Wolf, “Quantum Fin- gerprinting,”Physical Review Letters, vol. 87, no. 16, p. 167902, Sep. 2001

2001
[27]

PennyLane: Automatic differentiation of hybrid quantum-classical computations

V . Bergholmet al., “PennyLane: Automatic differentiation of hybrid quantum-classical computations,”arXiv:1811.04968, Jul. 2022

work page internal anchor Pith review arXiv 2022
[28]

PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation,

J. Anselet al., “PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation,” inProceed- ings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ser. ASPLOS ’24, vol. 2. New York, NY , USA: Association for Computing Machinery, Apr. 2024, ...

2024
[29]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

H. Xiao, K. Rasul, and R. V ollgraf, “Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms,” arXiv:1708.07747, Sep. 2017

work page internal anchor Pith review arXiv 2017
[30]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980, Dec. 2014

work page internal anchor Pith review arXiv 2014