Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings

Amir Aminifar; Anahita Baninajjar; Ananth Balashankar; Fatemeh Akbarian; Yingyi Zhang

arxiv: 2511.21893 · v2 · submitted 2025-11-26 · 💻 cs.LG

Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings

Fatemeh Akbarian , Anahita Baninajjar , Yingyi Zhang , Ananth Balashankar , Amir Aminifar This is my paper

Pith reviewed 2026-05-17 04:12 UTC · model grok-4.3

classification 💻 cs.LG

keywords adversarial illusionsmulti-modal embeddingsgenerative mitigationconsensus aggregationvariational autoencoderscross-modal alignmentImageBindadversarial robustness

0 comments

The pith

A consensus mechanism over variational autoencoder samples purifies adversarial perturbations to restore cross-modal alignment in multi-modal embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Multi-modal models align images, text and other inputs in one embedding space, yet tiny adversarial changes can break that alignment and mislead tasks. The paper develops a defense that draws multiple reconstructions from a variational autoencoder and combines them by consensus to recover the original natural alignment. The method requires no knowledge of the downstream task. Experiments on ImageBind show attack success rates fall to nearly zero and alignment improves for both clean and attacked inputs. If the approach holds, it supplies a general way to harden these models without retraining or task-specific tuning.

Core claim

The central claim is that sampling multiple reconstructions from a variational autoencoder and aggregating them via consensus-based aggregation restores the natural cross-modal alignment of a perturbed input, driving illusion attack success rates to near zero on ImageBind while also strengthening alignment on unperturbed inputs.

What carries the argument

Consensus-based aggregation over multiple samples generated by a variational autoencoder, which selects reconstructions that lie on the natural data manifold to counteract adversarial distortion.

If this is right

Illusion attack success rates drop to near zero on ImageBind.
Cross-modal alignment improves for both unperturbed and perturbed inputs.
The defense operates in a task-agnostic manner without reference to any downstream application.
The same purification step works on inputs that were never attacked.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sampling-and-consensus pattern could be tested on other multi-modal encoders whose training distributions allow similar generative models.
Combining this input purification with existing adversarial training might produce additive robustness gains.
Measuring wall-clock cost versus number of samples would reveal practical deployment trade-offs the paper leaves open.

Load-bearing premise

That samples drawn from the variational autoencoder will sufficiently cover the natural data manifold so consensus recovers the original alignment rather than settling on a new incorrect one.

What would settle it

If the consensus embedding after defense remains systematically closer to the adversarially perturbed embedding than to the clean unperturbed embedding across a large test set, the recovery claim is falsified.

Figures

Figures reproduced from arXiv: 2511.21893 by Amir Aminifar, Anahita Baninajjar, Ananth Balashankar, Fatemeh Akbarian, Yingyi Zhang.

**Figure 1.** Figure 1: Overview of our consensus-based generative sampling mitigation framework. Our mitigation scheme has two main components: a generative sampling [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Effect of sampling size on reconstruction robustness for VAE and DM [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 5.** Figure 5: Distribution of cosine similarities between perturbed embeddings and target labels. Attacks with our mitigation yield low cosine values, whereas attacks without it reach the maximum similarity threshold. C. Mitigation Computational Overheads Let us now discuss the trade-off between defense success rate and the computational overheads of our proposed defense mechanism [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗

**Figure 6.** Figure 6: presents the results of extending the adversarial illusion analysis to a text-generation downstream task. For each input image, either original or perturbed, we generate a single textual description using the corresponding generative model (VAE or DM). We then compute the similarity between the generated text embedding and both the original and target label embeddings using the all-mpnet-base-v2 model fro… view at source ↗

read the original abstract

Multi-modal foundation models align images, text, and other modalities in a shared embedding space but remain vulnerable to adversarial illusions [35], where imperceptible perturbations disrupt cross-modal alignment and mislead downstream tasks. To counteract the effects of adversarial illusions, we propose a task-agnostic mitigation mechanism that purifies the attacker's perturbed input using generative models, e.g., Variational Autoencoders (VAEs), to restore natural alignment. To further enhance the defense mechanism, we adopt a generative sampling strategy combined with a consensus-based aggregation scheme over the outcomes of the generated samples. Our experiments on ImageBind, a state-of-the-art multi-modal encoder, show that our approach substantially reduces the illusion attack success rates to near-zero and improves cross-modal alignment in unperturbed and perturbed input settings, providing an effective and task-agnostic defense against adversarial illusions. The code is available at https://github.com/fatemehakb/adversarial-illusions-mitigation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper combines VAE sampling with consensus aggregation to cut illusion attack success to near zero on ImageBind, but the central assumption that generated samples recover the clean manifold rather than a new consistent error needs direct evidence.

read the letter

The core claim is that a generative purification step followed by consensus over multiple VAE samples can restore cross-modal alignment after an adversarial illusion attack. They test this on ImageBind and report near-zero success rates plus better alignment even on clean inputs. The approach is task-agnostic and they released code, which is useful for anyone trying to harden multi-modal encoders in practice. The combination itself is not entirely novel, but applying it specifically to the illusion threat model and showing the effect sizes is the incremental step. Experiments appear to include both perturbed and unperturbed cases, which is a reasonable check. The main uncertainty is whether the VAE draws from a perturbed input actually land close to the original clean points on the manifold. If the latent space has gaps or the perturbation pushes reconstructions toward a different but internally consistent mode, then majority voting could reinforce the wrong alignment instead of correcting it. The abstract does not include reconstruction error numbers or embedding distance tables that would address this directly, so the result rests on that unshown coverage property. If the full paper has those metrics and proper attack-strength baselines, the defense looks like a practical engineering layer worth testing on other models. Readers working on multi-modal security or robustness would get the most out of it. I would send this to peer review so referees can verify the quantitative results and check whether the manifold assumption holds under stronger attacks.

Referee Report

2 major / 2 minor

Summary. The paper proposes a task-agnostic defense against adversarial illusions in multi-modal embeddings (e.g., ImageBind) by purifying perturbed inputs via VAE generative sampling followed by consensus-based aggregation to restore natural cross-modal alignment. Experiments are claimed to reduce attack success rates to near-zero while also improving alignment on unperturbed inputs; code is released.

Significance. If the quantitative claims hold under rigorous evaluation, the work would offer a practical post-processing defense for multi-modal foundation models. The generative-plus-consensus approach is a distinct angle from standard adversarial training or detection, and the public code aids reproducibility. Significance is currently limited by the absence of detailed experimental metrics in the visible text.

major comments (2)

[Abstract] Abstract: the central claim that the method 'substantially reduces the illusion attack success rates to near-zero' is presented without any quantitative tables, baseline comparisons, attack-strength parameters, or success-rate numbers. This absence prevents verification of the magnitude of improvement and undermines assessment of the central empirical result.
[Method] Method / defense description: the approach assumes that VAE samples drawn from a perturbed input will predominantly lie on the natural manifold and that consensus will therefore recover the original clean alignment rather than a new but internally consistent incorrect one. No analysis of reconstruction fidelity, embedding-distance distributions, or coverage of the natural manifold for perturbed versus clean inputs is referenced, leaving the load-bearing assumption untested.

minor comments (2)

[Abstract] Abstract: consider inserting one or two concrete numerical results (e.g., 'success rate reduced from X% to Y%') or explicit pointers to experimental tables/figures.
[Introduction] The citation to [35] for the definition of adversarial illusions should be checked for completeness; ensure the attack formulation used in experiments matches the referenced definition.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and have revised the manuscript to strengthen the presentation of results and the validation of our core assumptions.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the method 'substantially reduces the illusion attack success rates to near-zero' is presented without any quantitative tables, baseline comparisons, attack-strength parameters, or success-rate numbers. This absence prevents verification of the magnitude of improvement and undermines assessment of the central empirical result.

Authors: We agree that the abstract would be strengthened by including specific quantitative results. The full manuscript contains detailed tables, baseline comparisons, and success-rate numbers under defined attack strengths on ImageBind. In the revision we will incorporate key metrics (e.g., pre- and post-mitigation attack success rates approaching zero, alignment improvements on clean and attacked inputs) directly into the abstract while preserving its brevity. revision: yes
Referee: [Method] Method / defense description: the approach assumes that VAE samples drawn from a perturbed input will predominantly lie on the natural manifold and that consensus will therefore recover the original clean alignment rather than a new but internally consistent incorrect one. No analysis of reconstruction fidelity, embedding-distance distributions, or coverage of the natural manifold for perturbed versus clean inputs is referenced, leaving the load-bearing assumption untested.

Authors: The referee correctly notes that the manuscript does not yet provide explicit supporting analysis for this assumption. Our reported results demonstrate that consensus aggregation restores cross-modal alignment on perturbed inputs to levels comparable to clean inputs, which is consistent with recovery of natural manifold structure. To make this assumption explicit and testable, we will add a new subsection with reconstruction fidelity metrics, embedding-distance distributions, and manifold-coverage comparisons between clean and perturbed inputs. revision: yes

Circularity Check

0 steps flagged

No circularity: defense is an independent generative post-processing step

full rationale

The paper introduces a task-agnostic mitigation using VAEs for purification of perturbed inputs followed by generative sampling and consensus aggregation to restore cross-modal alignment in models like ImageBind. The abstract and described method present this as an external post-processing defense whose performance is evaluated experimentally against attack success rates. No equations, derivations, or self-citations reduce the claimed near-zero attack success or improved alignment metrics to quantities defined by the attack itself or by construction from fitted inputs. The central premise relies on the generative model's ability to cover the natural manifold (an explicit assumption, not a definitional loop), and the approach remains self-contained against external benchmarks without load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on the standard assumption that a VAE trained on clean data can generate samples close to the natural distribution and that majority consensus over those samples recovers the correct embedding.

free parameters (2)

number of generated samples
Hyperparameter controlling how many VAE draws are used for consensus; value chosen to balance robustness and compute.
consensus threshold
Minimum agreement level required to accept a purified output; not specified in abstract.

axioms (1)

domain assumption VAE latent space contains points whose decodings lie near the clean data manifold
Invoked when the method assumes generated samples can restore natural alignment.

pith-pipeline@v0.9.0 · 5489 in / 1137 out tokens · 28123 ms · 2026-05-17T04:12:23.831760+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

consensus-based generative sampling framework that reconstructs sanitized inputs from adversarially perturbed samples... majority aggregation
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

projecting perturbed inputs back toward the natural data manifold

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 6 internal anchors

[1]

Auto encoder-based defense mechanism against popular adversarial attacks in deep learning.PloS one, 19(10): e0307363, 2024

Syeda Nazia Ashraf, Raheel Siddiqi, and Humera Farooq. Auto encoder-based defense mechanism against popular adversarial attacks in deep learning.PloS one, 19(10): e0307363, 2024

work page 2024
[2]

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International conference on machine learning, pages 274–283. PMLR, 2018

work page 2018
[3]

Gradient-free adversarial purification with diffu- sion models.arXiv preprint arXiv:2501.13336, 2025

Xuelong Dai, Dong Wang, Duan Mingxing, and Bin Xiao. Gradient-free adversarial purification with diffu- sion models.arXiv preprint arXiv:2501.13336, 2025

work page arXiv 2025
[4]

A comparative study on denoising from facial images using convolu- tional autoencoder.Gazi University Journal of Science, 36(3):1122–1138, 2023

Muazzez Buket Darıcı and Zeki Erdem. A comparative study on denoising from facial images using convolu- tional autoencoder.Gazi University Journal of Science, 36(3):1122–1138, 2023

work page 2023
[5]

Shield: Fast, practical defense and vaccination for deep learning using jpeg compres- sion

Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Fred Hohman, Siwei Li, Li Chen, Michael E Kounavis, and Duen Horng Chau. Shield: Fast, practical defense and vaccination for deep learning using jpeg compres- sion. InProceedings of the 24th ACM SIGKDD Inter- national Conference on Knowledge Discovery & Data Mining, pages 196–204, 2018

work page 2018
[6]

Adversarial attacks to multi-modal models

Zhihao Dou, Xin Hu, Haibo Yang, Zhuqing Liu, and Minghong Fang. Adversarial attacks to multi-modal models. InProceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis, LAMPS ’24, page 35–46, New York, NY , USA,

work page
[7]

ISBN 9798400712098

Association for Computing Machinery. ISBN 9798400712098. doi: 10.1145/3689217.3690619. URL https://doi.org/10.1145/3689217.3690619

work page doi:10.1145/3689217.3690619
[8]

Diffcap: Diffusion-based cumulative adversarial purification for vision language models.arXiv preprint arXiv:2506.03933, 2025

Jia Fu, Yongtao Wu, Yihang Chen, Kunyu Peng, Xiao Zhang, V olkan Cevher, Sepideh Pashami, and Anders Holst. Diffcap: Diffusion-based cumulative adversarial purification for vision language models.arXiv preprint arXiv:2506.03933, 2025

work page arXiv 2025
[9]

Imagebind: One embedding space to bind them all

Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. Imagebind: One embedding space to bind them all. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15180– 15190, 2023

work page 2023
[10]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial exam- ples.arXiv preprint arXiv:1412.6572, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[11]

Countering Adversarial Images using Input Transformations

Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens Van Der Maaten. Countering adversarial images using input transformations.arXiv preprint arXiv:1711.00117, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Scaling up visual and vision-language representation learning with noisy text supervision

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR, 2021

work page 2021
[13]

Trap: Targeted redirecting of agentic preferences.arXiv preprint arXiv:2505.23518, 2025

Hangoo Kang, Jehyeok Yeon, and Gagandeep Singh. Trap: Targeted redirecting of agentic preferences.arXiv preprint arXiv:2505.23518, 2025

work page arXiv 2025
[14]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[15]

Adbm: Adversarial diffusion bridge model for reliable adversarial purification,

Xiao Li, Wenxuan Sun, Huanran Chen, Qiongxiu Li, Yining Liu, Yingzhe He, Jie Shi, and Xiaolin Hu. Adbm: Adversarial diffusion bridge model for reliable adversar- ial purification.arXiv preprint arXiv:2408.00315, 2024

work page arXiv 2024
[16]

Defvae: A defect detection method for catenary devices based on variational autoencoder.IEEE Transactions on Instrumentation and Measurement, 72: 1–12, 2023

Tengfei Lu, Zhongli Wang, Yan Shen, Xiaotao Shao, and Yonglin Tang. Defvae: A defect detection method for catenary devices based on variational autoencoder.IEEE Transactions on Instrumentation and Measurement, 72: 1–12, 2023

work page 2023
[17]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[18]

Magnet: a two-pronged defense against adversarial examples

Dongyu Meng and Hao Chen. Magnet: a two-pronged defense against adversarial examples. InProceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 135–147, 2017

work page 2017
[19]

Adversarial symmetric variational autoencoder.Advances in neural information processing systems, 30, 2017

Yuchen Pu, Weiyao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li, and Lawrence Carin. Adversarial symmetric variational autoencoder.Advances in neural information processing systems, 30, 2017

work page 2017
[20]

Learning transferable visual models from natural lan- guage supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural lan- guage supervision. InInternational conference on ma- chine learning, pages 8748–8763. PmLR, 2021

work page 2021
[21]

Universal adversarial attack on aligned multimodal llms.arXiv preprint arXiv:2502.07987, 2025

Temurbek Rahmatullaev, Polina Druzhinina, Nikita Kur- diukov, Matvey Mikhalchuk, Andrey Kuznetsov, and An- ton Razzhigaev. Universal adversarial attack on aligned multimodal llms.arXiv preprint arXiv:2502.07987, 2025

work page arXiv 2025
[22]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sen- tence embeddings using siamese bert-networks.arXiv preprint arXiv:1908.10084, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1908
[23]

the object

Christian Schlarmann, Naman Deep Singh, Francesco Croce, and Matthias Hein. Robust clip: Unsuper- vised adversarial fine-tuning of vision embeddings for robust large vision-language models.arXiv preprint arXiv:2402.12336, 2024

work page arXiv 2024
[24]

Plug and Pray: Exploiting off-the-shelf components of multi-modal models

Erfan Shayegani, Yue Dong, and Nael Abu-Ghazaleh. Jailbreak in pieces: Compositional adversarial attacks on multi-modal language models.arXiv preprint arXiv:2307.14539, 2023

work page arXiv 2023
[25]

Mpnet: Masked and permuted pre-training for language understanding.Advances in neural information processing systems, 33:16857–16867, 2020

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie- Yan Liu. Mpnet: Masked and permuted pre-training for language understanding.Advances in neural information processing systems, 33:16857–16867, 2020

work page 2020
[26]

Multimodal learning with deep boltzmann machines.Advances in neural information processing systems, 25, 2012

Nitish Srivastava and Russ R Salakhutdinov. Multimodal learning with deep boltzmann machines.Advances in neural information processing systems, 25, 2012

work page 2012
[27]

Ensemble Adversarial Training: Attacks and Defenses

Florian Tram `er, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. En- semble adversarial training: Attacks and defenses.arXiv preprint arXiv:1705.07204, 2017

work page arXiv 2017
[28]

Extracting and composing robust features with denoising autoencoders

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. InPro- ceedings of the 25th international conference on Machine learning, pages 1096–1103, 2008

work page 2008
[29]

Adversarial attacks on multimodal agents.arXiv e-prints, pages arXiv–2406, 2024

Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, and Aditi Raghunathan. Adversarial attacks on multimodal agents.arXiv e-prints, pages arXiv–2406, 2024

work page 2024
[30]

Adversarial-guided diffusion for multimodal llm attacks.arXiv preprint arXiv:2507.23202, 2025

Chengwei Xia, Fan Ma, Ruijie Quan, Kun Zhan, and Yi Yang. Adversarial-guided diffusion for multimodal llm attacks.arXiv preprint arXiv:2507.23202, 2025

work page arXiv 2025
[31]

G-vae: Variational autoencoder- based adversarial attacks and defenses in industrial con- trol systems.Computers and Electrical Engineering, 124: 110290, 2025

Lijuan Xu, Zhi Yang, Dawei Zhao, Fuqiang Yu, Yang Zhou, and Hu Zhang. G-vae: Variational autoencoder- based adversarial attacks and defenses in industrial con- trol systems.Computers and Electrical Engineering, 124: 110290, 2025

work page 2025
[32]

Towards effective and efficient adversarial defense with diffusion models for robust visual tracking

Long Xu, Peng Gao, Wen-Jia Tang, Fei Wang, and Ru-Yue Yuan. Towards effective and efficient adversarial defense with diffusion models for robust visual tracking. Information Fusion, 124:103384, December 2025. ISSN 1566-2535. doi: 10.1016/j.inffus.2025.103384. URL http://dx.doi.org/10.1016/j.inffus.2025.103384

work page doi:10.1016/j.inffus.2025.103384 2025
[33]

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks.arXiv preprint arXiv:1704.01155, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[34]

Defend- ing against adversarial attacks using spherical sampling- based variational auto-encoder.Neurocomputing, 478: 1–10, 2022

Sheng-lin Yin, Xing-lan Zhang, and Li-yu Zuo. Defend- ing against adversarial attacks using spherical sampling- based variational auto-encoder.Neurocomputing, 478: 1–10, 2022

work page 2022
[35]

Multimodal contrastive training for visual representation learning

Xin Yuan, Zhe Lin, Jason Kuen, Jianming Zhang, Yilin Wang, Michael Maire, Ajinkya Kale, and Baldo Faieta. Multimodal contrastive training for visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6995– 7004, 2021

work page 2021
[36]

Adversarial illusions in multi-modal embeddings

Tingwei Zhang, Rishi Jha, Eugene Bagdasaryan, and Vitaly Shmatikov. Adversarial illusions in multi-modal embeddings. In33rd USENIX Security Symposium (USENIX Security 24), pages 3009–3025, 2024

work page 2024
[37]

Advclip: Downstream- agnostic adversarial examples in multimodal contrastive learning

Ziqi Zhou, Shengshan Hu, Minghui Li, Hangtao Zhang, Yechao Zhang, and Hai Jin. Advclip: Downstream- agnostic adversarial examples in multimodal contrastive learning. InProceedings of the 31st ACM International Conference on Multimedia, pages 6311–6320, 2023

work page 2023

[1] [1]

Auto encoder-based defense mechanism against popular adversarial attacks in deep learning.PloS one, 19(10): e0307363, 2024

Syeda Nazia Ashraf, Raheel Siddiqi, and Humera Farooq. Auto encoder-based defense mechanism against popular adversarial attacks in deep learning.PloS one, 19(10): e0307363, 2024

work page 2024

[2] [2]

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International conference on machine learning, pages 274–283. PMLR, 2018

work page 2018

[3] [3]

Gradient-free adversarial purification with diffu- sion models.arXiv preprint arXiv:2501.13336, 2025

Xuelong Dai, Dong Wang, Duan Mingxing, and Bin Xiao. Gradient-free adversarial purification with diffu- sion models.arXiv preprint arXiv:2501.13336, 2025

work page arXiv 2025

[4] [4]

A comparative study on denoising from facial images using convolu- tional autoencoder.Gazi University Journal of Science, 36(3):1122–1138, 2023

Muazzez Buket Darıcı and Zeki Erdem. A comparative study on denoising from facial images using convolu- tional autoencoder.Gazi University Journal of Science, 36(3):1122–1138, 2023

work page 2023

[5] [5]

Shield: Fast, practical defense and vaccination for deep learning using jpeg compres- sion

Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Fred Hohman, Siwei Li, Li Chen, Michael E Kounavis, and Duen Horng Chau. Shield: Fast, practical defense and vaccination for deep learning using jpeg compres- sion. InProceedings of the 24th ACM SIGKDD Inter- national Conference on Knowledge Discovery & Data Mining, pages 196–204, 2018

work page 2018

[6] [6]

Adversarial attacks to multi-modal models

Zhihao Dou, Xin Hu, Haibo Yang, Zhuqing Liu, and Minghong Fang. Adversarial attacks to multi-modal models. InProceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis, LAMPS ’24, page 35–46, New York, NY , USA,

work page

[7] [7]

ISBN 9798400712098

Association for Computing Machinery. ISBN 9798400712098. doi: 10.1145/3689217.3690619. URL https://doi.org/10.1145/3689217.3690619

work page doi:10.1145/3689217.3690619

[8] [8]

Diffcap: Diffusion-based cumulative adversarial purification for vision language models.arXiv preprint arXiv:2506.03933, 2025

Jia Fu, Yongtao Wu, Yihang Chen, Kunyu Peng, Xiao Zhang, V olkan Cevher, Sepideh Pashami, and Anders Holst. Diffcap: Diffusion-based cumulative adversarial purification for vision language models.arXiv preprint arXiv:2506.03933, 2025

work page arXiv 2025

[9] [9]

Imagebind: One embedding space to bind them all

Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. Imagebind: One embedding space to bind them all. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15180– 15190, 2023

work page 2023

[10] [10]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial exam- ples.arXiv preprint arXiv:1412.6572, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[11] [11]

Countering Adversarial Images using Input Transformations

Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens Van Der Maaten. Countering adversarial images using input transformations.arXiv preprint arXiv:1711.00117, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

Scaling up visual and vision-language representation learning with noisy text supervision

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR, 2021

work page 2021

[13] [13]

Trap: Targeted redirecting of agentic preferences.arXiv preprint arXiv:2505.23518, 2025

Hangoo Kang, Jehyeok Yeon, and Gagandeep Singh. Trap: Targeted redirecting of agentic preferences.arXiv preprint arXiv:2505.23518, 2025

work page arXiv 2025

[14] [14]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[15] [15]

Adbm: Adversarial diffusion bridge model for reliable adversarial purification,

Xiao Li, Wenxuan Sun, Huanran Chen, Qiongxiu Li, Yining Liu, Yingzhe He, Jie Shi, and Xiaolin Hu. Adbm: Adversarial diffusion bridge model for reliable adversar- ial purification.arXiv preprint arXiv:2408.00315, 2024

work page arXiv 2024

[16] [16]

Defvae: A defect detection method for catenary devices based on variational autoencoder.IEEE Transactions on Instrumentation and Measurement, 72: 1–12, 2023

Tengfei Lu, Zhongli Wang, Yan Shen, Xiaotao Shao, and Yonglin Tang. Defvae: A defect detection method for catenary devices based on variational autoencoder.IEEE Transactions on Instrumentation and Measurement, 72: 1–12, 2023

work page 2023

[17] [17]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[18] [18]

Magnet: a two-pronged defense against adversarial examples

Dongyu Meng and Hao Chen. Magnet: a two-pronged defense against adversarial examples. InProceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 135–147, 2017

work page 2017

[19] [19]

Adversarial symmetric variational autoencoder.Advances in neural information processing systems, 30, 2017

Yuchen Pu, Weiyao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li, and Lawrence Carin. Adversarial symmetric variational autoencoder.Advances in neural information processing systems, 30, 2017

work page 2017

[20] [20]

Learning transferable visual models from natural lan- guage supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural lan- guage supervision. InInternational conference on ma- chine learning, pages 8748–8763. PmLR, 2021

work page 2021

[21] [21]

Universal adversarial attack on aligned multimodal llms.arXiv preprint arXiv:2502.07987, 2025

Temurbek Rahmatullaev, Polina Druzhinina, Nikita Kur- diukov, Matvey Mikhalchuk, Andrey Kuznetsov, and An- ton Razzhigaev. Universal adversarial attack on aligned multimodal llms.arXiv preprint arXiv:2502.07987, 2025

work page arXiv 2025

[22] [22]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sen- tence embeddings using siamese bert-networks.arXiv preprint arXiv:1908.10084, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1908

[23] [23]

the object

Christian Schlarmann, Naman Deep Singh, Francesco Croce, and Matthias Hein. Robust clip: Unsuper- vised adversarial fine-tuning of vision embeddings for robust large vision-language models.arXiv preprint arXiv:2402.12336, 2024

work page arXiv 2024

[24] [24]

Plug and Pray: Exploiting off-the-shelf components of multi-modal models

Erfan Shayegani, Yue Dong, and Nael Abu-Ghazaleh. Jailbreak in pieces: Compositional adversarial attacks on multi-modal language models.arXiv preprint arXiv:2307.14539, 2023

work page arXiv 2023

[25] [25]

Mpnet: Masked and permuted pre-training for language understanding.Advances in neural information processing systems, 33:16857–16867, 2020

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie- Yan Liu. Mpnet: Masked and permuted pre-training for language understanding.Advances in neural information processing systems, 33:16857–16867, 2020

work page 2020

[26] [26]

Multimodal learning with deep boltzmann machines.Advances in neural information processing systems, 25, 2012

Nitish Srivastava and Russ R Salakhutdinov. Multimodal learning with deep boltzmann machines.Advances in neural information processing systems, 25, 2012

work page 2012

[27] [27]

Ensemble Adversarial Training: Attacks and Defenses

Florian Tram `er, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. En- semble adversarial training: Attacks and defenses.arXiv preprint arXiv:1705.07204, 2017

work page arXiv 2017

[28] [28]

Extracting and composing robust features with denoising autoencoders

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. InPro- ceedings of the 25th international conference on Machine learning, pages 1096–1103, 2008

work page 2008

[29] [29]

Adversarial attacks on multimodal agents.arXiv e-prints, pages arXiv–2406, 2024

Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, and Aditi Raghunathan. Adversarial attacks on multimodal agents.arXiv e-prints, pages arXiv–2406, 2024

work page 2024

[30] [30]

Adversarial-guided diffusion for multimodal llm attacks.arXiv preprint arXiv:2507.23202, 2025

Chengwei Xia, Fan Ma, Ruijie Quan, Kun Zhan, and Yi Yang. Adversarial-guided diffusion for multimodal llm attacks.arXiv preprint arXiv:2507.23202, 2025

work page arXiv 2025

[31] [31]

G-vae: Variational autoencoder- based adversarial attacks and defenses in industrial con- trol systems.Computers and Electrical Engineering, 124: 110290, 2025

Lijuan Xu, Zhi Yang, Dawei Zhao, Fuqiang Yu, Yang Zhou, and Hu Zhang. G-vae: Variational autoencoder- based adversarial attacks and defenses in industrial con- trol systems.Computers and Electrical Engineering, 124: 110290, 2025

work page 2025

[32] [32]

Towards effective and efficient adversarial defense with diffusion models for robust visual tracking

Long Xu, Peng Gao, Wen-Jia Tang, Fei Wang, and Ru-Yue Yuan. Towards effective and efficient adversarial defense with diffusion models for robust visual tracking. Information Fusion, 124:103384, December 2025. ISSN 1566-2535. doi: 10.1016/j.inffus.2025.103384. URL http://dx.doi.org/10.1016/j.inffus.2025.103384

work page doi:10.1016/j.inffus.2025.103384 2025

[33] [33]

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks.arXiv preprint arXiv:1704.01155, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[34] [34]

Defend- ing against adversarial attacks using spherical sampling- based variational auto-encoder.Neurocomputing, 478: 1–10, 2022

Sheng-lin Yin, Xing-lan Zhang, and Li-yu Zuo. Defend- ing against adversarial attacks using spherical sampling- based variational auto-encoder.Neurocomputing, 478: 1–10, 2022

work page 2022

[35] [35]

Multimodal contrastive training for visual representation learning

Xin Yuan, Zhe Lin, Jason Kuen, Jianming Zhang, Yilin Wang, Michael Maire, Ajinkya Kale, and Baldo Faieta. Multimodal contrastive training for visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6995– 7004, 2021

work page 2021

[36] [36]

Adversarial illusions in multi-modal embeddings

Tingwei Zhang, Rishi Jha, Eugene Bagdasaryan, and Vitaly Shmatikov. Adversarial illusions in multi-modal embeddings. In33rd USENIX Security Symposium (USENIX Security 24), pages 3009–3025, 2024

work page 2024

[37] [37]

Advclip: Downstream- agnostic adversarial examples in multimodal contrastive learning

Ziqi Zhou, Shengshan Hu, Minghui Li, Hangtao Zhang, Yechao Zhang, and Hai Jin. Advclip: Downstream- agnostic adversarial examples in multimodal contrastive learning. InProceedings of the 31st ACM International Conference on Multimedia, pages 6311–6320, 2023

work page 2023