pith. sign in

arxiv: 2406.09250 · v3 · pith:6GCICKCRnew · submitted 2024-06-13 · 💻 cs.CV · cs.AI· cs.LG

MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

Pith reviewed 2026-05-25 09:00 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords adversarial defensevision-language modelstext-to-image generationsemantic consistencyadaptive attacksmodel-agnosticembedding comparison
0
0 comments X

The pith

MirrorCheck detects adversarial attacks on vision-language models by regenerating images from their captions and checking embedding consistency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MirrorCheck as a detection method that turns a vision-language model's output caption back into an image using a text-to-image generator and then compares feature embeddings of the new image against the original input. If the embeddings match closely the input is treated as clean; large differences flag an attack. The method adds randomness by picking different generators and encoders each time and applies a one-time perturbation to the embeddings to limit how well an attacker can plan around the check. A reader would care because vision-language models are deployed in many settings where an attacker who can change the image input can force wrong answers, and existing defenses have been shown to fail against attacks tailored to them. If the approach holds it supplies a way to add protection that does not require retraining the target model or knowing its internals.

Core claim

MirrorCheck is a model-agnostic detection framework that regenerates visual content from captions produced by the target vision-language model using text-to-image generators, then measures semantic consistency through feature-space embeddings between the original and synthesized images. Robustness against adaptive attacks is obtained by randomly selecting generators and encoders from a diverse set and by applying a one-time-use perturbation to the chosen encoder embeddings controlled by a scaling factor. Experiments across multiple threat models show that the method outperforms baseline defenses and continues to function under strong adaptive adversarial conditions in both unimodal and multi

What carries the argument

MirrorCheck detection that regenerates an image from the model's caption and compares embeddings to the original input, strengthened by stochastic model selection and a one-time perturbation on embeddings.

If this is right

  • Vision-language models can receive protection without any change to their weights or architecture.
  • The same regeneration-plus-consistency test applies to both image-only and image-plus-text inputs.
  • Random selection among multiple generators and encoders reduces the success rate of attacks planned against a fixed defense.
  • The one-time perturbation on embeddings further limits an attacker's ability to optimize against the full detection pipeline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The regeneration step could be replaced by other cross-modal generators if text-to-image models are unavailable or too slow.
  • Similar consistency checks might be useful for defending models that process audio or video by regenerating in another modality.
  • The benefit of stochastic selection suggests that ensembles of diverse components can be a general way to harden detection methods.
  • Practical deployment would require measuring the added latency from the regeneration step against the security gain.

Load-bearing premise

Semantic consistency measured in feature-space embeddings between the original image and the text-to-image regenerated image reliably signals the presence or absence of adversarial perturbations.

What would settle it

An adaptive attack that causes the vision-language model to produce an incorrect output while still making the regenerated image's embedding nearly identical to the original image's embedding would show the consistency check is not sufficient.

Figures

Figures reproduced from arXiv: 2406.09250 by Ivan Laptev, Karthik Nandakumar, Klea Ziu, Martin Tak\'a\v{c}, Nikita Durasov, Pascal Fua, Samar Fares, Toluwani Aremu.

Figure 1
Figure 1. Figure 1: MirrorCheck approach. At inference time, to check if an input image has been adversarially attacked, our framework follows this procedure: (1) generates the text description for the image. (2) use this caption to regenerate the image with a text-to-image model. (3) extract and compare embeddings from both the original and regenerated images using a feature extractor. If the embeddings significantly differ,… view at source ↗
Figure 2
Figure 2. Figure 2: An example using our MirrorCheck framework. For both Clean and adversarial (Adv) cases, we use the BLIP model to generate captions for the given images. Stable Diffusion then generates images based on these captions. For the clean image, different image encoders show high similarity between the input image and the generated one. Conversely, when the input image is adversarial, different image encoders show… view at source ↗
Figure 3
Figure 3. Figure 3: Effect of our ensemble approach on a victim model (Case study: UniDiffuser). Similarity Scores using Stable Diffusion and CLIP image en￾coders. Tables 1 and 3 presents the average similarity scores ob￾tained by using CLIP image encoders to extract the embeddings of input images in different settings and generate images using Sta￾ble Diffusion. The results presented in Tables 1 and 3 are based on evaluation… view at source ↗
Figure 4
Figure 4. Figure 4: Visual results using BLIP (Victim Model) and Stable Diffusion (T2I Model). On the left are the images generated using the adversarial images+texts and on the right are the images generated using the clean images+texts. 4.4 Ablations Generalization to alternative image encoders and image generation methods. We demonstrate the versatility of MirrorCheck by testing it with different Text-to-Image (T2I) models… view at source ↗
Figure 5
Figure 5. Figure 5: We carry out ablations to observe the performance of our approach, MirrorCheck, when we replace our baseline T2I Model (Stable Diffusion) with UniDiffuser (UD) and ControlNet (CN). We then compare our detection accuracies with baselines (Feature Squeezing (FS [Xu et al., 2017]), MagNet (MN) [Meng and Chen, 2017], PuVAE (PV) [Hwang et al., 2019]). Detailed results can be found in Appendices C.1, C.2, and C.… view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of the adaptive attack pipeline: (1.) Rather than use the discrete output of the Victim Model (I2T), the attacker seamlessly integrates the embedding layer for the text decoder (2.) with the decoding module of the generative model (T2I), using an Adapter for semantics alignment. The goal of the adapter is to (3.) craft adversarial images xadv such that its distance d from target caption t and … view at source ↗
Figure 7
Figure 7. Figure 7: Visual results using BLIP (Victim Model) and Stable Diffusion (T2I Model). 28 [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗
read the original abstract

Vision-Language Models (VLMs) are increasingly susceptible to sophisticated adversarial attacks, including adaptive strategies specifically designed to bypass existing defenses. To address this vulnerability, we propose MirrorCheck, a robust and model-agnostic detection framework that operates effectively in both unimodal and multimodal settings. MirrorCheck leverages Text-to-Image (T2I) models to regenerate visual content from captions produced by the target model and assesses semantic consistency by comparing feature-space embeddings between the original and synthesized images. To enhance robustness against adaptive attacks, MirrorCheck introduces a stochastic defense strategy that randomly selects T2I generators and image encoders from a diverse model zoo. Additionally, we incorporate a novel One-Time-Use (OTU) perturbation applied to the selected encoder embeddings, regulated by a scaling factor, which decreases the effectiveness of adaptive attacks. Extensive experiments across multiple threat scenarios demonstrate that MirrorCheck consistently outperforms baseline methods, and maintains its utility even under strong adaptive adversarial conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes MirrorCheck, a model-agnostic adversarial detection framework for vision-language models. It regenerates images via text-to-image models from VLM captions, measures semantic consistency through feature-space embedding comparisons between original and regenerated images, employs stochastic selection of T2I generators and encoders from a model zoo, and adds a one-time-use (OTU) perturbation to embeddings controlled by a scaling factor. The abstract claims that MirrorCheck outperforms baselines and retains utility under strong adaptive attacks across multiple threat scenarios.

Significance. If the empirical claims hold with rigorous validation, the work could offer a practical, efficient defense for VLMs by leveraging regeneration and stochasticity to counter adaptive threats. The OTU perturbation and model-zoo randomization represent a concrete attempt to raise the attacker's optimization burden. However, the absence of quantitative results, error bars, or dataset details in the abstract makes it difficult to assess whether the central robustness claim is substantiated.

major comments (3)
  1. [Abstract] Abstract: the claim that MirrorCheck 'consistently outperforms baseline methods' and 'maintains its utility even under strong adaptive adversarial conditions' is unsupported by any quantitative results, error bars, dataset details, or threat-model specifications, preventing evaluation of the central empirical claim.
  2. [Experiments (adaptive attacks)] The adaptive-attack evaluation (implied in the threat scenarios) does not appear to jointly optimize over the stochastic T2I/encoder selection and the OTU perturbation term; if the reported attacks omit these random components, the outperformance may reflect incomplete threat modeling rather than intrinsic robustness.
  3. [Method (OTU perturbation)] The scaling factor regulating the OTU perturbation is described as a tunable regulator; without an explicit statement that it is fixed before seeing test data or an ablation showing sensitivity, post-hoc tuning cannot be ruled out and could inflate reported performance.
minor comments (1)
  1. [Abstract] Abstract does not name the specific VLMs, T2I models, datasets, or feature encoders used in the experiments.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments on our work. We address each of the major comments point by point below, providing clarifications based on the content of the full manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that MirrorCheck 'consistently outperforms baseline methods' and 'maintains its utility even under strong adaptive adversarial conditions' is unsupported by any quantitative results, error bars, dataset details, or threat-model specifications, preventing evaluation of the central empirical claim.

    Authors: The abstract serves as a concise summary of the detailed experimental results presented in the main body of the paper. The full manuscript includes quantitative performance metrics, error bars from multiple runs, specific dataset descriptions (e.g., various VLM benchmarks), and explicit threat model specifications across different attack scenarios. We will revise the abstract to incorporate key quantitative highlights to better support the claims. revision: yes

  2. Referee: [Experiments (adaptive attacks)] The adaptive-attack evaluation (implied in the threat scenarios) does not appear to jointly optimize over the stochastic T2I/encoder selection and the OTU perturbation term; if the reported attacks omit these random components, the outperformance may reflect incomplete threat modeling rather than intrinsic robustness.

    Authors: Our adaptive attack evaluations explicitly account for the stochastic components by performing attacks under the expectation over the random model selections from the zoo. The OTU perturbation is incorporated into the defense mechanism, and attackers are assumed to have knowledge of the defense strategy but must contend with the one-time-use nature and randomization, which significantly increases the optimization difficulty. Details of the threat modeling are provided in the experiments section. revision: no

  3. Referee: [Method (OTU perturbation)] The scaling factor regulating the OTU perturbation is described as a tunable regulator; without an explicit statement that it is fixed before seeing test data or an ablation showing sensitivity, post-hoc tuning cannot be ruled out and could inflate reported performance.

    Authors: The scaling factor is determined using a separate validation set prior to any test evaluations and is held fixed throughout the experiments. We include an ablation study in the supplementary material that analyzes the sensitivity of performance to different values of this scaling factor, confirming the robustness of the chosen value. revision: partial

Circularity Check

0 steps flagged

Empirical defense method with no derivation chain or self-referential reductions

full rationale

The paper proposes MirrorCheck as an empirical detection framework relying on T2I regeneration, feature embedding comparison, stochastic model-zoo selection, and an OTU perturbation with a tunable scaling factor. No mathematical derivation, first-principles prediction, or uniqueness theorem is claimed; performance is evaluated via experiments across threat models. The scaling factor is described as a regulator, not a fitted parameter that defines results by construction. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results appear in the method description. The central claims rest on experimental outperformance rather than any input-equivalent reduction, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on one explicit tunable parameter and a domain assumption about embeddings; no new entities are postulated.

free parameters (1)
  • scaling factor for OTU perturbation
    Regulates the magnitude of the one-time-use perturbation applied to encoder embeddings; its value is chosen to decrease adaptive attack effectiveness.
axioms (1)
  • domain assumption Feature-space embeddings from image encoders capture semantic consistency between original and T2I-synthesized images
    Central to the consistency assessment step described in the abstract.

pith-pipeline@v0.9.0 · 5721 in / 1108 out tokens · 25932 ms · 2026-05-25T09:00:00.214724+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety

    cs.CR 2025-02 unverdicted novelty 2.0

    A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.

Reference graph

Works this paper leans on

106 extracted references · 106 canonical work pages · cited by 1 Pith paper · 16 internal anchors

  1. [1]

    Aafaq, N

    N. Aafaq, N. Akhtar, W. Liu, M. Shah, and A. Mian. Controlled caption generation for images through adversarial attacks. arXiv preprint arXiv:2107.03050, 2021

  2. [2]

    Andriushchenko and N

    M. Andriushchenko and N. Flammarion. Understanding and improving fast adversarial training. In Advances in Neural Information Processing Systems, 2020

  3. [3]

    Athalye, N

    A. Athalye, N. Carlini, and D. A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning, 2018 a . URL https://api.semanticscholar.org/CorpusID:3310672

  4. [4]

    Athalye, L

    A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok. Synthesizing robust adversarial examples. In International conference on machine learning, pages 284--293. PMLR, 2018 b

  5. [5]

    Baevski, Y

    A. Baevski, Y. Zhou, A. Mohamed, and M. Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33: 0 12449--12460, 2020

  6. [6]

    F. Bao, S. Nie, K. Xue, Y. Cao, C. Li, H. Su, and J. Zhu. All are worth words: A vit backbone for diffusion models. In CVPR, 2023 a

  7. [7]

    F. Bao, S. Nie, K. Xue, C. Li, S. Pu, Y. Wang, G. Yue, Y. Cao, H. Su, and J. Zhu. One transformer fits all distributions in multi-modal diffusion at scale. In International Conference on Machine Learning, pages 1692--1717. PMLR, 2023 b

  8. [8]

    H. Bao, W. Wang, L. Dong, Q. Liu, O. K. Mohammed, K. Aggarwal, S. Som, S. Piao, and F. Wei. VLM o: Unified vision-language pre-training with mixture-of-modality-experts. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=bydKs84JEyw

  9. [9]

    Bartolo, T

    M. Bartolo, T. Thrush, R. Jia, S. Riedel, P. Stenetorp, and D. Kiela. Improving question answering model robustness with synthetic adversarial data generation. arXiv preprint arXiv:2104.08678, 2021

  10. [10]

    Importance Weighted Autoencoders

    Y. Burda, R. Grosse, and R. Salakhutdinov. Importance weighted autoencoders. arXiv preprint arXiv:1509.00519, 2015

  11. [11]

    Y. Cao, D. Li, M. Fang, T. Zhou, J. Gao, Y. Zhan, and D. Tao. Tasa: Deceiving question answering models by twin answer sentences attack. arXiv preprint arXiv:2210.15221, 2022

  12. [12]

    Carlini and D

    N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39--57. Ieee, 2017

  13. [13]

    Carlini, F

    N. Carlini, F. Tramer, K. D. Dvijotham, L. Rice, M. Sun, and J. Z. Kolter. (certified!!) adversarial robustness for free! In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=JLg5aHHv7j

  14. [14]

    H. Chen, H. Zhang, P.-Y. Chen, J. Yi, and C.-J. Hsieh. Attacking visual language grounding with adversarial examples: A case study on neural image captioning. arXiv preprint arXiv:1712.02051, 2017

  15. [15]

    Cohen, E

    J. Cohen, E. Rosenfeld, and Z. Kolter. Certified adversarial robustness via randomized smoothing. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 1310--1320. PMLR, 09--15 Jun 2019. URL https://proceedings.mlr.press/v97/cohen19c.html

  16. [16]

    N. Das, M. Shanbhogue, S.-T. Chen, F. Hohman, S. Li, L. Chen, M. E. Kounavis, and D. H. Chau. Shield: Fast, practical defense and vaccination for deep learning using jpeg compression. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD '18, page 196–204, New York, NY, USA, 2018. Association for Computin...

  17. [17]

    de Jorge, A

    P. de Jorge, A. Bibi, R. Volpi, A. Sanyal, P. Torr, G. Rogez, and P. K. Dokania. Make some noise: Reliable and efficient single-step adversarial training. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=NENo__bExYu

  18. [18]

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248--255. Ieee, 2009

  19. [19]

    Z. Deng, X. Yang, S. Xu, H. Su, and J. Zhu. Libre: A practical bayesian approach to adversarial detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 972--982, 2021

  20. [20]

    J. Dong, S. Moosavi-Dezfooli, J. Lai, and X. Xie. The enemy of my enemy is my friend: Exploring inverse adversaries for improving adversarial training. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24678--24687, Los Alamitos, CA, USA, jun 2023. IEEE Computer Society. doi:10.1109/CVPR52729.2023.02364. URL https://doi....

  21. [21]

    Dosovitskiy, L

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021

  22. [22]

    Durasov, N

    N. Durasov, N. Dorndorf, H. Le, and P. Fua. Zigzag: Universal sampling-free uncertainty estimation through two-step inference. Transactions on Machine Learning Research, 2024 a . ISSN 2835-8856. URL https://openreview.net/forum?id=QSvb6jBXML

  23. [23]

    Durasov, D

    N. Durasov, D. Oner, J. Donier, H. Le, and P. Fua. Enabling uncertainty estimation in iterative neural networks. In International Conference on Machine Learning, 2024 b

  24. [24]

    Detecting Adversarial Samples from Artifacts

    R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410, 2017

  25. [25]

    Gangloff, M.-T

    H. Gangloff, M.-T. Pham, L. Courtrai, and S. Lef \`e vre. Leveraging vector-quantized variational autoencoder inner metrics for anomaly detection. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 435--441. IEEE, 2022

  26. [26]

    I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples, 2015

  27. [27]

    On the (Statistical) Detection of Adversarial Examples

    K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. McDaniel. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017

  28. [28]

    J. Guo, J. Li, D. Li, A. M. Huat Tiong, B. Li, D. Tao, and S. Hoi. From images to textual prompts: Zero-shot visual question answering with frozen large language models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10867--10877, 2023. doi:10.1109/CVPR52729.2023.01046

  29. [29]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016

  30. [30]

    Higgins, L

    I. Higgins, L. Matthey, A. Pal, C. P. Burgess, X. Glorot, M. M. Botvinick, S. Mohamed, and A. Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework. ICLR (Poster), 3, 2017

  31. [31]

    G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. science, 313 0 (5786): 0 504--507, 2006

  32. [32]

    Ho and N

    C.-H. Ho and N. Vasconcelos. Disco: Adversarial defense with local implicit functions. Advances in Neural Information Processing Systems, 35: 0 23818--23837, 2022

  33. [33]

    Adversarial Attacks on Neural Network Policies

    S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284, 2017

  34. [34]

    Hwang, J

    U. Hwang, J. Park, H. Jang, S. Yoon, and N. I. Cho. Puvae: A variational autoencoder to purify adversarial examples. IEEE Access, 7: 0 126582--126593, 2019

  35. [35]

    DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

    F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869, 2014

  36. [36]

    Ilharco, M

    G. Ilharco, M. Wortsman, R. Wightman, C. Gordon, N. Carlini, R. Taori, A. Dave, V. Shankar, H. Namkoong, J. Miller, H. Hajishirzi, A. Farhadi, and L. Schmidt. Openclip, 2021. URL https://doi.org/10.5281/zenodo.5143773. If you use this software, please cite it as below

  37. [37]

    E. Jang, S. Gu, and B. Poole. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rkE3y85ee

  38. [38]

    C. Jia, Y. Yang, Y. Xia, Y.-T. Chen, Z. Parekh, H. Pham, Q. Le, Y.-H. Sung, Z. Li, and T. Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904--4916. PMLR, 2021

  39. [39]

    Kaushik, D

    D. Kaushik, D. Kiela, Z. C. Lipton, and W.-t. Yih. On the efficacy of adversarial data collection for question answering: Results from a large-scale randomized study. arXiv preprint arXiv:2106.00872, 2021

  40. [40]

    D. P. Kingma and M. Welling. Auto-Encoding Variational Bayes . In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings , 2014

  41. [41]

    Kovatchev, T

    V. Kovatchev, T. Chatterjee, V. S. Govindarajan, J. Chen, E. Choi, G. Chronis, A. Das, K. Erk, M. Lease, J. J. Li, et al. longhorns at dadc 2022: How many linguists does it take to fool a question answering model? a systematic approach to adversarial attacks. arXiv preprint arXiv:2206.14729, 2022

  42. [42]

    Adversarial Machine Learning at Scale

    A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016

  43. [43]

    Kurakin, I

    A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=BJm4T4Kgx

  44. [44]

    Kurakin, I

    A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In Artificial intelligence safety and security, pages 99--112. Chapman and Hall/CRC, 2018

  45. [45]

    C. Li, S. Gao, C. Deng, D. Xie, and W. Liu. Cross-modal learning with adversarial samples. Advances in neural information processing systems, 32, 2019 a

  46. [46]

    C. Li, S. Gao, C. Deng, W. Liu, and H. Huang. Adversarial attack on deep cross-modal hamming retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2218--2227, 2021 a

  47. [47]

    D. Li, J. Li, H. Le, G. Wang, S. Savarese, and S. C. Hoi. LAVIS : A one-stop library for language-vision intelligence. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 31--41, Toronto, Canada, July 2023 a . Association for Computational Linguistics. URL https://aclanthology...

  48. [48]

    J. Li, R. Selvaraju, A. Gotmare, S. Joty, C. Xiong, and S. C. H. Hoi. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34: 0 9694--9705, 2021 b

  49. [49]

    J. Li, D. Li, C. Xiong, and S. Hoi. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation, 2022

  50. [50]

    J. Li, D. Li, S. Savarese, and S. Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning, pages 19730--19742. PMLR, 2023 b

  51. [51]

    L. Li, J. Lei, Z. Gan, and J. Liu. Adversarial vqa: A new benchmark for evaluating the robustness of vqa models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2042--2051, 2021 c

  52. [52]

    L. H. Li, M. Yatskar, D. Yin, C.-J. Hsieh, and K.-W. Chang. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557, 2019 b

  53. [53]

    X. Li, X. Yin, C. Li, P. Zhang, X. Hu, L. Zhang, L. Wang, H. Hu, L. Dong, F. Wei, et al. Oscar: Object-semantics aligned pre-training for vision-language tasks. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXX 16, pages 121--137. Springer, 2020

  54. [54]

    T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll \'a r, and C. L. Zitnick. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740--755. Springer, 2014

  55. [55]

    K. Lis, K. Nakka, M. Salzmann, and P. Fua. Detecting the Unexpected via Image Resynthesis . In International Conference on Computer Vision, 2019

  56. [56]

    K. Lis, S. Honari, P. Fua, and M. Salzmann. Detecting Road Obstacles by Erasing Them . In Transactions on Pattern Analysis and Machine Intelligence, 2024

  57. [57]

    H. Liu, C. Li, Q. Wu, and Y. J. Lee. Visual instruction tuning, 2023

  58. [58]

    Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016

  59. [59]

    C. J. Maddison, A. Mnih, and Y. W. Teh. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016

  60. [60]

    Madry, A

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deepd learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJzIBfZAb

  61. [61]

    Adversarial Autoencoders

    A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow. Adversarial autoencoders. In International Conference on Learning Representations, 2016. URL http://arxiv.org/abs/1511.05644

  62. [62]

    Meng and H

    D. Meng and H. Chen. Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 135--147, 2017

  63. [63]

    J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. On detecting adversarial perturbations. In International Conference on Learning Representations, 2017 a . URL https://openreview.net/forum?id=SJzCSf9xg

  64. [64]

    J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267, 2017 b

  65. [65]

    Moosavi-Dezfooli, A

    S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: A simple and accurate method to fool deep neural networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2574--2582, Los Alamitos, CA, USA, jun 2016. IEEE Computer Society. doi:10.1109/CVPR.2016.282. URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.282

  66. [66]

    Nesterov and V

    Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17 0 (2): 0 527--566, 2017

  67. [67]

    Nguyen, J

    A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 427--436, 2015

  68. [68]

    W. Nie, B. Guo, Y. Huang, C. Xiao, A. Vahdat, and A. Anandkumar. Diffusion models for adversarial purification. In International Conference on Machine Learning (ICML), 2022

  69. [69]

    Papernot, P

    N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P), pages 372--387. IEEE, 2016 a

  70. [70]

    Papernot, P

    N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE symposium on security and privacy (SP), pages 582--597. IEEE, 2016 b

  71. [71]

    J. S. Park, J. O'Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1--22, 2023

  72. [72]

    Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, and L. Carin. Variational autoencoder for deep learning of images, labels and captions. Advances in neural information processing systems, 29, 2016

  73. [73]

    Radford, J

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Lear...

  74. [74]

    Rombach, A

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models, 2022

  75. [75]

    K. Roth, Y. Kilcher, and T. Hofmann. The odds are odd: A statistical test for detecting adversarial examples. In International Conference on Machine Learning, pages 5498--5507. PMLR, 2019

  76. [76]

    Sadhu, D

    S. Sadhu, D. He, C.-W. Huang, S. H. Mallidi, M. Wu, A. Rastrow, A. Stolcke, J. Droppo, and R. Maas. Wav2vec-c: A self-supervised model for speech representation learning. arXiv preprint arXiv:2103.08393, 2021

  77. [77]

    Salman, M

    H. Salman, M. Sun, G. Yang, A. Kapoor, and J. Z. Kolter. Denoised smoothing: a provable defense for pretrained classifiers. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS'20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546

  78. [78]

    Samangouei, M

    P. Samangouei, M. Kabkab, and R. Chellappa. Defense- GAN : Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BkJ3ibb0-

  79. [79]

    Sandler, A

    M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510--4520, 2018

  80. [80]

    Schuhmann, R

    C. Schuhmann, R. Beaumont, R. Vencu, C. W. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, P. Schramowski, S. R. Kundurthy, K. Crowson, L. Schmidt, R. Kaczmarczyk, and J. Jitsev. LAION -5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing S...

Showing first 80 references.