MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

Ivan Laptev; Karthik Nandakumar; Klea Ziu; Martin Tak\'a\v{c}; Nikita Durasov; Pascal Fua; Samar Fares; Toluwani Aremu

arxiv: 2406.09250 · v3 · pith:6GCICKCRnew · submitted 2024-06-13 · 💻 cs.CV · cs.AI· cs.LG

MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

Samar Fares , Klea Ziu , Toluwani Aremu , Nikita Durasov , Martin Tak\'a\v{c} , Pascal Fua , Karthik Nandakumar , Ivan Laptev This is my paper

Pith reviewed 2026-05-25 09:00 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords adversarial defensevision-language modelstext-to-image generationsemantic consistencyadaptive attacksmodel-agnosticembedding comparison

0 comments

The pith

MirrorCheck detects adversarial attacks on vision-language models by regenerating images from their captions and checking embedding consistency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MirrorCheck as a detection method that turns a vision-language model's output caption back into an image using a text-to-image generator and then compares feature embeddings of the new image against the original input. If the embeddings match closely the input is treated as clean; large differences flag an attack. The method adds randomness by picking different generators and encoders each time and applies a one-time perturbation to the embeddings to limit how well an attacker can plan around the check. A reader would care because vision-language models are deployed in many settings where an attacker who can change the image input can force wrong answers, and existing defenses have been shown to fail against attacks tailored to them. If the approach holds it supplies a way to add protection that does not require retraining the target model or knowing its internals.

Core claim

MirrorCheck is a model-agnostic detection framework that regenerates visual content from captions produced by the target vision-language model using text-to-image generators, then measures semantic consistency through feature-space embeddings between the original and synthesized images. Robustness against adaptive attacks is obtained by randomly selecting generators and encoders from a diverse set and by applying a one-time-use perturbation to the chosen encoder embeddings controlled by a scaling factor. Experiments across multiple threat models show that the method outperforms baseline defenses and continues to function under strong adaptive adversarial conditions in both unimodal and multi

What carries the argument

MirrorCheck detection that regenerates an image from the model's caption and compares embeddings to the original input, strengthened by stochastic model selection and a one-time perturbation on embeddings.

If this is right

Vision-language models can receive protection without any change to their weights or architecture.
The same regeneration-plus-consistency test applies to both image-only and image-plus-text inputs.
Random selection among multiple generators and encoders reduces the success rate of attacks planned against a fixed defense.
The one-time perturbation on embeddings further limits an attacker's ability to optimize against the full detection pipeline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The regeneration step could be replaced by other cross-modal generators if text-to-image models are unavailable or too slow.
Similar consistency checks might be useful for defending models that process audio or video by regenerating in another modality.
The benefit of stochastic selection suggests that ensembles of diverse components can be a general way to harden detection methods.
Practical deployment would require measuring the added latency from the regeneration step against the security gain.

Load-bearing premise

Semantic consistency measured in feature-space embeddings between the original image and the text-to-image regenerated image reliably signals the presence or absence of adversarial perturbations.

What would settle it

An adaptive attack that causes the vision-language model to produce an incorrect output while still making the regenerated image's embedding nearly identical to the original image's embedding would show the consistency check is not sufficient.

Figures

Figures reproduced from arXiv: 2406.09250 by Ivan Laptev, Karthik Nandakumar, Klea Ziu, Martin Tak\'a\v{c}, Nikita Durasov, Pascal Fua, Samar Fares, Toluwani Aremu.

**Figure 1.** Figure 1: MirrorCheck approach. At inference time, to check if an input image has been adversarially attacked, our framework follows this procedure: (1) generates the text description for the image. (2) use this caption to regenerate the image with a text-to-image model. (3) extract and compare embeddings from both the original and regenerated images using a feature extractor. If the embeddings significantly differ,… view at source ↗

**Figure 2.** Figure 2: An example using our MirrorCheck framework. For both Clean and adversarial (Adv) cases, we use the BLIP model to generate captions for the given images. Stable Diffusion then generates images based on these captions. For the clean image, different image encoders show high similarity between the input image and the generated one. Conversely, when the input image is adversarial, different image encoders show… view at source ↗

**Figure 3.** Figure 3: Effect of our ensemble approach on a victim model (Case study: UniDiffuser). Similarity Scores using Stable Diffusion and CLIP image encoders. Tables 1 and 3 presents the average similarity scores obtained by using CLIP image encoders to extract the embeddings of input images in different settings and generate images using Stable Diffusion. The results presented in Tables 1 and 3 are based on evaluation… view at source ↗

**Figure 4.** Figure 4: Visual results using BLIP (Victim Model) and Stable Diffusion (T2I Model). On the left are the images generated using the adversarial images+texts and on the right are the images generated using the clean images+texts. 4.4 Ablations Generalization to alternative image encoders and image generation methods. We demonstrate the versatility of MirrorCheck by testing it with different Text-to-Image (T2I) models… view at source ↗

**Figure 5.** Figure 5: We carry out ablations to observe the performance of our approach, MirrorCheck, when we replace our baseline T2I Model (Stable Diffusion) with UniDiffuser (UD) and ControlNet (CN). We then compare our detection accuracies with baselines (Feature Squeezing (FS [Xu et al., 2017]), MagNet (MN) [Meng and Chen, 2017], PuVAE (PV) [Hwang et al., 2019]). Detailed results can be found in Appendices C.1, C.2, and C.… view at source ↗

**Figure 6.** Figure 6: Illustration of the adaptive attack pipeline: (1.) Rather than use the discrete output of the Victim Model (I2T), the attacker seamlessly integrates the embedding layer for the text decoder (2.) with the decoding module of the generative model (T2I), using an Adapter for semantics alignment. The goal of the adapter is to (3.) craft adversarial images xadv such that its distance d from target caption t and … view at source ↗

**Figure 7.** Figure 7: Visual results using BLIP (Victim Model) and Stable Diffusion (T2I Model). 28 [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗

read the original abstract

Vision-Language Models (VLMs) are increasingly susceptible to sophisticated adversarial attacks, including adaptive strategies specifically designed to bypass existing defenses. To address this vulnerability, we propose MirrorCheck, a robust and model-agnostic detection framework that operates effectively in both unimodal and multimodal settings. MirrorCheck leverages Text-to-Image (T2I) models to regenerate visual content from captions produced by the target model and assesses semantic consistency by comparing feature-space embeddings between the original and synthesized images. To enhance robustness against adaptive attacks, MirrorCheck introduces a stochastic defense strategy that randomly selects T2I generators and image encoders from a diverse model zoo. Additionally, we incorporate a novel One-Time-Use (OTU) perturbation applied to the selected encoder embeddings, regulated by a scaling factor, which decreases the effectiveness of adaptive attacks. Extensive experiments across multiple threat scenarios demonstrate that MirrorCheck consistently outperforms baseline methods, and maintains its utility even under strong adaptive adversarial conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MirrorCheck combines T2I regeneration for consistency checks with stochastic model selection and a one-time perturbation, but the abstract supplies no numbers or attack details to support the adaptive-attack claims.

read the letter

The main thing to know is that this paper offers a model-agnostic defense for vision-language models built around regenerating images from the model's own captions, then measuring embedding consistency between the original and the regenerated version. It layers on random selection from a zoo of T2I generators and encoders plus a one-time-use perturbation scaled by a tunable factor to blunt adaptive attacks. That specific combination is new relative to the single-component defenses mentioned in the abstract. The approach is practical in that it avoids retraining the target VLM and tries to handle both unimodal and multimodal cases. The paper does a reasonable job framing the threat model around adaptive strategies that existing defenses often fail against. The soft spots are more substantial. The abstract states that MirrorCheck outperforms baselines and holds up under strong adaptive conditions, yet it gives no quantitative results, no datasets, no error bars, and no description of how the adaptive attacks were constructed. Without those, it is impossible to tell whether the reported gains are real or whether the attacks simply did not jointly optimize over the random zoo choices and the OTU term. The stress-test concern therefore stands on the information available: incomplete threat modeling could make the robustness look better than it is. The core assumption that feature-space consistency between original and T2I-regenerated images reliably flags adversarial perturbations is plausible but untested in the summary, and the scaling factor being tunable invites questions about post-hoc adjustment. This work is aimed at researchers focused on practical security for deployed VLMs. A reader looking for concrete defense ideas could extract some value from the pipeline description, but the missing experimental evidence limits how much weight to give the claims right now. The paper deserves a serious referee to examine the full experiments, attack implementations, and any code or data releases. I would send it to peer review rather than desk-reject.

Referee Report

3 major / 1 minor

Summary. The paper proposes MirrorCheck, a model-agnostic adversarial detection framework for vision-language models. It regenerates images via text-to-image models from VLM captions, measures semantic consistency through feature-space embedding comparisons between original and regenerated images, employs stochastic selection of T2I generators and encoders from a model zoo, and adds a one-time-use (OTU) perturbation to embeddings controlled by a scaling factor. The abstract claims that MirrorCheck outperforms baselines and retains utility under strong adaptive attacks across multiple threat scenarios.

Significance. If the empirical claims hold with rigorous validation, the work could offer a practical, efficient defense for VLMs by leveraging regeneration and stochasticity to counter adaptive threats. The OTU perturbation and model-zoo randomization represent a concrete attempt to raise the attacker's optimization burden. However, the absence of quantitative results, error bars, or dataset details in the abstract makes it difficult to assess whether the central robustness claim is substantiated.

major comments (3)

[Abstract] Abstract: the claim that MirrorCheck 'consistently outperforms baseline methods' and 'maintains its utility even under strong adaptive adversarial conditions' is unsupported by any quantitative results, error bars, dataset details, or threat-model specifications, preventing evaluation of the central empirical claim.
[Experiments (adaptive attacks)] The adaptive-attack evaluation (implied in the threat scenarios) does not appear to jointly optimize over the stochastic T2I/encoder selection and the OTU perturbation term; if the reported attacks omit these random components, the outperformance may reflect incomplete threat modeling rather than intrinsic robustness.
[Method (OTU perturbation)] The scaling factor regulating the OTU perturbation is described as a tunable regulator; without an explicit statement that it is fixed before seeing test data or an ablation showing sensitivity, post-hoc tuning cannot be ruled out and could inflate reported performance.

minor comments (1)

[Abstract] Abstract does not name the specific VLMs, T2I models, datasets, or feature encoders used in the experiments.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments on our work. We address each of the major comments point by point below, providing clarifications based on the content of the full manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that MirrorCheck 'consistently outperforms baseline methods' and 'maintains its utility even under strong adaptive adversarial conditions' is unsupported by any quantitative results, error bars, dataset details, or threat-model specifications, preventing evaluation of the central empirical claim.

Authors: The abstract serves as a concise summary of the detailed experimental results presented in the main body of the paper. The full manuscript includes quantitative performance metrics, error bars from multiple runs, specific dataset descriptions (e.g., various VLM benchmarks), and explicit threat model specifications across different attack scenarios. We will revise the abstract to incorporate key quantitative highlights to better support the claims. revision: yes
Referee: [Experiments (adaptive attacks)] The adaptive-attack evaluation (implied in the threat scenarios) does not appear to jointly optimize over the stochastic T2I/encoder selection and the OTU perturbation term; if the reported attacks omit these random components, the outperformance may reflect incomplete threat modeling rather than intrinsic robustness.

Authors: Our adaptive attack evaluations explicitly account for the stochastic components by performing attacks under the expectation over the random model selections from the zoo. The OTU perturbation is incorporated into the defense mechanism, and attackers are assumed to have knowledge of the defense strategy but must contend with the one-time-use nature and randomization, which significantly increases the optimization difficulty. Details of the threat modeling are provided in the experiments section. revision: no
Referee: [Method (OTU perturbation)] The scaling factor regulating the OTU perturbation is described as a tunable regulator; without an explicit statement that it is fixed before seeing test data or an ablation showing sensitivity, post-hoc tuning cannot be ruled out and could inflate reported performance.

Authors: The scaling factor is determined using a separate validation set prior to any test evaluations and is held fixed throughout the experiments. We include an ablation study in the supplementary material that analyzes the sensitivity of performance to different values of this scaling factor, confirming the robustness of the chosen value. revision: partial

Circularity Check

0 steps flagged

Empirical defense method with no derivation chain or self-referential reductions

full rationale

The paper proposes MirrorCheck as an empirical detection framework relying on T2I regeneration, feature embedding comparison, stochastic model-zoo selection, and an OTU perturbation with a tunable scaling factor. No mathematical derivation, first-principles prediction, or uniqueness theorem is claimed; performance is evaluated via experiments across threat models. The scaling factor is described as a regulator, not a fitted parameter that defines results by construction. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results appear in the method description. The central claims rest on experimental outperformance rather than any input-equivalent reduction, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on one explicit tunable parameter and a domain assumption about embeddings; no new entities are postulated.

free parameters (1)

scaling factor for OTU perturbation
Regulates the magnitude of the one-time-use perturbation applied to encoder embeddings; its value is chosen to decrease adaptive attack effectiveness.

axioms (1)

domain assumption Feature-space embeddings from image encoders capture semantic consistency between original and T2I-synthesized images
Central to the consistency assessment step described in the abstract.

pith-pipeline@v0.9.0 · 5721 in / 1108 out tokens · 25932 ms · 2026-05-25T09:00:00.214724+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety
cs.CR 2025-02 unverdicted novelty 2.0

A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.

Reference graph

Works this paper leans on

106 extracted references · 106 canonical work pages · cited by 1 Pith paper · 16 internal anchors

[1]

Aafaq, N

N. Aafaq, N. Akhtar, W. Liu, M. Shah, and A. Mian. Controlled caption generation for images through adversarial attacks. arXiv preprint arXiv:2107.03050, 2021

work page arXiv 2021
[2]

Andriushchenko and N

M. Andriushchenko and N. Flammarion. Understanding and improving fast adversarial training. In Advances in Neural Information Processing Systems, 2020

work page 2020
[3]

Athalye, N

A. Athalye, N. Carlini, and D. A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning, 2018 a . URL https://api.semanticscholar.org/CorpusID:3310672

work page 2018
[4]

Athalye, L

A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok. Synthesizing robust adversarial examples. In International conference on machine learning, pages 284--293. PMLR, 2018 b

work page 2018
[5]

Baevski, Y

A. Baevski, Y. Zhou, A. Mohamed, and M. Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33: 0 12449--12460, 2020

work page 2020
[6]

F. Bao, S. Nie, K. Xue, Y. Cao, C. Li, H. Su, and J. Zhu. All are worth words: A vit backbone for diffusion models. In CVPR, 2023 a

work page 2023
[7]

F. Bao, S. Nie, K. Xue, C. Li, S. Pu, Y. Wang, G. Yue, Y. Cao, H. Su, and J. Zhu. One transformer fits all distributions in multi-modal diffusion at scale. In International Conference on Machine Learning, pages 1692--1717. PMLR, 2023 b

work page 2023
[8]

H. Bao, W. Wang, L. Dong, Q. Liu, O. K. Mohammed, K. Aggarwal, S. Som, S. Piao, and F. Wei. VLM o: Unified vision-language pre-training with mixture-of-modality-experts. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=bydKs84JEyw

work page 2022
[9]

Bartolo, T

M. Bartolo, T. Thrush, R. Jia, S. Riedel, P. Stenetorp, and D. Kiela. Improving question answering model robustness with synthetic adversarial data generation. arXiv preprint arXiv:2104.08678, 2021

work page arXiv 2021
[10]

Importance Weighted Autoencoders

Y. Burda, R. Grosse, and R. Salakhutdinov. Importance weighted autoencoders. arXiv preprint arXiv:1509.00519, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[11]

Y. Cao, D. Li, M. Fang, T. Zhou, J. Gao, Y. Zhan, and D. Tao. Tasa: Deceiving question answering models by twin answer sentences attack. arXiv preprint arXiv:2210.15221, 2022

work page arXiv 2022
[12]

Carlini and D

N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39--57. Ieee, 2017

work page 2017
[13]

Carlini, F

N. Carlini, F. Tramer, K. D. Dvijotham, L. Rice, M. Sun, and J. Z. Kolter. (certified!!) adversarial robustness for free! In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=JLg5aHHv7j

work page 2023
[14]

H. Chen, H. Zhang, P.-Y. Chen, J. Yi, and C.-J. Hsieh. Attacking visual language grounding with adversarial examples: A case study on neural image captioning. arXiv preprint arXiv:1712.02051, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Cohen, E

J. Cohen, E. Rosenfeld, and Z. Kolter. Certified adversarial robustness via randomized smoothing. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 1310--1320. PMLR, 09--15 Jun 2019. URL https://proceedings.mlr.press/v97/cohen19c.html

work page 2019
[16]

N. Das, M. Shanbhogue, S.-T. Chen, F. Hohman, S. Li, L. Chen, M. E. Kounavis, and D. H. Chau. Shield: Fast, practical defense and vaccination for deep learning using jpeg compression. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD '18, page 196–204, New York, NY, USA, 2018. Association for Computin...

work page doi:10.1145/3219819.3219910 2018
[17]

de Jorge, A

P. de Jorge, A. Bibi, R. Volpi, A. Sanyal, P. Torr, G. Rogez, and P. K. Dokania. Make some noise: Reliable and efficient single-step adversarial training. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=NENo__bExYu

work page 2022
[18]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248--255. Ieee, 2009

work page 2009
[19]

Z. Deng, X. Yang, S. Xu, H. Su, and J. Zhu. Libre: A practical bayesian approach to adversarial detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 972--982, 2021

work page 2021
[20]

J. Dong, S. Moosavi-Dezfooli, J. Lai, and X. Xie. The enemy of my enemy is my friend: Exploring inverse adversaries for improving adversarial training. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24678--24687, Los Alamitos, CA, USA, jun 2023. IEEE Computer Society. doi:10.1109/CVPR52729.2023.02364. URL https://doi....

work page doi:10.1109/cvpr52729.2023.02364 2023
[21]

Dosovitskiy, L

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021

work page 2021
[22]

Durasov, N

N. Durasov, N. Dorndorf, H. Le, and P. Fua. Zigzag: Universal sampling-free uncertainty estimation through two-step inference. Transactions on Machine Learning Research, 2024 a . ISSN 2835-8856. URL https://openreview.net/forum?id=QSvb6jBXML

work page 2024
[23]

Durasov, D

N. Durasov, D. Oner, J. Donier, H. Le, and P. Fua. Enabling uncertainty estimation in iterative neural networks. In International Conference on Machine Learning, 2024 b

work page 2024
[24]

Detecting Adversarial Samples from Artifacts

R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[25]

Gangloff, M.-T

H. Gangloff, M.-T. Pham, L. Courtrai, and S. Lef \`e vre. Leveraging vector-quantized variational autoencoder inner metrics for anomaly detection. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 435--441. IEEE, 2022

work page 2022
[26]

I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples, 2015

work page 2015
[27]

On the (Statistical) Detection of Adversarial Examples

K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. McDaniel. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[28]

J. Guo, J. Li, D. Li, A. M. Huat Tiong, B. Li, D. Tao, and S. Hoi. From images to textual prompts: Zero-shot visual question answering with frozen large language models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10867--10877, 2023. doi:10.1109/CVPR52729.2023.01046

work page doi:10.1109/cvpr52729.2023.01046 2023
[29]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016

work page 2016
[30]

Higgins, L

I. Higgins, L. Matthey, A. Pal, C. P. Burgess, X. Glorot, M. M. Botvinick, S. Mohamed, and A. Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework. ICLR (Poster), 3, 2017

work page 2017
[31]

G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. science, 313 0 (5786): 0 504--507, 2006

work page 2006
[32]

Ho and N

C.-H. Ho and N. Vasconcelos. Disco: Adversarial defense with local implicit functions. Advances in Neural Information Processing Systems, 35: 0 23818--23837, 2022

work page 2022
[33]

Adversarial Attacks on Neural Network Policies

S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[34]

Hwang, J

U. Hwang, J. Park, H. Jang, S. Yoon, and N. I. Cho. Puvae: A variational autoencoder to purify adversarial examples. IEEE Access, 7: 0 126582--126593, 2019

work page 2019
[35]

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[36]

Ilharco, M

G. Ilharco, M. Wortsman, R. Wightman, C. Gordon, N. Carlini, R. Taori, A. Dave, V. Shankar, H. Namkoong, J. Miller, H. Hajishirzi, A. Farhadi, and L. Schmidt. Openclip, 2021. URL https://doi.org/10.5281/zenodo.5143773. If you use this software, please cite it as below

work page doi:10.5281/zenodo.5143773 2021
[37]

E. Jang, S. Gu, and B. Poole. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rkE3y85ee

work page 2017
[38]

C. Jia, Y. Yang, Y. Xia, Y.-T. Chen, Z. Parekh, H. Pham, Q. Le, Y.-H. Sung, Z. Li, and T. Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904--4916. PMLR, 2021

work page 2021
[39]

Kaushik, D

D. Kaushik, D. Kiela, Z. C. Lipton, and W.-t. Yih. On the efficacy of adversarial data collection for question answering: Results from a large-scale randomized study. arXiv preprint arXiv:2106.00872, 2021

work page arXiv 2021
[40]

D. P. Kingma and M. Welling. Auto-Encoding Variational Bayes . In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings , 2014

work page 2014
[41]

Kovatchev, T

V. Kovatchev, T. Chatterjee, V. S. Govindarajan, J. Chen, E. Choi, G. Chronis, A. Das, K. Erk, M. Lease, J. J. Li, et al. longhorns at dadc 2022: How many linguists does it take to fool a question answering model? a systematic approach to adversarial attacks. arXiv preprint arXiv:2206.14729, 2022

work page arXiv 2022
[42]

Adversarial Machine Learning at Scale

A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[43]

Kurakin, I

A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=BJm4T4Kgx

work page 2017
[44]

Kurakin, I

A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In Artificial intelligence safety and security, pages 99--112. Chapman and Hall/CRC, 2018

work page 2018
[45]

C. Li, S. Gao, C. Deng, D. Xie, and W. Liu. Cross-modal learning with adversarial samples. Advances in neural information processing systems, 32, 2019 a

work page 2019
[46]

C. Li, S. Gao, C. Deng, W. Liu, and H. Huang. Adversarial attack on deep cross-modal hamming retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2218--2227, 2021 a

work page 2021
[47]

D. Li, J. Li, H. Le, G. Wang, S. Savarese, and S. C. Hoi. LAVIS : A one-stop library for language-vision intelligence. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 31--41, Toronto, Canada, July 2023 a . Association for Computational Linguistics. URL https://aclanthology...

work page 2023
[48]

J. Li, R. Selvaraju, A. Gotmare, S. Joty, C. Xiong, and S. C. H. Hoi. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34: 0 9694--9705, 2021 b

work page 2021
[49]

J. Li, D. Li, C. Xiong, and S. Hoi. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation, 2022

work page 2022
[50]

J. Li, D. Li, S. Savarese, and S. Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning, pages 19730--19742. PMLR, 2023 b

work page 2023
[51]

L. Li, J. Lei, Z. Gan, and J. Liu. Adversarial vqa: A new benchmark for evaluating the robustness of vqa models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2042--2051, 2021 c

work page 2042
[52]

L. H. Li, M. Yatskar, D. Yin, C.-J. Hsieh, and K.-W. Chang. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557, 2019 b

work page internal anchor Pith review Pith/arXiv arXiv 1908
[53]

X. Li, X. Yin, C. Li, P. Zhang, X. Hu, L. Zhang, L. Wang, H. Hu, L. Dong, F. Wei, et al. Oscar: Object-semantics aligned pre-training for vision-language tasks. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXX 16, pages 121--137. Springer, 2020

work page 2020
[54]

T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll \'a r, and C. L. Zitnick. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740--755. Springer, 2014

work page 2014
[55]

K. Lis, K. Nakka, M. Salzmann, and P. Fua. Detecting the Unexpected via Image Resynthesis . In International Conference on Computer Vision, 2019

work page 2019
[56]

K. Lis, S. Honari, P. Fua, and M. Salzmann. Detecting Road Obstacles by Erasing Them . In Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024
[57]

H. Liu, C. Li, Q. Wu, and Y. J. Lee. Visual instruction tuning, 2023

work page 2023
[58]

Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[59]

C. J. Maddison, A. Mnih, and Y. W. Teh. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[60]

Madry, A

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deepd learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJzIBfZAb

work page 2018
[61]

Adversarial Autoencoders

A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow. Adversarial autoencoders. In International Conference on Learning Representations, 2016. URL http://arxiv.org/abs/1511.05644

work page internal anchor Pith review Pith/arXiv arXiv 2016
[62]

Meng and H

D. Meng and H. Chen. Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 135--147, 2017

work page 2017
[63]

J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. On detecting adversarial perturbations. In International Conference on Learning Representations, 2017 a . URL https://openreview.net/forum?id=SJzCSf9xg

work page 2017
[64]

J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267, 2017 b

work page internal anchor Pith review Pith/arXiv arXiv 2017
[65]

Moosavi-Dezfooli, A

S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: A simple and accurate method to fool deep neural networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2574--2582, Los Alamitos, CA, USA, jun 2016. IEEE Computer Society. doi:10.1109/CVPR.2016.282. URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.282

work page doi:10.1109/cvpr.2016.282 2016
[66]

Nesterov and V

Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17 0 (2): 0 527--566, 2017

work page 2017
[67]

Nguyen, J

A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 427--436, 2015

work page 2015
[68]

W. Nie, B. Guo, Y. Huang, C. Xiao, A. Vahdat, and A. Anandkumar. Diffusion models for adversarial purification. In International Conference on Machine Learning (ICML), 2022

work page 2022
[69]

Papernot, P

N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P), pages 372--387. IEEE, 2016 a

work page 2016
[70]

Papernot, P

N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE symposium on security and privacy (SP), pages 582--597. IEEE, 2016 b

work page 2016
[71]

J. S. Park, J. O'Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1--22, 2023

work page 2023
[72]

Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, and L. Carin. Variational autoencoder for deep learning of images, labels and captions. Advances in neural information processing systems, 29, 2016

work page 2016
[73]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Lear...

work page 2021
[74]

Rombach, A

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models, 2022

work page 2022
[75]

K. Roth, Y. Kilcher, and T. Hofmann. The odds are odd: A statistical test for detecting adversarial examples. In International Conference on Machine Learning, pages 5498--5507. PMLR, 2019

work page 2019
[76]

Sadhu, D

S. Sadhu, D. He, C.-W. Huang, S. H. Mallidi, M. Wu, A. Rastrow, A. Stolcke, J. Droppo, and R. Maas. Wav2vec-c: A self-supervised model for speech representation learning. arXiv preprint arXiv:2103.08393, 2021

work page arXiv 2021
[77]

Salman, M

H. Salman, M. Sun, G. Yang, A. Kapoor, and J. Z. Kolter. Denoised smoothing: a provable defense for pretrained classifiers. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS'20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546

work page 2020
[78]

Samangouei, M

P. Samangouei, M. Kabkab, and R. Chellappa. Defense- GAN : Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BkJ3ibb0-

work page 2018
[79]

Sandler, A

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510--4520, 2018

work page 2018
[80]

Schuhmann, R

C. Schuhmann, R. Beaumont, R. Vencu, C. W. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, P. Schramowski, S. R. Kundurthy, K. Crowson, L. Schmidt, R. Kaczmarczyk, and J. Jitsev. LAION -5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing S...

work page 2022

Showing first 80 references.

[1] [1]

Aafaq, N

N. Aafaq, N. Akhtar, W. Liu, M. Shah, and A. Mian. Controlled caption generation for images through adversarial attacks. arXiv preprint arXiv:2107.03050, 2021

work page arXiv 2021

[2] [2]

Andriushchenko and N

M. Andriushchenko and N. Flammarion. Understanding and improving fast adversarial training. In Advances in Neural Information Processing Systems, 2020

work page 2020

[3] [3]

Athalye, N

A. Athalye, N. Carlini, and D. A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning, 2018 a . URL https://api.semanticscholar.org/CorpusID:3310672

work page 2018

[4] [4]

Athalye, L

A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok. Synthesizing robust adversarial examples. In International conference on machine learning, pages 284--293. PMLR, 2018 b

work page 2018

[5] [5]

Baevski, Y

A. Baevski, Y. Zhou, A. Mohamed, and M. Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33: 0 12449--12460, 2020

work page 2020

[6] [6]

F. Bao, S. Nie, K. Xue, Y. Cao, C. Li, H. Su, and J. Zhu. All are worth words: A vit backbone for diffusion models. In CVPR, 2023 a

work page 2023

[7] [7]

F. Bao, S. Nie, K. Xue, C. Li, S. Pu, Y. Wang, G. Yue, Y. Cao, H. Su, and J. Zhu. One transformer fits all distributions in multi-modal diffusion at scale. In International Conference on Machine Learning, pages 1692--1717. PMLR, 2023 b

work page 2023

[8] [8]

H. Bao, W. Wang, L. Dong, Q. Liu, O. K. Mohammed, K. Aggarwal, S. Som, S. Piao, and F. Wei. VLM o: Unified vision-language pre-training with mixture-of-modality-experts. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=bydKs84JEyw

work page 2022

[9] [9]

Bartolo, T

M. Bartolo, T. Thrush, R. Jia, S. Riedel, P. Stenetorp, and D. Kiela. Improving question answering model robustness with synthetic adversarial data generation. arXiv preprint arXiv:2104.08678, 2021

work page arXiv 2021

[10] [10]

Importance Weighted Autoencoders

Y. Burda, R. Grosse, and R. Salakhutdinov. Importance weighted autoencoders. arXiv preprint arXiv:1509.00519, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[11] [11]

Y. Cao, D. Li, M. Fang, T. Zhou, J. Gao, Y. Zhan, and D. Tao. Tasa: Deceiving question answering models by twin answer sentences attack. arXiv preprint arXiv:2210.15221, 2022

work page arXiv 2022

[12] [12]

Carlini and D

N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39--57. Ieee, 2017

work page 2017

[13] [13]

Carlini, F

N. Carlini, F. Tramer, K. D. Dvijotham, L. Rice, M. Sun, and J. Z. Kolter. (certified!!) adversarial robustness for free! In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=JLg5aHHv7j

work page 2023

[14] [14]

H. Chen, H. Zhang, P.-Y. Chen, J. Yi, and C.-J. Hsieh. Attacking visual language grounding with adversarial examples: A case study on neural image captioning. arXiv preprint arXiv:1712.02051, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Cohen, E

J. Cohen, E. Rosenfeld, and Z. Kolter. Certified adversarial robustness via randomized smoothing. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 1310--1320. PMLR, 09--15 Jun 2019. URL https://proceedings.mlr.press/v97/cohen19c.html

work page 2019

[16] [16]

N. Das, M. Shanbhogue, S.-T. Chen, F. Hohman, S. Li, L. Chen, M. E. Kounavis, and D. H. Chau. Shield: Fast, practical defense and vaccination for deep learning using jpeg compression. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD '18, page 196–204, New York, NY, USA, 2018. Association for Computin...

work page doi:10.1145/3219819.3219910 2018

[17] [17]

de Jorge, A

P. de Jorge, A. Bibi, R. Volpi, A. Sanyal, P. Torr, G. Rogez, and P. K. Dokania. Make some noise: Reliable and efficient single-step adversarial training. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=NENo__bExYu

work page 2022

[18] [18]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248--255. Ieee, 2009

work page 2009

[19] [19]

Z. Deng, X. Yang, S. Xu, H. Su, and J. Zhu. Libre: A practical bayesian approach to adversarial detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 972--982, 2021

work page 2021

[20] [20]

J. Dong, S. Moosavi-Dezfooli, J. Lai, and X. Xie. The enemy of my enemy is my friend: Exploring inverse adversaries for improving adversarial training. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24678--24687, Los Alamitos, CA, USA, jun 2023. IEEE Computer Society. doi:10.1109/CVPR52729.2023.02364. URL https://doi....

work page doi:10.1109/cvpr52729.2023.02364 2023

[21] [21]

Dosovitskiy, L

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021

work page 2021

[22] [22]

Durasov, N

N. Durasov, N. Dorndorf, H. Le, and P. Fua. Zigzag: Universal sampling-free uncertainty estimation through two-step inference. Transactions on Machine Learning Research, 2024 a . ISSN 2835-8856. URL https://openreview.net/forum?id=QSvb6jBXML

work page 2024

[23] [23]

Durasov, D

N. Durasov, D. Oner, J. Donier, H. Le, and P. Fua. Enabling uncertainty estimation in iterative neural networks. In International Conference on Machine Learning, 2024 b

work page 2024

[24] [24]

Detecting Adversarial Samples from Artifacts

R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[25] [25]

Gangloff, M.-T

H. Gangloff, M.-T. Pham, L. Courtrai, and S. Lef \`e vre. Leveraging vector-quantized variational autoencoder inner metrics for anomaly detection. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 435--441. IEEE, 2022

work page 2022

[26] [26]

I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples, 2015

work page 2015

[27] [27]

On the (Statistical) Detection of Adversarial Examples

K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. McDaniel. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[28] [28]

J. Guo, J. Li, D. Li, A. M. Huat Tiong, B. Li, D. Tao, and S. Hoi. From images to textual prompts: Zero-shot visual question answering with frozen large language models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10867--10877, 2023. doi:10.1109/CVPR52729.2023.01046

work page doi:10.1109/cvpr52729.2023.01046 2023

[29] [29]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016

work page 2016

[30] [30]

Higgins, L

I. Higgins, L. Matthey, A. Pal, C. P. Burgess, X. Glorot, M. M. Botvinick, S. Mohamed, and A. Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework. ICLR (Poster), 3, 2017

work page 2017

[31] [31]

G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. science, 313 0 (5786): 0 504--507, 2006

work page 2006

[32] [32]

Ho and N

C.-H. Ho and N. Vasconcelos. Disco: Adversarial defense with local implicit functions. Advances in Neural Information Processing Systems, 35: 0 23818--23837, 2022

work page 2022

[33] [33]

Adversarial Attacks on Neural Network Policies

S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[34] [34]

Hwang, J

U. Hwang, J. Park, H. Jang, S. Yoon, and N. I. Cho. Puvae: A variational autoencoder to purify adversarial examples. IEEE Access, 7: 0 126582--126593, 2019

work page 2019

[35] [35]

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[36] [36]

Ilharco, M

G. Ilharco, M. Wortsman, R. Wightman, C. Gordon, N. Carlini, R. Taori, A. Dave, V. Shankar, H. Namkoong, J. Miller, H. Hajishirzi, A. Farhadi, and L. Schmidt. Openclip, 2021. URL https://doi.org/10.5281/zenodo.5143773. If you use this software, please cite it as below

work page doi:10.5281/zenodo.5143773 2021

[37] [37]

E. Jang, S. Gu, and B. Poole. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rkE3y85ee

work page 2017

[38] [38]

C. Jia, Y. Yang, Y. Xia, Y.-T. Chen, Z. Parekh, H. Pham, Q. Le, Y.-H. Sung, Z. Li, and T. Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904--4916. PMLR, 2021

work page 2021

[39] [39]

Kaushik, D

D. Kaushik, D. Kiela, Z. C. Lipton, and W.-t. Yih. On the efficacy of adversarial data collection for question answering: Results from a large-scale randomized study. arXiv preprint arXiv:2106.00872, 2021

work page arXiv 2021

[40] [40]

D. P. Kingma and M. Welling. Auto-Encoding Variational Bayes . In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings , 2014

work page 2014

[41] [41]

Kovatchev, T

V. Kovatchev, T. Chatterjee, V. S. Govindarajan, J. Chen, E. Choi, G. Chronis, A. Das, K. Erk, M. Lease, J. J. Li, et al. longhorns at dadc 2022: How many linguists does it take to fool a question answering model? a systematic approach to adversarial attacks. arXiv preprint arXiv:2206.14729, 2022

work page arXiv 2022

[42] [42]

Adversarial Machine Learning at Scale

A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[43] [43]

Kurakin, I

A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=BJm4T4Kgx

work page 2017

[44] [44]

Kurakin, I

A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In Artificial intelligence safety and security, pages 99--112. Chapman and Hall/CRC, 2018

work page 2018

[45] [45]

C. Li, S. Gao, C. Deng, D. Xie, and W. Liu. Cross-modal learning with adversarial samples. Advances in neural information processing systems, 32, 2019 a

work page 2019

[46] [46]

C. Li, S. Gao, C. Deng, W. Liu, and H. Huang. Adversarial attack on deep cross-modal hamming retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2218--2227, 2021 a

work page 2021

[47] [47]

D. Li, J. Li, H. Le, G. Wang, S. Savarese, and S. C. Hoi. LAVIS : A one-stop library for language-vision intelligence. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 31--41, Toronto, Canada, July 2023 a . Association for Computational Linguistics. URL https://aclanthology...

work page 2023

[48] [48]

J. Li, R. Selvaraju, A. Gotmare, S. Joty, C. Xiong, and S. C. H. Hoi. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34: 0 9694--9705, 2021 b

work page 2021

[49] [49]

J. Li, D. Li, C. Xiong, and S. Hoi. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation, 2022

work page 2022

[50] [50]

J. Li, D. Li, S. Savarese, and S. Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning, pages 19730--19742. PMLR, 2023 b

work page 2023

[51] [51]

L. Li, J. Lei, Z. Gan, and J. Liu. Adversarial vqa: A new benchmark for evaluating the robustness of vqa models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2042--2051, 2021 c

work page 2042

[52] [52]

L. H. Li, M. Yatskar, D. Yin, C.-J. Hsieh, and K.-W. Chang. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557, 2019 b

work page internal anchor Pith review Pith/arXiv arXiv 1908

[53] [53]

X. Li, X. Yin, C. Li, P. Zhang, X. Hu, L. Zhang, L. Wang, H. Hu, L. Dong, F. Wei, et al. Oscar: Object-semantics aligned pre-training for vision-language tasks. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXX 16, pages 121--137. Springer, 2020

work page 2020

[54] [54]

T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll \'a r, and C. L. Zitnick. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740--755. Springer, 2014

work page 2014

[55] [55]

K. Lis, K. Nakka, M. Salzmann, and P. Fua. Detecting the Unexpected via Image Resynthesis . In International Conference on Computer Vision, 2019

work page 2019

[56] [56]

K. Lis, S. Honari, P. Fua, and M. Salzmann. Detecting Road Obstacles by Erasing Them . In Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024

[57] [57]

H. Liu, C. Li, Q. Wu, and Y. J. Lee. Visual instruction tuning, 2023

work page 2023

[58] [58]

Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[59] [59]

C. J. Maddison, A. Mnih, and Y. W. Teh. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[60] [60]

Madry, A

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deepd learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJzIBfZAb

work page 2018

[61] [61]

Adversarial Autoencoders

A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow. Adversarial autoencoders. In International Conference on Learning Representations, 2016. URL http://arxiv.org/abs/1511.05644

work page internal anchor Pith review Pith/arXiv arXiv 2016

[62] [62]

Meng and H

D. Meng and H. Chen. Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 135--147, 2017

work page 2017

[63] [63]

J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. On detecting adversarial perturbations. In International Conference on Learning Representations, 2017 a . URL https://openreview.net/forum?id=SJzCSf9xg

work page 2017

[64] [64]

J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267, 2017 b

work page internal anchor Pith review Pith/arXiv arXiv 2017

[65] [65]

Moosavi-Dezfooli, A

S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: A simple and accurate method to fool deep neural networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2574--2582, Los Alamitos, CA, USA, jun 2016. IEEE Computer Society. doi:10.1109/CVPR.2016.282. URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.282

work page doi:10.1109/cvpr.2016.282 2016

[66] [66]

Nesterov and V

Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17 0 (2): 0 527--566, 2017

work page 2017

[67] [67]

Nguyen, J

A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 427--436, 2015

work page 2015

[68] [68]

W. Nie, B. Guo, Y. Huang, C. Xiao, A. Vahdat, and A. Anandkumar. Diffusion models for adversarial purification. In International Conference on Machine Learning (ICML), 2022

work page 2022

[69] [69]

Papernot, P

N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P), pages 372--387. IEEE, 2016 a

work page 2016

[70] [70]

Papernot, P

N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE symposium on security and privacy (SP), pages 582--597. IEEE, 2016 b

work page 2016

[71] [71]

J. S. Park, J. O'Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1--22, 2023

work page 2023

[72] [72]

Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, and L. Carin. Variational autoencoder for deep learning of images, labels and captions. Advances in neural information processing systems, 29, 2016

work page 2016

[73] [73]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Lear...

work page 2021

[74] [74]

Rombach, A

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models, 2022

work page 2022

[75] [75]

K. Roth, Y. Kilcher, and T. Hofmann. The odds are odd: A statistical test for detecting adversarial examples. In International Conference on Machine Learning, pages 5498--5507. PMLR, 2019

work page 2019

[76] [76]

Sadhu, D

S. Sadhu, D. He, C.-W. Huang, S. H. Mallidi, M. Wu, A. Rastrow, A. Stolcke, J. Droppo, and R. Maas. Wav2vec-c: A self-supervised model for speech representation learning. arXiv preprint arXiv:2103.08393, 2021

work page arXiv 2021

[77] [77]

Salman, M

H. Salman, M. Sun, G. Yang, A. Kapoor, and J. Z. Kolter. Denoised smoothing: a provable defense for pretrained classifiers. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS'20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546

work page 2020

[78] [78]

Samangouei, M

P. Samangouei, M. Kabkab, and R. Chellappa. Defense- GAN : Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BkJ3ibb0-

work page 2018

[79] [79]

Sandler, A

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510--4520, 2018

work page 2018

[80] [80]

Schuhmann, R

C. Schuhmann, R. Beaumont, R. Vencu, C. W. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, P. Schramowski, S. R. Kundurthy, K. Crowson, L. Schmidt, R. Kaczmarczyk, and J. Jitsev. LAION -5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing S...

work page 2022