MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
Pith reviewed 2026-05-25 09:00 UTC · model grok-4.3
The pith
MirrorCheck detects adversarial attacks on vision-language models by regenerating images from their captions and checking embedding consistency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MirrorCheck is a model-agnostic detection framework that regenerates visual content from captions produced by the target vision-language model using text-to-image generators, then measures semantic consistency through feature-space embeddings between the original and synthesized images. Robustness against adaptive attacks is obtained by randomly selecting generators and encoders from a diverse set and by applying a one-time-use perturbation to the chosen encoder embeddings controlled by a scaling factor. Experiments across multiple threat models show that the method outperforms baseline defenses and continues to function under strong adaptive adversarial conditions in both unimodal and multi
What carries the argument
MirrorCheck detection that regenerates an image from the model's caption and compares embeddings to the original input, strengthened by stochastic model selection and a one-time perturbation on embeddings.
If this is right
- Vision-language models can receive protection without any change to their weights or architecture.
- The same regeneration-plus-consistency test applies to both image-only and image-plus-text inputs.
- Random selection among multiple generators and encoders reduces the success rate of attacks planned against a fixed defense.
- The one-time perturbation on embeddings further limits an attacker's ability to optimize against the full detection pipeline.
Where Pith is reading between the lines
- The regeneration step could be replaced by other cross-modal generators if text-to-image models are unavailable or too slow.
- Similar consistency checks might be useful for defending models that process audio or video by regenerating in another modality.
- The benefit of stochastic selection suggests that ensembles of diverse components can be a general way to harden detection methods.
- Practical deployment would require measuring the added latency from the regeneration step against the security gain.
Load-bearing premise
Semantic consistency measured in feature-space embeddings between the original image and the text-to-image regenerated image reliably signals the presence or absence of adversarial perturbations.
What would settle it
An adaptive attack that causes the vision-language model to produce an incorrect output while still making the regenerated image's embedding nearly identical to the original image's embedding would show the consistency check is not sufficient.
Figures
read the original abstract
Vision-Language Models (VLMs) are increasingly susceptible to sophisticated adversarial attacks, including adaptive strategies specifically designed to bypass existing defenses. To address this vulnerability, we propose MirrorCheck, a robust and model-agnostic detection framework that operates effectively in both unimodal and multimodal settings. MirrorCheck leverages Text-to-Image (T2I) models to regenerate visual content from captions produced by the target model and assesses semantic consistency by comparing feature-space embeddings between the original and synthesized images. To enhance robustness against adaptive attacks, MirrorCheck introduces a stochastic defense strategy that randomly selects T2I generators and image encoders from a diverse model zoo. Additionally, we incorporate a novel One-Time-Use (OTU) perturbation applied to the selected encoder embeddings, regulated by a scaling factor, which decreases the effectiveness of adaptive attacks. Extensive experiments across multiple threat scenarios demonstrate that MirrorCheck consistently outperforms baseline methods, and maintains its utility even under strong adaptive adversarial conditions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MirrorCheck, a model-agnostic adversarial detection framework for vision-language models. It regenerates images via text-to-image models from VLM captions, measures semantic consistency through feature-space embedding comparisons between original and regenerated images, employs stochastic selection of T2I generators and encoders from a model zoo, and adds a one-time-use (OTU) perturbation to embeddings controlled by a scaling factor. The abstract claims that MirrorCheck outperforms baselines and retains utility under strong adaptive attacks across multiple threat scenarios.
Significance. If the empirical claims hold with rigorous validation, the work could offer a practical, efficient defense for VLMs by leveraging regeneration and stochasticity to counter adaptive threats. The OTU perturbation and model-zoo randomization represent a concrete attempt to raise the attacker's optimization burden. However, the absence of quantitative results, error bars, or dataset details in the abstract makes it difficult to assess whether the central robustness claim is substantiated.
major comments (3)
- [Abstract] Abstract: the claim that MirrorCheck 'consistently outperforms baseline methods' and 'maintains its utility even under strong adaptive adversarial conditions' is unsupported by any quantitative results, error bars, dataset details, or threat-model specifications, preventing evaluation of the central empirical claim.
- [Experiments (adaptive attacks)] The adaptive-attack evaluation (implied in the threat scenarios) does not appear to jointly optimize over the stochastic T2I/encoder selection and the OTU perturbation term; if the reported attacks omit these random components, the outperformance may reflect incomplete threat modeling rather than intrinsic robustness.
- [Method (OTU perturbation)] The scaling factor regulating the OTU perturbation is described as a tunable regulator; without an explicit statement that it is fixed before seeing test data or an ablation showing sensitivity, post-hoc tuning cannot be ruled out and could inflate reported performance.
minor comments (1)
- [Abstract] Abstract does not name the specific VLMs, T2I models, datasets, or feature encoders used in the experiments.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive comments on our work. We address each of the major comments point by point below, providing clarifications based on the content of the full manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that MirrorCheck 'consistently outperforms baseline methods' and 'maintains its utility even under strong adaptive adversarial conditions' is unsupported by any quantitative results, error bars, dataset details, or threat-model specifications, preventing evaluation of the central empirical claim.
Authors: The abstract serves as a concise summary of the detailed experimental results presented in the main body of the paper. The full manuscript includes quantitative performance metrics, error bars from multiple runs, specific dataset descriptions (e.g., various VLM benchmarks), and explicit threat model specifications across different attack scenarios. We will revise the abstract to incorporate key quantitative highlights to better support the claims. revision: yes
-
Referee: [Experiments (adaptive attacks)] The adaptive-attack evaluation (implied in the threat scenarios) does not appear to jointly optimize over the stochastic T2I/encoder selection and the OTU perturbation term; if the reported attacks omit these random components, the outperformance may reflect incomplete threat modeling rather than intrinsic robustness.
Authors: Our adaptive attack evaluations explicitly account for the stochastic components by performing attacks under the expectation over the random model selections from the zoo. The OTU perturbation is incorporated into the defense mechanism, and attackers are assumed to have knowledge of the defense strategy but must contend with the one-time-use nature and randomization, which significantly increases the optimization difficulty. Details of the threat modeling are provided in the experiments section. revision: no
-
Referee: [Method (OTU perturbation)] The scaling factor regulating the OTU perturbation is described as a tunable regulator; without an explicit statement that it is fixed before seeing test data or an ablation showing sensitivity, post-hoc tuning cannot be ruled out and could inflate reported performance.
Authors: The scaling factor is determined using a separate validation set prior to any test evaluations and is held fixed throughout the experiments. We include an ablation study in the supplementary material that analyzes the sensitivity of performance to different values of this scaling factor, confirming the robustness of the chosen value. revision: partial
Circularity Check
Empirical defense method with no derivation chain or self-referential reductions
full rationale
The paper proposes MirrorCheck as an empirical detection framework relying on T2I regeneration, feature embedding comparison, stochastic model-zoo selection, and an OTU perturbation with a tunable scaling factor. No mathematical derivation, first-principles prediction, or uniqueness theorem is claimed; performance is evaluated via experiments across threat models. The scaling factor is described as a regulator, not a fitted parameter that defines results by construction. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results appear in the method description. The central claims rest on experimental outperformance rather than any input-equivalent reduction, making the work self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- scaling factor for OTU perturbation
axioms (1)
- domain assumption Feature-space embeddings from image encoders capture semantic consistency between original and T2I-synthesized images
Forward citations
Cited by 1 Pith paper
-
Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety
A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.
Reference graph
Works this paper leans on
- [1]
-
[2]
M. Andriushchenko and N. Flammarion. Understanding and improving fast adversarial training. In Advances in Neural Information Processing Systems, 2020
work page 2020
-
[3]
A. Athalye, N. Carlini, and D. A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning, 2018 a . URL https://api.semanticscholar.org/CorpusID:3310672
work page 2018
-
[4]
A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok. Synthesizing robust adversarial examples. In International conference on machine learning, pages 284--293. PMLR, 2018 b
work page 2018
-
[5]
A. Baevski, Y. Zhou, A. Mohamed, and M. Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33: 0 12449--12460, 2020
work page 2020
-
[6]
F. Bao, S. Nie, K. Xue, Y. Cao, C. Li, H. Su, and J. Zhu. All are worth words: A vit backbone for diffusion models. In CVPR, 2023 a
work page 2023
-
[7]
F. Bao, S. Nie, K. Xue, C. Li, S. Pu, Y. Wang, G. Yue, Y. Cao, H. Su, and J. Zhu. One transformer fits all distributions in multi-modal diffusion at scale. In International Conference on Machine Learning, pages 1692--1717. PMLR, 2023 b
work page 2023
-
[8]
H. Bao, W. Wang, L. Dong, Q. Liu, O. K. Mohammed, K. Aggarwal, S. Som, S. Piao, and F. Wei. VLM o: Unified vision-language pre-training with mixture-of-modality-experts. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=bydKs84JEyw
work page 2022
-
[9]
M. Bartolo, T. Thrush, R. Jia, S. Riedel, P. Stenetorp, and D. Kiela. Improving question answering model robustness with synthetic adversarial data generation. arXiv preprint arXiv:2104.08678, 2021
-
[10]
Importance Weighted Autoencoders
Y. Burda, R. Grosse, and R. Salakhutdinov. Importance weighted autoencoders. arXiv preprint arXiv:1509.00519, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
- [11]
-
[12]
N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39--57. Ieee, 2017
work page 2017
-
[13]
N. Carlini, F. Tramer, K. D. Dvijotham, L. Rice, M. Sun, and J. Z. Kolter. (certified!!) adversarial robustness for free! In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=JLg5aHHv7j
work page 2023
-
[14]
H. Chen, H. Zhang, P.-Y. Chen, J. Yi, and C.-J. Hsieh. Attacking visual language grounding with adversarial examples: A case study on neural image captioning. arXiv preprint arXiv:1712.02051, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
J. Cohen, E. Rosenfeld, and Z. Kolter. Certified adversarial robustness via randomized smoothing. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 1310--1320. PMLR, 09--15 Jun 2019. URL https://proceedings.mlr.press/v97/cohen19c.html
work page 2019
-
[16]
N. Das, M. Shanbhogue, S.-T. Chen, F. Hohman, S. Li, L. Chen, M. E. Kounavis, and D. H. Chau. Shield: Fast, practical defense and vaccination for deep learning using jpeg compression. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD '18, page 196–204, New York, NY, USA, 2018. Association for Computin...
-
[17]
P. de Jorge, A. Bibi, R. Volpi, A. Sanyal, P. Torr, G. Rogez, and P. K. Dokania. Make some noise: Reliable and efficient single-step adversarial training. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=NENo__bExYu
work page 2022
-
[18]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248--255. Ieee, 2009
work page 2009
-
[19]
Z. Deng, X. Yang, S. Xu, H. Su, and J. Zhu. Libre: A practical bayesian approach to adversarial detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 972--982, 2021
work page 2021
-
[20]
J. Dong, S. Moosavi-Dezfooli, J. Lai, and X. Xie. The enemy of my enemy is my friend: Exploring inverse adversaries for improving adversarial training. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24678--24687, Los Alamitos, CA, USA, jun 2023. IEEE Computer Society. doi:10.1109/CVPR52729.2023.02364. URL https://doi....
-
[21]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021
work page 2021
-
[22]
N. Durasov, N. Dorndorf, H. Le, and P. Fua. Zigzag: Universal sampling-free uncertainty estimation through two-step inference. Transactions on Machine Learning Research, 2024 a . ISSN 2835-8856. URL https://openreview.net/forum?id=QSvb6jBXML
work page 2024
-
[23]
N. Durasov, D. Oner, J. Donier, H. Le, and P. Fua. Enabling uncertainty estimation in iterative neural networks. In International Conference on Machine Learning, 2024 b
work page 2024
-
[24]
Detecting Adversarial Samples from Artifacts
R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[25]
H. Gangloff, M.-T. Pham, L. Courtrai, and S. Lef \`e vre. Leveraging vector-quantized variational autoencoder inner metrics for anomaly detection. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 435--441. IEEE, 2022
work page 2022
-
[26]
I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples, 2015
work page 2015
-
[27]
On the (Statistical) Detection of Adversarial Examples
K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. McDaniel. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[28]
J. Guo, J. Li, D. Li, A. M. Huat Tiong, B. Li, D. Tao, and S. Hoi. From images to textual prompts: Zero-shot visual question answering with frozen large language models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10867--10877, 2023. doi:10.1109/CVPR52729.2023.01046
-
[29]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016
work page 2016
-
[30]
I. Higgins, L. Matthey, A. Pal, C. P. Burgess, X. Glorot, M. M. Botvinick, S. Mohamed, and A. Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework. ICLR (Poster), 3, 2017
work page 2017
-
[31]
G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. science, 313 0 (5786): 0 504--507, 2006
work page 2006
- [32]
-
[33]
Adversarial Attacks on Neural Network Policies
S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [34]
-
[35]
DenseNet: Implementing Efficient ConvNet Descriptor Pyramids
F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[36]
G. Ilharco, M. Wortsman, R. Wightman, C. Gordon, N. Carlini, R. Taori, A. Dave, V. Shankar, H. Namkoong, J. Miller, H. Hajishirzi, A. Farhadi, and L. Schmidt. Openclip, 2021. URL https://doi.org/10.5281/zenodo.5143773. If you use this software, please cite it as below
-
[37]
E. Jang, S. Gu, and B. Poole. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rkE3y85ee
work page 2017
-
[38]
C. Jia, Y. Yang, Y. Xia, Y.-T. Chen, Z. Parekh, H. Pham, Q. Le, Y.-H. Sung, Z. Li, and T. Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904--4916. PMLR, 2021
work page 2021
-
[39]
D. Kaushik, D. Kiela, Z. C. Lipton, and W.-t. Yih. On the efficacy of adversarial data collection for question answering: Results from a large-scale randomized study. arXiv preprint arXiv:2106.00872, 2021
-
[40]
D. P. Kingma and M. Welling. Auto-Encoding Variational Bayes . In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings , 2014
work page 2014
-
[41]
V. Kovatchev, T. Chatterjee, V. S. Govindarajan, J. Chen, E. Choi, G. Chronis, A. Das, K. Erk, M. Lease, J. J. Li, et al. longhorns at dadc 2022: How many linguists does it take to fool a question answering model? a systematic approach to adversarial attacks. arXiv preprint arXiv:2206.14729, 2022
-
[42]
Adversarial Machine Learning at Scale
A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[43]
A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=BJm4T4Kgx
work page 2017
-
[44]
A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In Artificial intelligence safety and security, pages 99--112. Chapman and Hall/CRC, 2018
work page 2018
-
[45]
C. Li, S. Gao, C. Deng, D. Xie, and W. Liu. Cross-modal learning with adversarial samples. Advances in neural information processing systems, 32, 2019 a
work page 2019
-
[46]
C. Li, S. Gao, C. Deng, W. Liu, and H. Huang. Adversarial attack on deep cross-modal hamming retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2218--2227, 2021 a
work page 2021
-
[47]
D. Li, J. Li, H. Le, G. Wang, S. Savarese, and S. C. Hoi. LAVIS : A one-stop library for language-vision intelligence. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 31--41, Toronto, Canada, July 2023 a . Association for Computational Linguistics. URL https://aclanthology...
work page 2023
-
[48]
J. Li, R. Selvaraju, A. Gotmare, S. Joty, C. Xiong, and S. C. H. Hoi. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34: 0 9694--9705, 2021 b
work page 2021
-
[49]
J. Li, D. Li, C. Xiong, and S. Hoi. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation, 2022
work page 2022
-
[50]
J. Li, D. Li, S. Savarese, and S. Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning, pages 19730--19742. PMLR, 2023 b
work page 2023
-
[51]
L. Li, J. Lei, Z. Gan, and J. Liu. Adversarial vqa: A new benchmark for evaluating the robustness of vqa models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2042--2051, 2021 c
work page 2042
-
[52]
L. H. Li, M. Yatskar, D. Yin, C.-J. Hsieh, and K.-W. Chang. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557, 2019 b
work page internal anchor Pith review Pith/arXiv arXiv 1908
-
[53]
X. Li, X. Yin, C. Li, P. Zhang, X. Hu, L. Zhang, L. Wang, H. Hu, L. Dong, F. Wei, et al. Oscar: Object-semantics aligned pre-training for vision-language tasks. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXX 16, pages 121--137. Springer, 2020
work page 2020
-
[54]
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll \'a r, and C. L. Zitnick. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740--755. Springer, 2014
work page 2014
-
[55]
K. Lis, K. Nakka, M. Salzmann, and P. Fua. Detecting the Unexpected via Image Resynthesis . In International Conference on Computer Vision, 2019
work page 2019
-
[56]
K. Lis, S. Honari, P. Fua, and M. Salzmann. Detecting Road Obstacles by Erasing Them . In Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[57]
H. Liu, C. Li, Q. Wu, and Y. J. Lee. Visual instruction tuning, 2023
work page 2023
-
[58]
Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[59]
C. J. Maddison, A. Mnih, and Y. W. Teh. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
- [60]
-
[61]
A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow. Adversarial autoencoders. In International Conference on Learning Representations, 2016. URL http://arxiv.org/abs/1511.05644
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[62]
D. Meng and H. Chen. Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 135--147, 2017
work page 2017
-
[63]
J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. On detecting adversarial perturbations. In International Conference on Learning Representations, 2017 a . URL https://openreview.net/forum?id=SJzCSf9xg
work page 2017
-
[64]
J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267, 2017 b
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[65]
S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: A simple and accurate method to fool deep neural networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2574--2582, Los Alamitos, CA, USA, jun 2016. IEEE Computer Society. doi:10.1109/CVPR.2016.282. URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.282
-
[66]
Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17 0 (2): 0 527--566, 2017
work page 2017
- [67]
-
[68]
W. Nie, B. Guo, Y. Huang, C. Xiao, A. Vahdat, and A. Anandkumar. Diffusion models for adversarial purification. In International Conference on Machine Learning (ICML), 2022
work page 2022
-
[69]
N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P), pages 372--387. IEEE, 2016 a
work page 2016
-
[70]
N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE symposium on security and privacy (SP), pages 582--597. IEEE, 2016 b
work page 2016
-
[71]
J. S. Park, J. O'Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1--22, 2023
work page 2023
-
[72]
Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, and L. Carin. Variational autoencoder for deep learning of images, labels and captions. Advances in neural information processing systems, 29, 2016
work page 2016
-
[73]
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Lear...
work page 2021
-
[74]
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models, 2022
work page 2022
-
[75]
K. Roth, Y. Kilcher, and T. Hofmann. The odds are odd: A statistical test for detecting adversarial examples. In International Conference on Machine Learning, pages 5498--5507. PMLR, 2019
work page 2019
- [76]
-
[77]
H. Salman, M. Sun, G. Yang, A. Kapoor, and J. Z. Kolter. Denoised smoothing: a provable defense for pretrained classifiers. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS'20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546
work page 2020
-
[78]
P. Samangouei, M. Kabkab, and R. Chellappa. Defense- GAN : Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BkJ3ibb0-
work page 2018
-
[79]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510--4520, 2018
work page 2018
-
[80]
C. Schuhmann, R. Beaumont, R. Vencu, C. W. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, P. Schramowski, S. R. Kundurthy, K. Crowson, L. Schmidt, R. Kaczmarczyk, and J. Jitsev. LAION -5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing S...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.