Robustness of Vision Foundation Models to Common Perturbations

Cheng Hong; Hongbin Liu; Neil Zhenqiang Gong; Zhengyuan Jiang

arxiv: 2604.14973 · v1 · submitted 2026-04-16 · 💻 cs.CR · cs.CV

Robustness of Vision Foundation Models to Common Perturbations

Hongbin Liu , Zhengyuan Jiang , Cheng Hong , Neil Zhenqiang Gong This is my paper

Pith reviewed 2026-05-10 11:02 UTC · model grok-4.3

classification 💻 cs.CR cs.CV

keywords vision foundation modelsrobustness metricsimage perturbationsembedding vectorsdownstream tasksfine-tuningJPEG compressioncontrast adjustment

0 comments

The pith

Vision foundation models are generally non-robust to common perturbations like JPEG compression, brightness, and contrast adjustments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper conducts the first systematic evaluation of six large-scale vision foundation models on how their image embeddings respond to nine everyday editing operations. It introduces three new robustness metrics along with five mathematical properties those metrics should satisfy, then uses the metrics to measure embedding changes and connect them to real drops in downstream task performance. The work shows that these models from major providers are consistently sensitive to the perturbations and that the sensitivity can be reduced through targeted fine-tuning without harming the models' original capabilities. A reader would care because these embeddings power many practical applications, so fragility to routine image changes could make those applications unreliable in normal use.

Core claim

We present the first systematic study on foundation models' robustness to common perturbations that alter embedding vectors. We propose three robustness metrics and formulate five desired mathematical properties for these metrics, analyzing which properties they satisfy or violate. Using these metrics, we evaluate six industry-scale foundation models across nine common perturbation categories, finding them generally non-robust. We also show that common perturbations degrade downstream application performance and that robustness values can predict performance impacts. Finally, we propose a fine-tuning approach to improve robustness without sacrificing utility.

What carries the argument

Three robustness metrics, each checked against five mathematical properties, that quantify changes in embedding vectors caused by image perturbations.

If this is right

Downstream tasks experience measurable drops in accuracy when inputs undergo common perturbations.
The numerical robustness values directly predict the size of those accuracy drops.
Fine-tuning on perturbed examples raises robustness scores while leaving original task utility intact.
Models from different providers exhibit similar patterns of sensitivity across the nine perturbation types.
Embedding-based applications become less reliable unless robustness is explicitly addressed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applications that rely on these embeddings may need to preprocess inputs or adopt the fine-tuning step to maintain consistent behavior.
The metrics could serve as a quick benchmark when comparing new training methods or model architectures for sensitivity to everyday image variation.
Persistent non-robustness might point to deeper limitations in how current training data and objectives handle natural image variability.

Load-bearing premise

The three proposed robustness metrics accurately capture how perturbations affect performance in actual downstream applications.

What would settle it

A new model that scores high on the proposed robustness metrics but still shows large drops in downstream accuracy when the same perturbations are applied would falsify the claim that the metrics track practical impact.

Figures

Figures reproduced from arXiv: 2604.14973 by Cheng Hong, Hongbin Liu, Neil Zhenqiang Gong, Zhengyuan Jiang.

**Figure 2.** Figure 2: An example to illustrate the minimum enclosing ball. The [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Comparing random sampling and equally-spaced sam [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Average DivergenceRadius of ImageNet testing images for different foundation models and perturbation functions. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Accuracy under perturbation ACCp vs. DivergenceRadius of ImageNet testing images for (a) zero-shot classification and (b) linear-probe classification when different perturbation functions are used. Zero-shot classification is based on the CLIP ViT-L/14 foundation model and linear-probe classification is based on the DINO v2 ViT-g/14 foundation model [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Accuracy under perturbation ACCp vs. DivergenceRadius of Food101 testing images for (a) zero-shot classification and (b) linear-probe classification when different perturbation functions are used. Zero-shot classification is based on the CLIP ViT-L/14 foundation model and linear-probe classification is based on the DINO v2 ViT-g/14 foundation model. sion. We divide the dataset, train a linear model on the… view at source ↗

**Figure 7.** Figure 7: (a) Average cosine similarity, (b) average DivergenceRa [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 9.** Figure 9: Mean squared error of predicting an image’s [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 8.** Figure 8: Root mean squared error under perturbation [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 10.** Figure 10: Mean squared error of predicting an image’s [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 12.** Figure 12: Average cosine similarity of ImageNet testing images for different foundation models and perturbation functions. [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗

**Figure 13.** Figure 13: Average DivergenceRadius of Food101 testing images for different foundation models and perturbation functions. [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗

**Figure 14.** Figure 14: Average cosine similarity of Food101 testing images for different foundation models and perturbation functions. [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗

**Figure 15.** Figure 15: Average DivergenceRadius of NYU-Depth V2 testing images for different foundation models and perturbation functions. [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗

**Figure 16.** Figure 16: Average cosine similarity of images in NYU-Depth V2 for different foundation models and perturbation functions. [PITH_FULL_IMAGE:figures/full_fig_p015_16.png] view at source ↗

**Figure 17.** Figure 17: Accuracy under perturbation ACCp vs. cosine similarity of ImageNet and Food101 testing images for zero-shot classification and linear-probe classification when different perturbation functions are used. Zero-shot classification is based on the CLIP ViT-L/14 foundation model and linear-probe classification is based on the DINO v2 ViT-g/14 foundation model. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_17.png] view at source ↗

read the original abstract

A vision foundation model outputs an embedding vector for an image, which can be affected by common editing operations (e.g., JPEG compression, brightness, contrast adjustments). These common perturbations alter embedding vectors and may impact the performance of downstream tasks using these embeddings. In this work, we present the first systematic study on foundation models' robustness to such perturbations. We propose three robustness metrics and formulate five desired mathematical properties for these metrics, analyzing which properties they satisfy or violate. Using these metrics, we evaluate six industry-scale foundation models (OpenAI, Meta) across nine common perturbation categories, finding them generally non-robust. We also show that common perturbations degrade downstream application performance (e.g., classification accuracy) and that robustness values can predict performance impacts. Finally, we propose a fine-tuning approach to improve robustness without sacrificing utility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper claims to perform the first systematic study on the robustness of vision foundation models to common perturbations. It proposes three robustness metrics and analyzes their satisfaction of five mathematical properties. The evaluation covers six industry-scale models from OpenAI and Meta across nine perturbation categories, concluding they are generally non-robust. It demonstrates that these perturbations degrade downstream application performance like classification accuracy, that the robustness metrics can predict such performance impacts, and proposes a fine-tuning approach to improve robustness without sacrificing utility.

Significance. If the results hold, the paper makes a significant contribution by identifying vulnerabilities in widely-used vision foundation models to everyday perturbations, which is important for applications relying on their embeddings. The analysis of mathematical properties for the new metrics and the empirical demonstration of their predictive power for downstream tasks are notable strengths. The proposed fine-tuning method adds practical value. This could encourage the community to prioritize robustness in model development.

minor comments (2)

The abstract outlines the contributions but would benefit from briefly specifying the exact number of models evaluated and perturbation categories to provide a more complete overview at a glance.
In the evaluation results, include error bars, standard deviations, or statistical significance tests for the robustness metric values and downstream performance degradations to better support the 'generally non-robust' conclusion and the predictive claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work, recognition of its significance, and recommendation for minor revision. The referee's description accurately reflects the manuscript's contributions regarding robustness metrics, evaluation of vision foundation models, downstream impact analysis, and the proposed fine-tuning method. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; empirical study with independent metric validation

full rationale

The paper proposes three robustness metrics and five mathematical properties, then explicitly analyzes which properties each metric satisfies or violates. It evaluates six external foundation models on nine perturbation categories using direct measurements, demonstrates downstream task degradation (e.g., classification accuracy) via separate experiments, and shows observed correlations between robustness scores and performance drops. A fine-tuning method is proposed to improve robustness. No derivation reduces by construction to fitted parameters, self-definitions, or self-citation chains; all load-bearing claims rest on external data and explicit property checks rather than renaming or tautological prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are described; the work relies on standard empirical evaluation practices.

pith-pipeline@v0.9.0 · 5438 in / 1009 out tokens · 44913 ms · 2026-05-10T11:02:59.905261+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Are transformers more robust than cnns? InNeurIPS, 2021

Yutong Bai, Jieru Mei, Alan L Yuille, and Cihang Xie. Are transformers more robust than cnns? InNeurIPS, 2021. 8

work page 2021
[2]

Under- standing robustness of transformers for image classification

Srinadh Bhojanapalli, Ayan Chakrabarti, Daniel Glasner, Daliang Li, Thomas Unterthiner, and Andreas Veit. Under- standing robustness of transformers for image classification. InICCV, 2021. 8

work page 2021
[3]

Food-101 – mining discriminative components with random forests

Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101 – mining discriminative components with random forests. InECCV, 2014. 5

work page 2014
[4]

Towards evaluating the robustness of neural networks

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. InS&P, 2017. 1

work page 2017
[5]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geof- frey Hinton. A simple framework for contrastive learning of visual representations. InICML, 2020. 8

work page 2020
[6]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, 2009. 5

work page 2009
[7]

When does contrastive learning preserve ad- versarial robustness from pretraining to finetuning? 2021

Lijie Fan, Sijia Liu, Pin-Yu Chen, Gaoyuan Zhang, and Chuang Gan. When does contrastive learning preserve ad- versarial robustness from pretraining to finetuning? 2021. 1

work page 2021
[8]

Momentum contrast for unsupervised visual repre- sentation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual repre- sentation learning. InCVPR, 2020. 8

work page 2020
[9]

Benchmarking neural network robustness to common corruptions and perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. InICLR, 2019. 5, 8

work page 2019
[10]

The many faces of robust- ness: A critical analysis of out-of-distribution generalization

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kada- vath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The many faces of robust- ness: A critical analysis of out-of-distribution generalization. InICCV, 2021. 8

work page 2021
[11]

Baden- coder: Backdoor attacks to pre-trained encoders in self- supervised learning

Jinyuan Jia, Yupei Liu, and Neil Zhenqiang Gong. Baden- coder: Backdoor attacks to pre-trained encoders in self- supervised learning. InS&P, 2022. 1

work page 2022
[12]

Robust pre-training by adversarial contrastive learning

Ziyu Jiang, Tianlong Chen, Ting Chen, and Zhangyang Wang. Robust pre-training by adversarial contrastive learning. NeurIPS, 2020

work page 2020
[13]

Evading watermark based detection of ai-generated content

Zhengyuan Jiang, Jinghuai Zhang, and Neil Zhenqiang Gong. Evading watermark based detection of ai-generated content. InCCS, 2023

work page 2023
[14]

An embarrassingly simple backdoor attack on self-supervised learning

Changjiang Li, Ren Pang, Zhaohan Xi, Tianyu Du, Shouling Ji, Yuan Yao, and Ting Wang. An embarrassingly simple backdoor attack on self-supervised learning. InICCV, 2023. 1

work page 2023
[15]

Blip: Bootstrapping language-image pre-training for unified vision- language understanding and generation

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for unified vision- language understanding and generation. InICML, 2022. 8

work page 2022
[16]

Poi- sonedEncoder: Poisoning the unlabeled pre-training data in contrastive learning

Hongbin Liu, Jinyuan Jia, and Neil Zhenqiang Gong. Poi- sonedEncoder: Poisoning the unlabeled pre-training data in contrastive learning. InUSENIX Security Symposium, 2022. 1

work page 2022
[17]

Indoor segmentation and support inference from rgbd images

Pushmeet Kohli Nathan Silberman, Derek Hoiem and Rob Fergus. Indoor segmentation and support inference from rgbd images. InECCV, 2012. 5

work page 2012
[18]

Dinov2: Learning robust visual features without supervision.arXiv,

Maxime Oquab, Timoth´ee Darcet, Th´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv,

work page
[19]

Vision transformers are robust learners

Sayak Paul and Pin-Yu Chen. Vision transformers are robust learners. InAAAI, 2022. 8

work page 2022
[20]

Reaas: En- abling adversarially robust downstream classifiers via robust encoder as a service

Wenjie Qu, Jinyuan Jia, and Neil Zhenqiang Gong. Reaas: En- abling adversarially robust downstream classifiers via robust encoder as a service. InNDSS, 2023. 1

work page 2023
[21]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InICML, 2021. 1, 4, 8

work page 2021
[22]

Backdoor attacks on self- supervised learning

Aniruddha Saha, Ajinkya Tejankar, Soroush Abbasi Kooh- payegani, and Hamed Pirsiavash. Backdoor attacks on self- supervised learning. InCVPR, 2022. 1

work page 2022
[23]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. InICLR, 2014. 1

work page 2014
[24]

Can cnns be more robust than transformers? InICLR, 2023

Zeyu Wang, Yutong Bai, Yuyin Zhou, and Cihang Xie. Can cnns be more robust than transformers? InICLR, 2023. 8

work page 2023
[25]

Smallest enclosing disks (balls and ellipsoids)

Emo Welzl. Smallest enclosing disks (balls and ellipsoids). In New Results and New Trends in Computer Science. Springer,

work page
[26]

Promptbench: Towards evaluating the robustness of large language models on adversarial prompts

Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Neil Zhenqiang Gong, Yue Zhang, et al. Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. arXiv, 2023. 8 9 A. Impact Statements This work strengthens the reliability of vision foundation models by systematically eval...

work page 2023
[27]

Without loss of generality, we assume the subdomain K′ contains n discrete values k1,· · ·, k n. Then, we have the following: nX i=1 f(P(x, k i)) =0.(13) Based on Equation 10, we have the following equation group:    ||f(P(x, k 1))||2 2 −2f T (P(x, k 1))·c+||c|| 2 2 ≤r 2 ||f(P(x, k 2))||2 2 −2f T (P(x, k 2))·c+||c|| 2 2 ≤r 2 · · · ||f(P(x, k n))||2...

work page

[1] [1]

Are transformers more robust than cnns? InNeurIPS, 2021

Yutong Bai, Jieru Mei, Alan L Yuille, and Cihang Xie. Are transformers more robust than cnns? InNeurIPS, 2021. 8

work page 2021

[2] [2]

Under- standing robustness of transformers for image classification

Srinadh Bhojanapalli, Ayan Chakrabarti, Daniel Glasner, Daliang Li, Thomas Unterthiner, and Andreas Veit. Under- standing robustness of transformers for image classification. InICCV, 2021. 8

work page 2021

[3] [3]

Food-101 – mining discriminative components with random forests

Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101 – mining discriminative components with random forests. InECCV, 2014. 5

work page 2014

[4] [4]

Towards evaluating the robustness of neural networks

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. InS&P, 2017. 1

work page 2017

[5] [5]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geof- frey Hinton. A simple framework for contrastive learning of visual representations. InICML, 2020. 8

work page 2020

[6] [6]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, 2009. 5

work page 2009

[7] [7]

When does contrastive learning preserve ad- versarial robustness from pretraining to finetuning? 2021

Lijie Fan, Sijia Liu, Pin-Yu Chen, Gaoyuan Zhang, and Chuang Gan. When does contrastive learning preserve ad- versarial robustness from pretraining to finetuning? 2021. 1

work page 2021

[8] [8]

Momentum contrast for unsupervised visual repre- sentation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual repre- sentation learning. InCVPR, 2020. 8

work page 2020

[9] [9]

Benchmarking neural network robustness to common corruptions and perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. InICLR, 2019. 5, 8

work page 2019

[10] [10]

The many faces of robust- ness: A critical analysis of out-of-distribution generalization

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kada- vath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The many faces of robust- ness: A critical analysis of out-of-distribution generalization. InICCV, 2021. 8

work page 2021

[11] [11]

Baden- coder: Backdoor attacks to pre-trained encoders in self- supervised learning

Jinyuan Jia, Yupei Liu, and Neil Zhenqiang Gong. Baden- coder: Backdoor attacks to pre-trained encoders in self- supervised learning. InS&P, 2022. 1

work page 2022

[12] [12]

Robust pre-training by adversarial contrastive learning

Ziyu Jiang, Tianlong Chen, Ting Chen, and Zhangyang Wang. Robust pre-training by adversarial contrastive learning. NeurIPS, 2020

work page 2020

[13] [13]

Evading watermark based detection of ai-generated content

Zhengyuan Jiang, Jinghuai Zhang, and Neil Zhenqiang Gong. Evading watermark based detection of ai-generated content. InCCS, 2023

work page 2023

[14] [14]

An embarrassingly simple backdoor attack on self-supervised learning

Changjiang Li, Ren Pang, Zhaohan Xi, Tianyu Du, Shouling Ji, Yuan Yao, and Ting Wang. An embarrassingly simple backdoor attack on self-supervised learning. InICCV, 2023. 1

work page 2023

[15] [15]

Blip: Bootstrapping language-image pre-training for unified vision- language understanding and generation

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for unified vision- language understanding and generation. InICML, 2022. 8

work page 2022

[16] [16]

Poi- sonedEncoder: Poisoning the unlabeled pre-training data in contrastive learning

Hongbin Liu, Jinyuan Jia, and Neil Zhenqiang Gong. Poi- sonedEncoder: Poisoning the unlabeled pre-training data in contrastive learning. InUSENIX Security Symposium, 2022. 1

work page 2022

[17] [17]

Indoor segmentation and support inference from rgbd images

Pushmeet Kohli Nathan Silberman, Derek Hoiem and Rob Fergus. Indoor segmentation and support inference from rgbd images. InECCV, 2012. 5

work page 2012

[18] [18]

Dinov2: Learning robust visual features without supervision.arXiv,

Maxime Oquab, Timoth´ee Darcet, Th´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv,

work page

[19] [19]

Vision transformers are robust learners

Sayak Paul and Pin-Yu Chen. Vision transformers are robust learners. InAAAI, 2022. 8

work page 2022

[20] [20]

Reaas: En- abling adversarially robust downstream classifiers via robust encoder as a service

Wenjie Qu, Jinyuan Jia, and Neil Zhenqiang Gong. Reaas: En- abling adversarially robust downstream classifiers via robust encoder as a service. InNDSS, 2023. 1

work page 2023

[21] [21]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InICML, 2021. 1, 4, 8

work page 2021

[22] [22]

Backdoor attacks on self- supervised learning

Aniruddha Saha, Ajinkya Tejankar, Soroush Abbasi Kooh- payegani, and Hamed Pirsiavash. Backdoor attacks on self- supervised learning. InCVPR, 2022. 1

work page 2022

[23] [23]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. InICLR, 2014. 1

work page 2014

[24] [24]

Can cnns be more robust than transformers? InICLR, 2023

Zeyu Wang, Yutong Bai, Yuyin Zhou, and Cihang Xie. Can cnns be more robust than transformers? InICLR, 2023. 8

work page 2023

[25] [25]

Smallest enclosing disks (balls and ellipsoids)

Emo Welzl. Smallest enclosing disks (balls and ellipsoids). In New Results and New Trends in Computer Science. Springer,

work page

[26] [26]

Promptbench: Towards evaluating the robustness of large language models on adversarial prompts

Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Neil Zhenqiang Gong, Yue Zhang, et al. Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. arXiv, 2023. 8 9 A. Impact Statements This work strengthens the reliability of vision foundation models by systematically eval...

work page 2023

[27] [27]

Without loss of generality, we assume the subdomain K′ contains n discrete values k1,· · ·, k n. Then, we have the following: nX i=1 f(P(x, k i)) =0.(13) Based on Equation 10, we have the following equation group:    ||f(P(x, k 1))||2 2 −2f T (P(x, k 1))·c+||c|| 2 2 ≤r 2 ||f(P(x, k 2))||2 2 −2f T (P(x, k 2))·c+||c|| 2 2 ≤r 2 · · · ||f(P(x, k n))||2...

work page