pith. sign in

arxiv: 2606.03730 · v2 · pith:EZREJC27new · submitted 2026-06-02 · 💻 cs.CV

Beyond False Stability: High-Noise Drift Gating for Test-Time Adversarial Defenses in Vision-Language Models

Pith reviewed 2026-06-28 11:01 UTC · model grok-4.3

classification 💻 cs.CV
keywords adversarial robustnesstest-time defensevision-language modelsCLIPfeature driftnoise perturbationclean-robust trade-offrepresentation stability
0
0 comments X

The pith

High-noise feature drift in CLIP representations serves as a gating signal to selectively trigger test-time adversarial defenses, improving the clean-robust accuracy trade-off.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that adversarial representations in CLIP become markedly more unstable than clean ones once perturbation strength exceeds the weak-noise regime, reversing an earlier false-stability effect and supplying a usable separation signal. This transition holds across noise types, transforms, datasets, and attacks, and largely vanishes after adversarial training. The authors therefore introduce a training-free, plug-in drift-gated mechanism that monitors high-noise feature drift and activates existing defenses only on inputs showing adversarial-like instability. The gating raises mean clean-plus-adversarial accuracy on eight fine-grained datasets from 65.7 percent to 71.4 percent for counterattack methods and from 68.4 percent to 73.2 percent for noise-anchoring, with similar gains on ImageNet and its shifts.

Core claim

In non-robust CLIP, adversarial representations exhibit markedly higher instability under high-noise perturbations than clean representations, in contrast to the false stability observed at weak noise; this geometry-dependent transition supplies a lightweight, training-free gating signal that selectively activates test-time defenses only when high-noise drift indicates adversarial-like behavior, producing consistent gains in the clean-robust trade-off across thirteen datasets and multiple attack families.

What carries the argument

Drift-gated mechanism: a lightweight check on high-noise feature drift that gates the application of existing test-time defenses when instability exceeds clean levels.

If this is right

  • The mechanism raises mean clean-plus-adversarial accuracy from 65.7 percent to 71.4 percent for counterattack defenses and from 68.4 percent to 73.2 percent for noise-anchoring on eight fine-grained datasets.
  • On ImageNet and four distribution shifts the same gating lifts accuracy from 56.1 percent to 66.2 percent and from 62.1 percent to 67.6 percent.
  • The separation signal disappears in adversarially trained models, indicating dependence on the fragile local-basin geometry of non-robust representations.
  • The transition is reproducible with both uniform and Gaussian noise as well as photometric and geometric transforms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same high-noise instability signal could be tested as a lightweight detector for other representation vulnerabilities beyond adversarial examples.
  • Selective gating may lower the average compute cost of always-on test-time defenses by skipping them on clean inputs.
  • If the geometry link holds, adversarially trained models may require different test-time strategies that do not rely on drift separation.

Load-bearing premise

The stability reversal from false stability at low noise to clear adversarial instability at high noise remains consistent across noise distributions, image transforms, datasets, and attack methods in non-robust CLIP models.

What would settle it

An experiment on a held-out VLM or dataset in which high-noise drift fails to separate clean from adversarial examples more reliably than low-noise drift, or in which the gating step produces no net gain in clean-plus-adversarial accuracy.

Figures

Figures reproduced from arXiv: 2606.03730 by Hashmat Shadab Malik, Muzammal Naseer, Salman Khan.

Figure 1
Figure 1. Figure 1: Clean (blue), Adversarial (yellow; PGD-100, ε=4/255), and Average accuracy (green; (clean+adversarial)/2) for test-time defense methods, averaged across eight fine￾grained datasets. Current methods [43, 50] employ a low-strength random-noise (RN) base￾line at noise strength similar to the attack (ε=4/255), which provides no effective robust￾ness. However, applying high-strength random noise (RN, ε=48/255) … view at source ↗
Figure 2
Figure 2. Figure 2: Mean latent drift (τ) versus uniform noise strength (ε) for clean and adversarial samples across eight fine-grained datasets. Under weak noise, adversarial samples exhibit slightly lower drift than clean samples (false stability). As noise strength increases, the curves cross and adversarial drift becomes substantially larger than clean drift, yielding a reliable high-noise separation signal. between clean… view at source ↗
Figure 3
Figure 3. Figure 3: Mean latent drift (τ) under Gaussian noise(left) and diverse stochastic transfor￾mations (right), averaged across eight fine-grained datasets. The left panel shows the same regime transition as uniform noise, moving from low-noise false stability to high-noise sep￾arability. The right panel confirms that this separation is not specific to additive noise: it also appears under high-strength photometric, geo… view at source ↗
Figure 4
Figure 4. Figure 4: Mean latent drift (τ) under diverse transformations, averaged across eight fine￾grained datasets. Each subplot corresponds to a single transformation, with transformation severity increasing from left to right. For clarity, the exact transformation strengths are omitted and only the general progression toward stronger transformation is shown. Adver￾sarial samples exhibit progressively larger latent drift t… view at source ↗
Figure 5
Figure 5. Figure 5: Top: PCA projection of CLIP visual features. Adversarial samples (blue dots) are displaced far from the origin into isolated local basins. As uniform noise strength increases (ε ∈ {4,8,16,48}/255), mean positions of adversarial-plus-noise features migrate progres￾sively toward the clean-plus-noise cluster, tracing the escape trajectory back toward the clean semantic manifold. Bottom-left and center: Distri… view at source ↗
Figure 7
Figure 7. Figure 7: Mean latent drift under uniform noise for a non-robust CLIP model under image [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Average performance across eight fine-grained datasets under stronger PGD at [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: ViT-L-14(DataComp-1B) at PGD-100 (ε = 4 100 ). Mean latent drift (τ) versus uni￾form noise strength (ε) for clean and adversarial samples across eight fine-grained datasets. Under weak noise, adversarial samples exhibit slightly lower drift than clean samples (false stability). As noise strength increases, the curves cross and adversarial drift becomes sub￾stantially larger than clean drift, yielding a rel… view at source ↗
Figure 10
Figure 10. Figure 10: ViT-L-14(DataComp-1B) at PGD-100 (ε = 8 100 ). Mean latent drift (τ) ver￾sus uniform noise strength (ε) for clean and adversarial samples across eight fine-grained datasets. Under weak noise, adversarial samples exhibit slightly lower drift than clean sam￾ples (false stability). As noise strength increases, the curves cross and adversarial drift be￾comes substantially larger than clean drift, yielding a r… view at source ↗
Figure 11
Figure 11. Figure 11: ViT-L-14(DataComp-1B) at PGD-100 (ε = 4 100 ). Mean latent drift (τ) ver￾sus gaussian noise strength (σ) for clean and adversarial samples across eight fine-grained datasets. Under weak noise, adversarial samples exhibit slightly lower drift than clean sam￾ples (false stability). As noise strength increases, the curves cross and adversarial drift be￾comes substantially larger than clean drift, yielding a … view at source ↗
Figure 12
Figure 12. Figure 12: ViT-L-14(DataComp-1B) at PGD-100 (ε = 8 100 ). Mean latent drift (τ) ver￾sus gaussian noise strength (σ) for clean and adversarial samples across eight fine-grained datasets. Under weak noise, adversarial samples exhibit slightly lower drift than clean sam￾ples (false stability). As noise strength increases, the curves cross and adversarial drift be￾comes substantially larger than clean drift, yielding a … view at source ↗
Figure 13
Figure 13. Figure 13: ViT-L-14(DataComp-1B) at PGD-100 (ε = X 255 ). Mean latent drift (τ) versus uniform noise strength for clean and adversarial samples averaged across eight fine-grained datasets. Adversarial examples are generated at different perturbation budgets and using two attack objectives: the standard objective that maximizes cross-entropy loss, and a vision￾only objective (Img) that maximizes the discrepancy in vi… view at source ↗
Figure 14
Figure 14. Figure 14: ViT-L-14(DataComp-1B) at PGD-100 (ε = X 255 ). Mean latent drift (τ) versus gaussian noise strength for clean and adversarial samples averaged across eight fine-grained datasets. Adversarial examples are generated at different perturbation budgets and using two attack objectives: the standard objective that maximizes cross-entropy loss, and a vision￾only objective (Img) that maximizes the discrepancy in v… view at source ↗
Figure 15
Figure 15. Figure 15: ViT-L-14(DataComp-1B) at PGD-100 (ε = 4 255 ). Mean latent drift (τ) versus stochastic transformations [1, 37] for clean and adversarial samples across eight fine-grained datasets. Under different transformations, adversarial drift becomes substantially larger than clean drift, yielding a reliable high separation signal [PITH_FULL_IMAGE:figures/full_fig_p029_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: ViT-L-14(DataComp-1B) at PGD-100 (ε = 4 100 ). Mean latent drift (τ) versus uniform noise strength (ε) for clean and adversarial samples across ImageNet and its four out of distribution datasets. Under weak noise, adversarial samples exhibit slightly lower drift than clean samples (false stability). As noise strength increases, the curves cross and adversarial drift becomes substantially larger than clean… view at source ↗
Figure 17
Figure 17. Figure 17: ViT-L-14(DataComp-1B) at PGD-100 (ε = 4 100 ). Mean latent drift (τ) versus gaussian noise strength (σ) for clean and adversarial samples across ImageNet and its four out of distribution datasets. Under weak noise, adversarial samples exhibit slightly lower drift than clean samples (false stability). As noise strength increases, the curves cross and adversarial drift becomes substantially larger than clea… view at source ↗
Figure 18
Figure 18. Figure 18: ViT-L-14 at PGD-100 (ε = 4 100 ). Mean latent drift (τ) versus uniform noise strength (σ) for clean and adversarial samples across eight fine-grained datasets. Under weak noise, adversarial samples exhibit slightly lower drift than clean samples (false stability). As noise strength increases, the curves cross and adversarial drift becomes substantially larger than clean drift, yielding a reliable high-noi… view at source ↗
Figure 19
Figure 19. Figure 19: ViT-L-14 at PGD-100 (ε = 4 100 ). Mean latent drift (τ) versus gaussian noise strength (σ) for clean and adversarial samples across eight fine-grained datasets. Under weak noise, adversarial samples exhibit slightly lower drift than clean samples (false stability). As noise strength increases, the curves cross and adversarial drift becomes substantially larger than clean drift, yielding a reliable high-no… view at source ↗
Figure 20
Figure 20. Figure 20: ViT-L-14 at PGD-100 (ε = X 255 ). Mean latent drift (τ) versus uniform noise strength for clean and adversarial samples averaged across eight fine-grained datasets. Ad￾versarial examples are generated at different perturbation budgets and using two attack ob￾jectives: the standard objective that maximizes cross-entropy loss, and a vision-only objec￾tive (Img) that maximizes the discrepancy in visual featu… view at source ↗
Figure 21
Figure 21. Figure 21: ViT-L-14 at PGD-100 (ε = X 255 ). Mean latent drift (τ) versus gaussian noise strength for clean and adversarial samples averaged across eight fine-grained datasets. Ad￾versarial examples are generated at different perturbation budgets and using two attack ob￾jectives: the standard objective that maximizes cross-entropy loss, and a vision-only objec￾tive (Img) that maximizes the discrepancy in visual feat… view at source ↗
Figure 22
Figure 22. Figure 22: FARE [35] at PGD-100 (ε = 4 100 ). Mean latent drift (τ) versus uniform noise strength (σ) for clean and adversarial samples across eight fine-grained datasets. In contrast to standard CLIP models, clean and adversarial samples exhibit closely aligned drift curves across perturbation strengths, with little pronounced separation. This behavior is consistent with adversarial training, which explicitly encou… view at source ↗
Figure 23
Figure 23. Figure 23: FARE [35] at PGD-100 (ε = 4 100 ). Mean latent drift (τ) versus gaussian noise strength (σ) for clean and adversarial samples across eight fine-grained datasets.In contrast to standard CLIP models, clean and adversarial samples exhibit closely aligned drift curves across perturbation strengths, with little pronounced separation. This behavior is consistent with adversarial training, which explicitly encou… view at source ↗
Figure 24
Figure 24. Figure 24: FARE [35] at PGD-100 (ε = X 255 ). Mean latent drift (τ) versus uniform noise strength for clean and adversarial samples averaged across eight fine-grained datasets. Ad￾versarial examples are generated at different perturbation budgets and using two attack ob￾jectives: the standard objective that maximizes cross-entropy loss, and a vision-only objec￾tive (Img) that maximizes the discrepancy in visual feat… view at source ↗
Figure 25
Figure 25. Figure 25: FARE [35] at PGD-100 (ε = X 255 ). Mean latent drift (τ) versus gaussian noise strength for clean and adversarial samples averaged across eight fine-grained datasets. Ad￾versarial examples are generated at different perturbation budgets and using two attack ob￾jectives: the standard objective that maximizes cross-entropy loss, and a vision-only objec￾tive (Img) that maximizes the discrepancy in visual fea… view at source ↗
Figure 26
Figure 26. Figure 26: DeltaCLIP-L [47] at PGD-100 (ε = 4 100 ). Mean latent drift (τ) versus uniform noise strength (σ) for clean and adversarial samples across eight fine-grained datasets. In contrast to standard CLIP models, clean and adversarial samples exhibit closely aligned drift curves across perturbation strengths, with little pronounced separation. This behavior is consistent with adversarial training, which explicitl… view at source ↗
Figure 27
Figure 27. Figure 27: DeltaCLIP-L [47] at PGD-100 (ε = 4 100 ). Mean latent drift (τ) versus gaussian noise strength (σ) for clean and adversarial samples across eight fine-grained datasets. In contrast to standard CLIP models, clean and adversarial samples exhibit closely aligned drift curves across perturbation strengths, with little pronounced separation. This behavior is consistent with adversarial training, which explicit… view at source ↗
Figure 28
Figure 28. Figure 28: DeltaCLIP-L [47] at PGD-100 (ε = X 255 ). Mean latent drift (τ) versus uniform noise strength for clean and adversarial samples averaged across eight fine-grained datasets. Adversarial examples are generated at different perturbation budgets and using two attack objectives: the standard objective that maximizes cross-entropy loss, and a vision-only ob￾jective (Img) that maximizes the discrepancy in visual… view at source ↗
Figure 29
Figure 29. Figure 29: DeltaCLIP-L [47] at PGD-100 (ε = X 255 ). Mean latent drift (τ) versus gaussian noise strength for clean and adversarial samples averaged across eight fine-grained datasets. Adversarial examples are generated at different perturbation budgets and using two attack objectives: the standard objective that maximizes cross-entropy loss, and a vision-only ob￾jective (Img) that maximizes the discrepancy in visua… view at source ↗
Figure 30
Figure 30. Figure 30: ViT-L-14(DataComp-1B). Clean and adversarial accuracy versus uniform noise strength, averaged across eight fine-grained datasets. Adversarial examples (PGD-100) are generated at different perturbation budgets and with two attack objectives: maximizing cross-entropy loss and maximizing the discrepancy in visual features (Img). Across all set￾tings, stronger noise leads to clear improvements in adversarial … view at source ↗
Figure 31
Figure 31. Figure 31: ViT-L-14(DataComp-1B). Clean and adversarial accuracy versus Gaussian noise strength, averaged across eight fine-grained datasets. Adversarial examples (PGD-100) are generated at different perturbation budgets and with two attack objectives: maximizing cross-entropy loss and maximizing the discrepancy in visual features (Img). Across settings, moderate Gaussian noise improves adversarial accuracy, whereas… view at source ↗
Figure 33
Figure 33. Figure 33: Performance of drift-gated stochastic intervention using Gaussian noise. Clean (blue), adversarial (red; PGD-100, ε = 4/255), and average accuracy (purple) are reported after selectively injecting Gaussian noise into the input. Each plot corresponds to a different level of Gaussian noise strength (σ ∈ {0.12,0.18,0.24}) injected onto the samples. For every intervention setting, the probe noise used to comp… view at source ↗
Figure 34
Figure 34. Figure 34: ViT-L-14. Clean and adversarial accuracy versus uniform noise strength, aver￾aged across eight fine-grained datasets. Adversarial examples (PGD-100) are generated at different perturbation budgets and with two attack objectives: maximizing cross-entropy loss and maximizing the discrepancy in visual features (Img). Across all settings, stronger noise leads to clear improvements in adversarial accuracy, whi… view at source ↗
Figure 35
Figure 35. Figure 35: ViT-L-14. Clean and adversarial accuracy versus Gaussian noise strength, av￾eraged across eight fine-grained datasets. Adversarial examples (PGD-100) are generated at different perturbation budgets and with two attack objectives: maximizing cross-entropy loss and maximizing the discrepancy in visual features (Img). Across settings, moderate Gaussian noise improves adversarial accuracy, whereas stronger no… view at source ↗
Figure 36
Figure 36. Figure 36: FARE [35]. Clean and adversarial accuracy versus uniform noise strength, av￾eraged across eight fine-grained datasets. Adversarial examples (PGD-100) are generated at different perturbation budgets and with two attack objectives: maximizing cross-entropy loss and maximizing the discrepancy in visual features (Img). In contrast to standard CLIP models, noise injection does not improve adversarial accuracy;… view at source ↗
Figure 37
Figure 37. Figure 37: FARE [35]. Clean and adversarial accuracy versus Gaussian noise strength, av￾eraged across eight fine-grained datasets. Adversarial examples (PGD-100) are generated at different perturbation budgets and with two attack objectives: maximizing cross-entropy loss and maximizing the discrepancy in visual features (Img). In contrast to standard CLIP models, noise injection does not improve adversarial accuracy… view at source ↗
Figure 38
Figure 38. Figure 38: DeltaCLIP-L [47]. Clean and adversarial accuracy versus uniform noise strength, averaged across eight fine-grained datasets. Adversarial examples (PGD-100) are generated at different perturbation budgets and with two attack objectives: maximiz￾ing cross-entropy loss and maximizing the discrepancy in visual features (Img). In contrast to standard CLIP models, noise injection does not improve adversarial ac… view at source ↗
Figure 39
Figure 39. Figure 39: DeltaCLIP-L [47]. Clean and adversarial accuracy versus Gaussian noise strength, averaged across eight fine-grained datasets. Adversarial examples (PGD-100) are generated at different perturbation budgets and with two attack objectives: maximiz￾ing cross-entropy loss and maximizing the discrepancy in visual features (Img). In contrast to standard CLIP models, noise injection does not improve adversarial a… view at source ↗
Figure 40
Figure 40. Figure 40: Standard TTC [50] evalua￾tion averaged across eight fine-grained datasets. Clean (blue) and adversarial (red; PGD-100, ε = 4/255) accuracy are shown as a function of the false-stability threshold τTTC, below which the counter￾attack is triggered. The gating signal is computed under weak uniform probe noise ε ∈ {1,2,4}/255, following the original TTC protocol [50]. The best clean–robust trade-off is achiev… view at source ↗
Figure 41
Figure 41. Figure 41: Ours: Drift-gated TTC av￾eraged across eight fine-grained datasets using uniform probe noise. Clean (blue) and adversarial (red; PGD-100, ε = 4 /255) accuracy are shown as a function of the drift threshold γ, above which the counter￾attack is triggered. Unlike TTC’s weak￾noise false-stability trigger, our gating signal is computed under high-noise uni￾form probe strengths ε ∈ {24,32,40}/255, providing a m… view at source ↗
Figure 42
Figure 42. Figure 42: Ours: Drift-gated TTC av￾eraged across eight fine-grained datasets using Gaussian probe noise. Clean (blue) and adversarial (red; PGD-100, ε = 4/255) accuracy are shown as a function of the drift threshold γ, above which the counter￾attack is triggered. Unlike TTC’s weak￾noise false-stability trigger, our gating sig￾nal is computed under high-noise Gaussian probe strengths σ ∈ {0.06,0.12,0.18}, providing … view at source ↗
Figure 43
Figure 43. Figure 43: Ours: Drift-gated TTC aver￾aged across ImageNet and its four out-of￾distribution variants using uniform probe noise. Clean (blue) and adversarial (red; PGD-100, ε = 4 /255) accuracy are shown as a function of the drift threshold γ, above which the counterattack is trig￾gered. The gating signal is computed under high-noise uni￾form probe strengths ε ∈ {24,32,40}/255. As γ increases, TTC is applied more se￾… view at source ↗
Figure 44
Figure 44. Figure 44: Ours: Drift-gated TTC averaged across Im￾ageNet and its four out-of-distribution variants using Gaus￾sian probe noise. Clean (blue), adver￾sarial (red; PGD-100, ε = 4/255), and average (purple) ac￾curacy are shown as a function of the drift threshold γ, above which the counter￾attack is triggered. The gating signal is computed under high-noise Gaussian probe strengths σ ∈ {0.06,0.12,0.18}. As γ increases,… view at source ↗
Figure 45
Figure 45. Figure 45: Evaluation averaged across eight fine-grained datasets. Performance of our mean latent-drift–gated AOM [43]. Average accuracy is reported as a function of the latent drift threshold τ. Uniform noisy anchors with different perturbation strengths are used for feature interpolation, while the type and strength of noise used to compute mean latent drift are specified in each figure. A threshold τ = 0 correspo… view at source ↗
Figure 46
Figure 46. Figure 46: Evaluation averaged across eight fine-grained datasets. Performance of our mean latent-drift–gated AOM [43]. Average accuracy is reported as a function of the latent drift threshold τ. Gaussian noisy anchors with different perturbation strengths are used for feature interpolation, while the type and strength of noise used to compute mean latent drift are specified in each figure. A threshold τ = 0 corresp… view at source ↗
Figure 47
Figure 47. Figure 47: Evaluation averaged across ImageNet and its four out-of-distribution variants. Perfor￾mance of our mean latent-drift–gated AOM [43]. Average accuracy is reported as a function of the latent drift threshold τ. Uniform and gaussian noisy anchors are used for feature interpolation, while the type and strength of noise used to compute mean latent drift are specified in each figure. A threshold τ = 0 correspon… view at source ↗
read the original abstract

Vision-language models (VLMs) such as CLIP show strong zero-shot generalization but remain highly vulnerable to adversarial attacks. Adversarial training improves robustness but is computationally expensive, motivating test-time defenses. Recent approaches exploit how CLIP's visual representations respond to stochastic perturbations: aggregating predictions across noisy views, constructing Gaussian noise-averaged anchors and interpolating features toward them, or applying counter-perturbations. These strategies improve robustness but often degrade clean accuracy, yielding an unfavorable clean-robust trade-off. We revisit stochastic test-time defenses and identify an underexplored noise-regime transition in CLIP's representation space. Prior work explored perturbations mainly in the weak-noise regime, where adversarial examples can appear unusually stable (false stability). Our analysis shows this reverses as perturbation strength grows: beyond the weak-noise regime, adversarial representations become markedly more unstable than clean ones, giving a clearer separation signal. The transition is consistent across uniform and Gaussian noise, photometric and geometric transforms, datasets, and diverse attacks. It largely disappears in adversarially trained models, suggesting it is tied to the fragile local-basin geometry of adversarial representations in non-robust CLIP. We propose a training-free, plug-in drift-gated mechanism that uses high-noise feature drift as a lightweight gating signal to trigger existing test-time defenses only when adversarial-like instability is detected. Across 13 datasets it consistently improves the clean-robust trade-off. On eight fine-grained datasets, mean clean+adversarial accuracy rises from 65.7% to 71.4% for counterattack defenses and 68.4% to 73.2% for noise-anchoring; on ImageNet and four shifted variants, from 56.1% to 66.2% and 62.1% to 67.6%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that CLIP-style VLMs exhibit a noise-regime transition in which adversarial representations become markedly more unstable than clean ones beyond the weak-noise regime (consistent across noise types, transforms, datasets, and attacks), and that this transition largely vanishes under adversarial training. It introduces a training-free drift-gated mechanism that uses high-noise feature drift as a gating signal to selectively trigger existing test-time defenses (counterattack and noise-anchoring), reporting mean clean+adversarial accuracy gains from 65.7% to 71.4% and 68.4% to 73.2% on eight fine-grained datasets and from 56.1% to 66.2% / 62.1% to 67.6% on ImageNet and shifts.

Significance. If the transition is reproducible with a precisely defined instability metric and the gating improves the trade-off without implicit ensembling or post-hoc regime selection, the work supplies a lightweight, plug-in enhancement to stochastic test-time defenses that exploits the local-basin geometry of non-robust CLIP representations. The cross-condition consistency and disappearance under adversarial training would be a useful empirical observation for the field.

major comments (3)
  1. [Abstract; §3 (methods)] The abstract and methods description assert a 'sharp, reproducible increase' in instability and a 'consistent' transition across noise types, transforms, datasets, and attacks, yet supply neither the exact instability metric (e.g., cosine distance after N high-noise views, variance of normalized features) nor the numerical noise-strength boundary separating regimes. This is load-bearing for the central claim and for verifying that the reported accuracy lifts are attributable to the gating signal rather than extra forward passes.
  2. [§4 (experiments); Table 1] Table 1 / §4 results report aggregate mean clean+adversarial accuracy improvements (65.7% → 71.4%, 68.4% → 73.2%) but do not include per-dataset clean vs. adversarial breakdowns, standard deviations, or an ablation isolating the gating decision from the base defense. Without these, it is impossible to confirm that the mechanism triggers only on adversarial-like instability or that the gains are not artifacts of implicit ensembling.
  3. [§3.3; §4.3] The claim that the stability transition 'largely disappears in adversarially trained models' is used to attribute the phenomenon to fragile local-basin geometry, but no quantitative comparison (drift curves or tables) of AT vs. non-AT models is referenced. This evidence is required to support the geometric interpretation that underpins the gating design.
minor comments (2)
  1. [§3.2] Notation for the drift signal (e.g., which layer(s), normalization, exact aggregation) should be formalized with an equation in §3.2 to aid reproducibility.
  2. [Abstract] The abstract would be clearer if it stated the precise instability measure and the number of high-noise views used for gating.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the referee's insightful comments. We address each major point below and commit to revisions that enhance clarity and completeness of the manuscript.

read point-by-point responses
  1. Referee: [Abstract; §3 (methods)] The abstract and methods description assert a 'sharp, reproducible increase' in instability and a 'consistent' transition across noise types, transforms, datasets, and attacks, yet supply neither the exact instability metric (e.g., cosine distance after N high-noise views, variance of normalized features) nor the numerical noise-strength boundary separating regimes. This is load-bearing for the central claim and for verifying that the reported accuracy lifts are attributable to the gating signal rather than extra forward passes.

    Authors: We thank the referee for highlighting this. The instability metric is the average cosine distance between the clean feature and features from K high-noise perturbed views (K=10 in our experiments). The regime transition is observed starting at noise strength corresponding to epsilon=0.25 for L-infty attacks or sigma=0.4 for Gaussian. We will explicitly define the metric with the formula in §3 and report the boundary values used for gating in the revised manuscript to ensure reproducibility. revision: yes

  2. Referee: [§4 (experiments); Table 1] Table 1 / §4 results report aggregate mean clean+adversarial accuracy improvements (65.7% → 71.4%, 68.4% → 73.2%) but do not include per-dataset clean vs. adversarial breakdowns, standard deviations, or an ablation isolating the gating decision from the base defense. Without these, it is impossible to confirm that the mechanism triggers only on adversarial-like instability or that the gains are not artifacts of implicit ensembling.

    Authors: We agree these details are important. In the revision, we will expand Table 1 to include per-dataset results with clean and adversarial accuracies separately, report standard deviations across 3 random seeds, and add an ablation study comparing the drift-gated approach to the base defense, always-trigger, and random-trigger baselines to demonstrate that gains stem from selective gating rather than ensembling. revision: yes

  3. Referee: [§3.3; §4.3] The claim that the stability transition 'largely disappears in adversarially trained models' is used to attribute the phenomenon to fragile local-basin geometry, but no quantitative comparison (drift curves or tables) of AT vs. non-AT models is referenced. This evidence is required to support the geometric interpretation that underpins the gating design.

    Authors: The manuscript references experiments showing the transition vanishes under AT, but we concur that explicit quantitative evidence is needed. We will include a new figure in the revision depicting drift vs. noise strength curves for both standard CLIP and adversarially trained variants, along with a table summarizing the difference in instability at high-noise regimes. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical observation of drift transition with direct-measurement gating

full rationale

The paper's core contribution is an empirical identification of a high-noise instability transition in CLIP representations, followed by a training-free gating rule that triggers existing defenses when measured feature drift exceeds an (unspecified) threshold. The abstract and description contain no equations, fitted parameters, or self-citations whose load-bearing premise reduces the reported accuracy gains (65.7 % → 71.4 %, etc.) to a quantity defined by the method itself. The gating signal is constructed from observable cosine or Euclidean drift between clean and high-noise views, which is independent of the downstream defense performance it modulates. No self-definitional, fitted-input, or uniqueness-imported steps appear.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim rests on the empirical generality of the high-noise instability transition and its link to local-basin geometry; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption The observed stability transition is tied to the fragile local-basin geometry of adversarial representations in non-robust CLIP.
    Invoked to explain why the transition largely disappears in adversarially trained models.

pith-pipeline@v0.9.1-grok · 5872 in / 1186 out tokens · 37724 ms · 2026-06-28T11:01:21.416588+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 16 canonical work pages · 5 internal anchors

  1. [1]

    Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization.Advances in Neural Information Processing Systems, 36:80396–80413, 2023

    Jameel Abdul Samadh, Mohammad Hanan Gani, Noor Hussein, Muhammad Uzair Khattak, Muhammad Muzammal Naseer, Fahad Shahbaz Khan, and Salman H Khan. Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization.Advances in Neural Information Processing Systems, 36:80396–80413, 2023

  2. [2]

    Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

    Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. InInternational conference on machine learning, pages 274–283. PMLR, 2018

  3. [3]

    Semantic- enhanced image clustering

    Shaotian Cai, Liping Qiu, Xiaojun Chen, Qin Zhang, and Longteng Chen. Semantic- enhanced image clustering. InProceedings of the AAAI conference on artificial intelli- gence, volume 37, pages 6869–6878, 2023

  4. [4]

    Towards evaluating the robustness of neural net- works

    Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural net- works. In2017 ieee symposium on security and privacy (sp), pages 39–57. Ieee, 2017. 16MALIK ET.AL: BEYOND FALSE STABILITY

  5. [5]

    Describing textures in the wild

    Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3606–3613, 2014

  6. [6]

    Certified adversarial robustness via randomized smoothing

    Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. Ininternational conference on machine learning, pages 1310–

  7. [7]

    Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

    Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. InInternational conference on machine learning, pages 2206–2216. PMLR, 2020

  8. [8]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255. IEEE, 2009

  9. [9]

    Boosting adversarial attacks with momentum

    Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jian- guo Li. Boosting adversarial attacks with momentum. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 9185–9193, 2018

  10. [10]

    Adversarial robustness as a prior for learned representations

    Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Brandon Tran, and Aleksander Madry. Adversarial robustness as a prior for learned representations. arXiv preprint arXiv:1906.00945, 2019

  11. [11]

    Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories

    Li Fei-Fei, Rob Fergus, and Pietro Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In2004 Conference on Computer Vision and Pattern Recognition Workshop, pages 178–178. IEEE, 2004

  12. [12]

    Diverse data augmentation with diffusions for effective test-time prompt tuning

    Chun-Mei Feng, Kai Yu, Yong Liu, Salman Khan, and Wangmeng Zuo. Diverse data augmentation with diffusions for effective test-time prompt tuning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2704–2714, 2023

  13. [13]

    Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification

    Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019

  14. [14]

    The many faces of robustness: A critical analysis of out-of-distribution generalization

    Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The many faces of robustness: A critical analysis of out-of-distribution generalization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021

  15. [15]

    Natural adversarial examples

    Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial examples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15262–15271, 2021

  16. [16]

    As firm as their foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?arXiv preprint arXiv:2403.12693, 2024

    Anjun Hu, Jindong Gu, Francesco Pinto, Konstantinos Kamnitsas, and Philip Torr. As firm as their foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?arXiv preprint arXiv:2403.12693, 2024. MALIK ET.AL: BEYOND FALSE STABILITY17

  17. [17]

    Scaling up visual and vision-language representation learning with noisy text supervision

    Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. InInternational conference on machine learning, pages 4904–4916. PMLR, 2021

  18. [18]

    3d object representations for fine-grained categorization

    Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine-grained categorization. InProceedings of the IEEE International Conference on Computer Vision Workshops, pages 554–561, 2013

  19. [19]

    Certified Robustness to Adversarial Examples with Differential Privacy

    Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified robustness to adversarial examples with differential privacy.arXiv preprint arXiv:1802.03471, 2018

  20. [20]

    Language- driven anchors for zero-shot adversarial robustness

    Xiao Li, Wei Zhang, Yining Liu, Zhanhao Hu, Bo Zhang, and Xiaolin Hu. Language- driven anchors for zero-shot adversarial robustness. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24686–24695, 2024

  21. [21]

    Image clustering with external guidance.arXiv preprint arXiv:2310.11989, 2023

    Yunfan Li, Peng Hu, Dezhong Peng, Jiancheng Lv, Jianping Fan, and Xi Peng. Image clustering with external guidance.arXiv preprint arXiv:2310.11989, 2023

  22. [22]

    Clips: An enhanced clip framework for learning with synthetic captions.arXiv preprint arXiv:2411.16828, 2024

    Yanqing Liu, Xianhang Li, Zeyu Wang, Bingchen Zhao, and Cihang Xie. Clips: An enhanced clip framework for learning with synthetic captions.arXiv preprint arXiv:2411.16828, 2024

  23. [23]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks.arXiv preprint arXiv:1706.06083, 2017

  24. [24]

    Fine-Grained Visual Classification of Aircraft

    Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, and Andrea Vedaldi. Fine-grained visual classification of aircraft.arXiv preprint arXiv:1306.5151, 2013

  25. [25]

    Un- derstanding zero-shot adversarial robustness for large-scale models.arXiv preprint arXiv:2212.07016, 2022

    Chengzhi Mao, Scott Geng, Junfeng Yang, Xin Wang, and Carl V ondrick. Un- derstanding zero-shot adversarial robustness for large-scale models.arXiv preprint arXiv:2212.07016, 2022

  26. [26]

    Deepfool: a simple and accurate method to fool deep neural networks

    Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582, 2016

  27. [27]

    Diffusion models for adversarial purification.arXiv preprint arXiv:2205.07460, 2022

    Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and An- ima Anandkumar. Diffusion models for adversarial purification.arXiv preprint arXiv:2205.07460, 2022

  28. [28]

    Automated flower classification over a large number of classes

    Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In2008 Sixth Indian Conference on Computer Vision, Graph- ics & Image Processing, pages 722–729. IEEE, 2008

  29. [29]

    The limitations of deep learning in adversarial settings

    Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P), pages 372–387. IEEE, 2016. 18MALIK ET.AL: BEYOND FALSE STABILITY

  30. [30]

    Cats and dogs

    Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. Cats and dogs. In2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3498–3505. IEEE, 2012

  31. [31]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational confer- ence on machine learning, pages 8748–8763. PmLR, 2021

  32. [32]

    Zero-shot text-to-image generation

    Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. InInternational conference on machine learning, pages 8821–8831. Pmlr, 2021

  33. [33]

    Do im- agenet classifiers generalize to imagenet? InInternational Conference on Machine Learning, pages 5389–5400

    Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do im- agenet classifiers generalize to imagenet? InInternational Conference on Machine Learning, pages 5389–5400. PMLR, 2019

  34. [34]

    Photorealistic text-to-image diffusion models with deep language under- standing.Advances in neural information processing systems, 35:36479–36494, 2022

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Den- ton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Sali- mans, et al. Photorealistic text-to-image diffusion models with deep language under- standing.Advances in neural information processing systems, 35:36479–36494, 2022

  35. [35]

    Ro- bust clip: Unsupervised adversarial fine-tuning of vision embeddings for robust large vision-language models.arXiv preprint arXiv:2402.12336, 2024

    Christian Schlarmann, Naman Deep Singh, Francesco Croce, and Matthias Hein. Ro- bust clip: Unsupervised adversarial fine-tuning of vision embeddings for robust large vision-language models.arXiv preprint arXiv:2402.12336, 2024

  36. [36]

    Adversarial train- ing for free!Advances in neural information processing systems, 32, 2019

    Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial train- ing for free!Advances in neural information processing systems, 32, 2019

  37. [37]

    R-tpt: Improving adversarial robust- ness of vision-language models through test-time prompt tuning

    Lijun Sheng, Jian Liang, Zilei Wang, and Ran He. R-tpt: Improving adversarial robust- ness of vision-language models through test-time prompt tuning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 29958–29967, 2025

  38. [38]

    Reco: Retrieve and co-segment for zero-shot transfer.Advances in Neural Information Processing Systems, 35:33754– 33767, 2022

    Gyungin Shin, Weidi Xie, and Samuel Albanie. Reco: Retrieve and co-segment for zero-shot transfer.Advances in Neural Information Processing Systems, 35:33754– 33767, 2022

  39. [39]

    Test-time prompt tuning for zero-shot generalization in vision- language models.Advances in Neural Information Processing Systems, 35:14274– 14289, 2022

    Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, and Chaowei Xiao. Test-time prompt tuning for zero-shot generalization in vision- language models.Advances in Neural Information Processing Systems, 35:14274– 14289, 2022

  40. [40]

    A dataset of 101 human action classes from videos in the wild.Center for Research in Computer Vision, 2(11), 2012

    Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. A dataset of 101 human action classes from videos in the wild.Center for Research in Computer Vision, 2(11), 2012

  41. [41]

    Disentangling adversarial robustness and generalization

    David Stutz, Matthias Hein, and Bernt Schiele. Disentangling adversarial robustness and generalization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6976–6987, 2019. MALIK ET.AL: BEYOND FALSE STABILITY19

  42. [42]

    Intriguing properties of neural networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks.arXiv preprint arXiv:1312.6199, 2013

  43. [43]

    On the zero-shot adversarial robustness of vision-language models: A truly zero-shot and training-free approach

    Baoshun Tong, Hanjiang Lai, Yan Pan, and Jian Yin. On the zero-shot adversarial robustness of vision-language models: A truly zero-shot and training-free approach. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 19921–19930, 2025

  44. [44]

    Learning robust global representations by penalizing local predictive power.Advances in Neural Information Processing Systems, 32, 2019

    Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. Learning robust global representations by penalizing local predictive power.Advances in Neural Information Processing Systems, 32, 2019

  45. [45]

    Guided diffusion model for adversarial purification.arXiv preprint arXiv:2205.14969, 2022

    Jinyi Wang, Zhaoyang Lyu, Dahua Lin, Bo Dai, and Hongfei Fu. Guided diffusion model for adversarial purification.arXiv preprint arXiv:2205.14969, 2022

  46. [46]

    Pre-trained model guided fine-tuning for zero-shot adversarial robustness

    Sibo Wang, Jie Zhang, Zheng Yuan, and Shiguang Shan. Pre-trained model guided fine-tuning for zero-shot adversarial robustness. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 24502–24511, 2024

  47. [47]

    Double visual defense: Adversarial pre-training and instruction tuning for improving vision-language model robustness.arXiv preprint arXiv:2501.09446, 2025

    Zeyu Wang, Cihang Xie, Brian Bartoldson, and Bhavya Kailkhura. Double visual defense: Adversarial pre-training and instruction tuning for improving vision-language model robustness.arXiv preprint arXiv:2501.09446, 2025

  48. [48]

    Im- proving zero-shot generalization for clip with synthesized prompts.arXiv preprint arXiv:2307.07397, 2023

    Zhengbo Wang, Jian Liang, Ran He, Nan Xu, Zilei Wang, and Tieniu Tan. Im- proving zero-shot generalization for clip with synthesized prompts.arXiv preprint arXiv:2307.07397, 2023

  49. [49]

    A hard-to-beat baseline for training-free clip-based adaptation.arXiv preprint arXiv:2402.04087, 2024

    Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, and Tieniu Tan. A hard-to-beat baseline for training-free clip-based adaptation.arXiv preprint arXiv:2402.04087, 2024

  50. [50]

    Clip is strong enough to fight back: Test-time counterattacks towards zero-shot adversarial robustness of clip

    Songlong Xing, Zhengyu Zhao, and Nicu Sebe. Clip is strong enough to fight back: Test-time counterattacks towards zero-shot adversarial robustness of clip. InProceed- ings of the Computer Vision and Pattern Recognition Conference, pages 15172–15182, 2025

  51. [51]

    CoCa: Contrastive Captioners are Image-Text Foundation Models

    Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, and Yonghui Wu. Coca: Contrastive captioners are image-text foundation models.arXiv preprint arXiv:2205.01917, 2022

  52. [52]

    A simple framework for open-vocabulary segmentation and detection

    Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianwei Yang, and Lei Zhang. A simple framework for open-vocabulary segmentation and detection. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 1020– 1031, 2023

  53. [53]

    Theoretically principled trade-off between robustness and accuracy

    Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. InInternational conference on machine learning, pages 7472–7482. PMLR, 2019

  54. [54]

    Adversarial prompt tuning for vision-language models

    Jiaming Zhang, Xingjun Ma, Xin Wang, Lingyu Qiu, Jiaqi Wang, Yu-Gang Jiang, and Jitao Sang. Adversarial prompt tuning for vision-language models. InEuropean con- ference on computer vision, pages 56–72. Springer, 2024. 20MALIK ET.AL: BEYOND FALSE STABILITY

  55. [55]

    Vision-language models for vision tasks: A survey.IEEE transactions on pattern analysis and machine intelligence, 46(8):5625–5644, 2024

    Jingyi Zhang, Jiaxing Huang, Sheng Jin, and Shijian Lu. Vision-language models for vision tasks: A survey.IEEE transactions on pattern analysis and machine intelligence, 46(8):5625–5644, 2024

  56. [56]

    Exploiting unlabeled data with vision and language models for object detection

    Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, BG Vijay Kumar, Anastasis Stathopoulos, Manmohan Chandraker, and Dimitris N Metaxas. Exploiting unlabeled data with vision and language models for object detection. InEuropean conference on computer vision, pages 159–175. Springer, 2022

  57. [57]

    Extract free dense labels from clip

    Chong Zhou, Chen Change Loy, and Bo Dai. Extract free dense labels from clip. In European conference on computer vision, pages 696–712. Springer, 2022

  58. [58]

    Conditional prompt learning for vision-language models

    Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16816–16825, 2022

  59. [59]

    Learning to prompt for vision-language models.International journal of computer vision, 130(9):2337– 2348, 2022

    Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.International journal of computer vision, 130(9):2337– 2348, 2022

  60. [60]

    Few-shot adver- sarial prompt learning on vision-language models.Advances in Neural Information Processing Systems, 37:3122–3156, 2024

    Yiwei Zhou, Xiaobo Xia, Zhiwei Lin, Bo Han, and Tongliang Liu. Few-shot adver- sarial prompt learning on vision-language models.Advances in Neural Information Processing Systems, 37:3122–3156, 2024. MALIK ET.AL: BEYOND FALSE STABILITY21 A Appendix Overview This appendix provides additional analyses, ablations, and extended experimental results supporting ...