The Geometry of Robustness: Optimizing Loss Landscape Curvature and Feature Manifold Alignment for Robust Finetuning of Vision-Language Models

Brisa Maneechotesuwan; Chengyue Huang; Shaunak Halbe; Shivang Chopra; Zsolt Kira

arxiv: 2603.27139 · v2 · submitted 2026-03-28 · 💻 cs.CV

The Geometry of Robustness: Optimizing Loss Landscape Curvature and Feature Manifold Alignment for Robust Finetuning of Vision-Language Models

Shivang Chopra , Shaunak Halbe , Chengyue Huang , Brisa Maneechotesuwan , Zsolt Kira This is my paper

Pith reviewed 2026-05-14 22:14 UTC · model grok-4.3

classification 💻 cs.CV

keywords robust fine-tuningvision-language modelsloss landscape curvaturefeature alignmentadversarial robustnessout-of-distribution generalizationCLIP modelsPAC-Bayes

0 comments

The pith

GRACE fine-tunes vision-language models by flattening loss curvature and aligning features to gain ID and adversarial accuracy without losing OOD performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that fine-tuning vision-language models creates a three-way trade-off between in-distribution accuracy, out-of-distribution generalization, and adversarial robustness. This trade-off arises because optimization lands in sharp anisotropic minima and produces feature representations that distort under small changes. GRACE counters both problems at once with curvature-scaled weight perturbations that flatten the loss surface and a feature-alignment loss that keeps representations consistent across clean, adversarial, and shifted inputs. A sympathetic reader cares because existing methods improve at most two of the three goals, leaving deployed VLMs either inaccurate or brittle. On ImageNet fine-tuning of CLIP models the method raises ID accuracy 10.8 percent and adversarial accuracy 13.5 percent while holding OOD accuracy at 57.0 percent, nearly matching the zero-shot baseline.

Core claim

GRACE, grounded in Robust PAC-Bayes theory, jointly regularizes parameter-space curvature through adaptive weight perturbations scaled by local curvature estimates and enforces feature-space invariance with an alignment loss across clean, adversarial, and OOD inputs. On ImageNet fine-tuning of CLIP models this produces 10.8 percent higher ID accuracy, 13.5 percent higher adversarial accuracy, and 57.0 percent OOD accuracy versus the 57.4 percent zero-shot baseline. Geometric analysis shows the resulting minima are flatter and the learned features remain undistorted across distribution shifts.

What carries the argument

GRACE framework that applies adaptive curvature-scaled perturbations to promote flat minima together with a Gram-aligned feature invariance loss.

If this is right

GRACE converges to flatter minima in the loss landscape.
Feature representations stay consistent without distortion under adversarial perturbations and distribution shifts.
ID accuracy, adversarial robustness, and OOD generalization improve simultaneously on CLIP ImageNet fine-tuning.
The approach supplies a principled route to generalized robustness in foundation VLMs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Curvature regularization may transfer to other multimodal or language-only foundation models facing similar optimization instabilities.
Tracking curvature during training could become a practical diagnostic for whether a fine-tuning run is likely to preserve OOD behavior.
The same alignment loss might stabilize representations in non-adversarial continual-learning or domain-adaptation settings.

Load-bearing premise

The three-way robustness trade-off stems from sharp anisotropic minima in parameter space and unstable feature representations that deform under perturbation.

What would settle it

Measuring the Hessian or curvature metrics of a GRACE-trained model and finding no reduction in sharpness relative to standard fine-tuning would falsify the claimed geometric mechanism.

Figures

Figures reproduced from arXiv: 2603.27139 by Brisa Maneechotesuwan, Chengyue Huang, Shaunak Halbe, Shivang Chopra, Zsolt Kira.

**Figure 1.** Figure 1: The VLM robustness three-way tradeoff. Existing robust fine-tuning strategies resolve at most two of {ID, OOD, adversarial} robustness simultaneously, leaving a gap in generalized robustness. GRACE is designed to close this gap. shifts (OOD), and (iii) resisting gradient-based adversarial attacks. Standard fine-tuning often collapses at least one of these axes [26, 28, 30, 38], making generalized robustne… view at source ↗

**Figure 2.** Figure 2: (a) Feature Distribution Analysis: 3D projection of image features for in-distribution (fID), OOD (fOOD), and PGD adversarial inputs (fAdv) of the same class. (b) Loss Landscape Analysis: 3D/2D loss slices around the converged solutions for each method, using shared perturbation directions. Method λmax (×104 ) ∥H∥F / √ d (×102 ) FT 3.5 0.89 WiSE-FT (S1) 3.3 0.78 TeCoA (S2) 1.8 0.52 GRACE 1.6 0.43 [PITH_FU… view at source ↗

**Figure 4.** Figure 4: LAR-AWP rank curriculum. A diagonal rank mask controls the effective perturbation rank per layer. Curvature estimates hW (from mini-batch gradients) are used to assign higher perturbation ranks to sharper layers, focusing smoothing where the loss landscape is steepest. 5.3. Layer-Wise Adaptive Low-Rank AWP (LARAWP) To control the expected robust sharpness term in Eq. (2), GRACE injects adversarial weight… view at source ↗

**Figure 5.** Figure 5: Gram-volume feature alignment. For each input, GRACE compares clean, adversarial, and LAR-AWP-perturbed image embeddings via a small Gram matrix. The Gram-volume loss encourages these three vectors to remain close to each other (low volume) while preserving separation across different classes. where nv is the validation mini-batch size. We maintain an exponential moving average of hW for each layer and com… view at source ↗

**Figure 6.** Figure 6: Pareto Curve. GRACE (Red) achieves +7 performance gain while being 1.4× faster than prior adversarial methods. achieves the best average, demonstrating the complementary effects of representation consistency and curvature regularization. These results confirm that both geometric components are critical for stabilizing representations and mitigating catastrophic forgetting during fine-tuning. 6.5. Compu… view at source ↗

**Figure 7.** Figure 7: Layerwise curvature anisotropy in CLIP. Normalized Hutchinson curvature κℓ for each transformer block of CLIP ViT-B/32, ViT-B/16, and ViT-L/14. All models exhibit substantial variation in curvature across depth, indicating strong layerwise geometric heterogeneity. B. Preliminaries B.1. Image Classification with Vision–Language Models In a K-class image classification problem with inputs x ∈ X and labels … view at source ↗

**Figure 8.** Figure 8: Schematic of LoRA weight updates. Weight updates are parameterized by low-rank matrices A and B of rank r, constraining deviations from the pre-trained weights to a lowdimensional subspace. reduces the number of trainable parameters, improves efficiency, and implicitly constrains the adaptation to remain close to the pre-trained solution, consistent with the KL term in our PAC-Bayesian analysis. The ove… view at source ↗

**Figure 9.** Figure 9: 3D visualization of CLIP feature geometry under different shifts. Using normalized embeddings projected onto the unit sphere, we compare ID samples, natural adversarial (Nat Adv) variants, OOD samples, and features obtained under AWP perturbations. AWP produces feature displacements that closely follow the structure of real OOD and natural adversarial shifts, confirming that curvature-aligned weight per… view at source ↗

**Figure 11.** Figure 11: Evolution of Layerwise AWP Rank During Training. The heatmap shows the adaptive adversarial weight perturbation (AWP) rank assigned to each transformer block across training epochs. To understand how GRACE allocates adversarial perturbation capacity across the network, we analyze the temporal progression of the learned AWP rank r (ℓ) AWP and relate it to the curvature structure of the CLIP ViT-B/32 … view at source ↗

read the original abstract

Fine-tuning approaches for Vision-Language Models (VLMs) face a critical three-way trade-off between In-Distribution (ID) accuracy, Out-of-Distribution (OOD) generalization, and adversarial robustness. Existing robust fine-tuning strategies resolve at most two axes of this trade-off. Generalization-preserving methods retain ID/OOD performance but leave models vulnerable to adversarial attacks, while adversarial training improves robustness to targeted attacks but degrades ID/OOD accuracy. Our key insight is that the robustness trade-off stems from two geometric failures: sharp, anisotropic minima in parameter space and unstable feature representations that deform under perturbation. To address this, we propose GRACE (Gram-aligned Robustness via Adaptive Curvature Estimation), a unified fine-tuning framework that jointly regularizes the parameter-space curvature and feature-space invariance for VLMs. Grounded in Robust PAC-Bayes theory, GRACE employs adaptive weight perturbations scaled by local curvature to promote flatter minima, combined with a feature alignment loss that maintains representation consistency across clean, adversarial, and OOD inputs. On ImageNet fine-tuning of CLIP models, GRACE simultaneously improves ID accuracy by 10.8%, and adversarial accuracy by 13.5% while maintaining 57.0% OOD accuracy (vs. 57.4% zero-shot baseline). Geometric analysis confirms that GRACE converges to flatter minima without feature distortion across distribution shifts, providing a principled step toward generalized robustness in foundation VLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GRACE pairs curvature regularization with feature alignment to improve the three-way trade-off in VLM fine-tuning, reporting solid gains on CLIP but with thin experimental detail so far.

read the letter

The core of this paper is GRACE, a fine-tuning method for vision-language models that tries to fix the usual conflict between in-distribution accuracy, out-of-distribution generalization, and adversarial robustness. The authors trace the problem to sharp anisotropic minima in parameter space and features that shift under attack or distribution change. They respond with two regularizers: adaptive weight perturbations scaled by local curvature to flatten the landscape, plus a feature alignment loss that keeps representations consistent across clean, adversarial, and OOD inputs. Both pieces are motivated by Robust PAC-Bayes bounds. On ImageNet fine-tuning of CLIP, they claim +10.8% ID accuracy, +13.5% adversarial accuracy, and OOD accuracy that stays near the zero-shot level at 57.0% versus 57.4%. That joint improvement is the headline result. The geometric analysis they include to show flatter minima without feature distortion is a useful addition and gives some intuition for why the method works. The framework itself is a clean combination of existing ideas rather than a wholly new theory, but the specific pairing for VLMs appears fresh. The main limitation is experimental transparency. The abstract gives no error bars, no ablation tables, and no full protocol for how adversarial examples or OOD sets were constructed. Without those, it is difficult to judge how stable the reported deltas are or whether they depend on particular hyperparameter choices. The curvature estimation step also needs clearer implementation details to assess cost and approximation quality on large models. Still, the central claim is coherent and the problem is practically relevant. This is worth a serious referee for anyone working on robust adaptation of foundation models. A reader focused on regularization techniques for VLMs would get concrete value from the method and the numbers, even if revisions are needed to strengthen the evidence.

Referee Report

2 major / 1 minor

Summary. The paper claims that the three-way trade-off in VLM fine-tuning (ID accuracy, OOD generalization, adversarial robustness) arises from sharp anisotropic minima in parameter space and unstable feature representations under perturbation. GRACE addresses this via a unified framework grounded in Robust PAC-Bayes theory: adaptive curvature-regularizing weight perturbations to promote flatter minima, combined with a feature alignment loss enforcing representation consistency across clean, adversarial, and OOD inputs. On ImageNet fine-tuning of CLIP models, it reports simultaneous gains of +10.8% ID accuracy and +13.5% adversarial accuracy while holding OOD accuracy at 57.0% (vs. 57.4% zero-shot baseline), with geometric analysis confirming flatter minima without feature distortion.

Significance. If the empirical gains and theoretical grounding hold under full verification, this would be a meaningful contribution to robust fine-tuning of foundation VLMs by providing a geometric diagnosis and joint regularization strategy that resolves the typical trade-off, moving beyond methods that improve at most two axes.

major comments (2)

Abstract and experimental section: the central claims of +10.8% ID and +13.5% adversarial accuracy improvements (with OOD near baseline) are presented without error bars, statistical significance tests, ablation tables, or a complete experimental protocol (e.g., exact OOD datasets, perturbation budgets, hyperparameter ranges), which is load-bearing for assessing whether the three-way improvement is reproducible and not an artifact of selective reporting.
Theory section (Robust PAC-Bayes grounding): without the explicit derivations of the adaptive weight perturbations scaled by local curvature, it remains unclear whether the regularization terms are parameter-free or reduce by construction to quantities fitted on the target data, raising a potential circularity risk for the claimed geometric benefits.

minor comments (1)

Notation for the feature alignment loss and curvature estimator should be defined more explicitly with respect to the VLM components (e.g., vision encoder vs. text encoder) to improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and have revised the manuscript to strengthen the experimental reporting and theoretical derivations.

read point-by-point responses

Referee: Abstract and experimental section: the central claims of +10.8% ID and +13.5% adversarial accuracy improvements (with OOD near baseline) are presented without error bars, statistical significance tests, ablation tables, or a complete experimental protocol (e.g., exact OOD datasets, perturbation budgets, hyperparameter ranges), which is load-bearing for assessing whether the three-way improvement is reproducible and not an artifact of selective reporting.

Authors: We agree that comprehensive experimental details are essential for reproducibility. In the revised manuscript we have added error bars from five independent runs with different random seeds, included paired t-test results confirming statistical significance (p < 0.01) for the reported gains, expanded the ablation tables to cover each GRACE component, and provided a complete experimental protocol in the main text and appendix. This protocol specifies the exact OOD datasets (ImageNet-A, ImageNet-R, ImageNet-V2), perturbation budgets (PGD with ε = 8/255 and 10 steps), and hyperparameter ranges used for tuning. revision: yes
Referee: Theory section (Robust PAC-Bayes grounding): without the explicit derivations of the adaptive weight perturbations scaled by local curvature, it remains unclear whether the regularization terms are parameter-free or reduce by construction to quantities fitted on the target data, raising a potential circularity risk for the claimed geometric benefits.

Authors: We thank the referee for this observation. The original theory section presented the high-level PAC-Bayes motivation but omitted the full derivations for space reasons. We have now inserted the explicit step-by-step derivations in Section 3, showing that the adaptive perturbations are obtained from an online local Hessian-trace approximation computed during training and are not fitted post-hoc on the target data. The resulting regularization terms follow directly from the Robust PAC-Bayes bound and remain parameter-free in their core formulation; only standard validation-based selection is used for the few scalar hyperparameters. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central argument is an empirical claim: GRACE, motivated by geometric diagnosis of sharp minima and unstable features and grounded in external Robust PAC-Bayes theory, yields simultaneous gains in ID accuracy (+10.8%), adversarial accuracy (+13.5%), and near-baseline OOD accuracy on ImageNet-CLIP fine-tuning. No load-bearing derivation step is shown to reduce to its own inputs by construction. The abstract and skeptic summary present the method as jointly regularizing curvature (via adaptive perturbations) and feature alignment, with reported numbers as direct experimental outcomes rather than fitted predictions renamed as results. No self-citation chains, ansatzes smuggled via prior work, or self-definitional quantities appear in the provided text. The framework is therefore treated as self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so free parameters, axioms, and invented entities cannot be exhaustively audited; the method invokes Robust PAC-Bayes theory as grounding.

axioms (1)

domain assumption Robust PAC-Bayes theory supplies valid bounds for robustness under perturbation
Invoked to justify adaptive curvature regularization and feature alignment.

pith-pipeline@v0.9.0 · 5582 in / 1085 out tokens · 28976 ms · 2026-05-14T22:14:34.864229+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J-cost uniqueness) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GRACE employs adaptive weight perturbations scaled by local curvature to promote flatter minima, combined with a feature alignment loss that maintains representation consistency across clean, adversarial, and OOD inputs.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking (D=3 forcing) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Layerwise Adaptive Low-Rank Adversarial Weight Perturbation (LAR-AWP): structured, low-rank adversarial perturbations with layerwise adaptive magnitudes

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

Analysis of representations for domain adaptation

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. InAdvances in Neural Information Processing Systems. MIT Press, 2006. 4

work page 2006
[2]

Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce and Matthias Hein. Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks. ICML, 2020. 6

work page 2020
[3]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 7, 8, 10

work page 2009
[4]

Sharpness-aware minimization for efficiently improving generalization

Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness-aware minimization for efficiently improving generalization. InICLR, 2021. 2

work page 2021
[5]

Finetune like you pretrain: Im- proved finetuning of zero-shot vision models

Sachin Goyal, Ananya Kumar, Sankalp Garg, Zico Kolter, and Aditi Raghunathan. Finetune like you pretrain: Im- proved finetuning of zero-shot vision models. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19338–19347, 2023. 1, 2, 3, 7, 10

work page 2023
[6]

The many faces of robust- ness: A critical analysis of out-of-distribution generalization

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kada- vath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Lixuan Zhu, Samyak Parajuli, Mike Guo, Dawn Xiaodong Song, Ja- cob Steinhardt, and Justin Gilmer. The many faces of robust- ness: A critical analysis of out-of-distribution generalization. 2021 IEEE/CVF International Conference on Computer Vi- sion (IC...

work page 2021
[7]

Natural adversarial examples

Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Stein- hardt, and Dawn Song. Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 15262–15271,

work page
[8]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In ICLR, 2022. 2

work page 2022
[9]

Directional gradient pro- jection for robust fine-tuning of foundation models

Chengyue Huang, Junjiao Tian, Brisa Maneechotesuwan, Shivang Chopra, and Zsolt Kira. Directional gradient pro- jection for robust fine-tuning of foundation models. InThe Thirteenth International Conference on Learning Represen- tations, 2025. 1

work page 2025
[10]

Scaling up visual and vision-language representa- tion learning with noisy text supervision

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representa- tion learning with noisy text supervision. InInternational conference on machine learning, pages 4904–4916. PMLR,

work page
[11]

Visualizing the loss landscape of neural nets

Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape of neural nets. In Neural Information Processing Systems, 2018. 3

work page 2018
[12]

Rethinking natural adversarial examples for classifica- tion models, 2021

Xiao Li, Jianmin Li, Ting Dai, Jie Shi, Jun Zhu, and Xiaolin Hu. Rethinking natural adversarial examples for classifica- tion models, 2021. 6

work page 2021
[13]

Language-driven anchors for zero-shot ad- versarial robustness

Xiao Li, Wei Zhang, Yining Liu, Zhanhao Hu, Bo Zhang, and Xiaolin HU. Language-driven anchors for zero-shot ad- versarial robustness. InCVPR, 2024. 2, 3, 7, 10

work page 2024
[14]

Sophia: A scalable stochastic second-order optimizer for language model pre-training

Hong Liu, Zhiyuan Li, David Leo Wright Hall, Percy Liang, and Tengyu Ma. Sophia: A scalable stochastic second-order optimizer for language model pre-training. InThe Twelfth In- ternational Conference on Learning Representations, 2024. 5

work page 2024
[15]

Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E

Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E. Houle, and James Bailey. Characterizing adversarial sub- spaces using local intrinsic dimensionality, 2018. 5

work page 2018
[16]

Towards deep learn- ing models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learn- ing models resistant to adversarial attacks. InInternational Conference on Learning Representations, 2018. 1, 2

work page 2018
[17]

Understanding zero-shot adversarial robust- ness for large-scale models

Chengzhi Mao, Scott Geng, Junfeng Yang, Xin Wang, and Carl V ondrick. Understanding zero-shot adversarial robust- ness for large-scale models. InThe Eleventh International Conference on Learning Representations, 2023. 1, 2, 3, 7, 10

work page 2023
[18]

A pac-bayesian tutorial with a dropout bound, 2013

David McAllester. A pac-bayesian tutorial with a dropout bound, 2013. 3

work page 2013
[19]

Lipsum-FT: Ro- bust fine-tuning of zero-shot models using random text guid- ance

Giung Nam, Byeongho Heo, and Juho Lee. Lipsum-FT: Ro- bust fine-tuning of zero-shot models using random text guid- ance. InThe Twelfth International Conference on Learning Representations, 2024. 1, 2

work page 2024
[20]

Exploring generalization in deep learning

Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nathan Srebro. Exploring generalization in deep learning. InProceedings of the 31st International Conference on Neural Information Processing Systems, page 5949–5958, Red Hook, NY , USA, 2017. Curran Associates Inc. 3

work page 2017
[21]

A pac-bayesian approach to spectrally-normalized mar- gin bounds for neural networks, 2018

Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Sre- bro. A pac-bayesian approach to spectrally-normalized mar- gin bounds for neural networks, 2018. 3

work page 2018
[22]

Curran Associates Inc., Red Hook, NY , USA, 2019

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K ¨opf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala.PyTorch: an imper- ative style, high-perfo...

work page 2019
[23]

Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, and Quoc V . Le. Combined scaling for zero-shot transfer learn- ing.Neurocomput., 555(C), 2023. 1

work page 2023
[24]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, 2021....

work page 2021
[25]

Do ImageNet classifiers generalize to Ima- geNet? InProceedings of the 36th International Conference on Machine Learning, pages 5389–5400

Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do ImageNet classifiers generalize to Ima- geNet? InProceedings of the 36th International Conference on Machine Learning, pages 5389–5400. PMLR, 2019. 6, 8

work page 2019
[26]

Robust CLIP: Unsupervised ad- versarial fine-tuning of vision embeddings for robust large vision-language models

Christian Schlarmann, Naman Deep Singh, Francesco Croce, and Matthias Hein. Robust CLIP: Unsupervised ad- versarial fine-tuning of vision embeddings for robust large vision-language models. InProceedings of the 41st Inter- national Conference on Machine Learning, pages 43685– 43704. PMLR, 2024. 1, 2, 3, 7, 10

work page 2024
[27]

Hopcroft

Chuanbiao Song, Kun He, Liwei Wang, and John E. Hopcroft. Improving the generalization of adversarial train- ing with domain adaptation. InInternational Conference on Learning Representations, 2019. 2

work page 2019
[28]

Trainable projected gradient method for robust fine-tuning

Junjiao Tian, Xiaoliang Dai, Chih-Yao Ma, Zecheng He, Yen-Cheng Liu, and Zsolt Kira. Trainable projected gradient method for robust fine-tuning. In2023 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 7836–7845, 2023. 1, 2, 3, 7, 10

work page 2023
[29]

Rethinking weight decay for robust fine-tuning of foundation models

Junjiao Tian, Chengyue Huang, and Zsolt Kira. Rethinking weight decay for robust fine-tuning of foundation models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 3, 7, 10

work page 2024
[30]

Fast trainable projection for robust fine-tuning

Junjiao Tian, Yen-Cheng Liu, James Seale Smith, and Zsolt Kira. Fast trainable projection for robust fine-tuning. InPro- ceedings of the 37th International Conference on Neural In- formation Processing Systems, Red Hook, NY , USA, 2024. Curran Associates Inc. 1, 2

work page 2024
[31]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neu- ral Information Processing Systems. Curran Associates, Inc.,

work page
[32]

Learning robust global representations by penalizing local predictive power

Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. Learning robust global representations by penalizing local predictive power. InAdvances in Neural Information Processing Systems. Curran Associates, Inc., 2019. 6, 8

work page 2019
[33]

Improving out-of-distribution generalization by adversarial training with structured priors

Qixun Wang, Yifei Wang, Hong Zhu, and Yisen Wang. Improving out-of-distribution generalization by adversarial training with structured priors. InAdvances in Neural Infor- mation Processing Systems, 2022. 2

work page 2022
[34]

Pre- trained model guided fine-tuning for zero-shot adversarial robustness

Sibo Wang, Jie Zhang, Zheng Yuan, and Shiguang Shan. Pre- trained model guided fine-tuning for zero-shot adversarial robustness. InCVPR, 2024. 2, 3, 7, 10

work page 2024
[35]

Transformers: State-of-the-art natural language processing

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chau- mond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. Transformers: State-of-the-art ...

work page 2020
[36]

Robust fine-tuning of zero-shot models

Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gon- tijo Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, and Ludwig Schmidt. Robust fine-tuning of zero-shot models. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 7959–7971, 2022. 1, 2, 3, 7, 10

work page 2022
[37]

Adversarial weight perturbation helps robust generalization

Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization. InNeurIPS,

work page
[38]

A photo of a{label}

Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongx- uan LI, Ngai-Man (Man) Cheung, and Min Lin. On eval- uating adversarial robustness of large vision-language mod- els. InAdvances in Neural Information Processing Systems, pages 54111–54138. Curran Associates, Inc., 2023. 1 The Geometry of Robustness: Optimizing Loss Landscape Curvature and Feature Man...

work page arXiv 2023

[1] [1]

Analysis of representations for domain adaptation

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. InAdvances in Neural Information Processing Systems. MIT Press, 2006. 4

work page 2006

[2] [2]

Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce and Matthias Hein. Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks. ICML, 2020. 6

work page 2020

[3] [3]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 7, 8, 10

work page 2009

[4] [4]

Sharpness-aware minimization for efficiently improving generalization

Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness-aware minimization for efficiently improving generalization. InICLR, 2021. 2

work page 2021

[5] [5]

Finetune like you pretrain: Im- proved finetuning of zero-shot vision models

Sachin Goyal, Ananya Kumar, Sankalp Garg, Zico Kolter, and Aditi Raghunathan. Finetune like you pretrain: Im- proved finetuning of zero-shot vision models. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19338–19347, 2023. 1, 2, 3, 7, 10

work page 2023

[6] [6]

The many faces of robust- ness: A critical analysis of out-of-distribution generalization

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kada- vath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Lixuan Zhu, Samyak Parajuli, Mike Guo, Dawn Xiaodong Song, Ja- cob Steinhardt, and Justin Gilmer. The many faces of robust- ness: A critical analysis of out-of-distribution generalization. 2021 IEEE/CVF International Conference on Computer Vi- sion (IC...

work page 2021

[7] [7]

Natural adversarial examples

Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Stein- hardt, and Dawn Song. Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 15262–15271,

work page

[8] [8]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In ICLR, 2022. 2

work page 2022

[9] [9]

Directional gradient pro- jection for robust fine-tuning of foundation models

Chengyue Huang, Junjiao Tian, Brisa Maneechotesuwan, Shivang Chopra, and Zsolt Kira. Directional gradient pro- jection for robust fine-tuning of foundation models. InThe Thirteenth International Conference on Learning Represen- tations, 2025. 1

work page 2025

[10] [10]

Scaling up visual and vision-language representa- tion learning with noisy text supervision

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representa- tion learning with noisy text supervision. InInternational conference on machine learning, pages 4904–4916. PMLR,

work page

[11] [11]

Visualizing the loss landscape of neural nets

Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape of neural nets. In Neural Information Processing Systems, 2018. 3

work page 2018

[12] [12]

Rethinking natural adversarial examples for classifica- tion models, 2021

Xiao Li, Jianmin Li, Ting Dai, Jie Shi, Jun Zhu, and Xiaolin Hu. Rethinking natural adversarial examples for classifica- tion models, 2021. 6

work page 2021

[13] [13]

Language-driven anchors for zero-shot ad- versarial robustness

Xiao Li, Wei Zhang, Yining Liu, Zhanhao Hu, Bo Zhang, and Xiaolin HU. Language-driven anchors for zero-shot ad- versarial robustness. InCVPR, 2024. 2, 3, 7, 10

work page 2024

[14] [14]

Sophia: A scalable stochastic second-order optimizer for language model pre-training

Hong Liu, Zhiyuan Li, David Leo Wright Hall, Percy Liang, and Tengyu Ma. Sophia: A scalable stochastic second-order optimizer for language model pre-training. InThe Twelfth In- ternational Conference on Learning Representations, 2024. 5

work page 2024

[15] [15]

Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E

Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E. Houle, and James Bailey. Characterizing adversarial sub- spaces using local intrinsic dimensionality, 2018. 5

work page 2018

[16] [16]

Towards deep learn- ing models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learn- ing models resistant to adversarial attacks. InInternational Conference on Learning Representations, 2018. 1, 2

work page 2018

[17] [17]

Understanding zero-shot adversarial robust- ness for large-scale models

Chengzhi Mao, Scott Geng, Junfeng Yang, Xin Wang, and Carl V ondrick. Understanding zero-shot adversarial robust- ness for large-scale models. InThe Eleventh International Conference on Learning Representations, 2023. 1, 2, 3, 7, 10

work page 2023

[18] [18]

A pac-bayesian tutorial with a dropout bound, 2013

David McAllester. A pac-bayesian tutorial with a dropout bound, 2013. 3

work page 2013

[19] [19]

Lipsum-FT: Ro- bust fine-tuning of zero-shot models using random text guid- ance

Giung Nam, Byeongho Heo, and Juho Lee. Lipsum-FT: Ro- bust fine-tuning of zero-shot models using random text guid- ance. InThe Twelfth International Conference on Learning Representations, 2024. 1, 2

work page 2024

[20] [20]

Exploring generalization in deep learning

Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nathan Srebro. Exploring generalization in deep learning. InProceedings of the 31st International Conference on Neural Information Processing Systems, page 5949–5958, Red Hook, NY , USA, 2017. Curran Associates Inc. 3

work page 2017

[21] [21]

A pac-bayesian approach to spectrally-normalized mar- gin bounds for neural networks, 2018

Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Sre- bro. A pac-bayesian approach to spectrally-normalized mar- gin bounds for neural networks, 2018. 3

work page 2018

[22] [22]

Curran Associates Inc., Red Hook, NY , USA, 2019

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K ¨opf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala.PyTorch: an imper- ative style, high-perfo...

work page 2019

[23] [23]

Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, and Quoc V . Le. Combined scaling for zero-shot transfer learn- ing.Neurocomput., 555(C), 2023. 1

work page 2023

[24] [24]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, 2021....

work page 2021

[25] [25]

Do ImageNet classifiers generalize to Ima- geNet? InProceedings of the 36th International Conference on Machine Learning, pages 5389–5400

Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do ImageNet classifiers generalize to Ima- geNet? InProceedings of the 36th International Conference on Machine Learning, pages 5389–5400. PMLR, 2019. 6, 8

work page 2019

[26] [26]

Robust CLIP: Unsupervised ad- versarial fine-tuning of vision embeddings for robust large vision-language models

Christian Schlarmann, Naman Deep Singh, Francesco Croce, and Matthias Hein. Robust CLIP: Unsupervised ad- versarial fine-tuning of vision embeddings for robust large vision-language models. InProceedings of the 41st Inter- national Conference on Machine Learning, pages 43685– 43704. PMLR, 2024. 1, 2, 3, 7, 10

work page 2024

[27] [27]

Hopcroft

Chuanbiao Song, Kun He, Liwei Wang, and John E. Hopcroft. Improving the generalization of adversarial train- ing with domain adaptation. InInternational Conference on Learning Representations, 2019. 2

work page 2019

[28] [28]

Trainable projected gradient method for robust fine-tuning

Junjiao Tian, Xiaoliang Dai, Chih-Yao Ma, Zecheng He, Yen-Cheng Liu, and Zsolt Kira. Trainable projected gradient method for robust fine-tuning. In2023 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 7836–7845, 2023. 1, 2, 3, 7, 10

work page 2023

[29] [29]

Rethinking weight decay for robust fine-tuning of foundation models

Junjiao Tian, Chengyue Huang, and Zsolt Kira. Rethinking weight decay for robust fine-tuning of foundation models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 3, 7, 10

work page 2024

[30] [30]

Fast trainable projection for robust fine-tuning

Junjiao Tian, Yen-Cheng Liu, James Seale Smith, and Zsolt Kira. Fast trainable projection for robust fine-tuning. InPro- ceedings of the 37th International Conference on Neural In- formation Processing Systems, Red Hook, NY , USA, 2024. Curran Associates Inc. 1, 2

work page 2024

[31] [31]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neu- ral Information Processing Systems. Curran Associates, Inc.,

work page

[32] [32]

Learning robust global representations by penalizing local predictive power

Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. Learning robust global representations by penalizing local predictive power. InAdvances in Neural Information Processing Systems. Curran Associates, Inc., 2019. 6, 8

work page 2019

[33] [33]

Improving out-of-distribution generalization by adversarial training with structured priors

Qixun Wang, Yifei Wang, Hong Zhu, and Yisen Wang. Improving out-of-distribution generalization by adversarial training with structured priors. InAdvances in Neural Infor- mation Processing Systems, 2022. 2

work page 2022

[34] [34]

Pre- trained model guided fine-tuning for zero-shot adversarial robustness

Sibo Wang, Jie Zhang, Zheng Yuan, and Shiguang Shan. Pre- trained model guided fine-tuning for zero-shot adversarial robustness. InCVPR, 2024. 2, 3, 7, 10

work page 2024

[35] [35]

Transformers: State-of-the-art natural language processing

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chau- mond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. Transformers: State-of-the-art ...

work page 2020

[36] [36]

Robust fine-tuning of zero-shot models

Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gon- tijo Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, and Ludwig Schmidt. Robust fine-tuning of zero-shot models. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 7959–7971, 2022. 1, 2, 3, 7, 10

work page 2022

[37] [37]

Adversarial weight perturbation helps robust generalization

Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization. InNeurIPS,

work page

[38] [38]

A photo of a{label}

Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongx- uan LI, Ngai-Man (Man) Cheung, and Min Lin. On eval- uating adversarial robustness of large vision-language mod- els. InAdvances in Neural Information Processing Systems, pages 54111–54138. Curran Associates, Inc., 2023. 1 The Geometry of Robustness: Optimizing Loss Landscape Curvature and Feature Man...

work page arXiv 2023