The Geometry of Robustness: Optimizing Loss Landscape Curvature and Feature Manifold Alignment for Robust Finetuning of Vision-Language Models
Pith reviewed 2026-05-14 22:14 UTC · model grok-4.3
The pith
GRACE fine-tunes vision-language models by flattening loss curvature and aligning features to gain ID and adversarial accuracy without losing OOD performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GRACE, grounded in Robust PAC-Bayes theory, jointly regularizes parameter-space curvature through adaptive weight perturbations scaled by local curvature estimates and enforces feature-space invariance with an alignment loss across clean, adversarial, and OOD inputs. On ImageNet fine-tuning of CLIP models this produces 10.8 percent higher ID accuracy, 13.5 percent higher adversarial accuracy, and 57.0 percent OOD accuracy versus the 57.4 percent zero-shot baseline. Geometric analysis shows the resulting minima are flatter and the learned features remain undistorted across distribution shifts.
What carries the argument
GRACE framework that applies adaptive curvature-scaled perturbations to promote flat minima together with a Gram-aligned feature invariance loss.
If this is right
- GRACE converges to flatter minima in the loss landscape.
- Feature representations stay consistent without distortion under adversarial perturbations and distribution shifts.
- ID accuracy, adversarial robustness, and OOD generalization improve simultaneously on CLIP ImageNet fine-tuning.
- The approach supplies a principled route to generalized robustness in foundation VLMs.
Where Pith is reading between the lines
- Curvature regularization may transfer to other multimodal or language-only foundation models facing similar optimization instabilities.
- Tracking curvature during training could become a practical diagnostic for whether a fine-tuning run is likely to preserve OOD behavior.
- The same alignment loss might stabilize representations in non-adversarial continual-learning or domain-adaptation settings.
Load-bearing premise
The three-way robustness trade-off stems from sharp anisotropic minima in parameter space and unstable feature representations that deform under perturbation.
What would settle it
Measuring the Hessian or curvature metrics of a GRACE-trained model and finding no reduction in sharpness relative to standard fine-tuning would falsify the claimed geometric mechanism.
Figures
read the original abstract
Fine-tuning approaches for Vision-Language Models (VLMs) face a critical three-way trade-off between In-Distribution (ID) accuracy, Out-of-Distribution (OOD) generalization, and adversarial robustness. Existing robust fine-tuning strategies resolve at most two axes of this trade-off. Generalization-preserving methods retain ID/OOD performance but leave models vulnerable to adversarial attacks, while adversarial training improves robustness to targeted attacks but degrades ID/OOD accuracy. Our key insight is that the robustness trade-off stems from two geometric failures: sharp, anisotropic minima in parameter space and unstable feature representations that deform under perturbation. To address this, we propose GRACE (Gram-aligned Robustness via Adaptive Curvature Estimation), a unified fine-tuning framework that jointly regularizes the parameter-space curvature and feature-space invariance for VLMs. Grounded in Robust PAC-Bayes theory, GRACE employs adaptive weight perturbations scaled by local curvature to promote flatter minima, combined with a feature alignment loss that maintains representation consistency across clean, adversarial, and OOD inputs. On ImageNet fine-tuning of CLIP models, GRACE simultaneously improves ID accuracy by 10.8%, and adversarial accuracy by 13.5% while maintaining 57.0% OOD accuracy (vs. 57.4% zero-shot baseline). Geometric analysis confirms that GRACE converges to flatter minima without feature distortion across distribution shifts, providing a principled step toward generalized robustness in foundation VLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that the three-way trade-off in VLM fine-tuning (ID accuracy, OOD generalization, adversarial robustness) arises from sharp anisotropic minima in parameter space and unstable feature representations under perturbation. GRACE addresses this via a unified framework grounded in Robust PAC-Bayes theory: adaptive curvature-regularizing weight perturbations to promote flatter minima, combined with a feature alignment loss enforcing representation consistency across clean, adversarial, and OOD inputs. On ImageNet fine-tuning of CLIP models, it reports simultaneous gains of +10.8% ID accuracy and +13.5% adversarial accuracy while holding OOD accuracy at 57.0% (vs. 57.4% zero-shot baseline), with geometric analysis confirming flatter minima without feature distortion.
Significance. If the empirical gains and theoretical grounding hold under full verification, this would be a meaningful contribution to robust fine-tuning of foundation VLMs by providing a geometric diagnosis and joint regularization strategy that resolves the typical trade-off, moving beyond methods that improve at most two axes.
major comments (2)
- Abstract and experimental section: the central claims of +10.8% ID and +13.5% adversarial accuracy improvements (with OOD near baseline) are presented without error bars, statistical significance tests, ablation tables, or a complete experimental protocol (e.g., exact OOD datasets, perturbation budgets, hyperparameter ranges), which is load-bearing for assessing whether the three-way improvement is reproducible and not an artifact of selective reporting.
- Theory section (Robust PAC-Bayes grounding): without the explicit derivations of the adaptive weight perturbations scaled by local curvature, it remains unclear whether the regularization terms are parameter-free or reduce by construction to quantities fitted on the target data, raising a potential circularity risk for the claimed geometric benefits.
minor comments (1)
- Notation for the feature alignment loss and curvature estimator should be defined more explicitly with respect to the VLM components (e.g., vision encoder vs. text encoder) to improve clarity for readers.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and have revised the manuscript to strengthen the experimental reporting and theoretical derivations.
read point-by-point responses
-
Referee: Abstract and experimental section: the central claims of +10.8% ID and +13.5% adversarial accuracy improvements (with OOD near baseline) are presented without error bars, statistical significance tests, ablation tables, or a complete experimental protocol (e.g., exact OOD datasets, perturbation budgets, hyperparameter ranges), which is load-bearing for assessing whether the three-way improvement is reproducible and not an artifact of selective reporting.
Authors: We agree that comprehensive experimental details are essential for reproducibility. In the revised manuscript we have added error bars from five independent runs with different random seeds, included paired t-test results confirming statistical significance (p < 0.01) for the reported gains, expanded the ablation tables to cover each GRACE component, and provided a complete experimental protocol in the main text and appendix. This protocol specifies the exact OOD datasets (ImageNet-A, ImageNet-R, ImageNet-V2), perturbation budgets (PGD with ε = 8/255 and 10 steps), and hyperparameter ranges used for tuning. revision: yes
-
Referee: Theory section (Robust PAC-Bayes grounding): without the explicit derivations of the adaptive weight perturbations scaled by local curvature, it remains unclear whether the regularization terms are parameter-free or reduce by construction to quantities fitted on the target data, raising a potential circularity risk for the claimed geometric benefits.
Authors: We thank the referee for this observation. The original theory section presented the high-level PAC-Bayes motivation but omitted the full derivations for space reasons. We have now inserted the explicit step-by-step derivations in Section 3, showing that the adaptive perturbations are obtained from an online local Hessian-trace approximation computed during training and are not fitted post-hoc on the target data. The resulting regularization terms follow directly from the Robust PAC-Bayes bound and remain parameter-free in their core formulation; only standard validation-based selection is used for the few scalar hyperparameters. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's central argument is an empirical claim: GRACE, motivated by geometric diagnosis of sharp minima and unstable features and grounded in external Robust PAC-Bayes theory, yields simultaneous gains in ID accuracy (+10.8%), adversarial accuracy (+13.5%), and near-baseline OOD accuracy on ImageNet-CLIP fine-tuning. No load-bearing derivation step is shown to reduce to its own inputs by construction. The abstract and skeptic summary present the method as jointly regularizing curvature (via adaptive perturbations) and feature alignment, with reported numbers as direct experimental outcomes rather than fitted predictions renamed as results. No self-citation chains, ansatzes smuggled via prior work, or self-definitional quantities appear in the provided text. The framework is therefore treated as self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Robust PAC-Bayes theory supplies valid bounds for robustness under perturbation
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J-cost uniqueness) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
GRACE employs adaptive weight perturbations scaled by local curvature to promote flatter minima, combined with a feature alignment loss that maintains representation consistency across clean, adversarial, and OOD inputs.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking (D=3 forcing) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Layerwise Adaptive Low-Rank Adversarial Weight Perturbation (LAR-AWP): structured, low-rank adversarial perturbations with layerwise adaptive magnitudes
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Analysis of representations for domain adaptation
Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. InAdvances in Neural Information Processing Systems. MIT Press, 2006. 4
work page 2006
-
[2]
Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks
Francesco Croce and Matthias Hein. Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks. ICML, 2020. 6
work page 2020
-
[3]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 7, 8, 10
work page 2009
-
[4]
Sharpness-aware minimization for efficiently improving generalization
Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness-aware minimization for efficiently improving generalization. InICLR, 2021. 2
work page 2021
-
[5]
Finetune like you pretrain: Im- proved finetuning of zero-shot vision models
Sachin Goyal, Ananya Kumar, Sankalp Garg, Zico Kolter, and Aditi Raghunathan. Finetune like you pretrain: Im- proved finetuning of zero-shot vision models. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19338–19347, 2023. 1, 2, 3, 7, 10
work page 2023
-
[6]
The many faces of robust- ness: A critical analysis of out-of-distribution generalization
Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kada- vath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Lixuan Zhu, Samyak Parajuli, Mike Guo, Dawn Xiaodong Song, Ja- cob Steinhardt, and Justin Gilmer. The many faces of robust- ness: A critical analysis of out-of-distribution generalization. 2021 IEEE/CVF International Conference on Computer Vi- sion (IC...
work page 2021
-
[7]
Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Stein- hardt, and Dawn Song. Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 15262–15271,
-
[8]
LoRA: Low-rank adaptation of large language models
Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In ICLR, 2022. 2
work page 2022
-
[9]
Directional gradient pro- jection for robust fine-tuning of foundation models
Chengyue Huang, Junjiao Tian, Brisa Maneechotesuwan, Shivang Chopra, and Zsolt Kira. Directional gradient pro- jection for robust fine-tuning of foundation models. InThe Thirteenth International Conference on Learning Represen- tations, 2025. 1
work page 2025
-
[10]
Scaling up visual and vision-language representa- tion learning with noisy text supervision
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representa- tion learning with noisy text supervision. InInternational conference on machine learning, pages 4904–4916. PMLR,
-
[11]
Visualizing the loss landscape of neural nets
Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape of neural nets. In Neural Information Processing Systems, 2018. 3
work page 2018
-
[12]
Rethinking natural adversarial examples for classifica- tion models, 2021
Xiao Li, Jianmin Li, Ting Dai, Jie Shi, Jun Zhu, and Xiaolin Hu. Rethinking natural adversarial examples for classifica- tion models, 2021. 6
work page 2021
-
[13]
Language-driven anchors for zero-shot ad- versarial robustness
Xiao Li, Wei Zhang, Yining Liu, Zhanhao Hu, Bo Zhang, and Xiaolin HU. Language-driven anchors for zero-shot ad- versarial robustness. InCVPR, 2024. 2, 3, 7, 10
work page 2024
-
[14]
Sophia: A scalable stochastic second-order optimizer for language model pre-training
Hong Liu, Zhiyuan Li, David Leo Wright Hall, Percy Liang, and Tengyu Ma. Sophia: A scalable stochastic second-order optimizer for language model pre-training. InThe Twelfth In- ternational Conference on Learning Representations, 2024. 5
work page 2024
-
[15]
Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E
Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E. Houle, and James Bailey. Characterizing adversarial sub- spaces using local intrinsic dimensionality, 2018. 5
work page 2018
-
[16]
Towards deep learn- ing models resistant to adversarial attacks
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learn- ing models resistant to adversarial attacks. InInternational Conference on Learning Representations, 2018. 1, 2
work page 2018
-
[17]
Understanding zero-shot adversarial robust- ness for large-scale models
Chengzhi Mao, Scott Geng, Junfeng Yang, Xin Wang, and Carl V ondrick. Understanding zero-shot adversarial robust- ness for large-scale models. InThe Eleventh International Conference on Learning Representations, 2023. 1, 2, 3, 7, 10
work page 2023
-
[18]
A pac-bayesian tutorial with a dropout bound, 2013
David McAllester. A pac-bayesian tutorial with a dropout bound, 2013. 3
work page 2013
-
[19]
Lipsum-FT: Ro- bust fine-tuning of zero-shot models using random text guid- ance
Giung Nam, Byeongho Heo, and Juho Lee. Lipsum-FT: Ro- bust fine-tuning of zero-shot models using random text guid- ance. InThe Twelfth International Conference on Learning Representations, 2024. 1, 2
work page 2024
-
[20]
Exploring generalization in deep learning
Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nathan Srebro. Exploring generalization in deep learning. InProceedings of the 31st International Conference on Neural Information Processing Systems, page 5949–5958, Red Hook, NY , USA, 2017. Curran Associates Inc. 3
work page 2017
-
[21]
A pac-bayesian approach to spectrally-normalized mar- gin bounds for neural networks, 2018
Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Sre- bro. A pac-bayesian approach to spectrally-normalized mar- gin bounds for neural networks, 2018. 3
work page 2018
-
[22]
Curran Associates Inc., Red Hook, NY , USA, 2019
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K ¨opf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala.PyTorch: an imper- ative style, high-perfo...
work page 2019
-
[23]
Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, and Quoc V . Le. Combined scaling for zero-shot transfer learn- ing.Neurocomput., 555(C), 2023. 1
work page 2023
-
[24]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, 2021....
work page 2021
-
[25]
Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do ImageNet classifiers generalize to Ima- geNet? InProceedings of the 36th International Conference on Machine Learning, pages 5389–5400. PMLR, 2019. 6, 8
work page 2019
-
[26]
Christian Schlarmann, Naman Deep Singh, Francesco Croce, and Matthias Hein. Robust CLIP: Unsupervised ad- versarial fine-tuning of vision embeddings for robust large vision-language models. InProceedings of the 41st Inter- national Conference on Machine Learning, pages 43685– 43704. PMLR, 2024. 1, 2, 3, 7, 10
work page 2024
- [27]
-
[28]
Trainable projected gradient method for robust fine-tuning
Junjiao Tian, Xiaoliang Dai, Chih-Yao Ma, Zecheng He, Yen-Cheng Liu, and Zsolt Kira. Trainable projected gradient method for robust fine-tuning. In2023 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 7836–7845, 2023. 1, 2, 3, 7, 10
work page 2023
-
[29]
Rethinking weight decay for robust fine-tuning of foundation models
Junjiao Tian, Chengyue Huang, and Zsolt Kira. Rethinking weight decay for robust fine-tuning of foundation models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 3, 7, 10
work page 2024
-
[30]
Fast trainable projection for robust fine-tuning
Junjiao Tian, Yen-Cheng Liu, James Seale Smith, and Zsolt Kira. Fast trainable projection for robust fine-tuning. InPro- ceedings of the 37th International Conference on Neural In- formation Processing Systems, Red Hook, NY , USA, 2024. Curran Associates Inc. 1, 2
work page 2024
-
[31]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neu- ral Information Processing Systems. Curran Associates, Inc.,
-
[32]
Learning robust global representations by penalizing local predictive power
Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. Learning robust global representations by penalizing local predictive power. InAdvances in Neural Information Processing Systems. Curran Associates, Inc., 2019. 6, 8
work page 2019
-
[33]
Improving out-of-distribution generalization by adversarial training with structured priors
Qixun Wang, Yifei Wang, Hong Zhu, and Yisen Wang. Improving out-of-distribution generalization by adversarial training with structured priors. InAdvances in Neural Infor- mation Processing Systems, 2022. 2
work page 2022
-
[34]
Pre- trained model guided fine-tuning for zero-shot adversarial robustness
Sibo Wang, Jie Zhang, Zheng Yuan, and Shiguang Shan. Pre- trained model guided fine-tuning for zero-shot adversarial robustness. InCVPR, 2024. 2, 3, 7, 10
work page 2024
-
[35]
Transformers: State-of-the-art natural language processing
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chau- mond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. Transformers: State-of-the-art ...
work page 2020
-
[36]
Robust fine-tuning of zero-shot models
Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gon- tijo Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, and Ludwig Schmidt. Robust fine-tuning of zero-shot models. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 7959–7971, 2022. 1, 2, 3, 7, 10
work page 2022
-
[37]
Adversarial weight perturbation helps robust generalization
Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization. InNeurIPS,
-
[38]
Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongx- uan LI, Ngai-Man (Man) Cheung, and Min Lin. On eval- uating adversarial robustness of large vision-language mod- els. InAdvances in Neural Information Processing Systems, pages 54111–54138. Curran Associates, Inc., 2023. 1 The Geometry of Robustness: Optimizing Loss Landscape Curvature and Feature Man...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.