Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning

Ghodsiyeh Rostami; Mahdi S. Hosseini; Po-Han Chen

arxiv: 2606.10196 · v1 · pith:COTOANRFnew · submitted 2026-06-08 · 💻 cs.CV · cs.AI

Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning

Ghodsiyeh Rostami , Po-Han Chen , Mahdi S. Hosseini This is my paper

Pith reviewed 2026-06-27 16:32 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords parameter-efficient fine-tuningFisher informationadaptive fine-tuningPAC-Bayesian boundsJensen-Shannon distanceprogressive parameter selectionsegmentation adaptation

0 comments

The pith

Fisher structural drift allows progressive freezing of parameter groups during fine-tuning to reduce generalization error bounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FisherAdapTune to adapt pretrained models by selecting trainable parameters based on how their Fisher information distributions change over time. It begins from a PAC-Bayesian perspective on fine-tuning and decomposes the generalization error bound into Fisher-weighted update costs. Parameter groups whose curvature contribution stabilizes can then be frozen, lowering the bound while leaving other adaptation dynamics intact. The selection uses a scale-invariant Jensen-Shannon distance between consecutive Fisher distributions to create an adaptive active set. Experiments on a segmentation task show gains in both in-distribution accuracy and zero-shot transfer compared with fixed heuristic approaches.

Core claim

FisherAdapTune formulates parameter selection by tracking the temporal drift of Fisher geometry; starting from a PAC-Bayesian view, the generalization error bound is decomposed into Fisher-weighted update costs so that groups whose curvature contribution has stabilized can be frozen without interrupting remaining adaptation dynamics, with the criterion expressed via scale-invariant Jensen-Shannon distance between consecutive Fisher distributions.

What carries the argument

Scale-invariant Jensen-Shannon distance between consecutive Fisher distributions, used to detect structural drift and decide which parameter groups to freeze.

If this is right

The active parameter set becomes task-dependent rather than fixed by architecture.
Fewer parameters need updating once drift stabilizes, lowering compute per step.
The same drift signal improves both in-distribution performance and zero-shot transfer on segmentation tasks.
The PAC-Bayesian decomposition supplies an explicit link between curvature stability and error-bound reduction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same drift-tracking rule could be applied to other vision or language adaptation settings where Fisher matrices remain tractable.
If the decomposition holds, similar progressive freezing might combine with existing PEFT modules such as adapters or LoRA.
Measuring drift only on a small validation subset might preserve the method's efficiency while still capturing task-specific curvature changes.

Load-bearing premise

The generalization error bound from a PAC-Bayesian view of fine-tuning can be decomposed into Fisher-weighted update costs such that stabilized curvature groups can be frozen without harming remaining adaptation.

What would settle it

A controlled run in which parameters are frozen according to the Fisher drift criterion yet the final generalization error or downstream accuracy is higher than when the same groups remain trainable throughout training.

Figures

Figures reproduced from arXiv: 2606.10196 by Ghodsiyeh Rostami, Mahdi S. Hosseini, Po-Han Chen.

**Figure 1.** Figure 1: Layer-wise Fisher drift during full fine-tuning is uneven, motivating FisherAdapTune’s progressive freezing of stabilized layers. In this work, we propose FisherAdapTune, a Fisher-guided Adaptive Fine-Tuning framework for dynamic parameter selection during fine-tuning. From a PAC-Bayesian perspective, we theoretically show that freezing parameters with stabilized contributions effectively tightens gener… view at source ↗

**Figure 2.** Figure 2: Layer-wise Jensen-Shannon (JS) distance between consecutive Fisher distributions in [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Schematic overview of FisherAdapTune. The algorithm periodically estimates Fisher [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation trends over the freezing threshold [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Comparative fine-tuning performance for SAM2 and SegFormer. OmniCrack F1-score [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Parameter selection dynamics by FisherAdapTune. The algorithm progressively freezes [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Parameter group-wise Jensen-Shannon (JS) distance between consecutive Fisher distribu [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Layer-wise Jensen-Shannon (JS) distance between consecutive Fisher distributions in [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Trainable-parameter evolution for SAM2-Tiny across freezing thresholds [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Trainable-parameter evolution for SAM2-Large across freezing thresholds SegFormerB3Trainable Parameter Eoltion per (FisherAdapTne) [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Trainable-parameter evolution for SegFormer-MiT-B3 across freezing thresholds [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: Trainable-parameter evolution for SegFormer-MiT-B0 across freezing thresholds [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

read the original abstract

Parameter-efficient fine-tuning (PEFT) aims to adapt pretrained models with a small trainable parameter subset, however, most existing methods choose this subset from fixed architectural heuristics rather than using dynamic, task-aware criteria. We introduce \textbf{FisherAdapTune}, a Fisher-guided Adaptive Fine-Tuning framework that progressively selects parameter groups by tracking the temporal drift of their Fisher geometry. Starting from a PAC-Bayesian view of fine-tuning, we decompose the generalization error bound into Fisher-weighted update costs and show that parameter groups whose curvature contribution has stabilized can be frozen to reduce the error bound without interrupting the remaining adaptation dynamics. FisherAdapTune formulates this criterion with a scale-invariant Jensen-Shannon distance between consecutive Fisher distributions, yielding an adaptive active parameter set. We evaluate our approach on a downstream segmentation task, and results show FisherAdapTune improves the in-distribution performance and zero-shot transfer in multiple settings, validating that Fisher structural drift is a useful signal for efficient, task-aware adaptation. We release our \href{https://github.com/AtlasAnalyticsLab/FisherAdapTune}{code} publicly to enable further application of our proposed approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FisherAdapTune uses scale-invariant JS distance on Fisher distributions for progressive parameter freezing in PEFT, but the claimed PAC-Bayesian decomposition is not shown and experiments have no numbers.

read the letter

The paper's main contribution is FisherAdapTune, which tracks drift in Fisher geometry across training steps and uses a scale-invariant Jensen-Shannon distance to decide when to freeze parameter groups. This replaces the fixed architectural rules common in PEFT with a dynamic, task-aware rule. They motivate the freeze decision by starting from a PAC-Bayesian generalization bound, decomposing it into Fisher-weighted update costs, and arguing that stabilized groups can be dropped without raising the bound or stopping useful adaptation.

The new piece is the specific progressive criterion based on temporal Fisher drift rather than static heuristics. Releasing the code is useful for anyone who wants to test the implementation directly. The abstract ties the method to segmentation and zero-shot transfer gains, which at least shows they ran the experiments on a real downstream task.

The soft spots are in the theory and the missing evidence. The decomposition into Fisher-weighted costs is stated but not derived, so it is unclear whether the JS threshold actually produces a strict reduction in the bound or whether it rests on unstated assumptions about parameter-group independence. The stress-test concern about clean separation from KL and data terms looks like it still applies. On the empirical side, the abstract supplies no baselines, no quantitative deltas, and no error bars, which makes it impossible to judge whether the reported improvements are large or reliable.

This is for people working on efficient adaptation of vision models who are already thinking about dynamic parameter selection. A reader could extract the Fisher-drift idea and try it in their own setup using the released code. It deserves peer review because the framing is distinct enough that referees can check the missing derivations and the actual experimental controls. The authors engage with the right problem even if the current write-up leaves the central link under-supported.

Referee Report

2 major / 1 minor

Summary. The paper introduces FisherAdapTune, a Fisher-guided adaptive fine-tuning method that progressively selects parameter groups by tracking temporal drift in their Fisher geometry via scale-invariant Jensen-Shannon distance between consecutive Fisher distributions. Starting from a PAC-Bayesian perspective on fine-tuning, it claims to decompose the generalization error bound into Fisher-weighted update costs so that groups whose curvature contribution has stabilized can be frozen, reducing the bound without interrupting remaining adaptation dynamics. The approach is positioned as a task-aware alternative to fixed architectural heuristics in PEFT, with evaluation on a downstream segmentation task reportedly showing gains in in-distribution performance and zero-shot transfer; code is released publicly.

Significance. If the claimed decomposition holds and the empirical results prove robust with proper controls, the work could provide a principled, dynamic criterion for parameter selection in fine-tuning that improves efficiency over static PEFT methods. The public code release is a positive contribution that supports reproducibility.

major comments (2)

[Abstract] Abstract: the PAC-Bayesian decomposition of the generalization error bound into Fisher-weighted update costs is asserted without any derivation, explicit equations, or stated assumptions on parameter-group independence or higher-order terms; this leaves the justification for the JS-distance freeze criterion unverified and makes it impossible to confirm that freezing reduces the bound while preserving adaptation dynamics.
[Abstract] Abstract: performance improvements on segmentation and zero-shot transfer are claimed, yet no quantitative results, baselines, error bars, dataset details, or experimental protocol are supplied, so the data-to-claim link cannot be assessed and the central empirical support for Fisher structural drift as a useful signal remains unevaluated.

minor comments (1)

[Abstract] The abstract refers to 'multiple settings' without enumerating them or indicating whether they include standard benchmarks or ablation studies on the JS threshold.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each major point below and propose targeted revisions to improve clarity.

read point-by-point responses

Referee: [Abstract] Abstract: the PAC-Bayesian decomposition of the generalization error bound into Fisher-weighted update costs is asserted without any derivation, explicit equations, or stated assumptions on parameter-group independence or higher-order terms; this leaves the justification for the JS-distance freeze criterion unverified and makes it impossible to confirm that freezing reduces the bound while preserving adaptation dynamics.

Authors: The full derivation of the PAC-Bayesian decomposition, including explicit equations, the handling of higher-order terms, and assumptions on parameter-group independence, appears in Section 3 of the manuscript. The abstract summarizes the high-level contribution. To address the concern that the abstract alone leaves the justification unverified, we will revise the abstract to include a concise reference to the key decomposition equation and the relevant section. revision: yes
Referee: [Abstract] Abstract: performance improvements on segmentation and zero-shot transfer are claimed, yet no quantitative results, baselines, error bars, dataset details, or experimental protocol are supplied, so the data-to-claim link cannot be assessed and the central empirical support for Fisher structural drift as a useful signal remains unevaluated.

Authors: We agree that the abstract provides only a high-level claim without numbers or protocol details. The complete quantitative results (including specific gains, baselines, error bars, datasets, and experimental protocol) are reported in Section 4 and the supplementary material. We will revise the abstract to incorporate representative quantitative metrics and a brief mention of the evaluation settings to strengthen the link between claims and evidence. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation presented as independent decomposition from PAC-Bayesian starting point

full rationale

The abstract states that the authors start from a PAC-Bayesian view and decompose the generalization error bound into Fisher-weighted update costs, then formulate the freeze criterion via scale-invariant JS distance on Fisher distributions. No equations are supplied that reduce the JS threshold or active parameter set back to quantities defined inside the bound itself. No self-citations, fitted-input renamings, or ansatzes smuggled via prior work appear in the provided text. The central claim therefore retains independent content from the stated decomposition and the empirical JS criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the PAC-Bayesian decomposition of the generalization bound and the interpretation of stabilized Fisher curvature as safe to freeze; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption PAC-Bayesian view of fine-tuning decomposes the generalization error bound into Fisher-weighted update costs
Stated explicitly as the starting point for the method.

pith-pipeline@v0.9.1-grok · 5737 in / 1179 out tokens · 31912 ms · 2026-06-27T16:32:26.707266+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 13 canonical work pages

[1]

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning, December 2020

Armen Aghajanyan, Luke Zettlemoyer, and Sonal Gupta. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning, December 2020

2020
[2]

Natural Gradient Works Efficiently in Learning.Neural Computation, 10(2): 251–276, February 1998

Shun-ichi Amari. Natural Gradient Works Efficiently in Learning.Neural Computation, 10(2): 251–276, February 1998. ISSN 0899-7667. doi: 10.1162/089976698300017746

work page doi:10.1162/089976698300017746 1998
[3]

Springer, February 2016

Shun-ichi Amari.Information Geometry and Its Applications. Springer, February 2016. ISBN 978-4-431-55978-8

2016
[4]

Composable Sparse Fine-Tuning for Cross-Lingual Transfer, February 2023

Alan Ansell, Edoardo Maria Ponti, Anna Korhonen, and Ivan Vuli ´c. Composable Sparse Fine-Tuning for Cross-Lingual Transfer, February 2023

2023
[5]

Exploring Visual Prompts for Adapting Large-Scale Models, June 2022

Hyojin Bahng, Ali Jahanian, Swami Sankaranarayanan, and Phillip Isola. Exploring Visual Prompts for Adapting Large-Scale Models, June 2022

2022
[6]

Affor- dancellm: Grounding affordance from vision language models

Christian Benz and V olker Rodehorst. Omni-Crack30k: A Benchmark for Crack Segmentation and the Reasonable Effectiveness of Transfer Learning. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 3876–3886, June 2024. doi: 10.1109/CVPRW63382.2024.00392

work page doi:10.1109/cvprw63382.2024.00392 2024
[7]

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition.Advances in Neural Information Processing Systems, 35:16664–16678, December 2022

Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition.Advances in Neural Information Processing Systems, 35:16664–16678, December 2022

2022
[8]

Vision Transformer Adapter for Dense Predictions

Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Yu Qiao. Vision Transformer Adapter for Dense Predictions. InThe Eleventh International Conference on Learning Representations, September 2022

2022
[9]

The Devil is in the Crack Orientation: A New Perspective for Crack Detection

Zhuangzhuang Chen, Jin Zhang, Zhuonan Lai, Guanming Zhu, Zun Liu, Jie Chen, and Jianqiang Li. The Devil is in the Crack Orientation: A New Perspective for Crack Detection. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6630–6640, October
[10]

doi: 10.1109/ICCV51070.2023.00612

work page doi:10.1109/iccv51070.2023.00612 2023
[11]

Mind marginal non-crack regions: Clustering-inspired representation learning for crack segmentation

Zhuangzhuang Chen, Zhuonan Lai, Jie Chen, and Jianqiang Li. Mind marginal non-crack regions: Clustering-inspired representation learning for crack segmentation. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12698–12708, June
[12]

doi: 10.1109/CVPR52733.2024.01207. 10

work page doi:10.1109/cvpr52733.2024.01207 2024
[13]

Kronecker-factored Approximate Curvature (KFAC) From Scratch, July 2025

Felix Dangel, Bálint Mucsányi, Tobias Weber, and Runa Eschenhagen. Kronecker-factored Approximate Curvature (KFAC) From Scratch, July 2025

2025
[14]

Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data.arXiv preprint arXiv:1703.11008, 2017

Gintare Karolina Dziugaite and Daniel M Roy. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data.arXiv preprint arXiv:1703.11008, 2017

Pith/arXiv arXiv 2017
[15]

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, March 2019

Jonathan Frankle and Michael Carbin. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, March 2019

2019
[16]

K. Ge, C. Wang, Y . T. Guo, Y . S. Tang, Z. Z. Hu, and H. B. Chen. Fine-tuning vision foundation model for crack segmentation in civil infrastructures.Construction and Building Materials, 431:136573, June 2024. ISSN 0950-0618. doi: 10.1016/j.conbuildmat.2024.136573

work page doi:10.1016/j.conbuildmat.2024.136573 2024
[17]

Hosseini

Damien MARTINS Gomes, Yanlei Zhang, Eugene Belilovsky, Guy Wolf, and Mahdi S. Hosseini. AdaFisher: Adaptive Second Order Optimization via Fisher Information. InThe Thirteenth International Conference on Learning Representations, October 2024

2024
[18]

Rush, and Yoon Kim

Demi Guo, Alexander M. Rush, and Yoon Kim. Parameter-Efficient Transfer Learning with Diff Pruning, June 2021

2021
[19]

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey, April 2024

Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey, April 2024

2024
[20]

Parameter- Efficient Model Adaptation for Vision Transformers.Proceedings of the AAAI Conference on Artificial Intelligence, 37(1):817–825, June 2023

Xuehai He, Chunyuan Li, Pengchuan Zhang, Jianwei Yang, and Xin Eric Wang. Parameter- Efficient Model Adaptation for Vision Transformers.Proceedings of the AAAI Conference on Artificial Intelligence, 37(1):817–825, June 2023. ISSN 2374-3468. doi: 10.1609/aaai.v37i1. 25160

work page doi:10.1609/aaai.v37i1 2023
[21]

On the slow death of scaling.Available at SSRN 5877662, 2025

Sara Hooker. On the slow death of scaling.Available at SSRN 5877662, 2025

2025
[22]

Parameter-efficient transfer learning for nlp

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InInternational conference on machine learning, pages 2790–2799. PMLR, 2019

2019
[23]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models, October 2021

2021
[24]

UniSim: A Neural Closed-Loop Sensor Simulator

Qidong Huang, Xiaoyi Dong, Dongdong Chen, Weiming Zhang, Feifei Wang, Gang Hua, and Nenghai Yu. Diversity-Aware Meta Visual Prompting. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10878–10887, Vancouver, BC, Canada, June 2023. IEEE. ISBN 979-8-3503-0129-8. doi: 10.1109/CVPR52729.2023.01047

work page doi:10.1109/cvpr52729.2023.01047 2023
[25]

Visual Prompt Tuning, July 2022

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual Prompt Tuning, July 2022

2022
[26]

Convolutional Bypasses Are Better Vision Transformer Adapters

Shibo Jie, Zhi-Hong Deng, Shixuan Chen, and Zhijuan Jin. Convolutional Bypasses Are Better Vision Transformer Adapters. InECAI 2024, pages 202–209. IOS Press, 2024. doi: 10.3233/FAIA240489

work page doi:10.3233/faia240489 2024
[27]

BayesTune: Bayesian Sparse Deep Model Fine- tuning.Advances in Neural Information Processing Systems, 36:65317–65365, December 2023

Minyoung Kim and Timothy Hospedales. BayesTune: Bayesian Sparse Deep Model Fine- tuning.Advances in Neural Information Processing Systems, 36:65317–65365, December 2023

2023
[28]

Overcoming catastrophic forgetting in neural networks.Proceedings of the national academy of sciences, 114(13):3521–3526, 2017

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks.Proceedings of the national academy of sciences, 114(13):3521–3526, 2017

2017
[29]

Fine- Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

Ananya Kumar, Aditi Raghunathan, Robbie Matthew Jones, Tengyu Ma, and Percy Liang. Fine- Tuning can Distort Pretrained Features and Underperform Out-of-Distribution. InInternational Conference on Learning Representations, October 2021. 11

2021
[30]

Real-time high-resolution neural network with semantic guidance for crack segmentation.Automation in Construction, 156: 105112, December 2023

Yongshang Li, Ronggui Ma, Han Liu, and Gaoli Cheng. Real-time high-resolution neural network with semantic guidance for crack segmentation.Automation in Construction, 156: 105112, December 2023. ISSN 0926-5805. doi: 10.1016/j.autcon.2023.105112

work page doi:10.1016/j.autcon.2023.105112 2023
[31]

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

Haowei Lin, Baizhou Huang, Haotian Ye, Qinyu Chen, Zihao Wang, Sujian Li, Jianzhu Ma, Xiaojun Wan, James Zou, and Yitao Liang. Selecting Large Language Model to Fine-tune via Rectified Scaling Law. InProceedings of the 41st International Conference on Machine Learning, pages 30080–30107. PMLR, July 2024

2024
[32]

AutoFreeze: Automatically Freez- ing Model Blocks to Accelerate Fine-tuning, April 2021

Yuhan Liu, Saurabh Agarwal, and Shivaram Venkataraman. AutoFreeze: Automatically Freez- ing Model Blocks to Accelerate Fine-tuning, April 2021

2021
[33]

PAC-Bayes compression bounds so tight that they can explain generalization

Sanae Lotfi, Marc Anton Finzi, Sanyam Kapoor, Andres Potapczynski, Micah Goldblum, and Andrew Gordon Wilson. PAC-Bayes compression bounds so tight that they can explain generalization. InAdvances in Neural Information Processing Systems, 2022. URL https: //openreview.net/forum?id=o8nYuR8ekFm

2022
[34]

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

James Martens and Roger Grosse. Optimizing Neural Networks with Kronecker-factored Approximate Curvature. InProceedings of the 32nd International Conference on Machine Learning, pages 2408–2417. PMLR, June 2015

2015
[35]

McAllester

David A. McAllester. PAC-Bayesian model averaging. InProceedings of the Twelfth Annual Conference on Computational Learning Theory, pages 164–170, Santa Cruz California USA, July 1999. ACM. ISBN 978-1-58113-167-3. doi: 10.1145/307400.307435

work page doi:10.1145/307400.307435 1999
[36]

McAllester

David A. McAllester. Some PAC-Bayesian Theorems.Machine Learning, 37(3):355–363, December 1999. ISSN 1573-0565. doi: 10.1023/A:1007618624809

work page doi:10.1023/a:1007618624809 1999
[37]

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach.Advances in Neural Information Processing Systems, 35:30950–30962, December 2022

Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji, and Dacheng Tao. Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach.Advances in Neural Information Processing Systems, 35:30950–30962, December 2022

2022
[38]

How Do Vision Transformers Work? InInternational Conference on Learning Representations, October 2021

Namuk Park and Songkuk Kim. How Do Vision Transformers Work? InInternational Conference on Learning Representations, October 2021

2021
[39]

SAM 2: Segment Anything in Images and Videos, October 2024

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, and Christoph Feichtenhofer. SAM 2: Segment Anything in Images and Videos, October 2024

2024
[40]

Trade-Offs of Diagonal Fisher Information Matrix Estimators

Alexander Soen and Ke Sun. Trade-Offs of Diagonal Fisher Information Matrix Estimators. Advances in Neural Information Processing Systems, 37:5870–5912, December 2024

2024
[41]

Training Neural Networks with Fixed Sparse Masks - Preprint, November 2021

Yi-Lin Sung, Varun Nair, and Colin Raffel. Training Neural Networks with Fixed Sparse Masks - Preprint, November 2021

2021
[42]

Scale Efficiently: Insights from Pretraining and Finetuning Transformers

Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, and Donald Metzler. Scale Efficiently: Insights from Pretraining and Finetuning Transformers. InInternational Conference on Learning Representations, October 2021

2021
[43]

Automatic concrete crack segmentation model based on transformer

Wenjun Wang and Chao Su. Automatic concrete crack segmentation model based on transformer. Automation in Construction, 139:104275, July 2022. ISSN 09265805. doi: 10.1016/j.autcon. 2022.104275

work page doi:10.1016/j.autcon 2022
[44]

LoRA-Pro: Are Low- Rank Adapters Properly Optimized? InThe Thirteenth International Conference on Learning Representations, October 2024

Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, and Tieniu Tan. LoRA-Pro: Are Low- Rank Adapters Properly Optimized? InThe Thirteenth International Conference on Learning Representations, October 2024

2024
[45]

Alvarez, and Ping Luo

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Advances in Neural Information Processing Systems, volume 34, pages 12077–12090. Curran Associates, Inc., 2021. 12

2021
[46]

Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning, September 2021

Runxin Xu, Fuli Luo, Zhiyuan Zhang, Chuanqi Tan, Baobao Chang, Songfang Huang, and Fei Huang. Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning, September 2021

2021
[47]

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models, September 2022

Elad Ben Zaken, Shauli Ravfogel, and Yoav Goldberg. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models, September 2022

2022
[48]

Parameter- Efficient Fine-Tuning for Foundation Models, January 2025

Dan Zhang, Tao Feng, Lilong Xue, Yuandong Wang, Yuxiao Dong, and Jie Tang. Parameter- Efficient Fine-Tuning for Foundation Models, January 2025

2025
[49]

Adalora: Adaptive budget allocation for parameter- efficient fine-tuning.arXiv preprint arXiv:2303.10512, 2023

Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adalora: Adaptive budget allocation for parameter- efficient fine-tuning.arXiv preprint arXiv:2303.10512, 2023

Pith/arXiv arXiv 2023
[50]

Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning

Bingchen Zhao, Haoqin Tu, Chen Wei, Jieru Mei, and Cihang Xie. Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning. InThe Twelfth International Conference on Learning Representations, October 2023

2023
[51]

FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Gener- alized Segmentation

Dong Zhao, Jinlong Li, Shuang Wang, Mengyao Wu, Qi Zang, Nicu Sebe, and Zhun Zhong. FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Gener- alized Segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15043–15054, 2025

2025
[52]

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models, October 2020

Mengjie Zhao, Tao Lin, Fei Mi, Martin Jaggi, and Hinrich Schütze. Masking as an Efficient Alternative to Finetuning for Pretrained Language Models, October 2020

2020
[53]

Non-vacuous generalization bounds at the imagenet scale: a pac-bayesian compression approach.arXiv preprint arXiv:1804.05862, 2018

Wenda Zhou, Victor Veitch, Morgane Austern, Ryan P Adams, and Peter Orbanz. Non-vacuous generalization bounds at the imagenet scale: a pac-bayesian compression approach.arXiv preprint arXiv:1804.05862, 2018

Pith/arXiv arXiv 2018
[54]

A Comprehensive Survey on Transfer Learning.Proceedings of the IEEE, 109 (1):43–76, January 2021

Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. A Comprehensive Survey on Transfer Learning.Proceedings of the IEEE, 109 (1):43–76, January 2021. ISSN 1558-2256. doi: 10.1109/JPROC.2020.3004555. 13 Appendix Overview The appendix provides theoretical derivations, implementation details for Fisher drift, s...

work page doi:10.1109/jproc.2020.3004555 2021

[1] [1]

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning, December 2020

Armen Aghajanyan, Luke Zettlemoyer, and Sonal Gupta. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning, December 2020

2020

[2] [2]

Natural Gradient Works Efficiently in Learning.Neural Computation, 10(2): 251–276, February 1998

Shun-ichi Amari. Natural Gradient Works Efficiently in Learning.Neural Computation, 10(2): 251–276, February 1998. ISSN 0899-7667. doi: 10.1162/089976698300017746

work page doi:10.1162/089976698300017746 1998

[3] [3]

Springer, February 2016

Shun-ichi Amari.Information Geometry and Its Applications. Springer, February 2016. ISBN 978-4-431-55978-8

2016

[4] [4]

Composable Sparse Fine-Tuning for Cross-Lingual Transfer, February 2023

Alan Ansell, Edoardo Maria Ponti, Anna Korhonen, and Ivan Vuli ´c. Composable Sparse Fine-Tuning for Cross-Lingual Transfer, February 2023

2023

[5] [5]

Exploring Visual Prompts for Adapting Large-Scale Models, June 2022

Hyojin Bahng, Ali Jahanian, Swami Sankaranarayanan, and Phillip Isola. Exploring Visual Prompts for Adapting Large-Scale Models, June 2022

2022

[6] [6]

Affor- dancellm: Grounding affordance from vision language models

Christian Benz and V olker Rodehorst. Omni-Crack30k: A Benchmark for Crack Segmentation and the Reasonable Effectiveness of Transfer Learning. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 3876–3886, June 2024. doi: 10.1109/CVPRW63382.2024.00392

work page doi:10.1109/cvprw63382.2024.00392 2024

[7] [7]

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition.Advances in Neural Information Processing Systems, 35:16664–16678, December 2022

Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition.Advances in Neural Information Processing Systems, 35:16664–16678, December 2022

2022

[8] [8]

Vision Transformer Adapter for Dense Predictions

Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Yu Qiao. Vision Transformer Adapter for Dense Predictions. InThe Eleventh International Conference on Learning Representations, September 2022

2022

[9] [9]

The Devil is in the Crack Orientation: A New Perspective for Crack Detection

Zhuangzhuang Chen, Jin Zhang, Zhuonan Lai, Guanming Zhu, Zun Liu, Jie Chen, and Jianqiang Li. The Devil is in the Crack Orientation: A New Perspective for Crack Detection. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6630–6640, October

[10] [10]

doi: 10.1109/ICCV51070.2023.00612

work page doi:10.1109/iccv51070.2023.00612 2023

[11] [11]

Mind marginal non-crack regions: Clustering-inspired representation learning for crack segmentation

Zhuangzhuang Chen, Zhuonan Lai, Jie Chen, and Jianqiang Li. Mind marginal non-crack regions: Clustering-inspired representation learning for crack segmentation. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12698–12708, June

[12] [12]

doi: 10.1109/CVPR52733.2024.01207. 10

work page doi:10.1109/cvpr52733.2024.01207 2024

[13] [13]

Kronecker-factored Approximate Curvature (KFAC) From Scratch, July 2025

Felix Dangel, Bálint Mucsányi, Tobias Weber, and Runa Eschenhagen. Kronecker-factored Approximate Curvature (KFAC) From Scratch, July 2025

2025

[14] [14]

Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data.arXiv preprint arXiv:1703.11008, 2017

Gintare Karolina Dziugaite and Daniel M Roy. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data.arXiv preprint arXiv:1703.11008, 2017

Pith/arXiv arXiv 2017

[15] [15]

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, March 2019

Jonathan Frankle and Michael Carbin. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, March 2019

2019

[16] [16]

K. Ge, C. Wang, Y . T. Guo, Y . S. Tang, Z. Z. Hu, and H. B. Chen. Fine-tuning vision foundation model for crack segmentation in civil infrastructures.Construction and Building Materials, 431:136573, June 2024. ISSN 0950-0618. doi: 10.1016/j.conbuildmat.2024.136573

work page doi:10.1016/j.conbuildmat.2024.136573 2024

[17] [17]

Hosseini

Damien MARTINS Gomes, Yanlei Zhang, Eugene Belilovsky, Guy Wolf, and Mahdi S. Hosseini. AdaFisher: Adaptive Second Order Optimization via Fisher Information. InThe Thirteenth International Conference on Learning Representations, October 2024

2024

[18] [18]

Rush, and Yoon Kim

Demi Guo, Alexander M. Rush, and Yoon Kim. Parameter-Efficient Transfer Learning with Diff Pruning, June 2021

2021

[19] [19]

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey, April 2024

Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey, April 2024

2024

[20] [20]

Parameter- Efficient Model Adaptation for Vision Transformers.Proceedings of the AAAI Conference on Artificial Intelligence, 37(1):817–825, June 2023

Xuehai He, Chunyuan Li, Pengchuan Zhang, Jianwei Yang, and Xin Eric Wang. Parameter- Efficient Model Adaptation for Vision Transformers.Proceedings of the AAAI Conference on Artificial Intelligence, 37(1):817–825, June 2023. ISSN 2374-3468. doi: 10.1609/aaai.v37i1. 25160

work page doi:10.1609/aaai.v37i1 2023

[21] [21]

On the slow death of scaling.Available at SSRN 5877662, 2025

Sara Hooker. On the slow death of scaling.Available at SSRN 5877662, 2025

2025

[22] [22]

Parameter-efficient transfer learning for nlp

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InInternational conference on machine learning, pages 2790–2799. PMLR, 2019

2019

[23] [23]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models, October 2021

2021

[24] [24]

UniSim: A Neural Closed-Loop Sensor Simulator

Qidong Huang, Xiaoyi Dong, Dongdong Chen, Weiming Zhang, Feifei Wang, Gang Hua, and Nenghai Yu. Diversity-Aware Meta Visual Prompting. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10878–10887, Vancouver, BC, Canada, June 2023. IEEE. ISBN 979-8-3503-0129-8. doi: 10.1109/CVPR52729.2023.01047

work page doi:10.1109/cvpr52729.2023.01047 2023

[25] [25]

Visual Prompt Tuning, July 2022

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual Prompt Tuning, July 2022

2022

[26] [26]

Convolutional Bypasses Are Better Vision Transformer Adapters

Shibo Jie, Zhi-Hong Deng, Shixuan Chen, and Zhijuan Jin. Convolutional Bypasses Are Better Vision Transformer Adapters. InECAI 2024, pages 202–209. IOS Press, 2024. doi: 10.3233/FAIA240489

work page doi:10.3233/faia240489 2024

[27] [27]

BayesTune: Bayesian Sparse Deep Model Fine- tuning.Advances in Neural Information Processing Systems, 36:65317–65365, December 2023

Minyoung Kim and Timothy Hospedales. BayesTune: Bayesian Sparse Deep Model Fine- tuning.Advances in Neural Information Processing Systems, 36:65317–65365, December 2023

2023

[28] [28]

Overcoming catastrophic forgetting in neural networks.Proceedings of the national academy of sciences, 114(13):3521–3526, 2017

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks.Proceedings of the national academy of sciences, 114(13):3521–3526, 2017

2017

[29] [29]

Fine- Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

Ananya Kumar, Aditi Raghunathan, Robbie Matthew Jones, Tengyu Ma, and Percy Liang. Fine- Tuning can Distort Pretrained Features and Underperform Out-of-Distribution. InInternational Conference on Learning Representations, October 2021. 11

2021

[30] [30]

Real-time high-resolution neural network with semantic guidance for crack segmentation.Automation in Construction, 156: 105112, December 2023

Yongshang Li, Ronggui Ma, Han Liu, and Gaoli Cheng. Real-time high-resolution neural network with semantic guidance for crack segmentation.Automation in Construction, 156: 105112, December 2023. ISSN 0926-5805. doi: 10.1016/j.autcon.2023.105112

work page doi:10.1016/j.autcon.2023.105112 2023

[31] [31]

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

Haowei Lin, Baizhou Huang, Haotian Ye, Qinyu Chen, Zihao Wang, Sujian Li, Jianzhu Ma, Xiaojun Wan, James Zou, and Yitao Liang. Selecting Large Language Model to Fine-tune via Rectified Scaling Law. InProceedings of the 41st International Conference on Machine Learning, pages 30080–30107. PMLR, July 2024

2024

[32] [32]

AutoFreeze: Automatically Freez- ing Model Blocks to Accelerate Fine-tuning, April 2021

Yuhan Liu, Saurabh Agarwal, and Shivaram Venkataraman. AutoFreeze: Automatically Freez- ing Model Blocks to Accelerate Fine-tuning, April 2021

2021

[33] [33]

PAC-Bayes compression bounds so tight that they can explain generalization

Sanae Lotfi, Marc Anton Finzi, Sanyam Kapoor, Andres Potapczynski, Micah Goldblum, and Andrew Gordon Wilson. PAC-Bayes compression bounds so tight that they can explain generalization. InAdvances in Neural Information Processing Systems, 2022. URL https: //openreview.net/forum?id=o8nYuR8ekFm

2022

[34] [34]

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

James Martens and Roger Grosse. Optimizing Neural Networks with Kronecker-factored Approximate Curvature. InProceedings of the 32nd International Conference on Machine Learning, pages 2408–2417. PMLR, June 2015

2015

[35] [35]

McAllester

David A. McAllester. PAC-Bayesian model averaging. InProceedings of the Twelfth Annual Conference on Computational Learning Theory, pages 164–170, Santa Cruz California USA, July 1999. ACM. ISBN 978-1-58113-167-3. doi: 10.1145/307400.307435

work page doi:10.1145/307400.307435 1999

[36] [36]

McAllester

David A. McAllester. Some PAC-Bayesian Theorems.Machine Learning, 37(3):355–363, December 1999. ISSN 1573-0565. doi: 10.1023/A:1007618624809

work page doi:10.1023/a:1007618624809 1999

[37] [37]

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach.Advances in Neural Information Processing Systems, 35:30950–30962, December 2022

Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji, and Dacheng Tao. Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach.Advances in Neural Information Processing Systems, 35:30950–30962, December 2022

2022

[38] [38]

How Do Vision Transformers Work? InInternational Conference on Learning Representations, October 2021

Namuk Park and Songkuk Kim. How Do Vision Transformers Work? InInternational Conference on Learning Representations, October 2021

2021

[39] [39]

SAM 2: Segment Anything in Images and Videos, October 2024

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, and Christoph Feichtenhofer. SAM 2: Segment Anything in Images and Videos, October 2024

2024

[40] [40]

Trade-Offs of Diagonal Fisher Information Matrix Estimators

Alexander Soen and Ke Sun. Trade-Offs of Diagonal Fisher Information Matrix Estimators. Advances in Neural Information Processing Systems, 37:5870–5912, December 2024

2024

[41] [41]

Training Neural Networks with Fixed Sparse Masks - Preprint, November 2021

Yi-Lin Sung, Varun Nair, and Colin Raffel. Training Neural Networks with Fixed Sparse Masks - Preprint, November 2021

2021

[42] [42]

Scale Efficiently: Insights from Pretraining and Finetuning Transformers

Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, and Donald Metzler. Scale Efficiently: Insights from Pretraining and Finetuning Transformers. InInternational Conference on Learning Representations, October 2021

2021

[43] [43]

Automatic concrete crack segmentation model based on transformer

Wenjun Wang and Chao Su. Automatic concrete crack segmentation model based on transformer. Automation in Construction, 139:104275, July 2022. ISSN 09265805. doi: 10.1016/j.autcon. 2022.104275

work page doi:10.1016/j.autcon 2022

[44] [44]

LoRA-Pro: Are Low- Rank Adapters Properly Optimized? InThe Thirteenth International Conference on Learning Representations, October 2024

Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, and Tieniu Tan. LoRA-Pro: Are Low- Rank Adapters Properly Optimized? InThe Thirteenth International Conference on Learning Representations, October 2024

2024

[45] [45]

Alvarez, and Ping Luo

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Advances in Neural Information Processing Systems, volume 34, pages 12077–12090. Curran Associates, Inc., 2021. 12

2021

[46] [46]

Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning, September 2021

Runxin Xu, Fuli Luo, Zhiyuan Zhang, Chuanqi Tan, Baobao Chang, Songfang Huang, and Fei Huang. Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning, September 2021

2021

[47] [47]

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models, September 2022

Elad Ben Zaken, Shauli Ravfogel, and Yoav Goldberg. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models, September 2022

2022

[48] [48]

Parameter- Efficient Fine-Tuning for Foundation Models, January 2025

Dan Zhang, Tao Feng, Lilong Xue, Yuandong Wang, Yuxiao Dong, and Jie Tang. Parameter- Efficient Fine-Tuning for Foundation Models, January 2025

2025

[49] [49]

Adalora: Adaptive budget allocation for parameter- efficient fine-tuning.arXiv preprint arXiv:2303.10512, 2023

Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adalora: Adaptive budget allocation for parameter- efficient fine-tuning.arXiv preprint arXiv:2303.10512, 2023

Pith/arXiv arXiv 2023

[50] [50]

Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning

Bingchen Zhao, Haoqin Tu, Chen Wei, Jieru Mei, and Cihang Xie. Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning. InThe Twelfth International Conference on Learning Representations, October 2023

2023

[51] [51]

FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Gener- alized Segmentation

Dong Zhao, Jinlong Li, Shuang Wang, Mengyao Wu, Qi Zang, Nicu Sebe, and Zhun Zhong. FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Gener- alized Segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15043–15054, 2025

2025

[52] [52]

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models, October 2020

Mengjie Zhao, Tao Lin, Fei Mi, Martin Jaggi, and Hinrich Schütze. Masking as an Efficient Alternative to Finetuning for Pretrained Language Models, October 2020

2020

[53] [53]

Non-vacuous generalization bounds at the imagenet scale: a pac-bayesian compression approach.arXiv preprint arXiv:1804.05862, 2018

Wenda Zhou, Victor Veitch, Morgane Austern, Ryan P Adams, and Peter Orbanz. Non-vacuous generalization bounds at the imagenet scale: a pac-bayesian compression approach.arXiv preprint arXiv:1804.05862, 2018

Pith/arXiv arXiv 2018

[54] [54]

A Comprehensive Survey on Transfer Learning.Proceedings of the IEEE, 109 (1):43–76, January 2021

Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. A Comprehensive Survey on Transfer Learning.Proceedings of the IEEE, 109 (1):43–76, January 2021. ISSN 1558-2256. doi: 10.1109/JPROC.2020.3004555. 13 Appendix Overview The appendix provides theoretical derivations, implementation details for Fisher drift, s...

work page doi:10.1109/jproc.2020.3004555 2021