Decompose, Mix, Adapt: A Unified Framework for Parameter-Efficient Neural Network Recombination and Compression

Bryan A. Plummer; Nazia Tasnim; Shrimai Prabhumoye

arxiv: 2603.27383 · v2 · submitted 2026-03-28 · 💻 cs.CV

Decompose, Mix, Adapt: A Unified Framework for Parameter-Efficient Neural Network Recombination and Compression

Nazia Tasnim , Shrimai Prabhumoye , Bryan A. Plummer This is my paper

Pith reviewed 2026-05-14 22:13 UTC · model grok-4.3

classification 💻 cs.CV

keywords CRISPparameter recombinationmodel compressionparameter-efficient fine-tuningweight factorizationunified frameworkneural network adaptationcomputer vision

0 comments

The pith

CRISP factorizes pretrained weights into shared bases and small mixers to support both model compression and parameter-efficient fine-tuning in a single framework.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CRISP as a unified parameter recombination method that factorizes neural network weights into basis matrices shared across layers and small component mixing projections. Sharing and resizing the bases enables compression while the tiny mixer weights, often under 200 parameters, support quick adaptation to new tasks. This integration addresses the challenge of applying separate techniques for compression and fine-tuning, which becomes costly in edge deployments where even reduced parameter counts matter. A sympathetic reader would care because it promises models that are both smaller and more adaptable without composing multiple prior methods.

Core claim

CRISP factorizes pretrained weights into basis matrices and their component mixing projections. Sharing basis matrices across layers and adjusting its size enables model compression, whereas the mixer weight's small size enables support for parameter-efficient fine-tuning. Experiments show CRISP outperforms methods from prior work capable of dual-task applications by 4-5% while also outperforming the state-of-the-art in PEFT by 1.5% and PEFT+MC combinations by 1%.

What carries the argument

Coefficient-gated weight Recombination by Interpolated Shared basis Projections (CRISP), which decomposes weights into shared basis matrices for compression and small mixing projections for adaptation.

If this is right

Models can be compressed by reducing basis size while still adapting to new tasks with fewer than 200 additional parameters.
Dual-task performance exceeds prior recombination methods by 4-5% on relevant benchmarks.
The approach outperforms standalone state-of-the-art PEFT methods by 1.5% and combined PEFT plus model compression baselines by 1%.
A single factorization replaces the need to compose separate parameter recombination techniques for compression and fine-tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The shared-basis approach might extend naturally to transformer-based models outside computer vision tasks.
Further reductions in basis size could be tested to determine the exact compression limits before performance degrades.
Combining CRISP with quantization or pruning might produce additive efficiency gains not explored in the current work.
Deployment on edge hardware could be measured directly to quantify the practical memory and latency savings.

Load-bearing premise

Factorizing pretrained weights into shared basis matrices and small component mixing projections preserves sufficient model capacity and performance when bases are shared across layers for compression.

What would settle it

A performance drop exceeding 5% on standard vision benchmarks when using the compressed CRISP model compared to uncompressed fine-tuned baselines would falsify the capacity preservation claim.

Figures

Figures reproduced from arXiv: 2603.27383 by Bryan A. Plummer, Nazia Tasnim, Shrimai Prabhumoye.

**Figure 1.** Figure 1: PR approach comparison. (a) Prior work in PR typically focuses on PEFT or MC alone [3, 7, 15, 19, 26, 28, 40, 47, 54, 60, 68, 69, 77, 78, 86, 90, 94, 96, 102], which can result in efficient combinations when deployed together. (b) Our unified PR approach CRISP decomposes a pretrained models weights that support both MC and PEFT, enabling us to more effectively use parameter budgets even as tasks scale. PE… view at source ↗

**Figure 2.** Figure 2: CRISP decomposes a pretrained weight matrix into a frozen shared basis and small, learnable mixer matrices, then retrofits these [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: PEFT performance using a ViT-S/16 across a range of [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Comparing ImageNet [11] performance with and without 8-bit PTQ [83] compression. We find CRISP accurately reproduces the original model’s performance while also demonstrating effective compositionality with other compression techniques. mance, demonstrating that we accurately replicated the original model’s performance. Second, CRISP can be effectively combined with methods like PTQ for additional memor… view at source ↗

**Figure 5.** Figure 5: Impact of mixer matrix dimensions on model capac [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Impact of regularization constraint placement across [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Effect of reconstruction loss functions during neural mimicry. We compare four loss functions (Huber, Smooth-L1, MSE, L1) used in the neural mimicry stage (Equation 5 of main paper) for retrofitting pretrained weights into CRISP’s basis-mixer decomposition. AIRCRAFT BIRDS CIFAR100 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 90.0 Accuracy (%) Accuracy Across Initialization Methods Uniform Kaiming Xavier Orthogo… view at source ↗

**Figure 8.** Figure 8: Robustness to initialization methods. We evaluate four standard initialization schemes (Uniform, Kaiming, Xavier, Orthogonal) for the mixer matrices A ′rs during both neural mimicry retrofitting and subsequent task adaptation. ing that weight-space reconstruction without data is insufficient for aggressive compression. In contrast, distillation boosts accuracy by 31%, validating our two-stage approach. W… view at source ↗

read the original abstract

Parameter Recombination (PR) methods aim to efficiently compose the weights of a neural network for applications like Parameter-Efficient FineTuning (PEFT) and Model Compression (MC), among others. Most methods typically focus on one application of PR, which can make composing them challenging. For example, when deploying a large model you may wish to compress the model and also quickly adapt to new settings. However, PEFT methods often can still contain millions of parameters. This may be small compared to the original model size, but can be problematic in resource constrained deployments like edge devices, where they take a larger portion of the compressed model's parameters. To address this, we present Coefficient-gated weight Recombination by Interpolated Shared basis Projections (CRISP), a general approach that seamlessly integrates multiple PR tasks within the same framework. CRISP accomplishes this by factorizing pretrained weights into basis matrices and their component mixing projections. Sharing basis matrices across layers and adjusting its size enables us to perform MC, whereas the mixer weight's small size (fewer than 200 in some experiments) enables CRISP to support PEFT. Experiments show CRISP outperforms methods from prior work capable of dual-task applications by 4-5\% while also outperforming the state-of-the-art in PEFT by 1.5\% and PEFT+MC combinations by 1\%. Our code is available on the repository: https://github.com/appledora/CRISP-CVPR26.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CRISP unifies PEFT and compression through shared-basis factorization with small mixers, but the performance edge rests on thin experimental detail.

read the letter

The paper's main contribution is CRISP, which factorizes pretrained weights into shared basis matrices across layers for compression and attaches small coefficient-gated mixing projections for adaptation. This lets the same structure handle both model compression and parameter-efficient fine-tuning without switching methods. The mixer sizes stay under 200 parameters in the reported cases, which is genuinely useful for edge settings where even modest overhead matters. Releasing the code is also a plus for anyone who wants to test the interpolation and gating mechanics directly.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces CRISP (Coefficient-gated weight Recombination by Interpolated Shared basis Projections), a unified framework for parameter recombination. Pretrained weights are factorized into shared basis matrices (reduced in size for model compression) and small per-component mixing projections (under 200 parameters for PEFT). This enables simultaneous PEFT and MC within one model. Experiments claim 4-5% gains over prior dual-task methods, 1.5% over SOTA PEFT, and 1% over PEFT+MC combinations.

Significance. If the performance claims are robustly validated, the work provides a flexible, parameter-efficient way to combine adaptation and compression, addressing practical constraints on edge devices where PEFT overhead can dominate compressed models. The code release is a positive factor for reproducibility.

major comments (3)

[Experimental Results] Experimental Results section: Performance claims (4-5% over dual-task baselines, 1.5% over SOTA PEFT) are stated without error bars, explicit data splits, or controls for hyperparameter selection of basis dimensions and mixer sizes (listed as free parameters). This makes the reported gains difficult to interpret as generalizable rather than post-hoc.
[Method] Method section on shared bases: The unification claim rests on the assumption that sharing basis matrices across layers (while keeping mixers small) preserves sufficient capacity for dual-task gains. No ablation comparing shared vs. layer-specific bases is provided, leaving open the risk that layer-wise variation is lost and the small mixers cannot compensate without increasing rank (defeating compression).
[Results tables] Results tables: Comparisons to prior dual-task methods require clearer specification of which baselines support both PEFT and MC simultaneously, along with exact parameter counts and training protocols for CRISP in each regime.

minor comments (2)

[Abstract] Abstract: The repository link points to a future CVPR26 location; replace with a stable, permanent link or include a snapshot.
[Method] Notation: Define the interpolation operation and coefficient-gating mechanism more explicitly, including how the mixing projections are applied during recombination.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment point-by-point below and describe the revisions we will make.

read point-by-point responses

Referee: [Experimental Results] Experimental Results section: Performance claims (4-5% over dual-task baselines, 1.5% over SOTA PEFT) are stated without error bars, explicit data splits, or controls for hyperparameter selection of basis dimensions and mixer sizes (listed as free parameters). This makes the reported gains difficult to interpret as generalizable rather than post-hoc.

Authors: We agree that error bars, explicit data splits, and hyperparameter controls are necessary for robust interpretation. In the revised manuscript we will report mean performance with standard deviations over at least three random seeds, state the precise train/validation/test splits for every dataset, and add a dedicated paragraph describing how basis dimensions and mixer sizes were chosen via validation performance (including the search ranges and selection criterion). revision: yes
Referee: [Method] Method section on shared bases: The unification claim rests on the assumption that sharing basis matrices across layers (while keeping mixers small) preserves sufficient capacity for dual-task gains. No ablation comparing shared vs. layer-specific bases is provided, leaving open the risk that layer-wise variation is lost and the small mixers cannot compensate without increasing rank (defeating compression).

Authors: Sharing bases across layers is fundamental to the compression objective; layer-specific bases would multiply the basis storage cost and defeat the MC goal. We will add an ablation in the revision that compares the shared-basis CRISP model against a layer-specific variant whose per-layer ranks are reduced so that total parameter count remains comparable. The results will be reported together with a short discussion of whether the small mixers suffice to recover layer-wise capacity. revision: yes
Referee: [Results tables] Results tables: Comparisons to prior dual-task methods require clearer specification of which baselines support both PEFT and MC simultaneously, along with exact parameter counts and training protocols for CRISP in each regime.

Authors: We will update the tables and text to explicitly mark which baselines support simultaneous PEFT and MC, list exact trainable-parameter counts for CRISP and every baseline in each regime (PEFT-only, MC-only, dual-task), and append a supplementary table or paragraph detailing the optimizer, learning-rate schedule, batch size, and number of epochs/steps used for CRISP under each setting. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The CRISP framework is constructed by factorizing pretrained weights into shared basis matrices (for compression via size adjustment) and small per-component mixing projections (for PEFT via low parameter count). This decomposition follows standard low-rank ideas and directly enables the dual-task unification by varying basis rank and mixer size, without any equation or claim reducing the outputs to the inputs by definition. Performance numbers (4-5% gains over dual-task priors, 1.5% over SOTA PEFT) are presented as separate empirical results rather than predictions forced by the factorization itself. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps in the abstract or described method; the approach remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that pretrained weights admit a useful low-rank or basis decomposition that supports both sharing for compression and small adjustments for adaptation; no new entities are postulated.

free parameters (2)

basis matrix dimensions
Size of shared basis matrices is chosen to achieve desired compression ratio while maintaining performance.
mixer projection size
Number of parameters in the mixing projections is kept small (under 200 in experiments) to enable parameter-efficient fine-tuning.

axioms (1)

domain assumption Pretrained neural network weights can be factorized into basis matrices and mixing projections without substantial loss of expressivity
Invoked in the decomposition step to enable sharing and recombination.

pith-pipeline@v0.9.0 · 5578 in / 1204 out tokens · 56380 ms · 2026-05-14T22:13:48.928754+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

102 extracted references · 102 canonical work pages · 1 internal anchor

[1]

P. Agand. Knowledge distillation from single-task teachers to multi-task student for end-to-end autonomous driving. Proceedings of the AAAI Conference on Artificial Intelli- gence, 38:23375–23376, 2024. 2

work page 2024
[2]

Prune efficiently by soft prun- ing

Parakh Agarwal, Manu Mathew, Kunal Ranjan Patel, Varun Tripathi, and Pramod Swami. Prune efficiently by soft prun- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) Workshops, pages 2210–2217, 2024. 2

work page 2024
[3]

Deepcompress-vit: Rethinking model compression to enhance efficiency of vision transformers at the edge

Sabbir Ahmed, Abdullah Al Arafat, Deniz Najafi, Akhlak Mahmood, Mamshad Nayeem Rizve, Mohaiminul Al Nahian, Ranyang Zhou, Shaahin Angizi, and Ad- nan Siraj Rakin. Deepcompress-vit: Rethinking model compression to enhance efficiency of vision transformers at the edge. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR...

work page 2025
[4]

Piqa: Reasoning about physical commonsense in nat- ural language

Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al. Piqa: Reasoning about physical commonsense in nat- ural language. InProceedings of the AAAI conference on artificial intelligence, pages 7432–7439, 2020. 4

work page 2020
[5]

Fcp dis vit: Efficient vision transformer with neural network pruning

Yuhao Cao. Fcp dis vit: Efficient vision transformer with neural network pruning. In2024 IEEE 4th International Conference on Power, Electronics and Computer Applica- tions (ICPECA), pages 1216–1221, 2024. 2

work page 2024
[6]

An image is worth 1/2 tokens after layer 2: Plug-and-play inference accelera- tion for large vision-language models

Liang Chen, Haozhe Zhao, Tianyu Liu, Shuai Bai, Junyang Lin, Chang Zhou, and Baobao Chang. An image is worth 1/2 tokens after layer 2: Plug-and-play inference accelera- tion for large vision-language models. InProceedings of the European Conference on Computer Vision (ECCV), 2024. 1

work page 2024
[7]

Adaptformer: adapting vision transformers for scalable visual recogni- tion

Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: adapting vision transformers for scalable visual recogni- tion. InProceedings of the 36th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2022. Curran Associates Inc. 1

work page 2022
[8]

Lorashear: Efficient large language model structured pruning and knowledge recovery, 2023

Tianyi Chen, Tianyu Ding, Badal Yadav, Ilya Zharkov, and Luming Liang. Lorashear: Efficient large language model structured pruning and knowledge recovery, 2023. 2

work page 2023
[9]

BoolQ: Exploring the surprising difficulty of natural yes/no questions

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)...

work page 2019
[10]

Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018. 4

work page 2018
[11]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 4, 5, 6, 7, 1

work page 2009
[12]

Comedian: Self-supervised Table 10

Julien Denize, Mykola Liashuha, Jaonary Rabarisoa, Astrid Orcesi, and Romain H ´erault. Comedian: Self-supervised Table 10. Comparison of PEFT methods on commonsense reasoning benchmarks. Results from LoRA and DoRA are taken from Liu et al. [47], HiRA results are from Huang et al. [30]. We find that CRISP is on par or better than custom PEFT methods while...

work page 2024
[13]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An im- age is worth 16x16 words: Transformers for image recog- nition at scale.CoRR, abs/2010.11929, 2020. 2, 4, 5, 6, 7, 8, 3

work page internal anchor Pith review Pith/arXiv arXiv 2010
[14]

Sigmoid- weighted linear units for neural network function approx- imation in reinforcement learning.Neural Networks, 107: 3–11, 2018

Stefan Elfwing, Eiji Uchibe, and Kenji Doya. Sigmoid- weighted linear units for neural network function approx- imation in reinforcement learning.Neural Networks, 107: 3–11, 2018. Special issue on deep reinforcement learning. 4

work page 2018
[15]

Hyperdiffusion: Generating implicit neu- ral fields with weight-space diffusion.2023 IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 14254–14264, 2023

Ziya Erkoc ¸, Fangchang Ma, Qi Shan, Matthias Nießner, and Angela Dai. Hyperdiffusion: Generating implicit neu- ral fields with weight-space diffusion.2023 IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 14254–14264, 2023. 1

work page 2023
[16]

Isomorphic pruning for vision models

Gongfan Fang, Xinyin Ma, Michael Bi Mi, and Xinchao Wang. Isomorphic pruning for vision models. InComputer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XXX, page 232–250, Berlin, Heidelberg, 2024. Springer- Verlag. 2, 5, 6

work page 2024
[17]

The language model evaluation har- ness, 2024

Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Gold- ing, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle Mc- Donell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lin- tang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. The l...

work page 2024
[18]

Fast r-cnn

Ross Girshick. Fast r-cnn. InProceedings of the IEEE In- ternational Conference on Computer Vision (ICCV), 2015. 4

work page 2015
[19]

Pruning by block benefit: Exploring the properties of vision transformer blocks during domain adaptation

Patrick Glandorf and Bodo Rosenhahn. Pruning by block benefit: Exploring the properties of vision transformer blocks during domain adaptation. InInternational Confer- ence on Computer Vision Workshop, 2025. 1

work page 2025
[20]

Deep sparse rectifier neural networks

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. InProceedings of the Four- teenth International Conference on Artificial Intelligence and Statistics, pages 315–323, Fort Lauderdale, FL, USA,

work page
[21]

Balf: Budgeted activation- aware low-rank factorization for fine-tuning-free model compression, 2025

David Gonz ´alez-Mart´ınez. Balf: Budgeted activation- aware low-rank factorization for fine-tuning-free model compression, 2025. 5, 6

work page 2025
[22]

The llama 3 herd of models,

Aaron Grattafiori and et al. The llama 3 herd of models,

work page
[23]

Weight copy and low-rank adaptation for few-shot distillation of vision transformers

Diana-Nicoleta Grigore, Mariana-Iuliana Georgescu, Jon Alvarez Justo, Tor Johansen, Andreea Iuliana Ionescu, and Radu Tudor Ionescu. Weight copy and low-rank adaptation for few-shot distillation of vision transformers. InProceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 7368–7378, 2025. 2, 5, 6, 7

work page 2025
[24]

Dai, and Quoc V

David Ha, Andrew M. Dai, and Quoc V . Le. Hypernet- works. In5th International Conference on Learning Repre- sentations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017. 3

work page 2017
[25]

Svdiff: Compact pa- rameter space for diffusion fine-tuning.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

Ligong Han, Yinxiao Li, Han Zhang, Peyman Milanfar, Dimitris Metaxas, and Feng Yang. Svdiff: Compact pa- rameter space for diffusion fine-tuning.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 2

work page 2023
[26]

Learning effi- cient vision transformers via fine-grained manifold distil- lation

Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang, Han Hu, and Yunhe Wang. Learning effi- cient vision transformers via fine-grained manifold distil- lation. InAdvances in Neural Information Processing Sys- tems, 2022. 1, 2

work page 2022
[27]

Multi-dimensional model compression of vision transformer

Zejiang Hou and Sun-Yuan Kung. Multi-dimensional model compression of vision transformer. In2022 IEEE International Conference on Multimedia and Expo (ICME), pages 01–06, 2022. 2

work page 2022
[28]

LoRA: Low-rank adaptation of large language mod- els

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language mod- els. InInternational Conference on Learning Representa- tions, 2022. 1, 2, 3, 5, 6, 8, 4

work page 2022
[29]

LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models

Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Soujanya Poria, and Roy Lee. LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Lan- guage Processing, pages 5254–5276, Singapore, 2023. As- sociation for Computatio...

work page 2023
[30]

HiRA: Parameter-efficient hadamard high-rank adaptation for large language models

Qiushi Huang, Tom Ko, Zhan Zhuang, Lilian Tang, and Yu Zhang. HiRA: Parameter-efficient hadamard high-rank adaptation for large language models. InThe Thirteenth In- ternational Conference on Learning Representations, 2025. 6

work page 2025
[31]

Finding lottery tickets in vision models via data-driven spectral foresight pruning

Leonardo Iurada, Marco Ciccone, and Tatiana Tommasi. Finding lottery tickets in vision models via data-driven spectral foresight pruning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16142–16151, 2024. 2

work page 2024
[32]

Adap- tive class token knowledge distillation for efficient vi- sion transformer.Knowledge-Based Systems, 304:112531,

Minchan Kang, Sanghyeok Son, and Daeshik Kim. Adap- tive class token knowledge distillation for efficient vi- sion transformer.Knowledge-Based Systems, 304:112531,

work page
[33]

The need for speed: Pruning transformers with one recipe

Samir Khaki and Konstantinos N Plataniotis. The need for speed: Pruning transformers with one recipe. InThe Twelfth International Conference on Learning Representa- tions, 2024. 2

work page 2024
[34]

Krizhevsky and G

A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images.Master’s thesis, Department of Computer Science, University of Toronto, 2009. 4, 5, 3

work page 2009
[35]

Dct-vit: High-frequency pruned vision transformer with discrete cosine transform

Jongho Lee and Hyun Kim. Dct-vit: High-frequency pruned vision transformer with discrete cosine transform. IEEE Access, 12:80386–80396, 2024. 2

work page 2024
[36]

Discovering sparsity allocation for layer- wise pruning of large language models

Lujun Li, Peijie Dong, Zhenheng Tang, Xiang Liu, Qiang Wang, Wenhan Luo, Wei Xue, Qifeng Liu, Xiaowen Chu, and Yike Guo. Discovering sparsity allocation for layer- wise pruning of large language models. InProceedings of the 38th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2024. Curran Associates Inc. 2

work page 2024
[37]

Vb-lora: Extreme parameter efficient fine-tuning with vector banks

Yang Li, Shaobo Han, and Shihao Ji. Vb-lora: Extreme parameter efficient fine-tuning with vector banks. InThe 38th Conference on Neural Information Processing Systems (NeurIPS), 2024. 2, 5, 8

work page 2024
[38]

LLaMA-VID: An image is worth 2 tokens in large language models

Yanwei Li, Chengyao Wang, and Jiaya Jia. LLaMA-VID: An image is worth 2 tokens in large language models. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, 2024. 1, 2, 4

work page 2024
[39]

Scaling & shifting your features: A new baseline for efficient model tuning

Dongze Lian, Daquan Zhou, Jiashi Feng, and Xinchao Wang. Scaling & shifting your features: A new baseline for efficient model tuning. InAdvances in Neural Informa- tion Processing Systems (NeurIPS), 2022. 5, 6, 7, 8

work page 2022
[40]

Inflora: Interference-free low-rank adaptation for continual learning

Yan-Shuo Liang and Wu-Jun Li. Inflora: Interference-free low-rank adaptation for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 23638–23647, 2024. 1

work page 2024
[41]

3-in-1: 2d rotary adapta- tion for efficient finetuning, efficient batching and compos- ability

Baohao Liao and Christof Monz. 3-in-1: 2d rotary adapta- tion for efficient finetuning, efficient batching and compos- ability. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 2, 5, 8

work page 2024
[42]

Slimgpt: Layer-wise structured pruning for large language models

Gui Ling, Ziyang Wang, Yuliang Yan, and Qingwen Liu. Slimgpt: Layer-wise structured pruning for large language models. InAdvances in Neural Information Processing Sys- tems, pages 107112–107137. Curran Associates, Inc., 2024. 2

work page 2024
[43]

SVFT: Parameter-efficient fine-tuning with singular vectors

Vijay Lingam, Atula Tejaswi Neerkaje, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Eunsol Choi, Alex Dimakis, Aleksandar Bojchevski, and sujay sanghavi. SVFT: Parameter-efficient fine-tuning with singular vectors. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 5, 6, 7, 8

work page 2024
[44]

Pruning via merging: Com- pressing LLMs via manifold alignment based layer merg- ing

Deyuan Liu, Zhanyue Qin, Hairu Wang, Zhao Yang, Zecheng Wang, Fangying Rong, Qingbin Liu, Yanchao Hao, Bo Li, Xi Chen, Cunhang Fan, Zhao Lv, Dianhui Chu, Zhiying Tu, and Dianbo Sui. Pruning via merging: Com- pressing LLMs via manifold alignment based layer merg- ing. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,...

work page 2024
[45]

Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning

Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin Raffel. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. InProceedings of the 36th Interna- tional Conference on Neural Information Processing Sys- tems, Red Hook, NY , USA, 2022. Curran Associates Inc. 2, 5, 8

work page 2022
[46]

Small scale data-free knowledge distillation

He Liu, Yikai Wang, Huaping Liu, Fuchun Sun, and An- bang Yao. Small scale data-free knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 6008–6016,

work page
[47]

Dora: weight-decomposed low-rank adaptation

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: weight-decomposed low-rank adaptation. InProceedings of the 41st International Con- ference on Machine Learning. JMLR.org, 2024. 1, 2, 5, 8, 4, 6

work page 2024
[48]

Black, Adrian Weller, and Bernhard Sch ¨olkopf

Weiyang Liu, Zeju Qiu, Yao Feng, Yuliang Xiu, Yuxuan Xue, Longhui Yu, Haiwen Feng, Zhen Liu, Juyeon Heo, Songyou Peng, Yandong Wen, Michael J. Black, Adrian Weller, and Bernhard Sch ¨olkopf. Parameter-efficient or- thogonal finetuning via butterfly factorization. InICLR,

work page
[49]

Decoupled weight de- cay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learn- ing Representations, 2019. 4

work page 2019
[50]

S. Maji, J. Kannala, E. Rahtu, M. Blaschko, and A. Vedaldi. Fine-grained visual classification of aircraft.-, 2013. 4, 5, 3

work page 2013
[51]

Prune and merge: Efficient token compression for vision transformer with spatial in- formation preserved.IEEE Transactions on Multimedia, 27:4670–4683, 2025

Junzhu Mao, Yang Shen, Jinyang Guo, Yazhou Yao, Xian- sheng Hua, and Hengtao Shen. Prune and merge: Efficient token compression for vision transformer with spatial in- formation preserved.IEEE Transactions on Multimedia, 27:4670–4683, 2025. 1

work page 2025
[52]

Can a suit of armor conduct electricity? a new dataset for open book question answering

Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sab- harwal. Can a suit of armor conduct electricity? a new dataset for open book question answering. InProceed- ings of the 2018 Conference on Empirical Methods in Natu- ral Language Processing, pages 2381–2391, Brussels, Bel- gium, 2018. Association for Computational Linguistics. 4

work page 2018
[53]

RoSA: Accurate parameter-efficient fine-tuning via robust adaptation

Mahdi Nikdan, Soroush Tabesh, Elvir Crn ˇcevi´c, and Dan Alistarh. RoSA: Accurate parameter-efficient fine-tuning via robust adaptation. InForty-first International Confer- ence on Machine Learning, 2024. 2, 5

work page 2024
[54]

RoSA: Accurate parameter-efficient fine-tuning via robust adaptation

Mahdi Nikdan, Soroush Tabesh, Elvir Crn ˇcevi´c, and Dan Alistarh. RoSA: Accurate parameter-efficient fine-tuning via robust adaptation. InProceedings of the 41st Inter- national Conference on Machine Learning, pages 38187– 38206. PMLR, 2024. 1, 2

work page 2024
[55]

Automated flower classification over a large number of classes

Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. InIn- dian Conference on Computer Vision, Graphics and Image Processing, 2008. 4, 3

work page 2008
[56]

Fair-vpt: Fair visual prompt tuning for image classification

Sungho Park and Hyeran Byun. Fair-vpt: Fair visual prompt tuning for image classification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12268–12278, 2024. 2

work page 2024
[57]

Data-freeweight com- press and denoise for large language models.CoRR, abs/2402.16319, 2024

Runyu Peng, Yunhua Zhou, Qipeng Guo, Yang Gao, Hang Yan, Xipeng Qiu, and Dahua Lin. Data-freeweight com- press and denoise for large language models.CoRR, abs/2402.16319, 2024. 6, 4

work page arXiv 2024
[58]

Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, and Kate Saenko

Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, and Kate Saenko. Neural parameter allocation search. InInternational Conference on Learning Repre- sentations, 2022. 2, 3, 4

work page 2022
[59]

Recognizing in- door scenes

Ariadna Quattoni and Antonio Torralba. Recognizing in- door scenes. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 413–420, 2009. 4, 3

work page 2009
[60]

Venkatesh Babu

Harsh Rangwani, Pradipto Mondal, Mayank Mishra, Ashish Ramayee Asokan, and R. Venkatesh Babu. Deit- lt: Distillation strikes back for vision transformer training on long-tailed datasets. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23396–23406, 2024. 1, 2

work page 2024
[61]

MELoRA: Mini-ensemble low-rank adapters for parameter-efficient fine-tuning

Pengjie Ren, Chengshun Shi, Shiguang Wu, Mengqi Zhang, Zhaochun Ren, Maarten Rijke, Zhumin Chen, and Jiahuan Pei. MELoRA: Mini-ensemble low-rank adapters for parameter-efficient fine-tuning. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3052–3064, Bangkok, Thailand, 2024. Associati...

work page 2024
[62]

Winogrande: an adversarial winograd schema challenge at scale.Commun

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavat- ula, and Yejin Choi. Winogrande: an adversarial winograd schema challenge at scale.Commun. ACM, 64(9):99–106,

work page
[63]

Social IQa: Commonsense rea- soning about social interactions

Maarten Sap, Hannah Rashkin, Derek Chen, Ronan Le Bras, and Yejin Choi. Social IQa: Commonsense rea- soning about social interactions. InProceedings of the 2019 Conference on Empirical Methods in Natural Lan- guage Processing and the 9th International Joint Confer- ence on Natural Language Processing (EMNLP-IJCNLP), pages 4463–4473, Hong Kong, China, 2019...

work page 2019
[64]

Learning implicitly re- current CNNs through parameter sharing

Pedro Savarese and Michael Maire. Learning implicitly re- current CNNs through parameter sharing. InInternational Conference on Learning Representations, 2019. 2

work page 2019
[65]

You only prune once: Designing calibration- free model compression with policy learning

Ayan Sengupta, Siddhant Chaudhary, and Tanmoy Chakraborty. You only prune once: Designing calibration- free model compression with policy learning. InThe Thir- teenth International Conference on Learning Representa- tions, 2025. 2, 6, 4

work page 2025
[66]

Pro- vpt: Distribution-adaptive visual prompt tuning via prompt relocation

Chikai Shang, Mengke Li, Yiqun Zhang, Zhen Chen, Jinlin Wu, Fangqing Gu, Yang Lu, and Yiu-Ming Cheung. Pro- vpt: Distribution-adaptive visual prompt tuning via prompt relocation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1558–1568,

work page
[67]

Diversity-guided mlp reduc- tion for efficient large vision transformers.arXiv preprint arXiv:2506.07138, 2025

Chengchao Shen, Hourun Zhu, Gongfan Fang, Jianxin Wang, and Xinchao Wang. Diversity-guided mlp reduc- tion for efficient large vision transformers.arXiv preprint arXiv:2506.07138, 2025. 5, 6

work page arXiv 2025
[68]

UPop: Unified and progressive pruning for compressing vision-language transformers

Dachuan Shi, Chaofan Tao, Ying Jin, Zhendong Yang, Chun Yuan, and Jiaqi Wang. UPop: Unified and progressive pruning for compressing vision-language transformers. In Proceedings of the 40th International Conference on Ma- chine Learning, pages 31292–31311. PMLR, 2023. 1

work page 2023
[69]

See further for parameter efficient fine-tuning by standing on the shoulders of decomposition.arXiv preprint arXiv:2407.05417, 2024

Chongjie Si, Xiaokang Yang, and Wei Shen. See further for parameter efficient fine-tuning by standing on the shoulders of decomposition.arXiv preprint arXiv:2407.05417, 2024. 1

work page arXiv 2024
[70]

Sparse low rank factorization for deep neural network compression.Neurocomputing, 398:185– 196, 2020

Sridhar Swaminathan, Deepak Garg, Rajkumar Kannan, and Frederic Andres. Sparse low rank factorization for deep neural network compression.Neurocomputing, 398:185– 196, 2020. 2

work page 2020
[71]

Nazia Tasnim and Bryan A. Plummer. Recast: Reparam- eterized, compact weight adaptation for sequential tasks. InInternational Conference on Learning Representations (ICLR), 2025. 2, 3, 4, 5, 6, 7, 8

work page 2025
[72]

Tukan, A

M. Tukan, A. Maalouf, M. Weksler, and D. Feldman. No fine-tuning, no cry: robust svd for compressing deep net- works.Sensors, 21:5599, 2021. 2

work page 2021
[73]

DyLoRA: Parameter-efficient tuning of pre-trained models using dynamic search-free low-rank adaptation

Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, and Ali Ghodsi. DyLoRA: Parameter-efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3274–3287, Dubrovnik, Croatia, 2023. Association for Computational...

work page 2023
[74]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Be- longie. The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technol- ogy, 2011. 4, 5, 3

work page 2011
[75]

Ao Wang, Hui Chen, Zijia Lin, Sicheng Zhao, Jungong Han, and Guiguang Ding. Cait: Triple-win compression towards high accuracy, fast inference, and favorable trans- ferability for vits.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–17, 2025. 2

work page 2025
[76]

H. Wang, J. Chang, Y . Zhai, X. Luo, J. Sun, Z. Lin, and Q. Tian. Lion: implicit vision prompt tuning.Proceedings of the AAAI Conference on Artificial Intelligence, 38:5372– 5380, 2024. 2

work page 2024
[77]

Basis sharing: Cross-layer parameter sharing for large language model compression

Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin, Bing Li, and Grace Li Zhang. Basis sharing: Cross-layer parameter sharing for large language model compression. InThe Thir- teenth International Conference on Learning Representa- tions, 2025. 1, 2, 3, 4, 5, 6

work page 2025
[78]

Neural network pa- rameter diffusion

Kaili Wang, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, and Yang You. Neural network pa- rameter diffusion. In-, 2024. 1

work page 2024
[79]

SVD- LLM: Truncation-aware singular value decomposition for large language model compression

Xin Wang, Yu Zheng, Zhongwei Wan, and Mi Zhang. SVD- LLM: Truncation-aware singular value decomposition for large language model compression. InThe Thirteenth In- ternational Conference on Learning Representations, 2025. 2, 3

work page 2025
[80]

Revisiting the power of prompt for visual tuning

Yuzhu Wang, Lechao Cheng, Chaowei Fang, Dingwen Zhang, Manni Duan, and Meng Wang. Revisiting the power of prompt for visual tuning. InProceedings of the 41st In- ternational Conference on Machine Learning. JMLR.org,

work page

Showing first 80 references.

[1] [1]

P. Agand. Knowledge distillation from single-task teachers to multi-task student for end-to-end autonomous driving. Proceedings of the AAAI Conference on Artificial Intelli- gence, 38:23375–23376, 2024. 2

work page 2024

[2] [2]

Prune efficiently by soft prun- ing

Parakh Agarwal, Manu Mathew, Kunal Ranjan Patel, Varun Tripathi, and Pramod Swami. Prune efficiently by soft prun- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) Workshops, pages 2210–2217, 2024. 2

work page 2024

[3] [3]

Deepcompress-vit: Rethinking model compression to enhance efficiency of vision transformers at the edge

Sabbir Ahmed, Abdullah Al Arafat, Deniz Najafi, Akhlak Mahmood, Mamshad Nayeem Rizve, Mohaiminul Al Nahian, Ranyang Zhou, Shaahin Angizi, and Ad- nan Siraj Rakin. Deepcompress-vit: Rethinking model compression to enhance efficiency of vision transformers at the edge. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR...

work page 2025

[4] [4]

Piqa: Reasoning about physical commonsense in nat- ural language

Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al. Piqa: Reasoning about physical commonsense in nat- ural language. InProceedings of the AAAI conference on artificial intelligence, pages 7432–7439, 2020. 4

work page 2020

[5] [5]

Fcp dis vit: Efficient vision transformer with neural network pruning

Yuhao Cao. Fcp dis vit: Efficient vision transformer with neural network pruning. In2024 IEEE 4th International Conference on Power, Electronics and Computer Applica- tions (ICPECA), pages 1216–1221, 2024. 2

work page 2024

[6] [6]

An image is worth 1/2 tokens after layer 2: Plug-and-play inference accelera- tion for large vision-language models

Liang Chen, Haozhe Zhao, Tianyu Liu, Shuai Bai, Junyang Lin, Chang Zhou, and Baobao Chang. An image is worth 1/2 tokens after layer 2: Plug-and-play inference accelera- tion for large vision-language models. InProceedings of the European Conference on Computer Vision (ECCV), 2024. 1

work page 2024

[7] [7]

Adaptformer: adapting vision transformers for scalable visual recogni- tion

Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: adapting vision transformers for scalable visual recogni- tion. InProceedings of the 36th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2022. Curran Associates Inc. 1

work page 2022

[8] [8]

Lorashear: Efficient large language model structured pruning and knowledge recovery, 2023

Tianyi Chen, Tianyu Ding, Badal Yadav, Ilya Zharkov, and Luming Liang. Lorashear: Efficient large language model structured pruning and knowledge recovery, 2023. 2

work page 2023

[9] [9]

BoolQ: Exploring the surprising difficulty of natural yes/no questions

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)...

work page 2019

[10] [10]

Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018. 4

work page 2018

[11] [11]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 4, 5, 6, 7, 1

work page 2009

[12] [12]

Comedian: Self-supervised Table 10

Julien Denize, Mykola Liashuha, Jaonary Rabarisoa, Astrid Orcesi, and Romain H ´erault. Comedian: Self-supervised Table 10. Comparison of PEFT methods on commonsense reasoning benchmarks. Results from LoRA and DoRA are taken from Liu et al. [47], HiRA results are from Huang et al. [30]. We find that CRISP is on par or better than custom PEFT methods while...

work page 2024

[13] [13]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An im- age is worth 16x16 words: Transformers for image recog- nition at scale.CoRR, abs/2010.11929, 2020. 2, 4, 5, 6, 7, 8, 3

work page internal anchor Pith review Pith/arXiv arXiv 2010

[14] [14]

Sigmoid- weighted linear units for neural network function approx- imation in reinforcement learning.Neural Networks, 107: 3–11, 2018

Stefan Elfwing, Eiji Uchibe, and Kenji Doya. Sigmoid- weighted linear units for neural network function approx- imation in reinforcement learning.Neural Networks, 107: 3–11, 2018. Special issue on deep reinforcement learning. 4

work page 2018

[15] [15]

Hyperdiffusion: Generating implicit neu- ral fields with weight-space diffusion.2023 IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 14254–14264, 2023

Ziya Erkoc ¸, Fangchang Ma, Qi Shan, Matthias Nießner, and Angela Dai. Hyperdiffusion: Generating implicit neu- ral fields with weight-space diffusion.2023 IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 14254–14264, 2023. 1

work page 2023

[16] [16]

Isomorphic pruning for vision models

Gongfan Fang, Xinyin Ma, Michael Bi Mi, and Xinchao Wang. Isomorphic pruning for vision models. InComputer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XXX, page 232–250, Berlin, Heidelberg, 2024. Springer- Verlag. 2, 5, 6

work page 2024

[17] [17]

The language model evaluation har- ness, 2024

Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Gold- ing, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle Mc- Donell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lin- tang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. The l...

work page 2024

[18] [18]

Fast r-cnn

Ross Girshick. Fast r-cnn. InProceedings of the IEEE In- ternational Conference on Computer Vision (ICCV), 2015. 4

work page 2015

[19] [19]

Pruning by block benefit: Exploring the properties of vision transformer blocks during domain adaptation

Patrick Glandorf and Bodo Rosenhahn. Pruning by block benefit: Exploring the properties of vision transformer blocks during domain adaptation. InInternational Confer- ence on Computer Vision Workshop, 2025. 1

work page 2025

[20] [20]

Deep sparse rectifier neural networks

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. InProceedings of the Four- teenth International Conference on Artificial Intelligence and Statistics, pages 315–323, Fort Lauderdale, FL, USA,

work page

[21] [21]

Balf: Budgeted activation- aware low-rank factorization for fine-tuning-free model compression, 2025

David Gonz ´alez-Mart´ınez. Balf: Budgeted activation- aware low-rank factorization for fine-tuning-free model compression, 2025. 5, 6

work page 2025

[22] [22]

The llama 3 herd of models,

Aaron Grattafiori and et al. The llama 3 herd of models,

work page

[23] [23]

Weight copy and low-rank adaptation for few-shot distillation of vision transformers

Diana-Nicoleta Grigore, Mariana-Iuliana Georgescu, Jon Alvarez Justo, Tor Johansen, Andreea Iuliana Ionescu, and Radu Tudor Ionescu. Weight copy and low-rank adaptation for few-shot distillation of vision transformers. InProceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 7368–7378, 2025. 2, 5, 6, 7

work page 2025

[24] [24]

Dai, and Quoc V

David Ha, Andrew M. Dai, and Quoc V . Le. Hypernet- works. In5th International Conference on Learning Repre- sentations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017. 3

work page 2017

[25] [25]

Svdiff: Compact pa- rameter space for diffusion fine-tuning.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

Ligong Han, Yinxiao Li, Han Zhang, Peyman Milanfar, Dimitris Metaxas, and Feng Yang. Svdiff: Compact pa- rameter space for diffusion fine-tuning.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 2

work page 2023

[26] [26]

Learning effi- cient vision transformers via fine-grained manifold distil- lation

Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang, Han Hu, and Yunhe Wang. Learning effi- cient vision transformers via fine-grained manifold distil- lation. InAdvances in Neural Information Processing Sys- tems, 2022. 1, 2

work page 2022

[27] [27]

Multi-dimensional model compression of vision transformer

Zejiang Hou and Sun-Yuan Kung. Multi-dimensional model compression of vision transformer. In2022 IEEE International Conference on Multimedia and Expo (ICME), pages 01–06, 2022. 2

work page 2022

[28] [28]

LoRA: Low-rank adaptation of large language mod- els

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language mod- els. InInternational Conference on Learning Representa- tions, 2022. 1, 2, 3, 5, 6, 8, 4

work page 2022

[29] [29]

LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models

Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Soujanya Poria, and Roy Lee. LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Lan- guage Processing, pages 5254–5276, Singapore, 2023. As- sociation for Computatio...

work page 2023

[30] [30]

HiRA: Parameter-efficient hadamard high-rank adaptation for large language models

Qiushi Huang, Tom Ko, Zhan Zhuang, Lilian Tang, and Yu Zhang. HiRA: Parameter-efficient hadamard high-rank adaptation for large language models. InThe Thirteenth In- ternational Conference on Learning Representations, 2025. 6

work page 2025

[31] [31]

Finding lottery tickets in vision models via data-driven spectral foresight pruning

Leonardo Iurada, Marco Ciccone, and Tatiana Tommasi. Finding lottery tickets in vision models via data-driven spectral foresight pruning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16142–16151, 2024. 2

work page 2024

[32] [32]

Adap- tive class token knowledge distillation for efficient vi- sion transformer.Knowledge-Based Systems, 304:112531,

Minchan Kang, Sanghyeok Son, and Daeshik Kim. Adap- tive class token knowledge distillation for efficient vi- sion transformer.Knowledge-Based Systems, 304:112531,

work page

[33] [33]

The need for speed: Pruning transformers with one recipe

Samir Khaki and Konstantinos N Plataniotis. The need for speed: Pruning transformers with one recipe. InThe Twelfth International Conference on Learning Representa- tions, 2024. 2

work page 2024

[34] [34]

Krizhevsky and G

A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images.Master’s thesis, Department of Computer Science, University of Toronto, 2009. 4, 5, 3

work page 2009

[35] [35]

Dct-vit: High-frequency pruned vision transformer with discrete cosine transform

Jongho Lee and Hyun Kim. Dct-vit: High-frequency pruned vision transformer with discrete cosine transform. IEEE Access, 12:80386–80396, 2024. 2

work page 2024

[36] [36]

Discovering sparsity allocation for layer- wise pruning of large language models

Lujun Li, Peijie Dong, Zhenheng Tang, Xiang Liu, Qiang Wang, Wenhan Luo, Wei Xue, Qifeng Liu, Xiaowen Chu, and Yike Guo. Discovering sparsity allocation for layer- wise pruning of large language models. InProceedings of the 38th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2024. Curran Associates Inc. 2

work page 2024

[37] [37]

Vb-lora: Extreme parameter efficient fine-tuning with vector banks

Yang Li, Shaobo Han, and Shihao Ji. Vb-lora: Extreme parameter efficient fine-tuning with vector banks. InThe 38th Conference on Neural Information Processing Systems (NeurIPS), 2024. 2, 5, 8

work page 2024

[38] [38]

LLaMA-VID: An image is worth 2 tokens in large language models

Yanwei Li, Chengyao Wang, and Jiaya Jia. LLaMA-VID: An image is worth 2 tokens in large language models. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, 2024. 1, 2, 4

work page 2024

[39] [39]

Scaling & shifting your features: A new baseline for efficient model tuning

Dongze Lian, Daquan Zhou, Jiashi Feng, and Xinchao Wang. Scaling & shifting your features: A new baseline for efficient model tuning. InAdvances in Neural Informa- tion Processing Systems (NeurIPS), 2022. 5, 6, 7, 8

work page 2022

[40] [40]

Inflora: Interference-free low-rank adaptation for continual learning

Yan-Shuo Liang and Wu-Jun Li. Inflora: Interference-free low-rank adaptation for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 23638–23647, 2024. 1

work page 2024

[41] [41]

3-in-1: 2d rotary adapta- tion for efficient finetuning, efficient batching and compos- ability

Baohao Liao and Christof Monz. 3-in-1: 2d rotary adapta- tion for efficient finetuning, efficient batching and compos- ability. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 2, 5, 8

work page 2024

[42] [42]

Slimgpt: Layer-wise structured pruning for large language models

Gui Ling, Ziyang Wang, Yuliang Yan, and Qingwen Liu. Slimgpt: Layer-wise structured pruning for large language models. InAdvances in Neural Information Processing Sys- tems, pages 107112–107137. Curran Associates, Inc., 2024. 2

work page 2024

[43] [43]

SVFT: Parameter-efficient fine-tuning with singular vectors

Vijay Lingam, Atula Tejaswi Neerkaje, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Eunsol Choi, Alex Dimakis, Aleksandar Bojchevski, and sujay sanghavi. SVFT: Parameter-efficient fine-tuning with singular vectors. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 5, 6, 7, 8

work page 2024

[44] [44]

Pruning via merging: Com- pressing LLMs via manifold alignment based layer merg- ing

Deyuan Liu, Zhanyue Qin, Hairu Wang, Zhao Yang, Zecheng Wang, Fangying Rong, Qingbin Liu, Yanchao Hao, Bo Li, Xi Chen, Cunhang Fan, Zhao Lv, Dianhui Chu, Zhiying Tu, and Dianbo Sui. Pruning via merging: Com- pressing LLMs via manifold alignment based layer merg- ing. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,...

work page 2024

[45] [45]

Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning

Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin Raffel. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. InProceedings of the 36th Interna- tional Conference on Neural Information Processing Sys- tems, Red Hook, NY , USA, 2022. Curran Associates Inc. 2, 5, 8

work page 2022

[46] [46]

Small scale data-free knowledge distillation

He Liu, Yikai Wang, Huaping Liu, Fuchun Sun, and An- bang Yao. Small scale data-free knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 6008–6016,

work page

[47] [47]

Dora: weight-decomposed low-rank adaptation

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: weight-decomposed low-rank adaptation. InProceedings of the 41st International Con- ference on Machine Learning. JMLR.org, 2024. 1, 2, 5, 8, 4, 6

work page 2024

[48] [48]

Black, Adrian Weller, and Bernhard Sch ¨olkopf

Weiyang Liu, Zeju Qiu, Yao Feng, Yuliang Xiu, Yuxuan Xue, Longhui Yu, Haiwen Feng, Zhen Liu, Juyeon Heo, Songyou Peng, Yandong Wen, Michael J. Black, Adrian Weller, and Bernhard Sch ¨olkopf. Parameter-efficient or- thogonal finetuning via butterfly factorization. InICLR,

work page

[49] [49]

Decoupled weight de- cay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learn- ing Representations, 2019. 4

work page 2019

[50] [50]

S. Maji, J. Kannala, E. Rahtu, M. Blaschko, and A. Vedaldi. Fine-grained visual classification of aircraft.-, 2013. 4, 5, 3

work page 2013

[51] [51]

Prune and merge: Efficient token compression for vision transformer with spatial in- formation preserved.IEEE Transactions on Multimedia, 27:4670–4683, 2025

Junzhu Mao, Yang Shen, Jinyang Guo, Yazhou Yao, Xian- sheng Hua, and Hengtao Shen. Prune and merge: Efficient token compression for vision transformer with spatial in- formation preserved.IEEE Transactions on Multimedia, 27:4670–4683, 2025. 1

work page 2025

[52] [52]

Can a suit of armor conduct electricity? a new dataset for open book question answering

Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sab- harwal. Can a suit of armor conduct electricity? a new dataset for open book question answering. InProceed- ings of the 2018 Conference on Empirical Methods in Natu- ral Language Processing, pages 2381–2391, Brussels, Bel- gium, 2018. Association for Computational Linguistics. 4

work page 2018

[53] [53]

RoSA: Accurate parameter-efficient fine-tuning via robust adaptation

Mahdi Nikdan, Soroush Tabesh, Elvir Crn ˇcevi´c, and Dan Alistarh. RoSA: Accurate parameter-efficient fine-tuning via robust adaptation. InForty-first International Confer- ence on Machine Learning, 2024. 2, 5

work page 2024

[54] [54]

RoSA: Accurate parameter-efficient fine-tuning via robust adaptation

Mahdi Nikdan, Soroush Tabesh, Elvir Crn ˇcevi´c, and Dan Alistarh. RoSA: Accurate parameter-efficient fine-tuning via robust adaptation. InProceedings of the 41st Inter- national Conference on Machine Learning, pages 38187– 38206. PMLR, 2024. 1, 2

work page 2024

[55] [55]

Automated flower classification over a large number of classes

Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. InIn- dian Conference on Computer Vision, Graphics and Image Processing, 2008. 4, 3

work page 2008

[56] [56]

Fair-vpt: Fair visual prompt tuning for image classification

Sungho Park and Hyeran Byun. Fair-vpt: Fair visual prompt tuning for image classification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12268–12278, 2024. 2

work page 2024

[57] [57]

Data-freeweight com- press and denoise for large language models.CoRR, abs/2402.16319, 2024

Runyu Peng, Yunhua Zhou, Qipeng Guo, Yang Gao, Hang Yan, Xipeng Qiu, and Dahua Lin. Data-freeweight com- press and denoise for large language models.CoRR, abs/2402.16319, 2024. 6, 4

work page arXiv 2024

[58] [58]

Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, and Kate Saenko

Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, and Kate Saenko. Neural parameter allocation search. InInternational Conference on Learning Repre- sentations, 2022. 2, 3, 4

work page 2022

[59] [59]

Recognizing in- door scenes

Ariadna Quattoni and Antonio Torralba. Recognizing in- door scenes. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 413–420, 2009. 4, 3

work page 2009

[60] [60]

Venkatesh Babu

Harsh Rangwani, Pradipto Mondal, Mayank Mishra, Ashish Ramayee Asokan, and R. Venkatesh Babu. Deit- lt: Distillation strikes back for vision transformer training on long-tailed datasets. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23396–23406, 2024. 1, 2

work page 2024

[61] [61]

MELoRA: Mini-ensemble low-rank adapters for parameter-efficient fine-tuning

Pengjie Ren, Chengshun Shi, Shiguang Wu, Mengqi Zhang, Zhaochun Ren, Maarten Rijke, Zhumin Chen, and Jiahuan Pei. MELoRA: Mini-ensemble low-rank adapters for parameter-efficient fine-tuning. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3052–3064, Bangkok, Thailand, 2024. Associati...

work page 2024

[62] [62]

Winogrande: an adversarial winograd schema challenge at scale.Commun

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavat- ula, and Yejin Choi. Winogrande: an adversarial winograd schema challenge at scale.Commun. ACM, 64(9):99–106,

work page

[63] [63]

Social IQa: Commonsense rea- soning about social interactions

Maarten Sap, Hannah Rashkin, Derek Chen, Ronan Le Bras, and Yejin Choi. Social IQa: Commonsense rea- soning about social interactions. InProceedings of the 2019 Conference on Empirical Methods in Natural Lan- guage Processing and the 9th International Joint Confer- ence on Natural Language Processing (EMNLP-IJCNLP), pages 4463–4473, Hong Kong, China, 2019...

work page 2019

[64] [64]

Learning implicitly re- current CNNs through parameter sharing

Pedro Savarese and Michael Maire. Learning implicitly re- current CNNs through parameter sharing. InInternational Conference on Learning Representations, 2019. 2

work page 2019

[65] [65]

You only prune once: Designing calibration- free model compression with policy learning

Ayan Sengupta, Siddhant Chaudhary, and Tanmoy Chakraborty. You only prune once: Designing calibration- free model compression with policy learning. InThe Thir- teenth International Conference on Learning Representa- tions, 2025. 2, 6, 4

work page 2025

[66] [66]

Pro- vpt: Distribution-adaptive visual prompt tuning via prompt relocation

Chikai Shang, Mengke Li, Yiqun Zhang, Zhen Chen, Jinlin Wu, Fangqing Gu, Yang Lu, and Yiu-Ming Cheung. Pro- vpt: Distribution-adaptive visual prompt tuning via prompt relocation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1558–1568,

work page

[67] [67]

Diversity-guided mlp reduc- tion for efficient large vision transformers.arXiv preprint arXiv:2506.07138, 2025

Chengchao Shen, Hourun Zhu, Gongfan Fang, Jianxin Wang, and Xinchao Wang. Diversity-guided mlp reduc- tion for efficient large vision transformers.arXiv preprint arXiv:2506.07138, 2025. 5, 6

work page arXiv 2025

[68] [68]

UPop: Unified and progressive pruning for compressing vision-language transformers

Dachuan Shi, Chaofan Tao, Ying Jin, Zhendong Yang, Chun Yuan, and Jiaqi Wang. UPop: Unified and progressive pruning for compressing vision-language transformers. In Proceedings of the 40th International Conference on Ma- chine Learning, pages 31292–31311. PMLR, 2023. 1

work page 2023

[69] [69]

See further for parameter efficient fine-tuning by standing on the shoulders of decomposition.arXiv preprint arXiv:2407.05417, 2024

Chongjie Si, Xiaokang Yang, and Wei Shen. See further for parameter efficient fine-tuning by standing on the shoulders of decomposition.arXiv preprint arXiv:2407.05417, 2024. 1

work page arXiv 2024

[70] [70]

Sparse low rank factorization for deep neural network compression.Neurocomputing, 398:185– 196, 2020

Sridhar Swaminathan, Deepak Garg, Rajkumar Kannan, and Frederic Andres. Sparse low rank factorization for deep neural network compression.Neurocomputing, 398:185– 196, 2020. 2

work page 2020

[71] [71]

Nazia Tasnim and Bryan A. Plummer. Recast: Reparam- eterized, compact weight adaptation for sequential tasks. InInternational Conference on Learning Representations (ICLR), 2025. 2, 3, 4, 5, 6, 7, 8

work page 2025

[72] [72]

Tukan, A

M. Tukan, A. Maalouf, M. Weksler, and D. Feldman. No fine-tuning, no cry: robust svd for compressing deep net- works.Sensors, 21:5599, 2021. 2

work page 2021

[73] [73]

DyLoRA: Parameter-efficient tuning of pre-trained models using dynamic search-free low-rank adaptation

Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, and Ali Ghodsi. DyLoRA: Parameter-efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3274–3287, Dubrovnik, Croatia, 2023. Association for Computational...

work page 2023

[74] [74]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Be- longie. The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technol- ogy, 2011. 4, 5, 3

work page 2011

[75] [75]

Ao Wang, Hui Chen, Zijia Lin, Sicheng Zhao, Jungong Han, and Guiguang Ding. Cait: Triple-win compression towards high accuracy, fast inference, and favorable trans- ferability for vits.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–17, 2025. 2

work page 2025

[76] [76]

H. Wang, J. Chang, Y . Zhai, X. Luo, J. Sun, Z. Lin, and Q. Tian. Lion: implicit vision prompt tuning.Proceedings of the AAAI Conference on Artificial Intelligence, 38:5372– 5380, 2024. 2

work page 2024

[77] [77]

Basis sharing: Cross-layer parameter sharing for large language model compression

Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin, Bing Li, and Grace Li Zhang. Basis sharing: Cross-layer parameter sharing for large language model compression. InThe Thir- teenth International Conference on Learning Representa- tions, 2025. 1, 2, 3, 4, 5, 6

work page 2025

[78] [78]

Neural network pa- rameter diffusion

Kaili Wang, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, and Yang You. Neural network pa- rameter diffusion. In-, 2024. 1

work page 2024

[79] [79]

SVD- LLM: Truncation-aware singular value decomposition for large language model compression

Xin Wang, Yu Zheng, Zhongwei Wan, and Mi Zhang. SVD- LLM: Truncation-aware singular value decomposition for large language model compression. InThe Thirteenth In- ternational Conference on Learning Representations, 2025. 2, 3

work page 2025

[80] [80]

Revisiting the power of prompt for visual tuning

Yuzhu Wang, Lechao Cheng, Chaowei Fang, Dingwen Zhang, Manni Duan, and Meng Wang. Revisiting the power of prompt for visual tuning. InProceedings of the 41st In- ternational Conference on Machine Learning. JMLR.org,

work page