Decompose, Mix, Adapt: A Unified Framework for Parameter-Efficient Neural Network Recombination and Compression
Pith reviewed 2026-05-14 22:13 UTC · model grok-4.3
The pith
CRISP factorizes pretrained weights into shared bases and small mixers to support both model compression and parameter-efficient fine-tuning in a single framework.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CRISP factorizes pretrained weights into basis matrices and their component mixing projections. Sharing basis matrices across layers and adjusting its size enables model compression, whereas the mixer weight's small size enables support for parameter-efficient fine-tuning. Experiments show CRISP outperforms methods from prior work capable of dual-task applications by 4-5% while also outperforming the state-of-the-art in PEFT by 1.5% and PEFT+MC combinations by 1%.
What carries the argument
Coefficient-gated weight Recombination by Interpolated Shared basis Projections (CRISP), which decomposes weights into shared basis matrices for compression and small mixing projections for adaptation.
If this is right
- Models can be compressed by reducing basis size while still adapting to new tasks with fewer than 200 additional parameters.
- Dual-task performance exceeds prior recombination methods by 4-5% on relevant benchmarks.
- The approach outperforms standalone state-of-the-art PEFT methods by 1.5% and combined PEFT plus model compression baselines by 1%.
- A single factorization replaces the need to compose separate parameter recombination techniques for compression and fine-tuning.
Where Pith is reading between the lines
- The shared-basis approach might extend naturally to transformer-based models outside computer vision tasks.
- Further reductions in basis size could be tested to determine the exact compression limits before performance degrades.
- Combining CRISP with quantization or pruning might produce additive efficiency gains not explored in the current work.
- Deployment on edge hardware could be measured directly to quantify the practical memory and latency savings.
Load-bearing premise
Factorizing pretrained weights into shared basis matrices and small component mixing projections preserves sufficient model capacity and performance when bases are shared across layers for compression.
What would settle it
A performance drop exceeding 5% on standard vision benchmarks when using the compressed CRISP model compared to uncompressed fine-tuned baselines would falsify the capacity preservation claim.
Figures
read the original abstract
Parameter Recombination (PR) methods aim to efficiently compose the weights of a neural network for applications like Parameter-Efficient FineTuning (PEFT) and Model Compression (MC), among others. Most methods typically focus on one application of PR, which can make composing them challenging. For example, when deploying a large model you may wish to compress the model and also quickly adapt to new settings. However, PEFT methods often can still contain millions of parameters. This may be small compared to the original model size, but can be problematic in resource constrained deployments like edge devices, where they take a larger portion of the compressed model's parameters. To address this, we present Coefficient-gated weight Recombination by Interpolated Shared basis Projections (CRISP), a general approach that seamlessly integrates multiple PR tasks within the same framework. CRISP accomplishes this by factorizing pretrained weights into basis matrices and their component mixing projections. Sharing basis matrices across layers and adjusting its size enables us to perform MC, whereas the mixer weight's small size (fewer than 200 in some experiments) enables CRISP to support PEFT. Experiments show CRISP outperforms methods from prior work capable of dual-task applications by 4-5\% while also outperforming the state-of-the-art in PEFT by 1.5\% and PEFT+MC combinations by 1\%. Our code is available on the repository: https://github.com/appledora/CRISP-CVPR26.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CRISP (Coefficient-gated weight Recombination by Interpolated Shared basis Projections), a unified framework for parameter recombination. Pretrained weights are factorized into shared basis matrices (reduced in size for model compression) and small per-component mixing projections (under 200 parameters for PEFT). This enables simultaneous PEFT and MC within one model. Experiments claim 4-5% gains over prior dual-task methods, 1.5% over SOTA PEFT, and 1% over PEFT+MC combinations.
Significance. If the performance claims are robustly validated, the work provides a flexible, parameter-efficient way to combine adaptation and compression, addressing practical constraints on edge devices where PEFT overhead can dominate compressed models. The code release is a positive factor for reproducibility.
major comments (3)
- [Experimental Results] Experimental Results section: Performance claims (4-5% over dual-task baselines, 1.5% over SOTA PEFT) are stated without error bars, explicit data splits, or controls for hyperparameter selection of basis dimensions and mixer sizes (listed as free parameters). This makes the reported gains difficult to interpret as generalizable rather than post-hoc.
- [Method] Method section on shared bases: The unification claim rests on the assumption that sharing basis matrices across layers (while keeping mixers small) preserves sufficient capacity for dual-task gains. No ablation comparing shared vs. layer-specific bases is provided, leaving open the risk that layer-wise variation is lost and the small mixers cannot compensate without increasing rank (defeating compression).
- [Results tables] Results tables: Comparisons to prior dual-task methods require clearer specification of which baselines support both PEFT and MC simultaneously, along with exact parameter counts and training protocols for CRISP in each regime.
minor comments (2)
- [Abstract] Abstract: The repository link points to a future CVPR26 location; replace with a stable, permanent link or include a snapshot.
- [Method] Notation: Define the interpolation operation and coefficient-gating mechanism more explicitly, including how the mixing projections are applied during recombination.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address each major comment point-by-point below and describe the revisions we will make.
read point-by-point responses
-
Referee: [Experimental Results] Experimental Results section: Performance claims (4-5% over dual-task baselines, 1.5% over SOTA PEFT) are stated without error bars, explicit data splits, or controls for hyperparameter selection of basis dimensions and mixer sizes (listed as free parameters). This makes the reported gains difficult to interpret as generalizable rather than post-hoc.
Authors: We agree that error bars, explicit data splits, and hyperparameter controls are necessary for robust interpretation. In the revised manuscript we will report mean performance with standard deviations over at least three random seeds, state the precise train/validation/test splits for every dataset, and add a dedicated paragraph describing how basis dimensions and mixer sizes were chosen via validation performance (including the search ranges and selection criterion). revision: yes
-
Referee: [Method] Method section on shared bases: The unification claim rests on the assumption that sharing basis matrices across layers (while keeping mixers small) preserves sufficient capacity for dual-task gains. No ablation comparing shared vs. layer-specific bases is provided, leaving open the risk that layer-wise variation is lost and the small mixers cannot compensate without increasing rank (defeating compression).
Authors: Sharing bases across layers is fundamental to the compression objective; layer-specific bases would multiply the basis storage cost and defeat the MC goal. We will add an ablation in the revision that compares the shared-basis CRISP model against a layer-specific variant whose per-layer ranks are reduced so that total parameter count remains comparable. The results will be reported together with a short discussion of whether the small mixers suffice to recover layer-wise capacity. revision: yes
-
Referee: [Results tables] Results tables: Comparisons to prior dual-task methods require clearer specification of which baselines support both PEFT and MC simultaneously, along with exact parameter counts and training protocols for CRISP in each regime.
Authors: We will update the tables and text to explicitly mark which baselines support simultaneous PEFT and MC, list exact trainable-parameter counts for CRISP and every baseline in each regime (PEFT-only, MC-only, dual-task), and append a supplementary table or paragraph detailing the optimizer, learning-rate schedule, batch size, and number of epochs/steps used for CRISP under each setting. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The CRISP framework is constructed by factorizing pretrained weights into shared basis matrices (for compression via size adjustment) and small per-component mixing projections (for PEFT via low parameter count). This decomposition follows standard low-rank ideas and directly enables the dual-task unification by varying basis rank and mixer size, without any equation or claim reducing the outputs to the inputs by definition. Performance numbers (4-5% gains over dual-task priors, 1.5% over SOTA PEFT) are presented as separate empirical results rather than predictions forced by the factorization itself. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps in the abstract or described method; the approach remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- basis matrix dimensions
- mixer projection size
axioms (1)
- domain assumption Pretrained neural network weights can be factorized into basis matrices and mixing projections without substantial loss of expressivity
Reference graph
Works this paper leans on
-
[1]
P. Agand. Knowledge distillation from single-task teachers to multi-task student for end-to-end autonomous driving. Proceedings of the AAAI Conference on Artificial Intelli- gence, 38:23375–23376, 2024. 2
work page 2024
-
[2]
Prune efficiently by soft prun- ing
Parakh Agarwal, Manu Mathew, Kunal Ranjan Patel, Varun Tripathi, and Pramod Swami. Prune efficiently by soft prun- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) Workshops, pages 2210–2217, 2024. 2
work page 2024
-
[3]
Sabbir Ahmed, Abdullah Al Arafat, Deniz Najafi, Akhlak Mahmood, Mamshad Nayeem Rizve, Mohaiminul Al Nahian, Ranyang Zhou, Shaahin Angizi, and Ad- nan Siraj Rakin. Deepcompress-vit: Rethinking model compression to enhance efficiency of vision transformers at the edge. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR...
work page 2025
-
[4]
Piqa: Reasoning about physical commonsense in nat- ural language
Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al. Piqa: Reasoning about physical commonsense in nat- ural language. InProceedings of the AAAI conference on artificial intelligence, pages 7432–7439, 2020. 4
work page 2020
-
[5]
Fcp dis vit: Efficient vision transformer with neural network pruning
Yuhao Cao. Fcp dis vit: Efficient vision transformer with neural network pruning. In2024 IEEE 4th International Conference on Power, Electronics and Computer Applica- tions (ICPECA), pages 1216–1221, 2024. 2
work page 2024
-
[6]
Liang Chen, Haozhe Zhao, Tianyu Liu, Shuai Bai, Junyang Lin, Chang Zhou, and Baobao Chang. An image is worth 1/2 tokens after layer 2: Plug-and-play inference accelera- tion for large vision-language models. InProceedings of the European Conference on Computer Vision (ECCV), 2024. 1
work page 2024
-
[7]
Adaptformer: adapting vision transformers for scalable visual recogni- tion
Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: adapting vision transformers for scalable visual recogni- tion. InProceedings of the 36th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2022. Curran Associates Inc. 1
work page 2022
-
[8]
Lorashear: Efficient large language model structured pruning and knowledge recovery, 2023
Tianyi Chen, Tianyu Ding, Badal Yadav, Ilya Zharkov, and Luming Liang. Lorashear: Efficient large language model structured pruning and knowledge recovery, 2023. 2
work page 2023
-
[9]
BoolQ: Exploring the surprising difficulty of natural yes/no questions
Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)...
work page 2019
-
[10]
Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018. 4
work page 2018
-
[11]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 4, 5, 6, 7, 1
work page 2009
-
[12]
Comedian: Self-supervised Table 10
Julien Denize, Mykola Liashuha, Jaonary Rabarisoa, Astrid Orcesi, and Romain H ´erault. Comedian: Self-supervised Table 10. Comparison of PEFT methods on commonsense reasoning benchmarks. Results from LoRA and DoRA are taken from Liu et al. [47], HiRA results are from Huang et al. [30]. We find that CRISP is on par or better than custom PEFT methods while...
work page 2024
-
[13]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An im- age is worth 16x16 words: Transformers for image recog- nition at scale.CoRR, abs/2010.11929, 2020. 2, 4, 5, 6, 7, 8, 3
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[14]
Stefan Elfwing, Eiji Uchibe, and Kenji Doya. Sigmoid- weighted linear units for neural network function approx- imation in reinforcement learning.Neural Networks, 107: 3–11, 2018. Special issue on deep reinforcement learning. 4
work page 2018
-
[15]
Ziya Erkoc ¸, Fangchang Ma, Qi Shan, Matthias Nießner, and Angela Dai. Hyperdiffusion: Generating implicit neu- ral fields with weight-space diffusion.2023 IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 14254–14264, 2023. 1
work page 2023
-
[16]
Isomorphic pruning for vision models
Gongfan Fang, Xinyin Ma, Michael Bi Mi, and Xinchao Wang. Isomorphic pruning for vision models. InComputer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XXX, page 232–250, Berlin, Heidelberg, 2024. Springer- Verlag. 2, 5, 6
work page 2024
-
[17]
The language model evaluation har- ness, 2024
Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Gold- ing, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle Mc- Donell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lin- tang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. The l...
work page 2024
-
[18]
Ross Girshick. Fast r-cnn. InProceedings of the IEEE In- ternational Conference on Computer Vision (ICCV), 2015. 4
work page 2015
-
[19]
Patrick Glandorf and Bodo Rosenhahn. Pruning by block benefit: Exploring the properties of vision transformer blocks during domain adaptation. InInternational Confer- ence on Computer Vision Workshop, 2025. 1
work page 2025
-
[20]
Deep sparse rectifier neural networks
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. InProceedings of the Four- teenth International Conference on Artificial Intelligence and Statistics, pages 315–323, Fort Lauderdale, FL, USA,
-
[21]
Balf: Budgeted activation- aware low-rank factorization for fine-tuning-free model compression, 2025
David Gonz ´alez-Mart´ınez. Balf: Budgeted activation- aware low-rank factorization for fine-tuning-free model compression, 2025. 5, 6
work page 2025
- [22]
-
[23]
Weight copy and low-rank adaptation for few-shot distillation of vision transformers
Diana-Nicoleta Grigore, Mariana-Iuliana Georgescu, Jon Alvarez Justo, Tor Johansen, Andreea Iuliana Ionescu, and Radu Tudor Ionescu. Weight copy and low-rank adaptation for few-shot distillation of vision transformers. InProceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 7368–7378, 2025. 2, 5, 6, 7
work page 2025
-
[24]
David Ha, Andrew M. Dai, and Quoc V . Le. Hypernet- works. In5th International Conference on Learning Repre- sentations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017. 3
work page 2017
-
[25]
Ligong Han, Yinxiao Li, Han Zhang, Peyman Milanfar, Dimitris Metaxas, and Feng Yang. Svdiff: Compact pa- rameter space for diffusion fine-tuning.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 2
work page 2023
-
[26]
Learning effi- cient vision transformers via fine-grained manifold distil- lation
Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang, Han Hu, and Yunhe Wang. Learning effi- cient vision transformers via fine-grained manifold distil- lation. InAdvances in Neural Information Processing Sys- tems, 2022. 1, 2
work page 2022
-
[27]
Multi-dimensional model compression of vision transformer
Zejiang Hou and Sun-Yuan Kung. Multi-dimensional model compression of vision transformer. In2022 IEEE International Conference on Multimedia and Expo (ICME), pages 01–06, 2022. 2
work page 2022
-
[28]
LoRA: Low-rank adaptation of large language mod- els
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language mod- els. InInternational Conference on Learning Representa- tions, 2022. 1, 2, 3, 5, 6, 8, 4
work page 2022
-
[29]
LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models
Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Soujanya Poria, and Roy Lee. LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Lan- guage Processing, pages 5254–5276, Singapore, 2023. As- sociation for Computatio...
work page 2023
-
[30]
HiRA: Parameter-efficient hadamard high-rank adaptation for large language models
Qiushi Huang, Tom Ko, Zhan Zhuang, Lilian Tang, and Yu Zhang. HiRA: Parameter-efficient hadamard high-rank adaptation for large language models. InThe Thirteenth In- ternational Conference on Learning Representations, 2025. 6
work page 2025
-
[31]
Finding lottery tickets in vision models via data-driven spectral foresight pruning
Leonardo Iurada, Marco Ciccone, and Tatiana Tommasi. Finding lottery tickets in vision models via data-driven spectral foresight pruning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16142–16151, 2024. 2
work page 2024
-
[32]
Minchan Kang, Sanghyeok Son, and Daeshik Kim. Adap- tive class token knowledge distillation for efficient vi- sion transformer.Knowledge-Based Systems, 304:112531,
-
[33]
The need for speed: Pruning transformers with one recipe
Samir Khaki and Konstantinos N Plataniotis. The need for speed: Pruning transformers with one recipe. InThe Twelfth International Conference on Learning Representa- tions, 2024. 2
work page 2024
-
[34]
A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images.Master’s thesis, Department of Computer Science, University of Toronto, 2009. 4, 5, 3
work page 2009
-
[35]
Dct-vit: High-frequency pruned vision transformer with discrete cosine transform
Jongho Lee and Hyun Kim. Dct-vit: High-frequency pruned vision transformer with discrete cosine transform. IEEE Access, 12:80386–80396, 2024. 2
work page 2024
-
[36]
Discovering sparsity allocation for layer- wise pruning of large language models
Lujun Li, Peijie Dong, Zhenheng Tang, Xiang Liu, Qiang Wang, Wenhan Luo, Wei Xue, Qifeng Liu, Xiaowen Chu, and Yike Guo. Discovering sparsity allocation for layer- wise pruning of large language models. InProceedings of the 38th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2024. Curran Associates Inc. 2
work page 2024
-
[37]
Vb-lora: Extreme parameter efficient fine-tuning with vector banks
Yang Li, Shaobo Han, and Shihao Ji. Vb-lora: Extreme parameter efficient fine-tuning with vector banks. InThe 38th Conference on Neural Information Processing Systems (NeurIPS), 2024. 2, 5, 8
work page 2024
-
[38]
LLaMA-VID: An image is worth 2 tokens in large language models
Yanwei Li, Chengyao Wang, and Jiaya Jia. LLaMA-VID: An image is worth 2 tokens in large language models. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, 2024. 1, 2, 4
work page 2024
-
[39]
Scaling & shifting your features: A new baseline for efficient model tuning
Dongze Lian, Daquan Zhou, Jiashi Feng, and Xinchao Wang. Scaling & shifting your features: A new baseline for efficient model tuning. InAdvances in Neural Informa- tion Processing Systems (NeurIPS), 2022. 5, 6, 7, 8
work page 2022
-
[40]
Inflora: Interference-free low-rank adaptation for continual learning
Yan-Shuo Liang and Wu-Jun Li. Inflora: Interference-free low-rank adaptation for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 23638–23647, 2024. 1
work page 2024
-
[41]
3-in-1: 2d rotary adapta- tion for efficient finetuning, efficient batching and compos- ability
Baohao Liao and Christof Monz. 3-in-1: 2d rotary adapta- tion for efficient finetuning, efficient batching and compos- ability. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 2, 5, 8
work page 2024
-
[42]
Slimgpt: Layer-wise structured pruning for large language models
Gui Ling, Ziyang Wang, Yuliang Yan, and Qingwen Liu. Slimgpt: Layer-wise structured pruning for large language models. InAdvances in Neural Information Processing Sys- tems, pages 107112–107137. Curran Associates, Inc., 2024. 2
work page 2024
-
[43]
SVFT: Parameter-efficient fine-tuning with singular vectors
Vijay Lingam, Atula Tejaswi Neerkaje, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Eunsol Choi, Alex Dimakis, Aleksandar Bojchevski, and sujay sanghavi. SVFT: Parameter-efficient fine-tuning with singular vectors. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 5, 6, 7, 8
work page 2024
-
[44]
Pruning via merging: Com- pressing LLMs via manifold alignment based layer merg- ing
Deyuan Liu, Zhanyue Qin, Hairu Wang, Zhao Yang, Zecheng Wang, Fangying Rong, Qingbin Liu, Yanchao Hao, Bo Li, Xi Chen, Cunhang Fan, Zhao Lv, Dianhui Chu, Zhiying Tu, and Dianbo Sui. Pruning via merging: Com- pressing LLMs via manifold alignment based layer merg- ing. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,...
work page 2024
-
[45]
Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning
Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin Raffel. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. InProceedings of the 36th Interna- tional Conference on Neural Information Processing Sys- tems, Red Hook, NY , USA, 2022. Curran Associates Inc. 2, 5, 8
work page 2022
-
[46]
Small scale data-free knowledge distillation
He Liu, Yikai Wang, Huaping Liu, Fuchun Sun, and An- bang Yao. Small scale data-free knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 6008–6016,
-
[47]
Dora: weight-decomposed low-rank adaptation
Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: weight-decomposed low-rank adaptation. InProceedings of the 41st International Con- ference on Machine Learning. JMLR.org, 2024. 1, 2, 5, 8, 4, 6
work page 2024
-
[48]
Black, Adrian Weller, and Bernhard Sch ¨olkopf
Weiyang Liu, Zeju Qiu, Yao Feng, Yuliang Xiu, Yuxuan Xue, Longhui Yu, Haiwen Feng, Zhen Liu, Juyeon Heo, Songyou Peng, Yandong Wen, Michael J. Black, Adrian Weller, and Bernhard Sch ¨olkopf. Parameter-efficient or- thogonal finetuning via butterfly factorization. InICLR,
-
[49]
Decoupled weight de- cay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learn- ing Representations, 2019. 4
work page 2019
-
[50]
S. Maji, J. Kannala, E. Rahtu, M. Blaschko, and A. Vedaldi. Fine-grained visual classification of aircraft.-, 2013. 4, 5, 3
work page 2013
-
[51]
Junzhu Mao, Yang Shen, Jinyang Guo, Yazhou Yao, Xian- sheng Hua, and Hengtao Shen. Prune and merge: Efficient token compression for vision transformer with spatial in- formation preserved.IEEE Transactions on Multimedia, 27:4670–4683, 2025. 1
work page 2025
-
[52]
Can a suit of armor conduct electricity? a new dataset for open book question answering
Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sab- harwal. Can a suit of armor conduct electricity? a new dataset for open book question answering. InProceed- ings of the 2018 Conference on Empirical Methods in Natu- ral Language Processing, pages 2381–2391, Brussels, Bel- gium, 2018. Association for Computational Linguistics. 4
work page 2018
-
[53]
RoSA: Accurate parameter-efficient fine-tuning via robust adaptation
Mahdi Nikdan, Soroush Tabesh, Elvir Crn ˇcevi´c, and Dan Alistarh. RoSA: Accurate parameter-efficient fine-tuning via robust adaptation. InForty-first International Confer- ence on Machine Learning, 2024. 2, 5
work page 2024
-
[54]
RoSA: Accurate parameter-efficient fine-tuning via robust adaptation
Mahdi Nikdan, Soroush Tabesh, Elvir Crn ˇcevi´c, and Dan Alistarh. RoSA: Accurate parameter-efficient fine-tuning via robust adaptation. InProceedings of the 41st Inter- national Conference on Machine Learning, pages 38187– 38206. PMLR, 2024. 1, 2
work page 2024
-
[55]
Automated flower classification over a large number of classes
Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. InIn- dian Conference on Computer Vision, Graphics and Image Processing, 2008. 4, 3
work page 2008
-
[56]
Fair-vpt: Fair visual prompt tuning for image classification
Sungho Park and Hyeran Byun. Fair-vpt: Fair visual prompt tuning for image classification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12268–12278, 2024. 2
work page 2024
-
[57]
Data-freeweight com- press and denoise for large language models.CoRR, abs/2402.16319, 2024
Runyu Peng, Yunhua Zhou, Qipeng Guo, Yang Gao, Hang Yan, Xipeng Qiu, and Dahua Lin. Data-freeweight com- press and denoise for large language models.CoRR, abs/2402.16319, 2024. 6, 4
-
[58]
Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, and Kate Saenko
Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, and Kate Saenko. Neural parameter allocation search. InInternational Conference on Learning Repre- sentations, 2022. 2, 3, 4
work page 2022
-
[59]
Ariadna Quattoni and Antonio Torralba. Recognizing in- door scenes. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 413–420, 2009. 4, 3
work page 2009
-
[60]
Harsh Rangwani, Pradipto Mondal, Mayank Mishra, Ashish Ramayee Asokan, and R. Venkatesh Babu. Deit- lt: Distillation strikes back for vision transformer training on long-tailed datasets. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23396–23406, 2024. 1, 2
work page 2024
-
[61]
MELoRA: Mini-ensemble low-rank adapters for parameter-efficient fine-tuning
Pengjie Ren, Chengshun Shi, Shiguang Wu, Mengqi Zhang, Zhaochun Ren, Maarten Rijke, Zhumin Chen, and Jiahuan Pei. MELoRA: Mini-ensemble low-rank adapters for parameter-efficient fine-tuning. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3052–3064, Bangkok, Thailand, 2024. Associati...
work page 2024
-
[62]
Winogrande: an adversarial winograd schema challenge at scale.Commun
Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavat- ula, and Yejin Choi. Winogrande: an adversarial winograd schema challenge at scale.Commun. ACM, 64(9):99–106,
-
[63]
Social IQa: Commonsense rea- soning about social interactions
Maarten Sap, Hannah Rashkin, Derek Chen, Ronan Le Bras, and Yejin Choi. Social IQa: Commonsense rea- soning about social interactions. InProceedings of the 2019 Conference on Empirical Methods in Natural Lan- guage Processing and the 9th International Joint Confer- ence on Natural Language Processing (EMNLP-IJCNLP), pages 4463–4473, Hong Kong, China, 2019...
work page 2019
-
[64]
Learning implicitly re- current CNNs through parameter sharing
Pedro Savarese and Michael Maire. Learning implicitly re- current CNNs through parameter sharing. InInternational Conference on Learning Representations, 2019. 2
work page 2019
-
[65]
You only prune once: Designing calibration- free model compression with policy learning
Ayan Sengupta, Siddhant Chaudhary, and Tanmoy Chakraborty. You only prune once: Designing calibration- free model compression with policy learning. InThe Thir- teenth International Conference on Learning Representa- tions, 2025. 2, 6, 4
work page 2025
-
[66]
Pro- vpt: Distribution-adaptive visual prompt tuning via prompt relocation
Chikai Shang, Mengke Li, Yiqun Zhang, Zhen Chen, Jinlin Wu, Fangqing Gu, Yang Lu, and Yiu-Ming Cheung. Pro- vpt: Distribution-adaptive visual prompt tuning via prompt relocation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1558–1568,
-
[67]
Chengchao Shen, Hourun Zhu, Gongfan Fang, Jianxin Wang, and Xinchao Wang. Diversity-guided mlp reduc- tion for efficient large vision transformers.arXiv preprint arXiv:2506.07138, 2025. 5, 6
-
[68]
UPop: Unified and progressive pruning for compressing vision-language transformers
Dachuan Shi, Chaofan Tao, Ying Jin, Zhendong Yang, Chun Yuan, and Jiaqi Wang. UPop: Unified and progressive pruning for compressing vision-language transformers. In Proceedings of the 40th International Conference on Ma- chine Learning, pages 31292–31311. PMLR, 2023. 1
work page 2023
-
[69]
Chongjie Si, Xiaokang Yang, and Wei Shen. See further for parameter efficient fine-tuning by standing on the shoulders of decomposition.arXiv preprint arXiv:2407.05417, 2024. 1
-
[70]
Sparse low rank factorization for deep neural network compression.Neurocomputing, 398:185– 196, 2020
Sridhar Swaminathan, Deepak Garg, Rajkumar Kannan, and Frederic Andres. Sparse low rank factorization for deep neural network compression.Neurocomputing, 398:185– 196, 2020. 2
work page 2020
-
[71]
Nazia Tasnim and Bryan A. Plummer. Recast: Reparam- eterized, compact weight adaptation for sequential tasks. InInternational Conference on Learning Representations (ICLR), 2025. 2, 3, 4, 5, 6, 7, 8
work page 2025
- [72]
-
[73]
Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, and Ali Ghodsi. DyLoRA: Parameter-efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3274–3287, Dubrovnik, Croatia, 2023. Association for Computational...
work page 2023
-
[74]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Be- longie. The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technol- ogy, 2011. 4, 5, 3
work page 2011
-
[75]
Ao Wang, Hui Chen, Zijia Lin, Sicheng Zhao, Jungong Han, and Guiguang Ding. Cait: Triple-win compression towards high accuracy, fast inference, and favorable trans- ferability for vits.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–17, 2025. 2
work page 2025
-
[76]
H. Wang, J. Chang, Y . Zhai, X. Luo, J. Sun, Z. Lin, and Q. Tian. Lion: implicit vision prompt tuning.Proceedings of the AAAI Conference on Artificial Intelligence, 38:5372– 5380, 2024. 2
work page 2024
-
[77]
Basis sharing: Cross-layer parameter sharing for large language model compression
Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin, Bing Li, and Grace Li Zhang. Basis sharing: Cross-layer parameter sharing for large language model compression. InThe Thir- teenth International Conference on Learning Representa- tions, 2025. 1, 2, 3, 4, 5, 6
work page 2025
-
[78]
Neural network pa- rameter diffusion
Kaili Wang, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, and Yang You. Neural network pa- rameter diffusion. In-, 2024. 1
work page 2024
-
[79]
SVD- LLM: Truncation-aware singular value decomposition for large language model compression
Xin Wang, Yu Zheng, Zhongwei Wan, and Mi Zhang. SVD- LLM: Truncation-aware singular value decomposition for large language model compression. InThe Thirteenth In- ternational Conference on Learning Representations, 2025. 2, 3
work page 2025
-
[80]
Revisiting the power of prompt for visual tuning
Yuzhu Wang, Lechao Cheng, Chaowei Fang, Dingwen Zhang, Manni Duan, and Meng Wang. Revisiting the power of prompt for visual tuning. InProceedings of the 41st In- ternational Conference on Machine Learning. JMLR.org,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.