PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark
Pith reviewed 2026-05-17 05:04 UTC · model grok-4.3
The pith
PEFT-Bench offers a standardized way to compare parameter-efficient fine-tuning methods for large language models while factoring in training and inference costs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that PEFT-Bench, applied across 27 NLP datasets and 7 PEFT methods on autoregressive LLMs, combined with the PSCP metric that incorporates trainable parameters, inference speed, and training memory, enables more reproducible and practical comparisons of these methods than prior limited evaluations.
What carries the argument
PEFT-Bench, the unified end-to-end benchmark, and the PEFT Soft Cost Penalties (PSCP) metric that weights performance by training and inference costs.
If this is right
- PEFT methods can now be ranked consistently across tasks instead of relying on scattered individual studies.
- The PSCP metric produces efficiency-aware rankings that favor methods with lower memory and faster inference.
- Researchers gain a shared testbed that makes it easier to identify which fine-tuning approaches scale to new tasks.
- Adoption of the benchmark could reduce redundant experiments and improve comparability in the field.
Where Pith is reading between the lines
- If widely used, the benchmark might steer new PEFT designs toward explicit optimization of the PSCP factors.
- The approach could extend naturally to measuring long-term deployment costs beyond initial training.
- Rankings from this setup might differ when tested on specialized domains or much larger model scales.
Load-bearing premise
The 27 datasets and 7 methods chosen are representative enough to support general claims about PEFT method quality, and the specific cost weightings in the PSCP metric match practical needs.
What would settle it
Re-evaluating the same methods on a new collection of datasets or models yields substantially different performance rankings or shows that high-PSCP methods fail in real-world low-resource deployments.
Figures
read the original abstract
Despite the state-of-the-art performance of Large Language Models (LLMs) achieved on many tasks, their massive scale often leads to high computational and environmental costs, limiting their accessibility. Parameter-Efficient Fine-Tuning (PEFT) methods address this challenge by reducing the number of trainable parameters while maintaining strong downstream performance. Despite the advances in PEFT methods, current evaluations remain limited (in terms of evaluated models and datasets) and difficult to reproduce. To bridge this gap, we introduce PEFT-Bench, a unified end-to-end benchmark for evaluating diverse PEFT methods on autoregressive LLMs. We demonstrate its usage across 27 NLP datasets and 7 PEFT methods. To account for different PEFT training and inference factors, we also introduce the PEFT Soft Cost Penalties (PSCP) metric, which takes trainable parameters, inference speed, and training memory usage into account.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PEFT-Bench, a unified end-to-end benchmark for evaluating diverse Parameter-Efficient Fine-Tuning (PEFT) methods on autoregressive LLMs. It demonstrates the benchmark across 27 NLP datasets and 7 PEFT methods, and proposes the PEFT Soft Cost Penalties (PSCP) metric that incorporates trainable parameters, inference speed, and training memory usage to support more comprehensive evaluations.
Significance. If the benchmark implementation includes proper statistical validation, variance handling, and the PSCP metric is shown to produce stable, practically useful rankings, this work could provide a valuable standardized framework for fair and reproducible comparisons of PEFT methods, helping address current limitations in evaluation scope within the field.
major comments (3)
- Abstract: The abstract states the benchmark scope and introduces PSCP but provides no details on implementation, statistical validation of the metric, handling of variance across runs, or justification for dataset and method selection; this leaves the central claim of improved reproducibility and fair comparison without sufficient support.
- §3 (Dataset and Method Selection): The representativeness of the 27 NLP datasets and 7 PEFT methods for drawing general conclusions about PEFT method quality is not evidenced; without analysis of task diversity (e.g., classification, generation, reasoning) or coverage of major PEFT families (LoRA variants, adapters, prefix-tuning), the resulting rankings may not generalize.
- §4 (PSCP Metric Definition): The PSCP metric combines trainable parameters, inference speed, and training memory without explicit justification for the weighting scheme, sensitivity analysis, or comparison to existing cost models; this risks arbitrary and unstable rankings under different application constraints.
minor comments (1)
- Notation and formulas: The mathematical definition of the PSCP metric would benefit from a clearer, self-contained equation to improve reproducibility and ease of implementation by other researchers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating where revisions will be incorporated to improve clarity and support for our claims.
read point-by-point responses
-
Referee: Abstract: The abstract states the benchmark scope and introduces PSCP but provides no details on implementation, statistical validation of the metric, handling of variance across runs, or justification for dataset and method selection; this leaves the central claim of improved reproducibility and fair comparison without sufficient support.
Authors: The abstract is kept concise to summarize the core contributions. Implementation details appear in Sections 3 and 4, and the experimental protocol includes multiple runs with different random seeds to report averaged results. To directly address the concern, we will revise the abstract to briefly note the multi-run evaluation for variance handling and the selection rationale for datasets and methods. revision: yes
-
Referee: §3 (Dataset and Method Selection): The representativeness of the 27 NLP datasets and 7 PEFT methods for drawing general conclusions about PEFT method quality is not evidenced; without analysis of task diversity (e.g., classification, generation, reasoning) or coverage of major PEFT families (LoRA variants, adapters, prefix-tuning), the resulting rankings may not generalize.
Authors: Section 3 describes the 27 datasets covering classification, generation, and reasoning tasks drawn from established benchmarks, along with 7 PEFT methods spanning adapter, LoRA, and prefix-tuning families. We agree an explicit diversity analysis is beneficial and will add a dedicated paragraph in §3 with categorization tables and references to demonstrate coverage of major categories. revision: yes
-
Referee: §4 (PSCP Metric Definition): The PSCP metric combines trainable parameters, inference speed, and training memory without explicit justification for the weighting scheme, sensitivity analysis, or comparison to existing cost models; this risks arbitrary and unstable rankings under different application constraints.
Authors: The PSCP weights prioritize trainable parameters as the dominant efficiency factor in PEFT settings, with secondary terms for memory and speed. We will expand §4 to include explicit justification for the chosen weights, results from sensitivity analysis under varied constraints, and direct comparisons to prior cost models in the PEFT literature. revision: yes
Circularity Check
No circularity: benchmark and metric defined from observables without self-referential reduction
full rationale
The paper's core contribution is the creation of PEFT-Bench as an evaluation framework and the PSCP metric, both constructed directly from observable quantities (trainable parameters, inference speed, training memory) rather than any derived prediction or equation that loops back to inputs. No load-bearing derivations, fitted parameters renamed as predictions, or self-citation chains appear in the described claims. The usage across 27 datasets and 7 methods is an empirical demonstration, not a self-definitional or uniqueness-imported result. The paper remains self-contained against external benchmarks for its stated purpose of providing a unified evaluation setup.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 27 NLP datasets and 7 PEFT methods provide a representative sample for general PEFT evaluation.
invented entities (1)
-
PEFT Soft Cost Penalties (PSCP) metric
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose the PEFT Soft Cost Penalties metric (PSCP), which introduces a number of trainable parameters, memory usage, and inference speed in the final score calculation.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift
Supervised fine-tuning with 0.1% labeled data outperforms all 60 tested prompt variants for CLIPSeg cloud segmentation on satellite imagery under domain shift.
-
PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models
PEFT-Factory supplies a ready-to-use, extensible codebase that unifies 19 PEFT methods and evaluation pipelines for fine-tuning large autoregressive language models.
Reference graph
Works this paper leans on
-
[1]
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, and 1 others. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
Aida Amini, Saadia Gabriel, Shanchuan Lin, Rik Koncel-Kedziorski, Yejin Choi, and Hannaneh Hajishirzi. 2019. https://doi.org/10.18653/v1/N19-1245 M ath QA : Towards interpretable math word problem solving with operation-based formalisms . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics:...
-
[5]
Akari Asai, Mohammadreza Salehi, Matthew Peters, and Hannaneh Hajishirzi. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.446 ATTEMPT : Parameter-efficient multi-task tuning via attentional mixtures of soft prompts . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6655--6672, Abu Dhabi, United Arab Emirat...
-
[6]
Roy Bar Haim, Ido Dagan, Bill Dolan, Lisa Ferro, Danilo Giampiccolo, Bernardo Magnini, and Idan Szpektor. 2006. The second PASCAL recognising textual entailment challenge
work page 2006
-
[7]
Robert Belanec, Simon Ostermann, Ivan Srba, and Maria Bielikova. 2025. Task prompt vectors: Effective initialization through multi-task soft prompt transfer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 77--94. Springer
work page 2025
-
[8]
Luisa Bentivogli, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo, and Bernardo Magnini. 2009. The fifth PASCAL recognizing textual entailment challenge
work page 2009
-
[9]
Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, and 1 others. 2020. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432--7439
work page 2020
-
[10]
Daniel Cer, Mona Diab, Eneko Agirre, I \ n igo Lopez-Gazpio, and Lucia Specia. 2017. https://doi.org/10.18653/v1/S17-2001 S em E val-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation . In Proceedings of the 11th International Workshop on Semantic Evaluation ( S em E val-2017) , pages 1--14, Vancouver, Canada. ACL
-
[11]
Sahil Chaudhary. 2023. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca
work page 2023
-
[12]
Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1300 B ool Q : Exploring the surprising difficulty of natural yes/no questions . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language...
-
[13]
Charles W Cobb and Paul H Douglas. 1928. A theory of production. The American economic review, 18(1):139--165
work page 1928
-
[14]
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, and 1 others. 2021. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[15]
Ido Dagan, Oren Glickman, and Bernardo Magnini. 2005. https://doi.org/10.1007/11736790_9 The pascal recognising textual entailment challenge . In Proceedings of the First International Conference on Machine Learning Challenges: Evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment, MLCW'05, page 177–190, Berlin...
-
[16]
Marie-Catherine De Marneffe, Mandy Simons, and Judith Tonhauser. 2019. The commitmentbank: Investigating projection in naturally occurring discourse. In proceedings of Sinn und Bedeutung, volume 23, pages 107--124
work page 2019
-
[17]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long a...
-
[18]
Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, and 1 others. 2023. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220--235
work page 2023
-
[19]
William B Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the International Workshop on Paraphrasing
work page 2005
-
[20]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, and 1 others. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
William Fedus, Barret Zoph, and Noam Shazeer. 2022. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 23(120):1--39
work page 2022
-
[22]
Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. 2007. The third PASCAL recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, pages 1--9. Association for Computational Linguistics
work page 2007
-
[23]
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. https://openreview.net/forum?id=d7KBjmI3GmQ Measuring massive multitask language understanding . In International Conference on Learning Representations
work page 2021
-
[24]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, and 1 others. 2022. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3
work page 2022
-
[25]
Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, and 1 others. 2023. Mistral 7b. arXiv preprint arXiv:2310.06825
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. 2018. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), page...
work page 2018
-
[27]
Tushar Khot, Ashish Sabharwal, and Peter Clark. 2019. https://doi.org/10.18653/v1/D19-1281 What ' s missing: A knowledge gap guided approach for multi-hop question answering . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), p...
-
[28]
Pengxiang Lan, Haoyu Xu, Enneng Yang, Yuliang Liang, Guibing Guo, Jianzhe Zhao, and Xingwei Wang. 2025. https://aclanthology.org/2025.naacl-long.225/ Efficient and effective prompt tuning via prompt decomposition and compressed outer product . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational...
work page 2025
-
[29]
Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. https://doi.org/10.18653/v1/2021.emnlp-main.243 The power of scale for parameter-efficient prompt tuning . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045--3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics
-
[30]
Hector J Levesque, Ernest Davis, and Leora Morgenstern. 2011. The W inograd schema challenge. In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning , volume 46, page 47
work page 2011
-
[31]
Xiang Lisa Li and Percy Liang. 2021. https://doi.org/10.18653/v1/2021.acl-long.353 Prefix-tuning: Optimizing continuous prompts for generation . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582--4597, Onl...
- [32]
-
[33]
Vijay Lingam, Atula Tejaswi Neerkaje, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Eunsol Choi, Alex Dimakis, Aleksandar Bojchevski, and sujay sanghavi. 2024. https://openreview.net/forum?id=DOUskwCqg5 SVFT : Parameter-efficient fine-tuning with singular vectors . In 2nd Workshop on Advancing Neural Network Training: Computational Ef...
work page 2024
-
[34]
Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin A Raffel. 2022 a . Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950--1965
work page 2022
-
[35]
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023 a . https://doi.org/10.1145/3560815 Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing . ACM Comput. Surv., 55(9)
-
[36]
Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2022 b . https://doi.org/10.18653/v1/2022.acl-short.8 P -tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61--68, Dub...
-
[37]
Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. 2023 b . Gpt understands, too. AI Open
work page 2023
-
[38]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[39]
Ilya Loshchilov and Frank Hutter. 2019. https://openreview.net/forum?id=Bkg6RiCqY7 Decoupled weight decay regularization . In International Conference on Learning Representations
work page 2019
-
[40]
Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul, and Benjamin Bossan. 2022. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft
work page 2022
-
[41]
Arkil Patel, Satwik Bhattamishra, and Navin Goyal. 2021. https://doi.org/10.18653/v1/2021.naacl-main.168 Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2080--2094, Online. Association for ...
work page internal anchor Pith review doi:10.18653/v1/2021.naacl-main.168 2021
-
[42]
Mohammad Taher Pilehvar and Jose Camacho-Collados. 2019. https://doi.org/10.18653/v1/N19-1128 W i C : the word-in-context dataset for evaluating context-sensitive meaning representations . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and S...
-
[43]
Nusrat Jahan Prottasha, Upama Roy Chowdhury, Shetu Mohanto, Tasfia Nuzhat, Abdullah As Sami, Md Shamol Ali, Md Shohanur Islam Sobuj, Hafijur Raman, Md Kowsher, and Ozlem Ozmen Garibay. 2025. Peft a2z: Parameter-efficient fine-tuning survey for large language and vision models. arXiv preprint arXiv:2504.14117
-
[44]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, and 1 others. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9
work page 2019
-
[45]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1--67
work page 2020
-
[46]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQ u AD : 100,000+ questions for machine comprehension of text. In Proceedings of EMNLP, pages 2383--2392. Association for Computational Linguistics
work page 2016
-
[47]
Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[48]
Melissa Roemmele, Cosmin Adrian Bejan, and Andrew S Gordon. 2011. Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In AAAI spring symposium: logical formalizations of commonsense reasoning, pages 90--95
work page 2011
-
[49]
Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. 2021. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99--106
work page 2021
-
[50]
Maarten Sap, Hannah Rashkin, Derek Chen, Ronan Le Bras, and Yejin Choi. 2019. https://doi.org/10.18653/v1/D19-1454 Social IQ a: Commonsense reasoning about social interactions . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP),...
-
[51]
Zhengxiang Shi and Aldo Lipani. 2024. https://openreview.net/forum?id=KjegfPGRde De PT : Decomposed prompt tuning for parameter-efficient fine-tuning . In The Twelfth International Conference on Learning Representations
work page 2024
-
[52]
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on EMNLP, pages 1631--1642
work page 2013
-
[53]
Qi Sun, Edoardo Cetin, and Yujin Tang. 2025. https://openreview.net/forum?id=dh4t9qmcvK Transformer-squared: Self-adaptive LLM s . In The Thirteenth International Conference on Learning Representations
work page 2025
-
[54]
Pengwei Tang, Xiaolin Hu, and Yong Liu. 2025. https://openreview.net/forum?id=fswihJIYbd AD e PT : Adaptive decomposed prompt tuning for parameter-efficient fine-tuning . In The Thirteenth International Conference on Learning Representations
work page 2025
-
[55]
Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, and 1 others. 2024. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[56]
A Vaswani. 2017. Attention is all you need. Advances in Neural Information Processing Systems
work page 2017
-
[57]
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2019. Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, 32
work page 2019
-
[58]
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[59]
Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman. 2019. https://doi.org/10.1162/tacl_a_00290 Neural network acceptability judgments . Transactions of the ACL, 7:625--641
-
[60]
Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. https://doi.org/10.18653/v1/N18-1101 A broad-coverage challenge corpus for sentence understanding through inference . In Proceedings of the 2018 Conference of the North A merican Chapter of the ACL: Human Language Technologies, Volume 1 (Long Papers) , pages 1112--1122, New Orleans, Louisiana. ACL
-
[61]
Yi Xin, Siqi Luo, Xuyang Liu, Haodi Zhou, Xinyu Cheng, Christina E Lee, Junlong Du, Haozhe Wang, MingCai Chen, Ting Liu, and 1 others. 2024. V-petl bench: A unified visual parameter-efficient transfer learning benchmark. Advances in Neural Information Processing Systems, 37:80522--80535
work page 2024
- [62]
-
[63]
An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, and 39 others. 2024. https://api.semanticscholar.org/CorpusID:271212307 Qwen2 technical report . ArXiv, abs/2407.10671
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[64]
Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, and Graham Neubig. 2018. Learning to mine aligned code and natural language pairs from stack overflow. In Proceedings of the 15th international conference on mining software repositories, pages 476--486
work page 2018
-
[65]
Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. 2019. https://doi.org/10.18653/v1/P19-1472 H ella S wag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4791--4800, Florence, Italy. Association for Computational Linguistics
-
[66]
Jia-Chen Zhang, Yu-Jie Xiong, Chun-Ming Xia, Dong-Hai Zhu, and Xi-He Qiu. 2025 a . https://aclanthology.org/2025.coling-main.265/ Parameter-efficient fine-tuning of large language models via deconvolution in subspace . In Proceedings of the 31st International Conference on Computational Linguistics, pages 3924--3935, Abu Dhabi, UAE. Association for Comput...
work page 2025
- [67]
-
[68]
Sheng Zhang, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Kevin Duh, and Benjamin Van Durme. 2018. Record: Bridging the gap between human and machine commonsense reading comprehension. arXiv preprint arXiv:1810.12885
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [69]
-
[70]
Bingchen Zhao, Haoqin Tu, Chen Wei, Jieru Mei, and Cihang Xie. 2024. https://openreview.net/forum?id=YR3ETaElNK Tuning layernorm in attention: Towards efficient multi-modal LLM finetuning . In The Twelfth International Conference on Learning Representations
work page 2024
-
[71]
Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo, Zhangchi Feng, and Yongqiang Ma. 2024. http://arxiv.org/abs/2403.13372 Llamafactory: Unified efficient fine-tuning of 100+ language models . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Bangkok, Thailand. Assoc...
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.