pith. machine review for the scientific record. sign in

arxiv: 2402.09353 · v6 · submitted 2024-02-14 · 💻 cs.CL · cs.CV

Recognition: 2 theorem links

· Lean Theorem

DoRA: Weight-Decomposed Low-Rank Adaptation

Authors on Pith no claims yet

Pith reviewed 2026-05-15 22:22 UTC · model grok-4.3

classification 💻 cs.CL cs.CV
keywords DoRALoRAparameter-efficient fine-tuningweight decompositionlarge language modelsmultimodal modelsfine-tuningLLaMA
0
0 comments X

The pith

DoRA splits pretrained weights into magnitude and direction then updates only the direction with LoRA to narrow the gap to full fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper first analyzes why full fine-tuning usually beats LoRA by examining how each method changes the size and orientation of weight vectors. It finds that full fine-tuning alters both aspects while LoRA mainly changes orientation under a low-rank constraint. DoRA therefore freezes the magnitude of each weight and applies LoRA only to its direction, keeping the number of trainable parameters low and adding no cost at inference. This change is shown to raise accuracy and stability on LLaMA, LLaVA, and VL-BART across commonsense reasoning, visual instruction tuning, and multimodal understanding tasks.

Core claim

DoRA decomposes each pretrained weight matrix into a magnitude scalar and a directional unit vector. The magnitude is kept fixed at its pretrained value while low-rank adaptation matrices are used to update only the direction. The resulting fine-tuned weights are recombined at inference time exactly as in standard LoRA, yet the method recovers a larger fraction of full fine-tuning capacity and exhibits more stable training dynamics.

What carries the argument

Weight decomposition that isolates magnitude (frozen) from direction (updated by LoRA).

If this is right

  • DoRA raises accuracy on commonsense reasoning benchmarks compared with LoRA when fine-tuning LLaMA.
  • It improves visual instruction following performance for LLaVA without changing inference latency.
  • Training curves for DoRA show fewer oscillations than standard LoRA on the same tasks.
  • The same decomposition yields gains on image and video-text understanding for VL-BART.
  • The method adds zero extra parameters or compute once training finishes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The result suggests that directional alignment may be the dominant degree of freedom needed during adaptation while magnitude mainly sets scale.
  • DoRA could be combined with other low-rank or prompt-based methods to shrink the remaining gap to full fine-tuning.
  • The separation might allow selective magnitude rescaling at later training stages without increasing the LoRA rank.
  • Similar decomposition could be tested on convolutional or diffusion models to check whether the same magnitude-direction split helps there.

Load-bearing premise

The performance edge of full fine-tuning over LoRA comes mainly from its freedom to adjust both magnitude and direction, and fixing magnitude while updating direction recovers most of that edge.

What would settle it

An experiment in which low-rank updates are allowed to change both magnitude and direction simultaneously and still fail to match DoRA accuracy, or in which updating magnitude alone while freezing direction closes the gap instead.

read the original abstract

Among the widely used parameter-efficient fine-tuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA). DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing \ours, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. \ours~consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding. Code is available at https://github.com/NVlabs/DoRA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces DoRA, a PEFT method that decomposes pre-trained weights into magnitude and direction, applies LoRA updates solely to the direction, and scales the magnitude component. Motivated by an analysis showing FT modifies magnitude more than standard LoRA, DoRA is claimed to increase learning capacity and stability over LoRA while incurring no extra inference cost. Experiments demonstrate consistent gains when fine-tuning LLaMA on commonsense reasoning, LLaVA on visual instruction tuning, and VL-BART on image/video-text understanding tasks.

Significance. If the empirical improvements hold under controlled hyperparameter regimes, DoRA would constitute a simple, practical upgrade to LoRA that narrows the gap to full fine-tuning without runtime overhead. The public code release strengthens verifiability and potential for follow-up work. The decomposition perspective may also inform future PEFT designs, though its explanatory power depends on isolating the magnitude-direction split as causal.

major comments (2)
  1. [§3] §3 (weight decomposition analysis): the claim that FT alters magnitude more than LoRA is used to justify freezing/scaling magnitude while updating direction. The comparison does not appear to control for confounds such as optimizer state, effective learning-rate scaling, or total update steps between the FT and LoRA runs; without these controls the observed magnitude shift may be correlative rather than the primary driver of the FT-LoRA gap.
  2. [Experimental setup] Experimental setup (hyperparameter details): the manuscript gives limited information on the search ranges and protocol for the magnitude scaling factor. It is unclear whether this factor is tuned globally or per layer and how many trials were performed; this detail is load-bearing for interpreting whether the reported gains over LoRA reflect the decomposition itself or differences in tuning effort.
minor comments (1)
  1. [Abstract] Abstract: the assertion that DoRA enhances training stability would be strengthened by reporting a concrete stability metric (e.g., standard deviation of validation accuracy across random seeds).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and recommendation for minor revision. We address each major comment below.

read point-by-point responses
  1. Referee: [§3] §3 (weight decomposition analysis): the claim that FT alters magnitude more than LoRA is used to justify freezing/scaling magnitude while updating direction. The comparison does not appear to control for confounds such as optimizer state, effective learning-rate scaling, or total update steps between the FT and LoRA runs; without these controls the observed magnitude shift may be correlative rather than the primary driver of the FT-LoRA gap.

    Authors: We thank the referee for this observation. Our analysis in §3 compares FT and LoRA under their standard reported training protocols without explicit controls for optimizer state, learning-rate scaling, or update steps. We agree this renders the magnitude-difference observation correlative rather than strictly causal. In the revision we will add a clarifying paragraph acknowledging these potential confounds and will stress that the primary justification for DoRA remains its consistent empirical gains over LoRA across tasks and models. revision: partial

  2. Referee: [Experimental setup] Experimental setup (hyperparameter details): the manuscript gives limited information on the search ranges and protocol for the magnitude scaling factor. It is unclear whether this factor is tuned globally or per layer and how many trials were performed; this detail is load-bearing for interpreting whether the reported gains over LoRA reflect the decomposition itself or differences in tuning effort.

    Authors: We agree that additional hyperparameter details are required for reproducibility. The magnitude scaling factor was tuned independently per layer via grid search over the range [0.1, 10.0] with multiple trials per configuration. We will expand the experimental-setup section in the revised manuscript to report the exact search ranges, per-layer protocol, and number of trials performed. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's derivation proceeds from an empirical weight decomposition analysis of differences between full fine-tuning and LoRA, to a design choice that freezes magnitude and applies low-rank updates only to direction, followed by independent experimental validation on held-out downstream tasks. No step reduces by construction to its own inputs: the performance metrics are measured separately and do not equate to quantities defined by the decomposition itself. No load-bearing self-citations, fitted inputs renamed as predictions, or ansatzes smuggled via prior work appear in the chain.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 0 invented entities

The method adds one new modeling choice (magnitude-direction split) on top of standard LoRA; no new physical entities or unproven mathematical axioms are introduced. Hyperparameters such as rank and learning rate remain as in prior work.

free parameters (2)
  • LoRA rank r
    Standard hyperparameter inherited from LoRA; chosen per task and model size.
  • magnitude scaling factor
    Introduced by the decomposition; its value is either fixed or lightly tuned but not derived from first principles.

pith-pipeline@v0.9.0 · 5511 in / 1131 out tokens · 37939 ms · 2026-05-15T22:22:47.642685+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Continuous Expert Assembly: Instance-Conditioned Low-Rank Residuals for All-in-One Image Restoration

    cs.CV 2026-05 unverdicted novelty 7.0

    CEA assembles per-token low-rank residual updates via dense affinities over hyper-adapter-generated components to improve all-in-one image restoration on spatially non-uniform degradations.

  2. Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

    cs.LG 2026-04 unverdicted novelty 7.0

    A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and common...

  3. GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models

    cs.LG 2026-04 conditional novelty 7.0

    GUI-Perturbed shows that GUI grounding models suffer systematic accuracy collapse under relational instructions and visual changes such as 70% zoom, with even augmented fine-tuning worsening results.

  4. The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

    cs.LG 2026-04 conditional novelty 6.0

    Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accura...

  5. Sub-Token Routing in LoRA for Adaptation and Query-Aware KV Compression

    cs.LG 2026-04 unverdicted novelty 6.0

    Sub-token routing in LoRA-adapted transformers adds a finer compression axis for KV caches, with query-independent and query-aware designs that improve efficiency under reduced budgets when combined with token-level s...

  6. COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

    cs.LG 2026-04 unverdicted novelty 6.0

    COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.

  7. Sensitivity-Positional Co-Localization in GQA Transformers

    cs.CL 2026-04 unverdicted novelty 6.0

    In Llama 3.1 8B, task-sensitive layers cluster late while RoPE adaptation is strongest early, yet applying both adaptations only to sensitivity-identified layers outperforms other layer choices by 4-16 points on MMLU,...

  8. STQuant: Spatio-Temporal Adaptive Framework for Optimizer Quantization in Large Multimodal Model Training

    cs.LG 2026-04 unverdicted novelty 6.0

    STQuant dynamically allocates quantization bits for optimizer states in multimodal model training, reducing memory by 84.4% to an average 5.1 bits while preserving quality on GPT-2 and ViT.

  9. ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache

    cs.DC 2026-04 unverdicted novelty 6.0

    ForkKV uses copy-on-write disaggregated KV cache with DualRadixTree and ResidualAttention kernels to deliver up to 3x throughput over prior multi-LoRA serving systems with negligible quality loss.

  10. Constraint-Driven Warm-Freeze for Efficient Transfer Learning in Photovoltaic Systems

    cs.NE 2026-04 unverdicted novelty 6.0

    CDWF achieves 90-99% of full fine-tuning performance with up to 120x fewer trainable parameters by dynamically allocating full trainability to gradient-important blocks and LoRA to others for PV cyberattack transfer learning.

  11. GAIN: Multiplicative Modulation for Domain Adaptation

    cs.LG 2026-04 unverdicted novelty 6.0

    GAIN's multiplicative modulation preserves pretrained weight column spans during sequential domain adaptation, yielding 7-13% better prior-domain perplexity than LoRA across 774M-70B models while matching replay-augme...

  12. Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures

    cs.LG 2026-04 conditional novelty 6.0

    Gradient-guided layer selection for LoRA yields 15-28% training speedup with matched downstream results on MMLU, GSM8K, and HumanEval across 14 models from 0.5B to 72B parameters.

  13. Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapters

    cs.LG 2026-04 unverdicted novelty 6.0

    PoLAR-VBLL combines orthogonalized low-rank adapters with variational Bayesian last-layer inference to enable scalable, well-calibrated uncertainty quantification in fine-tuned LLMs.

  14. Deep Reprogramming Distillation for Medical Foundation Models

    cs.CV 2026-05 unverdicted novelty 5.0

    DRD introduces a reprogramming module and CKA-based distillation to enable efficient, robust adaptation of medical foundation models to downstream 2D/3D classification and segmentation tasks, outperforming prior PEFT ...

  15. SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning

    cs.DC 2026-04 unverdicted novelty 5.0

    SplitFT adapts cut-layer selection and reduces LoRA rank per client in federated split learning to improve efficiency and performance when fine-tuning LLMs on heterogeneous devices and data.

  16. The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

    cs.LG 2026-04 unverdicted novelty 5.0

    Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering close the gap.

  17. Small Language Models are the Future of Agentic AI

    cs.AI 2025-06 unverdicted novelty 5.0

    Small language models are sufficiently capable, more suitable, and far more economical than large models for the repetitive tasks that dominate agentic AI systems.

  18. LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language

    cs.CL 2026-05 conditional novelty 4.0

    Qwen2.5-3B was continued-pretrained and then fine-tuned with rsLoRA r256 on Sardinian data to reach 28.5 BLEU into the language, outperforming full fine-tuning and other LoRA variants.

  19. Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

    cs.LG 2024-03 accept novelty 4.0

    A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.

  20. TLoRA+: A Low-Rank Parameter-Efficient Fine-Tuning Method for Large Language Models

    cs.CL 2026-04 unverdicted novelty 3.0

    TLoRA+ augments LoRA with a dedicated optimizer to improve fine-tuning performance on GLUE tasks without meaningful added compute.

Reference graph

Works this paper leans on

108 extracted references · 108 canonical work pages · cited by 19 Pith papers · 6 internal anchors

  1. [1]

    Edward J Hu and yelong shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo

  2. [2]

    The Eleventh International Conference on Learning Representations , year=

    Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning , author=. The Eleventh International Conference on Learning Representations , year=

  3. [3]

    International Conference on Learning Representations , year=

    FedPara: Low-rank Hadamard Product for Communication-Efficient Federated Learning , author=. International Conference on Learning Representations , year=

  4. [5]

    International Conference on Learning Representations , year=

    VeRA: Vector-based Random Matrix Adaptation , author=. International Conference on Learning Representations , year=

  5. [6]

    Proceedings of the 30th International Conference on Neural Information Processing Systems , pages=

    Weight normalization: a simple reparameterization to accelerate training of deep neural networks , author=. Proceedings of the 30th International Conference on Neural Information Processing Systems , pages=

  6. [7]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  7. [8]

    LLM -Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

    Hu, Zhiqiang and Wang, Lei and Lan, Yihuai and Xu, Wanyu and Lim, Ee-Peng and Bing, Lidong and Xu, Xing and Poria, Soujanya and Lee, Roy. LLM -Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023

  8. [9]

    Hashimoto , title =

    Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. 2023 , url =

  9. [10]

    Proceedings of the IEEE International Conference on Computer Vision , pages=

    Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , author=. Proceedings of the IEEE International Conference on Computer Vision , pages=

  10. [11]

    Prefix-Tuning: Optimizing Continuous Prompts for Generation

    Li, Xiang Lisa and Liang, Percy. Prefix-Tuning: Optimizing Continuous Prompts for Generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021

  11. [12]

    International Conference on Machine Learning , pages=

    Parameter-efficient transfer learning for NLP , author=. International Conference on Machine Learning , pages=

  12. [13]

    International Conference on Learning Representations , year=

    Towards a Unified View of Parameter-Efficient Transfer Learning , author=. International Conference on Learning Representations , year=

  13. [14]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Making the v in vqa matter: Elevating the role of image understanding in visual question answering , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  14. [15]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Gqa: A new dataset for real-world visual reasoning and compositional question answering , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  15. [16]

    A Corpus for Reasoning about Natural Language Grounded in Photographs

    Suhr, Alane and Zhou, Stephanie and Zhang, Ally and Zhang, Iris and Bai, Huajun and Artzi, Yoav. A Corpus for Reasoning about Natural Language Grounded in Photographs. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019

  16. [18]

    International conference on machine learning , pages=

    Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=

  17. [19]

    BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

    Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke. BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020

  18. [20]

    Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1) , year=

    VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation , author=. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1) , year=

  19. [21]

    TVQA : Localized, Compositional Video Question Answering

    Lei, Jie and Yu, Licheng and Bansal, Mohit and Berg, Tamara. TVQA : Localized, Compositional Video Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018

  20. [23]

    HERO : Hierarchical Encoder for V ideo+ L anguage Omni-representation Pre-training

    Li, Linjie and Chen, Yen-Chun and Cheng, Yu and Gan, Zhe and Yu, Licheng and Liu, Jingjing. HERO : Hierarchical Encoder for V ideo+ L anguage Omni-representation Pre-training. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020

  21. [24]

    European Conference on Computer Vision , pages=

    TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval , author=. European Conference on Computer Vision , pages=

  22. [25]

    Proceedings of the AAAI Conference on Artificial Intelligence , year=

    Towards automatic learning of procedures from web instructional videos , author=. Proceedings of the AAAI Conference on Artificial Intelligence , year=

  23. [26]

    Thirty-seventh Conference on Neural Information Processing Systems , year=

    Visual Instruction Tuning , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

  24. [28]

    Proceedings of the IEEE/cvf conference on computer vision and pattern recognition , pages=

    Ok-vqa: A visual question answering benchmark requiring external knowledge , author=. Proceedings of the IEEE/cvf conference on computer vision and pattern recognition , pages=

  25. [29]

    European Conference on Computer Vision , pages=

    A-okvqa: A benchmark for visual question answering using world knowledge , author=. European Conference on Computer Vision , pages=

  26. [30]

    OCR-VQA: Visual Question Answering by Reading Text in Images , year=

    Mishra, Anand and Shekhar, Shashank and Singh, Ajeet Kumar and Chakraborty, Anirban , booktitle=. OCR-VQA: Visual Question Answering by Reading Text in Images , year=

  27. [31]

    Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16 , pages=

    Textcaps: a dataset for image captioning with reading comprehension , author=. Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16 , pages=

  28. [32]

    R efer I t G ame: Referring to Objects in Photographs of Natural Scenes

    Kazemzadeh, Sahar and Ordonez, Vicente and Matten, Mark and Berg, Tamara. R efer I t G ame: Referring to Objects in Photographs of Natural Scenes. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ). 2014

  29. [33]

    International journal of computer vision , pages=

    Visual genome: Connecting language and vision using crowdsourced dense image annotations , author=. International journal of computer vision , pages=

  30. [34]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Generation and comprehension of unambiguous object descriptions , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  31. [35]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Vizwiz grand challenge: Answering visual questions from blind people , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  32. [36]

    Advances in Neural Information Processing Systems , pages=

    Learn to explain: Multimodal reasoning via thought chains for science question answering , author=. Advances in Neural Information Processing Systems , pages=

  33. [37]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Towards vqa models that can read , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  34. [38]

    Evaluating Object Hallucination in Large Vision-Language Models

    Li, Yifan and Du, Yifan and Zhou, Kun and Wang, Jinpeng and Zhao, Xin and Wen, Ji-Rong. Evaluating Object Hallucination in Large Vision-Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023

  35. [40]

    Gonzalez and Ion Stoica , booktitle=

    Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica , booktitle=. Judging

  36. [41]

    Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

    Qin, Chengwei and Zhang, Aston and Zhang, Zhuosheng and Chen, Jiaao and Yasunaga, Michihiro and Yang, Diyi. Is ChatGPT a General-Purpose Natural Language Processing Task Solver?. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023

  37. [42]

    International Conference on Machine Learning , pages=

    Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation , author=. International Conference on Machine Learning , pages=

  38. [43]

    B it F it: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

    Ben Zaken, Elad and Goldberg, Yoav and Ravfogel, Shauli. B it F it: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2022

  39. [44]

    Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

    Karimi Mahabadi, Rabeeh and Ruder, Sebastian and Dehghani, Mostafa and Henderson, James. Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Pap...

  40. [45]

    Advances in Neural Information Processing Systems , year=

    Compacter: Efficient Low-Rank Hypercomplex Adapter Layers , author=. Advances in Neural Information Processing Systems , year=

  41. [46]

    The Power of Scale for Parameter-Efficient Prompt Tuning

    Lester, Brian and Al-Rfou, Rami and Constant, Noah. The Power of Scale for Parameter-Efficient Prompt Tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021

  42. [47]

    Residual Prompt Tuning: improving prompt tuning with residual reparameterization

    Razdaibiedina, Anastasiia and Mao, Yuning and Khabsa, Madian and Lewis, Mike and Hou, Rui and Ba, Jimmy and Almahairi, Amjad. Residual Prompt Tuning: improving prompt tuning with residual reparameterization. Findings of the Association for Computational Linguistics: ACL 2023. 2023

  43. [49]

    Thirty-seventh Conference on Neural Information Processing Systems , year=

    Controlling Text-to-Image Diffusion by Orthogonal Finetuning , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

  44. [50]

    Advances in Neural Information Processing Systems , year=

    Chain of Thought Prompting Elicits Reasoning in Large Language Models , author=. Advances in Neural Information Processing Systems , year=

  45. [51]

    2023 , url=

    Welcome to the OpenAI platform , author=. 2023 , url=

  46. [55]

    2024 , eprint=

    Orca-Math: Unlocking the potential of SLMs in Grade School Math , author=. 2024 , eprint=

  47. [56]

    Efficient finetuning of Llama 3 with FSDP QDoRA , author=

  48. [57]

    QLoRA: Efficient Finetuning of Quantized LLMs , volume =

    Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke , booktitle =. QLoRA: Efficient Finetuning of Quantized LLMs , volume =

  49. [59]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  50. [61]

    Chen, X., Fang, H., Lin, T.-Y., Vedantam, R., Gupta, S., Doll \'a r, P., and Zitnick, C. L. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015

  51. [62]

    Qlora: Efficient finetuning of quantized llms

    Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer, L. Qlora: Efficient finetuning of quantized llms. In Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.), Advances in Neural Information Processing Systems, volume 36, pp.\ 10088--10115. Curran Associates, Inc., 2023

  52. [63]

    Making the v in vqa matter: Elevating the role of image understanding in visual question answering

    Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., and Parikh, D. Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 6904--6913, 2017

  53. [64]

    J., Guo, A., Lin, C., Grauman, K., Luo, J., and Bigham, J

    Gurari, D., Li, Q., Stangl, A. J., Guo, A., Lin, C., Grauman, K., Luo, J., and Bigham, J. P. Vizwiz grand challenge: Answering visual questions from blind people. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 3608--3617, 2018

  54. [65]

    Towards a unified view of parameter-efficient transfer learning

    He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., and Neubig, G. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations, 2021

  55. [66]

    Delving deep into rectifiers: Surpassing human-level performance on imagenet classification

    He, K., Zhang, X., Ren, S., and Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, pp.\ 1026--1034, 2015

  56. [67]

    Parameter-efficient transfer learning for nlp

    Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., and Gelly, S. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pp.\ 2790--2799, 2019

  57. [68]

    J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W

    Hu, E. J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. Lo RA : Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022

  58. [69]

    LLM -adapters: An adapter family for parameter-efficient fine-tuning of large language models

    Hu, Z., Wang, L., Lan, Y., Xu, W., Lim, E.-P., Bing, L., Xu, X., Poria, S., and Lee, R. LLM -adapters: An adapter family for parameter-efficient fine-tuning of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

  59. [70]

    Hudson, D. A. and Manning, C. D. Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 6700--6709, 2019

  60. [71]

    Fedpara: Low-rank hadamard product for communication-efficient federated learning

    Hyeon-Woo, N., Ye-Bin, M., and Oh, T.-H. Fedpara: Low-rank hadamard product for communication-efficient federated learning. In International Conference on Learning Representations, 2022

  61. [72]

    Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks

    Karimi Mahabadi, R., Ruder, S., Dehghani, M., and Henderson, J. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.\ 565--576, 2021

  62. [73]

    R efer I t G ame: Referring to objects in photographs of natural scenes

    Kazemzadeh, S., Ordonez, V., Matten, M., and Berg, T. R efer I t G ame: Referring to objects in photographs of natural scenes. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) , pp.\ 787--798, 2014

  63. [74]

    Kerem Turgutlu, Jonathan Whitaker, J. H. Efficient finetuning of llama 3 with fsdp qdora. https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.html, 2024

  64. [75]

    J., Blankevoort, T., and Asano, Y

    Kopiczko, D. J., Blankevoort, T., and Asano, Y. M. Vera: Vector-based random matrix adaptation. In International Conference on Learning Representations, 2024

  65. [76]

    A., et al

    Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, pp.\ 32--73, 2017

  66. [77]

    TVQA : Localized, compositional video question answering

    Lei, J., Yu, L., Bansal, M., and Berg, T. TVQA : Localized, compositional video question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.\ 1369--1379, 2018

  67. [78]

    L., and Bansal, M

    Lei, J., Yu, L., Berg, T. L., and Bansal, M. Tvr: A large-scale dataset for video-subtitle moment retrieval. In European Conference on Computer Vision, pp.\ 447--463, 2020

  68. [79]

    The power of scale for parameter-efficient prompt tuning

    Lester, B., Al-Rfou, R., and Constant, N. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.\ 3045--3059, 2021

  69. [80]

    BART : Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension

    Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. BART : Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.\ 7871--7880, 2020

  70. [81]

    Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation

    Li, J., Li, D., Xiong, C., and Hoi, S. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pp.\ 12888--12900, 2022

  71. [82]

    HERO : Hierarchical encoder for V ideo+ L anguage omni-representation pre-training

    Li, L., Chen, Y.-C., Cheng, Y., Gan, Z., Yu, L., and Liu, J. HERO : Hierarchical encoder for V ideo+ L anguage omni-representation pre-training. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.\ 2046--2065, 2020

  72. [83]

    E., Wang, W

    Li, L., Lei, J., Gan, Z., Yu, L., Chen, Y.-C., Pillai, R., Cheng, Y., Zhou, L., Wang, X. E., Wang, W. Y., et al. Value: A multi-task benchmark for video-and-language understanding evaluation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021

  73. [84]

    Li, X. L. and Liang, P. Prefix-tuning: Optimizing continuous prompts for generation. In Zong, C., Xia, F., Li, W., and Navigli, R. (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.\ 4582--4597, 2021

  74. [85]

    Evaluating object hallucination in large vision-language models

    Li, Y., Du, Y., Zhou, K., Wang, J., Zhao, X., and Wen, J.-R. Evaluating object hallucination in large vision-language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.\ 292--305, 2023

  75. [86]

    Liu, H., Li, C., Wu, Q., and Lee, Y. J. Visual instruction tuning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023 a

  76. [87]

    Parameter-efficient orthogonal finetuning via butterfly factorization

    Liu, W., Qiu, Z., Feng, Y., Xiu, Y., Xue, Y., Yu, L., Feng, H., Liu, Z., Heo, J., Peng, S., et al. Parameter-efficient orthogonal finetuning via butterfly factorization. arXiv preprint arXiv:2311.06243, 2023 b

  77. [88]

    MMBench: Is Your Multi-modal Model an All-around Player?

    Liu, Y., Duan, H., Zhang, Y., Li, B., Zhang, S., Zhao, W., Yuan, Y., Wang, J., He, C., Liu, Z., et al. Mmbench: Is your multi-modal model an all-around player? arXiv preprint arXiv:2307.06281, 2023 c

  78. [89]

    Learn to explain: Multimodal reasoning via thought chains for science question answering

    Lu, P., Mishra, S., Xia, T., Qiu, L., Chang, K.-W., Zhu, S.-C., Tafjord, O., Clark, P., and Kalyan, A. Learn to explain: Multimodal reasoning via thought chains for science question answering. Advances in Neural Information Processing Systems, pp.\ 2507--2521, 2022

  79. [90]

    K., Henderson, J., and Ruder, S

    mahabadi, R. K., Henderson, J., and Ruder, S. Compacter: Efficient low-rank hypercomplex adapter layers. In Advances in Neural Information Processing Systems, 2021

  80. [91]

    L., and Murphy, K

    Mao, J., Huang, J., Toshev, A., Camburu, O., Yuille, A. L., and Murphy, K. Generation and comprehension of unambiguous object descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 11--20, 2016

Showing first 80 references.