arxiv: 2402.09353 · v6 · submitted 2024-02-14 · 💻 cs.CL · cs.CV

Recognition: 2 theorem links

· Lean Theorem

DoRA: Weight-Decomposed Low-Rank Adaptation

Shih-Yang Liu , Chien-Yi Wang , Hongxu Yin , Pavlo Molchanov , Yu-Chiang Frank Wang , Kwang-Ting Cheng , Min-Hung Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 22:22 UTC · model grok-4.3

classification 💻 cs.CL cs.CV

keywords DoRALoRAparameter-efficient fine-tuningweight decompositionlarge language modelsmultimodal modelsfine-tuningLLaMA

0 comments

The pith

DoRA splits pretrained weights into magnitude and direction then updates only the direction with LoRA to narrow the gap to full fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper first analyzes why full fine-tuning usually beats LoRA by examining how each method changes the size and orientation of weight vectors. It finds that full fine-tuning alters both aspects while LoRA mainly changes orientation under a low-rank constraint. DoRA therefore freezes the magnitude of each weight and applies LoRA only to its direction, keeping the number of trainable parameters low and adding no cost at inference. This change is shown to raise accuracy and stability on LLaMA, LLaVA, and VL-BART across commonsense reasoning, visual instruction tuning, and multimodal understanding tasks.

Core claim

DoRA decomposes each pretrained weight matrix into a magnitude scalar and a directional unit vector. The magnitude is kept fixed at its pretrained value while low-rank adaptation matrices are used to update only the direction. The resulting fine-tuned weights are recombined at inference time exactly as in standard LoRA, yet the method recovers a larger fraction of full fine-tuning capacity and exhibits more stable training dynamics.

What carries the argument

Weight decomposition that isolates magnitude (frozen) from direction (updated by LoRA).

If this is right

DoRA raises accuracy on commonsense reasoning benchmarks compared with LoRA when fine-tuning LLaMA.
It improves visual instruction following performance for LLaVA without changing inference latency.
Training curves for DoRA show fewer oscillations than standard LoRA on the same tasks.
The same decomposition yields gains on image and video-text understanding for VL-BART.
The method adds zero extra parameters or compute once training finishes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The result suggests that directional alignment may be the dominant degree of freedom needed during adaptation while magnitude mainly sets scale.
DoRA could be combined with other low-rank or prompt-based methods to shrink the remaining gap to full fine-tuning.
The separation might allow selective magnitude rescaling at later training stages without increasing the LoRA rank.
Similar decomposition could be tested on convolutional or diffusion models to check whether the same magnitude-direction split helps there.

Load-bearing premise

The performance edge of full fine-tuning over LoRA comes mainly from its freedom to adjust both magnitude and direction, and fixing magnitude while updating direction recovers most of that edge.

What would settle it

An experiment in which low-rank updates are allowed to change both magnitude and direction simultaneously and still fail to match DoRA accuracy, or in which updating magnitude alone while freezing direction closes the gap instead.

read the original abstract

Among the widely used parameter-efficient fine-tuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA). DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing \ours, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. \ours~consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding. Code is available at https://github.com/NVlabs/DoRA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DoRA is a clean, practical tweak to LoRA that separates magnitude from direction and delivers small but steady gains on LLaMA and vision-language tasks, though the causal story for why it works remains empirical.

read the letter

DoRA decomposes each weight matrix into a magnitude scalar and a direction vector, then applies LoRA-style low-rank updates only to the direction while learning a separate magnitude scale. The result is a modest lift over vanilla LoRA on commonsense reasoning with LLaMA, visual instruction tuning with LLaVA, and image/video-text tasks with VL-BART, all without extra inference cost. Code is released, which is helpful for verification.

Referee Report

2 major / 1 minor

Summary. The paper introduces DoRA, a PEFT method that decomposes pre-trained weights into magnitude and direction, applies LoRA updates solely to the direction, and scales the magnitude component. Motivated by an analysis showing FT modifies magnitude more than standard LoRA, DoRA is claimed to increase learning capacity and stability over LoRA while incurring no extra inference cost. Experiments demonstrate consistent gains when fine-tuning LLaMA on commonsense reasoning, LLaVA on visual instruction tuning, and VL-BART on image/video-text understanding tasks.

Significance. If the empirical improvements hold under controlled hyperparameter regimes, DoRA would constitute a simple, practical upgrade to LoRA that narrows the gap to full fine-tuning without runtime overhead. The public code release strengthens verifiability and potential for follow-up work. The decomposition perspective may also inform future PEFT designs, though its explanatory power depends on isolating the magnitude-direction split as causal.

major comments (2)

[§3] §3 (weight decomposition analysis): the claim that FT alters magnitude more than LoRA is used to justify freezing/scaling magnitude while updating direction. The comparison does not appear to control for confounds such as optimizer state, effective learning-rate scaling, or total update steps between the FT and LoRA runs; without these controls the observed magnitude shift may be correlative rather than the primary driver of the FT-LoRA gap.
[Experimental setup] Experimental setup (hyperparameter details): the manuscript gives limited information on the search ranges and protocol for the magnitude scaling factor. It is unclear whether this factor is tuned globally or per layer and how many trials were performed; this detail is load-bearing for interpreting whether the reported gains over LoRA reflect the decomposition itself or differences in tuning effort.

minor comments (1)

[Abstract] Abstract: the assertion that DoRA enhances training stability would be strengthened by reporting a concrete stability metric (e.g., standard deviation of validation accuracy across random seeds).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and recommendation for minor revision. We address each major comment below.

read point-by-point responses

Referee: [§3] §3 (weight decomposition analysis): the claim that FT alters magnitude more than LoRA is used to justify freezing/scaling magnitude while updating direction. The comparison does not appear to control for confounds such as optimizer state, effective learning-rate scaling, or total update steps between the FT and LoRA runs; without these controls the observed magnitude shift may be correlative rather than the primary driver of the FT-LoRA gap.

Authors: We thank the referee for this observation. Our analysis in §3 compares FT and LoRA under their standard reported training protocols without explicit controls for optimizer state, learning-rate scaling, or update steps. We agree this renders the magnitude-difference observation correlative rather than strictly causal. In the revision we will add a clarifying paragraph acknowledging these potential confounds and will stress that the primary justification for DoRA remains its consistent empirical gains over LoRA across tasks and models. revision: partial
Referee: [Experimental setup] Experimental setup (hyperparameter details): the manuscript gives limited information on the search ranges and protocol for the magnitude scaling factor. It is unclear whether this factor is tuned globally or per layer and how many trials were performed; this detail is load-bearing for interpreting whether the reported gains over LoRA reflect the decomposition itself or differences in tuning effort.

Authors: We agree that additional hyperparameter details are required for reproducibility. The magnitude scaling factor was tuned independently per layer via grid search over the range [0.1, 10.0] with multiple trials per configuration. We will expand the experimental-setup section in the revised manuscript to report the exact search ranges, per-layer protocol, and number of trials performed. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's derivation proceeds from an empirical weight decomposition analysis of differences between full fine-tuning and LoRA, to a design choice that freezes magnitude and applies low-rank updates only to direction, followed by independent experimental validation on held-out downstream tasks. No step reduces by construction to its own inputs: the performance metrics are measured separately and do not equate to quantities defined by the decomposition itself. No load-bearing self-citations, fitted inputs renamed as predictions, or ansatzes smuggled via prior work appear in the chain.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 0 invented entities

The method adds one new modeling choice (magnitude-direction split) on top of standard LoRA; no new physical entities or unproven mathematical axioms are introduced. Hyperparameters such as rank and learning rate remain as in prior work.

free parameters (2)

LoRA rank r
Standard hyperparameter inherited from LoRA; chosen per task and model size.
magnitude scaling factor
Introduced by the decomposition; its value is either fixed or lightly tuned but not derived from first principles.

pith-pipeline@v0.9.0 · 5511 in / 1131 out tokens · 37939 ms · 2026-05-15T22:22:47.642685+00:00 · methodology

discussion (0)

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Continuous Expert Assembly: Instance-Conditioned Low-Rank Residuals for All-in-One Image Restoration
cs.CV 2026-05 unverdicted novelty 7.0

CEA assembles per-token low-rank residual updates via dense affinities over hyper-adapter-generated components to improve all-in-one image restoration on spatially non-uniform degradations.
Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning
cs.LG 2026-04 unverdicted novelty 7.0

A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and common...
GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models
cs.LG 2026-04 conditional novelty 7.0

GUI-Perturbed shows that GUI grounding models suffer systematic accuracy collapse under relational instructions and visual changes such as 70% zoom, with even augmented fine-tuning worsening results.
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation
cs.LG 2026-04 conditional novelty 6.0

Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accura...
Sub-Token Routing in LoRA for Adaptation and Query-Aware KV Compression
cs.LG 2026-04 unverdicted novelty 6.0

Sub-token routing in LoRA-adapted transformers adds a finer compression axis for KV caches, with query-independent and query-aware designs that improve efficiency under reduced budgets when combined with token-level s...
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
cs.LG 2026-04 unverdicted novelty 6.0

COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
Sensitivity-Positional Co-Localization in GQA Transformers
cs.CL 2026-04 unverdicted novelty 6.0

In Llama 3.1 8B, task-sensitive layers cluster late while RoPE adaptation is strongest early, yet applying both adaptations only to sensitivity-identified layers outperforms other layer choices by 4-16 points on MMLU,...
STQuant: Spatio-Temporal Adaptive Framework for Optimizer Quantization in Large Multimodal Model Training
cs.LG 2026-04 unverdicted novelty 6.0

STQuant dynamically allocates quantization bits for optimizer states in multimodal model training, reducing memory by 84.4% to an average 5.1 bits while preserving quality on GPT-2 and ViT.
ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache
cs.DC 2026-04 unverdicted novelty 6.0

ForkKV uses copy-on-write disaggregated KV cache with DualRadixTree and ResidualAttention kernels to deliver up to 3x throughput over prior multi-LoRA serving systems with negligible quality loss.
Constraint-Driven Warm-Freeze for Efficient Transfer Learning in Photovoltaic Systems
cs.NE 2026-04 unverdicted novelty 6.0

CDWF achieves 90-99% of full fine-tuning performance with up to 120x fewer trainable parameters by dynamically allocating full trainability to gradient-important blocks and LoRA to others for PV cyberattack transfer learning.
GAIN: Multiplicative Modulation for Domain Adaptation
cs.LG 2026-04 unverdicted novelty 6.0

GAIN's multiplicative modulation preserves pretrained weight column spans during sequential domain adaptation, yielding 7-13% better prior-domain perplexity than LoRA across 774M-70B models while matching replay-augme...
Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures
cs.LG 2026-04 conditional novelty 6.0

Gradient-guided layer selection for LoRA yields 15-28% training speedup with matched downstream results on MMLU, GSM8K, and HumanEval across 14 models from 0.5B to 72B parameters.
Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapters
cs.LG 2026-04 unverdicted novelty 6.0

PoLAR-VBLL combines orthogonalized low-rank adapters with variational Bayesian last-layer inference to enable scalable, well-calibrated uncertainty quantification in fine-tuned LLMs.
Deep Reprogramming Distillation for Medical Foundation Models
cs.CV 2026-05 unverdicted novelty 5.0

DRD introduces a reprogramming module and CKA-based distillation to enable efficient, robust adaptation of medical foundation models to downstream 2D/3D classification and segmentation tasks, outperforming prior PEFT ...
SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning
cs.DC 2026-04 unverdicted novelty 5.0

SplitFT adapts cut-layer selection and reduces LoRA rank per client in federated split learning to improve efficiency and performance when fine-tuning LLMs on heterogeneous devices and data.
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation
cs.LG 2026-04 unverdicted novelty 5.0

Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering close the gap.
Small Language Models are the Future of Agentic AI
cs.AI 2025-06 unverdicted novelty 5.0

Small language models are sufficiently capable, more suitable, and far more economical than large models for the repetitive tasks that dominate agentic AI systems.
LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language
cs.CL 2026-05 conditional novelty 4.0

Qwen2.5-3B was continued-pretrained and then fine-tuned with rsLoRA r256 on Sardinian data to reach 28.5 BLEU into the language, outperforming full fine-tuning and other LoRA variants.
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
cs.LG 2024-03 accept novelty 4.0

A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.
TLoRA+: A Low-Rank Parameter-Efficient Fine-Tuning Method for Large Language Models
cs.CL 2026-04 unverdicted novelty 3.0

TLoRA+ augments LoRA with a dedicated optimizer to improve fine-tuning performance on GLUE tasks without meaningful added compute.

Reference graph

Works this paper leans on

108 extracted references · 108 canonical work pages · cited by 19 Pith papers · 6 internal anchors

[1]

Edward J Hu and yelong shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo

work page
[2]

The Eleventh International Conference on Learning Representations , year=

Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning , author=. The Eleventh International Conference on Learning Representations , year=

work page
[3]

International Conference on Learning Representations , year=

FedPara: Low-rank Hadamard Product for Communication-Efficient Federated Learning , author=. International Conference on Learning Representations , year=

work page
[5]

International Conference on Learning Representations , year=

VeRA: Vector-based Random Matrix Adaptation , author=. International Conference on Learning Representations , year=

work page
[6]

Proceedings of the 30th International Conference on Neural Information Processing Systems , pages=

Weight normalization: a simple reparameterization to accelerate training of deep neural networks , author=. Proceedings of the 30th International Conference on Neural Information Processing Systems , pages=

work page
[7]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[8]

LLM -Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

Hu, Zhiqiang and Wang, Lei and Lan, Yihuai and Xu, Wanyu and Lim, Ee-Peng and Bing, Lidong and Xu, Xing and Poria, Soujanya and Lee, Roy. LLM -Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023

work page 2023
[9]

Hashimoto , title =

Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. 2023 , url =

work page 2023
[10]

Proceedings of the IEEE International Conference on Computer Vision , pages=

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , author=. Proceedings of the IEEE International Conference on Computer Vision , pages=

work page
[11]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Li, Xiang Lisa and Liang, Percy. Prefix-Tuning: Optimizing Continuous Prompts for Generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021

work page 2021
[12]

International Conference on Machine Learning , pages=

Parameter-efficient transfer learning for NLP , author=. International Conference on Machine Learning , pages=

work page
[13]

International Conference on Learning Representations , year=

Towards a Unified View of Parameter-Efficient Transfer Learning , author=. International Conference on Learning Representations , year=

work page
[14]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Making the v in vqa matter: Elevating the role of image understanding in visual question answering , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[15]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Gqa: A new dataset for real-world visual reasoning and compositional question answering , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[16]

A Corpus for Reasoning about Natural Language Grounded in Photographs

Suhr, Alane and Zhou, Stephanie and Zhang, Ally and Zhang, Iris and Bai, Huajun and Artzi, Yoav. A Corpus for Reasoning about Natural Language Grounded in Photographs. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019

work page 2019
[18]

International conference on machine learning , pages=

Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=

work page
[19]

BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke. BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020

work page 2020
[20]

Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1) , year=

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation , author=. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1) , year=

work page
[21]

TVQA : Localized, Compositional Video Question Answering

Lei, Jie and Yu, Licheng and Bansal, Mohit and Berg, Tamara. TVQA : Localized, Compositional Video Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018

work page 2018
[23]

HERO : Hierarchical Encoder for V ideo+ L anguage Omni-representation Pre-training

Li, Linjie and Chen, Yen-Chun and Cheng, Yu and Gan, Zhe and Yu, Licheng and Liu, Jingjing. HERO : Hierarchical Encoder for V ideo+ L anguage Omni-representation Pre-training. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020

work page 2020
[24]

European Conference on Computer Vision , pages=

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval , author=. European Conference on Computer Vision , pages=

work page
[25]

Proceedings of the AAAI Conference on Artificial Intelligence , year=

Towards automatic learning of procedures from web instructional videos , author=. Proceedings of the AAAI Conference on Artificial Intelligence , year=

work page
[26]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Visual Instruction Tuning , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

work page
[28]

Proceedings of the IEEE/cvf conference on computer vision and pattern recognition , pages=

Ok-vqa: A visual question answering benchmark requiring external knowledge , author=. Proceedings of the IEEE/cvf conference on computer vision and pattern recognition , pages=

work page
[29]

European Conference on Computer Vision , pages=

A-okvqa: A benchmark for visual question answering using world knowledge , author=. European Conference on Computer Vision , pages=

work page
[30]

OCR-VQA: Visual Question Answering by Reading Text in Images , year=

Mishra, Anand and Shekhar, Shashank and Singh, Ajeet Kumar and Chakraborty, Anirban , booktitle=. OCR-VQA: Visual Question Answering by Reading Text in Images , year=

work page
[31]

Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16 , pages=

Textcaps: a dataset for image captioning with reading comprehension , author=. Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16 , pages=

work page 2020
[32]

R efer I t G ame: Referring to Objects in Photographs of Natural Scenes

Kazemzadeh, Sahar and Ordonez, Vicente and Matten, Mark and Berg, Tamara. R efer I t G ame: Referring to Objects in Photographs of Natural Scenes. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ). 2014

work page 2014
[33]

International journal of computer vision , pages=

Visual genome: Connecting language and vision using crowdsourced dense image annotations , author=. International journal of computer vision , pages=

work page
[34]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Generation and comprehension of unambiguous object descriptions , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[35]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Vizwiz grand challenge: Answering visual questions from blind people , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[36]

Advances in Neural Information Processing Systems , pages=

Learn to explain: Multimodal reasoning via thought chains for science question answering , author=. Advances in Neural Information Processing Systems , pages=

work page
[37]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Towards vqa models that can read , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[38]

Evaluating Object Hallucination in Large Vision-Language Models

Li, Yifan and Du, Yifan and Zhou, Kun and Wang, Jinpeng and Zhao, Xin and Wen, Ji-Rong. Evaluating Object Hallucination in Large Vision-Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023

work page 2023
[40]

Gonzalez and Ion Stoica , booktitle=

Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica , booktitle=. Judging

work page
[41]

Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

Qin, Chengwei and Zhang, Aston and Zhang, Zhuosheng and Chen, Jiaao and Yasunaga, Michihiro and Yang, Diyi. Is ChatGPT a General-Purpose Natural Language Processing Task Solver?. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023

work page 2023
[42]

International Conference on Machine Learning , pages=

Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation , author=. International Conference on Machine Learning , pages=

work page
[43]

B it F it: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

Ben Zaken, Elad and Goldberg, Yoav and Ravfogel, Shauli. B it F it: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2022

work page 2022
[44]

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

Karimi Mahabadi, Rabeeh and Ruder, Sebastian and Dehghani, Mostafa and Henderson, James. Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Pap...

work page 2021
[45]

Advances in Neural Information Processing Systems , year=

Compacter: Efficient Low-Rank Hypercomplex Adapter Layers , author=. Advances in Neural Information Processing Systems , year=

work page
[46]

The Power of Scale for Parameter-Efficient Prompt Tuning

Lester, Brian and Al-Rfou, Rami and Constant, Noah. The Power of Scale for Parameter-Efficient Prompt Tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021

work page 2021
[47]

Residual Prompt Tuning: improving prompt tuning with residual reparameterization

Razdaibiedina, Anastasiia and Mao, Yuning and Khabsa, Madian and Lewis, Mike and Hou, Rui and Ba, Jimmy and Almahairi, Amjad. Residual Prompt Tuning: improving prompt tuning with residual reparameterization. Findings of the Association for Computational Linguistics: ACL 2023. 2023

work page 2023
[49]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Controlling Text-to-Image Diffusion by Orthogonal Finetuning , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

work page
[50]

Advances in Neural Information Processing Systems , year=

Chain of Thought Prompting Elicits Reasoning in Large Language Models , author=. Advances in Neural Information Processing Systems , year=

work page
[51]

2023 , url=

Welcome to the OpenAI platform , author=. 2023 , url=

work page 2023
[55]

2024 , eprint=

Orca-Math: Unlocking the potential of SLMs in Grade School Math , author=. 2024 , eprint=

work page 2024
[56]

Efficient finetuning of Llama 3 with FSDP QDoRA , author=

work page
[57]

QLoRA: Efficient Finetuning of Quantized LLMs , volume =

Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke , booktitle =. QLoRA: Efficient Finetuning of Quantized LLMs , volume =

work page
[59]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[61]

Chen, X., Fang, H., Lin, T.-Y., Vedantam, R., Gupta, S., Doll \'a r, P., and Zitnick, C. L. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[62]

Qlora: Efficient finetuning of quantized llms

Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer, L. Qlora: Efficient finetuning of quantized llms. In Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.), Advances in Neural Information Processing Systems, volume 36, pp.\ 10088--10115. Curran Associates, Inc., 2023

work page 2023
[63]

Making the v in vqa matter: Elevating the role of image understanding in visual question answering

Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., and Parikh, D. Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 6904--6913, 2017

work page 2017
[64]

J., Guo, A., Lin, C., Grauman, K., Luo, J., and Bigham, J

Gurari, D., Li, Q., Stangl, A. J., Guo, A., Lin, C., Grauman, K., Luo, J., and Bigham, J. P. Vizwiz grand challenge: Answering visual questions from blind people. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 3608--3617, 2018

work page 2018
[65]

Towards a unified view of parameter-efficient transfer learning

He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., and Neubig, G. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations, 2021

work page 2021
[66]

Delving deep into rectifiers: Surpassing human-level performance on imagenet classification

He, K., Zhang, X., Ren, S., and Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, pp.\ 1026--1034, 2015

work page 2015
[67]

Parameter-efficient transfer learning for nlp

Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., and Gelly, S. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pp.\ 2790--2799, 2019

work page 2019
[68]

J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W

Hu, E. J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. Lo RA : Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022

work page 2022
[69]

LLM -adapters: An adapter family for parameter-efficient fine-tuning of large language models

Hu, Z., Wang, L., Lan, Y., Xu, W., Lim, E.-P., Bing, L., Xu, X., Poria, S., and Lee, R. LLM -adapters: An adapter family for parameter-efficient fine-tuning of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023
[70]

Hudson, D. A. and Manning, C. D. Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 6700--6709, 2019

work page 2019
[71]

Fedpara: Low-rank hadamard product for communication-efficient federated learning

Hyeon-Woo, N., Ye-Bin, M., and Oh, T.-H. Fedpara: Low-rank hadamard product for communication-efficient federated learning. In International Conference on Learning Representations, 2022

work page 2022
[72]

Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks

Karimi Mahabadi, R., Ruder, S., Dehghani, M., and Henderson, J. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.\ 565--576, 2021

work page 2021
[73]

R efer I t G ame: Referring to objects in photographs of natural scenes

Kazemzadeh, S., Ordonez, V., Matten, M., and Berg, T. R efer I t G ame: Referring to objects in photographs of natural scenes. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) , pp.\ 787--798, 2014

work page 2014
[74]

Kerem Turgutlu, Jonathan Whitaker, J. H. Efficient finetuning of llama 3 with fsdp qdora. https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.html, 2024

work page 2024
[75]

J., Blankevoort, T., and Asano, Y

Kopiczko, D. J., Blankevoort, T., and Asano, Y. M. Vera: Vector-based random matrix adaptation. In International Conference on Learning Representations, 2024

work page 2024
[76]

A., et al

Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, pp.\ 32--73, 2017

work page 2017
[77]

TVQA : Localized, compositional video question answering

Lei, J., Yu, L., Bansal, M., and Berg, T. TVQA : Localized, compositional video question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.\ 1369--1379, 2018

work page 2018
[78]

L., and Bansal, M

Lei, J., Yu, L., Berg, T. L., and Bansal, M. Tvr: A large-scale dataset for video-subtitle moment retrieval. In European Conference on Computer Vision, pp.\ 447--463, 2020

work page 2020
[79]

The power of scale for parameter-efficient prompt tuning

Lester, B., Al-Rfou, R., and Constant, N. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.\ 3045--3059, 2021

work page 2021
[80]

BART : Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. BART : Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.\ 7871--7880, 2020

work page 2020
[81]

Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation

Li, J., Li, D., Xiong, C., and Hoi, S. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pp.\ 12888--12900, 2022

work page 2022
[82]

HERO : Hierarchical encoder for V ideo+ L anguage omni-representation pre-training

Li, L., Chen, Y.-C., Cheng, Y., Gan, Z., Yu, L., and Liu, J. HERO : Hierarchical encoder for V ideo+ L anguage omni-representation pre-training. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.\ 2046--2065, 2020

work page 2020
[83]

E., Wang, W

Li, L., Lei, J., Gan, Z., Yu, L., Chen, Y.-C., Pillai, R., Cheng, Y., Zhou, L., Wang, X. E., Wang, W. Y., et al. Value: A multi-task benchmark for video-and-language understanding evaluation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021

work page 2021
[84]

Li, X. L. and Liang, P. Prefix-tuning: Optimizing continuous prompts for generation. In Zong, C., Xia, F., Li, W., and Navigli, R. (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.\ 4582--4597, 2021

work page 2021
[85]

Evaluating object hallucination in large vision-language models

Li, Y., Du, Y., Zhou, K., Wang, J., Zhao, X., and Wen, J.-R. Evaluating object hallucination in large vision-language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.\ 292--305, 2023

work page 2023
[86]

Liu, H., Li, C., Wu, Q., and Lee, Y. J. Visual instruction tuning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023 a

work page 2023
[87]

Parameter-efficient orthogonal finetuning via butterfly factorization

Liu, W., Qiu, Z., Feng, Y., Xiu, Y., Xue, Y., Yu, L., Feng, H., Liu, Z., Heo, J., Peng, S., et al. Parameter-efficient orthogonal finetuning via butterfly factorization. arXiv preprint arXiv:2311.06243, 2023 b

work page arXiv 2023
[88]

MMBench: Is Your Multi-modal Model an All-around Player?

Liu, Y., Duan, H., Zhang, Y., Li, B., Zhang, S., Zhao, W., Yuan, Y., Wang, J., He, C., Liu, Z., et al. Mmbench: Is your multi-modal model an all-around player? arXiv preprint arXiv:2307.06281, 2023 c

work page internal anchor Pith review Pith/arXiv arXiv 2023
[89]

Learn to explain: Multimodal reasoning via thought chains for science question answering

Lu, P., Mishra, S., Xia, T., Qiu, L., Chang, K.-W., Zhu, S.-C., Tafjord, O., Clark, P., and Kalyan, A. Learn to explain: Multimodal reasoning via thought chains for science question answering. Advances in Neural Information Processing Systems, pp.\ 2507--2521, 2022

work page 2022
[90]

K., Henderson, J., and Ruder, S

mahabadi, R. K., Henderson, J., and Ruder, S. Compacter: Efficient low-rank hypercomplex adapter layers. In Advances in Neural Information Processing Systems, 2021

work page 2021
[91]

L., and Murphy, K

Mao, J., Huang, J., Toshev, A., Camburu, O., Yuille, A. L., and Murphy, K. Generation and comprehension of unambiguous object descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 11--20, 2016

work page 2016

Showing first 80 references.