pith. sign in

hub Canonical reference

DoRA: Weight-Decomposed Low-Rank Adaptation

Canonical reference. 89% of citing Pith papers cite this work as background.

24 Pith papers citing it
Background 89% of classified citations
abstract

Among the widely used parameter-efficient fine-tuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA). DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing \ours, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. \ours~consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding. Code is available at https://github.com/NVlabs/DoRA.

hub tools

citation-role summary

background 9

citation-polarity summary

roles

background 9

polarities

background 8 support 1

representative citing papers

Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

cs.LG · 2026-04-24 · unverdicted · novelty 7.0

A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.

Sensitivity-Positional Co-Localization in GQA Transformers

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

In Llama 3.1 8B, task-sensitive layers cluster late while RoPE adaptation is strongest early, yet applying both adaptations only to sensitivity-identified layers outperforms other layer choices by 4-16 points on MMLU, GPQA, HumanEval+, MATH, MGSM and ARC.

GAIN: Multiplicative Modulation for Domain Adaptation

cs.LG · 2026-04-06 · unverdicted · novelty 6.0

GAIN's multiplicative modulation preserves pretrained weight column spans during sequential domain adaptation, yielding 7-13% better prior-domain perplexity than LoRA across 774M-70B models while matching replay-augmented baselines without storing data.

HyperAdapt: Simple High-Rank Adaptation

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.

Deep Reprogramming Distillation for Medical Foundation Models

cs.CV · 2026-05-06 · unverdicted · novelty 5.0

DRD introduces a reprogramming module and CKA-based distillation to enable efficient, robust adaptation of medical foundation models to downstream 2D/3D classification and segmentation tasks, outperforming prior PEFT and KD methods on 18 tasks.

Small Language Models are the Future of Agentic AI

cs.AI · 2025-06-02 · unverdicted · novelty 5.0

Small language models are sufficiently capable, more suitable, and far more economical than large models for the repetitive tasks that dominate agentic AI systems.

Test-Time Alignment via Hypothesis Reweighting

cs.LG · 2024-12-11 · unverdicted · novelty 5.0

HyRe personalizes reward models at test time by reweighting an ensemble of heads trained on aggregate preferences, using few target examples to outperform uniform averaging and prior methods on RewardBench and 32 tasks.

citing papers explorer

Showing 24 of 24 citing papers.