DenseLoRA: Dense Low-Rank Adaptation of Large Language Models

Li Ni; Lin Mu; Peiquan Jin; Xiaoyu Wang; Yang Li; Yiwen Zhang; Zhize Wu

arxiv: 2505.23808 · v1 · submitted 2025-05-27 · 💻 cs.CL · cs.AI

DenseLoRA: Dense Low-Rank Adaptation of Large Language Models

Lin Mu , Xiaoyu Wang , Li Ni , Yang Li , Zhize Wu , Peiquan Jin , Yiwen Zhang This is my paper

Pith reviewed 2026-05-19 13:27 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords DenseLoRALow-Rank AdaptationParameter-Efficient Fine-TuningLarge Language ModelsRepresentation CompressionEncoder-Decoder

0 comments

The pith

DenseLoRA adapts large language models by compressing hidden representations with an encoder-decoder then applying one dense low-rank matrix instead of two redundant ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DenseLoRA as a way to adapt large language models more efficiently than standard LoRA. It first passes hidden representations through an encoder-decoder to refine and compress them across adaptation layers. This step allows replacement of LoRA's pair of low-rank matrices with a single dense low-rank matrix. On LLaMA3-8B the approach reaches 83.8 percent accuracy using 0.01 percent trainable parameters, exceeding LoRA's 80.8 percent accuracy at 0.70 percent parameters. The goal is to remove redundancy in low-rank weights while preserving or improving task performance.

Core claim

DenseLoRA adapts LLMs through an encoder-decoder that refines and compresses hidden representations across all layers, followed by a single dense low-rank matrix that performs the adaptation, replacing the two separate low-rank matrices used in LoRA and thereby improving parameter utilization and final accuracy.

What carries the argument

Encoder-decoder compression of hidden representations followed by a single dense low-rank adaptation matrix that consolidates the work of LoRA's two matrices.

If this is right

LLMs can be adapted to new tasks with roughly seventy times fewer trainable parameters while still reaching higher accuracy.
Redundancy among weights in conventional low-rank adaptation matrices can be eliminated by first compressing representations.
The compression-plus-dense-matrix pattern maintains performance across multiple standard benchmarks without extra overhead.
Fine-tuning becomes feasible in settings where memory or compute budgets are tighter than current LoRA requirements allow.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same encoder-decoder compression idea could be tested on other parameter-efficient tuning methods to see if they also benefit from reduced matrix redundancy.
Applying DenseLoRA to models larger than 8B parameters would test whether the reported parameter savings remain proportional at scale.
Combining the approach with existing quantization or pruning techniques might produce further cumulative reductions in both training and inference cost.

Load-bearing premise

The encoder-decoder compression preserves task-relevant information in the hidden representations without introducing new failure modes or excessive compute overhead.

What would settle it

On the same LLaMA3-8B benchmarks, if DenseLoRA produces lower accuracy than LoRA at matched parameter budgets or if removing the encoder-decoder step causes no drop in performance, the claimed efficiency advantage would not hold.

read the original abstract

Low-rank adaptation (LoRA) has been developed as an efficient approach for adapting large language models (LLMs) by fine-tuning two low-rank matrices, thereby reducing the number of trainable parameters. However, prior research indicates that many of the weights in these matrices are redundant, leading to inefficiencies in parameter utilization. To address this limitation, we introduce Dense Low-Rank Adaptation (DenseLoRA), a novel approach that enhances parameter efficiency while achieving superior performance compared to LoRA. DenseLoRA builds upon the concept of representation fine-tuning, incorporating a single Encoder-Decoder to refine and compress hidden representations across all adaptation layers before applying adaptation. Instead of relying on two redundant low-rank matrices as in LoRA, DenseLoRA adapts LLMs through a dense low-rank matrix, improving parameter utilization and adaptation efficiency. We evaluate DenseLoRA on various benchmarks, showing that it achieves 83.8% accuracy with only 0.01% of trainable parameters, compared to LoRA's 80.8% accuracy with 0.70% of trainable parameters on LLaMA3-8B. Additionally, we conduct extensive experiments to systematically assess the impact of DenseLoRA's components on overall model performance. Code is available at https://github.com/mulin-ahu/DenseLoRA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DenseLoRA adds a shared encoder-decoder to compress representations before a single dense low-rank update, claiming much lower parameter counts and a modest accuracy gain over LoRA on LLaMA3-8B, but the compression step lacks direct validation.

read the letter

The main point is that DenseLoRA reports 83.8% accuracy at 0.01% trainable parameters versus LoRA's 80.8% at 0.70% on LLaMA3-8B by inserting a shared encoder-decoder to refine and compress hidden states across layers, then using one dense low-rank matrix instead of the usual pair. This is a straightforward extension of existing representation fine-tuning ideas into the LoRA setting, and the paper supplies code plus component-wise experiments that show how the pieces interact on benchmarks. That combination and the concrete numbers are the useful parts here. The approach is easy to understand and the GitHub link lets others check the implementation directly. The central empirical claim is clear enough that it can be tested without needing to re-derive anything from scratch. The soft spot is the compression step itself. The paper states that the encoder-decoder refines representations without performance loss, yet it gives no similarity metrics between pre- and post-compression activations on the evaluation data and no ablation that holds total trainable parameters fixed while removing the encoder-decoder. If that step is discarding task-relevant features, both the accuracy lift and the extreme parameter reduction could be partly artifacts of an unmeasured bottleneck rather than pure efficiency gains. It is also unclear whether the encoder-decoder parameters are folded into the 0.01% count or treated separately. This paper is for researchers who already work on LoRA-style adaptations and want to test incremental efficiency tweaks on standard LLM benchmarks. A reader who cares about practical fine-tuning workflows would get value from the reported numbers and the released code. It deserves a serious referee because the method is simple to reproduce, the main claim is falsifiable with the existing benchmarks, and the experiments already include some component analysis that referees can build on.

Referee Report

2 major / 1 minor

Summary. The paper introduces DenseLoRA, which augments LoRA by inserting a shared Encoder-Decoder module that refines and compresses hidden representations across all layers before applying a single dense low-rank adaptation matrix. On LLaMA3-8B it reports 83.8% accuracy at 0.01% trainable parameters versus LoRA's 80.8% at 0.70%, with additional experiments on component contributions and publicly released code.

Significance. If the reported gains prove robust, DenseLoRA would demonstrate that a single shared compression step can materially improve parameter utilization over standard LoRA without sacrificing accuracy, offering a practical route to more efficient LLM adaptation. The open-source implementation is a clear positive for reproducibility.

major comments (2)

[Section 3.2] Section 3.2: the statement that the Encoder-Decoder 'refine[s] and compress[es] hidden representations' is load-bearing for both the accuracy gain and the 70-fold parameter reduction, yet the manuscript supplies neither mutual-information nor cosine-similarity statistics between pre- and post-compression activations on the evaluation distribution, nor any other direct fidelity metric.
[Section 4] Section 4 / experimental protocol: no ablation is presented that removes the Encoder-Decoder while holding total trainable-parameter count fixed to the DenseLoRA budget; without this control it is impossible to separate the contribution of compression from possible differences in hyper-parameter search or baseline implementation details.

minor comments (1)

[Abstract] The abstract refers to 'various benchmarks' without enumerating the concrete tasks or datasets; a short list in the abstract or a dedicated table would improve immediate readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of DenseLoRA. We address each major comment below and commit to revisions that strengthen the empirical support for our claims.

read point-by-point responses

Referee: [Section 3.2] Section 3.2: the statement that the Encoder-Decoder 'refine[s] and compress[es] hidden representations' is load-bearing for both the accuracy gain and the 70-fold parameter reduction, yet the manuscript supplies neither mutual-information nor cosine-similarity statistics between pre- and post-compression activations on the evaluation distribution, nor any other direct fidelity metric.

Authors: We agree that direct fidelity metrics would provide stronger evidence for the refinement and compression role of the Encoder-Decoder. The current manuscript supports this claim through end-to-end accuracy gains, parameter-efficiency comparisons, and component-wise ablations, but does not report pre/post-compression statistics such as cosine similarity or mutual information. In the revised manuscript we will add these metrics, computed on held-out evaluation activations from the LLaMA3-8B experiments, to quantify how well the compressed representations preserve information relevant to the downstream task. revision: yes
Referee: [Section 4] Section 4 / experimental protocol: no ablation is presented that removes the Encoder-Decoder while holding total trainable-parameter count fixed to the DenseLoRA budget; without this control it is impossible to separate the contribution of compression from possible differences in hyper-parameter search or baseline implementation details.

Authors: The referee correctly identifies a missing control. While the manuscript already includes ablations that isolate the contribution of individual DenseLoRA components, these do not enforce an identical total trainable-parameter budget when the Encoder-Decoder is removed. We will add a new experiment that allocates the same parameter count used by DenseLoRA to a standard LoRA baseline (by increasing the rank of the low-rank matrices accordingly) and report the resulting accuracy on the same benchmarks. This will allow a direct comparison that isolates the effect of the shared compression step. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on held-out benchmarks

full rationale

The paper introduces DenseLoRA as an architectural modification to LoRA: an encoder-decoder compresses hidden states across layers, followed by a single dense low-rank update matrix. The headline performance numbers (83.8% vs 80.8% accuracy at 0.01% vs 0.70% trainable parameters on LLaMA3-8B) are direct measurements on standard evaluation benchmarks, not quantities derived from fitted constants or self-referential equations inside the method. No step in the provided description reduces a reported result to a post-hoc fit, self-citation chain, or definitional equivalence. The compression step is presented as a design choice justified by prior observations of redundancy, not as a theorem derived from the paper's own outputs. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 0 invented entities

The method introduces one architectural choice (the encoder-decoder) and one matrix factorization change (dense instead of two low-rank factors) whose benefit is demonstrated empirically rather than derived from first principles.

free parameters (2)

encoder-decoder hidden dimension
Chosen to balance compression ratio against performance; value not stated in abstract.
dense low-rank rank value
Determines the size of the single adaptation matrix; not numerically specified in abstract.

pith-pipeline@v0.9.0 · 5777 in / 1289 out tokens · 19076 ms · 2026-05-19T13:27:58.550064+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DenseLoRA introduces a structured three-stage process: (1) An Encoder refines and compresses hidden representations; (2) A denser low-rank adaptation module adapts the model; (3) A Decoder reconstructs... ˆh = W0 h + Decoder(M Encoder(h))
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Instead of relying on two redundant low-rank matrices as in LoRA, DenseLoRA adapts LLMs through a dense low-rank matrix

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.