Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

Chengqian Zhang; Kyumin Lee; Wei Zhu

arxiv: 2605.18822 · v1 · pith:FI6T3EVVnew · submitted 2026-05-12 · 💻 cs.LG · cs.AI

Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

Chengqian Zhang , Wei Zhu , Kyumin Lee This is my paper

Pith reviewed 2026-05-20 22:19 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords Hybrid-LoRALoRAfull fine-tuningPEFTLLM post-trainingreasoningparameter efficient

0 comments

The pith

Hybrid-LoRA matches full fine-tuning on reasoning tasks by fully updating only the 10% of modules most sensitive to low-rank changes and applying LoRA to the rest.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a way to adapt large language models for tasks like reasoning without the full memory cost of updating every parameter. It identifies a small number of modules that do not respond well to efficient low-rank updates and tunes those fully instead. This hybrid strategy aims to close the performance gap that pure efficient methods leave behind in complex post-training. If successful, it would let developers achieve high-quality model behavior on instruction following and multi-step reasoning with far less compute. The approach ranks modules using a sensitivity score to decide which ones need the full treatment under a limited budget.

Core claim

Hybrid-LoRA computes a score for each module measuring sensitivity to low-rank adaptation, selects the top 10 percent for full fine-tuning, and applies LoRA to the remaining modules. This yields performance close to full fine-tuning and improves over four PEFT baselines by an average of 4.36 percent, with peaks of 5.65 percent.

What carries the argument

The Hybrid-LoRA Score that ranks modules according to their sensitivity to low-rank adaptation under a fixed parameter budget.

Load-bearing premise

The Hybrid-LoRA Score accurately identifies the modules least suited to low-rank adaptation so that fully tuning just the top 10 percent recovers full performance.

What would settle it

A controlled experiment showing that selecting a random 10 percent of modules for full fine-tuning performs equally well as or better than the score-based selection would undermine the need for the ranking mechanism.

Figures

Figures reproduced from arXiv: 2605.18822 by Chengqian Zhang, Kyumin Lee, Wei Zhu.

**Figure 2.** Figure 2: The LoRA module placements on the Qwen2.5 1.5B backbones, after post-trained on the [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Ablation studies on the number of tunable parameters. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

Post-training has become essential for adapting large language models (LLMs) to complex downstream behaviors, including instruction following, preference alignment, and multi-step reasoning. Reinforcement learning with verifiable rewards (RLVR) has recently emerged as a particularly effective post-training paradigm for improving reasoning capabilities, with critic-free algorithms such as GRPO and GSPO enabling scalable optimization. However, RLVR post-training with full fine-tuning (FFT) requires substantial GPU memory and incurs high training costs. Although parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), effectively reduce computational costs, they often suffer from a noticeable performance gap compared to full fine-tuning in post-training for complex reasoning tasks. In this paper, we propose Hybrid-LoRA, an efficient hybrid post-training framework that selectively applies full fine-tuning to a small subset of modules less suited to low-rank adaptation, while adapting the remaining components with LoRA. We introduce a novel Hybrid-LoRA Score to rank candidate modules according to their sensitivity to low-rank adaptation under a fixed parameter budget. Experiments show that Hybrid-LoRA closely matches full fine-tuning performance under a 10% full fine-tuning module budget, with the remaining candidate modules adapted by LoRA, consistently outperforming four state-of-the-art PEFT post-training baselines, achieving improvements of up to 5.65% and on average 4.36% over the best baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Hybrid-LoRA narrows the gap to full fine-tuning by full-updating 10% of modules via a sensitivity score and LoRA on the rest, but the score needs direct comparison to random or simpler selection to prove it drives the gains.

read the letter

The central claim here is that Hybrid-LoRA gets performance close to full fine-tuning on reasoning post-training by fully updating only the top 10% of modules picked by their Hybrid-LoRA Score and applying LoRA to the rest. It also reports beating four PEFT baselines by up to 5.65% and 4.36% on average. That hybrid budget split is the practical hook for anyone who wants better results than plain LoRA without the full memory cost of FFT in RLVR setups like GRPO or GSPO. The score itself is the clearest new piece: it ranks modules by how much they lose under low-rank adaptation given a fixed parameter budget, so the method can decide where to spend the full updates. The paper frames the problem cleanly around the performance gap that standard LoRA shows on complex downstream behaviors, and the reported numbers suggest the hybrid can close most of it. If the full experiments include solid dataset details, multiple runs, and error bars, this setup could be directly useful for labs doing alignment or reasoning work on models where full fine-tuning is too heavy. The main soft spot is the missing controls on the selection step. Without an ablation that pits the Hybrid-LoRA Score against random choice of the 10% or against a gradient-norm baseline, it is hard to tell whether the ranking is doing real work or whether any 10% full-FT plus LoRA would produce similar lifts. The abstract is also thin on those experimental specifics, which leaves the strength of the 4.36% average gain unclear until the full tables are checked. This is aimed at researchers who already run post-training pipelines and want a lighter alternative to FFT. A reader focused on PEFT for LLMs would get value from trying the hybrid pattern even if the score needs more testing. It deserves a serious referee because the idea is testable and addresses a concrete cost-performance trade-off, though the review would likely push for the selection ablations and clearer result reporting.

Referee Report

2 major / 1 minor

Summary. The paper proposes Hybrid-LoRA, a hybrid post-training framework for LLMs that introduces a Hybrid-LoRA Score to rank modules by their sensitivity to low-rank adaptation under a fixed parameter budget. It applies full fine-tuning to the top-ranked 10% of modules and LoRA to the remainder, claiming this setup closely matches full fine-tuning performance on reasoning tasks while outperforming four state-of-the-art PEFT baselines by up to 5.65% and 4.36% on average.

Significance. If validated, the approach could meaningfully reduce the computational cost of RLVR post-training for complex reasoning while narrowing the gap to full fine-tuning. The Hybrid-LoRA Score offers a potentially useful mechanism for module selection, but its contribution to the observed gains remains to be isolated from the hybrid budget itself.

major comments (2)

[Experiments] Experiments section: The central claim that the Hybrid-LoRA Score enables near-full-fine-tuning performance rests on the assumption that it accurately identifies the 10% of modules least suited to LoRA. No ablation is reported comparing this selection to random module selection or to alternative heuristics (e.g., gradient-norm ranking) under the same 10% full-FT budget; without such controls the 4.36% average improvement cannot be confidently attributed to the proposed score rather than the hybrid allocation alone.
[§4] §4 (or equivalent results section): The reported performance gains lack accompanying details on statistical significance testing, error bars across multiple runs, or precise dataset splits and baseline re-implementations. These omissions make it impossible to assess whether the improvements over the best PEFT baseline are robust or reproducible.

minor comments (1)

[Method] Notation for the Hybrid-LoRA Score could be clarified with an explicit equation or pseudocode block showing how the fixed-parameter-budget sensitivity is computed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Experiments] Experiments section: The central claim that the Hybrid-LoRA Score enables near-full-fine-tuning performance rests on the assumption that it accurately identifies the 10% of modules least suited to LoRA. No ablation is reported comparing this selection to random module selection or to alternative heuristics (e.g., gradient-norm ranking) under the same 10% full-FT budget; without such controls the 4.36% average improvement cannot be confidently attributed to the proposed score rather than the hybrid allocation alone.

Authors: We agree that an ablation isolating the Hybrid-LoRA Score from the hybrid budget allocation is important for attributing the observed gains. In the revised manuscript we will add a controlled comparison under the identical 10% full fine-tuning budget, evaluating our score against both random module selection and a gradient-norm ranking heuristic. This will clarify whether the performance improvements stem specifically from the proposed sensitivity ranking. revision: yes
Referee: [§4] §4 (or equivalent results section): The reported performance gains lack accompanying details on statistical significance testing, error bars across multiple runs, or precise dataset splits and baseline re-implementations. These omissions make it impossible to assess whether the improvements over the best PEFT baseline are robust or reproducible.

Authors: We acknowledge that additional statistical details and reproducibility information would improve the manuscript. In the revised version we will report error bars from multiple independent runs, include statistical significance tests (such as paired t-tests with p-values), and provide explicit descriptions of dataset splits together with the precise re-implementation settings used for all baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: Hybrid-LoRA Score introduced as independent ranking mechanism

full rationale

The paper defines the Hybrid-LoRA Score explicitly as a ranking of modules by sensitivity to low-rank adaptation under a fixed parameter budget, then selects the top 10% for full fine-tuning while applying LoRA to the rest. This selection feeds into empirical experiments that compare against external baselines and full fine-tuning; no equation, definition, or result is shown to reduce to a tautological fit of the target performance metric, a self-citation chain, or an ansatz smuggled from prior work by the same authors. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit free parameters, axioms, or invented entities are described in the abstract; the Hybrid-LoRA Score is presented as a new ranking tool whose internal computation is not detailed here.

pith-pipeline@v0.9.0 · 5789 in / 1117 out tokens · 29691 ms · 2026-05-20T22:19:04.795853+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a novel Hybrid-LoRA Score to rank candidate modules according to their sensitivity to low-rank adaptation under a fixed parameter budget.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Hybrid-Score ... s(t)_{l,m} = 1/r ||e_{l,m} ⊙ g(t)_{l,m}||_1 ... H(l,m) = μ_{l,m} · σ_{l,m}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 8 internal anchors

[1]

Sparse low-rank adaptation of pre-trained language models

Ning Ding, Xingtai Lv, Qiaosen Wang, Yulin Chen, Bowen Zhou, Zhiyuan Liu, and Maosong Sun. Sparse low-rank adaptation of pre-trained language models. InProceedings of the 2023 conference on empirical methods in natural language processing, pages 4133–4145,

work page 2023
[2]

SNIP: Single-shot Network Pruning based on Connection Sensitivity

URL https: //github.com/huggingface/open-r1. Namhoon Lee, Thalaiyasingam Ajanthan, and Philip HS Torr. Snip: Single-shot network pruning based on connection sensitivity.arXiv preprint arXiv:1810.02340,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

DARTS: Differentiable Architecture Search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search.arXiv preprint arXiv:1806.09055,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Alora: Allocating low-rank adaptation for fine-tuning large language models

Zequan Liu, Jiawen Lyn, Wei Zhu, Xing Tian, and Yvette Graham. Alora: Allocating low-rank adaptation for fine-tuning large language models. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (V olume 1: Long Papers), pages 622–641,

work page 2024
[5]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L

Accessed: 2026-04-25. Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instruction...

work page 2026
[6]

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Curran Associates Inc. ISBN 9781713871088. David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R Bowman. Gpqa: A graduate-level google-proof q&a benchmark. arXiv preprint arXiv:2311.12022,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

URLhttps://arxiv.org/abs/2402.03300. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems. Curran Associates, Inc.,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Leetcodedataset: A temporal dataset for robust evaluation and efficient training of code llms

Yunhui Xia, Wei Shen, Yan Wang, Jason Klein Liu, Huifeng Sun, Siyue Wu, Jian Hu, and Xiaolong Xu. Leetcodedataset: A temporal dataset for robust evaluation and efficient training of code llms. arXiv preprint arXiv:2504.14655,

work page arXiv
[10]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. Dapo: An open-source llm reinforcement learning system at scale.arXiv preprint arXiv:2503.14476,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Ruiyi Zhang, Rushi Qiang, Sai Ashish Somayajula, and Pengtao Xie

URL https://openreview.net/forum?id= lq62uWRJjiY. Ruiyi Zhang, Rushi Qiang, Sai Ashish Somayajula, and Pengtao Xie. Autolora: Automatically tuning matrix ranks in low-rank adaptation based on meta learning. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (V ol...

work page 2024
[13]

Group Sequence Policy Optimization

Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, et al. Group sequence policy optimization.arXiv preprint arXiv:2507.18071,

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Sparse low-rank adaptation of pre-trained language models

Ning Ding, Xingtai Lv, Qiaosen Wang, Yulin Chen, Bowen Zhou, Zhiyuan Liu, and Maosong Sun. Sparse low-rank adaptation of pre-trained language models. InProceedings of the 2023 conference on empirical methods in natural language processing, pages 4133–4145,

work page 2023

[2] [2]

SNIP: Single-shot Network Pruning based on Connection Sensitivity

URL https: //github.com/huggingface/open-r1. Namhoon Lee, Thalaiyasingam Ajanthan, and Philip HS Torr. Snip: Single-shot network pruning based on connection sensitivity.arXiv preprint arXiv:1810.02340,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

DARTS: Differentiable Architecture Search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search.arXiv preprint arXiv:1806.09055,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Alora: Allocating low-rank adaptation for fine-tuning large language models

Zequan Liu, Jiawen Lyn, Wei Zhu, Xing Tian, and Yvette Graham. Alora: Allocating low-rank adaptation for fine-tuning large language models. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (V olume 1: Long Papers), pages 622–641,

work page 2024

[5] [5]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L

Accessed: 2026-04-25. Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instruction...

work page 2026

[6] [6]

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Curran Associates Inc. ISBN 9781713871088. David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R Bowman. Gpqa: A graduate-level google-proof q&a benchmark. arXiv preprint arXiv:2311.12022,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

URLhttps://arxiv.org/abs/2402.03300. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems. Curran Associates, Inc.,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Leetcodedataset: A temporal dataset for robust evaluation and efficient training of code llms

Yunhui Xia, Wei Shen, Yan Wang, Jason Klein Liu, Huifeng Sun, Siyue Wu, Jian Hu, and Xiaolong Xu. Leetcodedataset: A temporal dataset for robust evaluation and efficient training of code llms. arXiv preprint arXiv:2504.14655,

work page arXiv

[10] [10]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. Dapo: An open-source llm reinforcement learning system at scale.arXiv preprint arXiv:2503.14476,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Ruiyi Zhang, Rushi Qiang, Sai Ashish Somayajula, and Pengtao Xie

URL https://openreview.net/forum?id= lq62uWRJjiY. Ruiyi Zhang, Rushi Qiang, Sai Ashish Somayajula, and Pengtao Xie. Autolora: Automatically tuning matrix ranks in low-rank adaptation based on meta learning. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (V ol...

work page 2024

[13] [13]

Group Sequence Policy Optimization

Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, et al. Group sequence policy optimization.arXiv preprint arXiv:2507.18071,

work page internal anchor Pith review Pith/arXiv arXiv