Polynomial Expansion Rank Adaptation: Enhancing Low-Rank Fine-Tuning with High-Order Interactions

Li Ni; Lin Mu; Peiquan Jin; Wenhao Zhang; Yiwen Zhang

arxiv: 2604.11841 · v1 · submitted 2026-04-12 · 💻 cs.LG · cs.AI

Polynomial Expansion Rank Adaptation: Enhancing Low-Rank Fine-Tuning with High-Order Interactions

Wenhao Zhang , Lin Mu , Li Ni , Peiquan Jin , Yiwen Zhang This is my paper

Pith reviewed 2026-05-10 15:34 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords low-rank adaptationpolynomial expansionfine-tuninglarge language modelsnonlinear interactionsparameter-efficient tuningexpressive capacityLoRA

0 comments

The pith

PERA adds polynomial expansions to low-rank factors to capture high-order nonlinear interactions in LLM fine-tuning without raising rank or inference cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies that standard low-rank adaptation like LoRA is limited to linear and first-order bilinear updates, which restricts its ability to model complex parameter dependencies during fine-tuning of large models. PERA addresses this by applying structured polynomial expansion to each low-rank factor, generating terms such as squares and higher powers before the factors are composed into the weight update. This change turns the adaptation into a polynomial manifold that supports richer nonlinear coupling while preserving the original rank and keeping inference unchanged. Theoretical arguments show gains in expressive capacity and feature utilization, and experiments confirm consistent outperformance over prior methods, with square terms proving especially effective across rank values.

Core claim

By expanding each low-rank factor to synthesize high-order interaction terms before composition, PERA converts the adaptation space into a polynomial manifold that models richer nonlinear coupling, delivering enhanced expressive capacity and more effective feature utilization than linear low-rank methods without any increase in rank or inference cost.

What carries the argument

Structured polynomial expansion of each low-rank factor before composition, which inserts high-order interaction terms directly into the adaptation update.

If this is right

The method yields greater expressive capacity than the bilinear formulation used in conventional LoRA.
Inclusion of nonlinear components, especially square terms, produces more effective use of available features under fixed rank.
Performance stays strong and stable when rank varies, provided high-order terms are retained.
Empirical gains appear across diverse tasks while inference cost remains identical to standard low-rank adaptation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same expansion idea could be tested on other parameter-efficient methods that currently rely on linear or bilinear updates.
If the polynomial terms prove stable at larger scales, the technique might reduce the rank needed to reach a target performance level.
A direct comparison of training dynamics with and without the square terms could isolate which orders drive the reported robustness.

Load-bearing premise

Expanding low-rank factors with high-order terms will reliably increase expressive capacity and feature utilization without introducing training instability, overfitting, or any hidden inference overhead.

What would settle it

A controlled run on a standard benchmark where PERA produces no accuracy gain over baseline LoRA, shows clear overfitting, or increases measured inference latency would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.11841 by Li Ni, Lin Mu, Peiquan Jin, Wenhao Zhang, Yiwen Zhang.

**Figure 2.** Figure 2: The architecture comparison between LoRA and PERA. By applying the polynomial expansion technique, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Accuracy(%) comparison with the rank r increases on LLaMA3-8B model. The detailed results are provided in Appendix C.1 high-capacity models. 5 Understanding the PERA 5.1 Impact on the Number of Rank We conduct a systematic study to investigate how the parameter rank r influences model performance across multiple commonsense reasoning tasks. In the experiments, we vary only the value of r while keeping all … view at source ↗

**Figure 4.** Figure 4: Average accuracy under different pruning [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: The interaction strength matrix of LoRA and [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Loss Comparision between Different PEFT Methods [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

read the original abstract

Low-rank adaptation (LoRA) is a widely used strategy for efficient fine-tuning of large language models (LLMs), but its strictly linear structure fundamentally limits expressive capacity. The bilinear formulation of weight updates captures only first-order dependencies between low-rank factors, restricting the modeling of nonlinear and higher-order parameter interactions. In this paper, we propose Polynomial Expansion Rank Adaptation (PERA), a novel method that introduces structured polynomial expansion directly into the low-rank factor space. By expanding each low-rank factor to synthesize high-order interaction terms before composition, PERA transforms the adaptation space into a polynomial manifold capable of modeling richer nonlinear coupling without increasing rank or inference cost. We provide theoretical analysis demonstrating that PERA offers enhanced expressive capacity and more effective feature utilization compare to existing linear adaptation approaches. Empirically, PERA consistently outperforms state-of-the-art methods across diverse benchmarks. Notably, our experiments show that incorporating high-order nonlinear components particularly square terms is crucial for enhancing expressive capacity and maintaining strong and robust performance under various rank settings. Our code is available at https://github.com/zhangwenhao6/PERA

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PERA expands the low-rank factors with polynomial terms before composition to add nonlinearity, and the experiments report gains, but the inference-cost claim needs explicit verification.

read the letter

The main takeaway is that PERA modifies standard LoRA by expanding each low-rank factor with polynomial terms such as squares before the factors are multiplied together. This is intended to capture higher-order interactions while keeping the adaptation rank and inference cost unchanged. The paper reports consistent improvements over baselines on several benchmarks and notes that the square terms contribute noticeably to the gains across different rank settings. Code release is a plus for anyone wanting to inspect the implementation directly.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Polynomial Expansion Rank Adaptation (PERA) as an extension of LoRA for parameter-efficient fine-tuning of LLMs. It introduces structured polynomial expansions (particularly square terms) directly into the low-rank factors before their composition, transforming the adaptation into a polynomial manifold that models higher-order nonlinear interactions. The central claims are enhanced expressive capacity and feature utilization relative to linear methods, with no increase in rank or inference cost, supported by theoretical analysis and empirical results showing consistent gains across benchmarks.

Significance. If the efficiency claims hold, PERA would represent a useful advance in low-rank adaptation by enabling nonlinear modeling at constant inference cost. The empirical emphasis on the necessity of high-order terms for robustness across rank settings is a concrete contribution that could inform future work on expressive yet efficient fine-tuning.

major comments (3)

[§3.2] §3.2, forward-pass definition: the claim that polynomial expansion occurs 'before composition' without raising inference FLOPs requires an explicit algebraic reduction or pre-merge step showing equivalence to a single low-rank product (as in standard LoRA); absent this, the constant-cost premise is unsupported.
[§4.2] §4.2, ablation on square terms: while gains are reported when high-order components are included, the experiments do not include controls for training instability or overfitting that the skeptic note flags as risks of factor expansion; this undermines the robustness claim across rank settings.
[Theoretical analysis] Theoretical analysis section: the proof of increased expressive capacity over bilinear LoRA assumes factor properties (e.g., bounded norms or independence) that are not verified in the training dynamics or tested via counter-examples.

minor comments (2)

[Abstract] Abstract: 'compare to existing linear adaptation approaches' should read 'compared to'.
[Figures/Tables] Figure captions and tables: axis labels and legend entries for rank settings are occasionally inconsistent with the text description of the same experiments.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. The comments raise valid points about the forward-pass efficiency, experimental ablations, and theoretical assumptions. We have revised the manuscript to provide the requested clarifications and additional evidence, as detailed in our point-by-point responses below.

read point-by-point responses

Referee: [§3.2] §3.2, forward-pass definition: the claim that polynomial expansion occurs 'before composition' without raising inference FLOPs requires an explicit algebraic reduction or pre-merge step showing equivalence to a single low-rank product (as in standard LoRA); absent this, the constant-cost premise is unsupported.

Authors: We agree that an explicit algebraic reduction is essential to support the constant inference cost claim. In the revised manuscript, we have included a detailed derivation in §3.2 demonstrating that the polynomial terms can be pre-merged into the low-rank factors prior to composition. This results in an effective single low-rank matrix multiplication at inference time, equivalent to standard LoRA in terms of FLOPs. We provide the step-by-step algebraic steps showing the equivalence. revision: yes
Referee: [§4.2] §4.2, ablation on square terms: while gains are reported when high-order components are included, the experiments do not include controls for training instability or overfitting that the skeptic note flags as risks of factor expansion; this undermines the robustness claim across rank settings.

Authors: We acknowledge the importance of addressing potential training instability and overfitting risks. To strengthen this, we have added in the revised §4.2 new ablation studies that include monitoring of training and validation losses across ranks, as well as comparisons with and without dropout regularization. The results show stable training without evidence of overfitting, supporting the robustness of the performance gains from high-order terms. revision: yes
Referee: [Theoretical analysis] Theoretical analysis section: the proof of increased expressive capacity over bilinear LoRA assumes factor properties (e.g., bounded norms or independence) that are not verified in the training dynamics or tested via counter-examples.

Authors: The theoretical proof relies on standard assumptions for low-rank factors that are commonly used in the field. We agree that empirical verification enhances the analysis. In the revision, we have added empirical measurements of factor norms during training and a discussion section addressing the assumptions. We also include a simple counter-example to illustrate boundary cases, though full verification of independence is challenging due to the nature of gradient-based training. revision: partial

Circularity Check

0 steps flagged

No circularity: proposal is a direct architectural ansatz with independent theoretical and empirical support

full rationale

The paper defines PERA explicitly as the act of polynomial expansion on low-rank factors before composition. No equation or claim reduces a 'prediction' to a fitted parameter by construction, nor does any load-bearing step rely on self-citation chains or imported uniqueness theorems. The theoretical analysis and benchmark results are presented as consequences of the stated construction rather than tautological restatements of inputs. This is the normal non-circular case for a methods paper introducing a new adapter variant.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the polynomial manifold is described at a conceptual level without mathematical specification.

pith-pipeline@v0.9.0 · 5502 in / 1063 out tokens · 72422 ms · 2026-05-10T15:34:40.181054+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, and 1 others

Delora: Decoupling angles and strength in low- rank adaptation.arXiv preprint arXiv:2503.18225. Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, and 1 others. 2020. Piqa: Reasoning about physical commonsense in natural language. InProceedings of the AAAI conference on artificial intelligence, vol- ume 34, pages 7432–7439. Tom Brown, Benjamin Mann, N...

work page arXiv 2020
[2]

Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models.arXiv preprint arXiv:2304.01933,

The second pascal recognising textual entail- ment challenge. InProceedings of the Second PAS- CAL Challenges Workshop on Recognising Textual Entailment, volume 7, pages 785–794. Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer...

work page arXiv 2019
[3]

A broad-coverage challenge corpus for sen- tence understanding through inference. InProceed- ings of the 2018 conference of the North American chapter of the association for computational linguis- tics: human language technologies, volume 1 (long papers), pages 1112–1122. Muling Wu, Wenhao Liu, Xiaohua Wang, Tianlong Li, Changze Lv, Zixuan Ling, Zhu JianH...

work page arXiv 2018

[1] [1]

Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, and 1 others

Delora: Decoupling angles and strength in low- rank adaptation.arXiv preprint arXiv:2503.18225. Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, and 1 others. 2020. Piqa: Reasoning about physical commonsense in natural language. InProceedings of the AAAI conference on artificial intelligence, vol- ume 34, pages 7432–7439. Tom Brown, Benjamin Mann, N...

work page arXiv 2020

[2] [2]

Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models.arXiv preprint arXiv:2304.01933,

The second pascal recognising textual entail- ment challenge. InProceedings of the Second PAS- CAL Challenges Workshop on Recognising Textual Entailment, volume 7, pages 785–794. Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer...

work page arXiv 2019

[3] [3]

A broad-coverage challenge corpus for sen- tence understanding through inference. InProceed- ings of the 2018 conference of the North American chapter of the association for computational linguis- tics: human language technologies, volume 1 (long papers), pages 1112–1122. Muling Wu, Wenhao Liu, Xiaohua Wang, Tianlong Li, Changze Lv, Zixuan Ling, Zhu JianH...

work page arXiv 2018