arxiv: 2602.23638 · v2 · submitted 2026-02-27 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA

Haoran Zhang , Dongjun Kim , Seohyeon Cha , Haris Vikalo

Authors on Pith no claims yet

Pith reviewed 2026-05-15 18:51 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords federated learningLoRAlow-rank adaptationrotational misalignmentorthogonal alignmentaggregation errordecentralized fine-tuningconvergence analysis

0 comments

The pith

Federated LoRA updates suffer aggregation error from rotational misalignment of low-rank factors across clients, which orthogonal alignment can correct before averaging.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In federated fine-tuning with LoRA, each client produces a low-rank update that can be represented in many equivalent ways because multiplying one factor by an orthogonal matrix and the other by its transpose leaves the product unchanged. When these differently rotated factors are averaged directly on the server, they interfere destructively and produce a global update that deviates from what correct aggregation would give. The paper introduces a step that finds orthogonal transformations to rotate each client's factors into a common subspace before averaging. This step leaves the semantic meaning of every update intact, adds no communication overhead, and imposes no limit on expressivity. Convergence analysis shows the new procedure produces a strictly smaller upper bound on the aggregation error than plain factor-wise averaging, and experiments on language-understanding and generation benchmarks confirm lower error and higher final accuracy across varying data heterogeneity and ranks.

Core claim

Rotational invariance of the factorization (B_i R_i)(R_i^T A_i) = B_i A_i allows semantically identical updates to occupy different latent subspaces on different clients; averaging the misaligned factors directly therefore incurs avoidable error. Applying a client-specific orthogonal matrix to each pair of factors before aggregation maps them into a shared basis while exactly preserving the product B A, thereby reducing cross-client subspace mismatch. The resulting aligned aggregation yields a tighter analytic bound on the error induced by factor-wise averaging and produces measurably more stable training trajectories.

What carries the argument

Orthogonal transformation alignment of client LoRA factors (B_i, A_i) to a common reference subspace before server aggregation, which leaves the low-rank product unchanged while minimizing destructive interference.

If this is right

The alignment produces a provably smaller upper bound on the aggregation error that arises from factor-wise averaging.
Training remains stable and reaches higher accuracy on both natural-language-understanding and generative tasks under a range of data heterogeneity levels.
The improvement holds for multiple choices of LoRA rank without any increase in bits communicated per round.
Model expressivity is unchanged because each orthogonal transformation is invertible and the product B A is exactly preserved.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same alignment idea could be tested on other factorized parameter-efficient methods such as adapters or prompt tuning when run in federated settings.
In extremely heterogeneous regimes the reduction in subspace mismatch may allow larger LoRA ranks to be used without destabilizing convergence.
One could measure whether the orthogonal correction remains beneficial when client updates are further compressed by quantization or sparsification.

Load-bearing premise

Rotational misalignment between clients' low-rank factors is the dominant source of aggregation error rather than other factors such as data heterogeneity or optimizer noise.

What would settle it

Run a controlled simulation in which all clients receive identical data and identical random seeds except for an artificial random orthogonal rotation applied to each client's factors; measure whether the alignment step drives the difference between factor-averaged and correctly aggregated updates to near zero.

read the original abstract

Federated LoRA provides a communication-efficient mechanism for fine-tuning large language models on decentralized data. In practice, however, a discrepancy between the factor-wise averaging used to preserve low rank and the mathematically correct aggregation of local updates can cause significant aggregation error and unstable training. We argue that a major source of this problem is rotational misalignment, arising from the rotational invariance of low-rank factorizations -- semantically equivalent updates can be represented in different latent subspaces across clients since $(B_i R_i)(R_i^\top A_i) = B_i A_i$. When such misaligned factors are averaged directly, they interfere destructively and degrade the global update. To address this issue, we propose FedRot-LoRA, a federated LoRA framework that aligns client updates via orthogonal transformations prior to aggregation. This alignment preserves the semantic update while reducing cross-client subspace mismatch, without increasing communication cost or restricting model expressivity. We provide a convergence analysis that examines the aggregation error induced by factor-wise averaging and shows how rotational alignment yields a tighter upper bound on this error. Extensive experiments on natural language understanding and generative tasks demonstrate that FedRot-LoRA consistently outperforms existing federated LoRA baselines across a range of heterogeneity levels and LoRA ranks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FedRot-LoRA adds orthogonal alignment of LoRA factors before aggregation to cut rotational mismatch, but the claimed tighter error bound holds only under assumptions that data heterogeneity stays mostly rotational.

read the letter

The core idea is straightforward: client LoRA updates can sit in rotated subspaces even when they represent the same semantic change, because of the invariance (B R)(R^T A) = B A. FedRot-LoRA picks orthogonal R_i per client to align the factors before the server averages them, then claims this reduces the aggregation error without extra communication or loss of expressivity. The convergence section derives an upper bound that shrinks after alignment, and the experiments report steady gains over standard federated LoRA on NLU and generation tasks across different heterogeneity levels and ranks. That is the actual new piece; prior federated LoRA work focused on communication or rank selection but left this factorization mismatch untouched. The math on semantic preservation is clean and the experimental pattern is consistent enough to be worth checking. The soft spot sits in the bound itself. The derivation treats lower factor-space distance as a direct stand-in for lower product error after averaging, yet the stress-test note is right that this proxy needs the dominant directions to be nearly aligned already or the objective to be exactly the right Procrustes problem on the concatenated factors. If heterogeneity introduces non-rotational shifts, the chosen R_i can shrink the reported factor mismatch while leaving the real ||average(B A) - average(B R) average(R^T A)|| almost unchanged. The paper does not appear to test that case explicitly. Readers working on practical federated fine-tuning of LLMs will find the setup useful for stability questions. The work shows clear thinking on its own terms and the experiments are reproducible in principle, so it deserves a serious referee to verify the bound derivation and the experimental controls on heterogeneity.

Referee Report

2 major / 2 minor

Summary. The paper introduces FedRot-LoRA to address aggregation error in federated LoRA fine-tuning of LLMs. It identifies rotational misalignment from the invariance property (B_i R_i)(R_i^T A_i) = B_i A_i as a key source of destructive interference when averaging low-rank factors across clients. The method applies client-specific orthogonal transformations prior to aggregation to align subspaces, claims semantic preservation, no added communication cost or expressivity loss, a tighter convergence bound on aggregation error, and consistent empirical gains over baselines on NLU and generative tasks under varying heterogeneity and ranks.

Significance. If the analysis and experiments hold, the work offers a practical, low-overhead fix for a recurring instability in federated parameter-efficient tuning. The explicit treatment of rotational invariance and the derived error bound could inform future federated LoRA variants; the absence of extra communication or rank restrictions is a clear practical strength.

major comments (2)

[§4] §4 (Convergence Analysis): The claim that orthogonal alignment produces a strictly tighter upper bound on ||(1/n)∑B_i A_i − (1/n∑B_i R_i)(1/n∑R_i^T A_i)|| relies on the reduction in factor-space distance serving as a direct proxy for product-space error. This proxy requires additional justification or assumptions (e.g., that dominant singular vectors are already nearly aligned or that the alignment objective is the orthogonal Procrustes problem on the concatenated factors); without them the bound tightening does not necessarily follow when data heterogeneity induces non-rotational mismatch components.
[§5] §5 (Experiments): The reported consistent outperformance across heterogeneity levels and LoRA ranks is promising, but the evaluation does not clarify whether the orthogonal transformations R_i are computed from the same local updates used for aggregation or from a separate validation pass; if the latter, the comparison to baselines may overstate gains due to extra information not available in standard FedLoRA.

minor comments (2)

[§3] Notation: The distinction between the aligned factors (B_i R_i, R_i^T A_i) and the original factors should be made explicit in the first appearance of the alignment step to avoid reader confusion with standard LoRA notation.
[§5] Figure clarity: The convergence plots would benefit from error bars or shaded regions indicating variance across random seeds, especially given the emphasis on training stability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for major revision. We address each major comment below with clarifications and revisions to strengthen the manuscript. The concerns raised can be resolved without altering the core contributions.

read point-by-point responses

Referee: [§4] §4 (Convergence Analysis): The claim that orthogonal alignment produces a strictly tighter upper bound on ||(1/n)∑B_i A_i − (1/n∑B_i R_i)(1/n∑R_i^T A_i)|| relies on the reduction in factor-space distance serving as a direct proxy for product-space error. This proxy requires additional justification or assumptions (e.g., that dominant singular vectors are already nearly aligned or that the alignment objective is the orthogonal Procrustes problem on the concatenated factors); without them the bound tightening does not necessarily follow when data heterogeneity induces non-rotational mismatch components.

Authors: We appreciate the referee's careful reading of the convergence analysis. The orthogonal transformations are obtained by solving the orthogonal Procrustes problem independently on each client's concatenated factors (B_i, A_i), which directly minimizes the factor-space Frobenius distance. In the revised §4 we now explicitly state the assumption that the dominant mismatch under LoRA's rotational invariance is rotational (as non-rotational components would violate the low-rank update structure preserved across clients), and we provide a short lemma showing that the product error is monotonically bounded by the aligned factor distance under this condition. When heterogeneity introduces non-rotational mismatch, the bound remains valid but is no longer strictly tighter than the unaligned case; we have added a remark acknowledging this limitation and noting that empirical results indicate rotational misalignment dominates in practice. revision: yes
Referee: [§5] §5 (Experiments): The reported consistent outperformance across heterogeneity levels and LoRA ranks is promising, but the evaluation does not clarify whether the orthogonal transformations R_i are computed from the same local updates used for aggregation or from a separate validation pass; if the latter, the comparison to baselines may overstate gains due to extra information not available in standard FedLoRA.

Authors: We thank the referee for pointing out this ambiguity. The transformations R_i are computed locally from the exact same client updates (A_i, B_i) that are subsequently aggregated; no separate validation pass or additional data is involved. The alignment step uses only the current-round local factors via the closed-form Procrustes solution, incurring negligible local compute and zero extra communication. We have revised the experimental description in §5 and added explicit pseudocode in the appendix to make this procedure unambiguous, ensuring fair comparison with standard FedLoRA baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on explicit identities and bounds without reduction to inputs by construction

full rationale

The paper's core argument starts from the explicit identity (B_i R_i)(R_i^T A_i) = B_i A_i to identify rotational invariance, then introduces orthogonal transformations R_i to align factors before averaging. The convergence analysis derives an upper bound on the aggregation error ||(1/n)∑B_i A_i − (1/n∑B_i R_i)(1/n∑R_i^T A_i)|| and shows that the chosen alignment reduces the bound under the stated assumptions. None of these steps are self-definitional, fitted-input-as-prediction, or dependent on self-citation chains; the bound is obtained directly from norm inequalities and the alignment objective rather than by renaming or presupposing the target improvement. The proposal is therefore self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the mathematical property of rotational invariance in low-rank factorizations and the assumption that alignment via orthogonal transforms is feasible and beneficial.

axioms (1)

standard math Low-rank factorizations exhibit rotational invariance such that (B_i R_i)(R_i^T A_i) = B_i A_i, allowing semantically equivalent updates in different subspaces across clients
Directly invoked in the abstract to explain the source of misalignment.

pith-pipeline@v0.9.0 · 5525 in / 1265 out tokens · 50315 ms · 2026-05-15T18:51:24.498574+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

semantically equivalent updates can be represented in different latent subspaces across clients since (B_i R_i)(R_i^T A_i) = B_i A_i
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

rotational alignment yields a strictly tighter upper bound on this error

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.