Preventing Rank Collapse in Federated Low-Rank Adaptation with Client Heterogeneity

Fei Wu; Geyong Min; Jia Hu; Shiqiang Wang

arxiv: 2602.13486 · v2 · submitted 2026-02-13 · 💻 cs.LG · cs.AI· cs.DC

Preventing Rank Collapse in Federated Low-Rank Adaptation with Client Heterogeneity

Fei Wu , Jia Hu , Geyong Min , Shiqiang Wang This is my paper

Pith reviewed 2026-05-15 22:01 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DC

keywords federated learninglow-rank adaptationrank collapseclient heterogeneityfederated fine-tuningLoRA aggregationSVD rank allocation

0 comments

The pith

In federated LoRA with client-specific ranks, standard aggregation suppresses higher-rank updates geometrically because weights ignore which clients actually contribute to each rank.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies rank collapse in heterogeneous FedLoRA setups that assign different LoRA ranks to clients via SVD. Global averaging mixes updates without accounting for the fact that only some clients supply each rank level, so the influence of higher ranks shrinks by a fixed factor every round. This concentrates model energy in the smallest shared rank and hurts downstream accuracy. The authors derive the suppression rate theoretically from the mismatch between rank-agnostic weights and rank-dependent participation. They then introduce raFLoRA, which splits each update into rank partitions and re-weights each partition by the actual number of clients that provide it.

Core claim

Rank collapse arises because rank-agnostic aggregation weights mismatch with rank-dependent client contributions, which systematically suppresses higher-rank updates at a geometric rate over rounds. raFLoRA prevents the collapse by decomposing local updates into rank partitions and aggregating each partition weighted by its effective client contributions.

What carries the argument

Rank-partitioned aggregation that weights each rank level separately by the number of clients actually supplying updates at that rank.

If this is right

Global model energy concentrates in the minimum shared rank instead of using the full capacity of higher ranks.
Performance degrades on vision, language, and reasoning tasks relative to homogeneous-rank baselines.
Sensitivity to chosen rank configurations increases, making tuning brittle.
raFLoRA restores full-rank utilization and improves accuracy while remaining robust to varying heterogeneity patterns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same weight-participation mismatch could appear in any federated optimizer that mixes updates of unequal dimensionality.
Replacing SVD rank selection with random or data-driven allocation might slow or eliminate the collapse even without changing aggregation.
Under very low client participation rates the geometric factor becomes more severe, suggesting the method's benefit grows with scale.

Load-bearing premise

SVD-based rank allocation across clients produces contributions whose geometric suppression rate still governs behavior under real non-IID data and partial participation.

What would settle it

Track the fraction of Frobenius energy in successive singular values of the global update over rounds in a controlled heterogeneous FedLoRA run and verify whether it concentrates exactly in the minimum shared rank at the predicted geometric speed.

Figures

Figures reproduced from arXiv: 2602.13486 by Fei Wu, Geyong Min, Jia Hu, Shiqiang Wang.

**Figure 1.** Figure 1: FedLoRA with heterogeneous ranks. The global update is aggregated and allocated across clients with different ranks. data continue to grow. Federated learning (FL) (McMahan et al., 2017) enables privacy-preserving collaboration to effectively leverage distributed private data for training high-quality models, therefore, FL has been combined with fine-tuning of FMs in recent works. However, the prohibitive… view at source ↗

**Figure 2.** Figure 2: a. We term this behavior rank collapse, where the collapse reflects the concentration of the energy spectrum, rather than a reduction in algebraic rank. This phenomenon leads to consistently suboptimal performance across diverse non-IID data settings and induces high sensitivity to the shared rank, as shown in Figures 2c and 2d. Therefore, this study aims to address the following research question: How to … view at source ↗

**Figure 3.** Figure 3: Illustration of rank collapse. The shared-rank directions are fully aggregated, while the higher-rank directions are diluted. where the initial energy imbalance constant C and the convergence rate γ are given by C = Prmax j=r1+1 e (0) j Pr1 i=1 e (0) i , γ = qr1+1 qr1 < 1. (6) Here qi = β 2h(pi) is the sampling-induced contraction factor, where pi denotes the rank coverage rate as defined previously, and … view at source ↗

**Figure 4.** Figure 4: Illustration of the rank-partitioned aggregation. Each partition is weighted by its effective participating clients. be further reinforced by data heterogeneity. Relaxing Assumptions 4.2–4.3 causes clients to update different local subspaces, inducing drift in the singular directions across rounds. As a result, while the top-r1 directions are broadly shared and remain stable under aggregation, higher-rank… view at source ↗

**Figure 6.** Figure 6: Higher-rank energy ratio of the global update over 10 FL rounds under different data heterogeneity settings. higher-rank energy in HetLoRA and FlexLoRA decays to nearly zero, with HetLoRA exhibiting a faster decay due to the aggregation bias. In contrast, FLoRA retains 27% of higher-rank energy, as its low-rank updates are reset each round and therefore do not accumulate cross-round attenuation. By compar… view at source ↗

**Figure 7.** Figure 7: Sensitivity and robustness analyses of raFLoRA and FlexLoRA under different settings. In (a), data heterogeneity increases from left to right. In (b), the client participation rate increases. The detailed configurations for (c) and (d) are provided in Section 6.5. parable to strong baselines with only modest additional computational overhead. In particular, raFLoRA reduces communication cost to 18% of that… view at source ↗

**Figure 8.** Figure 8: Rank-wise energy ratios of HetLoRA and FLoRA on CIFAR100 using ViT under the patho-c20a1 data partitioning. As shown in Figure 8a, rank collapse is most pronounced in HetLoRA. Within fewer than 10% of the total training rounds, the update energy concentrates almost entirely on the minimum rank. This behavior is driven by two compounding factors. First, heterogeneous rank participation induces a rank-wise a… view at source ↗

**Figure 9.** Figure 9: Extended experiments comparing raFLoRA and FlexLoRA under varying rank configurations and LoRA module insertion settings. Effect of additional rank configurations. We evaluate a broader range of rank configurations, including conf-6 {8, 12, 16, 20, 24}, conf-7 {4, 8, 16, 32, 64}, and conf-8 {1, 4, 16, 64, 256}, which exhibit progressively larger rank gaps. As shown in Figure 9a, raFLoRA consistently achiev… view at source ↗

read the original abstract

Federated low-rank adaptation (FedLoRA) has facilitated communication-efficient and privacy-preserving fine-tuning of foundation models for downstream tasks. In practical federated learning scenarios, client heterogeneity in system resources and data distributions motivates the use of heterogeneous LoRA ranks across clients. However, we identify a previously overlooked phenomenon in heterogeneous FedLoRA with SVD-based allocation, termed rank collapse, where the energy of the global update becomes concentrated in the minimum shared rank, resulting in suboptimal performance and high sensitivity to rank configurations. Through theoretical analysis, we reveal the root cause of rank collapse: a mismatch between rank-agnostic aggregation weights and rank-dependent client contributions, which systematically suppresses higher-rank updates at a geometric rate over rounds. Motivated by this insight, we propose raFLoRA, a rank-partitioned aggregation method that decomposes local updates into rank partitions and then aggregates each partition weighted by its effective client contributions. Extensive experiments across vision, language, and reasoning tasks show that raFLoRA prevents rank collapse, improves model performance, and enhances robustness across diverse heterogeneous configurations compared with strong FedLoRA baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags a real rank collapse issue in heterogeneous FedLoRA from rank-agnostic aggregation and offers a partitioned weighting fix that improves results in their tests.

read the letter

The paper's core finding is that rank collapse occurs in heterogeneous FedLoRA because standard aggregation weights ignore the fact that clients with higher ranks contribute more to those dimensions, causing higher-rank components to decay geometrically. Their raFLoRA method fixes this by partitioning the updates by rank and weighting each partition according to effective contributions. This is new: the explicit identification of rank collapse and the partitioned aggregation technique. Prior FedLoRA work apparently didn't flag this under SVD-based rank allocation. The work does well in laying out the theoretical root cause and running experiments across vision, language, and reasoning tasks that show improved performance and robustness to different rank setups. A soft spot is the assumption in the theory that client contributions are primarily rank-dependent in a deterministic way. Under the paper's own non-IID and partial participation setting, data heterogeneity could modulate the actual update strengths per rank, which might affect how accurately the geometric rate predicts behavior. The experiments mitigate this by testing heterogeneous configs, but the derivation's generality could be tighter. This paper is aimed at practitioners and researchers working on communication-efficient fine-tuning in federated environments with resource-heterogeneous clients. It offers a targeted solution to a barrier that shows up in real deployments. It deserves serious referee attention because the problem is practical, the proposed method is straightforward to implement, and the results indicate meaningful gains without obvious overclaiming. Recommendation: Yes, send to peer review.

Referee Report

2 major / 2 minor

Summary. The paper identifies a phenomenon called rank collapse in heterogeneous FedLoRA with SVD-based rank allocation across clients. It attributes this to a mismatch between rank-agnostic aggregation weights (as in FedAvg) and rank-dependent client contributions, which causes higher-rank updates to be suppressed geometrically over communication rounds. The authors propose raFLoRA, a rank-partitioned aggregation scheme that weights each rank partition by its effective client contributions, and validate the approach with experiments on vision, language, and reasoning tasks showing improved performance and robustness to heterogeneous rank configurations.

Significance. If the theoretical analysis holds under the paper's assumptions, the work offers a concrete mechanistic explanation for a practical failure mode in federated fine-tuning of large models when clients have heterogeneous resources and data. The proposed raFLoRA method directly targets this issue and appears to enhance robustness, which is relevant for real-world deployments. The multi-task experimental validation is a positive point, though the significance is tempered by the need to confirm the derivation's applicability beyond the idealized setting.

major comments (2)

[Theoretical analysis] Theoretical analysis section (derivation of geometric suppression): The root-cause claim relies on modeling client contributions as deterministic functions of chosen rank alone under full participation. This does not account for how non-IID data modulates effective update magnitudes through local gradient norms projected onto each rank subspace, which can amplify or attenuate the suppression rate; a sensitivity analysis or extended bound incorporating data heterogeneity is needed to support the claim for the regimes where the method is applied.
[Experiments] Experimental validation: The reported improvements and reduced sensitivity to rank configurations are shown, but the setup details on partial client participation (standard in FL) and the precise post-hoc rank allocation procedure are not fully specified, making it difficult to assess whether the geometric suppression prediction generalizes to the heterogeneous non-IID cases emphasized in the abstract.

minor comments (2)

[Method] Notation: Ensure consistent use of symbols for rank partitions and aggregation weights between the theoretical derivation and the raFLoRA algorithm description.
[Abstract] Abstract: The acronym 'raFLoRA' should be expanded on first mention for readers unfamiliar with the proposal.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Theoretical analysis] Theoretical analysis section (derivation of geometric suppression): The root-cause claim relies on modeling client contributions as deterministic functions of chosen rank alone under full participation. This does not account for how non-IID data modulates effective update magnitudes through local gradient norms projected onto each rank subspace, which can amplify or attenuate the suppression rate; a sensitivity analysis or extended bound incorporating data heterogeneity is needed to support the claim for the regimes where the method is applied.

Authors: We agree that the current derivation isolates the geometric suppression under deterministic rank-dependent contributions and full participation. To address the impact of non-IID data, we will add a new subsection in the theoretical analysis that performs a sensitivity analysis. This will model the modulation of update magnitudes via local gradient norms projected onto rank subspaces and derive an extended bound showing that the geometric suppression rate persists (with data-dependent constants) under heterogeneous data distributions. The revised analysis will explicitly connect to the non-IID regimes emphasized in the abstract and experiments. revision: yes
Referee: [Experiments] Experimental validation: The reported improvements and reduced sensitivity to rank configurations are shown, but the setup details on partial client participation (standard in FL) and the precise post-hoc rank allocation procedure are not fully specified, making it difficult to assess whether the geometric suppression prediction generalizes to the heterogeneous non-IID cases emphasized in the abstract.

Authors: We acknowledge the need for fuller experimental details. In the revision we will expand the experimental setup section to specify the client participation rate (e.g., 10% random subset per round), the exact SVD-based post-hoc rank allocation algorithm, and the non-IID data partitioning procedure. We will also add new experiments under partial participation on the vision, language, and reasoning tasks to directly verify that the predicted geometric suppression occurs and that raFLoRA mitigates it in these realistic heterogeneous non-IID settings. revision: yes

Circularity Check

0 steps flagged

No significant circularity in rank collapse derivation

full rationale

The paper's central theoretical claim traces rank collapse to a mismatch between rank-agnostic aggregation weights and rank-dependent client contributions, yielding geometric suppression. This derivation is presented as arising directly from the FedAvg-style mechanics and SVD-based rank allocation under heterogeneity. No equations or steps reduce by construction to fitted parameters, self-referential definitions, or load-bearing self-citations. The analysis appears self-contained against the stated aggregation rules and does not rely on prior author results as an unverified uniqueness theorem. This is the normal case of an independent derivation grounded in the problem setup.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only; relies on standard federated learning assumptions about client heterogeneity and participation without introducing new free parameters or invented entities.

axioms (1)

domain assumption Standard assumptions on client data heterogeneity and partial participation in federated learning
Implicit in the FedLoRA setup and SVD-based allocation described

pith-pipeline@v0.9.0 · 5500 in / 1158 out tokens · 28064 ms · 2026-05-15T22:01:18.825365+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 4 internal anchors

[1]

SLoRA: Fed- erated parameter efficient fine-tuning of language models

Babakniya, S., Elkordy, A., Ezzeldin, Y ., Liu, Q., Song, K.-B., EL-Khamy, M., and Avestimehr, S. SLoRA: Fed- erated parameter efficient fine-tuning of language models. InInternational Workshop on Federated Learning in the Age of Foundation Models in Conjunction with NeurIPS 2023,

work page 2023
[2]

J., Liu, L., Xu, Z., Fahrezi, A., and Joshi, G

Cho, Y . J., Liu, L., Xu, Z., Fahrezi, A., and Joshi, G. Het- erogeneous LoRA for federated fine-tuning of on-device foundation models. InProceedings of the 2024 Confer- ence on Empirical Methods in Natural Language Pro- cessing,

work page 2024
[3]

BoolQ: Exploring the surprising difficulty of natural yes/no questions

Clark, C., Lee, K., Chang, M.-W., Kwiatkowski, T., Collins, M., and Toutanova, K. BoolQ: Exploring the surprising difficulty of natural yes/no questions. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, Volume 1 (Long and Short Papers), pp. 2924–2936,

work page 2019
[4]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., and Tafjord, O. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Training Verifiers to Solve Math Word Problems

Cobbe, K., Kosaraju, V ., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

The Llama 3 Herd of Models

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

LLM-adapters: An adapter family for parameter-efficient fine-tuning of large lan- guage models

Hu, Z., Wang, L., Lan, Y ., Xu, W., Lim, E.-P., Bing, L., Xu, X., Poria, S., and Lee, R. LLM-adapters: An adapter family for parameter-efficient fine-tuning of large lan- guage models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 5254–5276,

work page 2023
[8]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Y ., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V . RoBERTa: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692,

work page internal anchor Pith review Pith/arXiv arXiv 1907
[9]

Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

Meta AI. Llama 3.2: Revolutionizing edge AI and vision with open, customizable models. https://ai.meta. com/blog/llama-3-2-connect-2024-visio n-edge-mobile-devices/, September

work page 2024
[10]

Can a suit of armor conduct electricity? a new dataset for open book question answering

Mihaylov, T., Clark, P., Khot, T., and Sabharwal, A. Can a suit of armor conduct electricity? a new dataset for open book question answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2381–2391,

work page 2018
[11]

Social IQa: Commonsense reasoning about social interactions

Sap, M., Rashkin, H., Chen, D., Le Bras, R., and Choi, Y . Social IQa: Commonsense reasoning about social interactions. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing 9 Preventing Rank Collapse in Federated Low-Rank Adaptation with Client Heterogeneity and the 9th International Joint Conference on Natural Language...

work page 2019
[12]

Towards building the fed- eratedgpt: Federated instruction tuning

Zhang, J., Vahidian, S., Kuo, M., Li, C., Zhang, R., Yu, T., Wang, G., and Chen, Y . Towards building the fed- eratedgpt: Federated instruction tuning. InICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),

work page 2024
[13]

When δ2 i is negligible, energy concentrates toward the rank-r1 subspace at least as fast as in the basic analysis

are progressively suppressed, so rank collapse persists under general non-IID settings. When δ2 i is negligible, energy concentrates toward the rank-r1 subspace at least as fast as in the basic analysis. When δ2 i >0 , higher-rank energies are bounded by floors of orderδ 2 i /(1−q ′ i), preserving the qualitative behavior predicted by the basic setting. C...

work page 2024
[14]

Models are fine-tuned on Commonsense-15K and evaluated separately on the test sets of each of the eight tasks

benchmark, which comprises eight sub-tasks: BoolQ (Clark et al., 2019), PIQA (Bisk et al., 2020), SIQA (Sap et al., 2019), HellaSwag (Zellers et al., 2019), Winogrande (Sakaguchi et al., 2021), ARC-Easy and ARC-Challenge (Clark et al., 2018), and OpenBookQA (Mihaylov et al., 2018). Models are fine-tuned on Commonsense-15K and evaluated separately on the t...

work page 2019

[1] [1]

SLoRA: Fed- erated parameter efficient fine-tuning of language models

Babakniya, S., Elkordy, A., Ezzeldin, Y ., Liu, Q., Song, K.-B., EL-Khamy, M., and Avestimehr, S. SLoRA: Fed- erated parameter efficient fine-tuning of language models. InInternational Workshop on Federated Learning in the Age of Foundation Models in Conjunction with NeurIPS 2023,

work page 2023

[2] [2]

J., Liu, L., Xu, Z., Fahrezi, A., and Joshi, G

Cho, Y . J., Liu, L., Xu, Z., Fahrezi, A., and Joshi, G. Het- erogeneous LoRA for federated fine-tuning of on-device foundation models. InProceedings of the 2024 Confer- ence on Empirical Methods in Natural Language Pro- cessing,

work page 2024

[3] [3]

BoolQ: Exploring the surprising difficulty of natural yes/no questions

Clark, C., Lee, K., Chang, M.-W., Kwiatkowski, T., Collins, M., and Toutanova, K. BoolQ: Exploring the surprising difficulty of natural yes/no questions. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, Volume 1 (Long and Short Papers), pp. 2924–2936,

work page 2019

[4] [4]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., and Tafjord, O. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Training Verifiers to Solve Math Word Problems

Cobbe, K., Kosaraju, V ., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

The Llama 3 Herd of Models

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

LLM-adapters: An adapter family for parameter-efficient fine-tuning of large lan- guage models

Hu, Z., Wang, L., Lan, Y ., Xu, W., Lim, E.-P., Bing, L., Xu, X., Poria, S., and Lee, R. LLM-adapters: An adapter family for parameter-efficient fine-tuning of large lan- guage models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 5254–5276,

work page 2023

[8] [8]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Y ., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V . RoBERTa: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692,

work page internal anchor Pith review Pith/arXiv arXiv 1907

[9] [9]

Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

Meta AI. Llama 3.2: Revolutionizing edge AI and vision with open, customizable models. https://ai.meta. com/blog/llama-3-2-connect-2024-visio n-edge-mobile-devices/, September

work page 2024

[10] [10]

Can a suit of armor conduct electricity? a new dataset for open book question answering

Mihaylov, T., Clark, P., Khot, T., and Sabharwal, A. Can a suit of armor conduct electricity? a new dataset for open book question answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2381–2391,

work page 2018

[11] [11]

Social IQa: Commonsense reasoning about social interactions

Sap, M., Rashkin, H., Chen, D., Le Bras, R., and Choi, Y . Social IQa: Commonsense reasoning about social interactions. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing 9 Preventing Rank Collapse in Federated Low-Rank Adaptation with Client Heterogeneity and the 9th International Joint Conference on Natural Language...

work page 2019

[12] [12]

Towards building the fed- eratedgpt: Federated instruction tuning

Zhang, J., Vahidian, S., Kuo, M., Li, C., Zhang, R., Yu, T., Wang, G., and Chen, Y . Towards building the fed- eratedgpt: Federated instruction tuning. InICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),

work page 2024

[13] [13]

When δ2 i is negligible, energy concentrates toward the rank-r1 subspace at least as fast as in the basic analysis

are progressively suppressed, so rank collapse persists under general non-IID settings. When δ2 i is negligible, energy concentrates toward the rank-r1 subspace at least as fast as in the basic analysis. When δ2 i >0 , higher-rank energies are bounded by floors of orderδ 2 i /(1−q ′ i), preserving the qualitative behavior predicted by the basic setting. C...

work page 2024

[14] [14]

Models are fine-tuned on Commonsense-15K and evaluated separately on the test sets of each of the eight tasks

benchmark, which comprises eight sub-tasks: BoolQ (Clark et al., 2019), PIQA (Bisk et al., 2020), SIQA (Sap et al., 2019), HellaSwag (Zellers et al., 2019), Winogrande (Sakaguchi et al., 2021), ARC-Easy and ARC-Challenge (Clark et al., 2018), and OpenBookQA (Mihaylov et al., 2018). Models are fine-tuned on Commonsense-15K and evaluated separately on the t...

work page 2019