Flora: Low-rank adapters are secretly gradient compressors

· 2024 · arXiv 2402.03293

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training

cs.LG · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

Low-rank pre-training methods converge to geometrically and spectrally distinct basins and show diverging activations compared to full-rank training at 60M-350M scales.

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

cs.LG · 2024-03-06 · conditional · novelty 7.0

GaLore performs full-parameter LLM training with up to 65.5% less optimizer memory by projecting gradients onto a low-rank subspace at each step, matching full-rank performance on LLaMA pre-training and RoBERTa fine-tuning.

CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

CR-Net uses cross-layer low-rank residuals in a dual-path network plus specialized recomputation to outperform prior low-rank methods on 60M-7B model pre-training while using less compute and memory.

Memory-Efficient Differentially Private Training with Gradient Random Projection

cs.LG · 2025-06-18 · conditional · novelty 6.0

DP-GRAPE reduces memory in differentially private neural network training by using random Gaussian projections on gradients instead of SVD, achieving comparable privacy-utility tradeoffs to DP-SGD and scaling to 6.7B parameter models.

Task-agnostic Low-rank Residual Adaptation for Efficient Federated Continual Fine-Tuning

cs.LG · 2025-05-18 · unverdicted · novelty 6.0

Fed-TaLoRA uses task-agnostic low-rank residual adaptation with post-aggregation calibration to enable efficient federated continual fine-tuning across sequential tasks under non-IID conditions.

GWT: Scalable Optimizer State Compression for Large Language Model Training

cs.LG · 2025-01-13 · unverdicted · novelty 6.0

GWT projects gradients into wavelet subspaces to compress optimizer states for memory-efficient LLM training while claiming performance parity with full-rank updates.

GiVA: Gradient-Informed Bases for Vector-Based Adaptation

cs.CL · 2026-04-23 · unverdicted · novelty 5.0

GiVA uses gradients to initialize vector adapters so they match LoRA performance at eight times lower rank while keeping extreme parameter efficiency.

Fed-DLoRA: Efficient Wireless Federated Learning with Dynamic Low-Rank Adaptation

cs.LG · 2026-04-27 · unverdicted · novelty 4.0

Fed-DLoRA combines low-rank adaptation with federated learning and an adaptive rank-bandwidth-vehicle selection algorithm to improve accuracy, convergence speed, and communication efficiency in wireless IoV environments.

citing papers explorer

Showing 8 of 8 citing papers.

Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training cs.LG · 2026-05-13 · unverdicted · none · ref 7 · 2 links
Low-rank pre-training methods converge to geometrically and spectrally distinct basins and show diverging activations compared to full-rank training at 60M-350M scales.
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection cs.LG · 2024-03-06 · conditional · none · ref 15
GaLore performs full-parameter LLM training with up to 65.5% less optimizer memory by projecting gradients onto a low-rank subspace at each step, matching full-rank performance on LLaMA pre-training and RoBERTa fine-tuning.
CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure cs.LG · 2025-09-23 · unverdicted · none · ref 21
CR-Net uses cross-layer low-rank residuals in a dual-path network plus specialized recomputation to outperform prior low-rank methods on 60M-7B model pre-training while using less compute and memory.
Memory-Efficient Differentially Private Training with Gradient Random Projection cs.LG · 2025-06-18 · conditional · none · ref 11
DP-GRAPE reduces memory in differentially private neural network training by using random Gaussian projections on gradients instead of SVD, achieving comparable privacy-utility tradeoffs to DP-SGD and scaling to 6.7B parameter models.
Task-agnostic Low-rank Residual Adaptation for Efficient Federated Continual Fine-Tuning cs.LG · 2025-05-18 · unverdicted · none · ref 22
Fed-TaLoRA uses task-agnostic low-rank residual adaptation with post-aggregation calibration to enable efficient federated continual fine-tuning across sequential tasks under non-IID conditions.
GWT: Scalable Optimizer State Compression for Large Language Model Training cs.LG · 2025-01-13 · unverdicted · none · ref 16
GWT projects gradients into wavelet subspaces to compress optimizer states for memory-efficient LLM training while claiming performance parity with full-rank updates.
GiVA: Gradient-Informed Bases for Vector-Based Adaptation cs.CL · 2026-04-23 · unverdicted · none · ref 56
GiVA uses gradients to initialize vector adapters so they match LoRA performance at eight times lower rank while keeping extreme parameter efficiency.
Fed-DLoRA: Efficient Wireless Federated Learning with Dynamic Low-Rank Adaptation cs.LG · 2026-04-27 · unverdicted · none · ref 29
Fed-DLoRA combines low-rank adaptation with federated learning and an adaptive rank-bandwidth-vehicle selection algorithm to improve accuracy, convergence speed, and communication efficiency in wireless IoV environments.

Flora: Low-rank adapters are secretly gradient compressors

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer