Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate

Huangyu Xu; Jiaye Teng; Jingqin Yang; Qianqian Xu

arxiv: 2605.25134 · v1 · pith:A22SV6EJnew · submitted 2026-05-24 · 💻 cs.LG · cs.AI

Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate

Huangyu Xu , Jingqin Yang , Qianqian Xu , Jiaye Teng This is my paper

classification 💻 cs.LG cs.AI

keywords optimizationsparseapproachregularizationrewaadaptivedecayinstability

0 comments

read the original abstract

Sparse optimization is a fundamental challenge in various practical applications. A popular approach to sparse optimization is $\ell_p$ regularization. However, it may encounter optimization instability due to the unbounded gradients when $0<p<1$. In this paper, we introduce a novel approach to sparse optimization termed ReWA, based on Reparameterization, Weight decay, and Adaptive learning rate. ReWA is closely connected to $\ell_p$-regularization, yet it unveils a distinct optimization landscape that helps mitigate instability issues. Experiments on CIFAR-10 and ImageNet with ResNets demonstrate that ReWA leads to significant sparsity improvements over the $\ell_1$-regularization approach while preserving test accuracy.

This paper has not been read by Pith yet.

Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate

discussion (0)