Multigrade Neural Network Approximation

Shijun Zhang; Yuesheng Xu; Zuowei Shen

arxiv: 2601.16884 · v3 · pith:UTAUUML3new · submitted 2026-01-23 · 💻 cs.LG · cs.NA· math.NA· stat.ML

Multigrade Neural Network Approximation

Shijun Zhang , Zuowei Shen , Yuesheng Xu This is my paper

classification 💻 cs.LG cs.NAmath.NAstat.ML

keywords approximationdeepmultigradenetworksgrademgdlneuralrefinement

0 comments

read the original abstract

We study multigrade deep learning (MGDL) as a principled framework for structured error refinement in deep neural networks. While the approximation power of neural networks is now relatively well understood, training very deep architectures remains challenging due to highly nonconvex and often ill-conditioned optimization landscapes. In contrast, for relatively shallow networks, most notably certain one-hidden-layer ReLU models, training admits convex reformulations with global guarantees under appropriate settings, motivating learning paradigms that improve stability while scaling to depth. MGDL builds on this insight by training deep networks grade by grade: previously learned grades are frozen, and each newly added grade-wise subnetwork is composed on top of the previously learned grades and trained to fit the residual left by the current approximation, yielding a structured and interpretable hierarchical refinement process. We develop an operator-theoretic foundation for MGDL and prove that, for any continuous target function defined on a hypercube, there exists a fixed-width multigrade ReLU scheme whose residuals are pointwise nonincreasing in magnitude and converge uniformly to zero, with strict $L^p$-norm decay at every nontrivial grade for $p\in [1,\infty)$. To the best of our knowledge, this work provides the first rigorous constructive approximation guarantee showing that a grade-wise residual refinement scheme can achieve vanishing error in a fixed-width multigrade ReLU architecture.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Geometric Layer-wise Approximation Rates for Deep Networks
cs.LG 2026-04 unverdicted novelty 7.0

A shared mixed-activation network of width 2dN+d+2 yields layer-wise L^p approximation rates bounded by the modulus of continuity at geometric scale N^{-ℓ}, reducing to (2d+1)N^{-ℓ} for 1-Lipschitz targets.