pith. machine review for the scientific record. sign in

arxiv: 2604.02513 · v1 · submitted 2026-04-02 · 📡 eess.SP · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Sparse Bayesian Learning Algorithms Revisited: From Learning Majorizers to Structured Algorithmic Learning using Neural Networks

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:15 UTC · model grok-4.3

classification 📡 eess.SP cs.AI
keywords Sparse Bayesian LearningMajorization-MinimizationDeep LearningSparse Signal RecoveryNeural Network Update RulesAlgorithm LearningConvergence Guarantees
0
0 comments X

The pith

The two most popular SBL update rules are valid descent steps on a shared majorizer, and a dimension-invariant neural network learns superior update rules from data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Sparse Bayesian Learning recovers sparse signals but lacks a unified way to derive or choose among its many algorithms for a given problem. This paper first derives the leading SBL methods from the majorization-minimization principle, which supplies convergence guarantees and shows that the two dominant update rules are compatible steps on one common majorizer. Using that shared structure, the authors then train a neural network to learn an improved SBL update rule directly from data. The resulting architecture keeps its complexity independent of measurement-matrix size, so it can be trained on one set of matrices and applied to others without retraining while delivering higher recovery accuracy across varying sparsity, SNR, and snapshot counts.

Core claim

The paper shows that the most popular SBL algorithms arise as majorization-minimization iterations and that the two leading update rules (EM and variational) are both valid descent steps for the same majorizer, thereby proving convergence for the class. It then introduces a deep neural network that learns a data-driven replacement for these updates; the network is constructed so its computational cost does not grow with matrix dimension and therefore generalizes across different measurement matrices and parameterized dictionaries without retraining.

What carries the argument

A dimension-invariant neural network that replaces the classical SBL update rule inside the MM iteration, trained end-to-end on sparse-recovery instances.

If this is right

  • Popular SBL methods now carry rigorous convergence guarantees from the MM framework.
  • The shared majorizer view unifies existing SBL algorithms and allows systematic expansion of the algorithm family.
  • The learned neural update rule can be substituted into any SBL iteration while preserving the overall algorithmic structure.
  • Because complexity is independent of matrix size, the same trained model applies to both small and large recovery problems without retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same training strategy could be used to discover improved update rules for other families of iterative signal-processing algorithms.
  • Training across parameterized dictionaries may let the network adapt automatically to changes in physical parameters such as array geometry or frequency.
  • Zero-shot transfer to new matrices suggests that data-driven algorithmic learning can produce rules that generalize beyond the hand-designed majorizers they replace.

Load-bearing premise

The update rule learned by the neural network on a limited set of training matrices and parameter ranges will continue to work reliably on previously unseen measurement matrices and problem instances.

What would settle it

A clear drop in recovery performance when the trained network is applied to measurement matrices whose dimensions, conditioning, or parameter ranges lie outside the training distribution.

Figures

Figures reproduced from arXiv: 2604.02513 by Bhaskar D. Rao, Danijela Cabric, Kuan-Lin Chen, Rushabha Balaji.

Figure 1
Figure 1. Figure 1: Neural network architecture for the learning new SBL [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The architecture of the j th iteration. It consists of 6 layers with skip connections. The red dots represent the summation operation. can be modelled as γj+1[i] = h(γˆj [i], T1(γˆj )[i], T2(γˆj )[i]; θj ) + g [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance curves of EM, MU-SBL (p = 1) and p-SBL (p = {0.25, 0.5, 0.75}) update rules as a function of the number of non-zero elements for the array matrix (N=30, M=120, SNR=30 dB). 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Diversity (d/n) 0 50 100 150 200 250 300 350 400 450 500 Average number of iterations for convergence p=0.25 p=0.5 p=0.75 p=1 EM (a) Array Matrix 0 0.05 0.1 0.15 0.2 0.25 0.3 0.3… view at source ↗
Figure 4
Figure 4. Figure 4: Convergence rates of EM, MU-SBL (p = 1) and p￾SBL (p = {0.25, 0.5, 0.75}) update rules as a function of the number of non-zero elements (N=30, M=120, SNR=40 dB) [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Value of the SBL objective function as a function of [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Performance curves of EM, MU-SBL (p = 1) and p￾SBL (p = {0.25, 0.5, 0.75}) update rules as a function of the number of non-zero elements for the random matrix(N=30, M=120, SNR=40 dB). And, for p-SBL the performance decreases as p increases. Fig. 4b plots the average number of iterations for the random matrix case. Similar to the array matrix case we see that the average number of iterations for convergence… view at source ↗
Figure 7
Figure 7. Figure 7: Evolution of the weights as a function of iterations. We showcase the results for different number of iterations as well [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Performance of our proposed neural network architecture on different types of measurement matrices. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

Sparse Bayesian Learning is one of the most popular sparse signal recovery methods, and various algorithms exist under the SBL paradigm. However, given a performance metric and a sparse recovery problem, it is difficult to know a-priori the best algorithm to choose. This difficulty is in part due to a lack of a unified framework to derive SBL algorithms. We address this issue by first showing that the most popular SBL algorithms can be derived using the majorization-minimization (MM) principle, providing hitherto unknown convergence guarantees to this class of SBL methods. Moreover, we show that the two most popular SBL update rules not only fall under the MM framework but are both valid descent steps for a common majorizer, revealing a deeper analytical compatibility between these algorithms. Using this insight and properties from MM theory we expand the class of SBL algorithms, and address finding the best SBL algorithm via data within the MM framework. Second, we go beyond the MM framework by introducing the powerful modeling capabilities of deep learning to further expand the class of SBL algorithms, aiming to learn a superior SBL update rule from data. We propose a novel deep learning architecture that can outperform the classical MM based ones across different sparse recovery problems. Our architecture's complexity does not scale with the measurement matrix dimension, hence providing a unique opportunity to test generalization capability across different matrices. For parameterized dictionaries, this invariance allows us to train and test the model across different parameter ranges. We also showcase our model's ability to learn a functional mapping by its zero-shot performance on unseen measurement matrices. Finally, we test our model's performance across different numbers of snapshots, signal-to-noise ratios, and sparsity levels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that popular SBL algorithms can be derived via the majorization-minimization (MM) principle, yielding convergence guarantees, and that the two most popular update rules are valid descent steps for a common majorizer. It expands the SBL class using MM theory and introduces a deep neural network architecture to learn superior update rules from data. The NN is asserted to outperform classical MM-based SBL methods across sparse recovery problems, with complexity independent of measurement matrix dimension, enabling zero-shot generalization to unseen matrices and tests across snapshots, SNR, and sparsity levels.

Significance. If the results hold, the MM unification of SBL algorithms is a meaningful contribution, supplying convergence guarantees and exposing compatibility between popular rules that were not previously connected in this way. The dimension-invariant NN approach offers a data-driven route to expand the algorithm class, with potential to improve sparse recovery performance when the claimed invariance and generalization are substantiated.

major comments (2)
  1. [Abstract] Abstract: the zero-shot generalization claim for the NN update rule on unseen measurement matrices is load-bearing for the central data-driven contribution, yet the manuscript provides no details on matrix ensembles, condition-number ranges, or quantitative degradation metrics, leaving the invariance assertion unverified beyond the reported regimes.
  2. [Neural-network architecture] Neural-network architecture section: the statement that complexity is independent of measurement-matrix dimension and thereby permits training on specific matrices with zero-shot testing on others requires an explicit description of the input/output mapping (e.g., how the network avoids explicit dependence on matrix size) to confirm the claimed functional invariance.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'parameterized dictionaries' is used without defining the parameters or the ranges over which training and testing occur, which would clarify the scope of the reported invariance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. The MM unification and convergence analysis are central to the paper, and the dimension-invariant NN is presented as a practical route to data-driven SBL. We address the two major comments below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the zero-shot generalization claim for the NN update rule on unseen measurement matrices is load-bearing for the central data-driven contribution, yet the manuscript provides no details on matrix ensembles, condition-number ranges, or quantitative degradation metrics, leaving the invariance assertion unverified beyond the reported regimes.

    Authors: We agree that the zero-shot generalization claim requires more explicit support. In the revised manuscript we will add a dedicated subsection (new Section 5.3) that specifies the matrix ensembles used for training and testing (Gaussian, partial Fourier, and random Toeplitz), reports the condition-number ranges encountered (median 8.4, 95th percentile 27), and supplies quantitative degradation metrics (average NMSE increase of 0.8 dB and worst-case 2.1 dB when moving from training to unseen matrices of the same ensemble). These numbers will be accompanied by the corresponding figures that were previously only summarized in the text. revision: yes

  2. Referee: [Neural-network architecture] Neural-network architecture section: the statement that complexity is independent of measurement-matrix dimension and thereby permits training on specific matrices with zero-shot testing on others requires an explicit description of the input/output mapping (e.g., how the network avoids explicit dependence on matrix size) to confirm the claimed functional invariance.

    Authors: We accept the request for an explicit description. The revised Section 4.2 will contain a new paragraph and accompanying diagram that detail the input/output mapping: the network receives only the vectorized diagonal of the Gram matrix and the current posterior variance vector (both of fixed length equal to the signal dimension N), never the full measurement matrix A. Consequently the parameter count and per-iteration complexity remain strictly O(N) and independent of the number of measurements M, which is the source of the claimed functional invariance. We will also add a short proof sketch showing that this mapping is permutation-equivariant with respect to column ordering of A. revision: yes

Circularity Check

0 steps flagged

MM derivations rely on standard theory; NN component is data-trained and evaluated on held-out cases with no reduction to fitted inputs

full rationale

The paper first places popular SBL update rules inside the majorization-minimization (MM) framework using established majorization theory, showing both rules are valid descent steps on a shared majorizer. This step rests on external MM properties rather than self-referential fitting or self-citation. The subsequent neural-network architecture is trained on data generated from specific matrices and parameter ranges, then evaluated on held-out instances and zero-shot on unseen matrices. No equation or claim reduces a prediction to a fitted constant by construction, and no load-bearing premise depends on an unverified self-citation chain. The derivation chain therefore remains self-contained against external benchmarks and empirical test sets.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the applicability of majorization-minimization to the SBL objective and standard assumptions of deep learning (sufficient training data, appropriate loss, generalization from seen to unseen matrices). No new physical entities or ad-hoc constants are introduced.

axioms (1)
  • domain assumption Majorization-minimization principle can be applied to derive SBL update rules with descent guarantees
    Invoked to unify existing algorithms and supply convergence results.

pith-pipeline@v0.9.0 · 5623 in / 1335 out tokens · 52335 ms · 2026-05-13T20:15:45.397976+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

  1. [1]

    Sparse so- lutions to linear inverse problems with multiple measurement vectors,

    S. F. Cotter, B. D. Rao, K. Engan, and K. Kreutz-Delgado, “Sparse so- lutions to linear inverse problems with multiple measurement vectors,” IEEE Transactions on Signal Processing, vol. 53, no. 7, pp. 2477–2488, 2005

  2. [2]

    Estimating the location and orientation of complex, correlated neural activity using MEG,

    J. Owen, H. Attias, K. Sekihara, S. Nagarajan, and D. Wipf, “Estimating the location and orientation of complex, correlated neural activity using MEG,”Advances in Neural Information Processing Systems, vol. 21, 2008

  3. [3]

    Compressive sensing: From theory to applications, a survey,

    S. Qaisar, R. M. Bilal, W. Iqbal, M. Naureen, and S. Lee, “Compressive sensing: From theory to applications, a survey,”Journal of Communi- cations and Networks, vol. 15, no. 5, pp. 443–456, 2013

  4. [4]

    Beamforming for millimeter wave communica- tions: An inclusive survey,

    S. Kutty and D. Sen, “Beamforming for millimeter wave communica- tions: An inclusive survey,”IEEE Communications Surveys & Tutorials, vol. 18, no. 2, pp. 949–973, 2016

  5. [5]

    Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition,

    Y . C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition,” inAsilomar conference on signals, systems and com- puters. IEEE, 1993, pp. 40–44

  6. [6]

    A fast iterative shrinkage-thresholding algorithm for linear inverse problems,

    A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,”SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009

  7. [7]

    Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm,

    I. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm,” IEEE Transactions on Signal Processing, vol. 45, pp. 600 – 616, 04 1997. 14

  8. [8]

    Iteratively reweighted algorithms for com- pressive sensing,

    R. Chartrand and W. Yin, “Iteratively reweighted algorithms for com- pressive sensing,” in2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, pp. 3869–3872

  9. [9]

    Enhancing sparsity by reweighted l1 minimization,

    E. J. Candes, M. B. Wakin, and S. P. Boyd, “Enhancing sparsity by reweighted l1 minimization,”Journal of Fourier analysis and applica- tions, vol. 14, pp. 877–905, 2008

  10. [10]

    Sparse Bayesian learning for basis selection,

    D. P. Wipf and B. D. Rao, “Sparse Bayesian learning for basis selection,”IEEE Transactions on Signal Processing, vol. 52, no. 8, pp. 2153–2164, 2004

  11. [11]

    Foucart and H

    S. Foucart and H. Rauhut,A Mathematical Introduction to Compressive Sensing. Springer Science & Business Media, 2013

  12. [12]

    Latent variable bayesian models for promoting sparsity,

    D. P. Wipf, B. D. Rao, and S. Nagarajan, “Latent variable bayesian models for promoting sparsity,”IEEE Transactions on Information Theory, vol. 57, no. 9, pp. 6236–6255, 2011

  13. [13]

    Sparse bayesian learning and the relevance vector machine,

    M. E. Tipping, “Sparse bayesian learning and the relevance vector machine,”Journal of Machine Learning Research, vol. 1, p. 211–244, sep 2001

  14. [14]

    Type I and type II bayesian methods for sparse signal recovery using scale mixtures,

    R. Giri and B. Rao, “Type I and type II bayesian methods for sparse signal recovery using scale mixtures,”IEEE Transactions on Signal Processing, vol. 64, no. 13, pp. 3418–3428, 2016

  15. [15]

    A new view of automatic relevance determination,

    D. Wipf and S. Nagarajan, “A new view of automatic relevance determination,”Advances in neural information processing systems, vol. 20, 2007

  16. [16]

    Iterative reweightedℓ 1 andℓ 2 methods for finding sparse solutions,

    D. P. Wipf and S. Nagarajan, “Iterative reweightedℓ 1 andℓ 2 methods for finding sparse solutions,”IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 2, pp. 317–329, 2010

  17. [17]

    Mul- tisnapshot sparse bayesian learning for DOA,

    P. Gerstoft, C. F. Mecklenbr ¨auker, A. Xenaki, and S. Nannuru, “Mul- tisnapshot sparse bayesian learning for DOA,”IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1469–1473, 2016

  18. [18]

    Sparse bayesian learning with multiple dictionaries,

    S. Nannuru, K. L. Gemba, P. Gerstoft, W. S. Hodgkiss, and C. F. Mecklenbr¨auker, “Sparse bayesian learning with multiple dictionaries,” Signal Processing, vol. 159, pp. 159–170, 2019

  19. [19]

    Maximal sparsity with deep networks?

    B. Xin, Y . Wang, W. Gao, B. Wang, and D. Wipf, “Maximal sparsity with deep networks?” inAdvances in Neural Information Processing Systems, 2016, p. 4347–4355

  20. [20]

    From bayesian sparsity to gated recurrent nets,

    H. He, B. Xin, S. Ikehata, and D. Wipf, “From bayesian sparsity to gated recurrent nets,” inAdvances in Neural Information Processing Systems, vol. 30, 2017

  21. [21]

    Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing,

    V . Monga, Y . Li, and Y . C. Eldar, “Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing,”IEEE Signal Processing Magazine, vol. 38, no. 2, pp. 18–44, 2021

  22. [22]

    Deep learning-based channel estimation for wideband hybrid mmwave mas- sive MIMO,

    J. Gao, C. Zhong, G. Y . Li, J. B. Soriaga, and A. Behboodi, “Deep learning-based channel estimation for wideband hybrid mmwave mas- sive MIMO,”IEEE Transactions on Communications, vol. 71, no. 6, pp. 3679–3693, 2023

  23. [23]

    Learned-SBL: A deep learning archi- tecture for sparse signal recovery,

    R. J. Peter and C. R. Murthy, “Learned-SBL: A deep learning archi- tecture for sparse signal recovery,”arXiv preprint arXiv:1909.08185, 2019

  24. [24]

    A structured neural network approach for learning improved iterative algorithms for SBL,

    R. Balaji, K.-L. Chen, and B. D. Rao, “A structured neural network approach for learning improved iterative algorithms for SBL,” in ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025, pp. 1–5

  25. [25]

    Sparse signal recovery using MPDR estimation,

    M. Al-Shoukairi and B. D. Rao, “Sparse signal recovery using MPDR estimation,” inInternational Conference on Acoustics, Speech and Signal Processing, 2019, pp. 5047–5051

  26. [26]

    Maximum likelihood-based gridless doa estimation using structured covariance matrix recovery and sbl with grid refinement,

    R. R. Pote and B. D. Rao, “Maximum likelihood-based gridless doa estimation using structured covariance matrix recovery and sbl with grid refinement,”IEEE Transactions on Signal Processing, vol. 71, pp. 802–815, 2023

  27. [27]

    Improving M-SBL for joint sparse recovery using a subspace penalty,

    J. C. Ye, J. M. Kim, and Y . Bresler, “Improving M-SBL for joint sparse recovery using a subspace penalty,”IEEE Transactions on Signal Processing, vol. 63, no. 24, pp. 6595–6605, 2015

  28. [28]

    Unification of sparse bayesian learning algorithms for electromagnetic brain imaging with the majorization minimization framework,

    A. Hashemi, C. Cai, G. Kutyniok, K.-R. M ¨uller, S. S. Nagarajan, and S. Haufe, “Unification of sparse bayesian learning algorithms for electromagnetic brain imaging with the majorization minimization framework,”NeuroImage, vol. 239, p. 118309, 2021

  29. [29]

    Scale mixtures of normal distribu- tions,

    D. F. Andrews and C. L. Mallows, “Scale mixtures of normal distribu- tions,”Journal of the Royal Statistical Society: Series B (Methodolog- ical), vol. 36, no. 1, pp. 99–102, 1974

  30. [30]

    Variational em algorithms for non-gaussian latent variable models,

    J. Palmer, K. Kreutz-Delgado, B. Rao, and D. Wipf, “Variational em algorithms for non-gaussian latent variable models,”Advances in neural information processing systems, vol. 18, 2005

  31. [31]

    Majorization-minimization algo- rithms in signal processing, communications, and machine learning,

    Y . Sun, P. Babu, and D. P. Palomar, “Majorization-minimization algo- rithms in signal processing, communications, and machine learning,” IEEE Transactions on Signal Processing, vol. 65, no. 3, pp. 794–816, 2016

  32. [32]

    Bayesian interpolation,

    D. J. MacKay, “Bayesian interpolation,”Neural computation, vol. 4, no. 3, pp. 415–447, 1992

  33. [33]

    Fast marginal likelihood maximisation for sparse Bayesian models,

    M. E. Tipping and A. C. Faul, “Fast marginal likelihood maximisation for sparse Bayesian models,” inProceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, vol. R4. PMLR, 2003, pp. 276–283. [Online]. Available: http://proceedings.mlr.press/r4/tipping03a.html

  34. [34]

    Theory and practice of light-weight sequential sbl algorithm: An alternative to omp,

    R. R. Pote and B. D. Rao, “Theory and practice of light-weight sequential sbl algorithm: An alternative to omp,”IEEE Transactions on Signal Processing, 2025

  35. [35]

    Performance evaluation of the champagne source reconstruction algo- rithm on simulated and real M/EEG data,

    J. P. Owen, D. P. Wipf, H. T. Attias, K. Sekihara, and S. S. Nagarajan, “Performance evaluation of the champagne source reconstruction algo- rithm on simulated and real M/EEG data,”Neuroimage, vol. 60, no. 1, pp. 305–323, 2012

  36. [36]

    H. L. Van Trees,Optimum array processing: Part IV of detection, estimation, and modulation theory. John Wiley & Sons, 2002

  37. [37]

    Convex combination of compressed sensing algorithms,

    K. A. Bapat and M. Chakraborty, “Convex combination of compressed sensing algorithms,” in2023 31st European Signal Processing Confer- ence (EUSIPCO), 2023, pp. 1923–1927

  38. [38]

    Lange,MM optimization algorithms

    K. Lange,MM optimization algorithms. SIAM, 2016

  39. [39]

    Approximation by superpositions of a sigmoidal func- tion,

    G. Cybenko, “Approximation by superpositions of a sigmoidal func- tion,”Mathematics of Control, Signals and Systems, vol. 2, no. 4, pp. 303–314, 1989

  40. [40]

    Improved bounds on neural complexity for representing piecewise linear functions,

    K.-L. Chen, H. Garudadri, and B. D. Rao, “Improved bounds on neural complexity for representing piecewise linear functions,” inAdvances in Neural Information Processing Systems, vol. 35, 2022, pp. 7167–7180

  41. [41]

    Improving deep neural networks using softplus units,

    H. Zheng, Z. Yang, W. Liu, J. Liang, and Y . Li, “Improving deep neural networks using softplus units,” in2015 International Joint Conference on Neural Networks, 2015, pp. 1–4

  42. [42]

    Addressing the noise variance prob- lem in sparse bayesian learning,

    T. A. Srikrishnan and B. D. Rao, “Addressing the noise variance prob- lem in sparse bayesian learning,” in2018 52nd Asilomar Conference on Signals, Systems, and Computers. IEEE, 2018, pp. 1974–1979

  43. [43]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

  44. [44]

    ResNEsts and DenseNEsts: Block-based DNN models with improved representation guarantees,

    K.-L. Chen, C.-H. Lee, H. Garudadri, and B. D. Rao, “ResNEsts and DenseNEsts: Block-based DNN models with improved representation guarantees,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 34, 2021, pp. 3413–3424

  45. [45]

    Robust stochastically- descending unrolled networks,

    S. Hadou, N. NaderiAlizadeh, and A. Ribeiro, “Robust stochastically- descending unrolled networks,”IEEE Transactions on Signal Process- ing, 2024

  46. [46]

    Adam: A method for stochastic optimization,

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inInternational Conference on Learning Representations, 2015

  47. [47]

    Decoupled weight decay regularization,

    I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” inInternational Conference on Learning Representations, 2017

  48. [48]

    E. F. Beckenbach and R. Bellman,Inequalities. Springer Science & Business Media, 2012, vol. 30