pith. sign in

arxiv: 2605.15456 · v1 · pith:GXUNGVJGnew · submitted 2026-05-14 · 📡 eess.IV · cs.CV· math.OC

DIPA: Distilled Preconditioned Algorithms for Solving Imaging Inverse Problems

Pith reviewed 2026-05-19 14:23 UTC · model grok-4.3

classification 📡 eess.IV cs.CVmath.OC
keywords preconditioningknowledge distillationinverse problemsimaging reconstructionMRIcompressed sensingsuper-resolution
0
0 comments X

The pith

Preconditioning operators learned by distilling from a teacher with an ideal sensing matrix improve both convergence speed and final reconstruction quality in ill-conditioned imaging problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that preconditioning can be repurposed from its traditional role of accelerating convergence to directly enhancing the quality of solutions in imaging inverse problems. It introduces DIPA, in which a preconditioning operator is optimized through teacher-guided distillation. The teacher operates with a simulated, better-conditioned sensing matrix while the student must use only the real, physically constrained, ill-conditioned matrix. Different distillation losses are designed to transfer convergence behavior and reconstruction properties from teacher to student. This matters because many real imaging systems cannot change their hardware to improve conditioning, so any method that extracts better performance from the existing matrix could raise image quality without new acquisitions.

Core claim

By optimizing a preconditioning operator via distillation criteria that let a teacher algorithm use a simulated better-conditioned sensing matrix, the resulting DIPA student algorithm achieves improved reconstruction quality and convergence even when it is restricted to the physically feasible ill-conditioned sensing matrix available in practice.

What carries the argument

The preconditioning operator (PO), which can be linear (L-DIPA) for interpretability or non-linear (N-DIPA) via a neural network for scalability, that transforms the gradient step of the underlying optimization algorithm.

If this is right

  • Linear POs allow direct interpretation of how the preconditioning modifies the data-fidelity gradient.
  • Non-linear POs parametrized by neural networks offer greater flexibility and scalability across different imaging modalities.
  • Different distillation loss functions can be chosen to emphasize transfer of either convergence speed or final image fidelity.
  • The approach is validated on magnetic resonance imaging, compressed sensing, and super-resolution tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same distillation principle could be tested on other linear inverse problems outside imaging, such as tomographic reconstruction or deconvolution.
  • If the learned PO captures properties that generalize across datasets, it might reduce the need for modality-specific hand-crafted priors.
  • Combining DIPA with existing plug-and-play or deep-unfolding frameworks could produce hybrid solvers that inherit both classical guarantees and learned performance.

Load-bearing premise

A teacher algorithm given access to a simulated better-conditioned sensing matrix can successfully transfer useful convergence and quality properties to a student algorithm that must operate only with the real ill-conditioned matrix.

What would settle it

On real measured data from MRI, compressed sensing, or super-resolution, the DIPA student produces reconstructions whose quality and convergence rate are statistically indistinguishable from or worse than the same base algorithm run without the learned preconditioning operator.

Figures

Figures reproduced from arXiv: 2605.15456 by Henry Arguello, Leon Suarez, Roman Jacome, Romario Gualdr\'on-Hurtado.

Figure 1
Figure 1. Figure 1: DIPA framework. A teacher algorithm uses a simulated, better-conditioned sensing matrix and transfers reconstruction behavior through output imitation and data-fidelity gradient alignment. The deployable student uses the physically feasible sensing matrix and a learned preconditioning operator (PO). optimization solvers to minimize the data-fidelity term and the implicit regularization function (defined by… view at source ↗
Figure 2
Figure 2. Figure 2: Natural logarithmic representation of the linear PO, log(𝑃 𝑂 + 1), for different tasks and resolutions. Baseline L-DIPA Teacher (22.69) (37.44) (37.98) (PSNR) [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visual results and PSNR for Compressive Sensing (128 × 128) with RED-FISTA, 𝛾𝑡 = 0.7, 𝛾𝑠 = 0.2, 𝐶 (✗), 𝑆 (✓) with the BSDS500 dataset [3]. learning rate 1 × 10−5 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Signal convergence with state-of-the-art preconditioning methods for SR task. Baseline L-DIPA N-DIPA Teacher (PSNR) (PSNR) (PSNR) [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visual results and PSNR for PnP-FISTA with different preconditioning methods in SPC, MRI, and SR. as ViT, despite their widely known potential, are not able to generalize the gradient mapping from student to teacher. Regardless of the configuration, the MLP has more parameters than 𝐏, but it does not perform adequately. ConvNeXt gives the best trade-off in this ablation, likely because its convolutional st… view at source ↗
Figure 6
Figure 6. Figure 6: Visual results and PSNR for Super Resolution (110 × 110) with RED-FISTA, 𝑅𝐹𝑡 = 1, 𝑅𝐹𝑠 = 4, 𝐶 (✓), 𝑆 (✗) with the CelebA dataset [48]. (a) (b) [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a) Ablation results in terms of PSNR of different NNs for SPC. (b) Number of features and positional encoding usage in ConvNeXt. 5. Limitations One important drawback of our approach is the heavy training cost, because we must back-propagate through every iteration of the recovery loop, and optimizing the preconditioner becomes computationally intensive. Another limitation is that DIPA cannot create infor… view at source ↗
Figure 8
Figure 8. Figure 8: Linear representation of the learned nonlinear PO with N-DIPA. on the training distribution, not as a guarantee of recovering arbitrary null-space components. Moreover, our teacher operator 𝐀𝑡 is currently selected via heuristic rules, leveraging the known structure of the sensing matrix and empirical validation, but a principled, theory-driven criterion for choosing or learning the optimal 𝐀𝑡 remains an i… view at source ↗
read the original abstract

Solving imaging inverse problems has usually been addressed by designing proper prior models of the underlying signal. However, minimizing the data fidelity term poses significant challenges due to the ill-conditioned sensing matrix caused by physical constraints in the acquisition system. Thus, preconditioning techniques have been adopted in classical optimization theory to address ill-conditioned data-fidelity minimization by transforming the algorithm gradient step to achieve faster convergence and better numerical stability. We extend the preconditioning concept beyond convergence acceleration and use it to improve reconstruction quality. We introduce DIPA: Distilled Preconditioned Algorithms, where a preconditioning operator (PO) is optimized using teacher-guided distillation criteria. Unlike standard model-compression KD, the teacher and student differ by the sensing operators available during reconstruction: the teacher uses a simulated, better-conditioned, and more informative sensing matrix, whereas the student uses the physically feasible sensing matrix. We design different distillation loss functions to transfer different properties of the teacher algorithm to the preconditioned student. The PO can be linear (L-DIPA), allowing interpretability, or non-linear (N-DIPA), parametrized by a neural network, offering better scalability. We validate the proposed PO design across several imaging modalities, including magnetic resonance imaging, compressed sensing, and super-resolution imaging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces DIPA (Distilled Preconditioned Algorithms) for imaging inverse problems. It extends preconditioning beyond convergence acceleration by optimizing a preconditioning operator (PO) via teacher-guided distillation. The teacher uses a simulated better-conditioned sensing matrix while the student uses the physical ill-conditioned matrix; distillation losses transfer convergence behavior and reconstruction quality. The PO may be linear (L-DIPA, for interpretability) or nonlinear (N-DIPA, via neural network). Validation is claimed across MRI, compressed sensing, and super-resolution.

Significance. If the central claim holds, the work provides a novel mechanism to improve reconstruction quality in physically constrained inverse problems by distilling from a more informative simulated operator. This goes beyond standard preconditioning and model compression, with potential applicability to multiple modalities. The distinction between linear and nonlinear POs and the explicit teacher-student sensing-matrix mismatch are clear strengths.

major comments (2)
  1. [Validation / Experiments] The central claim that distillation transfers reconstruction-quality improvements (not merely convergence speed) to the student on the physical sensing matrix is load-bearing but rests on the unverified assumption that the teacher's trajectory and output statistics remain relevant to the student's data-fidelity term. Experiments must demonstrate that the learned PO yields measurably higher fidelity to the actual measurements (e.g., via data-consistency metrics or PSNR gains over non-distilled preconditioners) rather than simply mimicking the teacher on a different problem.
  2. [Method / Distillation criteria] The distillation loss weights are listed as free parameters; the manuscript should report sensitivity analysis or cross-validation showing that quality gains are robust rather than tuned to specific weight choices that favor the teacher.
minor comments (2)
  1. [Abstract] The abstract states validation 'across several imaging modalities' but provides no quantitative results, error bars, or ablation tables; these must appear in the main text with clear baselines (standard preconditioners, non-distilled methods, and the teacher itself).
  2. [Method] Notation for the preconditioning operator (PO) and the two sensing matrices should be introduced with explicit symbols early in the method section to avoid ambiguity when comparing teacher and student updates.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and recommendation for major revision. We address the major comments below by providing additional experiments and analysis in the revised manuscript.

read point-by-point responses
  1. Referee: [Validation / Experiments] The central claim that distillation transfers reconstruction-quality improvements (not merely convergence speed) to the student on the physical sensing matrix is load-bearing but rests on the unverified assumption that the teacher's trajectory and output statistics remain relevant to the student's data-fidelity term. Experiments must demonstrate that the learned PO yields measurably higher fidelity to the actual measurements (e.g., via data-consistency metrics or PSNR gains over non-distilled preconditioners) rather than simply mimicking the teacher on a different problem.

    Authors: We agree that demonstrating improved data fidelity on the student's physical sensing matrix is crucial to support the central claim. In the original manuscript, we reported PSNR and SSIM improvements on the target problem, but to directly address this, we have added data-consistency metrics (normalized data-fidelity error ||A_s x - y|| / ||y|| where A_s is the student's sensing matrix) in the revised experiments section. These show that DIPA achieves better consistency with the measurements compared to non-distilled preconditioned methods and other baselines, while also improving reconstruction quality. This indicates the transferred knowledge enhances performance on the actual problem rather than just mimicking the teacher. revision: yes

  2. Referee: [Method / Distillation criteria] The distillation loss weights are listed as free parameters; the manuscript should report sensitivity analysis or cross-validation showing that quality gains are robust rather than tuned to specific weight choices that favor the teacher.

    Authors: We acknowledge the need for sensitivity analysis on the distillation loss weights. In the revised manuscript, we have included a new set of experiments performing sensitivity analysis by varying the weights of the distillation losses (e.g., trajectory matching and output matching terms) across a grid of values. The results, presented in a new table and figure, demonstrate that the quality gains (in terms of PSNR and data fidelity) are robust within a reasonable range of weight choices and do not critically depend on specific tunings that overly favor the teacher. We have also added a brief discussion on how the weights were selected. revision: yes

Circularity Check

0 steps flagged

No circularity: DIPA is an empirical distillation method with external validation

full rationale

The paper introduces DIPA as a learned preconditioning operator optimized via teacher-guided distillation, where the teacher uses a simulated better-conditioned sensing matrix and the student uses the physical ill-conditioned one. No equations, derivations, or self-citations are shown that reduce the claimed reconstruction-quality improvements to fitted inputs by construction or to prior self-referential results. The approach is presented as a methodological extension validated across MRI, compressed sensing, and super-resolution, relying on external empirical benchmarks rather than self-definitional loops or load-bearing self-citations. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that a teacher with an idealized sensing matrix can usefully supervise a student operating under physical constraints; no free parameters or invented entities are explicitly named in the abstract.

free parameters (1)
  • distillation loss weights
    Weights balancing different teacher-guided criteria are introduced to transfer properties and are expected to be chosen or tuned.
axioms (1)
  • domain assumption A simulated better-conditioned sensing matrix produces a teacher algorithm whose behavior is worth transferring to the student.
    Invoked when the paper states that the teacher uses a simulated, better-conditioned matrix while the student uses the physical one.

pith-pipeline@v0.9.0 · 5764 in / 1207 out tokens · 43249 ms · 2026-05-19T14:23:25.107043+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We extend the preconditioning concept beyond convergence acceleration and use it to improve reconstruction quality. We introduce DIPA: Distilled Preconditioned Algorithms, where a preconditioning operator (PO) is optimized using teacher-guided distillation criteria. ... the teacher uses a simulated, better-conditioned, and more informative sensing matrix, whereas the student uses the physically feasible sensing matrix.

  • IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    L-DIPA formulates the PO as PM where P in R^{n x n} ... convergence-motivated discrepancy term ... R_C(P, A_s, A_t, x) = alpha ||(P A_s^T A_s - A_t^T A_t) x||

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 6 internal anchors

  1. [1]

    Preconditioning strategies for nonlinear conjugate gradient methods, based on quasi-newton updates, in: AIP conference proceedings, AIP Publishing

    Andrea, C., Giovanni, F., Massimo, R., 2016. Preconditioning strategies for nonlinear conjugate gradient methods, based on quasi-newton updates, in: AIP conference proceedings, AIP Publishing

  2. [2]

    Adaptive precision in block-jacobi preconditioning for iterative sparse linear system solvers

    Anzt, H., Dongarra, J., Flegar, G., Higham, N.J., Quintana-Ortí, E.S., 2019. Adaptive precision in block-jacobi preconditioning for iterative sparse linear system solvers. Concurrency and Computation: Practice and Experience 31, e4460

  3. [3]

    Contour detection and hierarchical image segmentation

    Arbelaez, P., Maire, M., Fowlkes, C., Malik, J., 2010. Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence 33, 898–916

  4. [4]

    Deep learning methods for solving linear inverse problems: Research directions and paradigms

    Bai, Y., Chen, W., Chen, J., Guo, W., 2020. Deep learning methods for solving linear inverse problems: Research directions and paradigms. Signal Processing 177, 107729

  5. [5]

    A fast iterative shrinkage-thresholding algorithm for linear inverse problems

    Beck, A., Teboulle, M., 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences 2, 183–202

  6. [6]

    Introduction to inverse problems in imaging

    Bertero, M., Boccacci, P., De Mol, C., 2021. Introduction to inverse problems in imaging. CRC press

  7. [7]

    Linear inverse problems with discrete data

    Bertero, M., De Mol, C., Pike, E.R., 1985. Linear inverse problems with discrete data. i. general formulation and singular system analysis. Inverse problems 1, 301

  8. [8]

    Distributed optimization and statistical learning via the alternating direction method of multipliers

    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al., 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends®in Machine learning 3, 1–122

  9. [9]

    Non-Local Means Denoising

    Buades, A., Coll, B., Morel, J.M., 2011. Non-Local Means Denoising. Image Processing On Line 1, 208–212

  10. [10]

    Image denoising: Can plain neural networks compete with bm3d?, in: 2012 IEEE conference on computer vision and pattern recognition, IEEE

    Burger, H.C., Schuler, C.J., Harmeling, S., 2012. Image denoising: Can plain neural networks compete with bm3d?, in: 2012 IEEE conference on computer vision and pattern recognition, IEEE. pp. 2392–2399

  11. [11]

    An introduction to compressive sampling

    Candes, E.J., Wakin, M.B., 2008. An introduction to compressive sampling. IEEE Signal Processing Magazine 25, 21–30. doi:10.1109/MSP. 2007.914731

  12. [12]

    Plug-and-play admm for image restoration: Fixed-point convergence and applications

    Chan, S.H., Wang, X., Elgendy, O.A., 2016. Plug-and-play admm for image restoration: Fixed-point convergence and applications. IEEE Transactions on Computational Imaging 3, 84–98

  13. [13]

    Nonlinearly preconditioned krylov subspace methods for discrete newton algorithms

    Chan, T.F., Jackson, K.R., 1984. Nonlinearly preconditioned krylov subspace methods for discrete newton algorithms. SIAM Journal on scientific and statistical computing 5, 533–542. R. Gualdrón-Hurtado et al.:Preprint submitted to ElsevierPage 15 of 17 DIPA for Solving Imaging Inverse Problems

  14. [14]

    Learning efficient object detection models with knowledge distillation, in: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R

    Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M., 2017. Learning efficient object detection models with knowledge distillation, in: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (Eds.), Advances in Neural Infor- mation Processing Systems, Curran Associates, Inc. URL:https://proceedings.neurips.cc/paper_fil...

  15. [15]

    Multiscale cholesky preconditioning for ill-conditioned problems

    Chen, J., Schäfer, F., Huang, J., Desbrun, M., 2021. Multiscale cholesky preconditioning for ill-conditioned problems. ACM Transactions on Graphics (TOG) 40, 1–13

  16. [16]

    Imagedenoisingwithblock-matchingand3dfiltering,in:Imageprocessing:algorithms and systems, neural networks, and machine learning, SPIE

    Dabov,K.,Foi,A.,Katkovnik,V.,Egiazarian,K.,2006. Imagedenoisingwithblock-matchingand3dfiltering,in:Imageprocessing:algorithms and systems, neural networks, and machine learning, SPIE. pp. 354–365

  17. [17]

    Multilevel preconditioning and adaptive sparse solution of inverse problems

    Dahlke, S., Fornasier, M., Raasch, T., 2012. Multilevel preconditioning and adaptive sparse solution of inverse problems. Mathematics of Computation 81, 419–446

  18. [18]

    A preconditioner for a primal-dual newton conjugate gradient method for compressed sensing problems

    Dassios, I., Fountoulakis, K., Gondzio, J., 2015. A preconditioner for a primal-dual newton conjugate gradient method for compressed sensing problems. SIAM Journal on Scientific Computing 37, A2783–A2812

  19. [19]

    An iterative thresholding algorithm for linear inverse problems with a sparsity constraint

    Daubechies, I., Defrise, M., De Mol, C., 2004. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 57, 1413–1457

  20. [20]

    Inversion by direct iteration: An alternative to denoising diffusion for image restoration

    Delbracio, M., Milanfar, P., 2023. Inversion by direct iteration: An alternative to denoising diffusion for image restoration. arXiv preprint arXiv:2303.11435

  21. [21]

    The mnist database of handwritten digit images for machine learning research [best of the web]

    Deng, L., 2012. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine 29, 141–142

  22. [22]

    Numerical methods for unconstrained optimization and nonlinear equations

    Dennis Jr, J.E., Schnabel, R.B., 1996. Numerical methods for unconstrained optimization and nonlinear equations. SIAM

  23. [23]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Dosovitskiy, A., et al., 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  24. [24]

    Single-pixel imaging via compressive sampling

    Duarte, M.F., Davenport, M.A., Takhar, D., Laska, J.N., Sun, T., Kelly, K.F., Baraniuk, R.G., 2008. Single-pixel imaging via compressive sampling. IEEE signal processing magazine 25, 83–91

  25. [25]

    Learning preconditioners for inverse problems

    Ehrhardt, et al., 2024. Learning preconditioners for inverse problems. arXiv preprint arXiv:2406.00260

  26. [26]

    Conjugate-gradient preconditioning methods for shift-variant pet image reconstruction

    Fessler, J.A., Booth, S.D., 1999. Conjugate-gradient preconditioning methods for shift-variant pet image reconstruction. IEEE transactions on image processing 8, 688–699

  27. [27]

    On the origins of linear and non-linear preconditioning, in: Domain decomposition methods in science and engineering XXIII, Springer

    Gander, M.J., 2017. On the origins of linear and non-linear preconditioning, in: Domain decomposition methods in science and engineering XXIII, Springer. pp. 153–161

  28. [28]

    Image restoration by denoising diffusion models with iteratively preconditioned guidance, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Garber, T., Tirer, T., 2024. Image restoration by denoising diffusion models with iteratively preconditioned guidance, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 25245–25254

  29. [29]

    Domain transform for edge-aware image and video processing, in: ACM SIGGRAPH 2011 Papers, Association for Computing Machinery, New York, NY, USA

    Gastal, E.S.L., Oliveira, M.M., 2011. Domain transform for edge-aware image and video processing, in: ACM SIGGRAPH 2011 Papers, Association for Computing Machinery, New York, NY, USA. URL:https://doi.org/10.1145/1964921.1964964, doi:10.1145/ 1964921.1964964

  30. [30]

    Tikhonov regularization and total least squares

    Golub, G.H., Hansen, P.C., O’Leary, D.P., 1999. Tikhonov regularization and total least squares. SIAM journal on matrix analysis and applications 21, 185–194

  31. [31]

    Knowledge distillation: A survey

    Gou, J., Yu, B., Maybank, S.J., Tao, D., 2021. Knowledge distillation: A survey. International Journal of Computer Vision 129, 1789–1819

  32. [32]

    GSNR: Graph Smooth Null-Space Representation for Inverse Problems

    Gualdrón-Hurtado, R., Jacome, R., Suarez, R.S., Arguello, H., 2026. GSNR: Graph Smooth Null-Space Representation for Inverse Problems. arXiv preprint arXiv:2602.20328

  33. [33]

    Gualdrón-Hurtado, R., Jacome, R., Suarez, L., Galvis, L., Arguello, H., 2025. Deep distillation gradient preconditioning for inverse problems, in: 2025 IEEE 10th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 166–170. doi:10.1109/CAMSAP66162.2025.11423951

  34. [34]

    Image restoration

    Gunturk, B., Li, X., 2018. Image restoration. CRC Press

  35. [35]

    Half-quadratic-based iterative minimization for robust sparse representation

    He, R., Zheng, W.S., Tan, T., Sun, Z., 2013. Half-quadratic-based iterative minimization for robust sparse representation. IEEE transactions on pattern analysis and machine intelligence 36, 261–275

  36. [36]

    Distilling the Knowledge in a Neural Network

    Hinton, G., Vinyals, O., Dean, J., 2015. Distilling the knowledge in a neural network. URL:https://arxiv.org/abs/1503.02531, arXiv:1503.02531

  37. [37]

    Polynomial preconditioners for regularized linear inverse problems

    Iyer, S.S., Ong, F., Cao, X., Liao, C., Daniel, L., Tamir, J.I., Setsompop, K., 2024. Polynomial preconditioners for regularized linear inverse problems. SIAM Journal on Imaging Sciences 17, 116–146

  38. [38]

    NPN: Non-Linear Projections of the Null-Space for Imaging Inverse Problems

    Jacome, R., Gualdrón-Hurtado, R., Suárez-Rodríguez, L., Arguello, H., 2026. NPN: Non-Linear Projections of the Null-Space for Imaging Inverse Problems. Advances in Neural Information Processing Systems 38, 119069–119099

  39. [39]

    Sparsity regularization in inverse problems

    Jin, B., Maaß, P., Scherzer, O., 2017. Sparsity regularization in inverse problems. Inverse Problems 33

  40. [40]

    Polynomial preconditioners for conjugate gradient calculations

    Johnson, O.G., Micchelli, C.A., Paul, G., 1983. Polynomial preconditioners for conjugate gradient calculations. SIAM Journal on Numerical Analysis 20, 362–376

  41. [41]

    Plug-and-play methods for integrating physical and learned models in computational imaging: Theory, algorithms, and applications

    Kamilov, et al., 2023. Plug-and-play methods for integrating physical and learned models in computational imaging: Theory, algorithms, and applications. IEEE Sig. Proc. Mag. 40, 85–97

  42. [42]

    Adam: A Method for Stochastic Optimization

    Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  43. [43]

    FastMRI:Apubliclyavailable rawk-space andDICOMdataset ofknee images foraccelerated MRimagereconstruction using machine learning

    Knoll, F., Zbontar, J., Sriram, A., Muckley, M.J., Bruno, M., Defazio, A., Parente, M., Geras, K.J., Katsnelson, J., Chandarana, H., Zhang, Z., Drozdzalv, M., Romero, A., Rabbat, M., Vincent, P., Pinkerton, J., Wang, D., Yakubova, N., Owens, E., Zitnick, C.L., Recht, M.P., Sodickson, D.K.,Lui, Y.W.,2020. FastMRI:Apubliclyavailable rawk-space andDICOMdatas...

  44. [44]

    Gradient-based learning applied to document recognition

    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324

  45. [45]

    Li, J., Yang, H., Yi, Q., Fang, F., Gao, G., Zeng, T., Zhang, G., 2022. Multiple degradation and reconstruction network for single image denoising via knowledge distillation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 558–567. R. Gualdrón-Hurtado et al.:Preprint submitted to ElsevierPage 16 of 17 DIPA for So...

  46. [46]

    A note on adaptive nonlinear preconditioning techniques

    Liu, L., Keyes, D.E., Krause, R., 2018. A note on adaptive nonlinear preconditioning techniques. SIAM Journal on Scientific Computing 40, A1171–A1186. URL: https://doi.org/10.1137/17M1128502, doi: 10.1137/17M1128502, arXiv:https://doi.org/10.1137/17M1128502

  47. [47]

    Structured knowledge distillation for semantic segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., Wang, J., 2019. Structured knowledge distillation for semantic segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  48. [48]

    Deep learning face attributes in the wild, in: Proceedings of International Conference on Computer Vision (ICCV)

    Liu, Z., Luo, P., Wang, X., Tang, X., 2015. Deep learning face attributes in the wild, in: Proceedings of International Conference on Computer Vision (ICCV)

  49. [49]

    A convnet for the 2020s, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp

    Liu, Z., et al., 2022. A convnet for the 2020s, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986

  50. [50]

    Decoupled Weight Decay Regularization

    Loshchilov, I., 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101

  51. [51]

    Compressed sensing mri

    Lustig, M., Donoho, D.L., Santos, J.M., Pauly, J.M., 2008. Compressed sensing mri. IEEE signal processing magazine 25, 72–82

  52. [52]

    Improving medical image denoising via a lightweight plug-and-play module, in: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE

    Ma, L., Kuang, H., Liu, J., Shen, C., Wang, J., 2023. Improving medical image denoising via a lightweight plug-and-play module, in: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE. pp. 1350–1355

  53. [53]

    Kd-mri: A knowledge distillation framework for image reconstruction and image restoration in mri workflow, in: Medical imaging with deep learning, PMLR

    Murugesan, B., Vijayarangan, S., Sarveswaran, K., Ram, K., Sivaprakasam, M., 2020. Kd-mri: A knowledge distillation framework for image reconstruction and image restoration in mri workflow, in: Medical imaging with deep learning, PMLR. pp. 515–526

  54. [54]

    Efficient preconditioners for optimality systems arising in connection with inverse problems

    Nielsen, B.F., Mardal, K.A., 2010. Efficient preconditioners for optimality systems arising in connection with inverse problems. SIAM Journal on Control and Optimization 48, 5143–5177

  55. [55]

    Deep learning techniques for inverse problems in imaging

    Ongie, G., Jalal, A., Baraniuk, C.A.M.R.G., Dimakis, A.G., Willett, R., 2020. Deep learning techniques for inverse problems in imaging. IEEE Journal on Selected Areas in Information Theory

  56. [56]

    Proximal algorithms

    Parikh, N., Boyd, S., 2014. Proximal algorithms. Foundations and Trends in optimization 1, 127–239

  57. [57]

    Preconditioners for krylov subspace methods: An overview

    Pearson, J.W., Pestana, J., 2020. Preconditioners for krylov subspace methods: An overview. GAMM-Mitteilungen 43, e202000015

  58. [58]

    Coil sensitivity encoding for fast mri, in: Proceedings of the ISMRM 6th Annual Meeting, Sydney

    Pruessmann, K.P., Weiger, M., Scheidegger, M.B., Boesiger, P., 1998. Coil sensitivity encoding for fast mri, in: Proceedings of the ISMRM 6th Annual Meeting, Sydney

  59. [59]

    On the expressive power of deep neural networks, in: international conference on machine learning, PMLR

    Raghu, M., et al., 2017. On the expressive power of deep neural networks, in: international conference on machine learning, PMLR. pp. 2847–2854

  60. [60]

    The little engine that could: Regularization by denoising (red)

    Romano, Y., Elad, M., Milanfar, P., 2017. The little engine that could: Regularization by denoising (red). SIAM Journal on Imaging Sciences 10, 1804–1844

  61. [61]

    Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Springer. pp. 234–241

  62. [62]

    The perceptron: a probabilistic model for information storage and organization in the brain

    Rosenblatt, F., 1958. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review 65, 386

  63. [63]

    End-to-end variational networks for accelerated mri reconstruction, in: MICCAI 2020, Springer

    Sriram, et al., 2020. End-to-end variational networks for accelerated mri reconstruction, in: MICCAI 2020, Springer. pp. 64–73

  64. [64]

    Edge-preserving and scale-dependent properties of total variation regularization

    Strong, D., Chan, T., 2003. Edge-preserving and scale-dependent properties of total variation regularization. Inverse problems 19, S165

  65. [65]

    DeepInverse: A deep learning framework for inverse problems in imaging

    Tachella, J., Chen, D., Hurault, S., Terris, M., Wang, A., 2023. DeepInverse: A deep learning framework for inverse problems in imaging. URL:https://github.com/deepinv/deepinv, doi:10.5281/zenodo.7982256

  66. [66]

    Provably convergent plug-and-play quasi-newton methods

    Tan, H.Y., Mukherjee, S., Tang, J., Schönlieb, C.B., 2024. Provably convergent plug-and-play quasi-newton methods. SIAM Journal on Imaging Sciences 17, 785–819

  67. [67]

    Image restoration and reconstruction using targeted plug-and-play priors

    Teodoro, A.M., Bioucas-Dias, J.M., Figueiredo, M.A., 2019. Image restoration and reconstruction using targeted plug-and-play priors. IEEE Transactions on Computational Imaging 5, 675–686

  68. [68]

    A survey on super-resolution imaging

    Tian, J., Ma, K.K., 2011. A survey on super-resolution imaging. Signal, Image and Video Processing 5, 329–342

  69. [69]

    Plug-and-play priors for model based reconstruction, in: 2013 IEEE Global Conference on Signal and Information Processing, pp

    Venkatakrishnan, S.V., Bouman, C.A., Wohlberg, B., 2013. Plug-and-play priors for model based reconstruction, in: 2013 IEEE Global Conference on Signal and Information Processing, pp. 945–948. doi:10.1109/GlobalSIP.2013.6737048

  70. [70]

    Cbam: Convolutional block attention module, in: Proceedings of the European conference on computer vision (ECCV), pp

    Woo, S., et al., 2018. Cbam: Convolutional block attention module, in: Proceedings of the European conference on computer vision (ECCV), pp. 3–19

  71. [71]

    Fista-net: Learning a fast iterative shrinkage thresholding network for inverse problems in imaging

    Xiang, et al., 2021. Fista-net: Learning a fast iterative shrinkage thresholding network for inverse problems in imaging. IEEE Trans. Med. Imaging 40, 1329–1339

  72. [72]

    A multiscale and multidepth convolutional neural network for remote sensing imagery pan-sharpening

    Yuan, Q., et al., 2018. A multiscale and multidepth convolutional neural network for remote sensing imagery pan-sharpening. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11, 978–989

  73. [73]

    Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer

    Zagoruyko, S., Komodakis, N., 2016. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928

  74. [74]

    Restormer: Efficient transformer for high-resolution image restoration, in: IEEE/CVF CVPR, pp

    Zamir, et al., 2022. Restormer: Efficient transformer for high-resolution image restoration, in: IEEE/CVF CVPR, pp. 5728–5739

  75. [75]

    Learning nonlocal sparse and low-rank models for image compressive sensing: Nonlocal sparse and low-rank modeling

    Zha, Z., Wen, B., Yuan, X., Ravishankar, S., Zhou, J., Zhu, C., 2023. Learning nonlocal sparse and low-rank models for image compressive sensing: Nonlocal sparse and low-rank modeling. IEEE Signal Processing Magazine 40, 32–44

  76. [76]

    Plug-and-play image restoration with deep denoiser prior

    Zhang, K., Li, Y., Zuo, W., Zhang, L., Van Gool, L., Timofte, R., 2021. Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 6360–6376. R. Gualdrón-Hurtado et al.:Preprint submitted to ElsevierPage 17 of 17