Hybrid Digital-Analog Approximate Inverse Preconditioning for Krylov Methods
Pith reviewed 2026-06-27 02:41 UTC · model grok-4.3
The pith
Analog execution changes preconditioner design so that stronger digital approximations can become less effective once noise, quantization, and clipping are included.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the hybrid setting the outer iteration is formulated as flexible GMRES because the analog preconditioner application is inexact and application-dependent. Comparisons of block Jacobi schemes demonstrate that analog execution alters the preconditioner design problem: a stronger digital preconditioner may be less effective after the combined effects of scaling, write noise, perturbations, quantization, and clipping are included. Analog-aware tuning of block size, damping, MCAI accuracy, and nesting is required for robust convergence under these non-idealities.
What carries the argument
Flexible GMRES outer iteration with analog crossbar matrix-vector multiplications realizing approximate inverse preconditioners, specifically various block Jacobi schemes whose effectiveness is evaluated under simulated device- and circuit-level non-idealities.
If this is right
- A stronger digital preconditioner may be less effective after analog scaling, write noise, input/output perturbations, quantization, and clipping are taken into account.
- Analog-aware choices of block size, damping, MCAI construction accuracy, and nesting are important for robust convergence.
- Monte Carlo approximate inverses and nested block Jacobi schemes require specific tuning parameters when executed on analog hardware.
- The outer iteration must use the flexible GMRES formulation to accommodate the application-dependent inexact preconditioner.
Where Pith is reading between the lines
- Hardware designers may need to co-optimize preconditioner parameters with the specific non-ideality profile of a given analog array rather than maximizing digital accuracy alone.
- The approach could be tested on other iterative solvers that rely on repeated matrix-vector products to see whether the same shift in design priorities appears.
- For very large problems the trade-off between analog parallelism and the need for more outer iterations due to weaker effective preconditioning would determine overall runtime.
Load-bearing premise
The realistic analog matrix-vector simulations used in the numerical experiments faithfully capture the combined effects of device- and circuit-level non-idealities on the realized preconditioner.
What would settle it
Running the same set of preconditioner schemes and Krylov iterations on physical analog hardware and checking whether the relative performance ordering of stronger versus weaker digital designs remains the same as in the simulations.
read the original abstract
Analog in-memory computing enables highly parallel matrix-vector multiplications with reduced data movement, but the resulting operations are noisy, quantized, and affected by device- and circuit-level non-idealities. This paper studies approximate inverse preconditioning for Krylov subspace methods in a hybrid digital-analog setting. The digital host performs sparse products with the coefficient matrix and the precision-sensitive Krylov operations, while preconditioner applications are performed through analog crossbar matrix-vector multiplications. Since the realized preconditioner is inexact and application-dependent, the outer iteration is formulated as the flexible GMRES method. We show that analog execution changes the usual preconditioner design problem in the sense that a stronger digital preconditioner may be less effective after analog scaling, write noise, input/output perturbations, quantization, and clipping are taken into account. We compare various block Jacobi preconditioning schemes including exact block inverses, sparse approximate inverses, Monte Carlo approximate inverses (MCAI), damping, and nested block Jacobi schemes. Numerical experiments with realistic analog matrix-vector simulations show that analog-aware choices of block size, damping, MCAI construction accuracy, and nesting are important for robust convergence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies approximate inverse preconditioning for Krylov subspace methods in a hybrid digital-analog setting. The digital host performs sparse matrix products and precision-sensitive Krylov operations while preconditioner applications occur via analog crossbar matrix-vector multiplications; the outer iteration is cast as flexible GMRES to accommodate the inexact, application-dependent preconditioner. Through numerical experiments that inject modeled analog effects (scaling, write noise, I/O perturbations, quantization, clipping), the work shows that analog execution can reverse the relative effectiveness of stronger versus weaker digital preconditioners and compares block Jacobi variants including exact inverses, sparse approximate inverses, Monte Carlo approximate inverses (MCAI), damping, and nested schemes, concluding that analog-aware choices of block size, damping, MCAI accuracy, and nesting are needed for robust convergence.
Significance. If the simulated non-idealities prove representative of hardware, the result would be significant for preconditioner design in emerging analog in-memory computing: it demonstrates that direct porting of digital designs is suboptimal and supplies concrete guidance on analog-aware tuning of block Jacobi schemes. The explicit comparison across multiple construction methods (MCAI, nesting, damping) and the use of flexible GMRES constitute practical strengths.
major comments (2)
- [Numerical experiments section] Numerical experiments section: the central claim that analog non-idealities can make stronger digital preconditioners less effective rests entirely on simulated injection of scaling, write noise, I/O perturbations, quantization, and clipping; no hardware measurements or cross-validation against physical crossbar arrays are reported, leaving open whether the observed ranking reversals among block Jacobi, MCAI, and nested schemes are artifacts of the particular noise model.
- [§3] §3 (formulation of flexible GMRES with analog preconditioning): the paper provides no convergence analysis or perturbation bounds that account for the combined analog effects; the claim therefore relies solely on empirical observation without theoretical grounding that would establish robustness beyond the chosen test matrices.
minor comments (2)
- [Abstract] The abstract states that experiments use 'realistic analog matrix-vector simulations' but does not list the specific matrix dimensions, condition numbers, or number of Monte Carlo samples per MCAI construction; adding these details would improve reproducibility.
- [§2] Notation for the analog perturbation operators (e.g., the combined scaling-plus-clipping map) is introduced only informally; a compact equation or diagram in §2 would clarify how these operators are applied inside each preconditioner evaluation.
Simulated Author's Rebuttal
We thank the referee for the thorough review and valuable feedback on our manuscript. Below we provide point-by-point responses to the major comments. We have revised the manuscript to address the concerns where feasible while maintaining the empirical focus of the work.
read point-by-point responses
-
Referee: [Numerical experiments section] Numerical experiments section: the central claim that analog non-idealities can make stronger digital preconditioners less effective rests entirely on simulated injection of scaling, write noise, I/O perturbations, quantization, and clipping; no hardware measurements or cross-validation against physical crossbar arrays are reported, leaving open whether the observed ranking reversals among block Jacobi, MCAI, and nested schemes are artifacts of the particular noise model.
Authors: We agree that the study is based on simulated non-idealities rather than physical hardware measurements, which is a limitation of the current work. The noise models (scaling, write noise, I/O perturbations, quantization, and clipping) are taken from established device and circuit models in the analog in-memory computing literature. In the revised manuscript we have expanded the numerical experiments section with additional sensitivity studies that vary the noise parameters over broader ranges and across more test matrices; the ranking reversals remain consistent. We have also added an explicit discussion of the simulation-based nature of the validation and the desirability of future hardware cross-validation. revision: partial
-
Referee: [§3] §3 (formulation of flexible GMRES with analog preconditioning): the paper provides no convergence analysis or perturbation bounds that account for the combined analog effects; the claim therefore relies solely on empirical observation without theoretical grounding that would establish robustness beyond the chosen test matrices.
Authors: The manuscript is an empirical study whose primary contribution is the demonstration that analog execution alters preconditioner effectiveness and the identification of analog-aware tuning strategies for block Jacobi schemes. Developing rigorous perturbation bounds or convergence theory that incorporates the full combination of analog effects would require substantial additional theoretical work that is outside the scope of this paper. In the revised version we have clarified in Section 3 and the conclusions that all claims rest on the reported numerical experiments and have listed the specific test matrices employed. revision: yes
Circularity Check
No circularity: empirical simulation results, no self-referential derivations
full rationale
The paper presents no mathematical derivation chain. Its central claim—that analog non-idealities can reverse the effectiveness ranking of digital preconditioners—is an empirical observation drawn from numerical experiments that inject modeled noise, scaling, quantization, and clipping into matrix-vector products. No equations define a quantity in terms of itself, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness theorems or ansatzes are imported via self-citation. The work is self-contained against external benchmarks in the sense that its conclusions rest on the described simulation protocol rather than on any identity that reduces to its own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Alexandrov , Efficient parallel Monte Carlo methods for matrix computations , Mathematics and Computers in Simulation, 47 (1998), pp
V. Alexandrov , Efficient parallel Monte Carlo methods for matrix computations , Mathematics and Computers in Simulation, 47 (1998), pp. 113--122
1998
-
[2]
S. Ambrogio, P. Narayanan, A. Okazaki, A. Fasoli, C. Mackin, K. Hosokawa, A. Nomura, T. Yasuda, A. Chen, A. Friz, M. Ishii, J. Luquin, Y. Kohda, N. Saulnier, K. Brew, S. Choi, I. Ok, T. Philip, V. Chan, C. Silvestre, I. Ahsan, V. Narayanan, H. Tsai, and G. W. Burr , An analog-AI chip for energy-efficient speech recognition and transcription , Nature, 620 ...
-
[3]
Benzi and G
M. Benzi and G. H. Golub , Bounds for the Entries of Matrix Functions with Applications to Preconditioning , BIT Numerical Mathematics, 39 (1999), pp. 417--438
1999
-
[4]
M. Benzi, G. H. Golub, and J. Liesen , Numerical solution of saddle point problems , Acta Numerica, 14 (2005), pp. 1--137, https://doi.org/10.1017/S0962492904000212
-
[5]
D. Cai, E. Chow, and Y. Xi , Data-driven linear complexity low-rank approximation of general kernel matrices: A geometric approach , Numerical Linear Algebra with Applications, 30 (2023), p. e2519, https://doi.org/https://doi.org/10.1002/nla.2519, https://onlinelibrary.wiley.com/doi/abs/10.1002/nla.2519
-
[6]
E. Carson and N. J. Higham , A New Analysis of Iterative Refinement and Its Application to Accurate Solution of Ill-Conditioned Sparse Linear Systems , SIAM Journal on Scientific Computing, 39 (2017), pp. A2834--A2856, https://doi.org/10.1137/17M1122918, https://doi.org/10.1137/17M1122918, https://arxiv.org/abs/https://doi.org/10.1137/17M1122918
-
[7]
J. Chen, J. Li, Y. Li, and X. Miao , Multiply accumulate operations in memristor crossbar arrays for analog computing , Journal of Semiconductors, 42 (2021), p. 013104, https://doi.org/10.1088/1674-4926/42/1/013104, https://doi.org/10.1088/1674-4926/42/1/013104
-
[8]
B. Feinberg, R. Wong, T. P. Xiao, C. H. Bennett, J. N. Rohan, E. G. Boman, M. J. Marinella, S. Agarwal, and E. Ipek , An Analog Preconditioner for Solving Linear Systems , in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 761--774, https://doi.org/10.1109/HPCA51647.2021.00069
-
[9]
S. Gazzola and M. Sabaté Landman , Krylov methods for inverse problems: Surveying classical, and introducing new, algorithmic approaches , GAMM-Mitteilungen, 43 (2020), p. e202000017, https://doi.org/https://doi.org/10.1002/gamm.202000017, https://onlinelibrary.wiley.com/doi/abs/10.1002/gamm.202000017, https://arxiv.org/abs/https://onlinelibrary.wiley.com...
-
[10]
Ghosh, L
S. Ghosh, L. Horesh, V. Kalantzis, Y. Lu, and T. Nowicki , Regenerative Ulam--von Neumann Algorithm: An Innovative Markov Chain Monte Carlo Method for Matrix Inversion , SIAM Journal on Matrix Analysis and Applications, 46 (2025), pp. 2589--2613
2025
-
[11]
T. Gokmen and Y. Vlasov , Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations , Frontiers in Neuroscience, 10 (2016), https://doi.org/10.3389/fnins.2016.00333
-
[12]
N. I. M. Gould and J. A. Scott , Sparse Approximate-Inverse Preconditioners Using Norm-Minimization Techniques , SIAM Journal on Scientific Computing, 19 (1998), pp. 605--625, https://doi.org/10.1137/S1064827595288425, https://doi.org/10.1137/S1064827595288425, https://arxiv.org/abs/https://doi.org/10.1137/S1064827595288425
-
[13]
Graillat, F
S. Graillat, F. J \'e z \'e quel, T. Mary, and R. Molina , Adaptive Precision Sparse Matrix–Vector Product and Its Application to Krylov Solvers , SIAM Journal on Scientific Computing, 46 (2024), pp. C30--C56
2024
-
[14]
M. J. Grote and T. Huckle , Parallel Preconditioning with Sparse Approximate Inverses , SIAM Journal on Scientific Computing, 18 (1997), pp. 838--853
1997
-
[15]
Hackbusch , A Sparse Matrix Arithmetic Based on H-Matrices
W. Hackbusch , A Sparse Matrix Arithmetic Based on H-Matrices. Part I: Introduction to H-Matrices , Computing, 62 (1999), pp. 89--108, https://api.semanticscholar.org/CorpusID:15496936
1999
-
[16]
P. C. Hansen, J. G. Nagy, and D. P. O'Leary , Deblurring Images , Society for Industrial and Applied Mathematics, 2006, https://doi.org/10.1137/1.9780898718874, https://epubs.siam.org/doi/abs/10.1137/1.9780898718874, https://arxiv.org/abs/https://epubs.siam.org/doi/pdf/10.1137/1.9780898718874
-
[17]
N. J. Higham and T. Mary , Mixed precision algorithms in numerical linear algebra , Acta Numerica, 31 (2022), p. 347–414, https://doi.org/10.1017/S0962492922000022
-
[18]
H. Ji, M. Mascagni, and Y. Li , Convergence Analysis of Markov Chain Monte Carlo Linear Solvers Using Ulam--von Neumann Algorithm , SIAM Journal on Numerical Analysis, 51 (2013), pp. 2107--2122, https://arxiv.org/abs/https://doi.org/10.1137/130904867
-
[19]
V. Kalantzis, C. Bekas, A. Curioni, and E. Gallopoulos , Accelerating data uncertainty quantification by solving linear systems with multiple right-hand sides , Numerical Algorithms, 62 (2013), pp. 637--653, https://doi.org/10.1007/s11075-012-9687-2
-
[20]
D. E. Keyes, L. C. Mcinnes, C. Woodward, W. Gropp, E. Myra, M. Pernice, J. Bell, J. Brown, A. Clo, J. Connors, E. Constantinescu, D. Estep, K. Evans, C. Farhat, A. Hakim, G. Hammond, G. Hansen, J. Hill, T. Isaac, X. Jiao, K. Jordan, D. Kaushik, E. Kaxiras, A. Koniges, K. Lee, A. Lott, Q. Lu, J. Magerlein, R. Maxwell, M. Mccourt, M. Mehl, R. Pawlowski, A. ...
-
[21]
M. Le Gallo, R. Khaddam-Aljameh, M. Stanisavljevic, A. Vasilopoulos, B. Kersting, M. Dazzi, G. Karunaratne, M. Br \"a ndli, A. Singh, S. M. M \"u ller, J. B \"u chel, X. Timoneda, V. Joshi, M. J. Rasch, U. Egger, A. Garofalo, A. Petropoulos, T. Antonakopoulos, K. Brew, S. Choi, I. Ok, T. Philip, V. Chan, C. Silvestre, I. Ahsan, N. Saulnier, V. Narayanan, ...
-
[22]
L. Pan, P. Zuo, Y. Luo, Z. Sun, and R. Huang , BlockAMC: Scalable In-Memory Analog Matrix Computing for Solving Linear Systems , in 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2024, pp. 1--6, https://doi.org/10.23919/DATE58400.2024.10546501
-
[23]
M. J. Rasch, C. Mackin, M. Le Gallo, A. Chen, A. Fasoli, F. Odermatt, N. Li, S. R. Nandakumar, P. Narayanan, H. Tsai, G. W. Burr, A. Sebastian, and V. Narayanan , Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators , Nature Communications, 14 (2023), p. 5282, https://doi.org/10...
-
[24]
M. J. Rasch, D. Moreda, T. Gokmen, M. Le Gallo, F. Carta, C. Goldberg, K. El Maghraoui, A. Sebastian, and V. Narayanan , A Flexible and Fast PyTorch Toolkit for Simulating Training and Inference on Analog Crossbar Arrays , in 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2021, pp. 1--4, https://doi.org/10....
-
[25]
Y. Saad , A Flexible Inner-Outer Preconditioned GMRES Algorithm , SIAM Journal on Scientific Computing, 14 (1993), pp. 461--469, https://doi.org/10.1137/0914028, https://doi.org/10.1137/0914028, https://arxiv.org/abs/https://doi.org/10.1137/0914028
-
[26]
Saad , Iterative Methods for Sparse Linear Systems , SIAM, 2nd ed., 2003
Y. Saad , Iterative Methods for Sparse Linear Systems , SIAM, 2nd ed., 2003
2003
-
[27]
Y. Saad and H. A. van der Vorst , Iterative solution of linear systems in the 20th century , Journal of Computational and Applied Mathematics, 123 (2000), pp. 1--33, https://doi.org/https://doi.org/10.1016/S0377-0427(00)00412-X, https://www.sciencedirect.com/science/article/pii/S037704270000412X. Numerical Analysis 2000. Vol. III: Linear Algebra
-
[28]
J. Shi, R. Li, Y. Xi, Y. Saad, and M. V. de Hoop , A Non-perturbative Approach to Computing Seismic Normal Modes in Rotating Planets , J. Sci. Comput., 91 (2022), https://doi.org/10.1007/s10915-022-01836-5, https://doi.org/10.1007/s10915-022-01836-5
-
[29]
W. Wan, R. Kubendran, C. Schaefer, S. B. Eryilmaz, W. Zhang, D. Wu, S. Deiss, P. Raina, H. Qian, B. Gao, S. Joshi, H. Wu, H.-S. P. Wong, and G. Cauwenberghs , A compute-in-memory chip based on resistive random-access memory , Nature, 608 (2022), pp. 504--512, https://doi.org/10.1038/s41586-022-04992-8, https://doi.org/10.1038/s41586-022-04992-8
-
[30]
Communications of the ACM , issue_date =
S. Williams, A. Waterman, and D. Patterson , Roofline: an insightful visual performance model for multicore architectures , Commun. ACM, 52 (2009), p. 65–76, https://doi.org/10.1145/1498765.1498785, https://doi.org/10.1145/1498765.1498785
-
[31]
W. A. Wulf and S. A. McKee , Hitting the memory wall: implications of the obvious , SIGARCH Comput. Archit. News, 23 (1995), p. 20–24, https://doi.org/10.1145/216585.216588, https://doi.org/10.1145/216585.216588
-
[32]
Y. Xi, R. Li, and Y. Saad , An Algebraic Multilevel Preconditioner with Low-Rank Corrections for Sparse Symmetric Matrices , SIAM Journal on Matrix Analysis and Applications, 37 (2016), pp. 235--259, https://doi.org/10.1137/15M1021830, https://doi.org/10.1137/15M1021830
-
[33]
T. P. Xiao, C. H. Bennett, B. Feinberg, S. Agarwal, and M. J. Marinella , Analog architectures for neural network acceleration based on non-volatile memory , Applied Physics Reviews, 7 (2020), p. 031301, https://doi.org/10.1063/1.5143815, https://doi.org/10.1063/1.5143815, https://arxiv.org/abs/https://pubs.aip.org/aip/apr/article-pdf/doi/10.1063/1.514381...
-
[34]
T. Xu, V. Kalantzis, R. Li, Y. Xi, G. Dillon, and Y. Saad , parGeMSLR: A parallel multilevel Schur complement low-rank preconditioning and solution package for general sparse matrices , Parallel Comput., 113 (2022), https://doi.org/10.1016/j.parco.2022.102956, https://doi.org/10.1016/j.parco.2022.102956
-
[35]
T. Xu, R. P. Li, and Y. Xi , Neural Approximate Inverse Preconditioners , 2025, https://arxiv.org/abs/2510.13034, https://arxiv.org/abs/2510.13034
arXiv 2025
-
[36]
T. Xu, Z. Zhang, J. Chen, Y. Saad, and Y. Xi , Mixed Precision Orthogonalization-Free Projection Methods for Eigenvalue and Singular Value Problems , 2025, https://arxiv.org/abs/2505.00281, https://arxiv.org/abs/2505.00281
arXiv 2025
-
[37]
P. Zuo, Q. Wang, Y. Luo, R. Xie, S. Wang, Z. Cheng, L. Bao, Z. Wang, Y. Cai, R. Huang, and Z. Sun , Precise and scalable analogue matrix equation solving using resistive random-access memory chips , Nature Electronics, 8 (2025), pp. 1222--1233, https://doi.org/10.1038/s41928-025-01477-0, https://doi.org/10.1038/s41928-025-01477-0
-
[38]
K. Świrydowicz, N. Koukpaizan, M. Alam, S. Regev, M. Saunders, and S. Peleš , Iterative methods in GPU-resident linear solvers for nonlinear constrained optimization , Parallel Computing, 123 (2025), p. 103123, https://doi.org/https://doi.org/10.1016/j.parco.2024.103123, https://www.sciencedirect.com/science/article/pii/S0167819124000619
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.