Hybrid Digital-Analog Approximate Inverse Preconditioning for Krylov Methods

Lior Horesh; Rui Peng Li; Shikhar Shah; Tayfun Gokmen; Vassilis Kalantzis; Yuanzhe Xi

arxiv: 2606.17227 · v1 · pith:WDZEE7OCnew · submitted 2026-06-15 · 🧮 math.NA · cs.NA

Hybrid Digital-Analog Approximate Inverse Preconditioning for Krylov Methods

Shikhar Shah , Rui Peng Li , Tayfun Gokmen , Vassilis Kalantzis , Lior Horesh , Yuanzhe Xi This is my paper

Pith reviewed 2026-06-27 02:41 UTC · model grok-4.3

classification 🧮 math.NA cs.NA

keywords approximate inverse preconditioningKrylov subspace methodshybrid digital-analog computingflexible GMRESblock Jacobi preconditionersanalog in-memory computingMonte Carlo approximate inverses

0 comments

The pith

Analog execution changes preconditioner design so that stronger digital approximations can become less effective once noise, quantization, and clipping are included.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines approximate inverse preconditioning for Krylov subspace methods in a hybrid digital-analog setting where the digital host handles sparse matrix products and precision-sensitive operations while analog crossbar arrays perform the preconditioner applications. It shows that the usual aim of creating the strongest possible digital preconditioner does not carry over once analog scaling, write noise, input/output perturbations, quantization, and clipping are taken into account. Instead, choices such as block size, damping, Monte Carlo construction accuracy, and nesting become critical for maintaining convergence. This matters because analog in-memory computing offers highly parallel matrix-vector multiplications with lower data movement, yet its imperfections require rethinking how preconditioners are designed and selected. Numerical experiments using realistic analog matrix-vector simulations compare exact block inverses, sparse approximate inverses, Monte Carlo approximate inverses, and nested schemes to illustrate the shift in effective design criteria.

Core claim

In the hybrid setting the outer iteration is formulated as flexible GMRES because the analog preconditioner application is inexact and application-dependent. Comparisons of block Jacobi schemes demonstrate that analog execution alters the preconditioner design problem: a stronger digital preconditioner may be less effective after the combined effects of scaling, write noise, perturbations, quantization, and clipping are included. Analog-aware tuning of block size, damping, MCAI accuracy, and nesting is required for robust convergence under these non-idealities.

What carries the argument

Flexible GMRES outer iteration with analog crossbar matrix-vector multiplications realizing approximate inverse preconditioners, specifically various block Jacobi schemes whose effectiveness is evaluated under simulated device- and circuit-level non-idealities.

If this is right

A stronger digital preconditioner may be less effective after analog scaling, write noise, input/output perturbations, quantization, and clipping are taken into account.
Analog-aware choices of block size, damping, MCAI construction accuracy, and nesting are important for robust convergence.
Monte Carlo approximate inverses and nested block Jacobi schemes require specific tuning parameters when executed on analog hardware.
The outer iteration must use the flexible GMRES formulation to accommodate the application-dependent inexact preconditioner.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hardware designers may need to co-optimize preconditioner parameters with the specific non-ideality profile of a given analog array rather than maximizing digital accuracy alone.
The approach could be tested on other iterative solvers that rely on repeated matrix-vector products to see whether the same shift in design priorities appears.
For very large problems the trade-off between analog parallelism and the need for more outer iterations due to weaker effective preconditioning would determine overall runtime.

Load-bearing premise

The realistic analog matrix-vector simulations used in the numerical experiments faithfully capture the combined effects of device- and circuit-level non-idealities on the realized preconditioner.

What would settle it

Running the same set of preconditioner schemes and Krylov iterations on physical analog hardware and checking whether the relative performance ordering of stronger versus weaker digital designs remains the same as in the simulations.

read the original abstract

Analog in-memory computing enables highly parallel matrix-vector multiplications with reduced data movement, but the resulting operations are noisy, quantized, and affected by device- and circuit-level non-idealities. This paper studies approximate inverse preconditioning for Krylov subspace methods in a hybrid digital-analog setting. The digital host performs sparse products with the coefficient matrix and the precision-sensitive Krylov operations, while preconditioner applications are performed through analog crossbar matrix-vector multiplications. Since the realized preconditioner is inexact and application-dependent, the outer iteration is formulated as the flexible GMRES method. We show that analog execution changes the usual preconditioner design problem in the sense that a stronger digital preconditioner may be less effective after analog scaling, write noise, input/output perturbations, quantization, and clipping are taken into account. We compare various block Jacobi preconditioning schemes including exact block inverses, sparse approximate inverses, Monte Carlo approximate inverses (MCAI), damping, and nested block Jacobi schemes. Numerical experiments with realistic analog matrix-vector simulations show that analog-aware choices of block size, damping, MCAI construction accuracy, and nesting are important for robust convergence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Analog non-idealities flip which preconditioners work best in hybrid Krylov solvers, based on simulations alone.

read the letter

The punchline is that analog non-idealities can make a stronger digital preconditioner less effective than a weaker one once you account for the noise and quantization in the analog part.

The paper applies the flexible GMRES method to a hybrid digital-analog setup. The digital side handles the coefficient matrix products and the Krylov basis work, while the preconditioner is applied via analog crossbar matrix-vector multiplies. They examine several block Jacobi options: exact block inverses, sparse approximate inverses, Monte Carlo approximate inverses, with options for damping and nesting. The experiments inject modeled effects like write noise, perturbations, quantization, and clipping into the analog operations. The takeaway is that you need to choose block size, damping, and construction accuracy with the analog effects in mind to get reliable convergence.

This work does a good job of showing how standard design rules break down in the hybrid case and gives specific comparisons that could guide someone building such a system.

The main concern is that all the evidence comes from numerical simulations of the analog behavior. There are no results from physical crossbar hardware to check if the noise model matches reality. That makes the ranking reversals between the schemes tentative until someone validates them on actual devices. The description of the experiments stays fairly high-level in the abstract, so it's hard to judge how thorough the sensitivity analysis was.

This is relevant for researchers exploring analog in-memory computing for linear algebra tasks. Someone already working on preconditioners for Krylov methods or on hybrid hardware would find the tuning advice practical.

I would recommend sending it for peer review. The topic is timely and the observations are worth checking even if the current evidence is simulation-based.

Referee Report

2 major / 2 minor

Summary. The paper studies approximate inverse preconditioning for Krylov subspace methods in a hybrid digital-analog setting. The digital host performs sparse matrix products and precision-sensitive Krylov operations while preconditioner applications occur via analog crossbar matrix-vector multiplications; the outer iteration is cast as flexible GMRES to accommodate the inexact, application-dependent preconditioner. Through numerical experiments that inject modeled analog effects (scaling, write noise, I/O perturbations, quantization, clipping), the work shows that analog execution can reverse the relative effectiveness of stronger versus weaker digital preconditioners and compares block Jacobi variants including exact inverses, sparse approximate inverses, Monte Carlo approximate inverses (MCAI), damping, and nested schemes, concluding that analog-aware choices of block size, damping, MCAI accuracy, and nesting are needed for robust convergence.

Significance. If the simulated non-idealities prove representative of hardware, the result would be significant for preconditioner design in emerging analog in-memory computing: it demonstrates that direct porting of digital designs is suboptimal and supplies concrete guidance on analog-aware tuning of block Jacobi schemes. The explicit comparison across multiple construction methods (MCAI, nesting, damping) and the use of flexible GMRES constitute practical strengths.

major comments (2)

[Numerical experiments section] Numerical experiments section: the central claim that analog non-idealities can make stronger digital preconditioners less effective rests entirely on simulated injection of scaling, write noise, I/O perturbations, quantization, and clipping; no hardware measurements or cross-validation against physical crossbar arrays are reported, leaving open whether the observed ranking reversals among block Jacobi, MCAI, and nested schemes are artifacts of the particular noise model.
[§3] §3 (formulation of flexible GMRES with analog preconditioning): the paper provides no convergence analysis or perturbation bounds that account for the combined analog effects; the claim therefore relies solely on empirical observation without theoretical grounding that would establish robustness beyond the chosen test matrices.

minor comments (2)

[Abstract] The abstract states that experiments use 'realistic analog matrix-vector simulations' but does not list the specific matrix dimensions, condition numbers, or number of Monte Carlo samples per MCAI construction; adding these details would improve reproducibility.
[§2] Notation for the analog perturbation operators (e.g., the combined scaling-plus-clipping map) is introduced only informally; a compact equation or diagram in §2 would clarify how these operators are applied inside each preconditioner evaluation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and valuable feedback on our manuscript. Below we provide point-by-point responses to the major comments. We have revised the manuscript to address the concerns where feasible while maintaining the empirical focus of the work.

read point-by-point responses

Referee: [Numerical experiments section] Numerical experiments section: the central claim that analog non-idealities can make stronger digital preconditioners less effective rests entirely on simulated injection of scaling, write noise, I/O perturbations, quantization, and clipping; no hardware measurements or cross-validation against physical crossbar arrays are reported, leaving open whether the observed ranking reversals among block Jacobi, MCAI, and nested schemes are artifacts of the particular noise model.

Authors: We agree that the study is based on simulated non-idealities rather than physical hardware measurements, which is a limitation of the current work. The noise models (scaling, write noise, I/O perturbations, quantization, and clipping) are taken from established device and circuit models in the analog in-memory computing literature. In the revised manuscript we have expanded the numerical experiments section with additional sensitivity studies that vary the noise parameters over broader ranges and across more test matrices; the ranking reversals remain consistent. We have also added an explicit discussion of the simulation-based nature of the validation and the desirability of future hardware cross-validation. revision: partial
Referee: [§3] §3 (formulation of flexible GMRES with analog preconditioning): the paper provides no convergence analysis or perturbation bounds that account for the combined analog effects; the claim therefore relies solely on empirical observation without theoretical grounding that would establish robustness beyond the chosen test matrices.

Authors: The manuscript is an empirical study whose primary contribution is the demonstration that analog execution alters preconditioner effectiveness and the identification of analog-aware tuning strategies for block Jacobi schemes. Developing rigorous perturbation bounds or convergence theory that incorporates the full combination of analog effects would require substantial additional theoretical work that is outside the scope of this paper. In the revised version we have clarified in Section 3 and the conclusions that all claims rest on the reported numerical experiments and have listed the specific test matrices employed. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical simulation results, no self-referential derivations

full rationale

The paper presents no mathematical derivation chain. Its central claim—that analog non-idealities can reverse the effectiveness ranking of digital preconditioners—is an empirical observation drawn from numerical experiments that inject modeled noise, scaling, quantization, and clipping into matrix-vector products. No equations define a quantity in terms of itself, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness theorems or ansatzes are imported via self-citation. The work is self-contained against external benchmarks in the sense that its conclusions rest on the described simulation protocol rather than on any identity that reduces to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no concrete free parameters, axioms, or invented entities can be extracted; the work implicitly relies on standard assumptions of numerical linear algebra and on the fidelity of the analog simulation model.

pith-pipeline@v0.9.1-grok · 5750 in / 1179 out tokens · 40938 ms · 2026-06-27T02:41:07.639444+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 29 canonical work pages

[1]

Alexandrov , Efficient parallel Monte Carlo methods for matrix computations , Mathematics and Computers in Simulation, 47 (1998), pp

V. Alexandrov , Efficient parallel Monte Carlo methods for matrix computations , Mathematics and Computers in Simulation, 47 (1998), pp. 113--122

1998
[2]

Ambrogio, P

S. Ambrogio, P. Narayanan, A. Okazaki, A. Fasoli, C. Mackin, K. Hosokawa, A. Nomura, T. Yasuda, A. Chen, A. Friz, M. Ishii, J. Luquin, Y. Kohda, N. Saulnier, K. Brew, S. Choi, I. Ok, T. Philip, V. Chan, C. Silvestre, I. Ahsan, V. Narayanan, H. Tsai, and G. W. Burr , An analog-AI chip for energy-efficient speech recognition and transcription , Nature, 620 ...

work page doi:10.1038/s41586-023-06337-5 2023
[3]

Benzi and G

M. Benzi and G. H. Golub , Bounds for the Entries of Matrix Functions with Applications to Preconditioning , BIT Numerical Mathematics, 39 (1999), pp. 417--438

1999
[4]

and Golub, G

M. Benzi, G. H. Golub, and J. Liesen , Numerical solution of saddle point problems , Acta Numerica, 14 (2005), pp. 1--137, https://doi.org/10.1017/S0962492904000212

work page doi:10.1017/s0962492904000212 2005
[5]

D. Cai, E. Chow, and Y. Xi , Data-driven linear complexity low-rank approximation of general kernel matrices: A geometric approach , Numerical Linear Algebra with Applications, 30 (2023), p. e2519, https://doi.org/https://doi.org/10.1002/nla.2519, https://onlinelibrary.wiley.com/doi/abs/10.1002/nla.2519

work page doi:10.1002/nla.2519 2023
[6]

2017 , month = jan, journal =

E. Carson and N. J. Higham , A New Analysis of Iterative Refinement and Its Application to Accurate Solution of Ill-Conditioned Sparse Linear Systems , SIAM Journal on Scientific Computing, 39 (2017), pp. A2834--A2856, https://doi.org/10.1137/17M1122918, https://doi.org/10.1137/17M1122918, https://arxiv.org/abs/https://doi.org/10.1137/17M1122918

work page doi:10.1137/17m1122918 2017
[7]

J. Chen, J. Li, Y. Li, and X. Miao , Multiply accumulate operations in memristor crossbar arrays for analog computing , Journal of Semiconductors, 42 (2021), p. 013104, https://doi.org/10.1088/1674-4926/42/1/013104, https://doi.org/10.1088/1674-4926/42/1/013104

work page doi:10.1088/1674-4926/42/1/013104 2021
[8]

2021 , url =

B. Feinberg, R. Wong, T. P. Xiao, C. H. Bennett, J. N. Rohan, E. G. Boman, M. J. Marinella, S. Agarwal, and E. Ipek , An Analog Preconditioner for Solving Linear Systems , in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 761--774, https://doi.org/10.1109/HPCA51647.2021.00069

work page doi:10.1109/hpca51647.2021.00069 2021
[9]

Gazzola and M

S. Gazzola and M. Sabaté Landman , Krylov methods for inverse problems: Surveying classical, and introducing new, algorithmic approaches , GAMM-Mitteilungen, 43 (2020), p. e202000017, https://doi.org/https://doi.org/10.1002/gamm.202000017, https://onlinelibrary.wiley.com/doi/abs/10.1002/gamm.202000017, https://arxiv.org/abs/https://onlinelibrary.wiley.com...

work page doi:10.1002/gamm.202000017 2020
[10]

Ghosh, L

S. Ghosh, L. Horesh, V. Kalantzis, Y. Lu, and T. Nowicki , Regenerative Ulam--von Neumann Algorithm: An Innovative Markov Chain Monte Carlo Method for Matrix Inversion , SIAM Journal on Matrix Analysis and Applications, 46 (2025), pp. 2589--2613

2025
[11]

Gokmen and Y

T. Gokmen and Y. Vlasov , Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations , Frontiers in Neuroscience, 10 (2016), https://doi.org/10.3389/fnins.2016.00333

work page doi:10.3389/fnins.2016.00333 2016
[12]

N. I. M. Gould and J. A. Scott , Sparse Approximate-Inverse Preconditioners Using Norm-Minimization Techniques , SIAM Journal on Scientific Computing, 19 (1998), pp. 605--625, https://doi.org/10.1137/S1064827595288425, https://doi.org/10.1137/S1064827595288425, https://arxiv.org/abs/https://doi.org/10.1137/S1064827595288425

work page doi:10.1137/s1064827595288425 1998
[13]

Graillat, F

S. Graillat, F. J \'e z \'e quel, T. Mary, and R. Molina , Adaptive Precision Sparse Matrix–Vector Product and Its Application to Krylov Solvers , SIAM Journal on Scientific Computing, 46 (2024), pp. C30--C56

2024
[14]

M. J. Grote and T. Huckle , Parallel Preconditioning with Sparse Approximate Inverses , SIAM Journal on Scientific Computing, 18 (1997), pp. 838--853

1997
[15]

Hackbusch , A Sparse Matrix Arithmetic Based on H-Matrices

W. Hackbusch , A Sparse Matrix Arithmetic Based on H-Matrices. Part I: Introduction to H-Matrices , Computing, 62 (1999), pp. 89--108, https://api.semanticscholar.org/CorpusID:15496936

1999
[16]

P. C. Hansen, J. G. Nagy, and D. P. O'Leary , Deblurring Images , Society for Industrial and Applied Mathematics, 2006, https://doi.org/10.1137/1.9780898718874, https://epubs.siam.org/doi/abs/10.1137/1.9780898718874, https://arxiv.org/abs/https://epubs.siam.org/doi/pdf/10.1137/1.9780898718874

work page doi:10.1137/1.9780898718874 2006
[17]

N. J. Higham and T. Mary , Mixed precision algorithms in numerical linear algebra , Acta Numerica, 31 (2022), p. 347–414, https://doi.org/10.1017/S0962492922000022

work page doi:10.1017/s0962492922000022 2022
[18]

H. Ji, M. Mascagni, and Y. Li , Convergence Analysis of Markov Chain Monte Carlo Linear Solvers Using Ulam--von Neumann Algorithm , SIAM Journal on Numerical Analysis, 51 (2013), pp. 2107--2122, https://arxiv.org/abs/https://doi.org/10.1137/130904867

work page doi:10.1137/130904867 2013
[19]

Kalantzis, C

V. Kalantzis, C. Bekas, A. Curioni, and E. Gallopoulos , Accelerating data uncertainty quantification by solving linear systems with multiple right-hand sides , Numerical Algorithms, 62 (2013), pp. 637--653, https://doi.org/10.1007/s11075-012-9687-2

work page doi:10.1007/s11075-012-9687-2 2013
[20]

D. E. Keyes, L. C. Mcinnes, C. Woodward, W. Gropp, E. Myra, M. Pernice, J. Bell, J. Brown, A. Clo, J. Connors, E. Constantinescu, D. Estep, K. Evans, C. Farhat, A. Hakim, G. Hammond, G. Hansen, J. Hill, T. Isaac, X. Jiao, K. Jordan, D. Kaushik, E. Kaxiras, A. Koniges, K. Lee, A. Lott, Q. Lu, J. Magerlein, R. Maxwell, M. Mccourt, M. Mehl, R. Pawlowski, A. ...

work page doi:10.1177/1094342012468181 2013
[21]

a ndli, A. Singh, S. M. M \

M. Le Gallo, R. Khaddam-Aljameh, M. Stanisavljevic, A. Vasilopoulos, B. Kersting, M. Dazzi, G. Karunaratne, M. Br \"a ndli, A. Singh, S. M. M \"u ller, J. B \"u chel, X. Timoneda, V. Joshi, M. J. Rasch, U. Egger, A. Garofalo, A. Petropoulos, T. Antonakopoulos, K. Brew, S. Choi, I. Ok, T. Philip, V. Chan, C. Silvestre, I. Ahsan, N. Saulnier, V. Narayanan, ...

work page doi:10.1038/s41928-023-01010-1 2023
[22]

L. Pan, P. Zuo, Y. Luo, Z. Sun, and R. Huang , BlockAMC: Scalable In-Memory Analog Matrix Computing for Solving Linear Systems , in 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2024, pp. 1--6, https://doi.org/10.23919/DATE58400.2024.10546501

work page doi:10.23919/date58400.2024.10546501 2024
[23]

M. J. Rasch, C. Mackin, M. Le Gallo, A. Chen, A. Fasoli, F. Odermatt, N. Li, S. R. Nandakumar, P. Narayanan, H. Tsai, G. W. Burr, A. Sebastian, and V. Narayanan , Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators , Nature Communications, 14 (2023), p. 5282, https://doi.org/10...

work page doi:10.1038/s41467-023-40770-4 2023
[24]

M. J. Rasch, D. Moreda, T. Gokmen, M. Le Gallo, F. Carta, C. Goldberg, K. El Maghraoui, A. Sebastian, and V. Narayanan , A Flexible and Fast PyTorch Toolkit for Simulating Training and Inference on Analog Crossbar Arrays , in 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2021, pp. 1--4, https://doi.org/10....

work page doi:10.1109/aicas51828.2021.9458494 2021
[25]

Saad , A Flexible Inner-Outer Preconditioned GMRES Algorithm , SIAM Journal on Scientific Computing, 14 (1993), pp

Y. Saad , A Flexible Inner-Outer Preconditioned GMRES Algorithm , SIAM Journal on Scientific Computing, 14 (1993), pp. 461--469, https://doi.org/10.1137/0914028, https://doi.org/10.1137/0914028, https://arxiv.org/abs/https://doi.org/10.1137/0914028

work page doi:10.1137/0914028 1993
[26]

Saad , Iterative Methods for Sparse Linear Systems , SIAM, 2nd ed., 2003

Y. Saad , Iterative Methods for Sparse Linear Systems , SIAM, 2nd ed., 2003

2003
[27]

Saad and H

Y. Saad and H. A. van der Vorst , Iterative solution of linear systems in the 20th century , Journal of Computational and Applied Mathematics, 123 (2000), pp. 1--33, https://doi.org/https://doi.org/10.1016/S0377-0427(00)00412-X, https://www.sciencedirect.com/science/article/pii/S037704270000412X. Numerical Analysis 2000. Vol. III: Linear Algebra

work page doi:10.1016/s0377-0427(00)00412-x 2000
[28]

J. Shi, R. Li, Y. Xi, Y. Saad, and M. V. de Hoop , A Non-perturbative Approach to Computing Seismic Normal Modes in Rotating Planets , J. Sci. Comput., 91 (2022), https://doi.org/10.1007/s10915-022-01836-5, https://doi.org/10.1007/s10915-022-01836-5

work page doi:10.1007/s10915-022-01836-5 2022
[29]

W. Wan, R. Kubendran, C. Schaefer, S. B. Eryilmaz, W. Zhang, D. Wu, S. Deiss, P. Raina, H. Qian, B. Gao, S. Joshi, H. Wu, H.-S. P. Wong, and G. Cauwenberghs , A compute-in-memory chip based on resistive random-access memory , Nature, 608 (2022), pp. 504--512, https://doi.org/10.1038/s41586-022-04992-8, https://doi.org/10.1038/s41586-022-04992-8

work page doi:10.1038/s41586-022-04992-8 2022
[30]

Communications of the ACM , issue_date =

S. Williams, A. Waterman, and D. Patterson , Roofline: an insightful visual performance model for multicore architectures , Commun. ACM, 52 (2009), p. 65–76, https://doi.org/10.1145/1498765.1498785, https://doi.org/10.1145/1498765.1498785

work page doi:10.1145/1498765.1498785 2009
[31]

W. A. Wulf and S. A. McKee , Hitting the memory wall: implications of the obvious , SIGARCH Comput. Archit. News, 23 (1995), p. 20–24, https://doi.org/10.1145/216585.216588, https://doi.org/10.1145/216585.216588

work page doi:10.1145/216585.216588 1995
[32]

Y. Xi, R. Li, and Y. Saad , An Algebraic Multilevel Preconditioner with Low-Rank Corrections for Sparse Symmetric Matrices , SIAM Journal on Matrix Analysis and Applications, 37 (2016), pp. 235--259, https://doi.org/10.1137/15M1021830, https://doi.org/10.1137/15M1021830

work page doi:10.1137/15m1021830 2016
[33]

T. P. Xiao, C. H. Bennett, B. Feinberg, S. Agarwal, and M. J. Marinella , Analog architectures for neural network acceleration based on non-volatile memory , Applied Physics Reviews, 7 (2020), p. 031301, https://doi.org/10.1063/1.5143815, https://doi.org/10.1063/1.5143815, https://arxiv.org/abs/https://pubs.aip.org/aip/apr/article-pdf/doi/10.1063/1.514381...

work page doi:10.1063/1.5143815 2020
[34]

T. Xu, V. Kalantzis, R. Li, Y. Xi, G. Dillon, and Y. Saad , parGeMSLR: A parallel multilevel Schur complement low-rank preconditioning and solution package for general sparse matrices , Parallel Comput., 113 (2022), https://doi.org/10.1016/j.parco.2022.102956, https://doi.org/10.1016/j.parco.2022.102956

work page doi:10.1016/j.parco.2022.102956 2022
[35]

T. Xu, R. P. Li, and Y. Xi , Neural Approximate Inverse Preconditioners , 2025, https://arxiv.org/abs/2510.13034, https://arxiv.org/abs/2510.13034

arXiv 2025
[36]

T. Xu, Z. Zhang, J. Chen, Y. Saad, and Y. Xi , Mixed Precision Orthogonalization-Free Projection Methods for Eigenvalue and Singular Value Problems , 2025, https://arxiv.org/abs/2505.00281, https://arxiv.org/abs/2505.00281

arXiv 2025
[37]

P. Zuo, Q. Wang, Y. Luo, R. Xie, S. Wang, Z. Cheng, L. Bao, Z. Wang, Y. Cai, R. Huang, and Z. Sun , Precise and scalable analogue matrix equation solving using resistive random-access memory chips , Nature Electronics, 8 (2025), pp. 1222--1233, https://doi.org/10.1038/s41928-025-01477-0, https://doi.org/10.1038/s41928-025-01477-0

work page doi:10.1038/s41928-025-01477-0 2025
[38]

Świrydowicz, N

K. Świrydowicz, N. Koukpaizan, M. Alam, S. Regev, M. Saunders, and S. Peleš , Iterative methods in GPU-resident linear solvers for nonlinear constrained optimization , Parallel Computing, 123 (2025), p. 103123, https://doi.org/https://doi.org/10.1016/j.parco.2024.103123, https://www.sciencedirect.com/science/article/pii/S0167819124000619

work page doi:10.1016/j.parco.2024.103123 2025

[1] [1]

Alexandrov , Efficient parallel Monte Carlo methods for matrix computations , Mathematics and Computers in Simulation, 47 (1998), pp

V. Alexandrov , Efficient parallel Monte Carlo methods for matrix computations , Mathematics and Computers in Simulation, 47 (1998), pp. 113--122

1998

[2] [2]

Ambrogio, P

S. Ambrogio, P. Narayanan, A. Okazaki, A. Fasoli, C. Mackin, K. Hosokawa, A. Nomura, T. Yasuda, A. Chen, A. Friz, M. Ishii, J. Luquin, Y. Kohda, N. Saulnier, K. Brew, S. Choi, I. Ok, T. Philip, V. Chan, C. Silvestre, I. Ahsan, V. Narayanan, H. Tsai, and G. W. Burr , An analog-AI chip for energy-efficient speech recognition and transcription , Nature, 620 ...

work page doi:10.1038/s41586-023-06337-5 2023

[3] [3]

Benzi and G

M. Benzi and G. H. Golub , Bounds for the Entries of Matrix Functions with Applications to Preconditioning , BIT Numerical Mathematics, 39 (1999), pp. 417--438

1999

[4] [4]

and Golub, G

M. Benzi, G. H. Golub, and J. Liesen , Numerical solution of saddle point problems , Acta Numerica, 14 (2005), pp. 1--137, https://doi.org/10.1017/S0962492904000212

work page doi:10.1017/s0962492904000212 2005

[5] [5]

D. Cai, E. Chow, and Y. Xi , Data-driven linear complexity low-rank approximation of general kernel matrices: A geometric approach , Numerical Linear Algebra with Applications, 30 (2023), p. e2519, https://doi.org/https://doi.org/10.1002/nla.2519, https://onlinelibrary.wiley.com/doi/abs/10.1002/nla.2519

work page doi:10.1002/nla.2519 2023

[6] [6]

2017 , month = jan, journal =

E. Carson and N. J. Higham , A New Analysis of Iterative Refinement and Its Application to Accurate Solution of Ill-Conditioned Sparse Linear Systems , SIAM Journal on Scientific Computing, 39 (2017), pp. A2834--A2856, https://doi.org/10.1137/17M1122918, https://doi.org/10.1137/17M1122918, https://arxiv.org/abs/https://doi.org/10.1137/17M1122918

work page doi:10.1137/17m1122918 2017

[7] [7]

J. Chen, J. Li, Y. Li, and X. Miao , Multiply accumulate operations in memristor crossbar arrays for analog computing , Journal of Semiconductors, 42 (2021), p. 013104, https://doi.org/10.1088/1674-4926/42/1/013104, https://doi.org/10.1088/1674-4926/42/1/013104

work page doi:10.1088/1674-4926/42/1/013104 2021

[8] [8]

2021 , url =

B. Feinberg, R. Wong, T. P. Xiao, C. H. Bennett, J. N. Rohan, E. G. Boman, M. J. Marinella, S. Agarwal, and E. Ipek , An Analog Preconditioner for Solving Linear Systems , in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 761--774, https://doi.org/10.1109/HPCA51647.2021.00069

work page doi:10.1109/hpca51647.2021.00069 2021

[9] [9]

Gazzola and M

S. Gazzola and M. Sabaté Landman , Krylov methods for inverse problems: Surveying classical, and introducing new, algorithmic approaches , GAMM-Mitteilungen, 43 (2020), p. e202000017, https://doi.org/https://doi.org/10.1002/gamm.202000017, https://onlinelibrary.wiley.com/doi/abs/10.1002/gamm.202000017, https://arxiv.org/abs/https://onlinelibrary.wiley.com...

work page doi:10.1002/gamm.202000017 2020

[10] [10]

Ghosh, L

S. Ghosh, L. Horesh, V. Kalantzis, Y. Lu, and T. Nowicki , Regenerative Ulam--von Neumann Algorithm: An Innovative Markov Chain Monte Carlo Method for Matrix Inversion , SIAM Journal on Matrix Analysis and Applications, 46 (2025), pp. 2589--2613

2025

[11] [11]

Gokmen and Y

T. Gokmen and Y. Vlasov , Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations , Frontiers in Neuroscience, 10 (2016), https://doi.org/10.3389/fnins.2016.00333

work page doi:10.3389/fnins.2016.00333 2016

[12] [12]

N. I. M. Gould and J. A. Scott , Sparse Approximate-Inverse Preconditioners Using Norm-Minimization Techniques , SIAM Journal on Scientific Computing, 19 (1998), pp. 605--625, https://doi.org/10.1137/S1064827595288425, https://doi.org/10.1137/S1064827595288425, https://arxiv.org/abs/https://doi.org/10.1137/S1064827595288425

work page doi:10.1137/s1064827595288425 1998

[13] [13]

Graillat, F

S. Graillat, F. J \'e z \'e quel, T. Mary, and R. Molina , Adaptive Precision Sparse Matrix–Vector Product and Its Application to Krylov Solvers , SIAM Journal on Scientific Computing, 46 (2024), pp. C30--C56

2024

[14] [14]

M. J. Grote and T. Huckle , Parallel Preconditioning with Sparse Approximate Inverses , SIAM Journal on Scientific Computing, 18 (1997), pp. 838--853

1997

[15] [15]

Hackbusch , A Sparse Matrix Arithmetic Based on H-Matrices

W. Hackbusch , A Sparse Matrix Arithmetic Based on H-Matrices. Part I: Introduction to H-Matrices , Computing, 62 (1999), pp. 89--108, https://api.semanticscholar.org/CorpusID:15496936

1999

[16] [16]

P. C. Hansen, J. G. Nagy, and D. P. O'Leary , Deblurring Images , Society for Industrial and Applied Mathematics, 2006, https://doi.org/10.1137/1.9780898718874, https://epubs.siam.org/doi/abs/10.1137/1.9780898718874, https://arxiv.org/abs/https://epubs.siam.org/doi/pdf/10.1137/1.9780898718874

work page doi:10.1137/1.9780898718874 2006

[17] [17]

N. J. Higham and T. Mary , Mixed precision algorithms in numerical linear algebra , Acta Numerica, 31 (2022), p. 347–414, https://doi.org/10.1017/S0962492922000022

work page doi:10.1017/s0962492922000022 2022

[18] [18]

H. Ji, M. Mascagni, and Y. Li , Convergence Analysis of Markov Chain Monte Carlo Linear Solvers Using Ulam--von Neumann Algorithm , SIAM Journal on Numerical Analysis, 51 (2013), pp. 2107--2122, https://arxiv.org/abs/https://doi.org/10.1137/130904867

work page doi:10.1137/130904867 2013

[19] [19]

Kalantzis, C

V. Kalantzis, C. Bekas, A. Curioni, and E. Gallopoulos , Accelerating data uncertainty quantification by solving linear systems with multiple right-hand sides , Numerical Algorithms, 62 (2013), pp. 637--653, https://doi.org/10.1007/s11075-012-9687-2

work page doi:10.1007/s11075-012-9687-2 2013

[20] [20]

D. E. Keyes, L. C. Mcinnes, C. Woodward, W. Gropp, E. Myra, M. Pernice, J. Bell, J. Brown, A. Clo, J. Connors, E. Constantinescu, D. Estep, K. Evans, C. Farhat, A. Hakim, G. Hammond, G. Hansen, J. Hill, T. Isaac, X. Jiao, K. Jordan, D. Kaushik, E. Kaxiras, A. Koniges, K. Lee, A. Lott, Q. Lu, J. Magerlein, R. Maxwell, M. Mccourt, M. Mehl, R. Pawlowski, A. ...

work page doi:10.1177/1094342012468181 2013

[21] [21]

a ndli, A. Singh, S. M. M \

M. Le Gallo, R. Khaddam-Aljameh, M. Stanisavljevic, A. Vasilopoulos, B. Kersting, M. Dazzi, G. Karunaratne, M. Br \"a ndli, A. Singh, S. M. M \"u ller, J. B \"u chel, X. Timoneda, V. Joshi, M. J. Rasch, U. Egger, A. Garofalo, A. Petropoulos, T. Antonakopoulos, K. Brew, S. Choi, I. Ok, T. Philip, V. Chan, C. Silvestre, I. Ahsan, N. Saulnier, V. Narayanan, ...

work page doi:10.1038/s41928-023-01010-1 2023

[22] [22]

L. Pan, P. Zuo, Y. Luo, Z. Sun, and R. Huang , BlockAMC: Scalable In-Memory Analog Matrix Computing for Solving Linear Systems , in 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2024, pp. 1--6, https://doi.org/10.23919/DATE58400.2024.10546501

work page doi:10.23919/date58400.2024.10546501 2024

[23] [23]

M. J. Rasch, C. Mackin, M. Le Gallo, A. Chen, A. Fasoli, F. Odermatt, N. Li, S. R. Nandakumar, P. Narayanan, H. Tsai, G. W. Burr, A. Sebastian, and V. Narayanan , Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators , Nature Communications, 14 (2023), p. 5282, https://doi.org/10...

work page doi:10.1038/s41467-023-40770-4 2023

[24] [24]

M. J. Rasch, D. Moreda, T. Gokmen, M. Le Gallo, F. Carta, C. Goldberg, K. El Maghraoui, A. Sebastian, and V. Narayanan , A Flexible and Fast PyTorch Toolkit for Simulating Training and Inference on Analog Crossbar Arrays , in 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2021, pp. 1--4, https://doi.org/10....

work page doi:10.1109/aicas51828.2021.9458494 2021

[25] [25]

Saad , A Flexible Inner-Outer Preconditioned GMRES Algorithm , SIAM Journal on Scientific Computing, 14 (1993), pp

Y. Saad , A Flexible Inner-Outer Preconditioned GMRES Algorithm , SIAM Journal on Scientific Computing, 14 (1993), pp. 461--469, https://doi.org/10.1137/0914028, https://doi.org/10.1137/0914028, https://arxiv.org/abs/https://doi.org/10.1137/0914028

work page doi:10.1137/0914028 1993

[26] [26]

Saad , Iterative Methods for Sparse Linear Systems , SIAM, 2nd ed., 2003

Y. Saad , Iterative Methods for Sparse Linear Systems , SIAM, 2nd ed., 2003

2003

[27] [27]

Saad and H

Y. Saad and H. A. van der Vorst , Iterative solution of linear systems in the 20th century , Journal of Computational and Applied Mathematics, 123 (2000), pp. 1--33, https://doi.org/https://doi.org/10.1016/S0377-0427(00)00412-X, https://www.sciencedirect.com/science/article/pii/S037704270000412X. Numerical Analysis 2000. Vol. III: Linear Algebra

work page doi:10.1016/s0377-0427(00)00412-x 2000

[28] [28]

J. Shi, R. Li, Y. Xi, Y. Saad, and M. V. de Hoop , A Non-perturbative Approach to Computing Seismic Normal Modes in Rotating Planets , J. Sci. Comput., 91 (2022), https://doi.org/10.1007/s10915-022-01836-5, https://doi.org/10.1007/s10915-022-01836-5

work page doi:10.1007/s10915-022-01836-5 2022

[29] [29]

W. Wan, R. Kubendran, C. Schaefer, S. B. Eryilmaz, W. Zhang, D. Wu, S. Deiss, P. Raina, H. Qian, B. Gao, S. Joshi, H. Wu, H.-S. P. Wong, and G. Cauwenberghs , A compute-in-memory chip based on resistive random-access memory , Nature, 608 (2022), pp. 504--512, https://doi.org/10.1038/s41586-022-04992-8, https://doi.org/10.1038/s41586-022-04992-8

work page doi:10.1038/s41586-022-04992-8 2022

[30] [30]

Communications of the ACM , issue_date =

S. Williams, A. Waterman, and D. Patterson , Roofline: an insightful visual performance model for multicore architectures , Commun. ACM, 52 (2009), p. 65–76, https://doi.org/10.1145/1498765.1498785, https://doi.org/10.1145/1498765.1498785

work page doi:10.1145/1498765.1498785 2009

[31] [31]

W. A. Wulf and S. A. McKee , Hitting the memory wall: implications of the obvious , SIGARCH Comput. Archit. News, 23 (1995), p. 20–24, https://doi.org/10.1145/216585.216588, https://doi.org/10.1145/216585.216588

work page doi:10.1145/216585.216588 1995

[32] [32]

Y. Xi, R. Li, and Y. Saad , An Algebraic Multilevel Preconditioner with Low-Rank Corrections for Sparse Symmetric Matrices , SIAM Journal on Matrix Analysis and Applications, 37 (2016), pp. 235--259, https://doi.org/10.1137/15M1021830, https://doi.org/10.1137/15M1021830

work page doi:10.1137/15m1021830 2016

[33] [33]

T. P. Xiao, C. H. Bennett, B. Feinberg, S. Agarwal, and M. J. Marinella , Analog architectures for neural network acceleration based on non-volatile memory , Applied Physics Reviews, 7 (2020), p. 031301, https://doi.org/10.1063/1.5143815, https://doi.org/10.1063/1.5143815, https://arxiv.org/abs/https://pubs.aip.org/aip/apr/article-pdf/doi/10.1063/1.514381...

work page doi:10.1063/1.5143815 2020

[34] [34]

T. Xu, V. Kalantzis, R. Li, Y. Xi, G. Dillon, and Y. Saad , parGeMSLR: A parallel multilevel Schur complement low-rank preconditioning and solution package for general sparse matrices , Parallel Comput., 113 (2022), https://doi.org/10.1016/j.parco.2022.102956, https://doi.org/10.1016/j.parco.2022.102956

work page doi:10.1016/j.parco.2022.102956 2022

[35] [35]

T. Xu, R. P. Li, and Y. Xi , Neural Approximate Inverse Preconditioners , 2025, https://arxiv.org/abs/2510.13034, https://arxiv.org/abs/2510.13034

arXiv 2025

[36] [36]

T. Xu, Z. Zhang, J. Chen, Y. Saad, and Y. Xi , Mixed Precision Orthogonalization-Free Projection Methods for Eigenvalue and Singular Value Problems , 2025, https://arxiv.org/abs/2505.00281, https://arxiv.org/abs/2505.00281

arXiv 2025

[37] [37]

P. Zuo, Q. Wang, Y. Luo, R. Xie, S. Wang, Z. Cheng, L. Bao, Z. Wang, Y. Cai, R. Huang, and Z. Sun , Precise and scalable analogue matrix equation solving using resistive random-access memory chips , Nature Electronics, 8 (2025), pp. 1222--1233, https://doi.org/10.1038/s41928-025-01477-0, https://doi.org/10.1038/s41928-025-01477-0

work page doi:10.1038/s41928-025-01477-0 2025

[38] [38]

Świrydowicz, N

K. Świrydowicz, N. Koukpaizan, M. Alam, S. Regev, M. Saunders, and S. Peleš , Iterative methods in GPU-resident linear solvers for nonlinear constrained optimization , Parallel Computing, 123 (2025), p. 103123, https://doi.org/https://doi.org/10.1016/j.parco.2024.103123, https://www.sciencedirect.com/science/article/pii/S0167819124000619

work page doi:10.1016/j.parco.2024.103123 2025