torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch

Mingyuan Chi; Shizheng Wen

arxiv: 2601.13994 · v2 · submitted 2026-01-20 · 💻 cs.DC · cs.AI

torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch

Mingyuan Chi , Shizheng Wen This is my paper

Pith reviewed 2026-05-16 12:34 UTC · model grok-4.3

classification 💻 cs.DC cs.AI

keywords sparse linear algebraautomatic differentiationPyTorchdistributed computingscientific machine learningadjoint methodbatched solvers

0 comments

The pith

torch-sla supplies one autograd-aware interface for sparse solvers that runs direct, iterative, nonlinear and eigenvalue problems across five backends with built-in batching and multi-GPU support.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces torch-sla to give PyTorch users a consistent way to call sparse linear solvers while obtaining correct gradients automatically. It unifies SciPy, Eigen, cuDSS, CuPy and a native iterative solver under a single API that dispatches by device and size. The library also adds batched solves and domain-decomposition parallelism with halo exchange. These features rest on an adjoint differentiation scheme whose memory cost stays constant regardless of solver depth. If the implementation succeeds, end-to-end differentiable models that embed large sparse systems become practical without hand-written gradient code.

Core claim

torch-sla exposes a single autograd-aware API for direct, iterative, nonlinear, and eigenvalue solvers across five interchangeable backends with automatic dispatch, batched solves, and distributed multi-GPU execution via domain decomposition with halo exchange, enabled by an O(1)-graph adjoint differentiation framework.

What carries the argument

The O(1)-graph adjoint differentiation framework together with an autograd-compatible halo-exchange layer that keeps gradient computation linear in the number of solver steps.

If this is right

Neural networks can embed sparse direct or iterative solves as differentiable layers without custom autograd rules.
Batched solves over shared or distinct sparsity patterns become available in one API call on both CPU and GPU.
Domain-decomposition parallelism with halo exchange scales sparse solves to multiple GPUs while preserving differentiability.
Automatic backend selection removes the need for separate CPU and GPU code paths in scientific ML pipelines.
Nonlinear and eigenvalue solvers now carry gradients, extending differentiable programming to a wider range of physics and optimization problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adjoint technique could be ported to other frameworks to give them comparable sparse-solver support.
Real-world timing on large-scale scientific datasets would quantify whether the constant-graph overhead remains negligible in practice.
The library's design suggests a path toward fully automatic differentiation of domain-decomposition codes beyond linear algebra.
Adding support for additional sparse formats or preconditioners would be a direct next step that preserves the existing API.

Load-bearing premise

The adjoint differentiation framework and halo-exchange layer produce accurate gradients for every supported solver type and backend without instability or prohibitive overhead.

What would settle it

Run a gradient check that compares library gradients against finite differences on a distributed batched nonlinear solve; mismatch beyond floating-point tolerance would show the central claim is false.

read the original abstract

Differentiable sparse linear algebra is foundational for scientific machine learning, yet PyTorch lacks a unified library for it: \texttt{torch.sparse} provides only low-level kernels and a non-differentiable, CPU-only \texttt{spsolve}, and \texttt{torch.linalg} is dense-only. We present \torchsla{}, an open-source library that fills this gap. It exposes a single autograd-aware API for direct, iterative, nonlinear, and eigenvalue solvers across five interchangeable backends -- SciPy and Eigen on CPU, cuDSS, CuPy, and a PyTorch-native iterative solver on GPU -- with automatic dispatch by device and problem size. The library further supports batched solves over shared or distinct sparsity patterns and distributed multi-GPU execution via domain decomposition with halo exchange. These capabilities are made scalable by an O(1)-graph adjoint differentiation framework and an autograd-compatible distributed halo-exchange layer. The library is available at https://www.torchsla.com/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

torch-sla supplies a unified differentiable sparse solver API for PyTorch with distributed support, but the O(1) adjoint claims rest on architecture without shown validation.

read the letter

The core contribution is a single autograd-aware interface that dispatches across five backends for direct, iterative, nonlinear, and eigenvalue sparse solves, plus batched operation and multi-GPU domain decomposition with halo exchange. That combination did not exist as a maintained PyTorch library before, and it directly addresses the gap left by torch.sparse and torch.linalg for scientific ML work that needs gradients through sparse systems.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces torch-sla, an open-source PyTorch library providing a unified autograd-aware API for sparse linear algebra. It supports direct, iterative, nonlinear, and eigenvalue solvers across five interchangeable backends (SciPy/Eigen on CPU, cuDSS/CuPy/PyTorch-native on GPU) with automatic dispatch, batched solves over shared or distinct sparsity patterns, and distributed multi-GPU execution via domain decomposition with halo exchange. Scalability is achieved through an O(1)-graph adjoint differentiation framework and an autograd-compatible halo-exchange layer.

Significance. If the adjoint framework and halo-exchange layer are correctly implemented and validated, the library would address a clear gap in PyTorch for differentiable sparse computations, enabling new applications in scientific machine learning such as physics-informed neural networks and large-scale optimization. The multi-backend design and distributed support are practical strengths; the open-source release further increases potential impact.

major comments (2)

[Abstract] Abstract: The claim that the O(1)-graph adjoint differentiation framework correctly computes gradients for iterative, nonlinear, and eigenvalue solvers (including under distributed halo exchange) is load-bearing for the central contribution, yet no derivation, implementation sketch, finite-difference verification, or per-solver stability analysis is supplied.
[Abstract] Abstract: No benchmarks, gradient-accuracy tests, or numerical results are presented to substantiate claims of numerical stability, O(1) memory scaling, or correct behavior when swapping backends or moving to distributed execution; this absence prevents assessment of whether the autograd-compatible layers introduce instability or excessive overhead.

minor comments (1)

The abstract states the library URL but provides no concrete API signatures, usage examples, or installation instructions, which would help readers evaluate the claimed single-API design.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and will revise the paper to incorporate the requested supporting material.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the O(1)-graph adjoint differentiation framework correctly computes gradients for iterative, nonlinear, and eigenvalue solvers (including under distributed halo exchange) is load-bearing for the central contribution, yet no derivation, implementation sketch, finite-difference verification, or per-solver stability analysis is supplied.

Authors: We agree that the central claim requires explicit support. In the revised manuscript we will add a dedicated subsection deriving the O(1)-graph adjoint method for each solver class, including implementation sketches, finite-difference verification experiments, and a per-solver stability discussion that explicitly treats the distributed halo-exchange case. These additions will be placed in the methods and experiments sections. revision: yes
Referee: [Abstract] Abstract: No benchmarks, gradient-accuracy tests, or numerical results are presented to substantiate claims of numerical stability, O(1) memory scaling, or correct behavior when swapping backends or moving to distributed execution; this absence prevents assessment of whether the autograd-compatible layers introduce instability or excessive overhead.

Authors: We acknowledge the lack of empirical validation in the current version. The revised manuscript will include a new experiments section containing runtime and memory benchmarks confirming O(1) scaling, gradient-accuracy comparisons against finite differences for all solver types, numerical stability results, and cross-backend plus distributed-execution tests. These will quantify any overhead introduced by the autograd layers. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering implementation of existing solvers

full rationale

This is a software library paper describing an API and implementation for differentiable sparse solvers in PyTorch. It introduces no mathematical derivations, fitted parameters, uniqueness theorems, or ansatzes. The O(1)-graph adjoint framework and halo-exchange layer are presented as engineering contributions whose correctness is asserted via implementation rather than derived from prior results within the paper. No load-bearing step reduces to a self-citation or to its own inputs by construction; the work is self-contained as code that wraps and extends standard backends.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software library contribution. No free parameters, mathematical axioms, or invented physical entities are required; the work consists of API design and backend integration.

pith-pipeline@v0.9.0 · 5476 in / 1038 out tokens · 56427 ms · 2026-05-16T12:34:17.457845+00:00 · methodology

torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)