WingSpan: Concurrency and Dependence for Sparse and Structured Tensor Compilers

Adrian Gushin; Sang Yoon Kim; Willow Ahrens

arxiv: 2606.20855 · v1 · pith:WDXGNJSDnew · submitted 2026-06-18 · 💻 cs.MS

WingSpan: Concurrency and Dependence for Sparse and Structured Tensor Compilers

Adrian Gushin , Sang Yoon Kim , Willow Ahrens This is my paper

Pith reviewed 2026-06-26 14:26 UTC · model grok-4.3

classification 💻 cs.MS

keywords sparse tensorsparallelismdependence theorytensor compilersSpGEMMconcurrencysafety analysisdata structures

0 comments

The pith

WingSpan enables arbitrary parallel compositions of loops and data structures for sparse tensors while using a dependence theory to guarantee safety.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing systems often restrict parallelism when sparse tensors appear as outputs or multiple inputs. WingSpan removes those limits by supporting any mix of parallel loops and structures. The language achieves performance at or above hand-written parallel code on kernels such as SpGEMM. A dependence theory determines which compositions remain safe. If the theory holds, programmers can write parallel sparse tensor code without manual safety checks or performance loss.

Core claim

WingSpan shows that a dependence theory tailored to sparse tensors and related structures can certify the safety of unrestricted parallel compositions, and that a compiler realizing this theory can match or exceed the speed of hand-optimized parallel routines on representative kernels.

What carries the argument

The dependence theory that classifies parallel compositions involving sparse tensors as safe or unsafe.

If this is right

Any safe combination of parallel loops over sparse inputs and outputs can be written directly in the language.
Performance on SpGEMM and similar kernels reaches the level of manually tuned parallel code.
The same safety analysis applies to data structures that go beyond simple sparsity.
Parallel programs no longer require separate hand-written synchronization for sparse cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dependence rules could be ported to other tensor compilers to add parallelism without new language features.
Hardware designers might use the theory to decide which sparse patterns warrant special parallel support.
Large-scale sparse applications such as graph analytics could gain automatic parallelization once the theory is implemented.

Load-bearing premise

The dependence theory correctly flags every unsafe parallel composition that involves sparse outputs or multiple sparse inputs.

What would settle it

A concrete parallel program the theory labels safe yet produces wrong answers from concurrent writes on sparse tensor entries would disprove the claim.

Figures

Figures reproduced from arXiv: 2606.20855 by Adrian Gushin, Sang Yoon Kim, Willow Ahrens.

**Figure 1.** Figure 1: A fiber tree representation of a sparse matrix in CSC format, with a dense outer level, a sparse inner level, and an element level of leaves that store values [2, 41]. Note that each index within a level’s fibers maps to a position in its child level. When the child is a leaf, this map materializes directly to a payload (value). position, or an identifier in the bulk pool of subfibers. When we need to refe… view at source ↗

**Figure 3.** Figure 3: Two example kernels. The access is independent on the left because no two loop iterations will modify 𝑐 with the same value for 𝑖. Thus, the kernel is safe to parallelize. The access on the right is dependent on itself since adjacent loop iterations modify the same element of 𝑐. This potential race means that the kernel may not be safe to parallelize. iterations, or between two accesses in the same iterati… view at source ↗

**Figure 2.** Figure 2: Examples of sparse matrix structures represented in the hierarchical format language of Finch[2]. 2.3 Classical Dependence Analysis Safe parallel code requires avoiding race conditions. Classical dependence analysis can determine when it is possible to safely transform or parallelize loops over dense arrays[24]. To introduce vocabulary, consider an 𝑛-dimensional tensor 𝐴 mapping 𝑛-tuples of integer coordi… view at source ↗

**Figure 4.** Figure 4: A fibertree disambiguation of a column-major 3 × 4 sparse tensor. Three memory accesses are being made to subscripts (2, 1), (3, 1), and (2, 2), denoted by black, red, and orange arrows respectively. Observe that despite all the subscripts referring to different elements, pairs (2, 1) and (3, 1) access the same memory location in the first level of the tensor (highlighted in yellow). In contrast, subscript… view at source ↗

**Figure 5.** Figure 5: The Grammar of WingSpan. Highlighted terms represent the new concepts added in this paper. WingSpan builds on Finch by adding several new language features to facilitate parallelism. WingSpan introduces four modifier levels to support parallelism, the mutex, isolate, shard and merge levels, which wrap Finch levels to protect them. WingSpan coordinates these levels with Finch’s existing framework by adding… view at source ↗

**Figure 7.** Figure 7: Three proposed level formats to handle concurrent updates. When processors write to separate locations, we can split the data structure across processors with the shard level. When processors write to the same locations, we need to use locks or keep different copies on each processor and merge them at the end of computation [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: The stability 𝑆 ′ of each level at mode 𝑙, expressed in terms of the stability 𝑆 of the wrapped levels when applicable. Only devices which unwrap, increment, or unfurl the modifier may use the modifier for stability. Unlike traditional levels, modifier levels do not introduce new modes into the tensor. Instead, they wrap the behavior of existing level access functions like unwrap, increment, and unfurl, w… view at source ↗

**Figure 9.** Figure 9: An annotated nested inner product (left) and its output 𝐶 (right). Blue, green, and yellow highlight code running on the serial device, dev1, and dev2, respectively. Entering a parallel loop transfers in-scope tensors to that loop’s device. The tree shows 𝐶’s structure on each device: a vector of vectors (serial), a vector of elements (dev1), or a single element (dev2) [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗

**Figure 10.** Figure 10: SpGEMM strong and weak scaling on randomly generated sparse matrices with 12 non-zeroes per row. comparable software may lack parallelism or an efficient routine. Second, this generality does not sacrifice performance; WingSpan competes with specialized, hand-written routines. Finally, WingSpan handles structure beyond sparsity, supporting parallelism over tensors with run-length patterns. For reproduci… view at source ↗

**Figure 11.** Figure 11: SpGEMM speedup on the small (left) and large (right) Zhang matrices. Inner Product dev=cpu(:t,nt) C=Tensor(Dense(Shard( dev,SparseList(Element(0.0))))) C.=0 for j=parallel(_,dev,sch) for i=_ for k=_ C[i,j]+=AT[k,i]*B[k,j] Outer Product dev=cpu(:t,nt) C=Tensor(Merge(dev,SparseDict( SparseDict(Element(0.0))))) C.=0 for k=parallel(_,dev,sch) for j=_ for i=_ C[i,j]+=A[i,k]*BT[j,k] Gustavson’s Algorithm dev=cp… view at source ↗

**Figure 12.** Figure 12: SpGEMM implemented in WingSpan. For brevity, trailing end scope-closing annotations are omitted. is erratic and Eigen fails to parallelize. The speedup and scaling performance show that WingSpan’s abstraction provides additional versatility with little to no performance tradeoff. MKL outperforms WingSpan on the outlier roadnet and m133 matrices. These products are large and particularly sparse, even amon… view at source ↗

**Figure 13.** Figure 13: SpAdd and Hadamard speedups on the large sparse Zhang SNAP matrices, computing 𝐶 = 𝐴 op 𝐵 where op ∈ {+, ⊙}. Top row: identity case (𝐴 = 𝐵); bottom row: standard case (𝐵 = random permutation of 𝐴) [PITH_FULL_IMAGE:figures/full_fig_p010_13.png] view at source ↗

**Figure 15.** Figure 15: On left, WingSpan’s performance on MTTKRP relative to TACO. On right, sparse MTTKRP scaling. On average, WingSpan matches TACO’s performance (0.96× speedup). TACO notably outperforms WingSpan by a factor of 2.82× on the Nell-1 data because WingSpan’s Finch backend is not optimized for this kernel. TACO is 3.83× faster than Finch in serial on the same data, suggesting that WingSpan’s parallelization strate… view at source ↗

**Figure 16.** Figure 16: Left: Histogram scaling. Indexing the output tensor with an RGB triple returns the number of pixels with that color. Right: Structured add scaling. framework applies across kernels using arbitrary level formats, we compute an RGB histogram of 5 of the sparsest, (most fill background pixels), images from a Kaggle dataset of web screenshots [4]. The input is an image represented with run-length encoding. … view at source ↗

read the original abstract

Sparse tensors represent data that is mostly zero or some other compressible fill pattern. Such datasets can be massive, so optimized tensor algebra libraries and compilers have been developed to exploit these patterns to improve performance. Existing systems, however, frequently lack support for parallelism, especially when outputs are sparse or multiple inputs are sparse. We propose WingSpan, a sparse tensor language enabling unrestricted parallel programming. WingSpan supports arbitrary composition of parallel loops and data structures, matching or exceeding the performance of hand-optimized parallel routines on critical kernels such as SpGEMM. We also introduce a dependence theory for the safety of parallel programs involving sparse tensors and structures beyond sparsity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

WingSpan adds a dependence theory and language support for unrestricted parallel composition on sparse tensors, with performance claims on SpGEMM that look worth checking.

read the letter

The core contribution is a sparse tensor language that lets you compose parallel loops and data structures without the usual restrictions when outputs or multiple inputs are sparse, plus a dependence theory meant to guarantee safety for those programs. That combination is what stands out from the abstract.

It does a clean job framing the problem: existing compilers often block parallelism in exactly the cases that matter for real sparse workloads. The claim that it matches or beats hand-optimized SpGEMM kernels is the kind of concrete target that makes the work testable.

The soft spots are the usual ones for an early paper. The dependence theory is only sketched at the abstract level, so it is not yet possible to see whether it is complete, overly conservative, or introduces hidden restrictions once you move past simple cases. The performance numbers are presented as empirical results, but without the full experimental section it is hard to judge baseline quality, measurement methodology, or whether the gains come from the new theory or from other implementation choices. No proof outlines or counter-example handling appear in the provided description.

This is for people who build or use tensor compilers and care about safe parallelism in sparse linear algebra. A reader already working on dependence analysis or parallel sparse kernels would get the most out of it.

The work is coherent on its own terms and addresses a real gap, so it deserves a serious referee even if the theory and experiments need tightening.

Referee Report

2 major / 0 minor

Summary. The paper introduces WingSpan, a sparse tensor language that supports arbitrary composition of parallel loops and data structures to enable unrestricted parallel programming. It presents a dependence theory for ensuring the safety of such parallel programs involving sparse tensors and structures beyond sparsity, and claims that the system matches or exceeds the performance of hand-optimized parallel routines on kernels such as SpGEMM.

Significance. If the dependence theory correctly identifies unsafe compositions and the implementation delivers the claimed performance without hidden restrictions on parallelism, the work would address a notable gap in existing sparse tensor compilers by enabling more flexible parallel compositions than current systems.

major comments (2)

[Abstract] Abstract: the central performance claim ('matching or exceeding the performance of hand-optimized parallel routines on critical kernels such as SpGEMM') is stated without reference to any experimental section, table, figure, or benchmark data, so the empirical result cannot be evaluated.
[Abstract] Abstract: the dependence theory is asserted to ensure safety for parallel programs with sparse outputs or multiple sparse inputs, but no axioms, definitions, or proof outline is supplied, preventing assessment of whether it correctly identifies all unsafe compositions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments. We address each major comment below and propose revisions where appropriate to improve clarity and evaluability of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claim ('matching or exceeding the performance of hand-optimized parallel routines on critical kernels such as SpGEMM') is stated without reference to any experimental section, table, figure, or benchmark data, so the empirical result cannot be evaluated.

Authors: We agree that the abstract would benefit from explicit references to the supporting experimental results. The full manuscript includes performance evaluations in the experimental section with direct comparisons to hand-optimized baselines on SpGEMM and related kernels. We will revise the abstract to cite the relevant figures and tables. revision: yes
Referee: [Abstract] Abstract: the dependence theory is asserted to ensure safety for parallel programs with sparse outputs or multiple sparse inputs, but no axioms, definitions, or proof outline is supplied, preventing assessment of whether it correctly identifies all unsafe compositions.

Authors: The abstract is intended as a concise summary. The manuscript develops the dependence theory in detail, including the relevant axioms, definitions, and proof outlines in the sections presenting the theory. We will update the abstract to reference those sections so readers can locate the formal development immediately. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and description introduce WingSpan as a new language with a dependence theory for sparse tensor parallelism, claiming empirical performance matching on kernels like SpGEMM. No equations, fitted parameters, self-citations, or derivation steps are present that reduce claims to inputs by construction. The central claims rest on the proposed system and external empirical validation rather than any self-definitional, fitted-input, or self-citation load-bearing patterns. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5634 in / 934 out tokens · 12769 ms · 2026-06-26T14:26:27.322760+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 20 canonical work pages

[1]

cuSPARSE.https://docs.nvidia.com/cuda/cusparse/index.html

2024. cuSPARSE.https://docs.nvidia.com/cuda/cusparse/index.html

2024
[2]

Willow Ahrens, Teodoro Fields Collin, Radha Patel, Kyle Deeds, Chang- wan Hong, and Saman Amarasinghe. 2025. Finch: Sparse and Struc- tured Tensor Programming with Control Flow.Proc. ACM Program. Lang.9, OOPSLA1 (April 2025). doi:10.1145/3720473Place: New York, NY, USA Publisher: Association for Computing Machinery

work page doi:10.1145/3720473place: 2025
[3]

Willow Ahrens, Daniel Donenfeld, Fredrik Kjolstad, and Saman Ama- rasinghe. 2023. Looplets: A Language for Structured Coiteration. In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization (CGO). Association for Computing Ma- chinery, New York, NY, USA, 41–54. doi:10.1145/3579990.3580020

work page doi:10.1145/3579990.3580020 2023
[4]

Fahri Aydos. 2020. WebScreenshots. doi:10.34740/KAGGLE/DS/202248

work page doi:10.34740/kaggle/ds/202248 2020
[5]

Mohsen Aznaveh, Jinhao Chen, Timothy A Davis, Bálint Hegyi, Scott P Kolodziej, Timothy G Mattson, and Gábor Szárnyas. 2020. Parallel GraphBLAS with OpenMP*. In2020 Proceedings of the SIAM Workshop on Combinatorial Scientific Computing. SIAM, 138–148

2020
[6]

2020.PETSc Users Manual (Rev

S Balay, S Abhyankar, Mark F Adams, J Brown, P Brune, K Buschelman, L Dalcin, A Dener, V Eijkhout, W Gropp, and others. 2020.PETSc Users Manual (Rev. 3.13). Technical Report. Argonne National Lab.(ANL), Argonne, IL (United States)

2020
[7]

1997.Dependence Analysis

Utpal Banerjee. 1997.Dependence Analysis. Springer

1997
[8]

Aart Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasi- lache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler Support for Sparse Tensor Computations in MLIR.ACM Transactions on Architecture and Code Optimization19, 4 (Sept. 2022), 50:1–50:25. doi:10.1145/3544559

work page doi:10.1145/3544559 2022
[9]

Stephen Chou and Saman Amarasinghe. 2022. Compilation of dy- namic sparse tensor algebra.Proceedings of the ACM on Programming Languages6, OOPSLA2 (Oct. 2022), 175:1408–175:1437. doi:10.1145/ 3563338

2022
[10]

Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. For- mat abstraction for sparse tensor algebra compilers.Proceedings of the ACM on Programming Languages2, OOPSLA (Oct. 2018), 123:1–123:30. doi:10.1145/3276493

work page doi:10.1145/3276493 2018
[11]

Atharva Chougule, Alexander J Root, Rubens Lacouture, Bobby Yan, Rohan Yadav, and Fredrik Kjolstad. 2026. Partitioning Unstruc- tured Sparse Tensor Algebra for Load-Balanced Parallel Execution. arXiv:2604.17198 [cs.PL]https://arxiv.org/abs/2604.17198

Pith/arXiv arXiv 2026
[12]

Sivakumar

Anirban Dasgupta, Ravi Kumar, and D. Sivakumar. 2012. Sparse and Lopsided Set Disjointness via Information Theory. InApproximation, Randomization, and Combinatorial Optimization. Algorithms and Tech- niques (Lecture Notes in Computer Science), Anupam Gupta, Klaus Jansen, José Rolim, and Rocco Servedio (Eds.). Springer, Berlin, Hei- delberg, 517–528. doi:1...

work page doi:10.1007/978-3-642-32512-0_44 2012
[13]

Timothy A. Davis. 2019. Algorithm 1000: SuiteSparse:GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra.ACM Trans. Math. Softw.45, 4, Article 44 (dec 2019), 25 pages. doi:10.1145/ 3322125

2019
[14]

Sofie De Cnudde, Yanou Ramon, David Martens, and Foster Provost
[15]

Deep learning on big, sparse, behavioral data.Big data7, 4 (2019), 286–307

2019
[16]

Daniel Donenfeld, Stephen Chou, and Saman Amarasinghe. 2022. Uni- fied Compilation for Lossless Compression and Sparse Computing. In2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 205–216. doi:10.1109/CGO53902.2022.9741282

work page doi:10.1109/cgo53902.2022.9741282 2022
[17]

Laurent El Ghaoui, Vu Pham, Guan-Cheng Li, Viet-An Duong, Ashok Srivastava, and Kanishka Bhaduri. 2013. Understanding large text corpora via sparse machine learning.Statistical Analysis and Data Mining: The ASA Data Science Journal6, 3 (2013), 221–242

2013
[18]

Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hy- pothesis: Finding Sparse, Trainable Neural Networks.arXiv:1803.03635 [cs](March 2019).http://arxiv.org/abs/1803.03635arXiv: 1803.03635

Pith/arXiv arXiv 2019
[19]

Graph500. 2025. November 2025 BFS.http://graph500.org/?page_id= 1410

2025
[20]

Gaël Guennebaud, Benoît Jacob, and others. 2010. Eigen v3.http: //eigen.tuxfamily.org

2010
[21]

Dorit S Hochbaum and Philipp Baumann. 2016. Sparse computation for large-scale data mining.IEEE Transactions on Big Data2, 2 (2016), 151–174

2016
[22]

Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Frédo Durand. 2019. Taichi: a language for high-performance computation on spatially sparse data structures.ACM Transactions on Graphics38, 6 (Nov. 2019), 201:1–201:16. doi:10.1145/3355089.3356506

work page doi:10.1145/3355089.3356506 2019
[23]

Intel Corporation. 2024. Developer Reference for Intel®oneAPI Math Kernel Library for Fortran.https://www.intel.com/content/www/us/ en/docs/onemkl/developer-reference-fortran/2024-0/overview.html

2024
[24]

John Jenkinson, Artyom Grigoryan, Mehdi Hajinoroozi, Raquel Díaz Hernández, Hayde Peregrina Barreto, Ariel Ortiz Esquivel, Leopoldo Altamirano, and Vahram Chavushyan. 2014. Machine learning and image processing in astronomy with sparse data sets. In2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 200–203

2014
[25]

2001.Optimizing Compilers For Mod- ern Architectures: A Dependence-Based Approach

Ken Kennedy and John R Allen. 2001.Optimizing Compilers For Mod- ern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc

2001
[26]

Owens, Marcin Zalewski, Timothy Mattson, and Jose Moreira

Jeremy Kepner, Peter Aaltonen, David Bader, Aydin Buluç, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, Carl Yang, John D. Owens, Marcin Zalewski, Timothy Mattson, and Jose Moreira. 2016. Mathematical foundations of the GraphBLAS. In2016 IEEE High Per- formance Extreme Computing Conferen...

arXiv 2016
[27]

2011.Graph Algorithms in the Lan- guage of Linear Algebra

Jeremy Kepner and John Gilbert. 2011.Graph Algorithms in the Lan- guage of Linear Algebra. Society for Industrial and Applied Mathemat- ics, USA

2011
[28]

Fredrik Kjolstad, Stephen Chou, David Lugato, Shoaib Kamil, and Saman Amarasinghe. 2017. taco: a tool to generate tensor algebra kernels. InProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). IEEE Press, Urbana- Champaign, IL, USA, 943–948

2017
[29]

Kolodziej, Mohsen Aznaveh, Matthew Bullock, Jarrett David, Timothy A

Scott P. Kolodziej, Mohsen Aznaveh, Matthew Bullock, Jarrett David, Timothy A. Davis, Matthew Henderson, Yifan Hu, and Read Sandstrom
[30]

doi:10.21105/joss.01244

The SuiteSparse Matrix Collection Website Interface.Journal of Open Source Software4, 35 (2019), 1244. doi:10.21105/joss.01244

work page doi:10.21105/joss.01244 2019
[31]

Minseok Kong, Jiho Park, Daye Lee, Nikolaos Kourtzanidis, and Jung- min So. 2025. Simulating Mobile Robot Vision: An Analysis of RGB-D Versus RGB-Based Distance Accuracy and CPU Optimization. In2025 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, 0871–0876

2025
[32]

Jie Liu, Zhongyuan Zhao, Zijian Ding, Benjamin Brock, Hongbo Rong, and Zhiru Zhang. 2024. UniSparse: An Intermediate Language for General Sparse Format Customization.Reproduction Package for Arti- cle ‘UniSparse: An Intermediate LanguageforGeneralSparseFormatCus- tomization’8, OOPSLA1 (April 2024), 99:137–99:165. doi:10.1145/ 3649816

2024
[33]

Tim Mattson, Timothy A Davis, Manoj Kumar, Aydin Buluc, Scott McMillan, José Moreira, and Carl Yang. 2019. LAGraph: A community effort to collect graph algorithms built on top of the GraphBLAS. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 276–284

2019
[34]

Erdal Mutlu, Ruiqin Tian, Bin Ren, Sriram Krishnamoorthy, Roberto Gioiosa, Jacques Pienaar, and Gokcen Kestor. 2020. COMET: A Domain- Specific Compilation of High-Performance Computational Chemistry. InInternational Workshop on Languages and Compilers for Parallel Computing. Springer, 87–103. WingSpan: Concurrency and Dependence for Sparse and Structured ...

2020
[35]

George, Kalyan Bhetwal, Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky

Tobi Popoola, Tuowen Zhao, Aaron St. George, Kalyan Bhetwal, Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2023. Code Synthesis for Sparse Tensor Format Conversion and Optimiza- tion. InProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization (CGO 2023). Association for Com- puting Machinery, New York, N...

work page doi:10.1145/3579990 2023
[36]

Jason Rumengan, Terry Yue Zhuo, and Conrad Sanderson. 2021. PyArmadillo: a streamlined linear algebra library for Python.Journal of Open Source Software6, 66 (2021), 3051. doi:10.21105/joss.03051

work page doi:10.21105/joss.03051 2021
[37]

Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjol- stad. 2020. A sparse iteration space transformation framework for sparse tensor algebra.Proceedings of the ACM on Programming Lan- guages4, OOPSLA (Nov. 2020), 158:1–158:30. doi:10.1145/3428226

work page doi:10.1145/3428226 2020
[38]

Amir Shaikhha, Mathieu Huot, Jaclyn Smith, and Dan Olteanu. 2022. Functional collection programming with semi-ring dictionaries.Pro- ceedings of the ACM on Programming Languages6, OOPSLA1 (April 2022), 89:1–89:33. doi:10.1145/3527333

work page doi:10.1145/3527333 2022
[39]

Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis

Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017.FROSTT: The Formidable Repository of Open Sparse Tensors and Tools.http://frostt.io/

2017
[40]

Michelle Mills Strout, Geri Georg, and Catherine Olschanowsky. 2012. Set and Relation Manipulation for the Sparse Polyhedral Framework. InLanguages and Compilers for Parallel Computing (Lecture Notes in Computer Science). Springer, Berlin, Heidelberg, 61–75. doi:10.1007/ 978-3-642-37658-0_5

2012
[41]

Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code.Proc. IEEE106, 11 (Nov. 2018), 1921–1934. doi:10.1109/JPROC.2018.2857721Conference Name: Proceedings of the IEEE

work page doi:10.1109/jproc.2018.2857721conference 2018
[42]

Michelle Mills Strout, Alan LaMielle, Larry Carter, Jeanne Ferrante, Barbara Kreaseck, and Catherine Olschanowsky. 2016. An approach for code generation in the Sparse Polyhedral Framework.Parallel Comput.53 (April 2016), 32–57. doi:10.1016/j.parco.2016.02.004

work page doi:10.1016/j.parco.2016.02.004 2016
[43]

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. 2020. Efficient Processing of Deep Neural Networks.Synthesis Lectures on Computer Architecture15, 2 (June 2020), 1–341. doi:10.2200/ S01004ED1V01Y202004CAC050Publisher: Morgan & Claypool Pub- lishers

2020
[44]

Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, and Gokcen Kestor
[45]

2021).http://arxiv.org/abs/2102

A High-Performance Sparse Tensor Algebra Compiler in Multi- Level IR.arXiv:2102.05187 [cs](Feb. 2021).http://arxiv.org/abs/2102. 05187arXiv: 2102.05187

arXiv 2021
[46]

qblaze: An Efficient and Scalable Sparse Quantum Simulator

Hristo Venev, Thien Udomsrirungruang, Dimitar Dimitrov, Timon Gehr, and Martin Vechev. 2025. qblaze: An Efficient and Scalable Sparse Quantum Simulator.Artifact for "qblaze: An Efficient and Scalable Sparse Quantum Simulator"9, OOPSLA2 (Oct. 2025), 288:444–288:470. doi:10.1145/3763066

work page doi:10.1145/3763066 2025
[47]

Liang, Image classiﬁcation based on RESNET

Richard Vuduc, James W. Demmel, and Katherine A. Yelick. 2005. OSKI: A library of automatically tuned sparse matrix kernels.Journal of Physics: Conference Series16 (Jan. 2005), 521–530. doi:10.1088/1742- 6596/16/1/071Publisher: IOP Publishing

work page doi:10.1088/1742- 2005
[48]

Jaeyeon Won, Willow Ahrens, Saman Amarasinghe, and Joel S Emer
[49]

InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

Insum: Sparse GPU Kernels Simplified and Optimized with Indi- rect Einsums. InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 993–1006
[50]

Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, and Luis Ceze. 2023. SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Sys- tems, Volume 3 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 6...

work page doi:10.1145/3582016.3582047 2023
[51]

doi:10.1145/3445814.3446706

Guowei Zhang, Nithya Attaluri, Joel S. Emer, and Daniel Sanchez. 2021. Gamma: leveraging Gustavson’s algorithm to accelerate sparse matrix multiplication. InProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, Virtual USA, 687–701. doi:10.1145/3445814.3446702

work page doi:10.1145/3445814.3446702 2021
[52]

Tuowen Zhao, Tobi Popoola, Mary Hall, Catherine Olschanowsky, and Michelle Strout. 2022. Polyhedral specification and code generation of sparse tensor contraction with co-iteration.ACM Transactions on Architecture and Code Optimization20, 1 (2022), 1–26

2022
[53]

Xunyu Zhu, Jian Li, Yong Liu, Can Ma, and Weiping Wang. 2024. A Survey on Model Compression for Large Language Models.Transac- tions of the Association for Computational Linguistics12 (Nov. 2024), 1556–1577. doi:10.1162/tacl_a_00704

work page doi:10.1162/tacl_a_00704 2024

[1] [1]

cuSPARSE.https://docs.nvidia.com/cuda/cusparse/index.html

2024. cuSPARSE.https://docs.nvidia.com/cuda/cusparse/index.html

2024

[2] [2]

Willow Ahrens, Teodoro Fields Collin, Radha Patel, Kyle Deeds, Chang- wan Hong, and Saman Amarasinghe. 2025. Finch: Sparse and Struc- tured Tensor Programming with Control Flow.Proc. ACM Program. Lang.9, OOPSLA1 (April 2025). doi:10.1145/3720473Place: New York, NY, USA Publisher: Association for Computing Machinery

work page doi:10.1145/3720473place: 2025

[3] [3]

Willow Ahrens, Daniel Donenfeld, Fredrik Kjolstad, and Saman Ama- rasinghe. 2023. Looplets: A Language for Structured Coiteration. In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization (CGO). Association for Computing Ma- chinery, New York, NY, USA, 41–54. doi:10.1145/3579990.3580020

work page doi:10.1145/3579990.3580020 2023

[4] [4]

Fahri Aydos. 2020. WebScreenshots. doi:10.34740/KAGGLE/DS/202248

work page doi:10.34740/kaggle/ds/202248 2020

[5] [5]

Mohsen Aznaveh, Jinhao Chen, Timothy A Davis, Bálint Hegyi, Scott P Kolodziej, Timothy G Mattson, and Gábor Szárnyas. 2020. Parallel GraphBLAS with OpenMP*. In2020 Proceedings of the SIAM Workshop on Combinatorial Scientific Computing. SIAM, 138–148

2020

[6] [6]

2020.PETSc Users Manual (Rev

S Balay, S Abhyankar, Mark F Adams, J Brown, P Brune, K Buschelman, L Dalcin, A Dener, V Eijkhout, W Gropp, and others. 2020.PETSc Users Manual (Rev. 3.13). Technical Report. Argonne National Lab.(ANL), Argonne, IL (United States)

2020

[7] [7]

1997.Dependence Analysis

Utpal Banerjee. 1997.Dependence Analysis. Springer

1997

[8] [8]

Aart Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasi- lache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler Support for Sparse Tensor Computations in MLIR.ACM Transactions on Architecture and Code Optimization19, 4 (Sept. 2022), 50:1–50:25. doi:10.1145/3544559

work page doi:10.1145/3544559 2022

[9] [9]

Stephen Chou and Saman Amarasinghe. 2022. Compilation of dy- namic sparse tensor algebra.Proceedings of the ACM on Programming Languages6, OOPSLA2 (Oct. 2022), 175:1408–175:1437. doi:10.1145/ 3563338

2022

[10] [10]

Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. For- mat abstraction for sparse tensor algebra compilers.Proceedings of the ACM on Programming Languages2, OOPSLA (Oct. 2018), 123:1–123:30. doi:10.1145/3276493

work page doi:10.1145/3276493 2018

[11] [11]

Atharva Chougule, Alexander J Root, Rubens Lacouture, Bobby Yan, Rohan Yadav, and Fredrik Kjolstad. 2026. Partitioning Unstruc- tured Sparse Tensor Algebra for Load-Balanced Parallel Execution. arXiv:2604.17198 [cs.PL]https://arxiv.org/abs/2604.17198

Pith/arXiv arXiv 2026

[12] [12]

Sivakumar

Anirban Dasgupta, Ravi Kumar, and D. Sivakumar. 2012. Sparse and Lopsided Set Disjointness via Information Theory. InApproximation, Randomization, and Combinatorial Optimization. Algorithms and Tech- niques (Lecture Notes in Computer Science), Anupam Gupta, Klaus Jansen, José Rolim, and Rocco Servedio (Eds.). Springer, Berlin, Hei- delberg, 517–528. doi:1...

work page doi:10.1007/978-3-642-32512-0_44 2012

[13] [13]

Timothy A. Davis. 2019. Algorithm 1000: SuiteSparse:GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra.ACM Trans. Math. Softw.45, 4, Article 44 (dec 2019), 25 pages. doi:10.1145/ 3322125

2019

[14] [14]

Sofie De Cnudde, Yanou Ramon, David Martens, and Foster Provost

[15] [15]

Deep learning on big, sparse, behavioral data.Big data7, 4 (2019), 286–307

2019

[16] [16]

Daniel Donenfeld, Stephen Chou, and Saman Amarasinghe. 2022. Uni- fied Compilation for Lossless Compression and Sparse Computing. In2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 205–216. doi:10.1109/CGO53902.2022.9741282

work page doi:10.1109/cgo53902.2022.9741282 2022

[17] [17]

Laurent El Ghaoui, Vu Pham, Guan-Cheng Li, Viet-An Duong, Ashok Srivastava, and Kanishka Bhaduri. 2013. Understanding large text corpora via sparse machine learning.Statistical Analysis and Data Mining: The ASA Data Science Journal6, 3 (2013), 221–242

2013

[18] [18]

Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hy- pothesis: Finding Sparse, Trainable Neural Networks.arXiv:1803.03635 [cs](March 2019).http://arxiv.org/abs/1803.03635arXiv: 1803.03635

Pith/arXiv arXiv 2019

[19] [19]

Graph500. 2025. November 2025 BFS.http://graph500.org/?page_id= 1410

2025

[20] [20]

Gaël Guennebaud, Benoît Jacob, and others. 2010. Eigen v3.http: //eigen.tuxfamily.org

2010

[21] [21]

Dorit S Hochbaum and Philipp Baumann. 2016. Sparse computation for large-scale data mining.IEEE Transactions on Big Data2, 2 (2016), 151–174

2016

[22] [22]

Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Frédo Durand. 2019. Taichi: a language for high-performance computation on spatially sparse data structures.ACM Transactions on Graphics38, 6 (Nov. 2019), 201:1–201:16. doi:10.1145/3355089.3356506

work page doi:10.1145/3355089.3356506 2019

[23] [23]

Intel Corporation. 2024. Developer Reference for Intel®oneAPI Math Kernel Library for Fortran.https://www.intel.com/content/www/us/ en/docs/onemkl/developer-reference-fortran/2024-0/overview.html

2024

[24] [24]

John Jenkinson, Artyom Grigoryan, Mehdi Hajinoroozi, Raquel Díaz Hernández, Hayde Peregrina Barreto, Ariel Ortiz Esquivel, Leopoldo Altamirano, and Vahram Chavushyan. 2014. Machine learning and image processing in astronomy with sparse data sets. In2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 200–203

2014

[25] [25]

2001.Optimizing Compilers For Mod- ern Architectures: A Dependence-Based Approach

Ken Kennedy and John R Allen. 2001.Optimizing Compilers For Mod- ern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc

2001

[26] [26]

Owens, Marcin Zalewski, Timothy Mattson, and Jose Moreira

Jeremy Kepner, Peter Aaltonen, David Bader, Aydin Buluç, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, Carl Yang, John D. Owens, Marcin Zalewski, Timothy Mattson, and Jose Moreira. 2016. Mathematical foundations of the GraphBLAS. In2016 IEEE High Per- formance Extreme Computing Conferen...

arXiv 2016

[27] [27]

2011.Graph Algorithms in the Lan- guage of Linear Algebra

Jeremy Kepner and John Gilbert. 2011.Graph Algorithms in the Lan- guage of Linear Algebra. Society for Industrial and Applied Mathemat- ics, USA

2011

[28] [28]

Fredrik Kjolstad, Stephen Chou, David Lugato, Shoaib Kamil, and Saman Amarasinghe. 2017. taco: a tool to generate tensor algebra kernels. InProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). IEEE Press, Urbana- Champaign, IL, USA, 943–948

2017

[29] [29]

Kolodziej, Mohsen Aznaveh, Matthew Bullock, Jarrett David, Timothy A

Scott P. Kolodziej, Mohsen Aznaveh, Matthew Bullock, Jarrett David, Timothy A. Davis, Matthew Henderson, Yifan Hu, and Read Sandstrom

[30] [30]

doi:10.21105/joss.01244

The SuiteSparse Matrix Collection Website Interface.Journal of Open Source Software4, 35 (2019), 1244. doi:10.21105/joss.01244

work page doi:10.21105/joss.01244 2019

[31] [31]

Minseok Kong, Jiho Park, Daye Lee, Nikolaos Kourtzanidis, and Jung- min So. 2025. Simulating Mobile Robot Vision: An Analysis of RGB-D Versus RGB-Based Distance Accuracy and CPU Optimization. In2025 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, 0871–0876

2025

[32] [32]

Jie Liu, Zhongyuan Zhao, Zijian Ding, Benjamin Brock, Hongbo Rong, and Zhiru Zhang. 2024. UniSparse: An Intermediate Language for General Sparse Format Customization.Reproduction Package for Arti- cle ‘UniSparse: An Intermediate LanguageforGeneralSparseFormatCus- tomization’8, OOPSLA1 (April 2024), 99:137–99:165. doi:10.1145/ 3649816

2024

[33] [33]

Tim Mattson, Timothy A Davis, Manoj Kumar, Aydin Buluc, Scott McMillan, José Moreira, and Carl Yang. 2019. LAGraph: A community effort to collect graph algorithms built on top of the GraphBLAS. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 276–284

2019

[34] [34]

Erdal Mutlu, Ruiqin Tian, Bin Ren, Sriram Krishnamoorthy, Roberto Gioiosa, Jacques Pienaar, and Gokcen Kestor. 2020. COMET: A Domain- Specific Compilation of High-Performance Computational Chemistry. InInternational Workshop on Languages and Compilers for Parallel Computing. Springer, 87–103. WingSpan: Concurrency and Dependence for Sparse and Structured ...

2020

[35] [35]

George, Kalyan Bhetwal, Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky

Tobi Popoola, Tuowen Zhao, Aaron St. George, Kalyan Bhetwal, Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2023. Code Synthesis for Sparse Tensor Format Conversion and Optimiza- tion. InProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization (CGO 2023). Association for Com- puting Machinery, New York, N...

work page doi:10.1145/3579990 2023

[36] [36]

Jason Rumengan, Terry Yue Zhuo, and Conrad Sanderson. 2021. PyArmadillo: a streamlined linear algebra library for Python.Journal of Open Source Software6, 66 (2021), 3051. doi:10.21105/joss.03051

work page doi:10.21105/joss.03051 2021

[37] [37]

Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjol- stad. 2020. A sparse iteration space transformation framework for sparse tensor algebra.Proceedings of the ACM on Programming Lan- guages4, OOPSLA (Nov. 2020), 158:1–158:30. doi:10.1145/3428226

work page doi:10.1145/3428226 2020

[38] [38]

Amir Shaikhha, Mathieu Huot, Jaclyn Smith, and Dan Olteanu. 2022. Functional collection programming with semi-ring dictionaries.Pro- ceedings of the ACM on Programming Languages6, OOPSLA1 (April 2022), 89:1–89:33. doi:10.1145/3527333

work page doi:10.1145/3527333 2022

[39] [39]

Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis

Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017.FROSTT: The Formidable Repository of Open Sparse Tensors and Tools.http://frostt.io/

2017

[40] [40]

Michelle Mills Strout, Geri Georg, and Catherine Olschanowsky. 2012. Set and Relation Manipulation for the Sparse Polyhedral Framework. InLanguages and Compilers for Parallel Computing (Lecture Notes in Computer Science). Springer, Berlin, Heidelberg, 61–75. doi:10.1007/ 978-3-642-37658-0_5

2012

[41] [41]

Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code.Proc. IEEE106, 11 (Nov. 2018), 1921–1934. doi:10.1109/JPROC.2018.2857721Conference Name: Proceedings of the IEEE

work page doi:10.1109/jproc.2018.2857721conference 2018

[42] [42]

Michelle Mills Strout, Alan LaMielle, Larry Carter, Jeanne Ferrante, Barbara Kreaseck, and Catherine Olschanowsky. 2016. An approach for code generation in the Sparse Polyhedral Framework.Parallel Comput.53 (April 2016), 32–57. doi:10.1016/j.parco.2016.02.004

work page doi:10.1016/j.parco.2016.02.004 2016

[43] [43]

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. 2020. Efficient Processing of Deep Neural Networks.Synthesis Lectures on Computer Architecture15, 2 (June 2020), 1–341. doi:10.2200/ S01004ED1V01Y202004CAC050Publisher: Morgan & Claypool Pub- lishers

2020

[44] [44]

Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, and Gokcen Kestor

[45] [45]

2021).http://arxiv.org/abs/2102

A High-Performance Sparse Tensor Algebra Compiler in Multi- Level IR.arXiv:2102.05187 [cs](Feb. 2021).http://arxiv.org/abs/2102. 05187arXiv: 2102.05187

arXiv 2021

[46] [46]

qblaze: An Efficient and Scalable Sparse Quantum Simulator

Hristo Venev, Thien Udomsrirungruang, Dimitar Dimitrov, Timon Gehr, and Martin Vechev. 2025. qblaze: An Efficient and Scalable Sparse Quantum Simulator.Artifact for "qblaze: An Efficient and Scalable Sparse Quantum Simulator"9, OOPSLA2 (Oct. 2025), 288:444–288:470. doi:10.1145/3763066

work page doi:10.1145/3763066 2025

[47] [47]

Liang, Image classiﬁcation based on RESNET

Richard Vuduc, James W. Demmel, and Katherine A. Yelick. 2005. OSKI: A library of automatically tuned sparse matrix kernels.Journal of Physics: Conference Series16 (Jan. 2005), 521–530. doi:10.1088/1742- 6596/16/1/071Publisher: IOP Publishing

work page doi:10.1088/1742- 2005

[48] [48]

Jaeyeon Won, Willow Ahrens, Saman Amarasinghe, and Joel S Emer

[49] [49]

InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

Insum: Sparse GPU Kernels Simplified and Optimized with Indi- rect Einsums. InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 993–1006

[50] [50]

Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, and Luis Ceze. 2023. SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Sys- tems, Volume 3 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 6...

work page doi:10.1145/3582016.3582047 2023

[51] [51]

doi:10.1145/3445814.3446706

Guowei Zhang, Nithya Attaluri, Joel S. Emer, and Daniel Sanchez. 2021. Gamma: leveraging Gustavson’s algorithm to accelerate sparse matrix multiplication. InProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, Virtual USA, 687–701. doi:10.1145/3445814.3446702

work page doi:10.1145/3445814.3446702 2021

[52] [52]

Tuowen Zhao, Tobi Popoola, Mary Hall, Catherine Olschanowsky, and Michelle Strout. 2022. Polyhedral specification and code generation of sparse tensor contraction with co-iteration.ACM Transactions on Architecture and Code Optimization20, 1 (2022), 1–26

2022

[53] [53]

Xunyu Zhu, Jian Li, Yong Liu, Can Ma, and Weiping Wang. 2024. A Survey on Model Compression for Large Language Models.Transac- tions of the Association for Computational Linguistics12 (Nov. 2024), 1556–1577. doi:10.1162/tacl_a_00704

work page doi:10.1162/tacl_a_00704 2024