A cell-wise primary storage for continuous high-order FEM enables exact matrix-free CG iteration on GPUs by confining communication to the preconditioner and realizing DSS via sequential axis face exchanges without gather-scatter or atomics.
arXiv preprint arXiv:2512.18134 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
TLX introduces MIMW-based extensions to Triton that let developers orchestrate warp-group execution and asynchronous hardware features while preserving blocked programming productivity, with kernels deployed in large-scale training and inference.
citing papers explorer
-
Coalesced Matrix-Free Finite Elements in Cell-Wise Storage
A cell-wise primary storage for continuous high-order FEM enables exact matrix-free CG iteration on GPUs by confining communication to the preconditioner and realizing DSS via sequential axis face exchanges without gather-scatter or atomics.
-
TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments
TLX introduces MIMW-based extensions to Triton that let developers orchestrate warp-group execution and asynchronous hardware features while preserving blocked programming productivity, with kernels deployed in large-scale training and inference.