ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores

· 2024 · arXiv 7535.363847

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

Stencil Computations on Cerebras Wafer-Scale Engine

cs.DC · 2026-05-08 · unverdicted · novelty 6.0

CStencil on the WSE-3 achieves up to 342x speedup for 2D stencils versus an adapted single-precision GPU solver and saturates both compute and on-chip memory bandwidth.

Matrix-Free 3D SIMP Topology Optimization with Fused Gather-GEMM-Scatter Kernels

cs.CE · 2026-04-20 · unverdicted · novelty 6.0

A fused gather-GEMM-scatter CUDA kernel achieves 4.6-7.3x end-to-end speedup and 3.2-4.9x lower energy for matrix-free 3D SIMP topology optimization on RTX 4090 compared to three-stage baselines.

citing papers explorer

Showing 2 of 2 citing papers.

Stencil Computations on Cerebras Wafer-Scale Engine cs.DC · 2026-05-08 · unverdicted · none · ref 6
CStencil on the WSE-3 achieves up to 342x speedup for 2D stencils versus an adapted single-precision GPU solver and saturates both compute and on-chip memory bandwidth.
Matrix-Free 3D SIMP Topology Optimization with Fused Gather-GEMM-Scatter Kernels cs.CE · 2026-04-20 · unverdicted · none · ref 50
A fused gather-GEMM-scatter CUDA kernel achieves 4.6-7.3x end-to-end speedup and 3.2-4.9x lower energy for matrix-free 3D SIMP topology optimization on RTX 4090 compared to three-stage baselines.

ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer