AsyncSparse presents BCSR and WCSR kernels that use TMA and warp specialization to accelerate SpMM, outperforming prior libraries by 1.47-6.24x on SuiteSparse and achieving 2.66x end-to-end speedup on Qwen2.5-7B at 90% block sparsity.
Tensormd: Molecular dynamics simulation with ab initio accuracy of 50 billion atoms
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 2polarities
background 2representative citing papers
SMC-AI scales Monte Carlo simulations to 4 trillion atoms on AI hardware clusters, achieving 32 times larger systems and 1.3 times higher throughput than prior records while decoupling ML models from the simulation core.
citing papers explorer
-
AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures
AsyncSparse presents BCSR and WCSR kernels that use TMA and warp specialization to accelerate SpMM, outperforming prior libraries by 1.47-6.24x on SuiteSparse and achieving 2.66x end-to-end speedup on Qwen2.5-7B at 90% block sparsity.
-
SMC-AI: Scaling Monte Carlo Simulation to Four Trillion Atoms with AI Accelerators
SMC-AI scales Monte Carlo simulations to 4 trillion atoms on AI hardware clusters, achieving 32 times larger systems and 1.3 times higher throughput than prior records while decoupling ML models from the simulation core.