AsyncSparse presents BCSR and WCSR kernels that use TMA and warp specialization to accelerate SpMM, outperforming prior libraries by 1.47-6.24x on SuiteSparse and achieving 2.66x end-to-end speedup on Qwen2.5-7B at 90% block sparsity.
Qwen2.5 technical report,
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
R2SAEA fine-tunes an LLM with RL to reason about solution relations for surrogate-assisted evolutionary optimization, reporting improved relation prediction and SOTA performance on single- and multi-objective benchmarks.
An LLM-based framework automates auditing of discharge summaries using a DISCHARGED-derived checklist on MIMIC-IV data to detect missing or ambiguous documentation elements.
citing papers explorer
-
AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures
AsyncSparse presents BCSR and WCSR kernels that use TMA and warp specialization to accelerate SpMM, outperforming prior libraries by 1.47-6.24x on SuiteSparse and achieving 2.66x end-to-end speedup on Qwen2.5-7B at 90% block sparsity.
-
Relation Reasoning with LLMs in Expensive Optimization
R2SAEA fine-tunes an LLM with RL to reason about solution relations for surrogate-assisted evolutionary optimization, reporting improved relation prediction and SOTA performance on single- and multi-objective benchmarks.
-
Automated Auditing of Hospital Discharge Summaries for Care Transitions
An LLM-based framework automates auditing of discharge summaries using a DISCHARGED-derived checklist on MIMIC-IV data to detect missing or ambiguous documentation elements.