CODA re-expresses most non-attention Transformer computations as GEMM-plus-epilogue programs using a constrained set of composable primitives to keep intermediate results on-chip and cut global memory traffic.
arXiv (2025),10.48550/arXiv.2506.09092
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Empirical study of agentic LLM generation of parallel Julia code finds reliable execution only at small scales with recurring failures in task dependencies and scheduling at larger scales.
HTAM builds a Hierarchical Transition Graph to organize coarse global directions and detailed local strategies for guiding LLM-based CUDA kernel optimization, improving results on KernelBench.
citing papers explorer
-
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
CODA re-expresses most non-attention Transformer computations as GEMM-plus-epilogue programs using a constrained set of composable primitives to keep intermediate results on-chip and cut global memory traffic.
-
Generated, Parallel, Scalable? A Study of Agentic AI-Generated Julia Code on Supercomputers
Empirical study of agentic LLM generation of parallel Julia code finds reliable execution only at small scales with recurring failures in task dependencies and scheduling at larger scales.
-
HTAM: Hierarchical Transition-Attended Memory for Operator Optimization
HTAM builds a Hierarchical Transition Graph to organize coarse global directions and detailed local strategies for guiding LLM-based CUDA kernel optimization, improving results on KernelBench.