pith. sign in

arxiv: 2603.02376 · v2 · pith:2FZNT56Jnew · submitted 2026-03-02 · 💻 cs.DC · cs.AR· cs.LG· cs.MA

CUCo: An Agentic Framework for Compute and Communication Co-design

classification 💻 cs.DC cs.ARcs.LGcs.MA
keywords co-designagentagenticcommunicationcomputecucoframeworkinference
0
0 comments X
read the original abstract

Computation and communication in distributed LLM training and inference are traditionally optimized in isolation; expert-crafted systems such as DeepEP, FLUX, and TokenWeave show the potential of co-design but require deep systems expertise and hardware-specific tuning; CUCo is an agentic framework that automates compute-communication co-design of CUDA kernels by combining a structured design-space formalization with a correctness-first fast-path agent for reliable baselines and an evolution-driven slow-path agent for high-performance strategies, achieving up to 1.57x speedup across four multi-GPU workloads and discovering a two-stream overlap strategy on a DeepSeek-V3 MoE layer that hides dispatch behind local compute at an LLM inference cost under $10 per workload.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.