CUCo: An Agentic Framework for Compute and Communication Co-design

Aditya Akella; Bodun Hu; Saurabh Agarwal; Yoga Sri Varshan Varadharajan

arxiv: 2603.02376 · v2 · pith:2FZNT56Jnew · submitted 2026-03-02 · 💻 cs.DC · cs.AR· cs.LG· cs.MA

CUCo: An Agentic Framework for Compute and Communication Co-design

Yoga Sri Varshan Varadharajan , Bodun Hu , Saurabh Agarwal , Aditya Akella This is my paper

classification 💻 cs.DC cs.ARcs.LGcs.MA

keywords co-designagentagenticcommunicationcomputecucoframeworkinference

0 comments

read the original abstract

Computation and communication in distributed LLM training and inference are traditionally optimized in isolation; expert-crafted systems such as DeepEP, FLUX, and TokenWeave show the potential of co-design but require deep systems expertise and hardware-specific tuning; CUCo is an agentic framework that automates compute-communication co-design of CUDA kernels by combining a structured design-space formalization with a correctness-first fast-path agent for reliable baselines and an evolution-driven slow-path agent for high-performance strategies, achieving up to 1.57x speedup across four multi-GPU workloads and discovering a two-stream overlap strategy on a DeepSeek-V3 MoE layer that hides dispatch behind local compute at an LLM inference cost under $10 per workload.

This paper has not been read by Pith yet.

CUCo: An Agentic Framework for Compute and Communication Co-design

discussion (0)