Embedding CUDA Graphs in UCX for multi-path intra-node GPU communication yields up to 2.95x bandwidth improvement over single-path UCX on a four-GPU node for large messages.
Ucx: An open source framework for hpc network apis and beyond
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
fields
cs.DC 2years
2026 2verdicts
UNVERDICTED 2roles
method 1polarities
use method 1representative citing papers
AAFLOW is a unified distributed runtime that models agentic workflows as operators with a zero-copy data plane using Apache Arrow and Cylon, achieving up to 4.64x pipeline speedup through improved data flow and batching.
citing papers explorer
-
Accelerating Intra-Node GPU-to-GPU Communication Through Multi-Path Transfers with CUDA Graphs
Embedding CUDA Graphs in UCX for multi-path intra-node GPU communication yields up to 2.95x bandwidth improvement over single-path UCX on a four-GPU node for large messages.
-
AAFLOW: Scalable Patterns for Agentic AI Workflows
AAFLOW is a unified distributed runtime that models agentic workflows as operators with a zero-copy data plane using Apache Arrow and Cylon, achieving up to 4.64x pipeline speedup through improved data flow and batching.