CCCL delivers 1.34-1.94x faster cross-node GPU collectives via CXL memory pooling than 200 Gbps InfiniBand RDMA, with 1.11x LLM training speedup and 2.75x hardware cost reduction.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.DC 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
CIDER improves throughput of memory-disaggregated KV stores by up to 6.6x on YCSB by replacing optimistic synchronization with pessimistic synchronization, global write-combining, and a contention-aware scheme.
citing papers explorer
-
CCCL: Node-Spanning GPU Collectives with CXL Memory Pooling
CCCL delivers 1.34-1.94x faster cross-node GPU collectives via CXL memory pooling than 200 Gbps InfiniBand RDMA, with 1.11x LLM training speedup and 2.75x hardware cost reduction.
-
CIDER: Boosting Memory-Disaggregated Key-Value Stores with Pessimistic Synchronization
CIDER improves throughput of memory-disaggregated KV stores by up to 6.6x on YCSB by replacing optimistic synchronization with pessimistic synchronization, global write-combining, and a contention-aware scheme.