archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 4

cs.DC 2026-05-12 reviewed

Decoupled compression speeds GPU collectives up to 9.65x
NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding

Jiamin Wang +2
cs.IT 2026-05-12 reviewed

Link failures cap LEO capacity scalability at O(1/n)
Capacity Scalability of LEO Constellations With Dynamic Link Failures

Wei Li +1
cs.DC 2026-05-12 reviewed

Per-head adaptive blocks improve sparse attention accuracy by 5.43%
AB-Sparse: Sparse Attention with Adaptive Block Size for Accurate and Efficient Long-Context Inference

Di Liu +8
cs.IT 2026-05-12 reviewed

Node failures scale wireless capacity and delay with sqrt of reliable nodes
On Capacity and Delay of Wireless Networks with Node Failures

Wei Li +3
cs.DC 2026-05-12 reviewed

Power capping leaves LLM decode energy untouched
The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures

Bole Ma +3
cs.LG 2026-05-12 reviewed

DynaTrain switches 70B model parallelism in under 2 seconds
DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training

Yuanqing Wang +11
cs.DC 2026-05-12 reviewed

Overlays trade reliability against overhead for AI agent discovery
Trade-offs in Decentralized Agentic AI Discovery Across the Compute Continuum

Patrizio Dazzi +3
cs.CE 2026-05-12 reviewed

LLM inference should be measured in joules per token at scale
Position: LLM Inference Should Be Evaluated as Energy-to-Token Production

Xiang Liu +7
cs.DC 2026-05-12 reviewed

GraphFlash hits 127x speedup in serverless graph processing
GraphFlash: Enabling Fast and Elastic Graph Processing on Serverless Infrastructure

Chen Zhao +4
cs.DC 2026-05-12 reviewed

NAVIS speeds on-SSD vector inserts up to 2.74x
NAVIS: Concurrent Search and Update with Low Position-Seeking Overhead in On-SSD Graph-Based Vector Search

Jaeyong Song +6
cs.DC 2026-05-12 reviewed

Off-chain twins let DeFi agents simulate trades without waiting for blocks
State Twins: An Off-Chain Substrate for Agentic Reasoning over Decentralized Finance Protocols

Ian C. Moore
cs.DC 2026-05-12 reviewed

Storage offloading breaks memory wall for full-graph GNN training
GriNNder: Breaking the Memory Capacity Wall in Full-Graph GNN Training with Storage Offloading

Jaeyong Song +6
quant-ph 2026-05-12 reviewed

Task runtime dispatches QIR programs to multiple quantum processors
Classic and Quantum Task-Based Intelligent Runtime for QIRs Running on Multiple QPUs

Narasinga Rao Miniskar +4
cs.RO 2026-05-12 reviewed

Kairos cuts physical AI task latency by 32-66 percent
Kairos: A Scalable Serving System for Physical AI

Yinwei Dai +5
cs.DC 2026-05-11 reviewed

Chunked prefetching speeds DiT steps up to 1.28x with 49% less GPU memory
ChunkFlow: Communication-Aware Chunked Prefetching for Layerwise Offloading in Distributed Diffusion Transformer Inference

Han Meng (University of California +5
cs.DC 2026-05-11 reviewed

Chakra standardizes graph traces for AI workload benchmarking
MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Srinivas Sridharan +28
cs.DC 2026-05-11 reviewed

Open traces standardize ML workload benchmarking
MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Srinivas Sridharan +28
cs.DC 2026-05-11 reviewed

Directed graphs support Byzantine consensus only under specific connectivity
Byzantine Consensus in Directed Graphs with Message Authentication

Nitin H. Vaidya +1
cs.DC 2026-05-11 reviewed

ReCoVer keeps microbatch count fixed after GPU failures
ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

Ziyue Liu +9
cs.DC 2026-05-11 reviewed

ReCoVer preserves exact training trajectory after GPU losses
ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

Ziyue Liu +9
cs.DC 2026-05-11 reviewed

ShardTensor scales SciML to arbitrary spatial resolutions
ShardTensor: Domain Parallelism for Scientific Machine Learning

Corey Adams +6
cs.DC 2026-05-11 reviewed

GCC 15 outperforms LLVM 21 on four of six RISC-V vector apps
Closer in the Gap: Towards Portable Performance on RISC-V Vector Processors

Ruimin Shi +4
cs.DC 2026-05-11 reviewed

GCC 15 outperforms LLVM 21 in four of six RISC-V vector apps
Closer in the Gap: Towards Portable Performance on RISC-V Vector Processors

Ruimin Shi +4
cs.DC 2026-05-11 reviewed

Edge micro-agent fixes failures safely with no destructive actions
An Uncertainty-Aware Resilience Micro-Agent for Causal Observability in the Computing Continuum

Suvi De Silva +4
cs.DC 2026-05-11 reviewed

Mutable membership lets MoE survive rank faults without restarts
Surviving Partial Rank Failures in Wide Expert-Parallel MoE Inference

Xun Sun +20
cs.CR 2026-05-11 reviewed

This paper performs a structured bidirectional review of peer-reviewed studies on AI and…
SoK: A Systematic Bidirectional Literature Review of AI & DLT Convergence

Ali Irzam Kathia +5
cs.DC 2026-05-11 reviewed

Maestro cuts GPU use by 40% for compound LLM training
Accelerating Compound LLM Training Workloads with Maestro

Xiulong Yuan +18
cs.DC 2026-05-11 reviewed

BitTorrent warm-up hides FL update sources from local observers
Privacy-preserving Chunk Scheduling in a BitTorrent Implementation of Federated Learning

Naicheng Li +4
cs.DC 2026-05-11 reviewed

BitTorrent warm-up bounds FL source attribution to random guessing
Privacy-preserving Chunk Scheduling in a BitTorrent Implementation of Federated Learning

Naicheng Li +4
cs.DC 2026-05-11 reviewed

Hierarchical RL cuts edge latency 28 percent while saving energy
HiRL: Hierarchical Reinforcement Learning for Coordinated Resource Management in Heterogeneous Edge Computing

Jianyong Zhu +5
cs.DC 2026-05-11 reviewed

CPU radix sort reaches 6x bandwidth efficiency on large datasets
FractalSortCPU: Bandwidth-Efficient Compressed Radix Sort on CPU

Michael Dang'ana
cs.DC 2026-05-11 reviewed

CPU radix sort cuts bandwidth use by 6x on large data
FractalSortCPU: Bandwidth-Efficient Compressed Radix Sort on CPU

Michael Dang'ana
cs.AI 2026-05-11 reviewed

Small models reach strong edge-agent results when tools match the model
Agentic Performance at the Edge: Insights from Benchmarking

Shiqiang Wang +1
cs.DC 2026-05-11 reviewed

Amortized protocol makes async BRB messages linear in size
Amortized Asynchronous Byzantine Reliable Broadcast with Optimal Resilience

Michael Yiqing Hu +2
cs.DC 2026-05-11 reviewed

Amortized BRB reaches O(n|m|) messages in async networks
Amortized Asynchronous Byzantine Reliable Broadcast with Optimal Resilience

Michael Yiqing Hu +2
cs.AI 2026-05-11 reviewed

Autonomous objects resolve over half of scientific data conflicts
Autonomous FAIR Digital Objects: From Passive Assertions to Active Knowledge

Zeyd Boukhers +3
physics.comp-ph 2026-05-11 reviewed

Block-structured matrix multiplication speeds quantum chemistry by 10x
Accelerating Locality-Driven Integration in Quantum Chemistry with Block-Structured Matrix Multiplication

Xinran Wei +10
physics.comp-ph 2026-05-11 reviewed

Block-structured matmul speeds DFT integrals up to 10x on GPUs
Accelerating Locality-Driven Integration in Quantum Chemistry with Block-Structured Matrix Multiplication

Xinran Wei +10
physics.comp-ph 2026-05-11 reviewed

Graph reordering cuts memory pressure in GPU integral evaluation
FusionRCG: Orchestrating Recursive Computation Graphs across GPU Memory Hierarchies

Yihong Zhang +6
physics.comp-ph 2026-05-11 reviewed

Graph orchestration cuts GPU memory use for recursive integrals
FusionRCG: Orchestrating Recursive Computation Graphs across GPU Memory Hierarchies

Yihong Zhang +6
cs.LG 2026-05-11 reviewed

Adaptive clipping lifts private federated LLM accuracy
DP-LAC: Lightweight Adaptive Clipping for Differentially Private Federated Fine-tuning of Language Models

Haaris Mehmood +4
cs.NI 2026-05-11 reviewed

Adaptive offloading lifts LLM throughput 65% at 47% lower energy
GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference

Zengzipeng Tang +4
cs.DC 2026-05-11 reviewed

Vehicle screening plus federated segmentation cuts pothole data volume
Edge-Cloud Collaborative Pothole Detection via Onboard Event Screening and Federated Temporal Segmentation

Yingjie Wu +2
cs.DC 2026-05-11 reviewed

Brokerless data plane delivers consistent batches for AI training
BatchWeave: A Consistent Object-Store-Native Data Plane for Large Foundation Model Training

Ting Sun +9
cs.DC 2026-05-11 reviewed

Object store delivers atomic batches for 64-GPU model training
BatchWeave: A Consistent Object-Store-Native Data Plane for Large Foundation Model Training

Ting Sun +9
cs.DC 2026-05-11 reviewed

Ordered agents let population protocols recognize unambiguous star-free languages
Population Protocols over Ordered Agents

Michael Blondin +5
cs.NI 2026-05-10 reviewed

Method optimizes server placement for vertical federated learning in dynamic networks
Optimizing Server Placement for Vertical Federated Learning in Dynamic Edge/Fog Networks

Su Wang +2
cs.DC 2026-05-10 reviewed

Cascade labels 8.6M orbital sequences for anomaly detection
Multi-Tier Labeling and Physics-Informed Learning for Orbital Anomaly Detection at Scale

Yong Fu
cs.DC 2026-05-10 reviewed

Cloud trace decomposition predicts performance at 2% error
Cloud Performance Decomposition for Long-Term Performance Engineering: A Case Study

Shimul Debnath +4
eess.IV 2026-05-10 reviewed

Neural preprocessor lifts H.264 perceptual scores 27 percent on UVG
Kelvin v1.0: A Neural Pre-Encoder for H.264: A standards-compliant learned preprocessor with -27.62% BD-VMAF on UVG

Marco Graziano