archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 6

cs.DC 2026-05-07 reviewed

AD replaces finite differences in INLA for 4-8x gradient speedups
ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations

Afif Boudaoud +8
cs.DC 2026-05-07 reviewed

ResiHP keeps LLM training fast by adapting to GPU failures
ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism

Tenghui Ma +6
cs.DC 2026-05-07 reviewed

ResiHP lifts LLM training speed 1–4× under real GPU failures
ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism

Tenghui Ma +6
cs.AI 2026-05-07 reviewed

Sfactory unifies three platforms into one agent training pipeline
Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence

Xinquan Chen +38
cs.AI 2026-05-07 reviewed

Three platforms linked into one pipeline for autonomous agents
Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence

Xinquan Chen +38
cs.DC 2026-05-07 reviewed

TACO toolsuite verifies threshold automata for distributed algorithms
TACO: A Toolsuite for the Verification of Threshold Automata

Paul Eichler +5
cs.DC 2026-05-07 reviewed

BalanceRoute cuts DP imbalance in LLM serving
Tackling the Data-Parallel Load Balancing Bottleneck in LLM Serving: Practical Online Routing at Scale

Tianci Bu +8
cs.DC 2026-05-07 reviewed

Router cuts data-parallel imbalance in LLM clusters
Tackling the Data-Parallel Load Balancing Bottleneck in LLM Serving: Practical Online Routing at Scale

Tianci Bu +8
cs.AI 2026-05-07 reviewed

AI agents generate custom LLM serving systems competitive with vLLM
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

Keisuke Kamahori +3
cs.DC 2026-05-07 reviewed

Automated low-complexity matrix multiplies beat hardware peaks
FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication

Honglin Zhu +12
cs.DC 2026-05-07 reviewed

FalconGEMM exceeds GEMM speeds by 7-17% via lower-complexity algorithms
FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication

Honglin Zhu +12
cs.DC 2026-05-07 reviewed

MoE cuts relay buffers with direct expert-window access
Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend

Tianlun Hu +10
cs.DC 2026-05-07 reviewed

Direct expert-window access removes relay buffers in MoE inference
Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend

Tianlun Hu +10
cs.AI 2026-05-07 reviewed

Structural alignment beats coordinate matching for heterogeneous prototypes
From Coordinate Matching to Structural Alignment: Rethinking Prototype Alignment in Heterogeneous Federated Learning

Xinghao Wu +5
cs.PL 2026-05-07 reviewed

Cache-free GPU enumeration outperforms priors on MBA synthesis
GPU-Accelerated Synthesis of Mixed-Boolean Arithmetic: Beyond Caching

Gabriel Bathie +2
cs.AR 2026-05-07 reviewed

Hardware hub lets MoE send data before knowing GPU addresses
MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems

Zhuoshan Zhou +12
cs.LO 2026-05-07 reviewed

Gossip protocols fix their own faulty messages
Self-Correcting Gossip Protocols

Giorgio Cignarale +5
cs.CR 2026-05-07 reviewed

Gas Cards replace off-chain signers in ERC-4337 paymasters
SuperPaymaster: Eliminating Centralized Signer Authority via Asset-Oriented Abstraction to Reconcile Usability and Decentralization in Account Abstraction

Huifeng Jiao +1
cs.DC 2026-05-07 reviewed

Differential privacy keeps edge ML fast and harder to steal
A Privacy-Preserving Machine Learning Framework for Edge Intelligence: An Empirical Analysis

Quoc Lap Trieu +2
cs.DC 2026-05-07 reviewed

LLM priors raise DRL task offloading success by over 17%
LLM-Enhanced Deep Reinforcement Learning for Task Offloading in Collaborative Edge Computing

Hao Guo +3
cs.DC 2026-05-07 reviewed

MLA cache recovers 83% tokens despite position shifts
Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving

Bole Ma +2
cs.AR 2026-05-07 reviewed

New in-switch method delivers 1.38x faster LLM tensor parallel training
Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems

Chen Zhang +12
cs.AR 2026-05-07 reviewed

DySHARP speeds MoE models 1.79x with dynamic in-switch computing
Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs

Qijun Zhang +12
cs.DC 2026-05-07 reviewed

Digital twin framework cuts data center power use with predictions
A Scalable Digital Twin Framework for Energy Optimization in Data Centers

Raphael Hendrigo de Souza Gon\c{c}alves +1
cs.DC 2026-05-07 reviewed

EdgeServing cuts SLO violations for multi-DNN edge serving
EdgeServing: Deadline-Aware Multi-DNN Serving at the Edge

Jiahe Cao +5
cs.LG 2026-05-06 reviewed

Simulation platform tests datacenter power flexibility for grid coordination
OpenG2G: A Simulation Platform for AI Datacenter-Grid Runtime Coordination

Jae-Won Chung +5
cs.DC 2026-05-06 reviewed

Dynamic tensor parallelism raises LLM goodput up to 5.3x
Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism

Vikranth Srivatsa +4
cs.DC 2026-05-06 reviewed

Nine-dimension model explains root causes in five of twelve DeFi incidents
Toward a Risk Assessment Framework for Institutional DeFi: A Nine-Dimension Approach

Eva Oberholzer +3
cs.DC 2026-05-06 reviewed

Resource model lifts MoE training efficiency 2-3.5X
Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism

Sajal Dash +1
cs.DC 2026-05-06 reviewed

DPU offload gives 1.55x speedup but 625x more DRAM traffic
Communication Offloading on SmartNIC DPUs: A Quantitative Approach

Jacob Wahlgren +4
cs.DC 2026-05-06 reviewed

DPU offload delivers 1.55x speedup when memory-to-comm ratio is high
Communication Offloading on SmartNIC DPUs: A Quantitative Approach

Jacob Wahlgren +4
cs.CY 2026-05-06 reviewed

Digital twin trust maps to four integration patterns across domains
Trustworthiness in Digital Twin Systems: Systematic Review and Research Horizons

Chi Fai David Lam (1) +3
cs.DC 2026-05-06 reviewed

Satellite AI cuts delays 32 percent with model collaboration
Delay-Aware Large-Small Model Collaboration over LEO Satellite Networks

Mingyu Guo +4
cs.DC 2026-05-06 reviewed

CCL-D pinpoints slow and hang anomalies in 4000-GPU clusters within 6 minutes
CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training

Yida Gu +19
cs.PF 2026-05-06 reviewed

LLM agents turn GPU profiles into optimization advice
KEET: Explaining Performance of GPU Kernels Using LLM Agents

Joshua H. Davis +7
cs.DC 2026-05-06 reviewed

Adaptive HBM split cuts recommender P99 latency 24-38%
One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving

Wenjun Yu +2
cs.DC 2026-05-05 reviewed

Coral cuts multi-LLM serving costs by up to 2.79x on mixed GPUs
Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs

Yixuan Mei +7
physics.comp-ph 2026-05-05 reviewed

GPU code speeds moving-boundary fluid simulations 20X
GPU-Accelerated Simulations of Problems with Moving Boundaries and Fluid-Structure Interaction at Extreme Scales

Sushrut Kumar +4
cs.NI 2026-05-05 reviewed

MRC lets AI training survive network faults by spraying across paths
Resilient AI Supercomputer Networking using MRC and SRv6

Joao Araujo +49
cs.DC 2026-05-05 reviewed

Serverless orchestration breaks in LEO continua
Orchestrating Serverless Applications in the Edge Cloud Space Continuum: What Breaks and What is Next?

Hadi Tabatabaee Malazi +3
cs.DC 2026-05-05 reviewed

ClusterLess cuts edge workflow times by up to 40%
ClusterLess: Deadline-Aware Serverless Workflow Orchestration on Federated Edge Clusters

Reza Farahani +6
cs.CR 2026-05-05 reviewed

Ledger stores CP-ABE keys so IoT users decrypt locally and revoke by epoch rotation
Revocation-Ready CP-ABE Key Management for Blockchain-Based IoT Data Sharing

Chun Yin Chiu
cs.DC 2026-05-05 reviewed

Control plane unifies physical neural networks across materials
phys-MCP: A Control Plane for Heterogeneous Physical Neural Networks

Stefan Fischer +2
eess.SY 2026-05-05 reviewed

Power grids need fast and slow thinking to handle renewables
Thinking fast and slow -- a cognitive inspired framework for decision intelligence for power systems

Apoorv Mathur
eess.SY 2026-05-05 reviewed

Cognitive models structure power grid decisions across timescales
Thinking fast and slow -- a cognitive inspired framework for decision intelligence for power systems

Apoorv Mathur
cs.OS 2026-05-05 reviewed

Pub/sub smart pointer limits reference updates to 0-1 per subscriber
ipc_shared_ptr: A Publish/Subscribe-Aware Smart Pointer for Cross-Process Object Lifetime Management

Takahiro Ishikawa-Aso +4
cs.DC 2026-05-05 reviewed

Microbenchmark models predict GPU performance with 1% error
Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures

Aaron Jarmusch +1
cs.DC 2026-05-05 reviewed

True Sessions enable scalable MPI initialization in MPICH
Implementing True MPI Sessions and Evaluating MPI Initialization Scalability

Hui Zhou +4
cs.NI 2026-05-05 reviewed

Federated learning fails at 5s latency due to TCP handshake timeouts
Surviving the Edge: Federated Learning under Networking and Resource Constraints

Mike Mwanje +3
cs.DC 2026-05-05 reviewed

HPC workflows pause for human input without idling compute resources
A Workflow-Oriented Framework for Asynchronous Human-AI Collaboration in Hybrid and Compute-Intensive HPC Environments

Sergio Mendoza +7