archive
Every paper Pith has read. Search by title, abstract, or pith.
1164 papers in cs.DC · page 6
-
AD replaces finite differences in INLA for 4-8x gradient speedups
ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations
-
ResiHP keeps LLM training fast by adapting to GPU failures
ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism
-
ResiHP lifts LLM training speed 1–4× under real GPU failures
ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism
-
Sfactory unifies three platforms into one agent training pipeline
Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence
-
Three platforms linked into one pipeline for autonomous agents
Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence
-
TACO toolsuite verifies threshold automata for distributed algorithms
TACO: A Toolsuite for the Verification of Threshold Automata
-
BalanceRoute cuts DP imbalance in LLM serving
Tackling the Data-Parallel Load Balancing Bottleneck in LLM Serving: Practical Online Routing at Scale
-
Router cuts data-parallel imbalance in LLM clusters
Tackling the Data-Parallel Load Balancing Bottleneck in LLM Serving: Practical Online Routing at Scale
-
AI agents generate custom LLM serving systems competitive with vLLM
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?
-
Automated low-complexity matrix multiplies beat hardware peaks
FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication
-
FalconGEMM exceeds GEMM speeds by 7-17% via lower-complexity algorithms
FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication
-
MoE cuts relay buffers with direct expert-window access
Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend
-
Direct expert-window access removes relay buffers in MoE inference
Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend
-
Structural alignment beats coordinate matching for heterogeneous prototypes
From Coordinate Matching to Structural Alignment: Rethinking Prototype Alignment in Heterogeneous Federated Learning
-
Cache-free GPU enumeration outperforms priors on MBA synthesis
GPU-Accelerated Synthesis of Mixed-Boolean Arithmetic: Beyond Caching
-
Hardware hub lets MoE send data before knowing GPU addresses
MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
-
Gossip protocols fix their own faulty messages
Self-Correcting Gossip Protocols
-
Gas Cards replace off-chain signers in ERC-4337 paymasters
SuperPaymaster: Eliminating Centralized Signer Authority via Asset-Oriented Abstraction to Reconcile Usability and Decentralization in Account Abstraction
-
Differential privacy keeps edge ML fast and harder to steal
A Privacy-Preserving Machine Learning Framework for Edge Intelligence: An Empirical Analysis
-
LLM priors raise DRL task offloading success by over 17%
LLM-Enhanced Deep Reinforcement Learning for Task Offloading in Collaborative Edge Computing
-
MLA cache recovers 83% tokens despite position shifts
Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving
-
New in-switch method delivers 1.38x faster LLM tensor parallel training
Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems
-
DySHARP speeds MoE models 1.79x with dynamic in-switch computing
Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
-
Digital twin framework cuts data center power use with predictions
A Scalable Digital Twin Framework for Energy Optimization in Data Centers
-
EdgeServing cuts SLO violations for multi-DNN edge serving
EdgeServing: Deadline-Aware Multi-DNN Serving at the Edge
-
Simulation platform tests datacenter power flexibility for grid coordination
OpenG2G: A Simulation Platform for AI Datacenter-Grid Runtime Coordination
-
Dynamic tensor parallelism raises LLM goodput up to 5.3x
Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism
-
Nine-dimension model explains root causes in five of twelve DeFi incidents
Toward a Risk Assessment Framework for Institutional DeFi: A Nine-Dimension Approach
-
Resource model lifts MoE training efficiency 2-3.5X
Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
-
DPU offload gives 1.55x speedup but 625x more DRAM traffic
Communication Offloading on SmartNIC DPUs: A Quantitative Approach
-
DPU offload delivers 1.55x speedup when memory-to-comm ratio is high
Communication Offloading on SmartNIC DPUs: A Quantitative Approach
-
Digital twin trust maps to four integration patterns across domains
Trustworthiness in Digital Twin Systems: Systematic Review and Research Horizons
-
Satellite AI cuts delays 32 percent with model collaboration
Delay-Aware Large-Small Model Collaboration over LEO Satellite Networks
-
CCL-D pinpoints slow and hang anomalies in 4000-GPU clusters within 6 minutes
CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training
-
LLM agents turn GPU profiles into optimization advice
KEET: Explaining Performance of GPU Kernels Using LLM Agents
-
Adaptive HBM split cuts recommender P99 latency 24-38%
One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving
-
Coral cuts multi-LLM serving costs by up to 2.79x on mixed GPUs
Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs
-
GPU code speeds moving-boundary fluid simulations 20X
GPU-Accelerated Simulations of Problems with Moving Boundaries and Fluid-Structure Interaction at Extreme Scales
-
MRC lets AI training survive network faults by spraying across paths
Resilient AI Supercomputer Networking using MRC and SRv6
-
Serverless orchestration breaks in LEO continua
Orchestrating Serverless Applications in the Edge Cloud Space Continuum: What Breaks and What is Next?
-
ClusterLess cuts edge workflow times by up to 40%
ClusterLess: Deadline-Aware Serverless Workflow Orchestration on Federated Edge Clusters
-
Ledger stores CP-ABE keys so IoT users decrypt locally and revoke by epoch rotation
Revocation-Ready CP-ABE Key Management for Blockchain-Based IoT Data Sharing
-
Control plane unifies physical neural networks across materials
phys-MCP: A Control Plane for Heterogeneous Physical Neural Networks
-
Power grids need fast and slow thinking to handle renewables
Thinking fast and slow -- a cognitive inspired framework for decision intelligence for power systems
-
Cognitive models structure power grid decisions across timescales
Thinking fast and slow -- a cognitive inspired framework for decision intelligence for power systems
-
Pub/sub smart pointer limits reference updates to 0-1 per subscriber
ipc_shared_ptr: A Publish/Subscribe-Aware Smart Pointer for Cross-Process Object Lifetime Management
-
Microbenchmark models predict GPU performance with 1% error
Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures
-
True Sessions enable scalable MPI initialization in MPICH
Implementing True MPI Sessions and Evaluating MPI Initialization Scalability
-
Federated learning fails at 5s latency due to TCP handshake timeouts
Surviving the Edge: Federated Learning under Networking and Resource Constraints
-
HPC workflows pause for human input without idling compute resources
A Workflow-Oriented Framework for Asynchronous Human-AI Collaboration in Hybrid and Compute-Intensive HPC Environments