archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 14

cs.DC 2026-04-08 reviewed

GROMACS runs deep-potential MD at scale on multi-GPU systems
Making Room for AI: Multi-GPU Molecular Dynamics with Deep Potentials in GROMACS

Luca Pennati +4
cs.DC 2026-04-08 reviewed

Disaggregating LoRA triples request rate under latency limits
InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models

Hongyu Chen +8
cs.DC 2026-04-08 reviewed

LLM serving policies rewrite themselves online for 34% gains
Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics

Youhe Jiang +6
cs.DC 2026-04-08 reviewed

One LLM call compiles web tasks into JSON that runs forever at fixed low cost
Agentic Compilation: Mitigating the LLM Rerun Crisis for Minimized-Inference-Cost Web Automation

Jagadeesh Chundru
cs.DC 2026-04-08 reviewed

Client scheduler hits 100% LLM deadlines at 4.2 requests per second
Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale

Renzhong Yuan +5
cs.DC 2026-04-08 reviewed

Nested pipelining gives 3x faster training on 1,500+ accelerators
NestPipe: Large-Scale Recommendation Training on 1,500+ Accelerators via Nested Pipelining

Zhida Jiang +14
cs.DC 2026-04-08 reviewed

Output-set tasks solvable under crashes iff inclusion graph connects
On the Decidability of Distributed Tasks with Output Sets under Asynchrony and Any Number of Crashes

Timoth\'e Albouy +4
cs.PL 2026-04-08 reviewed

Priorities and clocks extend CCS to define coherence
Determinacy with Priorities up to Clocks

Luigi Liquori (Centre Inria de l'Universit\'e C\^ote d'Azur) +2
cs.DC 2026-04-08 reviewed

Multi-robot service prototype runs on Aggregate Programming
Exploiting Aggregate Programming in a Multi-Robot Service Prototype

Giorgio Audrito (Dipartimento di Informatica +6
cs.PL 2026-04-08 reviewed

Effpi adds branching for external choice and timeouts
Branching Out: Existential External Choice in Effpi

Benjamin Robinson (University of Oxford) +1
cs.DC 2026-04-08 reviewed

Layer-by-layer freezing fits private LLM tuning on edge devices
Beyond End-to-End: Dynamic Chain Optimization for Private LLM Adaptation on the Edge

Yebo Wu +5
cs.DC 2026-04-08 reviewed

Nexus cuts serverless CPU use 44% by offloading I/O from VMs
Nexus: Transparent I/O Offloading for High-Density Serverless Computing

JooYoung Park +6
cs.AR 2026-04-08 reviewed

SwarmIO emulates 40M IOPS SSDs for GPUs with 300x speedup
SwarmIO: Towards 100 Million IOPS SSD Emulation for Next-generation GPU-centric Storage Systems

Hyeseong Kim +2
cs.DC 2026-04-08 reviewed

Foundry cuts LLM cold-start time from minutes to seconds
Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold Start

Xueshen Liu +5
cs.DC 2026-04-08 reviewed

SpMM requires structure-specific roofline models for accurate bounds
Sparsity-Aware Roofline Models for Sparse Matrix-Matrix Multiplication

Matthew Qian +3
cs.DC 2026-04-08 reviewed

DynLP updates graph labels 13x faster on average by limiting propagation to changed sub-
DynLP: Parallel Dynamic Batch Update for Label Propagation in Semi-Supervised Learning

S M Shovan +4
cs.DC 2026-04-08 reviewed

Canceled spot requests yield availability signals at near-zero cost
Ding-Dong Ditch: Peeking Into Spot Instance Availability

Kyumin Kim +3
cs.DC 2026-04-08 reviewed

Adaptive sync raises IoT ledger recovery after partitions
Contextual Chain: Single-State Ledger Design for Mobile/IoT Networks with Frequent Partitions

Song-Ju Kim
cs.DC 2026-04-07 reviewed

Copy-on-write KV cache triples multi-LoRA agent throughput
ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache

Shao Wang +2
cs.DC 2026-04-07 reviewed

Power reconstruction shows 79% energy cut from mixed precision on Frontier
Fine-Grained Power and Energy Attribution on AMD GPU/APU-Based Exascale Nodes

Adam McDaniel +10
cs.DC 2026-04-07 reviewed

Codec signals triple VLM streaming throughput
CodecSight: Leveraging Video Codec Signals for Efficient Streaming VLM Inference

Yulin Zou +7
cs.DC 2026-04-07 reviewed

GTaP runtime runs fork-join tasks on GPUs faster than CPU OpenMP
GTaP: A GPU-Resident Fork-Join Task-Parallel Runtime with a Pragma-Based Interface

Yuki Maeda +1
cs.DC 2026-04-07 reviewed

Morton plane trees speed GPU neighbor search by over 10x
JZ-Tree: GPU friendly neighbour search and friends-of-friends with dual tree walks in JAX plus CUDA

Jens St\"ucker +4
cs.DC 2026-04-07 reviewed

Linearizable registers force extensive message chains
Communication Requirements for Linearizable Registers

Ra\"issa Nataf +1
cs.DC 2026-04-07 reviewed

Go runtime outperforms Python and Node.js for OpenFaaS on Kubernetes
Optimizing OpenFaaS on Kubernetes: Comparative Analysis of Language Runtimes and Cluster Distributions

Ehsan Ataie +2
cs.LG 2026-04-07 reviewed

ALTO speeds LoRA tuning 13.8x via early stops and shared scheduling
ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads

Jingwei Zuo +7
cs.DC 2026-04-06 reviewed

Persistent Alltoallv cuts MPI runtime up to 44% for large messages
Analyzing Persistent Alltoallv RMA Implementations for High-Performance MPI Communication

Evelyn Namugwanya
cs.CL 2026-04-06 reviewed

Single GPU trains 120B-parameter models at full precision
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Zhengqing Yuan +3
cs.DC 2026-04-06 reviewed

Decentralized relayers route cross-chain messages without hubs
Towards Policy-Enabled Multi-Hop Routing for Cross-Chain Message Delivery

Amin Rezaei +2
cs.AR 2026-04-06 reviewed

Tool explores 250 trillion 3D AI accelerator designs 100000 times faster
DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators

Zhiwen Mo +13
cs.CR 2026-04-06 reviewed

RegGuard cuts optimistic rollup settlement failures by over 90 percent
RegGuard: Legitimacy and Fairness Enforcement for Optimistic Rollups

Zhenhang Shang +2
cs.DC 2026-04-06 reviewed

Execution-idle wastes 10.7% of GPU cluster energy
The Energy Cost of Execution-Idle in GPU Clusters

Yiran Lei +6
cs.LG 2026-04-06 reviewed

Sampling parallelism scales Bayesian training linearly across GPUs
Sampling Parallelism for Fast and Efficient Bayesian Learning

Asena Karolin \"Ozdemir +5
cs.DC 2026-04-06 reviewed

Splitting LLMs across LEO satellites cuts delay by 42%
Communication-Efficient Collaborative LLM Inference over LEO Satellite Networks

Songge Zhang +4
cs.DC 2026-04-06 reviewed

Zero downtime achieved in edge energy service migration
Edge-Oriented Orchestration of Energy Services Using Graph-Driven Swarm Intelligence

Liana Toderean +5
cs.DC 2026-04-06 reviewed

Single-agent exploration in dynamic graphs needs Omega(m) window
Tight Bounds on Window Size and Time for Single-Agent Graph Exploration under T-Interval Connectivity

Yuichi Sudo +6
cs.DC 2026-04-06 reviewed

Layout propagation removes redundant packing in GEMM sequences
LP-GEMM: Integrating Layout Propagation into GEMM Operations

C\'esar Guedes Carneiro +3
cs.DC 2026-04-06 reviewed

Slurm tool simplifies submissions and defers jobs to cut energy use
NBI-Slurm: Simplified submission of Slurm jobs with energy saving mode

Andrea Telatin
cs.AI 2026-04-06 reviewed

AI peer review platform detects fake citations over 85 percent of the time
OpenCLAW-P2P v7.0-P2PCLAW: Resilient Multi-Layer Persistence, Live Reference Verification, and Production-Scale Evaluation of Decentralized AI Peer Review v7.0 -- Mathematical Corrections & Ecosystem Developments Edition

Francisco Angulo de Lafuente +5
cs.AI 2026-04-06 reviewed

AI agents run peer review with 85% fabricated-citation detection
OpenCLAW-P2P v7.0-P2PCLAW: Resilient Multi-Layer Persistence, Live Reference Verification, and Production-Scale Evaluation of Decentralized AI Peer Review v7.0 -- Mathematical Corrections & Ecosystem Developments Edition

Francisco Angulo de Lafuente +5
cs.DC 2026-04-06 reviewed

Satellite emulators tested against real data show clear gaps
An experimental evaluation of satellite constellation emulators

Victor Cionca +3
cs.DC 2026-04-06 reviewed

Co-serving system raises SLO attainment for mixed diffusion workloads by up to 44%
GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads

Fanjiang Ye +12
cs.DC 2026-04-05 reviewed

Ledger state serves as shared environment for agent coordination
Ledger-State Stigmergy: A Formal Framework for Indirect Coordination Grounded in Distributed Ledger State

Fernando Paredes Garc\'ia
cs.DC 2026-04-05 reviewed

Lemonshark cuts async BFT latency up to 65% with early finality
Lemonshark: Asynchronous DAG-BFT With Early Finality

Michael Yiqing Hu +4
cs.CR 2026-04-04 reviewed

SecureAFL detects bad updates and estimates missing ones in async FL
SecureAFL: Secure Asynchronous Federated Learning

Anjun Gao +5
quant-ph 2026-04-04 reviewed

GPU simulator speeds quantum circuits up to 146x over CPU
GPU-Accelerated Quantum Simulation: Empirical Backend Selection, Gate Fusion, and Adaptive Precision

Poornima Kumaresan +3
quant-ph 2026-04-03 reviewed

Four-layer middleware adapts hybrid quantum-HPC resources at runtime
Hybrid Quantum-HPC Middleware Systems for Adaptive Resource, Workload and Task Management

Pradeep Mantha +4
cs.CR 2026-04-03 reviewed

Hybrid parallelism scales encrypted Transformers across multiple GPUs
AEGIS: Scaling Long-Sequence Homomorphic Encrypted Transformer Inference via Hybrid Parallelism on Multi-GPU Systems

Zhaoting Gong +3
cs.NI 2026-04-03 reviewed

Granger causality quantifies noisy neighbor effects up to 67% slowdown
Causal Inference for Quantifying Noisy Neighbor Effects in Multi-Tenant Cloud Environments

Philipe S. Schiavo +8
cs.DC 2026-04-03 reviewed

Collective KV sharing runs 2.7x more multi-agent LLM agents
TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

Zhuohang Bian +5