archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 8

cs.LG 2026-05-01 reviewed

Local AI agents stop early to cut energy waste 15-20%
AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

Dzung Pham +3
cs.DC 2026-05-01 reviewed

Perseus fixes proxy RDMA serialization for 10x multi-node MoE speedup
Eliminating Hidden Serialization in Multi-Node Megakernel Communication

Byungsoo Oh +1
cs.DC 2026-05-01 reviewed

Emulator matches vLLM serving within 5 percent error
LLM-Emu: Native Runtime Emulation of LLM Inference via Profile-Driven Sampling

Wei Da +1
cs.CL 2026-05-01 reviewed

Quantization halves memory use in LLM training
AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs

Wenxiang Lin +7
cs.CL 2026-05-01 reviewed

Quantization halves memory for 8B–32B LLM training
AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs

Wenxiang Lin +7
cs.DC 2026-05-01 reviewed

Fixed-core approach yields 211x higher efficiency for edge GEMM
Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge

M. Grailoo +1
cs.DC 2026-05-01 reviewed

Workflow scheduling cuts AI agent task time by 1.64x
SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters

Dongxin Guo +2
cs.DC 2026-05-01 reviewed

Ring subnets cut space LLM latency by threefold
SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks

Zhanwei Wang +4
cs.DC 2026-05-01 reviewed

Ring subnets cut satellite LLM latency threefold
SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks

Zhanwei Wang +4
cs.DC 2026-05-01 reviewed

IPU scaling boosts CFD AI training throughput fivefold
Adaptation of AI-accelerated CFD Simulations to the IPU platform

P. Rosciszewski +4
cs.DC 2026-05-01 reviewed

OrbitBFT scales BFT consensus in LEO satellite networks
OrbitBFT: Enabling Scalable and Robust BFT Consensus in LEO Constellations

Tianyi Sun +3
cs.LG 2026-05-01 reviewed

Architecture shapes convergence in hierarchical federated learning
Hierarchical Federated Learning for Networked AI: From Communication Saving to Architecture-Aware Design

Seyed Mohammad Azimi-Abarghouyi +2
cs.AI 2026-05-01 reviewed

Same model accuracy varies 12 points by endpoint
Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference

Yuxuan Gao +2
stat.CO 2026-04-30 reviewed

Three streaming covariance algorithms match exactly in exact math
$2B$ or Not $2B$: A Tale of Three Algorithms for Streaming: Covariance Estimation after Welford and Chan-Golub-LeVeque

Felix Reichel
cs.DC 2026-04-30 reviewed

Replication cuts partitioning costs by 17-65 percent on average
Replication in Graph Partitioning and Scheduling Problems

P\'al Andr\'as Papp +2
cs.NI 2026-04-30 reviewed

Untwinning removes specific network twins without full rebuild
Network Digital Untwinning: Towards Backward Optimization of Digital Twins

Zifan Zhang +7
cs.DC 2026-04-30 reviewed

Dedicated engine separates models for easier architecture simulation
Akita: A High Usability Simulation Framework for Computer Architecture

Sabila Al Jannat +8
cs.AR 2026-04-30 reviewed

Ring topology on FPGAs runs cortical circuit faster than real time
NeuroRing: Scaling Spiking Neural Networks via Multi-FPGA Bidirectional Ring Topologies and Stream-Dataflow Architectures

Muhammad Ihsan Al Hafiz +1
cs.DC 2026-04-30 reviewed

Fees linked to pool invariant k make CPMM trades path-independent
Characterizing Path-Independent Fees: A Route to Zero Impermanent Loss in CPMMs

Andrey Voronin +4
cs.DC 2026-04-30 reviewed

Model derives DEX fee floor to keep LPs in gain zone
From Impermanent Loss to Sustainable Gain: Quantifying Profitability Zones for Liquidity Providers on DEX

Ignat Melnikov +4
cs.DC 2026-04-30 reviewed

CS-3 runs 90% sparse SpMM 100x faster than CPU
Exploring Sparse Matrix Multiplication Kernels on the Cerebras CS-3

Milan Shah +2
cs.DS 2026-04-30 reviewed

Santa Claus needs sqrt n rounds for any approximation
Distributed Santa Claus via Global Rounding

Tijn de Vos +4
cs.DC 2026-04-30 reviewed

Most arbitrage chances come from one transaction each
The Origins of MEV: Systematic Attribution of Arbitrage Opportunity Creation at Scale

Andrei Seoev +6
cs.OS 2026-04-30 reviewed

Affinity hints give 12% throughput boost on chiplet servers
Affinity Tailor: Dynamic Locality-Aware Scheduling at Scale

Jin Xin Ng +9
cs.DC 2026-04-30 reviewed

Design-time traces yield low WCETs that cut waste 36% in mixed-criticality systems
AnTi-MiCS: Analytical Framework for Bounding Time in Embedded Mixed-Criticality Systems

Behnaz Ranjbar +1
cs.DC 2026-04-30 reviewed

AI inference relocates like electricity demand within latency limits
AI Inference as Relocatable Electricity Demand: A Latency-Constrained Energy-Geography Framework

Xubin Luo +1
cs.DC 2026-04-30 reviewed

Lossless compression speeds LLM training up to 1.18 times
ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training

Wenxiang Lin +4
cs.AI 2026-04-30 reviewed

Traditional methods fail for AI in autonomous system dependability
Autonomous Systems Dependability in the era of AI: Design Challenges in Safety, Security, Reliability and Certification

Behnaz Ranjbar +5
cs.DC 2026-04-30 reviewed

The paper proves that all predicates expressible in monadic Presburger arithmetic can be…
Monadic Presburger Predicates have Robust Population Protocols

Philipp Czerner +5
cs.DC 2026-04-30 reviewed

Consensus-embedded checks give order-execute chains 10.6x throughput
Back to the Future: Rethinking Endorsement in Order-Execute Blockchains

Rongji Huang +7
cs.CR 2026-04-30 reviewed

Merkle tree pipeline verifies IoT logs at 130k records per second
Lightweight Tamper-Evident Log Integrity Verification for IoT Edge Environments: A Merkle Tree Pipeline with Adaptive Chunking

Muhammet Anil Yagiz +2
cs.DC 2026-04-30 reviewed

Distributed GPUs train fluid predictors faster than solvers
A Study on the Performance of Distributed Training of Data-driven CFD Simulations

Sergio Iserte +3
cs.DC 2026-04-30 reviewed

Unified API brings dynamic resources to HPC apps via MPI spawning
Towards the Democratization and Standardization of Dynamic Resources with MPI Spawning

Sergio Iserte +5
cs.RO 2026-04-29 reviewed

Jetson AGX Orin runs 25k Monte Carlo AEB samples in 530 ms
Real-Time GPU-Accelerated Monte Carlo Evaluation of Safety-Critical AEB Systems Under Uncertainty

Akshay Karjol +1
cs.DC 2026-04-29 reviewed

Block pipelining lifts Hyperledger Fabric commit throughput 1.9x
End-to-End and Phase-Level Performance Optimization for Hyperledger Fabric

Pavan Sollu +8
cs.LG 2026-04-29 reviewed

Compiler automates sequence parallelism for 2.7x longer LLM contexts
AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism

Ahan Gupta +5
cs.DC 2026-04-29 reviewed

Round-robin stage dispatch breaks GPU pipeline bottleneck for LLM training
Efficient Training on Multiple Consumer GPUs with RoundPipe

Yibin Luo +4
cs.DC 2026-04-29 reviewed

Deterministic nodes adapt only to uniform goals in dynamic networks
Adaptive Self-Organization in Anonymous Dynamic Networks

Garrett Parzych +1
cs.DC 2026-04-29 reviewed

Serverless MoE serving cuts resources below one third for multi-tenant use
FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving

Minghe Wang +3
cs.DC 2026-04-29 reviewed

Test taxonomy with CI ecosystem improves HPC fault detection
A Test Taxonomy and Continuous Integration Ecosystem for Dynamic Resource Management in HPC

Petter Sand{\aa}s +3
cs.AR 2026-04-29 reviewed

The paper introduces Voxel, a compiler-aware simulation framework for studying the…
Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel

Yiqi Liu +4
cs.DC 2026-04-29 reviewed

Semantic cache reuses up to 92 percent of quantum circuit results
A Semantic Quantum Circuit Cache for Scalable and Distributed Quantum-Classical Workflows

Mar Tejedor +2
cs.DC 2026-04-29 reviewed

Jointly adapting batch size and parallelism speeds LLM training 4-8%
COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training

Akhmed Sakip +8
cs.DC 2026-04-29 reviewed

Agentic workflow turns PyTorch graphs into faster CUTLASS kernels
FACT: Compositional Kernel Synthesis with a Three-Stage Agentic Workflow

Sina Heidari +1
cs.DC 2026-04-29 reviewed

DMRlib more than triples data center throughput with easy malleable coding
DMRlib: Easy-coding and Efficient Resource Management for Job Malleability

Sergio Iserte +3
cs.DC 2026-04-29 reviewed

Mobile agents scale by denser single capabilities and group collaboration
Scaling Mobile Agent Systems: From Capability Density to Collective Intelligence

Bowei He
cs.DC 2026-04-29 reviewed

Malleability cuts malleable HPC workload time by 27%
MPI Malleability Validation under Replayed Real-World HPC Conditions

S. Iserte +4
cs.DC 2026-04-29 reviewed

Dual-path KV offload cuts edge LLM latency up to 42%
DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference

Bodon Jeong +6
cs.DC 2026-04-29 reviewed

FloatSOM trains 1024-node maps on 1B samples in 6 minutes on GPUs
FloatSOM: GPU-Accelerated, Distributed, Topology-Flexible Self-Organizing Maps

Tony Xu +5
cs.LG 2026-04-29 reviewed

Progressive encoder cuts VLM latency at 1 Mbps uplink
Progressive Semantic Communication for Efficient Edge-Cloud Vision-Language Models

Cyril Shih-Huan Hsu +2