archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 11

cs.LG 2026-04-20 reviewed

Post-correction keeps particle clusters intact after lossy compression
Preserving Clusters in Error-Bounded Lossy Compression of Particle Data

Congrong Ren +4
cs.PF 2026-04-20 reviewed

CPU-GPU hybrid speeds long-context LLM inference 1.41x-3.2x
HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing

Mao Lin +4
cs.DC 2026-04-20 reviewed

Resilient MPI key-value store hits limits with current ULFM and RMA
User Experiences with MPI RMA and ULFM in a Resilient Key-Value Store Implementation

Claudia Fohry +1
cs.DC 2026-04-20 reviewed

Digital twin tests BFT systems against timing attacks
Trust, but Verify: ByzTwin-Range, a Digital Twin Cyber-Range for Byzantine Faults

Tadeu Freitas +2
cs.DC 2026-04-20 reviewed

Memory quantile models cut cluster under-allocations from 4.17% to 2.89%
Optimizing Memory Allocation in Distributed Clusters with Predictive Modeling

Jonathan Bader +7
cs.DC 2026-04-20 reviewed

Tighter analysis cuts leader election messages to O(n log n)
Toward Optimality: A Tighter Analysis of Message Complexity for Leader Election in Diameter-Two Networks

Abhijit Sadhukhan +2
cs.CE 2026-04-20 reviewed

Fused CUDA kernel speeds 3D SIMP optimization 4.6-7.3x
Matrix-Free 3D SIMP Topology Optimization with Fused Gather-GEMM-Scatter Kernels

Shaoliang Yang +2
cs.DC 2026-04-20 reviewed

One frozen LLM runs many tasks with 4-6x better speed and memory on phones
Unlocking the Edge deployment and ondevice acceleration of multi-LoRA enabled one-for-all foundational LLM

Sravanth Kodavanti +15
cs.DC 2026-04-20 reviewed

Persistent GPU kernel yields 15x speedup for tiny tensor operations
GPUOS: A GPU Operating System Primitive for Transparent Operation Fusion

Yiwei Yang +5
cs.DC 2026-04-20 reviewed

Async GPU kernels speed up sparse matrix multiplies by up to 6x
AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures

Jie Liu +2
cs.CL 2026-04-20 reviewed

DeInfer speeds parallel inference of decomposed LLMs
DeInfer: Efficient Parallel Inferencing for Decomposed Large Language Models

You-Liang Huang +3
cs.DC 2026-04-19 reviewed

EcoSched cuts multi-GPU energy use by up to 14.8% via per-job GPU counts
Towards Energy Efficient Co-Scheduling in HPC

Zhong Zheng +2
cs.DC 2026-04-19 reviewed

EcoShift gains 6% performance in power-limited CPU-GPU clusters
EcoShift: Performance-Aware Power Management for Power-Constrained Heterogeneous Systems

Zhong Zheng +2
cs.LG 2026-04-19 reviewed

Crash-aware tuner spends fixed budget more consistently on LLM serving
SLO-Guard: Crash-Aware, Budget-Consistent Autotuning for SLO-Constrained LLM Serving

Christian Lysenst{\o}en
cs.AR 2026-04-19 reviewed

Multi-tier KV cache cuts LLM inference costs by 47%
Predictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU Inference

Sanjeev Rao Ganjihal
cs.DC 2026-04-19 reviewed

Compiler IR enables hardware-free design exploration for distributed ML
Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML

Jinsun Yoo +5
cs.DC 2026-04-19 reviewed

Active inference learns edge AI routing without offline training
Active Inference-Based Adaptive Routing for Heterogeneous Edge AI Services

Zihang Wang +2
cs.AI 2026-04-19 reviewed

Hive reuses logits to speed up multi-agent LLM re-sampling 1.11x-1.76x
Hive: A Multi-Agent Infrastructure for Algorithm- and Task-Level Scaling

Zizhang Luo +5
cs.DC 2026-04-19 reviewed

Cloud-native systems required to scale large language models
Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda

Minxian Xu +18
cs.DC 2026-04-19 reviewed

Lossless compression speeds GPU communication up to 47%
UCCL-Zip: Lossless Compression Supercharged GPU Communication

Shuang Ma +10
cs.DC 2026-04-18 reviewed

Proxy borrows OS scheduling to stop LLM agents from crashing APIs
HiveMind: OS-Inspired Scheduling for Concurrent LLM Agent Workloads

Justice Owusu Agyemang +5
cs.DC 2026-04-18 reviewed

Tensor fingerprinting cuts AI model hub storage
TStore: Rethinking AI Model Hub with Tensor-Centric Compression

Tingfeng Lan +5
cs.DC 2026-04-18 reviewed

TensorHub cuts AI model storage via tensor deduplication
TStore: Rethinking AI Model Hub with Tensor-Centric Compression

Tingfeng Lan +5
cs.DC 2026-04-18 reviewed

Standard Podman with added layers matches specialized HPC containers
Sarus Suite: Cloud-native Containers for HPC

Alberto Madonna +5
cs.DC 2026-04-18 reviewed

Pipeline predicts airspace sectors and lets aircraft coordinate entries
Predictive Sectorization and Bayesian Optimized Consensus for Admission Control in Autonomous Airspace Operations

Aditya Dhodapkar +4
cs.AI 2026-04-18 reviewed

Quick intuition tops slow reasoning for edge AI in DAOs
The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus

Syed Muhammad Aqdas Rizvi
cs.DC 2026-04-18 reviewed

Three axioms force AMM orbits to weighted geometric means
From Swap Axioms to Weighted Geometric Means: A Characterization of AMMs

Bj\"orn Assmann +1
cs.DC 2026-04-18 reviewed

Hierarchical sparsity speeds LLM attention 4.57 times
HieraSparse: Hierarchical Semi-Structured Sparse KV Attention

Haoxuan Wang +1
cs.DB 2026-04-17 reviewed

Flipped indexing delivers 6.5x lower GPU query latency with dynamic updates
FliX: Flipped-Indexing for Scalable GPU Queries and Updates

Rosina Kharal +3
cs.DC 2026-04-17 reviewed

Adaptive framework trains graph transformers 6x faster on 8 GPUs
Scalable and Adaptive Parallel Training of Graph Transformer on Large Graphs

Jun-Liang Lin +2
cs.DC 2026-04-17 reviewed

Agent context tracking cuts power use 27% in AI serving
KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving

Yichao Yuan +2
quant-ph 2026-04-17 reviewed

GreenPeas is a C++/CUDA tool that compiles quantum error-correction decoding hypergraphs…
GreenPeas: Unlocking Adaptive Quantum Error Correction with Just-in-Time Decoding Hypergraphs

Abbas B. Ziad +2
cs.LG 2026-04-17 reviewed

Precision modeling cuts training time prediction error to 9.8 percent
Training Time Prediction for Mixed Precision-based Distributed Training

Minchul Kang +7
cs.DC 2026-04-17 reviewed

Any amoebot shape breaks into O(holes) convex pieces in log time
Logarithmic-Time Geodesically Convex Decomposition in Programmable Matter

Henning Hillebrandt +4
cs.DC 2026-04-17 reviewed

Compositional operators let verified swarms be reused safely
Compositional Design, Implementation, and Verification of Swarms (Technical Report)

Florian Furbach +5
cs.DC 2026-04-17 reviewed

Availability weighting fixes unfair sampling in federated learning
Robust Synchronisation for Federated Learning in The Face of Correlated Device Failure

Stefan Behfar +1
cs.DC 2026-04-17 reviewed

Dynamic grouping plus TEE cuts blockchain consensus messages
T-RBFT: A Scalable and Efficient Byzantine Consensus Based on Trusted Execution Environment for Consortium Blockchain

Wen Gao +2
cs.DC 2026-04-17 reviewed

SYCL implementations vary in memory and kernel behavior
Evaluating SYCL as a Unified Programming Model for Heterogeneous Systems

Ami Marowka
cs.DC 2026-04-17 reviewed

Automated pipeline adds continuous benchmarking to HPC
Continuous benchmarking: Keeping pace with an evolving ecosystem of models and technologies

Jan Vogelsang +9
cs.DC 2026-04-17 reviewed

Second-gen serverless drops warm latency from 40 ms to 10 ms
New Kids: An Architecture and Performance Investigation of Second-Generation Serverless Platforms

Trever Schirmer +6
cs.DC 2026-04-17 reviewed

Exascale system trains billion-parameter interatomic potentials in hours
Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials

Yuanchang Zhou +14
cs.DC 2026-04-17 reviewed

On-orbit aggregation reduces satellite federated learning energy by 6x
CroSatFL: Energy-Efficient Federated Learning with Cross-Aggregation for Satellite Edge Computing

Nan Yang +4
cs.DC 2026-04-17 reviewed

GPU framework speeds NNQS configuration selection 2.32x on 64 GPUs
A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States

Daran Sun +15
cs.CR 2026-04-17 reviewed

Sequential memory proof caps ASIC speed at DRAM latency
PoSME: Proof of Sequential Memory Execution via Latency-Bound Pointer Chasing with Causal Hash Binding

David L. Condrey
cs.DC 2026-04-17 reviewed

Accuracy drives speed in long-context LLM serving
Accuracy Is Speed: Towards Long-Context-Aware Routing for Distributed LLM Serving

Takeshi Yoshimura +2
cs.DC 2026-04-17 reviewed

RAFT cluster inside blockchain nodes boosts scale and uptime
BlockRaFT: A Distributed Framework for Fault-Tolerant and Scalable Blockchain Nodes

Manaswini Piduguralla +3
cs.DC 2026-04-17 reviewed

The paper introduces DataCenterGym
DataCenterGym: A Physics-Grounded Simulator for Multi-Objective Data Center Scheduling

Nilavra Pathak +2
cs.LG 2026-04-16 reviewed

Mixing matrix design speeds SGP convergence in broadcast DFL
Optimizing Stochastic Gradient Push under Broadcast Communications

Tuan Nguyen +1
cs.DC 2026-04-16 reviewed

Wave dispatch lets HPC treat quantum fragments as tasks
Wave-Based Dispatch for Circuit Cutting in Hybrid HPC--Quantum Systems

Ricard S. Garc\'ia-Raigada +2
cs.DC 2026-04-16 reviewed

Stable per-LLM time shares enable efficient GPU allocation for agentic workflows
Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines

Marcel Wagenl\"ander +8