archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 12

cs.DC 2026-04-16 reviewed

Invariants let agents match hand-optimized GPU kernels
ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants

Haohui Mai +9
cs.AR 2026-04-16 reviewed

SCENIC hits 200G SmartNIC speed with programmable stream units
SCENIC: Stream Computation-Enhanced SmartNIC

Benjamin Ramhorst +6
cs.DC 2026-04-16 reviewed

Hybrid models let prefill run in a separate datacenter
Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter

Ruoyu Qin +7
cs.DC 2026-04-16 reviewed

Closed forms give exact multi-NUMA VM counts per host
Efficient calculation of available space for multi-NUMA virtual machines

Andrei Gudkov +2
cs.DC 2026-04-16 reviewed

Block placement and cache rules cut LLM serving latency
Serving Chain-structured Jobs with Large Memory Footprints with Application to Large Foundation Model Serving

Tingyang Sun +2
cs.AI 2026-04-16 reviewed

Game equilibria set synthetic data volumes in coopetitive learning
Cooperate to Compete: Strategic Data Generation and Incentivization Framework for Coopetitive Cross-Silo Federated Learning

Thanh Linh Nguyen +2
cs.IT 2026-04-16 reviewed

FL compression gains depend on correlation strength
Exploiting Correlations in Federated Learning: Opportunities and Practical Limitations

Adrian Edin +3
cs.LG 2026-04-16 reviewed

MoE serving gains 6.6x speedup via elastic self-speculation on 3D stacks
ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving

Yuseon Choi +7
cs.DC 2026-04-16 reviewed

Direct propagation matches locality lower bound in distributed DP
Locality, Not Spectral Mixing, Governs Direct Propagation in Distributed Offline Dynamic Programming

Ibne Farabi Shihab
cs.DC 2026-04-16 reviewed

Forkable shared logs let AI agents branch streaming data
AgileLog: A Forkable Shared Log for Agents on Data Streams

Shreesha G. Bhat +4
cs.DC 2026-04-16 reviewed

CoCoDiff speeds up distributed DiT inference 3.6x on average
CoCoDiff: Optimizing Collective Communications for Distributed Diffusion Transformer Inference Under Ulysses Sequence Parallelism

Bin Ma +4
cs.DS 2026-04-16 reviewed

Registers achieve log P latency despite contention
Fast Concurrent Primitives Despite Contention

Michael A. Bender +6
cs.DB 2026-04-15 reviewed

PIM hardware speeds R-tree queries up to 3.66x with less energy
Parallel R-tree-based Spatial Query Processing on a Commercial Processing-in-Memory System

Tasmia Jannat +2
quant-ph 2026-04-15 reviewed

VQLS cuts circuit count 256x for 10-qubit systems
Distributed Variational Quantum Linear Solver

Chao Lu +3
cs.DC 2026-04-15 reviewed

GPU hypergraph partitioner reaches 940x speedup with improved quality
Incidence Constraints in Hypergraph Partitioning on GPU

Marco Ronzani +1
cs.CR 2026-04-15 reviewed

Five themes together build cyber-physical resilience
Digital Guardians: The Past and The Future of Cyber-Physical Resilience

Saurabh Bagchi +22
cs.CR 2026-04-15 reviewed

Finite withholding beats infinite withholding by unbounded factor in pools
Temporary Power Adjusting Withholding Attack

Mustafa Doger +1
cs.CR 2026-04-15 reviewed

Temporary withholding boosts pool attack rewards 22x over permanent version
Temporary Power Adjusting Withholding Attack

Mustafa Doger +1
cs.DC 2026-04-15 reviewed

Inference tasks replace mining in AI blockchain consensus
HadAgent: Harness-Aware Decentralized Agentic AI Serving with Proof-of-Inference Blockchain Consensus

Landy Jimenez +5
cs.DC 2026-04-15 reviewed

OffloadFS moves database compaction to storage nodes for 3.36x speedup
OffloadFS: Leveraging Disaggregated Storage for Computation Offloading

Sungho Moon +6
cs.CR 2026-04-15 reviewed

Encrypted face data counts crowds without naming anyone
Head Count: Privacy-Preserving Face-Based Crowd Monitoring

Fatemeh Marzani +3
cs.DC 2026-04-15 reviewed

Open Ethernet HPC cluster ranks 49th on TOP500
SAKURAONE: An Open Ethernet-Based AI HPC System and Its Observed Workload Dynamics in a Single-Tenant LLM Development Environment

Fumikazu Konishi +2
cs.RO 2026-04-15 reviewed

Adaptive edge system raises robotics AI service quality
Self-adaptive Multi-Access Edge Architectures: A Robotics Case

Mahyar T Moghaddam +2
cs.CR 2026-04-15 reviewed

Distributed servers with MPC cut costs for private vertical federated learning
Secure and Privacy-Preserving Vertical Federated Learning

Shan Jin +4
cs.DC 2026-04-15 reviewed

PackSELL packs deltas and values to speed GPU SpMV 1.63x in FP16
PackSELL: A Sparse Matrix Format for Precision-Agnostic High-Performance SpMV

Kengo Suzuki +1
cs.DC 2026-04-14 reviewed

Event Tensor abstraction compiles dynamic megakernels
Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel

Hongyi Jin +20
cs.DC 2026-04-14 reviewed

DySkew cuts UDF skew delays with runtime data swaps
DySkew: Dynamic Data Redistribution for Skew-Resilient Snowpark UDF Execution

Chenwei Xie +10
cs.DC 2026-04-14 reviewed

Academia trains 70B open LLM on Alps supercomputer
An Engineering Journey Training Large Language Models at Scale on Alps: The Apertus Experience

Jonathan Coles +22
cs.PL 2026-04-14 reviewed

Virtual machine speeds array programs 147x on GPUs
Towards a Linear-Algebraic Hypervisor

Breandan Considine
cs.AR 2026-04-14 reviewed

EPAC RISC-V chip with three tiles taped out in 22nm
EPAC: The Last Dance

Filippo Mantovani +38
cs.DC 2026-04-14 reviewed

ML ensemble cuts CI memory waste by 36 GB per build
Intelligent resource prediction for SAP HANA continuous integration build workloads

Torsten Mandel +3
cs.DC 2026-04-14 reviewed

Hybrid platform extends supercomputers to full AI model lifecycle
Beyond Pre-Training: The Full Lifecycle of Foundation Models on HPC Systems

Dino Conciatore +6
cs.DC 2026-04-14 reviewed

The paper proposes pAirZero, a framework combining zeroth-order optimization and…
Three Birds, One Stone: Solving the Communication-Memory-Privacy Trilemma in LLM Fine-tuning Over Wireless Networks with Zeroth-Order Optimization

Zhijie Cai +5
cs.DC 2026-04-14 reviewed

Local routing plus compression cuts cloud LLM tokens 45-79%
Local-Splitter: A Measurement Study of Seven Tactics for Reducing Cloud LLM Token Usage on Coding-Agent Workloads

Justice Owusu Agyemang +4
cs.OS 2026-04-14 reviewed

MARS cuts agentic latency by 5.94x via co-scheduling
MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems

Yifei Wang +10
cs.AR 2026-04-14 reviewed

Compiler cuts NPU transformer energy use by up to 41%
Forge-UGC: FX optimization and register-graph engine for universal graph compiler

Satyam Kumar +1
cs.LG 2026-04-14 reviewed

Levy jumps fix trapping in decentralized random walks
Decentralized Learning via Random Walk with Jumps

Zonghong Liu +2
cs.DC 2026-04-14 reviewed

Periodic framework organizes distributed computing
A Periodic Space of Distributed Computing: Vision & Framework

Mohsen Amini Salehi +7
cs.LG 2026-04-14 reviewed

Physics-informed DLinear forecasts AI data center power more accurately
A Physics-Aware Framework for Short-Term GPU Power Forecasting of AI Data Centers

Mohammad AlShaikh Saleh +4
cs.DC 2026-04-14 reviewed

BlazingAML matches AML accuracy at 210x CPU speed
BlazingAML: High-Throughput Anti-Money Laundering (AML) via Multi-Stage Graph Mining

Haojie Ye +4
cs.DC 2026-04-14 reviewed

Live pipeline changes cut LLM first-token time by 2.5X
PipeLive: Efficient Live In-place Pipeline Parallelism Reconfiguration for Dynamic LLM Serving

Xu Bai +3
cs.AI 2026-04-13 reviewed

Reference-based replication creates AI agents in constant time
Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents

Swanand Rao +3
cs.DC 2026-04-13 reviewed

StableHLO unifies ML performance modeling across GPUs and TPUs
Evaluating Cross-Architecture Performance Modeling of Distributed ML Workloads Using StableHLO

Jonas Svedas +8
cs.DC 2026-04-13 reviewed

Pipelined Parareal on GPUs speeds microswimmer simulations
Accelerating Microswimmer Simulations via a Heterogeneous Pipelined Parallel-in-Time Framework

Ruixiang Huang +1
cs.DC 2026-04-13 reviewed

Bayesian Noisy-OR model cuts failure detection time by 60%
Predictive Bayesian Arbitration: A Scalable Noisy-OR Model with Service Criticality Awareness

Anil Jangam +2
cs.SE 2026-04-13 reviewed

Remote Git service delivers monorepo checkouts in under a second
GitFarm: Git as a Service for Large-Scale Monorepos

Preetam Dwivedi +2
cs.DC 2026-04-13 reviewed

Visual analytics clusters HPC nodes to expose behavioral differences
Understanding Large-Scale HPC System Behavior Through Cluster-Based Visual Analytics

Allison Austin +6
cs.LG 2026-04-13 reviewed

Residual bottlenecks deliver 128x activation compression for pipelines
ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline Parallelism

Alan Aboudib +3
cs.OS 2026-04-13 reviewed

Nanvix cuts serverless server needs by 20-100x
Nanvix: A Multikernel OS Design for High-Density Serverless Deployments

Carlos Segarra +6
cs.CR 2026-04-13 reviewed

Sparse FHE matmul on GPUs runs up to 3x faster than CPU
GPU Acceleration of Sparse Fully Homomorphic Encrypted DNNs

Lara D'Agata +9