archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 13

cs.AR 2026-04-13 reviewed

Decoupled matrix units deliver up to 2.31x AI speedups on CPUs
CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead

Jinpeng Ye +13
cs.DC 2026-04-13 reviewed

Self-calibrating digital twin reaches 4.39% MAPE on datacenter predictions
OpenDT: Exploring Datacenter Performance and Sustainability with a Self-Calibrating Digital Twin

Radu Nicolae +4
cs.DC 2026-04-13 reviewed

HPC fabrics show distinct congestion under AI-like bursts
Characterizing the Impact of Congestion in Modern HPC Interconnects

Lorenzo Piarulli +9
cs.LG 2026-04-13 reviewed

Pipeline compresses federated models over 11 times for 60% faster training
A Full Compression Pipeline for Green Federated Learning in Communication-Constrained Environments

Elouan Colybes +2
cs.DC 2026-04-13 reviewed

Hierarchical search tunes GPU apps better and faster
Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search

Daniel Nichols +5
cs.DC 2026-04-13 reviewed

Proactive DQN scaling outperforms reactive Kubernetes autoscalers
NimbusGuard: A Novel Framework for Proactive Kubernetes Autoscaling Using Deep Q-Networks

Chamath Wanigasooriya +1
quant-ph 2026-04-13 reviewed

Scheduler runs multiple quantum jobs in parallel on linked QPUs
QuMod: Parallel Quantum Job Scheduling on Modular QPUs using Circuit Cutting

Vinooth Kulkarni +5
cs.NI 2026-04-13 reviewed

Different GPU splits across LLMs change quality by 87% at fixed latency
RouterWise: Joint Resource Allocation and Routing for Latency-Aware Multi-Model LLM Serving

Hossein Hosseini Kasnavieh +2
cs.DC 2026-04-12 reviewed

Hybrid backend speeds cross-silo FL up to 3.8x for large models
Understanding Communication Backends in Cross-Silo Federated Learning

Amir Ziashahabi +2
eess.SY 2026-04-12 reviewed

AI workload mix smooths power variability but keeps fast ramps
Workload composition smooths aggregate power demand while sustaining short-horizon ramps in AI data centers

Subir Majumder +2
cs.DC 2026-04-12 reviewed

Thinning to degree two extends data center stability region
Bipartite matching under communication constraints

Moonmoon Mohanty +5
cs.CR 2026-04-12 reviewed

Protocol hides verifier claim choices from holders
COD-ssi: Enforcing Mutual Privacy for Credential Oblivious Disclosure in Self Sovereign Identity

Elia Onofri +4
cs.DC 2026-04-12 reviewed

Stackelberg game optimizes incentives and privacy noise in federated learning
FEDBUD: Joint Incentive and Privacy Optimization for Resource-Constrained Federated Learning

Tao Liu +1
cs.DC 2026-04-12 reviewed

One CIR image deploys on any platform after lazy build
CIR: Lightweight Container Image for Cross-Platform Deployment

Fengzhi Li +8
cs.DC 2026-04-12 reviewed

LLMs derive exact GPU thread maps that cut energy use up to 4833x
Leveraging Mathematical Reasoning of LLMs for Efficient GPU Thread Mapping

Jose Maureira +3
cs.DC 2026-04-11 reviewed

Icicle indexes billion-file HPC systems in real time
Icicle: Scalable Metadata Indexing and Real-Time Monitoring for HPC File Systems

Haochen Pan +9
cs.DC 2026-04-11 reviewed

INCGuard verifies in-network computing for packet-loss risks
Verifying In-Network Computing Systems for Design Risks

Tianyu Bai +3
cs.DC 2026-04-11 reviewed

Deep unrolling turns SP routines into reusable RF sensing blocks
RF-LEGO: Modularized Signal Processing-Deep Learning Co-Design for RF Sensing via Deep Unrolling

Luca Jiang-Tao Yu +1
cs.AR 2026-04-11 reviewed

Sparse measurements predict latency at every CPU-GPU frequency
Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge

Jiesong Chen +3
cs.DC 2026-04-11 reviewed

Kernel disaggregation lifts heterogeneous GPU throughput by 2.3x
Tessera: Unlocking Heterogeneous GPUs through Kernel-Granularity Disaggregation

Tiancheng Hu +12
cs.DC 2026-04-11 reviewed

FlexVector speeds GCN inference 3.78x with flexible registers
FlexVector: A SpMM Vector Processor with Flexible VRF for GCNs on Varying-Sparsity Graphs

Bohan Li +5
cs.LG 2026-04-11 reviewed

Local adaptive steps multiply comms savings in decentralized training
LoDAdaC: a unified local training-based decentralized framework with adaptive gradients and compressed communication

Wei Liu +7
cs.DC 2026-04-11 reviewed

Microkernel validation eliminates harm from agent restarts
Rebooting Microreboot: Architectural Support for Safe, Parallel Recovery in Microservice Systems

Laurent Bindschaedler
cs.DC 2026-04-10 reviewed

System choices scale HPL to 1.01 EF/s FP64 with 11.5x mixed precision gain
Sustaining Exascale Performance: Lessons from HPL and HPL-MxP on Aurora

Kazushige Goto +5
cs.CR 2026-04-10 reviewed

Lone attackers poison federated learning models
XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers

Israt Jahan Mouri +2
cs.LG 2026-04-10 reviewed

NOMAD speeds up massive graph embeddings by 10-100x on CPU clusters
NOMAD: Generating Embeddings for Massive Distributed Graphs

Aishwarya Sarkar +3
cs.DC 2026-04-10 reviewed

Adaptive layer resolves LLM scaling paradox on NPUs
A-IO: Adaptive Inference Orchestration for Memory-Bound NPUs

Chen Zhang +5
cs.DC 2026-04-10 reviewed

MATCHA cuts DNN inference latency up to 35% on heterogeneous edge SoCs
MATCHA: Efficient Deployment of Deep Neural Networks on Multi-Accelerator Heterogeneous Edge SoCs

Enrico Russo +8
cs.DC 2026-04-10 reviewed

Reference storage cuts LLM RL rollout stalls up to 19x
TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training

Chenhao Ye +13
cs.OS 2026-04-10 reviewed

Adaptive quantization cuts mobile LLM cold starts by 4x
EdgeFlow: Fast Cold Starts for LLMs on Mobile Devices

Yongsheng Yan +3
cs.DC 2026-04-10 reviewed

Right GPU cuts LLM energy use by 70% in servers
Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures

Mauricio Fadel Argerich +2
cs.DC 2026-04-10 reviewed

DAG consensus protocol lifts CFT throughput in wide-area nets
Finding Nemo-Nemo: CFT DAG-based Consensus in the WAN

Rithwik Kerur +5
cs.DC 2026-04-09 reviewed

Method scales sensor optimization to billion-DOF tsunami models on GPUs
Sensor Placement for Tsunami Early Warning via Large-Scale Bayesian Optimal Experimental Design

Sreeram Venkat +2
cs.DC 2026-04-09 reviewed

CPU offload over Nvlink-C2C fixes rigid GPU slice mismatches
Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading

Gabin Schieffer +3
cs.DC 2026-04-09 reviewed

Neural bandits learn better Kubernetes control-plane placements
NL-CPS: Reinforcement Learning-Based Kubernetes Control Plane Placement in Multi-Region Clusters

Sajid Alam +2
cs.DC 2026-04-09 reviewed

GPU HyperBall scales visibility graphs to 236k cells in 137 seconds
City-Scale Visibility Graph Analysis via GPU-Accelerated HyperBall

Alex Hodge +1
cs.DC 2026-04-09 reviewed

Causality arguments hold for quantum distributed snapshots
Asynchronous Quantum Distributed Computing: Causality, Snapshots, and Global Operations

Siddhartha Visveswara Jayanti +1
cs.DC 2026-04-09 reviewed

Joint algorithm minimizes weighted coflow time across OCS cores
Scheduling Coflows in Multi-Core OCS Networks with Performance Guarantee

Xin Wang +3
cs.DC 2026-04-09 reviewed

Speculative trees grow only when they cut inference time
SMART: When is it Actually Worth Expanding a Speculative Tree?

Lifu Wang +1
cs.DC 2026-04-09 reviewed

Energy-efficient GPUs deliver better value under budget limits
Wattlytics: A Web Platform for Co-Optimizing Performance, Energy, and TCO in HPC Clusters

Ayesha Afzal +2
cs.DC 2026-04-09 reviewed

Decomposed diffusion workflows handle 3x more requests
LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows

Lingyun Yang +12
cs.DC 2026-04-09 reviewed

Shared log makes LLM agent actions visible and stoppable
LogAct: Enabling Agentic Reliability via Shared Logs

Mahesh Balakrishnan +9
cs.DC 2026-04-09 reviewed

Beam speculation yields 1.4X LLM agent speedup on edge
B-PASTE: Beam-Aware Pattern-Guided Speculative Execution for Resource-Constrained LLM Agents

Yanfei Song
cs.DC 2026-04-09 reviewed

Decentralized edge agents lift mobile task success 21.7%
Administrative Decentralization in Edge-Cloud Multi-Agent for Mobile Automation

Senyao Li +5
cs.DC 2026-04-09 reviewed

Integrated panels give orbital AI 100 kW per ton
Reduced-Mass Orbital AI Inference via Integrated Solar, Compute, and Radiator Panels

Stephen Gaalema +2
cs.DC 2026-04-08 reviewed

No single config optimizes all goals in edge speculative LLM
ConfigSpec: Profiling-Based Configuration Selection for Distributed Edge--Cloud Speculative LLM Serving

Xiangchen Li +5
cs.DC 2026-04-08 reviewed

CPU-free LLM serving cuts P99 latency up to 8x
Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC

Mohammad Siavashi +4
cs.CR 2026-04-08 reviewed

Bonded identities and delay randomness fix MEV ordering
MEV-ACE: Identity-Authenticated Fair Ordering for Proposer-Controlled MEV Mitigation

Jian Sheng Wang
cs.DS 2026-04-08 reviewed

Batch algorithm updates maximal independent set in O(b log^3 n) work
Parallel Batch-Dynamic Maximal Independent Set

Guy Blelloch +4
eess.SY 2026-04-08 reviewed

AI workload power data scales to full data center energy profiles
Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning

Roberto Vercellino (1) +9