archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 7

cs.DC 2026-05-05 reviewed

Tensor lifting maps OpenMP loops to AI Engines
Lifting to tensors when compiling scientific computing workloads for AI Engines

Nick Brown +1
cs.DC 2026-05-05 reviewed

GPU layer speeds exascale trace analysis by up to 314x
Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics

Dragana Grbic (Department of Computer Science +1
cs.DC 2026-05-05 reviewed

GPU speeds exascale trace analysis by 314 times
Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics

Dragana Grbic (Department of Computer Science +1
cs.DC 2026-05-05 reviewed

MPC with limited machines needs higher local exponents for superlinear tasks
On Solving Problems of Substantially Super-linear Complexity in $N^{o(1)}$ Rounds in the MPC Model

Andrzej Lingas
cs.DC 2026-05-04 reviewed

Decoupled virtual cores lift LLM GPU throughput 24% on average
VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU

Zijian He +3
cs.PL 2026-05-04 reviewed

Pact maps choreographic protocols to formal games
Pact: A Choreographic Language for Agentic Ecosystems

Kiran Gopinathan +4
cs.DC 2026-05-04 reviewed

AI Data Centers Break Grid Load Diversity
From Barrier to Bridge: The Case for AI Data Center/Power Grid Co-Design

Noman Bashir +3
cs.LG 2026-05-04 reviewed

Draft signals let SpecKV adapt gamma for 56% faster speculative decoding
SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

Shikhar Shukla
cs.DC 2026-05-04 reviewed

Workflow templates speed sensor app prototyping for non-experts
From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications

Komal Thareja +2
cs.DC 2026-05-04 reviewed

AI reuses sensor workflow template to cut dev time to 1-2 days
(POSTER) From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications

Komal Thareja +2
cs.DC 2026-05-04 reviewed

Parallel HSOM cuts training time for intrusion detection
parHSOM: A novel parallel Hierarchical Self-Organizing Map implementation

Rebekah Lane +5
cs.DC 2026-05-04 reviewed

Optimal configuration found for N-body sims on RISC-V accelerators
Assessing Performance and Porting Strategies for Gravitational $N$-Body Simulations on the RISC-V-Based Tenstorrent Wormhole\textsuperscript{\texttrademark}

Jenny Lynn Almerol +3
quant-ph 2026-05-04 reviewed

Global optimization cuts distributed quantum costs most
Distributed Quantum Circuit Optimisation: Evaluating Global and Local encodings

Maria Gragera Garces +1
quant-ph 2026-05-04 reviewed

Global optimization minimizes distributed quantum circuit costs
Distributed Quantum Circuit Optimisation: Evaluating Global and Local encodings

Maria Gragera Garces +1
cs.DC 2026-05-04 reviewed

Bayesian optimization lifts Fabric TPS by 12%
Caliper-in-the-Loop: Black-Box Optimization for Hyperledger Fabric Performance Tuning

Yash Madhwal +7
cs.LG 2026-05-04 reviewed

Sign-Muon reaches O(1/sqrt(T)) rate with 32x bandwidth cut
SignMuon: Communication-Efficient Distributed Muon Optimization

Neel Mishra +2
cs.DC 2026-05-04 reviewed

Partial layer training matches full federated accuracy with 82 percent fewer parameters
FedPLT: Scalable, Resource-Efficient, and Heterogeneity-Aware Federated Learning via Partial Layer Training

Ahmad Dabaja +1
cs.DC 2026-05-04 reviewed

Kairos raises LLM SLO attainment by up to 34%
Taming Request Imbalance: SLO-Aware Scheduling for Disaggregated LLM Inference

Qipeng Wang +1
eess.SY 2026-05-04 reviewed

Each CAV spots sensor faults using distributed observers
Distributed Observer-based Fault Detection over Intelligent Networked Multi-Vehicle Systems

Mohammadreza Doostmohammadian +1
cs.DC 2026-05-04 reviewed

Raspberry Pi clusters teach undergrads practical supercomputing
Leveraging Teaching on Demand: Approaching HPC to Undergrads

S. Catal\'an +2
cs.DC 2026-05-04 reviewed

ZKP wrapper secures federated learning at 94 percent accuracy under attack
Privacy-Preserving Federated Learning: Integrating Zero-Knowledge Proofs in Scalable Distributed Architectures

Divya Gupta
cs.DC 2026-05-04 reviewed

IO500 logs reveal storage patterns missed by scores
A Treasure Trove of Performance: Analyzing the IO500 Submission Data

Julian Kunkel +4
cs.DC 2026-05-04 reviewed

Pipeline offloading lifts offline LLM throughput up to 2.51x
PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers

Hongbin Zhang +5
cs.CV 2026-05-04 reviewed

One-shot diffusion and model fusion reach 33.4% mAP for private surveillance
Heterogeneous Model Fusion for Privacy-Aware Multi-Camera Surveillance via Synthetic Domain Adaptation

Peggy Joy Lu +3
cs.CV 2026-05-04 reviewed

Privacy-preserving detection hits 33.4% mAP across cameras
Heterogeneous Model Fusion for Privacy-Aware Multi-Camera Surveillance via Synthetic Domain Adaptation

Peggy Joy Lu +3
cs.DC 2026-05-04 reviewed

AAFLOW speeds agentic AI pipelines 4.64x via zero-copy data flows
AAFLOW: Scalable Patterns for Agentic AI Workflows

Arup Kumar Sarker +5
cs.DC 2026-05-04 reviewed

Smaller idle models speed large LLM serving by more than double
SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM Inference

Jincheng Xie +6
cs.DC 2026-05-04 reviewed

Tail models accelerate large LLM inference by 2.28x as remote drafters
SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM Inference

Jincheng Xie +6
cs.DC 2026-05-04 reviewed

Queue predictions speed federated learning by 20 percent on HPC
FedQueue: Queue-Aware Federated Learning for Cross-Facility HPC Training

Yijiang Li +5
cs.DC 2026-05-04 reviewed

Queue predictions stabilize federated learning across HPC sites
FedQueue: Queue-Aware Federated Learning for Cross-Facility HPC Training

Yijiang Li +5
quant-ph 2026-05-03 reviewed

Random circuits distort quantum partitioning benchmarks
On the Distortion of Partitioning Performance by Random Quantum Circuits

Maria Gragera Garces
quant-ph 2026-05-03 reviewed

This paper finds that random quantum circuits used to test hypergraph partitioning for…
On the Distortion of Partitioning Performance by Random Quantum Circuits

Maria Gragera Garces
cs.DC 2026-05-03 reviewed

Data movement and overlap govern energy use in multimodal training
Cross-Layer Energy Analysis of Multimodal Training on Grace Hopper Superchips

Mahmoud Ahmed +6
cs.DC 2026-05-03 reviewed

Decentralized geohash sampling cuts geospatial stream latency
Decentralized Stratified Sampling for Low-Latency Approximate Geospatial Data Stream Processing in Edge-Cloud Architectures

Isam Mashhour Al Jawarneh +3
cs.LG 2026-05-03 reviewed

Sparse value sampling speeds attention 1.5x at long contexts
Stochastic Sparse Attention for Memory-Bound Inference

Kyle Lee +7
cs.LG 2026-05-03 reviewed

Declarative framework cuts RAG tuning code changes by 95%
AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines

Xintan Zeng +3
cs.DC 2026-05-03 reviewed

nvPAX three-phase method reaches 98.92% power satisfaction
nvPAX: Constrained Optimization for Dynamic Power Allocation in Hierarchical and Multi-Tenant Systems

Hadar Sivan +2
cs.DC 2026-05-03 reviewed

Joint time-structure model improves microservice fault detection
Joint Temporal-Structural Representation Learning for Distributed Fault Discrimination in Microservice Architectures

Yihan Xue +4
cs.DC 2026-05-03 reviewed

SplitZip speeds KV cache transfers by 1.32x with lossless GPU coding
SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving

Yipin Guo +1
cs.DC 2026-05-03 reviewed

SplitZip compresses KV caches at 613 GB/s for faster LLM transfers
SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving

Yipin Guo +1
cs.DC 2026-05-02 reviewed

CvxCluster uses a two-stage convex optimization approach to allocate resources across…
CvxCluster: Solving Large, Complex, Granular Resource Allocation Problems 100-1000x Faster

Obi Nnorom Jr +2
cs.AR 2026-05-02 reviewed

FPGA accelerator speeds SVD for PCA 22x over GPU
MANOJAVAM: A Scalable, Unified FPGA Accelerator for Matrix Multiplication and Singular Value Decomposition in Principal Component Analysis

Srivaths Ramasubramanian +7
cs.DC 2026-05-02 reviewed

Turing machine extension defines context-awareness
On defining and modeling context-awareness

Panteleimon Rodis
cs.OS 2026-05-02 reviewed

VUDA delivers 85% higher throughput via CUDA-Vulkan spatial sharing
VUDA: Breaking CUDA-Vulkan Isolation for Spatial Sharing of Compute and Graphics on the Same GPU

Bin Xu +4
cs.DC 2026-05-02 reviewed

Complex analysis cuts cloud VM flapping by 94%
Intelligent Autonomous Orchestration for Distributed Cloud Resources using Complex-Stability Analysis

Gopal Krishna Shyam +1
cs.DC 2026-05-02 reviewed

LLM serving needs math models over generic heuristics
Position: LLM Serving Needs Mathematical Optimization and Algorithmic Foundations, Not Just Heuristics

Zijie Zhou
cs.SE 2026-05-01 reviewed

DDD simulator runs same microservice code under multiple consistency models
A Domain-Driven Design Simulator for Business Logic-Rich Microservice Systems

Daniel da Palma Pereira +1
cs.DC 2026-05-01 reviewed

Interference flips scheduler rankings in 28% of edge cases
ncsim: A Lightweight Simulator for Networked Edge Computing with Wireless Interference Modeling

Bhaskar Krishnamachari +2
cs.DC 2026-05-01 reviewed

FPTC codec reaches 3.6x compression for power signals
FPTC: A Fast Parallel Transform-based Codec for Efficient Asymmetric Signal Compression

Ben Mechels +4
cs.DC 2026-05-01 reviewed

Streaming GPU encoding matches batch speed with 12x less memory
SURGE: SuperBatch Unified Resource-efficient GPU Encoding for Heterogeneous Partitioned Data

Shashank Kapadia +5