archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 5

cs.IT 2026-05-10 reviewed

Algorithm learns adversary utilities for sublinear regret in coding game
Learning from Acceptance: Cumulative Regret in the Game of Coding

Hanzaleh Akbari Nodehi +2
cs.AR 2026-05-10 reviewed

KV-cache movement regularization cuts static-graph LLM latency spikes
KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving

Zhiqing Zhong +5
cs.LG 2026-05-10 reviewed

Held-out gates catch regressions in LLM Metal kernel search
Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon

V\'ictor Gallego
cs.DS 2026-05-10 reviewed

Small subsets approximate the global ranking median
A Scalable and Unified Framework to Weighted Rank Aggregation

Amir Carmel +2
cs.DC 2026-05-10 reviewed

Adaptive DNN splits cut energy by 27-36% on real edge-cloud hardware
Adaptive DNN Partitioning and Offloading in Heterogeneous Edge-Cloud Continuum

Akuen Akoi Deng +3
cs.PL 2026-05-10 reviewed

CaMPL type system blocks deadlocks in concurrent code
Categorical Message Passing Language (CaMPL) for programmers

Daniel Kiyoshi Hashimoto +2
cs.DC 2026-05-10 reviewed

Air quality sensors detect cooking at 99.68 percent accuracy on-device
PoHAR: Understanding Hyperlocal Human Activities with Pollution Sensor Networks

Prasenjit Karmakar +2
cs.DC 2026-05-10 reviewed

ATLAS cuts GNN inference time 12-30x for billion-edge graphs
ATLAS: Efficient Out-of-Core Inference for Billion-Scale Graph Neural Networks

Pranjal Naman +1
cs.DC 2026-05-10 reviewed

Multi-metric detection catches all GPU failures in 504-GPU LLM run
From Detection to Recovery: Operational Analysis on LLM Pre-training with 504 GPUs

Daemyung Kang +13
cs.DC 2026-05-10 reviewed

Kernel-level splits let networked MCUs run large CNNs
Split CNN Inference on Networked Microcontrollers

Junyu Lu +4
cs.LG 2026-05-10 reviewed

DisagMoE overlaps MoE layers for 1.8x training speedup
DisagMoE: Computation-Communication overlapped MoE Training via Disaggregated AF-Pipe Parallelism

Zhichen Zeng +12
cs.CR 2026-05-10 reviewed

Split TCB design routes encrypted packets at native speed
Enforcing Attestable Workflows across Untrusted Networks

Hung Dang +1
cs.DC 2026-05-09 reviewed

Consistency models collapse into three entangled constraints
Light Cone Consistency: Closure, Ordering, and the Single-Observer Boundary

Rob Landers +1
cs.DC 2026-05-09 reviewed

System achieves up to 7.57x faster dynamic multimodal LLM training
MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production

Chunyu Xue +16
math.OC 2026-05-09 reviewed

Variance reduction shortens time complexity in parallel optimization
Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction

Zhirayr Tovmasyan +2
cs.LG 2026-05-09 reviewed

VAEs recover mixture proportions for personalized federated learning
FedGMI: Generative Model-Driven Federated Learning for Probabilistic Mixture Inference

Qijun Hou +3
cs.DC 2026-05-09 reviewed

Basic Verkle trees cost more than Merkle trees
TS-Verkle: A TypeScript Native Verkle Library With On-chain Verifier

Zhikai Li +4
cs.LG 2026-05-09 reviewed

Agent framework cuts data leaks 2-6x while raising accuracy 15-36%
PAAC: Privacy-Aware Agentic Device-Cloud Collaboration

Liangqi Yuan +3
cs.DC 2026-05-09 reviewed

Generative model compresses Earth data by up to 10,000 times
Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction

Jinxiao Zhang +16
eess.SP 2026-05-09 reviewed

LLMs collaborate across devices and cloud to meet resource limits
Large Language Models over Networks: Collaborative Intelligence under Resource Constraints

Liangqi Yuan +4
cs.DC 2026-05-08 reviewed

Same code runs in abstract rounds and real sockets for distributed algorithms
QUANTAS 2 An Abstract, Concrete and Byzantine Simulator

Mikhail Nesterenko +1
cs.DC 2026-05-08 reviewed

Concurrent RL fine-tunes match single-task quality at 4.3x efficiency
MARLaaS: Multi-Tenant Asynchronous Reinforcement Learning as a Service

Timothy Tin Long Yu +5
cs.DC 2026-05-08 reviewed

Block-level sharding scales context parallelism to 256 GPUs
Unleashing Scalable Context Parallelism for Foundation Models Pre-Training via FCP

Yilong Zhao +8
cs.LG 2026-05-08 reviewed

Asynchronous stages raise agent evolution throughput 3.5x
FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration

Zhengding Hu +10
cs.LG 2026-05-08 reviewed

Hybrid head speeds secure time-series inference up to 44x
Private Vertical Federated Inference for Time-Series

Lucas Fenaux +5
cs.DC 2026-05-08 reviewed

LLM profiler reuses work across models to cut GPU hours 56%
Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

Joon Ha Kim +3
cs.DC 2026-05-08 reviewed

Dooly reuses LLM op profiles across configs to cut costs 56%
Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

Joon Ha Kim +3
cs.LG 2026-05-08 reviewed

FLAM computes exact global performance in federated learning locally
FLAM: Evaluating Model Performance with Aggregatable Measures in Federated Learning

Fabian Stricker +3
cs.DC 2026-05-08 reviewed

Stencil kernels run up to 342x faster on wafer-scale engine
Stencil Computations on Cerebras Wafer-Scale Engine

Elia Belli +1
cs.LG 2026-05-08 reviewed

Adaptive tuning keeps decentralized SGD converging under adversary majority
\mathsf{VISTA}: Decentralized Machine Learning in Adversary Dominated Environments

Hanzaleh Akbari Nodehi +3
cs.AR 2026-05-08 reviewed

Model runs 1024-core chip sims 115x faster at under 7% error
Accelerating Precise End-to-End Simulation: Latency-Sensitive Many-core System Modeling

Yinrong Li +7
cs.DC 2026-05-08 reviewed

175B models trained at 10% peak FLOPs with standard parallelism
A Scalable Recipe on SuperMUC-NG Phase 2: Efficient Large-Scale Training of Language Models

Ajay Navilarekal Rajgopal +1
cs.DC 2026-05-08 reviewed

Wormhole stencil kernels match CPU speed but lose to transfers
Stencil Computations on Tenstorrent Wormhole

Lorenzo Piarulli +1
cs.DC 2026-05-08 reviewed

HexiSeq trains long-context LLMs 1.36x faster on mixed GPU clusters
HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware

Yan Liang +5
cs.DC 2026-05-08 reviewed

Hierarchical agents lift AI-RAN SLO fulfillment to 90%
Deadline-Driven Hierarchical Agentic Resource Sharing for AI Services and RAN Functions in AI-RAN

Haiyuan Li +2
cs.DC 2026-05-08 reviewed

RcLLM cuts TTFT 1.31x-9.51x for generative recommendation
RcLLM: Accelerating Generative Recommendation via Beyond-Prefix KV Caching

Zhan Zhao +2
cs.LG 2026-05-08 reviewed

Shared spectral operator aligns mismatched sensors in federated learning
UMEDA: Unified Multi-modal Efficient Data Fusion for Privacy-Preserving Graph Federated Learning via Spectral-Gated Attention and Diffusion-Based Operator Alignment

Shih-Yu Lai +4
cs.DC 2026-05-08 reviewed

MERBIT speeds irregular SpMV 27 percent on GPUs
MERBIT: A GPU-Based SpMV Method for Iterative Workloads

Qi Zhang +3
cs.LG 2026-05-08 reviewed

RL weight sync uses 100 times less data with full fidelity
SparseRL-Sync: Lossless Weight Synchronization with ~100x Less Communication

Lucas Hu +6
cs.AR 2026-05-08 reviewed

TREA accelerator reduces edge detection latency up to 9x
TREA: Low-precision Time-Multiplexed, Resource-Efficient Edge Accelerator for Object Detection and Classification

Vijay Pratap Sharma +4
eess.SP 2026-05-08 reviewed

Energy subtraction on paired elements recovers signed OTA aggregates
Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning

Hao Chen +1
eess.SP 2026-05-08 reviewed

Energy difference on two resources replaces CSI for wireless federated learning
Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning

Hao Chen +1
cs.DC 2026-05-08 reviewed

Future-state scheduler cuts LLM workflow makespan by 32 percent
FATE: Future-State-Aware Scheduling for Heterogeneous LLM Workflows

Zirui Huang +3
cs.SE 2026-05-08 reviewed

AI backends gain one admission seam for governance across requests
Execution Envelopes: A Shared Admission Contract for Backend AI Execution Requests

Krti Tallam
cs.DC 2026-05-07 reviewed

Hardware usage metrics match Kripke kernel to RAJA proxy
On Similarity of Computational Kernels in our Codes and Proxies

Michael McKinsey +2
cs.DC 2026-05-07 reviewed

Per-step slack regulator raises LLM goodput 1.77x
Regulating Branch Parallelism in LLM Serving

Swapnil Gandhi +3
cs.LG 2026-05-07 reviewed

IoT security model gains 30% detection boost with mostly unlabeled data
CLAD: A Clustered Label-Agnostic Federated Learning Framework for Joint Anomaly Detection and Attack Classification

Iason Ofeidis +4
cs.DC 2026-05-07 reviewed

Traces reveal LLM setups 3x slower on identical hardware
CCL-Bench 1.0: A Trace-Based Benchmark for LLM Infrastructure

Eric Ding +10
cs.DC 2026-05-07 reviewed

Sharing serving GPUs boosts agentic RL throughput 1.3-3.3x
ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL

Wei Gao +15
cs.DC 2026-05-07 reviewed

Serving GPUs accelerate agentic RL rollouts up to 3.3x
ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL

Wei Gao +15