archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 9

cs.DC 2026-04-29 reviewed

SplitFT speeds LLM fine-tuning with adaptive client cut layers
SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning

Yimeng Shan +5
cs.DC 2026-04-29 reviewed

Pipelined sharding speeds client xLM inference up to 30x with 10x less VRAM
Efficient, VRAM-Constrained xLM Inference on Clients

Aditya Ukarande +3
cs.CL 2026-04-29 reviewed

Folding parallelism cuts memory for long-context transformers
Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference

Vasu Shyam +2
cs.LG 2026-04-29 reviewed

Multi-version rollout lifts LLM RL throughput 2-3x while keeping convergence
DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training

Tianhao Hu +17
cs.AR 2026-04-28 reviewed

Memory-centric chiplets cut attention latency 15 times
AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving

Zhongkai Yu +11
cs.DC 2026-04-28 reviewed

Direct remote access beats prefetching for LLM GPU offloading
DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference

Shouxu Lin +2
cs.LG 2026-04-28 reviewed

Wave cost model picks MoE kernels with 0.93% regret
RaMP: Runtime-Aware Megakernel Polymorphism for Mixture-of-Experts

Vyom Sharma +1
cs.MA 2026-04-28 reviewed

Workflow structure lets Pythia speed up multi-agent LLM serving
Pythia: Exploiting Workflow Predictability for Efficient Agent-Native LLM Serving

Shan Yu +16
cs.MA 2026-04-28 reviewed

Simple interface lifts multi-agent LLM serving throughput
Pythia: Exploiting Workflow Predictability for Efficient Agent-Native LLM Serving

Shan Yu +16
eess.SP 2026-04-28 reviewed

Speculative decoding cuts federated LLM communication
SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission

Ce Zheng +5
cs.DS 2026-04-28 reviewed

Exclusive scans finish in log p rounds with bounded operator uses
Two Efficient Message-passing Exclusive Scan Algorithms

Jesper Larsson Tr\"aff
cs.DC 2026-04-28 reviewed

Hierarchical FL setups lower energy for plant disease classification
Performance and Energy Trade-Off Analysis of Hierarchical Federated Learning for Plant Disease Classification

Athanasios Papanikolaou +8
cs.DC 2026-04-28 reviewed

Volitional states guard atomic machine actions in people-machine systems
Volitional Multiagent Atomic Transactions: Describing People and their Machines

Andy Lewis-Pye +1
cs.DC 2026-04-28 reviewed

Computing clusters cut emissions by timing jobs to renewable surplus
Economical and ecological impact of sector coupling applied to computing clusters

P. Bechtle +9
cs.DC 2026-04-28 reviewed

Warp-tiled kernels cut depthwise convolution time by 3.26 times
CUDA Kernel Optimization and Counter-Free Performance Analysis for Depthwise Convolution in Cloud Environments

Huriyeh Babak +1
cs.DC 2026-04-28 reviewed

Microservice systems often model only partial production dynamics
Adaptive Management of Microservices in Dynamic Computing Environments: A Taxonomy and Future Directions

Ming Chen +3
cs.DC 2026-04-28 reviewed

3D parallelism cuts first-token time in LLM serving by 10-62%
CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration

Sean Nian +4
cs.DC 2026-04-27 reviewed

Fixed-input lock keeps Spark policy outputs identical under repartitioning
Spark Policy Toolkit: Semantic Contracts and Scalable Execution for Policy Learning in Spark

Zeyu Bai
cs.ET 2026-04-27 reviewed

IoE unifies people, data, and things for 6G automation
Internet of Everything in the 6G Era: Paradigms, Enablers, Potentials and Future Directions

Driss Choukri +3
cs.ET 2026-04-27 reviewed

Repository blockchain turns fork chains into trees for single-process access
A Tree-Based Repository Blockchain Framework for Shared Governance in Collaborative Fork Ecosystems

Razwan Ahmed Tanvir +1
cs.LG 2026-04-27 reviewed

One shared KV cache serves 15 agents at 97.7% less memory
PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference

Ishan Patel +1
cs.CR 2026-04-27 reviewed

Merkle trees allow 2-3x larger post-quantum cert chains
Network Impact of Post-Quantum Certificate Chain sizes on Time to First Byte in TLS Deployments

Matthew Chou +1
cs.DC 2026-04-27 reviewed

SpotVista picks multi-node spots with 81% higher availability
SpotVista: Availability-Aware Recommendation System for Reliable and Cost-Efficient Multi-Node Spot Instances

Taeyoon Kim +6
cs.CR 2026-04-27 reviewed

Split learning lets clients fine-tune LLMs without sharing data
A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy Optimizations

Zihan Liu +4
cs.DC 2026-04-27 reviewed

Incisor is a cloud system that pairs program analysis tools with large language models to…
Incisor: Ex Ante Cloud Instance Selection for HPC Jobs

Michael A. Laurenzano +2
cs.DC 2026-04-27 reviewed

Exact scheduler improves IoT latency
Exact, Efficient, and Reliable Multi-Objective and Multi-Constrained IoT Workflow Scheduling in Edge-Hub-Cloud Cyber-Physical Systems

Andreas Kouloumpris +3
cs.MA 2026-04-27 reviewed

Multi-agent LLM tutor runs full semester without boundary failures
ITAS: A Multi-Agent Architecture for LLM-Based Intelligent Tutoring

Iizalaarab Elhaimeur +1
cs.CY 2026-04-27 reviewed

Priority PayGo holds tutoring under 4s at 50 users
Latency and Cost of Multi-Agent Intelligent Tutoring at Scale

Iizalaarab Elhaimeur +1
cs.DC 2026-04-27 reviewed

Atomistic model reaches year-and-meter scales for RPV steel
Unfolding an Atomistic World: Atomistic Simulation of Reactor Pressure Vessel Steel Across Year-and-Meter Scales

Haozhi Han +10
cs.DC 2026-04-27 reviewed

AtomWorld simulates RPV steel atom by atom at meter and year scales
Unfolding an Atomistic World: Atomistic Simulation of Reactor Pressure Vessel Steel Across Year-and-Meter Scales

Haozhi Han +10
cs.DC 2026-04-27 reviewed

TACO cuts tensor-parallel communication to raise LLM training speed 1.87x
TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training

Man Liu +10
cs.LG 2026-04-27 reviewed

FreeScale cuts bubbles by 90 percent in recommendation model training
FreeScale: Distributed Training for Sequence Recommendation Models with Minimal Scaling Cost

Chenhao Feng +19
cs.DC 2026-04-27 reviewed

Kubernetes spot system gains 55% more performance per dollar
KubePACS: Kubernetes Cluster Using Performant, Highly Available, and Cost Efficient Spot Instances

Taeyoon Kim +4
cs.LG 2026-04-27 reviewed

CommFuse removes tail latency from LLM training overlaps
CommFuse: Hiding Tail Latency via Communication Decomposition and Fusion for Distributed LLM Training

Rezaul Karim +5
cs.DC 2026-04-27 reviewed

Distributed solver speeds large IPMs up to 97 times over single-node codes
SDSL-Solver: Scalable Distributed Sparse Linear Solvers for Large-Scale Interior Point Methods

Shaofeng Yang +5
cs.DC 2026-04-26 reviewed

Invariants proven for local-first access control data type
Towards System-Oriented Formal Verification of Local-First Access Control

Florian Jacob +2
cs.DC 2026-04-26 reviewed

Full-block fusion raises Pythia decoding speed 1.34x
ClusterFusion++: Expanding Cluster-Level Fusion to Full Transformer-Block Decoding

ChiHeng Jin +2
cs.DC 2026-04-25 reviewed

Isolated tracks let federated learning respect client exclusions
A Taxonomy and Resolution Strategy for Client-Level Disagreements in Federated Learning

Daan Rosendal +1
cs.DC 2026-04-25 reviewed

Genetic algorithm lifts blockchain validator profits by 15%
The Blockchain Execution Dilemma: Optimizing Revenue XOR Fair Ordering

Artjom Pugatsov +2
cs.DC 2026-04-25 reviewed

RL policy adapts caches to save 43% energy in GNN training
GreenDyGNN: Runtime-Adaptive Energy-Efficient Communication for Distributed GNN Training

Arefin Niam +2
cs.MA 2026-04-25 reviewed

Structured overlays beat gossip for AI agent discovery under node churn
Usable Agent Discovery for Decentralized AI Systems

Patrizio Dazzi +3
cs.DC 2026-04-24 reviewed

Survey maps path for large language model inference on edge networks
Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities

Zhixiong Chen +5
cs.DC 2026-04-24 reviewed

Peer-to-peer grids obey transport lower bounds and monoid reduction rules
Mathematical Foundations for Peer-to-Peer Lattice Computation

Danil Gorinevski (cybiont GmbH +2
cs.AR 2026-04-24 reviewed

Accelerators improve LLM speed on edge single-board computers
Cloud to Edge: Benchmarking LLM Inference On Hardware-Accelerated Single-Board Computers

Harri Renney +3
cs.LG 2026-04-24 reviewed

Gradient entropy ranks client contributions without validation data
Data-Free Contribution Estimation in Federated Learning using Gradient von Neumann Entropy

Asim Ukaye +3
cs.DC 2026-04-24 reviewed

Continuous bids cut cloud contention losses by 8-23%
LaissezCloud: Continuous Resource Renegotiation for the Public Cloud

Tejas Harith +1
cs.DC 2026-04-24 reviewed

MPS gains or loses 30% in GPU sharing depending on memory contention
A comprehensive evaluation of spatial co-execution on GPUs using MPS and MIG technologies

Jorge Villarrubia +3
cs.DC 2026-04-24 reviewed

Top-K method speeds sparse decode 1.88x on Blackwell
Guess-Verify-Refine: Data-Aware Top-K for Sparse-Attention Decoding on Blackwell via Temporal Correlation

Long Cheng +9
cs.DC 2026-04-24 reviewed

Multi-path GPU links with CUDA Graphs boost bandwidth 2.95x
Accelerating Intra-Node GPU-to-GPU Communication Through Multi-Path Transfers with CUDA Graphs

Amirhossein Sojoodi +4
cs.DC 2026-04-24 reviewed

Algorithm achieves 8K-approximation for coflow scheduling in K-core OCS networks
O(K)-Approximation Coflow Scheduling in K-Core Optical Circuit Switching Networks

Xin Wang +3