archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 16

cs.DC 2026-03-14 reviewed

Grassroots bonds turn community trust into interest-bearing liquidity
Grassroots Bonds as a Foundation for Market Liquidity

Ehud Shapiro
cs.DC 2026-03-13 reviewed

Token-budget routing cuts LLM GPU fleet 17-39%
Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference

Huamin Chen +4
cs.DC 2026-03-13 reviewed

Engine runs 1,200-node graphs after one agent call
Separating Intelligence from Execution: A Workflow Engine for the Model Context Protocol

Abhinav Singh Parmar
cs.LG 2026-03-12 reviewed

Cornserve boosts any-to-any model serving by 3.81x throughput
Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models

Jae-Won Chung +6
cs.DC 2026-03-12 reviewed

Multi-agent RL with graphs beats default Kubernetes scheduler
AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling

Hamed Hamzeh
cs.DC 2026-03-12 reviewed

Batch size cuts energy in LLM workflows but only for certain tasks
Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows

Md. Monzurul Amin Ifath +1
cs.DC 2026-03-12 reviewed

NCCLbpf adds verified eBPF policies to NCCL plugins with 130 ns overhead
NCCLbpf: Verified, Composable Policy Execution for GPU Collective Communication

Yusheng Zheng
cs.LG 2026-03-11 reviewed

Scheduler cuts multi-job federated learning time by 8.3x
FedACT: Concurrent Federated Intelligence across Heterogeneous Data Sources

Md Sirajul Islam +6
cs.CR 2026-03-11 reviewed

PrefixWall raises LLM cache reuse 70% over isolation
PrefixWall: Mitigating Prefix Caching Side Channels in Shared LLM Systems

Panagiotis Georgios Pennas +3
cs.DC 2026-03-11 reviewed

Ozaki-II adapted to FP8 cuts cost of double-precision matrix emulation
Double-Precision Matrix Multiplication Emulation via Ozaki-II Scheme with FP8 Quantization

Yuki Uchino +2
cs.DC 2026-03-11 reviewed

Cloud LLM creates and pushes adaptive code to edge devices
LLM-assisted Agentic Edge Intelligence Framework

Chinmaya Kumar Dehury +4
cs.DC 2026-03-10 reviewed

Flash-KMeans runs exact GPU k-means 18x faster
Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Shuo Yang +12
cs.DC 2026-03-10 reviewed

Sparse gating turns LLM batches into elastic super-trees for 5x speedup
ECHO: Elastic Speculative Decoding with Sparse Gating for High-Concurrency Scenarios

Xinyi Hu +8
cs.DC 2026-03-10 reviewed

FP64 tensor cores speed finite-element kernels 2x
Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores

Jiqun Tu +6
cs.DC 2026-03-08 reviewed

ArcLight raises CPU LLM throughput by 46% via NUMA control
ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs

Yuzhuang Xu +3
cs.AI 2026-03-08 reviewed

Graph engine runs LLM agents with zero hallucinations
GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration

Yeahia Sarker +3
cs.DC 2026-03-08 reviewed

Potential games and LLM weights optimize UAV networks
Agentic AI-Driven UAV Network Deployment: An LLM-Enhanced Exact Potential Game Approach

Xin Tang +8
cs.DC 2026-03-07 reviewed

ML duration predictor trims supercomputer job waits by 11%
Duration-Informed Workload Scheduler

Daniela Loreti +2
cs.DC 2026-03-07 reviewed

Simulator tests failure knobs for large AI clusters
AIReSim: A Discrete Event Simulator for Large-scale AI Cluster Reliability Modeling

Karthik Pattabiraman +2
cs.DC 2026-03-06 reviewed

OMA retains Kubernetes crash evidence past the evidence horizon
Operational Memory Architecture for Kubernetes:Preserving Causal Context Across the Evidence Horizon

Shamsher Khan
cs.DC 2026-03-06 reviewed

DMM merges divergent models data-free using normalization stats
Domain-Adaptive Model Merging Across Disconnected Modes

Junming Liu +4
cs.DC 2026-03-05 reviewed

Misaligned dimensions keep compressed LLMs from speeding up
Why Smaller Is Slower? Dimensional Misalignment in Compressed LLMs

Jihao Xin +4
quant-ph 2026-03-04 reviewed

Heron beats Eagle in protocol benchmarks for quantum advantage
Benchmarking Quantum Computers via Protocols, Comparing IBM's Heron vs IBM's Eagle

Nitay Mayo +2
cs.DC 2026-03-04 reviewed

Benchmark suite derives efficiency rules for compound AI
Benchmarking Compound AI Applications for Hardware-Software Co-Design

Paramuth Samuthrsindh +5
cs.DC 2026-03-04 reviewed

Planning system decides satellite vs ground tasks to fit data transfers
Constraint-Aware Execution Planning for Hybrid Space-Ground Compute Workloads

Subhadip Mitra
quant-ph 2026-03-04 reviewed

Beam search reduces quantum communication costs in circuit partitioning
Efficient Time-Aware Partitioning of Quantum Circuits for Distributed Quantum Computing

Raymond P. H. Wu +5
cs.DC 2026-03-04 reviewed

Unified objects automate IoT edge-cloud apps with 9 nines availability
EdgeWeaver: Accelerating IoT Application Development Across Edge-Cloud Continuum

Pawissanutt Lertpongrujikorn +3
cs.DC 2026-03-04 reviewed

Fixed encoding decodes data 9-213× faster than Protocol Buffers
Simplicity Scales

Andrew Sampson (6OVER3 Institute) +2
quant-ph 2026-03-03 reviewed

Gate fusion speeds quantum ML simulation by 20 times
Fast and memory-efficient classical simulation of quantum machine learning via forward and backward gate fusion

Yoshiaki Kawase
cs.RO 2026-03-03 reviewed

The paper introduces the cuNRTO framework with two new CUDA-based architectures
cuNRTO: GPU-Accelerated Nonlinear Robust Trajectory Optimization

Jiawei Wang +2
cs.DC 2026-03-01 reviewed

Filecoin reaches 2^{-30} finality in 30 rounds not 900
The Finality Calculator: Analyzing and Quantifying Filecoin's Finality Guarantees

Guy Goren +1
cs.DC 2026-02-27 reviewed

SPARe keeps fault-tolerance overhead at 2-3x for 100k GPU LLM training
SPARe: Stacked Parallelism with Adaptive Reordering for Fault-Tolerant LLM Pretraining Systems with 100k+ GPUs

Jin Lee +8
cs.LG 2026-02-27 reviewed

Perturbed model copies enable private LLM unlearning
MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models

Tiantong Wang +5
cs.CR 2026-02-26 reviewed

Protocol outsources MSM with 300x faster verification
2G2T: Constant-Size, Statistically Sound MSM Outsourcing

Majid Khabbazian
cs.LG 2026-02-26 reviewed

Shared caching cuts edge LLM first-token time by 93%
Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching

Hiroki Matsutani +2
cs.DC 2026-02-25 reviewed

CXL memory pool beats InfiniBand on GPU collectives
CCCL: Node-Spanning GPU Collectives with CXL Memory Pooling

Dong Xu (1) +16
cs.DC 2026-02-25 reviewed

Flexible sharding lifts FSDP speed by up to 66% at 10k GPUs
veScale-FSDP: Flexible and High-Performance FSDP at Scale

Zezhou Wang +11
cs.RO 2026-02-24 reviewed

GPU hybrid matches top solvers on large multi-depot routing
A GPU-Accelerated Hybrid Method for a Class of Multi-Depot Vehicle Routing Problems

Zhenyu Lei +1
cs.DC 2026-02-24 reviewed

Morton curve defined for pyramids in hybrid AMR
A Morton-Type Space-Filling Curve for Pyramid Subdivision and Hybrid Adaptive Mesh Refinement

David Knapp +4
cs.DC 2026-02-22 reviewed

Semantic dependencies resolve data conflicts locally via rebasing
Semantic Conflict Model for Collaborative Data Structures

Georgii Semenov +1
cs.DC 2026-02-21 reviewed

DualScale cuts energy up to 48% in LLM decode phase
DualScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS

Omar Basit +3
cs.MA 2026-02-21 reviewed

74% of workflows need no coordination for correctness
When Coordination Is Avoidable: A Monotonicity Analysis of Organizational Tasks

Harang Ju
cs.DC 2026-02-19 reviewed

GPU memory estimators fail to generalize across hardware
GPU Memory and Utilization Estimation for Training-Aware Resource Management: Opportunities and Limitations

Ehsan Yousefzadeh-Asl-Miandoab +4
cs.DC 2026-02-19 reviewed

SwapLess cuts Edge TPU latency up to 77% via CPU-TPU partitioning
Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUs

Nathan Ng +7
cs.DC 2026-02-18 reviewed

Prebuilt hypertree removes locks from parallel node generation
Load Balanced Parallel Node Generation for Meshless Numerical Methods

Jon Vehovar +3
cs.DC 2026-02-18 reviewed

Circuit cutting trains QNNs on distributed systems without losing accuracy
DistributedEstimator: Distributed Training of Quantum Neural Networks via Circuit Cutting

Prabhjot Singh +2
cs.LG 2026-02-17 reviewed

Cloud inference matches on-device for real-time braking
Cloud Is Closer Than It Appears: Revisiting the Tradeoffs of Distributed Real-Time Inference

Pragya Sharma +2
cs.DC 2026-02-15 reviewed

Baremetal runtime lifts AI efficiency 9x on 10x fewer tiles
AEG: A Baremetal Framework for AI Acceleration via Direct Hardware Access in Heterogeneous Accelerators

Hua Jiang +3
cs.MS 2026-02-15 reviewed

Direct solvers scale via communication cuts and low-rank compression
Parallel Sparse and Data-Sparse Factorization-based Linear Solvers

Xiaoye Sherry Li +1
cs.DC 2026-02-14 reviewed

Energy use shifts from linear to root function as core count rises
The Impact of Process Competition on Energy Consumption: Analysis and Modeling

Eduardo Gomes Campos +5