archive
Every paper Pith has read. Search by title, abstract, or pith.
1164 papers in cs.DC · page 2
-
Dynamic placement over three options cuts deadline misses in robot pipelines
DAG-Based QoS-Aware Dynamic Task Placement for Networked Multi-Stage Control Pipelines
-
Reasoning LLMs trap data parallelism in KV-cache limits
Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles
-
Bitcoin V2 encryption still permits eclipse and downgrade attacks
Security Analysis of Bitcoin's V2 Transport Protocol: Exploiting Design Implications for Sustained Eclipse and Downgrade Attacks
-
Offloading slows smaller LLMs more in mixed serving
Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption
-
Any sequential object admits a conflict-free linearizable implementation
Conflict-Freedom as a Progress Condition
-
Exact conditions found for multi-room wakeup solutions
A parallel wakeup problem and multi-room light switch strategies
-
FedADAS cuts communication 9974x for custom yawn models
FedADAS: Communication-Efficient Federated Distillation for On-Device Driver Yawn Recognition in Vehicular Networks
-
Predictor accuracy sets exact fault tolerance in Byzantine agreement
Resilient Byzantine Agreement with Predictions
-
Latent storage cuts AI image needs by 78.7 percent
LatentBox: Storing AI-Generated Images at Scale via a Latent-First Design
-
LatentBox cuts AI image storage by 78.7% using latents
LatentBox: Storing AI-Generated Images at Scale via a Latent-First Design
-
Quantum sensors hit Heisenberg scaling despite faults
Quantum-Enhanced Distributed Sensor Fusion: Lower Bounds on Aggregation from Projection Noise to Heisenberg-Limited Byzantine-Tolerant Networks
-
GCAS enables first linear-space universal constructions for infinite arrivals
Generalized Compare-and-Swap and Space-Efficient Universal Constructions for the Infinite-Arrival Model
-
Geo-distributed AI training optimizes at 10-100 km distances
Modeling the Impact of Fiber Latency on Compute-Communication Overlap in Geo-Distributed Multi-Datacenter AI Training
-
Planar MDS algorithms lift to genus g with 3α+1 ratio
Meta-Theorems for Cuttable Distributed Problems
-
Edge computing slashes ToT AIGC delay over 80%
Unleashing the Power of Tree-of-Thoughts for Edge-Enabled AIGC Service Provisioning
-
1-PLS of cost p yields t-PLS of cost p/t up to log n
Near-Resolution of the Tradeoff Conjecture in Distributed Proof Labeling Schemes
5 Piths -
Readiness-first runtime speeds pipeline training up to 2.77x
A Readiness-Driven Runtime for Pipeline-Parallel Training under Runtime Variability
-
NVFP4 and balanced SP enable 2x faster long video training
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation
-
Multimodal training 1.31x faster by sharing GPUs spatially
Mosaic: Towards Efficient Training of Multimodal Models with Spatial Resource Multiplexing
-
CIRCLES protocol solves majority with k^3 states
Ranking Opinions with Few States in Population Protocols
-
PopPy speeds Python AI apps up to 6.4x by parallelizing external calls
PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications
-
EPIC protocol enables polymorphic collectives on Ethernet
EPIC: Abstraction and Polymorphism of In-Network Collectives on Ethernet
-
Decoupled method hits optimal communication for saddle problems
Efficient Gradient Methods for Distributed Saddle Problems
-
2D blocking and virtual pointers lift GPU SpMV speed by up to 4x
CB-SpMV:A Data Aggregating and Balance Algorithm for Cache-Friendly Block-Based SpMV on GPUs
-
Federated meta-RL speeds vehicular task offloading
Heterogeneous Tasks Offloading in Vehicular Edge Computing: A Federated Meta Deep Reinforcement Learning Approach
-
RNS comparison works over full range with one mixed-radix step
Residue Number System Comparison revisited, a software perspective
-
JanusPipe speeds conservative MLIP training 1.5x on 32 GPUs
JanusPipe: Efficient Pipeline Parallel Training for Machine Learning Interatomic Potentials
-
JanusPipe boosts MLIP training 1.51x on 32 GPUs
JanusPipe: Efficient Pipeline Parallel Training for Machine Learning Interatomic Potentials
-
LLM instrumentation detects performance regressions at 5x lower severity
Duet instrumentation: An Agentic Approach to Improving Sensitivity in Cloud Service Benchmarking
-
Hybrid cluster cuts HTTP response time by over 40%
iHAC: A Hybrid Cluster Architecture for Enhanced Performance and Resilience
-
Top PRNGs pass only 72% of BigCrush tests across 1000+ streams
ASSESSING THE STOCHASTIC PROPERTIES OF MODERN PSEUDO-RANDOM GENERATORS FOR PARALLEL COMPUTING
-
Ringmaster LMO recovers optimal async time complexity for LMO
Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method
1 Piths -
Counting algorithm stabilizes in O(f+1) rounds after actual faults
Early-Stabilizing Counting
-
Renaming algorithms reach poly-log time with near-linear bits
Distributed Renaming with Subquadratic Bits via Scalable Committee Election
-
LLM semantics from names predict load phases and cut storage overloads 79%
TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semantics
-
Load balancing family finishes M tasks in O((M/n) log n) rounds
The Task Completion Problem and its Application to Crash-Resilient Computation
-
Graph allocation trims quantum workflow overhead by 30%
A System Aware Resource Allocation for Distributed Workflows in Quantum Computing Environments
-
Balancer halves imbalance in video diffusion transformer training
AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training
-
Guard boosts training utilization 1.7x by catching hidden stragglers
Guard: Scalable Straggler Detection and Node Health Management for Large-Scale Training
-
TierCheck cuts LLM checkpointing time below 10 seconds
TierCheck: Tiered Checkpointing for Fault Tolerance in Large Language Model Training
-
Covariance rotations keep 2-bit KV caches accurate
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
-
Single send verb unifies six kernel properties for objects and history
Send: Objects, History, and Transactions in a Single-Verb Kernel
-
Logit scores estimate client contributions per class in federated learning
Data-Free Client Contribution Estimation via Logit Maximization for Federated Learning
-
Simulator predicts LLM performance with under 5% error
Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference
-
Simulator achieves under 5% error in predicting LLM training performance
Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference
-
Object storage delivers KV cache in GPU order with 5.6% latency overhead
ObjectCache: Layerwise Object-Storage Retrieval for KV Cache Reuse
-
S-Bus rebuilds agent read sets from HTTP logs to block races
S-Bus: Automatic Read-Set Reconstruction for Multi-Agent LLM State Coordination
-
HTTP log reconstructs agent reads to block structural races
S-Bus: Automatic Read-Set Reconstruction for Multi-Agent LLM State Coordination
-
GoodServe lifts goodput up to 27.4% for agentic LLM inferences
GoodServe: Towards High-Goodput Serving of Agentic LLM Inferences over Heterogeneous Resources
-
Two-layer wrapper makes model merges order-independent
Conflict-Free Replicated Data Types for Neural Network Model Merging: A Two-Layer Architecture Enabling CRDT-Compliant Model Merging Across 26 Strategies