archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 10

cs.DC 2026-04-24 reviewed

GICC cuts GPU coordination latency up to 229 times on Slingshot
GICC: A High-Performance Runtime for GPU-Initiated Communication and Coordination in Modern HPC Systems

Baodi Shan +2
cs.DC 2026-04-23 reviewed

Fused GPU kernel speeds epidemic sims 217x over CPU
FlashSpread: IO-Aware GPU Simulation of Non-Markovian Epidemic Dynamics via Kernel Fusion

Heman Shakeri +3
cs.DC 2026-04-23 reviewed

Gradient sharding removes the serverless memory ceiling for federated learning
Shard the Gradient, Scale the Model: Serverless Federated Aggregation via Gradient Partitioning

Amine Barrak
cs.LG 2026-04-23 reviewed

N-gram promotion ensembles match neural accuracy at lower cost
Promoting Simple Agents: Ensemble Methods for Event-Log Prediction

Benedikt Bollig +3
cs.DC 2026-04-23 reviewed

Restructured big-integer ops deliver 4x SIMD speedups in libraries
Leveraging SIMD for Accelerating Large-number Arithmetic

Subhrajit Das +2
cs.DC 2026-04-23 reviewed

UBRI analysis abstracts blockchain research into deployable design themes
Systematizing Blockchain Research Themes and Design Patterns: Insights from the University Blockchain Research Initiative (UBRI)

Chien-Chih Chen +4
cs.DC 2026-04-23 reviewed

Risk estimates and hysteresis cut edge server switches 88%
Risk-Aware and Stable Edge Server Selection Under Network Latency SLOs

Mohan Liyanage +3
cs.DC 2026-04-23 reviewed

Delta Lake loads fastest, Iceberg saves most space
Research on the efficiency of data loading and storage in Data Lakehouse architectures for the formation of analytical data systems

Ivan Borodii +1
cs.DC 2026-04-23 reviewed

LLM planner cuts latency 20% in WiFi offload networks
A Task Decomposition and Planning Framework for Efficient LLM Inference in AI-Enabled WiFi-Offload Networks

Mingqi Han +1
cs.CV 2026-04-23 reviewed

One-layer lookahead decouples graph build from update in Vision GNNs
GraphLeap: Decoupling Graph Construction and Convolution for Vision GNN Acceleration on FPGA

Anvitha Ramachandran +2
cs.DC 2026-04-23 reviewed

Data pipeline changes cut deep learning training from 22 hours to 3 hours
Optimizing High-Throughput Distributed Data Pipelines for Reproducible Deep Learning at Scale

Kashish Mittal +5
cs.MA 2026-04-22 reviewed

Dedicated L2 stack needed for AI agent economies
AGNT2: Autonomous Agent Economies on Interaction-Optimized Layer 2 Infrastructure

Anbang Ruan +1
cs.DC 2026-04-22 reviewed

LLM turns natural language into OpenSearch queries under human control
A Cloud-Native Architecture for Human-in-Control LLM-Assisted OpenSearch in Investigative Settings

Benjamin Puhani +2
cs.AR 2026-04-22 reviewed

Runtime dispatcher shares Versal AI Engine tiles among mixed-criticality tasks
Enabling Mixed criticality applications for the Versal AI-Engines

Vincent Sprave +4
cs.AR 2026-04-22 reviewed

FPGA level-wise batch search speeds B+ tree lookups 4.9x
Efficient Batch Search Algorithm for B+ Tree Index Structures with Level-Wise Traversal on FPGAs

Max Tzschoppe +3
cs.DC 2026-04-22 reviewed

GPU runs 20,000 GWAS phenotypes in 20 minutes
TorchGWAS : GPU-accelerated GWAS for thousands of quantitative phenotypes

Xingzhong Zhao +7
cs.DC 2026-04-22 reviewed

BloomBee raises decentralized LLM throughput up to 1.76x
Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization

Jiu Chen +6
cs.LG 2026-04-22 reviewed

Spectral check spots clean clients to fix noisy labels
FedSIR: Spectral Client Identification and Relabeling for Federated Learning with Noisy Labels

Sina Gholami +4
cs.LG 2026-04-22 reviewed

Exact attention on billion-token sequences runs on single GPU
Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling

Yiming Bian +1
quant-ph 2026-04-22 reviewed

Quantum preconditioning prevents exponential failures in high-dim search
Distributed Quantum-Enhanced Optimization: A Topographical Preconditioning Approach for High-Dimensional Search

Dominik So\'os +3
quant-ph 2026-04-22 reviewed

Quantum framework solves 500-variable higher-order problems in 170 seconds
Distributed Quantum Optimization for Large-Scale Higher-Order Problems with Dense Interactions

Seongmin Kim +7
cs.DC 2026-04-22 reviewed

Fine-grained phase management boosts LLM serving throughput by 53%
FASER: Fine-Grained Phase Management for Speculative Decoding in Dynamic LLM Serving

Wenyan Chen +3
cs.DC 2026-04-22 reviewed

RISC-V SG2044 doubles single-core performance in HPC testbed
Monte Cimone v3: Where RISC-V Stands in High-Performance Computing

Emanuele Venieri +6
cs.DC 2026-04-22 reviewed

CoVer verifier extended to Fortran with better efficiency than MUST
Extending Contract Verification for Parallel Programming Models to Fortran

Yussur Mustafa Oraji +1
cs.DC 2026-04-22 reviewed

Mobile app boosts emergency response with phone sensing and cloud
e112: A Context-Aware Mobile Emergency Communication Platform Leveraging Smartphone Sensing and Cloud Services

Katerina Ioannidou +2
cs.LG 2026-04-22 reviewed

Joint optimizations cut multi-agent edge latency by 62 percent at 200 agents
A Delta-Aware Orchestration Framework for Scalable Multi-Agent Edge Computing

Samaresh Kumar Singh +1
quant-ph 2026-04-22 reviewed

Nine quantum-HPC stacks share design patterns for unifying layers
Quantum-HPC Software Stacks and the openQSE Reference Architecture: A Survey

Amir Shehata +24
cs.DC 2026-04-22 reviewed

Watchdog turns Lambda kills into clean Spark table rollbacks
Characterizing and Fixing Silent Data Loss in Spark-on-AWS-Lambda with Open Table Formats

Srujan Kumar Gandla
cs.LG 2026-04-21 reviewed

Four dimensions organize blockchain-federated learning systems
Federated Learning over Blockchain-Enabled Cloud Infrastructure

Saloni Garg +2
cs.DC 2026-04-21 reviewed

Slicing traces GPU stall roots for 1.8x speedups across vendors
LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing

Yuning Xia +1
cs.DC 2026-04-21 reviewed

Local cost signal lifts satellite goodput 20% and throughput 31%
Equinox: Decentralized Scheduling for Hardware-Aware Orbital Intelligence

Ansel Kaplan Erol +1
cs.SE 2026-04-21 reviewed

Predictive autoscaler holds Node.js latency at 26 ms in ramps
Predictive Autoscaling for Node.js on Kubernetes: Lower Latency, Right-Sized Capacity

Ivan Tymoshenko +2
cs.DC 2026-04-21 reviewed

Copy engine enables free intra-node MoE load balancing
FEPLB: Exploiting Copy Engines for Nearly Free MoE Load Balancing in Distributed Training

Shuyao Qi +2
cs.DC 2026-04-21 reviewed

35 watchers prevent double-spends without global consensus
Intercloud: Eventual Consistency for Decentralised Economies via Chilling-Effect Consensus

Gregory Magarshak
cs.DC 2026-04-21 reviewed

ReaLB speeds multimodal MoE inference 1.29x by runtime precision adjustment
ReaLB: Real-Time Load Balancing for Multimodal MoE Inference

Yingping Wang +6
cs.DC 2026-04-21 reviewed

ReaLB speeds multimodal MoE inference 1.1-1.32x via per-rank precision cuts
ReaLB: Real-Time Load Balancing for Multimodal MoE Inference

Yingping Wang +6
cs.DC 2026-04-21 reviewed

CXL single-copy cache yields 5.6X geo-mean speedup
DPC: A Distributed Page Cache over CXL

Shai Bergman +6
cs.DC 2026-04-21 reviewed

Self-stabilizing algorithms minimize IP risks hierarchically
Minimizing Intellectual Property Risks via Self-Stabilizing Algorithms

Ken Kennedy +1
cs.LG 2026-04-21 reviewed

Satellite FL routing is tractable or NP-hard by case
Optimal Routing for Federated Learning over Dynamic Satellite Networks: Tractable or Not?

Yi Zhao +4
cs.DC 2026-04-21 reviewed

CROWDio cuts execution time by 57% with adaptive scheduling on phones
CROWDio: A Practical Mobile Crowd Computing Framework with Developer-Oriented Design, Adaptive Scheduling, and Fault Resilience

Lakshani Manamperi +4
cs.DC 2026-04-21 reviewed

Matrix co-design gives PIC particle phase 10.9x speedup
POLAR-PIC: A Holistic Framework for Matrixized PIC with Co-Designed Compute, Layout, and Communication

Yizhuo Rao +10
cs.CE 2026-04-21 reviewed

Tensor cores accelerate PIC mass matrix assembly up to 3x
Mass Matrix Assembly on Tensor Cores for Implicit Particle-In-Cell Methods

Luca Pennati +1
cs.DC 2026-04-21 reviewed

Uniform trees let FMM scale to 32 billion points on 512 nodes
A Simple Communication Scheme for Distributed Fast Multipole Methods

Srinath Kailasa
cs.DC 2026-04-21 reviewed

MegaKernels fuse MoE communication and computation for up to 38 percent speedup
UniEP: Unified Expert-Parallel MoE MegaKernel for LLM Training

Size Zheng +3
cs.CR 2026-04-21 reviewed

Multi-party protocol aligns data privately without intersection leaks
Sherpa.ai Privacy-Preserving Multi-Party Entity Alignment without Intersection Disclosure for Noisy Identifiers

Daniel M. Jimenez-Gutierrez +6
cs.CR 2026-04-21 reviewed

Multi-party protocol aligns data without revealing shared records
Sherpa.ai Privacy-Preserving Multi-Party Entity Alignment without Intersection Disclosure for Noisy Identifiers

Daniel M. Jimenez-Gutierrez +6
cs.DC 2026-04-21 reviewed

Simulator lets agents control and adapt fog models at runtime
YAIFS: Yet (not) Another Intelligent Fog Simulator: A Framework for Agent-Driven Computing Continuum Modeling & Simulation

Isaac Lera +1
cs.DB 2026-04-21 reviewed

Heuristic partitioning cuts multi-tenant query P95 latency from 61s to 2s
Heuristic Search Space Partitioning for Low-Latency Multi-Tenant Cloud Queries

Prashant Kumar Pathak +2
cs.CR 2026-04-21 reviewed

CHRONOS cuts IoT federated learning latency by 74 percent
CHRONOS: A Hardware-Assisted Phase-Decoupled Framework for Secure Federated Learning in IoT

Hung Dang
cs.DC 2026-04-21 reviewed

HyperLogLog skips exact counts for faster GPU SpGEMM
Ocean: Fast Estimation-Based Sparse General Matrix-Matrix Multiplication on GPU

Yifan Li +1