archive
Every paper Pith has read. Search by title, abstract, or pith.
1164 papers in cs.DC · page 1
-
CFD-PIVAE data lets HPC schedulers save 10 percent energy
Enhancing Energy Efficiency in Scientific Workflows through CFD based PIVAEs
-
Apps define their own workflows in new CPS control framework
SDNator is Not Another SDN Controller: Enabling Extensible Data-Driven Control in Cyber-Physical Systems
-
Learned indexes lift RocksDB to 2.1X read throughput with few changes
A Pragmatic Approach to Learned Indexing in RocksDB: Targeted Optimizations with Minimal System Modification
-
MoE latency falls 1.58x on Ascend NPUs with tile scheduling
HyperParallel-MoE: Multi-Core Interleaved Scheduling for Fast MoE Training on Ascend NPUs
-
Flare shifts only excess microservice spike load to serverless
Flare: Leveraging Serverless Elasticity to Absorb Microservice Load Spikes
-
Multi-proposer protocol guarantees attested payloads enter next block
AMP: Arc Multi-Proposer Protocol with Bounded Inclusion Guarantees
-
Herring parallelizes batch-order-fairness across DAG subdags
Herring: Parallel Batch-Order-Fairness on DAG-based Blockchain Consensus
-
Trust model raises threat detection 19% in cloud digital twins
Multi-Factor Trust-Driven Secure Communication Model for Cloud-Based Digital Twins
-
MRV orders DAG-BFT commits from vertex metadata after consensus
Multi-Round Visibility: A Post-Consensus Ordering Layer for DAG-Based BFT
-
Prefix-aware batching lifts LLM decode throughput 1.98x
AlignedServe: Orchestrating Prefix-aware Batching to Build a High-throughput and Computing-efficient LLM Serving System
-
Router trims LLM inference tail latency by 52% at wind farms
XWind: A Cross-site Router for Large Language Model Inference Serving at Renewable Energy Farms
-
OKBs compile AI regulations into executable validation modules
Ontological Knowledge Blocks: Executable Compliance and Profile-Based Validation for Trustworthy AI Systems
-
Physics calculations cap solar output reports before blockchain entry
SolarChain: Bridging Physical Law, Verifiable Trust, and Sustainable Markets for Urban Energy Resilience
-
AI agent produces verified distributed systems on all 7 tests
Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems
-
Orbax speeds JAX checkpoint saves up to 3.5x over PyTorch
Orbax: Distributed Checkpointing with JAX
-
AI predicts multi-region spot fleet costs pre-launch
AI-Driven Multi-Region Provisioning for Cloud Services Using Spot Fleets
-
Game model guides trauma teams to best outcome under limits
A Generalized Nash Equilibrium-Seeking Scheme for Trauma Resuscitation
-
Relays speed replica convergence in networks with brief contacts
Relay-Based Synchronization of Replicated Data Types in Opportunistic Networks
-
Multicast method removes duplicate packets in collective comms
Exploiting Multicast for Accelerating Collective Communication
-
Monotone codes encode data for any trust structure
Monotone Erasure Codes
-
Separate physical pools for KV and SSM caches cut OOMs 7.6% and raise throughput up to 13x
Asymmetric Virtual Memory Paging for Hybrid Mamba-Transformer Inference
-
Nf-PEAK measures Nextflow task energy to 11 percent error under load
Nf-PEAK: Process-Based Energy Attribution for Nextflow Workflows on Kubernetes Clusters
-
Scaling sepsis AI replicas to CPU threads halves detection latency
SepsisAI Orchestrator: A Containerized and Scalable Platform for Deploying AI Models and Real-Time Monitoring in Early Sepsis Detection
-
Framework computes determinants securely on edge servers
Secure and Parallel Determinant Computation for Large-Scale Matrices in Edge Environments
-
Live handoff cuts LLM training pauses to seconds
LiveR: Fine-Grained Elasticity via Live Reconfiguration for Model Training
-
Co-design speeds vector search up to 8.4 times over CPU
NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing
-
Component-level GPU control yields 10% energy savings
CompPow: A Case for Component-level GPU Power Management
-
DynaFlow adds flexible parallelism to ML systems with minimal code
DynaFlow: Transparent and Flexible Intra-Device Parallelism via Programmable Operator Scheduling
-
BDTS compacts 2.71M tokens to 4k while keeping graph reachability
Budgeted Dynamic Trace Structures for Token-Efficient Sequential Computation
-
Power caps cut LLM energy use by 26% while reducing QoS violations
PALS: Power-Aware LLM Serving for Mixture-of-Experts Models
-
Power law beats standard models in Bitcoin price forecasts
Bitcoin's Power Law: Weak Structure, Strong Forecasts
-
Simulator predicts LLM serving latency with 6% error
Frontier: Towards Comprehensive and Accurate LLM Inference Simulation
-
Local edge knowledge speeds up distributed matching and covering
Distributed Stochastic Graph Algorithms
-
Hardware load balancing keeps AI networks at 98% line rate
High-speed Networking for Giga-Scale AI Factories
-
Roadside perception services turn on only when vehicles approach
Cloud-Native Operation of Roadside Infrastructure Enabling Demand-Driven Collective Perception via V2X
-
FLECA defends decentralized EV learning from attacks
Automated Byzantine-Resilient Clustered Decentralized Federated Learning for Battery Intelligence in Connected EVs
-
Dynamic context parallelism boosts MoE request rates up to 3x
NanoCP: Request-Level Dynamic Context Parallelism for Data-Expert Parallel Decoding
-
Routing imbalance in MoE stays fixed when expert parallelism scales
Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory
-
Ark commits any number of offchain Bitcoin payments in 200 vB
Ark: Offchain Transaction Batching in Bitcoin
-
LOSCAR-SGD overlaps local steps with sparse delayed updates
LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging
-
Cluster runtime cuts RLVR GPU costs up to 37.58%
PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR
-
Two GPU counters match MFU within 2 points at fleet scale
Instant GPU Efficiency Visibility at Fleet Scale
-
WebGPU backend cuts LLM memory use by 29-33% in browsers
Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU
-
RBF architecture sustains low-latency edge inference despite HPC delays
Hybrid Edge-HPC Systems for Low-Latency Data-Driven Inference
-
Edge inference continues while HPC models update asynchronously
Hybrid Edge-HPC Systems for Low-Latency Data-Driven Inference
-
GPU hypergraph partitioner runs 380x faster under size and edge constraints
Hypergraph Partitioning on GPU with Distinct Incident Hyperedges and Size Constraints
-
Transaction research keeps going as systems change
Fifty Years of Transaction Processing Research (extended)
-
Multi-rank PIM beats CPUs on AES and SHA-256
Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM
-
GPU-aware expert mapping cuts MoE latency by 7.9 percent on average
GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems
-
Space Data Centers Process Satellite Data in Orbit
Deep Tech to Space: Space Data Centers and AI Revolution at the Edge