pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 8

  1. cs.LG 2026-05-01 reviewed
    Local AI agents stop early to cut energy waste 15-20%

    AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

    Dzung Pham +3

  2. cs.DC 2026-05-01 reviewed
    Perseus fixes proxy RDMA serialization for 10x multi-node MoE speedup

    Eliminating Hidden Serialization in Multi-Node Megakernel Communication

    Byungsoo Oh +1

  3. cs.DC 2026-05-01 reviewed
    Emulator matches vLLM serving within 5 percent error

    LLM-Emu: Native Runtime Emulation of LLM Inference via Profile-Driven Sampling

    Wei Da +1

  4. cs.CL 2026-05-01 reviewed
    Quantization halves memory use in LLM training

    AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs

    Wenxiang Lin +7

  5. cs.CL 2026-05-01 reviewed
    Quantization halves memory for 8B–32B LLM training

    AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs

    Wenxiang Lin +7

  6. cs.DC 2026-05-01 reviewed
    Fixed-core approach yields 211x higher efficiency for edge GEMM

    Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge

    M. Grailoo +1

  7. cs.DC 2026-05-01 reviewed
    Workflow scheduling cuts AI agent task time by 1.64x

    SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters

    Dongxin Guo +2

  8. cs.DC 2026-05-01 reviewed
    Ring subnets cut space LLM latency by threefold

    SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks

    Zhanwei Wang +4

  9. cs.DC 2026-05-01 reviewed
    Ring subnets cut satellite LLM latency threefold

    SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks

    Zhanwei Wang +4

  10. cs.DC 2026-05-01 reviewed
    IPU scaling boosts CFD AI training throughput fivefold

    Adaptation of AI-accelerated CFD Simulations to the IPU platform

    P. Rosciszewski +4

  11. cs.DC 2026-05-01 reviewed
    OrbitBFT scales BFT consensus in LEO satellite networks

    OrbitBFT: Enabling Scalable and Robust BFT Consensus in LEO Constellations

    Tianyi Sun +3

  12. cs.LG 2026-05-01 reviewed
    Architecture shapes convergence in hierarchical federated learning

    Hierarchical Federated Learning for Networked AI: From Communication Saving to Architecture-Aware Design

    Seyed Mohammad Azimi-Abarghouyi +2

  13. cs.AI 2026-05-01 reviewed
    Same model accuracy varies 12 points by endpoint

    Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference

    Yuxuan Gao +2

  14. stat.CO 2026-04-30 reviewed
    Three streaming covariance algorithms match exactly in exact math

    $2B$ or Not $2B$: A Tale of Three Algorithms for Streaming: Covariance Estimation after Welford and Chan-Golub-LeVeque

    Felix Reichel

  15. cs.DC 2026-04-30 reviewed
    Replication cuts partitioning costs by 17-65 percent on average

    Replication in Graph Partitioning and Scheduling Problems

    P\'al Andr\'as Papp +2

  16. cs.NI 2026-04-30 reviewed
    Untwinning removes specific network twins without full rebuild

    Network Digital Untwinning: Towards Backward Optimization of Digital Twins

    Zifan Zhang +7

  17. cs.DC 2026-04-30 reviewed
    Dedicated engine separates models for easier architecture simulation

    Akita: A High Usability Simulation Framework for Computer Architecture

    Sabila Al Jannat +8

  18. cs.AR 2026-04-30 reviewed
    Ring topology on FPGAs runs cortical circuit faster than real time

    NeuroRing: Scaling Spiking Neural Networks via Multi-FPGA Bidirectional Ring Topologies and Stream-Dataflow Architectures

    Muhammad Ihsan Al Hafiz +1

  19. cs.DC 2026-04-30 reviewed
    Fees linked to pool invariant k make CPMM trades path-independent

    Characterizing Path-Independent Fees: A Route to Zero Impermanent Loss in CPMMs

    Andrey Voronin +4

  20. cs.DC 2026-04-30 reviewed
    Model derives DEX fee floor to keep LPs in gain zone

    From Impermanent Loss to Sustainable Gain: Quantifying Profitability Zones for Liquidity Providers on DEX

    Ignat Melnikov +4

  21. cs.DC 2026-04-30 reviewed
    CS-3 runs 90% sparse SpMM 100x faster than CPU

    Exploring Sparse Matrix Multiplication Kernels on the Cerebras CS-3

    Milan Shah +2

  22. cs.DS 2026-04-30 reviewed
    Santa Claus needs sqrt n rounds for any approximation

    Distributed Santa Claus via Global Rounding

    Tijn de Vos +4

  23. cs.DC 2026-04-30 reviewed
    Most arbitrage chances come from one transaction each

    The Origins of MEV: Systematic Attribution of Arbitrage Opportunity Creation at Scale

    Andrei Seoev +6

  24. cs.OS 2026-04-30 reviewed
    Affinity hints give 12% throughput boost on chiplet servers

    Affinity Tailor: Dynamic Locality-Aware Scheduling at Scale

    Jin Xin Ng +9

  25. cs.DC 2026-04-30 reviewed
    Design-time traces yield low WCETs that cut waste 36% in mixed-criticality systems

    AnTi-MiCS: Analytical Framework for Bounding Time in Embedded Mixed-Criticality Systems

    Behnaz Ranjbar +1

  26. cs.DC 2026-04-30 reviewed
    AI inference relocates like electricity demand within latency limits

    AI Inference as Relocatable Electricity Demand: A Latency-Constrained Energy-Geography Framework

    Xubin Luo +1

  27. cs.DC 2026-04-30 reviewed
    Lossless compression speeds LLM training up to 1.18 times

    ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training

    Wenxiang Lin +4

  28. cs.AI 2026-04-30 reviewed
    Traditional methods fail for AI in autonomous system dependability

    Autonomous Systems Dependability in the era of AI: Design Challenges in Safety, Security, Reliability and Certification

    Behnaz Ranjbar +5

  29. cs.DC 2026-04-30 reviewed
    The paper proves that all predicates expressible in monadic Presburger arithmetic can be…

    Monadic Presburger Predicates have Robust Population Protocols

    Philipp Czerner +5

  30. cs.DC 2026-04-30 reviewed
    Consensus-embedded checks give order-execute chains 10.6x throughput

    Back to the Future: Rethinking Endorsement in Order-Execute Blockchains

    Rongji Huang +7

  31. cs.CR 2026-04-30 reviewed
    Merkle tree pipeline verifies IoT logs at 130k records per second

    Lightweight Tamper-Evident Log Integrity Verification for IoT Edge Environments: A Merkle Tree Pipeline with Adaptive Chunking

    Muhammet Anil Yagiz +2

  32. cs.DC 2026-04-30 reviewed
    Distributed GPUs train fluid predictors faster than solvers

    A Study on the Performance of Distributed Training of Data-driven CFD Simulations

    Sergio Iserte +3

  33. cs.DC 2026-04-30 reviewed
    Unified API brings dynamic resources to HPC apps via MPI spawning

    Towards the Democratization and Standardization of Dynamic Resources with MPI Spawning

    Sergio Iserte +5

  34. cs.RO 2026-04-29 reviewed
    Jetson AGX Orin runs 25k Monte Carlo AEB samples in 530 ms

    Real-Time GPU-Accelerated Monte Carlo Evaluation of Safety-Critical AEB Systems Under Uncertainty

    Akshay Karjol +1

  35. cs.DC 2026-04-29 reviewed
    Block pipelining lifts Hyperledger Fabric commit throughput 1.9x

    End-to-End and Phase-Level Performance Optimization for Hyperledger Fabric

    Pavan Sollu +8

  36. cs.LG 2026-04-29 reviewed
    Compiler automates sequence parallelism for 2.7x longer LLM contexts

    AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism

    Ahan Gupta +5

  37. cs.DC 2026-04-29 reviewed
    Round-robin stage dispatch breaks GPU pipeline bottleneck for LLM training

    Efficient Training on Multiple Consumer GPUs with RoundPipe

    Yibin Luo +4

  38. cs.DC 2026-04-29 reviewed
    Deterministic nodes adapt only to uniform goals in dynamic networks

    Adaptive Self-Organization in Anonymous Dynamic Networks

    Garrett Parzych +1

  39. cs.DC 2026-04-29 reviewed
    Serverless MoE serving cuts resources below one third for multi-tenant use

    FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving

    Minghe Wang +3

  40. cs.DC 2026-04-29 reviewed
    Test taxonomy with CI ecosystem improves HPC fault detection

    A Test Taxonomy and Continuous Integration Ecosystem for Dynamic Resource Management in HPC

    Petter Sand{\aa}s +3

  41. cs.AR 2026-04-29 reviewed
    The paper introduces Voxel, a compiler-aware simulation framework for studying the…

    Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel

    Yiqi Liu +4

  42. cs.DC 2026-04-29 reviewed
    Semantic cache reuses up to 92 percent of quantum circuit results

    A Semantic Quantum Circuit Cache for Scalable and Distributed Quantum-Classical Workflows

    Mar Tejedor +2

  43. cs.DC 2026-04-29 reviewed
    Jointly adapting batch size and parallelism speeds LLM training 4-8%

    COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training

    Akhmed Sakip +8

  44. cs.DC 2026-04-29 reviewed
    Agentic workflow turns PyTorch graphs into faster CUTLASS kernels

    FACT: Compositional Kernel Synthesis with a Three-Stage Agentic Workflow

    Sina Heidari +1

  45. cs.DC 2026-04-29 reviewed
    DMRlib more than triples data center throughput with easy malleable coding

    DMRlib: Easy-coding and Efficient Resource Management for Job Malleability

    Sergio Iserte +3

  46. cs.DC 2026-04-29 reviewed
    Mobile agents scale by denser single capabilities and group collaboration

    Scaling Mobile Agent Systems: From Capability Density to Collective Intelligence

    Bowei He

  47. cs.DC 2026-04-29 reviewed
    Malleability cuts malleable HPC workload time by 27%

    MPI Malleability Validation under Replayed Real-World HPC Conditions

    S. Iserte +4

  48. cs.DC 2026-04-29 reviewed
    Dual-path KV offload cuts edge LLM latency up to 42%

    DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference

    Bodon Jeong +6

  49. cs.DC 2026-04-29 reviewed
    FloatSOM trains 1024-node maps on 1B samples in 6 minutes on GPUs

    FloatSOM: GPU-Accelerated, Distributed, Topology-Flexible Self-Organizing Maps

    Tony Xu +5

  50. cs.LG 2026-04-29 reviewed
    Progressive encoder cuts VLM latency at 1 Mbps uplink

    Progressive Semantic Communication for Efficient Edge-Cloud Vision-Language Models

    Cyril Shih-Huan Hsu +2