pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 4

  1. cs.DC 2026-05-12 reviewed
    Decoupled compression speeds GPU collectives up to 9.65x

    NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding

    Jiamin Wang +2

  2. cs.IT 2026-05-12 reviewed
    Link failures cap LEO capacity scalability at O(1/n)

    Capacity Scalability of LEO Constellations With Dynamic Link Failures

    Wei Li +1

  3. cs.DC 2026-05-12 reviewed
    Per-head adaptive blocks improve sparse attention accuracy by 5.43%

    AB-Sparse: Sparse Attention with Adaptive Block Size for Accurate and Efficient Long-Context Inference

    Di Liu +8

  4. cs.IT 2026-05-12 reviewed
    Node failures scale wireless capacity and delay with sqrt of reliable nodes

    On Capacity and Delay of Wireless Networks with Node Failures

    Wei Li +3

  5. cs.DC 2026-05-12 reviewed
    Power capping leaves LLM decode energy untouched

    The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures

    Bole Ma +3

  6. cs.LG 2026-05-12 reviewed
    DynaTrain switches 70B model parallelism in under 2 seconds

    DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training

    Yuanqing Wang +11

  7. cs.DC 2026-05-12 reviewed
    Overlays trade reliability against overhead for AI agent discovery

    Trade-offs in Decentralized Agentic AI Discovery Across the Compute Continuum

    Patrizio Dazzi +3

  8. cs.CE 2026-05-12 reviewed
    LLM inference should be measured in joules per token at scale

    Position: LLM Inference Should Be Evaluated as Energy-to-Token Production

    Xiang Liu +7

  9. cs.DC 2026-05-12 reviewed
    GraphFlash hits 127x speedup in serverless graph processing

    GraphFlash: Enabling Fast and Elastic Graph Processing on Serverless Infrastructure

    Chen Zhao +4

  10. cs.DC 2026-05-12 reviewed
    NAVIS speeds on-SSD vector inserts up to 2.74x

    NAVIS: Concurrent Search and Update with Low Position-Seeking Overhead in On-SSD Graph-Based Vector Search

    Jaeyong Song +6

  11. cs.DC 2026-05-12 reviewed
    Off-chain twins let DeFi agents simulate trades without waiting for blocks

    State Twins: An Off-Chain Substrate for Agentic Reasoning over Decentralized Finance Protocols

    Ian C. Moore

  12. cs.DC 2026-05-12 reviewed
    Storage offloading breaks memory wall for full-graph GNN training

    GriNNder: Breaking the Memory Capacity Wall in Full-Graph GNN Training with Storage Offloading

    Jaeyong Song +6

  13. quant-ph 2026-05-12 reviewed
    Task runtime dispatches QIR programs to multiple quantum processors

    Classic and Quantum Task-Based Intelligent Runtime for QIRs Running on Multiple QPUs

    Narasinga Rao Miniskar +4

  14. cs.RO 2026-05-12 reviewed
    Kairos cuts physical AI task latency by 32-66 percent

    Kairos: A Scalable Serving System for Physical AI

    Yinwei Dai +5

  15. cs.DC 2026-05-11 reviewed
    Chunked prefetching speeds DiT steps up to 1.28x with 49% less GPU memory

    ChunkFlow: Communication-Aware Chunked Prefetching for Layerwise Offloading in Distributed Diffusion Transformer Inference

    Han Meng (University of California +5

  16. cs.DC 2026-05-11 reviewed
    Chakra standardizes graph traces for AI workload benchmarking

    MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

    Srinivas Sridharan +28

  17. cs.DC 2026-05-11 reviewed
    Open traces standardize ML workload benchmarking

    MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

    Srinivas Sridharan +28

  18. cs.DC 2026-05-11 reviewed
    Directed graphs support Byzantine consensus only under specific connectivity

    Byzantine Consensus in Directed Graphs with Message Authentication

    Nitin H. Vaidya +1

  19. cs.DC 2026-05-11 reviewed
    ReCoVer keeps microbatch count fixed after GPU failures

    ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

    Ziyue Liu +9

  20. cs.DC 2026-05-11 reviewed
    ReCoVer preserves exact training trajectory after GPU losses

    ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

    Ziyue Liu +9

  21. cs.DC 2026-05-11 reviewed
    ShardTensor scales SciML to arbitrary spatial resolutions

    ShardTensor: Domain Parallelism for Scientific Machine Learning

    Corey Adams +6

  22. cs.DC 2026-05-11 reviewed
    GCC 15 outperforms LLVM 21 on four of six RISC-V vector apps

    Closer in the Gap: Towards Portable Performance on RISC-V Vector Processors

    Ruimin Shi +4

  23. cs.DC 2026-05-11 reviewed
    GCC 15 outperforms LLVM 21 in four of six RISC-V vector apps

    Closer in the Gap: Towards Portable Performance on RISC-V Vector Processors

    Ruimin Shi +4

  24. cs.DC 2026-05-11 reviewed
    Edge micro-agent fixes failures safely with no destructive actions

    An Uncertainty-Aware Resilience Micro-Agent for Causal Observability in the Computing Continuum

    Suvi De Silva +4

  25. cs.DC 2026-05-11 reviewed
    Mutable membership lets MoE survive rank faults without restarts

    Surviving Partial Rank Failures in Wide Expert-Parallel MoE Inference

    Xun Sun +20

  26. cs.CR 2026-05-11 reviewed
    This paper performs a structured bidirectional review of peer-reviewed studies on AI and…

    SoK: A Systematic Bidirectional Literature Review of AI & DLT Convergence

    Ali Irzam Kathia +5

  27. cs.DC 2026-05-11 reviewed
    Maestro cuts GPU use by 40% for compound LLM training

    Accelerating Compound LLM Training Workloads with Maestro

    Xiulong Yuan +18

  28. cs.DC 2026-05-11 reviewed
    BitTorrent warm-up hides FL update sources from local observers

    Privacy-preserving Chunk Scheduling in a BitTorrent Implementation of Federated Learning

    Naicheng Li +4

  29. cs.DC 2026-05-11 reviewed
    BitTorrent warm-up bounds FL source attribution to random guessing

    Privacy-preserving Chunk Scheduling in a BitTorrent Implementation of Federated Learning

    Naicheng Li +4

  30. cs.DC 2026-05-11 reviewed
    Hierarchical RL cuts edge latency 28 percent while saving energy

    HiRL: Hierarchical Reinforcement Learning for Coordinated Resource Management in Heterogeneous Edge Computing

    Jianyong Zhu +5

  31. cs.DC 2026-05-11 reviewed
    CPU radix sort reaches 6x bandwidth efficiency on large datasets

    FractalSortCPU: Bandwidth-Efficient Compressed Radix Sort on CPU

    Michael Dang'ana

  32. cs.DC 2026-05-11 reviewed
    CPU radix sort cuts bandwidth use by 6x on large data

    FractalSortCPU: Bandwidth-Efficient Compressed Radix Sort on CPU

    Michael Dang'ana

  33. cs.AI 2026-05-11 reviewed
    Small models reach strong edge-agent results when tools match the model

    Agentic Performance at the Edge: Insights from Benchmarking

    Shiqiang Wang +1

  34. cs.DC 2026-05-11 reviewed
    Amortized protocol makes async BRB messages linear in size

    Amortized Asynchronous Byzantine Reliable Broadcast with Optimal Resilience

    Michael Yiqing Hu +2

  35. cs.DC 2026-05-11 reviewed
    Amortized BRB reaches O(n|m|) messages in async networks

    Amortized Asynchronous Byzantine Reliable Broadcast with Optimal Resilience

    Michael Yiqing Hu +2

  36. cs.AI 2026-05-11 reviewed
    Autonomous objects resolve over half of scientific data conflicts

    Autonomous FAIR Digital Objects: From Passive Assertions to Active Knowledge

    Zeyd Boukhers +3

  37. physics.comp-ph 2026-05-11 reviewed
    Block-structured matrix multiplication speeds quantum chemistry by 10x

    Accelerating Locality-Driven Integration in Quantum Chemistry with Block-Structured Matrix Multiplication

    Xinran Wei +10

  38. physics.comp-ph 2026-05-11 reviewed
    Block-structured matmul speeds DFT integrals up to 10x on GPUs

    Accelerating Locality-Driven Integration in Quantum Chemistry with Block-Structured Matrix Multiplication

    Xinran Wei +10

  39. physics.comp-ph 2026-05-11 reviewed
    Graph reordering cuts memory pressure in GPU integral evaluation

    FusionRCG: Orchestrating Recursive Computation Graphs across GPU Memory Hierarchies

    Yihong Zhang +6

  40. physics.comp-ph 2026-05-11 reviewed
    Graph orchestration cuts GPU memory use for recursive integrals

    FusionRCG: Orchestrating Recursive Computation Graphs across GPU Memory Hierarchies

    Yihong Zhang +6

  41. cs.LG 2026-05-11 reviewed
    Adaptive clipping lifts private federated LLM accuracy

    DP-LAC: Lightweight Adaptive Clipping for Differentially Private Federated Fine-tuning of Language Models

    Haaris Mehmood +4

  42. cs.NI 2026-05-11 reviewed
    Adaptive offloading lifts LLM throughput 65% at 47% lower energy

    GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference

    Zengzipeng Tang +4

  43. cs.DC 2026-05-11 reviewed
    Vehicle screening plus federated segmentation cuts pothole data volume

    Edge-Cloud Collaborative Pothole Detection via Onboard Event Screening and Federated Temporal Segmentation

    Yingjie Wu +2

  44. cs.DC 2026-05-11 reviewed
    Brokerless data plane delivers consistent batches for AI training

    BatchWeave: A Consistent Object-Store-Native Data Plane for Large Foundation Model Training

    Ting Sun +9

  45. cs.DC 2026-05-11 reviewed
    Object store delivers atomic batches for 64-GPU model training

    BatchWeave: A Consistent Object-Store-Native Data Plane for Large Foundation Model Training

    Ting Sun +9

  46. cs.DC 2026-05-11 reviewed
    Ordered agents let population protocols recognize unambiguous star-free languages

    Population Protocols over Ordered Agents

    Michael Blondin +5

  47. cs.NI 2026-05-10 reviewed
    Method optimizes server placement for vertical federated learning in dynamic networks

    Optimizing Server Placement for Vertical Federated Learning in Dynamic Edge/Fog Networks

    Su Wang +2

  48. cs.DC 2026-05-10 reviewed
    Cascade labels 8.6M orbital sequences for anomaly detection

    Multi-Tier Labeling and Physics-Informed Learning for Orbital Anomaly Detection at Scale

    Yong Fu

  49. cs.DC 2026-05-10 reviewed
    Cloud trace decomposition predicts performance at 2% error

    Cloud Performance Decomposition for Long-Term Performance Engineering: A Case Study

    Shimul Debnath +4

  50. eess.IV 2026-05-10 reviewed
    Neural preprocessor lifts H.264 perceptual scores 27 percent on UVG

    Kelvin v1.0: A Neural Pre-Encoder for H.264: A standards-compliant learned preprocessor with -27.62% BD-VMAF on UVG

    Marco Graziano