pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 11

  1. cs.LG 2026-04-20 reviewed
    Post-correction keeps particle clusters intact after lossy compression

    Preserving Clusters in Error-Bounded Lossy Compression of Particle Data

    Congrong Ren +4

  2. cs.PF 2026-04-20 reviewed
    CPU-GPU hybrid speeds long-context LLM inference 1.41x-3.2x

    HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing

    Mao Lin +4

  3. cs.DC 2026-04-20 reviewed
    Resilient MPI key-value store hits limits with current ULFM and RMA

    User Experiences with MPI RMA and ULFM in a Resilient Key-Value Store Implementation

    Claudia Fohry +1

  4. cs.DC 2026-04-20 reviewed
    Digital twin tests BFT systems against timing attacks

    Trust, but Verify: ByzTwin-Range, a Digital Twin Cyber-Range for Byzantine Faults

    Tadeu Freitas +2

  5. cs.DC 2026-04-20 reviewed
    Memory quantile models cut cluster under-allocations from 4.17% to 2.89%

    Optimizing Memory Allocation in Distributed Clusters with Predictive Modeling

    Jonathan Bader +7

  6. cs.DC 2026-04-20 reviewed
    Tighter analysis cuts leader election messages to O(n log n)

    Toward Optimality: A Tighter Analysis of Message Complexity for Leader Election in Diameter-Two Networks

    Abhijit Sadhukhan +2

  7. cs.CE 2026-04-20 reviewed
    Fused CUDA kernel speeds 3D SIMP optimization 4.6-7.3x

    Matrix-Free 3D SIMP Topology Optimization with Fused Gather-GEMM-Scatter Kernels

    Shaoliang Yang +2

  8. cs.DC 2026-04-20 reviewed
    One frozen LLM runs many tasks with 4-6x better speed and memory on phones

    Unlocking the Edge deployment and ondevice acceleration of multi-LoRA enabled one-for-all foundational LLM

    Sravanth Kodavanti +15

  9. cs.DC 2026-04-20 reviewed
    Persistent GPU kernel yields 15x speedup for tiny tensor operations

    GPUOS: A GPU Operating System Primitive for Transparent Operation Fusion

    Yiwei Yang +5

  10. cs.DC 2026-04-20 reviewed
    Async GPU kernels speed up sparse matrix multiplies by up to 6x

    AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures

    Jie Liu +2

  11. cs.CL 2026-04-20 reviewed
    DeInfer speeds parallel inference of decomposed LLMs

    DeInfer: Efficient Parallel Inferencing for Decomposed Large Language Models

    You-Liang Huang +3

  12. cs.DC 2026-04-19 reviewed
    EcoSched cuts multi-GPU energy use by up to 14.8% via per-job GPU counts

    Towards Energy Efficient Co-Scheduling in HPC

    Zhong Zheng +2

  13. cs.DC 2026-04-19 reviewed
    EcoShift gains 6% performance in power-limited CPU-GPU clusters

    EcoShift: Performance-Aware Power Management for Power-Constrained Heterogeneous Systems

    Zhong Zheng +2

  14. cs.LG 2026-04-19 reviewed
    Crash-aware tuner spends fixed budget more consistently on LLM serving

    SLO-Guard: Crash-Aware, Budget-Consistent Autotuning for SLO-Constrained LLM Serving

    Christian Lysenst{\o}en

  15. cs.AR 2026-04-19 reviewed
    Multi-tier KV cache cuts LLM inference costs by 47%

    Predictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU Inference

    Sanjeev Rao Ganjihal

  16. cs.DC 2026-04-19 reviewed
    Compiler IR enables hardware-free design exploration for distributed ML

    Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML

    Jinsun Yoo +5

  17. cs.DC 2026-04-19 reviewed
    Active inference learns edge AI routing without offline training

    Active Inference-Based Adaptive Routing for Heterogeneous Edge AI Services

    Zihang Wang +2

  18. cs.AI 2026-04-19 reviewed
    Hive reuses logits to speed up multi-agent LLM re-sampling 1.11x-1.76x

    Hive: A Multi-Agent Infrastructure for Algorithm- and Task-Level Scaling

    Zizhang Luo +5

  19. cs.DC 2026-04-19 reviewed
    Cloud-native systems required to scale large language models

    Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda

    Minxian Xu +18

  20. cs.DC 2026-04-19 reviewed
    Lossless compression speeds GPU communication up to 47%

    UCCL-Zip: Lossless Compression Supercharged GPU Communication

    Shuang Ma +10

  21. cs.DC 2026-04-18 reviewed
    Proxy borrows OS scheduling to stop LLM agents from crashing APIs

    HiveMind: OS-Inspired Scheduling for Concurrent LLM Agent Workloads

    Justice Owusu Agyemang +5

  22. cs.DC 2026-04-18 reviewed
    Tensor fingerprinting cuts AI model hub storage

    TStore: Rethinking AI Model Hub with Tensor-Centric Compression

    Tingfeng Lan +5

  23. cs.DC 2026-04-18 reviewed
    TensorHub cuts AI model storage via tensor deduplication

    TStore: Rethinking AI Model Hub with Tensor-Centric Compression

    Tingfeng Lan +5

  24. cs.DC 2026-04-18 reviewed
    Standard Podman with added layers matches specialized HPC containers

    Sarus Suite: Cloud-native Containers for HPC

    Alberto Madonna +5

  25. cs.DC 2026-04-18 reviewed
    Pipeline predicts airspace sectors and lets aircraft coordinate entries

    Predictive Sectorization and Bayesian Optimized Consensus for Admission Control in Autonomous Airspace Operations

    Aditya Dhodapkar +4

  26. cs.AI 2026-04-18 reviewed
    Quick intuition tops slow reasoning for edge AI in DAOs

    The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus

    Syed Muhammad Aqdas Rizvi

  27. cs.DC 2026-04-18 reviewed
    Three axioms force AMM orbits to weighted geometric means

    From Swap Axioms to Weighted Geometric Means: A Characterization of AMMs

    Bj\"orn Assmann +1

  28. cs.DC 2026-04-18 reviewed
    Hierarchical sparsity speeds LLM attention 4.57 times

    HieraSparse: Hierarchical Semi-Structured Sparse KV Attention

    Haoxuan Wang +1

  29. cs.DB 2026-04-17 reviewed
    Flipped indexing delivers 6.5x lower GPU query latency with dynamic updates

    FliX: Flipped-Indexing for Scalable GPU Queries and Updates

    Rosina Kharal +3

  30. cs.DC 2026-04-17 reviewed
    Adaptive framework trains graph transformers 6x faster on 8 GPUs

    Scalable and Adaptive Parallel Training of Graph Transformer on Large Graphs

    Jun-Liang Lin +2

  31. cs.DC 2026-04-17 reviewed
    Agent context tracking cuts power use 27% in AI serving

    KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving

    Yichao Yuan +2

  32. quant-ph 2026-04-17 reviewed
    GreenPeas is a C++/CUDA tool that compiles quantum error-correction decoding hypergraphs…

    GreenPeas: Unlocking Adaptive Quantum Error Correction with Just-in-Time Decoding Hypergraphs

    Abbas B. Ziad +2

  33. cs.LG 2026-04-17 reviewed
    Precision modeling cuts training time prediction error to 9.8 percent

    Training Time Prediction for Mixed Precision-based Distributed Training

    Minchul Kang +7

  34. cs.DC 2026-04-17 reviewed
    Any amoebot shape breaks into O(holes) convex pieces in log time

    Logarithmic-Time Geodesically Convex Decomposition in Programmable Matter

    Henning Hillebrandt +4

  35. cs.DC 2026-04-17 reviewed
    Compositional operators let verified swarms be reused safely

    Compositional Design, Implementation, and Verification of Swarms (Technical Report)

    Florian Furbach +5

  36. cs.DC 2026-04-17 reviewed
    Availability weighting fixes unfair sampling in federated learning

    Robust Synchronisation for Federated Learning in The Face of Correlated Device Failure

    Stefan Behfar +1

  37. cs.DC 2026-04-17 reviewed
    Dynamic grouping plus TEE cuts blockchain consensus messages

    T-RBFT: A Scalable and Efficient Byzantine Consensus Based on Trusted Execution Environment for Consortium Blockchain

    Wen Gao +2

  38. cs.DC 2026-04-17 reviewed
    SYCL implementations vary in memory and kernel behavior

    Evaluating SYCL as a Unified Programming Model for Heterogeneous Systems

    Ami Marowka

  39. cs.DC 2026-04-17 reviewed
    Automated pipeline adds continuous benchmarking to HPC

    Continuous benchmarking: Keeping pace with an evolving ecosystem of models and technologies

    Jan Vogelsang +9

  40. cs.DC 2026-04-17 reviewed
    Second-gen serverless drops warm latency from 40 ms to 10 ms

    New Kids: An Architecture and Performance Investigation of Second-Generation Serverless Platforms

    Trever Schirmer +6

  41. cs.DC 2026-04-17 reviewed
    Exascale system trains billion-parameter interatomic potentials in hours

    Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials

    Yuanchang Zhou +14

  42. cs.DC 2026-04-17 reviewed
    On-orbit aggregation reduces satellite federated learning energy by 6x

    CroSatFL: Energy-Efficient Federated Learning with Cross-Aggregation for Satellite Edge Computing

    Nan Yang +4

  43. cs.DC 2026-04-17 reviewed
    GPU framework speeds NNQS configuration selection 2.32x on 64 GPUs

    A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States

    Daran Sun +15

  44. cs.CR 2026-04-17 reviewed
    Sequential memory proof caps ASIC speed at DRAM latency

    PoSME: Proof of Sequential Memory Execution via Latency-Bound Pointer Chasing with Causal Hash Binding

    David L. Condrey

  45. cs.DC 2026-04-17 reviewed
    Accuracy drives speed in long-context LLM serving

    Accuracy Is Speed: Towards Long-Context-Aware Routing for Distributed LLM Serving

    Takeshi Yoshimura +2

  46. cs.DC 2026-04-17 reviewed
    RAFT cluster inside blockchain nodes boosts scale and uptime

    BlockRaFT: A Distributed Framework for Fault-Tolerant and Scalable Blockchain Nodes

    Manaswini Piduguralla +3

  47. cs.DC 2026-04-17 reviewed
    The paper introduces DataCenterGym

    DataCenterGym: A Physics-Grounded Simulator for Multi-Objective Data Center Scheduling

    Nilavra Pathak +2

  48. cs.LG 2026-04-16 reviewed
    Mixing matrix design speeds SGP convergence in broadcast DFL

    Optimizing Stochastic Gradient Push under Broadcast Communications

    Tuan Nguyen +1

  49. cs.DC 2026-04-16 reviewed
    Wave dispatch lets HPC treat quantum fragments as tasks

    Wave-Based Dispatch for Circuit Cutting in Hybrid HPC--Quantum Systems

    Ricard S. Garc\'ia-Raigada +2

  50. cs.DC 2026-04-16 reviewed
    Stable per-LLM time shares enable efficient GPU allocation for agentic workflows

    Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines

    Marcel Wagenl\"ander +8