pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 7

  1. cs.DC 2026-05-05 reviewed
    Tensor lifting maps OpenMP loops to AI Engines

    Lifting to tensors when compiling scientific computing workloads for AI Engines

    Nick Brown +1

  2. cs.DC 2026-05-05 reviewed
    GPU layer speeds exascale trace analysis by up to 314x

    Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics

    Dragana Grbic (Department of Computer Science +1

  3. cs.DC 2026-05-05 reviewed
    GPU speeds exascale trace analysis by 314 times

    Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics

    Dragana Grbic (Department of Computer Science +1

  4. cs.DC 2026-05-05 reviewed
    MPC with limited machines needs higher local exponents for superlinear tasks

    On Solving Problems of Substantially Super-linear Complexity in $N^{o(1)}$ Rounds in the MPC Model

    Andrzej Lingas

  5. cs.DC 2026-05-04 reviewed
    Decoupled virtual cores lift LLM GPU throughput 24% on average

    VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU

    Zijian He +3

  6. cs.PL 2026-05-04 reviewed
    Pact maps choreographic protocols to formal games

    Pact: A Choreographic Language for Agentic Ecosystems

    Kiran Gopinathan +4

  7. cs.DC 2026-05-04 reviewed
    AI Data Centers Break Grid Load Diversity

    From Barrier to Bridge: The Case for AI Data Center/Power Grid Co-Design

    Noman Bashir +3

  8. cs.LG 2026-05-04 reviewed
    Draft signals let SpecKV adapt gamma for 56% faster speculative decoding

    SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

    Shikhar Shukla

  9. cs.DC 2026-05-04 reviewed
    Workflow templates speed sensor app prototyping for non-experts

    From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications

    Komal Thareja +2

  10. cs.DC 2026-05-04 reviewed
    AI reuses sensor workflow template to cut dev time to 1-2 days

    (POSTER) From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications

    Komal Thareja +2

  11. cs.DC 2026-05-04 reviewed
    Parallel HSOM cuts training time for intrusion detection

    parHSOM: A novel parallel Hierarchical Self-Organizing Map implementation

    Rebekah Lane +5

  12. cs.DC 2026-05-04 reviewed
    Optimal configuration found for N-body sims on RISC-V accelerators

    Assessing Performance and Porting Strategies for Gravitational $N$-Body Simulations on the RISC-V-Based Tenstorrent Wormhole\textsuperscript{\texttrademark}

    Jenny Lynn Almerol +3

  13. quant-ph 2026-05-04 reviewed
    Global optimization cuts distributed quantum costs most

    Distributed Quantum Circuit Optimisation: Evaluating Global and Local encodings

    Maria Gragera Garces +1

  14. quant-ph 2026-05-04 reviewed
    Global optimization minimizes distributed quantum circuit costs

    Distributed Quantum Circuit Optimisation: Evaluating Global and Local encodings

    Maria Gragera Garces +1

  15. cs.DC 2026-05-04 reviewed
    Bayesian optimization lifts Fabric TPS by 12%

    Caliper-in-the-Loop: Black-Box Optimization for Hyperledger Fabric Performance Tuning

    Yash Madhwal +7

  16. cs.LG 2026-05-04 reviewed
    Sign-Muon reaches O(1/sqrt(T)) rate with 32x bandwidth cut

    SignMuon: Communication-Efficient Distributed Muon Optimization

    Neel Mishra +2

  17. cs.DC 2026-05-04 reviewed
    Partial layer training matches full federated accuracy with 82 percent fewer parameters

    FedPLT: Scalable, Resource-Efficient, and Heterogeneity-Aware Federated Learning via Partial Layer Training

    Ahmad Dabaja +1

  18. cs.DC 2026-05-04 reviewed
    Kairos raises LLM SLO attainment by up to 34%

    Taming Request Imbalance: SLO-Aware Scheduling for Disaggregated LLM Inference

    Qipeng Wang +1

  19. eess.SY 2026-05-04 reviewed
    Each CAV spots sensor faults using distributed observers

    Distributed Observer-based Fault Detection over Intelligent Networked Multi-Vehicle Systems

    Mohammadreza Doostmohammadian +1

  20. cs.DC 2026-05-04 reviewed
    Raspberry Pi clusters teach undergrads practical supercomputing

    Leveraging Teaching on Demand: Approaching HPC to Undergrads

    S. Catal\'an +2

  21. cs.DC 2026-05-04 reviewed
    ZKP wrapper secures federated learning at 94 percent accuracy under attack

    Privacy-Preserving Federated Learning: Integrating Zero-Knowledge Proofs in Scalable Distributed Architectures

    Divya Gupta

  22. cs.DC 2026-05-04 reviewed
    IO500 logs reveal storage patterns missed by scores

    A Treasure Trove of Performance: Analyzing the IO500 Submission Data

    Julian Kunkel +4

  23. cs.DC 2026-05-04 reviewed
    Pipeline offloading lifts offline LLM throughput up to 2.51x

    PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers

    Hongbin Zhang +5

  24. cs.CV 2026-05-04 reviewed
    One-shot diffusion and model fusion reach 33.4% mAP for private surveillance

    Heterogeneous Model Fusion for Privacy-Aware Multi-Camera Surveillance via Synthetic Domain Adaptation

    Peggy Joy Lu +3

  25. cs.CV 2026-05-04 reviewed
    Privacy-preserving detection hits 33.4% mAP across cameras

    Heterogeneous Model Fusion for Privacy-Aware Multi-Camera Surveillance via Synthetic Domain Adaptation

    Peggy Joy Lu +3

  26. cs.DC 2026-05-04 reviewed
    AAFLOW speeds agentic AI pipelines 4.64x via zero-copy data flows

    AAFLOW: Scalable Patterns for Agentic AI Workflows

    Arup Kumar Sarker +5

  27. cs.DC 2026-05-04 reviewed
    Smaller idle models speed large LLM serving by more than double

    SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM Inference

    Jincheng Xie +6

  28. cs.DC 2026-05-04 reviewed
    Tail models accelerate large LLM inference by 2.28x as remote drafters

    SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM Inference

    Jincheng Xie +6

  29. cs.DC 2026-05-04 reviewed
    Queue predictions speed federated learning by 20 percent on HPC

    FedQueue: Queue-Aware Federated Learning for Cross-Facility HPC Training

    Yijiang Li +5

  30. cs.DC 2026-05-04 reviewed
    Queue predictions stabilize federated learning across HPC sites

    FedQueue: Queue-Aware Federated Learning for Cross-Facility HPC Training

    Yijiang Li +5

  31. quant-ph 2026-05-03 reviewed
    Random circuits distort quantum partitioning benchmarks

    On the Distortion of Partitioning Performance by Random Quantum Circuits

    Maria Gragera Garces

  32. quant-ph 2026-05-03 reviewed
    This paper finds that random quantum circuits used to test hypergraph partitioning for…

    On the Distortion of Partitioning Performance by Random Quantum Circuits

    Maria Gragera Garces

  33. cs.DC 2026-05-03 reviewed
    Data movement and overlap govern energy use in multimodal training

    Cross-Layer Energy Analysis of Multimodal Training on Grace Hopper Superchips

    Mahmoud Ahmed +6

  34. cs.DC 2026-05-03 reviewed
    Decentralized geohash sampling cuts geospatial stream latency

    Decentralized Stratified Sampling for Low-Latency Approximate Geospatial Data Stream Processing in Edge-Cloud Architectures

    Isam Mashhour Al Jawarneh +3

  35. cs.LG 2026-05-03 reviewed
    Sparse value sampling speeds attention 1.5x at long contexts

    Stochastic Sparse Attention for Memory-Bound Inference

    Kyle Lee +7

  36. cs.LG 2026-05-03 reviewed
    Declarative framework cuts RAG tuning code changes by 95%

    AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines

    Xintan Zeng +3

  37. cs.DC 2026-05-03 reviewed
    nvPAX three-phase method reaches 98.92% power satisfaction

    nvPAX: Constrained Optimization for Dynamic Power Allocation in Hierarchical and Multi-Tenant Systems

    Hadar Sivan +2

  38. cs.DC 2026-05-03 reviewed
    Joint time-structure model improves microservice fault detection

    Joint Temporal-Structural Representation Learning for Distributed Fault Discrimination in Microservice Architectures

    Yihan Xue +4

  39. cs.DC 2026-05-03 reviewed
    SplitZip speeds KV cache transfers by 1.32x with lossless GPU coding

    SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving

    Yipin Guo +1

  40. cs.DC 2026-05-03 reviewed
    SplitZip compresses KV caches at 613 GB/s for faster LLM transfers

    SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving

    Yipin Guo +1

  41. cs.DC 2026-05-02 reviewed
    CvxCluster uses a two-stage convex optimization approach to allocate resources across…

    CvxCluster: Solving Large, Complex, Granular Resource Allocation Problems 100-1000x Faster

    Obi Nnorom Jr +2

  42. cs.AR 2026-05-02 reviewed
    FPGA accelerator speeds SVD for PCA 22x over GPU

    MANOJAVAM: A Scalable, Unified FPGA Accelerator for Matrix Multiplication and Singular Value Decomposition in Principal Component Analysis

    Srivaths Ramasubramanian +7

  43. cs.DC 2026-05-02 reviewed
    Turing machine extension defines context-awareness

    On defining and modeling context-awareness

    Panteleimon Rodis

  44. cs.OS 2026-05-02 reviewed
    VUDA delivers 85% higher throughput via CUDA-Vulkan spatial sharing

    VUDA: Breaking CUDA-Vulkan Isolation for Spatial Sharing of Compute and Graphics on the Same GPU

    Bin Xu +4

  45. cs.DC 2026-05-02 reviewed
    Complex analysis cuts cloud VM flapping by 94%

    Intelligent Autonomous Orchestration for Distributed Cloud Resources using Complex-Stability Analysis

    Gopal Krishna Shyam +1

  46. cs.DC 2026-05-02 reviewed
    LLM serving needs math models over generic heuristics

    Position: LLM Serving Needs Mathematical Optimization and Algorithmic Foundations, Not Just Heuristics

    Zijie Zhou

  47. cs.SE 2026-05-01 reviewed
    DDD simulator runs same microservice code under multiple consistency models

    A Domain-Driven Design Simulator for Business Logic-Rich Microservice Systems

    Daniel da Palma Pereira +1

  48. cs.DC 2026-05-01 reviewed
    Interference flips scheduler rankings in 28% of edge cases

    ncsim: A Lightweight Simulator for Networked Edge Computing with Wireless Interference Modeling

    Bhaskar Krishnamachari +2

  49. cs.DC 2026-05-01 reviewed
    FPTC codec reaches 3.6x compression for power signals

    FPTC: A Fast Parallel Transform-based Codec for Efficient Asymmetric Signal Compression

    Ben Mechels +4

  50. cs.DC 2026-05-01 reviewed
    Streaming GPU encoding matches batch speed with 12x less memory

    SURGE: SuperBatch Unified Resource-efficient GPU Encoding for Heterogeneous Partitioned Data

    Shashank Kapadia +5