pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 5

  1. cs.IT 2026-05-10 reviewed
    Algorithm learns adversary utilities for sublinear regret in coding game

    Learning from Acceptance: Cumulative Regret in the Game of Coding

    Hanzaleh Akbari Nodehi +2

  2. cs.AR 2026-05-10 reviewed
    KV-cache movement regularization cuts static-graph LLM latency spikes

    KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving

    Zhiqing Zhong +5

  3. cs.LG 2026-05-10 reviewed
    Held-out gates catch regressions in LLM Metal kernel search

    Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon

    V\'ictor Gallego

  4. cs.DS 2026-05-10 reviewed
    Small subsets approximate the global ranking median

    A Scalable and Unified Framework to Weighted Rank Aggregation

    Amir Carmel +2

  5. cs.DC 2026-05-10 reviewed
    Adaptive DNN splits cut energy by 27-36% on real edge-cloud hardware

    Adaptive DNN Partitioning and Offloading in Heterogeneous Edge-Cloud Continuum

    Akuen Akoi Deng +3

  6. cs.PL 2026-05-10 reviewed
    CaMPL type system blocks deadlocks in concurrent code

    Categorical Message Passing Language (CaMPL) for programmers

    Daniel Kiyoshi Hashimoto +2

  7. cs.DC 2026-05-10 reviewed
    Air quality sensors detect cooking at 99.68 percent accuracy on-device

    PoHAR: Understanding Hyperlocal Human Activities with Pollution Sensor Networks

    Prasenjit Karmakar +2

  8. cs.DC 2026-05-10 reviewed
    ATLAS cuts GNN inference time 12-30x for billion-edge graphs

    ATLAS: Efficient Out-of-Core Inference for Billion-Scale Graph Neural Networks

    Pranjal Naman +1

  9. cs.DC 2026-05-10 reviewed
    Multi-metric detection catches all GPU failures in 504-GPU LLM run

    From Detection to Recovery: Operational Analysis on LLM Pre-training with 504 GPUs

    Daemyung Kang +13

  10. cs.DC 2026-05-10 reviewed
    Kernel-level splits let networked MCUs run large CNNs

    Split CNN Inference on Networked Microcontrollers

    Junyu Lu +4

  11. cs.LG 2026-05-10 reviewed
    DisagMoE overlaps MoE layers for 1.8x training speedup

    DisagMoE: Computation-Communication overlapped MoE Training via Disaggregated AF-Pipe Parallelism

    Zhichen Zeng +12

  12. cs.CR 2026-05-10 reviewed
    Split TCB design routes encrypted packets at native speed

    Enforcing Attestable Workflows across Untrusted Networks

    Hung Dang +1

  13. cs.DC 2026-05-09 reviewed
    Consistency models collapse into three entangled constraints

    Light Cone Consistency: Closure, Ordering, and the Single-Observer Boundary

    Rob Landers +1

  14. cs.DC 2026-05-09 reviewed
    System achieves up to 7.57x faster dynamic multimodal LLM training

    MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production

    Chunyu Xue +16

  15. math.OC 2026-05-09 reviewed
    Variance reduction shortens time complexity in parallel optimization

    Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction

    Zhirayr Tovmasyan +2

  16. cs.LG 2026-05-09 reviewed
    VAEs recover mixture proportions for personalized federated learning

    FedGMI: Generative Model-Driven Federated Learning for Probabilistic Mixture Inference

    Qijun Hou +3

  17. cs.DC 2026-05-09 reviewed
    Basic Verkle trees cost more than Merkle trees

    TS-Verkle: A TypeScript Native Verkle Library With On-chain Verifier

    Zhikai Li +4

  18. cs.LG 2026-05-09 reviewed
    Agent framework cuts data leaks 2-6x while raising accuracy 15-36%

    PAAC: Privacy-Aware Agentic Device-Cloud Collaboration

    Liangqi Yuan +3

  19. cs.DC 2026-05-09 reviewed
    Generative model compresses Earth data by up to 10,000 times

    Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction

    Jinxiao Zhang +16

  20. eess.SP 2026-05-09 reviewed
    LLMs collaborate across devices and cloud to meet resource limits

    Large Language Models over Networks: Collaborative Intelligence under Resource Constraints

    Liangqi Yuan +4

  21. cs.DC 2026-05-08 reviewed
    Same code runs in abstract rounds and real sockets for distributed algorithms

    QUANTAS 2 An Abstract, Concrete and Byzantine Simulator

    Mikhail Nesterenko +1

  22. cs.DC 2026-05-08 reviewed
    Concurrent RL fine-tunes match single-task quality at 4.3x efficiency

    MARLaaS: Multi-Tenant Asynchronous Reinforcement Learning as a Service

    Timothy Tin Long Yu +5

  23. cs.DC 2026-05-08 reviewed
    Block-level sharding scales context parallelism to 256 GPUs

    Unleashing Scalable Context Parallelism for Foundation Models Pre-Training via FCP

    Yilong Zhao +8

  24. cs.LG 2026-05-08 reviewed
    Asynchronous stages raise agent evolution throughput 3.5x

    FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration

    Zhengding Hu +10

  25. cs.LG 2026-05-08 reviewed
    Hybrid head speeds secure time-series inference up to 44x

    Private Vertical Federated Inference for Time-Series

    Lucas Fenaux +5

  26. cs.DC 2026-05-08 reviewed
    LLM profiler reuses work across models to cut GPU hours 56%

    Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

    Joon Ha Kim +3

  27. cs.DC 2026-05-08 reviewed
    Dooly reuses LLM op profiles across configs to cut costs 56%

    Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

    Joon Ha Kim +3

  28. cs.LG 2026-05-08 reviewed
    FLAM computes exact global performance in federated learning locally

    FLAM: Evaluating Model Performance with Aggregatable Measures in Federated Learning

    Fabian Stricker +3

  29. cs.DC 2026-05-08 reviewed
    Stencil kernels run up to 342x faster on wafer-scale engine

    Stencil Computations on Cerebras Wafer-Scale Engine

    Elia Belli +1

  30. cs.LG 2026-05-08 reviewed
    Adaptive tuning keeps decentralized SGD converging under adversary majority

    \mathsf{VISTA}: Decentralized Machine Learning in Adversary Dominated Environments

    Hanzaleh Akbari Nodehi +3

  31. cs.AR 2026-05-08 reviewed
    Model runs 1024-core chip sims 115x faster at under 7% error

    Accelerating Precise End-to-End Simulation: Latency-Sensitive Many-core System Modeling

    Yinrong Li +7

  32. cs.DC 2026-05-08 reviewed
    175B models trained at 10% peak FLOPs with standard parallelism

    A Scalable Recipe on SuperMUC-NG Phase 2: Efficient Large-Scale Training of Language Models

    Ajay Navilarekal Rajgopal +1

  33. cs.DC 2026-05-08 reviewed
    Wormhole stencil kernels match CPU speed but lose to transfers

    Stencil Computations on Tenstorrent Wormhole

    Lorenzo Piarulli +1

  34. cs.DC 2026-05-08 reviewed
    HexiSeq trains long-context LLMs 1.36x faster on mixed GPU clusters

    HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware

    Yan Liang +5

  35. cs.DC 2026-05-08 reviewed
    Hierarchical agents lift AI-RAN SLO fulfillment to 90%

    Deadline-Driven Hierarchical Agentic Resource Sharing for AI Services and RAN Functions in AI-RAN

    Haiyuan Li +2

  36. cs.DC 2026-05-08 reviewed
    RcLLM cuts TTFT 1.31x-9.51x for generative recommendation

    RcLLM: Accelerating Generative Recommendation via Beyond-Prefix KV Caching

    Zhan Zhao +2

  37. cs.LG 2026-05-08 reviewed
    Shared spectral operator aligns mismatched sensors in federated learning

    UMEDA: Unified Multi-modal Efficient Data Fusion for Privacy-Preserving Graph Federated Learning via Spectral-Gated Attention and Diffusion-Based Operator Alignment

    Shih-Yu Lai +4

  38. cs.DC 2026-05-08 reviewed
    MERBIT speeds irregular SpMV 27 percent on GPUs

    MERBIT: A GPU-Based SpMV Method for Iterative Workloads

    Qi Zhang +3

  39. cs.LG 2026-05-08 reviewed
    RL weight sync uses 100 times less data with full fidelity

    SparseRL-Sync: Lossless Weight Synchronization with ~100x Less Communication

    Lucas Hu +6

  40. cs.AR 2026-05-08 reviewed
    TREA accelerator reduces edge detection latency up to 9x

    TREA: Low-precision Time-Multiplexed, Resource-Efficient Edge Accelerator for Object Detection and Classification

    Vijay Pratap Sharma +4

  41. eess.SP 2026-05-08 reviewed
    Energy subtraction on paired elements recovers signed OTA aggregates

    Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning

    Hao Chen +1

  42. eess.SP 2026-05-08 reviewed
    Energy difference on two resources replaces CSI for wireless federated learning

    Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning

    Hao Chen +1

  43. cs.DC 2026-05-08 reviewed
    Future-state scheduler cuts LLM workflow makespan by 32 percent

    FATE: Future-State-Aware Scheduling for Heterogeneous LLM Workflows

    Zirui Huang +3

  44. cs.SE 2026-05-08 reviewed
    AI backends gain one admission seam for governance across requests

    Execution Envelopes: A Shared Admission Contract for Backend AI Execution Requests

    Krti Tallam

  45. cs.DC 2026-05-07 reviewed
    Hardware usage metrics match Kripke kernel to RAJA proxy

    On Similarity of Computational Kernels in our Codes and Proxies

    Michael McKinsey +2

  46. cs.DC 2026-05-07 reviewed
    Per-step slack regulator raises LLM goodput 1.77x

    Regulating Branch Parallelism in LLM Serving

    Swapnil Gandhi +3

  47. cs.LG 2026-05-07 reviewed
    IoT security model gains 30% detection boost with mostly unlabeled data

    CLAD: A Clustered Label-Agnostic Federated Learning Framework for Joint Anomaly Detection and Attack Classification

    Iason Ofeidis +4

  48. cs.DC 2026-05-07 reviewed
    Traces reveal LLM setups 3x slower on identical hardware

    CCL-Bench 1.0: A Trace-Based Benchmark for LLM Infrastructure

    Eric Ding +10

  49. cs.DC 2026-05-07 reviewed
    Sharing serving GPUs boosts agentic RL throughput 1.3-3.3x

    ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL

    Wei Gao +15

  50. cs.DC 2026-05-07 reviewed
    Serving GPUs accelerate agentic RL rollouts up to 3.3x

    ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL

    Wei Gao +15