pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

1164 papers in cs.DC · page 9

  1. cs.DC 2026-04-29 reviewed
    SplitFT speeds LLM fine-tuning with adaptive client cut layers

    SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning

    Yimeng Shan +5

  2. cs.DC 2026-04-29 reviewed
    Pipelined sharding speeds client xLM inference up to 30x with 10x less VRAM

    Efficient, VRAM-Constrained xLM Inference on Clients

    Aditya Ukarande +3

  3. cs.CL 2026-04-29 reviewed
    Folding parallelism cuts memory for long-context transformers

    Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference

    Vasu Shyam +2

  4. cs.LG 2026-04-29 reviewed
    Multi-version rollout lifts LLM RL throughput 2-3x while keeping convergence

    DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training

    Tianhao Hu +17

  5. cs.AR 2026-04-28 reviewed
    Memory-centric chiplets cut attention latency 15 times

    AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving

    Zhongkai Yu +11

  6. cs.DC 2026-04-28 reviewed
    Direct remote access beats prefetching for LLM GPU offloading

    DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference

    Shouxu Lin +2

  7. cs.LG 2026-04-28 reviewed
    Wave cost model picks MoE kernels with 0.93% regret

    RaMP: Runtime-Aware Megakernel Polymorphism for Mixture-of-Experts

    Vyom Sharma +1

  8. cs.MA 2026-04-28 reviewed
    Workflow structure lets Pythia speed up multi-agent LLM serving

    Pythia: Exploiting Workflow Predictability for Efficient Agent-Native LLM Serving

    Shan Yu +16

  9. cs.MA 2026-04-28 reviewed
    Simple interface lifts multi-agent LLM serving throughput

    Pythia: Exploiting Workflow Predictability for Efficient Agent-Native LLM Serving

    Shan Yu +16

  10. eess.SP 2026-04-28 reviewed
    Speculative decoding cuts federated LLM communication

    SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission

    Ce Zheng +5

  11. cs.DS 2026-04-28 reviewed
    Exclusive scans finish in log p rounds with bounded operator uses

    Two Efficient Message-passing Exclusive Scan Algorithms

    Jesper Larsson Tr\"aff

  12. cs.DC 2026-04-28 reviewed
    Hierarchical FL setups lower energy for plant disease classification

    Performance and Energy Trade-Off Analysis of Hierarchical Federated Learning for Plant Disease Classification

    Athanasios Papanikolaou +8

  13. cs.DC 2026-04-28 reviewed
    Volitional states guard atomic machine actions in people-machine systems

    Volitional Multiagent Atomic Transactions: Describing People and their Machines

    Andy Lewis-Pye +1

  14. cs.DC 2026-04-28 reviewed
    Computing clusters cut emissions by timing jobs to renewable surplus

    Economical and ecological impact of sector coupling applied to computing clusters

    P. Bechtle +9

  15. cs.DC 2026-04-28 reviewed
    Warp-tiled kernels cut depthwise convolution time by 3.26 times

    CUDA Kernel Optimization and Counter-Free Performance Analysis for Depthwise Convolution in Cloud Environments

    Huriyeh Babak +1

  16. cs.DC 2026-04-28 reviewed
    Microservice systems often model only partial production dynamics

    Adaptive Management of Microservices in Dynamic Computing Environments: A Taxonomy and Future Directions

    Ming Chen +3

  17. cs.DC 2026-04-28 reviewed
    3D parallelism cuts first-token time in LLM serving by 10-62%

    CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration

    Sean Nian +4

  18. cs.DC 2026-04-27 reviewed
    Fixed-input lock keeps Spark policy outputs identical under repartitioning

    Spark Policy Toolkit: Semantic Contracts and Scalable Execution for Policy Learning in Spark

    Zeyu Bai

  19. cs.ET 2026-04-27 reviewed
    IoE unifies people, data, and things for 6G automation

    Internet of Everything in the 6G Era: Paradigms, Enablers, Potentials and Future Directions

    Driss Choukri +3

  20. cs.ET 2026-04-27 reviewed
    Repository blockchain turns fork chains into trees for single-process access

    A Tree-Based Repository Blockchain Framework for Shared Governance in Collaborative Fork Ecosystems

    Razwan Ahmed Tanvir +1

  21. cs.LG 2026-04-27 reviewed
    One shared KV cache serves 15 agents at 97.7% less memory

    PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference

    Ishan Patel +1

  22. cs.CR 2026-04-27 reviewed
    Merkle trees allow 2-3x larger post-quantum cert chains

    Network Impact of Post-Quantum Certificate Chain sizes on Time to First Byte in TLS Deployments

    Matthew Chou +1

  23. cs.DC 2026-04-27 reviewed
    SpotVista picks multi-node spots with 81% higher availability

    SpotVista: Availability-Aware Recommendation System for Reliable and Cost-Efficient Multi-Node Spot Instances

    Taeyoon Kim +6

  24. cs.CR 2026-04-27 reviewed
    Split learning lets clients fine-tune LLMs without sharing data

    A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy Optimizations

    Zihan Liu +4

  25. cs.DC 2026-04-27 reviewed
    Incisor is a cloud system that pairs program analysis tools with large language models to…

    Incisor: Ex Ante Cloud Instance Selection for HPC Jobs

    Michael A. Laurenzano +2

  26. cs.DC 2026-04-27 reviewed
    Exact scheduler improves IoT latency

    Exact, Efficient, and Reliable Multi-Objective and Multi-Constrained IoT Workflow Scheduling in Edge-Hub-Cloud Cyber-Physical Systems

    Andreas Kouloumpris +3

  27. cs.MA 2026-04-27 reviewed
    Multi-agent LLM tutor runs full semester without boundary failures

    ITAS: A Multi-Agent Architecture for LLM-Based Intelligent Tutoring

    Iizalaarab Elhaimeur +1

  28. cs.CY 2026-04-27 reviewed
    Priority PayGo holds tutoring under 4s at 50 users

    Latency and Cost of Multi-Agent Intelligent Tutoring at Scale

    Iizalaarab Elhaimeur +1

  29. cs.DC 2026-04-27 reviewed
    Atomistic model reaches year-and-meter scales for RPV steel

    Unfolding an Atomistic World: Atomistic Simulation of Reactor Pressure Vessel Steel Across Year-and-Meter Scales

    Haozhi Han +10

  30. cs.DC 2026-04-27 reviewed
    AtomWorld simulates RPV steel atom by atom at meter and year scales

    Unfolding an Atomistic World: Atomistic Simulation of Reactor Pressure Vessel Steel Across Year-and-Meter Scales

    Haozhi Han +10

  31. cs.DC 2026-04-27 reviewed
    TACO cuts tensor-parallel communication to raise LLM training speed 1.87x

    TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training

    Man Liu +10

  32. cs.LG 2026-04-27 reviewed
    FreeScale cuts bubbles by 90 percent in recommendation model training

    FreeScale: Distributed Training for Sequence Recommendation Models with Minimal Scaling Cost

    Chenhao Feng +19

  33. cs.DC 2026-04-27 reviewed
    Kubernetes spot system gains 55% more performance per dollar

    KubePACS: Kubernetes Cluster Using Performant, Highly Available, and Cost Efficient Spot Instances

    Taeyoon Kim +4

  34. cs.LG 2026-04-27 reviewed
    CommFuse removes tail latency from LLM training overlaps

    CommFuse: Hiding Tail Latency via Communication Decomposition and Fusion for Distributed LLM Training

    Rezaul Karim +5

  35. cs.DC 2026-04-27 reviewed
    Distributed solver speeds large IPMs up to 97 times over single-node codes

    SDSL-Solver: Scalable Distributed Sparse Linear Solvers for Large-Scale Interior Point Methods

    Shaofeng Yang +5

  36. cs.DC 2026-04-26 reviewed
    Invariants proven for local-first access control data type

    Towards System-Oriented Formal Verification of Local-First Access Control

    Florian Jacob +2

  37. cs.DC 2026-04-26 reviewed
    Full-block fusion raises Pythia decoding speed 1.34x

    ClusterFusion++: Expanding Cluster-Level Fusion to Full Transformer-Block Decoding

    ChiHeng Jin +2

  38. cs.DC 2026-04-25 reviewed
    Isolated tracks let federated learning respect client exclusions

    A Taxonomy and Resolution Strategy for Client-Level Disagreements in Federated Learning

    Daan Rosendal +1

  39. cs.DC 2026-04-25 reviewed
    Genetic algorithm lifts blockchain validator profits by 15%

    The Blockchain Execution Dilemma: Optimizing Revenue XOR Fair Ordering

    Artjom Pugatsov +2

  40. cs.DC 2026-04-25 reviewed
    RL policy adapts caches to save 43% energy in GNN training

    GreenDyGNN: Runtime-Adaptive Energy-Efficient Communication for Distributed GNN Training

    Arefin Niam +2

  41. cs.MA 2026-04-25 reviewed
    Structured overlays beat gossip for AI agent discovery under node churn

    Usable Agent Discovery for Decentralized AI Systems

    Patrizio Dazzi +3

  42. cs.DC 2026-04-24 reviewed
    Survey maps path for large language model inference on edge networks

    Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities

    Zhixiong Chen +5

  43. cs.DC 2026-04-24 reviewed
    Peer-to-peer grids obey transport lower bounds and monoid reduction rules

    Mathematical Foundations for Peer-to-Peer Lattice Computation

    Danil Gorinevski (cybiont GmbH +2

  44. cs.AR 2026-04-24 reviewed
    Accelerators improve LLM speed on edge single-board computers

    Cloud to Edge: Benchmarking LLM Inference On Hardware-Accelerated Single-Board Computers

    Harri Renney +3

  45. cs.LG 2026-04-24 reviewed
    Gradient entropy ranks client contributions without validation data

    Data-Free Contribution Estimation in Federated Learning using Gradient von Neumann Entropy

    Asim Ukaye +3

  46. cs.DC 2026-04-24 reviewed
    Continuous bids cut cloud contention losses by 8-23%

    LaissezCloud: Continuous Resource Renegotiation for the Public Cloud

    Tejas Harith +1

  47. cs.DC 2026-04-24 reviewed
    MPS gains or loses 30% in GPU sharing depending on memory contention

    A comprehensive evaluation of spatial co-execution on GPUs using MPS and MIG technologies

    Jorge Villarrubia +3

  48. cs.DC 2026-04-24 reviewed
    Top-K method speeds sparse decode 1.88x on Blackwell

    Guess-Verify-Refine: Data-Aware Top-K for Sparse-Attention Decoding on Blackwell via Temporal Correlation

    Long Cheng +9

  49. cs.DC 2026-04-24 reviewed
    Multi-path GPU links with CUDA Graphs boost bandwidth 2.95x

    Accelerating Intra-Node GPU-to-GPU Communication Through Multi-Path Transfers with CUDA Graphs

    Amirhossein Sojoodi +4

  50. cs.DC 2026-04-24 reviewed
    Algorithm achieves 8K-approximation for coflow scheduling in K-core OCS networks

    O(K)-Approximation Coflow Scheduling in K-Core Optical Circuit Switching Networks

    Xin Wang +3