pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

225 papers in cs.PF · page 3

  1. cs.DC 2026-04-16 reviewed
    Block placement and cache rules cut LLM serving latency

    Serving Chain-structured Jobs with Large Memory Footprints with Application to Large Foundation Model Serving

    Tingyang Sun +2

  2. cs.PF 2026-04-16 reviewed
    L4 GPU delivers up to 4.4x inference throughput over T4

    DEEP-GAP: Deep-learning Evaluation of Execution Parallelism in GPU Architectural Performance

    Kathiravan Palaniappan

  3. eess.SY 2026-04-15 reviewed
    State-based scheduler maps full polytope of feasible worst-case schedules

    Exploiting Scheduling Flexibility via State-Based Scheduling When Guaranteeing Worst-Case Services

    Yike Xu +1

  4. cs.PL 2026-04-14 reviewed
    Virtual machine speeds array programs 147x on GPUs

    Towards a Linear-Algebraic Hypervisor

    Breandan Considine

  5. cs.NI 2026-04-14 reviewed
    IPFS achieves 70% success on decentralized NAT traversal

    Large-Scale Measurement of NAT Traversal for the Decentralized Web: A Case Study of DCUtR in IPFS

    Dennis Trautwein +4

  6. cs.CR 2026-04-13 reviewed
    Sparse FHE matmul on GPUs runs up to 3x faster than CPU

    GPU Acceleration of Sparse Fully Homomorphic Encrypted DNNs

    Lara D'Agata +9

  7. quant-ph 2026-04-13 reviewed
    Transpiler maps OpenQASM 3.0 dynamic circuits to CUDA-Q kernels

    Efficient Transpilation of OpenQASM 3.0 Dynamic Circuits to CUDA-Q: Performance and Expressiveness Advantages

    Vinooth Kulkarni +6

  8. cs.PF 2026-04-13 reviewed
    H200 outperforms H100 for memory-bound tasks when power-capped

    Architectural Trade-offs in the Energy-Efficient Era: A Comparative Study of power-capping NVIDIA H100 and H200

    Aditya Ujeniya +3

  9. cs.DC 2026-04-13 reviewed
    Hierarchical search tunes GPU apps better and faster

    Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search

    Daniel Nichols +5

  10. physics.flu-dyn 2026-04-13 reviewed
    Julia model for particle flows hits 18x GPU speedup

    LCS.jl: A High-Performance, Multi-Platform Computational Model in Julia for Turbulent Particle-Laden Flows

    Taketo Tominaga (Institute of Science Tokyo) +1

  11. eess.SY 2026-04-12 reviewed
    AI workload mix smooths power variability but keeps fast ramps

    Workload composition smooths aggregate power demand while sustaining short-horizon ramps in AI data centers

    Subir Majumder +2

  12. physics.app-ph 2026-04-12 reviewed
    Adaptive beta tuning curbs dominance in AI resource allocation

    Computable Fairness: Boltzmann-Softmax Control for AI Resource Allocation

    Ji-Won Park +1

  13. cs.LG 2026-04-12 reviewed
    MoEITS prunes experts in LLMs to reduce compute while preserving accuracy

    MoEITS: A Green AI approach for simplifying MoE-LLMs

    Luis Balderas +2

  14. cs.PF 2026-04-11 reviewed
    Wave-aware model picks near-optimal GPU kernel settings fast

    WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning

    Kaixuan Zhang +8

  15. cs.PF 2026-04-11 reviewed
    Mosaic clusters KVCache for faster streaming video VLMs

    Mosaic: Cross-Modal Clustering for Efficient Video Understanding

    Tuowei Wang +4

  16. cs.DC 2026-04-09 reviewed
    Energy-efficient GPUs deliver better value under budget limits

    Wattlytics: A Web Platform for Co-Optimizing Performance, Energy, and TCO in HPC Clusters

    Ayesha Afzal +2

  17. cs.DC 2026-04-08 reviewed
    CPU-free LLM serving cuts P99 latency up to 8x

    Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC

    Mohammad Siavashi +4

  18. cs.DC 2026-04-08 reviewed
    Client scheduler hits 100% LLM deadlines at 4.2 requests per second

    Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale

    Renzhong Yuan +5

  19. cs.DC 2026-04-07 reviewed
    Go runtime outperforms Python and Node.js for OpenFaaS on Kubernetes

    Optimizing OpenFaaS on Kubernetes: Comparative Analysis of Language Runtimes and Cluster Distributions

    Ehsan Ataie +2

  20. cs.PF 2026-04-07 reviewed
    PTE metric predicts LLM tool-use latency better than token counts

    Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning

    Qisheng Su +5

  21. cs.PL 2026-04-06 reviewed
    AutoLALA produces symbolic reuse-distance formulas for loop nests

    AutoLALA: Automatic Loop Algebraic Locality Analysis for AI and HPC Kernels

    Yifan Zhu +3

  22. cs.AI 2026-04-06 reviewed
    Three metrics separate AI adaptation from data shifts

    Learning, Potential, and Retention: An Approach for Evaluating Adaptive AI-Enabled Medical Devices

    Alexis Burgon +4

  23. cs.DC 2026-04-06 reviewed
    Execution-idle wastes 10.7% of GPU cluster energy

    The Energy Cost of Execution-Idle in GPU Clusters

    Yiran Lei +6

  24. cs.DC 2026-04-06 reviewed
    Satellite emulators tested against real data show clear gaps

    An experimental evaluation of satellite constellation emulators

    Victor Cionca +3

  25. eess.SP 2026-04-06 reviewed
    UAV flights fit polynomial and ML models to 5G KPIs

    Modeling and Analysis of Air-to-Ground Cellular KPIs in a 5G Testbed using Android Smartphones

    Simran Singh +7

  26. cs.PF 2026-04-06 reviewed
    Half the DCT coefficients train a transformer to near baseline loss

    Training Transformers in Cosine Coefficient Space

    Mohamed Amine Bergach

  27. cs.AI 2026-04-06 reviewed
    Merging experts beats pruning in MoE LLMs

    REAM: Merging Improves Pruning of Experts in LLMs

    Saurav Jha +5

  28. cs.CR 2026-04-05 reviewed
    Container testbed automates reproducible cybersecurity datasets

    NetSecBed: A Container-Native Testbed for Reproducible Cybersecurity Experimentation

    Leonardo Bitzki +6

  29. cs.PF 2026-04-03 reviewed
    Bridges link blockchains but usage lags behind

    The Price of Interoperability: Exploring Cross-Chain Bridges and Their Economic Consequences

    Yiyue Cao +4

  30. cs.LG 2026-04-02 reviewed
    Shared memory speeds NF4 dequantization 2x

    Fast NF4 Dequantization Kernels for Large Language Model Inference

    Xiangbo Qi +2

  31. cs.DC 2026-03-31 reviewed
    Multi-agent LLM workflow maps service text to KVI intervals

    KPI2KVI: A Multi Agent Workflow for Calculating Key Value Indicators from Service Descriptions

    Masoud Shokrnezhad +3

  32. cs.DC 2026-03-26 reviewed
    Erasure coding reduces LLM checkpoint latency 2.7x

    GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving

    Shakya Jayakody +3

  33. physics.plasm-ph 2026-03-25 reviewed
    Hybrid MPI+OpenMP scales PIC Monte Carlo to 16,000 GPUs

    Multi-GPU Hybrid Particle-in-Cell Monte Carlo Simulations for Exascale Computing Systems

    Jeremy J. Williams +15

  34. cs.CR 2026-03-19 reviewed
    ML-KEM key exchange runs in 35.7 ms on M0+

    Benchmarking NIST-Standardised ML-KEM and ML-DSA on ARM Cortex-M0+: Performance, Memory, and Energy on the RP2040

    Rojin Chhetri

  35. cs.NI 2026-03-14 reviewed
    CATS transport cuts first paint time by 78% in worst-case web load

    A Case for CATS: A Conductor-driven Asymmetric Transport Scheme for Semantic Prioritization

    Syed Muhammad Aqdas Rizvi

  36. cs.DC 2026-03-10 reviewed
    FP64 tensor cores speed finite-element kernels 2x

    Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores

    Jiqun Tu +6

  37. cs.DC 2026-03-04 reviewed
    Fixed encoding decodes data 9-213× faster than Protocol Buffers

    Simplicity Scales

    Andrew Sampson (6OVER3 Institute) +2

  38. cs.NI 2026-02-23 reviewed
    Dynamic routing across LLMs beats any single model

    Dynamic Model Routing and Cascading for Efficient LLM Inference: A Survey

    Yasmin Moslem +1

  39. cs.DC 2026-02-19 reviewed
    SwapLess cuts Edge TPU latency up to 77% via CPU-TPU partitioning

    Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUs

    Nathan Ng +7

  40. cs.LG 2026-02-09 reviewed
    WebGPU dispatch overhead is 24-36 μs on Vulkan

    Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers

    J\k{e}drzej Maczan

  41. cs.AI 2026-02-05 reviewed
    LLM energy minima at moderate input and output lengths

    SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference

    Hiari Pizzini Cavagna +7

  42. cs.CR 2026-01-30 reviewed
    PQC algorithms add manageable delay to enterprise Wi-Fi logins

    Assessing the Real-World Impact of Post-Quantum Cryptography on WPA-Enterprise Networks

    Lukas K\"oder +5

  43. cs.PF 2026-01-21 reviewed
    Hybrid model cuts GPU kernel prediction error by 6.7x

    PipeWeave: Synergizing Analytical and Learning Models for Unified GPU Performance Prediction

    Kaixuan Zhang +10

  44. cs.DC 2026-01-15 reviewed
    Beta metric delivers 96.5% optimal edge AI performance

    Mitigating GIL Bottlenecks in Edge AI Systems

    Mridankan Mandal +1

  45. cs.LG 2026-01-06 reviewed
    Sparse kernels factor forest proximities exactly

    Revisiting Forest Proximities via Sparse Leaf-Incidence Kernels

    Adrien Aumon +3

  46. cs.DC 2025-12-23 reviewed
    SHIRO delivers 221x SpMM speedup on 128 GPUs via sparsity-aware transfers

    SHIRO: Near-Optimal Communication Strategies for Distributed Sparse Matrix Multiplication

    Chen Zhuang +7

  47. cs.DC 2025-12-18 reviewed
    Multipath routing lifts host-GPU bandwidth 4.6x

    MultiPath Memory Access: Breaking Host-GPU Bandwidth Bottlenecks in LLM Services

    Lingfeng Tang +8

  48. cs.DC 2025-12-17 reviewed
    Data movement bottlenecks sit outside the network core

    Reexamining Paradigms of End-to-End Data Movement

    Chin Fang +3

  49. cs.DC 2025-12-15 reviewed
    Framework links SKA imaging quality to energy and cost metrics

    astroCAMP: A Community Benchmark and Co-Design Framework for Sustainable SKA-Scale Radio Imaging

    Denisa-Andreea Constantinescu +9

  50. cs.SE 2025-12-13 reviewed
    Async Kafka rules shift availability forecasts by 0.001 points or less

    Evaluating Asynchronous Semantics in Trace-Discovered Resilience Models: A Case Study on the OpenTelemetry Demo

    Anatoly A. Krasnovsky