pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

225 papers in cs.PF · page 1

  1. cs.LG 2026-05-22 reviewed
    Meta-learning yields model performance scores on unlabeled data

    Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning

    Trinh Pham +4

  2. cs.LG 2026-05-21 reviewed
    Controller routes LLM requests to best mode for 2x speedup

    ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU

    Aman Sunesh +2

  3. cs.AR 2026-05-21 reviewed
    ACALSim reaches 14x speedup over SST on large GPU simulations

    ACALSim: A Scalable Parallel Simulation Framework for High-Performance System Design Space Exploration

    Wei-Fen Lin +7

  4. cs.LG 2026-05-21 reviewed
    Separate physical pools for KV and SSM caches cut OOMs 7.6% and raise throughput up to 13x

    Asymmetric Virtual Memory Paging for Hybrid Mamba-Transformer Inference

    An Xuan Nguyen

  5. cs.AI 2026-05-20 reviewed
    Agentic AI uses 4.33x more energy per successful goal than linear baselines

    Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems

    Deepak Panigrahy +1

  6. cs.PF 2026-05-20 reviewed
    Discretization produces throughput-optimal policies for continuous MRJ

    Throughput-Optimal Multiresource-Job Scheduling with Continuous Requirement Distribution

    Heyuan Yao +2

  7. cs.LG 2026-05-19 reviewed
    Krylov approximation unlearns data 48x faster than retraining

    Causal Unlearning in Collaborative Optimization: Exact and Approximate Influence Reversal under Adversarial Contributions

    Ali Mahdavi +3

  8. cs.CV 2026-05-19 reviewed
    Billion-scale 3D Gaussians train on one 24 GB GPU

    TideGS: Scalable Training of Over One Billion 3D Gaussian Splatting Primitives via Out-of-Core Optimization

    Chonghao Zhong +6

  9. cs.SE 2026-05-19 reviewed
    Agent skills from expert methods beat docs for PostgreSQL tuning

    A Case for Agentic Tuning: From Documentation to Action in PostgreSQL

    Hongyu Lin +6

  10. cs.DC 2026-05-19 reviewed
    Reasoning LLMs trap data parallelism in KV-cache limits

    Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles

    Moiz Arif +3

  11. cs.PF 2026-05-18 reviewed
    Geo-distributed AI training optimizes at 10-100 km distances

    Modeling the Impact of Fiber Latency on Compute-Communication Overlap in Geo-Distributed Multi-Datacenter AI Training

    Ioannis Papavasileiou +3

  12. cs.PF 2026-05-18 reviewed
    Hybrid model cuts medical tourist waits from 13.7 to 2.4 days

    Reducing Waiting Time for Medical Tourists Through Hybrid Agent-Based and Discrete-Event Simulation: A Hospital Case Study

    Melika Baghi +1

  13. cs.LO 2026-05-18 reviewed
    Unified calculus and lattice language reduce CS problems to performance evaluation

    On Generalized Performance Evaluation and Generalized Controller Synthesis

    Zining Cao

  14. cs.LG 2026-05-18 reviewed
    Boundary protection recovers 69-90% quality at 13% KV retention

    Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction

    Gabriel Garcia

  15. cs.LG 2026-05-18 reviewed
    Covariance rotations keep 2-bit KV caches accurate

    OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

    Zhongzhu Zhou +6

  16. eess.IV 2026-05-16 reviewed
    Legacy GPUs power real-time 8K60 for connected vehicles

    Sustainable Real-Time 8K60 HEVC Encoding for V2X: Repurposing Legacy NVENC Hardware at the Vehicular Edge

    Kasidis Arunruangsirilert +1

  17. cs.PF 2026-05-15 reviewed
    Heuristic merges HPC traces to extend hardware counter coverage

    Heuristic-Based Merging of HPC Traces to Extend Hardware Counter Coverage

    J\'ulia Orteu Aubach +3

  18. cs.LG 2026-05-15 reviewed
    Closed-form linear operator fixes layer-pruned LLMs

    Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs

    Vincent-Daniel Yun +3

  19. quant-ph 2026-05-14 reviewed
    Cache reorganization lifts GPU speedups for 28-qubit simulations on laptops

    Accelerating State-Vector Quantum Simulation on Integrated GPUs via Cache Locality Optimization: A Cross-Architecture Evaluation

    Gabriel Fernandes Thomaz +4

  20. cs.OS 2026-05-14 reviewed
    LLM tunes Linux knobs for 72 percent stable gain over defaults

    SemaTune: Semantic-Aware Online OS Tuning with Large Language Models

    Georgios Liargkovas +3

  21. cs.DC 2026-05-13 reviewed
    Heterogeneous solvers up to 32% faster than GPU-only for big matrices

    Comparing the Performance of Heterogeneous Conjugate Gradient and Cholesky Solvers on Various Hardware Using SYCL

    Tim Th\"uring +2

  22. cs.LG 2026-05-12 reviewed
    Block-scale search cuts quantization error 27% in BFP

    Search Your Block Floating Point Scales!

    Tanmaey Gupta +12

  23. cs.PF 2026-05-12 reviewed
    Adaptive packed layouts enable efficient VLA ML code

    Scalable Packed Layouts for Vector-Length-Agnostic ML Code Generation

    Ege Beysel +2

  24. cs.PF 2026-05-12 reviewed
    Packed layouts enable scalable vector ML code

    Scalable Packed Layouts for Vector-Length-Agnostic ML Code Generation

    Ege Beysel +2

  25. cs.AR 2026-05-12 reviewed
    Joint TLB-cache tweaks boost instruction prefetching 8.7%

    Enhancing Instruction Prefetching via Cache and TLB Management

    Alexandre Valentin Jamet +4

  26. cs.IT 2026-05-12 reviewed
    Node failures scale wireless capacity and delay with sqrt of reliable nodes

    On Capacity and Delay of Wireless Networks with Node Failures

    Wei Li +3

  27. cs.DC 2026-05-12 reviewed
    Power capping leaves LLM decode energy untouched

    The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures

    Bole Ma +3

  28. cs.DC 2026-05-11 reviewed
    Chakra standardizes graph traces for AI workload benchmarking

    MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

    Srinivas Sridharan +28

  29. cs.DC 2026-05-11 reviewed
    Open traces standardize ML workload benchmarking

    MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

    Srinivas Sridharan +28

  30. cs.LG 2026-05-11 reviewed
    DMI-Lib cuts LLM internal observability overhead to 0.4-6.8 percent

    Enabling Performant and Flexible Model-Internal Observability for LLM Inference

    Nengneng Yu +4

  31. cs.DC 2026-05-11 reviewed
    Edge micro-agent fixes failures safely with no destructive actions

    An Uncertainty-Aware Resilience Micro-Agent for Causal Observability in the Computing Continuum

    Suvi De Silva +4

  32. cs.GR 2026-05-11 reviewed
    Inverted culling speeds dynamic LiDAR ray tracing

    Geometrically Approximated Modeling for Emitter-Centric Ray-Triangle Filtering in Arbitrarily Dynamic LiDAR Simulation

    Rabin Gajmer +2

  33. cs.CR 2026-05-11 reviewed
    KEM-IES upgrades ECIES with PQC KEM and Ascon

    Key Encapsulation Mechanism-Based Integrated Encryption Scheme (KEM-IES)

    Abel C. H. Chen

  34. cs.RO 2026-05-11 reviewed
    Caching reuses diffusion steps for 4.6x faster robot plans

    Muninn: Your Trajectory Diffusion Model But Faster

    Gokul Puthumanaillam +6

  35. cs.CR 2026-05-11 reviewed
    Mamba-2 classifies network bursts directly from raw bytes

    MambaNetBurst: Direct Byte-level Network Traffic Classification without Tokenization or Pretraining

    Gayan K. Kulatilleke +3

  36. cs.DC 2026-05-10 reviewed
    Cloud trace decomposition predicts performance at 2% error

    Cloud Performance Decomposition for Long-Term Performance Engineering: A Case Study

    Shimul Debnath +4

  37. cs.DC 2026-05-10 reviewed
    Adaptive DNN splits cut energy by 27-36% on real edge-cloud hardware

    Adaptive DNN Partitioning and Offloading in Heterogeneous Edge-Cloud Continuum

    Akuen Akoi Deng +3

  38. cs.LG 2026-05-09 reviewed
    Apple MPS shows 21x latency spikes in narrow decoding ranges

    Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

    Willy Fitra Hendria

  39. cs.LG 2026-05-09 reviewed
    MPS decoding latency spikes up to 21x in narrow ranges

    Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

    Willy Fitra Hendria

  40. cs.PF 2026-05-09 reviewed
    GPU speedups reach 10x despite 1.85x bandwidth limit in quantum simulation

    A Controlled Study of Memory Hierarchy Transitions in Quantum Circuit Simulation on Apple M4 Pro Unified Memory Architecture

    Gyan Pratipat

  41. cs.PF 2026-05-09 reviewed
    4.46× jump in quantum sim time at 29 qubits on M4 Pro

    A Controlled Study of Memory Hierarchy Transitions in Quantum Circuit Simulation on Apple M4 Pro Unified Memory Architecture

    Gyan Pratipat

  42. cs.PF 2026-05-09 reviewed
    Single-thread JPEG benchmarks misrank decoders for DataLoaders

    Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders

    Vladimir Iglovikov +1

  43. cs.PF 2026-05-09 reviewed
    DataLoader benchmarks reorder JPEG decoder rankings

    Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders

    Vladimir Iglovikov +1

  44. cs.AR 2026-05-09 reviewed
    DDR5 single sub-channel matches cache lines but loses 40-60% bandwidth

    Single 32-bit Sub-Channel DDR5 DIMMs: Architecture, Performance Bounds, and Standardisation

    Chih-Hua Ke

  45. cs.LG 2026-05-08 reviewed
    Cyclic tuning raises RAG quality by up to 54 percent

    CDS4RAG: Cyclic Dual-Sequential Hyperparameter Optimization for RAG

    Pengzhou Chen +1

  46. cs.LG 2026-05-08 reviewed
    Unified runtime delivers 2.55x decode speedup for low-rank transformers

    FlashSVD v1.5: Making Low-Rank Transformers Inference Actually Fast

    Wenhao Wu +7

  47. cs.LG 2026-05-08 reviewed
    Fluxion speeds long-context inference 1.5x-3.7x via CPU-GPU hybrid sparse attention

    An Efficient Hybrid Sparse Attention with CPU-GPU Parallelism for Long-Context Inference

    Feiyu Yao +5

  48. cs.LG 2026-05-08 reviewed
    First benchmark supplies real data for LLM hyperparameter tuning

    LLMSYS-HPOBench: Hyperparameter Optimization Benchmark Suite for Real-World LLM Systems

    Siyu Wu +5

  49. cs.DC 2026-05-07 reviewed
    AD replaces finite differences in INLA for 4-8x gradient speedups

    ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations

    Afif Boudaoud +8

  50. cs.AR 2026-05-07 reviewed
    Pipeline speeds power-of-two DNNs on edge FPGAs by up to 3.6x

    PoTAcc: A Pipeline for End-to-End Acceleration of Power-of-Two Quantized DNNs

    Rappy Saha +4