archive

Every paper Pith has read. Search by title, abstract, or pith.

225 papers in cs.PF · page 2

cs.AR 2026-05-07 reviewed

LLMs automate FPGA accelerator design space exploration
LLM-Driven Design Space Exploration of FPGA-based Accelerators

Vinamra Sharma +3
cs.PF 2026-05-07 reviewed

Int4 KV cache outruns fp16 on Apple Silicon
When Quantization Is Free: An int4 KV Cache That Outruns fp16 on Apple Silicon

Mohamed Amine Bergach
cs.LG 2026-05-06 reviewed

Task category predicts LLM kernel success far better than generation method
KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

Han Wang +5
cs.LG 2026-05-06 reviewed

Task category explains 3x more variance than method in LLM kernel correctness
KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

Han Wang +5
cs.GR 2026-05-06 reviewed

Algebraic coarsening delivers 3x speedup in GPU contact solves
AGIPC: Adaptive In-Solve Algebraic Coarsening for GPU IPC

Xuan Wang +4
cs.PF 2026-05-06 reviewed

LLM agents turn GPU profiles into optimization advice
KEET: Explaining Performance of GPU Kernels Using LLM Agents

Joshua H. Davis +7
cs.GT 2026-05-05 reviewed

Light storage limits turn content-provider competition into a potential game
Decentralized Edge Caching under Budget and Storage Constraints: A Game-Theoretic Approach

Hamta Sedghani +3
cs.AR 2026-05-05 reviewed

SPEC CPU2026 increases instruction volume and cache pressure
SPEC CPU2026: Characterization, Representativeness, and Cross-Suite Comparison

RuiHao Li +3
cs.AR 2026-05-05 reviewed

4-5 workloads preserve 96-99% of SPEC CPU2026 behavior
SPEC CPU2026: Characterization, Representativeness, and Cross-Suite Comparison

RuiHao Li +3
cs.DC 2026-05-05 reviewed

GPU layer speeds exascale trace analysis by up to 314x
Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics

Dragana Grbic (Department of Computer Science +1
cs.DC 2026-05-05 reviewed

GPU speeds exascale trace analysis by 314 times
Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics

Dragana Grbic (Department of Computer Science +1
cs.PF 2026-05-04 reviewed

Same model name yields different speed
When Is the Same Model Not the Same Service? A Measurement Study of Hosted Open-Weight LLM APIs

Haorui Li +9
cs.PF 2026-05-04 reviewed

Same LLM name produces different services by host
When Is the Same Model Not the Same Service? A Measurement Study of Hosted Open-Weight LLM APIs

Haorui Li +9
cs.LG 2026-05-04 reviewed

Streaming top-k runs CSA indexer to 1M tokens on 6 GB
StreamIndex: Memory-Bounded Compressed Sparse Attention via Streaming Top-k

Jaber Jaber +1
cs.CR 2026-05-04 reviewed

Two post-quantum signatures pass Australia's payment speed test
Post-Quantum Cryptography Migration in Australian Real-Time Payment Infrastructure: A Monte Carlo Simulation Study of the New Payments Platform

Nazmus Salehin Sammo
cs.PF 2026-05-02 reviewed

SPEC CPU 2026 standardizes mixed-workload CPU benchmarking
SPEC CPU: The Next Generation

Mahesh Madhav +33
cs.PF 2026-05-02 reviewed

Response time distributions derived for priority queues with preemption overhead
Priority Scheduling in the M/G/1 with Preemption Overhead

Shefali Ramakrishna +2
cs.PL 2026-05-01 reviewed

Compiler splits recursive datatypes into separate field buffers
SoCal: A Language for Memory-Layout Factorization of Recursive Datatypes

Vidush Singhal +5
cs.DC 2026-05-01 reviewed

Fixed-core approach yields 211x higher efficiency for edge GEMM
Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge

M. Grailoo +1
cs.PF 2026-05-01 reviewed

Apple Silicon runs 80B LLMs at 23x Nvidia energy efficiency
Silicon Showdown: Performance, Efficiency, and Ecosystem Barriers in Consumer-Grade LLM Inference

Abdurrahman Javat +1
stat.ME 2026-05-01 reviewed

Workflow turns raw measurements into defensible ECE/CS results
How to Do Statistical Evaluations in ECE/CS Papers: A Practical Playbook for Defensible Results

Bhaskar Krishnamachari
cs.AI 2026-05-01 reviewed

Same model accuracy varies 12 points by endpoint
Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference

Yuxuan Gao +2
cs.MA 2026-04-29 reviewed

C++ engine hits 33 million steps per second on POMDP tasks
A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations

Timothy Flavin +1
cs.LG 2026-04-29 reviewed

Compiler automates sequence parallelism for 2.7x longer LLM contexts
AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism

Ahan Gupta +5
cs.PF 2026-04-29 reviewed

Watchpoint recovers full NVIDIA driver command streams
Revealing NVIDIA Closed-Source Driver Command Streams for CPU-GPU Runtime Behavior Insight

Yuang Yan +2
cs.SE 2026-04-29 reviewed

RAPL tools add up to 47% time overhead at 1 kHz polling
What Is the Cost of Energy Monitoring? An Empirical Study on the Overhead of RAPL-Based Tools

Jeremy Diamond +1
cs.DC 2026-04-29 reviewed

Agentic workflow turns PyTorch graphs into faster CUTLASS kernels
FACT: Compositional Kernel Synthesis with a Three-Stage Agentic Workflow

Sina Heidari +1
cs.DC 2026-04-29 reviewed

Dual-path KV offload cuts edge LLM latency up to 42%
DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference

Bodon Jeong +6
cs.DC 2026-04-27 reviewed

Fixed-input lock keeps Spark policy outputs identical under repartitioning
Spark Policy Toolkit: Semantic Contracts and Scalable Execution for Policy Learning in Spark

Zeyu Bai
cs.NI 2026-04-27 reviewed

Reprofiling flows cuts bandwidth for delay guarantees in multi-hop nets
On the Benefits of Traffic "Reprofiling" -- The Multiple Hops Case -- Part II

Jiaming Qiu +1
cs.PF 2026-04-26 reviewed

Optimas automates GPU code optimization with 100% correctness
Optimas: An Intelligent Analytics-Informed Generative AI Framework for Performance Optimization

Mohammad Zaeed +2
cs.LG 2026-04-25 reviewed

Two-block Hadamard rotations match uniform ones on coordinates but not overall
Approximating Uniform Random Rotations by Two-Block Structured Hadamard Rotations in High Dimensions

Tomer Zilca +1
cs.PF 2026-04-24 reviewed

COMPASS cuts HPC job turnaround time by 66% with trace ML
COMPASS: A Unified Decision-Intelligence System for Navigating Performance Trade-off in HPC

Ankur Lahiry +4
astro-ph.IM 2026-04-24 reviewed

Tool shows solar storms trigger Starlink orbit decay and 10 Mbps drops
CosmicDancePro -- Measuring LEO satellite's orbital decay and network connectivity implications during solar storms

Suvam Basak +2
cs.AR 2026-04-24 reviewed

Accelerators improve LLM speed on edge single-board computers
Cloud to Edge: Benchmarking LLM Inference On Hardware-Accelerated Single-Board Computers

Harri Renney +3
cs.DC 2026-04-24 reviewed

Top-K method speeds sparse decode 1.88x on Blackwell
Guess-Verify-Refine: Data-Aware Top-K for Sparse-Attention Decoding on Blackwell via Temporal Correlation

Long Cheng +9
cs.LG 2026-04-23 reviewed

Parallel task split makes large-scale NN search run at medium-scale cost
Large-Scale Data Parallelization of Product Quantization and Inverted Indexing Using Dask

Ashley N. Abraham +4
cs.NI 2026-04-23 reviewed

Server-driven adaptive sampling cuts wireless iBCI power by 40 mW
An Efficient Wireless iBCI Headstage with Adaptive ADC Sample Rate

Hongyao Liu +3
cs.NI 2026-04-23 reviewed

SparKV cuts on-device LLM first-token time by 1.3x-5.1x
SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference

Hongyao Liu +3
cs.LG 2026-04-22 reviewed

Joint optimizations cut multi-agent edge latency by 62 percent at 200 agents
A Delta-Aware Orchestration Framework for Scalable Multi-Agent Edge Computing

Samaresh Kumar Singh +1
cs.DC 2026-04-21 reviewed

Slicing traces GPU stall roots for 1.8x speedups across vendors
LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing

Yuning Xia +1
cs.PF 2026-04-20 reviewed

CPU-GPU hybrid speeds long-context LLM inference 1.41x-3.2x
HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing

Mao Lin +4
cs.NI 2026-04-20 reviewed

Lagrange heuristic lowers age of updates from mixed sensors
Lagrange Index based Scheduling for Minimizing Age of Updates from Heterogeneous Sources

Aniket Mukherjee +2
cs.LG 2026-04-19 reviewed

Crash-aware tuner spends fixed budget more consistently on LLM serving
SLO-Guard: Crash-Aware, Budget-Consistent Autotuning for SLO-Constrained LLM Serving

Christian Lysenst{\o}en
cs.AR 2026-04-19 reviewed

Multi-tier KV cache cuts LLM inference costs by 47%
Predictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU Inference

Sanjeev Rao Ganjihal
cs.DC 2026-04-19 reviewed

Active inference learns edge AI routing without offline training
Active Inference-Based Adaptive Routing for Heterogeneous Edge AI Services

Zihang Wang +2
cs.DB 2026-04-19 reviewed

Branchable databases slow reads up to 4000x as agent branches deepen
BranchBench: Aligning Database Branching with Agentic Demands

Elaine Ang +5
cs.LG 2026-04-17 reviewed

Precision modeling cuts training time prediction error to 9.8 percent
Training Time Prediction for Mixed Precision-based Distributed Training

Minchul Kang +7
cs.CV 2026-04-17 reviewed

CPU optimizations boost 3D biomechanics pipeline 2.47x
CPU Optimization of a Monocular 3D Biomechanics Pipeline for Low-Resource Deployment

Yan Zhang +1
cs.PF 2026-04-16 reviewed

The paper introduces Ragged Paged Attention (RPA)
Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU

Jevin Jiang +4