Forecasting llm inference performance via hardware-agnostic analytical modeling,

· 2025 · arXiv 2508.00904

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Latency Prediction for LLM Inference on NPU Systems

cs.DC · 2026-06-16 · unverdicted · novelty 7.0

LENS predicts NPU LLM inference latency with 2.15% mean error by profiling each bucket with two E2E measurements and composing results to capture bucketing non-linearity.

WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs

cs.DC · 2026-07-02 · unverdicted · novelty 6.0

WattGPU ML models predict LLM inference power and latency on unseen GPUs with median errors of 3.4-13.5% using public data and show better performance than baselines.

Recover-LoRA for Aggressive Quantization: Reclaiming Accuracy in 2-Bit Language Models via Low-Rank Adaptation with Knowledge Distillation on Synthetic Data

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

Recover-LoRA with synthetic-data distillation recovers 80-95% accuracy on most benchmarks after selective 2-bit quantization of MLP gate/up layers while delivering 7.5-23.3% throughput improvement.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Latency Prediction for LLM Inference on NPU Systems cs.DC · 2026-06-16 · unverdicted · none · ref 32
LENS predicts NPU LLM inference latency with 2.15% mean error by profiling each bucket with two E2E measurements and composing results to capture bucketing non-linearity.
WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs cs.DC · 2026-07-02 · unverdicted · none · ref 16
WattGPU ML models predict LLM inference power and latency on unseen GPUs with median errors of 3.4-13.5% using public data and show better performance than baselines.

Forecasting llm inference performance via hardware-agnostic analytical modeling,

fields

years

verdicts

representative citing papers

citing papers explorer