WattGPU ML models predict LLM inference power and latency on unseen GPUs with median errors of 3.4-13.5% using public data and show better performance than baselines.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Recover-LoRA with synthetic-data distillation recovers 80-95% accuracy on most benchmarks after selective 2-bit quantization of MLP gate/up layers while delivering 7.5-23.3% throughput improvement.
citing papers explorer
-
WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs
WattGPU ML models predict LLM inference power and latency on unseen GPUs with median errors of 3.4-13.5% using public data and show better performance than baselines.
-
Recover-LoRA for Aggressive Quantization: Reclaiming Accuracy in 2-Bit Language Models via Low-Rank Adaptation with Knowledge Distillation on Synthetic Data
Recover-LoRA with synthetic-data distillation recovers 80-95% accuracy on most benchmarks after selective 2-bit quantization of MLP gate/up layers while delivering 7.5-23.3% throughput improvement.