L4 delivers up to 4.4x higher throughput than T4 for ResNet models, peaks at batch sizes 16-32, and INT8 yields up to 58x gains over CPU baselines.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.PF 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DEEP-GAP: Deep-learning Evaluation of Execution Parallelism in GPU Architectural Performance
L4 delivers up to 4.4x higher throughput than T4 for ResNet models, peaks at batch sizes 16-32, and INT8 yields up to 58x gains over CPU baselines.