L4 delivers up to 4.4x higher throughput than T4 for ResNet models, peaks at batch sizes 16-32, and INT8 yields up to 58x gains over CPU baselines.
Available: https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt- 861/developer-guide/index.html
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.PF 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DEEP-GAP: Deep-learning Evaluation of Execution Parallelism in GPU Architectural Performance
L4 delivers up to 4.4x higher throughput than T4 for ResNet models, peaks at batch sizes 16-32, and INT8 yields up to 58x gains over CPU baselines.