Empirical tests on 118 transformers show success falling from 88.1% at 512 tokens to 0% at 2048 tokens, with compressed models achieving 649.2 tokens/sec/M parameters versus 12.5 for large generative ones.
Feng et al., ”CodeBERT: A pre-trained model for programming and natural languages,” inFindings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Transformer Scalability Crisis: The First Comprehensive Empirical Analysis of Performance Walls in Modern Language Models
Empirical tests on 118 transformers show success falling from 88.1% at 512 tokens to 0% at 2048 tokens, with compressed models achieving 649.2 tokens/sec/M parameters versus 12.5 for large generative ones.