INCRT grows and prunes transformer attention heads on the fly via a geometric criterion, reaching a provably minimal sufficient architecture that matches or exceeds BERT-base performance on targeted tasks with 3-7 times fewer parameters.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
INCRT: An Incremental Transformer That Determines Its Own Architecture
INCRT grows and prunes transformer attention heads on the fly via a geometric criterion, reaching a provably minimal sufficient architecture that matches or exceeds BERT-base performance on targeted tasks with 3-7 times fewer parameters.