Analysis of 144 task-model pairs finds mathematical reasoning produces the highest attention entropy in all architectures while decoder models show significantly higher sparsity than encoders.
Wolf et al., ”Transformers: State-of-the-art natural language process- ing,” in Proceedings of EMNLP: System Demonstrations, 2020, pp
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance
Analysis of 144 task-model pairs finds mathematical reasoning produces the highest attention entropy in all architectures while decoder models show significantly higher sparsity than encoders.