DDCL-Attention introduces a collapse-free prototype readout for transformers that decomposes the training loss exactly into reconstruction and diversity terms while providing stability guarantees via singular perturbation theory.
FlashAttention-2: Faster attention with better parallelism and work parti- tioning.Proceedings of the International Conference on Learning Representations (ICLR)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
Collapse-Free Prototype Readout Layer for Transformer Encoders
DDCL-Attention introduces a collapse-free prototype readout for transformers that decomposes the training loss exactly into reconstruction and diversity terms while providing stability guarantees via singular perturbation theory.