Grammatically-Guided Sparse Attention uses POS tags to generate hard or soft masks that constrain self-attention, achieving 0.8200 and 0.8165 accuracy on SST-2 versus 0.8200 for full attention in a DistilBERT-like model.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers
Grammatically-Guided Sparse Attention uses POS tags to generate hard or soft masks that constrain self-attention, achieving 0.8200 and 0.8165 accuracy on SST-2 versus 0.8200 for full attention in a DistilBERT-like model.