SATFormer uses a context-dependent gate for selective reuse of early Transformer representations, improving validation loss and zero-shot accuracy especially on retrieval benchmarks.
arXiv preprint arXiv:2302.03985 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Attention Residuals replaces fixed residual summation with input-dependent softmax attention over preceding layers, and a blocked variant is shown to improve uniformity and downstream performance in a 48B-parameter model pre-trained on 1.4T tokens.
MSLA is a new attention mechanism that models multi-scale and cross-layer interactions to achieve more accurate OBI recognition than prior attention methods.
citing papers explorer
-
Enhancing Oracle Bone Inscription Recognition via Multi-Scale Layer Attention
MSLA is a new attention mechanism that models multi-scale and cross-layer interactions to achieve more accurate OBI recognition than prior attention methods.