Muse: Parallel multi-scale attention for sequence to sequence learning.arXiv preprint arXiv:1911.09483,

[ZXZ+19] Guangxiang Zhao, Jingjing Xu, Zhiyuan Zhang, Liangchen Luo, Zhengdong Lu, Xu Sun · 1911 · arXiv 1911.09483

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Key and Value Weights Are Probably All You Need: On the Necessity of the Query, Key, Value weight Triplet in Self-Attention Transformers

cs.LG · 2025-10-27 · unverdicted · novelty 7.0

One of the Q, K or V weights in transformer self-attention is redundant and replaceable by the identity matrix under mild assumptions, reducing parameters by 25 percent with no loss in small-model performance.

citing papers explorer

Showing 1 of 1 citing paper.

Key and Value Weights Are Probably All You Need: On the Necessity of the Query, Key, Value weight Triplet in Self-Attention Transformers cs.LG · 2025-10-27 · unverdicted · none · ref 24
One of the Q, K or V weights in transformer self-attention is redundant and replaceable by the identity matrix under mild assumptions, reducing parameters by 25 percent with no loss in small-model performance.

Muse: Parallel multi-scale attention for sequence to sequence learning.arXiv preprint arXiv:1911.09483,

fields

years

verdicts

representative citing papers

citing papers explorer