Attention sinks in NLLB-200 cross-attention cause non-content tokens to dominate 83-91% of mass, halving apparent content similarity; content filtering recovers linguistic signals like language clustering and mode differences.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
PhysEDA folds separable Manhattan-distance exponential decay into linear attention and potential-based rewards, cutting complexity to linear while improving zero-shot transfer and sparse-reward performance on decoupling-cap placement, macro placement, and IR-drop prediction.
citing papers explorer
-
Attention Sinks in Massively Multilingual Neural Machine Translation:Discovery, Analysis, and Mitigation
Attention sinks in NLLB-200 cross-attention cause non-content tokens to dominate 83-91% of mass, halving apparent content similarity; content filtering recovers linguistic signals like language clustering and mode differences.
-
PhysEDA: Physics-Aware Learning Framework for Efficient EDA With Manhattan Distance Decay
PhysEDA folds separable Manhattan-distance exponential decay into linear attention and potential-based rewards, cutting complexity to linear while improving zero-shot transfer and sparse-reward performance on decoupling-cap placement, macro placement, and IR-drop prediction.