← back to paper
arxiv: 2605.04279 · 2 revisions
Gradient Flow Structure and Quantitative Dynamics of Multi-Head Self-Attention