pith. sign in

Quantitative Clustering in Mean-Field Transformer Models

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it
abstract

The evolution of tokens through deep transformer models can be modeled as an interacting particle system that has been shown to exhibit an asymptotic clustering behavior akin to the synchronization phenomenon in Kuramoto models. In this work, we investigate the long-time clustering of mean-field transformer models. More precisely, under suitable assumptions on the transformer model parameters, we establish that any suitably regular mean-field initialization synchronizes exponentially fast to a Dirac point mass, with explicit quantitative convergence rates.

citation-role summary

background 2

citation-polarity summary

years

2026 7

roles

background 2

polarities

background 2

representative citing papers

Spectral Selection in Symmetric Self-Attention Dynamics

math.DS · 2026-04-28 · unverdicted · novelty 7.0

Symmetric self-attention dynamics select the dominant eigendirection of V, producing homogeneous alignment when one positive eigenvalue dominates or sign-split polarization when V is negative definite.

Propagation of Chaos in Contextual Flow Maps

cs.LG · 2026-05-16 · unverdicted · novelty 6.0

Derives forward and backward propagation-of-chaos bounds for finite vs. infinite-context transformers modeled as contextual flow maps, achieving Wasserstein rate n^{-1/d} generally and n^{-1/2} for transformer-like cases.

citing papers explorer

Showing 7 of 7 citing papers.