Defines the L² over Wasserstein space to equip random probability measures with inherited Riemannian geometry, enabling statistical convergence results and Bayesian posterior consistency in the Wasserstein topology.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Training a mean-field Transformer under L2 regularization induces an escape from attention-driven token clustering in later layers after initial clustering.
citing papers explorer
-
$L^2$ over Wasserstein: Statistical Analysis for Optimal Transport
Defines the L² over Wasserstein space to equip random probability measures with inherited Riemannian geometry, enabling statistical convergence results and Bayesian posterior consistency in the Wasserstein topology.
-
Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers
Training a mean-field Transformer under L2 regularization induces an escape from attention-driven token clustering in later layers after initial clustering.