On the Expressive Power of Contextual Relations in Transformers

· 2026 · stat.ML · arXiv 2603.25860

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Transformer architectures have achieved remarkable empirical success in modeling contextual relations, yet a clear understanding of their expressive power is still lacking. In this work, we introduce a measure-theoretic framework in which contextual relations are modeled as probabilistic objects, either as conditional distributions or as joint distributions (couplings). This perspective reveals a natural connection between standard softmax attention and entropy-regularized optimal transport, providing a unified view of attention as a normalization of an underlying affinity function. Within this framework, we establish a universal approximation theorem for contextual systems using standard Softmax Attention and alternately Sinkhorn normalization. These results show that Transformer architectures can approximate arbitrary contextual relations rules, and that the choice of normalization determines how these relations are represented. Moreover, they provide a principled explanation for why Transformers are effective at modeling contextual relations.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators

stat.ML · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

A single neural operator can approximate the map from arbitrary joint densities to their conditionals, backed by new continuity results and illustrated on Gaussian mixtures.

citing papers explorer

Showing 1 of 1 citing paper.

One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators stat.ML · 2026-05-07 · unverdicted · none · ref 23 · 2 links · internal anchor
A single neural operator can approximate the map from arbitrary joint densities to their conditionals, backed by new continuity results and illustrated on Gaussian mixtures.

On the Expressive Power of Contextual Relations in Transformers

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer