{"paper":{"title":"On the Expressive Power of Contextual Relations in Transformers","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Transformers can approximate any contextual relation by treating it as a probability distribution or coupling.","cross_cats":["cs.LG"],"primary_cat":"stat.ML","authors_text":"Demi\\'an Fraiman","submitted_at":"2026-03-26T19:30:36Z","abstract_excerpt":"Transformer architectures have achieved remarkable empirical success in modeling contextual relations, yet a clear understanding of their expressive power is still lacking. In this work, we introduce a measure-theoretic framework in which contextual relations are modeled as probabilistic objects, either as conditional distributions or as joint distributions (couplings). This perspective reveals a natural connection between standard softmax attention and entropy-regularized optimal transport, providing a unified view of attention as a normalization of an underlying affinity function. Within thi"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"we establish a universal approximation theorem for contextual systems using standard Softmax Attention and alternately Sinkhorn normalization. These results show that Transformer architectures can approximate arbitrary contextual relations rules, and that the choice of normalization determines how these relations are represented.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"Contextual relations can be fully and faithfully modeled as probabilistic objects, either as conditional distributions or as joint distributions (couplings), and that this modeling captures what Transformers actually compute.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Transformers using softmax or Sinkhorn attention can universally approximate any contextual relation modeled as a probabilistic coupling or conditional distribution.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Transformers can approximate any contextual relation by treating it as a probability distribution or coupling.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"8401a0316a87cec41b12e6a0f40000a6eeb10cbb5417dcf0c94512e17872fd59"},"source":{"id":"2603.25860","kind":"arxiv","version":3},"verdict":{"id":"bc20af99-e040-4ee9-ad8f-e4ccabe2f3c8","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T23:58:58.970158Z","strongest_claim":"we establish a universal approximation theorem for contextual systems using standard Softmax Attention and alternately Sinkhorn normalization. These results show that Transformer architectures can approximate arbitrary contextual relations rules, and that the choice of normalization determines how these relations are represented.","one_line_summary":"Transformers using softmax or Sinkhorn attention can universally approximate any contextual relation modeled as a probabilistic coupling or conditional distribution.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"Contextual relations can be fully and faithfully modeled as probabilistic objects, either as conditional distributions or as joint distributions (couplings), and that this modeling captures what Transformers actually compute.","pith_extraction_headline":"Transformers can approximate any contextual relation by treating it as a probability distribution or coupling."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2603.25860/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}