pith. sign in

Title resolution pending

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

years

2025 3

verdicts

UNVERDICTED 3

representative citing papers

The Bayesian Geometry of Transformer Attention

cs.LG · 2025-12-27 · unverdicted · novelty 7.0

Small transformers reproduce known Bayesian posteriors with 10^{-3} to 10^{-4} bit accuracy in verifiable wind-tunnel tasks via residual belief states, FFN updates, and attention routing, while MLPs do not.

Geometric Scaling of Bayesian Inference in LLMs

cs.LG · 2025-12-27 · unverdicted · novelty 5.0

Large language models preserve a geometric substrate in value representations that correlates with uncertainty and matches patterns from small models performing exact Bayesian inference.

citing papers explorer

Showing 3 of 3 citing papers.

  • The Bayesian Geometry of Transformer Attention cs.LG · 2025-12-27 · unverdicted · none · ref 4

    Small transformers reproduce known Bayesian posteriors with 10^{-3} to 10^{-4} bit accuracy in verifiable wind-tunnel tasks via residual belief states, FFN updates, and attention routing, while MLPs do not.

  • Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds stat.ML · 2025-12-27 · unverdicted · none · ref 3

    Gradient analysis shows cross-entropy induces an EM-like loop in attention that sculpts Bayesian manifolds supporting in-context probabilistic inference.

  • Geometric Scaling of Bayesian Inference in LLMs cs.LG · 2025-12-27 · unverdicted · none · ref 9

    Large language models preserve a geometric substrate in value representations that correlates with uncertainty and matches patterns from small models performing exact Bayesian inference.