arxiv: 2602.02315 · v2 · submitted 2026-02-02 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors

Rapha\"el Sarfati , Eric Bigelow , Daniel Wurgaft , Siddharth Boppana , Jack Merullo , Atticus Geiger , Owen Lewis , Tom McGrath

show 1 more author

Ekdeep Singh Lubana

Authors on Pith no claims yet

Pith reviewed 2026-05-16 08:11 UTC · model grok-4.3

classification 💻 cs.CL

keywords representation manifoldsbelief geometrylinear steeringin-context learningparameter posteriorsgeometry-aware interventionslanguage model representations

0 comments

The pith

Parameter posteriors in language models are encoded as curved manifolds in representation space rather than linear structures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how large language models form implicit beliefs as posteriors over latent variables by using a controlled task in which Llama-3.2 infers the parameters of a normal distribution from in-context samples. It shows that these posteriors appear as curved manifolds in representation space that evolve along the prompt, and that standard linear steering pushes representations off these manifolds to produce unintended coupled changes across parameters. Geometry-aware interventions, including linear field probing, keep changes within the target belief family. This setup demonstrates that LLM beliefs behave as geometric objects whose structure must be respected for precise control. If the finding holds, current linear abstractions of model internals become inadequate for describing or editing beliefs.

Core claim

In the controlled in-context learning setting, Llama-3.2 encodes parameter posteriors as curved manifolds in representation space. These manifolds trace the evolution of beliefs as evidence is added through the prompt. Linear steering moves points off the manifold and induces unintended shifts in multiple parameters simultaneously. Geometry-aware methods such as linear field probing tile the manifold and produce interventions that stay within the intended posterior family. The results indicate that LLM beliefs are inherently geometric and that globally linear representations often fail as abstractions.

What carries the argument

Curved manifolds in representation space that encode parameter posteriors, identified and navigated via linear field probing to enable geometry-respecting interventions.

If this is right

Geometry-aware interventions can alter one belief parameter without coupled changes to others.
Belief updates follow paths along the manifold as new evidence enters the prompt.
Globally linear steering directions are insufficient to isolate or control specific posteriors.
Representation space must be tiled locally rather than assumed flat for effective editing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Manifold structure may extend to other belief types such as factual or ethical posteriors, requiring similar geometry-aware tools.
Precise editing for alignment or unlearning could be improved by first mapping the relevant manifold before applying interventions.
The finding raises the possibility that curvature is a general feature of how evidence accumulates in transformer activations.

Load-bearing premise

The controlled in-context learning of normal distribution parameters by Llama-3.2 is representative of general belief formation and representation geometry across LLMs and tasks.

What would settle it

A demonstration that linear steering vectors in the same parameter-inference setup produce changes that remain strictly within the target belief family without measurable off-manifold displacement would falsify the claim.

read the original abstract

Large language models (LLMs) form implicit beliefs (posteriors over latent variables) from prompts, but we lack a mechanistic account of how these beliefs are encoded in representation space, how they update with new evidence, and how interventions reshape them. We study a controlled setting in which Llama-3.2 infers the parameters of a normal distribution from in-context samples. We show that parameter posteriors are encoded as curved manifolds in representation space, and trace how they evolve along the prompt. Standard linear steering moves representations off-manifold, inducing unintended, coupled changes, whereas geometry-aware methods preserve the target belief family. Our work demonstrates an example of linear field probing (LFP) as a principled approach to tile the data manifold and make interventions that respect the underlying geometry. Our results suggest that LLM beliefs are inherently geometric objects, and that globally linear representations are often inadequate abstractions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript studies how LLMs encode implicit beliefs as posteriors over latent variables. In a controlled task, Llama-3.2 infers the parameters of a normal distribution from in-context samples; the authors report that these posteriors appear as curved manifolds in representation space, trace their evolution along the prompt, and show that standard linear steering moves points off-manifold and produces coupled unintended changes, whereas geometry-aware interventions (via the introduced linear field probing method) preserve the target belief family. The work concludes that LLM beliefs are inherently geometric objects and that globally linear representations are often inadequate.

Significance. If the empirical observations hold, the paper would supply a concrete mechanistic example of non-linear structure in LLM belief representations and a practical method (LFP) for geometry-respecting interventions. The controlled normal-distribution task allows direct tracing of dynamics, which is a strength, and the contrast between linear and geometry-aware steering has clear implications for interpretability and model editing.

major comments (3)

[Abstract] Abstract: the claim that 'parameter posteriors are encoded as curved manifolds' is presented without any quantitative metrics (e.g., estimated manifold dimension, curvature statistics, or comparison to linear baselines), error bars, or data-exclusion criteria, so the strength of the central geometric claim cannot be evaluated from the provided text.
[Discussion] Discussion / Conclusion: the generalization that 'LLM beliefs are inherently geometric objects' and that linear representations are 'often inadequate' rests on a single generative process (in-context inference of Normal(μ,σ) parameters) and a single model family; no ablations across other distributions, non-distributional belief tasks, or additional models are reported, leaving the scope of the geometry claim untested.
[Methods] Methods: the linear field probing (LFP) procedure is introduced as a way to 'tile the data manifold,' but no explicit equations, algorithmic steps, or comparison to standard linear probing are supplied, making it impossible to verify how LFP enforces on-manifold interventions or differs from existing techniques.

minor comments (2)

All figures showing manifolds or trajectories should include quantitative annotations (e.g., curvature estimates, distance-to-manifold metrics) and error bars or confidence regions.
Clarify the precise definition of 'representation space' (layer, token position, pooling method) and how posterior samples are mapped to points on the claimed manifold.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight opportunities to strengthen the quantitative support, clarify scope, and improve methodological transparency. We address each major comment below and will incorporate revisions in the next version of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'parameter posteriors are encoded as curved manifolds' is presented without any quantitative metrics (e.g., estimated manifold dimension, curvature statistics, or comparison to linear baselines), error bars, or data-exclusion criteria, so the strength of the central geometric claim cannot be evaluated from the provided text.

Authors: We agree that the geometric claims require quantitative backing. In the revision we will add intrinsic dimension estimates (via PCA and Isomap), local curvature statistics from Hessian approximations, direct comparisons against linear baselines, error bars across random seeds, and explicit data-exclusion criteria tied to posterior convergence diagnostics. revision: yes
Referee: [Discussion] Discussion / Conclusion: the generalization that 'LLM beliefs are inherently geometric objects' and that linear representations are 'often inadequate' rests on a single generative process (in-context inference of Normal(μ,σ) parameters) and a single model family; no ablations across other distributions, non-distributional belief tasks, or additional models are reported, leaving the scope of the geometry claim untested.

Authors: We accept that the current evidence is confined to one controlled task and model family. The revised discussion and conclusion will explicitly qualify the generalization as a mechanistic case study, add a limitations subsection, and outline future extensions. Comprehensive ablations across tasks and models lie beyond the present scope and will be noted as such. revision: partial
Referee: [Methods] Methods: the linear field probing (LFP) procedure is introduced as a way to 'tile the data manifold,' but no explicit equations, algorithmic steps, or comparison to standard linear probing are supplied, making it impossible to verify how LFP enforces on-manifold interventions or differs from existing techniques.

Authors: We apologize for the omission. The revised methods section will supply the full optimization equations for LFP, a pseudocode algorithm, and a side-by-side comparison with standard linear probing that shows how the tangent-field constraint preserves manifold geometry. revision: yes

Circularity Check

0 steps flagged

Empirical manifold observations without derivation reducing to fitted inputs or self-citations

full rationale

The paper presents its core claims—that posteriors over normal-distribution parameters are encoded as curved manifolds in representation space, that linear steering induces off-manifold artifacts, and that geometry-aware interventions preserve the target family—as direct empirical findings from controlled in-context learning experiments on Llama-3.2. No equations, fitted parameters, or self-citations are shown that reduce these observations to the same data by construction or that import uniqueness results from the authors' prior work to force the geometry-aware conclusion. The derivation chain remains self-contained against external benchmarks, with the representativeness limitation noted separately as a generalization concern rather than a circularity issue.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that implicit beliefs are posteriors over latent variables and introduces the manifold geometry as the key descriptive entity without independent evidence outside the study.

axioms (1)

domain assumption LLMs form implicit beliefs (posteriors over latent variables) from prompts
Opening sentence of the abstract states this as the premise under study.

invented entities (1)

representation manifolds of posteriors no independent evidence
purpose: To describe the curved geometry in which parameter beliefs are encoded
Introduced as the load-bearing descriptive object; no falsifiable handle outside the reported experiments is given.

pith-pipeline@v0.9.0 · 5491 in / 1202 out tokens · 32115 ms · 2026-05-16T08:11:58.952590+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Foundation/BranchSelection.lean RCLCombiner_isCoupling_iff echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

parameter posteriors are encoded as curved manifolds in representation space... Standard linear steering moves representations off-manifold, inducing unintended, coupled changes, whereas geometry-aware methods preserve the target belief family
Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

linear field probes... tile the manifold... duality between encoding geometry and separability

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.