Rao-Blackwellized Score Matching on Manifolds

Divit Rawal

arxiv: 2605.25567 · v2 · pith:K5NPMOM3new · submitted 2026-05-25 · 📊 stat.ML · cs.LG

Rao-Blackwellized Score Matching on Manifolds

Divit Rawal This is my paper

Pith reviewed 2026-06-29 20:44 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords denoising score matchingmanifoldsRao-BlackwellizationRiemannian scoreWeingarten mapRicci operatorscore-based models

0 comments

The pith

Conditioning on the nearest-point projection removes the singularity in the tangent denoising target for score matching on manifolds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses denoising score matching when data lies on an embedded manifold in higher-dimensional space, where ambient Gaussian noise creates a singular tangent target whose variance blows up as noise level goes to zero. By taking the conditional expectation of this target given the nearest-point projection onto the manifold, the singularity is removed and the result is shown to be the unique L2-optimal predictor among estimators that depend only on the projection. This canonical target is then expanded for small noise and shown to match the intrinsic Riemannian score plus explicit second-order corrections from an intrinsic Tweedie term and extrinsic curvature involving the Weingarten and Ricci operators. The approach recovers standard lower-dimensional DSM when the manifold is flat and produces specific simplifications on spheres.

Core claim

Conditioning on the nearest-point projection π(X) yields the unique L²-optimal Rao-Blackwellized predictor of the tangent DSM target; its small-noise expansion equals the intrinsic Riemannian score up to an explicit order-σ² correction decomposing into an intrinsic Tweedie term and an extrinsic curvature term involving the Weingarten and Ricci operators.

What carries the argument

The conditional expectation of the tangent DSM target given the nearest-point projection π(X), which serves as a sufficient statistic to eliminate the normal-fiber noise singularity.

If this is right

In the flat case the construction reduces exactly to ordinary lower-dimensional Gaussian DSM.
On the sphere S^d the extrinsic correction simplifies to the scalar factor (1-d/2) times the gradient of log q on the manifold.
This extrinsic σ² correction cancels identically on S², though the intrinsic Tweedie term remains.
The resulting estimator is the unique L²-optimal one among all that depend only on the projected observation π(X).

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners could use the projected observations directly in score estimation to achieve lower variance without needing the full ambient data.
The explicit curvature correction terms suggest ways to adjust score matching objectives for specific manifold geometries like spheres or other symmetric spaces.
Extensions might include deriving similar Rao-Blackwellized targets for other noise models or higher-order expansions.

Load-bearing premise

The latent distribution is supported on a smooth embedded manifold that admits a well-defined nearest-point projection in a tubular neighborhood.

What would settle it

For a known distribution on the sphere, compute the conditional expectation numerically at small but finite sigma and verify whether its deviation from the intrinsic score matches the predicted combination of Tweedie and curvature corrections.

Figures

Figures reproduced from arXiv: 2605.25567 by Divit Rawal.

**Figure 1.** Figure 1: Variance collapse on S 2 under vMF(µ, κ=2). Second moment of the raw target Tσ (black, slope −2 in log σ, matching d/σ2 with d = 2) versus the Rao-Blackwellized target rσ(π(X)) (blue, flat at the theoretical E ∥∇M log q∥ 2 ). The gap is the irreducible d/σ2 Bayes-risk floor of Section 4.3. are independent. Let pT .= q ∗ ϕ (d) σ be the d-dimensional Gaussian convolution of q on V . Proposition 5.1 (Flat Ca… view at source ↗

**Figure 2.** Figure 2: Extrinsic coefficient αext across manifolds. Quadrature estimates (one dot per σ ∈ {0.05, 0.06, 0.08}) are computed by Gauss-Hermite quadrature of rσ(z) and compared to the predicted coefficients αd = 1 − d/2 on S d and α = +1 2 on T 2 . Latent density q is von Mises-Fisher with κ = 2 on each S d and a wrapped Gaussian with κ = 1.5 on T 2 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: (a) Densities of z · µ at the equilibrium of three closed-form Langevin drifts (σ=0.3): the intrinsic score ∇M log q (black), the raw ambient-DSM drift (1 + σ 2αd)∇M log q with αd=1 − d/2 (orange), and its Theorem 5.4 debias (1 − σ 2αd)(1 + σ 2αd)∇M log q (blue). (b) Score MSE of the same MLP architecture and training budget regressing on the raw Tσ,i target (orange) versus the RaoBlackwellized rbσ,i targ… view at source ↗

read the original abstract

We study denoising score matching (DSM) when the latent distribution is supported on a smooth embedded manifold $M \subset \mathbb{R}^D$. Under ambient Gaussian corruption, the tangent denoising target contains a singular normal-fiber noise channel whose variance diverges as $d/\sigma^2$ as $\sigma \to 0^+$. We show that conditioning on the nearest-point projection $\pi(X)$ canonically removes this singularity: the resulting conditional expectation is the unique $L^2$-optimal Rao-Blackwellized predictor of the tangent DSM target among all estimators depending only on the projected observation $\pi(X)$. We then compute the small-noise expansion of this canonical target and show that it equals the intrinsic Riemannian score up to an explicit order-$\sigma^2$ correction that decomposes into an intrinsic Tweedie term and an extrinsic curvature term involving the Weingarten and Ricci operators. In the flat case, the construction reduces exactly to ordinary lower-dimensional Gaussian DSM, while on $S^d$ the extrinsic correction simplifies to the scalar factor $(1-d/2)\nabla_M \log q$; this extrinsic $\sigma^2$ correction cancels identically on $S^2$, though the intrinsic Tweedie term remains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Conditioning on nearest-point projection removes the singularity in manifold DSM and yields an explicit split of the σ² correction into Tweedie and Weingarten/Ricci terms.

read the letter

This paper shows how to Rao-Blackwellize the tangent denoising score matching target on an embedded manifold by conditioning on the nearest-point projection. That step removes the singular normal-fiber noise, and the resulting target has a clean small-noise expansion that splits into an intrinsic Tweedie piece plus extrinsic curvature terms from the Weingarten and Ricci operators.

The new part is the explicit decomposition and the claim that the conditional expectation is the unique L2-optimal predictor depending only on the projection. The reduction to the flat case and the simplification on the sphere are both useful checks. The construction looks consistent with standard differential geometry and small-noise asymptotics.

The main assumption is that the manifold sits in a tubular neighborhood where the projection is well-defined and unique. The stress-test note flags this correctly; without it the conditional expectation and the curvature operators are not guaranteed to exist as stated. If the paper only treats cases where this holds globally, that should be stated up front. The derivations themselves are not visible in the abstract, so a referee would want to see the steps for the uniqueness and the expansion.

This work is aimed at people building score-based generative models for data on manifolds. A reader who already knows DSM and basic Riemannian geometry will find the formulas directly usable.

It is worth sending to peer review. The idea addresses a concrete technical problem in the area and the geometric expansion is worth checking in detail.

Referee Report

1 major / 2 minor

Summary. The manuscript develops Rao-Blackwellized denoising score matching (DSM) for distributions supported on a smooth embedded manifold M ⊂ R^D. Under ambient Gaussian noise, the tangent DSM target has a singular normal-fiber component whose variance diverges as d/σ². Conditioning on the nearest-point projection π(X) removes this singularity, yielding the unique L²-optimal predictor of the tangent DSM target among all estimators that depend only on π(X). The small-noise expansion of this conditional target equals the intrinsic Riemannian score plus an explicit O(σ²) correction that decomposes into an intrinsic Tweedie term and an extrinsic curvature term involving the Weingarten and Ricci operators. The construction recovers ordinary lower-dimensional Gaussian DSM in the flat case and simplifies on S^d, with the extrinsic correction vanishing identically on S².

Significance. If the derivations are correct, the work supplies a canonical, projection-based route to score matching on manifolds that eliminates the ambient-space singularity while furnishing an explicit, geometrically interpretable bias correction. The uniqueness claim, the decomposition into Tweedie and curvature contributions, and the exact recovery of known special cases (flat space, S^d) are substantive contributions that could guide the design of manifold-aware score estimators with quantifiable finite-noise error.

major comments (1)

[Setup and main assumption (abstract and §2)] The tubular-neighborhood assumption (stated in the abstract and required for π to be a smooth retraction and for the Weingarten map to be defined) is load-bearing for both the singularity removal and the curvature correction. The manuscript should state the precise regularity conditions on M that guarantee a tubular neighborhood of uniform radius in which π is C^∞, and verify that the conditional expectation E[tangent DSM target | π(X)] is well-defined as a function on M under these conditions.

minor comments (2)

[Small-noise expansion (likely §4)] In the expansion statement, clarify whether the O(σ²) remainder is uniform in the base point or only pointwise; this affects the practical utility of the correction term.
[Sphere example] The S^d example would be strengthened by an explicit computation of the Weingarten and Ricci contributions to confirm the claimed cancellation on S².

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation and for identifying the need to make the regularity assumptions fully explicit. We address the single major comment below.

read point-by-point responses

Referee: [Setup and main assumption (abstract and §2)] The tubular-neighborhood assumption (stated in the abstract and required for π to be a smooth retraction and for the Weingarten map to be defined) is load-bearing for both the singularity removal and the curvature correction. The manuscript should state the precise regularity conditions on M that guarantee a tubular neighborhood of uniform radius in which π is C^∞, and verify that the conditional expectation E[tangent DSM target | π(X)] is well-defined as a function on M under these conditions.

Authors: We agree that the tubular-neighborhood hypothesis is central and that its precise statement should appear in the main text. In the revision we will add to §2 the following conditions: M is a compact, boundaryless C^∞ embedded submanifold of R^D with positive reach. These conditions guarantee a uniform tubular radius r>0 on which the nearest-point projection π is C^∞. Under the same hypotheses the conditional expectation E[tangent DSM target | π(X)] is well-defined and C^∞ as a map on M, because the ambient Gaussian density is positive everywhere, π is a smooth retraction, and the tangent DSM target remains integrable with respect to the conditional law of the normal fiber (a short integrability argument will be supplied in an appendix paragraph). revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper derives the Rao-Blackwellized target via conditioning on the nearest-point projection and a small-noise asymptotic expansion using standard differential geometry (Weingarten map, Ricci operators). The uniqueness claim invokes the standard L2 property of conditional expectation, which is an external theorem and does not reduce the result to a self-definition or fitted input. No self-citations, parameter fitting presented as prediction, or ansatz smuggling appear. The tubular-neighborhood assumption is stated explicitly as a prerequisite rather than derived internally. The construction is self-contained against external benchmarks in probability and Riemannian geometry.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The construction rests on standard domain assumptions from differential geometry and statistics; no free parameters are introduced and no new entities are postulated.

axioms (2)

domain assumption M is a smooth embedded submanifold of R^D admitting a nearest-point projection in a tubular neighborhood
Required for π(X) to be well-defined and for the Weingarten map and curvature operators to exist.
domain assumption Noise model is isotropic ambient Gaussian corruption
Standard DSM noise model used to derive the tangent target and its singularity.

pith-pipeline@v0.9.1-grok · 5736 in / 1560 out tokens · 43065 ms · 2026-06-29T20:44:29.928642+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 1 canonical work pages · 1 internal anchor

[1]

org/CorpusID:23284154

URL https://api.semanticscholar. org/CorpusID:23284154. Fan, J. Design-adaptive nonparametric regression.Journal of the American Statistical Association, 87:998–1004,
[2]

Denoising Diffusion Probabilistic Models

URL https://api.semanticscholar. org/CorpusID:53587425. Federer, H. Curvature measures.Transactions of the Amer- 7 Rao-Blackwellized Score Matching on Manifolds ican Mathematical Society, 93(3):418–491, 1959. doi: 10.1090/S0002-9947-1959-0110078-1. Ho, J., Jain, A., and Abbeel, P. Denoising diffusion prob- abilistic models, 2020. URL https://arxiv.org/ ab...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1090/s0002-9947-1959-0110078-1 1959
[3]

Id. G. A Finite-Sample Rate for Local-Averaging Estimation ofr σ Equation (10) and Theorem 4.2 are population statements: they describe rσ and its variance under the true joint law of (Z, X). This appendix gives a corresponding finite-sample statement that connects our variance-collapse identity to an explicit estimation rate. Let(Z 1, X1), . . . ,(ZN , X...

1992

[1] [1]

org/CorpusID:23284154

URL https://api.semanticscholar. org/CorpusID:23284154. Fan, J. Design-adaptive nonparametric regression.Journal of the American Statistical Association, 87:998–1004,

[2] [2]

Denoising Diffusion Probabilistic Models

URL https://api.semanticscholar. org/CorpusID:53587425. Federer, H. Curvature measures.Transactions of the Amer- 7 Rao-Blackwellized Score Matching on Manifolds ican Mathematical Society, 93(3):418–491, 1959. doi: 10.1090/S0002-9947-1959-0110078-1. Ho, J., Jain, A., and Abbeel, P. Denoising diffusion prob- abilistic models, 2020. URL https://arxiv.org/ ab...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1090/s0002-9947-1959-0110078-1 1959

[3] [3]

Id. G. A Finite-Sample Rate for Local-Averaging Estimation ofr σ Equation (10) and Theorem 4.2 are population statements: they describe rσ and its variance under the true joint law of (Z, X). This appendix gives a corresponding finite-sample statement that connects our variance-collapse identity to an explicit estimation rate. Let(Z 1, X1), . . . ,(ZN , X...

1992