Bayesian Sphere-on-Sphere Regression with Optimal Transport Maps

Andrew Zammit-Mangion; Jiakun Liu; Kwok-Kun Kwong; Tin Lok James Ng

arxiv: 2501.08492 · v2 · pith:BYRZUWYLnew · submitted 2025-01-14 · 📊 stat.ME · math.ST· stat.TH

Bayesian Sphere-on-Sphere Regression with Optimal Transport Maps

Tin Lok James Ng , Kwok-Kun Kwong , Jiakun Liu , Andrew Zammit-Mangion This is my paper

Pith reviewed 2026-05-23 04:50 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH

keywords spherical regressionoptimal transportBayesian modelingsphere-on-spherepartitioninguncertainty quantificationclustering structure

0 comments

The pith

A Bayesian model partitions the sphere with optimal transport maps to fit distinct local regressions between spherical variables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method for sphere-on-sphere regression that avoids relying on one global mapping when relationships vary across the domain. It uses optimal transport to define partitions and places separate parametric maps inside each region, then fits everything jointly in a Bayesian model. This setup aims to identify heterogeneous regions, produce uncertainty estimates, and improve predictions on spherical data. A sympathetic reader would care because many scientific datasets involve directions or orientations where a uniform relationship fails to hold everywhere.

Core claim

The central claim is that jointly modeling spherical partitions via optimal transport maps and local parametric regression maps inside a Bayesian framework identifies heterogeneous regions on the sphere, supplies principled uncertainty quantification, and delivers strong predictive performance on real spherical data applications.

What carries the argument

Optimal transport maps that define spherical partitions, paired with distinct parametric regression maps fitted locally within each partition, all inferred jointly in a Bayesian model.

If this is right

The approach can locate regions of distinct behavior on the sphere without pre-specifying the partition boundaries.
Uncertainty estimates arise naturally from the joint posterior over partitions and local maps.
Real-data examples show interpretable clustering structure emerging from the inferred partitions.
Predictive performance improves relative to global mappings when relationships are heterogeneous.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same partitioning idea could be tested on regression problems defined on other manifolds where global maps are also known to be insufficient.
If the optimal transport cost is replaced by a simpler geometric criterion, the model might become faster while retaining similar clustering behavior.
The method supplies a concrete way to test whether a given spherical dataset truly requires multiple regimes rather than one.

Load-bearing premise

Partitioning the sphere with optimal transport maps and fitting separate regressions in each part will capture heterogeneous relationships better than any single global mapping while keeping the joint model tractable and identifiable.

What would settle it

On the same real spherical datasets, a single global spherical regression model would match or exceed the proposed method in predictive accuracy and uncertainty calibration.

read the original abstract

Spherical regression, in which both covariates and responses lie on the sphere, arises in many scientific applications and has attracted considerable methodological attention in recent years. Despite this progress, constructing flexible and expressive regression models between spherical domains remains challenging, particularly because a single global mapping is often insufficient to capture complex relationships across the entire sphere. A natural strategy is therefore to partition the spherical domain and allow distinct mappings within each region, though this introduces the additional challenge of modeling the partition structure itself. To address these issues, we propose an approach based on optimal transport to model spherical partitions, combined with parametric mappings defined locally within each region. We adopt a Bayesian framework to jointly model both the partitioning and the associated regression maps. This framework enables the identification of heterogeneous regions on the sphere while providing principled uncertainty quantification. Through real-data applications, we demonstrate that the proposed method achieves strong predictive performance, yields meaningful uncertainty estimates, and reveals interpretable clustering structure in spherical data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses optimal transport to drive Bayesian partitions for local sphere-on-sphere regressions, a workable way to handle heterogeneity that looks implementable but needs the full derivations checked.

read the letter

The punchline is that this work takes optimal transport maps to define partitions on the sphere and then runs separate parametric regressions inside each one, all inside a single Bayesian model. That combination is not standard in the directional statistics literature I know. It directly targets the problem that a single global map often fails when the relationship between two spherical variables changes across regions. The joint posterior over partitions and local maps is a reasonable way to get both uncertainty on the regions and on the fitted relationships. The real-data sections apparently show better predictive scores and some interpretable clusters, which is the evidence that matters for adoption. Credit is due for keeping the model parametric inside regions so that the whole thing stays computationally feasible rather than going fully nonparametric. The weakest part is the usual one for these partition-based approaches: it is not obvious how sensitive the results are to the number of regions chosen or to the particular cost function in the transport step, and the abstract gives no detail on how the sampler mixes when the partition structure is also random. If the full paper shows stable posterior behavior and reasonable run times on the examples, that concern stays minor. The math appears internally consistent from the description, with no obvious circularity or unstated identifiability problems. This paper is for people who already work with spherical or directional data in applications like meteorology, astronomy, or robotics. A reader who needs to move beyond global spherical regression will find the OT partitioning idea useful to try. It is solid enough on its own terms to deserve a serious referee rather than a desk reject, even if revisions will be needed on the computational and sensitivity checks.

Referee Report

2 major / 2 minor

Summary. The paper proposes a Bayesian sphere-on-sphere regression model that uses optimal transport maps to partition the spherical domain and fits distinct parametric regression mappings locally within each partition. A joint Bayesian framework is used to model both the partition structure and the local regressions, enabling identification of heterogeneous regions, principled uncertainty quantification, and interpretable clustering. Real-data applications are presented to demonstrate strong predictive performance relative to global alternatives.

Significance. If the joint model over OT partitions and local regressions is shown to be identifiable and computationally tractable while delivering the claimed gains in predictive accuracy and interpretability, the work would provide a flexible extension to existing spherical regression methods. The OT-based partitioning approach is a distinctive modeling choice that could support applications requiring region-specific mappings on the sphere.

major comments (2)

[§3.2, Eq. (8)] §3.2, Eq. (8): the claim that the joint posterior over partitions and local maps remains identifiable relies on the specific form of the OT cost and the prior on partition assignments; without an explicit identifiability argument or simulation study showing recovery of known partitions, it is unclear whether label-switching or degenerate partitions can occur under the stated model.
[§4.3, Table 4] §4.3, Table 4: the reported out-of-sample predictive scores for the proposed method versus the global baseline are given as point estimates only; the absence of standard errors or a formal test of improvement leaves open whether the gains are statistically distinguishable from sampling variability.

minor comments (2)

The notation distinguishing the global OT map from the local regression maps is introduced late; an early table or diagram summarizing all random quantities would improve readability.
[Figure 3] Figure 3 caption does not state the number of posterior samples used to generate the displayed credible regions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We address each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2, Eq. (8)] §3.2, Eq. (8): the claim that the joint posterior over partitions and local maps remains identifiable relies on the specific form of the OT cost and the prior on partition assignments; without an explicit identifiability argument or simulation study showing recovery of known partitions, it is unclear whether label-switching or degenerate partitions can occur under the stated model.

Authors: We thank the referee for this observation. The manuscript does not currently contain an explicit identifiability argument or a recovery simulation. Although the OT cost and prior on assignments are intended to discourage label-switching and degeneracy, we agree that direct evidence is needed. In the revision we will add a simulation study (new subsection in §3.2) that recovers known partitions under the model and discusses conditions that prevent label-switching. revision: yes
Referee: [§4.3, Table 4] §4.3, Table 4: the reported out-of-sample predictive scores for the proposed method versus the global baseline are given as point estimates only; the absence of standard errors or a formal test of improvement leaves open whether the gains are statistically distinguishable from sampling variability.

Authors: We agree that uncertainty quantification for the predictive scores is missing. In the revised manuscript we will recompute the scores in Table 4 with standard errors obtained via repeated random train-test splits. We will also add a paired statistical test (e.g., t-test on the per-fold differences) to assess whether the improvements over the global baseline are statistically significant. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The abstract and description present a modeling strategy that partitions the sphere via optimal transport and fits local parametric regressions inside a joint Bayesian framework. No equations, parameter-fitting steps, or self-citations are exhibited that reduce any claimed prediction or uniqueness result to the inputs by construction. The central claims concern empirical performance on real data, which are presented as external validation rather than internal derivations. The derivation chain is therefore self-contained against the supplied material.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Reviewed from abstract only; the method presupposes standard Bayesian posterior inference, the existence of a well-defined optimal transport map between spherical measures, and the adequacy of local parametric families inside each transport-defined region. No explicit free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5704 in / 1152 out tokens · 25663 ms · 2026-05-23T04:50:39.721887+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose casting spherical regression as a problem of optimal transport within a Bayesian framework... Factorization Model for Sphere-on-Sphere Regression (FMSOS) f(x) = R ◦ Sν(x)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 2... posterior contraction rate ϵn = n^{-τ/(2τ+s)} (log n)^t with τ=1/9

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.