TopoGeoScore: A Self-Supervised Source-Only Geometric Framework for OOD Checkpoint Selection

Ali Zia; Farid Hazratian; Hien Duy Nguyen

arxiv: 2605.08870 · v2 · pith:ZXMK4WUXnew · submitted 2026-05-09 · 💻 cs.LG · math.AT· math.DG

TopoGeoScore: A Self-Supervised Source-Only Geometric Framework for OOD Checkpoint Selection

Farid Hazratian , Ali Zia , Hien Duy Nguyen This is my paper

Pith reviewed 2026-05-12 01:03 UTC · model grok-4.3

classification 💻 cs.LG math.ATmath.DG

keywords OOD robustnesscheckpoint selectionself-supervised learningtopological data analysisgeometric scoringsource-onlydistribution shiftk-NN graphs

0 comments

The pith

Source embeddings encode global, local, and topological signals that identify which checkpoints will remain accurate under distribution shift.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TopoGeoScore to select model checkpoints likely to generalize well on unseen target domains when only source data is available. It builds class-conditional k-nearest-neighbor graphs from source embeddings and extracts three families of features: a global measure of manifold complexity from the reduced Laplacian, local regularity from Ollivier-Ricci curvature, and higher-order topological summaries of connectivity and loops. These features are combined into a non-negative linear score whose weights are learned by a self-supervised objective that keeps the score stable under geometry-preserving transformations while penalizing structure-breaking ones. If the central claim holds, practitioners can rank and pick checkpoints before any target samples arrive, turning an otherwise blind deployment choice into a data-driven ranking based on intrinsic source geometry.

Core claim

Given a trained checkpoint, class-conditional mutual k-NN graphs constructed from its source embeddings yield three complementary signals: a torsion-inspired reduced Laplacian log-determinant that quantifies global class-manifold complexity, Ollivier-Ricci curvature that quantifies local neighborhood regularity, and persistent-homology summaries that capture fragmented connectivity, loops, and global-local inconsistency. These signals are assembled into an interpretable non-negative linear score whose coefficients are learned by a self-supervised objective enforcing invariance to approximately geometry-preserving embedding views and separation from structure-breaking views. The resulting Top

What carries the argument

TopoGeoScore, a learned non-negative linear combination of global manifold complexity, local curvature, and higher-order topological invariants extracted from class-conditional k-NN graphs on source embeddings.

If this is right

Checkpoints can be ranked and selected for deployment using only source-domain representations and no target samples or labels.
The selected checkpoints improve accuracy on CIFAR corruption suites, ImageNet-C, MNLI-to-HANS transfer, and OGBN-Arxiv under distribution shift.
Global manifold complexity, local curvature, and topological inconsistency together supply measurable evidence of robustness inside source embeddings.
The scoring procedure remains fully interpretable because each component of the linear combination corresponds to a distinct geometric or topological property.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If source geometry reliably signals robustness, then monitoring these same invariants during training could serve as an early-stopping criterion for robustness.
The same graph-construction and feature-extraction pipeline might be applied to other representation spaces such as language-model hidden states or graph-neural-network embeddings.
Explicit regularization of the three topological quantities inside the training loss could directly encourage robustness rather than merely detecting it after training.
The approach suggests that robustness under shift is partly a property of the embedding manifold's intrinsic geometry rather than solely of the decision boundary.

Load-bearing premise

The self-supervised objective that rewards invariance under geometry-preserving embedding views actually selects for genuine OOD robustness rather than some other incidental property of the source embeddings.

What would settle it

A controlled experiment in which TopoGeoScore ranks a set of checkpoints from the same training run yet the highest-scoring checkpoints achieve lower accuracy on multiple held-out corruption and shift benchmarks than lower-scoring ones.

Figures

Figures reproduced from arXiv: 2605.08870 by Ali Zia, Farid Hazratian, Hien Duy Nguyen.

**Figure 1.** Figure 1: TOPOGEOSCORE ranks checkpoints using only labelled source data. For each checkpoint, source embeddings are converted into class-conditional mutual k-NN graphs, from which global complexity, local regularity, and higher-order topological signals are extracted. Non-negative feature weights are learned through a source-only invariance–separation objective, yielding a lower-is-better score for expected OOD rob… view at source ↗

**Figure 2.** Figure 2: Training dynamics plots [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Severity sweep of all metrics on CIFAR-10-C. [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Small-multiples scatter grid: each panel shows one top metric versus OOD accuracy; points [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Joint view of |ρ| (marginal predictive power) and ∆R2 (incremental power beyond torsion + Ricci). Shows which metrics add information that torsion/Ricci miss [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Horizontal bar chart of Spearman ρ for every metric versus OOD accuracy, with Logdet_reduced_Lc shown as the baseline; metrics are sorted and family-colored. Scope of the OGBN claim. This setting reports two GCN seeds with 21 checkpoints each. We present it as graph-modality evidence at the family level, not as a stand-alone metric ranking. The full per-metric partial correlations, source-only versus sourc… view at source ↗

**Figure 7.** Figure 7: Line plot of accuracy on CIFAR-10.1 and CIFAR-10.2 across epochs, with [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Top-6 metrics × 2 OOD targets scatter grid, with points labeled by epoch. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Metric-wise Spearman correlations with ID accuracy, OOD accuracy, and the ID–OOD gap [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: 5 variants × top-10 metrics, with z-scored values and ρ with respect to the ID–OOD gap shown above each column. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗

**Figure 11.** Figure 11: ImageNet validation accuracy and ImageNet-C mean accuracy for each variant, with the [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

**Figure 12.** Figure 12: MNLI-m and HANS accuracy for each model, with the ID–OOD gap annotated. [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

**Figure 13.** Figure 13: Metric correlations on MNLI→HANS (N=5 models). Bars show Spearman ρ between each metric and in-distribution accuracy (MNLI), out-of-distribution accuracy (HANS), and the generalization gap (MNLI−HANS). While many metrics strongly correlate with MNLI accuracy, their alignment with HANS accuracy is weaker and sometimes inconsistent. In contrast, correlations with the generalization gap are consistently stro… view at source ↗

**Figure 14.** Figure 14: 5 models × top-10 metrics heatmap, z-scored values, ρ-vs-gap headers. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗

**Figure 15.** Figure 15: GeoScore diagnostic: disagreement between torsion and Ricci signals across checkpoints. [PITH_FULL_IMAGE:figures/full_fig_p024_15.png] view at source ↗

**Figure 16.** Figure 16: Multi-seed training trajectory. Two seeds are overlaid with different line styles; geometry [PITH_FULL_IMAGE:figures/full_fig_p024_16.png] view at source ↗

**Figure 17.** Figure 17: Matrix of absolute Spearman correlations among the top-12 metrics. Cells with [PITH_FULL_IMAGE:figures/full_fig_p025_17.png] view at source ↗

**Figure 18.** Figure 18: For each top-20 metric, four Spearman correlations are shown: marginal, partial controlling for epoch, partial controlling for validation accuracy, and partial controlling for both. Metrics are sorted by the strictest control condition. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_18.png] view at source ↗

**Figure 19.** Figure 19: Per-metric paired bars: ρ on src_* (train nodes only) vs. ρ on srcval_* (train+val nodes), against test_acc. ∆ values are annotated. If ∆ is small, including validation nodes does not materially change the source-only signal; if ∆ is large, validation-node information affects the geometry signal [PITH_FULL_IMAGE:figures/full_fig_p027_19.png] view at source ↗

**Figure 20.** Figure 20: Per-metric scatter of per-seed Spearman correlations, with a mean line. The significance [PITH_FULL_IMAGE:figures/full_fig_p027_20.png] view at source ↗

**Figure 21.** Figure 21: Metric values under structural controls for the top- [PITH_FULL_IMAGE:figures/full_fig_p028_21.png] view at source ↗

**Figure 22.** Figure 22: Control-collapse analysis of metric–OOD accuracy correlation (Spearman [PITH_FULL_IMAGE:figures/full_fig_p029_22.png] view at source ↗

**Figure 23.** Figure 23: Spearman correlation (ρ) between geometry-based metrics and OOD accuracy for ConvNeXt-T on CIFAR, evaluated across two targets (CIFAR-10.1 and CIFAR-10.2) with N = 6 checkpoints. Most metrics exhibit strong and consistent correlations across both targets, with stable sign and magnitude, indicating that the induced ranking signal is largely target-agnostic. Positive correlations are observed for spectral a… view at source ↗

**Figure 24.** Figure 24: Per-epoch trajectory on OGBN-Arxiv (single seed). Each subplot shows a geometry-based [PITH_FULL_IMAGE:figures/full_fig_p031_24.png] view at source ↗

read the original abstract

Out-of-distribution (OOD) robustness is difficult to diagnose when target-domain labels are unavailable. We consider a more restrictive source-only variant of unsupervised accuracy estimation: selecting robust checkpoints using only source-domain representations, with no target samples or target labels. We propose \textbf{TopoGeoScore}, a source-only geometric scorer for label-free OOD checkpoint selection. Given a trained checkpoint, we construct class-conditional mutual $k$-nearest-neighbour graphs from source embeddings and extract three interpretable signals: a torsion-inspired reduced Laplacian log-determinant for global class-manifold complexity, Ollivier--Ricci curvature for local neighbourhood regularity, and higher-order topological summaries for fragmented connectivity, loops, and global--local inconsistency. Instead of fixing their weights by hand, TopoGeoScore learns a non-negative linear score through a self-supervised objective that enforces invariance under approximately geometry-preserving embedding views and separation from structure-breaking views. The score remains interpretable and uses no target-domain samples or labels. Results across CIFAR-based corruption and distribution-shift benchmarks, ImageNet-C, MNLI$\to$HANS transfer, and OGBN-Arxiv suggest that source representations contain measurable global--local--topological evidence of robustness, supporting practical checkpoint selection before deployment under distribution shift.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TopoGeoScore combines Laplacian, curvature and topology into a self-supervised source-only scorer for OOD checkpoint selection, but the link to actual robustness rests on an assumption that may not hold.

read the letter

The paper gives a concrete method to rank checkpoints for out-of-distribution performance when no target data or labels are available. It builds class-conditional k-NN graphs on source embeddings, then pulls three signals: a reduced Laplacian log-determinant for global manifold complexity, Ollivier-Ricci curvature for local regularity, and higher-order topological summaries for loops and fragmentation. These are combined with non-negative weights learned from a self-supervised loss that rewards invariance under geometry-preserving embedding views and penalizes structure-breaking ones. The score stays fully source-only and keeps some interpretability.

Referee Report

2 major / 2 minor

Summary. The paper proposes TopoGeoScore, a source-only geometric framework for selecting OOD-robust model checkpoints without target samples or labels. It constructs class-conditional mutual kNN graphs from source embeddings, extracts three signals (torsion-inspired reduced Laplacian log-determinant for global manifold complexity, Ollivier-Ricci curvature for local regularity, and higher-order topological summaries for connectivity and loops), and learns non-negative linear weights via a self-supervised objective that enforces invariance under approximately geometry-preserving embedding views while separating from structure-breaking views. Experiments are claimed on CIFAR corruption/shift benchmarks, ImageNet-C, MNLI to HANS, and OGBN-Arxiv.

Significance. If the central claim holds, the work offers a practical, interpretable tool for pre-deployment checkpoint selection under distribution shift using only source data. Strengths include the combination of global-local-topological features and the self-supervised weight learning that avoids hand-tuning or target supervision. This could complement existing OOD methods if the geometric invariants prove predictive of robustness rather than incidental source stability.

major comments (2)

[Abstract and §3] Abstract and method description: The self-supervised objective enforces invariance only under source-internal, approximately geometry-preserving embedding views. Nothing in the construction ensures these invariants align with the specific manifold distortions induced by the target shifts (CIFAR corruptions, ImageNet-C, MNLI→HANS, OGBN-Arxiv). This is load-bearing for the claim that the score selects for actual OOD robustness; an explicit correlation analysis or ablation linking the learned score to measured OOD accuracy (rather than just selection success) is required.
[§4] §4 (Experiments): The abstract states that results 'suggest that source representations contain measurable global--local--topological evidence of robustness' across benchmarks, but supplies no quantitative metrics, baselines, error bars, or ablation details on the contribution of each geometric signal. Without these, it is impossible to verify whether the topological summaries are load-bearing or whether the method outperforms simpler alternatives.

minor comments (2)

[§3.2] Clarify the precise construction of 'approximately geometry-preserving' vs. 'structure-breaking' views in the self-supervised loss (including any hyperparameters such as k in the mutual kNN graph).
[Figures in §4] Ensure all figures showing graph-based features include axis labels, legends, and statistical significance markers for the reported trends.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each major comment below and will incorporate revisions to strengthen the presentation and empirical support for our claims.

read point-by-point responses

Referee: [Abstract and §3] Abstract and method description: The self-supervised objective enforces invariance only under source-internal, approximately geometry-preserving embedding views. Nothing in the construction ensures these invariants align with the specific manifold distortions induced by the target shifts (CIFAR corruptions, ImageNet-C, MNLI→HANS, OGBN-Arxiv). This is load-bearing for the claim that the score selects for actual OOD robustness; an explicit correlation analysis or ablation linking the learned score to measured OOD accuracy (rather than just selection success) is required.

Authors: We agree that an explicit demonstration of alignment between the learned geometric invariants and OOD robustness is important for supporting the central claim. The self-supervised objective is constructed to identify weights that preserve geometric properties under views that approximate plausible shifts, but we acknowledge that this does not automatically guarantee correspondence to the specific distortions in the target benchmarks. In the revised version, we will add to §4 an explicit correlation analysis (e.g., Pearson or Spearman coefficients and scatter plots) between TopoGeoScore values and measured OOD accuracy across checkpoints on each benchmark, together with component-wise ablations that quantify how each geometric signal contributes to the observed selection performance. These additions will directly address whether the score captures robustness-relevant structure rather than source-only stability. revision: yes
Referee: [§4] §4 (Experiments): The abstract states that results 'suggest that source representations contain measurable global--local--topological evidence of robustness' across benchmarks, but supplies no quantitative metrics, baselines, error bars, or ablation details on the contribution of each geometric signal. Without these, it is impossible to verify whether the topological summaries are load-bearing or whether the method outperforms simpler alternatives.

Authors: We accept this criticism and agree that the experimental section would benefit from greater quantitative detail and transparency. While the manuscript reports selection performance on the listed benchmarks, we will revise §4 to include full tables of quantitative metrics (selection accuracy, mean OOD accuracy of selected checkpoints), comparisons against explicit baselines (e.g., embedding-norm scoring, single-signal geometric scores, and random selection), error bars obtained from multiple independent runs or seeds, and systematic ablation tables that isolate the contribution of the torsion-inspired Laplacian log-determinant, Ollivier-Ricci curvature, and higher-order topological summaries. These revisions will allow readers to assess whether the topological components are load-bearing and whether TopoGeoScore improves upon simpler alternatives. revision: yes

Circularity Check

0 steps flagged

No significant circularity; self-supervised weights learned on source data with empirical OOD validation

full rationale

The paper defines TopoGeoScore as a non-negative linear combination of three geometric measures (Laplacian log-det, Ollivier-Ricci curvature, topological summaries) extracted from source embeddings. Weights are obtained via a self-supervised objective that penalizes deviation under source-internal geometry-preserving views. This construction uses only source data and contains no target robustness labels or OOD samples by design. The central claim—that the resulting score selects robust checkpoints—is presented as an empirical hypothesis tested on external benchmarks (CIFAR corruptions, ImageNet-C, MNLI→HANS, OGBN-Arxiv). No step reduces the claimed correlation to a definitional equivalence, fitted input renamed as prediction, or load-bearing self-citation chain. The method is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from prior author work.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that geometric and topological properties extracted from source embeddings correlate with OOD robustness and that a self-supervised invariance objective can recover useful weights without any target information.

free parameters (2)

k in mutual k-nearest-neighbour graph
Hyperparameter controlling graph construction from source embeddings
non-negative linear weights
Learned via the self-supervised objective rather than fitted to target labels

axioms (2)

domain assumption Source-domain class-conditional embeddings contain global-local-topological signals that are predictive of robustness under distribution shift
Invoked as the justification for using the three geometric features as inputs to the scorer
domain assumption Approximately geometry-preserving embedding views can be generated without target data
Required for the self-supervised training signal

pith-pipeline@v0.9.0 · 5536 in / 1471 out tokens · 60227 ms · 2026-05-12T01:03:55.107913+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean, IndisputableMonolith/Cost/FunctionalEquation.lean alexander_duality_circle_linking, washburn_uniqueness_aczel, reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

construct class-conditional mutual k-nearest-neighbour graphs ... torsion-inspired reduced Laplacian log-determinant ... Ollivier–Ricci curvature ... higher-order topological summaries ... self-supervised objective that enforces invariance under approximately geometry-preserving embedding views

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.