Geometric Stability: The Missing Axis of Representations
Pith reviewed 2026-05-16 14:57 UTC · model grok-4.3
The pith
Geometric stability measures how reliably a representation's pairwise distance structure holds under perturbation, separate from similarity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Shesha quantifies geometric stability as the split-half correlation of RDMs from complementary feature subsets. Unlike CKA and Procrustes, Shesha is not invariant to orthogonal transformations of the feature space, so it registers compression-induced damage to distance structure that similarity metrics overlook. Spectral analysis shows stability retains sensitivity across the eigenspectrum after top components are removed. Across domains, stability and similarity prove empirically independent, arising from opposing effects of different transformations.
What carries the argument
Shesha, the split-half correlation of representational dissimilarity matrices from complementary feature subsets, which tracks self-consistency of pairwise distances under feature perturbation.
Load-bearing premise
Split-half correlations of RDMs from feature subsets meaningfully quantify robustness to general perturbations rather than capturing only subset-specific artifacts.
What would settle it
An orthogonal transformation of the feature space that preserves all pairwise distances but alters manifold curvature, after which Shesha scores change while CKA remains fixed.
Figures
read the original abstract
Representational similarity analysis and related methods have become standard tools for comparing the internal geometries of neural networks and biological systems. These methods measure what is represented, the alignment between two representational spaces, but not whether that structure is robust. We introduce geometric stability, a distinct dimension of representational quality that quantifies how reliably a representation's pairwise distance structure holds under perturbation. Our metric, Shesha, measures self-consistency through split-half correlation of representational dissimilarity matrices constructed from complementary feature subsets. A key formal property distinguishes stability from similarity: Shesha is not invariant to orthogonal transformations of the feature space, unlike CKA and Procrustes, enabling it to detect compression-induced damage to manifold structure that similarity metrics cannot see. Spectral analysis reveals the mechanism: similarity metrics collapse after removing the top principal component, while stability retains sensitivity across the eigenspectrum. Across 2463 encoder configurations in seven domains -- language, vision, audio, video, protein sequences, molecular profiles, and neural population recordings -- stability and similarity are empirically uncorrelated ($\rho=-0.01$). A regime analysis shows this independence arises from opposing effects: geometry-preserving transformations make the metrics redundant, while compression makes them anti-correlated, canceling in aggregate. Applied to 94 pretrained models across 6 datasets, stability exposes a "geometric tax": DINOv2, the top-performing model for transfer learning, ranks last in geometric stability on 5/6 datasets. Contrastive alignment and hierarchical architecture predict stability, providing actionable guidance for model selection in deployment contexts where representational reliability matters.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces geometric stability as a distinct dimension of representational quality separate from similarity, quantified by the Shesha metric: split-half correlation of representational dissimilarity matrices (RDMs) built from complementary feature subsets. It claims Shesha is not invariant to orthogonal transformations of the feature space (unlike CKA and Procrustes), enabling detection of compression-induced manifold damage. Spectral analysis is said to show similarity metrics collapsing after top-PC removal while stability retains eigenspectrum sensitivity. Large-scale experiments on 2463 encoders across seven domains report near-zero correlation (ρ=-0.01) between stability and similarity, with regime analysis attributing this to opposing effects under geometry-preserving vs. compression transformations. Applied to 94 pretrained models, stability reveals a 'geometric tax' where DINOv2 ranks last on 5/6 datasets, and contrastive/hierarchical designs predict higher stability.
Significance. If the core distinction holds, the work provides a new evaluation axis for representational reliability under perturbation, with potential value for model selection in deployment settings where robustness matters. The scale of the empirical study (multiple domains, pretrained-model sweep) and the reported uncorrelation are strengths that could influence how the field assesses learned geometries beyond alignment metrics.
major comments (3)
- [Abstract] Abstract: The central claim that Shesha detects compression-induced damage to manifold structure because it is not invariant to orthogonal transformations rests on the fixed partitioning of features into complementary subsets. Because pairwise distances are preserved under orthogonal transforms, any observed non-invariance arises solely from the arbitrary grouping of dimensions; an orthogonal rotation mixes dimensions and changes which pairs fall into each half. Without evidence that this split-specific sensitivity corresponds to intrinsic geometric properties rather than partitioning artifacts, the claimed distinction from CKA/Procrustes does not establish that Shesha quantifies general robustness to perturbations.
- [Abstract] Abstract (spectral analysis paragraph): The statement that 'similarity metrics collapse after removing the top principal component, while stability retains sensitivity across the eigenspectrum' is presented as revealing the mechanism, yet the manuscript provides no explicit derivation, perturbation protocol, or error analysis for how the split-half RDM correlation behaves under progressive PC removal. This makes it impossible to verify whether the retained sensitivity is a genuine property of the metric or an artifact of the complementary-subset construction.
- [Abstract] Abstract (regime analysis): The claim that independence arises from 'opposing effects' (geometry-preserving transformations making metrics redundant, compression making them anti-correlated) is load-bearing for the uncorrelation result (ρ=-0.01). The text does not specify the exact transformations, compression levels, or statistical controls used to isolate these regimes, leaving open whether the cancellation is robust or sensitive to the particular choice of splits and datasets.
minor comments (2)
- [Abstract] The term 'geometric tax' is introduced in the abstract without a formal definition or operationalization; a brief clarifying sentence would improve readability.
- [Abstract] The abstract reports results on 2463 encoder configurations and 94 pretrained models but does not indicate whether error bars, multiple-split variability, or cross-validation of the Shesha computation are provided in the main text or supplements.
Simulated Author's Rebuttal
We thank the referee for their insightful comments, which have helped us clarify the key distinctions in our work. We address each major comment below and have made substantial revisions to the manuscript to provide the requested derivations, specifications, and evidence.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that Shesha detects compression-induced damage to manifold structure because it is not invariant to orthogonal transformations rests on the fixed partitioning of features into complementary subsets. Because pairwise distances are preserved under orthogonal transforms, any observed non-invariance arises solely from the arbitrary grouping of dimensions; an orthogonal rotation mixes dimensions and changes which pairs fall into each half. Without evidence that this split-specific sensitivity corresponds to intrinsic geometric properties rather than partitioning artifacts, the claimed distinction from CKA/Procrustes does not establish that Shesha quantifies general robustness to perturbations.
Authors: The referee correctly identifies that the non-invariance arises from the fixed partitioning. However, this partitioning is not arbitrary in the sense that it is held constant across comparisons, allowing Shesha to measure the consistency of the distance structure under transformations that redistribute information across dimensions. This directly captures robustness to compression, which unevenly affects feature subsets. To address the concern about intrinsic properties, we have added a theoretical analysis in Section 3.2 demonstrating that Shesha's sensitivity corresponds to the condition number of the feature covariance matrix, providing evidence beyond partitioning artifacts. We also include experiments with multiple random partitions showing stable rankings. revision: yes
-
Referee: [Abstract] Abstract (spectral analysis paragraph): The statement that 'similarity metrics collapse after removing the top principal component, while stability retains sensitivity across the eigenspectrum' is presented as revealing the mechanism, yet the manuscript provides no explicit derivation, perturbation protocol, or error analysis for how the split-half RDM correlation behaves under progressive PC removal. This makes it impossible to verify whether the retained sensitivity is a genuine property of the metric or an artifact of the complementary-subset construction.
Authors: We agree that the abstract lacked the necessary details for verification. In the revised version, we have expanded the spectral analysis section with an explicit derivation: under PC removal, the RDM for each half is recomputed using the remaining components, and the correlation is derived as a function of the eigenvalue distribution. The perturbation protocol involves removing the top k PCs for k from 1 to full rank, with error analysis via 100 bootstrap samples over data points. New figures show that stability's retained sensitivity is due to its use of complementary subsets, which preserve lower-eigenvalue information differently than full-space similarity metrics. revision: yes
-
Referee: [Abstract] Abstract (regime analysis): The claim that independence arises from 'opposing effects' (geometry-preserving transformations making metrics redundant, compression making them anti-correlated) is load-bearing for the uncorrelation result (ρ=-0.01). The text does not specify the exact transformations, compression levels, or statistical controls used to isolate these regimes, leaving open whether the cancellation is robust or sensitive to the particular choice of splits and datasets.
Authors: We have revised the regime analysis to fully specify the protocol. Geometry-preserving transformations include random orthogonal rotations (via QR decomposition) and feature permutations. Compression is implemented via PCA truncation at levels retaining 10%, 25%, 50%, and 75% of variance, plus additive Gaussian noise at varying SNRs. Statistical controls include averaging over 50 random splits per configuration, with significance tested via permutation tests (p<0.001 for the opposing effects). Supplementary material now includes the full set of transformations and confirms the ρ=-0.01 is robust across domains and split choices. revision: yes
Circularity Check
No significant circularity; Shesha's properties follow from explicit definition without reduction to inputs by construction
full rationale
The paper introduces Shesha as a metric defined directly via split-half correlation of RDMs from complementary feature subsets. The non-invariance to orthogonal transformations is stated as a formal property arising from the fixed partitioning in the definition, not derived from prior results or fits. The empirical uncorrelation with similarity metrics (ρ=-0.01) is presented as an observation across datasets, not forced by the construction. No self-citations, ansatzes, or fitted predictions are invoked in a load-bearing manner for the central claims about geometric stability. The derivation chain is self-contained, with results on model rankings and predictors being observational rather than circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Representational dissimilarity matrices from feature subsets capture meaningful geometric structure
invented entities (2)
-
geometric stability
no independent evidence
-
Shesha metric
no independent evidence
Forward citations
Cited by 5 Pith papers
-
Geometric Phase Transition Enables Extreme Hippocampal Memory Capacity
A geometric phase transition produces crystalline hippocampal coding in food-caching birds that yields over 100-fold higher location memory capacity than the mist-like coding in non-caching birds.
-
Geometric coherence of single-cell CRISPR perturbations reveals regulatory architecture and predicts cellular stress
Shesha quantifies directional coherence of single-cell CRISPR responses as mean cosine similarity of shift vectors, correlating with magnitude while identifying pleiotropic regulators and stress associations across fi...
-
Geometric coherence of single-cell CRISPR perturbations reveals regulatory architecture and predicts cellular stress
Shesha quantifies directional coherence of single-cell CRISPR responses, correlates strongly with effect magnitude, distinguishes pleiotropic from lineage-specific regulators, and predicts chaperone activation after m...
-
The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models
Discrete tokenization in scientific foundation models imposes a geometric alignment tax that distorts continuous manifolds, with continuous heads reducing distortion by up to 8.5x and exposing three failure regimes in...
-
From Syntax to Semantics: Geometric Stability as the Missing Axis of Perturbation Biology
Geometric stability, defined as the directional coherence of cellular responses to perturbation, provides a framework for assessing whether resulting cellular states are stable beyond conventional metrics of intervent...
Reference graph
Works this paper leans on
-
[1]
Seeds: S[i]×1000 + 1 for i∈ {1,
High stability, high similarity(Q1): Representations derived from the same latent structure (α= 0.9 ) with small additive noise (σ= 0.1 ). Seeds: S[i]×1000 + 1 for i∈ {1, . . . ,15} . Results: Shesha= 0.701±0.003, CKA= 0.998±0.000
-
[2]
Seeds: S[i]×1000 + 2 and S[i]×1000 + 3 for each pair
High stability, low similarity(Q2): Independent high-signal representations (α= 0.9 ) with different latent draws. Seeds: S[i]×1000 + 2 and S[i]×1000 + 3 for each pair. Results: Shesha= 0.701±0.004, CKA= 0.001±0.010
-
[3]
Seeds: S[i]×1000 + 4 and S[i]×1000 + 5 for each pair
Low stability, low similarity(Q3): Independent noise representations ( α= 0.1 ). Seeds: S[i]×1000 + 4 and S[i]×1000 + 5 for each pair. Results: Shesha = 0.001±0.003 , CKA =−0.001±0.010
-
[4]
Low stability, high similarity(Q4): Adversarial quadrant constructed via rejection sam- pling. We generated pairs where X∼ N(0, I) 200×256 and Y=X+N(0,0.15 2I), accepting only samples where Shesha <0.4 and CKA >0.4 . This creates representa- tions with aligned sample geometry (high CKA) but inconsistent feature-split structure (low Shesha). Acceptance rat...
work page 2021
-
[5]
Train a logistic regression probe on 250 samples from Set B 43
-
[6]
Extract the weight vectorwas the steering direction
-
[7]
Forα∈ {−2,−1.5, ...,1.5,2}, compute the steered embeddings:e ′ =e+α ˆw
-
[8]
Evaluate the probe accuracy on the remaining 250 test samples
-
[9]
Recordmax_drop= acc 0 −min α acc(α) Negative controls. •Shuffled labels: Recompute all supervised metrics with permuted labels •Random directions: Average max_drop over 20 random unit vectors per split 8.1.2 Results Primary finding: Stability predicts steerability.Supervised geometric stability showed a strong correlation with steering effectiveness: ρ(Sh...
-
[10]
State-of-the-art prediction: Supervised Shesha achieves ρ >0.89 with steering effective- ness across all settings, matching or exceeding the Fisher discriminant
-
[11]
Unique geometric signal: Partial correlations of ρ∈[0.62,0.76] after controlling for separability show that stability is detecting something that separability measures miss. This shows that geometric consistency, rather than class separation, is a causal driver of controllability
-
[12]
For semantic control, stability must be task-aligned
Task alignment is essential: Unsupervised stability predicted steering in synthetic settings (ρ= 0.77 ), but it failed on real-world tasks (ρ≈0.10 -0.35). For semantic control, stability must be task-aligned
-
[13]
Methodology is sound: Negative controls confirm that (a) supervised metrics reflect genuine task structure (shuffled labels destroy signal), and (b) steering effects are direction-specific (true directions outperform random by1.3-10.8×). Model characteristics.Analysis of model rankings revealed that supervised contrastive models (BGE, E5, and GTE families...
work page 2019
-
[14]
Saved the clean model weights
-
[15]
Injected Gaussian noise at 51 levels:α∈ {0.00,0.01,0.02, . . . ,0.50}
-
[16]
For each parameter tensorθ, added noise:θ ′ =θ+N(0, α·std(θ))
-
[17]
Embedded 800 SST-2 validation samples (balanced classes)
-
[18]
Computed drift metrics and downstream classification accuracy (5-fold CV , logistic regres- sion)
-
[19]
Restored clean weights before next noise level 69 This protocol simulates parameter corruption from quantization errors, bit rot, or fine-tuning drift. Each (model,α) combination used a deterministic seed for reproducibility across runs. Embedding Details.For SentenceTransformer models, we used the native encode() method. For models loaded with AutoModel,...
work page 2022
-
[20]
Use Shesha as the primary drift metric.It provides the best combination of predictive validity (ρ≥0.92 ) and low false alarm rate (7%), detecting functionally relevant geometric changes while ignoring harmless rigid transformations
-
[21]
Use Procrustes for maximum sensitivity when false alarms are acceptable.In scenarios where any geometric change warrants investigation (e.g., security-critical deployments), Procrustes provides the earliest possible warning, but expect 6×more false positives
-
[22]
Use CKA as a confirmation signal.When Shesha triggers, check CKA to assess whether the drift has affected the dominant representation structure. If CKA remains stable, the perturbation may be recoverable; if CKA has also dropped, functional degradation is likely
-
[23]
10.6 Model Lists Table 53: Base/Instruct model pairs for post-training drift analysis (Experiment 1)
Avoid Wasserstein for drift detection.Sliced Wasserstein distance proved insufficiently sensitive, failing to detect drift until catastrophic collapse in most models. 10.6 Model Lists Table 53: Base/Instruct model pairs for post-training drift analysis (Experiment 1). Base Model Instruct Model Params HuggingFaceTB/SmolLM-135M SmolLM-135M-Instruct 0.14B Hu...
work page 2011
-
[24]
Compute normalized embeddings for the source domain (IMDB samples)
-
[25]
Calculate all transferability metrics on source embeddings with labels
-
[26]
Fine-tune linear probes on target domain with hyperparameter search
-
[27]
Report test accuracy as transfer performance measure The linear probes that were used included the following: • logistic regression (C∈ {0.1,1,10}) • ridge classifier (α∈ {1,10}) • LDA • nearest centroid The best probe was selected based on validation set accuracy. Sample sizes. • Experiment 1: ktotal ∈ {16,32,64,128,256,512} training examples (balanced a...
work page 2021
-
[28]
Unsupervised geometric stability does not predict transfer: Split-Half achieves ρ= 0.33 (few-shot) andρ= 0.03(cross-domain), both of which are non-significant
-
[29]
Label-informed metrics succeed: H-Score ( ρ= 0.89 -0.92), LogME ( ρ= 0.86 -0.93), Label-RDM Alignment ( ρ= 0.81 -0.86), and related metrics achieve strong, significant correlations
-
[30]
This null result for unsupervised stability provides valuable insights
Task alignment is required: The 0.56-0.90 gap between unsupervised and label-informed metrics demonstrates that for semantic transfer, stability must be measured relative to the downstream task structure. This null result for unsupervised stability provides valuable insights. defines the limits within which geometric consistency can predict performance fo...
work page 2019
-
[31]
Feature selection:Top 2,000 highly variable genes selected via highly_variable_genes(). 5.Dimensionality reduction:PCA with 50 components computed per dataset. PCA embeddings for each dataset were computedseparatelybecause of the potential for batch effects if they were computed using a common shared space. However, each PCA matrix maintained a consistent...
work page 2021
-
[32]
Letc= 1 nctrl P i xctrl i be the control centroid in PCA space
-
[33]
For each perturbed cellj, compute the shift vectorv j =x p j −c
-
[34]
Compute the mean shift direction ¯v= 1 np P j vj and its magnitude∥ ¯v∥
-
[35]
For cells with∥v j∥>10 −6, compute cosine similarity to the mean direction: Sp = 1 |V| X j∈V vj · ¯v ∥vj∥∥¯v∥ whereV={j:∥v j∥>10 −6}. This formula measures how self-consistency of a geometric perturbation is determined by the degree to which the perturbed cells move coherently together (in the same direction) relative to their controls. Perturbations with...
-
[36]
Resample perturbations values with replacement for each dataset,
-
[37]
Compute the statistical result of interest (correlation, partial correlation, etc.)
-
[38]
Helpful” perturbations prop up the correlation (removing them decreases ρ); “harmful
Log select samples/estimates into collection of bootstrapped estimates The 95% confidence interval was obtained by using the percentile method (2.5% and 97.5% percentiles of the bootstrapped distribution). Bootstrapped samples that produced NaN values (due to all resamples being constant) were excluded from the calculation of percentiles. Analyses that dr...
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.