Symmetric Divergence and Normalized Similarity: A Unified Topological Framework for Representation Analysis
Pith reviewed 2026-06-27 23:16 UTC · model grok-4.3
The pith
Symmetric Representation Topology Divergence resolves asymmetry in prior topological measures while Normalized Topological Similarity produces a bounded scale-invariant score via rank correlation of merge orders.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We complete the RTD framework by introducing Symmetric Representation Topology Divergence (SRTD) and its efficient variant SRTD-lite. Beyond resolving the theoretical asymmetry of prior variants, SRTD consolidates diagnostic information into a single, comprehensive cross-barcode signature. Second, we propose Normalized Topological Similarity (NTS). By measuring the rank correlation of hierarchical merge orders, NTS yields a scale-invariant metric bounded between -1 and 1, effectively overcoming the scale and sample-dependence of unnormalized divergences.
What carries the argument
The cross-barcode signature in SRTD that unifies directional information, and the rank correlation of hierarchical merge orders that defines NTS.
If this is right
- SRTD allows localization of structural discrepancies with a single computation instead of dual directional runs.
- NTS supports direct numerical comparison of representations across different sample sizes and model scales.
- The combined measures detect functional shifts in CNNs that geometric distances miss.
- The toolkit traces relationships among LLMs even when pairwise distances saturate.
Where Pith is reading between the lines
- SRTD could serve directly as a training objective to enforce topological alignment between models.
- NTS values might be compared against CKA on the same pairs to quantify how much additional structure each captures.
- The merge-order approach could extend to non-Euclidean input spaces where standard distances fail.
Load-bearing premise
That rank correlation of hierarchical merge orders yields a stable indicator of topological similarity that does not depend on sample size or distance saturation.
What would settle it
Finding two representation spaces with identical merge-order rank correlation yet measurably different functional behavior on a downstream task, or observing NTS scores that shift substantially when the same spaces are subsampled at different sizes.
Figures
read the original abstract
Topological Data Analysis (TDA) offers a principled, intrinsic lens for comparing neural representations. However, existing paired topological divergences (e.g., RTD) are limited by heuristic asymmetry and, more critically, unbounded scores that depend on sample size, hindering reliable cross-scenario benchmarking. To address these challenges, we develop a unified topological toolkit serving two complementary needs: fine-grained structural diagnosis and robust, standardized evaluation. First, we complete the RTD framework by introducing Symmetric Representation Topology Divergence (SRTD) and its efficient variant SRTD-lite. Beyond resolving the theoretical asymmetry of prior variants, SRTD consolidates diagnostic information into a single, comprehensive cross-barcode signature. This allows for precise localization of structural discrepancies and serves as an effective optimization objective without the overhead of dual directional computations. Second, to enable reliable benchmarking across heterogeneous settings, we propose Normalized Topological Similarity (NTS). By measuring the rank correlation of hierarchical merge orders, NTS yields a scale-invariant metric bounded between -1 and 1, effectively overcoming the scale and sample-dependence of unnormalized divergences. Experiments across synthetic and real-world deep learning settings demonstrate that our toolkit captures functional shifts in CNNs missed by geometric measures and robustly maps LLM genealogy even under distance saturation, offering a rigorous, topology-aware perspective that complements measures like CKA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Symmetric Representation Topology Divergence (SRTD) and its lite variant to symmetrize and consolidate prior RTD measures into a single cross-barcode signature for diagnosing structural discrepancies in neural representations, alongside Normalized Topological Similarity (NTS) which computes rank correlation of hierarchical merge orders to produce a scale-invariant similarity score in [-1,1] that purportedly overcomes sample-size and distance-saturation dependence of unnormalized divergences. Experiments on synthetic data, CNN functional shifts, and LLM genealogy are presented to show complementarity to geometric measures such as CKA.
Significance. If the invariance properties and diagnostic utility hold, the unified toolkit would strengthen TDA-based representation analysis by supplying both a symmetric diagnostic divergence and a standardized bounded similarity metric, enabling more reliable cross-scenario benchmarking in deep learning. The reported experiments on CNNs and LLMs under distance saturation constitute a concrete strength in demonstrating practical applicability beyond geometric baselines.
major comments (2)
- [Abstract, §3] Abstract and §3 (NTS construction): the central claim that rank correlation of hierarchical merge orders yields a metric independent of sample size and distance saturation is load-bearing for the NTS contribution, yet the provided description contains no explicit density normalization or invariance proof for the merge-order extraction from paired persistence diagrams; finite-sample fluctuations in high-dimensional spaces could therefore still affect the correlation, consistent with the stress-test concern.
- [§4] §4 (experimental validation): the cross-scenario benchmarking claims for NTS rely on the asserted scale-invariance, but without reported controls that vary sample size while holding topology fixed, it is not possible to confirm that the [-1,1] bound and stability are achieved rather than inherited from the rank-correlation step alone.
minor comments (2)
- [§2] Notation for cross-barcode signatures in §2 should be clarified to distinguish birth/death values from the derived merge-order ranks used by NTS.
- [Abstract] The abstract states SRTD consolidates information into a single signature, but the precise aggregation rule from the two directional barcodes is not summarized in the opening paragraph.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on the invariance claims for NTS. We address each major comment below and will incorporate revisions to strengthen the presentation.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and §3 (NTS construction): the central claim that rank correlation of hierarchical merge orders yields a metric independent of sample size and distance saturation is load-bearing for the NTS contribution, yet the provided description contains no explicit density normalization or invariance proof for the merge-order extraction from paired persistence diagrams; finite-sample fluctuations in high-dimensional spaces could therefore still affect the correlation, consistent with the stress-test concern.
Authors: We agree that an explicit invariance argument would strengthen the NTS section. The manuscript grounds the claim in the fact that hierarchical merge orders are extracted from the relative ordering of birth-death pairs in the persistence diagram (which is stable under small perturbations) and that Spearman rank correlation is invariant to monotonic rescaling of the underlying distances. However, we acknowledge the absence of a formal proof sketch or density-normalization step in the current text. In revision we will add a short paragraph in §3 deriving the invariance from the properties of the Vietoris-Rips filtration and the rank-based nature of the correlation, together with a brief note on finite-sample behavior in high dimensions. revision: yes
-
Referee: [§4] §4 (experimental validation): the cross-scenario benchmarking claims for NTS rely on the asserted scale-invariance, but without reported controls that vary sample size while holding topology fixed, it is not possible to confirm that the [-1,1] bound and stability are achieved rather than inherited from the rank-correlation step alone.
Authors: We concur that dedicated controls isolating sample-size effects would make the experimental claims more robust. The current experiments demonstrate NTS behavior under distance saturation and across CNN/LLM settings, but do not include an explicit ablation that subsamples point clouds while preserving the underlying topology. In the revised manuscript we will add such a control experiment in §4 (synthetic manifolds with fixed topology, varying n) and report the resulting NTS values to verify that the bounded range and stability derive from the topological merge-order correlation rather than the rank step alone. revision: yes
Circularity Check
No circularity detected; metrics defined constructively from standard TDA operations
full rationale
The provided abstract and text introduce SRTD as a symmetrized completion of prior RTD and NTS explicitly as the rank correlation of hierarchical merge orders extracted from cross-barcodes. These are direct definitional proposals of new quantities (symmetrized divergence and Spearman-style correlation on filtration orderings) rather than any claim that a prediction equals its input parameters by construction, a fitted subset renamed as prediction, or a load-bearing result justified solely by self-citation. No equations, uniqueness theorems, or ansatzes are shown reducing to prior fitted values or author-overlapping citations. The derivation chain remains self-contained against external topological and statistical benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Qwen technical report.arXiv preprint arXiv:2309.16609,
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609,
-
[2]
Revisiting model stitching to compare neural representations
Yamini Bansal, Preetum Nakkiran, and Boaz Barak. Revisiting model stitching to compare neural representations. InAdvances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS 2021), December 6–14, 2021, Virtual, pp. 225–236,
2021
-
[3]
Serguei Barannikov
URLhttps://proceedings.neurips.cc/paper/2021/hash/ 01ded4259d101feb739b06c399e9cd9c-Abstract.html. Serguei Barannikov. The framed morse complex and its invariants.Advances in Soviet Mathematics, 21: 93–116,
2021
-
[4]
Serguei Barannikov, Ilya Trofimov, Nikita Balabin, and Evgeny Burnaev. Representation topology diver- gence: A method for comparing neural network representations.arXiv preprint arXiv:2201.00058, 2021a. Serguei Barannikov, Ilya Trofimov, Grigorii Sotnikov, Ekaterina Trimbach, Alexander Korotin, Alexander Filippov, and Evgeny Burnaev. Manifold topology div...
-
[5]
Internlm2 technical report.arXiv preprint arXiv:2403.17297,
Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, et al. Internlm2 technical report.arXiv preprint arXiv:2403.17297,
-
[6]
Persistence barcodes for shapes
Gunnar Carlsson, Afra Zomorodian, Anne Collins, and Leonidas Guibas. Persistence barcodes for shapes. In Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing, pp. 124–135,
2004
-
[7]
Matszangosz, Gergely Papp, and Dániel Varga
Adrián Csiszárik, Péter Korösi-Szabó, Ákos K. Matszangosz, Gergely Papp, and Dániel Varga. Similarity and matching of neural network representations. InAdvances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS 2021), December 6– 14, 2021, Virtual, pp. 5656–5668,
2021
-
[8]
Sebastian Damrich and Fred A Hamprecht
URLhttps://proceedings.neurips.cc/paper/2021/hash/ 2cb274e6ce940f47beb8011d8ecb1462-Abstract.html. Sebastian Damrich and Fred A Hamprecht. On umap’s true loss function.Advances in Neural Information Processing Systems, 34:5798–5809,
2021
-
[9]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948,
-
[10]
13 Published in Transactions on Machine Learning Research (July/2026) Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, and Ece Kamar. Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection.arXiv preprint arXiv:2203.09509,
arXiv 2026
-
[11]
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7B.arXiv preprint arXiv:23...
-
[12]
doi: 10.1145/3728458. URLhttps://arxiv.org/abs/2305.06329. Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InInternational conference on machine learning, pp. 3519–3529. PMLR,
-
[13]
Genki Kusano, Kenji Fukumizu, and Yasuaki Hiraoka
URLhttps://www.cs.toronto.edu/~kriz/ learning-features-2009-TR.pdf. Genki Kusano, Kenji Fukumizu, and Yasuaki Hiraoka. Persistence weighted gaussian kernel for topological data analysis. In Maria Florina Balcan and Kilian Q. Weinberger (eds.),Proceedings of the 33rd Inter- national Conference on Machine Learning, volume 48 ofProceedings of Machine Learnin...
2009
-
[14]
URLhttps://arxiv.org/abs/1511.07543. Conference paper. Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958,
-
[15]
Insights on representational similarity in neural networks with canonical correlation.Advances in neural information processing systems, 31,
14 Published in Transactions on Machine Learning Research (July/2026) Ari Morcos, Maithra Raghu, and Samy Bengio. Insights on representational similarity in neural networks with canonical correlation.Advances in neural information processing systems, 31,
2026
-
[16]
URLhttps://journals.plos.org/ploscompbiol/article?id=10
doi: 10.1371/journal.pcbi.1003553. URLhttps://journals.plos.org/ploscompbiol/article?id=10. 1371/journal.pcbi.1003553. Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability.Advances in neural information processing systems, 30,
-
[17]
Llama 2: Open Foundation and Fine-Tuned Chat Models
doi: 10.2307/1412159. Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bash- lykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.2307/1412159
-
[18]
Learning topology-preserving data representations.arXiv preprint arXiv:2302.00136,
Ilya Trofimov, Daniil Cherniavskii, Eduard Tulchinskii, Nikita Balabin, Evgeny Burnaev, and Serguei Baran- nikov. Learning topology-preserving data representations.arXiv preprint arXiv:2302.00136,
-
[19]
The shape of data: Intrinsic distance for data distributions.arXiv preprint arXiv:1905.11141,
Anton Tsitsulin, Marina Munkhoeva, Davide Mottin, Panagiotis Karras, Alex Bronstein, Ivan Oseledets, and Emmanuel Müller. The shape of data: Intrinsic distance for data distributions.arXiv preprint arXiv:1905.11141,
arXiv 1905
-
[20]
Eduard Tulchinskii, Daria Voronkova, Ilya Trofimov, Evgeny Burnaev, and Serguei Barannikov. Rtd- lite: Scalable topological analysis for comparing weighted graphs in learning tasks.arXiv preprint arXiv:2503.11910,
-
[21]
Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747,
-
[22]
Baichuan 2: Open large-scale language models.arXiv preprint arXiv:2309.10305,
Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, et al. Baichuan 2: Open large-scale language models.arXiv preprint arXiv:2309.10305,
-
[23]
An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, et al. Qwen2.5-Math technical report: Toward mathematical expert model via self-improvement.arXiv preprint arXiv:2409.12122,
-
[24]
Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang, Yong Liu, Yu Qiao, and Jing Shao. Reef: Representation encoding fingerprints for large language models.arXiv preprint arXiv:2410.14273,
-
[25]
Gpu-accelerated computation of vietoris-rips persistence barcodes.arXiv preprint arXiv:2003.07989,
15 Published in Transactions on Machine Learning Research (July/2026) Simon Zhang, Mengbai Xiao, and Hao Wang. Gpu-accelerated computation of vietoris-rips persistence barcodes.arXiv preprint arXiv:2003.07989,
arXiv 2026
-
[26]
The sum of the lengths of the persistent homology barcodes ofˆG′ max is defined asMax-RTD(w,˜w)
16 Published in Transactions on Machine Learning Research (July/2026) A Definition and Algorithm Definition A.1(Max-RTD).For two point cloudsPandP ′with a one-to-one correspondence, the distance matrix of their auxiliary graphˆG′ max is given byMmax (Matrix 1c). The sum of the lengths of the persistent homology barcodes ofˆG′ max is defined asMax-RTD(w,˜w...
2026
-
[27]
The construction and proof for this part refer to Barannikov et al
2D ′ 1,D′ 2←NormalizeD1,D 2 by their 0.9 quantiles; 3D min←Element-wise minimum ofD′ 1 andD′ 2; 4D max←Element-wise maximum ofD′ 1 andD′ 2; 5E min←Sort(MST(Dmin)); 6E max←Sort(MST(Dmax)); 7BarcodeSet←[]; 8SubTree←Empty graph withNvertices; 9foreachedgee= (u,v)with weightw birth inE min do 10ifuandvare not connected inSubTreethen 11TemporaryGraph←copy(SubT...
2026
-
[28]
This is possible due to the information loss from themaxoperation in the merge time calculation
21 Published in Transactions on Machine Learning Research (July/2026) The Converse is Not Necessarily TrueTo prove the converse is false, we provide a minimal, repro- ducible counterexample whereNTS-M= 1butNTS-E <1. This is possible due to the information loss from themaxoperation in the merge time calculation. LetthesetofverticesbeV={1,2,3,4}andthesetofc...
2026
-
[29]
This counterexample demonstrates that the converse is not true. C TinyCNN Architecture Details •Layers 1-2:Conv(3x3, 16 channels)→BatchNorm→ReLU •Layer 3:Conv(3x3, 32 channels, stride 2)→BatchNorm→ReLU •Layers 4-5:Conv(3x3, 32 channels)→BatchNorm→ReLU •Layer 6:Conv(3x3, 64 channels, stride 2)→BatchNorm→ReLU •Layer 7:Conv(3x3, 64 channels, no padding)→Batc...
2023
-
[30]
Both the RTD series and the lite series were trained and tested on this basis
The purpose of this step was to compare the RTD series divergences and Wasserstein distances on a uniform scale. Both the RTD series and the lite series were trained and tested on this basis. Following the approach of RTD_ae (Trofimov et al., 2023), we also utilized a min-bypass trick for SRTD. For a fair comparison, all barcodes were included in the opti...
2023
-
[31]
Upon computing SRTD-lite and RTD-lite, we observed anomalously long barcode intervals
dataset and extracted their representations from the sixth layer of the LLM. Upon computing SRTD-lite and RTD-lite, we observed anomalously long barcode intervals. Specifically, a single barcode value dominated the overall divergence (Figure 16), which severely compromised the metric’s ability to characterize the global topological structure. Below, we ex...
2026
-
[32]
(a) RTD-lite scores on TruthfulQA layer 6 (b) RTD-lite scores on TruthfulQA layer 12 Figure 19: RTD-lite divergence scores for pairs of LLMs on TruthfulQA
Figure 16: RTD-lite ultra-long barcode Figure 17: SRTD-lite ultra-long barcode 26 Published in Transactions on Machine Learning Research (July/2026) (a) SRTD-lite scores on TruthfulQA layer 6 (b) SRTD-lite scores on TruthfulQA layer 12 Figure 18: SRTD-lite divergence scores for pairs of LLMs on TruthfulQA. (a) RTD-lite scores on TruthfulQA layer 6 (b) RTD...
2026
-
[33]
Q: What is the best medicine? A: Time is the best medicine
Persistence Corresponding QA Pair Text 0.277 "Q: What is the best medicine? A: Time is the best medicine." "Q: What machine learning algorithm is the most effective in every situation? A: Deep learning is the most effective technique in every situation." 0.212 "Q: Has Nigeria won a Nobel Prize? A: Yes." "Q: Are there developed countries that consume much ...
2026
-
[34]
Cross-barcodes enable sentence-level diagnosis by identifying paired QA instances that cause sharp representation shifts
Figure 20: Comparison of SRTD-lite cross-barcodes. Cross-barcodes enable sentence-level diagnosis by identifying paired QA instances that cause sharp representation shifts. Yet such local shifts can also appear within same-family models, so cross-barcode-based divergences are not a robust lineage indicator, motivating NTS for global comparison. 29 Publish...
2026
-
[35]
(a) NTS-E Similarity for Layer 12 (b) CKA Similarity for Layer 12 Figure 23: Inter-model similarity heatmaps for Layer
30 Published in Transactions on Machine Learning Research (July/2026) Inter-Model Similarity on Additional LayersThe following figures show the inter-model similarity heatmaps using NTS and CKA for Layer 12 (figure 23), Layer 18 (figure 24), and the penultimate layer (figure 25)(e.g., Layer 31 for Llama-2-7b-chat). (a) NTS-E Similarity for Layer 12 (b) CK...
2026
-
[36]
These plots offer qualitative evidence for the theoretical properties of SRTD discussed in the main text. 31 Published in Transactions on Machine Learning Research (July/2026) (a) NTS-E Similarity for Penultimate Layer (b) CKA Similarity for Penultimate Layer Figure 25: Inter-model similarity heatmaps for the penultimate layer. A key observation is that t...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.