Symmetric Divergence and Normalized Similarity: A Unified Topological Framework for Representation Analysis

Tianyang Hu; Yan Wang

arxiv: 2606.06342 · v1 · pith:VOIH5KYYnew · submitted 2026-06-04 · 📊 stat.ML · cs.LG

Symmetric Divergence and Normalized Similarity: A Unified Topological Framework for Representation Analysis

Yan Wang , Tianyang Hu This is my paper

Pith reviewed 2026-06-27 23:16 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords Topological Data AnalysisRepresentation SimilarityNeural Network RepresentationsSymmetric DivergenceNormalized SimilarityPersistent HomologyBarcode SignaturesRTD

0 comments

The pith

Symmetric Representation Topology Divergence resolves asymmetry in prior topological measures while Normalized Topological Similarity produces a bounded scale-invariant score via rank correlation of merge orders.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a unified toolkit for comparing neural representations using topology. It completes earlier asymmetric divergence methods by defining SRTD, which produces one cross-barcode signature for diagnosis and optimization. It also defines NTS, which converts hierarchical merge orders into a correlation score that stays between -1 and 1 regardless of sample size or distance scaling. Experiments show these tools detect functional changes in CNNs and trace LLM relationships where geometric measures fall short.

Core claim

We complete the RTD framework by introducing Symmetric Representation Topology Divergence (SRTD) and its efficient variant SRTD-lite. Beyond resolving the theoretical asymmetry of prior variants, SRTD consolidates diagnostic information into a single, comprehensive cross-barcode signature. Second, we propose Normalized Topological Similarity (NTS). By measuring the rank correlation of hierarchical merge orders, NTS yields a scale-invariant metric bounded between -1 and 1, effectively overcoming the scale and sample-dependence of unnormalized divergences.

What carries the argument

The cross-barcode signature in SRTD that unifies directional information, and the rank correlation of hierarchical merge orders that defines NTS.

If this is right

SRTD allows localization of structural discrepancies with a single computation instead of dual directional runs.
NTS supports direct numerical comparison of representations across different sample sizes and model scales.
The combined measures detect functional shifts in CNNs that geometric distances miss.
The toolkit traces relationships among LLMs even when pairwise distances saturate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

SRTD could serve directly as a training objective to enforce topological alignment between models.
NTS values might be compared against CKA on the same pairs to quantify how much additional structure each captures.
The merge-order approach could extend to non-Euclidean input spaces where standard distances fail.

Load-bearing premise

That rank correlation of hierarchical merge orders yields a stable indicator of topological similarity that does not depend on sample size or distance saturation.

What would settle it

Finding two representation spaces with identical merge-order rank correlation yet measurably different functional behavior on a downstream task, or observing NTS scores that shift substantially when the same spaces are subsampled at different sizes.

Figures

Figures reproduced from arXiv: 2606.06342 by Tianyang Hu, Yan Wang.

**Figure 2.** Figure 2: Conceptual relationship between SRTD, RTD, and Max-RTD. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: While Rep. A and Rep. B have distinct geometric layouts (CKA [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Comprehensive analysis of the RTD framework on the synthetic Clusters dataset. (a–d) Performance comparison across different measures; note the superior sensitivity of NTS and SRTD families compared to CKA and RTD-lite. (e) Evaluation of the small theoretical gap between SRTD and symmetrized directional variants, where E1 and E2 quantify the contribution of private topological features unique to individual… view at source ↗

**Figure 5.** Figure 5: UMAP experiment (a) CKA (98.89%) (b) NTS-E (97.22%) (c) NTS-M (94.72%) (d) SRTD-lite (98.33%) [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Average layer-wise similarity comparison over 45 pairs of trained [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Intra-model layer similarity for LLM families on the TruthfulQA (top half) and ToxiGen (bottom [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Inter-model similarity maps for 17 LLMs To empirically validate this scalability, we conducted a runtime benchmark using representations from a TinyCNN trained on CIFAR-10. We varied the sample size N from 5,000 to 30,000 and measured the endto-end execution time. The results in [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Runtime comparison on CIFAR-10 representations with varying sample sizes. 7 Conclusion In summary, we introduce a complementary topological toolkit. These methods offer a powerful choice for representation analysis. While NTS is ideal for obtaining a single, stable similarity score, SRTD-lite offers in-depth diagnostics ( [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: Supplementary Heatmap for Tiny CNN Experiments: RTD and RTD-lite [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

**Figure 11.** Figure 11: Further analysis of the RTD framework on UMAP embeddings. (a) The asymmetry of directional [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

**Figure 13.** Figure 13: UMAP Experiment [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗

**Figure 16.** Figure 16: RTD-lite ultra-long barcode [PITH_FULL_IMAGE:figures/full_fig_p026_16.png] view at source ↗

**Figure 17.** Figure 17: SRTD-lite ultra-long barcode 26 [PITH_FULL_IMAGE:figures/full_fig_p026_17.png] view at source ↗

**Figure 18.** Figure 18: SRTD-lite divergence scores for pairs of LLMs on TruthfulQA. [PITH_FULL_IMAGE:figures/full_fig_p027_18.png] view at source ↗

**Figure 19.** Figure 19: RTD-lite divergence scores for pairs of LLMs on TruthfulQA. [PITH_FULL_IMAGE:figures/full_fig_p027_19.png] view at source ↗

**Figure 20.** Figure 20: Comparison of SRTD-lite cross-barcodes. Cross-barcodes enable sentence-level diagnosis by [PITH_FULL_IMAGE:figures/full_fig_p029_20.png] view at source ↗

**Figure 21.** Figure 21: Ideal examples of SRTD-lite barcodes. (a) For a closely related pair of models, the barcodes are [PITH_FULL_IMAGE:figures/full_fig_p030_21.png] view at source ↗

**Figure 22.** Figure 22: NTS-E similarity heatmap without Z-score normalization (layer 6) [PITH_FULL_IMAGE:figures/full_fig_p030_22.png] view at source ↗

**Figure 23.** Figure 23: Inter-model similarity heatmaps for Layer 12. [PITH_FULL_IMAGE:figures/full_fig_p031_23.png] view at source ↗

**Figure 24.** Figure 24: Inter-model similarity heatmaps for Layer 18. [PITH_FULL_IMAGE:figures/full_fig_p031_24.png] view at source ↗

**Figure 25.** Figure 25: Inter-model similarity heatmaps for the penultimate layer. [PITH_FULL_IMAGE:figures/full_fig_p032_25.png] view at source ↗

**Figure 26.** Figure 26: A comparison of barcodes generated by SRTD (top row) and the directional RTD and Max-RTD [PITH_FULL_IMAGE:figures/full_fig_p033_26.png] view at source ↗

read the original abstract

Topological Data Analysis (TDA) offers a principled, intrinsic lens for comparing neural representations. However, existing paired topological divergences (e.g., RTD) are limited by heuristic asymmetry and, more critically, unbounded scores that depend on sample size, hindering reliable cross-scenario benchmarking. To address these challenges, we develop a unified topological toolkit serving two complementary needs: fine-grained structural diagnosis and robust, standardized evaluation. First, we complete the RTD framework by introducing Symmetric Representation Topology Divergence (SRTD) and its efficient variant SRTD-lite. Beyond resolving the theoretical asymmetry of prior variants, SRTD consolidates diagnostic information into a single, comprehensive cross-barcode signature. This allows for precise localization of structural discrepancies and serves as an effective optimization objective without the overhead of dual directional computations. Second, to enable reliable benchmarking across heterogeneous settings, we propose Normalized Topological Similarity (NTS). By measuring the rank correlation of hierarchical merge orders, NTS yields a scale-invariant metric bounded between -1 and 1, effectively overcoming the scale and sample-dependence of unnormalized divergences. Experiments across synthetic and real-world deep learning settings demonstrate that our toolkit captures functional shifts in CNNs missed by geometric measures and robustly maps LLM genealogy even under distance saturation, offering a rigorous, topology-aware perspective that complements measures like CKA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper symmetrizes RTD into SRTD and adds NTS via merge-order rank correlation to fix asymmetry and sample dependence, but the invariance claim for NTS looks under-supported without explicit normalization details.

read the letter

The core contribution is completing the RTD setup with symmetric SRTD and SRTD-lite, then defining NTS as a rank correlation on hierarchical merge orders to produce a bounded score in [-1,1].

This directly targets two practical problems in topological comparison of representations: directional asymmetry that forces extra computations, and unbounded scores that shift with sample size or scale. The abstract positions both as extensions that keep the cross-barcode diagnostic power while adding a standardized similarity measure.

The work is clearest on the motivation. It notes that existing paired divergences like RTD are limited for benchmarking across different models or datasets, and the new pieces aim to supply both fine-grained diagnosis and a comparable number. Experiments on CNN functional shifts and LLM genealogy are mentioned as evidence that the toolkit picks up changes missed by geometric baselines like CKA.

The soft spot is the NTS invariance argument. The claim is that rank correlation of merge orders removes sample-size and distance-saturation effects. The stress-test note flags that without an explicit density correction or proof that the ordering itself is stable, finite-sample fluctuations in high-dimensional spaces could still leak through. The abstract asserts the fix but does not show the construction steps or any normalization that would make this hold. That leaves the central selling point for NTS resting on an assumption that may not be automatic.

SRTD itself looks like a straightforward symmetrization, so the main risk is whether the consolidated cross-barcode signature actually preserves the localization properties claimed.

This is for people already working on TDA tools for neural representations who need better diagnostics and benchmarks. It is worth sending to peer review so the derivations and any invariance arguments can be checked against the actual constructions rather than the abstract summary.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Symmetric Representation Topology Divergence (SRTD) and its lite variant to symmetrize and consolidate prior RTD measures into a single cross-barcode signature for diagnosing structural discrepancies in neural representations, alongside Normalized Topological Similarity (NTS) which computes rank correlation of hierarchical merge orders to produce a scale-invariant similarity score in [-1,1] that purportedly overcomes sample-size and distance-saturation dependence of unnormalized divergences. Experiments on synthetic data, CNN functional shifts, and LLM genealogy are presented to show complementarity to geometric measures such as CKA.

Significance. If the invariance properties and diagnostic utility hold, the unified toolkit would strengthen TDA-based representation analysis by supplying both a symmetric diagnostic divergence and a standardized bounded similarity metric, enabling more reliable cross-scenario benchmarking in deep learning. The reported experiments on CNNs and LLMs under distance saturation constitute a concrete strength in demonstrating practical applicability beyond geometric baselines.

major comments (2)

[Abstract, §3] Abstract and §3 (NTS construction): the central claim that rank correlation of hierarchical merge orders yields a metric independent of sample size and distance saturation is load-bearing for the NTS contribution, yet the provided description contains no explicit density normalization or invariance proof for the merge-order extraction from paired persistence diagrams; finite-sample fluctuations in high-dimensional spaces could therefore still affect the correlation, consistent with the stress-test concern.
[§4] §4 (experimental validation): the cross-scenario benchmarking claims for NTS rely on the asserted scale-invariance, but without reported controls that vary sample size while holding topology fixed, it is not possible to confirm that the [-1,1] bound and stability are achieved rather than inherited from the rank-correlation step alone.

minor comments (2)

[§2] Notation for cross-barcode signatures in §2 should be clarified to distinguish birth/death values from the derived merge-order ranks used by NTS.
[Abstract] The abstract states SRTD consolidates information into a single signature, but the precise aggregation rule from the two directional barcodes is not summarized in the opening paragraph.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on the invariance claims for NTS. We address each major comment below and will incorporate revisions to strengthen the presentation.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (NTS construction): the central claim that rank correlation of hierarchical merge orders yields a metric independent of sample size and distance saturation is load-bearing for the NTS contribution, yet the provided description contains no explicit density normalization or invariance proof for the merge-order extraction from paired persistence diagrams; finite-sample fluctuations in high-dimensional spaces could therefore still affect the correlation, consistent with the stress-test concern.

Authors: We agree that an explicit invariance argument would strengthen the NTS section. The manuscript grounds the claim in the fact that hierarchical merge orders are extracted from the relative ordering of birth-death pairs in the persistence diagram (which is stable under small perturbations) and that Spearman rank correlation is invariant to monotonic rescaling of the underlying distances. However, we acknowledge the absence of a formal proof sketch or density-normalization step in the current text. In revision we will add a short paragraph in §3 deriving the invariance from the properties of the Vietoris-Rips filtration and the rank-based nature of the correlation, together with a brief note on finite-sample behavior in high dimensions. revision: yes
Referee: [§4] §4 (experimental validation): the cross-scenario benchmarking claims for NTS rely on the asserted scale-invariance, but without reported controls that vary sample size while holding topology fixed, it is not possible to confirm that the [-1,1] bound and stability are achieved rather than inherited from the rank-correlation step alone.

Authors: We concur that dedicated controls isolating sample-size effects would make the experimental claims more robust. The current experiments demonstrate NTS behavior under distance saturation and across CNN/LLM settings, but do not include an explicit ablation that subsamples point clouds while preserving the underlying topology. In the revised manuscript we will add such a control experiment in §4 (synthetic manifolds with fixed topology, varying n) and report the resulting NTS values to verify that the bounded range and stability derive from the topological merge-order correlation rather than the rank step alone. revision: yes

Circularity Check

0 steps flagged

No circularity detected; metrics defined constructively from standard TDA operations

full rationale

The provided abstract and text introduce SRTD as a symmetrized completion of prior RTD and NTS explicitly as the rank correlation of hierarchical merge orders extracted from cross-barcodes. These are direct definitional proposals of new quantities (symmetrized divergence and Spearman-style correlation on filtration orderings) rather than any claim that a prediction equals its input parameters by construction, a fitted subset renamed as prediction, or a load-bearing result justified solely by self-citation. No equations, uniqueness theorems, or ansatzes are shown reducing to prior fitted values or author-overlapping citations. The derivation chain remains self-contained against external topological and statistical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the central claims rest on unstated assumptions about the stability of merge-order ranks and the diagnostic value of the symmetrized barcode signature.

pith-pipeline@v0.9.1-grok · 5766 in / 1143 out tokens · 37607 ms · 2026-06-27T23:16:57.627323+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Qwen technical report.arXiv preprint arXiv:2309.16609,

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609,

Pith/arXiv arXiv
[2]

Revisiting model stitching to compare neural representations

Yamini Bansal, Preetum Nakkiran, and Boaz Barak. Revisiting model stitching to compare neural representations. InAdvances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS 2021), December 6–14, 2021, Virtual, pp. 225–236,

2021
[3]

Serguei Barannikov

URLhttps://proceedings.neurips.cc/paper/2021/hash/ 01ded4259d101feb739b06c399e9cd9c-Abstract.html. Serguei Barannikov. The framed morse complex and its invariants.Advances in Soviet Mathematics, 21: 93–116,

2021
[4]

Representation topology diver- gence: A method for comparing neural network representations.arXiv preprint arXiv:2201.00058, 2021a

Serguei Barannikov, Ilya Trofimov, Nikita Balabin, and Evgeny Burnaev. Representation topology diver- gence: A method for comparing neural network representations.arXiv preprint arXiv:2201.00058, 2021a. Serguei Barannikov, Ilya Trofimov, Grigorii Sotnikov, Ekaterina Trimbach, Alexander Korotin, Alexander Filippov, and Evgeny Burnaev. Manifold topology div...

arXiv
[5]

Internlm2 technical report.arXiv preprint arXiv:2403.17297,

Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, et al. Internlm2 technical report.arXiv preprint arXiv:2403.17297,

Pith/arXiv arXiv
[6]

Persistence barcodes for shapes

Gunnar Carlsson, Afra Zomorodian, Anne Collins, and Leonidas Guibas. Persistence barcodes for shapes. In Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing, pp. 124–135,

2004
[7]

Matszangosz, Gergely Papp, and Dániel Varga

Adrián Csiszárik, Péter Korösi-Szabó, Ákos K. Matszangosz, Gergely Papp, and Dániel Varga. Similarity and matching of neural network representations. InAdvances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS 2021), December 6– 14, 2021, Virtual, pp. 5656–5668,

2021
[8]

Sebastian Damrich and Fred A Hamprecht

URLhttps://proceedings.neurips.cc/paper/2021/hash/ 2cb274e6ce940f47beb8011d8ecb1462-Abstract.html. Sebastian Damrich and Fred A Hamprecht. On umap’s true loss function.Advances in Neural Information Processing Systems, 34:5798–5809,

2021
[9]

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948,

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948,

Pith/arXiv arXiv
[10]

Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection.arXiv preprint arXiv:2203.09509,

13 Published in Transactions on Machine Learning Research (July/2026) Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, and Ece Kamar. Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection.arXiv preprint arXiv:2203.09509,

arXiv 2026
[11]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7B.arXiv preprint arXiv:23...

Pith/arXiv arXiv
[12]

Klabunde, T

doi: 10.1145/3728458. URLhttps://arxiv.org/abs/2305.06329. Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InInternational conference on machine learning, pp. 3519–3529. PMLR,

work page doi:10.1145/3728458
[13]

Genki Kusano, Kenji Fukumizu, and Yasuaki Hiraoka

URLhttps://www.cs.toronto.edu/~kriz/ learning-features-2009-TR.pdf. Genki Kusano, Kenji Fukumizu, and Yasuaki Hiraoka. Persistence weighted gaussian kernel for topological data analysis. In Maria Florina Balcan and Kilian Q. Weinberger (eds.),Proceedings of the 33rd Inter- national Conference on Machine Learning, volume 48 ofProceedings of Machine Learnin...

2009
[14]

Conference paper

URLhttps://arxiv.org/abs/1511.07543. Conference paper. Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958,

Pith/arXiv arXiv
[15]

Insights on representational similarity in neural networks with canonical correlation.Advances in neural information processing systems, 31,

14 Published in Transactions on Machine Learning Research (July/2026) Ari Morcos, Maithra Raghu, and Samy Bengio. Insights on representational similarity in neural networks with canonical correlation.Advances in neural information processing systems, 31,

2026
[16]

URLhttps://journals.plos.org/ploscompbiol/article?id=10

doi: 10.1371/journal.pcbi.1003553. URLhttps://journals.plos.org/ploscompbiol/article?id=10. 1371/journal.pcbi.1003553. Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability.Advances in neural information processing systems, 30,

work page doi:10.1371/journal.pcbi.1003553
[17]

Llama 2: Open Foundation and Fine-Tuned Chat Models

doi: 10.2307/1412159. Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bash- lykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.2307/1412159
[18]

Learning topology-preserving data representations.arXiv preprint arXiv:2302.00136,

Ilya Trofimov, Daniil Cherniavskii, Eduard Tulchinskii, Nikita Balabin, Evgeny Burnaev, and Serguei Baran- nikov. Learning topology-preserving data representations.arXiv preprint arXiv:2302.00136,

arXiv
[19]

The shape of data: Intrinsic distance for data distributions.arXiv preprint arXiv:1905.11141,

Anton Tsitsulin, Marina Munkhoeva, Davide Mottin, Panagiotis Karras, Alex Bronstein, Ivan Oseledets, and Emmanuel Müller. The shape of data: Intrinsic distance for data distributions.arXiv preprint arXiv:1905.11141,

arXiv 1905
[20]

Rtd- lite: Scalable topological analysis for comparing weighted graphs in learning tasks.arXiv preprint arXiv:2503.11910,

Eduard Tulchinskii, Daria Voronkova, Ilya Trofimov, Evgeny Burnaev, and Serguei Barannikov. Rtd- lite: Scalable topological analysis for comparing weighted graphs in learning tasks.arXiv preprint arXiv:2503.11910,

Pith/arXiv arXiv
[21]

Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747,

Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747,

Pith/arXiv arXiv
[22]

Baichuan 2: Open large-scale language models.arXiv preprint arXiv:2309.10305,

Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, et al. Baichuan 2: Open large-scale language models.arXiv preprint arXiv:2309.10305,

Pith/arXiv arXiv
[23]

Qwen2.5-Math technical report: Toward mathematical expert model via self-improvement.arXiv preprint arXiv:2409.12122,

An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, et al. Qwen2.5-Math technical report: Toward mathematical expert model via self-improvement.arXiv preprint arXiv:2409.12122,

Pith/arXiv arXiv
[24]

Reef: Representation encoding fingerprints for large language models.arXiv preprint arXiv:2410.14273,

Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang, Yong Liu, Yu Qiao, and Jing Shao. Reef: Representation encoding fingerprints for large language models.arXiv preprint arXiv:2410.14273,

arXiv
[25]

Gpu-accelerated computation of vietoris-rips persistence barcodes.arXiv preprint arXiv:2003.07989,

15 Published in Transactions on Machine Learning Research (July/2026) Simon Zhang, Mengbai Xiao, and Hao Wang. Gpu-accelerated computation of vietoris-rips persistence barcodes.arXiv preprint arXiv:2003.07989,

arXiv 2026
[26]

The sum of the lengths of the persistent homology barcodes ofˆG′ max is defined asMax-RTD(w,˜w)

16 Published in Transactions on Machine Learning Research (July/2026) A Definition and Algorithm Definition A.1(Max-RTD).For two point cloudsPandP ′with a one-to-one correspondence, the distance matrix of their auxiliary graphˆG′ max is given byMmax (Matrix 1c). The sum of the lengths of the persistent homology barcodes ofˆG′ max is defined asMax-RTD(w,˜w...

2026
[27]

The construction and proof for this part refer to Barannikov et al

2D ′ 1,D′ 2←NormalizeD1,D 2 by their 0.9 quantiles; 3D min←Element-wise minimum ofD′ 1 andD′ 2; 4D max←Element-wise maximum ofD′ 1 andD′ 2; 5E min←Sort(MST(Dmin)); 6E max←Sort(MST(Dmax)); 7BarcodeSet←[]; 8SubTree←Empty graph withNvertices; 9foreachedgee= (u,v)with weightw birth inE min do 10ifuandvare not connected inSubTreethen 11TemporaryGraph←copy(SubT...

2026
[28]

This is possible due to the information loss from themaxoperation in the merge time calculation

21 Published in Transactions on Machine Learning Research (July/2026) The Converse is Not Necessarily TrueTo prove the converse is false, we provide a minimal, repro- ducible counterexample whereNTS-M= 1butNTS-E <1. This is possible due to the information loss from themaxoperation in the merge time calculation. LetthesetofverticesbeV={1,2,3,4}andthesetofc...

2026
[29]

This counterexample demonstrates that the converse is not true. C TinyCNN Architecture Details •Layers 1-2:Conv(3x3, 16 channels)→BatchNorm→ReLU •Layer 3:Conv(3x3, 32 channels, stride 2)→BatchNorm→ReLU •Layers 4-5:Conv(3x3, 32 channels)→BatchNorm→ReLU •Layer 6:Conv(3x3, 64 channels, stride 2)→BatchNorm→ReLU •Layer 7:Conv(3x3, 64 channels, no padding)→Batc...

2023
[30]

Both the RTD series and the lite series were trained and tested on this basis

The purpose of this step was to compare the RTD series divergences and Wasserstein distances on a uniform scale. Both the RTD series and the lite series were trained and tested on this basis. Following the approach of RTD_ae (Trofimov et al., 2023), we also utilized a min-bypass trick for SRTD. For a fair comparison, all barcodes were included in the opti...

2023
[31]

Upon computing SRTD-lite and RTD-lite, we observed anomalously long barcode intervals

dataset and extracted their representations from the sixth layer of the LLM. Upon computing SRTD-lite and RTD-lite, we observed anomalously long barcode intervals. Specifically, a single barcode value dominated the overall divergence (Figure 16), which severely compromised the metric’s ability to characterize the global topological structure. Below, we ex...

2026
[32]

(a) RTD-lite scores on TruthfulQA layer 6 (b) RTD-lite scores on TruthfulQA layer 12 Figure 19: RTD-lite divergence scores for pairs of LLMs on TruthfulQA

Figure 16: RTD-lite ultra-long barcode Figure 17: SRTD-lite ultra-long barcode 26 Published in Transactions on Machine Learning Research (July/2026) (a) SRTD-lite scores on TruthfulQA layer 6 (b) SRTD-lite scores on TruthfulQA layer 12 Figure 18: SRTD-lite divergence scores for pairs of LLMs on TruthfulQA. (a) RTD-lite scores on TruthfulQA layer 6 (b) RTD...

2026
[33]

Q: What is the best medicine? A: Time is the best medicine

Persistence Corresponding QA Pair Text 0.277 "Q: What is the best medicine? A: Time is the best medicine." "Q: What machine learning algorithm is the most effective in every situation? A: Deep learning is the most effective technique in every situation." 0.212 "Q: Has Nigeria won a Nobel Prize? A: Yes." "Q: Are there developed countries that consume much ...

2026
[34]

Cross-barcodes enable sentence-level diagnosis by identifying paired QA instances that cause sharp representation shifts

Figure 20: Comparison of SRTD-lite cross-barcodes. Cross-barcodes enable sentence-level diagnosis by identifying paired QA instances that cause sharp representation shifts. Yet such local shifts can also appear within same-family models, so cross-barcode-based divergences are not a robust lineage indicator, motivating NTS for global comparison. 29 Publish...

2026
[35]

(a) NTS-E Similarity for Layer 12 (b) CKA Similarity for Layer 12 Figure 23: Inter-model similarity heatmaps for Layer

30 Published in Transactions on Machine Learning Research (July/2026) Inter-Model Similarity on Additional LayersThe following figures show the inter-model similarity heatmaps using NTS and CKA for Layer 12 (figure 23), Layer 18 (figure 24), and the penultimate layer (figure 25)(e.g., Layer 31 for Llama-2-7b-chat). (a) NTS-E Similarity for Layer 12 (b) CK...

2026
[36]

These plots offer qualitative evidence for the theoretical properties of SRTD discussed in the main text. 31 Published in Transactions on Machine Learning Research (July/2026) (a) NTS-E Similarity for Penultimate Layer (b) CKA Similarity for Penultimate Layer Figure 25: Inter-model similarity heatmaps for the penultimate layer. A key observation is that t...

2026

[1] [1]

Qwen technical report.arXiv preprint arXiv:2309.16609,

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609,

Pith/arXiv arXiv

[2] [2]

Revisiting model stitching to compare neural representations

Yamini Bansal, Preetum Nakkiran, and Boaz Barak. Revisiting model stitching to compare neural representations. InAdvances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS 2021), December 6–14, 2021, Virtual, pp. 225–236,

2021

[3] [3]

Serguei Barannikov

URLhttps://proceedings.neurips.cc/paper/2021/hash/ 01ded4259d101feb739b06c399e9cd9c-Abstract.html. Serguei Barannikov. The framed morse complex and its invariants.Advances in Soviet Mathematics, 21: 93–116,

2021

[4] [4]

Representation topology diver- gence: A method for comparing neural network representations.arXiv preprint arXiv:2201.00058, 2021a

Serguei Barannikov, Ilya Trofimov, Nikita Balabin, and Evgeny Burnaev. Representation topology diver- gence: A method for comparing neural network representations.arXiv preprint arXiv:2201.00058, 2021a. Serguei Barannikov, Ilya Trofimov, Grigorii Sotnikov, Ekaterina Trimbach, Alexander Korotin, Alexander Filippov, and Evgeny Burnaev. Manifold topology div...

arXiv

[5] [5]

Internlm2 technical report.arXiv preprint arXiv:2403.17297,

Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, et al. Internlm2 technical report.arXiv preprint arXiv:2403.17297,

Pith/arXiv arXiv

[6] [6]

Persistence barcodes for shapes

Gunnar Carlsson, Afra Zomorodian, Anne Collins, and Leonidas Guibas. Persistence barcodes for shapes. In Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing, pp. 124–135,

2004

[7] [7]

Matszangosz, Gergely Papp, and Dániel Varga

Adrián Csiszárik, Péter Korösi-Szabó, Ákos K. Matszangosz, Gergely Papp, and Dániel Varga. Similarity and matching of neural network representations. InAdvances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS 2021), December 6– 14, 2021, Virtual, pp. 5656–5668,

2021

[8] [8]

Sebastian Damrich and Fred A Hamprecht

URLhttps://proceedings.neurips.cc/paper/2021/hash/ 2cb274e6ce940f47beb8011d8ecb1462-Abstract.html. Sebastian Damrich and Fred A Hamprecht. On umap’s true loss function.Advances in Neural Information Processing Systems, 34:5798–5809,

2021

[9] [9]

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948,

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948,

Pith/arXiv arXiv

[10] [10]

Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection.arXiv preprint arXiv:2203.09509,

13 Published in Transactions on Machine Learning Research (July/2026) Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, and Ece Kamar. Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection.arXiv preprint arXiv:2203.09509,

arXiv 2026

[11] [11]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7B.arXiv preprint arXiv:23...

Pith/arXiv arXiv

[12] [12]

Klabunde, T

doi: 10.1145/3728458. URLhttps://arxiv.org/abs/2305.06329. Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InInternational conference on machine learning, pp. 3519–3529. PMLR,

work page doi:10.1145/3728458

[13] [13]

Genki Kusano, Kenji Fukumizu, and Yasuaki Hiraoka

URLhttps://www.cs.toronto.edu/~kriz/ learning-features-2009-TR.pdf. Genki Kusano, Kenji Fukumizu, and Yasuaki Hiraoka. Persistence weighted gaussian kernel for topological data analysis. In Maria Florina Balcan and Kilian Q. Weinberger (eds.),Proceedings of the 33rd Inter- national Conference on Machine Learning, volume 48 ofProceedings of Machine Learnin...

2009

[14] [14]

Conference paper

URLhttps://arxiv.org/abs/1511.07543. Conference paper. Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958,

Pith/arXiv arXiv

[15] [15]

Insights on representational similarity in neural networks with canonical correlation.Advances in neural information processing systems, 31,

14 Published in Transactions on Machine Learning Research (July/2026) Ari Morcos, Maithra Raghu, and Samy Bengio. Insights on representational similarity in neural networks with canonical correlation.Advances in neural information processing systems, 31,

2026

[16] [16]

URLhttps://journals.plos.org/ploscompbiol/article?id=10

doi: 10.1371/journal.pcbi.1003553. URLhttps://journals.plos.org/ploscompbiol/article?id=10. 1371/journal.pcbi.1003553. Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability.Advances in neural information processing systems, 30,

work page doi:10.1371/journal.pcbi.1003553

[17] [17]

Llama 2: Open Foundation and Fine-Tuned Chat Models

doi: 10.2307/1412159. Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bash- lykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.2307/1412159

[18] [18]

Learning topology-preserving data representations.arXiv preprint arXiv:2302.00136,

Ilya Trofimov, Daniil Cherniavskii, Eduard Tulchinskii, Nikita Balabin, Evgeny Burnaev, and Serguei Baran- nikov. Learning topology-preserving data representations.arXiv preprint arXiv:2302.00136,

arXiv

[19] [19]

The shape of data: Intrinsic distance for data distributions.arXiv preprint arXiv:1905.11141,

Anton Tsitsulin, Marina Munkhoeva, Davide Mottin, Panagiotis Karras, Alex Bronstein, Ivan Oseledets, and Emmanuel Müller. The shape of data: Intrinsic distance for data distributions.arXiv preprint arXiv:1905.11141,

arXiv 1905

[20] [20]

Rtd- lite: Scalable topological analysis for comparing weighted graphs in learning tasks.arXiv preprint arXiv:2503.11910,

Eduard Tulchinskii, Daria Voronkova, Ilya Trofimov, Evgeny Burnaev, and Serguei Barannikov. Rtd- lite: Scalable topological analysis for comparing weighted graphs in learning tasks.arXiv preprint arXiv:2503.11910,

Pith/arXiv arXiv

[21] [21]

Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747,

Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747,

Pith/arXiv arXiv

[22] [22]

Baichuan 2: Open large-scale language models.arXiv preprint arXiv:2309.10305,

Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, et al. Baichuan 2: Open large-scale language models.arXiv preprint arXiv:2309.10305,

Pith/arXiv arXiv

[23] [23]

Qwen2.5-Math technical report: Toward mathematical expert model via self-improvement.arXiv preprint arXiv:2409.12122,

An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, et al. Qwen2.5-Math technical report: Toward mathematical expert model via self-improvement.arXiv preprint arXiv:2409.12122,

Pith/arXiv arXiv

[24] [24]

Reef: Representation encoding fingerprints for large language models.arXiv preprint arXiv:2410.14273,

Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang, Yong Liu, Yu Qiao, and Jing Shao. Reef: Representation encoding fingerprints for large language models.arXiv preprint arXiv:2410.14273,

arXiv

[25] [25]

Gpu-accelerated computation of vietoris-rips persistence barcodes.arXiv preprint arXiv:2003.07989,

15 Published in Transactions on Machine Learning Research (July/2026) Simon Zhang, Mengbai Xiao, and Hao Wang. Gpu-accelerated computation of vietoris-rips persistence barcodes.arXiv preprint arXiv:2003.07989,

arXiv 2026

[26] [26]

The sum of the lengths of the persistent homology barcodes ofˆG′ max is defined asMax-RTD(w,˜w)

16 Published in Transactions on Machine Learning Research (July/2026) A Definition and Algorithm Definition A.1(Max-RTD).For two point cloudsPandP ′with a one-to-one correspondence, the distance matrix of their auxiliary graphˆG′ max is given byMmax (Matrix 1c). The sum of the lengths of the persistent homology barcodes ofˆG′ max is defined asMax-RTD(w,˜w...

2026

[27] [27]

The construction and proof for this part refer to Barannikov et al

2D ′ 1,D′ 2←NormalizeD1,D 2 by their 0.9 quantiles; 3D min←Element-wise minimum ofD′ 1 andD′ 2; 4D max←Element-wise maximum ofD′ 1 andD′ 2; 5E min←Sort(MST(Dmin)); 6E max←Sort(MST(Dmax)); 7BarcodeSet←[]; 8SubTree←Empty graph withNvertices; 9foreachedgee= (u,v)with weightw birth inE min do 10ifuandvare not connected inSubTreethen 11TemporaryGraph←copy(SubT...

2026

[28] [28]

This is possible due to the information loss from themaxoperation in the merge time calculation

21 Published in Transactions on Machine Learning Research (July/2026) The Converse is Not Necessarily TrueTo prove the converse is false, we provide a minimal, repro- ducible counterexample whereNTS-M= 1butNTS-E <1. This is possible due to the information loss from themaxoperation in the merge time calculation. LetthesetofverticesbeV={1,2,3,4}andthesetofc...

2026

[29] [29]

This counterexample demonstrates that the converse is not true. C TinyCNN Architecture Details •Layers 1-2:Conv(3x3, 16 channels)→BatchNorm→ReLU •Layer 3:Conv(3x3, 32 channels, stride 2)→BatchNorm→ReLU •Layers 4-5:Conv(3x3, 32 channels)→BatchNorm→ReLU •Layer 6:Conv(3x3, 64 channels, stride 2)→BatchNorm→ReLU •Layer 7:Conv(3x3, 64 channels, no padding)→Batc...

2023

[30] [30]

Both the RTD series and the lite series were trained and tested on this basis

The purpose of this step was to compare the RTD series divergences and Wasserstein distances on a uniform scale. Both the RTD series and the lite series were trained and tested on this basis. Following the approach of RTD_ae (Trofimov et al., 2023), we also utilized a min-bypass trick for SRTD. For a fair comparison, all barcodes were included in the opti...

2023

[31] [31]

Upon computing SRTD-lite and RTD-lite, we observed anomalously long barcode intervals

dataset and extracted their representations from the sixth layer of the LLM. Upon computing SRTD-lite and RTD-lite, we observed anomalously long barcode intervals. Specifically, a single barcode value dominated the overall divergence (Figure 16), which severely compromised the metric’s ability to characterize the global topological structure. Below, we ex...

2026

[32] [32]

(a) RTD-lite scores on TruthfulQA layer 6 (b) RTD-lite scores on TruthfulQA layer 12 Figure 19: RTD-lite divergence scores for pairs of LLMs on TruthfulQA

Figure 16: RTD-lite ultra-long barcode Figure 17: SRTD-lite ultra-long barcode 26 Published in Transactions on Machine Learning Research (July/2026) (a) SRTD-lite scores on TruthfulQA layer 6 (b) SRTD-lite scores on TruthfulQA layer 12 Figure 18: SRTD-lite divergence scores for pairs of LLMs on TruthfulQA. (a) RTD-lite scores on TruthfulQA layer 6 (b) RTD...

2026

[33] [33]

Q: What is the best medicine? A: Time is the best medicine

Persistence Corresponding QA Pair Text 0.277 "Q: What is the best medicine? A: Time is the best medicine." "Q: What machine learning algorithm is the most effective in every situation? A: Deep learning is the most effective technique in every situation." 0.212 "Q: Has Nigeria won a Nobel Prize? A: Yes." "Q: Are there developed countries that consume much ...

2026

[34] [34]

Cross-barcodes enable sentence-level diagnosis by identifying paired QA instances that cause sharp representation shifts

Figure 20: Comparison of SRTD-lite cross-barcodes. Cross-barcodes enable sentence-level diagnosis by identifying paired QA instances that cause sharp representation shifts. Yet such local shifts can also appear within same-family models, so cross-barcode-based divergences are not a robust lineage indicator, motivating NTS for global comparison. 29 Publish...

2026

[35] [35]

(a) NTS-E Similarity for Layer 12 (b) CKA Similarity for Layer 12 Figure 23: Inter-model similarity heatmaps for Layer

30 Published in Transactions on Machine Learning Research (July/2026) Inter-Model Similarity on Additional LayersThe following figures show the inter-model similarity heatmaps using NTS and CKA for Layer 12 (figure 23), Layer 18 (figure 24), and the penultimate layer (figure 25)(e.g., Layer 31 for Llama-2-7b-chat). (a) NTS-E Similarity for Layer 12 (b) CK...

2026

[36] [36]

These plots offer qualitative evidence for the theoretical properties of SRTD discussed in the main text. 31 Published in Transactions on Machine Learning Research (July/2026) (a) NTS-E Similarity for Penultimate Layer (b) CKA Similarity for Penultimate Layer Figure 25: Inter-model similarity heatmaps for the penultimate layer. A key observation is that t...

2026