RADAR: Relative Angular Divergence Across Representations
Pith reviewed 2026-05-25 05:37 UTC · model grok-4.3
The pith
RADAR estimates cross-domain transferability by measuring divergence between within-domain and cross-domain representation trajectories.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RADAR analyzes the layer-wise evolution of representations by measuring angular alignments and relative changes in distance along layer-to-layer displacement trajectories, and by comparing empirical distributions of within-domain and cross-domain dynamics. Domain transferability is hypothesized to relate to the divergence between these trajectory distributions. Across vision and text benchmarks the metric yields competitive predictive performance, with particularly strong results when domain transitions are smooth or cleanly separated.
What carries the argument
RADAR metric: divergence between empirical distributions of within-domain versus cross-domain layer-to-layer angular alignments and relative distance changes in representation trajectories.
If this is right
- Practitioners can rank candidate source domains by RADAR score before fine-tuning to reduce negative transfer.
- The metric supplies a geometric signal that can be computed from unlabeled data in both source and target domains.
- Different modalities appear to favor different topological versions of the same trajectory comparison.
- Representation-space geometry becomes a measurable factor in deciding whether a foundation model will adapt well.
Where Pith is reading between the lines
- The same trajectory-divergence idea could be applied to decide which layers to freeze or adapt during transfer.
- If the hypothesis holds, models whose internal trajectories are already similar across domains might be preferred at training time.
- The approach invites direct comparison against other geometry-based transfer predictors that do not track layer-wise paths.
Load-bearing premise
Transfer success between domains is determined by how closely the patterns of layer-wise representation changes match inside one domain versus across domains.
What would settle it
A controlled test on paired domains where trajectory-distribution divergence is low yet fine-tuning still produces negative transfer, or divergence is high yet transfer succeeds.
Figures
read the original abstract
Machine learning methods rely on data. However, gathering suitable data can be challenging due to availability constraints, cost, or the need for domain expertise. Expanding datasets with additional sources is a common response to limited data, yet this practice does not always improve downstream performance and can sometimes lead to a loss of performance, known as negative transfer. We propose RADAR, a simple, geometrically grounded metric for estimating cross-domain transferability in foundation models. RADAR analyzes the layer-wise evolution of representations by measuring angular alignments and relative changes in distance along layer-to-layer displacement trajectories, and by comparing empirical distributions of within-domain and cross-domain dynamics. We hypothesize that domain transferability is related to the divergence between these trajectory distributions. We evaluate the metric across multiple modalities, including cross-lingual sentiment classification with text embedding models and cross-domain image classification with foundation vision models. Across several settings, RADAR provides competitive predictive performance relative to existing transferability metrics on several vision and text benchmarks, with particularly strong results when domain transitions are smooth or cleanly separated. Our ablations further suggest that the effectiveness of transferability estimation depends on the geometry of the model's internal representation space, with different modalities favoring different topological formulations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes RADAR, a geometrically grounded metric for predicting cross-domain transferability of foundation models. It computes a divergence score between empirical distributions of within-domain versus cross-domain layer-wise representation trajectories, where trajectories are characterized by angular alignments and relative distance changes. The central hypothesis is that lower divergence predicts better transfer performance. The metric is evaluated on cross-lingual sentiment classification with text embeddings and cross-domain image classification with vision foundation models, where it shows competitive correlation with observed transfer performance relative to prior metrics, with stronger results on smooth or cleanly separated domain shifts. Ablations indicate that effectiveness depends on the geometry of the model's representation space and that different modalities benefit from different topological formulations of the metric.
Significance. If the empirical results hold under full scrutiny, RADAR supplies a simple, interpretable, and modality-aware tool for anticipating negative transfer when augmenting limited datasets. Its explicit grounding in layer-wise trajectory geometry distinguishes it from purely output-based or parameter-count-based predictors and offers a falsifiable link between internal representation dynamics and downstream transfer. The multi-modality evaluation and ablation on topological variants add value by highlighting when geometric assumptions matter.
major comments (3)
- [§3.2] §3.2, divergence estimator: the manuscript must specify the exact distance or divergence used to compare the two empirical trajectory distributions (e.g., Wasserstein, KL, or MMD) and whether any bandwidth or binning hyper-parameters are required; without this, it is impossible to verify that the metric is parameter-free as claimed in the abstract.
- [Table 2, §4.1] Table 2 and §4.1, correlation tables: the reported Pearson/Spearman values for RADAR versus baselines are competitive, but the paper should report the number of domain pairs, the exact layer sampling procedure, and whether the same layers are used for all models; small differences in correlation could be driven by inconsistent layer selection rather than the metric itself.
- [§4.3] §4.3, modality-specific geometry claim: the ablation concludes that vision and text favor different topological formulations, yet only two modalities and a limited set of models are tested; the central claim that “effectiveness depends on the geometry of the model’s internal representation space” therefore rests on a narrow empirical base and requires either broader model coverage or a clearer theoretical justification.
minor comments (3)
- [Abstract, §2] The abstract and §2 should cite the specific prior transferability metrics (LogME, H-score, etc.) against which RADAR is compared so readers can immediately locate the baselines.
- [§3.1] Notation for angular alignment and relative distance change should be defined once in §3.1 with consistent symbols; several equations reuse the same symbols for different quantities.
- [Figure 3] Figure 3 caption should state the exact number of layers sampled and whether the trajectories are normalized per model or per domain pair.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive overall assessment. We address each major comment below and have revised the manuscript accordingly to improve clarity and reporting.
read point-by-point responses
-
Referee: [§3.2] §3.2, divergence estimator: the manuscript must specify the exact distance or divergence used to compare the two empirical trajectory distributions (e.g., Wasserstein, KL, or MMD) and whether any bandwidth or binning hyper-parameters are required; without this, it is impossible to verify that the metric is parameter-free as claimed in the abstract.
Authors: We agree this specification was missing. RADAR computes the divergence as the 2-Wasserstein distance between the two empirical distributions of trajectory features (angular alignments and relative distance changes), obtained via the exact optimal transport solution on the finite sample sets. No binning, kernels, or bandwidth parameters are used. We have updated §3.2 with this explicit description to confirm the metric is parameter-free. revision: yes
-
Referee: [Table 2, §4.1] Table 2 and §4.1, correlation tables: the reported Pearson/Spearman values for RADAR versus baselines are competitive, but the paper should report the number of domain pairs, the exact layer sampling procedure, and whether the same layers are used for all models; small differences in correlation could be driven by inconsistent layer selection rather than the metric itself.
Authors: We appreciate the request for additional experimental details. We have revised §4.1 and the caption of Table 2 to state that the vision experiments use 15 domain pairs and the text experiments use 12 domain pairs. All layers of each model are sampled, with layer indices normalized by total depth to ensure the same relative positions are compared across models of different architectures. revision: yes
-
Referee: [§4.3] §4.3, modality-specific geometry claim: the ablation concludes that vision and text favor different topological formulations, yet only two modalities and a limited set of models are tested; the central claim that “effectiveness depends on the geometry of the model’s internal representation space” therefore rests on a narrow empirical base and requires either broader model coverage or a clearer theoretical justification.
Authors: We partially concur that the empirical base is limited to two modalities. However, the ablations already cover multiple models within each modality and demonstrate consistent differences in topological variant performance. We have revised §4.3 to add a short theoretical paragraph linking angular trajectory divergence to the intrinsic geometry of representation manifolds and to qualify the claim as an observation supported by the current experiments rather than a general assertion. Extending to additional modalities is left for future work. revision: partial
Circularity Check
No significant circularity
full rationale
The paper introduces RADAR as an explicitly defined geometric metric based on angular alignments and relative distance changes along layer trajectories, then compares within-domain vs. cross-domain distribution divergences to observed transfer performance. This is an empirical proposal with no equations reducing a claimed prediction to a fitted input by construction, no load-bearing self-citations, and no ansatz or uniqueness claim imported from prior author work. The derivation chain is self-contained as a new metric whose validity is tested externally on benchmarks rather than asserted by internal redefinition.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Advances in Neural Information Processing Systems33, 21428–21439 (2020)
Alvarez-Melis, D., Fusi, N.: Geometric dataset distances via optimal transport. Advances in Neural Information Processing Systems33, 21428–21439 (2020)
work page 2020
-
[2]
In: 2019 IEEE international conference on image processing (ICIP)
Bao, Y ., Li, Y ., Huang, S.L., Zhang, L., Zheng, L., Zamir, A., Guibas, L.: An information- theoretic approach to transferability in task transfer learning. In: 2019 IEEE international conference on image processing (ICIP). pp. 2309–2313. IEEE (2019)
work page 2019
-
[3]
Perception Encoder: The best visual embeddings are not at the output of the network
Bolya, D., Huang, P.Y ., Sun, P., Cho, J.H., Madotto, A., Wei, C., Ma, T., Zhi, J., Rajasegaran, J., Rasheed, H., et al.: Perception encoder: The best visual embeddings are not at the output of the network. arXiv preprint arXiv:2504.13181 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
IEEE Transactions on Pattern Analysis and Machine Intelligence (2026)
Dong, H., Liu, M., Zhou, K., Chatzi, E., Kannala, J., Stachniss, C., Fink, O.: Advances in multimodal adaptation and generalization: From traditional approaches to foundation models. IEEE Transactions on Pattern Analysis and Machine Intelligence (2026)
work page 2026
-
[5]
Advances in data science and information engineering: proceedings from ICDATA 2020 and IKE 2020 pp
Farahani, A., V oghoei, S., Rasheed, K., Arabnia, H.R.: A brief review of domain adaptation. Advances in data science and information engineering: proceedings from ICDATA 2020 and IKE 2020 pp. 877–894 (2021)
work page 2020
-
[6]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention
Feydy, J., Roussillon, P., Trouvé, A., Gori, P.: Fast and scalable optimal transport for brain tractograms. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 636–644. Springer (2019)
work page 2019
-
[7]
Gebelein, H.: Das statistische problem der korrelation als variations-und eigenwertproblem und sein zusammenhang mit der ausgleichsrechnung. ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik21(6), 364–379 (1941)
work page 1941
-
[8]
Nature communications12(1), 5982 (2021)
Hénaff, O.J., Bai, Y ., Charlton, J.A., Nauhaus, I., Simoncelli, E.P., Goris, R.L.: Primary visual cortex straightens natural video trajectories. Nature communications12(1), 5982 (2021)
work page 2021
-
[9]
Nature neuroscience22(6), 984–991 (2019)
Hénaff, O.J., Goris, R.L., Simoncelli, E.P.: Perceptual straightening of natural videos. Nature neuroscience22(6), 984–991 (2019)
work page 2019
-
[10]
Proceedings of the International Conference on Learning Representations (2019)
Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. Proceedings of the International Conference on Learning Representations (2019)
work page 2019
-
[11]
Mathemat- ical Proceedings of the Cambridge Philosophical Society31(4), 520–524 (1935)
Hirschfeld, H.O.: A connection between correlation and contingency. Mathemat- ical Proceedings of the Cambridge Philosophical Society31(4), 520–524 (1935). https://doi.org/10.1017/S0305004100013517
-
[12]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Huang, L., Cao, X., Lu, H., Meng, Y ., Yang, F., Liu, X.: Mind the gap: Preserving and compensating for the modality gap in clip-based continual learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 3777–3786 (2025)
work page 2025
-
[13]
In: International conference on machine learning
Huang, L.K., Huang, J., Rong, Y ., Yang, Q., Wei, Y .: Frustratingly easy transferability estimation. In: International conference on machine learning. pp. 9201–9225. PMLR (2022) 10
work page 2022
-
[14]
https://huggingface.co/google/ embeddinggemma-300m(2026), accessed: 2026
Hugging Face: Embeddinggemma model card. https://huggingface.co/google/ embeddinggemma-300m(2026), accessed: 2026
work page 2026
-
[15]
https://huggingface.co/openai/ clip-vit-base-patch32(2026), accessed: 2026
Hugging Face: Model card: Clip. https://huggingface.co/openai/ clip-vit-base-patch32(2026), accessed: 2026
work page 2026
-
[16]
https://huggingface.co/facebook/ dinov3-vits16-pretrain-lvd1689m(2026), accessed: 2026
Hugging Face: Model card for dinov3. https://huggingface.co/facebook/ dinov3-vits16-pretrain-lvd1689m(2026), accessed: 2026
work page 2026
-
[17]
https://huggingface.co/Qwen/ Qwen3-Embedding-0.6B(2026), accessed: 2026
Hugging Face: Qwen3-embedding-0.6b. https://huggingface.co/Qwen/ Qwen3-Embedding-0.6B(2026), accessed: 2026
work page 2026
-
[18]
In: Forty-first International Conference on Machine Learning (2024)
Huh, M., Cheung, B., Wang, T., Isola, P.: Position: The platonic representation hypothesis. In: Forty-first International Conference on Machine Learning (2024)
work page 2024
-
[19]
In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases
Ibrahim, S., Ponomareva, N., Mazumder, R.: Newer is not always better: Rethinking transfer- ability metrics, their peculiarities, stability and performance. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 693–709. Springer (2022)
work page 2022
-
[20]
Internò, C., Geirhos, R., Olhofer, M., Liu, S., Hammer, B., Klindt, D.: AI-generated video detec- tion via perceptual straightening. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025),https://openreview.net/forum?id=LsmUgStXby
work page 2025
-
[21]
arXiv preprint arXiv:2201.05867 (2022)
Jiang, J., Shu, Y ., Wang, J., Long, M.: Transferability in deep learning: A survey. arXiv preprint arXiv:2201.05867 (2022)
-
[22]
Junguang, J., Baixu, C., Bo, F., Mingsheng, L.: Transfer-learning-library. https://github. com/thuml/Transfer-Learning-Library(2020)
work page 2020
- [23]
-
[24]
arXiv preprint arXiv:2602.10099 (2026)
Kumar, A., Patel, V .M.: Learning on the manifold: Unlocking standard diffusion transformers with representation encoders. arXiv preprint arXiv:2602.10099 (2026)
-
[25]
Li, D., Yang, Y ., Song, Y .Z., Hospedales, T.M.: Deeper, broader and artier domain generalization (2017),https://arxiv.org/abs/1710.03077
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[26]
Advances in Neural Information Processing Systems35, 17612–17625 (2022)
Liang, V .W., Zhang, Y ., Kwon, Y ., Yeung, S., Zou, J.Y .: Mind the gap: Understanding the modal- ity gap in multi-modal contrastive representation learning. Advances in Neural Information Processing Systems35, 17612–17625 (2022)
work page 2022
-
[27]
Neural Networks181, 106796 (2025)
Liu, X., Bai, Y ., Lu, Y ., Soltoggio, A., Kolouri, S.: Wasserstein task embedding for measuring task similarities. Neural Networks181, 106796 (2025)
work page 2025
-
[28]
In: International conference on machine learning
Long, M., Cao, Y ., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: International conference on machine learning. pp. 97–105. PMLR (2015)
work page 2015
-
[29]
IEEE transactions on pattern analysis and machine intelligence 29(9), 1546–1562 (2007)
Ma, Y ., Derksen, H., Hong, W., Wright, J.: Segmentation of multivariate mixed data via lossy data coding and compression. IEEE transactions on pattern analysis and machine intelligence 29(9), 1546–1562 (2007)
work page 2007
-
[30]
Mistretta, M., Baldrati, A., Agnolucci, L., Bertini, M., Bagdanov, A.D.: Cross the gap: Exposing the intra-modal misalignment in CLIP via modality inversion. In: The Thirteenth International Conference on Learning Representations (2025), https://openreview.net/forum?id= VVVfuIcmKR
work page 2025
-
[31]
Muennighoff, N., Tazi, N., Magne, L., Reimers, N.: Mteb: Massive text embedding bench- mark. In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. pp. 2014–2037 (2023)
work page 2014
-
[32]
In: International conference on machine learning
Nguyen, C., Hassner, T., Seeger, M., Archambeau, C.: Leep: A new measure to evaluate transferability of learned representations. In: International conference on machine learning. pp. 7294–7305. PMLR (2020) 11
work page 2020
-
[33]
arXiv preprint arXiv:2501.18901 (2025)
Nguyen, K., Nguyen, H., Pham, T., Ho, N.: Lightspeed geometric dataset distance via sliced optimal transport. arXiv preprint arXiv:2501.18901 (2025)
-
[34]
Nielsen, D.S., Enevoldsen, K., Schneider-Kamp, P.: Encoder vs decoder: Comparative analysis of encoder and decoder language models on multilingual nlu tasks. In: Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025). pp. 561–572 (2025)
work page 2025
-
[35]
In: Proceedings of the IEEE/CVF international conference on computer vision
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 1406–1415 (2019)
work page 2019
-
[36]
In: International conference on machine learning
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)
work page 2021
-
[37]
Acta Mathematica Academiae Scientiarum Hungarica 10(3), 441–451 (1959)
Rényi, A.: On measures of dependence. Acta Mathematica Academiae Scientiarum Hungarica 10(3), 441–451 (1959)
work page 1959
-
[38]
Simoni, O., et al.: Dinov3: Self-supervised learning for vision at unprecedented scale. arXiv preprint arXiv:2508.10104 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Tan, Y ., Li, Y ., Huang, S.L.: Otce: A transferability metric for cross-domain cross-task rep- resentations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 15779–15788 (2021)
work page 2021
-
[40]
IEEE Transactions on Neural Networks and Learning Systems 36(2), 2423–2436 (2024)
Tan, Y ., Zhang, E., Li, Y ., Huang, S.L., Zhang, X.P.: Transferability-guided cross-domain cross-task transfer learning. IEEE Transactions on Neural Networks and Learning Systems 36(2), 2423–2436 (2024)
work page 2024
-
[41]
In: Proceedings of the IEEE/CVF international conference on computer vision
Tran, A.T., Nguyen, C.V ., Hassner, T.: Transferability and hardness of supervised classification tasks. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 1395–1405 (2019)
work page 2019
-
[42]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Venkateswara, H., Eusebio, J., Chakraborty, S., Panchanathan, S.: Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5018–5027 (2017)
work page 2017
-
[43]
EmbeddingGemma: Powerful and Lightweight Text Representations
Vera, H.S., Dua, S., Zhang, B., Salz, D., Mullins, R., Panyam, S.R., Smoot, S., Naim, I., Zou, J., Chen, F., et al.: Embeddinggemma: Powerful and lightweight text representations. arXiv preprint arXiv:2509.20354 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
arXiv preprint arXiv:2507.03175 (2025)
Wang, H., Wang, J., Zhao, Z., Tan, Y ., Wu, Y ., Liu, H., Yang, J., Zhang, E., Chen, X., Rong, Z., et al.: Understanding knowledge transferability for transfer learning: A survey. arXiv preprint arXiv:2507.03175 (2025)
-
[45]
In: International conference on machine learning
You, K., Liu, Y ., Wang, J., Long, M.: Logme: Practical assessment of pre-trained models for transfer learning. In: International conference on machine learning. pp. 12133–12143. PMLR (2021)
work page 2021
-
[46]
Advances in neural information processing systems32 (2019)
Yurochkin, M., Claici, S., Chien, E., Mirzazadeh, F., Solomon, J.M.: Hierarchical optimal transport for document representation. Advances in neural information processing systems32 (2019)
work page 2019
-
[47]
IEEE/CAA Journal of Automatica Sinica10(2), 305–329 (2022)
Zhang, W., Deng, L., Zhang, L., Wu, D.: A survey on negative transfer. IEEE/CAA Journal of Automatica Sinica10(2), 305–329 (2022)
work page 2022
-
[48]
Zhang, Y ., Li, M., Long, D., Zhang, X., Lin, H., Yang, B., Xie, P., Yang, A., Liu, D., Lin, J., et al.: Qwen3 embedding: Advancing text embedding and reranking through foundation models. arXiv preprint arXiv:2506.05176 (2025) 12 A RADAR robustness on ImageNet-C dataset In this section, we examine the robustness of RADAR using the ImageNet-C dataset, whic...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[49]
or extreme (Severity 5), but exhibits vulnerabilities to intermediate, unstructured perturbations (Severity 3). 13 Figure 3:Visualizing ImageNet-C synthetic corruptions.Examples of various visual corruptions across increasing severities (0, 1, 3, and 5). The profound structural degradation at Severity 5 fundamentally scrambles patch-level token distributi...
-
[50]
to ensure tractable evaluation. Specifically, we restrict the size of DomainNet and OfficeHome by adopting sampling protocols standard in the domain adaptation literature [47]. Furthermore, we uniformly downsample the Amazon Reviews dataset from its original 256,000 samples per language to create a balanced, computationally efficient subset. Finally, to e...
-
[51]
Single-layer baselines are biased.Metrics that compare only final-layer feature distribu- tions (such as l2 centroid distance or s-OTDD) inherit a one-sided bias: they may report small divergence even when the underlying domains are meaningfully distinct at earlier layers
-
[52]
Multi-layer extraction is justified.Multi-layer extraction captures the full divergence profile across depth, recovering divergence signals that are monotonically suppressed as depth increases and would be invisible to any single-layer metric. The trajectory description leveraged by RADAR utilizes this insight by integrating geometric in- formation across...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.