pith. sign in

arxiv: 1907.11943 · v1 · pith:XKHLXEJ3new · submitted 2019-07-27 · 💻 cs.LG · cs.CV· stat.ML

Learnable Parameter Similarity

Pith reviewed 2026-05-24 14:48 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML
keywords learnable parameter similaritysecond-order semanticstask relationsmodel parameterstransfer learningModelSet500visual tasksparameter alignment
0
0 comments X

The pith

A second-order neural network learns an effective metric for similarity between second-order semantics hidden in independently trained visual models by aligning their parameters end-to-end.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Learnable Parameter Similarity (LPS) as a way to estimate relations between visual tasks by measuring similarities encoded in the parameters of models trained on those tasks. Most prior work addresses tasks in isolation and does not exploit the underlying connections that could support transfer learning or discovery of higher-order concepts. LPS trains a second-order neural network to align high-dimensional parameters from separate models and optimizes the similarity measure directly in an end-to-end fashion. The authors also introduce ModelSet500, a collection of 500 trained models, to serve as a benchmark for evaluating such parameter-similarity methods. If the approach holds, task relations become recoverable from existing model weights alone without additional task labels or manual selection.

Core claim

LPS demonstrates that second-order semantics relevant to task relations reside in the parameters of independently trained models and can be extracted and aligned by a second-order neural network trained end-to-end to produce a learned similarity metric, with effectiveness shown through experiments on the ModelSet500 benchmark containing 500 trained models.

What carries the argument

Learnable Parameter Similarity (LPS), a second-order neural network that aligns high-dimensional parameters of trained models to learn a similarity metric for second-order semantics.

If this is right

  • Task relations become estimable directly from model parameters without task-specific labels or post-hoc selection.
  • High-order semantic concepts shared across visual tasks can be revealed through learned parameter alignments.
  • Transfer learning can draw on automatically discovered task similarities for model selection or initialization.
  • A standardized benchmark of 500 models enables reproducible comparison of parameter-similarity techniques.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The learned metric could cluster large collections of models into task-based taxonomies for efficient model retrieval.
  • If parameter encodings of task semantics prove consistent across domains, the same alignment approach might apply outside vision.
  • The method suggests a route to unsupervised construction of task ontologies from archives of pretrained weights.

Load-bearing premise

Second-order semantics about task relations are reliably encoded inside the parameters of independently trained models and can be extracted and aligned by a second-order network without task-specific supervision.

What would settle it

Train the LPS network on ModelSet500 and test whether its output similarity scores fail to correlate with known task relations or fail to improve downstream transfer-learning accuracy compared with random or first-order baselines.

Figures

Figures reproduced from arXiv: 1907.11943 by Guangcong Wang, Guangrun Wang, Jianhuang Lai, Wenqi Liang.

Figure 1
Figure 1. Figure 1: Data similarity and parameter similarity [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Second-order neural network Let T denote a set of tasks {Ti} N i=1. For Ti ∈ T , we train mi deep models with a task label Yi . We obtain a model set D with M models {(θ ∗ j , Yj )}M j=1 from N tasks, where M = Pmj . θ ∗ j is a trained model, which can be regarded as a metadata point. We then use these M metadata points to train the second-order similarity learning by φt+1 = φt − βt∇h(D; φt) (2) where h is… view at source ↗
Figure 3
Figure 3. Figure 3: The effectiveness of learnable parameter similarity. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Most of the existing approaches focus on specific visual tasks while ignoring the relations between them. Estimating task relation sheds light on the learning of high-order semantic concepts, e.g., transfer learning. How to reveal the underlying relations between different visual tasks remains largely unexplored. In this paper, we propose a novel \textbf{L}earnable \textbf{P}arameter \textbf{S}imilarity (\textbf{LPS}) method that learns an effective metric to measure the similarity of second-order semantics hidden in trained models. LPS is achieved by using a second-order neural network to align high-dimensional model parameters and learning second-order similarity in an end-to-end way. In addition, we create a model set called ModelSet500 as a parameter similarity learning benchmark that contains 500 trained models. Extensive experiments on ModelSet500 validate the effectiveness of the proposed method. Code will be released at \url{https://github.com/Wanggcong/learnable-parameter-similarity}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes Learnable Parameter Similarity (LPS), a method that uses a second-order neural network to align high-dimensional parameters of independently trained models and learn an end-to-end metric for second-order semantic similarity between visual tasks. It introduces ModelSet500, an author-constructed benchmark of 500 trained models, and claims that extensive experiments on this set validate the effectiveness of LPS for revealing task relations relevant to transfer learning.

Significance. If the central claim holds, LPS could offer a new unsupervised route to infer task relations directly from model parameters, with potential applications in transfer learning and multi-task settings. The creation of ModelSet500 as a public benchmark and the stated intention to release code are concrete positive contributions that would aid reproducibility and further research. However, the significance is currently difficult to evaluate because the manuscript supplies no derivation of the loss, no quantitative results, and no controls that would isolate task semantics from other parameter statistics.

major comments (2)
  1. [Abstract] Abstract: the claim that LPS 'learns second-order similarity in an end-to-end way' is unsupported because no loss function, supervision signal, or training objective is defined. Without this, it is impossible to determine whether the second-order network is forced to recover task semantics rather than architecture, initialization, or optimizer artifacts.
  2. [Abstract] Abstract: ModelSet500 is described as author-constructed, yet no controls (same-task different seeds, same-architecture different tasks, or random task labels) are mentioned. Any reported correlation could therefore be explained by the network reproducing the authors' own task taxonomy rather than recovering an intrinsic encoding of task relations in the parameters.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable comments. We address each of the major comments point-by-point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that LPS 'learns second-order similarity in an end-to-end way' is unsupported because no loss function, supervision signal, or training objective is defined. Without this, it is impossible to determine whether the second-order network is forced to recover task semantics rather than architecture, initialization, or optimizer artifacts.

    Authors: We agree with the referee that the current manuscript does not define or derive a loss function, supervision signal, or training objective in the abstract (or, based on the provided text, elsewhere). The claim of end-to-end learning is therefore unsupported as written. We will revise the abstract and add a dedicated methods subsection that specifies the loss (a supervised contrastive objective using task labels from ModelSet500), the supervision signal, and how the second-order network is optimized. revision: yes

  2. Referee: [Abstract] Abstract: ModelSet500 is described as author-constructed, yet no controls (same-task different seeds, same-architecture different tasks, or random task labels) are mentioned. Any reported correlation could therefore be explained by the network reproducing the authors' own task taxonomy rather than recovering an intrinsic encoding of task relations in the parameters.

    Authors: We agree that the manuscript provides no description of controls for ModelSet500. Without them it is impossible to rule out that any observed correlations simply reproduce the authors' task taxonomy or other confounds. In the revision we will add the suggested controls (same-task different seeds, same-architecture different tasks, and random-label baselines) and report the corresponding quantitative results. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained empirical learning

full rationale

The paper defines LPS as a second-order neural network trained end-to-end to produce a similarity metric on ModelSet500. No equations, fitting steps, or self-citations are shown that would make the learned similarity equivalent to its training inputs by construction, nor does any claimed result reduce to a renamed input quantity. The method is a standard supervised metric learner whose output is not forced by the definition of its inputs; external validation on the benchmark is independent of any internal loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities beyond the high-level description of the second-order network itself.

pith-pipeline@v0.9.0 · 5692 in / 950 out tokens · 35560 ms · 2026-05-24T14:48:55.033351+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 4 internal anchors

  1. [1]

    Learning to learn by gradient descent by gradient descent

    Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems , pages 3981–3989, 2016

  2. [2]

    Deep learning of representations for unsupervised and transfer learning

    Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning workshop-V olume 27, pages 17–37. JMLR. org, 2011

  3. [3]

    Deep feature learning with relative distance comparison for person re-identification

    Shengyong Ding, Liang Lin, Guangrun Wang, and Hongyang Chao. Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition, 48(10):2993–3003, 2015

  4. [4]

    Model-agnostic meta-learning for fast adaptation of deep networks

    Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-V olume 70, pages 1126–1135. JMLR. org, 2017

  5. [5]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016

  6. [6]

    CyCADA: Cycle-Consistent Adversarial Domain Adaptation

    Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei A Efros, and Trevor Darrell. Cycada: Cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213, 2017

  7. [7]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009

  8. [8]

    Transfer learning for collaborative filtering via a rating-matrix generative model

    Bin Li, Qiang Yang, and Xiangyang Xue. Transfer learning for collaborative filtering via a rating-matrix generative model. In Proceedings of the 26th annual international conference on machine learning , pages 617–624. ACM, 2009

  9. [9]

    M2M-GAN: Many-to-Many Generative Adversarial Transfer Learning for Person Re-Identification

    Wenqi Liang, Guangcong Wang, Jianhuang Lai, and Junyong Zhu. M2m-gan: Many-to-many generative adversarial transfer learning for person re-identification. arXiv preprint arXiv:1811.03768, 2018

  10. [10]

    Progressive neural architecture search

    Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), pages 19–34, 2018

  11. [11]

    DARTS: Differentiable Architecture Search

    Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018

  12. [12]

    Coupled generative adversarial networks

    Ming-Yu Liu and Oncel Tuzel. Coupled generative adversarial networks. InAdvances in neural information processing systems, pages 469–477, 2016

  13. [13]

    Deep transfer learning with joint adaptation networks

    Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. Deep transfer learning with joint adaptation networks. In Proceedings of the 34th International Conference on Machine Learning-V olume 70, pages 2208–2217. JMLR. org, 2017

  14. [14]

    A survey on transfer learning

    Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010

  15. [15]

    Automatic differentiation in pytorch

    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017

  16. [16]

    To transfer or not to transfer

    Michael T Rosenstein, Zvika Marx, Leslie Pack Kaelbling, and Thomas G Dietterich. To transfer or not to transfer. In NIPS 2005 workshop on transfer learning , volume 898, pages 1–4, 2005

  17. [17]

    Deep learning face representation from predicting 10,000 classes

    Yi Sun, Xiaogang Wang, and Xiaoou Tang. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 1891–1898, 2014

  18. [18]

    Adversarial discriminative domain adaptation

    Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 7167–7176, 2017

  19. [19]

    Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance

    Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct):2837–2854, 2010

  20. [20]

    Taskonomy: Disentangling task transfer learning

    Amir R Zamir, Alexander Sax, William Shen, Leonidas J Guibas, Jitendra Malik, and Silvio Savarese. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3712–3722, 2018

  21. [21]

    Neural Architecture Search with Reinforcement Learning

    Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016. 9