Dissimilarity learning via Siamese network predicts brain imaging data
Pith reviewed 2026-05-25 11:14 UTC · model grok-4.3
The pith
Training a Siamese network directly on neural dissimilarity predicts MEG brain responses at state-of-the-art levels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By fine-tuning the initial layers of Alexnet to predict MEG early response and EVC data and all the layers of VGG-16 to predict MEG late response and IT data using a Siamese network whose contrastive loss is modified to train directly on neural dissimilarity, the model achieves state-of-the-art performance on the MEG dataset.
What carries the argument
A Siamese network with modified contrastive loss that takes image pairs as input and predicts the correlation distance between their output features to match measured neural dissimilarity.
If this is right
- Early visual responses are captured by fine-tuning only the initial layers of AlexNet when the training target is neural dissimilarity.
- Late visual responses require the full depth of VGG-16 when the training target is neural dissimilarity.
- Datasets with high signal-to-noise ratio such as MEG benefit most from dissimilarity-based training.
- Categorization objectives alone do not fully account for the dissimilarity structure observed across visual processing stages.
Where Pith is reading between the lines
- The visual system may prioritize preservation of pairwise dissimilarity over category discrimination as a core computational goal.
- Dissimilarity-trained features could be tested for improved transfer to new neural recording modalities or image sets.
- The same loss modification might be applied to other sensory domains where pairwise neural measurements are available.
Load-bearing premise
Direct training on measured neural dissimilarity supplies a better learning signal for predicting brain responses than training on image categories alone.
What would settle it
A direct comparison on the same Algonauts MEG data in which a standard categorization-trained network reaches equal or higher prediction accuracy than the dissimilarity-trained model.
read the original abstract
The advent of deep learning has a profound effect on visual neuroscience. It paved the way for new models to predict neural data. Although deep convolutional neural networks are explicitly trained for categorization, they learn a representation similar to a biological visual system. But categorization is not the only goal of the human visual system. Hence, the representation of a classification algorithm may not completely explain the visual processing stages. Here, I modified the traditional Siamese network loss function (Contrastive loss) to train them directly on neural dissimilarity. This network takes image pair as input and predicts the correlation distance between their output features. For Algonauts challenge, using dissimilarity learning, I fine-tuned the initial layers of Alexnet to predict MEG early response/EVC data and all the layers of VGG-16 to predict MEG late response/IT data. This approach is ideal for datasets with high SNR. Therefore, my model achieved state-of-the-art performance on MEG dataset but not fMRI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes training a Siamese network with a modified contrastive loss to directly predict neural dissimilarity (correlation distance between features) from brain imaging data rather than using standard categorization objectives. It applies this to fine-tune initial layers of AlexNet for early MEG/EVC responses and all layers of VGG-16 for late MEG/IT responses in the Algonauts challenge, claiming state-of-the-art performance on MEG (but not fMRI) due to high SNR in the MEG data.
Significance. If the central claim holds after proper validation, the work would demonstrate that dissimilarity-based training can better align DNN representations with brain data than categorization training, particularly for high-SNR neural datasets. This addresses a known mismatch between typical DNN objectives and visual system goals and could improve predictive models in visual neuroscience.
major comments (3)
- [Abstract] Abstract: The modified contrastive loss is described only at a high level ('modified the traditional Siamese network loss function (Contrastive loss) to train them directly on neural dissimilarity') with no equation, no definition of the target (e.g., how correlation distance is computed or scaled), and no implementation details. This formulation is load-bearing for the central claim that dissimilarity training is superior.
- [Abstract] Abstract: No controlled comparison is reported that holds network architecture, data, and fine-tuning schedule fixed while varying only the training objective (dissimilarity vs. standard categorization). Without this ablation, gains cannot be attributed to the dissimilarity target rather than layer selection or other unstated choices.
- [Abstract] Abstract: The SOTA claim on MEG lacks any reported metrics, error bars, validation statistics, or explicit comparison to other Algonauts submissions or standard baselines. The statement that the approach 'achieved state-of-the-art performance' cannot be assessed from the provided information.
minor comments (1)
- [Abstract] Abstract: Minor grammatical issues ('has a profound effect' should read 'has had a profound effect'; 'This approach is ideal for datasets with high SNR' is stated without supporting reasoning or evidence).
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment point-by-point below, proposing revisions where appropriate to improve clarity and rigor.
read point-by-point responses
-
Referee: [Abstract] Abstract: The modified contrastive loss is described only at a high level ('modified the traditional Siamese network loss function (Contrastive loss) to train them directly on neural dissimilarity') with no equation, no definition of the target (e.g., how correlation distance is computed or scaled), and no implementation details. This formulation is load-bearing for the central claim that dissimilarity training is superior.
Authors: We agree that the abstract provides only a high-level description. The target dissimilarity is defined as the correlation distance (1 - Pearson r) between the relevant neural feature vectors, and the loss is a modified contrastive loss that directly regresses to this value rather than using binary same/different labels. We will revise the abstract to include a concise statement of this formulation and direct readers to the methods for the full equation and scaling details. revision: yes
-
Referee: [Abstract] Abstract: No controlled comparison is reported that holds network architecture, data, and fine-tuning schedule fixed while varying only the training objective (dissimilarity vs. standard categorization). Without this ablation, gains cannot be attributed to the dissimilarity target rather than layer selection or other unstated choices.
Authors: The manuscript does not report such a controlled ablation. While the approach uses the same base architectures (AlexNet, VGG-16) as prior work, we did not explicitly compare dissimilarity versus categorization objectives under identical fine-tuning conditions. We will add an explicit discussion of this limitation and, where data permit, include a controlled comparison in the revised manuscript. revision: yes
-
Referee: [Abstract] Abstract: The SOTA claim on MEG lacks any reported metrics, error bars, validation statistics, or explicit comparison to other Algonauts submissions or standard baselines. The statement that the approach 'achieved state-of-the-art performance' cannot be assessed from the provided information.
Authors: The abstract is brief by design, but the full manuscript and Algonauts submission materials contain the specific leaderboard scores, baseline comparisons, and validation statistics supporting the MEG result. We will revise the abstract to report key quantitative metrics and error information to make the claim self-contained. revision: yes
Circularity Check
No significant circularity; training targets are independent external neural measurements
full rationale
The paper modifies a Siamese network's contrastive loss to train directly on neural dissimilarity (correlation distance) computed from brain imaging data, then uses the resulting fine-tuned CNN layers to predict held-out MEG/fMRI responses in the Algonauts challenge. The targets originate from separate experimental recordings rather than from the model's own predictions or fitted parameters. No equations, self-citations, or uniqueness claims reduce the central result to its inputs by construction. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- Network layers selected for fine-tuning
axioms (1)
- domain assumption Correlation distance between network features can be trained to match neural dissimilarity
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.