Dissimilarity learning via Siamese network predicts brain imaging data

Aakash Agrawal

arxiv: 1907.02591 · v1 · pith:EBTJHHTSnew · submitted 2019-07-01 · 🧬 q-bio.NC

Dissimilarity learning via Siamese network predicts brain imaging data

Aakash Agrawal This is my paper

Pith reviewed 2026-05-25 11:14 UTC · model grok-4.3

classification 🧬 q-bio.NC

keywords dissimilarity learningSiamese networkMEG predictionfMRIvisual neurosciencebrain imagingAlgonauts challengecontrastive loss

0 comments

The pith

Training a Siamese network directly on neural dissimilarity predicts MEG brain responses at state-of-the-art levels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that convolutional networks can be trained to reproduce dissimilarity patterns measured in brain recordings rather than being optimized only for image categorization. It alters the contrastive loss inside a Siamese architecture so the network receives image pairs and learns to output feature correlations that match observed neural dissimilarities. When applied to the Algonauts challenge, this yields superior predictions for early MEG and EVC signals from AlexNet's initial layers and for late MEG and IT signals from all layers of VGG-16. The method works best on high signal-to-noise data, which explains its success on MEG and weaker results on fMRI.

Core claim

By fine-tuning the initial layers of Alexnet to predict MEG early response and EVC data and all the layers of VGG-16 to predict MEG late response and IT data using a Siamese network whose contrastive loss is modified to train directly on neural dissimilarity, the model achieves state-of-the-art performance on the MEG dataset.

What carries the argument

A Siamese network with modified contrastive loss that takes image pairs as input and predicts the correlation distance between their output features to match measured neural dissimilarity.

If this is right

Early visual responses are captured by fine-tuning only the initial layers of AlexNet when the training target is neural dissimilarity.
Late visual responses require the full depth of VGG-16 when the training target is neural dissimilarity.
Datasets with high signal-to-noise ratio such as MEG benefit most from dissimilarity-based training.
Categorization objectives alone do not fully account for the dissimilarity structure observed across visual processing stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The visual system may prioritize preservation of pairwise dissimilarity over category discrimination as a core computational goal.
Dissimilarity-trained features could be tested for improved transfer to new neural recording modalities or image sets.
The same loss modification might be applied to other sensory domains where pairwise neural measurements are available.

Load-bearing premise

Direct training on measured neural dissimilarity supplies a better learning signal for predicting brain responses than training on image categories alone.

What would settle it

A direct comparison on the same Algonauts MEG data in which a standard categorization-trained network reaches equal or higher prediction accuracy than the dissimilarity-trained model.

read the original abstract

The advent of deep learning has a profound effect on visual neuroscience. It paved the way for new models to predict neural data. Although deep convolutional neural networks are explicitly trained for categorization, they learn a representation similar to a biological visual system. But categorization is not the only goal of the human visual system. Hence, the representation of a classification algorithm may not completely explain the visual processing stages. Here, I modified the traditional Siamese network loss function (Contrastive loss) to train them directly on neural dissimilarity. This network takes image pair as input and predicts the correlation distance between their output features. For Algonauts challenge, using dissimilarity learning, I fine-tuned the initial layers of Alexnet to predict MEG early response/EVC data and all the layers of VGG-16 to predict MEG late response/IT data. This approach is ideal for datasets with high SNR. Therefore, my model achieved state-of-the-art performance on MEG dataset but not fMRI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches a Siamese network with a tweaked contrastive loss trained on neural dissimilarity for MEG prediction but supplies no loss equation, ablations, or numbers to back the SOTA claim.

read the letter

The main point is that this work takes a standard Siamese architecture, alters its contrastive loss to target neural dissimilarity instead of labels, and applies the result to fine-tune AlexNet and VGG-16 on Algonauts MEG data, claiming better performance on high-SNR recordings than on fMRI. The motivation is straightforward: categorization training may miss aspects of visual representation that dissimilarity captures. That direction is reasonable and worth testing. The paper also notes that early layers suit early MEG responses while deeper layers suit later ones, which matches known hierarchy in visual cortex. Beyond that, there is little to evaluate. The abstract states the modification and the outcome but gives no equation for the altered loss, no description of how the correlation-distance target is computed from the brain data, and no controlled comparison that keeps the network and data fixed while changing only the training objective. No error bars, no baseline numbers, and no ablation on layer selection appear either. The stress-test concern holds: without those controls it is impossible to tell whether any gains come from the dissimilarity target or from other unstated choices. The work is aimed at the small group already running brain-encoding models on challenge data and already comfortable with Siamese nets. A reader outside that niche will find almost nothing usable. In its current form the paper does not supply enough evidence for a serious referee to spend time on it; the central result cannot be assessed.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes training a Siamese network with a modified contrastive loss to directly predict neural dissimilarity (correlation distance between features) from brain imaging data rather than using standard categorization objectives. It applies this to fine-tune initial layers of AlexNet for early MEG/EVC responses and all layers of VGG-16 for late MEG/IT responses in the Algonauts challenge, claiming state-of-the-art performance on MEG (but not fMRI) due to high SNR in the MEG data.

Significance. If the central claim holds after proper validation, the work would demonstrate that dissimilarity-based training can better align DNN representations with brain data than categorization training, particularly for high-SNR neural datasets. This addresses a known mismatch between typical DNN objectives and visual system goals and could improve predictive models in visual neuroscience.

major comments (3)

[Abstract] Abstract: The modified contrastive loss is described only at a high level ('modified the traditional Siamese network loss function (Contrastive loss) to train them directly on neural dissimilarity') with no equation, no definition of the target (e.g., how correlation distance is computed or scaled), and no implementation details. This formulation is load-bearing for the central claim that dissimilarity training is superior.
[Abstract] Abstract: No controlled comparison is reported that holds network architecture, data, and fine-tuning schedule fixed while varying only the training objective (dissimilarity vs. standard categorization). Without this ablation, gains cannot be attributed to the dissimilarity target rather than layer selection or other unstated choices.
[Abstract] Abstract: The SOTA claim on MEG lacks any reported metrics, error bars, validation statistics, or explicit comparison to other Algonauts submissions or standard baselines. The statement that the approach 'achieved state-of-the-art performance' cannot be assessed from the provided information.

minor comments (1)

[Abstract] Abstract: Minor grammatical issues ('has a profound effect' should read 'has had a profound effect'; 'This approach is ideal for datasets with high SNR' is stated without supporting reasoning or evidence).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment point-by-point below, proposing revisions where appropriate to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: The modified contrastive loss is described only at a high level ('modified the traditional Siamese network loss function (Contrastive loss) to train them directly on neural dissimilarity') with no equation, no definition of the target (e.g., how correlation distance is computed or scaled), and no implementation details. This formulation is load-bearing for the central claim that dissimilarity training is superior.

Authors: We agree that the abstract provides only a high-level description. The target dissimilarity is defined as the correlation distance (1 - Pearson r) between the relevant neural feature vectors, and the loss is a modified contrastive loss that directly regresses to this value rather than using binary same/different labels. We will revise the abstract to include a concise statement of this formulation and direct readers to the methods for the full equation and scaling details. revision: yes
Referee: [Abstract] Abstract: No controlled comparison is reported that holds network architecture, data, and fine-tuning schedule fixed while varying only the training objective (dissimilarity vs. standard categorization). Without this ablation, gains cannot be attributed to the dissimilarity target rather than layer selection or other unstated choices.

Authors: The manuscript does not report such a controlled ablation. While the approach uses the same base architectures (AlexNet, VGG-16) as prior work, we did not explicitly compare dissimilarity versus categorization objectives under identical fine-tuning conditions. We will add an explicit discussion of this limitation and, where data permit, include a controlled comparison in the revised manuscript. revision: yes
Referee: [Abstract] Abstract: The SOTA claim on MEG lacks any reported metrics, error bars, validation statistics, or explicit comparison to other Algonauts submissions or standard baselines. The statement that the approach 'achieved state-of-the-art performance' cannot be assessed from the provided information.

Authors: The abstract is brief by design, but the full manuscript and Algonauts submission materials contain the specific leaderboard scores, baseline comparisons, and validation statistics supporting the MEG result. We will revise the abstract to report key quantitative metrics and error information to make the claim self-contained. revision: yes

Circularity Check

0 steps flagged

No significant circularity; training targets are independent external neural measurements

full rationale

The paper modifies a Siamese network's contrastive loss to train directly on neural dissimilarity (correlation distance) computed from brain imaging data, then uses the resulting fine-tuned CNN layers to predict held-out MEG/fMRI responses in the Algonauts challenge. The targets originate from separate experimental recordings rather than from the model's own predictions or fitted parameters. No equations, self-citations, or uniqueness claims reduce the central result to its inputs by construction. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that neural dissimilarity measured by correlation distance is a suitable direct training target for CNN feature spaces, with layer selection as a modeling choice.

free parameters (1)

Network layers selected for fine-tuning
Initial layers of AlexNet and all layers of VGG-16 chosen to match early vs late responses

axioms (1)

domain assumption Correlation distance between network features can be trained to match neural dissimilarity
Core of the modified loss function

pith-pipeline@v0.9.0 · 5686 in / 1146 out tokens · 50286 ms · 2026-05-25T11:14:41.610351+00:00 · methodology

Dissimilarity learning via Siamese network predicts brain imaging data

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)