pith. sign in

arxiv: 2606.29577 · v1 · pith:QJLLVIYHnew · submitted 2026-06-28 · 💻 cs.CV · cs.AI

ReMAP-PET: Beyond Visual Understanding -- Learning Region-Guided Metabolic Alignment Semantics from Brain PET

Pith reviewed 2026-06-30 07:07 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords brain PETSUVR profilesmetabolic semanticscontrastive alignment3D ResNetclinical language modelsneuroimagingfoundation models
0
0 comments X

The pith

Supervising PET encoders with regional SUVR profiles yields structured, interpretable and language-compatible representations

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard 3D brain foundation models overlook the distinguishing regional metabolic information in PET by treating scans as generic volumes. ReMAP-PET counters this by partially tuning a MedicalNet 3D ResNet-50 on 1015 paired PET-SUVR samples via joint regression and contrastive objectives that directly supervise regional standardized uptake value ratio profiles. The resulting embeddings achieve 0.070 MAE and 77.8 percent Recall@1 on SUVR tasks, connect to frozen BioClinicalBERT through contrastive alignment, and support PET-to-report generation plus linear-probe diagnostic and cognitive tasks without further fine-tuning. A sympathetic reader would care because PET is central to neurodegenerative assessment and structured metabolic representations could make downstream clinical use more direct and interpretable.

Core claim

ReMAP-PET moves beyond visual encoding by supervising a partially-tuned MedicalNet 3D ResNet-50 with brain regional standardized uptake value ratio (SUVR) profiles through joint regression and contrastive objectives, enabling the encoder to learn the metabolic semantics underlying PET modality. On 1015 paired PET--SUVR samples, ReMAP-PET achieves 0.070 SUVR MAE and 77.8% PET SUVR Recall@1, substantially outperforming five frozen pretrained baselines. It further connects the metabolic embedding to clinical language via contrastive alignment with frozen BioClinicalBERT and demonstrates end-to-end PET-to-report generation through SUVR-constrained verbalization. Linear probing on diagnostic clas

What carries the argument

Joint regression on regional SUVR profiles plus contrastive alignment, applied to a 3D ResNet-50 to embed metabolic semantics rather than generic volumetric features

Load-bearing premise

Joint regression on SUVR profiles plus contrastive alignment is sufficient to inject the distinguishing metabolic semantics of PET without requiring task-specific fine-tuning or external validation that the learned embeddings capture causal metabolic differences rather than dataset-specific correlations

What would settle it

If the ReMAP-PET embeddings show no advantage over generic volumetric encoders on an independent multi-scanner dataset for the same clinical tasks, the claim that the supervision injects meaningful metabolic semantics would be falsified

Figures

Figures reproduced from arXiv: 2606.29577 by Dasen Dai, Hongjie Yu, Jagath C. Rajapakse, Qingxin Zhang, Qizhen Lan, Shuoqi Li, Vince D. Calhoun, Yanteng Zhang, Yuxiang Wei.

Figure 1
Figure 1. Figure 1: Existing methods treat PET as generic 3D [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of ReMAP-PET. Stage 1 (left) aligns a partially-tuned 3D PET encoder with structured 120-region SUVR profiles via joint regression and contrastive objectives. Stage 2 (center) connects the frozen metabolic embedding to clinical language through lightweight projection heads paired with frozen BioClinicalBERT. Downstream probing (right) evaluates the learned representations on diagnostic classificat… view at source ↗
Figure 3
Figure 3. Figure 3: PET–SUVR cosine similarity matrices on the [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Predicted versus ground-truth 120-region [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: sweeps the contrastive weight λcon with λreg = 1.0 fixed, using the MedicalNet layer4 setup. The most informative comparison is at the two endpoints. With λcon = 0 (pure regression), the model achieves its lowest SUVR error (MAE 0.055) but its retrieval collapses to chance (R@1 ≈ 0.007, essentially 1/153). The PET encoder has learned to predict SUVR values numerically, but its embedding space carries no us… view at source ↗
Figure 6
Figure 6. Figure 6: Per-subject SUVR prediction for a represen [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
read the original abstract

Positron Emission Tomography (PET) reveals brain metabolism and is clinically central to neurodegenerative disease assessment, yet existing 3D brain foundation models treat PET as generic volumetric data, missing the structured regional metabolic information that distinguishes it from structural neuroimaging. To address these limitations, we propose ReMAP-PET, a framework that moves beyond visual encoding by supervising a partially-tuned MedicalNet 3D ResNet-50 with brain regional standardized uptake value ratio (SUVR) profiles through joint regression and contrastive objectives, enabling the encoder to learn the metabolic semantics underlying PET modality. On 1015 paired PET--SUVR samples, ReMAP-PET achieves 0.070 SUVR MAE and 77.8% PET SUVR Recall@1, substantially outperforming five frozen pretrained baselines. We further connect the metabolic embedding to clinical language via contrastive alignment with frozen BioClinicalBERT and demonstrate end-to-end PET-to-report generation through SUVR-constrained verbalization. Linear probing on diagnostic classification and cognitive regression tasks confirms that the embeddings retain clinically relevant information without task-specific fine-tuning. Our results show that grounding PET encoders in regional metabolic semantics -- rather than treating PET as generic volumetric data -- yields representations that are structured, interpretable, and language-compatible, pointing to a new direction for metabolic-aware PET understanding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes ReMAP-PET, which supervises a partially-tuned 3D ResNet-50 encoder on 1015 paired PET-SUVR samples using joint regression on regional SUVR profiles and contrastive alignment with frozen BioClinicalBERT. It reports 0.070 SUVR MAE and 77.8% Recall@1, outperforming five frozen baselines, plus language-compatible embeddings enabling PET-to-report generation and retention of diagnostic/cognitive information under linear probing.

Significance. If the central claim holds under rigorous validation, the work would establish a concrete route for injecting domain-specific metabolic structure into PET encoders, yielding representations that are more interpretable and clinically aligned than generic volumetric pretraining. The language-alignment component and end-to-end verbalization are particularly noteworthy strengths.

major comments (3)
  1. [Abstract] Abstract: the reported 0.070 MAE and 77.8% Recall@1 are obtained by direct regression on the identical SUVR profiles that define the supervision target; without held-out clinical endpoints independent of these fitted values, the numbers demonstrate in-distribution reconstruction rather than acquisition of generalizable metabolic semantics.
  2. [Abstract] Abstract / Experiments section: no information is supplied on train/test splits, hyperparameter search protocol, or whether the five baselines were re-evaluated under identical conditions and preprocessing; these omissions render the outperformance claim impossible to assess.
  3. [Abstract] Abstract: linear-probing results on diagnostic classification and cognitive regression are presented as evidence that the embeddings retain clinically relevant information, yet the probes are still performed on data whose labels correlate with the SUVR supervision signal; an external validation set or task whose ground truth is causally independent of the fitted SUVR values is required to isolate the contribution of metabolic grounding.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'partially-tuned MedicalNet 3D ResNet-50' is used without specifying which layers remain frozen or the precise tuning schedule.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the insightful comments regarding the evaluation of our method. We provide point-by-point responses below and have updated the manuscript accordingly where feasible.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported 0.070 MAE and 77.8% Recall@1 are obtained by direct regression on the identical SUVR profiles that define the supervision target; without held-out clinical endpoints independent of these fitted values, the numbers demonstrate in-distribution reconstruction rather than acquisition of generalizable metabolic semantics.

    Authors: The referee is correct that these metrics are evaluated on the SUVR supervision targets. Our defense is that the joint training objective, including contrastive alignment, enables the model to learn structured metabolic representations rather than mere pixel-level reconstruction. The Recall@1 specifically measures how well the learned embeddings match the regional profiles in a retrieval task. We maintain that this constitutes evidence of metabolic semantics acquisition within the available data distribution. revision: no

  2. Referee: [Abstract] Abstract / Experiments section: no information is supplied on train/test splits, hyperparameter search protocol, or whether the five baselines were re-evaluated under identical conditions and preprocessing; these omissions render the outperformance claim impossible to assess.

    Authors: We agree that these details were missing from the submission. In the revised version, we will include the train/test split information (using an 80/20 split), the hyperparameter search protocol (grid search over learning rates and batch sizes on a validation subset), and confirmation that all baselines were re-evaluated under the same conditions and preprocessing pipeline. revision: yes

  3. Referee: [Abstract] Abstract: linear-probing results on diagnostic classification and cognitive regression are presented as evidence that the embeddings retain clinically relevant information, yet the probes are still performed on data whose labels correlate with the SUVR supervision signal; an external validation set or task whose ground truth is causally independent of the fitted SUVR values is required to isolate the contribution of metabolic grounding.

    Authors: We acknowledge the potential correlation between the supervision signal and the probing labels. The cognitive regression tasks involve standardized clinical scores that are not direct functions of SUVR. We will revise the manuscript to explicitly discuss this limitation and the correlational nature of the evidence provided by linear probing. revision: partial

standing simulated objections not resolved
  • Requirement for an external validation set or task with ground truth causally independent of the fitted SUVR values, which cannot be addressed with the current 1015-sample dataset.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained with explicit supervision and independent downstream probes

full rationale

The paper explicitly supervises a 3D ResNet-50 encoder via joint regression on regional SUVR profiles plus contrastive alignment on 1015 paired samples, then reports the resulting MAE/Recall@1 on that objective (outperforming frozen baselines) and performs linear probing on separate diagnostic classification and cognitive regression tasks. No equation or claim reduces a 'prediction' to the training target by construction, no self-citation chain bears the central premise, and no uniqueness theorem or ansatz is imported from prior author work. The SUVR regression performance is the direct consequence of the stated objective rather than a disguised renaming or fit; the additional clinical probes supply content independent of the fitted SUVR values themselves. This is standard supervised representation learning and does not match any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Insufficient information in abstract to enumerate free parameters, axioms, or invented entities; the supervision signal (SUVR profiles) is treated as given ground truth without stated derivation or uncertainty model.

pith-pipeline@v0.9.1-grok · 5803 in / 1176 out tokens · 20306 ms · 2026-06-30T07:07:03.130682+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    , journal =

    Tak, Divyanshu and Garomsa, Biniam and Zapaishchykova, Anna and Chaunzwa, Tafadzwa and Climent Pardo, Juan Carlos and Ye, Zezhong and Zielke, John and Ravipati, Yashwanth and Pai, Suraj and Vajapeyam, Sri and Mahootiha, Maryam and Parker, Mitchell and Pike, Luke and Smith, Ceilidh and Familiar, Ariana and Liu, Kevin and Prabhu, Sanjay and Arnaout, Omar an...

  2. [2]

    arXiv preprint arXiv:2509.00549 (2025)

    A Modality-Agnostic Multi-Task Foundation Model for Human Brain Imaging , author =. 2025 , eprint =. doi:10.48550/arXiv.2509.00549 , url =

  3. [3]

    Anatomical Foundation Models for Brain

    Barbano, Carlo Alberto and Brunello, Matteo and Dufumier, Benoit and Grangetto, Marco , journal =. Anatomical Foundation Models for Brain. 2026 , doi =

  4. [4]

    and Bernstein, Matt A

    Jack, Clifford R. and Bernstein, Matt A. and Fox, Nick C. and Thompson, Paul and Alexander, Gene and Harvey, Danielle and Borowski, Bret and Britson, Paula J. and Whitwell, Jennifer L. and Ward, Chad and Dale, Anders M. and Felmlee, Joel P. and Gunter, Jeffrey L. and Hill, Derek L. G. and Killiany, Ronald and Schuff, Norbert and Fox-Bosetti, Susan and Lin...

  5. [5]

    and Veitch, Dallas P

    Weiner, Michael W. and Veitch, Dallas P. and Aisen, Paul S. and Beckett, Laurel A. and Cairns, Nigel J. and Green, Robert C. and Harvey, Danielle and Jack, Clifford R. and Jagust, William and Morris, John C. and Petersen, Ronald C. and Saykin, Andrew J. and Shaw, Leslie and Toga, Arthur W. and Trojanowski, John Q. , journal =. The. 2013 , doi =

  6. [6]

    Automated Anatomical Labeling of Activations in

    Tzourio-Mazoyer, Nathalie and Landeau, Brigitte and Papathanassiou, Dimitri and Crivello, Fabrice and Etard, Olivier and Delcroix, Nicolas and Mazoyer, Bernard and Joliot, Marc , journal =. Automated Anatomical Labeling of Activations in. 2002 , doi =

  7. [7]

    2019 , doi =

    Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , booktitle =. 2019 , doi =

  8. [8]

    Proceedings of the 38th International Conference on Machine Learning , pages =

    Learning Transferable Visual Models From Natural Language Supervision , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , url =

  9. [9]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

    Sigmoid Loss for Language Image Pre-Training , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =. 2023 , doi =

  10. [10]

    The Lancet Neurology , volume =

    Hypothetical Model of Dynamic Biomarkers of the Alzheimer's Pathological Cascade , author =. The Lancet Neurology , volume =. 2010 , doi =

  11. [11]

    and Veitch, Dallas P

    Weiner, Michael W. and Veitch, Dallas P. and Aisen, Paul S. and Beckett, Laurel A. and Cairns, Nigel J. and Green, Robert C. and Harvey, Danielle and Jack, Clifford R. and Jagust, William and Liu, Enchi and Morris, John C. and Petersen, Ronald C. and Saykin, Andrew J. and Schmidt, Mark E. and Shaw, Leslie and Siuciak, Judith A. and Soares, Holly and Toga,...

  12. [12]

    Proceedings of the Asian Conference on Computer Vision , year =

    Kim, Jonghun and Kim, Mansu and Park, Hyunjin , title =. Proceedings of the Asian Conference on Computer Vision , year =

  13. [13]

    and DeLong, David M

    DeLong, Elizabeth R. and DeLong, David M. and Clarke-Pearson, Daniel L. , title =. Biometrics , year =

  14. [14]

    2022 , eprint=

    Contrastive Learning of Medical Visual Representations from Paired Images and Text , author=. 2022 , eprint=

  15. [15]

    2022 , eprint=

    MedCLIP: Contrastive Learning from Unpaired Medical Images and Text , author=. 2022 , eprint=

  16. [16]

    P ub M ed CLIP : How Much Does CLIP Benefit Visual Question Answering in the Medical Domain?

    Eslami, Sedigheh and Meinel, Christoph and de Melo, Gerard. P ub M ed CLIP : How Much Does CLIP Benefit Visual Question Answering in the Medical Domain?. Findings of the Association for Computational Linguistics: EACL 2023. 2023. doi:10.18653/v1/2023.findings-eacl.88

  17. [17]

    NEJM AI , year=

    A Multimodal Biomedical Foundation Model Trained from Fifteen Million Image–Text Pairs , author=. NEJM AI , year=. doi:10.1056/AIoa2400640 , url=

  18. [18]

    IEEE Transactions on Neural Networks and Learning Systems , volume =

    SAM-Med3D: a vision foundation model for general-purpose segmentation on volumetric medical images , author=. IEEE Transactions on Neural Networks and Learning Systems , volume =. 2025 , publisher=

  19. [19]

    International MICCAI brainlesion workshop , pages=

    Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images , author=. International MICCAI brainlesion workshop , pages=. 2021 , organization=

  20. [20]

    Med3D: Transfer Learning for 3D Medical Image Analysis

    Med3d: Transfer learning for 3d medical image analysis , author=. arXiv preprint arXiv:1904.00625 , year=

  21. [21]

    Proceedings of the 2nd clinical natural language processing workshop , pages=

    Publicly available clinical BERT embeddings , author=. Proceedings of the 2nd clinical natural language processing workshop , pages=

  22. [22]

    Publicly Available Clinical BERT Embeddings

    Alsentzer, Emily and Murphy, John and Boag, William and Weng, Wei-Hung and Jindi, Di and Naumann, Tristan and McDermott, Matthew. Publicly Available Clinical BERT Embeddings. Proceedings of the 2nd Clinical Natural Language Processing Workshop. 2019. doi:10.18653/v1/W19-1909

  23. [23]

    Neuroimage , volume=

    Implementation of a new parcellation of the orbitofrontal cortex in the automated anatomical labeling atlas , author=. Neuroimage , volume=. 2015 , publisher=

  24. [24]

    Nature Reviews Neurology , volume=

    Functional brain networks in the evaluation of patients with neurodegenerative disorders , author=. Nature Reviews Neurology , volume=. 2023 , publisher=

  25. [25]

    Physics of Life Reviews , volume=

    PET brain imaging in neurological disorders , author=. Physics of Life Reviews , volume=. 2024 , publisher=

  26. [26]

    Physics in Medicine & Biology , volume=

    Quantitative imaging of protein targets in the human brain with PET , author=. Physics in Medicine & Biology , volume=. 2015 , publisher=

  27. [27]

    2025 , url=

    Representation learning for 3D brain imaging: A comparative study of pre-trained encoders, foundation models and self-supervised learning methods , author=. 2025 , url=

  28. [28]

    Movement Disorders , volume=

    Typical cerebral metabolic patterns in neurodegenerative brain diseases , author=. Movement Disorders , volume=. 2010 , publisher=

  29. [29]

    Chinese Medical Journal , volume=

    Large models in medical imaging: Advances and prospects , author=. Chinese Medical Journal , volume=

  30. [30]

    Computational Neurosurgery , pages=

    Deep Learning: A Primer for Neurosurgeons , author=. Computational Neurosurgery , pages=. 2024 , publisher=

  31. [31]

    Medical Image Analysis , volume=

    BrainSegFounder: Towards 3D foundation models for neuroimage segmentation , author=. Medical Image Analysis , volume=. 2024 , publisher=

  32. [32]

    2025 , eprint=

    Qwen3 Technical Report , author=. 2025 , eprint=

  33. [33]

    Brain Foundation Models: A survey on advancements in neural signal processing and brain discovery , year=

    Zhou, Xinliang and Liu, Chenyu and Chen, Zhisheng and Wang, Kun and Ding, Yi and Jia, Ziyu and Wen, Qingsong , journal=. Brain Foundation Models: A survey on advancements in neural signal processing and brain discovery , year=

  34. [34]

    BioMed research international , volume=

    Precuneus and Cingulate Cortex Atrophy and Hypometabolism in Patients with Alzheimer’s Disease and Mild Cognitive Impairment: MRI and 18F-FDG PET Quantitative Analysis Using FreeSurfer , author=. BioMed research international , volume=. 2015 , publisher=

  35. [35]

    2023 , eprint=

    Customized Segment Anything Model for Medical Image Segmentation , author=. 2023 , eprint=

  36. [36]

    2022 , eprint=

    Visual Prompt Tuning , author=. 2022 , eprint=

  37. [37]

    2021 , eprint=

    The Power of Scale for Parameter-Efficient Prompt Tuning , author=. 2021 , eprint=

  38. [38]

    2021 , eprint=

    LoRA: Low-Rank Adaptation of Large Language Models , author=. 2021 , eprint=

  39. [39]

    2023 , eprint=

    Cross-Modality Neuroimage Synthesis: A Survey , author=. 2023 , eprint=

  40. [40]

    2025 , eprint=

    PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting , author=. 2025 , eprint=

  41. [41]

    2025 , eprint=

    Large Language Model with Region-guided Referring and Grounding for CT Report Generation , author=. 2025 , eprint=

  42. [42]

    2019 , eprint=

    Parameter-Efficient Transfer Learning for NLP , author=. 2019 , eprint=