pith. sign in

arxiv: 2606.05107 · v1 · pith:HFXSGFJLnew · submitted 2026-06-03 · 💻 cs.CV · cs.AI

Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have

Pith reviewed 2026-06-28 06:41 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords vision foundation modelsmetadata adaptationself-supervised learningdomain adaptationscientific imaginglabel-free adaptationfluorescence microscopymedical imaging
0
0 comments X

The pith

Vision foundation models can be adapted to scientific domains using only the metadata already attached to images, without task labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FINO, a method that adapts generic vision foundation models to specialized domains like microscopy and medical imaging by leveraging existing metadata in a self-supervised way. Instead of relying on scarce labels or risking loss of generality through full supervision, it uses metadata to guide the representation to keep useful factors and ignore spurious ones. This approach beats both standard unsupervised domain adaptation and fully supervised methods, as well as specialized state-of-the-art techniques in several domains. A sympathetic reader would care because it suggests a way to make powerful models useful in label-poor scientific settings without the usual costs. The key is that metadata, which is often already collected, provides enough signal for effective adaptation.

Core claim

FINO combines a standard self-supervised objective with flexible metadata guidance that handles both highly granular discrete metadata and continuous metadata. It encourages the representation to preserve informative factors while suppressing spurious ones. Across subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging, FINO consistently outperforms standard unsupervised domain adaptation and fully supervised adaptation. It also exceeds highly-specialized domain-specific state of the art, while using no task labels for backbone adaptation and only lightweight probes for supervision.

What carries the argument

FINO, the self-supervised adaptation method that adds flexible metadata guidance to a standard self-supervised objective to separate informative factors from spurious ones in the learned representation.

If this is right

  • Adaptation succeeds without any task labels for the backbone across multiple tested domains.
  • Performance exceeds both unsupervised domain adaptation and fully supervised adaptation.
  • Results surpass highly-specialized domain-specific state-of-the-art methods.
  • Only lightweight probes are needed for any remaining supervision.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Prioritizing richer metadata collection during scientific data acquisition could amplify the benefits of this style of adaptation.
  • The same metadata-guided principle might extend to non-vision foundation models if analogous side information is available.
  • Domains where metadata is noisy or weakly related to the task may require extra robustness steps not tested here.

Load-bearing premise

The metadata already present with the images supplies reliable signals for separating informative factors from spurious ones across the tested scientific domains.

What would settle it

Applying FINO to a new domain where the available metadata has no correlation with task-relevant variation and finding that performance is no better than a metadata-free self-supervised baseline.

read the original abstract

We propose a label-free approach to adapt powerful but generic vision foundation models to specialized scientific domains. Standard supervised fine-tuning is often ill-suited to these settings: labels are scarce, and task-specific training can collapse the model's generality and hurt robustness. We instead leverage metadata to adapt representations to new domains in a self-supervised manner. Our method, FINO, combines a standard self-supervised objective with flexible metadata guidance that handles both highly granular discrete metadata and continuous metadata. It encourages the representation to preserve informative factors while suppressing spurious ones. Across subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging, FINO consistently outperforms standard unsupervised domain adaptation and fully supervised adaptation. It also exceeds highly-specialized domain-specific state of the art, while using no task labels for backbone adaptation and only lightweight probes for supervision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes FINO, a label-free method to adapt vision foundation models to specialized scientific domains by combining a standard self-supervised objective with flexible metadata guidance (handling both granular discrete and continuous metadata). The approach encourages representations to preserve informative factors while suppressing spurious ones. It claims consistent outperformance over standard unsupervised domain adaptation, fully supervised adaptation, and highly-specialized domain-specific state-of-the-art methods across four domains (subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging), using no task labels for backbone adaptation and only lightweight probes for supervision.

Significance. If the empirical claims hold under rigorous validation, the work would be significant for enabling practical adaptation of foundation models in label-scarce scientific domains by exploiting readily available metadata. This could improve robustness and generality compared to standard fine-tuning or UDA, with the multi-domain evaluation providing a broad test of the approach.

major comments (1)
  1. [Abstract] Abstract: the claim of consistent outperformance across four domains supplies no experimental details, baseline definitions, or statistical tests, preventing verification of the data-to-claim link from the provided text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. We address the single major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of consistent outperformance across four domains supplies no experimental details, baseline definitions, or statistical tests, preventing verification of the data-to-claim link from the provided text.

    Authors: We agree that the abstract, as a concise summary, does not include experimental details, baseline definitions, or statistical tests. This is standard due to length constraints. The manuscript provides these in full in Section 4 (Experiments), including the four domains, comparisons against unsupervised domain adaptation, fully supervised adaptation, and domain-specific SOTA methods, with results in Tables 1-4 and statistical analysis where reported. Readers can verify the claims from the main text. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces FINO as a metadata-guided self-supervised adaptation method for vision foundation models. No equations, derivations, or parameter-fitting steps are described that reduce any claimed prediction or result to a quantity defined by the method itself. The approach combines standard self-supervised objectives with metadata handling, and performance claims are presented as empirical comparisons across domains rather than algebraic identities or self-referential fits. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text. The derivation chain is therefore self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no concrete implementation details, so no free parameters, axioms, or invented entities can be extracted.

pith-pipeline@v0.9.1-grok · 5706 in / 1050 out tokens · 33494 ms · 2026-06-28T06:41:17.972506+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

81 extracted references · 6 canonical work pages · 3 internal anchors

  1. [1]

    Dinov2: Learning robust visual features without supervision.TMLR, 2024

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Lab...

  2. [2]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InICML, 2021

  3. [3]

    SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

    Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al. SigLIP 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features.arXiv:2502.14786, 2025

  4. [4]

    Lungren, Tristan Naumann, Sheng Wang, and Hoifung Poon

    Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Andrea Tupini, Yu Wang, Matt Mazzola, Swadheen Shukla, Lars Liden, Jianfeng Gao, Angela Crabtree, Brian Piening, Carlo Bifulco, Matthew P. Lungren, Tristan Naumann, Sheng Wang, and Hoifung Poon. A multimodal biomedical...

  5. [5]

    Out of distribution generalization via interventional style transfer in single-cell microscopy

    Wolfgang M Pernice, Michael Doron, Alex Quach, Aditya Pratapa, Sultan Kenjeyev, Nicholas De Veaux, Michio Hirano, and Juan C Caicedo. Out of distribution generalization via interventional style transfer in single-cell microscopy. InCVPR Workshop on Computer Vision for Microscopy Image Analysis (CVMI), 2023

  6. [6]

    Geography-aware self-supervised learning

    Kumar Ayush, Burak Uzkent, Chenlin Meng, Kumar Tanmay, Marshall Burke, David Lobell, and Stefano Ermon. Geography-aware self-supervised learning. InICCV, 2021

  7. [7]

    Unbiased look at dataset bias

    Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. InCVPR. IEEE, 2011

  8. [8]

    Generalizing to unseen domains: A survey on domain generalization.IEEE transactions on knowledge and data engineering, 2022

    Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, Tao Qin, Wang Lu, Yiqiang Chen, Wenjun Zeng, and Philip S Yu. Generalizing to unseen domains: A survey on domain generalization.IEEE transactions on knowledge and data engineering, 2022

  9. [9]

    CoDEx: Combining domain expertise for spatial generalization in satellite image analysis

    Abhishek Kuriyal, Elliot Vincent, Mathieu Aubry, and Loic Landrieu. CoDEx: Combining domain expertise for spatial generalization in satellite image analysis. InCVPR workshop EarthVision, 2025

  10. [10]

    Vargas Hakim, David Osowiechi, Fereshteh Shakeri, Ali Bahri, Moslem Yazdanpanah, Sahar Dastani, Ismail Ben Ayed, and Christian Desrosiers

    Mehrdad Noori, Gustavo A. Vargas Hakim, David Osowiechi, Fereshteh Shakeri, Ali Bahri, Moslem Yazdanpanah, Sahar Dastani, Ismail Ben Ayed, and Christian Desrosiers. Histopath-C: Towards realistic domain shifts for histopathology vision-language adaptation. InWACV, 2026

  11. [11]

    Measuring domain shift for deep learning in histopathology.IEEE Journal of Biomedical and Health Informatics, 25(2):325–336, 2021

    Karin Stacke, Gabriel Eilertsen, Jonas Unger, and Claes Lundström. Measuring domain shift for deep learning in histopathology.IEEE Journal of Biomedical and Health Informatics, 25(2):325–336, 2021

  12. [12]

    Parameter-efficient fine-tuning of DINOv2 vision transformers for lung nodule classification

    Benjamin P Veasey and Amir A Amini. Parameter-efficient fine-tuning of DINOv2 vision transformers for lung nodule classification. InInternational Symposium on Biomedical Imaging (ISBI). IEEE, 2024

  13. [13]

    How to train your ViT? data, augmentation, and regularization in vision transformers.TMLR, 2022

    Andreas Peter Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, and Lucas Beyer. How to train your ViT? data, augmentation, and regularization in vision transformers.TMLR, 2022

  14. [14]

    Fixing the train-test resolution discrepancy

    Hugo Touvron, Andrea Vedaldi, Matthijs Douze, and Hervé Jégou. Fixing the train-test resolution discrepancy. InNeurIPS, 2019

  15. [15]

    Knowledge-guided adaptation of pathology foundation models effectively improves cross-domain generalization and demographic fairness.Nature Communications, 2025

    Yanyan Huang, Weiqin Zhao, Zhengyu Zhang, Yihang Chen, Yu Fu, Feng Wu, Yuming Jiang, Li Liang, Shujun Wang, and Lequan Yu. Knowledge-guided adaptation of pathology foundation models effectively improves cross-domain generalization and demographic fairness.Nature Communications, 2025

  16. [16]

    Deep learning is combined with massive-scale citizen science to improve large-scale image classification.Nature biotechnology, 36(9):820–828, 2018

    Devin P Sullivan, Casper F Winsnes, Lovisa Åkesson, Martin Hjelmare, Mikaela Wiking, Rutger Schutten, Linzi Campbell, Hjalti Leifsson, Scott Rhodes, Andie Nordgren, et al. Deep learning is combined with massive-scale citizen science to improve large-scale image classification.Nature biotechnology, 36(9):820–828, 2018

  17. [17]

    Earnshaw, Imran S

    Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang. Wilds...

  18. [18]

    Unsupervised domain adaptation by backpropagation

    Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. InICML, 2015

  19. [19]

    Learning transferable features with deep adaptation networks

    Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. Learning transferable features with deep adaptation networks. InICML. PMLR, 2015

  20. [20]

    Pseudo-labelingandconfirmation bias in deep semi-supervised learning

    EricArazo, DiegoOrtego, PaulAlbert, NoelEO’Connor, andKevinMcGuinness. Pseudo-labelingandconfirmation bias in deep semi-supervised learning. InIJCNN, pages 1–8. IEEE, 2020

  21. [21]

    The risks of invariant risk minimization

    Elan Rosenfeld, Pradeep Kumar Ravikumar, and Andrej Risteski. The risks of invariant risk minimization. In ICLR, 2021

  22. [22]

    Metadata- guided consistency learning for high content images

    Johan Fredin Haslum, Christos Matsoukas, Karl-Johan Leuchowius, Erik Müllers, and Kevin Smith. Metadata- guided consistency learning for high content images. InMedical Imaging with Deep Learning, pages 918–936. PMLR, 2024

  23. [23]

    Learning representations of satellite images from metadata supervision

    Jules Bourcier, Gohar Dashyan, Karteek Alahari, and Jocelyn Chanussot. Learning representations of satellite images from metadata supervision. InECCV. Springer, 2024

  24. [24]

    Metadata-enhanced contrastive learning from retinal optical coherence tomography images.Medical Image Analysis, 2024

    Robbie Holland, Oliver Leingang, Hrvoje Bogunović, Sophie Riedl, Lars Fritsche, Toby Prevost, Hendrik P N Scholl, Ursula Schmidt-Erfurth, Sobha Sivaprasad, Andrew J Lotery, Daniel Rueckert, and Martin J Menten. Metadata-enhanced contrastive learning from retinal optical coherence tomography images.Medical Image Analysis, 2024

  25. [25]

    Hierarchical metadata information constrained self-supervised learning for anomalous sound detection under domain shift

    Haiyan Lan, Qiaoxi Zhu, Jian Guan, Yuming Wei, and Wenwu Wang. Hierarchical metadata information constrained self-supervised learning for anomalous sound detection under domain shift. InICASSP, 2024

  26. [26]

    Analysis of the Human Protein Atlas Image Classification competition.Nature Methods, 16(12):1254–1261, 2019

    Wei Ouyang, Casper F Winsnes, Martin Hjelmare, Anthony J Cesnik, Lovisa Åkesson, Hao Xu, Devin P Sullivan, Shubin Dai, Jun Lan, Park Jinmo, Shaikat M Galib, Christof Henkel, Kevin Hwang, Dmytro Poplavskiy, Bojan Tunguz, Russell D Wolfinger, Yinzheng Gu, Chuanpeng Li, Jinbin Xie, Dmitry Buslov, Sergei Fironov, Alexander Kiselev, Dmytro Panchenko, Xuan Cao,...

  27. [27]

    Functional map of the world

    Gordon Christie, Neil Fendley, James Wilson, and Ryan Mukherjee. Functional map of the world. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

  28. [28]

    The iwildcam 2020 competition dataset

    Sara Beery, Elijah Cole, and Arvi Gjoka. The iwildcam 2020 competition dataset. InCVPR Fine-Grained Visual Categorization Workshop (FGVC), 2020

  29. [29]

    MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.Scientific data, 2019

    Alistair EW Johnson, Tom J Pollard, Seth J Berkowitz, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Roger G Mark, and Steven Horng. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.Scientific data, 2019

  30. [30]

    Correlation alignment for unsupervised domain adaptation

    Baochen Sun, Jiashi Feng, and Kate Saenko. Correlation alignment for unsupervised domain adaptation. In Domain adaptation in computer vision applications, pages 153–171. Springer, 2017

  31. [31]

    Towards domain-invariant self-supervised learning with batch styles standardization

    Marin Scalbert, Maria Vakalopoulou, and Florent Couzinié-Devy. Towards domain-invariant self-supervised learning with batch styles standardization. InICLR, 2024

  32. [32]

    Cell-dino: Self-supervised image-based embeddings for cell fluorescent microscopy.PLOS Computational Biology, 21(12):e1013828, 2025

    Théo Moutakanni, Camille Couprie, Seungeun Yi, Michael Doron, Zitong S Chen, Nikita Moshkov, Elouan Gardes, Mathilde Caron, Hugo Touvron, Armand Joulin, Piotr Bojanowski, Wolfgang M Pernice, and Juan C Caicedo. Cell-dino: Self-supervised image-based embeddings for cell fluorescent microscopy.PLOS Computational Biology, 21(12):e1013828, 2025

  33. [33]

    Parameter efficient self-supervised geospatial domain adaptation

    Linus Scheibenreif, Michael Mommert, and Damian Borth. Parameter efficient self-supervised geospatial domain adaptation. InCVPR, 2024

  34. [34]

    A simple framework for contrastive learning of visual representations

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InICML, 2020

  35. [35]

    Momentum contrast for unsupervised visual representation learning

    Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InCVPR, 2020

  36. [36]

    Emerging properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InICCV, 2021

  37. [37]

    Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, 11 Rémi Munos, and Michal Valko

    Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, 11 Rémi Munos, and Michal Valko. Bootstrap your own latent: A new approach to self-supervised learning. In NeurIPS, 2020

  38. [38]

    Masked autoencoders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InCVPR, 2022

  39. [39]

    Self-supervised learning from images with a joint-embedding predictive architecture

    Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. InCVPR, 2023

  40. [40]

    Unsupervised learning of visual features by contrasting cluster assignments

    Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments. InNeurIPS, 2020

  41. [41]

    Junnan Li, Pan Zhou, Caiming Xiong, and Steven C.H. Hoi. Prototypical contrastive learning of unsupervised representations. InICLR, 2021

  42. [42]

    Supervised contrastive learning

    Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. InNeurIPS, 2020

  43. [43]

    Rayan Krishnan, Pranav Rajpurkar, and Eric J. Topol. Self-supervised learning in medicine and healthcare. Nature Biomedical Engineering, 2022

  44. [44]

    Chen, Nikita Moshkov, Mathilde Caron, Hugo Touvron, Piotr Bojanowski, Wolfgang M

    Michael Doron, Théo Moutakanni, Zitong S. Chen, Nikita Moshkov, Mathilde Caron, Hugo Touvron, Piotr Bojanowski, Wolfgang M. Pernice, and Juan C. Caicedo. Unbiased single-cell morphology with self-supervised vision transformers.bioRxiv, 2023

  45. [45]

    Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 2022

    Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Lichao Mou, and Xiao Xiang Zhu. Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 2022

  46. [46]

    AutoFT: Learning an objective for robust fine-tuning.arXiv:2401.10220, 2024

    Caroline Choi, Yoonho Lee, Annie Chen, Allan Zhou, Aditi Raghunathan, and Chelsea Finn. AutoFT: Learning an objective for robust fine-tuning.arXiv:2401.10220, 2024

  47. [47]

    Connect later: Improving fine-tuning for robustness with targeted augmentations

    Helen Qu and Sang Michael Xie. Connect later: Improving fine-tuning for robustness with targeted augmentations. InICML, 2024

  48. [48]

    Tackling the widespread and critical impact of batch effects in high-throughput data.Nature Reviews Genetics, 2010

    Jeffrey T Leek, Robert B Scharpf, Héctor Corrada Bravo, David Simcha, Ben Langmead, W Evan Johnson, Donald Geman, Keith Baggerly, and Rafael A Irizarry. Tackling the widespread and critical impact of batch effects in high-throughput data.Nature Reviews Genetics, 2010

  49. [49]

    Contrastive learning for fair representations.arXiv:2109.10645, 2021

    Aili Shen, Xudong Han, Trevor Cohn, Timothy Baldwin, and Lea Frermann. Contrastive learning for fair representations.arXiv:2109.10645, 2021

  50. [50]

    Compositional risk minimization

    Divyat Mahajan, Mohammad Pezeshki, Charles Arnal, Ioannis Mitliagkas, Kartik Ahuja, and Pascal Vincent. Compositional risk minimization. InICML, 2025

  51. [51]

    Korsunsky, N

    I. Korsunsky, N. Millard, J. Fan, K. Slowikowski, F. Zhang, K. Wei, Y. Baglaenko, M. Brenner, P. R. Loh, and S. Raychaudhuri. Fast, sensitive and accurate integration of single-cell data with harmony.Nature Methods, 2019

  52. [52]

    J. B. Kang, A. Nathan, K. Weinand, F. Zhang, N. Millard, L. Rumker, D. B. Moody, I. Korsunsky, and S. Raychaudhuri. Efficient and precise single-cell reference atlas mapping with symphony.Nature Communications, 2021

  53. [53]

    A brief introduction to weakly supervised learning.National science review, 2018

    Zhi-Hua Zhou. A brief introduction to weakly supervised learning.National science review, 2018

  54. [54]

    Cheveralls, Manuel D

    Hirofumi Kobayashi, Keith C. Cheveralls, Manuel D. Leonetti, and Loic A. Royer. Self-supervised deep learning encodes high-resolution features of protein subcellular localization.Nature Methods, 2022

  55. [55]

    Hansen, Will Leineweber, Anthony Cesnik, Dan Lu, Ulrika Axelsson, Frederic Ballllosera Navarro, Theofanis Karaletsos, and Emma Lundberg

    Ankit Gupta, Zoe Wefers, Konstantin Kahnert, Jan N. Hansen, Will Leineweber, Anthony Cesnik, Dan Lu, Ulrika Axelsson, Frederic Ballllosera Navarro, Theofanis Karaletsos, and Emma Lundberg. SubCell: Vision foundation models for microscopy capture single-cell biology.bioRxiv, 2024

  56. [56]

    PRETI: Patient-aware retinal foundation model via metadata-guided representation learning

    Yeonkyung Lee, Woojung Han, Youngjun Jun, Hyeonmin Kim, Jungkyung Cho, and Seong Jae Hwang. PRETI: Patient-aware retinal foundation model via metadata-guided representation learning. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 523–533, 2025

  57. [57]

    Lobell, and Stefano Ermon

    Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David B. Lobell, and Stefano Ermon. SatMAE: Pre-training transformers for temporal and multi-spectral satellite imagery. In NeurIPS, 2022. 12

  58. [58]

    Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell

    Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell. Scale-MAE: A scale-aware masked autoencoder for multiscale geospatial representation learning. InICCV, 2023

  59. [59]

    Contextual vision transformers for robust representation learning

    Yujia Bao and Theofanis Karaletsos. Contextual vision transformers for robust representation learning. InICML Workshop on Spurious Correlations, Invariance and Stability (SCIS), 2023

  60. [60]

    Multitask learning.Machine Learning, 1997

    Rich Caruana. Multitask learning.Machine Learning, 1997

  61. [61]

    Learning visual representations via language-guided sampling

    Mohamed El Banani, Karan Desai, and Justin Johnson. Learning visual representations via language-guided sampling. InCVPR, 2023

  62. [62]

    Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks

    Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InICML, 2018

  63. [63]

    Towards impartial multi-task learning

    Liyang Liu, Yi Li, Zhanghui Kuang, Jing-Hao Xue, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. Towards impartial multi-task learning. InICLR, 2021

  64. [64]

    Rotograd: Gradient homogenization in multitask learning

    Adrián Javaloy and Isabel Valera. Rotograd: Gradient homogenization in multitask learning. InICLR, 2022

  65. [65]

    LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

    Randall Balestriero and Yann LeCun. LeJEPA: Provable and scalable self-supervised learning without the heuristics.arXiv:2511.08544, 2025

  66. [66]

    DINOv3

    Oriane Siméoni, Huy V Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv:2508.10104, 2025

  67. [67]

    A subcellular map of the human proteome.Science, 2017

    Peter J Thul, Lovisa Åkesson, Mikaela Wiking, Diana Mahdessian, Aikaterini Geladaki, Hammou Ait Blal, Tove Alm, Anna Asplund, Lars Björk, Lisa M Breckels, et al. A subcellular map of the human proteome.Science, 2017

  68. [68]

    The human protein atlas: A spatial map of the human proteome.Protein Science, 2018

    Peter J Thul and Cecilia Lindskog. The human protein atlas: A spatial map of the human proteome.Protein Science, 2018

  69. [69]

    Cho, Keith C

    Nathan H. Cho, Keith C. Cheveralls, Andreas-David Brunner, Kibeom Kim, André C. Michaelis, Preethi Raghavan, Hirofumi Kobayashi, Laura Savy, Jason Y. Li, Hera Canaj, James Y. S. Kim, Edna M. Stewart, Christian Gnann, Frank McCarthy, Joana P. Cabrera, Rachel M. Brunetti, Bryant B. Chhun, Greg Dingle, Marco Y. Hein, Bo Huang, Shalin B. Mehta, Jonathan S. We...

  70. [70]

    CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison

    Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, et al. CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. InAAAI, 2019

  71. [71]

    FLAIR-HUB: Large-scale multimodal dataset for land cover and crop mapping.ISPRS Journal of Photogrammetry and Remote Sensing, 237:271–300, 2026

    Anatol Garioud, Sébastien Giordano, Nicolas David, and Nicolas Gonthier. FLAIR-HUB: Large-scale multimodal dataset for land cover and crop mapping.ISPRS Journal of Photogrammetry and Remote Sensing, 237:271–300, 2026

  72. [72]

    Context autoencoder for self-supervised representation learning.IJCV, 2024

    Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, and Jingdong Wang. Context autoencoder for self-supervised representation learning.IJCV, 2024

  73. [73]

    arXiv preprint arXiv:2405.01469 (2024)

    Théo Moutakanni, Piotr Bojanowski, Guillaume Chassagnon, Céline Hudelot, Armand Joulin, Yann LeCun, Matthew Muckley, Maxime Oquab, Marie-Pierre Revel, and Maria Vakalopoulou. Advancing human-centric ai for robust x-ray analysis through holistic self-supervised learning.arXiv:2405.01469, 2024

  74. [74]

    Extending the wilds benchmark for unsupervised adaptation

    Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, et al. Extending the wilds benchmark for unsupervised adaptation. InICLR, 2022

  75. [75]

    Multi-task learning as multi-objective optimization

    Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization. InNeurIPS, 2018

  76. [76]

    FAMO: Fast adaptive multitask optimization

    Bo Liu, Yihao Feng, Peter Stone, and Qiang Liu. FAMO: Fast adaptive multitask optimization. InNeurIPS, 2023

  77. [77]

    Gradient surgery for multi-task learning

    Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi-task learning. InNeurIPS, 2020

  78. [78]

    Conflict-averse gradient descent for multi-task learning

    Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learning. InNeurIPS, 2021. 13

  79. [79]

    Multi-task learning as a bargaining game

    Aviv Navon, Aviv Shamsian, Idan Achituve, Haggai Maron, Kenji Kawaguchi, Gal Chechik, and Ethan Fetaya. Multi-task learning as a bargaining game. InICML, 2022

  80. [80]

    Independent component alignment for multi-task learning

    Dmitry Senushkin, Nikolay Patakin, Arseny Kuznetsov, and Anton Konushin. Independent component alignment for multi-task learning. InCVPR, 2023

Showing first 80 references.