Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have

Camille Couprie; Elouan Gard\`es; Huy V. Vo; Kartik Ahuja; Lo\"ic Landrieu; Piotr Bojanowski; Seung Eun Yi; Th\'eo Moutakanni; Wolfgang M. Pernice

arxiv: 2606.05107 · v1 · pith:HFXSGFJLnew · submitted 2026-06-03 · 💻 cs.CV · cs.AI

Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have

Elouan Gard\`es , Seung Eun Yi , Kartik Ahuja , Th\'eo Moutakanni , Huy V. Vo , Piotr Bojanowski , Wolfgang M. Pernice , Lo\"ic Landrieu

show 1 more author

Camille Couprie

This is my paper

Pith reviewed 2026-06-28 06:41 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords vision foundation modelsmetadata adaptationself-supervised learningdomain adaptationscientific imaginglabel-free adaptationfluorescence microscopymedical imaging

0 comments

The pith

Vision foundation models can be adapted to scientific domains using only the metadata already attached to images, without task labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FINO, a method that adapts generic vision foundation models to specialized domains like microscopy and medical imaging by leveraging existing metadata in a self-supervised way. Instead of relying on scarce labels or risking loss of generality through full supervision, it uses metadata to guide the representation to keep useful factors and ignore spurious ones. This approach beats both standard unsupervised domain adaptation and fully supervised methods, as well as specialized state-of-the-art techniques in several domains. A sympathetic reader would care because it suggests a way to make powerful models useful in label-poor scientific settings without the usual costs. The key is that metadata, which is often already collected, provides enough signal for effective adaptation.

Core claim

FINO combines a standard self-supervised objective with flexible metadata guidance that handles both highly granular discrete metadata and continuous metadata. It encourages the representation to preserve informative factors while suppressing spurious ones. Across subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging, FINO consistently outperforms standard unsupervised domain adaptation and fully supervised adaptation. It also exceeds highly-specialized domain-specific state of the art, while using no task labels for backbone adaptation and only lightweight probes for supervision.

What carries the argument

FINO, the self-supervised adaptation method that adds flexible metadata guidance to a standard self-supervised objective to separate informative factors from spurious ones in the learned representation.

If this is right

Adaptation succeeds without any task labels for the backbone across multiple tested domains.
Performance exceeds both unsupervised domain adaptation and fully supervised adaptation.
Results surpass highly-specialized domain-specific state-of-the-art methods.
Only lightweight probes are needed for any remaining supervision.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Prioritizing richer metadata collection during scientific data acquisition could amplify the benefits of this style of adaptation.
The same metadata-guided principle might extend to non-vision foundation models if analogous side information is available.
Domains where metadata is noisy or weakly related to the task may require extra robustness steps not tested here.

Load-bearing premise

The metadata already present with the images supplies reliable signals for separating informative factors from spurious ones across the tested scientific domains.

What would settle it

Applying FINO to a new domain where the available metadata has no correlation with task-relevant variation and finding that performance is no better than a metadata-free self-supervised baseline.

read the original abstract

We propose a label-free approach to adapt powerful but generic vision foundation models to specialized scientific domains. Standard supervised fine-tuning is often ill-suited to these settings: labels are scarce, and task-specific training can collapse the model's generality and hurt robustness. We instead leverage metadata to adapt representations to new domains in a self-supervised manner. Our method, FINO, combines a standard self-supervised objective with flexible metadata guidance that handles both highly granular discrete metadata and continuous metadata. It encourages the representation to preserve informative factors while suppressing spurious ones. Across subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging, FINO consistently outperforms standard unsupervised domain adaptation and fully supervised adaptation. It also exceeds highly-specialized domain-specific state of the art, while using no task labels for backbone adaptation and only lightweight probes for supervision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FINO uses existing metadata to adapt vision foundation models self-supervisedly without task labels and claims to beat both unsupervised and supervised baselines across four scientific domains.

read the letter

The main takeaway is that FINO combines a standard self-supervised objective with flexible guidance from both discrete and continuous metadata to adapt foundation models to specialized imaging domains while avoiding task labels for the backbone.

What stands out as new is the handling of mixed metadata types in one framework and the claim that this preserves informative factors while suppressing spurious ones. The paper reports consistent gains on subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging, beating standard unsupervised domain adaptation, fully supervised adaptation, and some domain-specific state-of-the-art methods.

The work does a clean job framing the practical problem of scarce labels in scientific settings and showing how metadata already present with the data can substitute. The high-level description of the method is straightforward.

The soft spots are the lack of any experimental details, baseline definitions, or statistical tests in the abstract, which makes it impossible to judge how solid the outperformance actually is. The central assumption that the metadata reliably separates useful signals from noise could fail in messy real-world scientific collections, and that needs checking in the full experiments.

This paper is aimed at computer vision researchers working on domain adaptation for biology, medicine, or environmental monitoring. Readers focused on label-efficient methods would get value from the concrete domains and the metadata angle.

It deserves a serious referee because the problem matters and the approach is coherent even if the evidence strength is still unclear from the summary.

Referee Report

1 major / 0 minor

Summary. The paper proposes FINO, a label-free method to adapt vision foundation models to specialized scientific domains by combining a standard self-supervised objective with flexible metadata guidance (handling both granular discrete and continuous metadata). The approach encourages representations to preserve informative factors while suppressing spurious ones. It claims consistent outperformance over standard unsupervised domain adaptation, fully supervised adaptation, and highly-specialized domain-specific state-of-the-art methods across four domains (subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging), using no task labels for backbone adaptation and only lightweight probes for supervision.

Significance. If the empirical claims hold under rigorous validation, the work would be significant for enabling practical adaptation of foundation models in label-scarce scientific domains by exploiting readily available metadata. This could improve robustness and generality compared to standard fine-tuning or UDA, with the multi-domain evaluation providing a broad test of the approach.

major comments (1)

[Abstract] Abstract: the claim of consistent outperformance across four domains supplies no experimental details, baseline definitions, or statistical tests, preventing verification of the data-to-claim link from the provided text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. We address the single major comment point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of consistent outperformance across four domains supplies no experimental details, baseline definitions, or statistical tests, preventing verification of the data-to-claim link from the provided text.

Authors: We agree that the abstract, as a concise summary, does not include experimental details, baseline definitions, or statistical tests. This is standard due to length constraints. The manuscript provides these in full in Section 4 (Experiments), including the four domains, comparisons against unsupervised domain adaptation, fully supervised adaptation, and domain-specific SOTA methods, with results in Tables 1-4 and statistical analysis where reported. Readers can verify the claims from the main text. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces FINO as a metadata-guided self-supervised adaptation method for vision foundation models. No equations, derivations, or parameter-fitting steps are described that reduce any claimed prediction or result to a quantity defined by the method itself. The approach combines standard self-supervised objectives with metadata handling, and performance claims are presented as empirical comparisons across domains rather than algebraic identities or self-referential fits. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text. The derivation chain is therefore self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no concrete implementation details, so no free parameters, axioms, or invented entities can be extracted.

pith-pipeline@v0.9.1-grok · 5706 in / 1050 out tokens · 33494 ms · 2026-06-28T06:41:17.972506+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

81 extracted references · 6 canonical work pages · 3 internal anchors

[1]

Dinov2: Learning robust visual features without supervision.TMLR, 2024

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Lab...

2024
[2]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InICML, 2021

2021
[3]

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al. SigLIP 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features.arXiv:2502.14786, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Lungren, Tristan Naumann, Sheng Wang, and Hoifung Poon

Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Andrea Tupini, Yu Wang, Matt Mazzola, Swadheen Shukla, Lars Liden, Jianfeng Gao, Angela Crabtree, Brian Piening, Carlo Bifulco, Matthew P. Lungren, Tristan Naumann, Sheng Wang, and Hoifung Poon. A multimodal biomedical...

2025
[5]

Out of distribution generalization via interventional style transfer in single-cell microscopy

Wolfgang M Pernice, Michael Doron, Alex Quach, Aditya Pratapa, Sultan Kenjeyev, Nicholas De Veaux, Michio Hirano, and Juan C Caicedo. Out of distribution generalization via interventional style transfer in single-cell microscopy. InCVPR Workshop on Computer Vision for Microscopy Image Analysis (CVMI), 2023

2023
[6]

Geography-aware self-supervised learning

Kumar Ayush, Burak Uzkent, Chenlin Meng, Kumar Tanmay, Marshall Burke, David Lobell, and Stefano Ermon. Geography-aware self-supervised learning. InICCV, 2021

2021
[7]

Unbiased look at dataset bias

Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. InCVPR. IEEE, 2011

2011
[8]

Generalizing to unseen domains: A survey on domain generalization.IEEE transactions on knowledge and data engineering, 2022

Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, Tao Qin, Wang Lu, Yiqiang Chen, Wenjun Zeng, and Philip S Yu. Generalizing to unseen domains: A survey on domain generalization.IEEE transactions on knowledge and data engineering, 2022

2022
[9]

CoDEx: Combining domain expertise for spatial generalization in satellite image analysis

Abhishek Kuriyal, Elliot Vincent, Mathieu Aubry, and Loic Landrieu. CoDEx: Combining domain expertise for spatial generalization in satellite image analysis. InCVPR workshop EarthVision, 2025

2025
[10]

Vargas Hakim, David Osowiechi, Fereshteh Shakeri, Ali Bahri, Moslem Yazdanpanah, Sahar Dastani, Ismail Ben Ayed, and Christian Desrosiers

Mehrdad Noori, Gustavo A. Vargas Hakim, David Osowiechi, Fereshteh Shakeri, Ali Bahri, Moslem Yazdanpanah, Sahar Dastani, Ismail Ben Ayed, and Christian Desrosiers. Histopath-C: Towards realistic domain shifts for histopathology vision-language adaptation. InWACV, 2026

2026
[11]

Measuring domain shift for deep learning in histopathology.IEEE Journal of Biomedical and Health Informatics, 25(2):325–336, 2021

Karin Stacke, Gabriel Eilertsen, Jonas Unger, and Claes Lundström. Measuring domain shift for deep learning in histopathology.IEEE Journal of Biomedical and Health Informatics, 25(2):325–336, 2021

2021
[12]

Parameter-efficient fine-tuning of DINOv2 vision transformers for lung nodule classification

Benjamin P Veasey and Amir A Amini. Parameter-efficient fine-tuning of DINOv2 vision transformers for lung nodule classification. InInternational Symposium on Biomedical Imaging (ISBI). IEEE, 2024

2024
[13]

How to train your ViT? data, augmentation, and regularization in vision transformers.TMLR, 2022

Andreas Peter Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, and Lucas Beyer. How to train your ViT? data, augmentation, and regularization in vision transformers.TMLR, 2022

2022
[14]

Fixing the train-test resolution discrepancy

Hugo Touvron, Andrea Vedaldi, Matthijs Douze, and Hervé Jégou. Fixing the train-test resolution discrepancy. InNeurIPS, 2019

2019
[15]

Knowledge-guided adaptation of pathology foundation models effectively improves cross-domain generalization and demographic fairness.Nature Communications, 2025

Yanyan Huang, Weiqin Zhao, Zhengyu Zhang, Yihang Chen, Yu Fu, Feng Wu, Yuming Jiang, Li Liang, Shujun Wang, and Lequan Yu. Knowledge-guided adaptation of pathology foundation models effectively improves cross-domain generalization and demographic fairness.Nature Communications, 2025

2025
[16]

Deep learning is combined with massive-scale citizen science to improve large-scale image classification.Nature biotechnology, 36(9):820–828, 2018

Devin P Sullivan, Casper F Winsnes, Lovisa Åkesson, Martin Hjelmare, Mikaela Wiking, Rutger Schutten, Linzi Campbell, Hjalti Leifsson, Scott Rhodes, Andie Nordgren, et al. Deep learning is combined with massive-scale citizen science to improve large-scale image classification.Nature biotechnology, 36(9):820–828, 2018

2018
[17]

Earnshaw, Imran S

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang. Wilds...

2021
[18]

Unsupervised domain adaptation by backpropagation

Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. InICML, 2015

2015
[19]

Learning transferable features with deep adaptation networks

Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. Learning transferable features with deep adaptation networks. InICML. PMLR, 2015

2015
[20]

Pseudo-labelingandconfirmation bias in deep semi-supervised learning

EricArazo, DiegoOrtego, PaulAlbert, NoelEO’Connor, andKevinMcGuinness. Pseudo-labelingandconfirmation bias in deep semi-supervised learning. InIJCNN, pages 1–8. IEEE, 2020

2020
[21]

The risks of invariant risk minimization

Elan Rosenfeld, Pradeep Kumar Ravikumar, and Andrej Risteski. The risks of invariant risk minimization. In ICLR, 2021

2021
[22]

Metadata- guided consistency learning for high content images

Johan Fredin Haslum, Christos Matsoukas, Karl-Johan Leuchowius, Erik Müllers, and Kevin Smith. Metadata- guided consistency learning for high content images. InMedical Imaging with Deep Learning, pages 918–936. PMLR, 2024

2024
[23]

Learning representations of satellite images from metadata supervision

Jules Bourcier, Gohar Dashyan, Karteek Alahari, and Jocelyn Chanussot. Learning representations of satellite images from metadata supervision. InECCV. Springer, 2024

2024
[24]

Metadata-enhanced contrastive learning from retinal optical coherence tomography images.Medical Image Analysis, 2024

Robbie Holland, Oliver Leingang, Hrvoje Bogunović, Sophie Riedl, Lars Fritsche, Toby Prevost, Hendrik P N Scholl, Ursula Schmidt-Erfurth, Sobha Sivaprasad, Andrew J Lotery, Daniel Rueckert, and Martin J Menten. Metadata-enhanced contrastive learning from retinal optical coherence tomography images.Medical Image Analysis, 2024

2024
[25]

Hierarchical metadata information constrained self-supervised learning for anomalous sound detection under domain shift

Haiyan Lan, Qiaoxi Zhu, Jian Guan, Yuming Wei, and Wenwu Wang. Hierarchical metadata information constrained self-supervised learning for anomalous sound detection under domain shift. InICASSP, 2024

2024
[26]

Analysis of the Human Protein Atlas Image Classification competition.Nature Methods, 16(12):1254–1261, 2019

Wei Ouyang, Casper F Winsnes, Martin Hjelmare, Anthony J Cesnik, Lovisa Åkesson, Hao Xu, Devin P Sullivan, Shubin Dai, Jun Lan, Park Jinmo, Shaikat M Galib, Christof Henkel, Kevin Hwang, Dmytro Poplavskiy, Bojan Tunguz, Russell D Wolfinger, Yinzheng Gu, Chuanpeng Li, Jinbin Xie, Dmitry Buslov, Sergei Fironov, Alexander Kiselev, Dmytro Panchenko, Xuan Cao,...

2019
[27]

Functional map of the world

Gordon Christie, Neil Fendley, James Wilson, and Ryan Mukherjee. Functional map of the world. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

2018
[28]

The iwildcam 2020 competition dataset

Sara Beery, Elijah Cole, and Arvi Gjoka. The iwildcam 2020 competition dataset. InCVPR Fine-Grained Visual Categorization Workshop (FGVC), 2020

2020
[29]

MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.Scientific data, 2019

Alistair EW Johnson, Tom J Pollard, Seth J Berkowitz, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Roger G Mark, and Steven Horng. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.Scientific data, 2019

2019
[30]

Correlation alignment for unsupervised domain adaptation

Baochen Sun, Jiashi Feng, and Kate Saenko. Correlation alignment for unsupervised domain adaptation. In Domain adaptation in computer vision applications, pages 153–171. Springer, 2017

2017
[31]

Towards domain-invariant self-supervised learning with batch styles standardization

Marin Scalbert, Maria Vakalopoulou, and Florent Couzinié-Devy. Towards domain-invariant self-supervised learning with batch styles standardization. InICLR, 2024

2024
[32]

Cell-dino: Self-supervised image-based embeddings for cell fluorescent microscopy.PLOS Computational Biology, 21(12):e1013828, 2025

Théo Moutakanni, Camille Couprie, Seungeun Yi, Michael Doron, Zitong S Chen, Nikita Moshkov, Elouan Gardes, Mathilde Caron, Hugo Touvron, Armand Joulin, Piotr Bojanowski, Wolfgang M Pernice, and Juan C Caicedo. Cell-dino: Self-supervised image-based embeddings for cell fluorescent microscopy.PLOS Computational Biology, 21(12):e1013828, 2025

2025
[33]

Parameter efficient self-supervised geospatial domain adaptation

Linus Scheibenreif, Michael Mommert, and Damian Borth. Parameter efficient self-supervised geospatial domain adaptation. InCVPR, 2024

2024
[34]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InICML, 2020

2020
[35]

Momentum contrast for unsupervised visual representation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InCVPR, 2020

2020
[36]

Emerging properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InICCV, 2021

2021
[37]

Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, 11 Rémi Munos, and Michal Valko

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, 11 Rémi Munos, and Michal Valko. Bootstrap your own latent: A new approach to self-supervised learning. In NeurIPS, 2020

2020
[38]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InCVPR, 2022

2022
[39]

Self-supervised learning from images with a joint-embedding predictive architecture

Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. InCVPR, 2023

2023
[40]

Unsupervised learning of visual features by contrasting cluster assignments

Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments. InNeurIPS, 2020

2020
[41]

Junnan Li, Pan Zhou, Caiming Xiong, and Steven C.H. Hoi. Prototypical contrastive learning of unsupervised representations. InICLR, 2021

2021
[42]

Supervised contrastive learning

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. InNeurIPS, 2020

2020
[43]

Rayan Krishnan, Pranav Rajpurkar, and Eric J. Topol. Self-supervised learning in medicine and healthcare. Nature Biomedical Engineering, 2022

2022
[44]

Chen, Nikita Moshkov, Mathilde Caron, Hugo Touvron, Piotr Bojanowski, Wolfgang M

Michael Doron, Théo Moutakanni, Zitong S. Chen, Nikita Moshkov, Mathilde Caron, Hugo Touvron, Piotr Bojanowski, Wolfgang M. Pernice, and Juan C. Caicedo. Unbiased single-cell morphology with self-supervised vision transformers.bioRxiv, 2023

2023
[45]

Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 2022

Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Lichao Mou, and Xiao Xiang Zhu. Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 2022

2022
[46]

AutoFT: Learning an objective for robust fine-tuning.arXiv:2401.10220, 2024

Caroline Choi, Yoonho Lee, Annie Chen, Allan Zhou, Aditi Raghunathan, and Chelsea Finn. AutoFT: Learning an objective for robust fine-tuning.arXiv:2401.10220, 2024

work page arXiv 2024
[47]

Connect later: Improving fine-tuning for robustness with targeted augmentations

Helen Qu and Sang Michael Xie. Connect later: Improving fine-tuning for robustness with targeted augmentations. InICML, 2024

2024
[48]

Tackling the widespread and critical impact of batch effects in high-throughput data.Nature Reviews Genetics, 2010

Jeffrey T Leek, Robert B Scharpf, Héctor Corrada Bravo, David Simcha, Ben Langmead, W Evan Johnson, Donald Geman, Keith Baggerly, and Rafael A Irizarry. Tackling the widespread and critical impact of batch effects in high-throughput data.Nature Reviews Genetics, 2010

2010
[49]

Contrastive learning for fair representations.arXiv:2109.10645, 2021

Aili Shen, Xudong Han, Trevor Cohn, Timothy Baldwin, and Lea Frermann. Contrastive learning for fair representations.arXiv:2109.10645, 2021

work page arXiv 2021
[50]

Compositional risk minimization

Divyat Mahajan, Mohammad Pezeshki, Charles Arnal, Ioannis Mitliagkas, Kartik Ahuja, and Pascal Vincent. Compositional risk minimization. InICML, 2025

2025
[51]

Korsunsky, N

I. Korsunsky, N. Millard, J. Fan, K. Slowikowski, F. Zhang, K. Wei, Y. Baglaenko, M. Brenner, P. R. Loh, and S. Raychaudhuri. Fast, sensitive and accurate integration of single-cell data with harmony.Nature Methods, 2019

2019
[52]

J. B. Kang, A. Nathan, K. Weinand, F. Zhang, N. Millard, L. Rumker, D. B. Moody, I. Korsunsky, and S. Raychaudhuri. Efficient and precise single-cell reference atlas mapping with symphony.Nature Communications, 2021

2021
[53]

A brief introduction to weakly supervised learning.National science review, 2018

Zhi-Hua Zhou. A brief introduction to weakly supervised learning.National science review, 2018

2018
[54]

Cheveralls, Manuel D

Hirofumi Kobayashi, Keith C. Cheveralls, Manuel D. Leonetti, and Loic A. Royer. Self-supervised deep learning encodes high-resolution features of protein subcellular localization.Nature Methods, 2022

2022
[55]

Hansen, Will Leineweber, Anthony Cesnik, Dan Lu, Ulrika Axelsson, Frederic Ballllosera Navarro, Theofanis Karaletsos, and Emma Lundberg

Ankit Gupta, Zoe Wefers, Konstantin Kahnert, Jan N. Hansen, Will Leineweber, Anthony Cesnik, Dan Lu, Ulrika Axelsson, Frederic Ballllosera Navarro, Theofanis Karaletsos, and Emma Lundberg. SubCell: Vision foundation models for microscopy capture single-cell biology.bioRxiv, 2024

2024
[56]

PRETI: Patient-aware retinal foundation model via metadata-guided representation learning

Yeonkyung Lee, Woojung Han, Youngjun Jun, Hyeonmin Kim, Jungkyung Cho, and Seong Jae Hwang. PRETI: Patient-aware retinal foundation model via metadata-guided representation learning. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 523–533, 2025

2025
[57]

Lobell, and Stefano Ermon

Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David B. Lobell, and Stefano Ermon. SatMAE: Pre-training transformers for temporal and multi-spectral satellite imagery. In NeurIPS, 2022. 12

2022
[58]

Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell

Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell. Scale-MAE: A scale-aware masked autoencoder for multiscale geospatial representation learning. InICCV, 2023

2023
[59]

Contextual vision transformers for robust representation learning

Yujia Bao and Theofanis Karaletsos. Contextual vision transformers for robust representation learning. InICML Workshop on Spurious Correlations, Invariance and Stability (SCIS), 2023

2023
[60]

Multitask learning.Machine Learning, 1997

Rich Caruana. Multitask learning.Machine Learning, 1997

1997
[61]

Learning visual representations via language-guided sampling

Mohamed El Banani, Karan Desai, and Justin Johnson. Learning visual representations via language-guided sampling. InCVPR, 2023

2023
[62]

Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InICML, 2018

2018
[63]

Towards impartial multi-task learning

Liyang Liu, Yi Li, Zhanghui Kuang, Jing-Hao Xue, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. Towards impartial multi-task learning. InICLR, 2021

2021
[64]

Rotograd: Gradient homogenization in multitask learning

Adrián Javaloy and Isabel Valera. Rotograd: Gradient homogenization in multitask learning. InICLR, 2022

2022
[65]

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

Randall Balestriero and Yann LeCun. LeJEPA: Provable and scalable self-supervised learning without the heuristics.arXiv:2511.08544, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[66]

DINOv3

Oriane Siméoni, Huy V Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv:2508.10104, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[67]

A subcellular map of the human proteome.Science, 2017

Peter J Thul, Lovisa Åkesson, Mikaela Wiking, Diana Mahdessian, Aikaterini Geladaki, Hammou Ait Blal, Tove Alm, Anna Asplund, Lars Björk, Lisa M Breckels, et al. A subcellular map of the human proteome.Science, 2017

2017
[68]

The human protein atlas: A spatial map of the human proteome.Protein Science, 2018

Peter J Thul and Cecilia Lindskog. The human protein atlas: A spatial map of the human proteome.Protein Science, 2018

2018
[69]

Cho, Keith C

Nathan H. Cho, Keith C. Cheveralls, Andreas-David Brunner, Kibeom Kim, André C. Michaelis, Preethi Raghavan, Hirofumi Kobayashi, Laura Savy, Jason Y. Li, Hera Canaj, James Y. S. Kim, Edna M. Stewart, Christian Gnann, Frank McCarthy, Joana P. Cabrera, Rachel M. Brunetti, Bryant B. Chhun, Greg Dingle, Marco Y. Hein, Bo Huang, Shalin B. Mehta, Jonathan S. We...

2022
[70]

CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison

Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, et al. CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. InAAAI, 2019

2019
[71]

FLAIR-HUB: Large-scale multimodal dataset for land cover and crop mapping.ISPRS Journal of Photogrammetry and Remote Sensing, 237:271–300, 2026

Anatol Garioud, Sébastien Giordano, Nicolas David, and Nicolas Gonthier. FLAIR-HUB: Large-scale multimodal dataset for land cover and crop mapping.ISPRS Journal of Photogrammetry and Remote Sensing, 237:271–300, 2026

2026
[72]

Context autoencoder for self-supervised representation learning.IJCV, 2024

Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, and Jingdong Wang. Context autoencoder for self-supervised representation learning.IJCV, 2024

2024
[73]

arXiv preprint arXiv:2405.01469 (2024)

Théo Moutakanni, Piotr Bojanowski, Guillaume Chassagnon, Céline Hudelot, Armand Joulin, Yann LeCun, Matthew Muckley, Maxime Oquab, Marie-Pierre Revel, and Maria Vakalopoulou. Advancing human-centric ai for robust x-ray analysis through holistic self-supervised learning.arXiv:2405.01469, 2024

work page arXiv 2024
[74]

Extending the wilds benchmark for unsupervised adaptation

Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, et al. Extending the wilds benchmark for unsupervised adaptation. InICLR, 2022

2022
[75]

Multi-task learning as multi-objective optimization

Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization. InNeurIPS, 2018

2018
[76]

FAMO: Fast adaptive multitask optimization

Bo Liu, Yihao Feng, Peter Stone, and Qiang Liu. FAMO: Fast adaptive multitask optimization. InNeurIPS, 2023

2023
[77]

Gradient surgery for multi-task learning

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi-task learning. InNeurIPS, 2020

2020
[78]

Conflict-averse gradient descent for multi-task learning

Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learning. InNeurIPS, 2021. 13

2021
[79]

Multi-task learning as a bargaining game

Aviv Navon, Aviv Shamsian, Idan Achituve, Haggai Maron, Kenji Kawaguchi, Gal Chechik, and Ethan Fetaya. Multi-task learning as a bargaining game. InICML, 2022

2022
[80]

Independent component alignment for multi-task learning

Dmitry Senushkin, Nikolay Patakin, Arseny Kuznetsov, and Anton Konushin. Independent component alignment for multi-task learning. InCVPR, 2023

2023

Showing first 80 references.

[1] [1]

Dinov2: Learning robust visual features without supervision.TMLR, 2024

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Lab...

2024

[2] [2]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InICML, 2021

2021

[3] [3]

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al. SigLIP 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features.arXiv:2502.14786, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

Lungren, Tristan Naumann, Sheng Wang, and Hoifung Poon

Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Andrea Tupini, Yu Wang, Matt Mazzola, Swadheen Shukla, Lars Liden, Jianfeng Gao, Angela Crabtree, Brian Piening, Carlo Bifulco, Matthew P. Lungren, Tristan Naumann, Sheng Wang, and Hoifung Poon. A multimodal biomedical...

2025

[5] [5]

Out of distribution generalization via interventional style transfer in single-cell microscopy

Wolfgang M Pernice, Michael Doron, Alex Quach, Aditya Pratapa, Sultan Kenjeyev, Nicholas De Veaux, Michio Hirano, and Juan C Caicedo. Out of distribution generalization via interventional style transfer in single-cell microscopy. InCVPR Workshop on Computer Vision for Microscopy Image Analysis (CVMI), 2023

2023

[6] [6]

Geography-aware self-supervised learning

Kumar Ayush, Burak Uzkent, Chenlin Meng, Kumar Tanmay, Marshall Burke, David Lobell, and Stefano Ermon. Geography-aware self-supervised learning. InICCV, 2021

2021

[7] [7]

Unbiased look at dataset bias

Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. InCVPR. IEEE, 2011

2011

[8] [8]

Generalizing to unseen domains: A survey on domain generalization.IEEE transactions on knowledge and data engineering, 2022

Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, Tao Qin, Wang Lu, Yiqiang Chen, Wenjun Zeng, and Philip S Yu. Generalizing to unseen domains: A survey on domain generalization.IEEE transactions on knowledge and data engineering, 2022

2022

[9] [9]

CoDEx: Combining domain expertise for spatial generalization in satellite image analysis

Abhishek Kuriyal, Elliot Vincent, Mathieu Aubry, and Loic Landrieu. CoDEx: Combining domain expertise for spatial generalization in satellite image analysis. InCVPR workshop EarthVision, 2025

2025

[10] [10]

Vargas Hakim, David Osowiechi, Fereshteh Shakeri, Ali Bahri, Moslem Yazdanpanah, Sahar Dastani, Ismail Ben Ayed, and Christian Desrosiers

Mehrdad Noori, Gustavo A. Vargas Hakim, David Osowiechi, Fereshteh Shakeri, Ali Bahri, Moslem Yazdanpanah, Sahar Dastani, Ismail Ben Ayed, and Christian Desrosiers. Histopath-C: Towards realistic domain shifts for histopathology vision-language adaptation. InWACV, 2026

2026

[11] [11]

Measuring domain shift for deep learning in histopathology.IEEE Journal of Biomedical and Health Informatics, 25(2):325–336, 2021

Karin Stacke, Gabriel Eilertsen, Jonas Unger, and Claes Lundström. Measuring domain shift for deep learning in histopathology.IEEE Journal of Biomedical and Health Informatics, 25(2):325–336, 2021

2021

[12] [12]

Parameter-efficient fine-tuning of DINOv2 vision transformers for lung nodule classification

Benjamin P Veasey and Amir A Amini. Parameter-efficient fine-tuning of DINOv2 vision transformers for lung nodule classification. InInternational Symposium on Biomedical Imaging (ISBI). IEEE, 2024

2024

[13] [13]

How to train your ViT? data, augmentation, and regularization in vision transformers.TMLR, 2022

Andreas Peter Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, and Lucas Beyer. How to train your ViT? data, augmentation, and regularization in vision transformers.TMLR, 2022

2022

[14] [14]

Fixing the train-test resolution discrepancy

Hugo Touvron, Andrea Vedaldi, Matthijs Douze, and Hervé Jégou. Fixing the train-test resolution discrepancy. InNeurIPS, 2019

2019

[15] [15]

Knowledge-guided adaptation of pathology foundation models effectively improves cross-domain generalization and demographic fairness.Nature Communications, 2025

Yanyan Huang, Weiqin Zhao, Zhengyu Zhang, Yihang Chen, Yu Fu, Feng Wu, Yuming Jiang, Li Liang, Shujun Wang, and Lequan Yu. Knowledge-guided adaptation of pathology foundation models effectively improves cross-domain generalization and demographic fairness.Nature Communications, 2025

2025

[16] [16]

Deep learning is combined with massive-scale citizen science to improve large-scale image classification.Nature biotechnology, 36(9):820–828, 2018

Devin P Sullivan, Casper F Winsnes, Lovisa Åkesson, Martin Hjelmare, Mikaela Wiking, Rutger Schutten, Linzi Campbell, Hjalti Leifsson, Scott Rhodes, Andie Nordgren, et al. Deep learning is combined with massive-scale citizen science to improve large-scale image classification.Nature biotechnology, 36(9):820–828, 2018

2018

[17] [17]

Earnshaw, Imran S

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang. Wilds...

2021

[18] [18]

Unsupervised domain adaptation by backpropagation

Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. InICML, 2015

2015

[19] [19]

Learning transferable features with deep adaptation networks

Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. Learning transferable features with deep adaptation networks. InICML. PMLR, 2015

2015

[20] [20]

Pseudo-labelingandconfirmation bias in deep semi-supervised learning

EricArazo, DiegoOrtego, PaulAlbert, NoelEO’Connor, andKevinMcGuinness. Pseudo-labelingandconfirmation bias in deep semi-supervised learning. InIJCNN, pages 1–8. IEEE, 2020

2020

[21] [21]

The risks of invariant risk minimization

Elan Rosenfeld, Pradeep Kumar Ravikumar, and Andrej Risteski. The risks of invariant risk minimization. In ICLR, 2021

2021

[22] [22]

Metadata- guided consistency learning for high content images

Johan Fredin Haslum, Christos Matsoukas, Karl-Johan Leuchowius, Erik Müllers, and Kevin Smith. Metadata- guided consistency learning for high content images. InMedical Imaging with Deep Learning, pages 918–936. PMLR, 2024

2024

[23] [23]

Learning representations of satellite images from metadata supervision

Jules Bourcier, Gohar Dashyan, Karteek Alahari, and Jocelyn Chanussot. Learning representations of satellite images from metadata supervision. InECCV. Springer, 2024

2024

[24] [24]

Metadata-enhanced contrastive learning from retinal optical coherence tomography images.Medical Image Analysis, 2024

Robbie Holland, Oliver Leingang, Hrvoje Bogunović, Sophie Riedl, Lars Fritsche, Toby Prevost, Hendrik P N Scholl, Ursula Schmidt-Erfurth, Sobha Sivaprasad, Andrew J Lotery, Daniel Rueckert, and Martin J Menten. Metadata-enhanced contrastive learning from retinal optical coherence tomography images.Medical Image Analysis, 2024

2024

[25] [25]

Hierarchical metadata information constrained self-supervised learning for anomalous sound detection under domain shift

Haiyan Lan, Qiaoxi Zhu, Jian Guan, Yuming Wei, and Wenwu Wang. Hierarchical metadata information constrained self-supervised learning for anomalous sound detection under domain shift. InICASSP, 2024

2024

[26] [26]

Analysis of the Human Protein Atlas Image Classification competition.Nature Methods, 16(12):1254–1261, 2019

Wei Ouyang, Casper F Winsnes, Martin Hjelmare, Anthony J Cesnik, Lovisa Åkesson, Hao Xu, Devin P Sullivan, Shubin Dai, Jun Lan, Park Jinmo, Shaikat M Galib, Christof Henkel, Kevin Hwang, Dmytro Poplavskiy, Bojan Tunguz, Russell D Wolfinger, Yinzheng Gu, Chuanpeng Li, Jinbin Xie, Dmitry Buslov, Sergei Fironov, Alexander Kiselev, Dmytro Panchenko, Xuan Cao,...

2019

[27] [27]

Functional map of the world

Gordon Christie, Neil Fendley, James Wilson, and Ryan Mukherjee. Functional map of the world. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

2018

[28] [28]

The iwildcam 2020 competition dataset

Sara Beery, Elijah Cole, and Arvi Gjoka. The iwildcam 2020 competition dataset. InCVPR Fine-Grained Visual Categorization Workshop (FGVC), 2020

2020

[29] [29]

MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.Scientific data, 2019

Alistair EW Johnson, Tom J Pollard, Seth J Berkowitz, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Roger G Mark, and Steven Horng. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.Scientific data, 2019

2019

[30] [30]

Correlation alignment for unsupervised domain adaptation

Baochen Sun, Jiashi Feng, and Kate Saenko. Correlation alignment for unsupervised domain adaptation. In Domain adaptation in computer vision applications, pages 153–171. Springer, 2017

2017

[31] [31]

Towards domain-invariant self-supervised learning with batch styles standardization

Marin Scalbert, Maria Vakalopoulou, and Florent Couzinié-Devy. Towards domain-invariant self-supervised learning with batch styles standardization. InICLR, 2024

2024

[32] [32]

Cell-dino: Self-supervised image-based embeddings for cell fluorescent microscopy.PLOS Computational Biology, 21(12):e1013828, 2025

Théo Moutakanni, Camille Couprie, Seungeun Yi, Michael Doron, Zitong S Chen, Nikita Moshkov, Elouan Gardes, Mathilde Caron, Hugo Touvron, Armand Joulin, Piotr Bojanowski, Wolfgang M Pernice, and Juan C Caicedo. Cell-dino: Self-supervised image-based embeddings for cell fluorescent microscopy.PLOS Computational Biology, 21(12):e1013828, 2025

2025

[33] [33]

Parameter efficient self-supervised geospatial domain adaptation

Linus Scheibenreif, Michael Mommert, and Damian Borth. Parameter efficient self-supervised geospatial domain adaptation. InCVPR, 2024

2024

[34] [34]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InICML, 2020

2020

[35] [35]

Momentum contrast for unsupervised visual representation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InCVPR, 2020

2020

[36] [36]

Emerging properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InICCV, 2021

2021

[37] [37]

Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, 11 Rémi Munos, and Michal Valko

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, 11 Rémi Munos, and Michal Valko. Bootstrap your own latent: A new approach to self-supervised learning. In NeurIPS, 2020

2020

[38] [38]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InCVPR, 2022

2022

[39] [39]

Self-supervised learning from images with a joint-embedding predictive architecture

Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. InCVPR, 2023

2023

[40] [40]

Unsupervised learning of visual features by contrasting cluster assignments

Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments. InNeurIPS, 2020

2020

[41] [41]

Junnan Li, Pan Zhou, Caiming Xiong, and Steven C.H. Hoi. Prototypical contrastive learning of unsupervised representations. InICLR, 2021

2021

[42] [42]

Supervised contrastive learning

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. InNeurIPS, 2020

2020

[43] [43]

Rayan Krishnan, Pranav Rajpurkar, and Eric J. Topol. Self-supervised learning in medicine and healthcare. Nature Biomedical Engineering, 2022

2022

[44] [44]

Chen, Nikita Moshkov, Mathilde Caron, Hugo Touvron, Piotr Bojanowski, Wolfgang M

Michael Doron, Théo Moutakanni, Zitong S. Chen, Nikita Moshkov, Mathilde Caron, Hugo Touvron, Piotr Bojanowski, Wolfgang M. Pernice, and Juan C. Caicedo. Unbiased single-cell morphology with self-supervised vision transformers.bioRxiv, 2023

2023

[45] [45]

Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 2022

Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Lichao Mou, and Xiao Xiang Zhu. Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 2022

2022

[46] [46]

AutoFT: Learning an objective for robust fine-tuning.arXiv:2401.10220, 2024

Caroline Choi, Yoonho Lee, Annie Chen, Allan Zhou, Aditi Raghunathan, and Chelsea Finn. AutoFT: Learning an objective for robust fine-tuning.arXiv:2401.10220, 2024

work page arXiv 2024

[47] [47]

Connect later: Improving fine-tuning for robustness with targeted augmentations

Helen Qu and Sang Michael Xie. Connect later: Improving fine-tuning for robustness with targeted augmentations. InICML, 2024

2024

[48] [48]

Tackling the widespread and critical impact of batch effects in high-throughput data.Nature Reviews Genetics, 2010

Jeffrey T Leek, Robert B Scharpf, Héctor Corrada Bravo, David Simcha, Ben Langmead, W Evan Johnson, Donald Geman, Keith Baggerly, and Rafael A Irizarry. Tackling the widespread and critical impact of batch effects in high-throughput data.Nature Reviews Genetics, 2010

2010

[49] [49]

Contrastive learning for fair representations.arXiv:2109.10645, 2021

Aili Shen, Xudong Han, Trevor Cohn, Timothy Baldwin, and Lea Frermann. Contrastive learning for fair representations.arXiv:2109.10645, 2021

work page arXiv 2021

[50] [50]

Compositional risk minimization

Divyat Mahajan, Mohammad Pezeshki, Charles Arnal, Ioannis Mitliagkas, Kartik Ahuja, and Pascal Vincent. Compositional risk minimization. InICML, 2025

2025

[51] [51]

Korsunsky, N

I. Korsunsky, N. Millard, J. Fan, K. Slowikowski, F. Zhang, K. Wei, Y. Baglaenko, M. Brenner, P. R. Loh, and S. Raychaudhuri. Fast, sensitive and accurate integration of single-cell data with harmony.Nature Methods, 2019

2019

[52] [52]

J. B. Kang, A. Nathan, K. Weinand, F. Zhang, N. Millard, L. Rumker, D. B. Moody, I. Korsunsky, and S. Raychaudhuri. Efficient and precise single-cell reference atlas mapping with symphony.Nature Communications, 2021

2021

[53] [53]

A brief introduction to weakly supervised learning.National science review, 2018

Zhi-Hua Zhou. A brief introduction to weakly supervised learning.National science review, 2018

2018

[54] [54]

Cheveralls, Manuel D

Hirofumi Kobayashi, Keith C. Cheveralls, Manuel D. Leonetti, and Loic A. Royer. Self-supervised deep learning encodes high-resolution features of protein subcellular localization.Nature Methods, 2022

2022

[55] [55]

Hansen, Will Leineweber, Anthony Cesnik, Dan Lu, Ulrika Axelsson, Frederic Ballllosera Navarro, Theofanis Karaletsos, and Emma Lundberg

Ankit Gupta, Zoe Wefers, Konstantin Kahnert, Jan N. Hansen, Will Leineweber, Anthony Cesnik, Dan Lu, Ulrika Axelsson, Frederic Ballllosera Navarro, Theofanis Karaletsos, and Emma Lundberg. SubCell: Vision foundation models for microscopy capture single-cell biology.bioRxiv, 2024

2024

[56] [56]

PRETI: Patient-aware retinal foundation model via metadata-guided representation learning

Yeonkyung Lee, Woojung Han, Youngjun Jun, Hyeonmin Kim, Jungkyung Cho, and Seong Jae Hwang. PRETI: Patient-aware retinal foundation model via metadata-guided representation learning. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 523–533, 2025

2025

[57] [57]

Lobell, and Stefano Ermon

Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David B. Lobell, and Stefano Ermon. SatMAE: Pre-training transformers for temporal and multi-spectral satellite imagery. In NeurIPS, 2022. 12

2022

[58] [58]

Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell

Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell. Scale-MAE: A scale-aware masked autoencoder for multiscale geospatial representation learning. InICCV, 2023

2023

[59] [59]

Contextual vision transformers for robust representation learning

Yujia Bao and Theofanis Karaletsos. Contextual vision transformers for robust representation learning. InICML Workshop on Spurious Correlations, Invariance and Stability (SCIS), 2023

2023

[60] [60]

Multitask learning.Machine Learning, 1997

Rich Caruana. Multitask learning.Machine Learning, 1997

1997

[61] [61]

Learning visual representations via language-guided sampling

Mohamed El Banani, Karan Desai, and Justin Johnson. Learning visual representations via language-guided sampling. InCVPR, 2023

2023

[62] [62]

Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InICML, 2018

2018

[63] [63]

Towards impartial multi-task learning

Liyang Liu, Yi Li, Zhanghui Kuang, Jing-Hao Xue, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. Towards impartial multi-task learning. InICLR, 2021

2021

[64] [64]

Rotograd: Gradient homogenization in multitask learning

Adrián Javaloy and Isabel Valera. Rotograd: Gradient homogenization in multitask learning. InICLR, 2022

2022

[65] [65]

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

Randall Balestriero and Yann LeCun. LeJEPA: Provable and scalable self-supervised learning without the heuristics.arXiv:2511.08544, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[66] [66]

DINOv3

Oriane Siméoni, Huy V Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv:2508.10104, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[67] [67]

A subcellular map of the human proteome.Science, 2017

Peter J Thul, Lovisa Åkesson, Mikaela Wiking, Diana Mahdessian, Aikaterini Geladaki, Hammou Ait Blal, Tove Alm, Anna Asplund, Lars Björk, Lisa M Breckels, et al. A subcellular map of the human proteome.Science, 2017

2017

[68] [68]

The human protein atlas: A spatial map of the human proteome.Protein Science, 2018

Peter J Thul and Cecilia Lindskog. The human protein atlas: A spatial map of the human proteome.Protein Science, 2018

2018

[69] [69]

Cho, Keith C

Nathan H. Cho, Keith C. Cheveralls, Andreas-David Brunner, Kibeom Kim, André C. Michaelis, Preethi Raghavan, Hirofumi Kobayashi, Laura Savy, Jason Y. Li, Hera Canaj, James Y. S. Kim, Edna M. Stewart, Christian Gnann, Frank McCarthy, Joana P. Cabrera, Rachel M. Brunetti, Bryant B. Chhun, Greg Dingle, Marco Y. Hein, Bo Huang, Shalin B. Mehta, Jonathan S. We...

2022

[70] [70]

CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison

Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, et al. CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. InAAAI, 2019

2019

[71] [71]

FLAIR-HUB: Large-scale multimodal dataset for land cover and crop mapping.ISPRS Journal of Photogrammetry and Remote Sensing, 237:271–300, 2026

Anatol Garioud, Sébastien Giordano, Nicolas David, and Nicolas Gonthier. FLAIR-HUB: Large-scale multimodal dataset for land cover and crop mapping.ISPRS Journal of Photogrammetry and Remote Sensing, 237:271–300, 2026

2026

[72] [72]

Context autoencoder for self-supervised representation learning.IJCV, 2024

Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, and Jingdong Wang. Context autoencoder for self-supervised representation learning.IJCV, 2024

2024

[73] [73]

arXiv preprint arXiv:2405.01469 (2024)

Théo Moutakanni, Piotr Bojanowski, Guillaume Chassagnon, Céline Hudelot, Armand Joulin, Yann LeCun, Matthew Muckley, Maxime Oquab, Marie-Pierre Revel, and Maria Vakalopoulou. Advancing human-centric ai for robust x-ray analysis through holistic self-supervised learning.arXiv:2405.01469, 2024

work page arXiv 2024

[74] [74]

Extending the wilds benchmark for unsupervised adaptation

Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, et al. Extending the wilds benchmark for unsupervised adaptation. InICLR, 2022

2022

[75] [75]

Multi-task learning as multi-objective optimization

Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization. InNeurIPS, 2018

2018

[76] [76]

FAMO: Fast adaptive multitask optimization

Bo Liu, Yihao Feng, Peter Stone, and Qiang Liu. FAMO: Fast adaptive multitask optimization. InNeurIPS, 2023

2023

[77] [77]

Gradient surgery for multi-task learning

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi-task learning. InNeurIPS, 2020

2020

[78] [78]

Conflict-averse gradient descent for multi-task learning

Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learning. InNeurIPS, 2021. 13

2021

[79] [79]

Multi-task learning as a bargaining game

Aviv Navon, Aviv Shamsian, Idan Achituve, Haggai Maron, Kenji Kawaguchi, Gal Chechik, and Ethan Fetaya. Multi-task learning as a bargaining game. InICML, 2022

2022

[80] [80]

Independent component alignment for multi-task learning

Dmitry Senushkin, Nikolay Patakin, Arseny Kuznetsov, and Anton Konushin. Independent component alignment for multi-task learning. InCVPR, 2023

2023