Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have
Pith reviewed 2026-06-28 06:41 UTC · model grok-4.3
The pith
Vision foundation models can be adapted to scientific domains using only the metadata already attached to images, without task labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FINO combines a standard self-supervised objective with flexible metadata guidance that handles both highly granular discrete metadata and continuous metadata. It encourages the representation to preserve informative factors while suppressing spurious ones. Across subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging, FINO consistently outperforms standard unsupervised domain adaptation and fully supervised adaptation. It also exceeds highly-specialized domain-specific state of the art, while using no task labels for backbone adaptation and only lightweight probes for supervision.
What carries the argument
FINO, the self-supervised adaptation method that adds flexible metadata guidance to a standard self-supervised objective to separate informative factors from spurious ones in the learned representation.
If this is right
- Adaptation succeeds without any task labels for the backbone across multiple tested domains.
- Performance exceeds both unsupervised domain adaptation and fully supervised adaptation.
- Results surpass highly-specialized domain-specific state-of-the-art methods.
- Only lightweight probes are needed for any remaining supervision.
Where Pith is reading between the lines
- Prioritizing richer metadata collection during scientific data acquisition could amplify the benefits of this style of adaptation.
- The same metadata-guided principle might extend to non-vision foundation models if analogous side information is available.
- Domains where metadata is noisy or weakly related to the task may require extra robustness steps not tested here.
Load-bearing premise
The metadata already present with the images supplies reliable signals for separating informative factors from spurious ones across the tested scientific domains.
What would settle it
Applying FINO to a new domain where the available metadata has no correlation with task-relevant variation and finding that performance is no better than a metadata-free self-supervised baseline.
read the original abstract
We propose a label-free approach to adapt powerful but generic vision foundation models to specialized scientific domains. Standard supervised fine-tuning is often ill-suited to these settings: labels are scarce, and task-specific training can collapse the model's generality and hurt robustness. We instead leverage metadata to adapt representations to new domains in a self-supervised manner. Our method, FINO, combines a standard self-supervised objective with flexible metadata guidance that handles both highly granular discrete metadata and continuous metadata. It encourages the representation to preserve informative factors while suppressing spurious ones. Across subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging, FINO consistently outperforms standard unsupervised domain adaptation and fully supervised adaptation. It also exceeds highly-specialized domain-specific state of the art, while using no task labels for backbone adaptation and only lightweight probes for supervision.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes FINO, a label-free method to adapt vision foundation models to specialized scientific domains by combining a standard self-supervised objective with flexible metadata guidance (handling both granular discrete and continuous metadata). The approach encourages representations to preserve informative factors while suppressing spurious ones. It claims consistent outperformance over standard unsupervised domain adaptation, fully supervised adaptation, and highly-specialized domain-specific state-of-the-art methods across four domains (subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging), using no task labels for backbone adaptation and only lightweight probes for supervision.
Significance. If the empirical claims hold under rigorous validation, the work would be significant for enabling practical adaptation of foundation models in label-scarce scientific domains by exploiting readily available metadata. This could improve robustness and generality compared to standard fine-tuning or UDA, with the multi-domain evaluation providing a broad test of the approach.
major comments (1)
- [Abstract] Abstract: the claim of consistent outperformance across four domains supplies no experimental details, baseline definitions, or statistical tests, preventing verification of the data-to-claim link from the provided text.
Simulated Author's Rebuttal
We thank the referee for their review. We address the single major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim of consistent outperformance across four domains supplies no experimental details, baseline definitions, or statistical tests, preventing verification of the data-to-claim link from the provided text.
Authors: We agree that the abstract, as a concise summary, does not include experimental details, baseline definitions, or statistical tests. This is standard due to length constraints. The manuscript provides these in full in Section 4 (Experiments), including the four domains, comparisons against unsupervised domain adaptation, fully supervised adaptation, and domain-specific SOTA methods, with results in Tables 1-4 and statistical analysis where reported. Readers can verify the claims from the main text. revision: no
Circularity Check
No significant circularity
full rationale
The paper introduces FINO as a metadata-guided self-supervised adaptation method for vision foundation models. No equations, derivations, or parameter-fitting steps are described that reduce any claimed prediction or result to a quantity defined by the method itself. The approach combines standard self-supervised objectives with metadata handling, and performance claims are presented as empirical comparisons across domains rather than algebraic identities or self-referential fits. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text. The derivation chain is therefore self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Dinov2: Learning robust visual features without supervision.TMLR, 2024
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Lab...
2024
-
[2]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InICML, 2021
2021
-
[3]
Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al. SigLIP 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features.arXiv:2502.14786, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Lungren, Tristan Naumann, Sheng Wang, and Hoifung Poon
Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Andrea Tupini, Yu Wang, Matt Mazzola, Swadheen Shukla, Lars Liden, Jianfeng Gao, Angela Crabtree, Brian Piening, Carlo Bifulco, Matthew P. Lungren, Tristan Naumann, Sheng Wang, and Hoifung Poon. A multimodal biomedical...
2025
-
[5]
Out of distribution generalization via interventional style transfer in single-cell microscopy
Wolfgang M Pernice, Michael Doron, Alex Quach, Aditya Pratapa, Sultan Kenjeyev, Nicholas De Veaux, Michio Hirano, and Juan C Caicedo. Out of distribution generalization via interventional style transfer in single-cell microscopy. InCVPR Workshop on Computer Vision for Microscopy Image Analysis (CVMI), 2023
2023
-
[6]
Geography-aware self-supervised learning
Kumar Ayush, Burak Uzkent, Chenlin Meng, Kumar Tanmay, Marshall Burke, David Lobell, and Stefano Ermon. Geography-aware self-supervised learning. InICCV, 2021
2021
-
[7]
Unbiased look at dataset bias
Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. InCVPR. IEEE, 2011
2011
-
[8]
Generalizing to unseen domains: A survey on domain generalization.IEEE transactions on knowledge and data engineering, 2022
Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, Tao Qin, Wang Lu, Yiqiang Chen, Wenjun Zeng, and Philip S Yu. Generalizing to unseen domains: A survey on domain generalization.IEEE transactions on knowledge and data engineering, 2022
2022
-
[9]
CoDEx: Combining domain expertise for spatial generalization in satellite image analysis
Abhishek Kuriyal, Elliot Vincent, Mathieu Aubry, and Loic Landrieu. CoDEx: Combining domain expertise for spatial generalization in satellite image analysis. InCVPR workshop EarthVision, 2025
2025
-
[10]
Vargas Hakim, David Osowiechi, Fereshteh Shakeri, Ali Bahri, Moslem Yazdanpanah, Sahar Dastani, Ismail Ben Ayed, and Christian Desrosiers
Mehrdad Noori, Gustavo A. Vargas Hakim, David Osowiechi, Fereshteh Shakeri, Ali Bahri, Moslem Yazdanpanah, Sahar Dastani, Ismail Ben Ayed, and Christian Desrosiers. Histopath-C: Towards realistic domain shifts for histopathology vision-language adaptation. InWACV, 2026
2026
-
[11]
Measuring domain shift for deep learning in histopathology.IEEE Journal of Biomedical and Health Informatics, 25(2):325–336, 2021
Karin Stacke, Gabriel Eilertsen, Jonas Unger, and Claes Lundström. Measuring domain shift for deep learning in histopathology.IEEE Journal of Biomedical and Health Informatics, 25(2):325–336, 2021
2021
-
[12]
Parameter-efficient fine-tuning of DINOv2 vision transformers for lung nodule classification
Benjamin P Veasey and Amir A Amini. Parameter-efficient fine-tuning of DINOv2 vision transformers for lung nodule classification. InInternational Symposium on Biomedical Imaging (ISBI). IEEE, 2024
2024
-
[13]
How to train your ViT? data, augmentation, and regularization in vision transformers.TMLR, 2022
Andreas Peter Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, and Lucas Beyer. How to train your ViT? data, augmentation, and regularization in vision transformers.TMLR, 2022
2022
-
[14]
Fixing the train-test resolution discrepancy
Hugo Touvron, Andrea Vedaldi, Matthijs Douze, and Hervé Jégou. Fixing the train-test resolution discrepancy. InNeurIPS, 2019
2019
-
[15]
Knowledge-guided adaptation of pathology foundation models effectively improves cross-domain generalization and demographic fairness.Nature Communications, 2025
Yanyan Huang, Weiqin Zhao, Zhengyu Zhang, Yihang Chen, Yu Fu, Feng Wu, Yuming Jiang, Li Liang, Shujun Wang, and Lequan Yu. Knowledge-guided adaptation of pathology foundation models effectively improves cross-domain generalization and demographic fairness.Nature Communications, 2025
2025
-
[16]
Deep learning is combined with massive-scale citizen science to improve large-scale image classification.Nature biotechnology, 36(9):820–828, 2018
Devin P Sullivan, Casper F Winsnes, Lovisa Åkesson, Martin Hjelmare, Mikaela Wiking, Rutger Schutten, Linzi Campbell, Hjalti Leifsson, Scott Rhodes, Andie Nordgren, et al. Deep learning is combined with massive-scale citizen science to improve large-scale image classification.Nature biotechnology, 36(9):820–828, 2018
2018
-
[17]
Earnshaw, Imran S
Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang. Wilds...
2021
-
[18]
Unsupervised domain adaptation by backpropagation
Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. InICML, 2015
2015
-
[19]
Learning transferable features with deep adaptation networks
Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. Learning transferable features with deep adaptation networks. InICML. PMLR, 2015
2015
-
[20]
Pseudo-labelingandconfirmation bias in deep semi-supervised learning
EricArazo, DiegoOrtego, PaulAlbert, NoelEO’Connor, andKevinMcGuinness. Pseudo-labelingandconfirmation bias in deep semi-supervised learning. InIJCNN, pages 1–8. IEEE, 2020
2020
-
[21]
The risks of invariant risk minimization
Elan Rosenfeld, Pradeep Kumar Ravikumar, and Andrej Risteski. The risks of invariant risk minimization. In ICLR, 2021
2021
-
[22]
Metadata- guided consistency learning for high content images
Johan Fredin Haslum, Christos Matsoukas, Karl-Johan Leuchowius, Erik Müllers, and Kevin Smith. Metadata- guided consistency learning for high content images. InMedical Imaging with Deep Learning, pages 918–936. PMLR, 2024
2024
-
[23]
Learning representations of satellite images from metadata supervision
Jules Bourcier, Gohar Dashyan, Karteek Alahari, and Jocelyn Chanussot. Learning representations of satellite images from metadata supervision. InECCV. Springer, 2024
2024
-
[24]
Metadata-enhanced contrastive learning from retinal optical coherence tomography images.Medical Image Analysis, 2024
Robbie Holland, Oliver Leingang, Hrvoje Bogunović, Sophie Riedl, Lars Fritsche, Toby Prevost, Hendrik P N Scholl, Ursula Schmidt-Erfurth, Sobha Sivaprasad, Andrew J Lotery, Daniel Rueckert, and Martin J Menten. Metadata-enhanced contrastive learning from retinal optical coherence tomography images.Medical Image Analysis, 2024
2024
-
[25]
Hierarchical metadata information constrained self-supervised learning for anomalous sound detection under domain shift
Haiyan Lan, Qiaoxi Zhu, Jian Guan, Yuming Wei, and Wenwu Wang. Hierarchical metadata information constrained self-supervised learning for anomalous sound detection under domain shift. InICASSP, 2024
2024
-
[26]
Analysis of the Human Protein Atlas Image Classification competition.Nature Methods, 16(12):1254–1261, 2019
Wei Ouyang, Casper F Winsnes, Martin Hjelmare, Anthony J Cesnik, Lovisa Åkesson, Hao Xu, Devin P Sullivan, Shubin Dai, Jun Lan, Park Jinmo, Shaikat M Galib, Christof Henkel, Kevin Hwang, Dmytro Poplavskiy, Bojan Tunguz, Russell D Wolfinger, Yinzheng Gu, Chuanpeng Li, Jinbin Xie, Dmitry Buslov, Sergei Fironov, Alexander Kiselev, Dmytro Panchenko, Xuan Cao,...
2019
-
[27]
Functional map of the world
Gordon Christie, Neil Fendley, James Wilson, and Ryan Mukherjee. Functional map of the world. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
2018
-
[28]
The iwildcam 2020 competition dataset
Sara Beery, Elijah Cole, and Arvi Gjoka. The iwildcam 2020 competition dataset. InCVPR Fine-Grained Visual Categorization Workshop (FGVC), 2020
2020
-
[29]
MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.Scientific data, 2019
Alistair EW Johnson, Tom J Pollard, Seth J Berkowitz, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Roger G Mark, and Steven Horng. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.Scientific data, 2019
2019
-
[30]
Correlation alignment for unsupervised domain adaptation
Baochen Sun, Jiashi Feng, and Kate Saenko. Correlation alignment for unsupervised domain adaptation. In Domain adaptation in computer vision applications, pages 153–171. Springer, 2017
2017
-
[31]
Towards domain-invariant self-supervised learning with batch styles standardization
Marin Scalbert, Maria Vakalopoulou, and Florent Couzinié-Devy. Towards domain-invariant self-supervised learning with batch styles standardization. InICLR, 2024
2024
-
[32]
Cell-dino: Self-supervised image-based embeddings for cell fluorescent microscopy.PLOS Computational Biology, 21(12):e1013828, 2025
Théo Moutakanni, Camille Couprie, Seungeun Yi, Michael Doron, Zitong S Chen, Nikita Moshkov, Elouan Gardes, Mathilde Caron, Hugo Touvron, Armand Joulin, Piotr Bojanowski, Wolfgang M Pernice, and Juan C Caicedo. Cell-dino: Self-supervised image-based embeddings for cell fluorescent microscopy.PLOS Computational Biology, 21(12):e1013828, 2025
2025
-
[33]
Parameter efficient self-supervised geospatial domain adaptation
Linus Scheibenreif, Michael Mommert, and Damian Borth. Parameter efficient self-supervised geospatial domain adaptation. InCVPR, 2024
2024
-
[34]
A simple framework for contrastive learning of visual representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InICML, 2020
2020
-
[35]
Momentum contrast for unsupervised visual representation learning
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InCVPR, 2020
2020
-
[36]
Emerging properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InICCV, 2021
2021
-
[37]
Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, 11 Rémi Munos, and Michal Valko
Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, 11 Rémi Munos, and Michal Valko. Bootstrap your own latent: A new approach to self-supervised learning. In NeurIPS, 2020
2020
-
[38]
Masked autoencoders are scalable vision learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InCVPR, 2022
2022
-
[39]
Self-supervised learning from images with a joint-embedding predictive architecture
Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. InCVPR, 2023
2023
-
[40]
Unsupervised learning of visual features by contrasting cluster assignments
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments. InNeurIPS, 2020
2020
-
[41]
Junnan Li, Pan Zhou, Caiming Xiong, and Steven C.H. Hoi. Prototypical contrastive learning of unsupervised representations. InICLR, 2021
2021
-
[42]
Supervised contrastive learning
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. InNeurIPS, 2020
2020
-
[43]
Rayan Krishnan, Pranav Rajpurkar, and Eric J. Topol. Self-supervised learning in medicine and healthcare. Nature Biomedical Engineering, 2022
2022
-
[44]
Chen, Nikita Moshkov, Mathilde Caron, Hugo Touvron, Piotr Bojanowski, Wolfgang M
Michael Doron, Théo Moutakanni, Zitong S. Chen, Nikita Moshkov, Mathilde Caron, Hugo Touvron, Piotr Bojanowski, Wolfgang M. Pernice, and Juan C. Caicedo. Unbiased single-cell morphology with self-supervised vision transformers.bioRxiv, 2023
2023
-
[45]
Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 2022
Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Lichao Mou, and Xiao Xiang Zhu. Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 2022
2022
-
[46]
AutoFT: Learning an objective for robust fine-tuning.arXiv:2401.10220, 2024
Caroline Choi, Yoonho Lee, Annie Chen, Allan Zhou, Aditi Raghunathan, and Chelsea Finn. AutoFT: Learning an objective for robust fine-tuning.arXiv:2401.10220, 2024
-
[47]
Connect later: Improving fine-tuning for robustness with targeted augmentations
Helen Qu and Sang Michael Xie. Connect later: Improving fine-tuning for robustness with targeted augmentations. InICML, 2024
2024
-
[48]
Tackling the widespread and critical impact of batch effects in high-throughput data.Nature Reviews Genetics, 2010
Jeffrey T Leek, Robert B Scharpf, Héctor Corrada Bravo, David Simcha, Ben Langmead, W Evan Johnson, Donald Geman, Keith Baggerly, and Rafael A Irizarry. Tackling the widespread and critical impact of batch effects in high-throughput data.Nature Reviews Genetics, 2010
2010
-
[49]
Contrastive learning for fair representations.arXiv:2109.10645, 2021
Aili Shen, Xudong Han, Trevor Cohn, Timothy Baldwin, and Lea Frermann. Contrastive learning for fair representations.arXiv:2109.10645, 2021
-
[50]
Compositional risk minimization
Divyat Mahajan, Mohammad Pezeshki, Charles Arnal, Ioannis Mitliagkas, Kartik Ahuja, and Pascal Vincent. Compositional risk minimization. InICML, 2025
2025
-
[51]
Korsunsky, N
I. Korsunsky, N. Millard, J. Fan, K. Slowikowski, F. Zhang, K. Wei, Y. Baglaenko, M. Brenner, P. R. Loh, and S. Raychaudhuri. Fast, sensitive and accurate integration of single-cell data with harmony.Nature Methods, 2019
2019
-
[52]
J. B. Kang, A. Nathan, K. Weinand, F. Zhang, N. Millard, L. Rumker, D. B. Moody, I. Korsunsky, and S. Raychaudhuri. Efficient and precise single-cell reference atlas mapping with symphony.Nature Communications, 2021
2021
-
[53]
A brief introduction to weakly supervised learning.National science review, 2018
Zhi-Hua Zhou. A brief introduction to weakly supervised learning.National science review, 2018
2018
-
[54]
Cheveralls, Manuel D
Hirofumi Kobayashi, Keith C. Cheveralls, Manuel D. Leonetti, and Loic A. Royer. Self-supervised deep learning encodes high-resolution features of protein subcellular localization.Nature Methods, 2022
2022
-
[55]
Hansen, Will Leineweber, Anthony Cesnik, Dan Lu, Ulrika Axelsson, Frederic Ballllosera Navarro, Theofanis Karaletsos, and Emma Lundberg
Ankit Gupta, Zoe Wefers, Konstantin Kahnert, Jan N. Hansen, Will Leineweber, Anthony Cesnik, Dan Lu, Ulrika Axelsson, Frederic Ballllosera Navarro, Theofanis Karaletsos, and Emma Lundberg. SubCell: Vision foundation models for microscopy capture single-cell biology.bioRxiv, 2024
2024
-
[56]
PRETI: Patient-aware retinal foundation model via metadata-guided representation learning
Yeonkyung Lee, Woojung Han, Youngjun Jun, Hyeonmin Kim, Jungkyung Cho, and Seong Jae Hwang. PRETI: Patient-aware retinal foundation model via metadata-guided representation learning. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 523–533, 2025
2025
-
[57]
Lobell, and Stefano Ermon
Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David B. Lobell, and Stefano Ermon. SatMAE: Pre-training transformers for temporal and multi-spectral satellite imagery. In NeurIPS, 2022. 12
2022
-
[58]
Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell
Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell. Scale-MAE: A scale-aware masked autoencoder for multiscale geospatial representation learning. InICCV, 2023
2023
-
[59]
Contextual vision transformers for robust representation learning
Yujia Bao and Theofanis Karaletsos. Contextual vision transformers for robust representation learning. InICML Workshop on Spurious Correlations, Invariance and Stability (SCIS), 2023
2023
-
[60]
Multitask learning.Machine Learning, 1997
Rich Caruana. Multitask learning.Machine Learning, 1997
1997
-
[61]
Learning visual representations via language-guided sampling
Mohamed El Banani, Karan Desai, and Justin Johnson. Learning visual representations via language-guided sampling. InCVPR, 2023
2023
-
[62]
Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks
Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InICML, 2018
2018
-
[63]
Towards impartial multi-task learning
Liyang Liu, Yi Li, Zhanghui Kuang, Jing-Hao Xue, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. Towards impartial multi-task learning. InICLR, 2021
2021
-
[64]
Rotograd: Gradient homogenization in multitask learning
Adrián Javaloy and Isabel Valera. Rotograd: Gradient homogenization in multitask learning. InICLR, 2022
2022
-
[65]
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
Randall Balestriero and Yann LeCun. LeJEPA: Provable and scalable self-supervised learning without the heuristics.arXiv:2511.08544, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[66]
Oriane Siméoni, Huy V Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv:2508.10104, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[67]
A subcellular map of the human proteome.Science, 2017
Peter J Thul, Lovisa Åkesson, Mikaela Wiking, Diana Mahdessian, Aikaterini Geladaki, Hammou Ait Blal, Tove Alm, Anna Asplund, Lars Björk, Lisa M Breckels, et al. A subcellular map of the human proteome.Science, 2017
2017
-
[68]
The human protein atlas: A spatial map of the human proteome.Protein Science, 2018
Peter J Thul and Cecilia Lindskog. The human protein atlas: A spatial map of the human proteome.Protein Science, 2018
2018
-
[69]
Cho, Keith C
Nathan H. Cho, Keith C. Cheveralls, Andreas-David Brunner, Kibeom Kim, André C. Michaelis, Preethi Raghavan, Hirofumi Kobayashi, Laura Savy, Jason Y. Li, Hera Canaj, James Y. S. Kim, Edna M. Stewart, Christian Gnann, Frank McCarthy, Joana P. Cabrera, Rachel M. Brunetti, Bryant B. Chhun, Greg Dingle, Marco Y. Hein, Bo Huang, Shalin B. Mehta, Jonathan S. We...
2022
-
[70]
CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison
Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, et al. CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. InAAAI, 2019
2019
-
[71]
FLAIR-HUB: Large-scale multimodal dataset for land cover and crop mapping.ISPRS Journal of Photogrammetry and Remote Sensing, 237:271–300, 2026
Anatol Garioud, Sébastien Giordano, Nicolas David, and Nicolas Gonthier. FLAIR-HUB: Large-scale multimodal dataset for land cover and crop mapping.ISPRS Journal of Photogrammetry and Remote Sensing, 237:271–300, 2026
2026
-
[72]
Context autoencoder for self-supervised representation learning.IJCV, 2024
Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, and Jingdong Wang. Context autoencoder for self-supervised representation learning.IJCV, 2024
2024
-
[73]
arXiv preprint arXiv:2405.01469 (2024)
Théo Moutakanni, Piotr Bojanowski, Guillaume Chassagnon, Céline Hudelot, Armand Joulin, Yann LeCun, Matthew Muckley, Maxime Oquab, Marie-Pierre Revel, and Maria Vakalopoulou. Advancing human-centric ai for robust x-ray analysis through holistic self-supervised learning.arXiv:2405.01469, 2024
-
[74]
Extending the wilds benchmark for unsupervised adaptation
Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, et al. Extending the wilds benchmark for unsupervised adaptation. InICLR, 2022
2022
-
[75]
Multi-task learning as multi-objective optimization
Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization. InNeurIPS, 2018
2018
-
[76]
FAMO: Fast adaptive multitask optimization
Bo Liu, Yihao Feng, Peter Stone, and Qiang Liu. FAMO: Fast adaptive multitask optimization. InNeurIPS, 2023
2023
-
[77]
Gradient surgery for multi-task learning
Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi-task learning. InNeurIPS, 2020
2020
-
[78]
Conflict-averse gradient descent for multi-task learning
Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learning. InNeurIPS, 2021. 13
2021
-
[79]
Multi-task learning as a bargaining game
Aviv Navon, Aviv Shamsian, Idan Achituve, Haggai Maron, Kenji Kawaguchi, Gal Chechik, and Ethan Fetaya. Multi-task learning as a bargaining game. InICML, 2022
2022
-
[80]
Independent component alignment for multi-task learning
Dmitry Senushkin, Nikolay Patakin, Arseny Kuznetsov, and Anton Konushin. Independent component alignment for multi-task learning. InCVPR, 2023
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.