MorphoHELM: A Comprehensive Benchmark for Evaluating Representations for Microscopy-Based Morphology Assays

arxiv: 2605.15383 · v1 · pith:WDFD5QSBnew · submitted 2026-05-14 · 💻 cs.CV

MorphoHELM: A Comprehensive Benchmark for Evaluating Representations for Microscopy-Based Morphology Assays

Emre Hayir , Lorin Crawford , Alex X. Lu This is my paper

Pith reviewed 2026-05-19 15:42 UTC · model grok-4.3

classification 💻 cs.CV

keywords Cell Paintingmorphological profilingrepresentation learningbatch effectsmicroscopyfeature extractionbenchmarkcomputer vision

0 comments p. Extension

pith:WDFD5QSB Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{WDFD5QSB}

Prints a linked pith:WDFD5QSB badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

No existing model outperforms classic computer vision analytic strategies across all settings for microscopy morphology representations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MorphoHELM as a unified benchmark that standardizes evaluation of feature extraction methods on Cell Painting microscopy images used to study cell responses to perturbations. It tests methods at multiple simulated levels of batch effects to measure how well each detects biological signals as technical noise rises. Results reveal clear trade-offs, with some approaches strong on particular signals but weaker on others. Overall, classic computer vision strategies emerge as the most reliable for general use cases rather than any deep learning model dominating every condition.

Core claim

MorphoHELM consolidates tasks and metrics for Cell Painting assays, extends them for robustness, and evaluates the widest range of methods to date while quantifying performance degradation under increasing batch effects; this shows that no model outperforms classic computer vision analytic strategies across all settings, which remain the strongest general use-case representations.

What carries the argument

MorphoHELM benchmark, which applies each representation task at controlled degrees of batch effects to isolate how well methods extract biological signal amid technical noise.

If this is right

Researchers gain a standardized way to compare new methods against established baselines under realistic noise conditions.
Method selection should depend on the specific biological signal of interest because of observed performance trade-offs.
Classic computer vision approaches serve as the default strong choice for general morphological profiling applications.
Public datasets and code enable consistent future benchmarking and correction of prior fragmented evaluations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The benchmark could be applied to real multi-lab batch variation data to test whether simulated noise levels match actual experimental variability.
Hybrid methods that combine classic analytic features with deep learning components might close the gap in specific high-noise regimes.
Similar evaluation structures could help standardize representation testing in other high-content biological imaging domains.

Load-bearing premise

The chosen tasks, metrics, and simulated batch effect levels fully and without bias capture the ability of representations to detect true biological signals.

What would settle it

A new model that outperforms classic computer vision methods on every task at every batch effect level within the MorphoHELM datasets would falsify the central finding.

Figures

Figures reproduced from arXiv: 2605.15383 by Alex X. Lu, Emre Hayir, Lorin Crawford.

**Figure 2.** Figure 2: Pairwise Jaccard similarity between least-restrictive MoA significant-compound discovery [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Gene Pathway Enrichment. performance advantage in the Not Same Batch setting). Conversely, ResNet is the worst performing model on our Geometric Mean OR metric (except for our untrained ResNet baseline), despite being relatively strong for MoA enrichment. This indicates that different models may have trade-offs across biological tasks. Intriguingly, models may exhibit generalization among tasks that doesn’… view at source ↗

**Figure 4.** Figure 4: kNN replicate retrieval. matching. For example, the expected random Recall@1 for cpg-target2 is 0.33, and the weakest models show about 10× better performance even in the most challenging Not in Same Layout setting, contrary to MoA enrichment, suggesting for replicate retrieval consistency, all methods detect some reproducible biological signature. CellProfiler features are superior to any representation l… view at source ↗

read the original abstract

Microscopy images contain rich information about how cells respond to perturbations, making them essential to applications like drug screening. To quantify images, researchers often use representation extraction methods, and recent years have seen a proliferation of deep learning methods. While measuring the quality of these representations is essential, evaluation remains fragmented, with each proposed model evaluated on different tasks and datasets, using custom pipelines and metrics, making it difficult to fairly compare models. Here, we introduce MorphoHELM, a comprehensive open benchmark for evaluating feature extraction methods for Cell Painting, the most widely-used morphological profiling assay. MorphoHELM consolidates evaluation standards in the field, extends and corrects them to be more robust, and evaluates on the widest range of methods to date. A defining feature of the benchmark is that each task is evaluated at different degrees of batch effects (or technical noise), directly quantifying how the ability of methods to detect biological signal degrades as noise increases. Together, these properties enable MorphoHELM to detect trade-offs between methods, and we demonstrate that models that excel at certain kinds of biological signal are weaker at others. We show that no existing model outperforms classic computer vision analytic strategies across all settings, which remain the strongest general use-case representations. All datasets, code, and evaluation tools are publicly available at https://github.com/microsoft/MorphoHELM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces MorphoHELM, a comprehensive open benchmark for evaluating feature extraction methods on Cell Painting microscopy images. It consolidates and extends existing evaluation standards, tests a wide range of methods (deep learning and classic computer vision) at multiple levels of simulated batch effects/technical noise, and concludes that no existing model outperforms classic analytic strategies across all settings, positioning the latter as the strongest general-use representations for detecting biological signals.

Significance. If the central findings hold, the work is significant as a standardization effort in morphological profiling for applications like drug screening. The public release of all datasets, code, and evaluation tools is a clear strength that supports reproducibility and community use. The benchmark's design for quantifying degradation under increasing noise levels usefully reveals method trade-offs, though its impact depends on the fidelity of the noise model to real assay artifacts.

major comments (1)

[Methods (batch effect simulation and task evaluation)] The batch-effect simulation procedure (described in the methods section on task evaluation under noise) is load-bearing for the headline claim that classic CV strategies remain strongest across settings. The paper does not provide direct validation (e.g., comparison of simulated vs. real multi-batch Cell Painting artifact distributions or spatial correlation statistics) that the chosen noise model (additive Gaussian, global shifts, plate effects) faithfully reproduces the relative difficulty of detecting true morphological perturbations; without this, analytic pipelines tuned to the simulation could appear artificially robust while learned representations are unfairly penalized.

minor comments (2)

[Results figures] Figure captions and axis labels for the degradation curves under increasing batch effects could be expanded to explicitly state the exact noise parameters and number of replicates per level.
[Evaluation protocol] The manuscript would benefit from an explicit table listing all evaluated methods, their training regimes (pretrained vs. fine-tuned), and the precise metrics (e.g., mAP, correlation) used for each task.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript introducing MorphoHELM. We address the major comment on the batch effect simulation below and outline planned revisions to strengthen the work.

read point-by-point responses

Referee: [Methods (batch effect simulation and task evaluation)] The batch-effect simulation procedure (described in the methods section on task evaluation under noise) is load-bearing for the headline claim that classic CV strategies remain strongest across settings. The paper does not provide direct validation (e.g., comparison of simulated vs. real multi-batch Cell Painting artifact distributions or spatial correlation statistics) that the chosen noise model (additive Gaussian, global shifts, plate effects) faithfully reproduces the relative difficulty of detecting true morphological perturbations; without this, analytic pipelines tuned to the simulation could appear artificially robust while learned representations are unfairly penalized.

Authors: We appreciate the referee's emphasis on the need for validation of the batch effect simulation, as it underpins our comparative analysis. The noise models employed—additive Gaussian noise, global shifts, and plate effects—were selected to emulate commonly observed technical artifacts in Cell Painting experiments, drawing from prior literature on batch correction and noise modeling in high-content screening. This controlled simulation enables us to isolate the impact of increasing technical noise on representation quality without confounding factors from real multi-batch data collection. We acknowledge that a direct empirical comparison to real artifact distributions, such as through spatial correlation statistics or distribution matching, is not included in the current manuscript. To address this, we will revise the methods and discussion sections to provide additional justification for the noise model choices with supporting references, explicitly discuss the assumptions and potential limitations of the simulation, and suggest directions for future validation using real multi-batch datasets. This will clarify the scope of our claims regarding method robustness. revision: partial

Circularity Check

0 steps flagged

No circularity detected in MorphoHELM benchmark

full rationale

The paper introduces MorphoHELM as an open benchmark consolidating evaluation standards for Cell Painting assays on public datasets. It evaluates representation methods across tasks at varying simulated batch effect levels and concludes that classic computer vision analytic strategies remain the strongest general-use representations based on comparative performance. No derivation step reduces to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations; the chain is grounded in external data, metrics, and consolidated standards, making the analysis self-contained against benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As a benchmark paper the work rests on domain assumptions about what constitutes valid evaluation for morphology assays rather than new derivations; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Standard computer vision and machine learning evaluation practices for image representations apply directly to Cell Painting microscopy data.
The benchmark builds on existing tasks and metrics in the field without re-deriving them.

pith-pipeline@v0.9.0 · 5777 in / 1395 out tokens · 62219 ms · 2026-05-19T15:42:44.492555+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We show that no existing model outperforms classic computer vision analytic strategies across all settings, which remain the strongest general use-case representations.
IndisputableMonolith/Foundation/Atomicity.lean atomic_tick unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A defining feature of the benchmark is that each task is evaluated at different degrees of batch effects (or technical noise)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

[1]

Cell painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes.Nature Protocols, 11(9):1757–1774, August 2016

Mark-Anthony Bray, Shantanu Singh, Han Han, Chadwick T Davis, Blake Borgeson, Cathy Hartland, Maria Kost-Alimova, Sigrun M Gustafsdottir, Christopher C Gibson, and Anne E Carpenter. Cell painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes.Nature Protocols, 11(9):1757–1774, August 2016. URL: http: //dx....

work page doi:10.1038/nprot.2016.105 2016
[2]

Carpenter

Srijit Seal, Maria-Anna Trapotsi, Ola Spjuth, Shantanu Singh, Jordi Carreras-Puigvert, Nigel Greene, Andreas Bender, and Anne E. Carpenter. Cell painting: a decade of discovery and innovation in cellular imaging.Nature Methods, 22(2):254–268, December 2024. URL: http://dx.doi.org/10.1038/s41592-024-02528-8, doi:10.1038/s41592-024-02528-8

work page doi:10.1038/s41592-024-02528-8 2024
[3]

Data-analysis strategies for image-based cell profiling

Juan C Caicedo, Sam Cooper, Florian Heigwer, Scott Warchal, Peng Qiu, Csaba Molnar, Aliaksei S Vasilevich, Joseph D Barry, Harmanjit Singh Bansal, Oren Kraus, Mathias Wawer, Lassi Paavolainen, Markus D Herrmann, Mohammad Rohban, Jane Hung, Holger Hennig, John Concannon, Ian Smith, Paul A Clemons, Shantanu Singh, Paul Rees, Peter Horvath, Roger G Linington...

work page doi:10.1038/nmeth.4397 2017
[4]

Machine learning and image-based profiling in drug discovery.Current Opinion in Systems Biology, 10:43–52, 2018

Christian Scheeder, Florian Heigwer, and Michael Boutros. Machine learning and image-based profiling in drug discovery.Current Opinion in Systems Biology, 10:43–52, 2018. URL: http://dx.doi.org/10.1016/j.coisb.2018.05.004, doi:10.1016/j.coisb.2018.05.004

work page doi:10.1016/j.coisb.2018.05.004 2018
[5]

Image-based profiling for drug discovery: due for a machine-learning upgrade?Nat

Srinivas Niranj Chandrasekaran, Hugo Ceulemans, Justin D Boyd, and Anne E Carpenter. Image-based profiling for drug discovery: due for a machine-learning upgrade?Nat. Rev. Drug Discov., 20(2):145–159, February 2021

work page 2021
[6]

CellProfiler: image analysis software for identifying and quantifying cell phenotypes.Genome Biol., 7(10):R100, October 2006

Anne E Carpenter, Thouis R Jones, Michael R Lamprecht, Colin Clarke, In Han Kang, Ola Friman, David A Guertin, Joo Han Chang, Robert A Lindquist, Jason Moffat, Polina Golland, and David M Sabatini. CellProfiler: image analysis software for identifying and quantifying cell phenotypes.Genome Biol., 7(10):R100, October 2006

work page 2006
[7]

Stirling, Madison J

David R. Stirling, Madison J. Swain-Bowden, Alice M. Lucas, Anne E. Carpenter, Beth A. Cimini, and Allen Goodman. Cellprofiler 4: improvements in speed, utility and usability. BMC Bioinformatics, 2021. URL: http://dx.doi.org/10.1186/s12859-021-04344-9 , doi:10.1186/s12859-021-04344-9

work page doi:10.1186/s12859-021-04344-9 2021
[8]

Evaluating batch correction methods for image-based cell profiling.Nat

John Arevalo, Ellen Su, Jessica D Ewald, Robert van Dijk, Anne E Carpenter, and Shantanu Singh. Evaluating batch correction methods for image-based cell profiling.Nat. Commun., 15(1):6516, August 2024

work page 2024
[9]

Morphological profiling for drug discovery in the era of deep learning.Briefings in Bioinformatics, 25(4):bbae284, 07 2024

Qiaosi Tang, Ranjala Ratnayake, Gustavo Seabra, Zhe Jiang, Ruogu Fang, Lina Cui, Yousong Ding, Tamer Kahveci, Jiang Bian, Chenglong Li, Hendrik Luesch, and Yanjun Li. Morphological profiling for drug discovery in the era of deep learning.Briefings in Bioinformatics, 25(4):bbae284, 07 2024. URL: https://doi.org/10.1093/bib/bbae284, arXiv:https://academic.o...

work page doi:10.1093/bib/bbae284 2024
[10]

Plummer, and Juan C Caicedo

Zitong Chen, Chau Pham, Siqi Wang, Michael Doron, Nikita Moshkov, Bryan A. Plummer, and Juan C Caicedo. CHAMMI: a benchmark for channel-adaptive models in microscopy imaging. InThirty-seventh Conference on Neural Information Processing Systems Datasets and Bench- marks Track. NeurIPS, 2023. URL:https://openreview.net/forum?id=Luc1bZLeMY

work page 2023
[11]

Rxrx1: a dataset for evaluating experimental batch correction methods

Maciej Sypetkowski, Morteza Rezanejad, Saber Saberian, Oren Kraus, John Urbanik, James Taylor, Ben Mabey, Mason Victors, Jason Yosinski, Alborz Rezazadeh Sereshkeh, Imran Haque, and Berton Earnshaw. Rxrx1: a dataset for evaluating experimental batch correction methods

work page
[12]

URL:https://arxiv.org/abs/2301.05768, arXiv:2301.05768

work page arXiv
[13]

Learning representations for image-based profiling of perturbations.Nat

Nikita Moshkov, Michael Bornholdt, Santiago Benoit, Matthew Smith, Claire McQuin, Allen Goodman, Rebecca A Senft, Yu Han, Mehrtash Babadi, Peter Horvath, Beth A Cimini, Anne E Carpenter, Shantanu Singh, and Juan C Caicedo. Learning representations for image-based profiling of perturbations.Nat. Commun., 15(1):1594, February 2024. 10

work page 2024
[14]

Lazar, Rahul Mohan, Conor Tillinghast, Tommaso Biancalani, Marta M

Safiye Celik, Jan-Christian Hütter, Sandra Melo Carlos, Nathan H. Lazar, Rahul Mohan, Conor Tillinghast, Tommaso Biancalani, Marta M. Fay, Berton A. Earnshaw, and Imran S. Haque. Building, benchmarking, and exploring perturbative maps of transcriptional and morphological data.PLOS Computational Biology, 20(10):1–24, 10 2024. URL: https://doi.org/10. 1371/...

work page doi:10.1371/journal.pcbi.1012463 2024
[15]

Caie, Rebecca E

Peter D. Caie, Rebecca E. Walls, Alexandra Ingleston-Orme, Sandeep Daya, Tom Houslay, Rob Eagle, Mark E. Roberts, and Neil O. Carragher. High-content phenotypic profiling of drug response signatures across distinct cancer cells.Molecular Cancer Therapeutics, 9(6):1913–1926, 06 2010. URL: https://doi.org/10.1158/1535-7163.MCT-09-1148 , arXiv:https://aacrjo...

work page doi:10.1158/1535-7163.mct-09-1148 1913
[16]

Michael Ando, Cory Y

D. Michael Ando, Cory Y . McLean, and Marc Berndl. Improving phe- notypic measurements in high-content imaging screens.bioRxiv, 2017. URL: https://www.biorxiv.org/content/early/2017/07/10/161422, arXiv:https://www.biorxiv.org/content/early/2017/07/10/161422.full.pdf, doi:10.1101/161422

work page doi:10.1101/161422 2017
[17]

WILDS: A Benchmark of in-the-Wild Distribution Shifts

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton Earnshaw, Imran Haque, Sara M Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang. WILDS: A ...

work page 2021
[18]

Zero- cell corrections in random-effects meta-analyses.Research Synthesis Methods, 11(6):913–919, 2020

Frank Weber, Guido Knapp, Katja Ickstadt, Günther Kundt, and Änne Glass. Zero- cell corrections in random-effects meta-analyses.Research Synthesis Methods, 11(6):913–919, 2020. URL: https://onlinelibrary.wiley.com/doi/abs/10. 1002/jrsm.1460, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/jrsm.1460, doi:https://doi.org/10.1002/jrsm.1460

work page doi:10.1002/jrsm.1460 2020
[19]

On estimating the relation between blood group and disease.Ann

B Woolf. On estimating the relation between blood group and disease.Ann. Hum. Genet., 19(4):251–253, June 1955

work page 1955
[20]

Integrating chemical structures as treatments improves representations of microscopy images for morphological profiling

Yemin Yu, Neil Tenenholtz, Lester Mackey, Ying Wei, David Alvarez-Melis, Ava P. Amini, and Alex X. Lu. Causal integration of chemical structures improves representations of microscopy images for morphological profiling. 2025. URL: https://arxiv.org/abs/2504.09544, arXiv:2504.09544

work page internal anchor Pith review Pith/arXiv arXiv 2025
[21]

Corum: the comprehensive resource of mam- malian protein complexes–2022.Nucleic Acids Research, 51(D1):D539–D545, 01 2023

George Tsitsiridis, Ralph Steinkamp, Madalina Giurgiu, Barbara Brauner, Gisela Fobo, Goar Frishman, Corinna Montrone, and Andreas Ruepp. Corum: the comprehensive resource of mam- malian protein complexes–2022.Nucleic Acids Research, 51(D1):D539–D545, 01 2023. URL: https://doi.org/10.1093/nar/gkac1015, arXiv:https://academic.oup.com/nar/article- pdf/51/D1/...

work page doi:10.1093/nar/gkac1015 2022
[22]

hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies

Kevin Drew, John B Wallingford, and Edward M Marcotte. hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol. Syst. Biol., 17(5):e10016, May 2021

work page 2021
[23]

The string database in 2025: protein networks with directional- ity of regulation.Nucleic Acids Research, 53(D1):D730–D737, 01 2025

Damian Szklarczyk, Katerina Nastou, Mikaela Koutrouli, Rebecca Kirsch, Farrokh Mehryary, Radja Hachilif, Dewei Hu, Matteo E Peluso, Qingyao Huang, Tao Fang, Nadezhda T Doncheva, Sampo Pyysalo, Peer Bork, Lars J Jensen, and Christian von Mering. The string database in 2025: protein networks with directional- ity of regulation.Nucleic Acids Research, 53(D1)...

work page doi:10.1093/nar/gkae1113 2025
[24]

Signor 2.0, the signaling network open resource 2.0: 2019 update.Nucleic Acids Research, 48(D1):D504–D510, 01 2020

Luana Licata, Prisca Lo Surdo, Marta Iannuccelli, Alessandro Palma, Elisa Micarelli, Livia Perfetto, Daniele Peluso, Alberto Calderone, Luisa Castagnoli, and Gianni Ce- sareni. Signor 2.0, the signaling network open resource 2.0: 2019 update.Nucleic Acids Research, 48(D1):D504–D510, 01 2020. URL: https://doi.org/10.1093/nar/ 11 gkz949, arXiv:https://acade...

work page doi:10.1093/nar/ 2019
[25]

Reactome: a database of reactions, pathways and biological processes.Nucleic Acids Res., 39(Database issue):D691–7, January 2011

David Croft, Gavin O’Kelly, Guanming Wu, Robin Haw, Marc Gillespie, Lisa Matthews, Michael Caudy, Phani Garapati, Gopal Gopinath, Bijay Jassal, Steven Jupe, Irina Kalatskaya, Shahana Mahajan, Bruce May, Nelson Ndegwa, Esther Schmidt, Veronica Shamovsky, Christina Yung, Ewan Birney, Henning Hermjakob, Peter D’Eustachio, and Lincoln Stein. Reactome: a datab...

work page 2011
[26]

The drug repurposing hub: a next-generation drug library and information resource.Nat

Steven M Corsello, Joshua A Bittker, Zihan Liu, Joshua Gould, Patrick McCarren, Jodi E Hirschman, Stephen E Johnston, Anita Vrcic, Bang Wong, Mariya Khan, Jacob Asiedu, Rajiv Narayan, Christopher C Mader, Aravind Subramanian, and Todd R Golub. The drug repurposing hub: a next-generation drug library and information resource.Nat. Med., 23(4):405–408, April 2017

work page 2017
[27]

2016, in IEEE Conf

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume, 770–778. 2016. doi:10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016
[28]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...

work page 2024
[29]

Masked autoencoders for microscopy are scalable learners of cellular biology

Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Le- ung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Dominique Beaini, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, and Berton Earnshaw. Masked autoencoders for microscopy are scalable learners of cellular biology. In Proceedings...

work page 2024
[30]

Hansen, Mohini K

Ankit Gupta, Zoe Wefers, Konstantin Kahnert, Jan N. Hansen, Mohini K. Misra, Will Leineweber, Anthony Cesnik, Dan Lu, Ulrika Axelsson, Frederic Ballllosera, Russ B. Altman, Theofanis Karaletsos, and Emma Lundberg. Subcell: proteome- aware vision foundation models for microscopy capture single-cell biology.bioRxiv,

work page
[31]

627299, arXiv:https://www.biorxiv.org/content/early/2025/10/30/2024.12.06.627299.full.pdf, doi:10.1101/2024.12.06.627299

URL: https://www.biorxiv.org/content/early/2025/10/30/2024.12.06. 627299, arXiv:https://www.biorxiv.org/content/early/2025/10/30/2024.12.06.627299.full.pdf, doi:10.1101/2024.12.06.627299

work page doi:10.1101/2024.12.06.627299 2025
[32]

CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical struc- tures.Nat

Ana Sanchez-Fernandez, Elisabeth Rumetshofer, Sepp Hochreiter, and Günter Klambauer. CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical struc- tures.Nat. Commun., 14(1):7339, November 2023

work page 2023
[33]

Michael Ando, John Arevalo, Melissa Bennion, Nicolas Boisseau, Adriana Borowa, Justin D

Srinivas Niranj Chandrasekaran, Jeanelle Ackerman, Eric Alix, D. Michael Ando, John Arevalo, Melissa Bennion, Nicolas Boisseau, Adriana Borowa, Justin D. Boyd, Laurent Brino, Patrick J. Byrne, Hugo Ceulemans, Carolyn Ch’ng, Beth A. Cimini, Djork-Arne Clevert, Nicole Deflaux, John G Doench, Thierry Dorval, Regis Doyonnas, Vincenza Dragone, Ola Engkvist, Pa...

work page
[34]

534023, arXiv:https://www.biorxiv.org/content/early/2023/03/27/2023.03.23.534023.full.pdf, doi:10.1101/2023.03.23.534023

URL: https://www.biorxiv.org/content/early/2023/03/27/2023.03.23. 534023, arXiv:https://www.biorxiv.org/content/early/2023/03/27/2023.03.23.534023.full.pdf, doi:10.1101/2023.03.23.534023

work page doi:10.1101/2023.03.23.534023 2023
[35]

A dataset of images and morphological profiles of 30 000 small-molecule treatments using the cell painting assay.GigaScience, 6(12):giw014, 12 2017

Mark-Anthony Bray, Sigrun M Gustafsdottir, Mohammad H Rohban, Shantanu Singh, Veb- jorn Ljosa, Katherine L Sokolnicki, Joshua A Bittker, Nicole E Bodycombe, Vlado Dan ˇcík, Thomas P Hasaka, Cindy S Hon, Melissa M Kemp, Kejie Li, Deepika Walpita, Mathias J Wawer, Todd R Golub, Stuart L Schreiber, Paul A Clemons, Alykhan F Shamji, and Anne E Carpenter. A da...

work page doi:10.1093/gigascience/giw014 2017
[36]

Not Same Batch (NSB): Candidates from the same experimental batch are excluded, testing robustness to day-to-day variation between batches

work page
[37]

Not Same Source (NSS): Candidates from the same institution are excluded, testing robust- ness to domain shifts such as microscopy hardware differences

work page
[38]

For the enrichment tasks (MoA and gene pathway enrichment), stringency is implemented by constructing separate consensus profiles from non-overlapping subsets of replicates

Not Same Layout (NSL): Candidates sharing the same plate position as the query are excluded, controlling for position-induced batch effects [5]. For the enrichment tasks (MoA and gene pathway enrichment), stringency is implemented by constructing separate consensus profiles from non-overlapping subsets of replicates. At the No Restriction level, all repli...

work page
[39]

This step both standardizes dimensionality and denoises the feature space

Platewise Center Scaling and PCA.All features are scaled to a standard range, then reduced via Principal Component Analysis applied across the entire dataset. This step both standardizes dimensionality and denoises the feature space. We use 64 components for the PCA analysis

work page
[40]

Robustize

Platewise Robust Standardization.A Median Absolute Deviation (MAD) "Robustize" transformation is applied per plate to align distributions across experimental runs while remaining robust to outliers. This pipeline is applied identically to all methods evaluated in the benchmark. 20 Appendix D: Supplementary Analyses D.1 Effect of Modified Haldane-Anscombe ...

work page

[1] [1]

Cell painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes.Nature Protocols, 11(9):1757–1774, August 2016

Mark-Anthony Bray, Shantanu Singh, Han Han, Chadwick T Davis, Blake Borgeson, Cathy Hartland, Maria Kost-Alimova, Sigrun M Gustafsdottir, Christopher C Gibson, and Anne E Carpenter. Cell painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes.Nature Protocols, 11(9):1757–1774, August 2016. URL: http: //dx....

work page doi:10.1038/nprot.2016.105 2016

[2] [2]

Carpenter

Srijit Seal, Maria-Anna Trapotsi, Ola Spjuth, Shantanu Singh, Jordi Carreras-Puigvert, Nigel Greene, Andreas Bender, and Anne E. Carpenter. Cell painting: a decade of discovery and innovation in cellular imaging.Nature Methods, 22(2):254–268, December 2024. URL: http://dx.doi.org/10.1038/s41592-024-02528-8, doi:10.1038/s41592-024-02528-8

work page doi:10.1038/s41592-024-02528-8 2024

[3] [3]

Data-analysis strategies for image-based cell profiling

Juan C Caicedo, Sam Cooper, Florian Heigwer, Scott Warchal, Peng Qiu, Csaba Molnar, Aliaksei S Vasilevich, Joseph D Barry, Harmanjit Singh Bansal, Oren Kraus, Mathias Wawer, Lassi Paavolainen, Markus D Herrmann, Mohammad Rohban, Jane Hung, Holger Hennig, John Concannon, Ian Smith, Paul A Clemons, Shantanu Singh, Paul Rees, Peter Horvath, Roger G Linington...

work page doi:10.1038/nmeth.4397 2017

[4] [4]

Machine learning and image-based profiling in drug discovery.Current Opinion in Systems Biology, 10:43–52, 2018

Christian Scheeder, Florian Heigwer, and Michael Boutros. Machine learning and image-based profiling in drug discovery.Current Opinion in Systems Biology, 10:43–52, 2018. URL: http://dx.doi.org/10.1016/j.coisb.2018.05.004, doi:10.1016/j.coisb.2018.05.004

work page doi:10.1016/j.coisb.2018.05.004 2018

[5] [5]

Image-based profiling for drug discovery: due for a machine-learning upgrade?Nat

Srinivas Niranj Chandrasekaran, Hugo Ceulemans, Justin D Boyd, and Anne E Carpenter. Image-based profiling for drug discovery: due for a machine-learning upgrade?Nat. Rev. Drug Discov., 20(2):145–159, February 2021

work page 2021

[6] [6]

CellProfiler: image analysis software for identifying and quantifying cell phenotypes.Genome Biol., 7(10):R100, October 2006

Anne E Carpenter, Thouis R Jones, Michael R Lamprecht, Colin Clarke, In Han Kang, Ola Friman, David A Guertin, Joo Han Chang, Robert A Lindquist, Jason Moffat, Polina Golland, and David M Sabatini. CellProfiler: image analysis software for identifying and quantifying cell phenotypes.Genome Biol., 7(10):R100, October 2006

work page 2006

[7] [7]

Stirling, Madison J

David R. Stirling, Madison J. Swain-Bowden, Alice M. Lucas, Anne E. Carpenter, Beth A. Cimini, and Allen Goodman. Cellprofiler 4: improvements in speed, utility and usability. BMC Bioinformatics, 2021. URL: http://dx.doi.org/10.1186/s12859-021-04344-9 , doi:10.1186/s12859-021-04344-9

work page doi:10.1186/s12859-021-04344-9 2021

[8] [8]

Evaluating batch correction methods for image-based cell profiling.Nat

John Arevalo, Ellen Su, Jessica D Ewald, Robert van Dijk, Anne E Carpenter, and Shantanu Singh. Evaluating batch correction methods for image-based cell profiling.Nat. Commun., 15(1):6516, August 2024

work page 2024

[9] [9]

Morphological profiling for drug discovery in the era of deep learning.Briefings in Bioinformatics, 25(4):bbae284, 07 2024

Qiaosi Tang, Ranjala Ratnayake, Gustavo Seabra, Zhe Jiang, Ruogu Fang, Lina Cui, Yousong Ding, Tamer Kahveci, Jiang Bian, Chenglong Li, Hendrik Luesch, and Yanjun Li. Morphological profiling for drug discovery in the era of deep learning.Briefings in Bioinformatics, 25(4):bbae284, 07 2024. URL: https://doi.org/10.1093/bib/bbae284, arXiv:https://academic.o...

work page doi:10.1093/bib/bbae284 2024

[10] [10]

Plummer, and Juan C Caicedo

Zitong Chen, Chau Pham, Siqi Wang, Michael Doron, Nikita Moshkov, Bryan A. Plummer, and Juan C Caicedo. CHAMMI: a benchmark for channel-adaptive models in microscopy imaging. InThirty-seventh Conference on Neural Information Processing Systems Datasets and Bench- marks Track. NeurIPS, 2023. URL:https://openreview.net/forum?id=Luc1bZLeMY

work page 2023

[11] [11]

Rxrx1: a dataset for evaluating experimental batch correction methods

Maciej Sypetkowski, Morteza Rezanejad, Saber Saberian, Oren Kraus, John Urbanik, James Taylor, Ben Mabey, Mason Victors, Jason Yosinski, Alborz Rezazadeh Sereshkeh, Imran Haque, and Berton Earnshaw. Rxrx1: a dataset for evaluating experimental batch correction methods

work page

[12] [12]

URL:https://arxiv.org/abs/2301.05768, arXiv:2301.05768

work page arXiv

[13] [13]

Learning representations for image-based profiling of perturbations.Nat

Nikita Moshkov, Michael Bornholdt, Santiago Benoit, Matthew Smith, Claire McQuin, Allen Goodman, Rebecca A Senft, Yu Han, Mehrtash Babadi, Peter Horvath, Beth A Cimini, Anne E Carpenter, Shantanu Singh, and Juan C Caicedo. Learning representations for image-based profiling of perturbations.Nat. Commun., 15(1):1594, February 2024. 10

work page 2024

[14] [14]

Lazar, Rahul Mohan, Conor Tillinghast, Tommaso Biancalani, Marta M

Safiye Celik, Jan-Christian Hütter, Sandra Melo Carlos, Nathan H. Lazar, Rahul Mohan, Conor Tillinghast, Tommaso Biancalani, Marta M. Fay, Berton A. Earnshaw, and Imran S. Haque. Building, benchmarking, and exploring perturbative maps of transcriptional and morphological data.PLOS Computational Biology, 20(10):1–24, 10 2024. URL: https://doi.org/10. 1371/...

work page doi:10.1371/journal.pcbi.1012463 2024

[15] [15]

Caie, Rebecca E

Peter D. Caie, Rebecca E. Walls, Alexandra Ingleston-Orme, Sandeep Daya, Tom Houslay, Rob Eagle, Mark E. Roberts, and Neil O. Carragher. High-content phenotypic profiling of drug response signatures across distinct cancer cells.Molecular Cancer Therapeutics, 9(6):1913–1926, 06 2010. URL: https://doi.org/10.1158/1535-7163.MCT-09-1148 , arXiv:https://aacrjo...

work page doi:10.1158/1535-7163.mct-09-1148 1913

[16] [16]

Michael Ando, Cory Y

D. Michael Ando, Cory Y . McLean, and Marc Berndl. Improving phe- notypic measurements in high-content imaging screens.bioRxiv, 2017. URL: https://www.biorxiv.org/content/early/2017/07/10/161422, arXiv:https://www.biorxiv.org/content/early/2017/07/10/161422.full.pdf, doi:10.1101/161422

work page doi:10.1101/161422 2017

[17] [17]

WILDS: A Benchmark of in-the-Wild Distribution Shifts

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton Earnshaw, Imran Haque, Sara M Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang. WILDS: A ...

work page 2021

[18] [18]

Zero- cell corrections in random-effects meta-analyses.Research Synthesis Methods, 11(6):913–919, 2020

Frank Weber, Guido Knapp, Katja Ickstadt, Günther Kundt, and Änne Glass. Zero- cell corrections in random-effects meta-analyses.Research Synthesis Methods, 11(6):913–919, 2020. URL: https://onlinelibrary.wiley.com/doi/abs/10. 1002/jrsm.1460, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/jrsm.1460, doi:https://doi.org/10.1002/jrsm.1460

work page doi:10.1002/jrsm.1460 2020

[19] [19]

On estimating the relation between blood group and disease.Ann

B Woolf. On estimating the relation between blood group and disease.Ann. Hum. Genet., 19(4):251–253, June 1955

work page 1955

[20] [20]

Integrating chemical structures as treatments improves representations of microscopy images for morphological profiling

Yemin Yu, Neil Tenenholtz, Lester Mackey, Ying Wei, David Alvarez-Melis, Ava P. Amini, and Alex X. Lu. Causal integration of chemical structures improves representations of microscopy images for morphological profiling. 2025. URL: https://arxiv.org/abs/2504.09544, arXiv:2504.09544

work page internal anchor Pith review Pith/arXiv arXiv 2025

[21] [21]

Corum: the comprehensive resource of mam- malian protein complexes–2022.Nucleic Acids Research, 51(D1):D539–D545, 01 2023

George Tsitsiridis, Ralph Steinkamp, Madalina Giurgiu, Barbara Brauner, Gisela Fobo, Goar Frishman, Corinna Montrone, and Andreas Ruepp. Corum: the comprehensive resource of mam- malian protein complexes–2022.Nucleic Acids Research, 51(D1):D539–D545, 01 2023. URL: https://doi.org/10.1093/nar/gkac1015, arXiv:https://academic.oup.com/nar/article- pdf/51/D1/...

work page doi:10.1093/nar/gkac1015 2022

[22] [22]

hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies

Kevin Drew, John B Wallingford, and Edward M Marcotte. hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol. Syst. Biol., 17(5):e10016, May 2021

work page 2021

[23] [23]

The string database in 2025: protein networks with directional- ity of regulation.Nucleic Acids Research, 53(D1):D730–D737, 01 2025

Damian Szklarczyk, Katerina Nastou, Mikaela Koutrouli, Rebecca Kirsch, Farrokh Mehryary, Radja Hachilif, Dewei Hu, Matteo E Peluso, Qingyao Huang, Tao Fang, Nadezhda T Doncheva, Sampo Pyysalo, Peer Bork, Lars J Jensen, and Christian von Mering. The string database in 2025: protein networks with directional- ity of regulation.Nucleic Acids Research, 53(D1)...

work page doi:10.1093/nar/gkae1113 2025

[24] [24]

Signor 2.0, the signaling network open resource 2.0: 2019 update.Nucleic Acids Research, 48(D1):D504–D510, 01 2020

Luana Licata, Prisca Lo Surdo, Marta Iannuccelli, Alessandro Palma, Elisa Micarelli, Livia Perfetto, Daniele Peluso, Alberto Calderone, Luisa Castagnoli, and Gianni Ce- sareni. Signor 2.0, the signaling network open resource 2.0: 2019 update.Nucleic Acids Research, 48(D1):D504–D510, 01 2020. URL: https://doi.org/10.1093/nar/ 11 gkz949, arXiv:https://acade...

work page doi:10.1093/nar/ 2019

[25] [25]

Reactome: a database of reactions, pathways and biological processes.Nucleic Acids Res., 39(Database issue):D691–7, January 2011

David Croft, Gavin O’Kelly, Guanming Wu, Robin Haw, Marc Gillespie, Lisa Matthews, Michael Caudy, Phani Garapati, Gopal Gopinath, Bijay Jassal, Steven Jupe, Irina Kalatskaya, Shahana Mahajan, Bruce May, Nelson Ndegwa, Esther Schmidt, Veronica Shamovsky, Christina Yung, Ewan Birney, Henning Hermjakob, Peter D’Eustachio, and Lincoln Stein. Reactome: a datab...

work page 2011

[26] [26]

The drug repurposing hub: a next-generation drug library and information resource.Nat

Steven M Corsello, Joshua A Bittker, Zihan Liu, Joshua Gould, Patrick McCarren, Jodi E Hirschman, Stephen E Johnston, Anita Vrcic, Bang Wong, Mariya Khan, Jacob Asiedu, Rajiv Narayan, Christopher C Mader, Aravind Subramanian, and Todd R Golub. The drug repurposing hub: a next-generation drug library and information resource.Nat. Med., 23(4):405–408, April 2017

work page 2017

[27] [27]

2016, in IEEE Conf

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume, 770–778. 2016. doi:10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016

[28] [28]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...

work page 2024

[29] [29]

Masked autoencoders for microscopy are scalable learners of cellular biology

Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Le- ung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Dominique Beaini, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, and Berton Earnshaw. Masked autoencoders for microscopy are scalable learners of cellular biology. In Proceedings...

work page 2024

[30] [30]

Hansen, Mohini K

Ankit Gupta, Zoe Wefers, Konstantin Kahnert, Jan N. Hansen, Mohini K. Misra, Will Leineweber, Anthony Cesnik, Dan Lu, Ulrika Axelsson, Frederic Ballllosera, Russ B. Altman, Theofanis Karaletsos, and Emma Lundberg. Subcell: proteome- aware vision foundation models for microscopy capture single-cell biology.bioRxiv,

work page

[31] [31]

627299, arXiv:https://www.biorxiv.org/content/early/2025/10/30/2024.12.06.627299.full.pdf, doi:10.1101/2024.12.06.627299

URL: https://www.biorxiv.org/content/early/2025/10/30/2024.12.06. 627299, arXiv:https://www.biorxiv.org/content/early/2025/10/30/2024.12.06.627299.full.pdf, doi:10.1101/2024.12.06.627299

work page doi:10.1101/2024.12.06.627299 2025

[32] [32]

CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical struc- tures.Nat

Ana Sanchez-Fernandez, Elisabeth Rumetshofer, Sepp Hochreiter, and Günter Klambauer. CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical struc- tures.Nat. Commun., 14(1):7339, November 2023

work page 2023

[33] [33]

Michael Ando, John Arevalo, Melissa Bennion, Nicolas Boisseau, Adriana Borowa, Justin D

Srinivas Niranj Chandrasekaran, Jeanelle Ackerman, Eric Alix, D. Michael Ando, John Arevalo, Melissa Bennion, Nicolas Boisseau, Adriana Borowa, Justin D. Boyd, Laurent Brino, Patrick J. Byrne, Hugo Ceulemans, Carolyn Ch’ng, Beth A. Cimini, Djork-Arne Clevert, Nicole Deflaux, John G Doench, Thierry Dorval, Regis Doyonnas, Vincenza Dragone, Ola Engkvist, Pa...

work page

[34] [34]

534023, arXiv:https://www.biorxiv.org/content/early/2023/03/27/2023.03.23.534023.full.pdf, doi:10.1101/2023.03.23.534023

URL: https://www.biorxiv.org/content/early/2023/03/27/2023.03.23. 534023, arXiv:https://www.biorxiv.org/content/early/2023/03/27/2023.03.23.534023.full.pdf, doi:10.1101/2023.03.23.534023

work page doi:10.1101/2023.03.23.534023 2023

[35] [35]

A dataset of images and morphological profiles of 30 000 small-molecule treatments using the cell painting assay.GigaScience, 6(12):giw014, 12 2017

Mark-Anthony Bray, Sigrun M Gustafsdottir, Mohammad H Rohban, Shantanu Singh, Veb- jorn Ljosa, Katherine L Sokolnicki, Joshua A Bittker, Nicole E Bodycombe, Vlado Dan ˇcík, Thomas P Hasaka, Cindy S Hon, Melissa M Kemp, Kejie Li, Deepika Walpita, Mathias J Wawer, Todd R Golub, Stuart L Schreiber, Paul A Clemons, Alykhan F Shamji, and Anne E Carpenter. A da...

work page doi:10.1093/gigascience/giw014 2017

[36] [36]

Not Same Batch (NSB): Candidates from the same experimental batch are excluded, testing robustness to day-to-day variation between batches

work page

[37] [37]

Not Same Source (NSS): Candidates from the same institution are excluded, testing robust- ness to domain shifts such as microscopy hardware differences

work page

[38] [38]

For the enrichment tasks (MoA and gene pathway enrichment), stringency is implemented by constructing separate consensus profiles from non-overlapping subsets of replicates

Not Same Layout (NSL): Candidates sharing the same plate position as the query are excluded, controlling for position-induced batch effects [5]. For the enrichment tasks (MoA and gene pathway enrichment), stringency is implemented by constructing separate consensus profiles from non-overlapping subsets of replicates. At the No Restriction level, all repli...

work page

[39] [39]

This step both standardizes dimensionality and denoises the feature space

Platewise Center Scaling and PCA.All features are scaled to a standard range, then reduced via Principal Component Analysis applied across the entire dataset. This step both standardizes dimensionality and denoises the feature space. We use 64 components for the PCA analysis

work page

[40] [40]

Robustize

Platewise Robust Standardization.A Median Absolute Deviation (MAD) "Robustize" transformation is applied per plate to align distributions across experimental runs while remaining robust to outliers. This pipeline is applied identically to all methods evaluated in the benchmark. 20 Appendix D: Supplementary Analyses D.1 Effect of Modified Haldane-Anscombe ...

work page