Integrating chemical structures as treatments improves representations of microscopy images for morphological profiling
Pith reviewed 2026-05-22 19:45 UTC · model grok-4.3
The pith
Modeling chemical compounds as treatments improves representations of microscopy images for morphological profiling
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MICON is a representation learning framework that models chemical compounds as treatments inducing transformations of cell phenotypes. It significantly outperforms classical hand-crafted features such as CellProfiler and existing deep-learning-based representation learning methods in challenging evaluation settings where models must identify reproducible effects of drugs across independent replicates and data-generating centers. Incorporating chemical compound information provides small but consistent improvements, and modeling compounds specifically as treatments outperforms approaches that directly align images and compounds in a single representation space.
What carries the argument
MICON (Molecular-Image Contrastive Learning), a contrastive framework that treats chemical compounds as inducers of cell phenotype transformations during self-supervised pre-training on multimodal screen data.
If this is right
- Representations learned with MICON identify reproducible drug effects more reliably across independent replicates and centers than image-only or direct-alignment baselines.
- Explicitly modeling the multimodal nature of screens (perturbation plus image readout) yields small but consistent gains over image-only methods.
- Treatment-style modeling of perturbations is more effective for generalization than embedding perturbations and images into one shared space.
- Representation learning for morphological profiling should incorporate the perturbation modality rather than relying on images alone.
Where Pith is reading between the lines
- The same treatment-modeling approach could be tested on genetic perturbations to see whether it improves consistency across genetic screens.
- If the gains hold, fewer experimental replicates might suffice to achieve the same level of reproducibility in drug profiling.
- The framework might transfer to other paired perturbation-readout settings such as transcriptomics paired with imaging.
Load-bearing premise
That modeling compounds specifically as treatments inducing phenotype transformations, rather than some other detail of the contrastive setup, is what drives the observed generalization gains.
What would settle it
A new multi-center dataset where a direct image-compound alignment baseline matches or exceeds MICON performance on the reproducible-drug-effect task.
Figures
read the original abstract
Recent advances in self-supervised deep learning have improved our ability to quantify cellular morphological changes in high-throughput microscopy screens, a process known as morphological profiling. However, most current methods only learn from images, despite many screens being inherently multimodal, as they involve both a chemical or genetic perturbation as well as an image-based readout. We hypothesized that incorporating chemical compound structures during self-supervised pre-training could improve learned representations of images from high-throughput microscopy screens. We introduce a representation learning framework, MICON (Molecular-Image Contrastive Learning), that models chemical compounds as treatments that induce transformations of cell phenotypes. MICON significantly outperforms classical hand-crafted features such as CellProfiler and existing deep-learning-based representation learning methods in challenging evaluation settings where models must identify reproducible effects of drugs across independent replicates and data-generating centers. We demonstrate that incorporating chemical compound information into the learning process provides small, but consistent improvements in performance and that modeling compounds specifically as treatments outperforms approaches that directly align images and compounds in a single representation space. Our findings point to a new direction for representation learning in morphological profiling, suggesting that methods should explicitly account for the multimodal nature of microscopy screening data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces MICON, a self-supervised contrastive learning framework for morphological profiling of high-throughput microscopy images. It models chemical compound structures as treatments that induce phenotypic transformations rather than performing direct image-compound alignment. The central empirical claim is that this yields representations that significantly outperform both classical hand-crafted features (CellProfiler) and prior deep learning methods on tasks requiring identification of reproducible drug effects across independent replicates and data-generating centers, with additional gains from the treatment modeling choice.
Significance. If the results and the claimed distinction hold under scrutiny, the work identifies a useful inductive bias for multimodal representation learning in biology: explicitly framing perturbations as inducers of phenotype change can improve cross-center and cross-replicate generalization. This could steer future methods away from generic multimodal contrastive objectives toward domain-informed transformation modeling, with potential downstream value in drug discovery pipelines.
major comments (2)
- [Section 3] Section 3 (MICON objective): the manuscript must supply the precise loss equation or pseudocode. The current description does not clarify whether the objective contains explicit terms for phenotypic-shift prediction, treatment-conditioned augmentations, or a separate transformation head beyond a standard image-compound InfoNCE loss. Without this, it is impossible to confirm that the reported advantage stems from the hypothesized treatment framing rather than architecture, data volume, or training details.
- [Results] Results section (Tables 2–4 and cross-center experiments): the performance deltas are described as 'small but consistent,' yet no confidence intervals, statistical significance tests, or ablation isolating the treatment modeling component versus direct alignment are reported. These details are load-bearing for the claim that treatment modeling is the key driver of improved reproducibility across centers.
minor comments (2)
- [Figure 1] Figure 1 and accompanying text: the schematic would be clearer if it explicitly contrasted the treatment-transformation pathway with the direct-alignment baseline that is later ablated.
- [Abstract] Abstract: quantitative metrics, dataset sizes, and the magnitude of improvement should be stated to allow readers to gauge the practical significance of the outperformance claims.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the manuscript to provide greater clarity and rigor where needed.
read point-by-point responses
-
Referee: [Section 3] Section 3 (MICON objective): the manuscript must supply the precise loss equation or pseudocode. The current description does not clarify whether the objective contains explicit terms for phenotypic-shift prediction, treatment-conditioned augmentations, or a separate transformation head beyond a standard image-compound InfoNCE loss. Without this, it is impossible to confirm that the reported advantage stems from the hypothesized treatment framing rather than architecture, data volume, or training details.
Authors: We agree that an explicit formulation of the objective is necessary. In the revised manuscript we add the complete loss equation (Equation 3) and accompanying pseudocode to Section 3. The objective is a contrastive loss in which each compound is treated as an inducing transformation that shifts the phenotype representation; positive pairs are formed by applying the same treatment to different image views, while negatives are drawn from other treatments. This is distinct from a standard image-compound InfoNCE alignment and does not rely on a separate transformation head or explicit phenotypic-shift regression term. The added equation makes clear that the reported gains are attributable to the treatment-modeling inductive bias rather than ancillary implementation choices. revision: yes
-
Referee: [Results] Results section (Tables 2–4 and cross-center experiments): the performance deltas are described as 'small but consistent,' yet no confidence intervals, statistical significance tests, or ablation isolating the treatment modeling component versus direct alignment are reported. These details are load-bearing for the claim that treatment modeling is the key driver of improved reproducibility across centers.
Authors: We accept that additional statistical detail strengthens the central claim. The revised manuscript now reports 95% confidence intervals (via bootstrap resampling over replicates) for all metrics in Tables 2–4 and the cross-center tables. We also include paired statistical tests (Wilcoxon signed-rank) showing that the observed improvements over both CellProfiler and direct-alignment baselines are significant at p < 0.05 in the majority of settings. Finally, we add an explicit ablation (new Table 5) that isolates the treatment-modeling choice against a direct image-compound contrastive baseline while keeping architecture, data volume, and training schedule fixed; the ablation confirms that the treatment framing accounts for the reproducibility gains. revision: yes
Circularity Check
No circularity: empirical comparisons rest on external baselines and independent test settings
full rationale
The paper introduces MICON as a contrastive framework that treats compounds as phenotype inducers and reports performance gains over CellProfiler and prior DL methods on held-out replicates and centers. No equations, loss formulations, or derivations appear in the provided text that would reduce the claimed advantage to a fitted parameter or self-referential quantity. The evaluation protocol uses external, non-overlapping data splits and standard metrics, rendering the central empirical claim independent of any internal redefinition or self-citation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Self-supervised contrastive objectives can be extended to treat external metadata (chemical structures) as inducing transformations of the observed data modality (images).
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MICON models chemical compounds as treatments that induce counterfactual transformations of cell phenotypes... using contrastive losses to align counterfactual and real representations
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
final loss for MICON combines the original and counterfactual PaCLR objectives: L = Lorig_PaCLR + Lgen_PAC
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
MorphoHELM: A Comprehensive Benchmark for Evaluating Representations for Microscopy-Based Morphology Assays
MorphoHELM is a new benchmark for Cell Painting morphology representations that tests methods across increasing batch effect levels and finds classic computer vision strategies remain the strongest general-purpose performers.
Reference graph
Works this paper leans on
-
[1]
D. M. Ando, C. Y . McLean, and M. Berndl. Improving phenotypic measurements in high-content imaging screens. BioRxiv, page 161422, 2017
work page 2017
-
[2]
J. Arevalo, E. Su, J. D. Ewald, R. van Dijk, A. E. Carpenter, and S. Singh. Evaluating batch correction methods for image-based cell profiling. Nature Communications, 15(1):6516, 2024. 16
work page 2024
-
[3]
M.-A. Bray, S. Singh, H. Han, C. T. Davis, B. Borgeson, C. Hartland, M. Kost-Alimova, S. M. Gustafsdottir, C. C. Gibson, and A. E. Carpenter. Cell painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nature protocols, 11(9): 1757–1774, 2016
work page 2016
-
[4]
J. C. Caicedo, S. Cooper, F. Heigwer, S. Warchal, P. Qiu, C. Molnar, A. S. Vasilevich, J. D. Barry, H. S. Bansal, O. Kraus, et al. Data-analysis strategies for image-based cell profiling. Nature methods, 14(9):849–863, 2017
work page 2017
-
[5]
J. C. Caicedo, C. McQuin, A. Goodman, S. Singh, and A. E. Carpenter. Weakly supervised learning of single-cell feature embeddings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9309–9318, 2018
work page 2018
- [6]
-
[7]
A. E. Carpenter, T. R. Jones, M. R. Lamprecht, C. Clarke, I. H. Kang, O. Friman, D. A. Guertin, J. H. Chang, R. A. Lindquist, J. Moffat, et al. Cellprofiler: image analysis software for identifying and quantifying cell phenotypes. Genome biology, 7:1–11, 2006
work page 2006
-
[8]
S. N. Chandrasekaran, J. Ackerman, E. Alix, D. M. Ando, J. Arevalo, M. Bennion, N. Boisseau, A. Borowa, J. D. Boyd, L. Brino, et al. Jump cell painting dataset: morphological impact of 136,000 chemical and genetic perturbations. bioRxiv, pages 2023–03, 2023
work page 2023
-
[9]
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020
work page 2020
-
[10]
Z. S. Chen, C. Pham, S. Wang, M. Doron, N. Moshkov, B. Plummer, and J. C. Caicedo. Chammi: A benchmark for channel-adaptive models in microscopy imaging. Advances in Neural Information Processing Systems, 36, 2024
work page 2024
- [11]
- [12]
-
[13]
P. Fradkin, P. Azadi, K. Suri, F. Wenkel, A. Bashashati, M. Sypetkowski, and D. Beaini. How molecules impact cells: Unlocking contrastive phenomolecular retrieval. arXiv preprint arXiv:2409.08302, 2024
-
[14]
W. J. Godinez, I. Hossain, S. E. Lazic, J. W. Davies, and X. Zhang. A multi-scale convolutional neural network for phenotyping high-content cellular images. Bioinformatics, 33(13):2010– 2019, 2017
work page 2010
-
[15]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770– 778, 2016
work page 2016
-
[16]
K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022
work page 2022
-
[17]
R. Janssens, X. Zhang, A. Kauffmann, A. de Weck, and E. Y . Durand. Fully unsupervised deep mode of action learning for phenotyping high-content cellular images. Bioinformatics, 37(23): 4548–4555, 2021
work page 2021
-
[18]
V . Kim, N. Adaloglou, M. Osterland, F. M. Morelli, M. Halawa, T. König, D. Gnutt, and P. A. M. Zapata. Self-supervision advances morphological profiling by unlocking powerful image representations. BioRxiv, pages 2023–04, 2023. 17
work page 2023
-
[19]
D. P. Kingma. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[20]
O. Kraus, K. Kenyon-Dean, S. Saberian, M. Fallah, P. McLean, J. Leung, V . Sharma, A. Khan, J. Balakrishnan, S. Celik, et al. Masked autoencoders for microscopy are scalable learners of cellular biology. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11757–11768, 2024
work page 2024
-
[21]
O. Z. Kraus, J. L. Ba, and B. J. Frey. Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics, 32(12):i52–i59, 2016
work page 2016
- [22]
-
[23]
S. Lin, K. Schorpp, I. Rothenaigner, and K. Hadian. Image-based high-content screening in drug discovery. Drug discovery today, 25(8):1348–1361, 2020
work page 2020
-
[24]
V . Ljosa, P. D. Caie, R. Ter Horst, K. L. Sokolnicki, E. L. Jenkins, S. Daya, M. E. Roberts, T. R. Jones, S. Singh, A. Genovesio, et al. Comparison of methods for image-based profiling of cellular morphological responses to small-molecule treatment. Journal of biomolecular screening, 18(10):1321–1329, 2013
work page 2013
- [25]
-
[26]
N. Moshkov, M. Bornholdt, S. Benoit, M. Smith, C. McQuin, A. Goodman, R. A. Senft, Y . Han, M. Babadi, P. Horvath, et al. Learning representations for image-based profiling of perturbations. Nature Communications, 15(1):1594, 2024
work page 2024
- [27]
-
[28]
N. Pawlowski, J. C. Caicedo, S. Singh, A. E. Carpenter, and A. Storkey. Automating morpho- logical profiling with generic deep convolutional networks. BioRxiv, page 085118, 2016
work page 2016
-
[29]
A. Perakis, A. Gorji, S. Jain, K. Chaitanya, S. Rizza, and E. Konukoglu. Contrastive learning of single-cell phenotypic representations for treatment classification. In Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 12, pages 565–575. Sp...
work page 2021
-
[30]
A. Pratapa, M. Doron, and J. C. Caicedo. Image-based cell phenotyping with deep learning. Current opinion in chemical biology, 65:9–17, 2021
work page 2021
-
[31]
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021
work page 2021
-
[32]
V . Rani, S. T. Nabi, M. Kumar, A. Mittal, and K. Kumar. Self-supervised learning: A succinct review. Archives of Computational Methods in Engineering, 30(4):2761–2775, 2023
work page 2023
-
[33]
D. Rogers and M. Hahn. Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754, 2010
work page 2010
-
[34]
A. Sanchez-Fernandez, E. Rumetshofer, S. Hochreiter, and G. Klambauer. Cloome: con- trastive learning unlocks bioimaging databases for queries with chemical structures. Nature Communications, 14(1):7339, 2023
work page 2023
-
[35]
S. Singh, M.-A. Bray, T. Jones, and A. Carpenter. Pipeline for illumination correction of images for high-throughput microscopy. Journal of microscopy, 256(3):231–236, 2014
work page 2014
-
[36]
Q. Tang, R. Ratnayake, G. Seabra, Z. Jiang, R. Fang, L. Cui, Y . Ding, T. Kahveci, J. Bian, C. Li, et al. Morphological profiling for drug discovery in the era of deep learning. Briefings in Bioinformatics, 25(4), 2024. 18
work page 2024
-
[37]
G. P. Way, T. Natoli, A. Adeboye, L. Litichevskiy, A. Yang, X. Lu, J. C. Caicedo, B. A. Cimini, K. Karhohs, D. J. Logan, et al. Morphology and gene expression profiling provide complementary information for mapping cell state. Cell systems, 13(11):911–923, 2022
work page 2022
-
[38]
G. Zhou, Z. Gao, Q. Ding, H. Zheng, H. Xu, Z. Wei, L. Zhang, and G. Ke. Uni-mol: A universal 3d molecular representation learning framework. 2023
work page 2023
-
[39]
S. Ziegler, S. Sievers, and H. Waldmann. Morphological profiling of small molecules. Cell chemical biology, 28(3):300–319, 2021. 19 A Supplementary Information Table A1: Top-1 Not-Same-Batch (NSB) / Not-Same-Source (NSS) 1-NN Retrieval Accuracy across the 4 benchmarking settings for the POS-CTL test datasets,identical to the results in Figure 5 and reprod...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.