arxiv: 2604.22832 · v1 · submitted 2026-04-19 · 💻 cs.CV · cs.AI· cs.LG

Recognition: unknown

Intervention-Aware Multiscale Representation Learning from Imaging Phenomics and Perturbation Transcriptomics

Jiayuan Chen , Ruoqi Liu , Zishan Gu , Ping Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:44 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords intervention-aware distillationimaging phenomicsperturbation transcriptomicsrepresentation learningdrug discoveryone-shot transferrisk boundsmultimodal learning

0 comments

The pith

Transcriptomic guidance tightens risk bounds for image-based prediction of drug interventions

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an intervention-aware distillation framework that uses perturbational transcriptomics to guide image representation learning. A transcriptome-conditioned teacher integrates gene expression and intervention metadata to create soft distributions over a chemistry-aware codebook organized by drug similarity, employing a fine-tuned single-cell foundation model to handle cell-type context and dose effects. An image-only student then learns to predict these distributions from microscopy images, distilling mechanistic knowledge while operating independently at test time. A sympathetic reader would care because this design addresses the scalability of imaging for drug discovery while incorporating the mechanistic depth of transcriptomics, leading to better generalization on unseen interventions rather than mere sample identity alignment.

Core claim

The central claim is that an intervention-aware distillation framework leveraging perturbational transcriptomics guides image representation learning, with a transcriptome-conditioned teacher producing soft distributions over a chemistry-aware codebook that an image-only student predicts from microscopy alone. This yields theoretical guarantees that transcriptomic guidance tightens the risk bound for image-based prediction and delivers empirical gains in one-shot transfer to unseen interventions and drug-target gene discovery on Cell Painting and RxRx datasets paired with L1000, outperforming self-supervised and alignment baselines while handling dose and cell-type mismatches in weakly-pairs

What carries the argument

The transcriptome-conditioned teacher that integrates gene expression and intervention metadata to produce soft distributions over a chemistry-aware codebook organized by drug similarity, using a fine-tuned single-cell foundation model to encode cell-type context and disentangle dose effects; this knowledge is distilled to an image-only student.

If this is right

The image-only student can operate independently at test time while still incorporating mechanistic knowledge from transcriptomics.
One-shot transfer performance to unseen interventions improves compared to self-supervised and alignment methods.
Drug-target gene discovery accuracy increases on the evaluated paired datasets.
Theoretical risk bounds for image-based prediction become tighter under transcriptomic guidance.
The approach explicitly manages cell-type and dose mismatches instead of relying on identity alignment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the codebook organization by drug similarity holds, the learned image representations may cluster compounds by shared mechanism even for novel compounds.
The distillation approach could inform designs for other weakly-paired multimodal problems in biology where one modality is mechanistic but expensive to obtain at scale.

Load-bearing premise

The transcriptome-conditioned teacher can produce soft distributions that meaningfully capture intervention semantics independent of sample identity and these can be reliably distilled to images despite cell-type and dose mismatches in weakly paired data.

What would settle it

If the image-only student shows no statistically significant gains over self-supervised and alignment baselines in one-shot transfer accuracy to unseen interventions or in drug-target gene discovery on the Cell Painting and RxRx datasets paired with L1000, the claimed benefits of the distillation framework would be falsified.

Figures

Figures reproduced from arXiv: 2604.22832 by Jiayuan Chen, Ping Zhang, Ruoqi Liu, Zishan Gu.

**Figure 1.** Figure 1: Causal pathway of drug interventions across modalities. Causal Structure. Figure 1 depicts the multiscale cellular state under a drug intervention. Let D denote the molecular structure of the interventional drug; after exposure, the drug binds to its molecular target(s), producing a targetengagement/signaling state Z. This perturbed state then drives changes at two levels: the transcriptome R and the mor… view at source ↗

**Figure 2.** Figure 2: Framework overview. Our approach uses a chemistry-aware codebook and transcriptome-conditioned teacher to guide image [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: Left: Microscopy images of A549 cells treated with BMS [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 3.** Figure 3: Effect of paired sample size and scFM fine-tuning on one [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 5.** Figure 5: Dose (µM) distribution across imaging and transcriptomic datasets. (a) Dose distribution in RxRx3 imaging data. (b-d) Dose distributions in L1000 transcriptomic profiles for HUVEC, U2OS, and A549 cell lines. GCTX (Gene Expression Connectivity) format, containing expression measurements for 978 landmark genes across multiple cell lines and perturbations. Each profile represents a bulk-cell population averag… view at source ↗

**Figure 6.** Figure 6: Drug-target discovery performance across dose levels on [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

read the original abstract

Microscopy-based phenotypic profiling is scalable for drug discovery but lacks the mechanistic depth of transcriptomics, which remains costly and scarce. Existing multimodal approaches either use images to support other modalities or naively align representations by sample identity, ignoring cell-type and dose variations in weakly paired data-limiting generalization to unseen interventions. In this paper, we introduce an intervention-aware distillation framework that leverages perturbational transcriptomics to guide image representation learning. A transcriptome-conditioned teacher integrates gene expression and intervention metadata to produce soft distributions over a chemistry-aware codebook organized by drug similarity. The teacher employs a fine-tuned single-cell foundation model to encode cell-type context and disentangle dose effects. An image-only student learns to predict these distributions from microscopy alone, distilling mechanistic knowledge while operating independently at test time. This design emphasizes intervention semantics rather than identity alignment and explicitly handles dose and cell-type mismatches. We provide theoretical guarantees showing that transcriptomic guidance tightens the risk bound for image-based prediction. On Cell Painting and RxRx datasets paired with L1000, our method significantly improves one-shot transfer to unseen interventions and drug-target gene discovery compared to self-supervised and alignment baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's real contribution is a distillation setup that uses a chemistry-aware codebook and single-cell model to pull intervention semantics from transcriptomics into image representations while handling dose and cell-type mismatches.

read the letter

The main thing to know is that this work tries to make image-based phenotypic models generalize to unseen interventions by distilling from a transcriptome teacher instead of relying on sample identity alignment. The teacher combines gene expression with intervention metadata through a fine-tuned single-cell foundation model and a codebook organized by drug similarity. The image student then learns to match the resulting soft distributions, so it can run without transcriptomics at test time. They report gains on one-shot transfer and drug-target gene discovery on Cell Painting and RxRx paired with L1000, plus a claim of tightened risk bounds from the transcriptomic guidance. That combination of explicit mismatch handling and the codebook structure is the clearest step beyond the naive alignment baselines they cite. The design targets a practical gap in scalable imaging for drug discovery. The soft spot is whether the teacher's outputs stay driven by intervention identity rather than sample-specific factors in the weakly paired regime. If cell-type or dose effects remain entangled in the codebook assignments, the student could pick up spurious correlations, which would weaken both the theoretical bound and the transfer results. The abstract flags the weak pairing, so the methods need to demonstrate that the disentanglement actually holds. This is for researchers working on multimodal representation learning in phenotypic screening and early drug discovery. Readers who already use foundation models or teacher-student setups in biology will see the most direct value. It deserves peer review because the problem is concrete, the approach is grounded in the data constraints, and the experiments use relevant benchmarks, even if the robustness checks will need close attention.

Referee Report

2 major / 2 minor

Summary. The paper introduces an intervention-aware distillation framework for learning image representations from microscopy data guided by perturbational transcriptomics. A transcriptome-conditioned teacher, built on a fine-tuned single-cell foundation model and a chemistry-aware codebook, generates soft distributions over intervention semantics; an image-only student distills these to operate independently at test time. The approach claims to handle cell-type and dose mismatches in weakly paired data, provides theoretical guarantees that transcriptomic guidance tightens the risk bound for image-based prediction, and reports empirical gains in one-shot transfer to unseen interventions and drug-target gene discovery on Cell Painting and RxRx datasets paired with L1000, outperforming self-supervised and alignment baselines.

Significance. If the theoretical guarantees and empirical results hold under the stated assumptions, the work could meaningfully advance multimodal representation learning for drug discovery by enabling mechanistic guidance from scarce transcriptomic data to scale imaging phenomics without requiring paired samples at inference. The explicit focus on intervention semantics rather than sample identity, combined with the teacher-student distillation design, addresses a recognized limitation in existing alignment methods.

major comments (2)

[Abstract and §4] Abstract and §4 (theoretical analysis): the claimed tightening of the risk bound for image-based prediction is load-bearing for the central contribution, yet the provided description does not specify the precise assumptions under which the bound holds—particularly whether the teacher’s soft distributions remain independent of cell-type and dose factors in the weakly paired L1000 regime. Without an explicit statement or derivation showing that residual entanglement is controlled, the guarantee risks being circular with the method’s own fitted quantities.
[§3.2] §3.2 (teacher architecture) and experimental setup: the skeptic’s concern is material. The fine-tuned single-cell foundation model plus chemistry-aware codebook must demonstrably produce soft labels driven by intervention identity rather than sample-specific covariates; any leakage from cell-type or dose mismatches would cause the student to learn spurious correlations, directly undermining both the one-shot transfer results and the drug-target discovery claims on unseen interventions.

minor comments (2)

[Figure 2 and §5] Figure 2 and §5: the visualization of codebook organization by drug similarity would benefit from an explicit legend or quantitative measure (e.g., silhouette score) showing separation by intervention rather than by cell line.
[Table 1] Table 1: error bars or standard deviations across the reported runs are not visible in the excerpt; adding them would strengthen the claim of significant improvement over baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of our theoretical and architectural contributions. We respond to each major comment below and outline targeted revisions.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (theoretical analysis): the claimed tightening of the risk bound for image-based prediction is load-bearing for the central contribution, yet the provided description does not specify the precise assumptions under which the bound holds—particularly whether the teacher’s soft distributions remain independent of cell-type and dose factors in the weakly paired L1000 regime. Without an explicit statement or derivation showing that residual entanglement is controlled, the guarantee risks being circular with the method’s own fitted quantities.

Authors: We agree that the assumptions underlying the risk-bound tightening require explicit statement to avoid any appearance of circularity. In the revised manuscript we will expand the theoretical analysis in §4 to list the precise conditions: (i) the teacher is trained with explicit conditioning on intervention metadata and a chemistry-aware codebook that groups by drug similarity, and (ii) the fine-tuned single-cell foundation model is used to encode and thereby factor out cell-type context while the dose effect is modeled as an additive shift in the latent space. Under these inductive biases the teacher’s soft distributions over intervention semantics are independent of the mismatched cell-type and dose factors present in the weakly paired regime. We will add a short derivation sketch showing that the excess risk of the image student is bounded by the teacher’s intervention-conditioned risk plus a term that vanishes when the teacher’s output is independent of the nuisance factors; this grounding in architecture rather than post-fit quantities removes the circularity concern. revision: yes
Referee: [§3.2] §3.2 (teacher architecture) and experimental setup: the skeptic’s concern is material. The fine-tuned single-cell foundation model plus chemistry-aware codebook must demonstrably produce soft labels driven by intervention identity rather than sample-specific covariates; any leakage from cell-type or dose mismatches would cause the student to learn spurious correlations, directly undermining both the one-shot transfer results and the drug-target discovery claims on unseen interventions.

Authors: We concur that empirical verification that the teacher’s soft labels are driven by intervention identity (rather than cell-type or dose leakage) is necessary to support the one-shot transfer and target-discovery claims. In the revision we will augment §3.2 with two new analyses: (1) Pearson correlations between the teacher’s soft-distribution vectors and intervention labels versus cell-type and dose labels across the L1000 training set, and (2) an ablation in which we replace the intervention-conditioned teacher with a version that receives only cell-type and dose metadata; the resulting drop in downstream one-shot accuracy and target-gene ranking will quantify the contribution of intervention semantics. These additions use only existing data and will be reported alongside the original results. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation relies on external data and standard principles without self-referential reduction.

full rationale

The paper defines a teacher-student distillation setup where the teacher is conditioned on transcriptomic data from L1000 to produce soft labels over a chemistry-aware codebook, and the student predicts these from images alone. The theoretical guarantee that transcriptomic guidance tightens the risk bound is presented as following from the guidance mechanism and standard bounds, not by redefining the bound in terms of the method's own outputs. No equations or steps in the provided text reduce a prediction to a fitted input by construction, invoke self-citations as load-bearing uniqueness theorems, or smuggle ansatzes via prior work. Empirical claims on Cell Painting/RxRx are comparisons to external baselines rather than renamed fits. The chain is self-contained against the external transcriptomic inputs and does not collapse to its own definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are detailed in the provided text.

pith-pipeline@v0.9.0 · 5511 in / 1126 out tokens · 28291 ms · 2026-05-10T05:44:26.214785+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 6 canonical work pages · 1 internal anchor

[1]

A cross modal knowledge distillation & data augmentation recipe for improving transcriptomics rep- resentations through morphological features.arXiv preprint arXiv:2505.21317, 2025

Ihab Bendidi, Yassir El Mesbahi, Alisandra K Denton, Karush Suri, Kian Kenyon-Dean, Auguste Genovesio, and Em- manuel Noutahi. A cross modal knowledge distillation & data augmentation recipe for improving transcriptomics rep- resentations through morphological features.arXiv preprint arXiv:2505.21317, 2025. 1, 2, 16

work page arXiv 2025
[2]

Cell painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes

Mark-Anthony Bray, Shantanu Singh, Han Han, Chadwick T Davis, Blake Borgeson, Cathy Hartland, Maria Kost-Alimova, Sigrun M Gustafsdottir, Christopher C Gibson, and Anne E Carpenter. Cell painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nature protocols, 11(9):1757–1774, 2016. 1, 2, 16

2016
[3]

A dataset of images and morphologi- cal profiles of 30 000 small-molecule treatments using the cell painting assay.Gigascience, 6(12):giw014, 2017

Mark-Anthony Bray, Sigrun M Gustafsdottir, Mohammad H Rohban, Shantanu Singh, Vebjorn Ljosa, Katherine L Sokol- nicki, Joshua A Bittker, Nicole E Bodycombe, Vlado Danˇc´ık, Thomas P Hasaka, et al. A dataset of images and morphologi- cal profiles of 30 000 small-molecule treatments using the cell painting assay.Gigascience, 6(12):giw014, 2017. 2, 5

2017
[4]

How to build the virtual cell with artificial intelligence: Priorities and opportunities.Cell, 187(25):7045–7063, 2024

Charlotte Bunne, Yusuf Roohani, Yanay Rosen, Ankit Gupta, Xikun Zhang, Marcel Roed, Theo Alexandrov, Mohammed AlQuraishi, Patricia Brennan, Daniel B Burkhardt, et al. How to build the virtual cell with artificial intelligence: Priorities and opportunities.Cell, 187(25):7045–7063, 2024. 1

2024
[5]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J ´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 1, 6, 13

2021
[6]

Biologi- cal cartography: Building and benchmarking representations of life

Safiye Celik, Jan-Christian Huetter, Sandra Melo, Nathan Lazar, Rahul Mohan, Conor Tillinghast, Tommaso Biancalani, Marta Fay, Berton Earnshaw, and Imran S Haque. Biologi- cal cartography: Building and benchmarking representations of life. InNeurIPS 2022 Workshop on Learning Meaningful Representations of Life, 2022. 1

2022
[7]

Three million images and morphological pro- files of cells treated with matched chemical and genetic pertur- bations.Nature Methods, 21(6):1114–1121, 2024

Srinivas Niranj Chandrasekaran, Beth A Cimini, Amy Goodale, Lisa Miller, Maria Kost-Alimova, Nasim Jamali, John G Doench, Briana Fritchman, Adam Skepner, Michelle Melanson, et al. Three million images and morphological pro- files of cells treated with matched chemical and genetic pertur- bations.Nature Methods, 21(6):1114–1121, 2024. 1, 2, 5, 7, 16

2024
[8]

Integrating biological knowledge for robust mi- croscopy image profiling on de novo cell lines

Jiayuan Chen, Thai-Hoang Pham, Yuanlong Wang, and Ping Zhang. Integrating biological knowledge for robust mi- croscopy image profiling on de novo cell lines. InProceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22846–22856, 2025. 1

2025
[9]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 7, 12

work page internal anchor Pith review Pith/arXiv arXiv 2010
[10]

How molecules impact cells: Unlocking contrastive phenomolecular retrieval.Advances in Neural Information Processing Systems, 37:110667–110701, 2024

Philip Fradkin, Puria Azadi Moghadam, Karush Suri, Frederik Wenkel, Ali Bashashati, Maciej Sypetkowski, and Dominique Beaini. How molecules impact cells: Unlocking contrastive phenomolecular retrieval.Advances in Neural Information Processing Systems, 37:110667–110701, 2024. 1, 7

2024
[11]

Large-scale foundation model on single- cell transcriptomics.Nature methods, 21(8):1481–1491, 2024

Minsheng Hao, Jing Gong, Xin Zeng, Chiming Liu, Yucheng Guo, Xingyi Cheng, Taifeng Wang, Jianzhu Ma, Xuegong Zhang, and Le Song. Large-scale foundation model on single- cell transcriptomics.Nature methods, 21(8):1481–1491, 2024. 1, 2, 4, 6, 11

2024
[12]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022. 1, 6, 13

2022
[13]

Vitally consistent: Scaling biological representation learning for cell microscopy

Kian Kenyon-Dean, Zitong Jerry Wang, John Urbanik, Kon- stantin Donhauser, Jason Hartford, Saber Saberian, Nil Sahin, Ihab Bendidi, Safiye Celik, Marta Fay, et al. Vitally consistent: Scaling biological representation learning for cell microscopy. arXiv preprint arXiv:2411.02572, 2024. 1, 2

work page arXiv 2024
[14]

Masked autoen- coders for microscopy are scalable learners of cellular biology

Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, et al. Masked autoen- coders for microscopy are scalable learners of cellular biology. InProceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 11757–11768, 2024. 1, 2, 7, 12

2024
[15]

Rxrx3-core: Benchmarking drug-target interactions in high-content mi- croscopy.arXiv preprint arXiv:2503.20158, 2025

Oren Kraus, Federico Comitani, John Urbanik, Kian Kenyon-Dean, Lakshmanan Arumugam, Saber Saberian, Cas Wognum, Safiye Celik, and Imran S Haque. Rxrx3-core: Benchmarking drug-target interactions in high-content mi- croscopy.arXiv preprint arXiv:2503.20158, 2025. 1, 2, 6, 14

work page arXiv 2025
[16]

Learning molec- ular representation in a cell.ArXiv, pages arXiv–2406, 2024

Gang Liu, Srijit Seal, John Arevalo, Zhenwen Liang, Anne E Carpenter, Meng Jiang, and Shantanu Singh. Learning molec- ular representation in a cell.ArXiv, pages arXiv–2406, 2024. 1, 2, 7, 16

2024
[17]

Learn- ing representations for image-based profiling of perturbations

Nikita Moshkov, Michael Bornholdt, Santiago Benoit, Matthew Smith, Claire McQuin, Allen Goodman, Rebecca A Senft, Yu Han, Mehrtash Babadi, Peter Horvath, et al. Learn- ing representations for image-based profiling of perturbations. Nature communications, 15(1):1594, 2024. 1

2024
[18]

Morphodiff: Cellular morphology painting with dif- fusion models.bioRxiv, 2024

Zeinab Navidi, Jun Ma, Esteban A Miglietta, Le Liu, Anne E Carpenter, Beth A Cimini, Benjamin Haibe-Kains, and Bo Wang. Morphodiff: Cellular morphology painting with dif- fusion models.bioRxiv, 2024. 16

2024
[19]

scperturb: har- monized single-cell perturbation data.Nature Methods, 21(3): 531–540, 2024

Stefan Peidli, Tessa D Green, Ciyue Shen, Torsten Gross, Joseph Min, Samuele Garda, Bo Yuan, Linus J Schumacher, Jake P Taylor-King, Debora S Marks, et al. scperturb: har- monized single-cell perturbation data.Nature Methods, 21(3): 531–540, 2024. 2

2024
[20]

Film: Visual reasoning with a general conditioning layer

Ethan Perez, Florian Strub, Harm De Vries, Vincent Du- moulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. InProceedings of the AAAI con- ference on artificial intelligence, 2018. 5, 13

2018
[21]

Multi-modal contrastive learning with negative sampling calibration for phenotypic drug dis- covery

Jiahua Rao, Hanjing Lin, Leyu Chen, Jiancong Xie, Shuangjia Zheng, and Yuedong Yang. Multi-modal contrastive learning with negative sampling calibration for phenotypic drug dis- covery. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 30752–30762, 2025. 16

2025
[22]

Mapping information-rich genotype-phenotype landscapes with genome-scale perturb-seq.Cell, 185(14): 2559–2575, 2022

Joseph M Replogle, Reuben A Saunders, Angela N Pogson, Jeffrey A Hussmann, Alexander Lenail, Alina Guna, Lauren Mascibroda, Eric J Wagner, Karen Adelman, Gila Lithwick- Yanai, et al. Mapping information-rich genotype-phenotype landscapes with genome-scale perturb-seq.Cell, 185(14): 2559–2575, 2022. 1

2022
[23]

Extended-connectivity fin- gerprints.Journal of chemical information and modeling, 50 (5):742–754, 2010

David Rogers and Mathew Hahn. Extended-connectivity fin- gerprints.Journal of chemical information and modeling, 50 (5):742–754, 2010. 8

2010
[24]

Toward a foundation model of causal cell and tissue biology with a perturbation cell and tissue atlas.Cell, 187(17):4520–4545,

Jennifer E Rood, Anna Hupalowska, and Aviv Regev. Toward a foundation model of causal cell and tissue biology with a perturbation cell and tissue atlas.Cell, 187(17):4520–4545,
[25]

Cloome: contrastive learning unlocks bioimaging databases for queries with chem- ical structures.Nature Communications, 14(1):7339, 2023

Ana Sanchez-Fernandez, Elisabeth Rumetshofer, Sepp Hochreiter, and G ¨unter Klambauer. Cloome: contrastive learning unlocks bioimaging databases for queries with chem- ical structures.Nature Communications, 14(1):7339, 2023. 1, 2, 7, 16

2023
[26]

A pooled cell painting crispr screening platform enables de novo inference of gene function by self-supervised deep learning.bioRxiv, pages 2023–08, 2023

Srinivasan Sivanandan, Bobby Leitmann, Eric Lubeck, Mo- hammad Muneeb Sultan, Panagiotis Stanitsas, Navpreet Ranu, Alexis Ewer, Jordan E Mancuso, Zachary F Phillips, Albert Kim, et al. A pooled cell painting crispr screening platform enables de novo inference of gene function by self-supervised deep learning.bioRxiv, pages 2023–08, 2023. 2

2023
[27]

A next generation connectivity map: L1000 platform and the first 1,000,000 profiles.Cell, 171(6):1437–1452, 2017

Aravind Subramanian, Rajiv Narayan, Steven M Corsello, David D Peck, Ted E Natoli, Xiaodong Lu, Joshua Gould, John F Davis, Andrew A Tubelli, Jacob K Asiedu, et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles.Cell, 171(6):1437–1452, 2017. 1, 2, 6, 16

2017
[28]

Transfer learning enables predictions in network biology.Na- ture, 618(7965):616–624, 2023

Christina V Theodoris, Ling Xiao, Anant Chopra, Mark D Chaffin, Zeina R Al Sayed, Matthew C Hill, Helene Manti- neo, Elizabeth M Brydon, Zexian Zeng, X Shirley Liu, et al. Transfer learning enables predictions in network biology.Na- ture, 618(7965):616–624, 2023. 2

2023
[29]

Removing biases from molecular represen- tations via information maximization.arXiv preprint arXiv:2312.00718, 2023

Chenyu Wang, Sharut Gupta, Caroline Uhler, and Tommi Jaakkola. Removing biases from molecular represen- tations via information maximization.arXiv preprint arXiv:2312.00718, 2023. 1, 2, 7, 16

work page arXiv 2023
[30]

Mor- phology and gene expression profiling provide complemen- tary information for mapping cell state.Cell systems, 13(11): 911–923, 2022

Gregory P Way, Ted Natoli, Adeniyi Adeboye, Lev Litichevskiy, Andrew Yang, Xiaodong Lu, Juan C Caicedo, Beth A Cimini, Kyle Karhohs, David J Logan, et al. Mor- phology and gene expression profiling provide complemen- tary information for mapping cell state.Cell systems, 13(11): 911–923, 2022. 16

2022
[31]

Perturbnet predicts single-cell responses to unseen chemical and genetic perturbations.Molecular Systems Bi- ology, 21(8):960–982, 2025

Hengshi Yu, Weizhou Qian, Yuxuan Song, and Joshua D Welch. Perturbnet predicts single-cell responses to unseen chemical and genetic perturbations.Molecular Systems Bi- ology, 21(8):960–982, 2025. 12

2025
[32]

Tahoe-100m: A giga-scale single-cell perturbation atlas for context-dependent gene function and cellular modeling.BioRxiv, pages 2025–02,

Jesse Zhang, Airol A Ubas, Richard de Borja, Valentine Svensson, Nicole Thomas, Neha Thakar, Ian Lai, Aidan Win- ters, Umair Khan, Matthew G Jones, et al. Tahoe-100m: A giga-scale single-cell perturbation atlas for context-dependent gene function and cellular modeling.BioRxiv, pages 2025–02,

2025
[33]

Cellflux: Simulating cellular morphology changes via flow matching

Yuhui Zhang, Yuchang Su, Chenyu Wang, Tianhong Li, Zoe Wefers, Jeffrey Nirschl, James Burgess, Daisy Ding, Alejan- dro Lozano, Emma Lundberg, et al. Cellflux: Simulating cel- lular morphology changes via flow matching.arXiv preprint arXiv:2502.09775, 2025. 16

work page arXiv 2025
[34]

w/o dose

Shuangjia Zheng, Jiahua Rao, Jixian Zhang, Lianyu Zhou, Jiancong Xie, Ethan Cohen, Wei Lu, Chengtao Li, and Yue- dong Yang. Cross-modal graph contrastive learning with cel- lular images.Advanced Science, 11(32):2404845, 2024. 1, 2, 16 A. Proof of Proposition 3.1 A.1. Theorem Statement Proposition A.1(Transcriptome-Guided Learning Bound). Given readout var...

2024