Closing the Domain Gap in Biomedical Imaging by In-Context Control Samples

Ana Sanchez-Fernandez; G\"unter Klambauer; Thomas Pinetz; Werner Zellinger

arxiv: 2604.20824 · v1 · submitted 2026-04-22 · 💻 cs.LG · q-bio.QM

Closing the Domain Gap in Biomedical Imaging by In-Context Control Samples

Ana Sanchez-Fernandez , Thomas Pinetz , Werner Zellinger , G\"unter Klambauer This is my paper

Pith reviewed 2026-05-10 00:02 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM

keywords batch effectsdomain adaptationmeta-learningbiomedical imagingmechanism of action classificationcontrol samplesdrug discovery

0 comments

The pith

Negative control samples present in every batch let meta-learning adapt models to new experimental conditions and close the domain gap in biomedical imaging.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Batch effects create technical variations that cause deep learning models trained on one set of experimental batches to lose accuracy on new ones, dropping from 0.939 to 0.862 on mechanism-of-action classification tasks. The paper introduces a meta-learning method that treats the negative control samples included by design in every batch as stable, unperturbed references for in-context adaptation. This approach raises performance on unseen batches to 0.935 while foundation models and standard normalization techniques fail to close the gap. The result demonstrates that principled use of built-in controls can neutralize batch effects without additional data collection.

Core claim

Meta-learning approaches that exploit negative control samples as in-context references close the domain gap caused by batch effects, achieving 0.935 ± 0.018 accuracy on new experimental batches in large-scale mechanism-of-action classification, compared with a drop to 0.862 ± 0.060 for standard ResNets.

What carries the argument

Control-Stabilized Adaptive Risk Minimization via Batch Normalization (CS-ARM-BN), a meta-learning adaptation method that uses unperturbed negative control samples present in every batch as stable context for adaptation via batch normalization.

If this is right

Models become practically usable on data from different labs or strong domain shifts when control samples are available for stabilization.
Batch effects in bioimaging can be neutralized through in-context adaptation rather than requiring new data or retraining.
Drug discovery pipelines that rely on mechanism-of-action classification gain reliability across experimental batches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Experiments in other scientific domains that routinely include negative controls could adopt similar in-context adaptation to improve model robustness.
The availability of controls by experimental design may prove more important for reliable machine learning than post-hoc normalization techniques.
This suggests testing whether the same control-based adaptation improves performance on other batch-sensitive tasks such as cell segmentation or image registration.

Load-bearing premise

Negative control samples are present in every experimental batch by design and serve as stable, unperturbed context for adaptation without introducing their own bias or domain shift.

What would settle it

An experiment on new batches where removing or perturbing the negative control samples causes accuracy to fall back to the 0.862 level of standard models would show the adaptation method does not close the gap.

Figures

Figures reproduced from arXiv: 2604.20824 by Ana Sanchez-Fernandez, G\"unter Klambauer, Thomas Pinetz, Werner Zellinger.

**Figure 1.** Figure 1: Performance of MoA classifier on JUMP-CP data. Error bars represent variance across five cross-validation folds. Green bar: within the training domain, the performance of the classifier is high. Orange bars: The performance of the classifier on images from new experimental batches ("new domain"). Even foundation models with normalization (FM+TVN) suffer performance declines, and domain adaptation methods… view at source ↗

**Figure 2.** Figure 2: Representation of batch effects in microscopy imaging data considered as a multi-source domain adaptation (MSDA) problem. In this setting, each source consists of different experimental conditions, e.g. different plates. In each domain, an image of the same class is depicted, which represents a particular mechanism-of-action (MoA). Control samples are unperturbed samples, that are present in every domain, … view at source ↗

**Figure 3.** Figure 3: Graphic representation of our method, CS-ARM-BN, and comparison to ARM-BN (Zhang et al., 2021). Both are meta-learning methods that are be modified at test-time by using the BN statistics from the target domain (lilac). CS-ARM-BN uses control samples both at training and at inference time, which provides stability when (b) the number of perturbed samples is small or (c) the label distribution is shifted. F… view at source ↗

read the original abstract

The central problem in biomedical imaging are batch effects: systematic technical variations unrelated to the biological signal of interest. These batch effects critically undermine experimental reproducibility and are the primary cause of failure of deep learning systems on new experimental batches, preventing their practical use in the real world. Despite years of research, no method has succeeded in closing this performance gap for deep learning models. We propose Control-Stabilized Adaptive Risk Minimization via Batch Normalization (CS-ARM-BN), a meta-learning adaptation method that exploits negative control samples. Such unperturbed reference images are present in every experimental batch by design and serve as stable context for adaptation. We validate our novel method on Mechanism-of-Action (MoA) classification, a crucial task for drug discovery, on the large-scale JUMP-CP dataset. The accuracy of standard ResNets drops from 0.939 $\pm$ 0.005, on the training domain, to 0.862 $\pm$ 0.060 on data from new experimental batches. Foundation models, even after Typical Variation Normalization, fail to close this gap. We are the first to show that meta-learning approaches close the domain gap by achieving 0.935 $\pm$ 0.018. If the new experimental batches exhibit strong domain shifts, such as being generated in a different lab, meta-learning approaches can be stabilized with control samples, which are always available in biomedical experiments. Our work shows that batch effects in bioimaging data can be effectively neutralized through principled in-context adaptation, which also makes them practically usable and efficient.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They recover most of the accuracy drop on new JUMP-CP batches with meta-learning on negative controls, but the method's success hinges on an unverified claim that those controls stay stable across batches.

read the letter

The main point is that standard ResNets lose about 8 points of accuracy on new experimental batches for Mechanism-of-Action classification, and this CS-ARM-BN approach brings it back to within 0.4 points of the in-domain number using in-context adaptation on the negative controls that are already in every batch. That is a practical result on a large public dataset, and it beats the foundation-model baselines they tried after typical normalization. The idea of treating the controls as always-available, unperturbed references for meta-learning adaptation is the concrete new piece here, and it fits the experimental design of high-throughput bioimaging without needing extra data collection. They report means and standard deviations, which is better than many papers in this area. The work is grounded in the actual constraints of drug-discovery imaging rather than abstract domain-adaptation theory. The soft spot is exactly the one the stress test flags: the method treats the controls as domain-invariant anchors, yet the abstract gives no quantitative check that their intensity distributions or embeddings are actually stable between training and test batches. If the controls carry residual batch effects, the adaptation could be learning to match those instead of neutralizing them, which would change the interpretation. The paper would be stronger with an explicit invariance test or ablation on control quality. Prior-art coverage also needs to be thorough in the full text to support the “first to show” phrasing. This is for researchers who run or analyze high-content screening data and already know batch effects are the blocker. Anyone working on practical meta-learning or in-context adaptation in imaging will find the setup useful. It is worth sending to peer review because the problem is real, the dataset is standard, and the numbers are reported with error bars; a referee can push on the control-stability check and the comparison to other adaptation baselines.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Control-Stabilized Adaptive Risk Minimization via Batch Normalization (CS-ARM-BN), a meta-learning adaptation technique that exploits negative control samples present by design in every experimental batch as stable, unperturbed in-context references to neutralize batch effects in biomedical imaging. Evaluated on Mechanism-of-Action classification using the large-scale JUMP-CP dataset, it reports that standard ResNets drop from 0.939 ± 0.005 accuracy on the training domain to 0.862 ± 0.060 on new batches, while CS-ARM-BN achieves 0.935 ± 0.018 and claims to be the first meta-learning approach to close this gap; foundation models with typical variation normalization are shown to fail.

Significance. If the result holds, the work would be significant for machine learning applications in the life sciences, where batch effects are a primary obstacle to deploying models in real experimental pipelines such as drug discovery. The approach is grounded in a domain property (availability of controls) rather than generic domain adaptation, and the concrete accuracy numbers with standard deviations on a named large-scale dataset, together with comparisons to baselines, provide a clear empirical anchor. This could encourage further development of in-context adaptation methods tailored to scientific data modalities.

major comments (1)

[Methods and Experiments] The central claim that meta-learning closes the domain gap rests on negative control samples being free of residual batch effects or bias and serving purely as stable references. No quantitative verification—such as statistical comparison of intensity histograms, feature embeddings, or invariance metrics between controls in training versus new batches—is reported in the methods or experimental sections to support this assumption. If controls carry domain-specific signals, the adaptation could exploit them rather than neutralize batch effects, undermining the interpretation of the 0.935 ± 0.018 result.

minor comments (2)

[Abstract] The abstract states concrete accuracy figures with standard deviations but provides no overview of the meta-learning objective, the precise role of batch normalization in CS-ARM-BN, or the number of experimental batches used; adding a short methods paragraph or table summarizing these would improve readability without altering the claims.
[Introduction] The related-work discussion should explicitly position CS-ARM-BN against prior meta-learning and domain-adaptation techniques applied to bioimaging to clarify the novelty of the control-sample stabilization step.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the significance of our work and for the constructive major comment. We address it point by point below.

read point-by-point responses

Referee: [Methods and Experiments] The central claim that meta-learning closes the domain gap rests on negative control samples being free of residual batch effects or bias and serving purely as stable references. No quantitative verification—such as statistical comparison of intensity histograms, feature embeddings, or invariance metrics between controls in training versus new batches—is reported in the methods or experimental sections to support this assumption. If controls carry domain-specific signals, the adaptation could exploit them rather than neutralize batch effects, undermining the interpretation of the 0.935 ± 0.018 result.

Authors: We agree that the manuscript does not include explicit quantitative verification of the stability of negative control samples (e.g., intensity histogram comparisons, embedding invariances, or other metrics) across training and new batches. This assumption is grounded in the JUMP-CP experimental design, where negative controls are included by construction as unperturbed references. However, to directly address the concern and strengthen the interpretation of our results, we will add a dedicated analysis subsection in the revised methods and experiments. This will report statistical comparisons of controls between domains, including histogram distances and feature embedding metrics, to confirm they do not carry exploitable domain-specific signals. We believe this addition will support that the performance gain to 0.935 ± 0.018 arises from batch-effect neutralization via in-context adaptation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results stand independently of self-referential definitions

full rationale

The paper introduces CS-ARM-BN as a meta-learning adaptation technique that uses negative control samples (present by design in every batch) as stable in-context references for batch normalization. The headline performance claim (0.935 ± 0.018 accuracy closing the domain gap) is presented as an empirical outcome on the JUMP-CP dataset, with direct comparisons to ResNet baselines (dropping to 0.862) and foundation models. No equations, parameter-fitting steps, or derivations are described that would reduce the reported prediction to a fitted input or self-defined quantity. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked in the provided text. The assumption that controls introduce no bias is asserted from experimental design rather than derived circularly from the method itself. The result is therefore self-contained against the stated benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain property that control samples exist in every batch and can be used for stable adaptation.

axioms (1)

domain assumption Negative control samples are unperturbed reference images present in every experimental batch by design
Stated directly in the abstract as the basis for in-context adaptation.

pith-pipeline@v0.9.0 · 5596 in / 1056 out tokens · 36662 ms · 2026-05-10T00:02:51.914285+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages

[1]

M., McLean, C

Ando, D. M., McLean, C. Y., and Berndl, M. (2017). Improving phenotypic measurements in high-content imaging screens. bioRxiv , page 161422

work page 2017
[2]

D., van Dijk, R., Carpenter, A

Arevalo, J., Su, E., Ewald, J. D., van Dijk, R., Carpenter, A. E., and Singh, S. (2024). Evaluating batch correction methods for image-based cell profiling. Nature Communications 2024 15:1 , 15:1--12

work page 2024
[3]

Baxter, J. (1998). Theoretical models of learning to learn. In Learning to learn , pages 71--94. Springer

work page 1998
[4]

Ben-David, S., Blitzer, J., Crammer, K., and Pereira, F. (2006). Analysis of representations for domain adaptation. Advances in neural information processing systems , 19

work page 2006
[5]

Blanchard, G., Lee, G., and Scott, C. (2011). Generalizing from several related classification tasks to a new unlabeled sample. In Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., and Weinberger, K., editors, Advances in Neural Information Processing Systems , volume 24. Curran Associates, Inc

work page 2011
[6]

B., and Bertinetto, L

Boudiaf, M., Mueller, R., Ayed, I. B., and Bertinetto, L. (2022). Parameter-free online test-time adaptation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2022-June:8334--8343

work page 2022
[7]

Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., and Krishnan, D. (2017). Unsupervised pixel-level domain adaptation with generative adversarial networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 , 2017-January:95--104

work page 2017
[8]

I., and Platform, H

Bray, M.-A., Carpenter, A., of MIT, B. I., and Platform, H. I. (2017). Advanced assay development guidelines for image-based high content screening and analysis. Assay Guidance Manual

work page 2017
[9]

N., Ackerman, J., Alix, E., Ando, D

Chandrasekaran, S. N., Ackerman, J., Alix, E., Ando, D. M., Arevalo, J., Bennion, M., Boisseau, N., Borowa, A., Boyd, J. D., Brino, L., Byrne, P. J., Ceulemans, H., Ch’ng, C., Cimini, B. A., Clevert, D.-A., Deflaux, N., Doench, J. G., Dorval, T., Doyonnas, R., Dragone, V., Engkvist, O., Faloon, P. W., Fritchman, B., Fuchs, F., Garg, S., Gilbert, T. J., Gl...

work page 2023
[10]

A., Li, J

Chen, W., Zhao, Y., Chen, X., Yang, Z., Xu, X., Bi, Y., Chen, V., Li, J., Choi, H., Ernest, B., Tran, B., Mehta, M., Kumar, P., Farmer, A., Mir, A., Mehra, U. A., Li, J. L., Moos, M., Xiao, W., and Wang, C. (2020). A multicenter study benchmarking single-cell rna sequencing technologies using reference samples. Nature Biotechnology 2020 39:9 , 39:1103--1114

work page 2020
[11]

Chung, J., Hyun, S., and Heo, J. P. (2024). Style injection in diffusion: A training-free approach for adapting large-scale diffusion models for style transfer. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , pages 8795--8805

work page 2024
[12]

P., and Roohani, Y

Dong, M., Adduri, A., Gautam, D., Carpenter, C., Shah, R., Ricci-Tam, C., Kluger, Y., Burke, D. P., and Roohani, Y. H. (2026). Stack: In-context learning of single-cell biology. bioRxiv

work page 2026
[13]

Farahani, A., Voghoei, S., Rasheed, K., and Arabnia, H. R. (2020). A brief review of domain adaptation. ArXiv , pages 877--894

work page 2020
[14]

Finn, C., Abbeel, P., and Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning , pages 1126--1135. PMLR

work page 2017
[15]

and Lempitsky, V

Ganin, Y. and Lempitsky, V. (2014). Unsupervised domain adaptation by backpropagation. 32nd International Conference on Machine Learning, ICML 2015 , 2:1180--1189

work page 2014
[16]

Gong, T., Jeong, J., Kim, T., Kim, Y., Shin, J., and Lee, S. J. (2022). Note: Robust continual test-time adaptation against temporal correlation. Advances in Neural Information Processing Systems , 35

work page 2022
[17]

Graham, B., El-Nouby, A., Touvron, H., Stock, P., Joulin, A., Jégou, H., and Douze, M. (2021). Levit: a vision transformer in convnet's clothing for faster inference. Proceedings of the IEEE International Conference on Computer Vision , pages 12239--12249

work page 2021
[18]

Multi-Source Domain Adaptation with Mixture of Experts

Guo, J., Shah, D. J., and Barzilay, R. (2018). Multi-source domain adaptation with mixture of experts. arXiv preprint arXiv:1809.02256

work page Pith review arXiv 2018
[19]

T., Morgan, M

Haghverdi, L., Lun, A. T., Morgan, M. D., and Marioni, J. C. (2018). Batch effects in single-cell rna-sequencing data are corrected by matching mutual nearest neighbors. Nature Biotechnology 2018 36:5 , 36:421--427

work page 2018
[20]

F., Matsoukas, C., Leuchowius, K

Haslum, J. F., Matsoukas, C., Leuchowius, K. J., and Smith, K. (2023). Bridging generalization gaps in high content imaging through online self-supervised domain adaptation. IEEE Workshop/Winter Conference on Applications of Computer Vision , pages 7723--7732

work page 2023
[21]

He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2016-December:770--778

work page 2015
[22]

Hie, B., Bryson, B., and Berger, B. (2019). Efficient integration of heterogeneous single-cell transcriptomes using scanorama. Nature biotechnology , 37:685--691

work page 2019
[23]

S., and Conwell, P

Hochreiter, S., Younger, A. S., and Conwell, P. R. (2001). Learning to learn using gradient descent. In International conference on artificial neural networks , pages 87--94. Springer

work page 2001
[24]

P., Rees, S

Hughes, J. P., Rees, S. S., Kalindjian, S. B., and Philpott, K. L. (2011). Principles of early drug discovery. British Journal of Pharmacology , 162:1239

work page 2011
[25]

and Szegedy, C

Ioffe, S. and Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 , ICML'15, page 448–456. JMLR.org

work page 2015
[26]

E., Li, C., and Rabinovic, A

Johnson, W. E., Li, C., and Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics (Oxford, England) , 8:118--127

work page 2007
[27]

M., Halawa, M., König, T., Gnutt, D., and Zapata, P

Kim, V., Adaloglou, N., Osterland, M., Morelli, F. M., Halawa, M., König, T., Gnutt, D., and Zapata, P. A. M. (2025). Self-supervision advances morphological profiling by unlocking powerful image representations. Scientific Reports 2025 15:1 , 15:1--15

work page 2025
[28]

and Hino, H

Kimura, M. and Hino, H. (2024). A short survey on importance weighting for machine learning. arXiv preprint arXiv:2403.10175

work page arXiv 2024
[29]

and Gromo, G

Knowles, J. and Gromo, G. (2003). A guide to drug discovery: Target selection in drug discovery. Nature reviews. Drug discovery , 2:63--69

work page 2003
[30]

Korsunsky, I., Millard, N., Fan, J., Slowikowski, K., Zhang, F., Wei, K., Baglaenko, Y., Brenner, M., ru Loh, P., and Raychaudhuri, S. (2019). Fast, sensitive and accurate integration of single-cell data with harmony. Nature Methods 2019 16:12 , 16:1289--1296

work page 2019
[31]

V., Morse, K., Makes, M., Mabey, B., and Earnshaw, B

Kraus, O., Kenyon-Dean, K., Saberian, S., Fallah, M., McLean, P., Leung, J., Sharma, V., Khan, A., Balakrishnan, J., Celik, S., Beaini, D., Sypetkowski, M., Cheng, C. V., Morse, K., Makes, M., Mabey, B., and Earnshaw, B. (2024). Masked autoencoders for microscopy are scalable learners of cellular biology. Proceedings of the IEEE Computer Society Conferenc...

work page 2024
[32]

Lee, J., Jung, D., Lee, S., Park, J., Shin, J., Hwang, U., and Yoon, S. (2024). Entropy is not enough for test-time adaptation: From the perspective of disentangled factors. 12th International Conference on Learning Representations, ICLR 2024

work page 2024
[33]

Leek, Robert B

Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., Geman, D., Baggerly, K., and Irizarry, R. A. (2010). Tackling the widespread and critical impact of batch effects in high-throughput data. Nature reviews. Genetics , 11:10.1038/nrg2825

work page doi:10.1038/nrg2825 2010
[34]

Li, Y., Wang, N., Shi, J., Liu, J., and Hou, X. (2016). Revisiting batch normalization for practical domain adaptation. International Conference on Learning Representations

work page 2016
[35]

and Lu, A

Lin, A. and Lu, A. (2022). Incorporating knowledge of plates in batch normalization improves generalization of deep learning for microscopy images. In Knowles, D. A., Mostafavi, S., and Lee, S.-I., editors, Proceedings of the 17th Machine Learning in Computational Biology meeting , volume 200 of Proceedings of Machine Learning Research , pages 74--93. PMLR

work page 2022
[36]

and Tuzel, O

Liu, M.-Y. and Tuzel, O. (2016). Coupled generative adversarial networks. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems , volume 29. Curran Associates, Inc

work page 2016
[37]

B., Jordan, M

Lopez, R., Regier, J., Cole, M. B., Jordan, M. I., and Yosef, N. (2018). Deep generative modeling for single-cell transcriptomics. Nature methods , 15:1053--1058

work page 2018
[38]

D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M

Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M., and Theis, F. J. (2021). Benchmarking atlas-level data integration in single-cell genomics. Nature Methods 2021 19:1 , 19:41--50

work page 2021
[39]

A., Döbler, M., and Yang, B

Marsden, R. A., Döbler, M., and Yang, B. (2023). Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction. Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024 , pages 2543--2553

work page 2023
[40]

J., and Lotfollahi, M

Palma, A., Theis, F. J., and Lotfollahi, M. (2025). Predicting cell morphological responses to perturbations using generative modeling. Nature Communications , 16:1--19

work page 2025
[41]

Park, S., Yang, S., Choo, J., and Yun, S. (2023). Label shift adapter for test-time adaptation under covariate and label shifts. Proceedings of the IEEE International Conference on Computer Vision , pages 16375--16385

work page 2023
[42]

D., Shen, C., Gross, T., Min, J., Garda, S., Yuan, B., Schumacher, L

Peidli, S., Green, T. D., Shen, C., Gross, T., Min, J., Garda, S., Yuan, B., Schumacher, L. J., Taylor-King, J. P., Marks, D. S., Luna, A., Blüthgen, N., and Sander, C. (2023). sc P erturb: Harmonized single-cell perturbation data. bioRxiv , page 2022.08.20.504663

work page 2023
[43]

Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., and Wang, B. (2019). Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision , pages 1406--1415

work page 2019
[44]

Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference , 90(2):227--244

work page 2000
[45]

R., Swain-Bowden, M

Stirling, D. R., Swain-Bowden, M. J., Lucas, A. M., Carpenter, A. E., Cimini, B. A., and Goodman, A. (2021). Cellprofiler 4: improvements in speed, utility and usability. BMC Bioinformatics , 22:433

work page 2021
[46]

M., Hao, Y., Stoeckius, M., Smibert, P., and Satija, R

Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., Mauck, W. M., Hao, Y., Stoeckius, M., Smibert, P., and Satija, R. (2019). Comprehensive integration of single-cell data. Cell , 177:1888--1902.e21

work page 2019
[47]

Sun, B., Feng, J., and Saenko, K. (2015). Return of frustratingly easy domain adaptation. 30th AAAI Conference on Artificial Intelligence, AAAI 2016 , pages 2058--2065

work page 2015
[48]

R., Haque, I., and Earnshaw, B

Sypetkowski, M., Rezanejad, M., Saberian, S., Kraus, O., Urbanik, J., Taylor, J., Mabey, B., Victors, M., Yosinski, J., Sereshkeh, A. R., Haque, I., and Earnshaw, B. (2023). R x R x1: A dataset for evaluating experimental batch correction methods. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops , 2023-June:4285--4294

work page 2023
[49]

Taigman, Y., Polyak, A., and Wolf, L. (2016). Unsupervised cross-domain image generation. ArXiv , abs/1611.02200

work page arXiv 2016
[50]

N., Singh, D., Revanur, A., et al

Venkat, N., Kundu, J. N., Singh, D., Revanur, A., et al. (2020). Your classifier can secretly suffice multi-source domain adaptation. Advances in Neural Information Processing Systems , 33:4647--4659

work page 2020
[51]

Wang, D., Shelhamer, E., Liu, S., Olshausen, B., and Darrell, T. (2021). TENT : Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations

work page 2021
[52]

V., and Dai, D

Wang, Q., Fink, O., Gool, L. V., and Dai, D. (2022). Continual test-time domain adaptation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2022-June:7191--7201

work page 2022
[53]

Wen, J., Greiner, R., and Schuurmans, D. (2020). Domain aggregation networks for multi-source domain adaptation. In International conference on machine learning , pages 10214--10224. PMLR

work page 2020
[54]

Yang, L., Balaji, Y., Lim, S.-N., and Shrivastava, A. (2020). Curriculum manager for source selection in multi-source domain adaptation. In European conference on computer vision , pages 608--624. Springer

work page 2020
[55]

Zellinger, W., Grubinger, T., Lughofer, E., Natschl \"a ger, T., and Saminger-Platz, S. (2017). Central moment discrepancy (cmd) for domain-invariant representation learning. In International Conference on Learning Representations (ICLR)

work page 2017
[56]

A., de Borja, R., Svensson, V., Thomas, N., Thakar, N., Lai, I., Winters, A., Khan, U., Jones, M

Zhang, J., Ubas, A. A., de Borja, R., Svensson, V., Thomas, N., Thakar, N., Lai, I., Winters, A., Khan, U., Jones, M. G., Tran, V., Pangallo, J., Papalexi, E., Sapre, A., Nguyen, H., Sanderson, O., Nigos, M., Kaplan, O., Schroeder, S., Hariadi, B., Marrujo, S., Salvino, C. C. A., Gallareta Olivares, G., Koehler, R., Geiss, G., Rosenberg, A., Roco, C., Mer...

work page 2025
[57]

Zhang, M., Marklund, H., Dhawan, N., Gupta, A., Levine, S., and Finn, C. (2021). Adaptive R isk M inimization: learning to adapt to domain shift. In Proceedings of the 35th International Conference on Neural Information Processing Systems , NeurIPS '21, Red Hook, NY, USA. Curran Associates Inc

work page 2021
[58]

Zhao, H., Liu, Y., Alahi, A., and Lin, T. (2023). On pitfalls of test-time adaptation. Proceedings of Machine Learning Research , 202:42058--42080

work page 2023
[59]

M., Costeira, J

Zhao, H., Zhang, S., Wu, G., Moura, J. M., Costeira, J. P., and Gordon, G. J. (2018). Adversarial M ultiple S ource D omain A daptation. Advances in neural information processing systems , 31

work page 2018

[1] [1]

M., McLean, C

Ando, D. M., McLean, C. Y., and Berndl, M. (2017). Improving phenotypic measurements in high-content imaging screens. bioRxiv , page 161422

work page 2017

[2] [2]

D., van Dijk, R., Carpenter, A

Arevalo, J., Su, E., Ewald, J. D., van Dijk, R., Carpenter, A. E., and Singh, S. (2024). Evaluating batch correction methods for image-based cell profiling. Nature Communications 2024 15:1 , 15:1--12

work page 2024

[3] [3]

Baxter, J. (1998). Theoretical models of learning to learn. In Learning to learn , pages 71--94. Springer

work page 1998

[4] [4]

Ben-David, S., Blitzer, J., Crammer, K., and Pereira, F. (2006). Analysis of representations for domain adaptation. Advances in neural information processing systems , 19

work page 2006

[5] [5]

Blanchard, G., Lee, G., and Scott, C. (2011). Generalizing from several related classification tasks to a new unlabeled sample. In Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., and Weinberger, K., editors, Advances in Neural Information Processing Systems , volume 24. Curran Associates, Inc

work page 2011

[6] [6]

B., and Bertinetto, L

Boudiaf, M., Mueller, R., Ayed, I. B., and Bertinetto, L. (2022). Parameter-free online test-time adaptation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2022-June:8334--8343

work page 2022

[7] [7]

Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., and Krishnan, D. (2017). Unsupervised pixel-level domain adaptation with generative adversarial networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 , 2017-January:95--104

work page 2017

[8] [8]

I., and Platform, H

Bray, M.-A., Carpenter, A., of MIT, B. I., and Platform, H. I. (2017). Advanced assay development guidelines for image-based high content screening and analysis. Assay Guidance Manual

work page 2017

[9] [9]

N., Ackerman, J., Alix, E., Ando, D

Chandrasekaran, S. N., Ackerman, J., Alix, E., Ando, D. M., Arevalo, J., Bennion, M., Boisseau, N., Borowa, A., Boyd, J. D., Brino, L., Byrne, P. J., Ceulemans, H., Ch’ng, C., Cimini, B. A., Clevert, D.-A., Deflaux, N., Doench, J. G., Dorval, T., Doyonnas, R., Dragone, V., Engkvist, O., Faloon, P. W., Fritchman, B., Fuchs, F., Garg, S., Gilbert, T. J., Gl...

work page 2023

[10] [10]

A., Li, J

Chen, W., Zhao, Y., Chen, X., Yang, Z., Xu, X., Bi, Y., Chen, V., Li, J., Choi, H., Ernest, B., Tran, B., Mehta, M., Kumar, P., Farmer, A., Mir, A., Mehra, U. A., Li, J. L., Moos, M., Xiao, W., and Wang, C. (2020). A multicenter study benchmarking single-cell rna sequencing technologies using reference samples. Nature Biotechnology 2020 39:9 , 39:1103--1114

work page 2020

[11] [11]

Chung, J., Hyun, S., and Heo, J. P. (2024). Style injection in diffusion: A training-free approach for adapting large-scale diffusion models for style transfer. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , pages 8795--8805

work page 2024

[12] [12]

P., and Roohani, Y

Dong, M., Adduri, A., Gautam, D., Carpenter, C., Shah, R., Ricci-Tam, C., Kluger, Y., Burke, D. P., and Roohani, Y. H. (2026). Stack: In-context learning of single-cell biology. bioRxiv

work page 2026

[13] [13]

Farahani, A., Voghoei, S., Rasheed, K., and Arabnia, H. R. (2020). A brief review of domain adaptation. ArXiv , pages 877--894

work page 2020

[14] [14]

Finn, C., Abbeel, P., and Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning , pages 1126--1135. PMLR

work page 2017

[15] [15]

and Lempitsky, V

Ganin, Y. and Lempitsky, V. (2014). Unsupervised domain adaptation by backpropagation. 32nd International Conference on Machine Learning, ICML 2015 , 2:1180--1189

work page 2014

[16] [16]

Gong, T., Jeong, J., Kim, T., Kim, Y., Shin, J., and Lee, S. J. (2022). Note: Robust continual test-time adaptation against temporal correlation. Advances in Neural Information Processing Systems , 35

work page 2022

[17] [17]

Graham, B., El-Nouby, A., Touvron, H., Stock, P., Joulin, A., Jégou, H., and Douze, M. (2021). Levit: a vision transformer in convnet's clothing for faster inference. Proceedings of the IEEE International Conference on Computer Vision , pages 12239--12249

work page 2021

[18] [18]

Multi-Source Domain Adaptation with Mixture of Experts

Guo, J., Shah, D. J., and Barzilay, R. (2018). Multi-source domain adaptation with mixture of experts. arXiv preprint arXiv:1809.02256

work page Pith review arXiv 2018

[19] [19]

T., Morgan, M

Haghverdi, L., Lun, A. T., Morgan, M. D., and Marioni, J. C. (2018). Batch effects in single-cell rna-sequencing data are corrected by matching mutual nearest neighbors. Nature Biotechnology 2018 36:5 , 36:421--427

work page 2018

[20] [20]

F., Matsoukas, C., Leuchowius, K

Haslum, J. F., Matsoukas, C., Leuchowius, K. J., and Smith, K. (2023). Bridging generalization gaps in high content imaging through online self-supervised domain adaptation. IEEE Workshop/Winter Conference on Applications of Computer Vision , pages 7723--7732

work page 2023

[21] [21]

He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2016-December:770--778

work page 2015

[22] [22]

Hie, B., Bryson, B., and Berger, B. (2019). Efficient integration of heterogeneous single-cell transcriptomes using scanorama. Nature biotechnology , 37:685--691

work page 2019

[23] [23]

S., and Conwell, P

Hochreiter, S., Younger, A. S., and Conwell, P. R. (2001). Learning to learn using gradient descent. In International conference on artificial neural networks , pages 87--94. Springer

work page 2001

[24] [24]

P., Rees, S

Hughes, J. P., Rees, S. S., Kalindjian, S. B., and Philpott, K. L. (2011). Principles of early drug discovery. British Journal of Pharmacology , 162:1239

work page 2011

[25] [25]

and Szegedy, C

Ioffe, S. and Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 , ICML'15, page 448–456. JMLR.org

work page 2015

[26] [26]

E., Li, C., and Rabinovic, A

Johnson, W. E., Li, C., and Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics (Oxford, England) , 8:118--127

work page 2007

[27] [27]

M., Halawa, M., König, T., Gnutt, D., and Zapata, P

Kim, V., Adaloglou, N., Osterland, M., Morelli, F. M., Halawa, M., König, T., Gnutt, D., and Zapata, P. A. M. (2025). Self-supervision advances morphological profiling by unlocking powerful image representations. Scientific Reports 2025 15:1 , 15:1--15

work page 2025

[28] [28]

and Hino, H

Kimura, M. and Hino, H. (2024). A short survey on importance weighting for machine learning. arXiv preprint arXiv:2403.10175

work page arXiv 2024

[29] [29]

and Gromo, G

Knowles, J. and Gromo, G. (2003). A guide to drug discovery: Target selection in drug discovery. Nature reviews. Drug discovery , 2:63--69

work page 2003

[30] [30]

Korsunsky, I., Millard, N., Fan, J., Slowikowski, K., Zhang, F., Wei, K., Baglaenko, Y., Brenner, M., ru Loh, P., and Raychaudhuri, S. (2019). Fast, sensitive and accurate integration of single-cell data with harmony. Nature Methods 2019 16:12 , 16:1289--1296

work page 2019

[31] [31]

V., Morse, K., Makes, M., Mabey, B., and Earnshaw, B

Kraus, O., Kenyon-Dean, K., Saberian, S., Fallah, M., McLean, P., Leung, J., Sharma, V., Khan, A., Balakrishnan, J., Celik, S., Beaini, D., Sypetkowski, M., Cheng, C. V., Morse, K., Makes, M., Mabey, B., and Earnshaw, B. (2024). Masked autoencoders for microscopy are scalable learners of cellular biology. Proceedings of the IEEE Computer Society Conferenc...

work page 2024

[32] [32]

Lee, J., Jung, D., Lee, S., Park, J., Shin, J., Hwang, U., and Yoon, S. (2024). Entropy is not enough for test-time adaptation: From the perspective of disentangled factors. 12th International Conference on Learning Representations, ICLR 2024

work page 2024

[33] [33]

Leek, Robert B

Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., Geman, D., Baggerly, K., and Irizarry, R. A. (2010). Tackling the widespread and critical impact of batch effects in high-throughput data. Nature reviews. Genetics , 11:10.1038/nrg2825

work page doi:10.1038/nrg2825 2010

[34] [34]

Li, Y., Wang, N., Shi, J., Liu, J., and Hou, X. (2016). Revisiting batch normalization for practical domain adaptation. International Conference on Learning Representations

work page 2016

[35] [35]

and Lu, A

Lin, A. and Lu, A. (2022). Incorporating knowledge of plates in batch normalization improves generalization of deep learning for microscopy images. In Knowles, D. A., Mostafavi, S., and Lee, S.-I., editors, Proceedings of the 17th Machine Learning in Computational Biology meeting , volume 200 of Proceedings of Machine Learning Research , pages 74--93. PMLR

work page 2022

[36] [36]

and Tuzel, O

Liu, M.-Y. and Tuzel, O. (2016). Coupled generative adversarial networks. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems , volume 29. Curran Associates, Inc

work page 2016

[37] [37]

B., Jordan, M

Lopez, R., Regier, J., Cole, M. B., Jordan, M. I., and Yosef, N. (2018). Deep generative modeling for single-cell transcriptomics. Nature methods , 15:1053--1058

work page 2018

[38] [38]

D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M

Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M., and Theis, F. J. (2021). Benchmarking atlas-level data integration in single-cell genomics. Nature Methods 2021 19:1 , 19:41--50

work page 2021

[39] [39]

A., Döbler, M., and Yang, B

Marsden, R. A., Döbler, M., and Yang, B. (2023). Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction. Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024 , pages 2543--2553

work page 2023

[40] [40]

J., and Lotfollahi, M

Palma, A., Theis, F. J., and Lotfollahi, M. (2025). Predicting cell morphological responses to perturbations using generative modeling. Nature Communications , 16:1--19

work page 2025

[41] [41]

Park, S., Yang, S., Choo, J., and Yun, S. (2023). Label shift adapter for test-time adaptation under covariate and label shifts. Proceedings of the IEEE International Conference on Computer Vision , pages 16375--16385

work page 2023

[42] [42]

D., Shen, C., Gross, T., Min, J., Garda, S., Yuan, B., Schumacher, L

Peidli, S., Green, T. D., Shen, C., Gross, T., Min, J., Garda, S., Yuan, B., Schumacher, L. J., Taylor-King, J. P., Marks, D. S., Luna, A., Blüthgen, N., and Sander, C. (2023). sc P erturb: Harmonized single-cell perturbation data. bioRxiv , page 2022.08.20.504663

work page 2023

[43] [43]

Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., and Wang, B. (2019). Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision , pages 1406--1415

work page 2019

[44] [44]

Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference , 90(2):227--244

work page 2000

[45] [45]

R., Swain-Bowden, M

Stirling, D. R., Swain-Bowden, M. J., Lucas, A. M., Carpenter, A. E., Cimini, B. A., and Goodman, A. (2021). Cellprofiler 4: improvements in speed, utility and usability. BMC Bioinformatics , 22:433

work page 2021

[46] [46]

M., Hao, Y., Stoeckius, M., Smibert, P., and Satija, R

Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., Mauck, W. M., Hao, Y., Stoeckius, M., Smibert, P., and Satija, R. (2019). Comprehensive integration of single-cell data. Cell , 177:1888--1902.e21

work page 2019

[47] [47]

Sun, B., Feng, J., and Saenko, K. (2015). Return of frustratingly easy domain adaptation. 30th AAAI Conference on Artificial Intelligence, AAAI 2016 , pages 2058--2065

work page 2015

[48] [48]

R., Haque, I., and Earnshaw, B

Sypetkowski, M., Rezanejad, M., Saberian, S., Kraus, O., Urbanik, J., Taylor, J., Mabey, B., Victors, M., Yosinski, J., Sereshkeh, A. R., Haque, I., and Earnshaw, B. (2023). R x R x1: A dataset for evaluating experimental batch correction methods. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops , 2023-June:4285--4294

work page 2023

[49] [49]

Taigman, Y., Polyak, A., and Wolf, L. (2016). Unsupervised cross-domain image generation. ArXiv , abs/1611.02200

work page arXiv 2016

[50] [50]

N., Singh, D., Revanur, A., et al

Venkat, N., Kundu, J. N., Singh, D., Revanur, A., et al. (2020). Your classifier can secretly suffice multi-source domain adaptation. Advances in Neural Information Processing Systems , 33:4647--4659

work page 2020

[51] [51]

Wang, D., Shelhamer, E., Liu, S., Olshausen, B., and Darrell, T. (2021). TENT : Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations

work page 2021

[52] [52]

V., and Dai, D

Wang, Q., Fink, O., Gool, L. V., and Dai, D. (2022). Continual test-time domain adaptation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2022-June:7191--7201

work page 2022

[53] [53]

Wen, J., Greiner, R., and Schuurmans, D. (2020). Domain aggregation networks for multi-source domain adaptation. In International conference on machine learning , pages 10214--10224. PMLR

work page 2020

[54] [54]

Yang, L., Balaji, Y., Lim, S.-N., and Shrivastava, A. (2020). Curriculum manager for source selection in multi-source domain adaptation. In European conference on computer vision , pages 608--624. Springer

work page 2020

[55] [55]

Zellinger, W., Grubinger, T., Lughofer, E., Natschl \"a ger, T., and Saminger-Platz, S. (2017). Central moment discrepancy (cmd) for domain-invariant representation learning. In International Conference on Learning Representations (ICLR)

work page 2017

[56] [56]

A., de Borja, R., Svensson, V., Thomas, N., Thakar, N., Lai, I., Winters, A., Khan, U., Jones, M

Zhang, J., Ubas, A. A., de Borja, R., Svensson, V., Thomas, N., Thakar, N., Lai, I., Winters, A., Khan, U., Jones, M. G., Tran, V., Pangallo, J., Papalexi, E., Sapre, A., Nguyen, H., Sanderson, O., Nigos, M., Kaplan, O., Schroeder, S., Hariadi, B., Marrujo, S., Salvino, C. C. A., Gallareta Olivares, G., Koehler, R., Geiss, G., Rosenberg, A., Roco, C., Mer...

work page 2025

[57] [57]

Zhang, M., Marklund, H., Dhawan, N., Gupta, A., Levine, S., and Finn, C. (2021). Adaptive R isk M inimization: learning to adapt to domain shift. In Proceedings of the 35th International Conference on Neural Information Processing Systems , NeurIPS '21, Red Hook, NY, USA. Curran Associates Inc

work page 2021

[58] [58]

Zhao, H., Liu, Y., Alahi, A., and Lin, T. (2023). On pitfalls of test-time adaptation. Proceedings of Machine Learning Research , 202:42058--42080

work page 2023

[59] [59]

M., Costeira, J

Zhao, H., Zhang, S., Wu, G., Moura, J. M., Costeira, J. P., and Gordon, G. J. (2018). Adversarial M ultiple S ource D omain A daptation. Advances in neural information processing systems , 31

work page 2018