Closing the Domain Gap in Biomedical Imaging by In-Context Control Samples
Pith reviewed 2026-05-10 00:02 UTC · model grok-4.3
The pith
Negative control samples present in every batch let meta-learning adapt models to new experimental conditions and close the domain gap in biomedical imaging.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Meta-learning approaches that exploit negative control samples as in-context references close the domain gap caused by batch effects, achieving 0.935 ± 0.018 accuracy on new experimental batches in large-scale mechanism-of-action classification, compared with a drop to 0.862 ± 0.060 for standard ResNets.
What carries the argument
Control-Stabilized Adaptive Risk Minimization via Batch Normalization (CS-ARM-BN), a meta-learning adaptation method that uses unperturbed negative control samples present in every batch as stable context for adaptation via batch normalization.
If this is right
- Models become practically usable on data from different labs or strong domain shifts when control samples are available for stabilization.
- Batch effects in bioimaging can be neutralized through in-context adaptation rather than requiring new data or retraining.
- Drug discovery pipelines that rely on mechanism-of-action classification gain reliability across experimental batches.
Where Pith is reading between the lines
- Experiments in other scientific domains that routinely include negative controls could adopt similar in-context adaptation to improve model robustness.
- The availability of controls by experimental design may prove more important for reliable machine learning than post-hoc normalization techniques.
- This suggests testing whether the same control-based adaptation improves performance on other batch-sensitive tasks such as cell segmentation or image registration.
Load-bearing premise
Negative control samples are present in every experimental batch by design and serve as stable, unperturbed context for adaptation without introducing their own bias or domain shift.
What would settle it
An experiment on new batches where removing or perturbing the negative control samples causes accuracy to fall back to the 0.862 level of standard models would show the adaptation method does not close the gap.
Figures
read the original abstract
The central problem in biomedical imaging are batch effects: systematic technical variations unrelated to the biological signal of interest. These batch effects critically undermine experimental reproducibility and are the primary cause of failure of deep learning systems on new experimental batches, preventing their practical use in the real world. Despite years of research, no method has succeeded in closing this performance gap for deep learning models. We propose Control-Stabilized Adaptive Risk Minimization via Batch Normalization (CS-ARM-BN), a meta-learning adaptation method that exploits negative control samples. Such unperturbed reference images are present in every experimental batch by design and serve as stable context for adaptation. We validate our novel method on Mechanism-of-Action (MoA) classification, a crucial task for drug discovery, on the large-scale JUMP-CP dataset. The accuracy of standard ResNets drops from 0.939 $\pm$ 0.005, on the training domain, to 0.862 $\pm$ 0.060 on data from new experimental batches. Foundation models, even after Typical Variation Normalization, fail to close this gap. We are the first to show that meta-learning approaches close the domain gap by achieving 0.935 $\pm$ 0.018. If the new experimental batches exhibit strong domain shifts, such as being generated in a different lab, meta-learning approaches can be stabilized with control samples, which are always available in biomedical experiments. Our work shows that batch effects in bioimaging data can be effectively neutralized through principled in-context adaptation, which also makes them practically usable and efficient.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Control-Stabilized Adaptive Risk Minimization via Batch Normalization (CS-ARM-BN), a meta-learning adaptation technique that exploits negative control samples present by design in every experimental batch as stable, unperturbed in-context references to neutralize batch effects in biomedical imaging. Evaluated on Mechanism-of-Action classification using the large-scale JUMP-CP dataset, it reports that standard ResNets drop from 0.939 ± 0.005 accuracy on the training domain to 0.862 ± 0.060 on new batches, while CS-ARM-BN achieves 0.935 ± 0.018 and claims to be the first meta-learning approach to close this gap; foundation models with typical variation normalization are shown to fail.
Significance. If the result holds, the work would be significant for machine learning applications in the life sciences, where batch effects are a primary obstacle to deploying models in real experimental pipelines such as drug discovery. The approach is grounded in a domain property (availability of controls) rather than generic domain adaptation, and the concrete accuracy numbers with standard deviations on a named large-scale dataset, together with comparisons to baselines, provide a clear empirical anchor. This could encourage further development of in-context adaptation methods tailored to scientific data modalities.
major comments (1)
- [Methods and Experiments] The central claim that meta-learning closes the domain gap rests on negative control samples being free of residual batch effects or bias and serving purely as stable references. No quantitative verification—such as statistical comparison of intensity histograms, feature embeddings, or invariance metrics between controls in training versus new batches—is reported in the methods or experimental sections to support this assumption. If controls carry domain-specific signals, the adaptation could exploit them rather than neutralize batch effects, undermining the interpretation of the 0.935 ± 0.018 result.
minor comments (2)
- [Abstract] The abstract states concrete accuracy figures with standard deviations but provides no overview of the meta-learning objective, the precise role of batch normalization in CS-ARM-BN, or the number of experimental batches used; adding a short methods paragraph or table summarizing these would improve readability without altering the claims.
- [Introduction] The related-work discussion should explicitly position CS-ARM-BN against prior meta-learning and domain-adaptation techniques applied to bioimaging to clarify the novelty of the control-sample stabilization step.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the significance of our work and for the constructive major comment. We address it point by point below.
read point-by-point responses
-
Referee: [Methods and Experiments] The central claim that meta-learning closes the domain gap rests on negative control samples being free of residual batch effects or bias and serving purely as stable references. No quantitative verification—such as statistical comparison of intensity histograms, feature embeddings, or invariance metrics between controls in training versus new batches—is reported in the methods or experimental sections to support this assumption. If controls carry domain-specific signals, the adaptation could exploit them rather than neutralize batch effects, undermining the interpretation of the 0.935 ± 0.018 result.
Authors: We agree that the manuscript does not include explicit quantitative verification of the stability of negative control samples (e.g., intensity histogram comparisons, embedding invariances, or other metrics) across training and new batches. This assumption is grounded in the JUMP-CP experimental design, where negative controls are included by construction as unperturbed references. However, to directly address the concern and strengthen the interpretation of our results, we will add a dedicated analysis subsection in the revised methods and experiments. This will report statistical comparisons of controls between domains, including histogram distances and feature embedding metrics, to confirm they do not carry exploitable domain-specific signals. We believe this addition will support that the performance gain to 0.935 ± 0.018 arises from batch-effect neutralization via in-context adaptation. revision: yes
Circularity Check
No significant circularity; empirical results stand independently of self-referential definitions
full rationale
The paper introduces CS-ARM-BN as a meta-learning adaptation technique that uses negative control samples (present by design in every batch) as stable in-context references for batch normalization. The headline performance claim (0.935 ± 0.018 accuracy closing the domain gap) is presented as an empirical outcome on the JUMP-CP dataset, with direct comparisons to ResNet baselines (dropping to 0.862) and foundation models. No equations, parameter-fitting steps, or derivations are described that would reduce the reported prediction to a fitted input or self-defined quantity. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked in the provided text. The assumption that controls introduce no bias is asserted from experimental design rather than derived circularly from the method itself. The result is therefore self-contained against the stated benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Negative control samples are unperturbed reference images present in every experimental batch by design
Reference graph
Works this paper leans on
-
[1]
Ando, D. M., McLean, C. Y., and Berndl, M. (2017). Improving phenotypic measurements in high-content imaging screens. bioRxiv , page 161422
work page 2017
-
[2]
D., van Dijk, R., Carpenter, A
Arevalo, J., Su, E., Ewald, J. D., van Dijk, R., Carpenter, A. E., and Singh, S. (2024). Evaluating batch correction methods for image-based cell profiling. Nature Communications 2024 15:1 , 15:1--12
work page 2024
-
[3]
Baxter, J. (1998). Theoretical models of learning to learn. In Learning to learn , pages 71--94. Springer
work page 1998
-
[4]
Ben-David, S., Blitzer, J., Crammer, K., and Pereira, F. (2006). Analysis of representations for domain adaptation. Advances in neural information processing systems , 19
work page 2006
-
[5]
Blanchard, G., Lee, G., and Scott, C. (2011). Generalizing from several related classification tasks to a new unlabeled sample. In Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., and Weinberger, K., editors, Advances in Neural Information Processing Systems , volume 24. Curran Associates, Inc
work page 2011
-
[6]
Boudiaf, M., Mueller, R., Ayed, I. B., and Bertinetto, L. (2022). Parameter-free online test-time adaptation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2022-June:8334--8343
work page 2022
-
[7]
Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., and Krishnan, D. (2017). Unsupervised pixel-level domain adaptation with generative adversarial networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 , 2017-January:95--104
work page 2017
-
[8]
Bray, M.-A., Carpenter, A., of MIT, B. I., and Platform, H. I. (2017). Advanced assay development guidelines for image-based high content screening and analysis. Assay Guidance Manual
work page 2017
-
[9]
N., Ackerman, J., Alix, E., Ando, D
Chandrasekaran, S. N., Ackerman, J., Alix, E., Ando, D. M., Arevalo, J., Bennion, M., Boisseau, N., Borowa, A., Boyd, J. D., Brino, L., Byrne, P. J., Ceulemans, H., Ch’ng, C., Cimini, B. A., Clevert, D.-A., Deflaux, N., Doench, J. G., Dorval, T., Doyonnas, R., Dragone, V., Engkvist, O., Faloon, P. W., Fritchman, B., Fuchs, F., Garg, S., Gilbert, T. J., Gl...
work page 2023
-
[10]
Chen, W., Zhao, Y., Chen, X., Yang, Z., Xu, X., Bi, Y., Chen, V., Li, J., Choi, H., Ernest, B., Tran, B., Mehta, M., Kumar, P., Farmer, A., Mir, A., Mehra, U. A., Li, J. L., Moos, M., Xiao, W., and Wang, C. (2020). A multicenter study benchmarking single-cell rna sequencing technologies using reference samples. Nature Biotechnology 2020 39:9 , 39:1103--1114
work page 2020
-
[11]
Chung, J., Hyun, S., and Heo, J. P. (2024). Style injection in diffusion: A training-free approach for adapting large-scale diffusion models for style transfer. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , pages 8795--8805
work page 2024
-
[12]
Dong, M., Adduri, A., Gautam, D., Carpenter, C., Shah, R., Ricci-Tam, C., Kluger, Y., Burke, D. P., and Roohani, Y. H. (2026). Stack: In-context learning of single-cell biology. bioRxiv
work page 2026
-
[13]
Farahani, A., Voghoei, S., Rasheed, K., and Arabnia, H. R. (2020). A brief review of domain adaptation. ArXiv , pages 877--894
work page 2020
-
[14]
Finn, C., Abbeel, P., and Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning , pages 1126--1135. PMLR
work page 2017
-
[15]
Ganin, Y. and Lempitsky, V. (2014). Unsupervised domain adaptation by backpropagation. 32nd International Conference on Machine Learning, ICML 2015 , 2:1180--1189
work page 2014
-
[16]
Gong, T., Jeong, J., Kim, T., Kim, Y., Shin, J., and Lee, S. J. (2022). Note: Robust continual test-time adaptation against temporal correlation. Advances in Neural Information Processing Systems , 35
work page 2022
-
[17]
Graham, B., El-Nouby, A., Touvron, H., Stock, P., Joulin, A., Jégou, H., and Douze, M. (2021). Levit: a vision transformer in convnet's clothing for faster inference. Proceedings of the IEEE International Conference on Computer Vision , pages 12239--12249
work page 2021
-
[18]
Multi-Source Domain Adaptation with Mixture of Experts
Guo, J., Shah, D. J., and Barzilay, R. (2018). Multi-source domain adaptation with mixture of experts. arXiv preprint arXiv:1809.02256
work page Pith review arXiv 2018
-
[19]
Haghverdi, L., Lun, A. T., Morgan, M. D., and Marioni, J. C. (2018). Batch effects in single-cell rna-sequencing data are corrected by matching mutual nearest neighbors. Nature Biotechnology 2018 36:5 , 36:421--427
work page 2018
-
[20]
F., Matsoukas, C., Leuchowius, K
Haslum, J. F., Matsoukas, C., Leuchowius, K. J., and Smith, K. (2023). Bridging generalization gaps in high content imaging through online self-supervised domain adaptation. IEEE Workshop/Winter Conference on Applications of Computer Vision , pages 7723--7732
work page 2023
-
[21]
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2016-December:770--778
work page 2015
-
[22]
Hie, B., Bryson, B., and Berger, B. (2019). Efficient integration of heterogeneous single-cell transcriptomes using scanorama. Nature biotechnology , 37:685--691
work page 2019
-
[23]
Hochreiter, S., Younger, A. S., and Conwell, P. R. (2001). Learning to learn using gradient descent. In International conference on artificial neural networks , pages 87--94. Springer
work page 2001
-
[24]
Hughes, J. P., Rees, S. S., Kalindjian, S. B., and Philpott, K. L. (2011). Principles of early drug discovery. British Journal of Pharmacology , 162:1239
work page 2011
-
[25]
Ioffe, S. and Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 , ICML'15, page 448–456. JMLR.org
work page 2015
-
[26]
Johnson, W. E., Li, C., and Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics (Oxford, England) , 8:118--127
work page 2007
-
[27]
M., Halawa, M., König, T., Gnutt, D., and Zapata, P
Kim, V., Adaloglou, N., Osterland, M., Morelli, F. M., Halawa, M., König, T., Gnutt, D., and Zapata, P. A. M. (2025). Self-supervision advances morphological profiling by unlocking powerful image representations. Scientific Reports 2025 15:1 , 15:1--15
work page 2025
-
[28]
Kimura, M. and Hino, H. (2024). A short survey on importance weighting for machine learning. arXiv preprint arXiv:2403.10175
-
[29]
Knowles, J. and Gromo, G. (2003). A guide to drug discovery: Target selection in drug discovery. Nature reviews. Drug discovery , 2:63--69
work page 2003
-
[30]
Korsunsky, I., Millard, N., Fan, J., Slowikowski, K., Zhang, F., Wei, K., Baglaenko, Y., Brenner, M., ru Loh, P., and Raychaudhuri, S. (2019). Fast, sensitive and accurate integration of single-cell data with harmony. Nature Methods 2019 16:12 , 16:1289--1296
work page 2019
-
[31]
V., Morse, K., Makes, M., Mabey, B., and Earnshaw, B
Kraus, O., Kenyon-Dean, K., Saberian, S., Fallah, M., McLean, P., Leung, J., Sharma, V., Khan, A., Balakrishnan, J., Celik, S., Beaini, D., Sypetkowski, M., Cheng, C. V., Morse, K., Makes, M., Mabey, B., and Earnshaw, B. (2024). Masked autoencoders for microscopy are scalable learners of cellular biology. Proceedings of the IEEE Computer Society Conferenc...
work page 2024
-
[32]
Lee, J., Jung, D., Lee, S., Park, J., Shin, J., Hwang, U., and Yoon, S. (2024). Entropy is not enough for test-time adaptation: From the perspective of disentangled factors. 12th International Conference on Learning Representations, ICLR 2024
work page 2024
-
[33]
Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., Geman, D., Baggerly, K., and Irizarry, R. A. (2010). Tackling the widespread and critical impact of batch effects in high-throughput data. Nature reviews. Genetics , 11:10.1038/nrg2825
-
[34]
Li, Y., Wang, N., Shi, J., Liu, J., and Hou, X. (2016). Revisiting batch normalization for practical domain adaptation. International Conference on Learning Representations
work page 2016
-
[35]
Lin, A. and Lu, A. (2022). Incorporating knowledge of plates in batch normalization improves generalization of deep learning for microscopy images. In Knowles, D. A., Mostafavi, S., and Lee, S.-I., editors, Proceedings of the 17th Machine Learning in Computational Biology meeting , volume 200 of Proceedings of Machine Learning Research , pages 74--93. PMLR
work page 2022
-
[36]
Liu, M.-Y. and Tuzel, O. (2016). Coupled generative adversarial networks. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems , volume 29. Curran Associates, Inc
work page 2016
-
[37]
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I., and Yosef, N. (2018). Deep generative modeling for single-cell transcriptomics. Nature methods , 15:1053--1058
work page 2018
-
[38]
D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M
Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M., and Theis, F. J. (2021). Benchmarking atlas-level data integration in single-cell genomics. Nature Methods 2021 19:1 , 19:41--50
work page 2021
-
[39]
Marsden, R. A., Döbler, M., and Yang, B. (2023). Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction. Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024 , pages 2543--2553
work page 2023
-
[40]
Palma, A., Theis, F. J., and Lotfollahi, M. (2025). Predicting cell morphological responses to perturbations using generative modeling. Nature Communications , 16:1--19
work page 2025
-
[41]
Park, S., Yang, S., Choo, J., and Yun, S. (2023). Label shift adapter for test-time adaptation under covariate and label shifts. Proceedings of the IEEE International Conference on Computer Vision , pages 16375--16385
work page 2023
-
[42]
D., Shen, C., Gross, T., Min, J., Garda, S., Yuan, B., Schumacher, L
Peidli, S., Green, T. D., Shen, C., Gross, T., Min, J., Garda, S., Yuan, B., Schumacher, L. J., Taylor-King, J. P., Marks, D. S., Luna, A., Blüthgen, N., and Sander, C. (2023). sc P erturb: Harmonized single-cell perturbation data. bioRxiv , page 2022.08.20.504663
work page 2023
-
[43]
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., and Wang, B. (2019). Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision , pages 1406--1415
work page 2019
-
[44]
Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference , 90(2):227--244
work page 2000
-
[45]
Stirling, D. R., Swain-Bowden, M. J., Lucas, A. M., Carpenter, A. E., Cimini, B. A., and Goodman, A. (2021). Cellprofiler 4: improvements in speed, utility and usability. BMC Bioinformatics , 22:433
work page 2021
-
[46]
M., Hao, Y., Stoeckius, M., Smibert, P., and Satija, R
Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., Mauck, W. M., Hao, Y., Stoeckius, M., Smibert, P., and Satija, R. (2019). Comprehensive integration of single-cell data. Cell , 177:1888--1902.e21
work page 2019
-
[47]
Sun, B., Feng, J., and Saenko, K. (2015). Return of frustratingly easy domain adaptation. 30th AAAI Conference on Artificial Intelligence, AAAI 2016 , pages 2058--2065
work page 2015
-
[48]
R., Haque, I., and Earnshaw, B
Sypetkowski, M., Rezanejad, M., Saberian, S., Kraus, O., Urbanik, J., Taylor, J., Mabey, B., Victors, M., Yosinski, J., Sereshkeh, A. R., Haque, I., and Earnshaw, B. (2023). R x R x1: A dataset for evaluating experimental batch correction methods. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops , 2023-June:4285--4294
work page 2023
- [49]
-
[50]
N., Singh, D., Revanur, A., et al
Venkat, N., Kundu, J. N., Singh, D., Revanur, A., et al. (2020). Your classifier can secretly suffice multi-source domain adaptation. Advances in Neural Information Processing Systems , 33:4647--4659
work page 2020
-
[51]
Wang, D., Shelhamer, E., Liu, S., Olshausen, B., and Darrell, T. (2021). TENT : Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations
work page 2021
-
[52]
Wang, Q., Fink, O., Gool, L. V., and Dai, D. (2022). Continual test-time domain adaptation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2022-June:7191--7201
work page 2022
-
[53]
Wen, J., Greiner, R., and Schuurmans, D. (2020). Domain aggregation networks for multi-source domain adaptation. In International conference on machine learning , pages 10214--10224. PMLR
work page 2020
-
[54]
Yang, L., Balaji, Y., Lim, S.-N., and Shrivastava, A. (2020). Curriculum manager for source selection in multi-source domain adaptation. In European conference on computer vision , pages 608--624. Springer
work page 2020
-
[55]
Zellinger, W., Grubinger, T., Lughofer, E., Natschl \"a ger, T., and Saminger-Platz, S. (2017). Central moment discrepancy (cmd) for domain-invariant representation learning. In International Conference on Learning Representations (ICLR)
work page 2017
-
[56]
A., de Borja, R., Svensson, V., Thomas, N., Thakar, N., Lai, I., Winters, A., Khan, U., Jones, M
Zhang, J., Ubas, A. A., de Borja, R., Svensson, V., Thomas, N., Thakar, N., Lai, I., Winters, A., Khan, U., Jones, M. G., Tran, V., Pangallo, J., Papalexi, E., Sapre, A., Nguyen, H., Sanderson, O., Nigos, M., Kaplan, O., Schroeder, S., Hariadi, B., Marrujo, S., Salvino, C. C. A., Gallareta Olivares, G., Koehler, R., Geiss, G., Rosenberg, A., Roco, C., Mer...
work page 2025
-
[57]
Zhang, M., Marklund, H., Dhawan, N., Gupta, A., Levine, S., and Finn, C. (2021). Adaptive R isk M inimization: learning to adapt to domain shift. In Proceedings of the 35th International Conference on Neural Information Processing Systems , NeurIPS '21, Red Hook, NY, USA. Curran Associates Inc
work page 2021
-
[58]
Zhao, H., Liu, Y., Alahi, A., and Lin, T. (2023). On pitfalls of test-time adaptation. Proceedings of Machine Learning Research , 202:42058--42080
work page 2023
-
[59]
Zhao, H., Zhang, S., Wu, G., Moura, J. M., Costeira, J. P., and Gordon, G. J. (2018). Adversarial M ultiple S ource D omain A daptation. Advances in neural information processing systems , 31
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.