pith. sign in

arxiv: 2508.15452 · v3 · submitted 2025-08-21 · 📡 eess.IV · cs.CV

DoSReMC: Domain Shift Resilient Mammography Classification using Batch Normalization Adaptation

Pith reviewed 2026-05-18 22:03 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords batch normalization adaptationdomain shiftmammography classificationcross-domain generalizationbreast cancer detectionadversarial trainingfull-field digital mammography
0
0 comments X p. Extension

The pith

Fine-tuning only batch normalization and fully connected layers restores cross-domain performance in pretrained mammography classifiers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that batch normalization layers in convolutional networks create domain dependence when mammography images come from different scanners or clinics. It shows that updating only the normalization statistics and the final classifier layers, while freezing the convolutional feature extractors, recovers much of the lost accuracy on target domains. Adding an adversarial training component during this limited adaptation yields further gains. The approach matters because full retraining of large models is expensive and often impractical in new hospital settings, so a lightweight fix could make existing breast-cancer detectors usable across more real-world data sources.

Core claim

Batch normalization layers are the primary source of domain dependence: they work well inside a single training distribution but degrade generalization when the test distribution shifts. DoSReMC therefore freezes the pretrained convolutional filters and updates only the batch-normalization parameters together with the fully connected classifier, optionally under an adversarial objective. Experiments across three large full-field digital mammography collections, including a new pathologically confirmed in-house set, show that this targeted update recovers cross-domain accuracy at far lower computational cost than full-model fine-tuning.

What carries the argument

Targeted adaptation of batch normalization running statistics and fully connected layers on top of frozen pretrained convolutional filters, optionally combined with adversarial training.

If this is right

  • Cross-domain accuracy on unseen mammography datasets improves without any change to the convolutional feature extractors.
  • Adaptation training time and memory use drop substantially compared with full-network retraining.
  • The same BN-plus-FC update can be added to existing mammography pipelines without redesigning the backbone.
  • Adversarial training on top of the limited adaptation produces an extra boost in target-domain performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same limited adaptation may work for other medical imaging modalities where scanner differences dominate the domain gap.
  • If normalization statistics carry most domain information, similar lightweight fixes could apply to non-medical vision tasks that suffer from camera or lighting shifts.
  • Future pipelines could combine BN adaptation with small amounts of target-domain data collection to keep models current as new scanners enter clinical use.

Load-bearing premise

The effects of domain shift are concentrated inside the mean and variance statistics tracked by batch normalization, so updating only those layers plus the classifier is sufficient to restore generalization.

What would settle it

On a held-out target-domain mammography set, full fine-tuning of all layers fails to outperform or even matches the accuracy obtained by updating only batch normalization and fully connected layers.

Figures

Figures reproduced from arXiv: 2508.15452 by Burhan Kele\c{s}, Deniz Katircioglu-\"Ozt\"urk, Emre K. S\"usl\"u, Figen B. Demirkaz{\i}k, Gamze Durhan, G\"ozde B. Akar, Meltem G. Akp{\i}nar, Mete C. Kaya, U\u{g}urcan Aky\"uz.

Figure 2
Figure 2. Figure 2: Distribution of Density scores per study in the HCTP [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of BI-RADS scores per study in the HCTP [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example images that were excluded from the HCTP dataset. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Standardized mammography samples from Hacettepe, [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The figure illustrates the two-step architecture proposed in [35], which combines global and local information, adapted here within a [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Pixel-intensity distributions of mammography images from [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Saliency maps generated by Mtr NYU, M ′ tr NYU, M ′ tt NYU mod￾els on a sample from the CSAW dataset. Blue regions represent the ground truth annotations, while red regions highlight the saliency maps corresponding to the malignant class. In our second experiment, we investigated the effect of an artificially induced domain shift on the MNYU. While the pretrained model was originally trained us￾ing input s… view at source ↗
Figure 9
Figure 9. Figure 9: JS divergence across BN layers in the Global Module (ResNet-22), using [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Saliency maps generated by Mtr NYU, Mtr HCTP, Mtt HCTP(BNFC) models a sample from the HCTP dataset. Blue regions represent the ground truth annotations, while red regions highlight the saliency maps corresponding to the malignant class. tunes only the BN and FC layers on the HCTP and VinDr datasets. This strategy demonstrates that such selective adaptation can significantly improve perfor￾mance, aligning … view at source ↗
Figure 11
Figure 11. Figure 11: JS divergence across BN layers in the Global Module (ResNet-22), using [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
read the original abstract

Numerous deep learning-based solutions have been developed for the automatic recognition of breast cancer using mammography images. However, their performance often declines when applied to data from different domains, primarily due to domain shift - the variation in data distributions between source and target domains. This performance drop limits the safe and equitable deployment of AI in real-world clinical settings. In this study, we present DoSReMC (Domain Shift Resilient Mammography Classification), a batch normalization (BN) adaptation framework designed to enhance cross-domain generalization without retraining the entire model. Using three large-scale full-field digital mammography (FFDM) datasets - including HCTP, a newly introduced, pathologically confirmed in-house dataset - we conduct a systematic cross-domain evaluation with convolutional neural networks (CNNs). Our results demonstrate that BN layers are a primary source of domain dependence: they perform effectively when training and testing occur within the same domain, and they significantly impair model generalization under domain shift. DoSReMC addresses this limitation by fine-tuning only the BN and fully connected (FC) layers, while preserving pretrained convolutional filters. We further integrate this targeted adaptation with an adversarial training scheme, yielding additional improvements in cross-domain generalizability while reducing the computational cost of model training. DoSReMC can be readily incorporated into existing AI pipelines and applied across diverse clinical environments, providing a practical pathway toward more robust and generalizable mammography classification systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces DoSReMC, a targeted adaptation framework for CNN-based mammography classification that fine-tunes only batch normalization (BN) and fully connected (FC) layers while preserving pretrained convolutional filters, combined with adversarial training. It claims that BN layers are the primary source of domain dependence—they perform well in-domain but impair generalization under domain shift—and demonstrates this via systematic cross-domain evaluation on three large-scale FFDM datasets, including the new pathologically confirmed in-house HCTP dataset, to achieve improved cross-domain performance at lower computational cost.

Significance. If the central empirical claims hold with stronger supporting evidence, the work could offer a practical, low-cost method for adapting mammography AI models to new clinical domains without full retraining, supporting more equitable deployment across sites with varying imaging protocols. The introduction of the HCTP dataset and the focus on BN as a domain-shift locus are potentially useful contributions to the medical imaging community, though their impact hinges on rigorous validation of the localization assumption.

major comments (2)
  1. [Abstract] Abstract: The claim that BN layers are a primary source of domain dependence (performing effectively in-domain but significantly impairing generalization under shift) is load-bearing for motivating DoSReMC, yet the abstract provides no specific quantitative metrics (e.g., AUC or accuracy drops), baselines, or statistical tests across the three datasets to substantiate this over other factors such as intensity or texture variations affecting convolutional activations.
  2. [DoSReMC framework] DoSReMC framework description: The premise that domain shift effects are concentrated in BN statistics and that fine-tuning only BN+FC (plus adversarial training) suffices while freezing conv filters assumes domain-invariant representations from the pretrained convolutional feature extractors. This is not supported by any described ablations comparing BN+FC adaptation against full fine-tuning or adaptation of earlier layers, leaving open whether reported cross-domain gains are maximal or if low-level domain biases persist in the frozen filters.
minor comments (1)
  1. [Abstract] Abstract: More detail on the characteristics of the three FFDM datasets (e.g., sizes, acquisition parameters, and specific domain differences between HCTP and external sets) would strengthen the description of the cross-domain evaluation setup.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their valuable feedback on our manuscript. We address each of the major comments in detail below and have made revisions to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that BN layers are a primary source of domain dependence (performing effectively in-domain but significantly impairing generalization under shift) is load-bearing for motivating DoSReMC, yet the abstract provides no specific quantitative metrics (e.g., AUC or accuracy drops), baselines, or statistical tests across the three datasets to substantiate this over other factors such as intensity or texture variations affecting convolutional activations.

    Authors: We agree that the abstract should provide more concrete evidence to support the claim regarding BN layers as a primary source of domain dependence. In the revised manuscript, we have updated the abstract to include specific quantitative metrics from our experiments, such as the observed AUC drops under domain shift across the three datasets. We have also clarified the baselines used and noted that these findings are supported by statistical comparisons in the results section. This helps distinguish the BN effect from other factors like intensity variations, which are mitigated by our adversarial training approach. revision: yes

  2. Referee: [DoSReMC framework] DoSReMC framework description: The premise that domain shift effects are concentrated in BN statistics and that fine-tuning only BN+FC (plus adversarial training) suffices while freezing conv filters assumes domain-invariant representations from the pretrained convolutional feature extractors. This is not supported by any described ablations comparing BN+FC adaptation against full fine-tuning or adaptation of earlier layers, leaving open whether reported cross-domain gains are maximal or if low-level domain biases persist in the frozen filters.

    Authors: We acknowledge that the manuscript does not present explicit ablations comparing BN+FC adaptation to full fine-tuning or to adapting earlier convolutional layers. Our approach is grounded in the hypothesis that pretrained convolutional filters capture domain-invariant features, as supported by the literature on transfer learning in medical imaging and our systematic cross-domain evaluations showing improved generalization. To address this concern, we have added a discussion in the revised manuscript explaining the rationale and referencing related works. We believe the current evidence from the three datasets supports the efficacy of the proposed method, though we recognize that additional ablations could further strengthen the claims. revision: partial

Circularity Check

0 steps flagged

Empirical BN adaptation technique with no circular derivation

full rationale

The paper introduces DoSReMC as an empirical adaptation method that fine-tunes only BN and FC layers on top of pretrained CNNs for cross-domain mammography classification. Its central claims rest on experimental cross-domain evaluations across three FFDM datasets rather than any first-principles derivation, mathematical prediction, or chain of equations. No steps reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the reported gains are presented as outcomes of targeted fine-tuning plus adversarial training, validated directly against held-out target domains. The work is therefore self-contained as a practical engineering contribution without tautological reduction of its results to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that batch normalization statistics capture most domain shift effects in mammography CNNs and that targeted fine-tuning of BN and FC layers plus adversarial training is adequate for generalization.

axioms (1)
  • domain assumption Domain shift in mammography images primarily manifests in the running statistics of batch normalization layers.
    Invoked when stating that BN layers are the primary source of domain dependence and that adapting them restores generalization.
invented entities (1)
  • DoSReMC framework no independent evidence
    purpose: Targeted BN and FC adaptation for domain shift resilience
    Newly named method introduced to solve the identified limitation of standard BN in cross-domain settings.

pith-pipeline@v0.9.0 · 5857 in / 1289 out tokens · 44251 ms · 2026-05-18T22:03:49.785533+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 6 internal anchors

  1. [1]

    F. Bray, M. Laversanne, H. Sung, J. Ferlay, R. L. Siegel, I. So- erjomataram, A. Jemal, Global cancer statistics 2022: Globo- can estimates of incidence and mortality worldwide for 36 can- cers in 185 countries, CA: a cancer journal for clinicians (2021). doi:10.3322/caac.21834

  2. [2]

    R. L. Siegel, A. N. Giaquinto, A. Jemal, Cancer statistics, 2024, CA: a cancer journal for clinicians 74 (1) (2024) 12–49. doi: 10.3322/caac.21820

  3. [3]

    W. Ren, M. Chen, Y . Qiao, F. Zhao, Global guidelines for breast cancer screening: a systematic review, The Breast 64 (2022) 85–

  4. [4]

    doi:10.1016/j.breast.2022.04.003

  5. [5]

    Katalinic, N

    A. Katalinic, N. Eisemann, K. Kraywinkel, M. R. Noftz, J. Hüb- ner, Breast cancer incidence and mortality before and after implementation of the german mammography screening pro- gram, International journal of cancer 147 (3) (2020) 709–718. doi:10.1002/ijc.32767

  6. [6]

    S. W. Du ffy, L. Tabár, H.-H. Chen, M. Holmqvist, M.-F. Yen, S. Abdsalah, B. Epstein, E. Frodis, E. Ljungberg, C. Hedborg- Melander, et al., The impact of organized mammography service screening on breast carcinoma mortality in seven swedish coun- ties: a collaborative evaluation, Cancer: Interdisciplinary Inter- national Journal of the American Cancer So...

  7. [7]

    Schopper, C

    D. Schopper, C. de Wolf, How effective are breast cancer screen- ing programmes by mammography? review of the current evi- dence, European journal of cancer 45 (11) (2009) 1916–1923. doi:10.1016/j.ejca.2009.03.022

  8. [8]

    Kalager, M

    M. Kalager, M. Zelen, F. Langmark, H.-O. Adami, E ffect of screening mammography on breast-cancer mortality in norway, New England Journal of Medicine 363 (13) (2010) 1203–1210. doi:10.1056/NEJMoa1000727

  9. [9]

    Brodersen, V

    J. Brodersen, V . D. Siersma, Long-term psychosocial conse- quences of false-positive screening mammography, The Annals of Family Medicine 11 (2) (2013) 106–115. doi:10.1370/ afm.1466

  10. [10]

    H. D. Nelson, E. S. O’meara, K. Kerlikowske, S. Balch, D. Miglioretti, Factors associated with rates of false-positive and false-negative results from digital mammography screening: an analysis of registry data, Annals of internal medicine 164 (4) (2016) 226–235. doi:10.7326/M15-0971

  11. [11]

    Zhang, Y

    X. Zhang, Y . Zhang, E. Y . Han, N. Jacobs, Q. Han, X. Wang, J. Liu, Classification of whole mammogram and tomosynthe- sis images using deep convolutional neural networks, IEEE transactions on nanobioscience 17 (3) (2018) 237–242. doi: 10.1109/TNB.2018.2845103

  12. [12]

    Y . Tan, K. Sim, F. F. Ting, Breast cancer detection using con- volutional neural networks for mammogram imaging system, in: 2017 International Conference on Robotics, Automation and Sciences (ICORAS), IEEE, 2017, pp. 1–5. doi:10.1109/ ICORAS.2017.8308076

  13. [13]

    Lu, E.-W

    H.-C. Lu, E.-W. Loh, S.-C. Huang, The classification of mam- mogram using convolutional neural network with specific im- age preprocessing for breast cancer detection, in: 2019 2nd In- ternational Conference on Artificial Intelligence and Big Data (ICAIBD), IEEE, 2019, pp. 9–12. doi:10.1109/ICAIBD. 2019.8837000

  14. [14]

    K. J. Geras, S. Wolfson, Y . Shen, N. Wu, S. Kim, E. Kim, L. Heacock, U. Parikh, L. Moy, K. Cho, High-resolution breast cancer screening with multi-view deep convolutional neural networks, arXiv preprint arXiv:1703.07047 (2017). doi:10. 48550/arXiv.1703.07047

  15. [15]

    H. N. Khan, A. R. Shahid, B. Raza, A. H. Dar, H. Alquhayz, Multi-view feature fusion based four views model for mammo- gram classification using convolutional neural network, IEEE Access 7 (2019) 165724–165733. doi:10.1109/ACCESS. 2019.2953318

  16. [16]

    H. Wang, J. Feng, Z. Zhang, H. Su, L. Cui, H. He, L. Liu, Breast mass classification via deeply integrating the contextual infor- mation from multi-view data, Pattern Recognition 80 (2018) 42–

  17. [17]

    doi:10.1016/j.patcog.2018.02.026

  18. [18]

    MAMMO: A Deep Learning Solution for Facilitating Radiologist-Machine Collaboration in Breast Cancer Diagnosis

    T. Kyono, F. J. Gilbert, M. van der Schaar, Mammo: A deep learning solution for facilitating radiologist-machine collabora- tion in breast cancer diagnosis, arXiv preprint arXiv:1811.02661 (2018). doi:10.48550/arXiv.1811.02661

  19. [19]

    Lopez, E

    E. Lopez, E. Grassucci, M. Valleriani, D. Comminiello, Hy- percomplex neural architectures for multi-view breast cancer classification, arXiv preprint arXiv:2204.05798 (2022). doi: 10.48550/arXiv.2204.05798

  20. [20]

    M. G. Ertosun, D. L. Rubin, Probabilistic visual search for masses within mammography images using deep learning, in: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, 2015, pp. 1310–1315. doi:10. 1109/BIBM.2015.7359868

  21. [21]

    W. E. Fathy, A. S. Ghoneim, A deep learning approach for breast cancer mass detection, International Journal of Advanced Com- puter Science and Applications 10 (1) (2019). doi:10.14569/ IJACSA.2019.0100123

  22. [22]

    A. J. Barnett, F. R. Schwartz, C. Tao, C. Chen, Y . Ren, J. Y . Lo, C. Rudin, A case-based interpretable deep learning model for classification of mass lesions in digital mammography, Nature Machine Intelligence 3 (12) (2021) 1061–1070.doi:10.1038/ s42256-021-00423-x

  23. [23]

    Zhou, L.-Y

    J. Zhou, L.-Y . Luo, Q. Dou, H. Chen, C. Chen, G.-J. Li, Z.- F. Jiang, P.-A. Heng, Weakly supervised 3d deep learning for breast cancer classification and localization of the lesions in mr images, Journal of Magnetic Resonance Imaging 50 (4) (2019) 1144–1151. doi:10.1002/jmri.26721

  24. [24]

    Agarwal, O

    R. Agarwal, O. Diaz, X. Lladó, M. H. Yap, R. Martí, Automatic mass detection in mammograms using deep convolutional neu- ral networks, Journal of Medical Imaging 6 (3) (2019) 031409– 031409. doi:10.1117/1.JMI.6.3.031409

  25. [25]

    H. Cao, S. Pu, W. Tan, J. Tong, Breast mass detection in dig- ital mammography based on anchor-free architecture, Com- puter methods and programs in biomedicine 205 (2021) 106033. doi:10.1016/j.cmpb.2021.106033

  26. [27]

    Zheng, F

    T. Zheng, F. Lin, X. Li, T. Chu, J. Gao, S. Zhang, Z. Li, Y . Gu, S. Wang, F. Zhao, et al., Deep learning-enabled fully automated pipeline system for segmentation and classification of single- mass breast lesions using contrast-enhanced mammography: a prospective, multicentre study, EClinicalMedicine 58 (2023). doi:10.1016/j.eclinm.2023.101913

  27. [29]

    K. Wang, N. Khan, R. Highnam, Automated segmentation of breast arterial calcifications from digital mammography, in: 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), IEEE, 2019, pp. 1–6. doi:10.1109/ IVCNZ48456.2019.8960956

  28. [30]

    Sakaida, T

    M. Sakaida, T. Yoshimura, M. Tang, S. Ichikawa, H. Sugi- mori, Development of a mammography calcification detection 17 algorithm using deep learning with resolution-preserved image patch division, Algorithms 16 (10) (2023) 483. doi:10.3390/ a16100483

  29. [31]

    A. Yala, P. G. Mikhael, F. Strand, G. Lin, K. Smith, Y .-L. Wan, L. Lamb, K. Hughes, C. Lehman, R. Barzilay, Toward robust mammography-based models for breast cancer risk, Sci- ence Translational Medicine 13 (578) (2021) eaba4373. doi: 10.1126/scitranslmed.aba4373

  30. [32]

    A. Yala, P. G. Mikhael, F. Strand, G. Lin, S. Satuluru, T. Kim, I. Banerjee, J. Gichoya, H. Trivedi, C. D. Lehman, et al., Multi- institutional validation of a mammography-based breast cancer risk model, Journal of Clinical Oncology 40 (16) (2022) 1732–

  31. [33]

    doi:10.1200/JCO.21.01337

  32. [34]

    Dembrower, P

    K. Dembrower, P. Lindholm, F. Strand, A multi-million mam- mography image dataset and population-based screening co- hort for the training and evaluation of deep neural net- works—the cohort of screen-aged women (csaw), Journal of digital imaging 33 (2) (2020) 408–413. doi:10.1007/ s10278-019-00278-0

  33. [35]

    K. Liu, Y . Shen, N. Wu, J. Chł˛ edowski, C. Fernandez-Granda, K. J. Geras, Weakly-supervised high-resolution segmentation of mammography images for breast cancer diagnosis, Proceedings of machine learning research 143 (2021) 268. doi:10.48550/ arXiv.2106.07049

  34. [36]

    Y . Tang, Z. Cao, Y . Zhang, Z. Yang, Z. Ji, Y . Wang, M. Han, J. Ma, J. Xiao, P. Chang, Leveraging large-scale weakly labeled data for semi-supervised mass detection in mammograms, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3855–3864. doi:10.1109/ CVPR46437.2021.00385

  35. [37]

    Bakalo, R

    R. Bakalo, R. Ben-Ari, J. Goldberger, Classification and detec- tion in mammograms with weak supervision via dual branch deep neural net, in: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), IEEE, 2019, pp. 1905–

  36. [38]

    doi:10.1109/ISBI.2019.8759458

  37. [39]

    Y . Shen, N. Wu, J. Phang, J. Park, K. Liu, S. Tyagi, L. Hea- cock, S. G. Kim, L. Moy, K. Cho, et al., An interpretable clas- sifier for high-resolution breast cancer screening images utiliz- ing weakly supervised localization, Medical image analysis 68 (2021) 101908. doi:10.1016/j.media.2020.101908

  38. [40]

    Jiménez-Sánchez, M

    A. Jiménez-Sánchez, M. Tardy, M. A. G. Ballester, D. Ma- teus, G. Piella, Memory-aware curriculum federated learning for breast cancer classification, Computer Methods and Programs in Biomedicine 229 (2023) 107318. doi:10.1016/j.cmpb. 2022.107318

  39. [41]

    H. R. Roth, K. Chang, P. Singh, N. Neumark, W. Li, V . Gupta, S. Gupta, L. Qu, A. Ihsani, B. C. Bizzo, et al., Federated learning for breast density classification: A real-world imple- mentation, in: MICCAI Workshop on Domain Adaptation and Representation Transfer, Springer, 2020, pp. 181–191. doi: 10.1007/978-3-030-60548-3_18

  40. [42]

    C. S. Perone, P. Ballester, R. C. Barros, J. Cohen-Adad, Un- supervised domain adaptation for medical imaging segmenta- tion with self-ensembling, NeuroImage 194 (2019) 1–11. doi: 10.1016/j.neuroimage.2019.03.026

  41. [43]

    Karani, K

    N. Karani, K. Chaitanya, C. Baumgartner, E. Konukoglu, A life- long learning approach to brain mr segmentation across scanners and protocols, in: International conference on medical image computing and computer-assisted intervention, Springer, 2018, pp. 476–484. doi:10.1007/978-3-030-00928-1_54

  42. [44]

    H. Guan, M. Liu, Domain adaptation for medical image anal- ysis: a survey, IEEE Transactions on Biomedical Engineer- ing 69 (3) (2021) 1173–1185. doi:10.1109/TBME.2021. 3117407

  43. [45]

    Ghafoorian, A

    M. Ghafoorian, A. Mehrtash, T. Kapur, N. Karssemeijer, E. Marchiori, M. Pesteie, C. R. Guttmann, F.-E. De Leeuw, C. M. Tempany, B. Van Ginneken, et al., Transfer learning for domain adaptation in mri: Application in brain lesion segmenta- tion, in: International conference on medical image computing and computer-assisted intervention, Springer, 2017, pp. 516–

  44. [46]

    doi:10.1007/978-3-319-66179-7_59

  45. [47]

    Y . Gu, Z. Ge, C. P. Bonnington, J. Zhou, Progressive transfer learning and adversarial domain adaptation for cross-domain skin disease classification, IEEE journal of biomedical and health informatics 24 (5) (2019) 1379–1393. doi:10.1109/ JBHI.2019.2942429

  46. [48]

    P. Laiz, J. Vitrià, S. Seguí, Using the triplet loss for domain adaptation in wce, in: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019, pp. 399–405. doi:10.1109/ICCVW.2019.00051

  47. [49]

    Garrucho, K

    L. Garrucho, K. Kushibar, S. Jouide, O. Diaz, L. Igual, K. Lekadir, Domain generalization in deep learning based mass detection in mammography: A large-scale multi-center study, Artificial Intelligence in Medicine 132 (2022) 102386. doi: 10.1016/j.artmed.2022.102386

  48. [50]

    Kumar, C

    D. Kumar, C. Kumar, M. Shao, Cross-database mammographic image analysis through unsupervised domain adaptation, in: 2017 IEEE international conference on big data (Big Data), IEEE, 2017, pp. 4035–4042. doi:10.1109/BigData.2017. 8258419

  49. [51]

    F. Ryan, K. L.-L. Román, B. Z. Gerbolés, K. M. Rebescher, M. S. Txurio, R. C. Ugarte, M. J. G. González, I. M. Oliver, Unsupervised domain adaptation for the segmentation of breast tissue in mammography images, Computer Methods and Pro- grams in Biomedicine 211 (2021) 106368. doi:10.1016/j. cmpb.2021.106368

  50. [52]

    G. I. Quintana, V . Jugnon, L. Vancamberg, A. Desolneux, M. Mougeot, Contrastive learning: an e fficient domain adap- tation strategy for 2d mammography image classification, in: 2024 IEEE International Symposium on Biomedical Imaging (ISBI), IEEE, 2024, pp. 1–5. doi:10.1109/ISBI56570. 2024.10635873

  51. [53]

    A. D. Lauritzen, M. C. von Euler-Chelpin, E. Lynge, I. Vejborg, M. Nielsen, N. Karssemeijer, M. Lillholm, Robust cross-vendor mammographic texture models using augmentation-based do- main adaptation for long-term breast cancer risk, Journal of Medical Imaging 10 (5) (2023) 054003–054003. doi:10. 1117/1.JMI.10.5.054003

  52. [54]

    H. T. Nguyen, H. Q. Nguyen, H. H. Pham, K. Lam, L. T. Le, M. Dao, V . Vu, Vindr-mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mam- mography, Scientific Data 10 (1) (2023) 277. doi:10.1038/ s41597-023-02100-7

  53. [55]

    Strand, CSAW-CC (mammography) – a dataset for AI re- search to improve screening, diagnostics and prognostics of breast cancer (2022)

    F. Strand, CSAW-CC (mammography) – a dataset for AI re- search to improve screening, diagnostics and prognostics of breast cancer (2022). doi:10.5878/45vm-t798. URL https://doi.org/10.5878/45vm-t798

  54. [57]

    Y . Li, N. Wang, J. Shi, J. Liu, X. Hou, Revisiting batch normalization for practical domain adaptation, arXiv preprint arXiv:1603.04779 (2016). doi:10.48550/arXiv.1603. 04779

  55. [58]

    Removing covariate shift improves robustness against common corruptions

    S. Schneider, E. Rusak, L. Eck, O. Bringmann, W. Brendel, M. Bethge, Improving robustness against common corruptions by covariate shift adaptation, Advances in neural information processing systems 33 (2020) 11539–11551. doi:10.48550/ arXiv.2006.16971. 18

  56. [59]

    Frankle, D

    J. Frankle, D. J. Schwab, A. S. Morcos, Training batchnorm and only batchnorm: On the expressive power of random fea- tures in cnns, arXiv preprint arXiv:2003.00152 (2020). doi: 10.48550/arXiv.2003.00152

  57. [60]

    Io ffe, C

    S. Io ffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in: Inter- national conference on machine learning, pmlr, 2015, pp. 448–

  58. [61]

    doi:10.48550/arXiv.1502.03167

  59. [63]

    Bjorck, C

    N. Bjorck, C. P. Gomes, B. Selman, K. Q. Weinberger, Under- standing batch normalization, Advances in neural information processing systems 31 (2018). doi:10.48550/arXiv.1806. 02375

  60. [64]

    N. Wu, J. Phang, J. Park, Y . Shen, S. G. Kim, L. Heacock, L. Moy, K. Cho, K. J. Geras, The nyu breast cancer screening dataset v1. 0, New York Univ., New York, NY , USA, Tech. Rep (2019)

  61. [65]

    K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. doi:10.1109/CVPR.2016.90

  62. [66]

    Ganin, E

    Y . Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. March, V . Lempitsky, Domain-adversarial training of neural networks, Journal of Machine Learning Re- search 17 (59) (2016) 1–35. doi:10.48550/arXiv.1505. 07818. URL http://jmlr.org/papers/v17/15-239.html

  63. [67]

    R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, I. Stoica, Tune: A research platform for distributed model se- lection and training, arXiv preprint arXiv:1807.05118 (2018). doi:10.48550/arXiv.1807.05118

  64. [68]

    R. M. French, Catastrophic forgetting in connectionist networks, Trends in cognitive sciences 3 (4) (1999) 128–135. doi:10. 1016/S1364-6613(99)01294-2

  65. [69]

    S. Niu, J. Wu, Y . Zhang, Z. Wen, Y . Chen, P. Zhao, M. Tan, To- wards stable test-time adaptation in dynamic wild world, arXiv preprint arXiv:2302.12400 (2023). doi:10.48550/arXiv. 2302.12400

  66. [70]

    B. Zhao, C. Chen, S.-T. Xia, Delta: degradation-free fully test- time adaptation, arXiv preprint arXiv:2301.13018 (2023). doi: 10.48550/arXiv.2301.13018. 19 Appendix A. Kernel Density Estimations Appendix A.1. KDEs of BN outputs from Mtr NYU, M ′tr NYU, and M ′tt NYU −6 −4 −2 0 2 4 6 1 4 8 12 16 Channel  tr NYU A vg. V ar: 0.7487 −6 −4 −2 0 2 4 6 Activat...