pith. the verified trust layer for science. sign in

arxiv: 2605.09925 · v1 · submitted 2026-05-11 · 💻 cs.CV

Frequency Adapter with SAM for Generalized Medical Image Segmentation

Pith reviewed 2026-05-12 04:03 UTC · model grok-4.3

classification 💻 cs.CV
keywords domain generalizationmedical image segmentationSAMfrequency adapterLoRAfundusprostate
0
0 comments X p. Extension

The pith

A frequency adapter added to SAM improves generalization in medical image segmentation across domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes FSAM to address domain shifts in medical image segmentation by integrating a frequency adapter with the Segment Anything Model. This adapter extracts high-frequency features that are invariant to variations in imaging equipment and protocols. Combined with LoRA for efficient adaptation, it aims to enhance robustness without relying on explicit alignment or adversarial training. If successful, this would mean foundation models like SAM can be adapted for reliable use in varied clinical environments. The experimental validation on fundus and prostate images shows outperformance over prior methods.

Core claim

FSAM is a framework that incorporates Low-Rank Adaptation (LoRA) and a frequency adapter into SAM to extract domain-invariant high-frequency features, thereby mitigating frequency-related domain shifts for improved single-source domain generalization in medical image segmentation.

What carries the argument

The frequency adapter, which incorporates frequency-domain representations to capture domain-invariant features in the SAM model.

If this is right

  • FSAM outperforms traditional domain generalization and SAM-based methods on fundus and prostate segmentation tasks.
  • It enables efficient fine-tuning of SAM while addressing frequency discrepancies.
  • The approach focuses on high-frequency features overlooked by spatial-domain methods.
  • It supports single-source domain generalization without needing multiple source domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This technique could be adapted to other foundation models for segmentation tasks in different fields.
  • Testing on more diverse medical modalities might reveal additional benefits or limitations of frequency adaptation.
  • The emphasis on frequency domain suggests potential for hybrid spatial-frequency models in general computer vision robustness.

Load-bearing premise

Frequency-domain representations extracted by the adapter are reliably domain-invariant and adding them mitigates frequency-related domain shifts affecting SAM.

What would settle it

Observing no improvement or worse performance on a held-out medical dataset with pronounced frequency variations compared to standard SAM fine-tuning.

Figures

Figures reproduced from arXiv: 2605.09925 by Duc-Tai Le, Hyunseung Choo, Junghyun Bum, Phuoc-Nguyen Bui, Van-Nguyen Pham.

Figure 1
Figure 1. Figure 1: Overview of the proposed frequency-based domain generalization framework with SAM (FSAM). The fire icon represents trainable parameters, while the lock icon indicates frozen parameters retained from the pre-trained model [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

Medical image segmentation is a critical task in computer-aided diagnosis and treatment planning. However, deep learning models often struggle to generalize across datasets due to domain shifts arising from variations in imaging protocols, scanner types, and patient populations. Traditional domain generalization (DG) methods utilize causal feature learning, adversarial consistency, and style augmentation to improve segmentation robustness. While effective, these approaches rely on explicit feature alignment, adversarial objectives, or handcrafted augmentations, which may not fully exploit the capabilities of foundation models. Recently, the Segment Anything Model (SAM) has demonstrated strong generalization capabilities in segmentation tasks. SAM-based DG methods attempt to improve medical image segmentation. However, these approaches primarily operate in the spatial domain and overlook frequency-based discrepancies that significantly affect model robustness. In this work, we propose Frequency-based Domain Generalization with SAM (FSAM), a novel framework that integrates Low-Rank Adaptation (LoRA) for efficient fine-tuning and a frequency adapter to incorporate frequency-domain representations for single-source domain generalization. FSAM enhances SAM's segmentation robustness by extracting domain-invariant high-frequency features, mitigating frequency-related domain shifts. Experimental results on fundus and prostate datasets demonstrate that FSAM outperforms existing traditional DG and SAM-based DG approaches in domain generalization. Codes and pre-trained models will be made available on GitHub.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes FSAM, a framework that augments the Segment Anything Model (SAM) with Low-Rank Adaptation (LoRA) for efficient fine-tuning and a frequency adapter module to extract and incorporate domain-invariant high-frequency features. The central claim is that this mitigates frequency-related domain shifts in single-source domain generalization for medical image segmentation, yielding superior performance over traditional DG methods and prior SAM-based approaches on fundus and prostate datasets.

Significance. If the experimental claims hold with proper validation, the work could meaningfully extend foundation-model adaptation in medical imaging by targeting frequency-domain discrepancies that spatial-only methods overlook. The use of LoRA for parameter efficiency is a practical strength, and the promise of releasing code and pre-trained models supports reproducibility. However, the absence of quantitative metrics, ablation results, and implementation details in the current presentation substantially weakens the ability to judge whether the frequency adapter delivers genuine domain-invariant gains beyond what LoRA alone provides.

major comments (3)
  1. [Abstract] Abstract: the claim that 'FSAM outperforms existing traditional DG and SAM-based DG approaches' is stated without any numerical results (e.g., Dice, IoU, or Hausdorff distances), confidence intervals, or statistical tests on the fundus and prostate datasets. This omission prevents evaluation of the magnitude and reliability of the reported gains.
  2. [Method] Method section: the frequency adapter is introduced as extracting 'domain-invariant high-frequency features' yet no concrete description is given of the transform used (FFT, wavelet, etc.), the precise fusion mechanism with SAM's image encoder, or any regularization that would enforce invariance. Without these, it is impossible to verify whether the module reduces frequency shifts or merely adds capacity.
  3. [Experiments] Experiments section: no ablation isolating the frequency adapter from LoRA fine-tuning alone is reported, nor are cross-dataset quantitative tables or visualizations of frequency spectra before/after adaptation provided. These omissions make the central generalization claim impossible to substantiate.
minor comments (1)
  1. [Abstract] The acronym FSAM is defined only after its first use; spelling out 'Frequency-based Domain Generalization with SAM' on first mention would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key areas where additional details will strengthen the manuscript. We agree that the current presentation lacks sufficient quantitative support, methodological specifics, and experimental validation, and we will revise accordingly to address each point.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'FSAM outperforms existing traditional DG and SAM-based DG approaches' is stated without any numerical results (e.g., Dice, IoU, or Hausdorff distances), confidence intervals, or statistical tests on the fundus and prostate datasets. This omission prevents evaluation of the magnitude and reliability of the reported gains.

    Authors: We agree that the abstract would benefit from quantitative support. In the revised version, we will include key performance metrics such as Dice and IoU scores on the fundus and prostate datasets, along with direct comparisons to the baselines mentioned. Space permitting, we will also report confidence intervals to better convey the reliability of the gains. revision: yes

  2. Referee: [Method] Method section: the frequency adapter is introduced as extracting 'domain-invariant high-frequency features' yet no concrete description is given of the transform used (FFT, wavelet, etc.), the precise fusion mechanism with SAM's image encoder, or any regularization that would enforce invariance. Without these, it is impossible to verify whether the module reduces frequency shifts or merely adds capacity.

    Authors: We will expand the Method section with a precise description of the frequency adapter. This will specify the frequency transform, the fusion process with SAM's image encoder, and any regularization or invariance-promoting mechanisms. These additions will clarify how the module targets frequency-domain shifts rather than simply increasing model capacity. revision: yes

  3. Referee: [Experiments] Experiments section: no ablation isolating the frequency adapter from LoRA fine-tuning alone is reported, nor are cross-dataset quantitative tables or visualizations of frequency spectra before/after adaptation provided. These omissions make the central generalization claim impossible to substantiate.

    Authors: We will augment the Experiments section with the requested elements. This includes ablation studies separating the frequency adapter's contribution from LoRA, comprehensive cross-dataset tables reporting Dice, IoU, and other metrics, and visualizations of frequency spectra to illustrate the domain-invariant effects. These revisions will provide direct evidence for the generalization improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical architecture proposal (FSAM) that combines LoRA fine-tuning with a frequency adapter module for SAM-based single-source domain generalization. All load-bearing claims rest on experimental results comparing segmentation performance on fundus and prostate datasets against baselines; there is no mathematical derivation, no fitted parameters renamed as predictions, no self-citation chain invoked for uniqueness, and no ansatz smuggled via prior work. The approach is self-contained as a practical extension of existing foundation-model techniques, with performance evaluated externally rather than by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the empirical effectiveness of the frequency adapter; the paper introduces one new component (the frequency adapter) whose benefit is demonstrated only through the reported experiments.

axioms (1)
  • domain assumption High-frequency components in medical images are domain-invariant across scanners and protocols
    Invoked to justify why the frequency adapter should improve generalization; appears in the motivation for the frequency adapter.
invented entities (1)
  • Frequency adapter no independent evidence
    purpose: Extract domain-invariant high-frequency features to mitigate frequency-related domain shifts in SAM
    New module proposed in the paper; no independent evidence outside the reported experiments is provided.

pith-pipeline@v0.9.0 · 5543 in / 1260 out tokens · 45492 ms · 2026-05-12T04:03:58.882791+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 2 internal anchors

  1. [1]

    Artificial Intelli- gence in Medicine p

    Bui, P.N., Le, D.T., Bum, J., Han, J.C., Pham, V.N., Choo, H.: Multi-scale feature enhancement in multi-task learning for medical image analysis. Artificial Intelli- gence in Medicine p. 103338 (2025)

  2. [2]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Carlucci, F.M., D’Innocente, A., Bucci, S., Caputo, B., Tommasi, T.: Domain gen- eralization by solving jigsaw puzzles. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2229–2238 (2019) Frequency Adapter with SAM for Generalized Medical Image Segmentation 9

  3. [3]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Chen, C., Li, Z., Ouyang, C., Sinclair, M., Bai, W., Rueckert, D.: Maxstyle: Adver- sarial style composition for robust medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 151–161. Springer (2022)

  4. [4]

    In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23

    Chen, C., Qin, C., Qiu, H., Ouyang, C., Wang, S., Chen, L., Tarroni, G., Bai, W., Rueckert, D.: Realistic adversarial data augmentation for mr image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23. pp. 667–677. Springer (2020)

  5. [5]

    In: International Conference on Medical Image Computing and Computer- Assisted Intervention

    Chen, Z., Pan, Y., Ye, Y., Cui, H., Xia, Y.: Treasure in distribution: A domain ran- domization based multi-source domain generalization for 2d medical image segmen- tation. In: International Conference on Medical Image Computing and Computer- Assisted Intervention. pp. 89–99. Springer (2023)

  6. [6]

    Image Analysis & Stereology pp

    Decencière, E., Zhang, X., Cazuguel, G., Lay, B., Cochener, B., Trone, C., Gain, P., Ordóñez-Varela, J.R., Massin, P., Erginay, A., et al.: Feedback on a publicly distributed image database: the messidor database. Image Analysis & Stereology pp. 231–234 (2014)

  7. [7]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  8. [8]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Gao, Y., Xia, W., Hu, D., Wang, W., Gao, X.: Desam: Decoupled segment anything model for generalizable medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 509–519. Springer (2024)

  9. [9]

    ICLR1(2), 3 (2022)

    Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. ICLR1(2), 3 (2022)

  10. [10]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Hu, S., Liao, Z., Xia, Y.: Domain specific convolution and high frequency recon- struction based unsupervised domain adaptation for medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 650–659. Springer (2022)

  11. [11]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Hu, S., Liao, Z., Xia, Y.: Devil is in channels: Contrastive single domain general- ization for medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 14–23. Springer (2023)

  12. [12]

    In: 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

    Imans, D., Bui, P.N., Le, D.T., Choo, H.: Unsupervised domain adaptation with sam-refiser for enhanced brain tumor segmentation. In: 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). pp. 3721–3724. IEEE (2025)

  13. [13]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4015–4026 (2023)

  14. [14]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Li, H., Li, H., Zhao, W., Fu, H., Su, X., Hu, Y., Liu, J.: Frequency-mixed single- source domain generalization for medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 127–136. Springer (2023)

  15. [15]

    In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Lin, S., Zhang, Z., Huang, Z., Lu, Y., Lan, C., Chu, P., You, Q., Wang, J., Liu, Z., Parulkar, A., et al.: Deep frequency filtering for domain generalization. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11797–11807 (2023) 10 P.-N. Bui et al

  16. [16]

    In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23

    Liu, Q., Dou, Q., Heng, P.A.: Shape-aware meta-learning for generalizing prostate mri segmentation to unseen domains. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23. pp. 475–485. Springer (2020)

  17. [17]

    Decoupled Weight Decay Regularization

    Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  18. [18]

    Nature Communications15(1), 654 (2024)

    Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. Nature Communications15(1), 654 (2024)

  19. [19]

    IEEE Transactions on Medical Imaging42(4), 1095–1106 (2022)

    Ouyang, C., Chen, C., Li, S., Li, Z., Qin, C., Bai, W., Rueckert, D.: Causality- inspired single-source domain generalization for medical image segmentation. IEEE Transactions on Medical Imaging42(4), 1095–1106 (2022)

  20. [20]

    In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, Oc- tober 5-9, 2015, proceedings, part III 18

    Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, Oc- tober 5-9, 2015, proceedings, part III 18. pp. 234–241. Springer (2015)

  21. [21]

    In: Proceedings of the AAAI conference on artificial intelligence

    Su, Z., Yao, K., Yang, X., Huang, K., Wang, Q., Sun, J.: Rethinking data augmen- tation for single-source domain generalization in medical image segmentation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 37, pp. 2366–2374 (2023)

  22. [22]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7167–7176 (2017)

  23. [23]

    In: International Conference on Medical Image Computing and Computer- Assisted Intervention

    Wei,Z.,Dong,W.,Zhou,P.,Gu,Y.,Zhao,Z.,Xu,Y.:Promptingsegmentanything model with domain-adaptive prototype for generalizable medical image segmenta- tion. In: International Conference on Medical Image Computing and Computer- Assisted Intervention. pp. 533–543. Springer (2024)

  24. [24]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Xu, Y., Xie, S., Reynolds, M., Ragoza, M., Gong, M., Batmanghelich, K.: Adver- sarial consistency for single domain generalization in medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 671–681. Springer (2022)

  25. [25]

    arXiv preprint arXiv:2007.13003 (2020) 3

    Xu, Z., Liu, D., Yang, J., Raffel, C., Niethammer, M.: Robust and general- izable visual representation learning via random convolutions. arXiv preprint arXiv:2007.13003 (2020)

  26. [26]

    Customized segment anything model for medical image segmentation,

    Zhang, K., Liu, D.: Customized segment anything model for medical image seg- mentation. arXiv preprint arXiv:2304.13785 (2023)

  27. [27]

    arXiv preprint arXiv:2104.02008 , year=

    Zhou, K., Yang, Y., Qiao, Y., Xiang, T.: Domain generalization with mixstyle. arXiv preprint arXiv:2104.02008 (2021)

  28. [28]

    In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition

    Zhou, Z., Qi, L., Yang, X., Ni, D., Shi, Y.: Generalizable cross-modality medical image segmentation via style augmentation and dual normalization. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20856–20865 (2022)