arxiv: 2605.09925 · v1 · submitted 2026-05-11 · 💻 cs.CV

Frequency Adapter with SAM for Generalized Medical Image Segmentation

Phuoc-Nguyen Bui , Van-Nguyen Pham , Duc-Tai Le , Junghyun Bum , Hyunseung Choo This is my paper

Pith reviewed 2026-05-12 04:03 UTC · model grok-4.3

classification 💻 cs.CV

keywords domain generalizationmedical image segmentationSAMfrequency adapterLoRAfundusprostate

0 comments p. Extension

The pith

A frequency adapter added to SAM improves generalization in medical image segmentation across domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes FSAM to address domain shifts in medical image segmentation by integrating a frequency adapter with the Segment Anything Model. This adapter extracts high-frequency features that are invariant to variations in imaging equipment and protocols. Combined with LoRA for efficient adaptation, it aims to enhance robustness without relying on explicit alignment or adversarial training. If successful, this would mean foundation models like SAM can be adapted for reliable use in varied clinical environments. The experimental validation on fundus and prostate images shows outperformance over prior methods.

Core claim

FSAM is a framework that incorporates Low-Rank Adaptation (LoRA) and a frequency adapter into SAM to extract domain-invariant high-frequency features, thereby mitigating frequency-related domain shifts for improved single-source domain generalization in medical image segmentation.

What carries the argument

The frequency adapter, which incorporates frequency-domain representations to capture domain-invariant features in the SAM model.

If this is right

FSAM outperforms traditional domain generalization and SAM-based methods on fundus and prostate segmentation tasks.
It enables efficient fine-tuning of SAM while addressing frequency discrepancies.
The approach focuses on high-frequency features overlooked by spatial-domain methods.
It supports single-source domain generalization without needing multiple source domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This technique could be adapted to other foundation models for segmentation tasks in different fields.
Testing on more diverse medical modalities might reveal additional benefits or limitations of frequency adaptation.
The emphasis on frequency domain suggests potential for hybrid spatial-frequency models in general computer vision robustness.

Load-bearing premise

Frequency-domain representations extracted by the adapter are reliably domain-invariant and adding them mitigates frequency-related domain shifts affecting SAM.

What would settle it

Observing no improvement or worse performance on a held-out medical dataset with pronounced frequency variations compared to standard SAM fine-tuning.

Figures

Figures reproduced from arXiv: 2605.09925 by Duc-Tai Le, Hyunseung Choo, Junghyun Bum, Phuoc-Nguyen Bui, Van-Nguyen Pham.

**Figure 1.** Figure 1: Overview of the proposed frequency-based domain generalization framework with SAM (FSAM). The fire icon represents trainable parameters, while the lock icon indicates frozen parameters retained from the pre-trained model [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

read the original abstract

Medical image segmentation is a critical task in computer-aided diagnosis and treatment planning. However, deep learning models often struggle to generalize across datasets due to domain shifts arising from variations in imaging protocols, scanner types, and patient populations. Traditional domain generalization (DG) methods utilize causal feature learning, adversarial consistency, and style augmentation to improve segmentation robustness. While effective, these approaches rely on explicit feature alignment, adversarial objectives, or handcrafted augmentations, which may not fully exploit the capabilities of foundation models. Recently, the Segment Anything Model (SAM) has demonstrated strong generalization capabilities in segmentation tasks. SAM-based DG methods attempt to improve medical image segmentation. However, these approaches primarily operate in the spatial domain and overlook frequency-based discrepancies that significantly affect model robustness. In this work, we propose Frequency-based Domain Generalization with SAM (FSAM), a novel framework that integrates Low-Rank Adaptation (LoRA) for efficient fine-tuning and a frequency adapter to incorporate frequency-domain representations for single-source domain generalization. FSAM enhances SAM's segmentation robustness by extracting domain-invariant high-frequency features, mitigating frequency-related domain shifts. Experimental results on fundus and prostate datasets demonstrate that FSAM outperforms existing traditional DG and SAM-based DG approaches in domain generalization. Codes and pre-trained models will be made available on GitHub.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FSAM tacks a frequency adapter onto SAM with LoRA for single-source medical segmentation DG, but the abstract gives no numbers or ablations so the gains are impossible to check.

read the letter

The main takeaway is that this paper proposes FSAM to handle frequency-domain shifts in medical images by adding a dedicated adapter to SAM while using LoRA for efficient tuning. It targets single-source domain generalization on fundus and prostate data and claims to beat both standard DG baselines and prior SAM adaptations. That combination is the concrete new element relative to the cited SAM-DG work, which stayed mostly spatial. The framing is reasonable: medical scanners often introduce frequency discrepancies that spatial-only fine-tuning misses, and pulling in high-frequency invariants could help without needing multiple source domains. LoRA keeps the compute practical, which is a plus for anyone adapting large models to limited medical data. The architecture sketch itself is straightforward and could serve as a starting point for similar extensions. The soft spots are the missing pieces that matter most. The abstract states outperformance but supplies no dice scores, no statistical tests, no ablation isolating the frequency adapter from LoRA alone, and no description of how the frequency features are actually extracted or fused. Without those, it is hard to tell whether the adapter reduces the claimed shifts or simply adds parameters that happen to fit the test sets. The central claim therefore rests on unreported experiments. This work is aimed at researchers already experimenting with SAM or other foundation models in medical segmentation who want a quick frequency-domain lever for domain shifts. A reader in that niche could borrow the high-level design, but the paper will not move the broader literature until the results are shown in detail. I would send it for peer review because the problem is practical and the proposed fix is simple enough that referees can ask for the necessary ablations and metrics without starting from scratch.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes FSAM, a framework that augments the Segment Anything Model (SAM) with Low-Rank Adaptation (LoRA) for efficient fine-tuning and a frequency adapter module to extract and incorporate domain-invariant high-frequency features. The central claim is that this mitigates frequency-related domain shifts in single-source domain generalization for medical image segmentation, yielding superior performance over traditional DG methods and prior SAM-based approaches on fundus and prostate datasets.

Significance. If the experimental claims hold with proper validation, the work could meaningfully extend foundation-model adaptation in medical imaging by targeting frequency-domain discrepancies that spatial-only methods overlook. The use of LoRA for parameter efficiency is a practical strength, and the promise of releasing code and pre-trained models supports reproducibility. However, the absence of quantitative metrics, ablation results, and implementation details in the current presentation substantially weakens the ability to judge whether the frequency adapter delivers genuine domain-invariant gains beyond what LoRA alone provides.

major comments (3)

[Abstract] Abstract: the claim that 'FSAM outperforms existing traditional DG and SAM-based DG approaches' is stated without any numerical results (e.g., Dice, IoU, or Hausdorff distances), confidence intervals, or statistical tests on the fundus and prostate datasets. This omission prevents evaluation of the magnitude and reliability of the reported gains.
[Method] Method section: the frequency adapter is introduced as extracting 'domain-invariant high-frequency features' yet no concrete description is given of the transform used (FFT, wavelet, etc.), the precise fusion mechanism with SAM's image encoder, or any regularization that would enforce invariance. Without these, it is impossible to verify whether the module reduces frequency shifts or merely adds capacity.
[Experiments] Experiments section: no ablation isolating the frequency adapter from LoRA fine-tuning alone is reported, nor are cross-dataset quantitative tables or visualizations of frequency spectra before/after adaptation provided. These omissions make the central generalization claim impossible to substantiate.

minor comments (1)

[Abstract] The acronym FSAM is defined only after its first use; spelling out 'Frequency-based Domain Generalization with SAM' on first mention would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key areas where additional details will strengthen the manuscript. We agree that the current presentation lacks sufficient quantitative support, methodological specifics, and experimental validation, and we will revise accordingly to address each point.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'FSAM outperforms existing traditional DG and SAM-based DG approaches' is stated without any numerical results (e.g., Dice, IoU, or Hausdorff distances), confidence intervals, or statistical tests on the fundus and prostate datasets. This omission prevents evaluation of the magnitude and reliability of the reported gains.

Authors: We agree that the abstract would benefit from quantitative support. In the revised version, we will include key performance metrics such as Dice and IoU scores on the fundus and prostate datasets, along with direct comparisons to the baselines mentioned. Space permitting, we will also report confidence intervals to better convey the reliability of the gains. revision: yes
Referee: [Method] Method section: the frequency adapter is introduced as extracting 'domain-invariant high-frequency features' yet no concrete description is given of the transform used (FFT, wavelet, etc.), the precise fusion mechanism with SAM's image encoder, or any regularization that would enforce invariance. Without these, it is impossible to verify whether the module reduces frequency shifts or merely adds capacity.

Authors: We will expand the Method section with a precise description of the frequency adapter. This will specify the frequency transform, the fusion process with SAM's image encoder, and any regularization or invariance-promoting mechanisms. These additions will clarify how the module targets frequency-domain shifts rather than simply increasing model capacity. revision: yes
Referee: [Experiments] Experiments section: no ablation isolating the frequency adapter from LoRA fine-tuning alone is reported, nor are cross-dataset quantitative tables or visualizations of frequency spectra before/after adaptation provided. These omissions make the central generalization claim impossible to substantiate.

Authors: We will augment the Experiments section with the requested elements. This includes ablation studies separating the frequency adapter's contribution from LoRA, comprehensive cross-dataset tables reporting Dice, IoU, and other metrics, and visualizations of frequency spectra to illustrate the domain-invariant effects. These revisions will provide direct evidence for the generalization improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical architecture proposal (FSAM) that combines LoRA fine-tuning with a frequency adapter module for SAM-based single-source domain generalization. All load-bearing claims rest on experimental results comparing segmentation performance on fundus and prostate datasets against baselines; there is no mathematical derivation, no fitted parameters renamed as predictions, no self-citation chain invoked for uniqueness, and no ansatz smuggled via prior work. The approach is self-contained as a practical extension of existing foundation-model techniques, with performance evaluated externally rather than by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the empirical effectiveness of the frequency adapter; the paper introduces one new component (the frequency adapter) whose benefit is demonstrated only through the reported experiments.

axioms (1)

domain assumption High-frequency components in medical images are domain-invariant across scanners and protocols
Invoked to justify why the frequency adapter should improve generalization; appears in the motivation for the frequency adapter.

invented entities (1)

Frequency adapter no independent evidence
purpose: Extract domain-invariant high-frequency features to mitigate frequency-related domain shifts in SAM
New module proposed in the paper; no independent evidence outside the reported experiments is provided.

pith-pipeline@v0.9.0 · 5543 in / 1260 out tokens · 45492 ms · 2026-05-12T04:03:58.882791+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We design a Frequency Adapter that aggregates high-frequency components, improving robustness against domain shifts.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FSAM enhances SAM's segmentation robustness by extracting domain-invariant high-frequency features

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 2 internal anchors

[1]

Artificial Intelli- gence in Medicine p

Bui, P.N., Le, D.T., Bum, J., Han, J.C., Pham, V.N., Choo, H.: Multi-scale feature enhancement in multi-task learning for medical image analysis. Artificial Intelli- gence in Medicine p. 103338 (2025)

work page 2025
[2]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Carlucci, F.M., D’Innocente, A., Bucci, S., Caputo, B., Tommasi, T.: Domain gen- eralization by solving jigsaw puzzles. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2229–2238 (2019) Frequency Adapter with SAM for Generalized Medical Image Segmentation 9

work page 2019
[3]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Chen, C., Li, Z., Ouyang, C., Sinclair, M., Bai, W., Rueckert, D.: Maxstyle: Adver- sarial style composition for robust medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 151–161. Springer (2022)

work page 2022
[4]

In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23

Chen, C., Qin, C., Qiu, H., Ouyang, C., Wang, S., Chen, L., Tarroni, G., Bai, W., Rueckert, D.: Realistic adversarial data augmentation for mr image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23. pp. 667–677. Springer (2020)

work page 2020
[5]

In: International Conference on Medical Image Computing and Computer- Assisted Intervention

Chen, Z., Pan, Y., Ye, Y., Cui, H., Xia, Y.: Treasure in distribution: A domain ran- domization based multi-source domain generalization for 2d medical image segmen- tation. In: International Conference on Medical Image Computing and Computer- Assisted Intervention. pp. 89–99. Springer (2023)

work page 2023
[6]

Image Analysis & Stereology pp

Decencière, E., Zhang, X., Cazuguel, G., Lay, B., Cochener, B., Trone, C., Gain, P., Ordóñez-Varela, J.R., Massin, P., Erginay, A., et al.: Feedback on a publicly distributed image database: the messidor database. Image Analysis & Stereology pp. 231–234 (2014)

work page 2014
[7]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[8]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Gao, Y., Xia, W., Hu, D., Wang, W., Gao, X.: Desam: Decoupled segment anything model for generalizable medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 509–519. Springer (2024)

work page 2024
[9]

ICLR1(2), 3 (2022)

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. ICLR1(2), 3 (2022)

work page 2022
[10]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Hu, S., Liao, Z., Xia, Y.: Domain specific convolution and high frequency recon- struction based unsupervised domain adaptation for medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 650–659. Springer (2022)

work page 2022
[11]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Hu, S., Liao, Z., Xia, Y.: Devil is in channels: Contrastive single domain general- ization for medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 14–23. Springer (2023)

work page 2023
[12]

In: 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Imans, D., Bui, P.N., Le, D.T., Choo, H.: Unsupervised domain adaptation with sam-refiser for enhanced brain tumor segmentation. In: 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). pp. 3721–3724. IEEE (2025)

work page 2025
[13]

In: Proceedings of the IEEE/CVF international conference on computer vision

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4015–4026 (2023)

work page 2023
[14]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Li, H., Li, H., Zhao, W., Fu, H., Su, X., Hu, Y., Liu, J.: Frequency-mixed single- source domain generalization for medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 127–136. Springer (2023)

work page 2023
[15]

In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition

Lin, S., Zhang, Z., Huang, Z., Lu, Y., Lan, C., Chu, P., You, Q., Wang, J., Liu, Z., Parulkar, A., et al.: Deep frequency filtering for domain generalization. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11797–11807 (2023) 10 P.-N. Bui et al

work page 2023
[16]

In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23

Liu, Q., Dou, Q., Heng, P.A.: Shape-aware meta-learning for generalizing prostate mri segmentation to unseen domains. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23. pp. 475–485. Springer (2020)

work page 2020
[17]

Decoupled Weight Decay Regularization

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[18]

Nature Communications15(1), 654 (2024)

Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. Nature Communications15(1), 654 (2024)

work page 2024
[19]

IEEE Transactions on Medical Imaging42(4), 1095–1106 (2022)

Ouyang, C., Chen, C., Li, S., Li, Z., Qin, C., Bai, W., Rueckert, D.: Causality- inspired single-source domain generalization for medical image segmentation. IEEE Transactions on Medical Imaging42(4), 1095–1106 (2022)

work page 2022
[20]

In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, Oc- tober 5-9, 2015, proceedings, part III 18

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, Oc- tober 5-9, 2015, proceedings, part III 18. pp. 234–241. Springer (2015)

work page 2015
[21]

In: Proceedings of the AAAI conference on artificial intelligence

Su, Z., Yao, K., Yang, X., Huang, K., Wang, Q., Sun, J.: Rethinking data augmen- tation for single-source domain generalization in medical image segmentation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 37, pp. 2366–2374 (2023)

work page 2023
[22]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7167–7176 (2017)

work page 2017
[23]

In: International Conference on Medical Image Computing and Computer- Assisted Intervention

Wei,Z.,Dong,W.,Zhou,P.,Gu,Y.,Zhao,Z.,Xu,Y.:Promptingsegmentanything model with domain-adaptive prototype for generalizable medical image segmenta- tion. In: International Conference on Medical Image Computing and Computer- Assisted Intervention. pp. 533–543. Springer (2024)

work page 2024
[24]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Xu, Y., Xie, S., Reynolds, M., Ragoza, M., Gong, M., Batmanghelich, K.: Adver- sarial consistency for single domain generalization in medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 671–681. Springer (2022)

work page 2022
[25]

arXiv preprint arXiv:2007.13003 (2020) 3

Xu, Z., Liu, D., Yang, J., Raffel, C., Niethammer, M.: Robust and general- izable visual representation learning via random convolutions. arXiv preprint arXiv:2007.13003 (2020)

work page arXiv 2007
[26]

Customized segment anything model for medical image segmentation,

Zhang, K., Liu, D.: Customized segment anything model for medical image seg- mentation. arXiv preprint arXiv:2304.13785 (2023)

work page arXiv 2023
[27]

arXiv preprint arXiv:2104.02008 , year=

Zhou, K., Yang, Y., Qiao, Y., Xiang, T.: Domain generalization with mixstyle. arXiv preprint arXiv:2104.02008 (2021)

work page arXiv 2021
[28]

In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition

Zhou, Z., Qi, L., Yang, X., Ni, D., Shi, Y.: Generalizable cross-modality medical image segmentation via style augmentation and dual normalization. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20856–20865 (2022)

work page 2022