{"paper":{"title":"SpectraFlow: Unifying Structural Pretraining and Frequency Adaptation for Medical Image Segmentation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Aligning images and binary masks in a shared latent space through latent transport regression produces transferable structural representations that improve medical image segmentation accuracy and boundary precision in low-data regimes.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Guowei Zou, Haitao Wang, Hejun Wu, Zhiquan Chen","submitted_at":"2026-05-14T08:37:10Z","abstract_excerpt":"Medical image segmentation remains challenging in low-data regimes, where scarce annotations often yield poor generalization and ambiguous boundaries with missing fine structures. Recent self-supervised pretraining has improved transferability, but it often exhibits a texture bias. In contrast, accurate segmentation is inherently geometry-aware and depends on both topological consistency and precise boundary preservation. To address this problem, we propose a two-stage framework that couples structure-aware encoder pretraining with boundary-oriented decoding. In Stage-1, we aim to learn struct"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Experiments on ISIC-2016, Kvasir-SEG, and GlaS demonstrate consistent gains over state-of-the-art methods, with improved robustness in low-data settings and sharper boundary delineation.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That aligning images and binary masks through latent transport regression in a shared latent space produces task-agnostic structural representations that transfer effectively to downstream segmentation without bias from the mask generation or pretraining process.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"SpectraFlow combines structure-aware pretraining with mask-guided latent alignment and frequency-directional decoding to improve medical image segmentation accuracy and boundary sharpness in low-data regimes.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Aligning images and binary masks in a shared latent space through latent transport regression produces transferable structural representations that improve medical image segmentation accuracy and boundary precision in low-data regimes.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"241f9a2c60bfd9ad5ec5950bc1c05d0c3b69fee216d78274eb2149b8a7784596"},"source":{"id":"2605.14566","kind":"arxiv","version":1},"verdict":{"id":"d340a272-0cd2-469a-9c10-97426629ae54","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T01:53:36.558700Z","strongest_claim":"Experiments on ISIC-2016, Kvasir-SEG, and GlaS demonstrate consistent gains over state-of-the-art methods, with improved robustness in low-data settings and sharper boundary delineation.","one_line_summary":"SpectraFlow combines structure-aware pretraining with mask-guided latent alignment and frequency-directional decoding to improve medical image segmentation accuracy and boundary sharpness in low-data regimes.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That aligning images and binary masks through latent transport regression in a shared latent space produces task-agnostic structural representations that transfer effectively to downstream segmentation without bias from the mask generation or pretraining process.","pith_extraction_headline":"Aligning images and binary masks in a shared latent space through latent transport regression produces transferable structural representations that improve medical image segmentation accuracy and boundary precision in low-data regimes."},"references":{"count":30,"sample":[{"doi":"","year":2022,"title":"In: ICLR (2022) 2, 5","work_id":"b2444808-f812-4349-a0ab-d7afe9551e50","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-unet: Unet-like pure transformer for medical image segmentation (2021) 2","work_id":"9340cda6-3366-4880-bf09-667ef6e4dc74","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Caron, M., Touvron, H., Misra, I., J´ egou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: ICCV. pp. 9650– 9660 (2021) 3, 5","work_id":"c4675887-025e-4217-92ff-f9a7c378dcc5","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.: Transunet: Transformers make strong encoders for medical image segmentation. In: MICCAI. pp. 127–136 (2021) 1","work_id":"546d5372-7d9d-415e-9ba9-e53e72c7168a","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Chen, L., Gu, L., Li, L., Yan, C., Fu, Y.: Frequency dynamic convolution for dense image prediction. In: CVPR. pp. 30178–30188 (2025) 3, 8","work_id":"852e6241-a666-45e6-bcd4-09024ba7f7de","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":30,"snapshot_sha256":"2888afe5f13ebf0b81989453714223cb7cdc07d65144212e8fd3c8ea1d17eaaa","internal_anchors":0},"formal_canon":{"evidence_count":1,"snapshot_sha256":"15868b03c734f8037f6537680666d405dd986a5bcd6e2c175eda386d584f7b62"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}