TopoMamba: Topology-Aware Scanning and Fusion for Segmenting Heterogeneous Medical Visual Media

Chengpei Xu; Chi-Man Pun; Fuchen Zheng; Haolun Li; Junhua Zhou; Lei Zhao; Long Ma; Quanjun Li; Shoujun Zhou; Weihuang Liu

arxiv: 2604.25545 · v2 · submitted 2026-04-28 · 💻 cs.CV

TopoMamba: Topology-Aware Scanning and Fusion for Segmenting Heterogeneous Medical Visual Media

Fuchen Zheng , Chengpei Xu , Long Ma , Weixuan Li , Junhua Zhou , Xuhang Chen , Weihuang Liu , Haolun Li

show 5 more authors

Quanjun Li Zhenxi Zhang Lei Zhao Chi-Man Pun Shoujun Zhou

This is my paper

Pith reviewed 2026-05-07 16:42 UTC · model grok-4.3

classification 💻 cs.CV

keywords medical image segmentationstate space modelstopology aware scanningfeature fusionHSIC gate3D segmentationSSMs

0 comments

The pith

TopoMamba adds diagonal and anti-diagonal scans to state-space models and fuses them with a dependence-aware gate to improve segmentation of curved structures in medical images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that standard visual state-space models lose effectiveness on medical images because their axis-aligned scans overlook oblique and curved anatomy while naive branch fusion adds redundancy. It proposes combining a TopoA-Scan branch that processes diagonal and anti-diagonal directions with the usual Cross-Scan, then regulating the two streams through an HSIC Gate that applies a scalar dependence measure. If this holds, segmentation quality would rise on thin or irregular targets such as the pancreas or gallbladder across CT, dermoscopy, and endoscopy data, without sacrificing speed under changing input sizes. The work also supplies a caching scheme that reuses scan indices for repeated resolutions and extends the design to volumetric 3D cases.

Core claim

TopoMamba augments state-space models with a TopoA-Scan branch that traverses diagonal and anti-diagonal paths to capture complementary structural priors, merges the resulting features with the standard Cross-Scan branch via an HSIC Gate that uses a Hilbert-Schmidt independence criterion scalar to control interaction, and employs ScanCache to amortize index construction across varying resolutions. Experiments on Synapse CT, ISIC 2017 dermoscopy, and CVC-ClinicDB endoscopy demonstrate consistent gains over CNN, Transformer, and baseline SSM methods, with the largest benefits appearing on thin or curved anatomical targets, while the 3D instantiation supports practical volumetric segmentation.

What carries the argument

The TopoA-Scan branch (diagonal and anti-diagonal ordering) paired with the Cross-Scan branch and regulated by the HSIC Gate, which supplies complementary priors for oblique structures and limits redundant fusion.

If this is right

Segmentation accuracy rises on thin or curved anatomical structures across CT, dermoscopy, and endoscopy.
The method retains favorable runtime and memory use compared with Transformer and standard SSM baselines under variable input resolutions.
A single 3D instantiation extends the same scan-and-gate design to volumetric clinical volumes.
The caching mechanism reduces repeated computation when input sizes recur in clinical workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same diagonal-scan ordering might help other vision tasks that involve non-grid-aligned features, such as vessel tracing or road segmentation.
An HSIC-style dependence gate could be tested in multi-modal fusion settings where branch redundancy is a known issue.
If the topology priors prove stable across modalities, training regimes might require less aggressive augmentation focused on orientation changes.

Load-bearing premise

The added diagonal and anti-diagonal scans supply genuinely new structural information that the standard scans miss, and the HSIC scalar gate can balance the branches without discarding useful detail or introducing fitting artifacts.

What would settle it

Running the same evaluation sets with the TopoA-Scan and HSIC Gate removed or replaced by a simple average fusion, then measuring whether accuracy on curved targets such as the pancreas drops back to baseline levels.

Figures

Figures reproduced from arXiv: 2604.25545 by Chengpei Xu, Chi-Man Pun, Fuchen Zheng, Haolun Li, Junhua Zhou, Lei Zhao, Long Ma, Quanjun Li, Shoujun Zhou, Weihuang Liu, Weixuan Li, Xuhang Chen, Zhenxi Zhang.

**Figure 1.** Figure 1: Motivation and overview of TopoMamba. Left: axis-biased scanning disrupts non-axial structural continuity in 3D CT, endoscopy, and dermoscopy. Right: Cross-Scan and TopoA-Scan are fused by the HSIC Gate to better preserve non-axial continuity and suppress false positives. scan boosts pancreas Dice from 77.27% to 79.72% (+2.45 points). • We design a plug-and-play HSIC Gate. It uses only one learnable scalar… view at source ↗

**Figure 2.** Figure 2: Architecture of TopoMamba-3D. (a) 3D U-shaped segmentation network with patch embedding, hierarchical TopoMamba blocks, patch merging and expanding, and skip connections. (b) TopoMamba block with TopoA-Scan and Cross-Scan coupled via ScanCache, state-space sequence modeling, HSIC Gate fusion, view-aware reweighting, and a 3D feed-forward module. one learnable scalar to regulate the relative contribution of… view at source ↗

**Figure 3.** Figure 3: Effective receptive fields (ERFs) [38] before and after training, averaged over 300 slices. ERFs are computed from unit input perturbations and normalized output-gradient energy. Fusion-Scan denotes Cross-Scan fused with TopoA-Scan via the HSIC Gate. We first apply Johnson-Lindenstrauss random projection [35], [39] to reduce computational overhead and stabilize kernel computation: Xscan = FscanP √ L ∈ R B… view at source ↗

**Figure 4.** Figure 4: Qualitative ETMS attention comparison. Both branches share similar view at source ↗

**Figure 5.** Figure 5: Representative Synapse test cases. TABLE III TOPOLOGY-ORIENTED EVALUATION ON SYNAPSE. LOWER CCE/HCE AND HIGHER ETM INDICATE BETTER TOPOLOGY PRESERVATION. Method CCE↓ HCE↓ ETM(%)↑ VM-UNet [7] 4.86 0.75 32.2 Swin-UMamba [6] 4.79 0.68 34.6 H-SAM (GT Box) [47] 3.41 0.54 37.6 MedSegDiff [44] 3.29 0.52 38.1 TopoMamba-2D 2.89 0.48 41.8 TopoMamba-3D 2.18 0.44 49.7 +Topology-Aware Loss [53] 1.54 0.38 58.9 on Synaps… view at source ↗

**Figure 6.** Figure 6: Representative ISIC 2017 and CVC-ClinicDB test cases (fixed random seed). Additional random cases are provided in the supplementary material. view at source ↗

read the original abstract

Visual state-space models (SSMs) have shown strong potential for medical image segmentation, yet their effectiveness is often limited by two practical issues: axis-biased scan ordering weakens the modeling of oblique and curved structures, and naive multi-branch fusion tends to amplify redundant responses. We present TopoMamba, a topology-aware scan-and-fuse framework for segmenting heterogeneous medical visual media. The method combines a diagonal/anti-diagonal TopoA-Scan branch with the standard Cross-Scan branch to provide complementary structural priors, and introduces ScanCache, a device-aware caching mechanism that amortizes explicit scan-index construction across recurring resolutions. To fuse heterogeneous scan features efficiently, we further propose a lightweight HSIC Gate that regulates branch interaction using a dependence-aware scalar gating rule. We also instantiate a volumetric TopoMamba-3D for practical 3D clinical segmentation. Experiments on Synapse CT, ISIC 2017 dermoscopy, and CVC-ClinicDB endoscopy show that TopoMamba consistently improves segmentation quality over strong CNN, Transformer, and SSM baselines, with particularly clear gains on thin or curved targets such as the pancreas and gallbladder, while maintaining favorable deployment efficiency under dynamic input resolutions. These results suggest that topology-aware scan ordering and lightweight dependence-aware fusion form an effective and practical design for medical multimedia segmentation. The code will be made publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TopoMamba adds diagonal scans and an HSIC gate to Mamba for curved medical structures, but the complementarity and gate claims rest on untested assumptions.

read the letter

The main point is that TopoMamba tries to reduce axis bias in state-space model scans by pairing the usual cross-scan with diagonal and anti-diagonal TopoA-Scan branches, then fuses them with a lightweight HSIC scalar gate while caching scan indices for speed across resolutions. It also gives a 3D version for volumetric work. This targets a genuine limitation in current SSM vision models when segmenting thin or oblique anatomy in CT, dermoscopy, and endoscopy data.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces TopoMamba, a topology-aware scan-and-fuse framework for medical image segmentation based on visual state-space models. It augments the standard Cross-Scan with a diagonal/anti-diagonal TopoA-Scan branch to capture oblique and curved structures, adds ScanCache for amortizing scan-index construction across resolutions, and proposes a lightweight HSIC Gate that fuses multi-branch features via a dependence-aware scalar. A 3D volumetric extension is also instantiated. Experiments on Synapse CT, ISIC 2017 dermoscopy, and CVC-ClinicDB endoscopy are reported to show consistent gains over CNN, Transformer, and SSM baselines, especially on thin/curved targets such as the pancreas and gallbladder, while preserving deployment efficiency under varying input resolutions.

Significance. If the empirical claims hold after proper validation, the work would offer a practical advance in applying SSMs to heterogeneous medical imaging by mitigating axis-biased scanning and redundant fusion. The emphasis on efficiency under dynamic resolutions and the 3D extension address real clinical constraints. The public code release is a positive factor for reproducibility.

major comments (3)

[Abstract / Experiments] Abstract and Experiments section: The central claim of consistent outperformance and particular gains on thin/curved targets is stated without any quantitative metrics (e.g., Dice scores, IoU, Hausdorff distance), error bars, statistical tests, or detailed baseline configurations. This absence prevents evaluation of the magnitude and reliability of the reported improvements.
[Method (TopoA-Scan / HSIC Gate)] Method (TopoA-Scan and HSIC Gate): The assumption that the diagonal/anti-diagonal TopoA-Scan supplies genuinely complementary structural priors beyond axis-aligned Cross-Scan, and that the HSIC Gate regulates interaction via a single dependence-aware scalar without introducing fitting artifacts or information loss, is load-bearing but unsupported by any ablation isolating each component or analysis of the gate's information-preservation properties.
[Experiments] Experiments section: No ablation tables or controlled studies are referenced that would demonstrate the incremental benefit of TopoA-Scan over simply adding extra scan branches/parameters, or that observed gains on pancreas/gallbladder exceed what would be expected from increased model capacity alone.

minor comments (2)

[Abstract] The abstract is concise but would be strengthened by including one or two key quantitative results (e.g., average Dice improvement) to ground the performance claims.
[Method] Notation for the HSIC Gate scalar and the exact dependence measure should be defined more explicitly in the method section to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment point by point below, indicating the revisions we will incorporate.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: The central claim of consistent outperformance and particular gains on thin/curved targets is stated without any quantitative metrics (e.g., Dice scores, IoU, Hausdorff distance), error bars, statistical tests, or detailed baseline configurations. This absence prevents evaluation of the magnitude and reliability of the reported improvements.

Authors: We acknowledge that the abstract states the improvements qualitatively. The Experiments section presents quantitative results via tables reporting Dice, IoU, and related metrics on Synapse, ISIC 2017, and CVC-ClinicDB, with comparisons to CNN, Transformer, and SSM baselines. To address the concern directly, we will revise the abstract to include specific key numbers (e.g., mean Dice on Synapse and the improvement on pancreas). We will also add standard deviation error bars to the tables, include statistical significance tests (e.g., paired t-tests or Wilcoxon) for the main comparisons, and expand the description of baseline configurations and training protocols in the main text. revision: yes
Referee: [Method (TopoA-Scan / HSIC Gate)] Method (TopoA-Scan and HSIC Gate): The assumption that the diagonal/anti-diagonal TopoA-Scan supplies genuinely complementary structural priors beyond axis-aligned Cross-Scan, and that the HSIC Gate regulates interaction via a single dependence-aware scalar without introducing fitting artifacts or information loss, is load-bearing but unsupported by any ablation isolating each component or analysis of the gate's information-preservation properties.

Authors: We agree that explicit isolation of each component would strengthen the claims. We will add ablation experiments that disable the TopoA-Scan branch (replacing it with a capacity-matched extra axis-aligned branch) and report the resulting drop in performance on curved structures. For the HSIC Gate, we will add an analysis comparing HSIC dependence scores before and after gating, together with a direct comparison against simple addition and concatenation baselines to quantify information preservation and rule out fitting artifacts. These results will appear in a dedicated ablation subsection and table. revision: yes
Referee: [Experiments] Experiments section: No ablation tables or controlled studies are referenced that would demonstrate the incremental benefit of TopoA-Scan over simply adding extra scan branches/parameters, or that observed gains on pancreas/gallbladder exceed what would be expected from increased model capacity alone.

Authors: We will introduce new controlled ablations that match parameter count exactly: one variant adds redundant scan branches without topology awareness, and another scales the baseline SSM capacity to match TopoMamba. We will report per-organ Dice scores on pancreas and gallbladder for these capacity-controlled variants alongside the full model, demonstrating that the observed gains exceed those attributable to capacity alone. The new table and discussion will be placed in the Experiments section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is algorithmic construction validated by experiments.

full rationale

The paper presents TopoMamba as a practical algorithmic framework (TopoA-Scan + Cross-Scan + HSIC Gate + ScanCache) for medical segmentation. No equations, derivations, or self-referential reductions appear in the abstract or described claims that would make any 'prediction' equivalent to its inputs by construction. Improvements are shown via empirical results on Synapse CT, ISIC 2017, and CVC-ClinicDB rather than fitted parameters renamed as outputs. No load-bearing self-citations, uniqueness theorems, or smuggled ansatzes are referenced in the provided text. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, mathematical axioms, or postulated entities; the contributions are algorithmic modules whose internal hyperparameters are not described.

pith-pipeline@v0.9.0 · 5590 in / 1191 out tokens · 50323 ms · 2026-05-07T16:42:50.834258+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 7 canonical work pages · 3 internal anchors

[1]

Segment anything in medical images,

J. Ma, Y . He, F. Li, L. Han, C. You, and B. Wang, “Segment anything in medical images,”Nature Communications, vol. 15, no. 1, p. 654, 2024

2024
[2]

A generalist foundation model and database for open-world medical image segmentation,

S. Zhang, Q. Zhang, S. Zhang, X. Liu, J. Yue, M. Lu, H. Xu, J. Yao, X. Wei, J. Caoet al., “A generalist foundation model and database for open-world medical image segmentation,”Nature Biomedical Engineer- ing, pp. 1–16, 2025

2025
[3]

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual representation learning with bidirectional state space model,”arXiv preprint arXiv:2401.09417, 2024

work page internal anchor Pith review arXiv 2024
[4]

VMamba: Visual State Space Model

Y . Liu, Y . Tian, Y . Zhao, H. Yu, L. Xie, Y . Wang, Q. Ye, and Y . Liu, “Vmamba: Visual state space model 2024,”arXiv preprint arXiv:2401.10166, 2024

work page internal anchor Pith review arXiv 2024
[5]

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

J. Ma, F. Li, and B. Wang, “U-mamba: Enhancing long-range dependency for biomedical image segmentation,”arXiv preprint arXiv:2401.04722, 2024

work page internal anchor Pith review arXiv 2024
[6]

Swin-umamba: Mamba-based unet with imagenet-based pretraining,

J. Liu, H. Yang, H.-Y . Zhou, Y . Xi, L. Yu, C. Li, Y . Liang, G. Shi, Y . Yu, S. Zhanget al., “Swin-umamba: Mamba-based unet with imagenet-based pretraining,” inInternational conference on medical image computing and computer-assisted intervention. Springer, 2024, pp. 615–625

2024
[7]

Vm-unet: Vision mamba unet for medical image segmentation,

J. Ruan, J. Li, and S. Xiang, “Vm-unet: Vision mamba unet for medical image segmentation,”ACM Transactions on Multimedia Computing, Communications and Applications, 2024

2024
[8]

Zigma: A dit-style zigzag mamba diffusion model,

V . T. Hu, S. A. Baumann, M. Gui, O. Grebenkova, P. Ma, J. Fischer, and B. Ommer, “Zigma: A dit-style zigzag mamba diffusion model,” in European conference on computer vision. Springer, 2024, pp. 148–166

2024
[9]

Dynamic snake convo- lution based on topological geometric constraints for tubular structure segmentation,

Y . Qi, Y . He, X. Qi, Y . Zhang, and G. Yang, “Dynamic snake convo- lution based on topological geometric constraints for tubular structure segmentation,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 6070–6079

2023
[10]

Temporal ensembling for semi-supervised learn- ing,

S. Laine and T. Aila, “Temporal ensembling for semi-supervised learn- ing,” inICLR, 2017

2017
[11]

Rsmamba: Remote sensing image classification with state space model,

K. Chen, B. Chen, C. Liu, W. Li, Z. Zou, and Z. Shi, “Rsmamba: Remote sensing image classification with state space model,”IEEE Geoscience and Remote Sensing Letters, vol. 21, pp. 1–5, 2024

2024
[12]

Plainmamba: Improving non- hierarchical mamba in visual recognition

C. Yang, Z. Chen, M. Espinosa, L. Ericsson, Z. Wang, J. Liu, and E. J. Crowley, “Plainmamba: Improving non-hierarchical mamba in visual recognition,”arXiv preprint arXiv:2403.17695, 2024

work page arXiv 2024
[13]

Measuring statistical dependence with hilbert-schmidt norms,

A. Gretton, O. Bousquet, A. Smola, and B. Sch ¨olkopf, “Measuring statistical dependence with hilbert-schmidt norms,” inInternational conference on algorithmic learning theory. Springer, 2005, pp. 63– 77

2005
[14]

A kernel statistical test of independence,

A. Gretton, K. Fukumizu, C. Teo, L. Song, B. Sch ¨olkopf, and A. Smola, “A kernel statistical test of independence,”Advances in neural informa- tion processing systems, vol. 20, 2007

2007
[15]

Feature se- lection via dependence maximization,

L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt, “Feature se- lection via dependence maximization,”The Journal of Machine Learning Research, vol. 13, no. 1, pp. 1393–1434, 2012

2012
[16]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMedical image computing and computer-assisted intervention, 2015, pp. 234–241

2015
[17]

Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,

Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,”IEEE Transactions on Medical Imaging, 2019

2019
[18]

An image is worth 16x16 words: Trans- formers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Trans- formers for image recognition at scale,” inInternational Conference on Learning Representations, 2021

2021
[19]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9992–10 002

2021
[20]

Neural memory state space models for medical image segmentation,

Z. Wang, J. Gu, W. Zhou, Q. He, T. Zhao, J. Guo, L. Lu, T. He, and J. Bu, “Neural memory state space models for medical image segmentation,” International Journal of Neural Systems, vol. 35, no. 1, p. 2450068, 2025

2025
[21]

An enhanced visual state space model for myocardial pathology segmentation in multi- sequence cardiac mri,

S. Li, X. Li, P. Wang, K. Liu, B. Wei, and J. Cong, “An enhanced visual state space model for myocardial pathology segmentation in multi- sequence cardiac mri,”Medical Physics, vol. 52, no. 6, pp. 4355–4370, 2025

2025
[22]

Dcss-unet: Unet based on state space model for polyp segmentation,

X. Wang and B. Li, “Dcss-unet: Unet based on state space model for polyp segmentation,”Frontiers in Computing and Intelligent Systems, vol. 9, no. 3, pp. 32–39, 2024

2024
[23]

Axial-deeplab: Stand-alone axial-attention for panoptic segmentation,

H. Wang, Y . Zhu, B. Green, H. Adam, A. Yuille, and L.-C. Chen, “Axial-deeplab: Stand-alone axial-attention for panoptic segmentation,” inEuropean conference on computer vision. Springer, 2020, pp. 108– 126

2020
[24]

Ccnet: Criss-cross attention for semantic segmentation,

Z. Huang, X. Wang, L. Huang, C. Huang, Y . Wei, and W. Liu, “Ccnet: Criss-cross attention for semantic segmentation,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 603– 612

2019
[25]

Rotate to scan: Unet-like mamba with triplet ssm module for medical image segmentation,

H. Tang, L. Cheng, G. Huang, Z. Tan, J. Lu, and K. Wu, “Rotate to scan: Unet-like mamba with triplet ssm module for medical image segmentation,”arXiv preprint arXiv:2403.17701, 2024

work page arXiv 2024
[26]

Topology-aware wavelet mamba for airway structure segmentation in postoperative recurrent nasopharyngeal carcinoma ct scans,

H. Huang, P. Liang, N. Lin, L. Wang, B. Pu, J. Chen, Q. Chang, X. Shen, and G. Ran, “Topology-aware wavelet mamba for airway structure segmentation in postoperative recurrent nasopharyngeal carcinoma ct scans,”CoRR, vol. abs/2502.14363, 2025. [Online]. Available: https://arxiv.org/abs/2502.14363

work page arXiv 2025
[27]

Jpeg2000: Standard for interactive imaging,

D. S. Taubman and M. W. Marcellin, “Jpeg2000: Standard for interactive imaging,”Proceedings of the IEEE, vol. 90, no. 8, pp. 1336–1357, 2002

2002
[28]

W. B. Pennebaker and J. L. Mitchell,JPEG: Still image data compres- sion standard. Springer Science & Business Media, 1992

1992
[29]

Cuda c++ programming guide,

D. Guide, “Cuda c++ programming guide,”NVIDIA, July, 2020

2020
[30]

Nvidia tensor core programmability, performance & precision,

S. Markidis, S. W. Der Chien, E. Laure, I. B. Peng, and J. S. Vetter, “Nvidia tensor core programmability, performance & precision,” in 2018 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, 2018, pp. 522–531

2018
[31]

Scfmunet: A fusion architecture based on multi-scale state space model and channel attention for medical image segmentation,

Z. Huang, Z. Zhao, Z. Yu, M. Hou, S. Zhou, J. Wang, Y . Yan, Y . Liu, and H. Gregersen, “Scfmunet: A fusion architecture based on multi-scale state space model and channel attention for medical image segmentation,”Neural Networks, vol. 192, p. 107919, 2025

2025
[32]

A dual-branch network for lesion segmentation in medical images using state space models,

H. Chen, B.-W. Min, and H. Zhang, “A dual-branch network for lesion segmentation in medical images using state space models,”Quantitative Imaging in Medicine and Surgery, vol. 15, no. 12, pp. 11 977–11 991, 2025

2025
[33]

Toposegnet: Scalable topology preser- vation in image segmentation via critical points,

M. Ahmadkhani and E. Shook, “Toposegnet: Scalable topology preser- vation in image segmentation via critical points,”Computer Vision and Image Understanding, vol. 262, p. 104564, 2025

2025
[34]

{ARC}: A{Self-Tuning}, low overhead replacement cache,

N. Megiddo and D. S. Modha, “{ARC}: A{Self-Tuning}, low overhead replacement cache,” in2nd USENIX Conference on File and Storage Technologies (FAST 03), 2003

2003
[35]

Extensions of lipschitz mappings into a hilbert space,

W. B. Johnson, J. Lindenstrausset al., “Extensions of lipschitz mappings into a hilbert space,”Contemporary mathematics, vol. 26, no. 189-206, p. 1, 1984

1984
[36]

Bader,Space-filling curves: an introduction with applications in scientific computing

M. Bader,Space-filling curves: an introduction with applications in scientific computing. Springer Science & Business Media, 2012, vol. 9

2012
[37]

The jpeg still picture compression standard,

G. K. Wallace, “The jpeg still picture compression standard,”IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii–xxxiv, 2002

2002
[38]

Understanding the effective receptive field in deep convolutional neural networks,

W. Luo, Y . Li, R. Urtasun, and R. Zemel, “Understanding the effective receptive field in deep convolutional neural networks,”Advances in neural information processing systems, vol. 29, 2016

2016
[39]

Database-friendly random projections: Johnson- lindenstrauss with binary coins,

D. Achlioptas, “Database-friendly random projections: Johnson- lindenstrauss with binary coins,”Journal of computer and System Sciences, vol. 66, no. 4, pp. 671–687, 2003

2003
[40]

Sch ¨olkopf and A

B. Sch ¨olkopf and A. J. Smola,Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2002

2002
[41]

Shawe-Taylor and N

J. Shawe-Taylor and N. Cristianini,Kernel methods for pattern analysis. Cambridge university press, 2004

2004
[42]

Camps-Valls and L

G. Camps-Valls and L. Bruzzone,Kernel methods for remote sensing data analysis. John Wiley & Sons, 2009

2009
[43]

nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,

F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,”Nature methods, vol. 18, no. 2, pp. 203–211, 2021

2021
[44]

Medsegdiff: Medical image segmentation with diffusion probabilistic model,

J. Wu, R. Fu, H. Fang, Y . Zhang, Y . Yang, H. Xiong, H. Liu, and Y . Xu, “Medsegdiff: Medical image segmentation with diffusion probabilistic model,” inMedical Imaging with Deep Learning. PMLR, 2024, pp. 1623–1639

2024
[45]

Self-supervised pre-training of swin transformers for 3d medical image analysis,

Y . Tang, D. Yang, W. Li, H. R. Roth, B. Landman, D. Xu, V . Nath, and A. Hatamizadeh, “Self-supervised pre-training of swin transformers for 3d medical image analysis,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 20 730–20 740

2022
[46]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026

2023
[47]

Unleashing the potential of sam for medical adaptation via hierarchical decoding,

Z. Cheng, Q. Wei, H. Zhu, Y . Wang, L. Qu, W. Shao, and Y . Zhou, “Unleashing the potential of sam for medical adaptation via hierarchical decoding,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 3511–3522

2024
[48]

Swin-unet: Unet-like pure transformer for medical image segmenta- tion,

H. Cao, Y . Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmenta- tion,” inECCV, 2022, pp. 205–218

2022
[49]

Mamba-unet: Unet- like pure visual mamba for medical image segmentation,

Z. Wang, J.-Q. Zheng, Y . Zhang, G. Cui, and L. Li, “Mamba-unet: Unet- like pure visual mamba for medical image segmentation,”arXiv preprint arXiv:2402.05079, 2024

work page arXiv 2024
[50]

Medical image computing and computer-assisted intervention multi- atlas labeling beyond the cranial vault–workshop and challenge,

B. Landman, Z. Xu, J. Igelsias, M. Styner, T. Langerak, and A. Klein, “Medical image computing and computer-assisted intervention multi- atlas labeling beyond the cranial vault–workshop and challenge,” in Medical image computing and computer-assisted intervention, vol. 5, 2015, p. 12

2015
[51]

N. C. Codella, D. Gutman, M. E. Celebi, B. Helba, M. A. Marchetti, S. W. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kittleret al., “Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic),” in2018 IEEE 15th international sy...

2017
[52]

Cvc-clinicdb,

J. Bernal, F. J. S ´anchez, G. Fern ´andez-Esparrach, D. Gil, C. Rodr ´ıguez, and F. Vilari ˜no, “Cvc-clinicdb,” 2015. [Online]. Available: https: //polyp.grand-challenge.org/CVCClinicDB/

2015
[53]

Topology-aware focal loss for 3d image segmentation,

A. Demir, E. Massaad, and B. Kiziltan, “Topology-aware focal loss for 3d image segmentation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 580–589

2023
[54]

Squeeze-and-excitation networks,

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2018, pp. 7132–7141

2018
[55]

Backpropagation-free network for 3d test-time adaptation,

Y . Wang, A. Cheraghian, Z. Hayder, J. Hong, S. Ramasinghe, S. Rah- man, D. Ahmedt-Aristizabal, X. Li, L. Petersson, and M. Harandi, “Backpropagation-free network for 3d test-time adaptation,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 23 231–23 241

2024
[56]

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders,

R. Zhang, L. Wang, Y . Qiao, P. Gao, and H. Li, “Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 21 769–21 780

2023
[57]

Convolutional fine-grained classification with self-supervised target relation regularization,

K. Liu, K. Chen, and K. Jia, “Convolutional fine-grained classification with self-supervised target relation regularization,”IEEE Transactions on Image Processing, vol. 31, pp. 5570–5584, 2022. APPENDIX A. Preliminaries of SSM State-space models (SSMs) describe sequential processing through a hidden-state evolution: dh(t) dt =Ah(t) +Bx(t),(9) y(t) =Ch(t) ...

2022
[58]

These are the only two cases

At the boundary between two consecutive diagonal segments, the alternating reversal ensures that the terminal point of one segment and the initial point of the next segment differ by either(1,0)or(0,1), hence they are 4-neighbors and their distance is1. These are the only two cases. Corollary 1(Extension to anti-diagonal and reversed TopoA sequences).The ...
[59]

Apply Johnson-Lindenstrauss projection [35], [39] to re- duce the sequence dimension before kernel construction
[60]

Build RBF kernels [40] with a median bandwidth heuris- tic on the projected channel descriptors
[61]

Center the kernels by row mean, column mean, and global mean before computing the normalized Frobenius inner product
[62]

Convert the resulting HSIC score into a sigmoid gate and retain a TopoA-biased residual shortcut for stability. We intentionally avoid stronger claims such as target-variable dependence, mutual-information approximation, or topology guarantees, because the gate is used here purely as a compact dependence-aware fusion rule. In the paper setting,αis initial...

2017

[1] [1]

Segment anything in medical images,

J. Ma, Y . He, F. Li, L. Han, C. You, and B. Wang, “Segment anything in medical images,”Nature Communications, vol. 15, no. 1, p. 654, 2024

2024

[2] [2]

A generalist foundation model and database for open-world medical image segmentation,

S. Zhang, Q. Zhang, S. Zhang, X. Liu, J. Yue, M. Lu, H. Xu, J. Yao, X. Wei, J. Caoet al., “A generalist foundation model and database for open-world medical image segmentation,”Nature Biomedical Engineer- ing, pp. 1–16, 2025

2025

[3] [3]

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual representation learning with bidirectional state space model,”arXiv preprint arXiv:2401.09417, 2024

work page internal anchor Pith review arXiv 2024

[4] [4]

VMamba: Visual State Space Model

Y . Liu, Y . Tian, Y . Zhao, H. Yu, L. Xie, Y . Wang, Q. Ye, and Y . Liu, “Vmamba: Visual state space model 2024,”arXiv preprint arXiv:2401.10166, 2024

work page internal anchor Pith review arXiv 2024

[5] [5]

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

J. Ma, F. Li, and B. Wang, “U-mamba: Enhancing long-range dependency for biomedical image segmentation,”arXiv preprint arXiv:2401.04722, 2024

work page internal anchor Pith review arXiv 2024

[6] [6]

Swin-umamba: Mamba-based unet with imagenet-based pretraining,

J. Liu, H. Yang, H.-Y . Zhou, Y . Xi, L. Yu, C. Li, Y . Liang, G. Shi, Y . Yu, S. Zhanget al., “Swin-umamba: Mamba-based unet with imagenet-based pretraining,” inInternational conference on medical image computing and computer-assisted intervention. Springer, 2024, pp. 615–625

2024

[7] [7]

Vm-unet: Vision mamba unet for medical image segmentation,

J. Ruan, J. Li, and S. Xiang, “Vm-unet: Vision mamba unet for medical image segmentation,”ACM Transactions on Multimedia Computing, Communications and Applications, 2024

2024

[8] [8]

Zigma: A dit-style zigzag mamba diffusion model,

V . T. Hu, S. A. Baumann, M. Gui, O. Grebenkova, P. Ma, J. Fischer, and B. Ommer, “Zigma: A dit-style zigzag mamba diffusion model,” in European conference on computer vision. Springer, 2024, pp. 148–166

2024

[9] [9]

Dynamic snake convo- lution based on topological geometric constraints for tubular structure segmentation,

Y . Qi, Y . He, X. Qi, Y . Zhang, and G. Yang, “Dynamic snake convo- lution based on topological geometric constraints for tubular structure segmentation,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 6070–6079

2023

[10] [10]

Temporal ensembling for semi-supervised learn- ing,

S. Laine and T. Aila, “Temporal ensembling for semi-supervised learn- ing,” inICLR, 2017

2017

[11] [11]

Rsmamba: Remote sensing image classification with state space model,

K. Chen, B. Chen, C. Liu, W. Li, Z. Zou, and Z. Shi, “Rsmamba: Remote sensing image classification with state space model,”IEEE Geoscience and Remote Sensing Letters, vol. 21, pp. 1–5, 2024

2024

[12] [12]

Plainmamba: Improving non- hierarchical mamba in visual recognition

C. Yang, Z. Chen, M. Espinosa, L. Ericsson, Z. Wang, J. Liu, and E. J. Crowley, “Plainmamba: Improving non-hierarchical mamba in visual recognition,”arXiv preprint arXiv:2403.17695, 2024

work page arXiv 2024

[13] [13]

Measuring statistical dependence with hilbert-schmidt norms,

A. Gretton, O. Bousquet, A. Smola, and B. Sch ¨olkopf, “Measuring statistical dependence with hilbert-schmidt norms,” inInternational conference on algorithmic learning theory. Springer, 2005, pp. 63– 77

2005

[14] [14]

A kernel statistical test of independence,

A. Gretton, K. Fukumizu, C. Teo, L. Song, B. Sch ¨olkopf, and A. Smola, “A kernel statistical test of independence,”Advances in neural informa- tion processing systems, vol. 20, 2007

2007

[15] [15]

Feature se- lection via dependence maximization,

L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt, “Feature se- lection via dependence maximization,”The Journal of Machine Learning Research, vol. 13, no. 1, pp. 1393–1434, 2012

2012

[16] [16]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMedical image computing and computer-assisted intervention, 2015, pp. 234–241

2015

[17] [17]

Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,

Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,”IEEE Transactions on Medical Imaging, 2019

2019

[18] [18]

An image is worth 16x16 words: Trans- formers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Trans- formers for image recognition at scale,” inInternational Conference on Learning Representations, 2021

2021

[19] [19]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9992–10 002

2021

[20] [20]

Neural memory state space models for medical image segmentation,

Z. Wang, J. Gu, W. Zhou, Q. He, T. Zhao, J. Guo, L. Lu, T. He, and J. Bu, “Neural memory state space models for medical image segmentation,” International Journal of Neural Systems, vol. 35, no. 1, p. 2450068, 2025

2025

[21] [21]

An enhanced visual state space model for myocardial pathology segmentation in multi- sequence cardiac mri,

S. Li, X. Li, P. Wang, K. Liu, B. Wei, and J. Cong, “An enhanced visual state space model for myocardial pathology segmentation in multi- sequence cardiac mri,”Medical Physics, vol. 52, no. 6, pp. 4355–4370, 2025

2025

[22] [22]

Dcss-unet: Unet based on state space model for polyp segmentation,

X. Wang and B. Li, “Dcss-unet: Unet based on state space model for polyp segmentation,”Frontiers in Computing and Intelligent Systems, vol. 9, no. 3, pp. 32–39, 2024

2024

[23] [23]

Axial-deeplab: Stand-alone axial-attention for panoptic segmentation,

H. Wang, Y . Zhu, B. Green, H. Adam, A. Yuille, and L.-C. Chen, “Axial-deeplab: Stand-alone axial-attention for panoptic segmentation,” inEuropean conference on computer vision. Springer, 2020, pp. 108– 126

2020

[24] [24]

Ccnet: Criss-cross attention for semantic segmentation,

Z. Huang, X. Wang, L. Huang, C. Huang, Y . Wei, and W. Liu, “Ccnet: Criss-cross attention for semantic segmentation,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 603– 612

2019

[25] [25]

Rotate to scan: Unet-like mamba with triplet ssm module for medical image segmentation,

H. Tang, L. Cheng, G. Huang, Z. Tan, J. Lu, and K. Wu, “Rotate to scan: Unet-like mamba with triplet ssm module for medical image segmentation,”arXiv preprint arXiv:2403.17701, 2024

work page arXiv 2024

[26] [26]

Topology-aware wavelet mamba for airway structure segmentation in postoperative recurrent nasopharyngeal carcinoma ct scans,

H. Huang, P. Liang, N. Lin, L. Wang, B. Pu, J. Chen, Q. Chang, X. Shen, and G. Ran, “Topology-aware wavelet mamba for airway structure segmentation in postoperative recurrent nasopharyngeal carcinoma ct scans,”CoRR, vol. abs/2502.14363, 2025. [Online]. Available: https://arxiv.org/abs/2502.14363

work page arXiv 2025

[27] [27]

Jpeg2000: Standard for interactive imaging,

D. S. Taubman and M. W. Marcellin, “Jpeg2000: Standard for interactive imaging,”Proceedings of the IEEE, vol. 90, no. 8, pp. 1336–1357, 2002

2002

[28] [28]

W. B. Pennebaker and J. L. Mitchell,JPEG: Still image data compres- sion standard. Springer Science & Business Media, 1992

1992

[29] [29]

Cuda c++ programming guide,

D. Guide, “Cuda c++ programming guide,”NVIDIA, July, 2020

2020

[30] [30]

Nvidia tensor core programmability, performance & precision,

S. Markidis, S. W. Der Chien, E. Laure, I. B. Peng, and J. S. Vetter, “Nvidia tensor core programmability, performance & precision,” in 2018 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, 2018, pp. 522–531

2018

[31] [31]

Scfmunet: A fusion architecture based on multi-scale state space model and channel attention for medical image segmentation,

Z. Huang, Z. Zhao, Z. Yu, M. Hou, S. Zhou, J. Wang, Y . Yan, Y . Liu, and H. Gregersen, “Scfmunet: A fusion architecture based on multi-scale state space model and channel attention for medical image segmentation,”Neural Networks, vol. 192, p. 107919, 2025

2025

[32] [32]

A dual-branch network for lesion segmentation in medical images using state space models,

H. Chen, B.-W. Min, and H. Zhang, “A dual-branch network for lesion segmentation in medical images using state space models,”Quantitative Imaging in Medicine and Surgery, vol. 15, no. 12, pp. 11 977–11 991, 2025

2025

[33] [33]

Toposegnet: Scalable topology preser- vation in image segmentation via critical points,

M. Ahmadkhani and E. Shook, “Toposegnet: Scalable topology preser- vation in image segmentation via critical points,”Computer Vision and Image Understanding, vol. 262, p. 104564, 2025

2025

[34] [34]

{ARC}: A{Self-Tuning}, low overhead replacement cache,

N. Megiddo and D. S. Modha, “{ARC}: A{Self-Tuning}, low overhead replacement cache,” in2nd USENIX Conference on File and Storage Technologies (FAST 03), 2003

2003

[35] [35]

Extensions of lipschitz mappings into a hilbert space,

W. B. Johnson, J. Lindenstrausset al., “Extensions of lipschitz mappings into a hilbert space,”Contemporary mathematics, vol. 26, no. 189-206, p. 1, 1984

1984

[36] [36]

Bader,Space-filling curves: an introduction with applications in scientific computing

M. Bader,Space-filling curves: an introduction with applications in scientific computing. Springer Science & Business Media, 2012, vol. 9

2012

[37] [37]

The jpeg still picture compression standard,

G. K. Wallace, “The jpeg still picture compression standard,”IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii–xxxiv, 2002

2002

[38] [38]

Understanding the effective receptive field in deep convolutional neural networks,

W. Luo, Y . Li, R. Urtasun, and R. Zemel, “Understanding the effective receptive field in deep convolutional neural networks,”Advances in neural information processing systems, vol. 29, 2016

2016

[39] [39]

Database-friendly random projections: Johnson- lindenstrauss with binary coins,

D. Achlioptas, “Database-friendly random projections: Johnson- lindenstrauss with binary coins,”Journal of computer and System Sciences, vol. 66, no. 4, pp. 671–687, 2003

2003

[40] [40]

Sch ¨olkopf and A

B. Sch ¨olkopf and A. J. Smola,Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2002

2002

[41] [41]

Shawe-Taylor and N

J. Shawe-Taylor and N. Cristianini,Kernel methods for pattern analysis. Cambridge university press, 2004

2004

[42] [42]

Camps-Valls and L

G. Camps-Valls and L. Bruzzone,Kernel methods for remote sensing data analysis. John Wiley & Sons, 2009

2009

[43] [43]

nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,

F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,”Nature methods, vol. 18, no. 2, pp. 203–211, 2021

2021

[44] [44]

Medsegdiff: Medical image segmentation with diffusion probabilistic model,

J. Wu, R. Fu, H. Fang, Y . Zhang, Y . Yang, H. Xiong, H. Liu, and Y . Xu, “Medsegdiff: Medical image segmentation with diffusion probabilistic model,” inMedical Imaging with Deep Learning. PMLR, 2024, pp. 1623–1639

2024

[45] [45]

Self-supervised pre-training of swin transformers for 3d medical image analysis,

Y . Tang, D. Yang, W. Li, H. R. Roth, B. Landman, D. Xu, V . Nath, and A. Hatamizadeh, “Self-supervised pre-training of swin transformers for 3d medical image analysis,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 20 730–20 740

2022

[46] [46]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026

2023

[47] [47]

Unleashing the potential of sam for medical adaptation via hierarchical decoding,

Z. Cheng, Q. Wei, H. Zhu, Y . Wang, L. Qu, W. Shao, and Y . Zhou, “Unleashing the potential of sam for medical adaptation via hierarchical decoding,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 3511–3522

2024

[48] [48]

Swin-unet: Unet-like pure transformer for medical image segmenta- tion,

H. Cao, Y . Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmenta- tion,” inECCV, 2022, pp. 205–218

2022

[49] [49]

Mamba-unet: Unet- like pure visual mamba for medical image segmentation,

Z. Wang, J.-Q. Zheng, Y . Zhang, G. Cui, and L. Li, “Mamba-unet: Unet- like pure visual mamba for medical image segmentation,”arXiv preprint arXiv:2402.05079, 2024

work page arXiv 2024

[50] [50]

Medical image computing and computer-assisted intervention multi- atlas labeling beyond the cranial vault–workshop and challenge,

B. Landman, Z. Xu, J. Igelsias, M. Styner, T. Langerak, and A. Klein, “Medical image computing and computer-assisted intervention multi- atlas labeling beyond the cranial vault–workshop and challenge,” in Medical image computing and computer-assisted intervention, vol. 5, 2015, p. 12

2015

[51] [51]

N. C. Codella, D. Gutman, M. E. Celebi, B. Helba, M. A. Marchetti, S. W. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kittleret al., “Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic),” in2018 IEEE 15th international sy...

2017

[52] [52]

Cvc-clinicdb,

J. Bernal, F. J. S ´anchez, G. Fern ´andez-Esparrach, D. Gil, C. Rodr ´ıguez, and F. Vilari ˜no, “Cvc-clinicdb,” 2015. [Online]. Available: https: //polyp.grand-challenge.org/CVCClinicDB/

2015

[53] [53]

Topology-aware focal loss for 3d image segmentation,

A. Demir, E. Massaad, and B. Kiziltan, “Topology-aware focal loss for 3d image segmentation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 580–589

2023

[54] [54]

Squeeze-and-excitation networks,

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2018, pp. 7132–7141

2018

[55] [55]

Backpropagation-free network for 3d test-time adaptation,

Y . Wang, A. Cheraghian, Z. Hayder, J. Hong, S. Ramasinghe, S. Rah- man, D. Ahmedt-Aristizabal, X. Li, L. Petersson, and M. Harandi, “Backpropagation-free network for 3d test-time adaptation,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 23 231–23 241

2024

[56] [56]

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders,

R. Zhang, L. Wang, Y . Qiao, P. Gao, and H. Li, “Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 21 769–21 780

2023

[57] [57]

Convolutional fine-grained classification with self-supervised target relation regularization,

K. Liu, K. Chen, and K. Jia, “Convolutional fine-grained classification with self-supervised target relation regularization,”IEEE Transactions on Image Processing, vol. 31, pp. 5570–5584, 2022. APPENDIX A. Preliminaries of SSM State-space models (SSMs) describe sequential processing through a hidden-state evolution: dh(t) dt =Ah(t) +Bx(t),(9) y(t) =Ch(t) ...

2022

[58] [58]

These are the only two cases

At the boundary between two consecutive diagonal segments, the alternating reversal ensures that the terminal point of one segment and the initial point of the next segment differ by either(1,0)or(0,1), hence they are 4-neighbors and their distance is1. These are the only two cases. Corollary 1(Extension to anti-diagonal and reversed TopoA sequences).The ...

[59] [59]

Apply Johnson-Lindenstrauss projection [35], [39] to re- duce the sequence dimension before kernel construction

[60] [60]

Build RBF kernels [40] with a median bandwidth heuris- tic on the projected channel descriptors

[61] [61]

Center the kernels by row mean, column mean, and global mean before computing the normalized Frobenius inner product

[62] [62]

Convert the resulting HSIC score into a sigmoid gate and retain a TopoA-biased residual shortcut for stability. We intentionally avoid stronger claims such as target-variable dependence, mutual-information approximation, or topology guarantees, because the gate is used here purely as a compact dependence-aware fusion rule. In the paper setting,αis initial...

2017