Polygon-mamba: Retinal vessel segmentation using polygon scanning mamba and space-frequency collaborative attention
Pith reviewed 2026-05-12 03:48 UTC · model grok-4.3
The pith
Polygon scanning mamba maintains connectivity of small retinal vessels during segmentation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We design a hybrid CNN-Mamba fusion network that integrates polygon scanning mamba and space-frequency collaborative attention mechanism for the detection of small vessels. The polygon scanning visual state space model uses multi-layer reverse scanning to identify small vessel structural features and preserve pixel connectivity, mitigating information loss. The space-frequency collaborative attention mechanism extracts efficient features from spatial and frequency domains to dynamically enhance key features and suppress clutter.
What carries the argument
Polygon scanning visual state space model (PS-VSS) using multi-layer reverse scanning to preserve connectivity in small vessel structures
Load-bearing premise
The polygon scanning and space-frequency attention mechanisms will continue to preserve small vessel connectivity and enhance features effectively on retinal datasets beyond the three used in the study without introducing artifacts.
What would settle it
Running the model on a fourth independent retinal vessel dataset and checking if small vessels show continuous paths matching the annotations or if new breaks and false positives appear.
read the original abstract
Retinal vessel segmentation is crucial for diagnosis and assessment of ocular diseases. Notably, segmentation of small retinal vessels has been consistently recognized as a challenging and complex task. To tackle this challenge, we design a hybrid CNN-Mamba fusion network that integrates polygon scanning mamba and space-frequency collaborative attention mechanism for the detection of small vessels. Considering that the traditional mamba architecture with horizontal-vertical scanning may compromise the topological integrity of target structures and result in local discontinuities in small retinal vessels, we present a polygon scanning visual state space model (PS-VSS) to identify small vessel structural features by multi-layer reverse scanning way. Which effectively preserves pixels connectivity, thereby substantially mitigating the loss of information pertaining to small vessels. Furthermore, as we all known that the spatial domain prioritizes positional and structural information, while the frequency domain emphasizes global perception and local detail components, a space-frequency collaborative attention mechanism (SFCAM) is introduced within the skip connection to extract efficient features from the spatial and frequency domains. This strategy empowers the model to dynamically enhance the key features while effectively suppressing clutters. To assess the efficacy of our model, it was tested on three publicly available datasets: DRIVE, STARE, and CHASE_DB1. Compared to manual annotations, our model demonstrated F1 scores of 0.8283, 0.8282, and 0.8251, Area Under Curve (AUC) values of 0.9806, 0.9840, and 0.9866, and Sensitivity (SE) values of of 0.8268, 0.8314, and 0.8484 across three datasets, respectively. The effectiveness of our model was validated through both visual inspection and quantitative analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Polygon-Mamba, a hybrid CNN-Mamba fusion network for retinal vessel segmentation. It introduces a Polygon Scanning Visual State Space (PS-VSS) module that employs multi-layer reverse polygon scanning to preserve topological connectivity of small vessels (addressing discontinuities from standard horizontal-vertical Mamba scans), and a Space-Frequency Collaborative Attention Mechanism (SFCAM) placed in skip connections to fuse spatial positional information with frequency-domain global and local details. The model is evaluated on the DRIVE, STARE, and CHASE_DB1 datasets, reporting F1 scores of 0.8283/0.8282/0.8251, AUC values of 0.9806/0.9840/0.9866, and sensitivities of 0.8268/0.8314/0.8484.
Significance. If the performance gains can be rigorously attributed to the polygon scanning and space-frequency fusion rather than training details or the base CNN-Mamba backbone, the work would provide a useful direction for maintaining vessel continuity in thin-structure segmentation tasks. The Mamba-based scanning offers an efficiency-oriented alternative to attention mechanisms, and the hybrid design could inform subsequent medical imaging models that prioritize both local connectivity and global context.
major comments (3)
- [§4 Experiments] §4 Experiments and §4.2 Ablation Study (if present): No ablation experiments are described that isolate the PS-VSS polygon scanning (e.g., by replacing it with standard raster or bidirectional scanning on the identical backbone) or the SFCAM (e.g., by substituting standard spatial attention or removing the frequency branch). Without these, the reported F1/SE improvements cannot be attributed to the claimed connectivity preservation and artifact suppression rather than hyperparameter choices or the overall architecture.
- [§4.1 Quantitative Results] §4.1 Quantitative Results: The evaluation reports only standard pixel-wise metrics (F1, AUC, SE) and visual inspection; no topology-specific metrics (e.g., connected-component counts, Betti numbers, or vessel continuity scores) or direct side-by-side comparisons of scanning orders are provided to verify that multi-layer reverse polygon scanning specifically reduces discontinuities in small vessels compared with conventional Mamba scanning.
- [§4 Experiments] §4 and §3 Method: Training procedures, data augmentation, hyperparameter search, baseline re-implementations (versus literature numbers), and statistical significance tests for the metric differences are not detailed. This information is load-bearing for assessing whether the gains on DRIVE/STARE/CHASE_DB1 are robust and generalizable beyond the three tested datasets.
minor comments (3)
- [Abstract] Abstract: The phrase 'as we all known that' is grammatically incorrect and should be revised to 'as is well known' or equivalent.
- [Abstract] Abstract: Duplicate word 'of of' appears in 'Sensitivity (SE) values of of 0.8268'.
- [§3 Method] Method section: Module acronyms (PS-VSS, SFCAM) and the exact polygon scanning directions should be accompanied by a clear diagram or pseudocode for reproducibility.
Simulated Author's Rebuttal
We sincerely thank the referee for the constructive and detailed feedback. We have carefully reviewed each major comment and will incorporate revisions to address the concerns regarding experimental validation, metrics, and reproducibility.
read point-by-point responses
-
Referee: [§4 Experiments] §4 Experiments and §4.2 Ablation Study (if present): No ablation experiments are described that isolate the PS-VSS polygon scanning (e.g., by replacing it with standard raster or bidirectional scanning on the identical backbone) or the SFCAM (e.g., by substituting standard spatial attention or removing the frequency branch). Without these, the reported F1/SE improvements cannot be attributed to the claimed connectivity preservation and artifact suppression rather than hyperparameter choices or the overall architecture.
Authors: We agree that dedicated ablation studies are necessary to rigorously attribute performance gains to the polygon scanning in PS-VSS and the space-frequency fusion in SFCAM. The current manuscript validates the full model via SOTA comparisons and visuals but does not isolate these components. In the revised manuscript, we will add a dedicated ablation subsection with: (i) PS-VSS replaced by standard horizontal-vertical or bidirectional scanning on the identical backbone, and (ii) SFCAM replaced by standard spatial attention or with the frequency branch removed. Results will be reported on all three datasets to support the connectivity and artifact-suppression claims. revision: yes
-
Referee: [§4.1 Quantitative Results] §4.1 Quantitative Results: The evaluation reports only standard pixel-wise metrics (F1, AUC, SE) and visual inspection; no topology-specific metrics (e.g., connected-component counts, Betti numbers, or vessel continuity scores) or direct side-by-side comparisons of scanning orders are provided to verify that multi-layer reverse polygon scanning specifically reduces discontinuities in small vessels compared with conventional Mamba scanning.
Authors: We acknowledge that topology-aware metrics would provide stronger quantitative support for the connectivity-preservation benefit of multi-layer reverse polygon scanning. The manuscript currently relies on standard metrics plus qualitative visual evidence of improved small-vessel continuity. In the revision, we will add topology-specific evaluations, including connected-component counts and a vessel continuity score, together with direct side-by-side quantitative and visual comparisons of polygon versus standard raster scanning on representative images from DRIVE, STARE, and CHASE_DB1. revision: yes
-
Referee: [§4 Experiments] §4 and §3 Method: Training procedures, data augmentation, hyperparameter search, baseline re-implementations (versus literature numbers), and statistical significance tests for the metric differences are not detailed. This information is load-bearing for assessing whether the gains on DRIVE/STARE/CHASE_DB1 are robust and generalizable beyond the three tested datasets.
Authors: We agree that expanded methodological and experimental details are required for reproducibility and to substantiate the reported gains. The original manuscript provides only high-level descriptions. In the revised version we will: (i) detail all data-augmentation strategies, optimizer settings, learning-rate schedules, and hyperparameter search procedure in Section 3; (ii) clarify baseline re-implementations versus literature-reported numbers; and (iii) include statistical significance tests (e.g., paired t-tests with p-values) for metric differences across the three datasets, along with a brief discussion of generalizability limitations. revision: yes
Circularity Check
No circularity: empirical network design with independent validation
full rationale
The paper is an empirical deep-learning contribution that proposes a hybrid CNN-Mamba architecture (PS-VSS polygon scanning and SFCAM space-frequency attention) and evaluates it via standard training on public retinal datasets (DRIVE, STARE, CHASE_DB1), reporting F1/AUC/SE metrics. No derivation chain, equations, or predictions are presented that reduce by construction to fitted inputs, self-citations, or ansatzes. Performance numbers arise from gradient descent on held-out test splits rather than any self-referential loop. The work is therefore self-contained against external benchmarks with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Mamba-based state space models can capture long-range dependencies in 2D images when an appropriate scanning order is chosen.
- domain assumption Joint spatial and frequency domain processing improves discrimination of small structures over spatial-only attention.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we present a polygon scanning visual state space model (PS-VSS) to identify small vessel structural features by multi-layer reverse scanning way. Which effectively preserves pixels connectivity
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
a space-frequency collaborative attention mechanism (SFCAM) is introduced within the skip connection
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.