KaLDeX: Kalman Filter based Linear Deformable Cross Attention for Retina Vessel Segmentation
Pith reviewed 2026-05-23 19:16 UTC · model grok-4.3
The pith
KaLDeX adds a Kalman filter based linear deformable cross attention module to UNet++ to segment thin retinal vessels more accurately.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The KaLDeX network integrates a Kalman filter based linear deformable convolution module with cross-attention inside UNet++ and adds a persistent homology topological loss. This combination lets the model adaptively sample thin vessels that standard convolutions miss and aggregate multi-scale vascular features, producing higher segmentation accuracy across the tested retinal fundus and OCTA datasets.
What carries the argument
Kalman filter based linear deformable cross attention (LDCA) module, which uses Kalman filtering to guide adaptive sampling in deformable convolution and cross-attention to fuse detailed and high-level features.
If this is right
- Reports average accuracy of 97.25% on DRIVE, 97.77% on CHASE_DB1, 97.85% on STARE, 98.89% on 3 mm OCTA-500, and 98.21% on 6 mm OCTA-500.
- Outperforms prior best models on all five evaluated vessel segmentation datasets.
- The LD module focuses sampling on thin vessels while the CA module improves global structure awareness.
- The topological loss based on persistent homology constrains segmentation to preserve vessel continuity.
Where Pith is reading between the lines
- The LDCA design may transfer to other medical segmentation tasks that involve fine linear structures such as nerves or airways.
- Kalman filtering inside the attention block offers a route to incorporate prediction uncertainty directly into feature sampling.
- Replacing UNet++ with other encoder-decoder backbones could test whether the module's benefit is architecture-specific.
Load-bearing premise
The measured accuracy gains come from the LDCA module rather than from unstated differences in preprocessing, training protocols, or dataset-specific tuning.
What would settle it
An ablation experiment that removes only the LDCA module from KaLDeX, retrains on the same splits of DRIVE, and checks whether accuracy falls to the level of a plain UNet++ baseline.
Figures
read the original abstract
Background and Objective: In the realm of ophthalmic imaging, accurate vascular segmentation is paramount for diagnosing and managing various eye diseases. Contemporary deep learning-based vascular segmentation models rival human accuracy but still face substantial challenges in accurately segmenting minuscule blood vessels in neural network applications. Due to the necessity of multiple downsampling operations in the CNN models, fine details from high-resolution images are inevitably lost. The objective of this study is to design a structure to capture the delicate and small blood vessels. Methods: To address these issues, we propose a novel network (KaLDeX) for vascular segmentation leveraging a Kalman filter based linear deformable cross attention (LDCA) module, integrated within a UNet++ framework. Our approach is based on two key components: Kalman filter (KF) based linear deformable convolution (LD) and cross-attention (CA) modules. The LD module is designed to adaptively adjust the focus on thin vessels that might be overlooked in standard convolution. The CA module improves the global understanding of vascular structures by aggregating the detailed features from the LD module with the high level features from the UNet++ architecture. Finally, we adopt a topological loss function based on persistent homology to constrain the topological continuity of the segmentation. Results: The proposed method is evaluated on retinal fundus image datasets (DRIVE, CHASE_BD1, and STARE) as well as the 3mm and 6mm of the OCTA-500 dataset, achieving an average accuracy (ACC) of 97.25%, 97.77%, 97.85%, 98.89%, and 98.21%, respectively. Conclusions: Empirical evidence shows that our method outperforms the current best models on different vessel segmentation datasets. Our source code is available at: https://github.com/AIEyeSystem/KalDeX.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes KaLDeX, a UNet++-based architecture for retinal vessel segmentation that incorporates a Kalman filter-based linear deformable cross-attention (LDCA) module (combining linear deformable convolution and cross-attention) together with a persistent-homology topological loss. It evaluates the method on DRIVE, CHASE_DB1, STARE, and OCTA-500 (3 mm and 6 mm), reporting average accuracies of 97.25 %, 97.77 %, 97.85 %, 98.89 %, and 98.21 % respectively, and claims these results outperform prior state-of-the-art models.
Significance. If the reported gains can be shown to arise specifically from the Kalman-filter LDCA component rather than from the topological loss, training schedule, or preprocessing, the work would offer a concrete mechanism for recovering fine-scale vessels that are typically lost under repeated down-sampling. The public release of source code supports reproducibility and is a clear strength.
major comments (3)
- [Abstract / Results] Abstract and Results: the reported accuracies (97.25 % DRIVE, 97.77 % CHASE_DB1, etc.) are presented without error bars, standard deviations across runs, or statistical tests against the cited baselines. This prevents any assessment of whether the numerical improvements are reliable or merely within run-to-run variation.
- [Methods / Experiments] Methods and Experiments: no component-wise ablation is supplied that isolates the contribution of the Kalman-filter LDCA module from the persistent-homology topological loss or from the plain UNet++ backbone under identical training protocols and preprocessing. Because the central claim attributes the performance lift to the LDCA module, the absence of such controlled comparisons renders the attribution unverifiable.
- [Experiments] Experiments: the manuscript does not indicate whether the same data augmentation, optimizer schedule, and post-processing steps were used for all re-implemented baselines; any uncontrolled differences could account for the observed margins on DRIVE, CHASE_DB1, STARE, and OCTA-500.
minor comments (2)
- [Abstract] Abstract: the dataset name is written “CHASE_BD1”; the conventional spelling is CHASE_DB1.
- [Methods] Notation: the distinction between the linear deformable convolution (LD) and the full cross-attention (CA) sub-modules inside LDCA is not always maintained consistently in the text and figures.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and commit to revisions that will strengthen the empirical support and clarity of the manuscript.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and Results: the reported accuracies (97.25 % DRIVE, 97.77 % CHASE_DB1, etc.) are presented without error bars, standard deviations across runs, or statistical tests against the cited baselines. This prevents any assessment of whether the numerical improvements are reliable or merely within run-to-run variation.
Authors: We agree that variability measures and statistical tests are necessary for reliable assessment of improvements. In the revised manuscript we will report mean and standard deviation of accuracy (and other metrics) over at least five independent training runs with different random seeds, and we will add paired statistical tests (e.g., Wilcoxon signed-rank) against each cited baseline. revision: yes
-
Referee: [Methods / Experiments] Methods and Experiments: no component-wise ablation is supplied that isolates the contribution of the Kalman-filter LDCA module from the persistent-homology topological loss or from the plain UNet++ backbone under identical training protocols and preprocessing. Because the central claim attributes the performance lift to the LDCA module, the absence of such controlled comparisons renders the attribution unverifiable.
Authors: We concur that controlled ablations are required to substantiate the attribution to the LDCA module. The revised manuscript will include a component-wise ablation table that evaluates (i) plain UNet++, (ii) UNet++ plus topological loss, and (iii) the full KaLDeX model, all trained with identical protocols, preprocessing, and hyperparameters. revision: yes
-
Referee: [Experiments] Experiments: the manuscript does not indicate whether the same data augmentation, optimizer schedule, and post-processing steps were used for all re-implemented baselines; any uncontrolled differences could account for the observed margins on DRIVE, CHASE_DB1, STARE, and OCTA-500.
Authors: All baselines were re-implemented under the exact same data-augmentation pipeline, optimizer, learning-rate schedule, batch size, and post-processing steps used for KaLDeX. The revised manuscript will explicitly state this protocol equivalence in the Experiments section and will list the precise augmentation and post-processing operations applied uniformly. revision: yes
Circularity Check
No circularity; empirical claims rest on external public datasets
full rationale
The paper proposes KaLDeX (Kalman-filter LDCA inside UNet++ plus topological loss) and reports measured accuracies on the public DRIVE, CHASE_DB1, STARE and OCTA-500 benchmarks. No equations, parameters, or performance figures are defined in terms of themselves or obtained by fitting on the test sets and then relabeled as predictions. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text. The central claim therefore remains an ordinary empirical comparison against external data and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Gradient-based optimization on the combined segmentation and topological loss yields a network whose outputs generalize to unseen retinal images.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.lean; IndisputableMonolith/Foundation/AlexanderDuality.leanwashburn_uniqueness_aczel; alexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Kalman filter based linear deformable cross attention (LDCA) module... topological loss function based on persistent homology
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
S. You, E. Bas, D. Erdogmus, J. Kalpathy-Cramer, Principal curved based retinal vessel segmentation towards diagnosis of retinal diseases, in: 2011 IEEE First International Conference on Healthcare Informatics, Imaging and Systems Biology, IEEE, 2011, pp. 331–337
work page 2011
-
[2]
Y . Zhao, Z. Zhao, J. Yang, L. Li, M. A. Nasseri, D. Zapp, Ai-based fully automatic analysis of retinal vascular morphology in pediatric high my- opia, BMC ophthalmology 24 (1) (2024) 415
work page 2024
-
[3]
C. D. REGAN, C. S. FOSTER, Retinal vascular diseases: clinical presen- tation and diagnosis, International Ophthalmology Clinics 26 (2) (1986) 25–53
work page 1986
-
[4]
Z. Zhao, S. Faghihroohi, J. Yang, K. Huang, N. Navab, M. Maier, M. A. Nasseri, Unobtrusive biometric data de-identification of fundus images using latent space disentanglement, Biomedical Optics Express 14 (10) (2023) 5466–5483
work page 2023
-
[5]
T. H. Rim, A. W. J. Teo, H. H. S. Yang, C. Y . Cheung, T. Y . Wong, Retinal vascular signs and cerebrovascular diseases, Journal of Neuro- ophthalmology 40 (1) (2020) 44–59
work page 2020
- [6]
-
[7]
Q. Qin, Y . Chen, A review of retinal vessel segmentation for fundus image analysis, Engineering Applications of Artificial Intelligence 128 (2024) 107454
work page 2024
-
[8]
C. Chen, J. H. Chuah, R. Ali, Y . Wang, Retinal vessel segmentation using deep learning: a review, IEEE Access 9 (2021) 111985–112004
work page 2021
- [9]
-
[10]
J. V . Soares, J. J. Leandro, R. M. Cesar, H. F. Jelinek, M. J. Cree, Retinal vessel segmentation using the 2-d gabor wavelet and supervised classifi- cation, IEEE Transactions on medical Imaging 25 (9) (2006) 1214–1222
work page 2006
-
[11]
Z. Zhao, J. Yang, S. Faghihroohi, K. Huang, M. Maier, N. Navab, M. A. Nasseri, Label-preserving data augmentation in latent space for diabetic retinopathy recognition, in: International Conference on Medical Im- age Computing and Computer-Assisted Intervention, Springer, 2023, pp. 284–294
work page 2023
-
[12]
X. Li, Y . Jiang, M. Li, S. Yin, Lightweight attention convolutional neu- ral network for retinal vessel image segmentation, IEEE Transactions on Industrial Informatics 17 (3) (2020) 1958–1967
work page 2020
-
[13]
O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Con- ference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Springer, 2015, pp. 234–241
work page 2015
-
[14]
J. Hu, H. Wang, S. Gao, M. Bao, T. Liu, Y . Wang, J. Zhang, S-unet: A bridge-style u-net framework with a saliency mechanism for retinal vessel segmentation, IEEE Access 7 (2019) 174167–174177
work page 2019
-
[15]
Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, J. Liang, Unet ++: A nested u-net architecture for medical image segmentation, in: Deep Learning in Medical Image Analysis and Multimodal Learning for Clini- cal Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MIC- CAI 2018, Grana...
work page 2018
-
[16]
X. Li, W. Qian, D. Xu, C. Liu, Image segmentation based on improved unet, in: Journal of Physics: Conference Series, V ol. 1815, IOP Publish- ing, 2021, p. 012018
work page 2021
-
[17]
D. Hirahara, E. Takaya, M. Kadowaki, Y . Kobayashi, T. Ueda, E ffect of the pixel interpolation method for downsampling medical images on deep learning accuracy, Journal of Computer and Communications 9 (11) (2021) 150–156
work page 2021
-
[18]
J. Dai, H. Qi, Y . Xiong, Y . Li, G. Zhang, H. Hu, Y . Wei, Deformable con- volutional networks, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 764–773
work page 2017
-
[19]
Q. Jin, Z. Meng, T. D. Pham, Q. Chen, L. Wei, R. Su, Dunet: A de- formable network for retinal vessel segmentation, Knowledge-Based Sys- tems 178 (2019) 149–162
work page 2019
-
[20]
Y . Qi, Y . He, X. Qi, Y . Zhang, G. Yang, Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 6070–6079
work page 2023
-
[21]
S. Shit, J. C. Paetzold, A. Sekuboyina, I. Ezhov, A. Unger, A. Zhylka, J. P. Pluim, U. Bauer, B. H. Menze, cldice-a novel topology-preserving loss function for tubular structure segmentation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 16560–16569
work page 2021
-
[22]
J. R. Clough, N. Byrne, I. Oksuz, V . A. Zimmer, J. A. Schnabel, A. P. King, A topological loss function for deep-learning based image segmen- tation using persistent homology, IEEE transactions on pattern analysis and machine intelligence 44 (12) (2020) 8766–8778
work page 2020
-
[23]
J. Cervantes, J. Cervantes, F. Garc ´ıa-Lamont, A. Yee-Rendon, J. E. Cabr- era, L. D. Jalili, A comprehensive survey on segmentation techniques for retinal vessel segmentation, Neurocomputing 556 (2023) 126626
work page 2023
-
[24]
S. Y . Shin, S. Lee, I. D. Yun, K. M. Lee, Deep vessel segmentation by learning graphical connectivity, Medical image analysis 58 (2019) 101556
work page 2019
-
[25]
N. Siddique, S. Paheding, C. P. Elkin, V . Devabhaktuni, U-net and its variants for medical image segmentation: A review of theory and appli- cations, Ieee Access 9 (2021) 82031–82057
work page 2021
-
[26]
T. A. Soomro, A. J. Afifi, L. Zheng, S. Soomro, J. Gao, O. Hellwich, M. Paul, Deep learning models for retinal blood vessels segmentation: a review, IEEE Access 7 (2019) 71696–71717
work page 2019
-
[27]
N. Wang, K. Li, G. Zhang, Z. Zhu, P. Wang, Improvement of retinal vessel segmentation method based on u-net, Electronics 12 (2) (2023) 262
work page 2023
-
[28]
F. I. Diakogiannis, F. Waldner, P. Caccetta, C. Wu, Resunet-a: A deep learning framework for semantic segmentation of remotely sensed data, ISPRS Journal of Photogrammetry and Remote Sensing 162 (2020) 94– 114
work page 2020
-
[29]
D. Mahapatra, A. Poellinger, M. Reyes, Interpretability-guided inductive bias for deep learning based medical image, Medical image analysis 81 (2022) 102551
work page 2022
-
[30]
P. K. Gadosey, Y . Li, E. A. Agyekum, T. Zhang, Z. Liu, P. T. Yamak, F. Essaf, Sd-unet: Stripping down u-net for segmentation of biomedical images on platforms with low computational budgets, Diagnostics 10 (2) (2020) 110
work page 2020
-
[31]
J. Ba, B. Frey, Adaptive dropout for training deep neural networks, Ad- vances in neural information processing systems 26 (2013)
work page 2013
-
[32]
W. Liu, H. Yang, T. Tian, Z. Cao, X. Pan, W. Xu, Y . Jin, F. Gao, Full-resolution network and dual-threshold iteration for retinal vessel and coronary angiograph segmentation, IEEE Journal of Biomedical and Health Informatics 26 (9) (2022) 4623–4634
work page 2022
-
[33]
N. Shen, Z. Wang, J. Li, H. Gao, W. Lu, P. Hu, L. Feng, Multi-organ segmentation network for abdominal ct images based on spatial atten- tion and deformable convolution, Expert Systems with Applications 211 (2023) 118625
work page 2023
-
[34]
S. Dong, Z. Pan, Y . Fu, Q. Yang, Y . Gao, T. Yu, Y . Shi, C. Zhuo, Deu-net 2.0: Enhanced deformable u-net for 3d cardiac cine mri segmentation, Medical Image Analysis 78 (2022) 102389
work page 2022
-
[35]
G. Tsechpenakis, Deformable model-based medical image segmentation, in: Multi Modality State-of-the-Art Medical Image Segmentation and Registration Methodologies: V olume 1, Springer, 2011, pp. 33–67
work page 2011
-
[36]
X. Yang, Z. Li, Y . Guo, D. Zhou, Dcu-net: A deformable convolutional neural network based on cascade u-net for retinal vessel segmentation, Multimedia Tools and Applications 81 (11) (2022) 15593–15607
work page 2022
-
[37]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, 12 Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017)
work page 2017
-
[38]
R. Hou, H. Chang, B. Ma, S. Shan, X. Chen, Cross attention network for few-shot classification, Advances in neural information processing sys- tems 32 (2019)
work page 2019
-
[39]
Y . Yuan, L. Zhang, L. Wang, H. Huang, Multi-level attention network for retinal vessel segmentation, IEEE Journal of Biomedical and Health Informatics 26 (1) (2021) 312–323
work page 2021
-
[40]
Y . Lv, H. Ma, J. Li, S. Liu, Attention guided u-net with atrous convolution for accurate retinal vessels segmentation, IEEE Access 8 (2020) 32826– 32839
work page 2020
-
[41]
C. Dong, S. Xu, D. Dai, Y . Zhang, C. Zhang, Z. Li, A novel multi- attention, multi-scale 3d deep network for coronary artery segmentation, Medical Image Analysis 85 (2023) 102745
work page 2023
-
[42]
N. Mu, Z. Lyu, M. Rezaeitaleshmahalleh, J. Tang, J. Jiang, An atten- tion residual u-net with di fferential preprocessing and geometric post- processing: Learning how to segment vasculature including intracranial aneurysms, Medical image analysis 84 (2023) 102697
work page 2023
- [43]
-
[44]
L. Xia, H. Zhang, Y . Wu, R. Song, Y . Ma, L. Mou, J. Liu, Y . Xie, M. Ma, Y . Zhao, 3d vessel-like structure segmentation in medical images by an edge-reinforced network, Medical Image Analysis 82 (2022) 102581
work page 2022
-
[45]
M. R. K. Mookiah, S. Hogg, T. J. MacGillivray, V . Prathiba, R. Pradeepa, V . Mohan, R. M. Anjana, A. S. Doney, C. N. Palmer, E. Trucco, A review of machine learning methods for retinal blood vessel segmentation and artery/vein classification, Medical Image Analysis 68 (2021) 101905
work page 2021
-
[46]
C. Guo, M. Szemenyei, Y . Yi, W. Wang, B. Chen, C. Fan, Sa-unet: Spatial attention u-net for retinal vessel segmentation, in: 2020 25th international conference on pattern recognition (ICPR), IEEE, 2021, pp. 1236–1242
work page 2020
-
[47]
Y .-f. Zhu, X. Xu, X.-d. Zhang, M.-s. Jiang, Ccs-unet: a cross-channel spa- tial attention model for accurate retinal vessel segmentation, Biomedical Optics Express 14 (9) (2023) 4739–4758
work page 2023
- [48]
-
[49]
A. Agarwal, C. Arora, Attention attention everywhere: Monocular depth prediction with skip attention, in: Proceedings of the IEEE /CVF Winter Conference on Applications of Computer Vision, 2023, pp. 5861–5870
work page 2023
-
[50]
X. Mao, Y . Zhao, B. Chen, Y . Ma, Z. Gu, S. Gu, J. Yang, J. Cheng, J. Liu, Deep learning with skip connection attention for choroid layer segmenta- tion in oct images, in: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), IEEE, 2020, pp. 1641–1645
work page 2020
- [51]
-
[52]
C.-C. Wong, C.-M. V ong, Persistent homology based graph convolu- tion network for fine-grained 3d shape segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7098–7107
work page 2021
-
[53]
L. Wasserman, Topological data analysis, Annual Review of Statistics and Its Application 5 (2018) 501–532
work page 2018
-
[54]
R. Br ¨uel-Gabrielsson, B. J. Nelson, A. Dwaraknath, P. Skraba, L. J. Guibas, G. Carlsson, A topology layer for machine learning (2020). arXiv:1905.12200
- [55]
-
[56]
K. U. Research., Chase db1 database, [EB /OL], http://blogs. kingston.ac.uk/retinal/chasedb1/ Accessed Jan. 2011
work page 2011
- [57]
-
[58]
K. Jin, X. Huang, J. Zhou, Y . Li, Y . Yan, Y . Sun, Q. Zhang, Y . Wang, J. Ye, Fives: A fundus image dataset for artificial intelligence based vessel segmentation, Scientific Data 9 (1) (2022) 475
work page 2022
-
[59]
M. Li, K. Huang, Q. Xu, J. Yang, Y . Zhang, K. Xie, S. Yuan, Q. Liu, Q. Chen, Octa-500: a retinal dataset for optical coherence tomography angiography study, Medical Image Analysis (2024) 103092
work page 2024
-
[60]
M. M. Rahman, R. Marculescu, G-cascade: E fficient cascaded graph con- volutional decoding for 2d medical image segmentation, in: Proceedings of the IEEE /CVF Winter Conference on Applications of Computer Vi- sion, 2024, pp. 7728–7737
work page 2024
-
[61]
Y . He, V . Nath, D. Yang, Y . Tang, A. Myronenko, D. Xu, Swinunetr- v2: Stronger swin transformers with stagewise convolutions for 3d med- ical image segmentation, in: International Conference on Medical Im- age Computing and Computer-Assisted Intervention, Springer, 2023, pp. 416–426
work page 2023
-
[62]
W. Wang, J. Zhong, H. Wu, Z. Wen, J. Qin, Rvseg-net: An e fficient fea- ture pyramid cascade network for retinal vessel segmentation, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceed- ings, Part V 23, Springer, 2020, pp. 796–805. 13
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.