Non-destructive three-dimensional measurement of hand vein based on self-supervised network
Pith reviewed 2026-05-25 12:52 UTC · model grok-4.3
The pith
A self-supervised network called SDMNet computes dense disparity maps from stereo pairs without any disparity labels for 3D hand vein measurement.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SDMNet is a self-supervised stereo disparity matching network that approximates disparity maps by densely matching stereo images, warps the images accordingly, and trains by minimizing a perceptual loss between the estimated and original images, achieving excellent results on KITTI 2012, KITTI 2015, simulated vein dataset and real vein dataset while outperforming many state-of-the-art supervised matching methods.
What carries the argument
SDMNet, the self-supervised network that uses image reconstruction loss with perceptual loss to train disparity estimation without labels.
Load-bearing premise
That minimizing the image reconstruction loss with perceptual loss between warped and original stereo images produces accurate disparity maps in the hand vein domain.
What would settle it
If the endpoint error or bad pixel percentage of SDMNet on a labeled real hand vein test set exceeds that of top supervised methods, the superiority claim would be falsified.
Figures
read the original abstract
At present, supervised stereo methods based on deep neural network have achieved impressive results. However, in some scenarios, accurate three-dimensional labels are inaccessible for supervised training. In this paper, a self-supervised network is proposed for binocular disparity matching (SDMNet), which computes dense disparity maps from stereo image pairs without disparity labels: In the self-supervised training, we match the stereo images densely to approximate the disparity maps and use them to warp the left and right images to estimate the right and left images; we build the loss function between estimated images and original images for self-supervised training, which adopts perceptual loss to help improve the quality of disparity maps in both detail and structure. Then, we use SDMNet to obtain disparities of hand vein. SDMNet has achieved excellent results on KITTI 2012, KITTI 2015, simulated vein dataset and real vein dataset, outperforming many state-of-the-art supervised matching methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SDMNet, a self-supervised stereo disparity matching network that computes dense disparity maps from stereo image pairs without disparity labels. Training minimizes a perceptual loss between the original stereo images and images warped using the predicted disparities; the network is then applied to obtain 3D hand-vein measurements and is claimed to outperform many supervised state-of-the-art methods on KITTI 2012, KITTI 2015, a simulated vein dataset, and a real vein dataset.
Significance. If the quantitative claims hold, the self-supervised formulation would be valuable for medical 3D imaging domains where ground-truth disparity labels are difficult to obtain. The explicit use of perceptual loss to improve structural fidelity is a constructive design choice that could generalize beyond the vein application.
major comments (2)
- [Abstract] Abstract: the self-supervised loss is described solely as image-reconstruction loss plus perceptual loss between warped and original images. No left-right consistency, smoothness, or occlusion terms are mentioned; in the low-texture skin regions surrounding veins this leaves the disparity field under-constrained, directly undermining the accuracy claims on both simulated and real vein data.
- [Abstract] Abstract: the assertion that SDMNet 'has achieved excellent results ... outperforming many state-of-the-art supervised matching methods' on four datasets is presented without any quantitative metrics, error tables, ablation studies, or statistical comparisons, rendering the central performance claim impossible to evaluate.
minor comments (1)
- The transition from disparity estimation to the final 'non-destructive three-dimensional measurement' pipeline is not detailed; a brief description of how disparity maps are converted to 3D vein geometry would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the abstract. We address each point below and will revise the manuscript to improve clarity and support for the claims where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: the self-supervised loss is described solely as image-reconstruction loss plus perceptual loss between warped and original images. No left-right consistency, smoothness, or occlusion terms are mentioned; in the low-texture skin regions surrounding veins this leaves the disparity field under-constrained, directly undermining the accuracy claims on both simulated and real vein data.
Authors: The abstract provides a concise high-level summary of the training objective. The full manuscript (Section 3) specifies that the loss consists of image reconstruction via warping combined with perceptual loss; no explicit left-right consistency, smoothness, or occlusion terms are included. We argue that the perceptual loss, operating on higher-level VGG features, provides structural regularization that helps mitigate under-constraint in low-texture skin regions around veins, as evidenced by the competitive results on the vein datasets. Nevertheless, we acknowledge the referee's point that explicitly noting the absence of these terms and discussing their implications would strengthen the abstract. We will revise the abstract to clarify the loss formulation and add a brief statement on handling low-texture areas. revision: partial
-
Referee: [Abstract] Abstract: the assertion that SDMNet 'has achieved excellent results ... outperforming many state-of-the-art supervised matching methods' on four datasets is presented without any quantitative metrics, error tables, ablation studies, or statistical comparisons, rendering the central performance claim impossible to evaluate.
Authors: The abstract is intended as a high-level summary; the quantitative evidence, including error metrics (e.g., EPE, D1), tables comparing against supervised methods on KITTI 2012/2015 and both vein datasets, and ablation studies, appears in the Experiments section. We agree that the abstract's performance claim would be more evaluable if supported by key numbers. We will revise the abstract to include representative quantitative results (subject to length constraints) or qualify the statement to refer readers to the full results. revision: yes
Circularity Check
No circularity; self-supervised reconstruction loss is independent of disparity targets
full rationale
The paper defines SDMNet training directly via a perceptual loss between original stereo images and images warped by the network's own disparity predictions. This objective is stated without reference to ground-truth disparities and does not reduce to any fitted parameter being renamed as a prediction. No self-citation chain, uniqueness theorem, or ansatz smuggling is invoked to justify the central claim. The derivation chain therefore remains self-contained against external benchmarks and matches none of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Chang, J.R., Chen, Y.S., 2018. Pyramid stereo matching network, in: ProceedingsoftheIEEEConferenceonComputerVisionandPattern Recognition, pp. 5410–5418
work page 2018
-
[2]
Ima- genet: Alarge-scalehierarchicalimagedatabase
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F., 2009. Ima- genet: Alarge-scalehierarchicalimagedatabase. ProcofIEEECom- puter Vision and Pattern Recognition , 248–255
work page 2009
-
[3]
Garg, R., B.G., V.K., Carneiro, G., Reid, I., 2016. Unsupervised cnn for single view depth estimation: Geometry to the rescue, in: Euro- pean Conference on Computer Vision
work page 2016
-
[4]
Geiger,A.,Lenz,P.,Urtasun,R.,2012. Arewereadyforautonomous driving? the kitti vision benchmark suite, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE. pp. 3354–3361
work page 2012
-
[5]
Godard,C.,Aodha,O.M.,Brostow,G.J.,2017. Unsupervisedmonoc- ulardepthestimationwithleft-rightconsistency,in: ComputerVision and Pattern Recognition. Xiaoyu Chen et al.: Preprint submitted to Elsevier Page 7 of 8
work page 2017
-
[6]
Guney, F., Geiger, A., 2015. Displets: Resolving stereo ambiguities using object knowledge, in: Computer Vision and Pattern Recogni- tion
work page 2015
-
[7]
Spatial pyramid pooling indeepconvolutionalnetworksforvisualrecognition
He, K., Zhang, X., Ren, S., Sun, J., 2015. Spatial pyramid pooling indeepconvolutionalnetworksforvisualrecognition. IEEETransac- tions on Pattern Analysis and Machine Intelligence 37, 1904–1916
work page 2015
-
[8]
Stereo processing by semiglobal matching and mutual information
Hirschmuller, H., 2007. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on pattern analysis and machine intelligence 30, 328–341
work page 2007
-
[9]
Jaderberg,M.,Simonyan,K.,Zisserman,A.,Kavukcuoglu,K.,2015. Spatial transformer networks
work page 2015
-
[10]
PerceptualLossesforReal-Time Style Transfer and Super-Resolution
Johnson,J.,Alahi,A.,Li,F.F.,2016. PerceptualLossesforReal-Time Style Transfer and Super-Resolution
work page 2016
-
[11]
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., Bry, A., 2017. End-to-end learning of geometry and contextfordeepstereoregression, in: ProceedingsoftheIEEEInter- national Conference on Computer Vision, pp. 66–75
work page 2017
-
[12]
Klaus, A., Sormann, M., Karner, K., 2006. Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure, in: 18th International Conference on Pattern Recognition (ICPR’06), IEEE. pp. 15–18
work page 2006
-
[13]
Computing visual correspondence with occlusions using graph cuts
Kolmogorov, V., Zabih, R., 2013. Computing visual correspondence with occlusions using graph cuts. Phd Thesis Stanford Univ 2, 508– 515 vol.2
work page 2013
-
[14]
Stereo matching by training a convolutional neural network to compare image patches
Lecun, Y., 2015. Stereo matching by training a convolutional neural network to compare image patches
work page 2015
-
[15]
Li, Y., Huttenlocher, D.P., 2008. Learning for stereo vision using the structuredsupportvectormachine,in: IEEEConferenceonComputer Vision and Pattern Recognition
work page 2008
-
[16]
IEEE Trans Pattern Anal Mach Intell 29, 331–342
Li,Z.,Seitz,S.M.,2007.Estimatingoptimalparametersformrfstereo from a single image pair. IEEE Trans Pattern Anal Mach Intell 29, 331–342
work page 2007
-
[17]
Efficient deep learning for stereo matching, in: Computer Vision and Pattern Recognition
Luo, W., Schwing, A.G., Urtasun, R., 2016. Efficient deep learning for stereo matching, in: Computer Vision and Pattern Recognition
work page 2016
-
[18]
Deep multi-scale video prediction beyond mean square error
Mathieu, M., Couprie, C., Lecun, Y., 2015. Deep multi-scale video prediction beyond mean square error
work page 2015
-
[19]
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A.,Brox,T.,2016. Alargedatasettotrainconvolutionalnetworksfor disparity, optical flow, and scene flow estimation, in: IEEE Confer- ence on Computer Vision and Pattern Recognition
work page 2016
-
[20]
Mei, X., Sun, X., Zhou, M., Jiao, S., Wang, H., Zhang, X., 2011. On buildinganaccuratestereomatchingsystemongraphicshardware,in: 2011IEEEInternationalConferenceonComputerVisionWorkshops (ICCV Workshops), IEEE. pp. 467–474
work page 2011
-
[21]
Object scene flow for autonomous ve- hicles, in: Computer Vision and Pattern Recognition
Menze, M., Geiger, A., 2015. Object scene flow for autonomous ve- hicles, in: Computer Vision and Pattern Recognition
work page 2015
-
[22]
Acceleration of non-linear minimisation with py- torch
Nikolic, B., 2018. Acceleration of non-linear minimisation with py- torch
work page 2018
-
[23]
On learning conditional random fields for stereo
Pal, C.J., 2012. On learning conditional random fields for stereo. In- ternational Journal of Computer Vision 99, 319–337
work page 2012
-
[24]
Pang,J.,Sun,W.,Ren,J.S.,Yang,C.,Yan,Q.,2017.Cascaderesidual learning: Atwo-stageconvolutionalneuralnetworkforstereomatch- ing
work page 2017
-
[25]
Scharstein, D., Pal, C., 2007. Learning conditional random fields for stereo, in: IEEE Conference on Computer Vision and Pattern Recog- nition
work page 2007
-
[26]
Stereo matching with nonlinear diffusion
Scharstein, D., Szeliski, R., 1998. Stereo matching with nonlinear diffusion. International Journal of Computer Vision 28, 155–174
work page 1998
-
[27]
Seki, A., Pollefeys, M., 2017. Sgm-nets: Semi-global matching with neural networks, in: IEEE Conference on Computer Vision and Pat- tern Recognition
work page 2017
-
[28]
Very deep convolutional net- works for large-scale image recognition
Simonyan, K., Zisserman, A., 2014. Very deep convolutional net- works for large-scale image recognition. Computer Science
work page 2014
-
[29]
Selection of typical wavelength for palmar vein recognition
Wu Wei, Yuan Wei Qi, L.S.K.D., Hongtao, Z., 2012. Selection of typical wavelength for palmar vein recognition. Acta Optica Sinica 32, 133–139
work page 2012
-
[30]
Yamaguchi,K.,Mcallester,D.,Urtasun,R.,2014. Efficientjointseg- mentation, occlusion labeling, stereo and flow estimation, in: Euro- pean Conference on Computer Vision
work page 2014
-
[31]
Stereo matching using tree filtering
Yang, Q., 2015. Stereo matching using tree filtering. IEEE Transac- tions on Pattern Analysis and Machine Intelligence 37, 834–846
work page 2015
-
[32]
Yang, Q., Wang, L., Yang, R., StewaNius, H., NistaR, D., 2009. Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 492–504
work page 2009
-
[33]
Adaptive support-weight approach for correspondence search
Yoon, K.J., Kweon, I.S., 2006. Adaptive support-weight approach for correspondence search. IEEE Trans Pattern Anal Mach Intell 28, 650–656
work page 2006
-
[34]
Activestereonet: End-to-end self-supervised learning for active stereo systems
Zhang, Y., Khamis, S., Rhemann, C., Valentin, J., Kowdle, A., Tankovich, V., Schoenberg, M., Izadi, S., Funkhouser, T., Fanello, S., 2018. Activestereonet: End-to-end self-supervised learning for active stereo systems
work page 2018
-
[35]
Flexiblecameracalibrationbyviewingaplanefrom unknown orientations, in: Computer Vision, 1999
Zhang,Z.,1999. Flexiblecameracalibrationbyviewingaplanefrom unknown orientations, in: Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on
work page 1999
-
[36]
Self- supervised learning for stereo matching with self-improving ability
Zhong, Y., Dai, Y., Li, H., Zhong, Y., Dai, Y., Li, H., 2017. Self- supervised learning for stereo matching with self-improving ability . Xiaoyu Chen et al.: Preprint submitted to Elsevier Page 8 of 8
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.