pith. sign in

arxiv: 1907.01683 · v1 · pith:5LXSCDMInew · submitted 2019-07-02 · 💻 cs.CV · cs.CG· cs.LG· eess.IV

SkeletonNet: Shape Pixel to Skeleton Pixel

Pith reviewed 2026-05-25 10:35 UTC · model grok-4.3

classification 💻 cs.CV cs.CGcs.LGeess.IV
keywords skeleton extractionU-NetHED architectureshape to skeletonpixel labelingF1 scoreCVPR challengeimage segmentation
0
0 comments X

The pith

A U-Net with an HED-style decoder extracts skeleton pixels from shape pixels of 89 objects by fusing four side layers through a dilation convolution and reaches 0.77 F1 on test data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method for the first track of a CVPR 2019 challenge that requires turning shape pixels into skeleton pixels across images of 89 objects. It starts from a standard U-Net encoder-decoder and changes the decoder to follow the HED pattern by adding four side layers that are combined in one dilation convolutional layer. The goal of this change is to reconnect any broken segments in the output skeleton. Readers would care because reliable skeleton extraction is a basic step in turning raw object images into usable geometric descriptions. The reported result on the held-out test set is an F1 score of 0.77.

Core claim

We use a U-net model with an encoder-decoder structure. Unlike the plain decoder in the traditional Unet, we have designed the decoder in the format of HED architecture, wherein we have introduced 4 side layers and fused them to one dilation convolutional layer to connect the broken links of the skeleton. Our proposed architecture achieved the F1 score of 0.77 on test data.

What carries the argument

The HED-style decoder that introduces four side layers fused through a single dilation convolutional layer to reconnect broken skeleton segments.

If this is right

  • The architecture converts shape pixels into skeleton pixels for the 89 objects in the first track of the challenge.
  • The model reaches an F1 score of 0.77 when tested on the competition test data.
  • The side-layer fusion step is intended to restore connectivity in the extracted skeletons.
  • The same encoder-decoder design can be trained end-to-end on the supplied dataset for this pixel-to-pixel task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decoder modification might be tried on other dense-prediction problems such as edge detection or thin-structure segmentation.
  • Running the network on object classes outside the original 89 would test whether the connectivity benefit transfers.
  • An ablation that removes the dilation fusion layer and measures the change in F1 would quantify how much the side-layer step contributes.

Load-bearing premise

Adding four side layers fused via a single dilation convolutional layer will reliably connect broken skeleton links on the provided competition dataset.

What would settle it

Evaluating the trained model on a new collection of shape images that contain known gaps in the ground-truth skeletons and checking whether those gaps remain unconnected in the output.

Figures

Figures reproduced from arXiv: 1907.01683 by Priya Kansal, Sabari Nathan.

Figure 1
Figure 1. Figure 1: SkeletonNet: A detailed view of Proposed Architecture [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Coordinate convolutional layer as proposed in [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Residual Squeezed (RS) block used in the [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Side – Layers and Fused layer guidance with [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Side – Layers, Fused layer output and Ensembled [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
read the original abstract

Deep Learning for Geometric Shape Understating has organized a challenge for extracting different kinds of skeletons from the images of different objects. This competition is organized in association with CVPR 2019. There are three different tracks of this competition. The present manuscript describes the method used to train the model for the dataset provided in the first track. The first track aims to extract skeleton pixels from the shape pixels of 89 different objects. For the purpose of extracting the skeleton, a U-net model which is comprised of an encoder-decoder structure has been used. In our proposed architecture, unlike the plain decoder in the traditional Unet, we have designed the decoder in the format of HED architecture, wherein we have introduced 4 side layers and fused them to one dilation convolutional layer to connect the broken links of the skeleton. Our proposed architecture achieved the F1 score of 0.77 on test data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript describes a U-Net variant for the first track of the CVPR 2019 Deep Learning for Geometric Shape Understanding challenge. The architecture modifies the decoder to an HED-style structure with four side layers fused through a single dilation convolutional layer, with the goal of connecting broken skeleton links; the reported result is an F1 score of 0.77 on the held-out test set for skeleton pixel extraction from shape images of 89 object classes.

Significance. A validated improvement in skeleton connectivity via side-layer fusion could be useful for shape analysis tasks, but the single scalar F1 score on one competition dataset provides limited insight into broader applicability or the contribution of the claimed architectural element.

major comments (2)
  1. [Abstract] Abstract: the assertion that the four side layers fused via one dilation convolutional layer connect broken skeleton links is presented without any ablation study, baseline comparison to a plain U-Net decoder, or quantitative justification. Because the central claim attributes the F1=0.77 result to this design choice, the absence of controls means the performance cannot be tied to the proposed modification.
  2. [Abstract] Abstract: no error bars, cross-validation details, training/validation split information, or comparison against other skeletonization methods are supplied, so the reported F1 score stands as an isolated number whose reliability and improvement over standard approaches cannot be assessed.
minor comments (2)
  1. [Abstract] Typo: 'Understating' should read 'Understanding'.
  2. The manuscript provides no information on loss function, optimizer, data augmentation, or implementation framework, all of which are required for reproducibility of the reported F1 score.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's comments highlighting the need for stronger justification of the architectural choices and additional experimental details. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that the four side layers fused via one dilation convolutional layer connect broken skeleton links is presented without any ablation study, baseline comparison to a plain U-Net decoder, or quantitative justification. Because the central claim attributes the F1=0.77 result to this design choice, the absence of controls means the performance cannot be tied to the proposed modification.

    Authors: We agree that the manuscript does not contain an ablation study or direct baseline comparison to a standard U-Net decoder, which limits the ability to quantitatively attribute the result to the side-layer fusion and dilation convolution. The design draws from the HED architecture's demonstrated utility in multi-scale feature extraction for edge-like tasks, with the dilation layer added specifically to address skeleton connectivity. To strengthen the claim, we will incorporate a baseline U-Net comparison in the revised manuscript. revision: yes

  2. Referee: [Abstract] Abstract: no error bars, cross-validation details, training/validation split information, or comparison against other skeletonization methods are supplied, so the reported F1 score stands as an isolated number whose reliability and improvement over standard approaches cannot be assessed.

    Authors: The F1 score of 0.77 is the official result on the competition's fixed held-out test set for the 89-class shape skeletonization track. Internal training/validation splits were performed but omitted due to the concise competition-report format; no cross-validation or error bars were computed because evaluation is determined by the organizers. We will expand the methods section in revision to detail the training procedure and any internal splits. Comparisons to non-deep-learning skeletonization methods (e.g., morphological thinning) can be added for context, though the challenge emphasizes learned approaches. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical F1 on held-out test data

full rationale

The manuscript describes a U-Net variant with HED-style side layers for skeleton extraction and reports an F1 score of 0.77 on competition test data. No derivation chain, equations, fitted-parameter predictions, or self-citation load-bearing steps exist. The performance number is an external held-out metric, not a quantity defined by construction from the model's own inputs or prior self-citations. The paper is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents identification of specific free parameters, axioms, or invented entities; the model likely contains standard neural-network hyperparameters but none are enumerated.

pith-pipeline@v0.9.0 · 5685 in / 1034 out tokens · 44331 ms · 2026-05-25T10:35:54.920611+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 3 internal anchors

  1. [1]

    Computing and simplifying 2D and 3D continuous skeletons

    Attali D, Montanvert A. Computing and simplifying 2D and 3D continuous skeletons. Computer vision and image understanding, 67(3):261-73, 1997

  2. [2]

    Skeleton pruning by contour partitioning with discrete curve evolution

    Bai X, Latecki LJ, Liu WY. Skeleton pruning by contour partitioning with discrete curve evolution. IEEE transactions on pattern analysis and machine intelligence, 29(3):449-62, 1997

  3. [3]

    Deepedge: A multi- scale bifurcated deep network for top-down contour detection

    Bertasius G, Shi J, Torresani L. Deepedge: A multi- scale bifurcated deep network for top-down contour detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4380- 4389, 2015

  4. [4]

    A practical introduction to skeletons for the plant sciences

    Bucksch A. A practical introduction to skeletons for the plant sciences. Applications in plant sciences, 2(8):1400005, 2014

  5. [5]

    λ-medial axis

    Chazal F, Lieutier A. The “ λ-medial axis”. Graphical Models, 67(4):304-31, 2005

  6. [6]

    SkelNetOn 2019: Dataset and Challenge on Deep Learning for Geometric Shape Understanding

    Demir, I., Hahn, C., Leonard, K., Morin, G., Rahbani, D., Panotopoulou, A., . . . Kortylewski, A. (2019, March 21). SkelNetOn 2019 Dataset and Challenge on Deep Learning for Geometric Shape Understanding. Retrieved from https://arxiv.org/abs/1903.09233

  7. [7]

    The Propagated Skeleton: A Robust Detail-Preserving Approach

    Durix B, Chambon S, Leonard K, Mari JL, Morin G. The Propagated Skeleton: A Robust Detail-Preserving Approach. In International Conference on Discrete Geometry for Computer Imagery, pp. 343-354. Springer, Cham, 2005

  8. [8]

    Fast edge detection using structured forests

    Dollár P, Zitnick CL. Fast edge detection using structured forests. IEEE transactions on pattern analysis and machine intelligence, 37(8):1558-70, 2015

  9. [9]

    The scale axis transform

    Giesen J, Miklos B, Pauly M, Wormser C. The scale axis transform. In Proceedings of the twenty-fifth annual symposium on Computational geometry, pp. 106-115, ACM, 2009

  10. [10]

    Hypercolumns for object segmentation and fine-grained localization

    Hariharan B, Arbeláez P, Girshick R, Malik J. Hypercolumns for object segmentation and fine-grained localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 447-456, 2015

  11. [11]

    Three Birds One Stone: A General Architecture for Salient Object Segmentation, Edge Detection and Skeleton Extraction

    Hou Q, Liu J, Cheng MM, Borji A, Torr PH. Three birds one stone: a unified framework for salient object segmentation, edge detection and skeleton extraction. arXiv preprint arXiv:1803.09860. 2018

  12. [12]

    Squeeze-and-excitation networks

    Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141, 2018

  13. [13]

    Depth map-based human activity tracking and recognition using body joints features and self-organized map

    Jalal A, Kamal S, Kim D. Depth map-based human activity tracking and recognition using body joints features and self-organized map. In Fifth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1-6, IEEE, 2014

  14. [14]

    Tracking Direction of Human Movement - An Efficient Implementation using Skeleton

    Kundu M, Sengupta D, Dastidar JG. Tracking Direction of Human Movement-An Efficient Implementation using Skeleton. arXiv preprint arXiv:1506.08815. 2015

  15. [15]

    Skeletonization using SSM of the distance transform

    Latecki LJ, Li QN, Bai X, Liu WY. Skeletonization using SSM of the distance transform. In2007 IEEE International Conference on Image Processing, 5(V):349, IEEE, 2007

  16. [16]

    A 2D shape structure for decomposition and part similarity

    Leonard K, Morin G, Hahmann S, Carlier A. A 2D shape structure for decomposition and part similarity. In2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3216-3221, IEEE, 2016

  17. [17]

    An implementation of ocr system based on skeleton matching, 1993

    Li N. An implementation of ocr system based on skeleton matching, 1993

  18. [18]

    An intriguing failing of convolutional neural networks and the coordconv solution

    Liu R, Lehman J, Molino P, Such FP, Frank E, Sergeev A, Yosinski J. An intriguing failing of convolutional neural networks and the coordconv solution. In Advances in Neural Information Processing Systems, pp. 9605-9616, 2018

  19. [19]

    Analysis of two-dimensional non-rigid shapes

    Bronstein AM, Bronstein MM, Bruckstein AM, Kimmel R. Analysis of two-dimensional non-rigid shapes. International Journal of Computer Vision, 78(1):67-88, 2008

  20. [20]

    U-net: Convolutional networks for biomedical image segmentation

    Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234-241, Springer, Cham, 2015

  21. [21]

    Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks

    Roy AG, Navab N, Wachinger C. Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 421-429, Springer, Cham, 2018

  22. [22]

    Object skeleton extraction in natural images by fusing scale-associated deep side outputs

    Shen W, Zhao K, Jiang Y, Wang Y, Zhang Z, Bai X. Object skeleton extraction in natural images by fusing scale-associated deep side outputs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 222-230, 2016

  23. [23]

    Deepcontour: A deep convolutional feature learned by positive-sharing loss for contour detection

    Shen W, Wang X, Wang Y, Bai X, Zhang Z. Deepcontour: A deep convolutional feature learned by positive-sharing loss for contour detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3982-3991, 2015

  24. [24]

    Recognition of shapes by editing their shock graphs

    Sebastian TB, Klein PN, Kimia BB. Recognition of shapes by editing their shock graphs. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1(5):550- 71, 2004

  25. [25]

    (2019, February)

    SkelNetOn @ CVPR19. (2019, February). Retrieved from http://ubee.enseeiht.fr/skelneton/challenge.html

  26. [26]

    Holistically-nested edge detection

    Xie S, Tu Z. Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision, pp. 1395-1403, 2015

  27. [27]

    Object contour detection with a fully convolutional encoder- decoder network

    Yang J, Price B, Cohen S, Lee H, Yang MH. Object contour detection with a fully convolutional encoder- decoder network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 193-202, 2016

  28. [28]

    Preprocessing and postprocessing for skeleton-based fingerprint minutiae extraction

    Zhao F, Tang X. Preprocessing and postprocessing for skeleton-based fingerprint minutiae extraction. Pattern Recognition, 40(4):1270-81, 2007