pith. sign in

arxiv: 1907.10763 · v1 · pith:UOZLABBSnew · submitted 2019-07-24 · 💻 cs.CV · cs.LG

One-stage Shape Instantiation from a Single 2D Image to 3D Point Cloud

Pith reviewed 2026-05-24 16:37 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords shape instantiationpoint cloudconvolutional networkchamfer distanceright ventricle3D reconstructionone-stage methodintra-operative navigation
0
0 comments X

The pith

A convolutional network can map a single 2D image directly to a 3D point cloud with 1.72 mm average error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors replace a two-stage shape instantiation pipeline with a single end-to-end network. The earlier method required manual segmentation to create a 2D statistical shape model, then kernel partial least squares regression to predict the 3D model. The new approach feeds the raw 2D image into PointOutNet, which outputs the 3D point cloud using Chamfer distance as the training loss. On 609 experiments from 27 right-ventricle subjects the network reaches 1.72 mm point-cloud-to-point-cloud error, close to the 1.31–1.42 mm of the two-stage baselines. This removes the need for explicit intermediate models and enables direct image-to-point-cloud training and inference.

Core claim

PointOutNet, built from 19 convolutional layers and three fully-connected layers and trained with Chamfer distance loss, predicts the 3D target point cloud from a single 2D image. On a dataset of 27 right-ventricle subjects (609 experiments) it produces an average point-cloud-to-point-cloud error of 1.72 mm, comparable to the PLSR-based (1.42 mm) and KPLSR-based (1.31 mm) two-stage algorithms, while allowing spontaneous image-to-point-cloud training and inference without manual segmentation or an explicit 2D statistical shape model.

What carries the argument

PointOutNet, a 19-convolutional-layer plus three-fully-connected-layer network trained end-to-end with Chamfer distance loss to map a 2D image directly to a 3D point cloud.

If this is right

  • Real-time intra-operative navigation no longer requires a manual segmentation step to build the 2D shape model.
  • Training and inference become fully spontaneous from raw images to 3D point clouds.
  • The method avoids constructing and maintaining separate 2D and 3D statistical shape models.
  • Accuracy remains within 0.3–0.4 mm of the prior regression-based results on the same right-ventricle data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same direct-mapping approach could be tested on other dynamic organs where 2D-to-3D shape variation is the dominant task.
  • Replacing the final point-cloud output with a surface mesh would allow direct comparison against surface-based clinical metrics.
  • Combining the network with online fine-tuning on a few new patient images might further reduce error for individual anatomy.

Load-bearing premise

A standard convolutional network can learn the necessary 3D shape variations from 2D images without an explicit 2D statistical shape model or kernel regression step.

What would settle it

On a held-out set of right-ventricle images the network's point-cloud predictions would show average error well above 2 mm or systematic failure to reproduce the shape variations captured by the earlier two-stage method.

Figures

Figures reproduced from arXiv: 1907.10763 by Guang-Zhong Yang, Jian-Qing Zheng, Peichao Li, Xiao-Yun Zhou, Zhao-Yang Wang.

Figure 1
Figure 1. Figure 1: Shape instantiation of (a) two-stage with manual image segmentation to gen￾erate 2D SSM and KPLSR-based learning for 3D mesh prediction; and (b) one-stage with PointOutNet to predict 3D point cloud from a single 2D image. A general dynamic framework was proposed recently in [13] for 3D shape instantiation. First, it determined an optimal scan plane for a dynamic target by analyzing its pre-operative 3D SSM… view at source ↗
Figure 2
Figure 2. Figure 2: Detailed network architecture of PointOutNet. As point cloud is an unordered data format, Using the regular L1 or L2 loss to calculate the corresponding distance error between the predicted point cloud and the ground truth may cause regression difficulty. Hence Chamfer distance is used as the loss function. It calculates the distance between the predicted point cloud and the ground truth as: Loss = X yˆ∈Yˆ… view at source ↗
Figure 3
Figure 3. Figure 3: The PC-to-PC error for each time frame for 12 subjects selected randomly from the 27 subjects. 3mm for all time frames. There are no excessively high peaks, which illustrates the stability of the proposed one-stage shape instantiation with PointOutNet. Slightly higher errors exist at the beginning and the end time frame (e.g. 1 and 25) and the middle time frame (e.g. 9), which is common as it was also obse… view at source ↗
Figure 4
Figure 4. Figure 4: Intuitive illustrations of the instantiation results of two randomly selected sub￾jects at the systole and diastole time frame, color indicates the PC-to-PC error for each vertex in mm. 3.2 Instantiation Examples The point clouds predicted by the PointOutNet at the systole and diastole time frames from two randomly selected subjects are shown in [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The mean PC-to-PC error for 27 subjects with PLSR-based and KPLSR-based two-stage shape instantiation and PointOutNet-based one-stage shape instantiation. 3.3 Comparisons to Other Methods The proposed one-stage shape instantiation with PointOutNet was compared to previous two-stage shape instantiation with PLSR and KPLSR. The mean PC￾to-PC error for each subject is shown in [PITH_FULL_IMAGE:figures/full_f… view at source ↗
read the original abstract

Shape instantiation which predicts the 3D shape of a dynamic target from one or more 2D images is important for real-time intra-operative navigation. Previously, a general shape instantiation framework was proposed with manual image segmentation to generate a 2D Statistical Shape Model (SSM) and with Kernel Partial Least Square Regression (KPLSR) to learn the relationship between the 2D and 3D SSM for 3D shape prediction. In this paper, the two-stage shape instantiation is improved to be one-stage. PointOutNet with 19 convolutional layers and three fully-connected layers is used as the network structure and Chamfer distance is used as the loss function to predict the 3D target point cloud from a single 2D image. With the proposed one-stage shape instantiation algorithm, a spontaneous image-to-point cloud training and inference can be achieved. A dataset from 27 Right Ventricle (RV) subjects, indicating 609 experiments, were used to validate the proposed one-stage shape instantiation algorithm. An average point cloud-to-point cloud (PC-to-PC) error of 1.72mm has been achieved, which is comparable to the PLSR-based (1.42mm) and KPLSR-based (1.31mm) two-stage shape instantiation algorithm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes replacing a prior two-stage shape instantiation pipeline (manual 2D SSM + KPLSR/PLSR regression) with a one-stage end-to-end network called PointOutNet (19 convolutional layers + 3 fully-connected layers) trained with Chamfer distance loss to regress a 3D point cloud directly from a single 2D image. Validation is performed on right-ventricle data from 27 subjects (609 experiments), reporting a mean PC-to-PC error of 1.72 mm claimed to be comparable to the two-stage baselines (1.42 mm PLSR, 1.31 mm KPLSR).

Significance. If the numerical equivalence holds under matched evaluation protocols, the result would show that explicit statistical shape models and kernel regression can be eliminated while preserving accuracy, simplifying real-time intra-operative 3D navigation. The work also supplies a concrete empirical baseline for single-view point-cloud regression on cardiac anatomy.

major comments (2)
  1. [Abstract] Abstract: the central claim that 1.72 mm is 'comparable' to 1.31 mm and 1.42 mm is load-bearing yet unsupported by any variance measure, confidence interval, per-subject breakdown, or statistical test; a 0.41 mm absolute difference on this scale cannot be evaluated without these controls or confirmation that identical held-out test cases were used for all three methods.
  2. [Abstract] Abstract / Results: no information is supplied on the train-test split, cross-validation scheme, or whether the 609 experiments constitute independent held-out cases; without this, the reported mean error cannot be interpreted as a reliable estimate of generalization performance.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'a dataset from 27 Right Ventricle (RV) subjects, indicating 609 experiments' is ambiguous; clarify how the 609 experiments are constructed from the 27 subjects (e.g., multiple views, time points, or augmentations).
  2. [Results] The manuscript should explicitly state whether the same test images and ground-truth point clouds were used for the one-stage and two-stage comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. The points raised about statistical support for the comparability claim and details on the experimental protocol are valid, and we will revise the manuscript to address them directly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 1.72 mm is 'comparable' to 1.31 mm and 1.42 mm is load-bearing yet unsupported by any variance measure, confidence interval, per-subject breakdown, or statistical test; a 0.41 mm absolute difference on this scale cannot be evaluated without these controls or confirmation that identical held-out test cases were used for all three methods.

    Authors: We agree the abstract's 'comparable' claim lacks supporting statistics and that a 0.41 mm difference requires variance measures or tests to evaluate properly. The full manuscript evaluates all methods on the same 27-subject RV dataset using PC-to-PC error, but does not report per-experiment variance or tests in the abstract. We will revise to include standard deviations, per-subject error breakdowns, and a statistical comparison (e.g., Wilcoxon test) on the available errors. We will also explicitly confirm that the two-stage baselines used the identical held-out cases from the same experiments. revision: yes

  2. Referee: [Abstract] Abstract / Results: no information is supplied on the train-test split, cross-validation scheme, or whether the 609 experiments constitute independent held-out cases; without this, the reported mean error cannot be interpreted as a reliable estimate of generalization performance.

    Authors: The abstract omits the validation protocol details. The manuscript uses data from 27 subjects (609 experiments total) with a subject-wise cross-validation scheme to ensure held-out evaluation and avoid leakage across subjects. We will revise the abstract and add a methods/results subsection explicitly describing the split (e.g., leave-one-subject-out) and confirming the 1.72 mm mean is computed on independent held-out cases. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical training outcome independent of inputs

full rationale

The paper presents a standard end-to-end CNN (PointOutNet) trained with Chamfer distance on a held-out RV dataset to produce 3D point clouds from 2D images. The reported 1.72 mm PC-to-PC error is a measured validation result, not a quantity obtained by fitting parameters inside the paper's own equations and then renaming the fit as a prediction. No self-definitional steps, no uniqueness theorems imported from the authors' prior work, and no ansatz smuggled via self-citation appear in the derivation. The comparison to prior PLSR/KPLSR numbers is a post-hoc numerical statement rather than a load-bearing reduction of the new method to the old ones. The derivation chain is therefore self-contained against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the empirical performance of a trained neural network on a specific RV imaging dataset; network weights are learned from data and no additional free parameters, axioms, or invented physical entities are introduced beyond standard deep-learning practice.

invented entities (1)
  • PointOutNet no independent evidence
    purpose: Direct one-stage prediction of 3D point cloud from single 2D image
    New network architecture proposed for the task.

pith-pipeline@v0.9.0 · 5780 in / 1311 out tokens · 37101 ms · 2026-05-24T16:37:59.725446+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 1 internal anchor

  1. [1]

    In: European Conference on Computer Vision

    Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In: European Conference on Computer Vision. pp. 628–644. Springer (2016)

  2. [2]

    Medical Image Analysis 10(6), 875–887 (2006)

    Cool, D., Downey, D., Izawa, J., Chin, J., Fenster, A.: 3D prostate model formation from non-parallel 2D ultrasound biopsy images. Medical Image Analysis 10(6), 875–887 (2006)

  3. [3]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object recon- struction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 605–613 (2017)

  4. [4]

    arXiv preprint arXiv:1905.01326 (2019)

    Kulon, D., Wang, H., G¨ uler, R.A., Bronstein, M., Zafeiriou, S.: Single image 3D hand reconstruction with mesh convolutions. arXiv preprint arXiv:1905.01326 (2019)

  5. [5]

    In: International Confer- ence on Medical Image Computing and Computer-Assisted Intervention

    Lee, S.L., Chung, A., Lerotic, M., Hawkins, M.A., Tait, D., Yang, G.Z.: Dy- namic shape instantiation for intra-operative guidance. In: International Confer- ence on Medical Image Computing and Computer-Assisted Intervention. pp. 69–76. Springer (2010) One-stage Shape Instantiation from a Single 2D Image to 3D Point Cloud 9

  6. [6]

    3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point Cloud Reconstruction from a Single Image

    Mandikal, P., Murthy, N., Agarwal, M., Babu, R.V.: 3D-LMNet: Latent embedding matching for accurate and diverse 3D point cloud reconstruction from a single image. arXiv preprint arXiv:1807.07796 (2018)

  7. [7]

    https://uk.mathworks.com/matlabcentral/fileexchange/ 41396-nonrigidicp (2016), accessed: 2019-04-02

    Manu: nonrigidicp. https://uk.mathworks.com/matlabcentral/fileexchange/ 41396-nonrigidicp (2016), accessed: 2019-04-02

  8. [8]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Toth, D., Pfister, M., Maier, A., Kowarschik, M., Hornegger, J.: Adaption of 3D models to 2D x-ray images during endovascular abdominal aneurysm repair. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 339–346. Springer (2015)

  9. [9]

    arXiv preprint arXiv:1809.05955 (2018)

    Zheng, J.Q., Zhou, X.Y., Riga, C., Yang, G.Z.: 3D path planning from a single 2D fluoroscopic image for robot assisted fenestrated endovascular aortic repair. arXiv preprint arXiv:1809.05955 (2018)

  10. [10]

    arXiv preprint arXiv:1902.11089 (2019)

    Zheng, J.Q., Zhou, X.Y., Yang, G.Z.: Real-time 3D shape instantiation for partially-deployed stent segment from a single 2D fluoroscopic image in robot- assisted fenestrated endovascular aortic repair. arXiv preprint arXiv:1902.11089 (2019)

  11. [11]

    Zhou, X., Yang, G., Riga, C., Lee, S.: Stent graft shape instantiation for fenes- trated endovascular aortic repair. pp. 78–79. The Hamlyn Symposium on Medical Robotics (2016)

  12. [12]

    IEEE Robotics and Automation Letters 3(2), 1314–1321 (2018)

    Zhou, X.Y., Lin, J., Riga, C., Yang, G.Z., Lee, S.L.: Real-time 3-D shape instan- tiation from single fluoroscopy projection for fenestrated stent graft deployment. IEEE Robotics and Automation Letters 3(2), 1314–1321 (2018)

  13. [13]

    Medical Image Analysis 44, 86–97 (2018)

    Zhou, X.Y., Yang, G.Z., Lee, S.L.: A real-time and registration-free framework for dynamic shape instantiation. Medical Image Analysis 44, 86–97 (2018)