pith. sign in

arxiv: 1906.11478 · v1 · pith:PEDOZEK2new · submitted 2019-06-27 · 💻 cs.CV · cs.GR

A Convolutional Decoder for Point Clouds using Adaptive Instance Normalization

Pith reviewed 2026-05-25 15:00 UTC · model grok-4.3

classification 💻 cs.CV cs.GR
keywords point cloudsconvolutional decoderadaptive instance normalizationchamfer distance3d shape generationauto-encodingpoint cloud upsampling
0
0 comments X

The pith

A convolutional decoder using adaptive instance normalization generates higher quality point clouds than prior methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a convolutional decoder for point clouds that adapts Adaptive Instance Normalization from image synthesis to improve generation quality and training. It also proposes extensions to the Chamfer distance loss function and highlights the role of careful sampling for both input shapes and generated outputs. These elements are tested in an auto-encoding framework with ablation studies, leading to better performance than existing approaches on tasks like upsampling and single-view reconstruction. A reader would care because neural 3D shape synthesis has historically trailed the quality of 2D image methods, restricting practical uses in graphics and vision.

Core claim

The central claim is that a convolutional decoder for point clouds that incorporates Adaptive Instance Normalization, combined with extensions to Chamfer distance minimization and deliberate sampling strategies, produces higher quality results than current state-of-the-art methods in auto-encoding setups, as confirmed by extensive ablations and experiments on upsampling, single view reconstruction, and shape synthesis.

What carries the argument

Adaptive Instance Normalization layers inside the convolutional point cloud decoder, which adaptively adjust feature statistics from the input to guide generation.

If this is right

  • The decoder achieves better results in point cloud upsampling than prior methods.
  • It improves accuracy in single view reconstruction tasks.
  • It supports higher quality shape synthesis from latent representations.
  • Each added component, including normalization and sampling, contributes measurably to the gains shown in ablations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The sampling emphasis suggests that input and output point distributions may be underappreciated factors in many other point-based generative models.
  • If the normalization technique works here, it could be tested on related unordered structures such as graphs or sparse voxel grids.
  • Pairing this decoder with encoders trained on different modalities might enable cross-domain shape transfer without retraining the full pipeline.

Load-bearing premise

That Adaptive Instance Normalization transfers effectively from image domains to unordered point cloud data and that the proposed Chamfer distance extensions plus sampling choices do not introduce hidden biases that inflate reported gains.

What would settle it

A side-by-side test on the same dataset and metrics where the new decoder shows no improvement in Chamfer distance or visual fidelity over a baseline convolutional decoder that omits Adaptive Instance Normalization and the proposed loss extensions.

Figures

Figures reproduced from arXiv: 1906.11478 by Isaak Lim, Leif Kobbelt, Moritz Ibing.

Figure 1
Figure 1. Figure 1: We show decoding results (blue) for an input shape (red) from the test set. Our convolutional autoencoder with Adaptive Instance Normalization was trained to output 2500 points for inputs with 2500 points. We also visualize outputs from our decoder with a much higher (15000) or lower (500) number of points than the number used during training. Note that with 15000 points we are able to robustly and densely… view at source ↗
Figure 2
Figure 2. Figure 2: Overview over our convolutional decoder: Given is some latent vector z produced by an encoder. Passing it through a multi￾layer perceptron (MLP) produces w, which consists of a series of scaling and translation parameters [(s1,t1),...,(sl ,tl)]. P is a learned constant parameter block (in our case it has dimension 512×2×2×2) used to kickstart the convolutional decoding process. The B blocks each contain an… view at source ↗
Figure 3
Figure 3. Figure 3: Our convolutional encoder follows a similar method to Rethage et al. [RWS∗ 18]. We embed the input point cloud (red) into a volumetric grid. For each cell we pass all points within a certain radius from the cell center into a small PointNet. This results in a grid where each cell encodes local point cloud information via a 32-dimensional feature vector. This is visualized as a multi-colored grid. The grid … view at source ↗
Figure 4
Figure 4. Figure 4: Qualitive results for different autoencoder models. From left to right: ground truth, our results, [GFK∗ 18], [LCHL18]. Note that our method produces less spurious points and reproduces sharper surface details. c 2019 The Author(s) Computer Graphics Forum c 2019 The Eurographics Association and John Wiley & Sons Ltd [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: We show some qualitative results for single view recon￾struction. The input images are shown on the left. Reconstruction results are visualized in blue. The ground truth is rendered in green. chitecture were not tuned particularly for these demonstrations. We expect that with more carefully chosen settings, better results could be achieved. Single View Reconstruction For single view reconstruction (see [P… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative results for our point cloud upsampling. Severely under-sampled input point clouds (50 points) are shown in red. The network predictions and ground truth point clouds are shown in blue and green respectively (16000 points). Point Cloud Upsampling As our network architecture is indiffer￾ent to the number of input or output points, it is straightforward to use our model for the task of point cloud… view at source ↗
read the original abstract

Automatic synthesis of high quality 3D shapes is an ongoing and challenging area of research. While several data-driven methods have been proposed that make use of neural networks to generate 3D shapes, none of them reach the level of quality that deep learning synthesis approaches for images provide. In this work we present a method for a convolutional point cloud decoder/generator that makes use of recent advances in the domain of image synthesis. Namely, we use Adaptive Instance Normalization and offer an intuition on why it can improve training. Furthermore, we propose extensions to the minimization of the commonly used Chamfer distance for auto-encoding point clouds. In addition, we show that careful sampling is important both for the input geometry and in our point cloud generation process to improve results. The results are evaluated in an auto-encoding setup to offer both qualitative and quantitative analysis. The proposed decoder is validated by an extensive ablation study and is able to outperform current state of the art results in a number of experiments. We show the applicability of our method in the fields of point cloud upsampling, single view reconstruction, and shape synthesis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a convolutional decoder for point clouds that incorporates Adaptive Instance Normalization (AdaIN) from image synthesis domains, along with extensions to Chamfer distance minimization and careful sampling of both input geometry and generated points. It evaluates the approach in an auto-encoding setup with qualitative and quantitative analysis, validates it via an extensive ablation study, claims outperformance over state-of-the-art methods in several experiments, and demonstrates applicability to point cloud upsampling, single-view reconstruction, and shape synthesis.

Significance. If the outperformance claims are substantiated with rigorous quantitative evidence and the AdaIN transfer plus Chamfer/sampling modifications are shown to avoid hidden biases, the work would represent a meaningful advance in bridging image-domain synthesis techniques to unordered 3D point cloud generation, potentially improving output quality across multiple 3D tasks.

major comments (2)
  1. Abstract and Experiments: The central claim that the decoder 'is able to outperform current state of the art results in a number of experiments' and is 'validated by an extensive ablation study' provides no quantitative metrics, error bars, dataset specifics, or baseline comparisons, which is load-bearing for assessing whether genuine gains are achieved rather than artifacts of sampling or loss modifications.
  2. Method (convolutional decoder and AdaIN application): No details are given on how unordered point clouds are prepared for convolutional operations (e.g., ordering, rasterization, or fixed-cardinality handling) prior to applying AdaIN, which is critical because the effectiveness of per-channel normalization and fairness of any modified Chamfer loss both depend on these choices; without this, the ablation study cannot isolate the decoder contribution.
minor comments (1)
  1. The abstract would be strengthened by including at least one key quantitative result (e.g., a Chamfer distance value or percentage improvement) to support the outperformance claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the manuscript to improve clarity and better substantiate the claims.

read point-by-point responses
  1. Referee: Abstract and Experiments: The central claim that the decoder 'is able to outperform current state of the art results in a number of experiments' and is 'validated by an extensive ablation study' provides no quantitative metrics, error bars, dataset specifics, or baseline comparisons, which is load-bearing for assessing whether genuine gains are achieved rather than artifacts of sampling or loss modifications.

    Authors: The abstract is intended as a concise summary of contributions. The full manuscript provides quantitative results in the Experiments section, including Chamfer distance metrics on datasets such as ShapeNet, direct comparisons to prior methods, and an extensive ablation study. To strengthen the abstract's claims, we will incorporate key quantitative highlights, dataset details, and baseline references in the revised version. revision: yes

  2. Referee: Method (convolutional decoder and AdaIN application): No details are given on how unordered point clouds are prepared for convolutional operations (e.g., ordering, rasterization, or fixed-cardinality handling) prior to applying AdaIN, which is critical because the effectiveness of per-channel normalization and fairness of any modified Chamfer loss both depend on these choices; without this, the ablation study cannot isolate the decoder contribution.

    Authors: We agree that explicit details on preparing unordered point clouds for convolution are necessary to interpret the AdaIN application and ablation results. The manuscript already specifies a fixed point cardinality and careful sampling of input and output points. We will expand the Method section in revision to detail the exact preparation steps, including any ordering or fixed-size handling prior to the convolutional decoder and AdaIN layers. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical method validated externally

full rationale

The paper introduces a convolutional decoder architecture incorporating Adaptive Instance Normalization, Chamfer distance extensions, and sampling choices for point cloud auto-encoding, upsampling, and synthesis. These components are defined explicitly as design choices and evaluated via ablation studies plus quantitative comparisons against independent prior SOTA methods. No load-bearing step reduces by the paper's own equations to a fitted parameter or self-citation chain; the central outperformance claim rests on external benchmarks rather than internal redefinition or renaming. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from image synthesis transferring to point clouds, plus empirical validation via ablation; no new entities postulated.

free parameters (1)
  • network hyperparameters and sampling parameters
    Multiple architecture and training choices tuned on data to achieve reported gains.
axioms (1)
  • domain assumption Adaptive Instance Normalization improves training stability and quality when applied to point cloud features
    Paper offers intuition but relies on prior image-domain success without independent proof for point clouds.

pith-pipeline@v0.9.0 · 5720 in / 1176 out tokens · 22139 ms · 2026-05-25T15:00:52.731257+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION fin.entry.original add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 i...

  3. [3]

    Achlioptas P., Diamanti O., Mitliagkas I., Guibas L. J. : Learning representations and generative models for 3d point clouds. International Conference on International Conference on Machine Learning (2018)

  4. [4]

    : Point convolutional neural networks by extension operators

    Atzmon M., Maron H., Lipman Y. : Point convolutional neural networks by extension operators. ACM Transactions on Graphics 37 (03 2018)

  5. [5]

    ShapeNet: An Information-Rich 3D Model Repository

    Chang A. X., Funkhouser T., Guibas L., Hanrahan P., Huang Q., Li Z., Savarese S., Savva M., Song S., Su H., et al. : Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)

  6. [6]

    : Fast and accurate deep network learning by exponential linear units (elus)

    Clevert D.-A., Unterthiner T., Hochreiter S. : Fast and accurate deep network learning by exponential linear units (elus). International Conference on Learning Representations (2016)

  7. [7]

    B., Xu D., Gwak J., Chen K., Savarese S

    Choy C. B., Xu D., Gwak J., Chen K., Savarese S. : 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. European Conference on Computer Vision (2016), 628--644

  8. [8]

    : Learning implicit fields for generative shape modeling

    Chen Z., Zhang H. : Learning implicit fields for generative shape modeling. IEEE Conf. on Computer Vision and Pattern Recognition (2019)

  9. [9]

    d., Courville A., Bengio Y

    Dumoulin V., Perez E., Schucher N., Strub F., Vries H. d., Courville A., Bengio Y. : Feature-wise transformations. Distill 3, 7 (2018), e11

  10. [10]

    : A learned representation for artistic style

    Dumoulin V., Shlens J., Kudlur M. : A learned representation for artistic style. International Conference on Learning Representations (2017)

  11. [11]

    : Splinecnn: Fast geometric deep learning with continuous b-spline kernels

    Fey M., Eric Lenssen J., Weichert F., M \"u ller H. : Splinecnn: Fast geometric deep learning with continuous b-spline kernels. IEEE Conf. on Computer Vision and Pattern Recognition (2018), 869--877

  12. [12]

    Fan H., Su H., Guibas L. J. : A point set generation network for 3d object reconstruction from a single image. IEEE Conf. on Computer Vision and Pattern Recognition (2017), 2463--2471

  13. [13]

    G., Russell B., Aubry M

    Groueix T., Fisher M., Kim V. G., Russell B., Aubry M. : AtlasNet: A Papier-M\^ach\'e Approach to Learning 3D Surface Generation . IEEE Conf. on Computer Vision and Pattern Recognition (2018)

  14. [14]

    : Exploring the structure of a real-time, arbitrary neural artistic stylization network

    Ghiasi G., Lee H., Kudlur M., Dumoulin V., Shlens J. : Exploring the structure of a real-time, arbitrary neural artistic stylization network. British Machine Vision Conference (2017)

  15. [15]

    : Generative adversarial nets

    Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. : Generative adversarial nets. Advances in Neural Information Processing Systems (2014), 2672--2680

  16. [16]

    : Multiresolution tree networks for 3d point cloud processing

    Gadelha M., Wang R., Maji S. : Multiresolution tree networks for 3d point cloud processing. European Conference on Computer Vision (2018)

  17. [17]

    : Arbitrary style transfer in real-time with adaptive instance normalization

    Huang X., Belongie S. : Arbitrary style transfer in real-time with adaptive instance normalization. IEEE International Conference on Computer Vision (2017), 1501--1510

  18. [18]

    : High-resolution shape completion using deep neural networks for global structure and local geometry inference

    Han X., Li Z., Huang H., Kalogerakis E., Yu Y. : High-resolution shape completion using deep neural networks for global structure and local geometry inference. IEEE Conf. on Computer Vision and Pattern Recognition (2017), 85--93

  19. [19]

    : Batch normalization: accelerating deep network training by reducing internal covariate shift

    Ioffe S., Szegedy C. : Batch normalization: accelerating deep network training by reducing internal covariate shift. International Conference on International Conference on Machine Learning 37 (2015), 448--456

  20. [20]

    : A style-based generator architecture for generative adversarial networks

    Karras T., Laine S., Aila T. : A style-based generator architecture for generative adversarial networks. IEEE Conf. on Computer Vision and Pattern Recognition (2019)

  21. [21]

    P., Welling M

    Kingma D. P., Welling M. : Auto-encoding variational bayes. International Conference on Learning Representations (2014)

  22. [22]

    : Pointcnn: Convolution on x-transformed points

    Li Y., Bu R., Sun M., Wu W., Di X., Chen B. : Pointcnn: Convolution on x-transformed points. Advances in Neural Information Processing Systems (2018), 828--838

  23. [23]

    M., Hee Lee G

    Li J., Chen B. M., Hee Lee G. : So-net: Self-organizing network for point cloud analysis. IEEE Conf. on Computer Vision and Pattern Recognition (2018), 9397--9406

  24. [24]

    : Learning efficient point cloud generation for dense 3d object reconstruction

    Lin C.-H., Kong C., Lucey S. : Learning efficient point cloud generation for dense 3d object reconstruction. Thirty-Second AAAI Conference on Artificial Intelligence (2018)

  25. [25]

    : Least squares quantization in pcm

    Lloyd S. : Least squares quantization in pcm. IEEE transactions on information theory 28, 2 (1982), 129--137

  26. [26]

    : Grass: Generative recursive autoencoders for shape structures

    Li J., Xu K., Chaudhuri S., Yumer E., Zhang H., Guibas L. : Grass: Generative recursive autoencoders for shape structures. ACM Transactions on Graphics 36, 4 (2017), 52

  27. [27]

    : Occupancy networks: Learning 3d reconstruction in function space

    Mescheder L., Oechsle M., Niemeyer M., Nowozin S., Geiger A. : Occupancy networks: Learning 3d reconstruction in function space. IEEE Conf. on Computer Vision and Pattern Recognition (2019)

  28. [28]

    Nash C., Williams C. K. : The shape variational autoencoder: A deep generative model of part-segmented 3d objects. Computer Graphics Forum 36, 5 (2017), 1--12

  29. [29]

    J., Florence P., Straub J., Newcombe R., Lovegrove S

    Park J. J., Florence P., Straub J., Newcombe R., Lovegrove S. : Deepsdf: Learning continuous signed distance functions for shape representation. IEEE Conf. on Computer Vision and Pattern Recognition (2019)

  30. [30]

    R., Su H., Mo K., Guibas L

    Qi C. R., Su H., Mo K., Guibas L. J. : Pointnet: Deep learning on point sets for 3d classification and segmentation. IEEE Conf. on Computer Vision and Pattern Recognition 1, 2 (2017), 4

  31. [31]

    R., Yi L., Su H., Guibas L

    Qi C. R., Yi L., Su H., Guibas L. J. : Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems (2017), 5099--5108

  32. [32]

    J., Kale S., Kumar S

    Reddi S. J., Kale S., Kumar S. : On the convergence of adam and beyond. International Conference on Learning Representations (2018)

  33. [33]

    : Fully-convolutional point networks for large-scale point clouds

    Rethage D., Wald J., Sturm J., Navab N., Tombari F. : Fully-convolutional point networks for large-scale point clouds. European Conference on Computer Vision (2018)

  34. [34]

    : Dropout: A simple way to prevent neural networks from overfitting

    Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R. : Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15 (2014), 1929--1958

  35. [35]

    : Surfnet: Generating 3d shape surfaces using deep residual networks

    Sinha A., Unmesh A., Huang Q., Ramani K. : Surfnet: Generating 3d shape surfaces using deep residual networks. IEEE Conf. on Computer Vision and Pattern Recognition (2017), 6040--6049

  36. [36]

    : Very deep convolutional networks for large-scale image recognition

    Simonyan K., Zisserman A. : Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (2015)

  37. [37]

    Instance Normalization: The Missing Ingredient for Fast Stylization

    Ulyanov D., Vedaldi A., Lempitsky V. : Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016)

  38. [38]

    E., Bronstein M

    Wang Y., Sun Y., Liu Z., Sarma S. E., Bronstein M. M., Solomon J. M. : Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (2019)

  39. [39]

    : Adaptive O-CNN: A Patch-based Deep Representation of 3D Shapes

    Wang P.-S., Sun C.-Y., Liu Y., Tong X. : Adaptive O-CNN: A Patch-based Deep Representation of 3D Shapes . ACM Transactions on Graphics 37, 6 (2018)

  40. [40]

    : Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling

    Wu J., Zhang C., Xue T., Freeman B., Tenenbaum J. : Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Advances in Neural Information Processing Systems (2016), 82--90

  41. [41]

    : P2p-net: Bidirectional point displacement net for shape transform

    Yin K., Huang H., Cohen-Or D., Zhang H. : P2p-net: Bidirectional point displacement net for shape transform. ACM Transactions on Graphics 37, 4 (2018), 152:1--152:13

  42. [42]

    : Pu-net: Point cloud upsampling network

    Yu L., Li X., Fu C.-W., Cohen-Or D., Heng P.-A. : Pu-net: Point cloud upsampling network. IEEE Conf. on Computer Vision and Pattern Recognition (2018), 2790--2799

  43. [43]

    : Patch-based progressive 3d point set upsampling

    Yifan W., Wu S., Huang H., Cohen-Or D., Sorkine-Hornung O. : Patch-based progressive 3d point set upsampling. IEEE Conf. on Computer Vision and Pattern Recognition (2019)