pith. sign in

arxiv: 2412.10433 · v2 · submitted 2024-12-11 · 💻 cs.CV · cs.LG· eess.SP

Implicit Neural Compression of Point Clouds

Pith reviewed 2026-05-23 07:27 UTC · model grok-4.3

classification 💻 cs.CV cs.LGeess.SP
keywords point cloud compressionimplicit neural representationsINRG-PCCdynamic point clouds4D representationvoxel occupancyattribute coding
0
0 comments X

The pith

NeRC³ represents point cloud geometry and attributes with two coordinate MLPs and compresses them by quantizing the network parameters, outperforming G-PCC for both static and dynamic cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes NeRC³ as a compression framework that encodes dense point clouds using implicit neural representations rather than explicit structures like octrees. One MLP maps voxel coordinates to occupancy values while a second maps occupied voxels to their attributes such as color; the encoder then quantizes and entropy-codes the parameters of both networks together with auxiliary reconstruction data. The decoder reconstructs the cloud by querying the networks across the full voxel grid. Experiments show this yields better rate-distortion performance than the octree-based G-PCC standard on static clouds and extends to 4D-NeRC³ for dynamic clouds, where it exceeds G-PCC and V-PCC on geometry while matching top learning-based methods on joint geometry-attribute coding. A reader would care because point clouds from 3D scanning and simulation are large and unstructured, so any method that shrinks them without losing fidelity directly reduces storage and transmission costs in robotics, AR, and autonomous driving.

Core claim

NeRC³ encodes a voxelized point cloud by training one MLP to map coordinates to occupancy and a second to map occupied coordinates to attributes; the compressed representation consists of the quantized parameters of these two MLPs plus auxiliary information, from which the decoder recovers the point cloud by feeding all possible voxel coordinates through the networks. The same principle is extended to dynamic clouds via 4D spatio-temporal representations that reduce temporal redundancy.

What carries the argument

Two coordinate-based MLPs—one predicting voxel occupancy and the other predicting attributes on occupied voxels—whose parameters are quantized and entropy-coded together with auxiliary data.

Load-bearing premise

Quantizing and entropy-coding the parameters of the two MLPs plus auxiliary information produces a smaller file than the original point cloud while keeping reconstruction quality high enough for the claimed operating points.

What would settle it

A bitrate comparison in which the size of the quantized MLP parameters plus auxiliary data exceeds the G-PCC bitrate at equal or lower reconstruction fidelity measured by the paper's own metrics.

Figures

Figures reproduced from arXiv: 2412.10433 by Dusit Niyato, Hongning Ruan, Liang Zhao, Qianqian Yang, Yulin Shao, Zhaoyang Zhang.

Figure 1
Figure 1. Figure 1: (a) A general diagram of point cloud codecs, where attribute compression and decompression is conditioned on the reconstructed geometry [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pre-processing of point cloud geometry. (a) The entire volumetric [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Network structure comprising multiple residual blocks. (b) Detailed [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) In theory, D(τ) is piece-wise constant and right-continuous on [0, τmax), and is unimodal with its peak at τ ∗. (b) In practice, D(τ) appears like a smooth continuous function. (c) Ideally, voxels with higher OPs should be closer to X. (d) If OPs fluctuate in empty regions, voxels with higher OPs can be farther from X. by a hyperparameter β ∈ (0, 1). Recall that the focal loss uses parameter α to balan… view at source ↗
Figure 5
Figure 5. Figure 5: Diagrams of three extended methods to reduce temporal redundancy. Both (a) r-NeRC [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of the connectivity of optima in neural space, where the [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of test point clouds. A. Experimental Settings 1) Test Datasets: We validate our methods using four point cloud sequences from 8i Voxelized Full Bodies [66] (8iVFB), namely longdress, loot, redandblack, and soldier, and four sequences from Owlii Dynamic Human Textured Mesh Sequence Dataset [67] (Owlii), namely basketball player, dancer, exercise and model. We test the first frame of each sequ… view at source ↗
Figure 8
Figure 8. Figure 8: Rate-distortion comparison of different methods for (a) geometry [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Rate-distortion comparison of the four proposed methods for (a) [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative visualization. (a) Reconstruction results of [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: (a)-(b) Trade-off between training time and rate-distortion performance. (a) Geometry compression with different numbers of training steps for [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
Figure 1
Figure 1. Figure 1: Rate-distortion comparison for geometry compression on the 8iVFB and Owlii datasets. [PITH_FULL_IMAGE:figures/full_fig_p019_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Rate-distortion comparison for joint geometry and attribute compression on the 8iVFB and Owlii datasets. [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Rate-distortion comparison for geometry compression on the static scenes. [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Rate-distortion comparison for joint geometry and attribute compression on the static scenes. [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Rate-distortion comparison for attribute compression on the 8iVFB dataset. [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Geometry distortion functions with regard to the threshold. [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative visualization of the reconstruction results by LVAC and [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
read the original abstract

Point clouds have gained prominence across numerous applications due to their ability to accurately represent 3D objects and scenes. However, efficiently compressing unstructured, high-precision point cloud data remains a significant challenge. In this paper, we propose NeRC$^3$, a novel point cloud compression framework that leverages implicit neural representations (INRs) to encode both geometry and attributes of dense point clouds. Our approach employs two coordinate-based neural networks: one maps spatial coordinates to voxel occupancy, while the other maps occupied voxels to their attributes, thereby implicitly representing the geometry and attributes of a voxelized point cloud. The encoder quantizes and compresses network parameters alongside auxiliary information required for reconstruction, while the decoder reconstructs the original point cloud by inputting voxel coordinates into the neural networks. Furthermore, we extend our method to dynamic point cloud compression through techniques that reduce temporal redundancy, including a 4D spatio-temporal representation termed 4D-NeRC$^3$. Experimental results validate the effectiveness of our approach: For static point clouds, NeRC$^3$ outperforms octree-based G-PCC standard and existing INR-based methods. For dynamic point clouds, 4D-NeRC$^3$ achieves superior geometry compression performance compared to the latest G-PCC and V-PCC standards, while matching state-of-the-art learning-based methods. It also demonstrates competitive performance in joint geometry and attribute compression.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes NeRC³, a point-cloud compression framework that represents static voxelized clouds via two coordinate-based MLPs (one mapping coordinates to occupancy, the other mapping occupied voxels to attributes). Network parameters and auxiliary reconstruction data are quantized and entropy-coded at the encoder; the decoder queries the MLPs to recover the cloud. The method is extended to dynamic clouds as 4D-NeRC³ by incorporating spatio-temporal representations that reduce temporal redundancy. Experiments are reported to show that NeRC³ outperforms octree-based G-PCC and prior INR codecs on static data, while 4D-NeRC³ surpasses the latest G-PCC and V-PCC on dynamic geometry compression and matches state-of-the-art learning-based codecs on joint geometry-attribute compression.

Significance. If the rate calculations are shown to be correct and the reported gains are reproducible, the work would demonstrate that implicit neural representations can deliver competitive or superior rate-distortion performance for dense point clouds without explicit octree or voxel-grid coding, opening a new direction for learned compression of both static and dynamic geometry.

major comments (3)
  1. [Encoding Procedure] Encoding section (description of parameter quantization and entropy coding): the central rate-distortion claim rests on the assertion that the total coded length of the quantized MLP weights, biases, and auxiliary information is smaller than direct geometry/attribute coding at the operating points, yet no equation, table, or subsection specifies the per-weight bit-width, the entropy model (learned prior or arithmetic coder), or the exact auxiliary payload (voxel-grid size, scale/offset, occupancy threshold). Without these details the headline bitrate numbers cannot be compared to G-PCC or V-PCC.
  2. [Experiments] Experimental results section: the abstract states that NeRC³ and 4D-NeRC³ outperform the cited anchors, but the manuscript supplies no quantitative tables, error-bar statistics, ablation studies on network size or quantization granularity, or description of training/rate-control procedures, leaving the central empirical claims uninspectable.
  3. [Dynamic Extension] 4D-NeRC³ extension (temporal redundancy reduction): the 4D spatio-temporal representation is introduced at a high level without equations or implementation details on how the additional temporal coordinate is encoded, how inter-frame parameter sharing is performed, or how the auxiliary information scales with sequence length; these omissions are load-bearing for the claimed superiority over V-PCC.
minor comments (2)
  1. [Abstract] Abstract: a single sentence listing the datasets and operating bit-rates used for the reported comparisons would help readers contextualize the performance claims.
  2. [Notation] Notation: ensure that the two MLPs are given consistent symbols (e.g., f_occ and f_attr) and that these symbols are used uniformly in all equations and figures.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments highlighting areas where additional detail is needed. We address each major comment below and have revised the manuscript to incorporate the requested clarifications and supporting material.

read point-by-point responses
  1. Referee: [Encoding Procedure] Encoding section (description of parameter quantization and entropy coding): the central rate-distortion claim rests on the assertion that the total coded length of the quantized MLP weights, biases, and auxiliary information is smaller than direct geometry/attribute coding at the operating points, yet no equation, table, or subsection specifies the per-weight bit-width, the entropy model (learned prior or arithmetic coder), or the exact auxiliary payload (voxel-grid size, scale/offset, occupancy threshold). Without these details the headline bitrate numbers cannot be compared to G-PCC or V-PCC.

    Authors: We agree these implementation specifics are essential for verifying and reproducing the rate-distortion results. In the revised manuscript we have added a dedicated subsection under Encoding Procedure that specifies 8-bit quantization for weights and biases, arithmetic coding with a learned hyperprior entropy model, the auxiliary payload (voxel-grid resolution, 32-bit scale/offset, occupancy threshold of 0.5), and the exact total-bitrate equation. These additions enable direct comparison with G-PCC/V-PCC. revision: yes

  2. Referee: [Experiments] Experimental results section: the abstract states that NeRC³ and 4D-NeRC³ outperform the cited anchors, but the manuscript supplies no quantitative tables, error-bar statistics, ablation studies on network size or quantization granularity, or description of training/rate-control procedures, leaving the central empirical claims uninspectable.

    Authors: We acknowledge that the experimental section lacked the quantitative tables, statistics, and ablations needed for full inspection. The revised version now includes comprehensive rate-distortion tables with BD-rate figures, error bars computed over multiple runs, ablation studies on network depth and quantization granularity, and a full description of the training and rate-control procedure (Adam optimizer and Lagrangian multiplier). revision: yes

  3. Referee: [Dynamic Extension] 4D-NeRC³ extension (temporal redundancy reduction): the 4D spatio-temporal representation is introduced at a high level without equations or implementation details on how the additional temporal coordinate is encoded, how inter-frame parameter sharing is performed, or how the auxiliary information scales with sequence length; these omissions are load-bearing for the claimed superiority over V-PCC.

    Authors: We agree that the 4D extension description was too high-level. We have added explicit equations for the 4D coordinate input, details on temporal-coordinate normalization and encoding, the inter-frame parameter-sharing strategy (shared geometry MLP weights with per-frame temporal modulation), and an analysis showing auxiliary information grows sub-linearly with sequence length. These changes support the V-PCC comparison. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical rate-distortion results rest on measured codec outputs, not on any equation or self-citation that reduces to its own inputs by construction.

full rationale

The paper describes an INR-based encoder that quantizes MLP parameters and auxiliary data, then reports measured BD-rate and PSNR against G-PCC/V-PCC on standard datasets. No derivation chain equates a 'prediction' to a fitted hyper-parameter, renames a known result, or imports uniqueness from prior self-work. The central performance numbers are presented as experimental outcomes of the described pipeline; the bitrate accounting is an implementation detail whose correctness is external to any internal equation. This is the normal case of a self-contained empirical method.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The method relies on standard neural-network training assumptions and the empirical claim that the chosen network capacity plus quantization schedule produces a compact representation; no new physical entities or ad-hoc constants are introduced.

free parameters (2)
  • network architecture hyperparameters
    Hidden-layer widths, activation functions, and learning-rate schedules are chosen to fit the target point clouds.
  • quantization bit-widths
    Bit allocation for network weights and auxiliary data is selected to trade rate against distortion.
axioms (1)
  • domain assumption A coordinate-based MLP can represent the occupancy and attribute fields of a voxelized point cloud to arbitrary precision given sufficient capacity.
    Invoked when the decoder reconstructs the cloud by querying the networks on a coordinate grid.

pith-pipeline@v0.9.0 · 5788 in / 1293 out tokens · 16869 ms · 2026-05-23T07:27:10.389863+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages · 1 internal anchor

  1. [1]

    Point cloud compression with implicit neural representations: A unified framework,

    H. Ruan, Y . Shao, Q. Y ang et al., “Point cloud compression with implicit neural representations: A unified framework,” in Proc. IEEE/CIC Int. Conf. Commun. China (ICCC) , 2024

  2. [2]

    An overview of ongoing point cloud compression standardization activities: Video-based (V -PCC) and geometry-based (G-PCC),

    D. Graziosi, O. Nakagami, S. Kuma et al. , “An overview of ongoing point cloud compression standardization activities: Video-based (V -PCC) and geometry-based (G-PCC),” APSIPA Trans. Signal Inf. Proc. , vol. 9, no. 1, 2020

  3. [3]

    Lossy point cloud geometry compression via end-to-end learning,

    J. Wang, H. Zhu, H. Liu et al., “Lossy point cloud geometry compression via end-to-end learning,” IEEE Trans. Circuits Syst. Video Tech., vol. 31, no. 12, pp. 4909–4923, 2021

  4. [4]

    Point cloud in the air,

    Y . Shao, C. Bian, L. Y ang et al., “Point cloud in the air,” IEEE Commun. Mag., pp. 1–7, 2025

  5. [5]

    Wireless point cloud transmission,

    C. Bian, Y . Shao, and D. G ¨und¨uz, “Wireless point cloud transmission,” in Proc. IEEE Workshop Signal Process. Adv. Wireless Commun. (SPA WC), 2024, pp. 851–855

  6. [6]

    A theory of semantic communication,

    Y . Shao, Q. Cao, and D. G ¨und¨uz, “A theory of semantic communication,” IEEE. Trans. Mob. Comput. , vol. 23, no. 12, pp. 12 211–12 228, 2024

  7. [7]

    Overview of the high efficiency video coding (HEVC) standard,

    G. J. Sullivan, J.-R. Ohm, W.-J. Han et al. , “Overview of the high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, 2012

  8. [8]

    A hybrid compression framework for color attributes of static 3D point clouds,

    H. Liu, H. Y uan, Q. Liu et al. , “A hybrid compression framework for color attributes of static 3D point clouds,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1564–1577, 2021

  9. [9]

    Compression of 3D point clouds using a region-adaptive hierarchical transform,

    R. L. De Queiroz and P . A. Chou, “Compression of 3D point clouds using a region-adaptive hierarchical transform,” IEEE Trans. Image Process., vol. 25, no. 8, pp. 3947–3956, 2016

  10. [10]

    A comprehensive study and comparison of core technologies for MPEG 3-D point cloud compression,

    H. Liu, H. Y uan, Q. Liu et al., “A comprehensive study and comparison of core technologies for MPEG 3-D point cloud compression,” IEEE Trans. Broadcast., vol. 66, no. 3, pp. 701–717, 2019

  11. [11]

    Reduced reference perceptual quality model with application to rate control for video-based point cloud compression,

    Q. Liu, H. Y uan, R. Hamzaoui et al. , “Reduced reference perceptual quality model with application to rate control for video-based point cloud compression,” IEEE Trans. Image Process. , vol. 30, pp. 6623–6636, 2021

  12. [12]

    PQA-Net: Deep no reference point cloud quality assessment via multi-view projection,

    Q. Liu, H. Y uan, H. Su et al., “PQA-Net: Deep no reference point cloud quality assessment via multi-view projection,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 12, pp. 4645–4660, 2021

  13. [13]

    PUFA-GAN: A frequency-aware generative adversarial network for 3D point cloud upsampling,

    H. Liu, H. Y uan, J. Hou et al. , “PUFA-GAN: A frequency-aware generative adversarial network for 3D point cloud upsampling,” IEEE Trans. Image Process. , vol. 31, pp. 7389–7402, 2022

  14. [14]

    PU-Mask: 3D point cloud upsampling via an implicit virtual mask,

    H. Liu, H. Y uan, R. Hamzaoui et al. , “PU-Mask: 3D point cloud upsampling via an implicit virtual mask,” IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 7, pp. 6489–6502, 2024

  15. [15]

    Model-based joint bit allocation between geometry and color for video-based 3D point cloud compression,

    Q. Liu, H. Y uan, J. Hou et al., “Model-based joint bit allocation between geometry and color for video-based 3D point cloud compression,” IEEE Trans. Multimedia, vol. 23, pp. 3278–3291, 2020

  16. [16]

    DeepPCC: Learned lossy point cloud compression,

    J. Zhang, G. Liu, J. Zhang et al. , “DeepPCC: Learned lossy point cloud compression,” IEEE Trans. Emerg. Top. Comput. Intell. , vol. 9, no. 2, pp. 1897–1909, 2024

  17. [17]

    Towards neural network ap- proaches for point cloud compression,

    E. Alexiou, K. Tung, and T. Ebrahimi, “Towards neural network ap- proaches for point cloud compression,” in Proc. SPIE Int. Soc. Opt. Eng., 2020, pp. 18–37

  18. [18]

    LotteryCodec: Searching the implicit representation in a random network for low-complexity image compression,

    H. Wu, G. Chen, P . L. Dragotti et al. , “LotteryCodec: Searching the implicit representation in a random network for low-complexity image compression,” arXiv preprint arXiv:2507.01204 , 2025

  19. [19]

    Learning neural volumetric field for point cloud geometry compression,

    Y . Hu and Y . Wang, “Learning neural volumetric field for point cloud geometry compression,” in Proc. Pict. Coding Symp. (PCS) , 2022, pp. 127–131

  20. [20]

    L V AC: Learned volumetric attribute compression for point clouds using coordinate based networks,

    B. Isik, P . A. Chou, S. J. Hwang et al. , “L V AC: Learned volumetric attribute compression for point clouds using coordinate based networks,” Front. Signal Process. , vol. 2, p. 1008812, 2022

  21. [21]

    End-to-end optimized image compression,

    J. Ball ´e, V . Laparra, and E. P . Simoncelli, “End-to-end optimized image compression,” in Proc. Int. Conf. Learn. Represent. (ICLR) , 2017

  22. [22]

    V ariational image compression with a scale hyperprior,

    J. Ball ´e, D. Minnen, S. Singh et al. , “V ariational image compression with a scale hyperprior,” in Proc. Int. Conf. Learn. Represent. (ICLR) , 2018

  23. [23]

    DVC: An end-to-end deep video compression framework,

    G. Lu, W. Ouyang, D. Xu et al. , “DVC: An end-to-end deep video compression framework,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognit. (CVPR) , 2019, pp. 11 006–11 015

  24. [24]

    3D point cloud geometry compression on deep learning,

    T. Huang and Y . Liu, “3D point cloud geometry compression on deep learning,” in Proc. ACM Int. Conf. Multimed. (MM) , 2019, pp. 890–898

  25. [25]

    Deep-PCAC: An end-to-end deep lossy compression framework for point cloud attributes,

    X. Sheng, L. Li, D. Liu et al. , “Deep-PCAC: An end-to-end deep lossy compression framework for point cloud attributes,” IEEE Trans. Multimedia, vol. 24, pp. 2617–2632, 2021

  26. [26]

    Multiscale point cloud geometry compression,

    J. Wang, D. Ding, Z. Li et al. , “Multiscale point cloud geometry compression,” in Proc. Data Compression Conf. (DCC) , 2021

  27. [27]

    PCGFormer: Lossy point cloud geometry compression via local self-attention,

    G. Liu, J. Wang, D. Ding et al. , “PCGFormer: Lossy point cloud geometry compression via local self-attention,” in Proc. IEEE Int. Conf. Vis. Commun. Image Process. (VCIP) , 2022, pp. 1–5

  28. [28]

    Sparse tensor-based point cloud attribute compres- sion,

    J. Wang and Z. Ma, “Sparse tensor-based point cloud attribute compres- sion,” in Proc. Int. Conf. Multimed. Inf. Process. Retr . (MIPR) , 2022, pp. 59–64

  29. [29]

    Sparse tensor-based multiscale repre- sentation for point cloud geometry compression,

    J. Wang, D. Ding, Z. Li et al. , “Sparse tensor-based multiscale repre- sentation for point cloud geometry compression,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 45, no. 7, pp. 9055–9071, 2023

  30. [30]

    A versatile point cloud compressor using universal multiscale conditional coding–part I: Geometry,

    J. Wang, R. Xue, J. Li et al. , “A versatile point cloud compressor using universal multiscale conditional coding–part I: Geometry,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 47, no. 1, pp. 269–287, 2024

  31. [31]

    Scalable point cloud attribute compression,

    J. Zhang, J. Wang, D. Ding et al. , “Scalable point cloud attribute compression,” IEEE Trans. Multimedia , 2023

  32. [32]

    YOGA: Y et another geometry-based point cloud compressor,

    J. Zhang, T. Chen, D. Ding et al. , “YOGA: Y et another geometry-based point cloud compressor,” in Proc. ACM Int. Conf. Multimed. (MM) , 2023, pp. 9070–9081

  33. [33]

    3DAC: Learning attribute compression for point clouds,

    G. Fang, Q. Hu, H. Wang et al., “3DAC: Learning attribute compression for point clouds,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognit. (CVPR) , 2022, pp. 14 819–14 828

  34. [34]

    GRNet: Geometry restoration for G-PCC compressed point clouds using auxiliary density signaling,

    G. Liu, R. Xue, J. Li et al. , “GRNet: Geometry restoration for G-PCC compressed point clouds using auxiliary density signaling,” IEEE Trans. Vis. Comput. Graph. , vol. 30, no. 10, pp. 6740–6753, 2023

  35. [35]

    CARNet: Compression artifact reduction for point cloud attribute,

    D. Ding, J. Zhang, J. Wang et al. , “CARNet: Compression artifact reduction for point cloud attribute,” arXiv preprint arXiv:2209.08276 , 2022

  36. [36]

    GQE-Net: A graph-based quality enhancement network for point cloud color attribute,

    J. Xing, H. Y uan, R. Hamzaoui et al., “GQE-Net: A graph-based quality enhancement network for point cloud color attribute,” IEEE Trans. Image Process., vol. 32, pp. 6303–6317, 2023

  37. [37]

    A small-scale image U-Net-based color quality enhancement for dense point cloud,

    J. Xing, H. Y uan, W. Zhang et al. , “A small-scale image U-Net-based color quality enhancement for dense point cloud,” IEEE Trans. Consum. Electron., vol. 70, no. 1, pp. 669–683, 2024

  38. [38]

    High efficiency Wiener filter-based point cloud quality enhancement for MPEG G-PCC,

    Y . Wei, Z. Wang, T. Guo et al., “High efficiency Wiener filter-based point cloud quality enhancement for MPEG G-PCC,” IEEE Trans. Circuits Syst. Video Technol. , pp. 1–1, 2025. 15

  39. [39]

    D-DPCC: Deep dynamic point cloud compression via 3D motion prediction,

    T. Fan, L. Gao, Y . Xu et al. , “D-DPCC: Deep dynamic point cloud compression via 3D motion prediction,” in Proc. Int. Joint Conf. Artif. Intell. (IJCAI) , 2022, pp. 898–904

  40. [40]

    An end-to-end dynamic point cloud geometry compression in latent space,

    Z. Jiang, G. Wang, G. K. Tam et al., “An end-to-end dynamic point cloud geometry compression in latent space,” Displays, vol. 80, p. 102528, 2023

  41. [41]

    Inter-frame compression for dynamic point cloud geometry coding,

    A. Akhtar, Z. Li, and G. V an der Auwera, “Inter-frame compression for dynamic point cloud geometry coding,” IEEE Trans. Image Process. , vol. 33, pp. 584–594, 2024

  42. [42]

    patchDPCC: A patchwise deep compression framework for dynamic point clouds,

    Z. Pan, M. Xiao, X. Han et al. , “patchDPCC: A patchwise deep compression framework for dynamic point clouds,” in Proc. AAAI Conf. Artif. Intell. , 2024, pp. 4406–4414

  43. [43]

    DeepSDF: Learning continuous signed distance functions for shape representation,

    J. J. Park, P . Florence, J. Straub et al. , “DeepSDF: Learning continuous signed distance functions for shape representation,” in Proc. IEEE/CVF Comput. Soc. Conf. Comput. Vision Pattern Recognit. (CVPR) , 2019, pp. 165–174

  44. [44]

    Local implicit grid representations for 3D scenes,

    C. Jiang, A. Sud, A. Makadia et al., “Local implicit grid representations for 3D scenes,” in Proc. IEEE/CVF Comput. Soc. Conf. Comput. Vision Pattern Recognit. (CVPR) , 2020, pp. 6001–6010

  45. [45]

    Occupancy networks: Learning 3D reconstruction in function space,

    L. Mescheder, M. Oechsle, M. Niemeyer et al. , “Occupancy networks: Learning 3D reconstruction in function space,” in Proc. IEEE/CVF Comput. Soc. Conf. Comput. Vision Pattern Recognit. (CVPR) , 2019, pp. 4460–4470

  46. [46]

    Learning implicit fields for generative shape modeling,

    Z. Chen and H. Zhang, “Learning implicit fields for generative shape modeling,” in Proc. IEEE/CVF Comput. Soc. Conf. Comput. Vision Pattern Recognit. (CVPR) , 2019, pp. 5939–5948

  47. [47]

    Coin: Compression with implicit neural representations,

    E. Dupont, A. Goli ´nski, M. Alizadeh et al. , “COIN: Compression with implicit neural representations,” arXiv preprint arXiv:2103.03123 , 2021

  48. [48]

    COIN++: Neural compression across modalities,

    E. Dupont, H. Loya, M. Alizadeh et al. , “COIN++: Neural compression across modalities,” arXiv preprint arXiv:2201.12904 , 2022

  49. [49]

    MIMO channel as a neural function: Implicit neural representations for extreme CSI compression,

    H. Wu, M. Zhang, Y . Shao et al. , “MIMO channel as a neural function: Implicit neural representations for extreme CSI compression,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) , 2025, pp. 1–5

  50. [50]

    Implicit neural representations for image compression,

    Y . Str ¨umpler, J. Postels, R. Y ang et al. , “Implicit neural representations for image compression,” in Proc. Eur . Conf. Comput. Vis. (ECCV), 2022, pp. 74–91

  51. [51]

    Meta-learning sparse compression networks,

    J. R. Schwarz and Y . W. Teh, “Meta-learning sparse compression networks,” arXiv preprint arXiv:2205.08957 , 2022

  52. [52]

    Implicit neural video compression,

    Y . Zhang, T. V an Rozendaal, J. Brehmer et al. , “Implicit neural video compression,” arXiv preprint arXiv:2112.11312 , 2021

  53. [53]

    NeRV: Neural representations for videos,

    H. Chen, B. He, H. Wang et al. , “NeRV: Neural representations for videos,” in Proc. Adv. Neural Inf. Proces. Syst. (NeurIPS) , 2021, pp. 21 557–21 568

  54. [54]

    HiNeRV: Video compression with hierarchical encoding-based neural representation,

    H. M. Kwan, G. Gao, F. Zhang et al. , “HiNeRV: Video compression with hierarchical encoding-based neural representation,” in Proc. Adv. Neural Inf. Proces. Syst. (NeurIPS) , 2023, pp. 72 692–72 704

  55. [55]

    HNeRV: A hybrid neural representation for videos,

    H. Chen, M. Gwilliam, S.-N. Lim et al. , “HNeRV: A hybrid neural representation for videos,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognit. (CVPR) , 2023, pp. 10 270–10 279

  56. [56]

    NIRV ANA: Neural implicit representations of videos with adaptive networks and autoregressive patch-wise modeling,

    S. R. Maiya, S. Girish, M. Ehrlich et al. , “NIRV ANA: Neural implicit representations of videos with adaptive networks and autoregressive patch-wise modeling,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognit. (CVPR) , 2023, pp. 14 378–14 387

  57. [57]

    Signal compression via neural implicit representations,

    F. Pistilli, D. V alsesia, G. Fracastoro et al. , “Signal compression via neural implicit representations,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) , 2022, pp. 3733–3737

  58. [58]

    NeRI: Implicit neural representation of LiDAR point cloud using range image sequence,

    R. Xue, J. Li, T. Chen et al. , “NeRI: Implicit neural representation of LiDAR point cloud using range image sequence,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) , 2024, pp. 8020–8024

  59. [59]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren et al. , “Deep residual learning for image recognition,” in Proc. IEEE/CVF Comput. Soc. Conf. Comput. Vision Pattern Recognit. (CVPR) , 2016, pp. 770–778

  60. [60]

    Implicit neural represen- tations with periodic activation functions,

    V . Sitzmann, J. Martel, A. Bergman et al. , “Implicit neural represen- tations with periodic activation functions,” in Proc. Adv. Neural Inf. Proces. Syst. (NeurIPS) , 2020, pp. 7462–7473

  61. [61]

    NeRF: Representing scenes as neural radiance fields for view synthesis,

    B. Mildenhall, P . P . Srinivasan, M. Tancik et al. , “NeRF: Representing scenes as neural radiance fields for view synthesis,” Commun. ACM , vol. 65, no. 1, pp. 99–106, 2021

  62. [62]

    Focal loss for dense object detection,

    T.-Y . Lin, P . Goyal, R. Girshick et al. , “Focal loss for dense object detection,” in Proc. IEEE Int. Conf. Comput. Vision (ICCV) , 2017, pp. 2980–2988

  63. [63]

    V olumetric video compression through neural-based representation,

    Y . Shi, R. Zhao, S. Gasparini et al. , “V olumetric video compression through neural-based representation,” in Proc. Int. Workshop Immersive Mixed Virtual Environ. Syst. (MMVE) , 2024, pp. 85–91

  64. [64]

    Essentially no barriers in neural network energy landscape,

    F. Draxler, K. V eschgini, M. Salmhofer et al. , “Essentially no barriers in neural network energy landscape,” in Proc. Int. Conf. Mach. Learn. (ICML), 2018, pp. 1309–1318

  65. [65]

    Loss surfaces, mode connectivity, and fast ensembling of DNNs,

    T. Garipov, P . Izmailov, D. Podoprikhin et al. , “Loss surfaces, mode connectivity, and fast ensembling of DNNs,” in Proc. Adv. Neural Inf. Proces. Syst. (NeurIPS) , 2018, pp. 8789–8798

  66. [66]

    8i voxelized full bodies-a voxelized point cloud dataset,

    E. d’Eon, B. Harrison, T. Myers et al. , “8i voxelized full bodies-a voxelized point cloud dataset,” ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document WG11M40059/WG1M74006 , 2017

  67. [67]

    Owlii dynamic human mesh sequence dataset,

    Y . Xu, Y . Lu, and Z. Wen, “Owlii dynamic human mesh sequence dataset,” ISO/IEC JTC1/SC29/WG11 m41658 , 2017

  68. [68]

    Common test conditions for point cloud compression,

    S. Schwarz, G. Martin-Cocher, D. Flynn et al., “Common test conditions for point cloud compression,” Document ISO/IEC JTC1/SC29/WG11 w17766, Slovenia , 2018

  69. [69]

    DeepCABAC: A universal compression algorithm for deep neural networks,

    S. Wiedemann, H. Kirchhoffer, S. Matlage et al. , “DeepCABAC: A universal compression algorithm for deep neural networks,” IEEE J. Sel. Top. Signal Process. , vol. 14, no. 4, pp. 700–714, 2020

  70. [70]

    MPEG-PCC-TMC13

    MPEGGroup. MPEG-PCC-TMC13. Accessed: 2025. [Online]. Available: https://github.com/MPEGGroup/mpeg-pcc-tmc13

  71. [71]

    MPEG-PCC-TMC2

    MPEGGroup. MPEG-PCC-TMC2. Accessed: 2025. [Online]. Available: https://github.com/MPEGGroup/mpeg-pcc-tmc2

  72. [72]

    PCQM: A full-reference quality metric for colored 3D point clouds,

    G. Meynet, Y . Nehm ´e, J. Digne et al. , “PCQM: A full-reference quality metric for colored 3D point clouds,” in Proc. Int. Conf. Qual. Multimed. Exp. (QoMEX) , 2020

  73. [73]

    Calculation of average PSNR differences between RD- curves,

    G. Bjøntegaard, “Calculation of average PSNR differences between RD- curves,” ITU-T SG16 Q , vol. 6, 2001

  74. [74]

    KAN: Kolmogorov-Arnold Networks

    Z. Liu, Y . Wang, S. V aidya et al., “KAN: Kolmogorov-Arnold networks,” arXiv preprint arXiv:2404.19756 , 2024

  75. [75]

    MEST: Accurate and fast memory- economic sparse training framework on the edge,

    G. Y uan, X. Ma, W. Niu et al. , “MEST: Accurate and fast memory- economic sparse training framework on the edge,” in Proc. Adv. Neural Inf. Proces. Syst. (NeurIPS) , 2021, pp. 20 838–20 850. 1 Implicit Neural Compression of Point Clouds: Supplementary Material Hongning Ruan, Y ulin Shao, Qianqian Y ang, Liang Zhao, Zhaoyang Zhang, Dusit Niyato Abstract—T...

  76. [76]

    The encoder optimizes the parameters Θ (0), performs quantization, and then transmits the quantized parameters ˆΘ (0) to the decoder

    Geometry Compression: The first frame of a sequence is processed with the intra-frame compression method, as in i- NeRC3. The encoder optimizes the parameters Θ (0), performs quantization, and then transmits the quantized parameters ˆΘ (0) to the decoder. For the remaining frames ( t = 1 , 2, · · ·), the encoder first trains the complete parameters Θ (t) by...

  77. [77]

    For the remaining frames, the encoder optimizes Φ (t) through the following loss function: L(t) G (Φ (t)) = Eˆx∼P (t) G [D(t) G (ˆx)] + λ G |X(t)|∥Φ (t) − ˆΦ (t− 1)∥1

    Attribute Compression: The first frame is processed as in i-NeRC 3, i.e., the encoder quantizes and transmits the complete parameters Φ (0). For the remaining frames, the encoder optimizes Φ (t) through the following loss function: L(t) G (Φ (t)) = Eˆx∼P (t) G [D(t) G (ˆx)] + λ G |X(t)|∥Φ (t) − ˆΦ (t− 1)∥1. (12) Then the encoder quantizes and transmits the...

  78. [78]

    Let Θ (t) be the parameter set of network F that implicitly represents the t-th frame X (t)

    Geometry Compression: Consider a group of consec- utive T frames X (0), X (1), · · ·, X (T − 1). Let Θ (t) be the parameter set of network F that implicitly represents the t-th frame X (t). The corresponding loss function that the parameter set Θ (t) minimizes is the geometry distortion averaged over training samples for the t-th frame, i.e., Ex∼P (t) F [...

  79. [79]

    The corresponding loss function is Eˆx∼P (t) G [D(t) G (ˆx)]

    Attribute Compression: Let Φ (t) be the parameter set of network G that represents the attributes of the t-th frame C (t). The corresponding loss function is Eˆx∼P (t) G [D(t) G (ˆx)]. Similarly to geometry compression, we assume that Φ (0), Φ (1), · · ·, Φ (T − 1) can be evenly sampled on a Bezier curve in the neural space R|Φ |, each sample point expres...

  80. [80]

    (21) The D1 PSNR between the original point cloud X and the reconstructed point cloud ˆX is D1 PSNR = 10 log10 3 × (2N − 1)2 max{e( ˆX , X ), e (X , ˆX )} (dB). (22) A. Proof of Proposition 1 When reconstructing the geometry of a point cloud, we first obtain the OPs ˆp(x) of all voxels in V by feeding each x into F . It’s worth noting that none of these OP...