A Single Atlas is All You Need: Decoder-Side Gaussian Splatting for Immersive Video
Pith reviewed 2026-05-19 18:40 UTC · model grok-4.3
The pith
Decoder-side Gaussian splatting from a single compressed atlas outperforms depth estimation for immersive video quality and consistency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Decoder-Side Gaussian Splatting optimizes volumetric scenes entirely on the decoder from compressed textures and metadata, replacing the depth-estimation stage of prior decoder-side systems. Lossy compression functions as an implicit low-pass filter that stabilizes feed-forward splat prediction, so compressed bitstreams can exceed lossless quality while shrinking tenfold in size. Under extreme view sparsity with one atlas comprising four input views, the approach achieves a 5.79 dB BD-PSNR gain and 0.054 BD-SSIM gain over the DSDE anchor while reducing maximum inter-view Delta IV-PSNR from 17.2 dB to 6.4 dB.
What carries the argument
Decoder-Side Gaussian Splatting (DSGS) is the central mechanism, which runs feed-forward 3D Gaussian Splatting inference on the client from a single transmitted 2D atlas and metadata to produce consistent novel views without explicit 3D transmission.
If this is right
- Immersive video delivery becomes feasible with extreme view sparsity using only one atlas for four input views.
- Inter-view consistency improves because the splatting produces more coherent geometry across virtual viewpoints.
- Bandwidth use drops sharply since only standard compressed 2D textures and metadata are transmitted.
- The pipeline aligns directly with existing video codecs instead of requiring special formats for 3D data or splats.
Where Pith is reading between the lines
- The compression stabilization effect could be tested in other feed-forward neural rendering systems that predict scene structure from 2D inputs.
- Lower data rates from this approach may support real-time immersive experiences on mobile networks where pixel-rate limits are strict.
- The reduction in domain shift between atlas views and synthesized viewports may improve comfort during head movement in virtual environments.
Load-bearing premise
The method assumes lossy compression acts as a helpful low-pass filter that stabilizes splat prediction without introducing artifacts that degrade the final rendered views.
What would settle it
Run the same DSGS model on identical atlas inputs under lossless compression and under lossy compression at the same bitrate, then measure whether the lossy version produces equal or higher PSNR and fewer visible artifacts in the rendered output.
Figures
read the original abstract
Immersive video delivery is bottlenecked by pixel-rate constraints, making the transmission of high-resolution depth maps or explicit 3D volumetric data expensive. Decoder-Side Depth Estimation (DSDE) shifts depth computation to the client, but struggles with complex geometries, inter-view flickering, and non-Lambertian reflections. Conversely, 3D Gaussian Splatting (3DGS) offers state-of-the-art view synthesis, but transmitting splats (or their projected 2D maps) incurs prohibitive bandwidth costs and is poorly aligned with standard video codecs. We propose Decoder-Side Gaussian Splatting (DSGS), a framework that natively replaces the depth-estimation stage of DSDE with feed-forward 3DGS inference, optimizing volumetric scenes entirely on the decoder side from compressed textures and metadata. A central, counterintuitive finding is that lossy compression acts as an implicit low-pass filter stabilizing feed-forward splat prediction: compressed bitstreams exceed lossless quality while shrinking tenfold. Under extreme view sparsity (one 2D atlas comprising 4 input views), DSGS achieves a +5.79 dB BD-PSNR and +0.054 BD-SSIM gain over the DSDE anchor while reducing maximum inter-view Delta IV-PSNR from 17.2 dB to 6.4 dB, minimizing the domain shift between transmitted and virtual viewports.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Decoder-Side Gaussian Splatting (DSGS) to replace Decoder-Side Depth Estimation (DSDE) in immersive video pipelines. From a single 2D atlas of four input views plus compressed metadata, a feed-forward network infers 3D Gaussian splats on the decoder; the abstract reports that this yields +5.79 dB BD-PSNR and +0.054 BD-SSIM over DSDE while cutting maximum inter-view Delta IV-PSNR from 17.2 dB to 6.4 dB. A central claim is that lossy compression functions as an implicit low-pass filter that stabilizes splat prediction and even improves final rendered quality relative to lossless inputs.
Significance. If the performance numbers and the compression-regularization effect can be reproduced, the work would offer a practical route to high-quality view synthesis under extreme view sparsity while staying compatible with existing video codecs. The decoder-side 3DGS formulation directly targets the pixel-rate bottleneck of immersive delivery and could reduce both bandwidth and inter-view flickering. The absence of methods, training details, and ablations, however, prevents any assessment of whether these gains are attributable to the architecture or to unstated implementation choices.
major comments (2)
- [Abstract] Abstract: the headline result (+5.79 dB BD-PSNR, reduced Delta IV-PSNR) is presented as evidence that DSGS replaces DSDE, yet the text explicitly attributes success to the mechanism that 'lossy compression acts as an implicit low-pass filter.' No ablation comparing compressed versus lossless atlas inputs is described, leaving the load-bearing assumption untested.
- [Methods] No section provides the architecture of the feed-forward splat predictor, the precise input features extracted from the compressed atlas, the training objective, or any hyper-parameter settings. Without these elements the quantitative claims cannot be reproduced or isolated from possible confounding factors in the experimental pipeline.
minor comments (1)
- [Abstract] The abstract uses 'BD-PSNR' and 'BD-SSIM' without defining the rate-distortion operating points or the anchor codec configuration used for the Bjontegaard calculation.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript introducing Decoder-Side Gaussian Splatting (DSGS). We appreciate the acknowledgment of the potential practical benefits for immersive video and address each major comment in detail below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline result (+5.79 dB BD-PSNR, reduced Delta IV-PSNR) is presented as evidence that DSGS replaces DSDE, yet the text explicitly attributes success to the mechanism that 'lossy compression acts as an implicit low-pass filter.' No ablation comparing compressed versus lossless atlas inputs is described, leaving the load-bearing assumption untested.
Authors: We agree that an explicit ablation comparing compressed and lossless atlas inputs would provide stronger support for the regularization effect attributed to lossy compression. Although our experiments consistently showed improved rendering quality with compressed inputs, the initial submission did not include this direct comparison. In the revised manuscript we will add the requested ablation, which will demonstrate that the low-pass filtering induced by compression reduces prediction noise and yields higher final quality than lossless inputs, thereby reinforcing the central claim while preserving the reported gains over DSDE. revision: yes
-
Referee: [Methods] No section provides the architecture of the feed-forward splat predictor, the precise input features extracted from the compressed atlas, the training objective, or any hyper-parameter settings. Without these elements the quantitative claims cannot be reproduced or isolated from possible confounding factors in the experimental pipeline.
Authors: The referee correctly notes that detailed architectural and training information is required for reproducibility. The original manuscript prioritized the high-level framework and quantitative results; we will expand the Methods section in the revision to fully specify the feed-forward splat predictor architecture, the exact input features taken from the compressed atlas, the training objective, and all hyper-parameter values. These additions will allow independent reproduction and isolation of the reported performance improvements. revision: yes
Circularity Check
No circularity; performance claims are empirical measurements, not derived reductions.
full rationale
The paper presents DSGS as a framework replacing DSDE depth estimation with decoder-side 3DGS inference from compressed inputs. The headline gains (+5.79 dB BD-PSNR, reduced inter-view Delta IV-PSNR) are reported as measured outputs under one-atlas sparsity. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text that would make the results equivalent to inputs by construction. The lossy-compression-as-low-pass-filter observation is stated as an empirical finding rather than a first-principles derivation that loops back on itself. The work is self-contained against external benchmarks (DSDE anchor) with no load-bearing uniqueness theorems or ansatzes imported via self-citation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
MPEG immersive video coding st andard,
J. M. Boyce et al., “MPEG immersive video coding st andard,” Proc. IEEE, vol. 109, no. 9, pp. 1521–1536, Sep. 2021
work page 2021
-
[2]
Overview and efficiency of deco der-side depth estimation in MPEG immersive video,
D. Mieloch et al., “Overview and efficiency of deco der-side depth estimation in MPEG immersive video,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 9, pp. 6360–6374, Sep. 2022
work page 2022
-
[3]
A new approach to decoder-side depth estimation in immersive video transmission,
D. Mieloch et al., “A new approach to decoder-side depth estimation in immersive video transmission,” IEEE Trans. Broadcas t., vol. 69, no. 4, pp. 951–965, Dec. 2023
work page 2023
-
[4]
Non-Lam bertian Surfaces and Their Challenges for Visual SLAM,
S. Pyykölä, N. Joswig and L. Ruotsalainen, “Non-Lam bertian Surfaces and Their Challenges for Visual SLAM,” IEEE Open Jo urnal of the Computer Society, vol. 5, pp. 430-445, 2024
work page 2024
-
[5]
3D Gaussian splatting for real-tim e radiance field rendering,
B. Kerbl et al. “3D Gaussian splatting for real-tim e radiance field rendering,” ACM Trans. Graph., vol. 42, no. 4, pp. 1–14, Aug. 2023
work page 2023
-
[6]
3D Gaussian splatting: Survey, tech nologies, challenges, and opportunities,
Y. Bao et al., “3D Gaussian splatting: Survey, tech nologies, challenges, and opportunities,” IEEE Trans. Circuits Syst. Vide o Technol., vol. 35, no. 7, pp. 6832–6852, Jul. 2025
work page 2025
-
[7]
D. Yang et al., “Generalizable 3D Gaussian splattin g enabled semantic coding for real-time immersive video communications ,” arXiv:2604.25330, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[8]
GIFStream: 4D Gaussian-based immersi ve video with feature stream,
H. Li et al., “GIFStream: 4D Gaussian-based immersi ve video with feature stream,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2025, pp. 21761–21770
work page 2025
-
[9]
Text of ISO/IEC 23090-5 DAM V-P CC for gaussian splats coding,
ISO/IEC JTC1/SC29, “Text of ISO/IEC 23090-5 DAM V-P CC for gaussian splats coding,” MPEG document N01453, 2026
work page 2026
-
[10]
Lightweight 3D Gaussia n splatting compression via video codec,
Q. Yang, M. Liu, and Y. Xu, “Lightweight 3D Gaussia n splatting compression via video codec,” arXiv:2512.11186, 2025
-
[11]
CompSplat: Compression-aware 3D Ga ussian splatting for real-world video,
H. Song et al., “CompSplat: Compression-aware 3D Ga ussian splatting for real-world video,” arXiv:2602.09816, 2026
-
[12]
On the efficient adaptive streamin g of 3D Gaussian splatting over dynamic networks,
Y. Wang et al., “On the efficient adaptive streamin g of 3D Gaussian splatting over dynamic networks,” IEEE Trans. Circu its Syst. Video Technol., vol. 36, no. 4, pp. 4594–4608, Apr. 2026
work page 2026
-
[13]
CSGaussian: Progressive rate-distortion compression and segmentation for 3D Gaussian splatting,
Y.-J. Tseng et al., “CSGaussian: Progressive rate-distortion compression and segmentation for 3D Gaussian splatting,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), 2026
work page 2026
-
[14]
Spla tter image: Ultra- fast single-view 3D reconstruction,
S. Szymanowicz, C. Rupprecht, and A. Vedaldi, “Spla tter image: Ultra- fast single-view 3D reconstruction,” in Proc. IEEE/ CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 10208–10217
work page 2024
-
[15]
pixelSplat: 3D Gaussian splats from image pairs for scalable generalizable 3D reconstruction,
D. Charatan et al., “pixelSplat: 3D Gaussian splats from image pairs for scalable generalizable 3D reconstruction,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 19457–19467
work page 2024
-
[16]
MVSplat: Efficient 3D Gaussian spl atting from sparse multi-view images,
Y. Chen et al., “MVSplat: Efficient 3D Gaussian spl atting from sparse multi-view images,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2024
work page 2024
-
[17]
DepthSplat: Connecting Gaussian spla tting and depth,
H. Xu et al., “DepthSplat: Connecting Gaussian spla tting and depth,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025
work page 2025
-
[18]
No pose, no problem: Surprisingly si mple 3D Gaussian splats from sparse unposed images,
B. Ye et al., “No pose, no problem: Surprisingly si mple 3D Gaussian splats from sparse unposed images,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2025
work page 2025
-
[19]
PF3plat: Pose-free feed-forward 3D Gaussian splatting for novel view synthesis,
S. Hong et al., “PF3plat: Pose-free feed-forward 3D Gaussian splatting for novel view synthesis,” in Proc. Int. Conf. Mach. Learn. (ICML), 2025
work page 2025
-
[20]
FLARE: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views,
S. Zhang et al., “FLARE: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views,” in Proc . IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2025, pp. 21936–21947
work page 2025
-
[21]
AnySplat: Feed-forward 3D Gaussia n splatting from unconstrained views,
L. Jiang et al., “AnySplat: Feed-forward 3D Gaussia n splatting from unconstrained views,” ACM Trans. Graphics, vol. 44, no. 6, Dec. 2025
work page 2025
-
[22]
PocketGS: On-device training of 3D Gaussian splatting for high perceptual modeling,
W. Guo et al., “PocketGS: On-device training of 3D Gaussian splatting for high perceptual modeling,” arXiv:2601.17354, 2026
work page internal anchor Pith review arXiv 2026
-
[23]
3D-LMVIC: Learning-based multi-vi ew image coding with 3D Gaussian geometric priors,
Y. Huang et al., “3D-LMVIC: Learning-based multi-vi ew image coding with 3D Gaussian geometric priors,” arXiv:2409.04013, 2024
-
[24]
Resplat: Learning recurrent gaussian splatting,
H. Xu, D. Barath, A. Geiger, and M. Pollefeys, “ReS plat: Learning recurrent Gaussian splats,” arXiv:2510.08575, 2025
-
[25]
NeRF in the dark: High dynam ic range view synthesis from noisy raw images,
B. Mildenhall et al., “NeRF in the dark: High dynam ic range view synthesis from noisy raw images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 16190–16199
work page 2022
-
[26]
DenoiseSplat: Feed- forward Gaussian splatting for noisy 3D scene reconstruction,
F. Jiang, Z. Li, and Y. Zhang, “DenoiseSplat: Feed- forward Gaussian splatting for noisy 3D scene reconstruction,” arXiv:2603.09291, 2026
-
[27]
Lossy compression o f noisy images,
O. K. Al-Shaykh, R. Mersereau, “Lossy compression o f noisy images,” IEEE Trans. Image Process., vol. 7, no. 12, pp. 1641–1654, Dec. 1998
work page 1998
-
[28]
GeoRGS: Geometric regularization for real-time novel view synthesis from sparse inputs,
Z. Liu et al., “GeoRGS: Geometric regularization for real-time novel view synthesis from sparse inputs,” IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 12, pp. 13113–13126, Dec. 2024
work page 2024
-
[29]
ZPressor: Bottleneck-aware compres sion for scalable feed-forward 3DGS,
W. Wang et al., “ZPressor: Bottleneck-aware compres sion for scalable feed-forward 3DGS,” in Adv. Neural Inf. Process. Syst. (NeurIPS), 2025
work page 2025
-
[30]
Common test conditions for MPEG immersive video,
ISO/IEC JTC1/SC29/WG04, “Common test conditions for MPEG immersive video,” MPEG document N00659, 2025
work page 2025
-
[31]
VVenC: An open and optimized VVC encoder implementation,
A. Wieckowski et al., “VVenC: An open and optimized VVC encoder implementation,” in Proc. IEEE Int. Conf. Multimedi a Expo Workshops (ICMEW), Jul. 2021
work page 2021
-
[32]
IV- PSNR—The objective quality metric for immersive vid eo applications,
A. Dziembowski, D. Mieloch, J. Stankowski, and A. G rzelka, “IV- PSNR—The objective quality metric for immersive vid eo applications,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 11, pp. 7575– 7591, Nov. 2022
work page 2022
-
[33]
IV-SS IM—The structural similarity metric for immersive video,
A. Dziembowski, W. Nowak, and J. Stankowski, “IV-SS IM—The structural similarity metric for immersive video,” Appl. Sci., vol. 14, no. 16, p. 7090, Aug. 2024
work page 2024
-
[34]
D-FCGS: Feedforward compression o f dynamic Gaussian splatting for free-viewpoint videos,
W. Zhang et al., “D-FCGS: Feedforward compression o f dynamic Gaussian splatting for free-viewpoint videos,” arXiv:2507.05859, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.