pith. sign in

arxiv: 2307.03017 · v5 · submitted 2023-07-06 · 💻 cs.CV

RealLiFe: Real-Time Light Field Reconstruction via Hierarchical Sparse Gradient Descent

Pith reviewed 2026-05-24 07:43 UTC · model grok-4.3

classification 💻 cs.CV
keywords light field reconstructionmulti-plane imagesreal-time novel view synthesissparse gradient descentextended realityhierarchical optimization3D CNN
0
0 comments X

The pith

RealLiFe reconstructs high-quality light fields from sparse views in real time by optimizing only the sparse gradients of multi-plane images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to meet the demand for real-time light field reconstruction needed by extended reality systems when only a few input images are available. It starts from the observation that multi-plane images contain an intrinsic sparse structure that can be used to speed up the process. A 3D convolutional network first produces a coarse scene representation, which is then refined in just a few steps by applying hierarchical sparse gradient descent restricted to content-aligned gradients. If correct, this yields visual results comparable to slow offline techniques yet runs two orders of magnitude faster and improves on existing real-time alternatives.

Core claim

The authors introduce RealLiFe, a method that first generates a coarse multi-plane image using a 3D CNN and then applies Hierarchical Sparse Gradient Descent to optimize only the scene-content-aligned sparse MPI gradients in a small number of iterations, thereby producing high-quality light fields from sparse inputs at real-time speeds.

What carries the argument

Hierarchical Sparse Gradient Descent (HSGD), which restricts each optimization step to the sparse gradients aligned with scene content inside the multi-plane image representation.

If this is right

  • Produces visual quality comparable to state-of-the-art offline methods while running roughly 100 times faster on average.
  • Delivers approximately 2 dB higher PSNR than prior online approaches.
  • Enables real-time novel-view synthesis suitable for extended-reality applications from only a few input images.
  • Maintains rendering quality by exploiting the sparse manifold property of multi-plane images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sparse-gradient principle could be tested on other layered or volumetric scene representations that exhibit similar sparsity.
  • If the speed advantage holds on mobile hardware, the technique may allow light-field rendering inside standalone XR headsets without cloud offload.
  • Extending the hierarchical schedule to video sequences could be examined to support dynamic scenes.

Load-bearing premise

The multi-plane image representation of a scene possesses an intrinsic sparse manifold that permits rapid optimization without loss of rendering quality.

What would settle it

Measure whether removing the sparse-gradient restriction causes the optimization to require many more iterations or to produce noticeably lower PSNR on the same test scenes.

Figures

Figures reproduced from arXiv: 2307.03017 by Jinzhi Zhang, Lei Han, Lin Li, Lu Fang, Tianpeng Lin, Yijie Deng.

Figure 1
Figure 1. Figure 1: Rendering quality and efficiency comparison with state-of-the-art [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The derivation of MPI sparse gradients and the optimization [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Application of our method to support a real-time 3D display. (a) Sparse multi-view images that serve as the input to our model. (b) Our [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The overview of RealLiFe. First, the PSV is constructed using multiview images by homographic warping, and the PSV is then downsampled hierarchically at multiple resolutions. The output MPI is then generated in several iterations. Initially, the lowest-resolution PSV is fed to a 3D CNN to extract a coarse-level MPI. The PSV and the upsampled MPI both go through the Sparse Gradient Descent module for a refi… view at source ↗
Figure 5
Figure 5. Figure 5: This module comprises three major operations: gradi [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 5
Figure 5. Figure 5: The sparse gradient descent module. (a) The MPI gradients [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Proportions of empty pixels of individual MPI planes throughout optimization iterations. (a) A rendered image of [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The influence of k on the rendering quality. (a) A rendered image of RealLiFe (without using HSGD) with 40 MPI planes. (b) Top 5 MPI planes with the highest alpha gradients A for the red pixel. (The MPI is multiplied by A to better visualize its contribution to the final rendering result.) (c) The alpha gradients of the red pixel across 40 MPI planes, with the red dotted lines partitioning d based on wheth… view at source ↗
Figure 8
Figure 8. Figure 8: The backbone 3D CNN of RealLiFe, which contains only three convolution layers. CNNs responsible for converting the input plane sweep volume into a multi-plane image (as depicted in [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative results on Real Forward-Facing [9] (a) and (b), SWORD [50] (c) and (d) and Shiny [48] (e) evaluation datasets. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Extra qualitative comparison on Shiny [8] and IBRNet collected [5]. [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative comparison on the ablation configuration of [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The border artifacts of our approach. The borders of our ren [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗
read the original abstract

With the rise of Extended Reality (XR) technology, there is a growing need for real-time light field reconstruction from sparse view inputs. Existing methods can be classified into offline techniques, which can generate high-quality novel views but at the cost of long inference/training time, and online methods, which either lack generalizability or produce unsatisfactory results. However, we have observed that the intrinsic sparse manifold of Multi-plane Images (MPI) enables a significant acceleration of light field reconstruction while maintaining rendering quality.Based on this insight, we introduce \textbf{RealLiFe}, a novel light field optimization method, which leverages the proposed Hierarchical Sparse Gradient Descent (HSGD) to produce high-quality light fields from sparse input images in real time. Technically, the coarse MPI of a scene is first generated using a 3D CNN, and it is further optimized leveraging only the scene content aligned sparse MPI gradients in a few iterations. Extensive experiments demonstrate that our method achieves comparable visual quality while being 100x faster on average than state-of-the-art offline methods and delivers better performance (about 2 dB higher in PSNR) compared to other online approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents RealLiFe, a method for real-time light field reconstruction from sparse input images. It first uses a 3D CNN to generate a coarse Multi-plane Image (MPI) representation of the scene, then applies Hierarchical Sparse Gradient Descent (HSGD) to optimize only the scene-content-aligned sparse gradients in a few iterations. The authors claim that this approach achieves visual quality comparable to state-of-the-art offline methods while being approximately 100x faster, and outperforms other online methods by about 2 dB in PSNR.

Significance. If the experimental claims hold after proper controls, the work could enable practical real-time light-field rendering for XR by demonstrating that MPI sparsity permits rapid optimization without quality loss.

major comments (2)
  1. [Abstract] Abstract: the central claims of 100x average speedup versus offline methods and ~2 dB PSNR gain versus online methods are stated without any dataset names, scene counts, baseline implementation details, or error bars, so the quantitative results cannot be inspected or reproduced from the given evidence.
  2. [Abstract / Experiments] Abstract / Experiments section: no ablation compares HSGD sparse-gradient selection against an otherwise identical full-gradient optimizer on the same coarse MPI produced by the 3D CNN. Without this internal control, it is impossible to confirm that the claimed quality preservation is due to the asserted intrinsic sparse manifold rather than the coarse initialization or other factors.
minor comments (1)
  1. Notation for the hierarchical levels of HSGD and the precise definition of the scene-content-aligned sparse mask should be given explicitly with equations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. Below we provide point-by-point responses to the major comments and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claims of 100x average speedup versus offline methods and ~2 dB PSNR gain versus online methods are stated without any dataset names, scene counts, baseline implementation details, or error bars, so the quantitative results cannot be inspected or reproduced from the given evidence.

    Authors: We acknowledge that space constraints in the abstract prevent inclusion of all experimental specifics. The manuscript's Experiments section details the datasets, scene counts, baseline implementations, and reports with error bars. To improve self-containment, we will revise the abstract to reference the primary evaluation datasets and clarify that the reported figures are averages drawn from those controlled experiments. revision: yes

  2. Referee: [Abstract / Experiments] Abstract / Experiments section: no ablation compares HSGD sparse-gradient selection against an otherwise identical full-gradient optimizer on the same coarse MPI produced by the 3D CNN. Without this internal control, it is impossible to confirm that the claimed quality preservation is due to the asserted intrinsic sparse manifold rather than the coarse initialization or other factors.

    Authors: This observation is correct; the current experiments focus on external method comparisons rather than an internal sparse-versus-full gradient ablation on identical 3D-CNN initializations. We will add this controlled ablation to the revised Experiments section, using the same coarse MPI for both optimizers, to directly demonstrate the contribution of the sparse manifold. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external baselines and an empirical observation, not self-referential fits or citations

full rationale

The paper presents the sparse manifold of MPI as an observed property that motivates HSGD, then reports PSNR gains and speedups measured exclusively against external offline and online baselines. No equations or sections reduce the claimed 2 dB improvement or 100x speedup to quantities fitted inside the same paper, nor do any load-bearing steps rely on self-citations whose content is unverified. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the domain assumption that MPI representations possess exploitable sparsity; no free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption The intrinsic sparse manifold of Multi-plane Images (MPI) enables a significant acceleration of light field reconstruction while maintaining rendering quality.
    Directly stated in abstract as the enabling observation for HSGD.

pith-pipeline@v0.9.0 · 5742 in / 1134 out tokens · 37999 ms · 2026-05-24T07:43:25.483554+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Learn2Splat: Extending the Horizon of Learned 3DGS Optimization

    cs.CV 2026-05 unverdicted novelty 7.0

    A meta-learned optimizer for 3DGS that extends the optimization horizon via checkpoint buffers and latent gradient-scale encoding, delivering better early novel-view quality and long-term stability with zero-shot gene...

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Deepview: View synthesis with learned gradient descent,

    J. Flynn, M. Broxton, P . E. Debevec, M. DuVall, G. Fyffe, R. S. Overbeck, N. Snavely, and R. Tucker, “Deepview: View synthesis with learned gradient descent,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2362–2371, 2019, doi: 10.1109/CVPR.2019.00247

  2. [2]

    Spatiotemporal reservoir resampling for real-time ray tracing with dynamic direct lighting.ACM Transactions on Graphics (Proc

    M. Broxton, J. Flynn, R. S. Overbeck, D. Erickson, P . Hedman, M. DuVall, J. Dourgarian, J. Busch, M. Whalen, and P . E. Debevec, “Immersive light field video with a layered mesh representation,” ACM Transactions on Graphics (TOG), vol. 39, pp. 86:1 – 86:15, 2020, doi: 10.1145/3386569.3392485

  3. [3]

    Nerf: Representing scenes as neural radiance fields for view synthesis,

    B. Mildenhall, P . P . Srinivasan, M. Tancik, J. T. Barron, R. Ra- mamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in European Conference on Computer Vision, 2020, doi: 10.1145/3503250

  4. [4]

    Walk in the cloud: Learning curves for point clouds shape analysis, pp

    J. T. Barron, B. Mildenhall, M. Tancik, P . Hedman, R. Martin- Brualla, and P . P . Srinivasan, “Mip-nerf: A multiscale represen- tation for anti-aliasing neural radiance fields,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5835– 5844, 2021, doi: 10.1109/ICCV48922.2021.00580

  5. [5]

    Derf: Decomposed radiance fields,

    Q. Wang, Z. Wang, K. Genova, P . P . Srinivasan, H. Zhou, J. T. Bar- ron, R. Martin-Brualla, N. Snavely, and T. A. Funkhouser, “Ibrnet: Learning multi-view image-based rendering,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4688–4697, 2021, doi: 10.1109/CVPR46437.2021.00466

  6. [6]

    Walk in the cloud: Learning curves for point clouds shape analysis, pp

    A. Chen, Z. Xu, F. Zhao, X. Zhang, F. Xiang, J. Yu, and H. Su, “Mvsnerf: Fast generalizable radiance field reconstruc- tion from multi-view stereo,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14 104–14 113, 2021, doi: 10.1109/ICCV48922.2021.01386

  7. [7]

    Efficient neural radiance fields for interactive free-viewpoint video,

    H. Lin, S. Peng, Z. Xu, Y. Yan, Q. Shuai, H. Bao, and X. Zhou, “Efficient neural radiance fields for interactive free-viewpoint video,” SIGGRAPH Asia 2022 Conference Papers, 2021, doi: 10.1145/3550469.3555376

  8. [9]

    Local light field fusion: Practical view synthesis with prescriptive sampling guidelines,

    B. Mildenhall, P . P . Srinivasan, R. Ortiz-Cayon, N. K. Kalan- tari, R. Ramamoorthi, R. Ng, and A. Kar, “Local light field fusion: Practical view synthesis with prescriptive sampling guidelines,” ACM Trans. Graph., vol. 38, no. 4, jul 2019, doi: 10.1145/3306346.3322980

  9. [10]

    Derf: Decomposed radiance fields,

    D. B. Lindell, J. N. P . Martel, and G. Wetzstein, “Au- toint: Automatic integration for fast neural volume render- ing,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14 551–14 560, 2020, doi: 10.1109/CVPR46437.2021.01432

  10. [11]

    Derf: Decomposed radiance fields,

    D. Rebain, W. Jiang, S. Yazdani, K. Li, K. M. Yi, and A. Tagliasacchi, “Derf: Decomposed radiance fields,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14 148– 14 156, 2020, doi: 10.1109/CVPR46437.2021.01393

  11. [12]

    Walk in the cloud: Learning curves for point clouds shape analysis, pp

    C. Reiser, S. Peng, Y. Liao, and A. Geiger, “Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14 315–14 325, 2021, doi: 10.1109/ICCV48922.2021.01407

  12. [13]

    Walk in the cloud: Learning curves for point clouds shape analysis, pp

    S. J. Garbin, M. Kowalski, M. Johnson, J. Shotton, and J. P . C. Valentin, “Fastnerf: High-fidelity neural rendering at 200fps,”2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14 326–14 335, 2021, doi: 10.1109/ICCV48922.2021.01408

  13. [14]

    A ConvNet for the 2020s

    Y. Liu, S. Peng, L. Liu, Q. Wang, P . Wang, C. Theobalt, X. Zhou, and W. Wang, “Neural rays for occlusion-aware image-based rendering,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7814–7823, 2022, doi: 10.1109/CVPR52688.2022.00767

  14. [15]

    360mvsnet: Deep multi-view stereo network with 360° images for indoor scene reconstruction,

    A. Trevithick and B. Yang, “Grf: Learning a general radiance field for 3d representation and rendering,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 162–15 172, doi: 10.1109/WACV56688.2023.00429. 15

  15. [16]

    Derf: Decomposed radiance fields,

    A. Yu, V . Ye, M. Tancik, and A. Kanazawa, “pixelnerf: Neural radi- ance fields from one or few images,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4576– 4585, 2021, doi: 10.1109/CVPR46437.2021.00455

  16. [17]

    Derf: Decomposed radiance fields,

    J. Chibane, A. Bansal, V . Lazova, and G. Pons-Moll, “Stereo radiance fields (srf): Learning view synthesis for sparse views of novel scenes,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7907–7916, 2021, doi: 10.1109/CVPR46437.2021.00782

  17. [18]

    A ConvNet for the 2020s

    A. Yu, S. Fridovich-Keil, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa, “Plenoxels: Radiance fields without neu- ral networks,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5491–5500, 2021, doi: 10.1109/CVPR52688.2022.00542

  18. [19]

    Walk in the cloud: Learning curves for point clouds shape analysis, pp

    A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa, “Plenoc- trees for real-time rendering of neural radiance fields,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5732–5741, 2021, doi: 10.1109/ICCV48922.2021.00570

  19. [20]

    A ConvNet for the 2020s

    C. Sun, M. Sun, and H.-T. Chen, “Direct voxel grid op- timization: Super-fast convergence for radiance fields recon- struction,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5449–5459, 2021, doi: 10.1109/CVPR52688.2022.00538

  20. [21]

    Stereo Magnification: Learning View Synthesis using Multiplane Images

    T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely, “Stereo magnification: Learning view synthesis using multiplane images,” ArXiv, vol. abs/1805.09817, 2018, doi: 10.1145/3197517.3201323

  21. [22]

    Pushing the boundaries of view extrapola- tion with multiplane images,

    P . P . Srinivasan, R. Tucker, J. T. Barron, R. Ramamoorthi, R. Ng, and N. Snavely, “Pushing the boundaries of view extrapola- tion with multiplane images,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 175– 184, doi:10.1109/CVPR.2019.00026

  22. [23]

    A ConvNet for the 2020s

    K. Deng, A. Liu, J.-Y. Zhu, and D. Ramanan, “Depth-supervised nerf: Fewer views and faster training for free,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12 872–12 881, 2021, doi: 10.1109/CVPR52688.2022.01254

  23. [24]

    Uncertainty awareness with adaptive propagation for multi-view stereo,

    J. Chen, Z. Yu, L. Ma, and K. Zhang, “Uncertainty awareness with adaptive propagation for multi-view stereo,” Applied Intelligence, 2023, doi: 10.1007/s10489-023-04910-z. [Online]. Available: https://api.semanticscholar.org/CorpusID:261038143

  24. [25]

    Adaptmvsnet: Efficient multi-view stereo with adaptive convolution and attention fusion,

    P . Jiang, X. Yang, Y.-R. Chen, W.-Z. Song, and Y. Li, “Adaptmvsnet: Efficient multi-view stereo with adaptive convolution and attention fusion,” Computers & Graphics, 2023, doi: 10.1016/j.cag.2023.08.014. [Online]. Available: https://api.semanticscholar.org/CorpusID:260792500

  25. [26]

    Highres-mvsnet: A fast multi- view stereo network for dense 3d reconstruction from high- resolution images,

    R. Weilharter and F. Fraundorfer, “Highres-mvsnet: A fast multi- view stereo network for dense 3d reconstruction from high- resolution images,” IEEE Access, vol. 9, pp. 11 306–11 315, 2021, doi: 10.1109/ACCESS.2021.3050556

  26. [27]

    360mvsnet: Deep multi-view stereo network with 360° images for indoor scene reconstruction,

    C.-Y. Chiu, Y.-T. Wu, I.-C. Shen, and Y.-Y. Chuang, “360mvsnet: Deep multi-view stereo network with 360° images for indoor scene reconstruction,” in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3056–3065, doi: 10.1109/WACV56688.2023.00307

  27. [28]

    Uncertainty guided multi-view stereo network for depth estimation,

    W. Su, Q. Xu, and W. Tao, “Uncertainty guided multi-view stereo network for depth estimation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7796–7808, 2022, doi: 10.1109/TCSVT.2022.3183836

  28. [29]

    Multistage pixel-visibility learning with cost regular- ization for multiview stereo,

    X. Guan, W. Tong, S. Jiang, P . Z. H. Sun, E. Q. Wu, and G. Chen, “Multistage pixel-visibility learning with cost regular- ization for multiview stereo,” IEEE Transactions on Automation Science and Engineering, vol. 20, no. 2, pp. 751–762, 2023, doi: 10.1109/TASE.2022.3165944

  29. [30]

    Learned primal-dual reconstruction,

    J. Adler and O. ¨Oktem, “Learned primal-dual reconstruction,” IEEE Transactions on Medical Imaging, vol. 37, pp. 1322–1332, 2017, doi: 10.1109/TMI.2018.2799231

  30. [31]

    Solving ill-posed inverse problems using iterative deep neural networks,

    ——, “Solving ill-posed inverse problems using iterative deep neural networks,” Inverse Problems, vol. 33, p. 124007, 2017, doi: 10.1088/1361-6420/aa9581

  31. [32]

    Tensor displays,

    G. Wetzstein, D. Lanman, M. Hirsch, and R. Raskar, “Tensor displays,” ACM Transactions on Graphics (TOG), vol. 31, pp. 1 – 11, 2012, doi: 10.1145/2343456.2343480

  32. [33]

    Compositing digital images,

    T. K. Porter and T. D. S. Duff, “Compositing digital images,” international conference on computer graphics and interactive techniques, 1984, doi: 10.1145/964965.808606

  33. [34]

    A ConvNet for the 2020s

    B. Mildenhall, P . Hedman, R. Martin-Brualla, P . P . Srinivasan, and J. T. Barron, “Nerf in the dark: High dynamic range view synthesis from noisy raw images,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16 169– 16 178, 2021, doi: 10.1109/CVPR52688.2022.01571

  34. [35]

    A ConvNet for the 2020s

    D. Verbin, P . Hedman, B. Mildenhall, T. E. Zickler, J. T. Barron, and P . P . Srinivasan, “Ref-nerf: Structured view-dependent appear- ance for neural radiance fields,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5481–5490, 2021, doi: 10.1109/CVPR52688.2022.00541

  35. [36]

    A ConvNet for the 2020s

    J. T. Barron, B. Mildenhall, D. Verbin, P . P . Srinivasan, and P . Hedman, “Mip-nerf 360: Unbounded anti-aliased neural ra- diance fields,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5460–5469, 2021, doi: 10.1109/CVPR52688.2022.00539

  36. [37]

    Derf: Decomposed radiance fields,

    M. Tancik, B. Mildenhall, T. Wang, D. Schmidt, P . P . Srinivasan, J. T. Barron, and R. Ng, “Learned initializations for optimiz- ing coordinate-based neural representations,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2845–2854, 2020, doi: 10.1109/CVPR46437.2021.00287

  37. [38]

    Instant neural graph- ics primitives with a multiresolution hash encoding.ACM Transactions on Graphics (Proc

    T. M ¨uller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Transactions on Graphics (TOG), vol. 41, pp. 1 – 15, 2022, doi: 10.1145/3528223.3530127

  38. [39]

    Derf: Decomposed radiance fields,

    S. Peng, Y. Zhang, Y. Xu, Q. Wang, Q. Shuai, H. Bao, and X. Zhou, “Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9050–9059, 2020, doi: 10.1109/CVPR46437.2021.00894

  39. [40]

    A ConvNet for the 2020s

    T. Khakhulin, D. Korzhenkov, P . Solovev, G. Sterkin, A.-T. Arde- lean, and V . S. Lempitsky, “Stereo magnification with multi- layer images,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8677–8686, 2022, doi: 10.1109/CVPR52688.2022.00849

  40. [41]

    Light field re- construction via deep adaptive fusion of hybrid lenses,

    J. Jin, M. Guo, J. Hou, H. Liu, and H. Xiong, “Light field re- construction via deep adaptive fusion of hybrid lenses,” IEEE Transactionson Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 12 050–12 067, 2023, doi: 10.1109/TPAMI.2023.3287603

  41. [42]

    IEEE Transactions on Image Processing 26(5), 2274–2285 (2017)

    G. Wu, Y. Wang, Y. Liu, L. Fang, and T. Chai, “Spatial- angular attention network for light field reconstruction,” IEEE Transactions on Image Processing, vol. 30, pp. 8999–9013, 2021, doi: 10.1109/TIP .2021.3122089

  42. [43]

    In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

    J. Shi, X. Jiang, and C. Guillemot, “Learning fused pixel and feature-based view reconstructions for light fields,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2552–2561, doi: 10.1109/CVPR42600.2020.00263

  43. [44]

    Project starline: A high-fidelity telepresence system,

    J. Lawrence, D. B. Goldman, S. Achar, G. M. Blascovich, J. G. Desloge, T. Fortes, E. M. Gomez, S. H ¨aberling, H. Hoppe, A. Huibers et al., “Project starline: A high-fidelity telepresence system,” 2021, doi: 10.1145/3478513.3480490

  44. [45]

    Virtualcube: An immersive 3d video communication sys- tem,

    Y. Zhang, J. Yang, Z. Liu, R. Wang, G. Chen, X. Tong, and B. Guo, “Virtualcube: An immersive 3d video communication sys- tem,” IEEE Transactions on Visualization and Computer Graphics, vol. PP , pp. 1–1, 2021, doi: 10.1109/TVCG.2022.3150512

  45. [47]

    Adam: A Method for Stochastic Optimization

    D. P . Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID:6628106

  46. [48]

    Derf: Decomposed radiance fields,

    S. Wizadwongsa, P . Phongthawee, J. Yenphraphai, and S. Suwa- janakorn, “Nex: Real-time view synthesis with neural basis expansion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8534–8543, doi: 10.1109/CVPR46437.2021.00843

  47. [49]

    : Computational design and fabrication of soft pneumatic objects with desired deformations

    E. Penner and L. Zhang, “Soft 3d reconstruction for view synthe- sis,” ACM Transactions on Graphics (TOG), vol. 36, pp. 1 – 11, 2017, doi: 10.1145/3130800.3130855

  48. [50]

    PoseNet: A convolutional network for real-time 6-dof camera relocalization,

    R. Szeliski and P . Golland, “Stereo matching with transparency and matting,” International Journal of Computer Vision, vol. 32, pp. 45–61, 1998, doi: 10.1109/ICCV .1998.710766

  49. [51]

    Mvsnet: Depth infer- ence for unstructured multi-view stereo,

    Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan, “Mvsnet: Depth infer- ence for unstructured multi-view stereo,” in European Conference on Computer Vision, 2018, doi: 10.1007/978-3-030-01237-3 47. 16 Yijie Deng is currently a master student in Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua University. He received B.E. from Wuhan University in 2021. His...