RealLiFe: Real-Time Light Field Reconstruction via Hierarchical Sparse Gradient Descent

Jinzhi Zhang; Lei Han; Lin Li; Lu Fang; Tianpeng Lin; Yijie Deng

arxiv: 2307.03017 · v5 · submitted 2023-07-06 · 💻 cs.CV

RealLiFe: Real-Time Light Field Reconstruction via Hierarchical Sparse Gradient Descent

Yijie Deng , Lei Han , Tianpeng Lin , Lin Li , Jinzhi Zhang , Lu Fang This is my paper

Pith reviewed 2026-05-24 07:43 UTC · model grok-4.3

classification 💻 cs.CV

keywords light field reconstructionmulti-plane imagesreal-time novel view synthesissparse gradient descentextended realityhierarchical optimization3D CNN

0 comments

The pith

RealLiFe reconstructs high-quality light fields from sparse views in real time by optimizing only the sparse gradients of multi-plane images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to meet the demand for real-time light field reconstruction needed by extended reality systems when only a few input images are available. It starts from the observation that multi-plane images contain an intrinsic sparse structure that can be used to speed up the process. A 3D convolutional network first produces a coarse scene representation, which is then refined in just a few steps by applying hierarchical sparse gradient descent restricted to content-aligned gradients. If correct, this yields visual results comparable to slow offline techniques yet runs two orders of magnitude faster and improves on existing real-time alternatives.

Core claim

The authors introduce RealLiFe, a method that first generates a coarse multi-plane image using a 3D CNN and then applies Hierarchical Sparse Gradient Descent to optimize only the scene-content-aligned sparse MPI gradients in a small number of iterations, thereby producing high-quality light fields from sparse inputs at real-time speeds.

What carries the argument

Hierarchical Sparse Gradient Descent (HSGD), which restricts each optimization step to the sparse gradients aligned with scene content inside the multi-plane image representation.

If this is right

Produces visual quality comparable to state-of-the-art offline methods while running roughly 100 times faster on average.
Delivers approximately 2 dB higher PSNR than prior online approaches.
Enables real-time novel-view synthesis suitable for extended-reality applications from only a few input images.
Maintains rendering quality by exploiting the sparse manifold property of multi-plane images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sparse-gradient principle could be tested on other layered or volumetric scene representations that exhibit similar sparsity.
If the speed advantage holds on mobile hardware, the technique may allow light-field rendering inside standalone XR headsets without cloud offload.
Extending the hierarchical schedule to video sequences could be examined to support dynamic scenes.

Load-bearing premise

The multi-plane image representation of a scene possesses an intrinsic sparse manifold that permits rapid optimization without loss of rendering quality.

What would settle it

Measure whether removing the sparse-gradient restriction causes the optimization to require many more iterations or to produce noticeably lower PSNR on the same test scenes.

Figures

Figures reproduced from arXiv: 2307.03017 by Jinzhi Zhang, Lei Han, Lin Li, Lu Fang, Tianpeng Lin, Yijie Deng.

**Figure 2.** Figure 2: The derivation of MPI sparse gradients and the optimization [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Application of our method to support a real-time 3D display. (a) Sparse multi-view images that serve as the input to our model. (b) Our [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: The overview of RealLiFe. First, the PSV is constructed using multiview images by homographic warping, and the PSV is then downsampled hierarchically at multiple resolutions. The output MPI is then generated in several iterations. Initially, the lowest-resolution PSV is fed to a 3D CNN to extract a coarse-level MPI. The PSV and the upsampled MPI both go through the Sparse Gradient Descent module for a refi… view at source ↗

**Figure 5.** Figure 5: This module comprises three major operations: gradi [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 5.** Figure 5: The sparse gradient descent module. (a) The MPI gradients [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Proportions of empty pixels of individual MPI planes throughout optimization iterations. (a) A rendered image of [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: The influence of k on the rendering quality. (a) A rendered image of RealLiFe (without using HSGD) with 40 MPI planes. (b) Top 5 MPI planes with the highest alpha gradients A for the red pixel. (The MPI is multiplied by A to better visualize its contribution to the final rendering result.) (c) The alpha gradients of the red pixel across 40 MPI planes, with the red dotted lines partitioning d based on wheth… view at source ↗

**Figure 8.** Figure 8: The backbone 3D CNN of RealLiFe, which contains only three convolution layers. CNNs responsible for converting the input plane sweep volume into a multi-plane image (as depicted in [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative results on Real Forward-Facing [9] (a) and (b), SWORD [50] (c) and (d) and Shiny [48] (e) evaluation datasets. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: Extra qualitative comparison on Shiny [8] and IBRNet collected [5]. [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 11.** Figure 11: Qualitative comparison on the ablation configuration of [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗

**Figure 12.** Figure 12: The border artifacts of our approach. The borders of our ren [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗

read the original abstract

With the rise of Extended Reality (XR) technology, there is a growing need for real-time light field reconstruction from sparse view inputs. Existing methods can be classified into offline techniques, which can generate high-quality novel views but at the cost of long inference/training time, and online methods, which either lack generalizability or produce unsatisfactory results. However, we have observed that the intrinsic sparse manifold of Multi-plane Images (MPI) enables a significant acceleration of light field reconstruction while maintaining rendering quality.Based on this insight, we introduce \textbf{RealLiFe}, a novel light field optimization method, which leverages the proposed Hierarchical Sparse Gradient Descent (HSGD) to produce high-quality light fields from sparse input images in real time. Technically, the coarse MPI of a scene is first generated using a 3D CNN, and it is further optimized leveraging only the scene content aligned sparse MPI gradients in a few iterations. Extensive experiments demonstrate that our method achieves comparable visual quality while being 100x faster on average than state-of-the-art offline methods and delivers better performance (about 2 dB higher in PSNR) compared to other online approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces HSGD to refine coarse MPI outputs from a 3D CNN for real-time light field reconstruction, but the sparse gradient claim rests on external comparisons without an internal full-gradient control.

read the letter

The main takeaway is that this work proposes Hierarchical Sparse Gradient Descent (HSGD) as a post-processing step on Multi-Plane Images. After a 3D CNN produces an initial coarse MPI from sparse views, HSGD updates only a subset of gradients aligned with scene content over a few iterations. The abstract positions this as delivering real-time performance with quality close to offline methods and a 2 dB PSNR edge over other online ones, based on the idea that MPI representations have an intrinsic sparse manifold.

Referee Report

2 major / 1 minor

Summary. The manuscript presents RealLiFe, a method for real-time light field reconstruction from sparse input images. It first uses a 3D CNN to generate a coarse Multi-plane Image (MPI) representation of the scene, then applies Hierarchical Sparse Gradient Descent (HSGD) to optimize only the scene-content-aligned sparse gradients in a few iterations. The authors claim that this approach achieves visual quality comparable to state-of-the-art offline methods while being approximately 100x faster, and outperforms other online methods by about 2 dB in PSNR.

Significance. If the experimental claims hold after proper controls, the work could enable practical real-time light-field rendering for XR by demonstrating that MPI sparsity permits rapid optimization without quality loss.

major comments (2)

[Abstract] Abstract: the central claims of 100x average speedup versus offline methods and ~2 dB PSNR gain versus online methods are stated without any dataset names, scene counts, baseline implementation details, or error bars, so the quantitative results cannot be inspected or reproduced from the given evidence.
[Abstract / Experiments] Abstract / Experiments section: no ablation compares HSGD sparse-gradient selection against an otherwise identical full-gradient optimizer on the same coarse MPI produced by the 3D CNN. Without this internal control, it is impossible to confirm that the claimed quality preservation is due to the asserted intrinsic sparse manifold rather than the coarse initialization or other factors.

minor comments (1)

Notation for the hierarchical levels of HSGD and the precise definition of the scene-content-aligned sparse mask should be given explicitly with equations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. Below we provide point-by-point responses to the major comments and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the central claims of 100x average speedup versus offline methods and ~2 dB PSNR gain versus online methods are stated without any dataset names, scene counts, baseline implementation details, or error bars, so the quantitative results cannot be inspected or reproduced from the given evidence.

Authors: We acknowledge that space constraints in the abstract prevent inclusion of all experimental specifics. The manuscript's Experiments section details the datasets, scene counts, baseline implementations, and reports with error bars. To improve self-containment, we will revise the abstract to reference the primary evaluation datasets and clarify that the reported figures are averages drawn from those controlled experiments. revision: yes
Referee: [Abstract / Experiments] Abstract / Experiments section: no ablation compares HSGD sparse-gradient selection against an otherwise identical full-gradient optimizer on the same coarse MPI produced by the 3D CNN. Without this internal control, it is impossible to confirm that the claimed quality preservation is due to the asserted intrinsic sparse manifold rather than the coarse initialization or other factors.

Authors: This observation is correct; the current experiments focus on external method comparisons rather than an internal sparse-versus-full gradient ablation on identical 3D-CNN initializations. We will add this controlled ablation to the revised Experiments section, using the same coarse MPI for both optimizers, to directly demonstrate the contribution of the sparse manifold. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external baselines and an empirical observation, not self-referential fits or citations

full rationale

The paper presents the sparse manifold of MPI as an observed property that motivates HSGD, then reports PSNR gains and speedups measured exclusively against external offline and online baselines. No equations or sections reduce the claimed 2 dB improvement or 100x speedup to quantities fitted inside the same paper, nor do any load-bearing steps rely on self-citations whose content is unverified. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the domain assumption that MPI representations possess exploitable sparsity; no free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption The intrinsic sparse manifold of Multi-plane Images (MPI) enables a significant acceleration of light field reconstruction while maintaining rendering quality.
Directly stated in abstract as the enabling observation for HSGD.

pith-pipeline@v0.9.0 · 5742 in / 1134 out tokens · 37999 ms · 2026-05-24T07:43:25.483554+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the intrinsic sparse manifold of Multi-plane Images (MPI) enables a significant acceleration of light field reconstruction while maintaining rendering quality... Hierarchical Sparse Gradient Descent (HSGD)... only employs sparsely gradients that align with the scene contents
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

sparsity loss Ls that regularizes the alpha values of the MPI to be close to 0 or 1

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learn2Splat: Extending the Horizon of Learned 3DGS Optimization
cs.CV 2026-05 unverdicted novelty 7.0

A meta-learned optimizer for 3DGS that extends the optimization horizon via checkpoint buffers and latent gradient-scale encoding, delivering better early novel-view quality and long-term stability with zero-shot gene...

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Deepview: View synthesis with learned gradient descent,

J. Flynn, M. Broxton, P . E. Debevec, M. DuVall, G. Fyffe, R. S. Overbeck, N. Snavely, and R. Tucker, “Deepview: View synthesis with learned gradient descent,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2362–2371, 2019, doi: 10.1109/CVPR.2019.00247

work page doi:10.1109/cvpr.2019.00247 2019
[2]

Spatiotemporal reservoir resampling for real-time ray tracing with dynamic direct lighting.ACM Transactions on Graphics (Proc

M. Broxton, J. Flynn, R. S. Overbeck, D. Erickson, P . Hedman, M. DuVall, J. Dourgarian, J. Busch, M. Whalen, and P . E. Debevec, “Immersive light field video with a layered mesh representation,” ACM Transactions on Graphics (TOG), vol. 39, pp. 86:1 – 86:15, 2020, doi: 10.1145/3386569.3392485

work page doi:10.1145/3386569.3392485 2020
[3]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P . P . Srinivasan, M. Tancik, J. T. Barron, R. Ra- mamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in European Conference on Computer Vision, 2020, doi: 10.1145/3503250

work page doi:10.1145/3503250 2020
[4]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

J. T. Barron, B. Mildenhall, M. Tancik, P . Hedman, R. Martin- Brualla, and P . P . Srinivasan, “Mip-nerf: A multiscale represen- tation for anti-aliasing neural radiance fields,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5835– 5844, 2021, doi: 10.1109/ICCV48922.2021.00580

work page doi:10.1109/iccv48922.2021.00580 2021
[5]

Derf: Decomposed radiance fields,

Q. Wang, Z. Wang, K. Genova, P . P . Srinivasan, H. Zhou, J. T. Bar- ron, R. Martin-Brualla, N. Snavely, and T. A. Funkhouser, “Ibrnet: Learning multi-view image-based rendering,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4688–4697, 2021, doi: 10.1109/CVPR46437.2021.00466

work page doi:10.1109/cvpr46437.2021.00466 2021
[6]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

A. Chen, Z. Xu, F. Zhao, X. Zhang, F. Xiang, J. Yu, and H. Su, “Mvsnerf: Fast generalizable radiance field reconstruc- tion from multi-view stereo,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14 104–14 113, 2021, doi: 10.1109/ICCV48922.2021.01386

work page doi:10.1109/iccv48922.2021.01386 2021
[7]

Efficient neural radiance fields for interactive free-viewpoint video,

H. Lin, S. Peng, Z. Xu, Y. Yan, Q. Shuai, H. Bao, and X. Zhou, “Efficient neural radiance fields for interactive free-viewpoint video,” SIGGRAPH Asia 2022 Conference Papers, 2021, doi: 10.1145/3550469.3555376

work page doi:10.1145/3550469.3555376 2022
[9]

Local light field fusion: Practical view synthesis with prescriptive sampling guidelines,

B. Mildenhall, P . P . Srinivasan, R. Ortiz-Cayon, N. K. Kalan- tari, R. Ramamoorthi, R. Ng, and A. Kar, “Local light field fusion: Practical view synthesis with prescriptive sampling guidelines,” ACM Trans. Graph., vol. 38, no. 4, jul 2019, doi: 10.1145/3306346.3322980

work page doi:10.1145/3306346.3322980 2019
[10]

Derf: Decomposed radiance fields,

D. B. Lindell, J. N. P . Martel, and G. Wetzstein, “Au- toint: Automatic integration for fast neural volume render- ing,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14 551–14 560, 2020, doi: 10.1109/CVPR46437.2021.01432

work page doi:10.1109/cvpr46437.2021.01432 2021
[11]

Derf: Decomposed radiance fields,

D. Rebain, W. Jiang, S. Yazdani, K. Li, K. M. Yi, and A. Tagliasacchi, “Derf: Decomposed radiance fields,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14 148– 14 156, 2020, doi: 10.1109/CVPR46437.2021.01393

work page doi:10.1109/cvpr46437.2021.01393 2021
[12]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

C. Reiser, S. Peng, Y. Liao, and A. Geiger, “Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14 315–14 325, 2021, doi: 10.1109/ICCV48922.2021.01407

work page doi:10.1109/iccv48922.2021.01407 2021
[13]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

S. J. Garbin, M. Kowalski, M. Johnson, J. Shotton, and J. P . C. Valentin, “Fastnerf: High-fidelity neural rendering at 200fps,”2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14 326–14 335, 2021, doi: 10.1109/ICCV48922.2021.01408

work page doi:10.1109/iccv48922.2021.01408 2021
[14]

A ConvNet for the 2020s

Y. Liu, S. Peng, L. Liu, Q. Wang, P . Wang, C. Theobalt, X. Zhou, and W. Wang, “Neural rays for occlusion-aware image-based rendering,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7814–7823, 2022, doi: 10.1109/CVPR52688.2022.00767

work page doi:10.1109/cvpr52688.2022.00767 2022
[15]

360mvsnet: Deep multi-view stereo network with 360° images for indoor scene reconstruction,

A. Trevithick and B. Yang, “Grf: Learning a general radiance field for 3d representation and rendering,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 162–15 172, doi: 10.1109/WACV56688.2023.00429. 15

work page doi:10.1109/wacv56688.2023.00429 2021
[16]

Derf: Decomposed radiance fields,

A. Yu, V . Ye, M. Tancik, and A. Kanazawa, “pixelnerf: Neural radi- ance fields from one or few images,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4576– 4585, 2021, doi: 10.1109/CVPR46437.2021.00455

work page doi:10.1109/cvpr46437.2021.00455 2021
[17]

Derf: Decomposed radiance fields,

J. Chibane, A. Bansal, V . Lazova, and G. Pons-Moll, “Stereo radiance fields (srf): Learning view synthesis for sparse views of novel scenes,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7907–7916, 2021, doi: 10.1109/CVPR46437.2021.00782

work page doi:10.1109/cvpr46437.2021.00782 2021
[18]

A ConvNet for the 2020s

A. Yu, S. Fridovich-Keil, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa, “Plenoxels: Radiance fields without neu- ral networks,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5491–5500, 2021, doi: 10.1109/CVPR52688.2022.00542

work page doi:10.1109/cvpr52688.2022.00542 2022
[19]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa, “Plenoc- trees for real-time rendering of neural radiance fields,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5732–5741, 2021, doi: 10.1109/ICCV48922.2021.00570

work page doi:10.1109/iccv48922.2021.00570 2021
[20]

A ConvNet for the 2020s

C. Sun, M. Sun, and H.-T. Chen, “Direct voxel grid op- timization: Super-fast convergence for radiance fields recon- struction,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5449–5459, 2021, doi: 10.1109/CVPR52688.2022.00538

work page doi:10.1109/cvpr52688.2022.00538 2022
[21]

Stereo Magnification: Learning View Synthesis using Multiplane Images

T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely, “Stereo magnification: Learning view synthesis using multiplane images,” ArXiv, vol. abs/1805.09817, 2018, doi: 10.1145/3197517.3201323

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3197517.3201323 2018
[22]

Pushing the boundaries of view extrapola- tion with multiplane images,

P . P . Srinivasan, R. Tucker, J. T. Barron, R. Ramamoorthi, R. Ng, and N. Snavely, “Pushing the boundaries of view extrapola- tion with multiplane images,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 175– 184, doi:10.1109/CVPR.2019.00026

work page doi:10.1109/cvpr.2019.00026 2019
[23]

A ConvNet for the 2020s

K. Deng, A. Liu, J.-Y. Zhu, and D. Ramanan, “Depth-supervised nerf: Fewer views and faster training for free,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12 872–12 881, 2021, doi: 10.1109/CVPR52688.2022.01254

work page doi:10.1109/cvpr52688.2022.01254 2022
[24]

Uncertainty awareness with adaptive propagation for multi-view stereo,

J. Chen, Z. Yu, L. Ma, and K. Zhang, “Uncertainty awareness with adaptive propagation for multi-view stereo,” Applied Intelligence, 2023, doi: 10.1007/s10489-023-04910-z. [Online]. Available: https://api.semanticscholar.org/CorpusID:261038143

work page doi:10.1007/s10489-023-04910-z 2023
[25]

Adaptmvsnet: Efficient multi-view stereo with adaptive convolution and attention fusion,

P . Jiang, X. Yang, Y.-R. Chen, W.-Z. Song, and Y. Li, “Adaptmvsnet: Efficient multi-view stereo with adaptive convolution and attention fusion,” Computers & Graphics, 2023, doi: 10.1016/j.cag.2023.08.014. [Online]. Available: https://api.semanticscholar.org/CorpusID:260792500

work page doi:10.1016/j.cag.2023.08.014 2023
[26]

Highres-mvsnet: A fast multi- view stereo network for dense 3d reconstruction from high- resolution images,

R. Weilharter and F. Fraundorfer, “Highres-mvsnet: A fast multi- view stereo network for dense 3d reconstruction from high- resolution images,” IEEE Access, vol. 9, pp. 11 306–11 315, 2021, doi: 10.1109/ACCESS.2021.3050556

work page doi:10.1109/access.2021.3050556 2021
[27]

360mvsnet: Deep multi-view stereo network with 360° images for indoor scene reconstruction,

C.-Y. Chiu, Y.-T. Wu, I.-C. Shen, and Y.-Y. Chuang, “360mvsnet: Deep multi-view stereo network with 360° images for indoor scene reconstruction,” in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3056–3065, doi: 10.1109/WACV56688.2023.00307

work page doi:10.1109/wacv56688.2023.00307 2023
[28]

Uncertainty guided multi-view stereo network for depth estimation,

W. Su, Q. Xu, and W. Tao, “Uncertainty guided multi-view stereo network for depth estimation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7796–7808, 2022, doi: 10.1109/TCSVT.2022.3183836

work page doi:10.1109/tcsvt.2022.3183836 2022
[29]

Multistage pixel-visibility learning with cost regular- ization for multiview stereo,

X. Guan, W. Tong, S. Jiang, P . Z. H. Sun, E. Q. Wu, and G. Chen, “Multistage pixel-visibility learning with cost regular- ization for multiview stereo,” IEEE Transactions on Automation Science and Engineering, vol. 20, no. 2, pp. 751–762, 2023, doi: 10.1109/TASE.2022.3165944

work page doi:10.1109/tase.2022.3165944 2023
[30]

Learned primal-dual reconstruction,

J. Adler and O. ¨Oktem, “Learned primal-dual reconstruction,” IEEE Transactions on Medical Imaging, vol. 37, pp. 1322–1332, 2017, doi: 10.1109/TMI.2018.2799231

work page doi:10.1109/tmi.2018.2799231 2017
[31]

Solving ill-posed inverse problems using iterative deep neural networks,

——, “Solving ill-posed inverse problems using iterative deep neural networks,” Inverse Problems, vol. 33, p. 124007, 2017, doi: 10.1088/1361-6420/aa9581

work page doi:10.1088/1361-6420/aa9581 2017
[32]

Tensor displays,

G. Wetzstein, D. Lanman, M. Hirsch, and R. Raskar, “Tensor displays,” ACM Transactions on Graphics (TOG), vol. 31, pp. 1 – 11, 2012, doi: 10.1145/2343456.2343480

work page doi:10.1145/2343456.2343480 2012
[33]

Compositing digital images,

T. K. Porter and T. D. S. Duff, “Compositing digital images,” international conference on computer graphics and interactive techniques, 1984, doi: 10.1145/964965.808606

work page doi:10.1145/964965.808606 1984
[34]

A ConvNet for the 2020s

B. Mildenhall, P . Hedman, R. Martin-Brualla, P . P . Srinivasan, and J. T. Barron, “Nerf in the dark: High dynamic range view synthesis from noisy raw images,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16 169– 16 178, 2021, doi: 10.1109/CVPR52688.2022.01571

work page doi:10.1109/cvpr52688.2022.01571 2022
[35]

A ConvNet for the 2020s

D. Verbin, P . Hedman, B. Mildenhall, T. E. Zickler, J. T. Barron, and P . P . Srinivasan, “Ref-nerf: Structured view-dependent appear- ance for neural radiance fields,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5481–5490, 2021, doi: 10.1109/CVPR52688.2022.00541

work page doi:10.1109/cvpr52688.2022.00541 2022
[36]

A ConvNet for the 2020s

J. T. Barron, B. Mildenhall, D. Verbin, P . P . Srinivasan, and P . Hedman, “Mip-nerf 360: Unbounded anti-aliased neural ra- diance fields,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5460–5469, 2021, doi: 10.1109/CVPR52688.2022.00539

work page doi:10.1109/cvpr52688.2022.00539 2022
[37]

Derf: Decomposed radiance fields,

M. Tancik, B. Mildenhall, T. Wang, D. Schmidt, P . P . Srinivasan, J. T. Barron, and R. Ng, “Learned initializations for optimiz- ing coordinate-based neural representations,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2845–2854, 2020, doi: 10.1109/CVPR46437.2021.00287

work page doi:10.1109/cvpr46437.2021.00287 2021
[38]

Instant neural graph- ics primitives with a multiresolution hash encoding.ACM Transactions on Graphics (Proc

T. M ¨uller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Transactions on Graphics (TOG), vol. 41, pp. 1 – 15, 2022, doi: 10.1145/3528223.3530127

work page doi:10.1145/3528223.3530127 2022
[39]

Derf: Decomposed radiance fields,

S. Peng, Y. Zhang, Y. Xu, Q. Wang, Q. Shuai, H. Bao, and X. Zhou, “Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9050–9059, 2020, doi: 10.1109/CVPR46437.2021.00894

work page doi:10.1109/cvpr46437.2021.00894 2021
[40]

A ConvNet for the 2020s

T. Khakhulin, D. Korzhenkov, P . Solovev, G. Sterkin, A.-T. Arde- lean, and V . S. Lempitsky, “Stereo magnification with multi- layer images,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8677–8686, 2022, doi: 10.1109/CVPR52688.2022.00849

work page doi:10.1109/cvpr52688.2022.00849 2022
[41]

Light field re- construction via deep adaptive fusion of hybrid lenses,

J. Jin, M. Guo, J. Hou, H. Liu, and H. Xiong, “Light field re- construction via deep adaptive fusion of hybrid lenses,” IEEE Transactionson Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 12 050–12 067, 2023, doi: 10.1109/TPAMI.2023.3287603

work page doi:10.1109/tpami.2023.3287603 2023
[42]

IEEE Transactions on Image Processing 26(5), 2274–2285 (2017)

G. Wu, Y. Wang, Y. Liu, L. Fang, and T. Chai, “Spatial- angular attention network for light field reconstruction,” IEEE Transactions on Image Processing, vol. 30, pp. 8999–9013, 2021, doi: 10.1109/TIP .2021.3122089

work page doi:10.1109/tip 2021
[43]

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

J. Shi, X. Jiang, and C. Guillemot, “Learning fused pixel and feature-based view reconstructions for light fields,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2552–2561, doi: 10.1109/CVPR42600.2020.00263

work page doi:10.1109/cvpr42600.2020.00263 2020
[44]

Project starline: A high-fidelity telepresence system,

J. Lawrence, D. B. Goldman, S. Achar, G. M. Blascovich, J. G. Desloge, T. Fortes, E. M. Gomez, S. H ¨aberling, H. Hoppe, A. Huibers et al., “Project starline: A high-fidelity telepresence system,” 2021, doi: 10.1145/3478513.3480490

work page doi:10.1145/3478513.3480490 2021
[45]

Virtualcube: An immersive 3d video communication sys- tem,

Y. Zhang, J. Yang, Z. Liu, R. Wang, G. Chen, X. Tong, and B. Guo, “Virtualcube: An immersive 3d video communication sys- tem,” IEEE Transactions on Visualization and Computer Graphics, vol. PP , pp. 1–1, 2021, doi: 10.1109/TVCG.2022.3150512

work page doi:10.1109/tvcg.2022.3150512 2021
[47]

Adam: A Method for Stochastic Optimization

D. P . Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID:6628106

work page internal anchor Pith review Pith/arXiv arXiv 2014
[48]

Derf: Decomposed radiance fields,

S. Wizadwongsa, P . Phongthawee, J. Yenphraphai, and S. Suwa- janakorn, “Nex: Real-time view synthesis with neural basis expansion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8534–8543, doi: 10.1109/CVPR46437.2021.00843

work page doi:10.1109/cvpr46437.2021.00843 2021
[49]

: Computational design and fabrication of soft pneumatic objects with desired deformations

E. Penner and L. Zhang, “Soft 3d reconstruction for view synthe- sis,” ACM Transactions on Graphics (TOG), vol. 36, pp. 1 – 11, 2017, doi: 10.1145/3130800.3130855

work page doi:10.1145/3130800.3130855 2017
[50]

PoseNet: A convolutional network for real-time 6-dof camera relocalization,

R. Szeliski and P . Golland, “Stereo matching with transparency and matting,” International Journal of Computer Vision, vol. 32, pp. 45–61, 1998, doi: 10.1109/ICCV .1998.710766

work page doi:10.1109/iccv 1998
[51]

Mvsnet: Depth infer- ence for unstructured multi-view stereo,

Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan, “Mvsnet: Depth infer- ence for unstructured multi-view stereo,” in European Conference on Computer Vision, 2018, doi: 10.1007/978-3-030-01237-3 47. 16 Yijie Deng is currently a master student in Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua University. He received B.E. from Wuhan University in 2021. His...

work page doi:10.1007/978-3-030-01237-3 2018

[1] [1]

Deepview: View synthesis with learned gradient descent,

J. Flynn, M. Broxton, P . E. Debevec, M. DuVall, G. Fyffe, R. S. Overbeck, N. Snavely, and R. Tucker, “Deepview: View synthesis with learned gradient descent,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2362–2371, 2019, doi: 10.1109/CVPR.2019.00247

work page doi:10.1109/cvpr.2019.00247 2019

[2] [2]

Spatiotemporal reservoir resampling for real-time ray tracing with dynamic direct lighting.ACM Transactions on Graphics (Proc

M. Broxton, J. Flynn, R. S. Overbeck, D. Erickson, P . Hedman, M. DuVall, J. Dourgarian, J. Busch, M. Whalen, and P . E. Debevec, “Immersive light field video with a layered mesh representation,” ACM Transactions on Graphics (TOG), vol. 39, pp. 86:1 – 86:15, 2020, doi: 10.1145/3386569.3392485

work page doi:10.1145/3386569.3392485 2020

[3] [3]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P . P . Srinivasan, M. Tancik, J. T. Barron, R. Ra- mamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in European Conference on Computer Vision, 2020, doi: 10.1145/3503250

work page doi:10.1145/3503250 2020

[4] [4]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

J. T. Barron, B. Mildenhall, M. Tancik, P . Hedman, R. Martin- Brualla, and P . P . Srinivasan, “Mip-nerf: A multiscale represen- tation for anti-aliasing neural radiance fields,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5835– 5844, 2021, doi: 10.1109/ICCV48922.2021.00580

work page doi:10.1109/iccv48922.2021.00580 2021

[5] [5]

Derf: Decomposed radiance fields,

Q. Wang, Z. Wang, K. Genova, P . P . Srinivasan, H. Zhou, J. T. Bar- ron, R. Martin-Brualla, N. Snavely, and T. A. Funkhouser, “Ibrnet: Learning multi-view image-based rendering,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4688–4697, 2021, doi: 10.1109/CVPR46437.2021.00466

work page doi:10.1109/cvpr46437.2021.00466 2021

[6] [6]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

A. Chen, Z. Xu, F. Zhao, X. Zhang, F. Xiang, J. Yu, and H. Su, “Mvsnerf: Fast generalizable radiance field reconstruc- tion from multi-view stereo,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14 104–14 113, 2021, doi: 10.1109/ICCV48922.2021.01386

work page doi:10.1109/iccv48922.2021.01386 2021

[7] [7]

Efficient neural radiance fields for interactive free-viewpoint video,

H. Lin, S. Peng, Z. Xu, Y. Yan, Q. Shuai, H. Bao, and X. Zhou, “Efficient neural radiance fields for interactive free-viewpoint video,” SIGGRAPH Asia 2022 Conference Papers, 2021, doi: 10.1145/3550469.3555376

work page doi:10.1145/3550469.3555376 2022

[8] [9]

Local light field fusion: Practical view synthesis with prescriptive sampling guidelines,

B. Mildenhall, P . P . Srinivasan, R. Ortiz-Cayon, N. K. Kalan- tari, R. Ramamoorthi, R. Ng, and A. Kar, “Local light field fusion: Practical view synthesis with prescriptive sampling guidelines,” ACM Trans. Graph., vol. 38, no. 4, jul 2019, doi: 10.1145/3306346.3322980

work page doi:10.1145/3306346.3322980 2019

[9] [10]

Derf: Decomposed radiance fields,

D. B. Lindell, J. N. P . Martel, and G. Wetzstein, “Au- toint: Automatic integration for fast neural volume render- ing,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14 551–14 560, 2020, doi: 10.1109/CVPR46437.2021.01432

work page doi:10.1109/cvpr46437.2021.01432 2021

[10] [11]

Derf: Decomposed radiance fields,

D. Rebain, W. Jiang, S. Yazdani, K. Li, K. M. Yi, and A. Tagliasacchi, “Derf: Decomposed radiance fields,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14 148– 14 156, 2020, doi: 10.1109/CVPR46437.2021.01393

work page doi:10.1109/cvpr46437.2021.01393 2021

[11] [12]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

C. Reiser, S. Peng, Y. Liao, and A. Geiger, “Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14 315–14 325, 2021, doi: 10.1109/ICCV48922.2021.01407

work page doi:10.1109/iccv48922.2021.01407 2021

[12] [13]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

S. J. Garbin, M. Kowalski, M. Johnson, J. Shotton, and J. P . C. Valentin, “Fastnerf: High-fidelity neural rendering at 200fps,”2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14 326–14 335, 2021, doi: 10.1109/ICCV48922.2021.01408

work page doi:10.1109/iccv48922.2021.01408 2021

[13] [14]

A ConvNet for the 2020s

Y. Liu, S. Peng, L. Liu, Q. Wang, P . Wang, C. Theobalt, X. Zhou, and W. Wang, “Neural rays for occlusion-aware image-based rendering,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7814–7823, 2022, doi: 10.1109/CVPR52688.2022.00767

work page doi:10.1109/cvpr52688.2022.00767 2022

[14] [15]

360mvsnet: Deep multi-view stereo network with 360° images for indoor scene reconstruction,

A. Trevithick and B. Yang, “Grf: Learning a general radiance field for 3d representation and rendering,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 162–15 172, doi: 10.1109/WACV56688.2023.00429. 15

work page doi:10.1109/wacv56688.2023.00429 2021

[15] [16]

Derf: Decomposed radiance fields,

A. Yu, V . Ye, M. Tancik, and A. Kanazawa, “pixelnerf: Neural radi- ance fields from one or few images,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4576– 4585, 2021, doi: 10.1109/CVPR46437.2021.00455

work page doi:10.1109/cvpr46437.2021.00455 2021

[16] [17]

Derf: Decomposed radiance fields,

J. Chibane, A. Bansal, V . Lazova, and G. Pons-Moll, “Stereo radiance fields (srf): Learning view synthesis for sparse views of novel scenes,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7907–7916, 2021, doi: 10.1109/CVPR46437.2021.00782

work page doi:10.1109/cvpr46437.2021.00782 2021

[17] [18]

A ConvNet for the 2020s

A. Yu, S. Fridovich-Keil, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa, “Plenoxels: Radiance fields without neu- ral networks,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5491–5500, 2021, doi: 10.1109/CVPR52688.2022.00542

work page doi:10.1109/cvpr52688.2022.00542 2022

[18] [19]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa, “Plenoc- trees for real-time rendering of neural radiance fields,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5732–5741, 2021, doi: 10.1109/ICCV48922.2021.00570

work page doi:10.1109/iccv48922.2021.00570 2021

[19] [20]

A ConvNet for the 2020s

C. Sun, M. Sun, and H.-T. Chen, “Direct voxel grid op- timization: Super-fast convergence for radiance fields recon- struction,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5449–5459, 2021, doi: 10.1109/CVPR52688.2022.00538

work page doi:10.1109/cvpr52688.2022.00538 2022

[20] [21]

Stereo Magnification: Learning View Synthesis using Multiplane Images

T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely, “Stereo magnification: Learning view synthesis using multiplane images,” ArXiv, vol. abs/1805.09817, 2018, doi: 10.1145/3197517.3201323

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3197517.3201323 2018

[21] [22]

Pushing the boundaries of view extrapola- tion with multiplane images,

P . P . Srinivasan, R. Tucker, J. T. Barron, R. Ramamoorthi, R. Ng, and N. Snavely, “Pushing the boundaries of view extrapola- tion with multiplane images,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 175– 184, doi:10.1109/CVPR.2019.00026

work page doi:10.1109/cvpr.2019.00026 2019

[22] [23]

A ConvNet for the 2020s

K. Deng, A. Liu, J.-Y. Zhu, and D. Ramanan, “Depth-supervised nerf: Fewer views and faster training for free,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12 872–12 881, 2021, doi: 10.1109/CVPR52688.2022.01254

work page doi:10.1109/cvpr52688.2022.01254 2022

[23] [24]

Uncertainty awareness with adaptive propagation for multi-view stereo,

J. Chen, Z. Yu, L. Ma, and K. Zhang, “Uncertainty awareness with adaptive propagation for multi-view stereo,” Applied Intelligence, 2023, doi: 10.1007/s10489-023-04910-z. [Online]. Available: https://api.semanticscholar.org/CorpusID:261038143

work page doi:10.1007/s10489-023-04910-z 2023

[24] [25]

Adaptmvsnet: Efficient multi-view stereo with adaptive convolution and attention fusion,

P . Jiang, X. Yang, Y.-R. Chen, W.-Z. Song, and Y. Li, “Adaptmvsnet: Efficient multi-view stereo with adaptive convolution and attention fusion,” Computers & Graphics, 2023, doi: 10.1016/j.cag.2023.08.014. [Online]. Available: https://api.semanticscholar.org/CorpusID:260792500

work page doi:10.1016/j.cag.2023.08.014 2023

[25] [26]

Highres-mvsnet: A fast multi- view stereo network for dense 3d reconstruction from high- resolution images,

R. Weilharter and F. Fraundorfer, “Highres-mvsnet: A fast multi- view stereo network for dense 3d reconstruction from high- resolution images,” IEEE Access, vol. 9, pp. 11 306–11 315, 2021, doi: 10.1109/ACCESS.2021.3050556

work page doi:10.1109/access.2021.3050556 2021

[26] [27]

360mvsnet: Deep multi-view stereo network with 360° images for indoor scene reconstruction,

C.-Y. Chiu, Y.-T. Wu, I.-C. Shen, and Y.-Y. Chuang, “360mvsnet: Deep multi-view stereo network with 360° images for indoor scene reconstruction,” in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3056–3065, doi: 10.1109/WACV56688.2023.00307

work page doi:10.1109/wacv56688.2023.00307 2023

[27] [28]

Uncertainty guided multi-view stereo network for depth estimation,

W. Su, Q. Xu, and W. Tao, “Uncertainty guided multi-view stereo network for depth estimation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7796–7808, 2022, doi: 10.1109/TCSVT.2022.3183836

work page doi:10.1109/tcsvt.2022.3183836 2022

[28] [29]

Multistage pixel-visibility learning with cost regular- ization for multiview stereo,

X. Guan, W. Tong, S. Jiang, P . Z. H. Sun, E. Q. Wu, and G. Chen, “Multistage pixel-visibility learning with cost regular- ization for multiview stereo,” IEEE Transactions on Automation Science and Engineering, vol. 20, no. 2, pp. 751–762, 2023, doi: 10.1109/TASE.2022.3165944

work page doi:10.1109/tase.2022.3165944 2023

[29] [30]

Learned primal-dual reconstruction,

J. Adler and O. ¨Oktem, “Learned primal-dual reconstruction,” IEEE Transactions on Medical Imaging, vol. 37, pp. 1322–1332, 2017, doi: 10.1109/TMI.2018.2799231

work page doi:10.1109/tmi.2018.2799231 2017

[30] [31]

Solving ill-posed inverse problems using iterative deep neural networks,

——, “Solving ill-posed inverse problems using iterative deep neural networks,” Inverse Problems, vol. 33, p. 124007, 2017, doi: 10.1088/1361-6420/aa9581

work page doi:10.1088/1361-6420/aa9581 2017

[31] [32]

Tensor displays,

G. Wetzstein, D. Lanman, M. Hirsch, and R. Raskar, “Tensor displays,” ACM Transactions on Graphics (TOG), vol. 31, pp. 1 – 11, 2012, doi: 10.1145/2343456.2343480

work page doi:10.1145/2343456.2343480 2012

[32] [33]

Compositing digital images,

T. K. Porter and T. D. S. Duff, “Compositing digital images,” international conference on computer graphics and interactive techniques, 1984, doi: 10.1145/964965.808606

work page doi:10.1145/964965.808606 1984

[33] [34]

A ConvNet for the 2020s

B. Mildenhall, P . Hedman, R. Martin-Brualla, P . P . Srinivasan, and J. T. Barron, “Nerf in the dark: High dynamic range view synthesis from noisy raw images,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16 169– 16 178, 2021, doi: 10.1109/CVPR52688.2022.01571

work page doi:10.1109/cvpr52688.2022.01571 2022

[34] [35]

A ConvNet for the 2020s

D. Verbin, P . Hedman, B. Mildenhall, T. E. Zickler, J. T. Barron, and P . P . Srinivasan, “Ref-nerf: Structured view-dependent appear- ance for neural radiance fields,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5481–5490, 2021, doi: 10.1109/CVPR52688.2022.00541

work page doi:10.1109/cvpr52688.2022.00541 2022

[35] [36]

A ConvNet for the 2020s

J. T. Barron, B. Mildenhall, D. Verbin, P . P . Srinivasan, and P . Hedman, “Mip-nerf 360: Unbounded anti-aliased neural ra- diance fields,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5460–5469, 2021, doi: 10.1109/CVPR52688.2022.00539

work page doi:10.1109/cvpr52688.2022.00539 2022

[36] [37]

Derf: Decomposed radiance fields,

M. Tancik, B. Mildenhall, T. Wang, D. Schmidt, P . P . Srinivasan, J. T. Barron, and R. Ng, “Learned initializations for optimiz- ing coordinate-based neural representations,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2845–2854, 2020, doi: 10.1109/CVPR46437.2021.00287

work page doi:10.1109/cvpr46437.2021.00287 2021

[37] [38]

Instant neural graph- ics primitives with a multiresolution hash encoding.ACM Transactions on Graphics (Proc

T. M ¨uller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Transactions on Graphics (TOG), vol. 41, pp. 1 – 15, 2022, doi: 10.1145/3528223.3530127

work page doi:10.1145/3528223.3530127 2022

[38] [39]

Derf: Decomposed radiance fields,

S. Peng, Y. Zhang, Y. Xu, Q. Wang, Q. Shuai, H. Bao, and X. Zhou, “Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9050–9059, 2020, doi: 10.1109/CVPR46437.2021.00894

work page doi:10.1109/cvpr46437.2021.00894 2021

[39] [40]

A ConvNet for the 2020s

T. Khakhulin, D. Korzhenkov, P . Solovev, G. Sterkin, A.-T. Arde- lean, and V . S. Lempitsky, “Stereo magnification with multi- layer images,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8677–8686, 2022, doi: 10.1109/CVPR52688.2022.00849

work page doi:10.1109/cvpr52688.2022.00849 2022

[40] [41]

Light field re- construction via deep adaptive fusion of hybrid lenses,

J. Jin, M. Guo, J. Hou, H. Liu, and H. Xiong, “Light field re- construction via deep adaptive fusion of hybrid lenses,” IEEE Transactionson Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 12 050–12 067, 2023, doi: 10.1109/TPAMI.2023.3287603

work page doi:10.1109/tpami.2023.3287603 2023

[41] [42]

IEEE Transactions on Image Processing 26(5), 2274–2285 (2017)

G. Wu, Y. Wang, Y. Liu, L. Fang, and T. Chai, “Spatial- angular attention network for light field reconstruction,” IEEE Transactions on Image Processing, vol. 30, pp. 8999–9013, 2021, doi: 10.1109/TIP .2021.3122089

work page doi:10.1109/tip 2021

[42] [43]

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

J. Shi, X. Jiang, and C. Guillemot, “Learning fused pixel and feature-based view reconstructions for light fields,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2552–2561, doi: 10.1109/CVPR42600.2020.00263

work page doi:10.1109/cvpr42600.2020.00263 2020

[43] [44]

Project starline: A high-fidelity telepresence system,

J. Lawrence, D. B. Goldman, S. Achar, G. M. Blascovich, J. G. Desloge, T. Fortes, E. M. Gomez, S. H ¨aberling, H. Hoppe, A. Huibers et al., “Project starline: A high-fidelity telepresence system,” 2021, doi: 10.1145/3478513.3480490

work page doi:10.1145/3478513.3480490 2021

[44] [45]

Virtualcube: An immersive 3d video communication sys- tem,

Y. Zhang, J. Yang, Z. Liu, R. Wang, G. Chen, X. Tong, and B. Guo, “Virtualcube: An immersive 3d video communication sys- tem,” IEEE Transactions on Visualization and Computer Graphics, vol. PP , pp. 1–1, 2021, doi: 10.1109/TVCG.2022.3150512

work page doi:10.1109/tvcg.2022.3150512 2021

[45] [47]

Adam: A Method for Stochastic Optimization

D. P . Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID:6628106

work page internal anchor Pith review Pith/arXiv arXiv 2014

[46] [48]

Derf: Decomposed radiance fields,

S. Wizadwongsa, P . Phongthawee, J. Yenphraphai, and S. Suwa- janakorn, “Nex: Real-time view synthesis with neural basis expansion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8534–8543, doi: 10.1109/CVPR46437.2021.00843

work page doi:10.1109/cvpr46437.2021.00843 2021

[47] [49]

: Computational design and fabrication of soft pneumatic objects with desired deformations

E. Penner and L. Zhang, “Soft 3d reconstruction for view synthe- sis,” ACM Transactions on Graphics (TOG), vol. 36, pp. 1 – 11, 2017, doi: 10.1145/3130800.3130855

work page doi:10.1145/3130800.3130855 2017

[48] [50]

PoseNet: A convolutional network for real-time 6-dof camera relocalization,

R. Szeliski and P . Golland, “Stereo matching with transparency and matting,” International Journal of Computer Vision, vol. 32, pp. 45–61, 1998, doi: 10.1109/ICCV .1998.710766

work page doi:10.1109/iccv 1998

[49] [51]

Mvsnet: Depth infer- ence for unstructured multi-view stereo,

Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan, “Mvsnet: Depth infer- ence for unstructured multi-view stereo,” in European Conference on Computer Vision, 2018, doi: 10.1007/978-3-030-01237-3 47. 16 Yijie Deng is currently a master student in Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua University. He received B.E. from Wuhan University in 2021. His...

work page doi:10.1007/978-3-030-01237-3 2018