Recognition: no theorem link
Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement
Pith reviewed 2026-05-10 17:54 UTC · model grok-4.3
The pith
A Gaussian-based framework reconstructs immersive volumetric videos from multi-view audiovisual data for large 6-DoF VR spaces.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce Immersive Volumetric Videos as a new volumetric media format for large 6-DoF interaction spaces with audiovisual feedback and high-resolution dynamic content. Built on the ImViD multi-view multi-modal dataset, our dynamic light field reconstruction framework employs a Gaussian-based spatio-temporal representation with flow-guided sparse initialization, joint camera temporal calibration, and multi-term supervision. We also present the first method for sound field reconstruction from multi-view audiovisual data, forming a unified pipeline whose benchmarks and VR experiments show high-quality, temporally stable output.
What carries the argument
Gaussian-based spatio-temporal representation incorporating flow-guided sparse initialization, joint camera temporal calibration, and multi-term spatio-temporal supervision for modeling complex motions and audiovisual fields.
If this is right
- The pipeline generates high-quality, temporally stable audiovisual volumetric content.
- It enables large 6-DoF interaction spaces in VR.
- It handles complex indoor and outdoor scenes with rich foreground-background interactions.
- Sound field reconstruction integrates with visual reconstruction for synchronized audiovisual feedback.
Where Pith is reading between the lines
- Such captured volumetric videos could be extended to real-time streaming for live events if reconstruction speed improves.
- Combining this with existing compression techniques might allow distribution of IVV content over networks.
- Testing the framework on even more challenging dynamics like fast-moving crowds could reveal scalability limits.
Load-bearing premise
That the Gaussian representation with the specified initialization and supervision terms can robustly and accurately capture complex real-world motions and audiovisual fields from the multi-view data.
What would settle it
Observing significant visual artifacts, temporal instability, or desynchronized audio in the reconstructed content during VR immersion tests with rapid object movements would indicate the modeling approach fails.
Figures
read the original abstract
Fully immersive experiences that tightly integrate 6-DoF visual and auditory interaction are essential for virtual and augmented reality. While such experiences can be achieved through computer-generated content, constructing them directly from real-world captured videos remains largely unexplored. We introduce Immersive Volumetric Videos, a new volumetric media format designed to provide large 6-DoF interaction spaces, audiovisual feedback, and high-resolution, high-frame-rate dynamic content. To support IVV construction, we present ImViD, a multi-view, multi-modal dataset built upon a space-oriented capture philosophy. Our custom capture rig enables synchronized multi-view video-audio acquisition during motion, facilitating efficient capture of complex indoor and outdoor scenes with rich foreground--background interactions and challenging dynamics. The dataset provides 5K-resolution videos at 60 FPS with durations of 1-5 minutes, offering richer spatial, temporal, and multimodal coverage than existing benchmarks. Leveraging this dataset, we develop a dynamic light field reconstruction framework built upon a Gaussian-based spatio-temporal representation, incorporating flow-guided sparse initialization, joint camera temporal calibration, and multi-term spatio-temporal supervision for robust and accurate modeling of complex motion. We further propose, to our knowledge, the first method for sound field reconstruction from such multi-view audiovisual data. Together, these components form a unified pipeline for immersive volumetric video production. Extensive benchmarks and immersive VR experiments demonstrate that our pipeline generates high-quality, temporally stable audiovisual volumetric content with large 6-DoF interaction spaces. This work provides both a foundational definition and a practical construction methodology for immersive volumetric videos.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper defines Immersive Volumetric Videos (IVV) as a new multimodal format enabling large 6-DoF audiovisual interaction in VR from real-world captures. It introduces the ImViD dataset captured via a custom multi-view rig (5K@60 FPS, 1-5 min sequences with foreground-background dynamics) and a reconstruction pipeline based on a Gaussian spatio-temporal representation that incorporates flow-guided sparse initialization, joint camera temporal calibration, multi-term spatio-temporal supervision, and the first reported sound-field reconstruction from such audiovisual data. Extensive benchmarks and immersive VR user studies are reported to demonstrate temporally stable, high-quality outputs supporting large interaction spaces.
Significance. If the experimental results hold, the work supplies both a foundational definition and an end-to-end construction methodology for real-world immersive volumetric media that jointly handles visual and auditory 6-DoF content. The ImViD dataset and the sound-field reconstruction component constitute concrete, reusable contributions that extend beyond existing visual-only volumetric video pipelines. The coherent integration of Gaussian representation, flow initialization, and multi-term supervision addresses a recognized gap in modeling complex real-world motion and audio fields.
minor comments (3)
- Abstract: the phrase 'to our knowledge, the first method for sound field reconstruction' would be strengthened by a brief sentence situating the claim against the closest prior audiovisual reconstruction works (e.g., those using ambisonics or multi-view audio).
- §3 (Method): the multi-term supervision loss is described at a high level; adding the explicit weighting coefficients or a short ablation on their relative importance would improve reproducibility.
- Figure 7 and associated VR experiment description: the quantitative metrics for temporal stability (e.g., warping error or flicker index) are mentioned but not tabulated; a small table summarizing these values across sequences would make the stability claim easier to verify.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work and the recommendation for minor revision. The provided summary correctly captures the core contributions: the definition of Immersive Volumetric Videos (IVV), the ImViD dataset captured with our custom multi-view rig, the Gaussian spatio-temporal reconstruction pipeline with flow-guided initialization and multi-term supervision, and the sound-field reconstruction component. We appreciate the recognition that these elements address gaps in real-world multimodal 6-DoF volumetric media.
Circularity Check
No significant circularity in derivation chain
full rationale
The manuscript presents a capture rig, dataset, and reconstruction pipeline (Gaussian spatio-temporal representation, flow-guided initialization, multi-term supervision, sound-field step) whose outputs are validated via benchmarks and VR experiments. No equations, parameter-fitting steps, or first-principles derivations appear in the abstract or described methods; claims rest on empirical results rather than any reduction of predictions to fitted inputs or self-citations. The pipeline is described as coherent and self-contained against external benchmarks, satisfying the default expectation of no circularity.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Immersive Volumetric Videos (IVV)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
https://www.worldlabs.ai/blog/marble-world-model
-
[2]
https://seed.bytedance.com/en/seedance2 0
-
[3]
Immersive light field video with a layered mesh representation,
M. Broxton, J. Flynn, R. Overbeck, D. Erickson, P. Hedman, M. Duvall, J. Dourgarian, J. Busch, M. Whalen, and P. Debevec, “Immersive light field video with a layered mesh representation,”ACM Transactions on Graphics (TOG), vol. 39, no. 4, pp. 86–1, 2020
2020
-
[4]
https://www.apple.com/newsroom/2024/07/ new-apple-immersive-video-series-and-films-premiere-on-vision-pro/
2024
-
[5]
Immersive 360 ˆ° video user experience: impact of different variables in the sense of presence and cybersickness,
D. Narciso, M. Bessa, M. Melo, A. Coelho, and J. Vasconcelos-Raposo, “Immersive 360 ˆ° video user experience: impact of different variables in the sense of presence and cybersickness,”Universal Access in the Information Society, vol. 18, pp. 77–87, 2019
2019
-
[6]
https://www.gracia.ai/
-
[7]
Panoptic studio: A massively multiview system for social interaction capture,
H. Joo, T. Simon, X. Li, H. Liu, L. Tan, L. Gui, S. Banerjee, T. S. Godisart, B. Nabbe, I. Matthews, T. Kanade, S. Nobuhara, and Y . Sheikh, “Panoptic studio: A massively multiview system for social interaction capture,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2017
2017
-
[8]
Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,
S. Peng, Y . Zhang, Y . Xu, Q. Wang, Q. Shuai, H. Bao, and X. Zhou, “Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,” inCVPR, 2021
2021
-
[9]
Diva-360: The dynamic visual dataset for immersive neural fields,
C.-Y . Lu, P. Zhou, A. Xing, C. Pokhariya, A. Dey, I. N. Shah, R. Mavidipalli, D. Hu, A. I. Comport, K. Chenet al., “Diva-360: The dynamic visual dataset for immersive neural fields,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22 466–22 476
2024
-
[10]
Replay: Multi-modal multi-view acted videos for casual holography,
R. Shapovalov, Y . Kleiman, I. Rocco, D. Novotny, A. Vedaldi, C. Chen, F. Kokkinos, B. Graham, and N. Neverova, “Replay: Multi-modal multi-view acted videos for casual holography,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 20 338–20 348
2023
-
[11]
Nerf: Representing scenes as neural radiance fields for view synthesis,
B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021
2021
-
[12]
3D Gaussian Splatting for Real-Time Radiance Field Rendering,
B. Kerbl, G. Kopanas, T. Leimkuehler, and G. Drettakis, “3D Gaussian Splatting for Real-Time Radiance Field Rendering,”ACM Transactions on Graphics, vol. 42, no. 4, pp. 1–14, Aug. 2023
2023
-
[13]
arXiv preprint arXiv:2312.00109 , year =
T. Lu, M. Yu, L. Xu, Y . Xiangli, L. Wang, D. Lin, and B. Dai, “Scaffold- gs: Structured 3d gaussians for view-adaptive rendering,”arXiv preprint arXiv:2312.00109, 2023
-
[14]
Street gaussians for modeling dynamic urban scenes,
Y . Yan, H. Lin, C. Zhou, W. Wang, H. Sun, K. Zhan, X. Lang, X. Zhou, and S. Peng, “Street gaussians for modeling dynamic urban scenes,” arXiv preprint arXiv:2401.01339, 2024
-
[15]
Spec-gaussian: Anisotropic view-dependent appearance for 3d gaussian splatting,
Z. Yang, X. Gao, Y . Sun, Y . Huang, X. Lyu, W. Zhou, S. Jiao, X. Qi, and X. Jin, “Spec-gaussian: Anisotropic view-dependent appearance for 3d gaussian splatting,”arXiv preprint arXiv:2402.15870, 2024
-
[16]
Gaussian splatting with nerf-based color and opacity
D. Malarz, W. Smolak, J. Tabor, S. Tadeja, and P. Spurek, “Gaussian splatting with nerf-based color and opacity.”
-
[17]
Gaussian in the wild: 3d gaussian splatting for unconstrained image collections,
D. Zhang, C. Wang, W. Wang, P. Li, M. Qin, and H. Wang, “Gaussian in the wild: 3d gaussian splatting for unconstrained image collections,” arXiv preprint arXiv:2403.15704, 2024
-
[18]
Gaussianpro: 3d gaussian splatting with progressive propagation,
K. Cheng, X. Long, K. Yang, Y . Yao, W. Yin, Y . Ma, W. Wang, and X. Chen, “Gaussianpro: 3d gaussian splatting with progressive propagation,”arXiv preprint arXiv:2402.14650, 2024
-
[19]
Geogaussian: Geometry-aware gaussian splatting for scene rendering,
Y . Li, C. Lyu, Y . Di, G. Zhai, G. H. Lee, and F. Tombari, “Geogaussian: Geometry-aware gaussian splatting for scene rendering,”arXiv preprint arXiv:2403.11324, 2024
-
[20]
4d gaussian splatting for real-time dynamic scene rendering,
G. Wu, T. Yi, J. Fang, L. Xie, X. Zhang, W. Wei, W. Liu, Q. Tian, and X. Wang, “4d gaussian splatting for real-time dynamic scene rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 310–20 320
2024
-
[21]
Spacetime gaussian feature splatting for real-time dynamic view synthesis,
Z. Li, Z. Chen, Z. Li, and Y . Xu, “Spacetime gaussian feature splatting for real-time dynamic view synthesis,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8508–8520
2024
-
[22]
K. Katsumata, D. M. V o, and H. Nakayama, “A compact dynamic 3d gaussian representation for real-time dynamic view synthesis,” 2024. [Online]. Available: https://arxiv.org/abs/2311.12897
-
[23]
Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting,
A. Kratimenos, J. Lei, and K. Daniilidis, “Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting,”arXiv preprint arXiv:2312.00112, 2023
-
[24]
Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle,
Y . Lin, Z. Dai, S. Zhu, and Y . Yao, “Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 21 136–21 145
2024
-
[25]
Neural 3d video synthesis from multi-view video,
T. Li, M. Slavcheva, M. Zollhoefer, S. Green, C. Lassner, C. Kim, T. Schmidt, S. Lovegrove, M. Goesele, R. Newcombeet al., “Neural 3d video synthesis from multi-view video,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5521–5531
2022
-
[26]
Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields,
L. Song, A. Chen, Z. Li, Z. Chen, L. Chen, J. Yuan, Y . Xu, and A. Geiger, “Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields,”IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 5, pp. 2732–2742, 2023
2023
-
[27]
Mixed neural voxels for fast multi-view video synthesis,
F. Wang, S. Tan, X. Li, Z. Tian, Y . Song, and H. Liu, “Mixed neural voxels for fast multi-view video synthesis,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 19 706–19 716
2023
-
[28]
Hyperreel: High-fidelity 6-dof video with ray-conditioned sampling,
B. Attal, J.-B. Huang, C. Richardt, M. Zollhoefer, J. Kopf, M. O’Toole, and C. Kim, “Hyperreel: High-fidelity 6-dof video with ray-conditioned sampling,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16 610–16 620
2023
-
[29]
K-planes: Explicit radiance fields in space, time, and appearance,
S. Fridovich-Keil, G. Meanti, F. R. Warburg, B. Recht, and A. Kanazawa, “K-planes: Explicit radiance fields in space, time, and appearance,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12 479–12 488
2023
-
[30]
Hexplane: A fast representation for dynamic scenes,
A. Cao and J. Johnson, “Hexplane: A fast representation for dynamic scenes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 130–141
2023
-
[31]
Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruc- tion and rendering,
R. Shao, Z. Zheng, H. Tu, B. Liu, H. Zhang, and Y . Liu, “Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruc- tion and rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16 632–16 642
2023
-
[32]
Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,
C. Ionescu, D. Papava, V . Olaru, and C. Sminchisescu, “Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,”IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 7, pp. 1325–1339, 2013. 15
2013
-
[33]
arXiv preprint arXiv:2106.13228 (2021)
K. Park, U. Sinha, P. Hedman, J. T. Barron, S. Bouaziz, D. B. Goldman, R. Martin-Brualla, and S. M. Seitz, “Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields,”arXiv preprint arXiv:2106.13228, 2021
-
[34]
Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera,
J. S. Yoon, K. Kim, O. Gallo, H. S. Park, and J. Kautz, “Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5336–5345
2020
-
[35]
Dˆ 2nerf: Self-supervised decoupling of dynamic and static objects from a monocular video,
T. Wu, F. Zhong, A. Tagliasacchi, F. Cole, and C. Oztireli, “Dˆ 2nerf: Self-supervised decoupling of dynamic and static objects from a monocular video,”Advances in neural information processing systems, vol. 35, pp. 32 653–32 666, 2022
2022
-
[36]
Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild,
W. Ren, Z. Zhu, B. Sun, J. Chen, M. Pollefeys, and S. Peng, “Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8931–8940
2024
-
[37]
Dataset and pipeline for multi-view light-field video,
N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubertet al., “Dataset and pipeline for multi-view light-field video,” inProceedings of the IEEE conference on computer vision and pattern recognition Workshops, 2017, pp. 30–40
2017
-
[38]
Deep 3d mask volume for view synthesis of dynamic scenes,
K.-E. Lin, L. Xiao, F. Liu, G. Yang, and R. Ramamoorthi, “Deep 3d mask volume for view synthesis of dynamic scenes,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1749–1758
2021
-
[39]
Streaming radiance fields for 3d video synthesis,
L. Li, Z. Shen, Z. Wang, L. Shen, and P. Tan, “Streaming radiance fields for 3d video synthesis,”Advances in Neural Information Processing Systems, vol. 35, pp. 13 485–13 498, 2022
2022
-
[40]
Efficient neural radiance fields for interactive free-viewpoint video,
H. Lin, S. Peng, Z. Xu, Y . Yan, Q. Shuai, H. Bao, and X. Zhou, “Efficient neural radiance fields for interactive free-viewpoint video,” inSIGGRAPH Asia 2022 Conference Papers, 2022, pp. 1–9
2022
-
[41]
Masked space-time hash encoding for efficient dynamic scene reconstruction,
F. Wang, Z. Chen, G. Wang, Y . Song, and H. Liu, “Masked space-time hash encoding for efficient dynamic scene reconstruction,”Advances in Neural Information Processing Systems, vol. 36, 2024
2024
-
[42]
360+x: A panoptic multi-modal scene understanding dataset,
H. Chen, Y . Hou, C. Qu, I. Testini, X. Hong, and J. Jiao, “360+x: A panoptic multi-modal scene understanding dataset,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 19 373–19 382
2024
-
[43]
High-quality streamable free- viewpoint video,
A. Collet, M. Chuang, P. Sweeney, D. Gillett, D. Evseev, D. Calabrese, H. Hoppe, A. Kirk, and S. Sullivan, “High-quality streamable free- viewpoint video,”ACM Transactions on Graphics (ToG), vol. 34, no. 4, pp. 1–13, 2015
2015
-
[44]
Virtualized reality: Construct- ing virtual worlds from real scenes,
T. Kanade, P. Rander, and P. Narayanan, “Virtualized reality: Construct- ing virtual worlds from real scenes,”IEEE multimedia, vol. 4, no. 1, pp. 34–47, 1997
1997
-
[45]
arXiv preprint arXiv:2312.15059 (2023)
H. Jung, N. Brasch, J. Song, E. Perez-Pellitero, Y . Zhou, Z. Li, N. Navab, and B. Busam, “Deformable 3d gaussian splatting for animatable human avatars,”arXiv preprint arXiv:2312.15059, 2023
-
[46]
Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis,
Y . Liang, N. Khan, Z. Li, T. Nguyen-Phuoc, D. Lanman, J. Tompkin, and L. Xiao, “Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis,”arXiv preprint arXiv:2312.11458, 2023
-
[47]
arXiv preprint arXiv:2308.09713 , year=
J. Luiten, G. Kopanas, B. Leibe, and D. Ramanan, “Dynamic 3d gaus- sians: Tracking by persistent dynamic view synthesis,”arXiv preprint arXiv:2308.09713, 2023
-
[48]
3dgstream: On- the-fly training of 3d gaussians for efficient streaming of photo-realistic free-viewpoint videos,
J. Sun, H. Jiao, G. Li, Z. Zhang, L. Zhao, and W. Xing, “3dgstream: On- the-fly training of 3d gaussians for efficient streaming of photo-realistic free-viewpoint videos,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 675–20 685
2024
-
[49]
B. P. Duisterhof, Z. Mandi, Y . Yao, J.-W. Liu, M. Z. Shou, S. Song, and J. Ichnowski, “Md-splatting: Learning metric deformation from 4d gaus- sians in highly deformable scenes,”arXiv preprint arXiv:2312.00583, 2023
-
[50]
Motion-aware 3d gaussian splatting for efficient dynamic scene reconstruction,
Z. Guo, W. Zhou, L. Li, M. Wang, and H. Li, “Motion-aware 3d gaussian splatting for efficient dynamic scene reconstruction,”arXiv preprint arXiv:2403.11447, 2024
-
[51]
St-4dgs: Spatial- temporally consistent 4d gaussian splatting for efficient dynamic scene rendering,
D. Li, S.-S. Huang, Z. Lu, X. Duan, and H. Huang, “St-4dgs: Spatial- temporally consistent 4d gaussian splatting for efficient dynamic scene rendering,” inACM SIGGRAPH 2024 Conference Papers, 2024, pp. 1– 11
2024
-
[52]
Fully explicit dynamic gaussian splatting,
J. Lee, C. Won, H. Jung, I. Bae, and H.-G. Jeon, “Fully explicit dynamic gaussian splatting,” inThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
2024
-
[53]
Freetimegs: Free gaussian primitives at anytime anywhere for dynamic scene reconstruction,
Y . Wang, P. Yang, Z. Xu, J. Sun, Z. Zhang, Y . Chen, H. Bao, S. Peng, and X. Zhou, “Freetimegs: Free gaussian primitives at anytime anywhere for dynamic scene reconstruction,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 21 750–21 760
2025
-
[54]
Z. Yang, H. Yang, Z. Pan, X. Zhu, and L. Zhang, “Real-time photo- realistic dynamic scene representation and rendering with 4d gaussian splatting,”arXiv preprint arXiv:2310.10642, 2023
-
[55]
4d-rotor gaussian splatting: Towards efficient novel view synthesis for dynamic scenes,
Y . Duan, F. Wei, Q. Dai, Y . He, W. Chen, and B. Chen, “4d-rotor gaussian splatting: Towards efficient novel view synthesis for dynamic scenes,” inACM SIGGRAPH 2024 Conference Papers, 2024, pp. 1–11
2024
-
[56]
Swings: sliding windows for dy- namic 3d gaussian splatting,
R. Shaw, M. Nazarczuk, J. Song, A. Moreau, S. Catley-Chandar, H. Dhamo, and E. P ´erez-Pellitero, “Swings: sliding windows for dy- namic 3d gaussian splatting,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 37–54
2024
-
[57]
J. C. Lee, D. Rho, X. Sun, J. H. Ko, and E. Park, “Compact 3d gaussian splatting for static and dynamic radiance fields,”arXiv preprint arXiv:2408.03822, 2024
-
[58]
Av-nerf: Learning neural fields for real-world audio-visual scene synthesis,
S. Liang, C. Huang, Y . Tian, A. Kumar, and C. Xu, “Av-nerf: Learning neural fields for real-world audio-visual scene synthesis,”Advances in Neural Information Processing Systems, vol. 36, pp. 37 472–37 490, 2023
2023
-
[59]
Neraf: 3d scene infused neural radiance and acoustic fields,
A. Brunetto, S. Hornauer, and F. Moutarde, “Neraf: 3d scene infused neural radiance and acoustic fields,”arXiv preprint arXiv:2405.18213, 2024
-
[60]
Av-gs: Learning material and geometry aware priors for novel view acoustic synthesis,
S. Bhosale, H. Yang, D. Kanojia, J. Deng, and X. Zhu, “Av-gs: Learning material and geometry aware priors for novel view acoustic synthesis,” arXiv preprint arXiv:2406.08920, 2024
-
[61]
Soaf: Scene occlusion-aware neural acoustic field.arXiv preprint arXiv:2407.02264, 2024
H. Gao, J. Ma, D. Ahmedt-Aristizabal, C. Nguyen, and M. Liu, “Soaf: Scene occlusion-aware neural acoustic field,”arXiv preprint arXiv:2407.02264, 2024
-
[62]
Monocular dynamic view synthesis: A reality check,
H. Gao, R. Li, S. Tulsiani, B. Russell, and A. Kanazawa, “Monocular dynamic view synthesis: A reality check,”Advances in Neural Informa- tion Processing Systems, vol. 35, pp. 33 768–33 780, 2022
2022
-
[63]
MoDGS : Dynamic gaussian splatting from causually-captured monocular videos
Q. Liu, Y . Liu, J. Wang, X. Lv, P. Wang, W. Wang, and J. Hou, “Modgs: Dynamic gaussian splatting from causually-captured monocular videos,” arXiv preprint arXiv:2406.00434, 2024
-
[64]
Imvid: Immersive volumetric videos for enhanced vr engagement,
Z. Yang, S. Pan, S. Wang, H. Wang, L. Lin, G. Li, Z. Wen, B. Lin, J. Tao, and T. Yu, “Imvid: Immersive volumetric videos for enhanced vr engagement,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 16 554–16 564
2025
-
[65]
Structure-from-motion revisited,
J. L. Sch ¨onberger and J.-M. Frahm, “Structure-from-motion revisited,” inConference on Computer Vision and Pattern Recognition (CVPR), 2016
2016
-
[66]
Videoflow: Exploiting temporal cues for multi-frame optical flow estimation,
X. Shi, Z. Huang, W. Bian, D. Li, M. Zhang, K. C. Cheung, S. See, H. Qin, J. Dai, and H. Li, “Videoflow: Exploiting temporal cues for multi-frame optical flow estimation,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 12 469–12 480
2023
-
[67]
Bilateral guided radiance field processing,
Y . Wang, C. Wang, B. Gong, and T. Xue, “Bilateral guided radiance field processing,”ACM Transactions on Graphics (TOG), vol. 43, no. 4, pp. 1–13, 2024
2024
-
[68]
L. Yang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao, “Depth anything v2,”arXiv:2406.09414, 2024
work page internal anchor Pith review arXiv 2024
-
[69]
A hierarchical 3d gaussian representation for real-time rendering of very large datasets,
B. Kerbl, A. Meuleman, G. Kopanas, M. Wimmer, A. Lanvin, and G. Drettakis, “A hierarchical 3d gaussian representation for real-time rendering of very large datasets,”ACM Transactions on Graphics (TOG), vol. 43, no. 4, pp. 1–15, 2024
2024
-
[70]
The unreasonable effectiveness of deep features as a perceptual metric,
R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in CVPR, 2018
2018
-
[71]
A perceptual evaluation of individual and non-individual hrtfs: A case study of the sadie ii database,
C. Armstrong, L. Thresh, D. Murphy, and G. Kearney, “A perceptual evaluation of individual and non-individual hrtfs: A case study of the sadie ii database,”Applied Sciences, vol. 8, no. 11, p. 2029, 2018
2029
-
[72]
arXiv preprint arXiv:2303.07399 (2023)
T. Jiang, P. Lu, L. Zhang, N. Ma, R. Han, C. Lyu, Y . Li, and K. Chen, “Rtmpose: Real-time multi-person pose estimation based on mmpose,” arXiv preprint arXiv:2303.07399, 2023
-
[73]
Gifstream: 4d gaussian-based immersive video with feature stream,
H. Li, S. Li, X. Gao, A. Batuer, L. Yu, and Y . Liao, “Gifstream: 4d gaussian-based immersive video with feature stream,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 21 761–21 770
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.