ViVo: A Dataset for Volumetric Video Reconstruction and Compression
Pith reviewed 2026-05-19 11:50 UTC · model grok-4.3
The pith
The ViVo dataset supplies synchronized multi-view RGB and depth videos with diverse human features and dynamic effects to test reconstruction and compression algorithms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a new dataset, ViVo, for VolumetrIc VideO reconstruction and compression. The dataset is faithful to real-world volumetric video production and is the first dataset to extend the definition of diversity to include both human-centric characteristics (skin, hair, etc.) and dynamic visual phenomena (transparent, reflective, liquid, etc.). Each video sequence contains raw data including fourteen multi-view RGB and depth video pairs, synchronized at 30FPS with per-frame calibration and audio data, and their associated 2-D foreground masks and 3-D point clouds. Benchmarks of state-of-the-art methods evidence the challenging nature of the proposed dataset and the limitations of existing
What carries the argument
The ViVo dataset of synchronized fourteen-view RGB-plus-depth sequences that incorporate both human-centric traits and dynamic phenomena, supplied with calibration, masks, and point clouds for direct use in reconstruction and compression pipelines.
If this is right
- State-of-the-art 3-D reconstruction methods encounter measurable drops in quality on scenes containing transparent or reflective surfaces.
- Existing volumetric compression algorithms show increased bitrate or quality loss when handling dynamic elements such as liquids.
- New algorithms must be developed to address the combination of human-centric and dynamic features present in production pipelines.
- The raw multi-view data can be used directly to train and validate models that aim to overcome these observed limitations.
Where Pith is reading between the lines
- Adoption of ViVo as a benchmark could shift evaluation standards toward content that better matches end-use conditions in immersive applications.
- The presence of synchronized audio alongside visual data suggests opportunities to study joint audio-visual reconstruction tasks.
- Future extensions might isolate the contribution of individual diversity factors, such as liquid motion, to identify which scene elements drive the largest performance gaps.
Load-bearing premise
The selected scenes and diversity criteria sufficiently represent the full range of challenges that arise in real-world volumetric video production pipelines.
What would settle it
State-of-the-art reconstruction and compression methods achieving accuracy and efficiency levels on ViVo that match or exceed their results on prior datasets would indicate the new sequences do not introduce uniquely difficult conditions.
Figures
read the original abstract
As research on neural volumetric video reconstruction and compression flourishes, there is a need for diverse and realistic datasets, which can be used to develop and validate reconstruction and compression models. However, existing volumetric video datasets lack diverse content in terms of both semantic and low-level features that are commonly present in real-world production pipelines. In this context, we propose a new dataset, ViVo, for VolumetrIc VideO reconstruction and compression. The dataset is faithful to real-world volumetric video production and is the first dataset to extend the definition of diversity to include both human-centric characteristics (skin, hair, etc.) and dynamic visual phenomena (transparent, reflective, liquid, etc.). Each video sequence in this database contains raw data including fourteen multi-view RGB and depth video pairs, synchronized at 30FPS with per-frame calibration and audio data, and their associated 2-D foreground masks and 3-D point clouds. To demonstrate the use of this database, we have benchmarked three state-of-the-art (SotA) 3-D reconstruction methods and two volumetric video compression algorithms. The obtained results evidence the challenging nature of the proposed dataset and the limitations of existing datasets for both volumetric video reconstruction and compression tasks, highlighting the need to develop more effective algorithms for these applications. The database and the associated results are available at https://vivo-bvicr.github.io/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the ViVo dataset for volumetric video reconstruction and compression. It consists of 14 synchronized multi-view RGB and depth video pairs at 30 FPS with per-frame calibration, audio, 2-D foreground masks, and 3-D point clouds. The central claims are that ViVo is faithful to real-world production pipelines and is the first dataset to extend diversity definitions to encompass both human-centric characteristics (skin, hair variations) and dynamic visual phenomena (transparent, reflective, liquid elements), with benchmarks on three reconstruction and two compression methods demonstrating its challenging nature and the limitations of prior datasets and methods.
Significance. If the representativeness and diversity claims are substantiated, ViVo would provide a valuable new benchmark that highlights gaps in current volumetric video techniques. The release of raw multi-view data plus derived assets (masks, point clouds) and the public availability of results are positive contributions for reproducibility in the field.
major comments (2)
- [Abstract; Dataset description (likely §3)] The claim that ViVo extends the definition of diversity to human-centric and dynamic phenomena and is 'faithful to real-world' volumetric video production is load-bearing for the paper's contribution and for interpreting the benchmark results as evidence of general limitations. However, no per-sequence or per-frame quantification is supplied (e.g., fraction of sequences exhibiting transparency, reflection, or liquid effects; distribution of skin tones or hair types; comparison to production statistics). This leaves the representativeness assumption unverified and makes it unclear whether observed method failures reflect broad challenges or selection effects in the chosen scenes.
- [Benchmarking section (likely §4)] Benchmarks on three reconstruction and two compression methods are presented to support the 'challenging nature' claim, yet the manuscript provides insufficient detail on evaluation protocols, exact error metrics (e.g., how PSNR/SSIM or geometric errors are computed across frames), and baseline implementation choices. This weakens the ability to interpret the quantitative results as robust evidence of limitations.
minor comments (2)
- [Figures and Tables] Figure captions and table headers should explicitly state the number of sequences or frames used for each reported metric to improve clarity.
- [Abstract] The abstract states 'the obtained results evidence the challenging nature,' but this phrasing should be softened to 'suggest' or 'indicate' pending the added quantification requested above.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate planned revisions to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract; Dataset description (likely §3)] The claim that ViVo extends the definition of diversity to human-centric and dynamic phenomena and is 'faithful to real-world' volumetric video production is load-bearing for the paper's contribution and for interpreting the benchmark results as evidence of general limitations. However, no per-sequence or per-frame quantification is supplied (e.g., fraction of sequences exhibiting transparency, reflection, or liquid effects; distribution of skin tones or hair types; comparison to production statistics). This leaves the representativeness assumption unverified and makes it unclear whether observed method failures reflect broad challenges or selection effects in the chosen scenes.
Authors: We agree that explicit quantification would better substantiate the diversity and representativeness claims. In the revised manuscript, we will add a dedicated subsection (or table) in the dataset description that reports per-sequence statistics on dynamic phenomena (e.g., counts of sequences containing transparent objects, reflective surfaces, or liquids) and human-centric attributes (e.g., distribution of skin tones and hair types across the 14 sequences). We will also expand the discussion of capture setup to reference standard volumetric video production practices, clarifying alignment with real-world pipelines. These additions should help address concerns about selection effects. revision: yes
-
Referee: [Benchmarking section (likely §4)] Benchmarks on three reconstruction and two compression methods are presented to support the 'challenging nature' claim, yet the manuscript provides insufficient detail on evaluation protocols, exact error metrics (e.g., how PSNR/SSIM or geometric errors are computed across frames), and baseline implementation choices. This weakens the ability to interpret the quantitative results as robust evidence of limitations.
Authors: We concur that additional methodological detail is required for reproducibility and robust interpretation. In the revised §4, we will provide expanded descriptions of the evaluation protocols, including precise definitions and aggregation methods for metrics (e.g., per-frame PSNR/SSIM computation and averaging, geometric error calculations), the exact baseline implementations (including versions, modifications, and hyperparameters), and any preprocessing steps. We will also release the evaluation scripts with the dataset to support verification of the reported results. revision: yes
Circularity Check
No circularity: dataset release with external benchmarks is self-contained
full rationale
The paper is a dataset release accompanied by empirical benchmarks on three external 3-D reconstruction methods and two compression algorithms. No derivations, predictions, fitted parameters, or self-citations appear in the provided text. Claims of diversity and faithfulness rest on scene selection and observed benchmark performance against independent SotA methods rather than any reduction to internal definitions or prior author work. The work contains no load-bearing steps that equate outputs to inputs by construction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The dataset is faithful to real-world volumetric video production and is the first dataset to extend the definition of diversity to include both human-centric characteristics (skin, hair, etc.) and dynamic visual phenomena (transparent, reflective, liquid, etc.).
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We benchmarked three state-of-the-art (SotA) 3-D reconstruction methods and two volumetric video compression algorithms.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges
Splatography improves dynamic 3D reconstruction from sparse multi-view videos by splitting foreground and background Gaussian representations and applying tailored deformation learning for each.
Reference graph
Works this paper leans on
-
[1]
S. Peng, Y . Zhang, Y . Xu, Q. Wang, Q. Shuai, H. Bao, and X. Zhou, “Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,” in CVPR, 2021
work page 2021
-
[2]
Panoptic studio: A massively multiview system for social motion capture,
H. Joo, H. Liu, L. Tan, L. Gui, B. Nabbe, I. Matthews, T. Kanade, S. Nobuhara, and Y . Sheikh, “Panoptic studio: A massively multiview system for social motion capture,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 3334–3342
work page 2015
-
[3]
SAM 2: Segment Anything in Images and Videos
N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafson et al., “Sam 2: Segment any- thing in images and videos,” arXiv preprint arXiv:2408.00714 , 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[4]
Nerf: Representing scenes as neural radiance fields for view synthesis,
B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021
work page 2021
-
[5]
Instant neural graphics primitives with a multiresolution hash encoding,
T. M ¨uller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM transactions on graphics (TOG) , vol. 41, no. 4, pp. 1–15, 2022
work page 2022
-
[6]
Neural human performer: Learning generalizable radiance fields for human performance rendering,
Y . Kwon, D. Kim, D. Ceylan, and H. Fuchs, “Neural human performer: Learning generalizable radiance fields for human performance rendering,” Advances in Neural Information Pro- cessing Systems , vol. 34, pp. 24 741–24 752, 2021
work page 2021
-
[7]
3d gaussian splatting for real-time radiance field rendering
B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering.” ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023
work page 2023
-
[8]
Human gaussian splatting: Real-time rendering of an- imatable avatars,
A. Moreau, J. Song, H. Dhamo, R. Shaw, Y . Zhou, and E. P´erez- Pellitero, “Human gaussian splatting: Real-time rendering of an- imatable avatars,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2024, pp. 788–798
work page 2024
-
[9]
4d gaussian splatting for real-time dynamic scene rendering,
G. Wu, T. Yi, J. Fang, L. Xie, X. Zhang, W. Wei, W. Liu, Q. Tian, and X. Wang, “4d gaussian splatting for real-time dynamic scene rendering,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2024, pp. 20 310–20 320
work page 2024
-
[10]
Spacetime gaussian feature splatting for real-time dynamic view synthesis,
Z. Li, Z. Chen, Z. Li, and Y . Xu, “Spacetime gaussian feature splatting for real-time dynamic view synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8508–8520
work page 2024
-
[11]
Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction,
Z. Yang, X. Gao, W. Zhou, S. Jiao, Y . Zhang, and X. Jin, “Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction,” in Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition , 2024, pp. 20 331–20 341
work page 2024
-
[12]
Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes,
Y .-H. Huang, Y .-T. Sun, Z. Yang, X. Lyu, Y .-P. Cao, and X. Qi, “Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2024, pp. 4220– 4230
work page 2024
-
[13]
K-planes: Explicit radiance fields in space, time, and appearance,
S. Fridovich-Keil, G. Meanti, F. R. Warburg, B. Recht, and A. Kanazawa, “K-planes: Explicit radiance fields in space, time, and appearance,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 12 479– 12 488
work page 2023
-
[14]
Hexplane: A fast representation for dynamic scenes,
A. Cao and J. Johnson, “Hexplane: A fast representation for dynamic scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 130– 141
work page 2023
-
[15]
ACM Transactions on Graphics , year =
R. W. Sumner, J. Schmid, and M. Pauly, “Embedded deformation for shape manipulation,” ACM Trans. Graph. , vol. 26, no. 3, p. 80–es, Jul. 2007. [Online]. Available: https://doi.org/10.1145/1276377.1276478
-
[16]
Dna-rendering: A diverse neural 11 actor repository for high-fidelity human-centric rendering,
W. Cheng, R. Chen, S. Fan, W. Yin, K. Chen, Z. Cai, J. Wang, Y . Gao, Z. Yu, Z. Lin et al. , “Dna-rendering: A diverse neural 11 actor repository for high-fidelity human-centric rendering,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 19 982–19 993
work page 2023
-
[17]
Mvhumannet: A large-scale dataset of multi-view daily dressing human captures,
Z. Xiong, C. Li, K. Liu, H. Liao, J. Hu, J. Zhu, S. Ning, L. Qiu, C. Wang, S. Wang et al. , “Mvhumannet: A large-scale dataset of multi-view daily dressing human captures,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 19 801–19 811
work page 2024
-
[18]
Exploring dy- namic novel view synthesis technologies for cinematography,
A. Azzarelli, N. Anantrasirichai, and D. R. Bull, “Exploring dy- namic novel view synthesis technologies for cinematography,” arXiv preprint arXiv:2412.17532 , 2024
-
[19]
Neural 3d video synthesis from multi-view video,
T. Li, M. Slavcheva, M. Zollhoefer, S. Green, C. Lassner, C. Kim, T. Schmidt, S. Lovegrove, M. Goesele, R. Newcombe et al. , “Neural 3d video synthesis from multi-view video,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2022, pp. 5521–5531
work page 2022
-
[20]
Overview of the stereo and multiview video coding extensions of the h. 264/mpeg-4 avc standard,
A. Vetro, T. Wiegand, and G. J. Sullivan, “Overview of the stereo and multiview video coding extensions of the h. 264/mpeg-4 avc standard,” Proceedings of the IEEE , vol. 99, no. 4, pp. 626–642, 2011
work page 2011
-
[21]
Overview of the multiview and 3d extensions of high efficiency video coding,
G. Tech, Y . Chen, K. M ¨uller, J.-R. Ohm, A. Vetro, and Y .-K. Wang, “Overview of the multiview and 3d extensions of high efficiency video coding,” IEEE Transactions on Circuits and Systems for Video Technology , vol. 26, no. 1, pp. 35–49, 2015
work page 2015
-
[22]
MPEG immersive video coding standard,
J. M. Boyce, R. Dor ´e, A. Dziembowski, J. Fleureau, J. Jung, B. Kroon, B. Salahieh, V . K. M. Vadakital, and L. Yu, “MPEG immersive video coding standard,” Proceedings of the IEEE , vol. 109, no. 9, pp. 1521–1536, 2021
work page 2021
-
[23]
Overview of the high efficiency video coding (hevc) standard,
G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on circuits and systems for video technology , vol. 22, no. 12, pp. 1649–1668, 2012
work page 2012
-
[24]
Fv-nerv: Neural compression for free viewpoint videos,
T. Fujihashi, S. Kato, and T. Koike-Akino, “Fv-nerv: Neural compression for free viewpoint videos,” in Workshop on Ma- chine Learning and Compression, NeurIPS 2024
work page 2024
-
[25]
Immersive video compression using implicit neural representations,
H. M. Kwan, F. Zhang, A. Gower, and D. Bull, “Immersive video compression using implicit neural representations,” in PCS. IEEE, 2024, pp. 1–5
work page 2024
-
[26]
Neu- ral volumetric video coding with hierarchical coded represen- tation of dynamic volume,
J.-Y . Shin, J.-K. Lee, G. Bang, J.-S. Kim, and J.-W. Kang, “Neu- ral volumetric video coding with hierarchical coded represen- tation of dynamic volume,” IEEE Transactions on Multimedia , 2025
work page 2025
-
[27]
Implicit-explicit integrated representations for multi-view video compression,
C. Zhu, G. Lu, B. He, R. Xie, and L. Song, “Implicit-explicit integrated representations for multi-view video compression,” IEEE Trans. Image Process. , vol. 34, pp. 1106–1118, 2025
work page 2025
-
[28]
MPEG immersive video coding standard,
J. M. Boyce, R. Dor ´e, A. Dziembowski, J. Fleureau, J. Jung, B. Kroon, B. Salahieh, V . K. M. Vadakital, and L. Yu, “MPEG immersive video coding standard,” Proc. IEEE, vol. 109, no. 9, pp. 1521–1536, 2021
work page 2021
-
[29]
Jpeg pleno database: 8i voxelized full bodies (8ivfb v2)-a dynamic voxelized point cloud dataset,
E. d’Eon, B. Harrison, T. Myers, and P. A. Chou, “Jpeg pleno database: 8i voxelized full bodies (8ivfb v2)-a dynamic voxelized point cloud dataset,” 2019
work page 2019
-
[30]
Microsoft voxelized upper bodies – a voxelized point cloud dataset,
C. Loop, Q. Cai, S. O. Escolano, and P. A. Chou, “Microsoft voxelized upper bodies – a voxelized point cloud dataset,” ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document m38673/M72012, Geneva, May 2016, available at: https://www.microsoft.com/en- us/research/publication/microsoft-voxelized-upper-bodies-a- voxelized-point-cloud-dataset/
work page 2016
-
[31]
UVG-VPC: voxelized point cloud dataset for visual volumetric video-based coding,
G. Gautier, A. Mercat, L. Fr ´eneau, M. Pitk ¨anen, and J. Vanne, “UVG-VPC: voxelized point cloud dataset for visual volumetric video-based coding,” in 2023 15th international conference on quality of Multimedia experience (QoMEX) . IEEE, 2023, pp. 244–247
work page 2023
-
[32]
Textured mesh vs coloured point cloud: A subjective study for volumetric video compression,
E. Zerman, C. Ozcinar, P. Gao, and A. Smolic, “Textured mesh vs coloured point cloud: A subjective study for volumetric video compression,” in Twelfth International Conference on Quality of Multimedia Experience (QoMEX) , 2020
work page 2020
-
[33]
Owlii dynamic human textured mesh sequence dataset,
Y . Xu, Y . Lu, and Z. Wen, “Owlii dynamic human textured mesh sequence dataset,” in ISO/IEC JTC1/SC29/WG1 1 input document m41658 , 2017
work page 2017
-
[34]
BVI-CR: A multi-view hu- man dataset for volumetric video compression,
G. Gao, A. Azzarelli, H. M. Kwan, N. Anantrasirichai, F. Zhang, O. Moolan-Feroze, and D. Bull, “BVI-CR: A multi-view hu- man dataset for volumetric video compression,” arXiv preprint arXiv:2411.11199, 2024
-
[35]
MPEG video-based point cloud compression (V-PCC) standard,
G. Li, W. Gao, and W. Gao, “MPEG video-based point cloud compression (V-PCC) standard,” in Point Cloud Compression: Technologies and Standardization . Springer, 2024, pp. 199– 218
work page 2024
-
[36]
PKU-DyMVHumans: A multi-view video bench- mark for high-fidelity dynamic human modeling,
X. Zheng, L. Liao, X. Li, J. Jiao, R. Wang, F. Gao, S. Wang, and R. Wang, “PKU-DyMVHumans: A multi-view video bench- mark for high-fidelity dynamic human modeling,” in Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22 530–22 540
work page 2024
-
[37]
Robust dual gaussian splatting for immersive human- centric volumetric videos,
Y . Jiang, Z. Shen, Y . Hong, C. Guo, Y . Wu, Y . Zhang, J. Yu, and L. Xu, “Robust dual gaussian splatting for immersive human- centric volumetric videos,” ACM Transactions on Graphics (TOG), vol. 43, no. 6, pp. 1–15, 2024
work page 2024
-
[38]
S. Tsuchida, S. Fukayama, M. Hamasaki, and M. Goto, “Aist dance video database: Multi-genre, multi-dancer, and multi- camera database for dance information processing.” in ISMIR, vol. 1, no. 5, 2019, p. 6
work page 2019
-
[39]
Ai choreog- rapher: Music conditioned 3d dance generation with aist++,
R. Li, S. Yang, D. A. Ross, and A. Kanazawa, “Ai choreog- rapher: Music conditioned 3d dance generation with aist++,” in Proceedings of the IEEE/CVF international conference on computer vision , 2021, pp. 13 401–13 412
work page 2021
-
[40]
Real-time deep dynamic characters,
M. Habermann, L. Liu, W. Xu, M. Zollhoefer, G. Pons-Moll, and C. Theobalt, “Real-time deep dynamic characters,” ACM Transactions on Graphics (ToG), vol. 40, no. 4, pp. 1–16, 2021
work page 2021
-
[41]
Generalizable neural performer: Learning robust radi- ance fields for human novel view synthesis,
W. Cheng, S. Xu, J. Piao, C. Qian, W. Wu, K.-Y . Lin, and H. Li, “Generalizable neural performer: Learning robust radi- ance fields for human novel view synthesis,” arXiv preprint arXiv:2204.11798, 2022
-
[42]
C. Ionescu, D. Papava, V . Olaru, and C. Sminchisescu, “Hu- man3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE transactions on pattern analysis and machine intelligence , vol. 36, no. 7, pp. 1325–1339, 2013
work page 2013
-
[43]
Humbi: A large multiview dataset of human body expressions,
Z. Yu, J. S. Yoon, I. K. Lee, P. Venkatesh, J. Park, J. Yu, and H. S. Park, “Humbi: A large multiview dataset of human body expressions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2020, pp. 2990– 3000
work page 2020
-
[44]
Humman: Multi-modal 4d human dataset for versatile sensing and modeling,
Z. Cai, D. Ren, A. Zeng, Z. Lin, T. Yu, W. Wang, X. Fan, Y . Gao, Y . Yu, L. Pan et al. , “Humman: Multi-modal 4d human dataset for versatile sensing and modeling,” in European Conference on Computer Vision . Springer, 2022, pp. 557–577
work page 2022
-
[45]
Open3D: A Modern Library for 3D Data Processing
Q.-Y . Zhou, J. Park, and V . Koltun, “Open3d: A modern library for 3d data processing,” arXiv preprint arXiv:1801.09847, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[46]
Image quality assessment: from error visibility to structural similarity,
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing , vol. 13, no. 4, pp. 600–612, 2004
work page 2004
-
[47]
The unreasonable effectiveness of deep features as a perceptual metric,
R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2018, pp. 586–595
work page 2018
-
[48]
Common test condi- tions for mpeg immersive video,
A. Dziembowski, B. Kroon, and J. Jung, “Common test condi- tions for mpeg immersive video,” ISO/IEC JTC 1/SC 29/WG 04, Technical Report, 2023
work page 2023
-
[49]
VVenC: An Open And Optimized VVC Encoder Implementation,
A. Wieckowski, J. Brandenburg, T. Hinz, C. Bartnik, V . George, G. Hege, C. Helmrich, A. Henkel, C. Lehmann, C. Stoffers, I. Zupancic, B. Bross, and D. Marpe, “VVenC: An Open And Optimized VVC Encoder Implementation,” in Proc. IEEE Inter- national Conference on Multimedia Expo Workshops (ICMEW) , pp. 1–2
-
[50]
Calculation of average PSNR differences between RD-curves,
G. Bjøntegaard, “Calculation of average PSNR differences between RD-curves,” in 13th VCEG Meeting, no. VCEG- 12 M33,Austin, Texas, 2001, pp. USA: ITU–T
work page 2001
-
[51]
Iv-psnr—the objective quality metric for immersive video applications,
A. Dziembowski, D. Mieloch, J. Stankowski, and A. Grzelka, “Iv-psnr—the objective quality metric for immersive video applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7575–7591, 2022. Adrian Azzarelli received the M.Eng degree in Electronic Engineering with Artificial Intelligence from the University of So...
work page 2022
-
[52]
He is currently a Research Associate with the School of Computer Science, University of Bristol. His research interests focus on low-level computer vision including neural video compression, implicit neural representations, and generative models. Ho Man Kwan received the B.Eng. degree in Com- puter Engineering and the M.Phil. degree in Elec- tronic and Co...
work page 2018
-
[53]
and Frontiers in Signal Processing (in 2022). Fan is also a member of the Visual Signal Processing and Communications Technical Committee associated with the IEEE Circuits and Systems Society. His research interests focus on low-level computer vision including video compression, quality assessment, super resolution and video frame interpolation. Nantheera...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.