ViVo: A Dataset for Volumetric Video Reconstruction and Compression

Adrian Azzarelli; David Bull; Fan Zhang; Ge Gao; Ho Man Kwan; Nantheera Anantrasirichai; Ollie Moolan-Feroze

arxiv: 2506.00558 · v2 · submitted 2025-05-31 · 💻 cs.CV

ViVo: A Dataset for Volumetric Video Reconstruction and Compression

Adrian Azzarelli , Ge Gao , Ho Man Kwan , Fan Zhang , Nantheera Anantrasirichai , Ollie Moolan-Feroze , David Bull This is my paper

Pith reviewed 2026-05-19 11:50 UTC · model grok-4.3

classification 💻 cs.CV

keywords volumetric videodatasetreconstructioncompressionmulti-viewdiversitypoint cloudsneural rendering

0 comments

The pith

The ViVo dataset supplies synchronized multi-view RGB and depth videos with diverse human features and dynamic effects to test reconstruction and compression algorithms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current volumetric video datasets miss the mix of human details and dynamic visual effects that appear in actual production work. The paper presents ViVo as a collection of sequences that include varied skin tones, hair styles, transparent surfaces, reflections, and liquids while supplying raw multi-view data for each. Every sequence records fourteen camera pairs at 30 frames per second together with depth maps, calibration, audio, foreground masks, and point clouds. Benchmarks of three leading reconstruction methods and two compression approaches on this data show clear performance drops compared with simpler sets. The results indicate that existing techniques encounter previously under-tested difficulties when confronted with this broader range of content.

Core claim

We propose a new dataset, ViVo, for VolumetrIc VideO reconstruction and compression. The dataset is faithful to real-world volumetric video production and is the first dataset to extend the definition of diversity to include both human-centric characteristics (skin, hair, etc.) and dynamic visual phenomena (transparent, reflective, liquid, etc.). Each video sequence contains raw data including fourteen multi-view RGB and depth video pairs, synchronized at 30FPS with per-frame calibration and audio data, and their associated 2-D foreground masks and 3-D point clouds. Benchmarks of state-of-the-art methods evidence the challenging nature of the proposed dataset and the limitations of existing

What carries the argument

The ViVo dataset of synchronized fourteen-view RGB-plus-depth sequences that incorporate both human-centric traits and dynamic phenomena, supplied with calibration, masks, and point clouds for direct use in reconstruction and compression pipelines.

If this is right

State-of-the-art 3-D reconstruction methods encounter measurable drops in quality on scenes containing transparent or reflective surfaces.
Existing volumetric compression algorithms show increased bitrate or quality loss when handling dynamic elements such as liquids.
New algorithms must be developed to address the combination of human-centric and dynamic features present in production pipelines.
The raw multi-view data can be used directly to train and validate models that aim to overcome these observed limitations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adoption of ViVo as a benchmark could shift evaluation standards toward content that better matches end-use conditions in immersive applications.
The presence of synchronized audio alongside visual data suggests opportunities to study joint audio-visual reconstruction tasks.
Future extensions might isolate the contribution of individual diversity factors, such as liquid motion, to identify which scene elements drive the largest performance gaps.

Load-bearing premise

The selected scenes and diversity criteria sufficiently represent the full range of challenges that arise in real-world volumetric video production pipelines.

What would settle it

State-of-the-art reconstruction and compression methods achieving accuracy and efficiency levels on ViVo that match or exceed their results on prior datasets would indicate the new sequences do not introduce uniquely difficult conditions.

Figures

Figures reproduced from arXiv: 2506.00558 by Adrian Azzarelli, David Bull, Fan Zhang, Ge Gao, Ho Man Kwan, Nantheera Anantrasirichai, Ollie Moolan-Feroze.

**Figure 1.** Figure 1: We showcase the scenes and relevant data associated with the NVS and MVV compression experiments. (a-b) is the source data. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: The camera placement is spherical around the human per [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Inside The Metaverse Studio: The truss columns and training [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Adding static masks (left) and dynamic SAM2 masks (middle) [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Visual comparison of dynamic reconstruction methods for each scene. The top row of each result is the full render and the bottom [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Visual comparison of NVS over time for evaluating temporal degradation. The Pony scene was selected to demonstrate: (1) STG [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Rate-distortion performance comparison of selected baselines, with distortion measured by PSNR, SSIM, and IV-PSNR, respectively. [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative comparison of the source and pose trace views by [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

read the original abstract

As research on neural volumetric video reconstruction and compression flourishes, there is a need for diverse and realistic datasets, which can be used to develop and validate reconstruction and compression models. However, existing volumetric video datasets lack diverse content in terms of both semantic and low-level features that are commonly present in real-world production pipelines. In this context, we propose a new dataset, ViVo, for VolumetrIc VideO reconstruction and compression. The dataset is faithful to real-world volumetric video production and is the first dataset to extend the definition of diversity to include both human-centric characteristics (skin, hair, etc.) and dynamic visual phenomena (transparent, reflective, liquid, etc.). Each video sequence in this database contains raw data including fourteen multi-view RGB and depth video pairs, synchronized at 30FPS with per-frame calibration and audio data, and their associated 2-D foreground masks and 3-D point clouds. To demonstrate the use of this database, we have benchmarked three state-of-the-art (SotA) 3-D reconstruction methods and two volumetric video compression algorithms. The obtained results evidence the challenging nature of the proposed dataset and the limitations of existing datasets for both volumetric video reconstruction and compression tasks, highlighting the need to develop more effective algorithms for these applications. The database and the associated results are available at https://vivo-bvicr.github.io/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the ViVo dataset for volumetric video reconstruction and compression. It consists of 14 synchronized multi-view RGB and depth video pairs at 30 FPS with per-frame calibration, audio, 2-D foreground masks, and 3-D point clouds. The central claims are that ViVo is faithful to real-world production pipelines and is the first dataset to extend diversity definitions to encompass both human-centric characteristics (skin, hair variations) and dynamic visual phenomena (transparent, reflective, liquid elements), with benchmarks on three reconstruction and two compression methods demonstrating its challenging nature and the limitations of prior datasets and methods.

Significance. If the representativeness and diversity claims are substantiated, ViVo would provide a valuable new benchmark that highlights gaps in current volumetric video techniques. The release of raw multi-view data plus derived assets (masks, point clouds) and the public availability of results are positive contributions for reproducibility in the field.

major comments (2)

[Abstract; Dataset description (likely §3)] The claim that ViVo extends the definition of diversity to human-centric and dynamic phenomena and is 'faithful to real-world' volumetric video production is load-bearing for the paper's contribution and for interpreting the benchmark results as evidence of general limitations. However, no per-sequence or per-frame quantification is supplied (e.g., fraction of sequences exhibiting transparency, reflection, or liquid effects; distribution of skin tones or hair types; comparison to production statistics). This leaves the representativeness assumption unverified and makes it unclear whether observed method failures reflect broad challenges or selection effects in the chosen scenes.
[Benchmarking section (likely §4)] Benchmarks on three reconstruction and two compression methods are presented to support the 'challenging nature' claim, yet the manuscript provides insufficient detail on evaluation protocols, exact error metrics (e.g., how PSNR/SSIM or geometric errors are computed across frames), and baseline implementation choices. This weakens the ability to interpret the quantitative results as robust evidence of limitations.

minor comments (2)

[Figures and Tables] Figure captions and table headers should explicitly state the number of sequences or frames used for each reported metric to improve clarity.
[Abstract] The abstract states 'the obtained results evidence the challenging nature,' but this phrasing should be softened to 'suggest' or 'indicate' pending the added quantification requested above.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate planned revisions to strengthen the paper.

read point-by-point responses

Referee: [Abstract; Dataset description (likely §3)] The claim that ViVo extends the definition of diversity to human-centric and dynamic phenomena and is 'faithful to real-world' volumetric video production is load-bearing for the paper's contribution and for interpreting the benchmark results as evidence of general limitations. However, no per-sequence or per-frame quantification is supplied (e.g., fraction of sequences exhibiting transparency, reflection, or liquid effects; distribution of skin tones or hair types; comparison to production statistics). This leaves the representativeness assumption unverified and makes it unclear whether observed method failures reflect broad challenges or selection effects in the chosen scenes.

Authors: We agree that explicit quantification would better substantiate the diversity and representativeness claims. In the revised manuscript, we will add a dedicated subsection (or table) in the dataset description that reports per-sequence statistics on dynamic phenomena (e.g., counts of sequences containing transparent objects, reflective surfaces, or liquids) and human-centric attributes (e.g., distribution of skin tones and hair types across the 14 sequences). We will also expand the discussion of capture setup to reference standard volumetric video production practices, clarifying alignment with real-world pipelines. These additions should help address concerns about selection effects. revision: yes
Referee: [Benchmarking section (likely §4)] Benchmarks on three reconstruction and two compression methods are presented to support the 'challenging nature' claim, yet the manuscript provides insufficient detail on evaluation protocols, exact error metrics (e.g., how PSNR/SSIM or geometric errors are computed across frames), and baseline implementation choices. This weakens the ability to interpret the quantitative results as robust evidence of limitations.

Authors: We concur that additional methodological detail is required for reproducibility and robust interpretation. In the revised §4, we will provide expanded descriptions of the evaluation protocols, including precise definitions and aggregation methods for metrics (e.g., per-frame PSNR/SSIM computation and averaging, geometric error calculations), the exact baseline implementations (including versions, modifications, and hyperparameters), and any preprocessing steps. We will also release the evaluation scripts with the dataset to support verification of the reported results. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset release with external benchmarks is self-contained

full rationale

The paper is a dataset release accompanied by empirical benchmarks on three external 3-D reconstruction methods and two compression algorithms. No derivations, predictions, fitted parameters, or self-citations appear in the provided text. Claims of diversity and faithfulness rest on scene selection and observed benchmark performance against independent SotA methods rather than any reduction to internal definitions or prior author work. The work contains no load-bearing steps that equate outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset introduction paper with no mathematical derivations, free parameters, or postulated entities; all content consists of collected real-world recordings and standard benchmarking.

pith-pipeline@v0.9.0 · 5800 in / 1132 out tokens · 37408 ms · 2026-05-19T11:50:12.394844+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The dataset is faithful to real-world volumetric video production and is the first dataset to extend the definition of diversity to include both human-centric characteristics (skin, hair, etc.) and dynamic visual phenomena (transparent, reflective, liquid, etc.).
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We benchmarked three state-of-the-art (SotA) 3-D reconstruction methods and two volumetric video compression algorithms.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges
cs.CV 2025-11 unverdicted novelty 6.0

Splatography improves dynamic 3D reconstruction from sparse multi-view videos by splitting foreground and background Gaussian representations and applying tailored deformation learning for each.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,

S. Peng, Y . Zhang, Y . Xu, Q. Wang, Q. Shuai, H. Bao, and X. Zhou, “Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,” in CVPR, 2021

work page 2021
[2]

Panoptic studio: A massively multiview system for social motion capture,

H. Joo, H. Liu, L. Tan, L. Gui, B. Nabbe, I. Matthews, T. Kanade, S. Nobuhara, and Y . Sheikh, “Panoptic studio: A massively multiview system for social motion capture,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 3334–3342

work page 2015
[3]

SAM 2: Segment Anything in Images and Videos

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafson et al., “Sam 2: Segment any- thing in images and videos,” arXiv preprint arXiv:2408.00714 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

work page 2021
[5]

Instant neural graphics primitives with a multiresolution hash encoding,

T. M ¨uller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM transactions on graphics (TOG) , vol. 41, no. 4, pp. 1–15, 2022

work page 2022
[6]

Neural human performer: Learning generalizable radiance fields for human performance rendering,

Y . Kwon, D. Kim, D. Ceylan, and H. Fuchs, “Neural human performer: Learning generalizable radiance fields for human performance rendering,” Advances in Neural Information Pro- cessing Systems , vol. 34, pp. 24 741–24 752, 2021

work page 2021
[7]

3d gaussian splatting for real-time radiance field rendering

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering.” ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023

work page 2023
[8]

Human gaussian splatting: Real-time rendering of an- imatable avatars,

A. Moreau, J. Song, H. Dhamo, R. Shaw, Y . Zhou, and E. P´erez- Pellitero, “Human gaussian splatting: Real-time rendering of an- imatable avatars,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2024, pp. 788–798

work page 2024
[9]

4d gaussian splatting for real-time dynamic scene rendering,

G. Wu, T. Yi, J. Fang, L. Xie, X. Zhang, W. Wei, W. Liu, Q. Tian, and X. Wang, “4d gaussian splatting for real-time dynamic scene rendering,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2024, pp. 20 310–20 320

work page 2024
[10]

Spacetime gaussian feature splatting for real-time dynamic view synthesis,

Z. Li, Z. Chen, Z. Li, and Y . Xu, “Spacetime gaussian feature splatting for real-time dynamic view synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8508–8520

work page 2024
[11]

Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction,

Z. Yang, X. Gao, W. Zhou, S. Jiao, Y . Zhang, and X. Jin, “Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction,” in Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition , 2024, pp. 20 331–20 341

work page 2024
[12]

Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes,

Y .-H. Huang, Y .-T. Sun, Z. Yang, X. Lyu, Y .-P. Cao, and X. Qi, “Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2024, pp. 4220– 4230

work page 2024
[13]

K-planes: Explicit radiance fields in space, time, and appearance,

S. Fridovich-Keil, G. Meanti, F. R. Warburg, B. Recht, and A. Kanazawa, “K-planes: Explicit radiance fields in space, time, and appearance,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 12 479– 12 488

work page 2023
[14]

Hexplane: A fast representation for dynamic scenes,

A. Cao and J. Johnson, “Hexplane: A fast representation for dynamic scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 130– 141

work page 2023
[15]

ACM Transactions on Graphics , year =

R. W. Sumner, J. Schmid, and M. Pauly, “Embedded deformation for shape manipulation,” ACM Trans. Graph. , vol. 26, no. 3, p. 80–es, Jul. 2007. [Online]. Available: https://doi.org/10.1145/1276377.1276478

work page doi:10.1145/1276377.1276478 2007
[16]

Dna-rendering: A diverse neural 11 actor repository for high-fidelity human-centric rendering,

W. Cheng, R. Chen, S. Fan, W. Yin, K. Chen, Z. Cai, J. Wang, Y . Gao, Z. Yu, Z. Lin et al. , “Dna-rendering: A diverse neural 11 actor repository for high-fidelity human-centric rendering,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 19 982–19 993

work page 2023
[17]

Mvhumannet: A large-scale dataset of multi-view daily dressing human captures,

Z. Xiong, C. Li, K. Liu, H. Liao, J. Hu, J. Zhu, S. Ning, L. Qiu, C. Wang, S. Wang et al. , “Mvhumannet: A large-scale dataset of multi-view daily dressing human captures,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 19 801–19 811

work page 2024
[18]

Exploring dy- namic novel view synthesis technologies for cinematography,

A. Azzarelli, N. Anantrasirichai, and D. R. Bull, “Exploring dy- namic novel view synthesis technologies for cinematography,” arXiv preprint arXiv:2412.17532 , 2024

work page arXiv 2024
[19]

Neural 3d video synthesis from multi-view video,

T. Li, M. Slavcheva, M. Zollhoefer, S. Green, C. Lassner, C. Kim, T. Schmidt, S. Lovegrove, M. Goesele, R. Newcombe et al. , “Neural 3d video synthesis from multi-view video,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2022, pp. 5521–5531

work page 2022
[20]

Overview of the stereo and multiview video coding extensions of the h. 264/mpeg-4 avc standard,

A. Vetro, T. Wiegand, and G. J. Sullivan, “Overview of the stereo and multiview video coding extensions of the h. 264/mpeg-4 avc standard,” Proceedings of the IEEE , vol. 99, no. 4, pp. 626–642, 2011

work page 2011
[21]

Overview of the multiview and 3d extensions of high efficiency video coding,

G. Tech, Y . Chen, K. M ¨uller, J.-R. Ohm, A. Vetro, and Y .-K. Wang, “Overview of the multiview and 3d extensions of high efficiency video coding,” IEEE Transactions on Circuits and Systems for Video Technology , vol. 26, no. 1, pp. 35–49, 2015

work page 2015
[22]

MPEG immersive video coding standard,

J. M. Boyce, R. Dor ´e, A. Dziembowski, J. Fleureau, J. Jung, B. Kroon, B. Salahieh, V . K. M. Vadakital, and L. Yu, “MPEG immersive video coding standard,” Proceedings of the IEEE , vol. 109, no. 9, pp. 1521–1536, 2021

work page 2021
[23]

Overview of the high efficiency video coding (hevc) standard,

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on circuits and systems for video technology , vol. 22, no. 12, pp. 1649–1668, 2012

work page 2012
[24]

Fv-nerv: Neural compression for free viewpoint videos,

T. Fujihashi, S. Kato, and T. Koike-Akino, “Fv-nerv: Neural compression for free viewpoint videos,” in Workshop on Ma- chine Learning and Compression, NeurIPS 2024

work page 2024
[25]

Immersive video compression using implicit neural representations,

H. M. Kwan, F. Zhang, A. Gower, and D. Bull, “Immersive video compression using implicit neural representations,” in PCS. IEEE, 2024, pp. 1–5

work page 2024
[26]

Neu- ral volumetric video coding with hierarchical coded represen- tation of dynamic volume,

J.-Y . Shin, J.-K. Lee, G. Bang, J.-S. Kim, and J.-W. Kang, “Neu- ral volumetric video coding with hierarchical coded represen- tation of dynamic volume,” IEEE Transactions on Multimedia , 2025

work page 2025
[27]

Implicit-explicit integrated representations for multi-view video compression,

C. Zhu, G. Lu, B. He, R. Xie, and L. Song, “Implicit-explicit integrated representations for multi-view video compression,” IEEE Trans. Image Process. , vol. 34, pp. 1106–1118, 2025

work page 2025
[28]

MPEG immersive video coding standard,

J. M. Boyce, R. Dor ´e, A. Dziembowski, J. Fleureau, J. Jung, B. Kroon, B. Salahieh, V . K. M. Vadakital, and L. Yu, “MPEG immersive video coding standard,” Proc. IEEE, vol. 109, no. 9, pp. 1521–1536, 2021

work page 2021
[29]

Jpeg pleno database: 8i voxelized full bodies (8ivfb v2)-a dynamic voxelized point cloud dataset,

E. d’Eon, B. Harrison, T. Myers, and P. A. Chou, “Jpeg pleno database: 8i voxelized full bodies (8ivfb v2)-a dynamic voxelized point cloud dataset,” 2019

work page 2019
[30]

Microsoft voxelized upper bodies – a voxelized point cloud dataset,

C. Loop, Q. Cai, S. O. Escolano, and P. A. Chou, “Microsoft voxelized upper bodies – a voxelized point cloud dataset,” ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document m38673/M72012, Geneva, May 2016, available at: https://www.microsoft.com/en- us/research/publication/microsoft-voxelized-upper-bodies-a- voxelized-point-cloud-dataset/

work page 2016
[31]

UVG-VPC: voxelized point cloud dataset for visual volumetric video-based coding,

G. Gautier, A. Mercat, L. Fr ´eneau, M. Pitk ¨anen, and J. Vanne, “UVG-VPC: voxelized point cloud dataset for visual volumetric video-based coding,” in 2023 15th international conference on quality of Multimedia experience (QoMEX) . IEEE, 2023, pp. 244–247

work page 2023
[32]

Textured mesh vs coloured point cloud: A subjective study for volumetric video compression,

E. Zerman, C. Ozcinar, P. Gao, and A. Smolic, “Textured mesh vs coloured point cloud: A subjective study for volumetric video compression,” in Twelfth International Conference on Quality of Multimedia Experience (QoMEX) , 2020

work page 2020
[33]

Owlii dynamic human textured mesh sequence dataset,

Y . Xu, Y . Lu, and Z. Wen, “Owlii dynamic human textured mesh sequence dataset,” in ISO/IEC JTC1/SC29/WG1 1 input document m41658 , 2017

work page 2017
[34]

BVI-CR: A multi-view hu- man dataset for volumetric video compression,

G. Gao, A. Azzarelli, H. M. Kwan, N. Anantrasirichai, F. Zhang, O. Moolan-Feroze, and D. Bull, “BVI-CR: A multi-view hu- man dataset for volumetric video compression,” arXiv preprint arXiv:2411.11199, 2024

work page arXiv 2024
[35]

MPEG video-based point cloud compression (V-PCC) standard,

G. Li, W. Gao, and W. Gao, “MPEG video-based point cloud compression (V-PCC) standard,” in Point Cloud Compression: Technologies and Standardization . Springer, 2024, pp. 199– 218

work page 2024
[36]

PKU-DyMVHumans: A multi-view video bench- mark for high-fidelity dynamic human modeling,

X. Zheng, L. Liao, X. Li, J. Jiao, R. Wang, F. Gao, S. Wang, and R. Wang, “PKU-DyMVHumans: A multi-view video bench- mark for high-fidelity dynamic human modeling,” in Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22 530–22 540

work page 2024
[37]

Robust dual gaussian splatting for immersive human- centric volumetric videos,

Y . Jiang, Z. Shen, Y . Hong, C. Guo, Y . Wu, Y . Zhang, J. Yu, and L. Xu, “Robust dual gaussian splatting for immersive human- centric volumetric videos,” ACM Transactions on Graphics (TOG), vol. 43, no. 6, pp. 1–15, 2024

work page 2024
[38]

Aist dance video database: Multi-genre, multi-dancer, and multi- camera database for dance information processing

S. Tsuchida, S. Fukayama, M. Hamasaki, and M. Goto, “Aist dance video database: Multi-genre, multi-dancer, and multi- camera database for dance information processing.” in ISMIR, vol. 1, no. 5, 2019, p. 6

work page 2019
[39]

Ai choreog- rapher: Music conditioned 3d dance generation with aist++,

R. Li, S. Yang, D. A. Ross, and A. Kanazawa, “Ai choreog- rapher: Music conditioned 3d dance generation with aist++,” in Proceedings of the IEEE/CVF international conference on computer vision , 2021, pp. 13 401–13 412

work page 2021
[40]

Real-time deep dynamic characters,

M. Habermann, L. Liu, W. Xu, M. Zollhoefer, G. Pons-Moll, and C. Theobalt, “Real-time deep dynamic characters,” ACM Transactions on Graphics (ToG), vol. 40, no. 4, pp. 1–16, 2021

work page 2021
[41]

Generalizable neural performer: Learning robust radi- ance fields for human novel view synthesis,

W. Cheng, S. Xu, J. Piao, C. Qian, W. Wu, K.-Y . Lin, and H. Li, “Generalizable neural performer: Learning robust radi- ance fields for human novel view synthesis,” arXiv preprint arXiv:2204.11798, 2022

work page arXiv 2022
[42]

Hu- man3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,

C. Ionescu, D. Papava, V . Olaru, and C. Sminchisescu, “Hu- man3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE transactions on pattern analysis and machine intelligence , vol. 36, no. 7, pp. 1325–1339, 2013

work page 2013
[43]

Humbi: A large multiview dataset of human body expressions,

Z. Yu, J. S. Yoon, I. K. Lee, P. Venkatesh, J. Park, J. Yu, and H. S. Park, “Humbi: A large multiview dataset of human body expressions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2020, pp. 2990– 3000

work page 2020
[44]

Humman: Multi-modal 4d human dataset for versatile sensing and modeling,

Z. Cai, D. Ren, A. Zeng, Z. Lin, T. Yu, W. Wang, X. Fan, Y . Gao, Y . Yu, L. Pan et al. , “Humman: Multi-modal 4d human dataset for versatile sensing and modeling,” in European Conference on Computer Vision . Springer, 2022, pp. 557–577

work page 2022
[45]

Open3D: A Modern Library for 3D Data Processing

Q.-Y . Zhou, J. Park, and V . Koltun, “Open3d: A modern library for 3d data processing,” arXiv preprint arXiv:1801.09847, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[46]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing , vol. 13, no. 4, pp. 600–612, 2004

work page 2004
[47]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2018, pp. 586–595

work page 2018
[48]

Common test condi- tions for mpeg immersive video,

A. Dziembowski, B. Kroon, and J. Jung, “Common test condi- tions for mpeg immersive video,” ISO/IEC JTC 1/SC 29/WG 04, Technical Report, 2023

work page 2023
[49]

VVenC: An Open And Optimized VVC Encoder Implementation,

A. Wieckowski, J. Brandenburg, T. Hinz, C. Bartnik, V . George, G. Hege, C. Helmrich, A. Henkel, C. Lehmann, C. Stoffers, I. Zupancic, B. Bross, and D. Marpe, “VVenC: An Open And Optimized VVC Encoder Implementation,” in Proc. IEEE Inter- national Conference on Multimedia Expo Workshops (ICMEW) , pp. 1–2

work page
[50]

Calculation of average PSNR differences between RD-curves,

G. Bjøntegaard, “Calculation of average PSNR differences between RD-curves,” in 13th VCEG Meeting, no. VCEG- 12 M33,Austin, Texas, 2001, pp. USA: ITU–T

work page 2001
[51]

Iv-psnr—the objective quality metric for immersive video applications,

A. Dziembowski, D. Mieloch, J. Stankowski, and A. Grzelka, “Iv-psnr—the objective quality metric for immersive video applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7575–7591, 2022. Adrian Azzarelli received the M.Eng degree in Electronic Engineering with Artificial Intelligence from the University of So...

work page 2022
[52]

His research interests focus on low-level computer vision including neural video compression, implicit neural representations, and generative models

He is currently a Research Associate with the School of Computer Science, University of Bristol. His research interests focus on low-level computer vision including neural video compression, implicit neural representations, and generative models. Ho Man Kwan received the B.Eng. degree in Com- puter Engineering and the M.Phil. degree in Elec- tronic and Co...

work page 2018
[53]

Fan is also a member of the Visual Signal Processing and Communications Technical Committee associated with the IEEE Circuits and Systems Society

and Frontiers in Signal Processing (in 2022). Fan is also a member of the Visual Signal Processing and Communications Technical Committee associated with the IEEE Circuits and Systems Society. His research interests focus on low-level computer vision including video compression, quality assessment, super resolution and video frame interpolation. Nantheera...

work page 2022

[1] [1]

Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,

S. Peng, Y . Zhang, Y . Xu, Q. Wang, Q. Shuai, H. Bao, and X. Zhou, “Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,” in CVPR, 2021

work page 2021

[2] [2]

Panoptic studio: A massively multiview system for social motion capture,

H. Joo, H. Liu, L. Tan, L. Gui, B. Nabbe, I. Matthews, T. Kanade, S. Nobuhara, and Y . Sheikh, “Panoptic studio: A massively multiview system for social motion capture,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 3334–3342

work page 2015

[3] [3]

SAM 2: Segment Anything in Images and Videos

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafson et al., “Sam 2: Segment any- thing in images and videos,” arXiv preprint arXiv:2408.00714 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[4] [4]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

work page 2021

[5] [5]

Instant neural graphics primitives with a multiresolution hash encoding,

T. M ¨uller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM transactions on graphics (TOG) , vol. 41, no. 4, pp. 1–15, 2022

work page 2022

[6] [6]

Neural human performer: Learning generalizable radiance fields for human performance rendering,

Y . Kwon, D. Kim, D. Ceylan, and H. Fuchs, “Neural human performer: Learning generalizable radiance fields for human performance rendering,” Advances in Neural Information Pro- cessing Systems , vol. 34, pp. 24 741–24 752, 2021

work page 2021

[7] [7]

3d gaussian splatting for real-time radiance field rendering

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering.” ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023

work page 2023

[8] [8]

Human gaussian splatting: Real-time rendering of an- imatable avatars,

A. Moreau, J. Song, H. Dhamo, R. Shaw, Y . Zhou, and E. P´erez- Pellitero, “Human gaussian splatting: Real-time rendering of an- imatable avatars,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2024, pp. 788–798

work page 2024

[9] [9]

4d gaussian splatting for real-time dynamic scene rendering,

G. Wu, T. Yi, J. Fang, L. Xie, X. Zhang, W. Wei, W. Liu, Q. Tian, and X. Wang, “4d gaussian splatting for real-time dynamic scene rendering,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2024, pp. 20 310–20 320

work page 2024

[10] [10]

Spacetime gaussian feature splatting for real-time dynamic view synthesis,

Z. Li, Z. Chen, Z. Li, and Y . Xu, “Spacetime gaussian feature splatting for real-time dynamic view synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8508–8520

work page 2024

[11] [11]

Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction,

Z. Yang, X. Gao, W. Zhou, S. Jiao, Y . Zhang, and X. Jin, “Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction,” in Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition , 2024, pp. 20 331–20 341

work page 2024

[12] [12]

Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes,

Y .-H. Huang, Y .-T. Sun, Z. Yang, X. Lyu, Y .-P. Cao, and X. Qi, “Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2024, pp. 4220– 4230

work page 2024

[13] [13]

K-planes: Explicit radiance fields in space, time, and appearance,

S. Fridovich-Keil, G. Meanti, F. R. Warburg, B. Recht, and A. Kanazawa, “K-planes: Explicit radiance fields in space, time, and appearance,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 12 479– 12 488

work page 2023

[14] [14]

Hexplane: A fast representation for dynamic scenes,

A. Cao and J. Johnson, “Hexplane: A fast representation for dynamic scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 130– 141

work page 2023

[15] [15]

ACM Transactions on Graphics , year =

R. W. Sumner, J. Schmid, and M. Pauly, “Embedded deformation for shape manipulation,” ACM Trans. Graph. , vol. 26, no. 3, p. 80–es, Jul. 2007. [Online]. Available: https://doi.org/10.1145/1276377.1276478

work page doi:10.1145/1276377.1276478 2007

[16] [16]

Dna-rendering: A diverse neural 11 actor repository for high-fidelity human-centric rendering,

W. Cheng, R. Chen, S. Fan, W. Yin, K. Chen, Z. Cai, J. Wang, Y . Gao, Z. Yu, Z. Lin et al. , “Dna-rendering: A diverse neural 11 actor repository for high-fidelity human-centric rendering,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 19 982–19 993

work page 2023

[17] [17]

Mvhumannet: A large-scale dataset of multi-view daily dressing human captures,

Z. Xiong, C. Li, K. Liu, H. Liao, J. Hu, J. Zhu, S. Ning, L. Qiu, C. Wang, S. Wang et al. , “Mvhumannet: A large-scale dataset of multi-view daily dressing human captures,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 19 801–19 811

work page 2024

[18] [18]

Exploring dy- namic novel view synthesis technologies for cinematography,

A. Azzarelli, N. Anantrasirichai, and D. R. Bull, “Exploring dy- namic novel view synthesis technologies for cinematography,” arXiv preprint arXiv:2412.17532 , 2024

work page arXiv 2024

[19] [19]

Neural 3d video synthesis from multi-view video,

T. Li, M. Slavcheva, M. Zollhoefer, S. Green, C. Lassner, C. Kim, T. Schmidt, S. Lovegrove, M. Goesele, R. Newcombe et al. , “Neural 3d video synthesis from multi-view video,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2022, pp. 5521–5531

work page 2022

[20] [20]

Overview of the stereo and multiview video coding extensions of the h. 264/mpeg-4 avc standard,

A. Vetro, T. Wiegand, and G. J. Sullivan, “Overview of the stereo and multiview video coding extensions of the h. 264/mpeg-4 avc standard,” Proceedings of the IEEE , vol. 99, no. 4, pp. 626–642, 2011

work page 2011

[21] [21]

Overview of the multiview and 3d extensions of high efficiency video coding,

G. Tech, Y . Chen, K. M ¨uller, J.-R. Ohm, A. Vetro, and Y .-K. Wang, “Overview of the multiview and 3d extensions of high efficiency video coding,” IEEE Transactions on Circuits and Systems for Video Technology , vol. 26, no. 1, pp. 35–49, 2015

work page 2015

[22] [22]

MPEG immersive video coding standard,

J. M. Boyce, R. Dor ´e, A. Dziembowski, J. Fleureau, J. Jung, B. Kroon, B. Salahieh, V . K. M. Vadakital, and L. Yu, “MPEG immersive video coding standard,” Proceedings of the IEEE , vol. 109, no. 9, pp. 1521–1536, 2021

work page 2021

[23] [23]

Overview of the high efficiency video coding (hevc) standard,

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on circuits and systems for video technology , vol. 22, no. 12, pp. 1649–1668, 2012

work page 2012

[24] [24]

Fv-nerv: Neural compression for free viewpoint videos,

T. Fujihashi, S. Kato, and T. Koike-Akino, “Fv-nerv: Neural compression for free viewpoint videos,” in Workshop on Ma- chine Learning and Compression, NeurIPS 2024

work page 2024

[25] [25]

Immersive video compression using implicit neural representations,

H. M. Kwan, F. Zhang, A. Gower, and D. Bull, “Immersive video compression using implicit neural representations,” in PCS. IEEE, 2024, pp. 1–5

work page 2024

[26] [26]

Neu- ral volumetric video coding with hierarchical coded represen- tation of dynamic volume,

J.-Y . Shin, J.-K. Lee, G. Bang, J.-S. Kim, and J.-W. Kang, “Neu- ral volumetric video coding with hierarchical coded represen- tation of dynamic volume,” IEEE Transactions on Multimedia , 2025

work page 2025

[27] [27]

Implicit-explicit integrated representations for multi-view video compression,

C. Zhu, G. Lu, B. He, R. Xie, and L. Song, “Implicit-explicit integrated representations for multi-view video compression,” IEEE Trans. Image Process. , vol. 34, pp. 1106–1118, 2025

work page 2025

[28] [28]

MPEG immersive video coding standard,

J. M. Boyce, R. Dor ´e, A. Dziembowski, J. Fleureau, J. Jung, B. Kroon, B. Salahieh, V . K. M. Vadakital, and L. Yu, “MPEG immersive video coding standard,” Proc. IEEE, vol. 109, no. 9, pp. 1521–1536, 2021

work page 2021

[29] [29]

Jpeg pleno database: 8i voxelized full bodies (8ivfb v2)-a dynamic voxelized point cloud dataset,

E. d’Eon, B. Harrison, T. Myers, and P. A. Chou, “Jpeg pleno database: 8i voxelized full bodies (8ivfb v2)-a dynamic voxelized point cloud dataset,” 2019

work page 2019

[30] [30]

Microsoft voxelized upper bodies – a voxelized point cloud dataset,

C. Loop, Q. Cai, S. O. Escolano, and P. A. Chou, “Microsoft voxelized upper bodies – a voxelized point cloud dataset,” ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document m38673/M72012, Geneva, May 2016, available at: https://www.microsoft.com/en- us/research/publication/microsoft-voxelized-upper-bodies-a- voxelized-point-cloud-dataset/

work page 2016

[31] [31]

UVG-VPC: voxelized point cloud dataset for visual volumetric video-based coding,

G. Gautier, A. Mercat, L. Fr ´eneau, M. Pitk ¨anen, and J. Vanne, “UVG-VPC: voxelized point cloud dataset for visual volumetric video-based coding,” in 2023 15th international conference on quality of Multimedia experience (QoMEX) . IEEE, 2023, pp. 244–247

work page 2023

[32] [32]

Textured mesh vs coloured point cloud: A subjective study for volumetric video compression,

E. Zerman, C. Ozcinar, P. Gao, and A. Smolic, “Textured mesh vs coloured point cloud: A subjective study for volumetric video compression,” in Twelfth International Conference on Quality of Multimedia Experience (QoMEX) , 2020

work page 2020

[33] [33]

Owlii dynamic human textured mesh sequence dataset,

Y . Xu, Y . Lu, and Z. Wen, “Owlii dynamic human textured mesh sequence dataset,” in ISO/IEC JTC1/SC29/WG1 1 input document m41658 , 2017

work page 2017

[34] [34]

BVI-CR: A multi-view hu- man dataset for volumetric video compression,

G. Gao, A. Azzarelli, H. M. Kwan, N. Anantrasirichai, F. Zhang, O. Moolan-Feroze, and D. Bull, “BVI-CR: A multi-view hu- man dataset for volumetric video compression,” arXiv preprint arXiv:2411.11199, 2024

work page arXiv 2024

[35] [35]

MPEG video-based point cloud compression (V-PCC) standard,

G. Li, W. Gao, and W. Gao, “MPEG video-based point cloud compression (V-PCC) standard,” in Point Cloud Compression: Technologies and Standardization . Springer, 2024, pp. 199– 218

work page 2024

[36] [36]

PKU-DyMVHumans: A multi-view video bench- mark for high-fidelity dynamic human modeling,

X. Zheng, L. Liao, X. Li, J. Jiao, R. Wang, F. Gao, S. Wang, and R. Wang, “PKU-DyMVHumans: A multi-view video bench- mark for high-fidelity dynamic human modeling,” in Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22 530–22 540

work page 2024

[37] [37]

Robust dual gaussian splatting for immersive human- centric volumetric videos,

Y . Jiang, Z. Shen, Y . Hong, C. Guo, Y . Wu, Y . Zhang, J. Yu, and L. Xu, “Robust dual gaussian splatting for immersive human- centric volumetric videos,” ACM Transactions on Graphics (TOG), vol. 43, no. 6, pp. 1–15, 2024

work page 2024

[38] [38]

Aist dance video database: Multi-genre, multi-dancer, and multi- camera database for dance information processing

S. Tsuchida, S. Fukayama, M. Hamasaki, and M. Goto, “Aist dance video database: Multi-genre, multi-dancer, and multi- camera database for dance information processing.” in ISMIR, vol. 1, no. 5, 2019, p. 6

work page 2019

[39] [39]

Ai choreog- rapher: Music conditioned 3d dance generation with aist++,

R. Li, S. Yang, D. A. Ross, and A. Kanazawa, “Ai choreog- rapher: Music conditioned 3d dance generation with aist++,” in Proceedings of the IEEE/CVF international conference on computer vision , 2021, pp. 13 401–13 412

work page 2021

[40] [40]

Real-time deep dynamic characters,

M. Habermann, L. Liu, W. Xu, M. Zollhoefer, G. Pons-Moll, and C. Theobalt, “Real-time deep dynamic characters,” ACM Transactions on Graphics (ToG), vol. 40, no. 4, pp. 1–16, 2021

work page 2021

[41] [41]

Generalizable neural performer: Learning robust radi- ance fields for human novel view synthesis,

W. Cheng, S. Xu, J. Piao, C. Qian, W. Wu, K.-Y . Lin, and H. Li, “Generalizable neural performer: Learning robust radi- ance fields for human novel view synthesis,” arXiv preprint arXiv:2204.11798, 2022

work page arXiv 2022

[42] [42]

Hu- man3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,

C. Ionescu, D. Papava, V . Olaru, and C. Sminchisescu, “Hu- man3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE transactions on pattern analysis and machine intelligence , vol. 36, no. 7, pp. 1325–1339, 2013

work page 2013

[43] [43]

Humbi: A large multiview dataset of human body expressions,

Z. Yu, J. S. Yoon, I. K. Lee, P. Venkatesh, J. Park, J. Yu, and H. S. Park, “Humbi: A large multiview dataset of human body expressions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2020, pp. 2990– 3000

work page 2020

[44] [44]

Humman: Multi-modal 4d human dataset for versatile sensing and modeling,

Z. Cai, D. Ren, A. Zeng, Z. Lin, T. Yu, W. Wang, X. Fan, Y . Gao, Y . Yu, L. Pan et al. , “Humman: Multi-modal 4d human dataset for versatile sensing and modeling,” in European Conference on Computer Vision . Springer, 2022, pp. 557–577

work page 2022

[45] [45]

Open3D: A Modern Library for 3D Data Processing

Q.-Y . Zhou, J. Park, and V . Koltun, “Open3d: A modern library for 3d data processing,” arXiv preprint arXiv:1801.09847, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[46] [46]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing , vol. 13, no. 4, pp. 600–612, 2004

work page 2004

[47] [47]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2018, pp. 586–595

work page 2018

[48] [48]

Common test condi- tions for mpeg immersive video,

A. Dziembowski, B. Kroon, and J. Jung, “Common test condi- tions for mpeg immersive video,” ISO/IEC JTC 1/SC 29/WG 04, Technical Report, 2023

work page 2023

[49] [49]

VVenC: An Open And Optimized VVC Encoder Implementation,

A. Wieckowski, J. Brandenburg, T. Hinz, C. Bartnik, V . George, G. Hege, C. Helmrich, A. Henkel, C. Lehmann, C. Stoffers, I. Zupancic, B. Bross, and D. Marpe, “VVenC: An Open And Optimized VVC Encoder Implementation,” in Proc. IEEE Inter- national Conference on Multimedia Expo Workshops (ICMEW) , pp. 1–2

work page

[50] [50]

Calculation of average PSNR differences between RD-curves,

G. Bjøntegaard, “Calculation of average PSNR differences between RD-curves,” in 13th VCEG Meeting, no. VCEG- 12 M33,Austin, Texas, 2001, pp. USA: ITU–T

work page 2001

[51] [51]

Iv-psnr—the objective quality metric for immersive video applications,

A. Dziembowski, D. Mieloch, J. Stankowski, and A. Grzelka, “Iv-psnr—the objective quality metric for immersive video applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7575–7591, 2022. Adrian Azzarelli received the M.Eng degree in Electronic Engineering with Artificial Intelligence from the University of So...

work page 2022

[52] [52]

His research interests focus on low-level computer vision including neural video compression, implicit neural representations, and generative models

He is currently a Research Associate with the School of Computer Science, University of Bristol. His research interests focus on low-level computer vision including neural video compression, implicit neural representations, and generative models. Ho Man Kwan received the B.Eng. degree in Com- puter Engineering and the M.Phil. degree in Elec- tronic and Co...

work page 2018

[53] [53]

Fan is also a member of the Visual Signal Processing and Communications Technical Committee associated with the IEEE Circuits and Systems Society

and Frontiers in Signal Processing (in 2022). Fan is also a member of the Visual Signal Processing and Communications Technical Committee associated with the IEEE Circuits and Systems Society. His research interests focus on low-level computer vision including video compression, quality assessment, super resolution and video frame interpolation. Nantheera...

work page 2022