Evaluating Video Quality Metrics for Neural and Traditional Codecs using 4K/UHD-1 Videos

Alexander Raake; Benjamin Herb; Rakesh Rao Ramachandra Rao; Steve G\"oring

arxiv: 2511.00969 · v1 · pith:5WGJRFMCnew · submitted 2025-11-02 · 📡 eess.IV

Evaluating Video Quality Metrics for Neural and Traditional Codecs using 4K/UHD-1 Videos

Benjamin Herb , Rakesh Rao Ramachandra Rao , Steve G\"oring , Alexander Raake This is my paper

Pith reviewed 2026-05-21 19:15 UTC · model grok-4.3

classification 📡 eess.IV

keywords video quality assessmentneural video codecssubjective teststraditional codecs4K UHDquality metricscorrelation coefficientsVMAF

0 comments

The pith

Subjective tests show no significant differences in how well quality metrics perform on neural versus traditional video codecs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper investigates if existing quality metrics remain valid for neural video codecs compared to traditional ones through a subjective study. Using 4K videos encoded with AV1, VVC, DCVC-FM, and DCVC-RT, they collected ratings from 30 participants on 216 sequences. Results show strong performance from VMAF and AVQBits for Pearson correlation, PSNR for Spearman rank, and FasterVQA for no-reference, with no significant differences between codec types. This finding matters because it suggests current metrics can evaluate new neural compression methods reliably. The dataset is released publicly.

Core claim

The paper claims that no significant performance differences in metric reliability are observed between traditional and neural video codecs. VMAF and AVQBits demonstrate strong Pearson correlation with subjective scores, PSNR shows the highest Spearman rank order correlation for within-sequence comparisons, and FasterVQA performs best among no-reference metrics. This is determined from a controlled subjective test with 30 participants rating sequences from two traditional and two neural codecs on 4K content.

What carries the argument

Correlation analysis of objective quality metrics (full-reference, hybrid, no-reference) against human subjective ratings from a controlled experiment with traditional (AV1, VVC) and neural (DCVC-FM, DCVC-RT) codecs on 4K/UHD-1 videos.

If this is right

VMAF can be used reliably to assess both neural and traditional video codecs.
PSNR is suitable for comparing quality rankings within sequences across codec types.
No-reference metrics like FasterVQA show promise for scenarios without reference video.
The public dataset supports development and testing of improved metrics for emerging codecs.
Engineers can apply existing metric tools when comparing compression performance of neural and traditional approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Neural codecs may produce distortions that existing metrics are already equipped to measure.
Testing on more diverse video content could strengthen or challenge the generalizability.
This supports continued use of current evaluation standards as neural codecs mature.
Extensions to higher resolutions like 8K or different frame rates could be explored next.

Load-bearing premise

The specific codecs chosen and the selected 4K video content are sufficiently representative to generalize about metric performance for neural versus traditional codecs overall.

What would settle it

Observing statistically significant differences in metric correlations when using a broader set of video contents or additional neural codec implementations would challenge the central finding.

Figures

Figures reproduced from arXiv: 2511.00969 by Alexander Raake, Benjamin Herb, Rakesh Rao Ramachandra Rao, Steve G\"oring.

**Figure 3.** Figure 3: Mean PSNR and bitrate values for all six sequences encoded at multiple quality levels per codec and resolution. Quality levels are chosen using the [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 5.** Figure 5: Distribution of ratings. 1 2 3 4 5 MOS 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Standard Deviation a: 0.242 [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗

**Figure 7.** Figure 7: Subjective results for all shown sequences. [PITH_FULL_IMAGE:figures/full_fig_p004_7.png] view at source ↗

**Figure 8.** Figure 8: MOS compared to the metric results. Each metric axis is linearly mapped to the ACR scale following ITU-T Rec. P.1401 [32]. [PITH_FULL_IMAGE:figures/full_fig_p005_8.png] view at source ↗

read the original abstract

With neural video codecs (NVCs) emerging as promising alternatives for traditional compression methods, it is increasingly important to determine whether existing quality metrics remain valid for evaluating their performance. However, few studies have systematically investigated this using well-designed subjective tests. To address this gap, this paper presents a subjective quality assessment study using two traditional (AV1 and VVC) and two variants of a neural video codec (DCVC-FM and DCVC-RT). Six source videos (8-10 seconds each, 4K/UHD-1, 60 fps) were encoded at four resolutions (360p to 2160p) using nine different QP values, resulting in 216 sequences that were rated in a controlled environment by 30 participants. These results were used to evaluate a range of full-reference, hybrid, and no-reference quality metrics to assess their applicability to the induced quality degradations. The objective quality assessment results show that VMAF and AVQBits|H0|f demonstrate strong Pearson correlation, while FasterVQA performed best among the tested no-reference metrics. Furthermore, PSNR shows the highest Spearman rank order correlation for within-sequence comparisons across the different codecs. Importantly, no significant performance differences in metric reliability are observed between traditional and neural video codecs across the tested metrics. The dataset, consisting of source videos, encoded videos, and both subjective and quality metric scores will be made publicly available following an open-science approach (https://github.com/Telecommunication-Telemedia-Assessment/AVT-VQDB-UHD-1-NVC).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New subjective ratings for neural codecs on 4K content are the useful addition here, but the no-difference claim between codec types rests on observed similarity without a formal test.

read the letter

Here's the quick read on this one. The paper's real value is the new set of subjective ratings for neural video codecs alongside traditional ones on 4K material, plus the promise to release the full dataset. They used six 8-10 second 4K sources at 60 fps, encoded them with AV1 and VVC as the traditional options and DCVC-FM plus DCVC-RT as the neural ones. They went through four resolutions and nine QP values for 216 total sequences. Thirty participants rated them in a controlled setup. Then they ran a range of metrics and looked at how well they matched the subjective scores. VMAF and one hybrid metric came out strong on Pearson correlation, FasterVQA did best for no-reference, and PSNR led on Spearman for within-sequence ranking. The headline result is that the metrics behaved similarly for the neural and traditional codecs. That data collection follows standard methods and the open release is a plus for the field. It gives people concrete numbers to work with when they want to check if quality metrics still apply as neural codecs get more use. The weaker part is the support for saying there are no significant differences in how the metrics perform across codec types. They base this on the correlation values looking close, but with only six source videos the effective sample size for those correlations is limited. The two neural codecs are both variants of the same DCVC approach, which narrows the test. They don't include a direct statistical comparison, such as a test for difference in correlation coefficients or bootstrap intervals on the gap between groups. Without that, the similarity could just reflect low power rather than a real lack of difference. This work is aimed at people in video compression and quality assessment who need updated validation for metrics on newer codec types. A reader looking for fresh empirical points on metric reliability would get something out of it. The paper shows clear thinking on the experimental design and engages with the practical question of metric validity. I'd recommend sending it to peer review. The new data makes it worth the referees' time, provided they strengthen the statistical side of the comparison.

Referee Report

1 major / 2 minor

Summary. The manuscript reports a subjective video quality assessment with 30 participants rating 216 encoded 4K/UHD-1 sequences (6 sources, 8-10 s each, 60 fps) generated from AV1, VVC, DCVC-FM and DCVC-RT at four resolutions and nine QP values. Subjective scores are used to benchmark a range of full-reference, hybrid and no-reference metrics; the authors report that VMAF and AVQBits|H0|f achieve the strongest Pearson correlations, PSNR the highest Spearman rank correlation for within-sequence comparisons, and that no significant performance differences in metric reliability appear between the traditional and neural codec groups. The dataset of sources, encodings, subjective scores and metric values is to be released publicly.

Significance. If the central finding of metric equivalence holds, the work would provide useful empirical support for applying established metrics such as VMAF to neural video codecs, reducing the need for new subjective tests when comparing codec families. The controlled 4K test design, use of multiple resolutions, and planned open release of the full dataset constitute clear strengths for reproducibility and future meta-analyses.

major comments (1)

[Results / objective quality assessment] Results / correlation tables (implicit in the objective quality assessment paragraph): the claim that 'no significant performance differences in metric reliability are observed between traditional and neural video codecs' rests on numerical similarity of Pearson and Spearman coefficients computed over the 216 sequences. No formal test of the difference between the two codec-group correlations (Fisher z, Steiger, or bootstrap CI on Δr) is reported. With only six source videos the effective degrees of freedom per correlation are low; numerical closeness alone does not establish statistical non-significance versus under-power.

minor comments (2)

[Abstract / Methods] The abstract and methods should explicitly state how the 'traditional' versus 'neural' grouping was defined for the correlation comparisons and whether any per-source or per-resolution blocking was applied before pooling.
[Results] Details on the exact statistical procedure used to reach the 'no significant difference' conclusion (including any multiple-comparison correction) are missing; these should be added even if only to confirm that a simple numerical comparison was performed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thorough review and the constructive suggestion regarding statistical rigor in our comparison of metric performance across codec families. We address the major comment below and will update the manuscript to incorporate a formal test of correlation differences.

read point-by-point responses

Referee: [Results / objective quality assessment] Results / correlation tables (implicit in the objective quality assessment paragraph): the claim that 'no significant performance differences in metric reliability are observed between traditional and neural video codecs' rests on numerical similarity of Pearson and Spearman coefficients computed over the 216 sequences. No formal test of the difference between the two codec-group correlations (Fisher z, Steiger, or bootstrap CI on Δr) is reported. With only six source videos the effective degrees of freedom per correlation are low; numerical closeness alone does not establish statistical non-significance versus under-power.

Authors: We agree that the current claim relies on numerical similarity without a formal statistical comparison and that this is insufficient to establish non-significance, particularly given the limited number of source contents. In the revised manuscript we will compute separate Pearson and Spearman correlations for the traditional codec group (AV1 and VVC sequences) and the neural codec group (DCVC-FM and DCVC-RT sequences). We will then apply Fisher's z-transformation to test the difference between these correlations and will additionally report bootstrap confidence intervals for the difference Δr. We will also present per-source correlation values to acknowledge content dependency. These additions will be placed in the objective quality assessment section together with the corresponding p-values and a revised statement of the findings. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical study with new subjective data

full rationale

This is a standard empirical evaluation paper. It collects new subjective ratings from 30 participants on 216 newly encoded sequences (6 sources, 4 codecs including two DCVC neural variants, multiple resolutions/QPs) and computes Pearson/Spearman correlations of objective metrics (VMAF, PSNR, FasterVQA, etc.) against those ratings. No mathematical derivation chain exists, no parameters are fitted on a subset and then called predictions on related quantities, and no self-citation or uniqueness theorem is invoked to justify the central claim. The analysis directly compares observed correlations between codec groups using the fresh subjective ground truth, making the result self-contained against external benchmarks rather than reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on established practices in video quality research rather than introducing new free parameters or entities.

axioms (1)

domain assumption Subjective ratings collected from 30 participants in a controlled environment accurately reflect perceived video quality differences.
This is a foundational assumption in all subjective quality assessment studies.

pith-pipeline@v0.9.0 · 5827 in / 1231 out tokens · 57288 ms · 2026-05-21T19:15:18.507991+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

DeepCoder: A deep neural network based video compression

T. Chen et al. “DeepCoder: A deep neural network based video compression”. In:Visual Communications and Image Processing. St. Petersburg, FL: IEEE, 2017, pp. 1–4

work page 2017
[2]

Deep Predictive Video Compression Using Mode-Selective Uni- and Bi-Directional Predictions Based on Multi- Frame Hypothesis

W. Park and M. Kim. “Deep Predictive Video Compression Using Mode-Selective Uni- and Bi-Directional Predictions Based on Multi- Frame Hypothesis”. In:IEEE Access9 (2020), pp. 72–85

work page 2020
[3]

Recurrent Neural Network-Based Video Compression

Z. Montajabi, V . Khorasani Ghassab, and N. Bouguila. “Recurrent Neural Network-Based Video Compression”. In:21st Int. Conf. on Machine Learning and Applications. Nassau, Bahamas: IEEE, 2022, pp. 925–930

work page 2022
[4]

Neural Video Compression Using GANs for Detail Synthesis and Propagation

F. Mentzer et al. “Neural Video Compression Using GANs for Detail Synthesis and Propagation”. In:Computer Vision (ECCV). Cham: Springer Nature Switzerland, 2022, pp. 562–578. TABLE II CORRELATION BETWEENMOSAND METRIC FOR EACH CODEC,THE MEAN CORRELATION(WITHIN-SEQUENCE)FOR EACH SOURCE AND CORRELATION ACROSS ALL VIDEOS.∆N vTQUANTIFIES THE DEGREE TO WHICH QU...

work page arXiv 2022
[5]

Deep Contextual Video Compression

J. Li, B. Li, and Y . Lu. “Deep Contextual Video Compression”. In: Advances in Neural Information Processing Systems. V ol. 34. Curran Associates, Inc., 2021, pp. 18114–18125

work page 2021
[6]

Temporal Context Mining for Learned Video Com- pression

X. Sheng et al. “Temporal Context Mining for Learned Video Com- pression”. In:Trans. on Multimedia25 (2022), pp. 7311–7322

work page 2022
[7]

Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression

J. Li, B. Li, and Y . Lu. “Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression”. In:Proc. of the 30th ACM Int. Conf. on Multimedia. Lisboa Portugal: ACM, 2022, pp. 1503–1511

work page 2022
[8]

Neural Video Compression with Diverse Contexts

J. Li, B. Li, and Y . Lu. “Neural Video Compression with Diverse Contexts”. In:Conf. on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023, pp. 22616–22626

work page 2023
[9]

Neural Video Compression with Feature Modulation

J. Li, B. Li, and Y . Lu. “Neural Video Compression with Feature Modulation”. In:Conf. on Computer Vision and Pattern Recognition. Seattle, W A, USA: IEEE, 2024, pp. 26099–26108

work page 2024
[10]

Towards Practical Real-Time Neural Video Compression

Z. Jia et al. “Towards Practical Real-Time Neural Video Compression”. In:Proc. of the Computer Vision and Pattern Recognition Conference. 2025, pp. 12543–12552

work page 2025
[11]

EVC: Towards Real-Time Neural Image Compres- sion with Mask Decay

G.-H. Wang et al. “EVC: Towards Real-Time Neural Image Compres- sion with Mask Decay”. In:Int. Conf. on Learning Representations. 2023

work page 2023
[12]

Deep Hierarchical Video Compression

M. Lu et al. “Deep Hierarchical Video Compression”. In:Proc. of the AAAI Conf. on Artificial Intelligence38.8 (2024), pp. 8859–8867

work page 2024
[13]

High-Efficiency Neural Video Compression via Hierar- chical Predictive Learning

M. Lu et al. “High-Efficiency Neural Video Compression via Hierar- chical Predictive Learning”. In:arXiv:2410.02598 [eess.IV](2024)

work page arXiv 2024
[14]

Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration

S. Teng et al. “Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration”. In:Int. Conf. on Visual Communi- cations and Image Processing. 2024, pp. 1–5

work page 2024
[15]

Analysis of Neural Video Compression Networks for 360-Degree Video Coding

A. Regensky, F. Brand, and A. Kaup. “Analysis of Neural Video Compression Networks for 360-Degree Video Coding”. In:Picture Coding Symp.Taichung, Taiwan: IEEE, 2024, pp. 1–5

work page 2024
[16]

A VT-VQDB-UHD-1: A Large Scale Video Quality Database for UHD-1

R. R. Ramachandra Rao et al. “A VT-VQDB-UHD-1: A Large Scale Video Quality Database for UHD-1”. In:Int. Symp. on Multimedia. San Diego, CA, USA: IEEE, 2019, pp. 17–177

work page 2019
[17]

VCA: video complexity analyzer

V . V . Menon et al. “VCA: video complexity analyzer”. In:Proc. of the 13th ACM Multimedia Systems Conf.Athlone Ireland: ACM, 2022, pp. 259–264

work page 2022
[18]

Vvenc: An Open And Optimized Vvc Encoder Implementation

A. Wieckowski et al. “Vvenc: An Open And Optimized Vvc Encoder Implementation”. In:Int. Conf. on Multimedia & Expo Workshops. Shenzhen, China: IEEE, 2021, pp. 1–2

work page 2021
[19]

Calculation of average PSNR differences between RD-curves

G. Bjontegaard. “Calculation of average PSNR differences between RD-curves”. In:ITU-T SG16, Doc. VCEG-M33(2001)

work page 2001
[20]

Alliance for Open Media.AOM Common Test Conditions v3.0. 2022. URL: https : / / aomedia . org / docs / CWG - C038o A V2CTC v3 . pdf (visited on 05/20/2025)

work page 2022
[21]

2023.URL: https://github.com/microsoft/ DCVC/blob/main/test conditions.md (visited on 05/21/2025)

Microsoft.Test Conditions. 2023.URL: https://github.com/microsoft/ DCVC/blob/main/test conditions.md (visited on 05/21/2025)

work page 2023
[22]

ITU-T.P .910: Subjective video quality assessment methods for multi- media applications. 2023

work page 2023
[23]

SOS: The MOS is not enough!

T. Hossfeld, R. Schatz, and S. Egger. “SOS: The MOS is not enough!” In:3rd. Int. Workshop on Quality of Multimedia Experience (QoMEX). Mechelen, Belgium: IEEE, 2011, pp. 131–136

work page 2011
[24]

A Large-Scale Evaluation of Subject Rating Be- haviour in Visual Quality Assessment Studies

R. R. R. Rao et al. “A Large-Scale Evaluation of Subject Rating Be- haviour in Visual Quality Assessment Studies”. In:17th. Int. Workshop on Quality of Multimedia Experience (to appear). 2025

work page 2025
[25]

Deep Learning Based Full-Reference and No-Reference Quality Assessment Models for Compressed UGC Videos

W. Sun et al. “Deep Learning Based Full-Reference and No-Reference Quality Assessment Models for Compressed UGC Videos”. In:Int. Conf. on Multimedia & Expo Workshops. 2021, pp. 1–6

work page 2021
[26]

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

R. Zhang et al. “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric”. In:Conf. Computer Vision and Pattern Recognition. Salt Lake City, UT: IEEE, 2018, pp. 586–595

work page 2018
[27]

MUSIQ: Multi-scale Image Quality Transformer

J. Ke et al. “MUSIQ: Multi-scale Image Quality Transformer”. In: Int. Conf. on Computer Vision. Montreal, QC, Canada: IEEE, 2021, pp. 5128–5137

work page 2021
[28]

Neighbourhood Representative Sampling for Efficient End-to-End Video Quality Assessment

H. Wu et al. “Neighbourhood Representative Sampling for Efficient End-to-End Video Quality Assessment”. In:IEEE Trans. Pattern Anal. Mach. Intell.45.12 (2023), pp. 15185–15202

work page 2023
[29]

Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

H. Wu et al. “Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives”. In:Int. Conf. on Computer Vision. Paris, France: IEEE, 2023, pp. 20087–20097

work page 2023
[30]

Q-ALIGN: teaching LMMs for visual scoring via discrete text-defined levels

H. Wu et al. “Q-ALIGN: teaching LMMs for visual scoring via discrete text-defined levels”. In:Proc. of the 41st Int. Conf. Machine Learning. V ol. 235. Vienna, Austria: JMLR.org, 2024, pp. 54015–54029

work page 2024
[31]

A VQBits—Adaptive Video Quality Model Based on Bitstream Information for Various Video Applications

R. R. Ramachandra Rao, S. Goring, and A. Raake. “A VQBits—Adaptive Video Quality Model Based on Bitstream Information for Various Video Applications”. In:IEEE Access10 (2022), pp. 80321–80351

work page 2022
[32]

ITU-T.P .1401: Methods, metrics and procedures for statistical eval- uation, qualification and comparison of objective quality prediction models. 2020

work page 2020

[1] [1]

DeepCoder: A deep neural network based video compression

T. Chen et al. “DeepCoder: A deep neural network based video compression”. In:Visual Communications and Image Processing. St. Petersburg, FL: IEEE, 2017, pp. 1–4

work page 2017

[2] [2]

Deep Predictive Video Compression Using Mode-Selective Uni- and Bi-Directional Predictions Based on Multi- Frame Hypothesis

W. Park and M. Kim. “Deep Predictive Video Compression Using Mode-Selective Uni- and Bi-Directional Predictions Based on Multi- Frame Hypothesis”. In:IEEE Access9 (2020), pp. 72–85

work page 2020

[3] [3]

Recurrent Neural Network-Based Video Compression

Z. Montajabi, V . Khorasani Ghassab, and N. Bouguila. “Recurrent Neural Network-Based Video Compression”. In:21st Int. Conf. on Machine Learning and Applications. Nassau, Bahamas: IEEE, 2022, pp. 925–930

work page 2022

[4] [4]

Neural Video Compression Using GANs for Detail Synthesis and Propagation

F. Mentzer et al. “Neural Video Compression Using GANs for Detail Synthesis and Propagation”. In:Computer Vision (ECCV). Cham: Springer Nature Switzerland, 2022, pp. 562–578. TABLE II CORRELATION BETWEENMOSAND METRIC FOR EACH CODEC,THE MEAN CORRELATION(WITHIN-SEQUENCE)FOR EACH SOURCE AND CORRELATION ACROSS ALL VIDEOS.∆N vTQUANTIFIES THE DEGREE TO WHICH QU...

work page arXiv 2022

[5] [5]

Deep Contextual Video Compression

J. Li, B. Li, and Y . Lu. “Deep Contextual Video Compression”. In: Advances in Neural Information Processing Systems. V ol. 34. Curran Associates, Inc., 2021, pp. 18114–18125

work page 2021

[6] [6]

Temporal Context Mining for Learned Video Com- pression

X. Sheng et al. “Temporal Context Mining for Learned Video Com- pression”. In:Trans. on Multimedia25 (2022), pp. 7311–7322

work page 2022

[7] [7]

Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression

J. Li, B. Li, and Y . Lu. “Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression”. In:Proc. of the 30th ACM Int. Conf. on Multimedia. Lisboa Portugal: ACM, 2022, pp. 1503–1511

work page 2022

[8] [8]

Neural Video Compression with Diverse Contexts

J. Li, B. Li, and Y . Lu. “Neural Video Compression with Diverse Contexts”. In:Conf. on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023, pp. 22616–22626

work page 2023

[9] [9]

Neural Video Compression with Feature Modulation

J. Li, B. Li, and Y . Lu. “Neural Video Compression with Feature Modulation”. In:Conf. on Computer Vision and Pattern Recognition. Seattle, W A, USA: IEEE, 2024, pp. 26099–26108

work page 2024

[10] [10]

Towards Practical Real-Time Neural Video Compression

Z. Jia et al. “Towards Practical Real-Time Neural Video Compression”. In:Proc. of the Computer Vision and Pattern Recognition Conference. 2025, pp. 12543–12552

work page 2025

[11] [11]

EVC: Towards Real-Time Neural Image Compres- sion with Mask Decay

G.-H. Wang et al. “EVC: Towards Real-Time Neural Image Compres- sion with Mask Decay”. In:Int. Conf. on Learning Representations. 2023

work page 2023

[12] [12]

Deep Hierarchical Video Compression

M. Lu et al. “Deep Hierarchical Video Compression”. In:Proc. of the AAAI Conf. on Artificial Intelligence38.8 (2024), pp. 8859–8867

work page 2024

[13] [13]

High-Efficiency Neural Video Compression via Hierar- chical Predictive Learning

M. Lu et al. “High-Efficiency Neural Video Compression via Hierar- chical Predictive Learning”. In:arXiv:2410.02598 [eess.IV](2024)

work page arXiv 2024

[14] [14]

Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration

S. Teng et al. “Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration”. In:Int. Conf. on Visual Communi- cations and Image Processing. 2024, pp. 1–5

work page 2024

[15] [15]

Analysis of Neural Video Compression Networks for 360-Degree Video Coding

A. Regensky, F. Brand, and A. Kaup. “Analysis of Neural Video Compression Networks for 360-Degree Video Coding”. In:Picture Coding Symp.Taichung, Taiwan: IEEE, 2024, pp. 1–5

work page 2024

[16] [16]

A VT-VQDB-UHD-1: A Large Scale Video Quality Database for UHD-1

R. R. Ramachandra Rao et al. “A VT-VQDB-UHD-1: A Large Scale Video Quality Database for UHD-1”. In:Int. Symp. on Multimedia. San Diego, CA, USA: IEEE, 2019, pp. 17–177

work page 2019

[17] [17]

VCA: video complexity analyzer

V . V . Menon et al. “VCA: video complexity analyzer”. In:Proc. of the 13th ACM Multimedia Systems Conf.Athlone Ireland: ACM, 2022, pp. 259–264

work page 2022

[18] [18]

Vvenc: An Open And Optimized Vvc Encoder Implementation

A. Wieckowski et al. “Vvenc: An Open And Optimized Vvc Encoder Implementation”. In:Int. Conf. on Multimedia & Expo Workshops. Shenzhen, China: IEEE, 2021, pp. 1–2

work page 2021

[19] [19]

Calculation of average PSNR differences between RD-curves

G. Bjontegaard. “Calculation of average PSNR differences between RD-curves”. In:ITU-T SG16, Doc. VCEG-M33(2001)

work page 2001

[20] [20]

Alliance for Open Media.AOM Common Test Conditions v3.0. 2022. URL: https : / / aomedia . org / docs / CWG - C038o A V2CTC v3 . pdf (visited on 05/20/2025)

work page 2022

[21] [21]

2023.URL: https://github.com/microsoft/ DCVC/blob/main/test conditions.md (visited on 05/21/2025)

Microsoft.Test Conditions. 2023.URL: https://github.com/microsoft/ DCVC/blob/main/test conditions.md (visited on 05/21/2025)

work page 2023

[22] [22]

ITU-T.P .910: Subjective video quality assessment methods for multi- media applications. 2023

work page 2023

[23] [23]

SOS: The MOS is not enough!

T. Hossfeld, R. Schatz, and S. Egger. “SOS: The MOS is not enough!” In:3rd. Int. Workshop on Quality of Multimedia Experience (QoMEX). Mechelen, Belgium: IEEE, 2011, pp. 131–136

work page 2011

[24] [24]

A Large-Scale Evaluation of Subject Rating Be- haviour in Visual Quality Assessment Studies

R. R. R. Rao et al. “A Large-Scale Evaluation of Subject Rating Be- haviour in Visual Quality Assessment Studies”. In:17th. Int. Workshop on Quality of Multimedia Experience (to appear). 2025

work page 2025

[25] [25]

Deep Learning Based Full-Reference and No-Reference Quality Assessment Models for Compressed UGC Videos

W. Sun et al. “Deep Learning Based Full-Reference and No-Reference Quality Assessment Models for Compressed UGC Videos”. In:Int. Conf. on Multimedia & Expo Workshops. 2021, pp. 1–6

work page 2021

[26] [26]

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

R. Zhang et al. “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric”. In:Conf. Computer Vision and Pattern Recognition. Salt Lake City, UT: IEEE, 2018, pp. 586–595

work page 2018

[27] [27]

MUSIQ: Multi-scale Image Quality Transformer

J. Ke et al. “MUSIQ: Multi-scale Image Quality Transformer”. In: Int. Conf. on Computer Vision. Montreal, QC, Canada: IEEE, 2021, pp. 5128–5137

work page 2021

[28] [28]

Neighbourhood Representative Sampling for Efficient End-to-End Video Quality Assessment

H. Wu et al. “Neighbourhood Representative Sampling for Efficient End-to-End Video Quality Assessment”. In:IEEE Trans. Pattern Anal. Mach. Intell.45.12 (2023), pp. 15185–15202

work page 2023

[29] [29]

Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

H. Wu et al. “Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives”. In:Int. Conf. on Computer Vision. Paris, France: IEEE, 2023, pp. 20087–20097

work page 2023

[30] [30]

Q-ALIGN: teaching LMMs for visual scoring via discrete text-defined levels

H. Wu et al. “Q-ALIGN: teaching LMMs for visual scoring via discrete text-defined levels”. In:Proc. of the 41st Int. Conf. Machine Learning. V ol. 235. Vienna, Austria: JMLR.org, 2024, pp. 54015–54029

work page 2024

[31] [31]

A VQBits—Adaptive Video Quality Model Based on Bitstream Information for Various Video Applications

R. R. Ramachandra Rao, S. Goring, and A. Raake. “A VQBits—Adaptive Video Quality Model Based on Bitstream Information for Various Video Applications”. In:IEEE Access10 (2022), pp. 80321–80351

work page 2022

[32] [32]

ITU-T.P .1401: Methods, metrics and procedures for statistical eval- uation, qualification and comparison of objective quality prediction models. 2020

work page 2020