pith. sign in

arxiv: 2605.15800 · v1 · pith:Y4RZ7HY2new · submitted 2026-05-15 · 📡 eess.IV · cs.ET· cs.MM· eess.SP

Video Quality Evaluation Methodology and Result of AV2 Compression Performance

Pith reviewed 2026-05-19 18:48 UTC · model grok-4.3

classification 📡 eess.IV cs.ETcs.MMeess.SP
keywords AV2AV1BD-ratevideo compressioncommon test conditionsadaptive streaminguser-generated contentvideo quality evaluation
0
0 comments X

The pith

AV2 delivers bitrate savings of 29.81 percent on PSNR-YUV and 33.79 percent on VMAF versus AV1 under the defined random access tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out the evaluation rules for the AV2 video codec and measures how much better it compresses video than AV1. It uses new test setups that include adaptive streaming with convex hull optimization, user-generated videos, and more color formats to judge quality. The results indicate that AV2 needs roughly 30 percent less data to reach the same picture quality as AV1 under random access encoding. A reader would care because this could mean smoother high-resolution streaming with less internet use or storage space. The methodology is designed to mirror practical uses in media delivery across varied content types.

Core claim

The paper presents the AV2 Common Test Conditions for assessing video quality, which feature convex-hull adaptive streaming configuration, user-generated content, and extended chroma formats. Against the AV1 baseline, AV2 v13.0 achieves BD-rate reductions of 29.81 percent for PSNR-YUV and 33.79 percent for VMAF in random access configuration. This demonstrates the compression efficiency gains of AV2 for next-generation streaming applications.

What carries the argument

The AV2 Common Test Conditions, which standardize evaluation across convex-hull adaptive streaming, user-generated content, and extended chroma formats to quantify performance against AV1.

If this is right

  • AV2 can reach equivalent video quality at lower bitrates than AV1 across the tested configurations.
  • Convex-hull adaptive streaming tests provide a more realistic measure of performance in variable bitrate delivery.
  • The gains extend to user-generated content, showing applicability beyond professional video sources.
  • Extended chroma format support broadens the codec's usefulness in modern production pipelines.
  • The reported savings position AV2 as a direct efficiency upgrade for bandwidth-sensitive streaming.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These test outcomes could guide encoding choices when services decide whether to adopt AV2 for large-scale video libraries.
  • The emphasis on user-generated content may expose relative strengths of AV2 when source material varies in quality and resolution.
  • The same test framework could serve as a baseline for measuring any subsequent codec against AV2.
  • Savings observed in controlled tests may shift in practice once hardware-specific encoders and network conditions enter the picture.

Load-bearing premise

The test sequences and configurations chosen for the Common Test Conditions represent typical real-world video content and usage patterns.

What would settle it

Applying the identical BD-rate calculations and quality metrics to a separate collection of production video clips drawn from commercial streaming services.

read the original abstract

The Alliance for Open Media (AOMedia) has developed the AV2 video coding standard to supersede AV1, aiming for substantial compression efficiency gains across diverse media applications. This paper details the quality and performance evaluation methodology defined in the AV2 Common Test Conditions (CTC), which introduces new evaluation methods and content, including convex-hull-based adaptive streaming (AS) configuration, user-generated content (UGC), and extended chroma formats. We present the coding gains of the AV2 (v13.0) against the AV1 baseline. Experimental results show that AV2 achieves significant Bj{\o}ntegaard-Delta Rate (BD-rate) reductions of 29.81\% and 33.79\% for PSNR-YUV and VMAF, respectively, under random access configuration, validating the efficiency of AV2 for next-generation streaming applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript describes the quality evaluation methodology in the AV2 Common Test Conditions (CTC), which incorporates convex-hull adaptive streaming, user-generated content, and extended chroma formats. It reports empirical BD-rate savings of AV2 v13.0 versus an AV1 baseline, specifically 29.81% for PSNR-YUV and 33.79% for VMAF under random access configuration, and positions these as validation for next-generation streaming applications.

Significance. If the CTC-defined results hold, the work supplies concrete, configuration-specific compression gain figures that document AV2's efficiency improvements over AV1. The introduction of convex-hull AS and UGC content types broadens the evaluation scope beyond prior standards testing.

major comments (1)
  1. [Abstract] Abstract: the assertion that the reported BD-rate reductions validate AV2 efficiency for next-generation streaming applications rests on the unexamined premise that the CTC sequences and configurations (convex-hull AS, UGC, extended chroma) are representative of real-world workloads. No sensitivity analysis, cross-validation on external datasets, or generalization study is supplied to support this extrapolation from the chosen test set.
minor comments (1)
  1. The abstract states precise BD-rate percentages without reference to the number of sequences, confidence intervals, or explicit verification steps for the CTC setup.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the abstract. We respond to the major comment below and indicate the revision we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that the reported BD-rate reductions validate AV2 efficiency for next-generation streaming applications rests on the unexamined premise that the CTC sequences and configurations (convex-hull AS, UGC, extended chroma) are representative of real-world workloads. No sensitivity analysis, cross-validation on external datasets, or generalization study is supplied to support this extrapolation from the chosen test set.

    Authors: We acknowledge that the manuscript does not include sensitivity analyses, cross-validation on external datasets, or explicit generalization studies. The AV2 CTC were developed through the AOMedia standardization process specifically to capture representative workloads for next-generation video applications, incorporating UGC, convex-hull adaptive streaming, and extended chroma formats that reflect practical streaming scenarios. The reported BD-rate figures are therefore presented as results obtained under these standardized, community-defined conditions rather than as a direct claim of universal real-world performance. To address the referee's concern, we will revise the abstract to state that the observed gains demonstrate AV2 efficiency under the AV2 CTC, which are designed to be representative of relevant streaming workloads. revision: partial

Circularity Check

0 steps flagged

Empirical BD-rate results from codec runs against external AV1 baseline show no derivation circularity

full rationale

The paper reports measured coding gains of AV2 v13.0 versus AV1 on the defined CTC sequences, using standard Bjørntegaard-Delta Rate computation on RD curves obtained from actual encoder runs. These numerical results (29.81% PSNR-YUV and 33.79% VMAF under random access) are direct empirical outputs, not quantities derived from equations or parameters fitted inside the paper. The CTC methodology (including convex-hull AS and UGC) is referenced as the evaluation framework, but the performance deltas are independent measurements against an external baseline codec. No self-definitional loops, fitted-input predictions, or load-bearing self-citations reduce the headline claims to the paper's own inputs. Minor reference to prior CTC work does not make the reported savings circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The evaluation rests on standard video quality metrics and test condition definitions rather than new free parameters or invented entities.

axioms (2)
  • domain assumption BD-rate accurately summarizes rate-distortion performance across multiple operating points.
    Central to all reported percentage reductions.
  • domain assumption PSNR-YUV and VMAF are appropriate proxies for visual quality in the tested content classes.
    Used to compute the headline savings figures.

pith-pipeline@v0.9.0 · 5695 in / 1197 out tokens · 41953 ms · 2026-05-19T18:48:20.946383+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

  1. [1]

    A V2 aims to deliver a substantial reduction in bitrate compared to A V1 for equivalent visual quality

    INTRODUCTION To address the increasing demand for efficient video storage and transmission, the Alliance for Open Media (AOMedia) is developing A V2, the next-generation video coding stan- dard succeeding A V1 [1]. A V2 aims to deliver a substantial reduction in bitrate compared to A V1 for equivalent visual quality. This efficiency gain is essential for ...

  2. [2]

    Video Quality Evaluation Methodology and Result of AV2 Compression Performance

    BACKGROUND Developing a standardized performance evaluation is the fun- damental part of video coding standard development. It al- lows for the rigorous assessment of coding tools, ensuring that compression gains justify the associated computational complexity. Historically, groups such as MPEG and VCEG, and Joint Video Experts Team JVET have established ...

  3. [3]

    It ensures a fair comparison can be made be- tween different proposed coding tools and the existing base- line

    QUALITY AND PERFORMANCE EV ALUATION METHODOLOGY IN A V2 CTC The A V2 CTC defines the specific video sequences, the en- coding and decoding configurations, the quality and complex- ity metrics, and the procedures for conducting coding gain evaluations. It ensures a fair comparison can be made be- tween different proposed coding tools and the existing base-...

  4. [4]

    Class A: natural content, 270p to 2160p, 8/10 bit, 4:2:0

  5. [5]

    Class B: synthetic content for gaming, animation, and screen sharing, 1080p, 8/10 bit, 4:2:0

  6. [6]

    Class E: user-generated content (UGC) up to 4K, 8-bit, 4:2:0

  7. [7]

    Class F: still image up to 8K, 8 bit, 4:2:0

  8. [8]

    Class G: HDR in BT.2100 colour space with PQ trans- fer function, 2160p, 10bit, 4:2:0

  9. [9]

    Class ECF: extended chroma format up to 4K, 8/10 bit with total of six sub-classes, 4:2:2 and 4:4:4 with SDR and HDR-BT.2100-PQ, and separate sub-classes for RGB, YCoCg-RE. 3.2. Encoding Configuration Since the main goal of the CTC design is to evaluate norma- tive coding tools, encoding algorithms such as two-pass or look-ahead encoding, content-adaptive...

  10. [10]

    A V2 PERFORMANCE TEST RESULTS Over the A V2 development cycle, periodic tests were con- ducted across all video classes to gauge compression progress. To ensure maximum coding gain, a single-tile configuration is used for the A V2 v13.0 release evaluation for all classes except Class ECF, where A V2 CTC with parallel tiles is used. Table 2 summarises the ...

  11. [11]

    CONCLUSION This paper presents the A V2 CTC, the methodology estab- lished to evaluate the A V2 video coding standard. With stan- dardized set of test sequences, encoding configurations, and objective quality metrics, the A V2 CTC ensures fair, transpar- ent, and reproducible comparisons of coding tool proposals and codec releases throughout the developme...

  12. [12]

    A technical overview of av1,

    Jingning Han, Bohan Li, Debargha Mukherjee, Ching- Han Chiang, Adrian Grange, Cheng Chen, Hui Su, Sarah Parker, Sai Deng, Urvang Joshi, Yue Chen, Yunqing Wang, Paul Wilkins, Yaowu Xu, and James Bankoski, “A technical overview of av1,”Proceedings of the IEEE, vol. 109, no. 9, pp. 1435–1462, 2021

  13. [13]

    AOM Com- mon Test Conditions v8.0,

    Xin Zhao, Zhijun (Ryan) Lei, Andrey Norkin, Thomas Daede, Alexis Tourapis, Vibhoothi Vib- hoothi, Van Luong Pham, Dzung Hoang, Aki Ku- usela, and Mohammed Golam Sarwer, “AOM Com- mon Test Conditions v8.0,”Alliance for Open Me- dia, Codec Working Group Output Document, vol. CWG/F384o, 2025,https://aomedia.org/ docs/CWG-F384o_AV2_CTC_v8.pdf

  14. [14]

    Common test conditions and software ref- erence configuration,

    F Bossen, “Common test conditions and software ref- erence configuration,”JCT-VC of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 12th Meeting: Doc: JCTVC-L1100, Jan 2013

  15. [15]

    JVET common test conditions and evaluation proce- dures for SDR video,

    F. Bossen, J. Boyce, X. Li, V . Seregin, and K. S ¨uhring, “JVET common test conditions and evaluation proce- dures for SDR video,”Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, JVET- T2010, 2020

  16. [16]

    JVET common test conditions and evaluation procedures for HDR/ WCG video,

    A Segall et al., “JVET common test conditions and evaluation procedures for HDR/ WCG video,”Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, JVET-T2011, 2020

  17. [17]

    YouTube UGC Dataset for Video Compression Research,

    Yilin Wang, Sasi Inguva, and Balu Adsumilli, “YouTube UGC Dataset for Video Compression Research,” in 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), 2019, pp. 1–5

  18. [18]

    Video codec comparison using the dynamic optimizer framework,

    I. Katsavounidis and L. Guo, “Video codec comparison using the dynamic optimizer framework,” inApplica- tions of Digital Image Processing XLI. SPIE, 2018, vol. 10752

  19. [19]

    Convex hull prediction methods for bitrate ladder con- struction: Design, evaluation, and comparison,

    Ahmed Telili, Wassim Hamidouche, Hadi Amirpour, Sid Ahmed Fezza, Christian Timmerer, and Luce Morin, “Convex hull prediction methods for bitrate ladder con- struction: Design, evaluation, and comparison,”ACM Trans. Multimedia Comput. Commun. Appl., vol. 21, no. 7, 2025

  20. [20]

    New full-reference quality metrics based on hvs,

    Karen Egiazarian, Jaakko Astola, Nikolay Pono- marenko, Vladimir Lukin, Federica Battisti, and Marco Carli, “New full-reference quality metrics based on hvs,” inProceedings of the second international work- shop on video processing and quality metrics, 2006, vol. 4, p. 4

  21. [21]

    Multiscale structural similarity for image quality assessment,

    Z. Wang, E.P. Simoncelli, and A.C. Bovik, “Multiscale structural similarity for image quality assessment,” in 37th Asilomar Conference on Signals, Systems Comput- ers, 2003, 2003, vol. 2, pp. 1398–1402 V ol.2

  22. [22]

    Color image quality assessment based on ciede2000,

    Yang Yang, Jun Ming, and Nenghai Yu, “Color image quality assessment based on ciede2000,”Advances in Multimedia, vol. 2012, no. 1, pp. 273723, 2012

  23. [23]

    A fusion-based video quality as- sessment (FVQA) index,

    Joe Yuchieh Lin, Tsung-Jung Liu, Eddy Chi-Hao Wu, and C.-C Jay Kuo, “A fusion-based video quality as- sessment (FVQA) index,” inSignal and Information Processing Association Annual Summit and Conference (APSIPA), 2014

  24. [24]

    Cambi: Contrast-aware multiscale banding index,

    Pulkit Tandon, Mariana Afonso, Joel Sole, and Luk ´aˇs Krasula, “Cambi: Contrast-aware multiscale banding index,” in2021 Picture Coding Symposium (PCS), 2021, pp. 1–5

  25. [25]

    VMAF - Video Multi-Method Assessment Fu- sion,

    Netflix, “VMAF - Video Multi-Method Assessment Fu- sion,” 2016, https://github.com/Netflix/vmaf

  26. [26]

    Calculation of average PSNR differ- ences between RD curves; VCEG-M33,

    G Bjontegaard, “Calculation of average PSNR differ- ences between RD curves; VCEG-M33,” inITU-T SG16/Q6, 2001

  27. [27]

    BDRate/BD-PSNR Excel extensions,

    A. M. Tourapis, D. Singer, Y . Su, and K. Mam- mou, “BDRate/BD-PSNR Excel extensions,” inITU-T SG16/Q6. 2017, Joint Video Exploration Team (JVET) of ITU-T and ISO/IEC JTC 1/SC 29/WG 11

  28. [28]

    HDRTools pacakge [On- line],

    ITU-T and ISO/IEC, “HDRTools pacakge [On- line],” 2015, Available:https://gitlab.com/ standards/HDRTools

  29. [29]

    ConvexHull Frame- work: A V2 Codec Library,

    Alliance for Open Media, “ConvexHull Frame- work: A V2 Codec Library,” 2026,https: //gitlab.com/AOMediaCodec/avm/-/tree/ main/tools/convexhull_framework

  30. [30]

    Monotone Piecewise Cubic Interpolation,

    Fred N. F. and Ralph E. C., “Monotone Piecewise Cubic Interpolation,”SIAM Journal on Numerical Analysis, vol. 17, 1980

  31. [31]

    A V2 Codec Library,

    Alliance for Open Media, “A V2 Codec Library,” 2026, https://gitlab.com/AOMediaCodec/avm

  32. [32]

    A V1 Codec Library,

    Alliance for Open Media, “A V1 Codec Library,” 2018, https://aomedia.googlesource.com/aom

  33. [33]

    dav2d: very-fast A V2 cross-platform de- coder,

    VideoLAN, “dav2d: very-fast A V2 cross-platform de- coder,” 2026,https://code.videolan.org/ videolan/dav2d