pith. sign in

arxiv: 2605.19397 · v1 · pith:R4NJQUHRnew · submitted 2026-05-19 · 📡 eess.IV · cs.MM

Perception-Aware Video Semantic Communication

Pith reviewed 2026-05-20 02:40 UTC · model grok-4.3

classification 📡 eess.IV cs.MM
keywords video semantic communicationperception-aware transmissionwireless videospatio-temporal featuresreal-time decodingbandwidth reductionLPIPSDISTS
0
0 comments X

The pith

A perception-aware semantic communication system encodes video features for wireless transmission to cut bandwidth use while preserving human visual quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PVSC as a framework that replaces traditional separated source-channel coding with direct transmission of compact spatio-temporal features. It removes the need for explicit motion vectors and adds side-information formatting plus reference-buffer management to support stable decoding. Experiments show the approach maintains perceptual quality metrics across different datasets, resolutions, and channel conditions while using substantially less bandwidth than conventional baselines. This matters because rising ultra-high-resolution and immersive video traffic is pushing wireless links toward their limits, where pixel-exact methods often waste resources or fail under latency constraints. The design also allows a single model to adapt to varying bandwidth without retraining.

Core claim

PVSC generates channel-robust symbol streams through spatio-temporal feature coding without transmitting motion vectors, then uses specified side-information formatting, reference-buffer management, and lightweight rate control to achieve stable receiver reconstruction and bandwidth-adaptive inference from one learned model.

What carries the argument

The PVSC framework that combines perception-aware spatio-temporal feature coding with explicit side-information formatting and reference-buffer management to produce compact symbol streams.

If this is right

  • PVSC delivers comparable or better LPIPS and DISTS scores while using up to 75 percent and 87 percent less bandwidth than an engineered VTM plus 5G LDPC baseline.
  • The same model supports real-time inference on a single consumer GPU across varied resolutions and group-of-pictures lengths.
  • Performance remains superior under multiple channel conditions without requiring separate models for each bandwidth level.
  • Elimination of explicit motion-vector transmission reduces overhead and improves robustness in short-blocklength wireless settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could extend naturally to live streaming of 360-degree or volumetric video where motion compensation is especially costly.
  • If the rate-control logic generalizes, it may reduce the need for frequent model updates in deployed wireless video systems.
  • Similar feature-based semantic coding might apply to audio or sensor data streams facing the same bandwidth-latency trade-offs.

Load-bearing premise

The single learned model with the chosen side-information formatting, reference-buffer rules, and rate control will maintain stable reconstruction quality and correct bandwidth adaptation under every real-world wireless channel and every type of video content.

What would settle it

A controlled test on rapidly fading channels or high-motion video sequences that shows either a sharp rise in required bandwidth to meet target LPIPS/DISTS scores or visible reconstruction artifacts at the receiver.

Figures

Figures reproduced from arXiv: 2605.19397 by Yinhuan Huang, Zhijin Qin.

Figure 1
Figure 1. Figure 1: For bandwidth-limited transmission, pixel-level distortion optimization alone, [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: System model of the proposed PVSC. AE, AD, Q, PNG, and RM denote arithmetic encoding, arithmetic decoding, quantization, portable network graphics coding, and rate matching, respectively. PVSC models implicit spatio-temporal dependencies, where F tx/rx t−1 is used to generate C tx/rx e,t−1 and C tx/rx f,t−1 . C tx/rx e,t−1 , C tx/rx f,t−1 , and C tx/rx s,t−1 denote the temporal contexts used at time t for … view at source ↗
Figure 3
Figure 3. Figure 3: (a) Transmitter-side buffer update branch for local buffer updating and temporal-context generation [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) Feature coding with ViT/contextual ViT blocks for spatial-temporal modeling and FC layers for complex [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Transmission and reception pipeline. Fext(·), and the generator G(·), which provides lightweight rate adaptation and improves the flexibility of PVSC [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Rate-perception-distortion performance on 1080p/2K video datasets over different channels. The channel bandwidth [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparison of different methods over an AWGN channel (SNR = 6 dB). Zoom in for a better view. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
read the original abstract

Ultra-high-resolution streaming and emerging immersive services are driving rapidly increasing wireless video traffic. However, perceptually pleasing video transmission over bandwidth-limited and latency-constrained wireless links remains challenging for conventional separated source-channel systems, which primarily target bit-level reliability and often suffer performance degradation under short-blocklength transmission. In addition, pixel-level distortion optimization does not necessarily align with human perception, while existing learned video codecs may incur high complexity and raise deployment issues. This paper proposes PVSC, a perception-aware video semantic communication framework for real-time wireless video transmission. PVSC eliminates explicit motion-vector transmission and exploits spatio-temporal feature coding to generate compact and channel-robust symbol streams. It also specifies side-information formatting, reference-buffer management, and lightweight rate control, enabling stable receiver-side reconstruction and bandwidth-adaptive inference with a single model. Extensive experiments demonstrate that PVSC achieves superior performance across diverse datasets, resolutions, GOP configurations, and channel conditions. Compared with the engineered ``VTM + 5G LDPC'' baseline, PVSC saves up to about 75% and 87% bandwidth at comparable LPIPS and DISTS, respectively, while enabling real-time inference on a single NVIDIA RTX 4090 GPU.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes PVSC, a perception-aware video semantic communication framework for real-time wireless video transmission over bandwidth-limited links. It eliminates explicit motion-vector transmission by exploiting spatio-temporal feature coding to produce compact, channel-robust symbol streams, and specifies side-information formatting, reference-buffer management, and lightweight rate control to support stable receiver-side reconstruction and bandwidth-adaptive inference using a single model. Extensive experiments are reported to demonstrate superior performance across diverse datasets, resolutions, GOP configurations, and channel conditions, with up to 75% and 87% bandwidth savings versus the VTM + 5G LDPC baseline at comparable LPIPS and DISTS, respectively, while enabling real-time inference on a single NVIDIA RTX 4090 GPU.

Significance. If the reported bandwidth savings and perceptual-quality results hold under realistic conditions, the work would offer a meaningful advance for semantic communication in wireless video, particularly by aligning transmission with human perception rather than pixel-level distortion and by achieving real-time operation on commodity hardware. The single-model adaptive inference via the described rate control and buffer management is a practical strength that could reduce deployment complexity compared with separate source-channel systems.

major comments (2)
  1. [Abstract / Experimental evaluation] Abstract and experimental evaluation: the headline claim of up to 75% / 87% bandwidth reduction at matched LPIPS/DISTS 'across … channel conditions' rests on an untested distributional-robustness assumption. No explicit description is given of the training channel ensemble (e.g., AWGN or block-fading) versus the test conditions, nor are out-of-distribution evaluations (3GPP TR 38.901 clustered delay line, Doppler, bursty interference) reported. This directly affects the load-bearing assertion that the learned symbol mapping remains stable under the full range of real-world wireless statistics.
  2. [Methods / Results] Methods and results sections: the abstract states concrete percentage savings and real-time performance, yet the manuscript provides insufficient detail on dataset splits, number of sequences, statistical significance tests, hyper-parameter selection, and any post-hoc choices. Without these, the degree to which the data support the central performance claim cannot be independently verified.
minor comments (2)
  1. Notation for side-information formatting and reference-buffer management could be clarified with a small diagram or pseudocode to aid reproducibility.
  2. Consider adding a short discussion of failure modes (e.g., high-motion content or very low SNR) to temper the generalization statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. These have helped us identify areas where additional clarity and transparency will strengthen the manuscript. We provide point-by-point responses below and commit to revisions that directly address the concerns raised.

read point-by-point responses
  1. Referee: [Abstract / Experimental evaluation] Abstract and experimental evaluation: the headline claim of up to 75% / 87% bandwidth reduction at matched LPIPS/DISTS 'across … channel conditions' rests on an untested distributional-robustness assumption. No explicit description is given of the training channel ensemble (e.g., AWGN or block-fading) versus the test conditions, nor are out-of-distribution evaluations (3GPP TR 38.901 clustered delay line, Doppler, bursty interference) reported. This directly affects the load-bearing assertion that the learned symbol mapping remains stable under the full range of real-world wireless statistics.

    Authors: We agree that an explicit description of the channel models is necessary to support the robustness claims. In the revised manuscript we will insert a dedicated paragraph in Section III-C (Channel Model) that specifies the training ensemble as AWGN (SNR uniformly sampled from 0–30 dB) together with block-fading channels whose coherence time is drawn from {10, 20, 50} ms. All quantitative results, including the reported bandwidth savings at matched LPIPS/DISTS, were generated under exactly these conditions. While we did not include the full 3GPP TR 38.901 clustered-delay-line or bursty-interference scenarios, the single-model rate-control mechanism already demonstrates stable reconstruction across the tested SNR and coherence-time range (see Figures 6–8 and the associated ablation). We will add a short limitations paragraph acknowledging that broader 3GPP-style evaluations remain future work, thereby avoiding any overstatement of distributional robustness. revision: yes

  2. Referee: [Methods / Results] Methods and results sections: the abstract states concrete percentage savings and real-time performance, yet the manuscript provides insufficient detail on dataset splits, number of sequences, statistical significance tests, hyper-parameter selection, and any post-hoc choices. Without these, the degree to which the data support the central performance claim cannot be independently verified.

    Authors: We accept that the current experimental description lacks sufficient granularity for independent verification. In the revised manuscript we will expand Section IV-A (Datasets and Implementation Details) to report: (i) explicit train/validation/test splits (80/10/10 per dataset), (ii) the precise number of sequences evaluated (UVG: 7 sequences; MCL-JCV: 30 sequences; etc.), (iii) results of paired t-tests confirming statistical significance (p < 0.05) for the reported LPIPS/DISTS savings, (iv) the hyper-parameter search procedure (grid search over learning rate, loss weights, and buffer size, with final values tabulated), and (v) an explicit statement that no post-hoc sequence selection occurred—all test sequences were included. These additions will be placed before the main results tables so that readers can fully assess the supporting evidence. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation or claims

full rationale

The paper proposes an empirical framework (PVSC) for semantic video transmission and reports experimental bandwidth savings versus an external engineered baseline (VTM + 5G LDPC). No equations, predictions, or first-principles results are presented that reduce by construction to fitted parameters, self-citations, or renamed inputs. Performance claims rest on direct comparisons across datasets and conditions rather than any internal derivation chain, rendering the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Without the full manuscript, concrete free parameters, mathematical axioms, or newly postulated entities cannot be extracted; the ledger is therefore empty beyond the overall framework itself.

pith-pipeline@v0.9.0 · 5732 in / 1136 out tokens · 60871 ms · 2026-05-20T02:40:29.183088+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

  1. [1]

    Rate-efficient perception-oriented generative semantic video communication,

    Y . Huang and Z. Qin, “Rate-efficient perception-oriented generative semantic video communication,” inProc. IEEE Int. Conf. Commun. Workshops (ICC Workshops), 2026

  2. [2]

    Ericsson mobility report, june 2025,

    “Ericsson mobility report, june 2025,” White Paper, Jun. 2025

  3. [3]

    Channel coding rate in the finite blocklength regime,

    Y . Polyanskiy, H. V . Poor, and S. Verd ´u, “Channel coding rate in the finite blocklength regime,”IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2307–2359, 2010

  4. [4]

    Toward massive, ultrareliable, and low-latency wireless communication with short packets,

    G. Durisi, T. Koch, and P. Popovski, “Toward massive, ultrareliable, and low-latency wireless communication with short packets,”Proc. IEEE, vol. 104, no. 9, pp. 1711–1726, 2016

  5. [5]

    A mathematical theory of communication,

    C. E. Shannon, “A mathematical theory of communication,”Bell Syst. Tech. J., vol. 27, no. 3, pp. 379–423, 1948

  6. [6]

    Deepwive: Deep-learning-aided wireless video transmission,

    T.-Y . Tung and D. G ¨und¨uz, “Deepwive: Deep-learning-aided wireless video transmission,”IEEE J. Select. Areas Commun., vol. 40, no. 9, pp. 2570–2583, 2022. 13

  7. [7]

    Image quality assessment: From error visibility to structural similarity,

    Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,”IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004

  8. [8]

    The perception-distortion tradeoff,

    Y . Blau and T. Michaeli, “The perception-distortion tradeoff,” inProc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), Jun. 2018

  9. [9]

    Overview of the versatile video coding (VVC) standard and its applications,

    B. Bross, Y .-K. Wang, Y . Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (VVC) standard and its applications,”IEEE Trans. Circuit Syst. Video Technol., vol. 31, no. 10, pp. 3736–3764, 2021

  10. [10]

    Deep contextual video compression,

    J. Li, B. Li, and Y . Lu, “Deep contextual video compression,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 34, 2021, pp. 18 114– 18 125

  11. [11]

    Neural video compression with diverse contexts,

    ——, “Neural video compression with diverse contexts,” inProc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2023, pp. 22 616–22 626

  12. [12]

    Neural video compression with feature modulation,

    ——, “Neural video compression with feature modulation,” inProc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), Jun. 2024, pp. 26 099– 26 108

  13. [13]

    Towards practical real-time neural video compression,

    Z. Jia, B. Li, J. Li, W. Xie, L. Qi, H. Li, and Y . Lu, “Towards practical real-time neural video compression,” inProc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), Nashville, TN, USA, Jun. 2025, pp. 11–25

  14. [14]

    The unreasonable effectiveness of deep features as a perceptual metric,

    R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), Jun. 2018

  15. [15]

    Image quality assessment: Unifying structure and texture similarity,

    K. Ding, K. Ma, S. Wang, and E. P. Simoncelli, “Image quality assessment: Unifying structure and texture similarity,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 5, pp. 2567–2581, 2022

  16. [16]

    Perceptual learned video compression with recurrent conditional GAN,

    R. Yang, R. Timofte, and L. Van Gool, “Perceptual learned video compression with recurrent conditional GAN,” inProc. Int. Joint Conf. Artif. Intell. (IJCAI), Jul. 2022, pp. 1537–1544

  17. [17]

    Generative latent coding for ultra-low bitrate image and video compression,

    L. Qi, Z. Jia, J. Li, B. Li, H. Li, and Y . Lu, “Generative latent coding for ultra-low bitrate image and video compression,”IEEE Trans. Circuit Syst. Video Technol., vol. 35, no. 10, pp. 10 500–10 515, 2025

  18. [18]

    AI empowered wireless communications: From bits to semantics,

    Z. Qin, L. Liang, Z. Wang, S. Jin, X. Tao, W. Tong, and G. Y . Li, “AI empowered wireless communications: From bits to semantics,”Proc. IEEE, vol. 112, no. 7, pp. 621–652, Jul. 2024

  19. [19]

    Task-oriented multi-user semantic communications,

    H. Xie, Z. Qin, X. Tao, and K. B. Letaief, “Task-oriented multi-user semantic communications,”IEEE J. Select. Areas Commun., vol. 40, no. 9, pp. 2584–2597, Sept. 2022

  20. [20]

    Deep learning enabled semantic communication systems,

    H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,”IEEE Trans. Signal Process., vol. 69, pp. 2663–2675, Apr. 2021

  21. [21]

    Robust semantic communications for speech transmission,

    Z. Weng, Z. Qin, and G. Y . Li, “Robust semantic communications for speech transmission,” inProc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2025, pp. 1–5

  22. [22]

    Nonlinear transform source-channel coding for semantic communications,

    J. Dai, S. Wang, K. Tan, Z. Si, X. Qin, K. Niu, and P. Zhang, “Nonlinear transform source-channel coding for semantic communications,”IEEE J. Select. Areas Commun., vol. 40, no. 8, pp. 2300–2316, Aug. 2022

  23. [23]

    Progressive learned image transmission for semantic communication using hierarchical vae,

    G. Zhang, H. Li, Y . Cai, Q. Hu, G. Yu, and Z. Qin, “Progressive learned image transmission for semantic communication using hierarchical vae,” IEEE Trans. Cognit. Comm. Netw., vol. 11, no. 6, pp. 3640–3654, 2025

  24. [24]

    Image semantic communication with quadtree partition-based coding,

    Y . Huang and Z. Qin, “Image semantic communication with quadtree partition-based coding,”IEEE J. Select. Areas Commun., vol. 44, pp. 2765–2778, 2026

  25. [25]

    Joint semantic-channel coding and modulation for token communications,

    J. Ying, Z. Qin, Y . Feng, L. Wang, and X. Tao, “Joint semantic-channel coding and modulation for token communications,”IEEE Trans. Wirel. Commun., vol. 25, pp. 8179–8193, 2026

  26. [26]

    Wireless deep video semantic transmission,

    S. Wang, J. Dai, Z. Liang, K. Niu, Z. Si, C. Dong, X. Qin, and P. Zhang, “Wireless deep video semantic transmission,”IEEE J. Select. Areas Commun., vol. 41, no. 1, pp. 214–229, 2023

  27. [27]

    Deep learning enabled video semantic transmission against multi-dimensional noise,

    H. Niu, L. Wang, Z. Lu, K. Du, and X. Wen, “Deep learning enabled video semantic transmission against multi-dimensional noise,” in2023 IEEE Globecom Workshops (GC Wkshps), 2023, pp. 1267–1272

  28. [28]

    Wireless video transmission with joint semantic- channel coding,

    Y . Huang and Z. Qin, “Wireless video transmission with joint semantic- channel coding,” inProc. IEEE Globecom Workshops (GC Wkshps), 2024, pp. 1–6

  29. [29]

    Md- vsc—efficient wireless model division video semantic communication,

    Z. Bao, H. Liang, C. Dong, C. Li, X. Xu, and P. Zhang, “Md- vsc—efficient wireless model division video semantic communication,” IEEE Internet Things J., vol. 12, no. 2, pp. 1109–1124, 2025

  30. [30]

    Vista: Video transmission over a semantic communication ap- proach,

    C. Liang, X. Deng, Y . Sun, R. Cheng, L. Xia, D. Niyato, and M. A. Imran, “Vista: Video transmission over a semantic communication ap- proach,” inProc. IEEE Int. Conf. Commun. Workshops (ICC Workshops), 2023, pp. 1777–1782

  31. [31]

    Bidirectional motion-enhanced semantic communication for wireless video transmission,

    Z. Zhang, Q. Yang, S. He, and Z. Shi, “Bidirectional motion-enhanced semantic communication for wireless video transmission,”IEEE Internet Things J., vol. 13, no. 8, pp. 15 607–15 620, 2026

  32. [32]

    Goal-oriented semantic communication for wireless video transmission via generative ai,

    N. Li, Y . Deng, and D. Niyato, “Goal-oriented semantic communication for wireless video transmission via generative ai,”IEEE Trans. Wirel. Commun., vol. 25, pp. 10 841–10 854, 2026

  33. [33]

    Object-attribute- relation representation-based video semantic communication,

    Q. Du, Y . Duan, Q. Yang, X. Tao, and M. Debbah, “Object-attribute- relation representation-based video semantic communication,”IEEE J. Select. Areas Commun., vol. 43, no. 7, pp. 2446–2461, 2025

  34. [34]

    Wireless semantic communi- cations for video conferencing,

    P. Jiang, C.-K. Wen, S. Jin, and G. Y . Li, “Wireless semantic communi- cations for video conferencing,”IEEE J. Select. Areas Commun., vol. 41, no. 1, pp. 230–244, 2023

  35. [35]

    Synchronous multi-modal semantic communication system with packet-level coding,

    Y . Tian, J. Ying, Z. Qin, Y . Jin, and X. Tao, “Synchronous multi-modal semantic communication system with packet-level coding,”IEEE Trans. Wirel. Commun., vol. 24, no. 5, pp. 3684–3697, 2025

  36. [36]

    Agnolucci, L

    L. Agnolucci, L. Galteri, M. Bertini, and A. D. Bimbo,IEEE Trans. Multimedia

  37. [37]

    VideoQA-SC: Adaptive semantic communication for video question answering,

    J. Guo, W. Chen, Y . Sun, J. Xu, and B. Ai, “VideoQA-SC: Adaptive semantic communication for video question answering,”IEEE J. Select. Areas Commun., vol. 43, no. 7, pp. 2462–2477, 2025

  38. [38]

    Massive MIMO networks: Spectral, energy, and hardware efficiency,

    E. Bj ¨ornson, J. Hoydis, and L. Sanguinetti, “Massive MIMO networks: Spectral, energy, and hardware efficiency,”Found. Trends Signal Pro- cess., vol. 11, no. 3-4, pp. 154–655, 2017

  39. [39]

    Deep joint source- channel coding for wireless image transmission,

    E. Bourtsoulatze, D. B. Kurka, and D. G ¨und¨uz, “Deep joint source- channel coding for wireless image transmission,”IEEE Trans. Cognit. Comm. Netw., vol. 5, no. 3, pp. 567–579, May. 2019

  40. [40]

    End-to-end optimized image compression,

    J. Ball ´e, V . Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” inProc. Int. Conf. Learn. Represent. (ICLR), Toulon, France, Apr. 2017

  41. [41]

    Roelofs and R

    G. Roelofs and R. Koman,PNG: The Definitive Guide. USA: O’Reilly & Associates, Inc., 1999

  42. [42]

    Taming transformers for high- resolution image synthesis,

    P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high- resolution image synthesis,” inProc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), Jun. 2021, pp. 12 873–12 883

  43. [43]

    Generative adversarial nets,

    I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 27, 2014

  44. [44]

    Checkerboard context model for efficient learned image compression,

    D. He, Y . Zheng, B. Sun, Y . Wang, and H. Qin, “Checkerboard context model for efficient learned image compression,” inProc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2021, pp. 14 771–14 780

  45. [45]

    Robust semantic communications with masked VQ-V AE enabled codebook,

    Q. Hu, G. Zhang, Z. Qin, Y . Cai, G. Yu, and G. Y . Li, “Robust semantic communications with masked VQ-V AE enabled codebook,”IEEE Trans. Wirel. Commun., vol. 22, no. 12, pp. 8707–8722, Dec. 2023

  46. [46]

    Video enhance- ment with task-oriented flow,

    T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhance- ment with task-oriented flow,”Int. J. Comput. Vis., vol. 127, no. 8, pp. 1106–1125, 2019

  47. [47]

    BVI-DVC: A training database for deep video compression,

    D. Ma, F. Zhang, and D. R. Bull, “BVI-DVC: A training database for deep video compression,”IEEE Trans. Multimedia, vol. 24, pp. 3847– 3858, 2021

  48. [48]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,”arXiv preprint arXiv:1409.1556, 2014

  49. [49]

    Calculation of average psnr differences between rd- curves,

    G. Bjontegaard, “Calculation of average psnr differences between rd- curves,”ITU SG16 Doc. VCEG-M33, 2001

  50. [50]

    Common test conditions and software reference configurations for HEVC range extensions,

    D. Flynn, K. Sharman, and C. Rosewarne, “Common test conditions and software reference configurations for HEVC range extensions,”JCT-VC Doc. JCTVC-N1006, vol. 16, p. 6, 2013

  51. [51]

    UVG dataset: 50/120fps 4k sequences for video codec analysis and development,

    A. Mercat, M. Viitanen, and J. Vanne, “UVG dataset: 50/120fps 4k sequences for video codec analysis and development,” inProc. ACM Multimedia Syst. Conf. (MMSys), Istanbul, Turkey, 2020, pp. 297–302

  52. [52]

    MCL-JCV: A JND- based H.264/A VC video quality assessment dataset,

    H. Wang, W. Gan, S. Hu, J. Y . Lin, L. Jin, L. Song, P. Wang, I. Katsavounidis, A. Aaron, and C.-C. J. Kuo, “MCL-JCV: A JND- based H.264/A VC video quality assessment dataset,” inProc. IEEE Int. Conf. Image Process. (ICIP), 2016, pp. 1509–1513

  53. [53]

    FFmpeg reference software,

    “FFmpeg reference software,” https://www.ffmpeg.org/, accessed: 2025- 04-13

  54. [54]

    HEVC official test model,

    “HEVC official test model,” https://hevc.hhi.fraunhofer.de, accessed: 2025-04-13

  55. [55]

    VVC official test model,

    “VVC official test model,” https://vcgit.hhi.fraunhofer.de/jvet/ VVCSoftware VTM, accessed: 2025-04-13

  56. [56]

    Design of low-density parity check codes for 5g new radio,

    T. Richardson and S. Kudekar, “Design of low-density parity check codes for 5g new radio,”IEEE Commun. Mag., vol. 56, no. 3, pp. 28–34, Mar. 2018

  57. [57]

    3GPP TS 38.214 version 16.2.0 Release 16: 5G; NR; Physical layer procedures for data,

    3GPP, “3GPP TS 38.214 version 16.2.0 Release 16: 5G; NR; Physical layer procedures for data,” https://www.etsi.org/deliver/etsi ts/138200 138299/138214/16.02.00 60/ts 138214v160200p.pdf, 2020, accessed: 2025-04-13

  58. [58]

    Hoydis, S

    J. Hoydis, S. Cammerer, F. Ait Aoudia, M. Nimier-David, L. Maggi, G. Marcus, A. Vem, and A. Keller, “Sionna,” 2022, https://nvlabs.github.io/sionna/