Neural Video Compression with Domain Transfer

Haofeng Wang; Qi Zhang; Rongqun Lin; Siwei Ma; Tiange Zhang; Xiandong Meng; Xing Tian

arxiv: 2605.13476 · v1 · pith:K3IB4553new · submitted 2026-05-13 · 💻 cs.CV

Neural Video Compression with Domain Transfer

Tiange Zhang , Rongqun Lin , Xiandong Meng , Haofeng Wang , Xing Tian , Qi Zhang , Siwei Ma This is my paper

Pith reviewed 2026-05-14 20:17 UTC · model grok-4.3

classification 💻 cs.CV

keywords neural video compressiondomain transferlatent adaptationrate-distortion optimizationgeneralization

0 comments

The pith

Neural video codecs can close most of their training-test performance gap by adapting only the latent code at inference time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a domain transfer mechanism that adapts the encoded latent representation during inference to reduce the performance loss caused by mismatches between training and test video distributions. This adaptation occurs without any changes to the encoder or decoder parameters. The method pairs the transfer step with a frame-level dynamic rate-distortion adjustment that varies the loss weighting according to observed quality changes. Experiments report bitrate reductions and improved behavior on content outside the training distribution.

Core claim

DCVC-DT achieves up to 6.21% bitrate savings over the baseline DCVC-DC by using a lightweight online domain transfer that adapts the latent representation and a frame-level dynamic RD adjustment, leading to improved generalization and reduced error propagation.

What carries the argument

lightweight online domain transfer (DT) mechanism that dynamically adapts the encoded latent representation during inference without changing encoder or decoder parameters

Load-bearing premise

The lightweight online domain transfer mechanism can reliably bridge the domain gap between training and test distributions by adapting only the latent representation without any change to encoder or decoder parameters.

What would settle it

Apply the method to video sequences drawn from a domain shift larger than those tested in the paper and check whether the reported bitrate savings and generalization gains remain, shrink, or reverse.

Figures

Figures reproduced from arXiv: 2605.13476 by Haofeng Wang, Qi Zhang, Rongqun Lin, Siwei Ma, Tiange Zhang, Xiandong Meng, Xing Tian.

**Figure 1.** Figure 1: Overall framework of DCVC-DT. We integrate the proposed Online Domain Transfer (DT) mechanism and Dynamic RD Adjustment strategy into [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Illustration of the proposed Online Latent Refinement and Frame-level Dynamic RD Adjustment modules. During inference, the encoder iteratively [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: R–D performance comparison on HEVC Class C and D datasets [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The first 32 frames of the BasketballPass sequence from the HEVC [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Content-adaptive compression has always been a key direction in neural video coding (NVC), aiming to mitigate the domain gap between training and testing data. Such gaps often arise from distributional discrepancies between training and inference data, which may cause noticeable performance degradation when the testing content differs from the training distribution. To tackle this challenge, we propose DCVC-DT, a domain transfer enhanced neural video compression framework. Specifically, we design a lightweight online domain transfer (DT) mechanism that dynamically adapts the encoded latent representation during inference, effectively bridging the domain gap without modifying the encoder or decoder parameters. In addition, we develop a frame-level dynamic RD (Rate and Distortion) adjustment scheme that actively regulates the ratio of R and D in the loss function based on quality fluctuation, thereby improving rate-distortion performance. Extensive experiments demonstrate that DCVC-DT achieves up to 6.21% bitrate savings over the baseline DCVC-DC, while significantly enhancing generalization to unseen testing data and alleviating error propagation. Our code is available at https://github.com/SunnyMass/DCVC-DT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DCVC-DT adds online latent adaptation and per-frame RD tuning to DCVC-DC, yielding modest bitrate savings and better out-of-distribution behavior with released code.

read the letter

DCVC-DT takes the DCVC-DC baseline and adds a lightweight online step that adapts only the latent representation at inference time, plus a frame-level scheme that shifts the rate-distortion balance according to observed quality changes. The reported outcome is up to 6.21% bitrate reduction on standard benchmarks, plus clearer gains on unseen content and less error accumulation across frames. The encoder and decoder stay frozen, so the method avoids full retraining or parameter updates. Implementation details cover the adaptation objective and iteration count, and the code is public, which lets others check the numbers directly. The dynamic RD adjustment reacts to per-frame fluctuations rather than using a fixed tradeoff, which matches how real video content varies. These pieces address a practical issue in neural video coding where training distributions often diverge from test material. The gains are incremental, not dramatic, and the paper would benefit from explicit measurements of the added latency or compute from the online iterations. Generalization results rest on common test sets, so performance on more extreme domain shifts remains open. No internal contradictions appear in the construction, and the claims tie to concrete RD curves rather than circular fitting. This work suits researchers and engineers focused on deployable neural codecs who need simple ways to handle content variation without retraining the whole model. A reader already working on DCVC variants or domain adaptation in compression would extract usable ideas from the mechanism and the released implementation. It deserves peer review because the core method is reproducible and the empirical claims are falsifiable on public data.

Referee Report

0 major / 3 minor

Summary. The paper proposes DCVC-DT, a neural video compression framework that adds a lightweight online domain transfer (DT) mechanism to adapt the encoded latent representation at inference time (without changing encoder or decoder parameters) and a frame-level dynamic rate-distortion adjustment scheme driven by observed quality fluctuation. It reports up to 6.21% bitrate savings versus the DCVC-DC baseline, improved generalization on unseen test content, and reduced error propagation, with code released.

Significance. If the empirical claims hold on standard benchmarks, the work offers a practical, low-overhead route to closing train/test domain gaps in neural video coding by operating only on latents; the explicit implementation details and code release make the headline result directly falsifiable and potentially useful for deployment scenarios with distribution shift.

minor comments (3)

[Abstract] Abstract: the headline 6.21% figure and generalization claims are stated without naming the test sets, metrics, or number of sequences; adding one sentence with these details would strengthen the summary.
[Section 3.2] The dynamic RD adjustment is described as regulating the R/D ratio based on quality fluctuation, but the precise mapping from fluctuation measure to lambda (or equivalent) is not shown in equation form; a short derivation or pseudocode would clarify reproducibility.
[Section 4] Table or figure presenting the RD curves: confirm that all comparisons use identical GOP structures, intra-periods, and the same set of test sequences (e.g., UVG, HEVC Class B/C) so that the 6.21% savings can be directly verified.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation of minor revision. We appreciate the recognition of the practical benefits of the lightweight online domain transfer mechanism and the frame-level dynamic RD adjustment in DCVC-DT, as well as the value of the code release for reproducibility.

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper's central claims rest on an empirical framework: a lightweight online domain transfer mechanism that adapts only the latent representation during inference (leaving encoder/decoder frozen) plus a frame-level dynamic RD adjustment derived from observed quality fluctuations. No equations or self-citations are presented that reduce the reported 6.21% bitrate savings or generalization improvements to fitted parameters by construction. The mechanism is described with concrete implementation details (adaptation objective, iteration count) and tied to falsifiable RD curves on standard benchmarks, with code released. The derivation chain is self-contained against external evaluation and does not invoke load-bearing self-citations or ansatzes that collapse to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond standard neural-network training assumptions.

pith-pipeline@v0.9.0 · 5496 in / 1018 out tokens · 41130 ms · 2026-05-14T20:17:29.373470+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Lt = β_t · R + w_t · λ · D(x_t, ˆx_t); y'_t = y_t − α ∂Lt/∂y_t; β_t = β_0 − 0.2·sign(ΔQ_t)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

lightweight online domain transfer mechanism that dynamically adapts the encoded latent representation during inference, effectively bridging the domain gap without modifying the encoder or decoder parameters

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

[1]

Neural video compression with diverse contexts,

J. Li, B. Li, and Y . Lu, “Neural video compression with diverse contexts,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 616–22 626

work page 2023
[2]

Vqnerv: Vector quantization neural representation for video compression,

G. Zhang, L. Tang, and X. Zhang, “Vqnerv: Vector quantization neural representation for video compression,” inIEEE International Symposium on Circuits and Systems, 2024

work page 2024
[3]

Efvc: Error-propagation-free neural video coding with reversible transform,

J. Liao, L. Li, D. Liu, and H. Li, “Efvc: Error-propagation-free neural video coding with reversible transform,” inIEEE International Sympo- sium on Circuits and Systems, 2025

work page 2025
[4]

Bi-directional deep contextual video compression,

X. Sheng, L. Li, D. Liu, and S. Wang, “Bi-directional deep contextual video compression,”IEEE Transactions on Multimedia, 2025

work page 2025
[5]

L-lbvc: Long- term motion estimation and prediction for learned bi-directional video compression,

Y . Zhai, L. Tang, W. Jiang, J. Yang, and R. Wang, “L-lbvc: Long- term motion estimation and prediction for learned bi-directional video compression,” inIEEE Data Compression Conference, 2025, pp. 53–62

work page 2025
[6]

Emerging advances in learned video compression: Models, systems and beyond,

C. Jia, F. Ye, S. Ma, W. Gao, H. Sun, and L. Chiariglione, “Emerging advances in learned video compression: Models, systems and beyond,” inInternational Joint Conference on Artificial Intelligence, 2025

work page 2025
[7]

Deep video compression with scaled hier- archical bi-directional motion model,

F. Ye, L. Zhang, and C. Jia, “Deep video compression with scaled hier- archical bi-directional motion model,” inACM International Conference on Multimedia, 2024, pp. 11 244–11 247

work page 2024
[8]

Multiple hypotheses based motion compensation for learned video compression,

R. Lin, M. Wang, P. Zhang, S. Wang, and S. Kwong, “Multiple hypotheses based motion compensation for learned video compression,” Neurocomputing, vol. 548, p. 126396, 2023

work page 2023
[9]

Generative coding: Promise and challenges,

S. Ma, S. Song, B. Chen, Q. Mao, X. Fang, C. Jia, and S. Wang, “Generative coding: Promise and challenges,”APSIPA Transactions on Signal and Information Processing, vol. 14, 2025

work page 2025
[10]

Content adaptive based motion alignment framework for learned video compression,

T. Zhang, X. Meng, and S. Ma, “Content adaptive based motion alignment framework for learned video compression,” inarXiv preprint arXiv:2512.12936, 2025

work page arXiv 2025
[11]

L-stec: Learned video compression with long-term spatio-temporal enhanced context,

T. Zhang, Z. Huang, X. Meng, K. Zhang, Z. Deng, and S. Ma, “L-stec: Learned video compression with long-term spatio-temporal enhanced context,” inarXiv preprint arXiv:2512.12790, 2025

work page arXiv 2025
[12]

Learned image compression with large capacity and low redundancy of latent representation,

X. Meng, S. Zhu, S. Ma, and B. Zeng, “Learned image compression with large capacity and low redundancy of latent representation,” in IEEE International Conference on Image Processing, 2023, pp. 1640– 1644

work page 2023
[13]

Ecvc: Exploiting non-local correlations in multiple frames for contextual video compression,

W. Jiang, J. Li, K. Zhang, and L. Zhang, “Ecvc: Exploiting non-local correlations in multiple frames for contextual video compression,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition Conference, 2025, pp. 7331–7341

work page 2025
[14]

Spatial decomposition and temporal fusion based inter prediction for learned video compression,

X. Sheng, L. Li, D. Liu, and H. Li, “Spatial decomposition and temporal fusion based inter prediction for learned video compression,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 7, pp. 6460–6473, 2024

work page 2024
[15]

Hybrid local-global context learning for neural video compression,

Y . Zhai, J. Yang, W. Jiang, C. Yang, L. Tang, and R. Wang, “Hybrid local-global context learning for neural video compression,” inIEEE Data Compression Conference, 2024, pp. 322–331

work page 2024
[16]

Prediction and reference quality adaptation for learned video compression,

X. Sheng, L. Li, D. Liu, and H. Li, “Prediction and reference quality adaptation for learned video compression,”IEEE Transactions on Image Processing, 2025

work page 2025
[17]

Biecvc: Gated diversifica- tion of bidirectional contexts for learned video compression,

W. Jiang, J. Li, K. Zhang, and L. Zhang, “Biecvc: Gated diversifica- tion of bidirectional contexts for learned video compression,” inACM International Conference on Multimedia, 2025, pp. 7248–7257

work page 2025
[18]

Dmvc: Decomposed motion modeling for learned video compression,

K. Lin, C. Jia, X. Zhang, S. Wang, S. Ma, and W. Gao, “Dmvc: Decomposed motion modeling for learned video compression,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 7, pp. 3502–3515, 2022

work page 2022
[19]

Dy- namic temporal reference aggregation for neural video compression,

S. Liao, K. Feng, Z. Huang, S. Ma, Q. Wang, L. Chen, and C. Jia, “Dy- namic temporal reference aggregation for neural video compression,” in IEEE Data Compression Conference, 2025, pp. 13–22

work page 2025
[20]

Neural video com- pression with dynamic temporal context mining,

S. Liao, K. Feng, C. Jia, Z. Huang, and S. Ma, “Neural video com- pression with dynamic temporal context mining,” inIEEE International Conference on Wireless Communications and Signal Processing, 2024, pp. 1337–1342

work page 2024
[21]

Staco: Spatio-temporal adaptive context optimization for neural video compression,

K. Feng, S. Liao, Z. Huang, C. Jia, Q. Wang, L. Chen, S. Ma, and W. Gao, “Staco: Spatio-temporal adaptive context optimization for neural video compression,” inIEEE Data Compression Conference. IEEE, 2025, pp. 365–365

work page 2025
[22]

Rate-quality based rate control model for neural video compression,

S. Liao, C. Jia, H. Fan, J. Yan, and S. Ma, “Rate-quality based rate control model for neural video compression,” inIEEE International Conference on Acoustics, Speech and Signal Processing, 2024, pp. 4215–4219

work page 2024
[23]

Advanced learning-based coding tools for ecm: Intra prediction and in-loop filtering,

Y . Zhao, J. Fu, Z. Li, Q. Wang, Z. Huang, J. Zhang, C. Jia, and S. Ma, “Advanced learning-based coding tools for ecm: Intra prediction and in-loop filtering,” inIEEE International Symposium on Circuits and Systems, 2025

work page 2025
[24]

Chnerv: Condition enhanced hybrid neural representation for videos,

T. Zhang, H. Wang, X. Meng, K. Zhang, X. Deng, Z. Huang, and S. Ma, “Chnerv: Condition enhanced hybrid neural representation for videos,” inIEEE International Conference on Acoustics, Speech and Signal Processing, 2026, pp. 8377–8381

work page 2026
[25]

Overview of the h. 264/avc video coding standard,

T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h. 264/avc video coding standard,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, 2003

work page 2003
[26]

Overview of the high efficiency video coding (hevc) standard,

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649– 1668, 2012

work page 2012
[27]

Overview of the versatile video coding (vvc) standard and its applications,

B. Bross, Y .-K. Wang, Y . Ye, S. Liu, J. Chen, G. J. Sullivan, and J.- R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021

work page 2021
[28]

Content adaptive optimization for neural image compression,

J. Campos, S. Meierhans, A. Djelouah, and C. Schroers, “Content adaptive optimization for neural image compression,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019

work page 2019
[29]

Improving inference for neural image compression,

Y . Yang, R. Bamler, and S. Mandt, “Improving inference for neural image compression,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 573–584

work page 2020
[30]

Overfitting for fun and profit: Instance-adaptive data compression,

T. van Rozendaal, I. Huijben, and T. S. Cohen, “Overfitting for fun and profit: Instance-adaptive data compression,” inInternational Conference on Learning Representations, 2021

work page 2021
[31]

Universal deep image com- pression via content-adaptive optimization with adapters,

K. Tsubota, H. Akutsu, and K. Aizawa, “Universal deep image com- pression via content-adaptive optimization with adapters,” inIEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 2529– 2538

work page 2023
[32]

Content adaptive latents and decoder for neural image compression,

G. Pan, G. Lu, Z. Hu, and D. Xu, “Content adaptive latents and decoder for neural image compression,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 556–573

work page 2022
[33]

Offline and online optical flow enhancement for deep video compression,

C. Tang, X. Sheng, Z. Li, H. Zhang, L. Li, and D. Liu, “Offline and online optical flow enhancement for deep video compression,” inAAAI Conference on Artificial Intelligence, vol. 38, no. 6, 2024, pp. 5118– 5126

work page 2024
[34]

Group-aware parameter-efficient updating for content-adaptive neural video compression,

Z. Chen, L. Zhou, Z. Hu, and D. Xu, “Group-aware parameter-efficient updating for content-adaptive neural video compression,” inACM Inter- national Conference on Multimedia, 2024, pp. 11 022–11 031

work page 2024
[35]

Deep contextual video compression,

J. Li, B. Li, and Y . Lu, “Deep contextual video compression,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 18 114–18 125

work page 2021
[36]

Temporal context min- ing for learned video compression,

X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y . Lu, “Temporal context min- ing for learned video compression,”IEEE Transactions on Multimedia, vol. 25, pp. 7311–7322, 2022

work page 2022
[37]

Hybrid spatial-temporal entropy modelling for neural video compression,

J. Li, B. Li, and Y . Lu, “Hybrid spatial-temporal entropy modelling for neural video compression,” inACM International Conference on Multimedia, 2022, pp. 1503–1511

work page 2022
[38]

Towards practical real-time neural video compression,

Z. Jia, B. Li, J. Li, W. Xie, L. Qi, H. Li, and Y . Lu, “Towards practical real-time neural video compression,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 12 543–12 552

work page 2025
[39]

Neural video compression with feature modulation,

J. Li, B. Li, and Y . Lu, “Neural video compression with feature modulation,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 26 099–26 108

work page 2024
[40]

Content adaptive and error propagation aware deep video compression,

G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao, “Content adaptive and error propagation aware deep video compression,” inEuropean Conference on Computer Vision. Springer, 2020, pp. 456– 472

work page 2020
[41]

Frame level content adaptiveλfor neural video compression,

Z. Zuo, J. Liao, X. Song, Z. Liu, H. Zheng, and D. Liu, “Frame level content adaptiveλfor neural video compression,” inIEEE International Conference on Visual Communications and Image Processing, 2024

work page 2024
[42]

Content-adaptive infer- ence for state-of-the-art learned video compression,

A. Bilican, M. A. Yılmaz, and A. M. Tekalp, “Content-adaptive infer- ence for state-of-the-art learned video compression,”IEEE Open Journal of Signal Processing, 2025

work page 2025
[43]

Common test conditions and software reference configura- tions,

F. Bossen, “Common test conditions and software reference configura- tions,” in3rd. JCT-VC Meeting, Guangzhou, CN, October 2010, 2010

work page 2010
[44]

Calculation of average psnr differences between rd- curves,

G. Bjontegaard, “Calculation of average psnr differences between rd- curves,”ITU-T SG16, Doc. VCEG-M33, 2001

work page 2001

[1] [1]

Neural video compression with diverse contexts,

J. Li, B. Li, and Y . Lu, “Neural video compression with diverse contexts,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 616–22 626

work page 2023

[2] [2]

Vqnerv: Vector quantization neural representation for video compression,

G. Zhang, L. Tang, and X. Zhang, “Vqnerv: Vector quantization neural representation for video compression,” inIEEE International Symposium on Circuits and Systems, 2024

work page 2024

[3] [3]

Efvc: Error-propagation-free neural video coding with reversible transform,

J. Liao, L. Li, D. Liu, and H. Li, “Efvc: Error-propagation-free neural video coding with reversible transform,” inIEEE International Sympo- sium on Circuits and Systems, 2025

work page 2025

[4] [4]

Bi-directional deep contextual video compression,

X. Sheng, L. Li, D. Liu, and S. Wang, “Bi-directional deep contextual video compression,”IEEE Transactions on Multimedia, 2025

work page 2025

[5] [5]

L-lbvc: Long- term motion estimation and prediction for learned bi-directional video compression,

Y . Zhai, L. Tang, W. Jiang, J. Yang, and R. Wang, “L-lbvc: Long- term motion estimation and prediction for learned bi-directional video compression,” inIEEE Data Compression Conference, 2025, pp. 53–62

work page 2025

[6] [6]

Emerging advances in learned video compression: Models, systems and beyond,

C. Jia, F. Ye, S. Ma, W. Gao, H. Sun, and L. Chiariglione, “Emerging advances in learned video compression: Models, systems and beyond,” inInternational Joint Conference on Artificial Intelligence, 2025

work page 2025

[7] [7]

Deep video compression with scaled hier- archical bi-directional motion model,

F. Ye, L. Zhang, and C. Jia, “Deep video compression with scaled hier- archical bi-directional motion model,” inACM International Conference on Multimedia, 2024, pp. 11 244–11 247

work page 2024

[8] [8]

Multiple hypotheses based motion compensation for learned video compression,

R. Lin, M. Wang, P. Zhang, S. Wang, and S. Kwong, “Multiple hypotheses based motion compensation for learned video compression,” Neurocomputing, vol. 548, p. 126396, 2023

work page 2023

[9] [9]

Generative coding: Promise and challenges,

S. Ma, S. Song, B. Chen, Q. Mao, X. Fang, C. Jia, and S. Wang, “Generative coding: Promise and challenges,”APSIPA Transactions on Signal and Information Processing, vol. 14, 2025

work page 2025

[10] [10]

Content adaptive based motion alignment framework for learned video compression,

T. Zhang, X. Meng, and S. Ma, “Content adaptive based motion alignment framework for learned video compression,” inarXiv preprint arXiv:2512.12936, 2025

work page arXiv 2025

[11] [11]

L-stec: Learned video compression with long-term spatio-temporal enhanced context,

T. Zhang, Z. Huang, X. Meng, K. Zhang, Z. Deng, and S. Ma, “L-stec: Learned video compression with long-term spatio-temporal enhanced context,” inarXiv preprint arXiv:2512.12790, 2025

work page arXiv 2025

[12] [12]

Learned image compression with large capacity and low redundancy of latent representation,

X. Meng, S. Zhu, S. Ma, and B. Zeng, “Learned image compression with large capacity and low redundancy of latent representation,” in IEEE International Conference on Image Processing, 2023, pp. 1640– 1644

work page 2023

[13] [13]

Ecvc: Exploiting non-local correlations in multiple frames for contextual video compression,

W. Jiang, J. Li, K. Zhang, and L. Zhang, “Ecvc: Exploiting non-local correlations in multiple frames for contextual video compression,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition Conference, 2025, pp. 7331–7341

work page 2025

[14] [14]

Spatial decomposition and temporal fusion based inter prediction for learned video compression,

X. Sheng, L. Li, D. Liu, and H. Li, “Spatial decomposition and temporal fusion based inter prediction for learned video compression,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 7, pp. 6460–6473, 2024

work page 2024

[15] [15]

Hybrid local-global context learning for neural video compression,

Y . Zhai, J. Yang, W. Jiang, C. Yang, L. Tang, and R. Wang, “Hybrid local-global context learning for neural video compression,” inIEEE Data Compression Conference, 2024, pp. 322–331

work page 2024

[16] [16]

Prediction and reference quality adaptation for learned video compression,

X. Sheng, L. Li, D. Liu, and H. Li, “Prediction and reference quality adaptation for learned video compression,”IEEE Transactions on Image Processing, 2025

work page 2025

[17] [17]

Biecvc: Gated diversifica- tion of bidirectional contexts for learned video compression,

W. Jiang, J. Li, K. Zhang, and L. Zhang, “Biecvc: Gated diversifica- tion of bidirectional contexts for learned video compression,” inACM International Conference on Multimedia, 2025, pp. 7248–7257

work page 2025

[18] [18]

Dmvc: Decomposed motion modeling for learned video compression,

K. Lin, C. Jia, X. Zhang, S. Wang, S. Ma, and W. Gao, “Dmvc: Decomposed motion modeling for learned video compression,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 7, pp. 3502–3515, 2022

work page 2022

[19] [19]

Dy- namic temporal reference aggregation for neural video compression,

S. Liao, K. Feng, Z. Huang, S. Ma, Q. Wang, L. Chen, and C. Jia, “Dy- namic temporal reference aggregation for neural video compression,” in IEEE Data Compression Conference, 2025, pp. 13–22

work page 2025

[20] [20]

Neural video com- pression with dynamic temporal context mining,

S. Liao, K. Feng, C. Jia, Z. Huang, and S. Ma, “Neural video com- pression with dynamic temporal context mining,” inIEEE International Conference on Wireless Communications and Signal Processing, 2024, pp. 1337–1342

work page 2024

[21] [21]

Staco: Spatio-temporal adaptive context optimization for neural video compression,

K. Feng, S. Liao, Z. Huang, C. Jia, Q. Wang, L. Chen, S. Ma, and W. Gao, “Staco: Spatio-temporal adaptive context optimization for neural video compression,” inIEEE Data Compression Conference. IEEE, 2025, pp. 365–365

work page 2025

[22] [22]

Rate-quality based rate control model for neural video compression,

S. Liao, C. Jia, H. Fan, J. Yan, and S. Ma, “Rate-quality based rate control model for neural video compression,” inIEEE International Conference on Acoustics, Speech and Signal Processing, 2024, pp. 4215–4219

work page 2024

[23] [23]

Advanced learning-based coding tools for ecm: Intra prediction and in-loop filtering,

Y . Zhao, J. Fu, Z. Li, Q. Wang, Z. Huang, J. Zhang, C. Jia, and S. Ma, “Advanced learning-based coding tools for ecm: Intra prediction and in-loop filtering,” inIEEE International Symposium on Circuits and Systems, 2025

work page 2025

[24] [24]

Chnerv: Condition enhanced hybrid neural representation for videos,

T. Zhang, H. Wang, X. Meng, K. Zhang, X. Deng, Z. Huang, and S. Ma, “Chnerv: Condition enhanced hybrid neural representation for videos,” inIEEE International Conference on Acoustics, Speech and Signal Processing, 2026, pp. 8377–8381

work page 2026

[25] [25]

Overview of the h. 264/avc video coding standard,

T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h. 264/avc video coding standard,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, 2003

work page 2003

[26] [26]

Overview of the high efficiency video coding (hevc) standard,

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649– 1668, 2012

work page 2012

[27] [27]

Overview of the versatile video coding (vvc) standard and its applications,

B. Bross, Y .-K. Wang, Y . Ye, S. Liu, J. Chen, G. J. Sullivan, and J.- R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021

work page 2021

[28] [28]

Content adaptive optimization for neural image compression,

J. Campos, S. Meierhans, A. Djelouah, and C. Schroers, “Content adaptive optimization for neural image compression,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019

work page 2019

[29] [29]

Improving inference for neural image compression,

Y . Yang, R. Bamler, and S. Mandt, “Improving inference for neural image compression,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 573–584

work page 2020

[30] [30]

Overfitting for fun and profit: Instance-adaptive data compression,

T. van Rozendaal, I. Huijben, and T. S. Cohen, “Overfitting for fun and profit: Instance-adaptive data compression,” inInternational Conference on Learning Representations, 2021

work page 2021

[31] [31]

Universal deep image com- pression via content-adaptive optimization with adapters,

K. Tsubota, H. Akutsu, and K. Aizawa, “Universal deep image com- pression via content-adaptive optimization with adapters,” inIEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 2529– 2538

work page 2023

[32] [32]

Content adaptive latents and decoder for neural image compression,

G. Pan, G. Lu, Z. Hu, and D. Xu, “Content adaptive latents and decoder for neural image compression,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 556–573

work page 2022

[33] [33]

Offline and online optical flow enhancement for deep video compression,

C. Tang, X. Sheng, Z. Li, H. Zhang, L. Li, and D. Liu, “Offline and online optical flow enhancement for deep video compression,” inAAAI Conference on Artificial Intelligence, vol. 38, no. 6, 2024, pp. 5118– 5126

work page 2024

[34] [34]

Group-aware parameter-efficient updating for content-adaptive neural video compression,

Z. Chen, L. Zhou, Z. Hu, and D. Xu, “Group-aware parameter-efficient updating for content-adaptive neural video compression,” inACM Inter- national Conference on Multimedia, 2024, pp. 11 022–11 031

work page 2024

[35] [35]

Deep contextual video compression,

J. Li, B. Li, and Y . Lu, “Deep contextual video compression,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 18 114–18 125

work page 2021

[36] [36]

Temporal context min- ing for learned video compression,

X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y . Lu, “Temporal context min- ing for learned video compression,”IEEE Transactions on Multimedia, vol. 25, pp. 7311–7322, 2022

work page 2022

[37] [37]

Hybrid spatial-temporal entropy modelling for neural video compression,

J. Li, B. Li, and Y . Lu, “Hybrid spatial-temporal entropy modelling for neural video compression,” inACM International Conference on Multimedia, 2022, pp. 1503–1511

work page 2022

[38] [38]

Towards practical real-time neural video compression,

Z. Jia, B. Li, J. Li, W. Xie, L. Qi, H. Li, and Y . Lu, “Towards practical real-time neural video compression,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 12 543–12 552

work page 2025

[39] [39]

Neural video compression with feature modulation,

J. Li, B. Li, and Y . Lu, “Neural video compression with feature modulation,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 26 099–26 108

work page 2024

[40] [40]

Content adaptive and error propagation aware deep video compression,

G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao, “Content adaptive and error propagation aware deep video compression,” inEuropean Conference on Computer Vision. Springer, 2020, pp. 456– 472

work page 2020

[41] [41]

Frame level content adaptiveλfor neural video compression,

Z. Zuo, J. Liao, X. Song, Z. Liu, H. Zheng, and D. Liu, “Frame level content adaptiveλfor neural video compression,” inIEEE International Conference on Visual Communications and Image Processing, 2024

work page 2024

[42] [42]

Content-adaptive infer- ence for state-of-the-art learned video compression,

A. Bilican, M. A. Yılmaz, and A. M. Tekalp, “Content-adaptive infer- ence for state-of-the-art learned video compression,”IEEE Open Journal of Signal Processing, 2025

work page 2025

[43] [43]

Common test conditions and software reference configura- tions,

F. Bossen, “Common test conditions and software reference configura- tions,” in3rd. JCT-VC Meeting, Guangzhou, CN, October 2010, 2010

work page 2010

[44] [44]

Calculation of average psnr differences between rd- curves,

G. Bjontegaard, “Calculation of average psnr differences between rd- curves,”ITU-T SG16, Doc. VCEG-M33, 2001

work page 2001