arxiv: 2604.03353 · v1 · submitted 2026-04-03 · 📡 eess.IV · cs.CV

Recognition: 2 theorem links

· Lean Theorem

NeuralLVC: Neural Lossless Video Compression via Masked Diffusion with Temporal Conditioning

Tiberio Uricchio , Marco Bertini

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:33 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords lossless video compressionneural codecmasked diffusiontemporal conditioningI-frame P-frame architectureexact reconstructionarithmetic codingYUV420

0 comments

The pith

Masked diffusion with temporal conditioning enables a neural lossless video codec that reconstructs every pixel exactly while beating H.264 and H.265.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NeuralLVC as a neural lossless video codec that pairs an I-frame model using bijective linear tokenization with a P-frame model based on masked diffusion conditioned on the prior decoded frame. This structure exploits temporal redundancy in video by modeling the distribution of frame differences through a lightweight reference embedding that adds only 1.3 percent extra parameters. The approach matters because it delivers exact reconstruction in the input domain for YUV420 video planes while achieving lower bit rates than conventional lossless standards on standard test sequences. End-to-end verification with arithmetic coding confirms no information is lost during the full encode-decode cycle. Group-wise decoding further allows users to adjust the speed-compression balance as needed.

Core claim

NeuralLVC shows that masked diffusion models conditioned on previous decoded frames via a lightweight reference embedding can serve as an effective entropy model for temporal differences, and when combined with bijective tokenization for I-frames, this produces a fully lossless neural video codec that outperforms H.264 and H.265 lossless compression on CIF sequences while guaranteeing exact pixel reconstruction.

What carries the argument

Masked diffusion model for P-frames that uses a lightweight reference embedding from the prior decoded frame to model the probability distribution of temporal differences, paired with bijective linear tokenization for I-frames.

If this is right

The codec guarantees exact reconstruction of every pixel in YUV420 video planes.
It achieves lower bit rates than H.264 and H.265 lossless on the tested CIF sequences.
Group-wise decoding lets users trade decoding speed for better compression ratios.
The temporal conditioning adds only 1.3 percent more trainable parameters.
Exact reconstruction is verified through complete end-to-end encode-decode cycles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same conditioning technique could be tested on longer video sequences or different frame rates to check how well the reference embedding scales with motion complexity.
Because the model is fully differentiable, it might be combined with learned quantization steps to create controlled near-lossless variants without changing the core architecture.
The lightweight embedding approach suggests that similar diffusion-based entropy models could be adapted for other temporal signals such as audio waveforms if a comparable reference mechanism is defined.

Load-bearing premise

The masked diffusion process conditioned only on the previous decoded frame can fully capture the probability distribution of temporal differences without any hidden information loss.

What would settle it

Encode and decode any of the 9 Xiph CIF test sequences through the full pipeline with arithmetic coding, then compare every pixel value in the reconstructed YUV420 frames to the originals and check for even a single mismatch.

Figures

Figures reproduced from arXiv: 2604.03353 by Marco Bertini, Tiberio Uricchio.

**Figure 1.** Figure 1: High-level overview of NeuralLVC. The first frame is coded independently, while later frames are coded from the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Grouping patterns for different 𝛿 values on an 8×8 grid (32×32 in practice). Each color represents a group of positions predicted in parallel. 𝛿 = 0 yields column-wise groups; 𝛿 = 1 produces diagonal bands with more groups and better compression; 𝛿 = 2 creates steeper diagonals. The number in each cell indicates the group index. 3.4 Group-wise Parallelism Encoding a patch with the masked diffusion model re… view at source ↗

**Figure 3.** Figure 3: Temporal redundancy and compression cost (coast [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Per-frame compression rate on two CIF sequences [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Rate composition per video. The I-frame cost (dark) [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

While neural lossless image compression has advanced significantly with learned entropy models, lossless video compression remains largely unexplored in the neural setting. We present NeuralLVC, a neural lossless video codec that combines masked diffusion with an I/P-frame architecture for exploiting temporal redundancy. Our I-frame model compresses individual frames using bijective linear tokenization that guarantees exact pixel reconstruction. The P-frame model compresses temporal differences between consecutive frames, conditioned on the previous decoded frame via a lightweight reference embedding that adds only 1.3% trainable parameters. Group-wise decoding enables controllable speed-compression trade-offs. Our codec is lossless in the input domain: for video, it reconstructs YUV420 planes exactly; for image evaluation, RGB channels are reconstructed exactly. Experiments on 9 Xiph CIF sequences show that NeuralLVC outperforms H.264 and H.265 lossless by a significant margin. We verify exact reconstruction through end-to-end encode-decode testing with arithmetic coding. These results suggest that masked diffusion with temporal conditioning is a promising direction for neural lossless video compression.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NeuralLVC gives a coherent first neural lossless video codec via masked diffusion on P-frames plus bijective tokenization on I-frames, with the lossless property holding by construction, but the evaluation stays too narrow to show a clear advance.

read the letter

The main takeaway is that this paper puts forward a neural lossless video codec that separates I-frames (handled with bijective linear tokenization for exact pixel recovery) from P-frames (modeled with masked diffusion conditioned on the prior decoded frame through a small reference embedding). The conditioning adds only 1.3 percent extra parameters, and group-wise decoding lets users trade speed for rate. End-to-end arithmetic coding with the model probabilities is said to deliver exact YUV420 reconstruction, and the stress-test note confirms the argument has no internal contradiction because references are always decoded frames and the model assigns positive probability to observed symbols. That setup is new relative to prior neural compression work, which has mostly stayed with lossy video or lossless images. The architecture is laid out plainly and the lossless guarantee follows directly from the design choices rather than from fitted tricks. Experiments are reported on nine Xiph CIF sequences where it beats H.264 and H.265 lossless by a significant margin, with verification through encode-decode cycles. Those elements are the concrete contributions. The soft spots are straightforward. The test set is tiny and low-resolution, with no ablations, no rate numbers, and no scaling results shown in the summary. CIF content from years ago does not tell us whether the diffusion model holds up on modern high-resolution or diverse video. Without those details it is hard to judge how much of the reported gain comes from the diffusion component versus the tokenization or the arithmetic coder. The assumption that the conditioned diffusion accurately captures temporal difference distributions for true lossless recovery is plausible from the description but remains untested at scale. This paper is mainly for researchers already working on learned entropy models who want to extend diffusion approaches to exact video reconstruction. A reader looking for new conditioning ideas or bijective tokenization tricks could pull useful pieces from it. It deserves a serious referee because the core lossless argument is tight and the architecture is reproducible in principle, even though revisions will need to add broader experiments and quantitative breakdowns before it can be taken as a strong baseline.

Referee Report

1 major / 2 minor

Summary. The paper introduces NeuralLVC, a neural lossless video compression codec that uses an I/P-frame architecture with masked diffusion models for temporal conditioning. I-frames are compressed via bijective linear tokenization to guarantee exact pixel reconstruction, while P-frames model temporal differences conditioned on the exact previous decoded frame through a lightweight reference embedding (adding 1.3% parameters). The method supports group-wise decoding for speed-compression trade-offs and claims exact lossless reconstruction in the YUV420 domain (or RGB for images). Experiments on 9 Xiph CIF sequences are reported to show significant outperformance over H.264 and H.265 lossless codecs, with exact reconstruction verified via end-to-end encode-decode testing using arithmetic coding.

Significance. If the empirical claims hold with supporting data, this would constitute a meaningful advance in neural lossless video compression, a domain that remains largely unexplored relative to lossy video or neural lossless image methods. The bijective tokenization combined with conditioned diffusion for exact probability modeling via arithmetic coding provides a clean path to lossless guarantees while exploiting temporal redundancy with minimal added parameters.

major comments (1)

[Abstract] Abstract: the claim that NeuralLVC 'outperforms H.264 and H.265 lossless by a significant margin' on 9 Xiph CIF sequences is presented without any accompanying quantitative results (e.g., average bpp values, percentage savings, or reference to a results table), leaving the central empirical claim without visible supporting evidence in the provided text.

minor comments (2)

The description of the lightweight reference embedding (1.3% trainable parameters) would benefit from an explicit equation or breakdown showing how this overhead is calculated relative to the base diffusion model.
Clarify whether the masked diffusion model for P-frames is trained end-to-end or in stages, and provide details on the exact form of the temporal conditioning (e.g., how the previous decoded frame is embedded).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback. We address the single major comment below and will incorporate the suggested changes in the revised manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that NeuralLVC 'outperforms H.264 and H.265 lossless by a significant margin' on 9 Xiph CIF sequences is presented without any accompanying quantitative results (e.g., average bpp values, percentage savings, or reference to a results table), leaving the central empirical claim without visible supporting evidence in the provided text.

Authors: We agree that the abstract would be strengthened by including specific quantitative support for the performance claim. In the revised version, we will add concise numerical results (e.g., average bpp reductions relative to H.264/H.265 lossless) and a reference to the main results table while preserving the abstract's brevity. The full experimental details and tables already appear in the body of the manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's lossless claim rests on bijective linear tokenization for I-frames (explicitly guaranteeing exact pixel reconstruction by construction) combined with masked diffusion for P-frames that conditions on the exact previous decoded frame, followed by arithmetic coding that recovers symbols exactly when model probabilities are applied consistently at decode time. No equations, derivations, or 'predictions' reduce the reported performance gains to fitted parameters or self-referential definitions. Experiments on independent Xiph CIF sequences provide external validation against H.264/H.265, and end-to-end encode-decode verification confirms exact YUV420 reconstruction without hidden information loss. Any self-citations are not load-bearing for the core architecture or lossless argument, which remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unstated premise that masked diffusion can serve as an effective entropy model for both spatial and temporal video data while preserving exact invertibility through the chosen tokenization and arithmetic coding steps. No explicit free parameters, axioms, or invented entities are detailed in the abstract.

pith-pipeline@v0.9.0 · 5483 in / 1097 out tokens · 35135 ms · 2026-05-13T18:33:20.294350+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

bijective linear tokenization... Token_I(x)=2x... Token_P(x_t,x_{t-1})=(x_t-x_{t-1})+255... masked diffusion entropy model... LLaDA... group-wise parallel decoding... δ=2 yields 94 groups
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

I/P-frame architecture with temporal conditioning... reference embedding (+1.3% parameters)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 2 internal anchors

[1]

ITU-T Recommendation H.264: Advanced video coding for generic audio- visual services

2003. ITU-T Recommendation H.264: Advanced video coding for generic audio- visual services

work page 2003
[2]

ITU-T Recommendation H.265: High efficiency video coding

2013. ITU-T Recommendation H.265: High efficiency video coding

work page 2013
[3]

Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. 2021. Structured denoising diffusion models in discrete state-spaces. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 34. 17981– 17993

work page 2021
[4]

Yuanchao Bai, Xianming Liu, Kai Wang, Xiangyang Ji, Xiaolin Wu, and Wen Gao

work page
[5]

doi:10.1109/TPAMI.2023.3348486

Deep Lossy Plus Residual Coding for Lossless and Near-Lossless Image Compression.IEEE Transactions on Pattern Analysis and Machine Intelligence46, 5 (2024), 3577–3594. doi:10.1109/TPAMI.2023.3348486

work page doi:10.1109/tpami.2023.3348486 2024
[6]

Kecheng Chen, Pingping Zhang, Hui Liu, Jie Liu, Yibing Liu, Jixin Huang, Shiqi Wang, Hong Yan, and Haoliang Li. 2024. Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need. arXiv preprint arXiv:2411.12448(2024)

work page arXiv 2024
[7]

Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative Pretraining from Pixels.Proceedings of the 37th International Conference on Machine Learning(2020)

work page 2020
[8]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Ima- geNet: A large-scale hierarchical image database. InProc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 248–255

work page 2009
[9]

Junhao Du, Chuqin Zhou, Ning Cao, Gang Chen, Yunuo Chen, Zhengxue Cheng, Li Song, Guo Lu, and Wenjun Zhang. 2025. Large Language Model for Lossless Image Compression with Visual Prompts.arXiv preprint arXiv:2502.16163(2025)

work page arXiv 2025
[10]

European Society of Radiology (ESR). 2011. Usability of irreversible image compression in radiological imaging. A position paper by the European Society of Radiology (ESR).Insights into Imaging2, 2 (2011), 103–115. doi:10.1007/s13244- 011-0071-x NeuralLVC: Neural Lossless Video Compression via Masked Diffusion with Temporal Conditioning ACM MM ’26, Novemb...

work page doi:10.1007/s13244- 2011
[11]

Fraunhofer HHI. 2024. VVenC: Fraunhofer Versatile Video Encoder. https: //github.com/fraunhoferhhi/vvenc

work page 2024
[12]

Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, and Yan Lu. 2025. Towards Practical Real-Time Neural Video Compression. InProc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12543– 12552

work page 2025
[13]

Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, and Yan Lu. 2024. Generative Latent Coding for Ultra-Low Bitrate Image Compression. InProc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 26088–26098

work page 2024
[14]

Reto Kromer. 2017. Matroska and FFV1: One File Format for Film and Video Archiving?Journal of Film Preservation96 (2017), 41–45

work page 2017
[15]

Binzhe Li, Shurun Wang, Shiqi Wang, and Yan Ye. 2025. High Efficiency Image Compression for Large Visual-Language Models.IEEE Transactions on Circuits and Systems for Video Technology35, 3 (2025), 2870–2880. doi:10.1109/TCSVT. 2024.3488181

work page doi:10.1109/tcsvt 2025
[16]

Daxin Li, Yuanchao Bai, Kai Wang, Junjun Jiang, Xianming Liu, and Wen Gao

work page
[17]

CALLIC: Content Adaptive Learning for Lossless Image Compression. In Proc. of AAAI Conference on Artificial Intelligence

work page
[18]

Daxin Li, Yuanchao Bai, Kai Wang, Wenbo Zhao, Junjun Jiang, and Xianming Liu

work page
[19]

Rethinking Autoregressive Models for Lossless Image Compression via Hier- archical Parallelism and Progressive Adaptation.arXiv preprint arXiv:2511.10991 (2025)

work page arXiv 2025
[20]

Jiahao Li, Bin Li, and Yan Lu. 2021. Deep Contextual Video Compression. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 34

work page 2021
[21]

Jiahao Li, Bin Li, and Yan Lu. 2022. Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression. InProc. of ACM International Conference on Multimedia (ACM MM). 1503–1511. doi:10.1145/3503161.3547845

work page doi:10.1145/3503161.3547845 2022
[22]

Jiahao Li, Bin Li, and Yan Lu. 2023. Neural Video Compression with Diverse Con- texts. InProc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22616–22626

work page 2023
[23]

Jiahao Li, Bin Li, and Yan Lu. 2024. Neural Video Compression with Feature Modulation. InProc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 26099–26108

work page 2024
[24]

Ziguang Li, Chao Huang, Xuliang Wang, Haibo Hu, Cole Wyeth, Dongbo Bu, Quan Yu, Wen Gao, Xingwu Liu, and Ming Li. 2025. Lossless data compression by large models.Nature Machine Intelligence7 (2025), 794–799

work page 2025
[25]

Marcellin, and Ali Bilgin

Feng Liu, Miguel Hernandez-Cabronero, Victor Sanchez, Michael W. Marcellin, and Ali Bilgin. 2017. The Current Role of Image Compression Standards in Medical Imaging.Information8, 4 (2017), 131. doi:10.3390/info8040131

work page doi:10.3390/info8040131 2017
[26]

Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao. 2019. DVC: An End-to-End Deep Video Compression Framework. InProc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11006– 11015

work page 2019
[27]

Wenzhuo Ma and Zhenzhong Chen. 2025. Diffusion-based Perceptual Neural Video Compression with Temporal Diffusion Information Reuse.ACM Transac- tions on Multimedia Computing, Communications, and Applications21, 12, Article 345 (Nov. 2025), 22 pages. doi:10.1145/3761815

work page doi:10.1145/3761815 2025
[28]

Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2019. Practical Full Resolution Learned Lossless Image Compression. In Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10629–10638

work page 2019
[29]

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. 2025. Large Language Diffusion Models.arXiv preprint arXiv:2502.09992(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Michael Niedermayer. 2019. FFV1 Video Codec Specification. IETF Internet- Draft

work page 2019
[31]

Michael Niedermayer, Dave Rice, and Jérémie Martinez. 2021. FFV1 Video Coding Format Versions 0, 1, and 3. RFC 9043, Internet Engineering Task Force (IETF). https://www.rfc-editor.org/rfc/rfc9043

work page 2021
[32]

Linfeng Qi, Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, and Yan Lu. 2024. Long- Term Temporal Context Gathering for Neural Video Compression. InProc. of European Conference on Computer Vision (ECCV) (Lecture Notes in Computer Science). Springer, 305–322. doi:10.1007/978-3-031-72848-8_18

work page doi:10.1007/978-3-031-72848-8_18 2024
[33]

Linfeng Qi, Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, and Yan Lu. 2025. Generative Latent Coding for Ultra-Low Bitrate Image and Video Compression. IEEE Transactions on Circuits and Systems for Video Technology(2025). doi:10. 1109/TCSVT.2025.3571944

work page arXiv 2025
[34]

Hochang Rhee, Yeong Il Jang, Seyun Kim, and Nam Ik Cho. 2022. LC-FDNet: Learned Lossless Image Compression with Frequency Decomposition Network. In Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6033–6042

work page 2022
[35]

Xihua Sheng, Jiahao Li, Bin Li, Li Li, Dong Liu, and Yan Lu. 2022. Temporal Context Mining for Learned Video Compression.IEEE Transactions on Multimedia 25 (2022), 7311–7322. doi:10.1109/TMM.2022.3220421

work page doi:10.1109/tmm.2022.3220421 2022
[36]

SMPTE. 2022. ST 2042-1:2022 — VC-2 Video Compression

work page 2022
[37]

Chen-Han Tsai. 2026. Revisiting Data Compression with Language Modeling. arXiv preprint arXiv:2601.02875(2026)

work page arXiv 2026
[38]

Aaron Van den Oord, Nal Kalchbrenner, Lasse Espeholt, Koray Kavukcuoglu, Oriol Vinyals, and Alex Graves. 2016. Conditional image generation with PixelCNN decoders. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 29

work page 2016
[39]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 30

work page 2017
[40]

Rui Wan, Qi Zheng, and Yibo Fan. 2025. M3-CVC: Controllable Video Compres- sion with Multimodal Generative Models. InProc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1–5. doi:10.1109/ICASSP49660. 2025.10888491

work page doi:10.1109/icassp49660 2025
[41]

Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Lingdong Wang, Guan-Ming Su, Divya Kothandaraman, Tsung-Wei Huang, Mohammad Hajiesmaili, and Ramesh K. Sitaraman. 2025. Low-Bitrate Video Compression through Semantic-Conditioned Diffusion. InarXiv preprint. arXiv:2512.00408

work page internal anchor Pith review Pith/arXiv arXiv 2025
[42]

Xiph.org Foundation. 2024. Xiph.org Video Test Media. https://media.xiph.org/ video/derf/

work page 2024
[43]

Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019. Video Enhancement with Task-Oriented Flow.International Journal of Computer Vision127 (2019), 1106–1125

work page 2019
[44]

Yibo Yang, Justus Will, and Stephan Mandt. 2025. Progressive Compression with Universally Quantized Diffusion Models. InProc. of International Conference on Learning Representations (ICLR)

work page 2025
[45]

Xubing Ye, Yukang Gan, Xiaoke Huang, Yixiao Ge, and Yansong Tang. 2025. VoCo-LLaMA: Towards Vision Compression with Large Language Models. In Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 29836–29846

work page 2025
[46]

Maojun Zhang, Haotian Wu, Richeng Jin, Deniz Gunduz, and Krystian Miko- lajczyk. 2026. Diffusion-aided Extreme Video Compression with Lightweight Semantics Guidance.arXiv preprint arXiv:2602.05201(2026)

work page arXiv 2026
[47]

Pingping Zhang, Jinlong Li, Kecheng Chen, Meng Wang, Long Xu, Haoliang Li, Nicu Sebe, Sam Kwong, and Shiqi Wang. 2024. When video coding meets multimodal large language models: A unified paradigm for video coding.arXiv preprint arXiv:2408.08093(2024)

work page arXiv 2024
[48]

Zhe Zhang, Zhenzhong Chen, and Shan Liu. 2025. Fitted neural lossless im- age compression. InProceedings of the Computer Vision and Pattern Recognition Conference. 23249–23258

work page 2025
[49]

Zhe Zhang, Huairui Wang, Zhenzhong Chen, and Shan Liu. 2024. Learned Lossless Image Compression based on Bit Plane Slicing. InProc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2024