DVC: An End-to-end Deep Video Compression Framework

Chunlei Cai; Dong Xu; Guo Lu; Wanli Ouyang; Xiaoyun Zhang; Zhiyong Gao

arxiv: 1812.00101 · v3 · pith:J6SCL35Ynew · submitted 2018-11-30 · 📡 eess.IV · cs.CV

DVC: An End-to-end Deep Video Compression Framework

Guo Lu , Wanli Ouyang , Dong Xu , Xiaoyun Zhang , Chunlei Cai , Zhiyong Gao This is my paper

classification 📡 eess.IV cs.CV

keywords videocompressioninformationmotionarchitecturecodingconventionalcorresponding

0 comments

read the original abstract

Conventional video compression approaches use the predictive coding architecture and encode the corresponding motion information and residual information. In this paper, taking advantage of both classical architecture in the conventional video compression method and the powerful non-linear representation ability of neural networks, we propose the first end-to-end video compression deep model that jointly optimizes all the components for video compression. Specifically, learning based optical flow estimation is utilized to obtain the motion information and reconstruct the current frames. Then we employ two auto-encoder style neural networks to compress the corresponding motion and residual information. All the modules are jointly learned through a single loss function, in which they collaborate with each other by considering the trade-off between reducing the number of compression bits and improving quality of the decoded video. Experimental results show that the proposed approach can outperform the widely used video coding standard H.264 in terms of PSNR and be even on par with the latest standard H.265 in terms of MS-SSIM. Code is released at https://github.com/GuoLusjtu/DVC.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Streaming of rendered content with adaptive frame rate and resolution
eess.IV 2026-05 unverdicted novelty 5.0

A neural network predicts optimal frame rate and resolution pairs for bandwidth-constrained streaming of rendered content to boost perceptual quality.