Learning Image and Video Compression through Spatial-Temporal Energy Compaction

Heming Sun; Jiro Katto; Masaru Takeuchi; Zhengxue Cheng

arxiv: 1906.09683 · v2 · pith:HDYUCFLTnew · submitted 2019-06-24 · 📡 eess.IV

Learning Image and Video Compression through Spatial-Temporal Energy Compaction

Zhengxue Cheng , Heming Sun , Masaru Takeuchi , Jiro Katto This is my paper

Pith reviewed 2026-05-25 17:34 UTC · model grok-4.3

classification 📡 eess.IV

keywords image compressionvideo compressionconvolutional autoencoderenergy compactionMS-SSIMMPEG-4H.264interpolation loop

0 comments

The pith

A convolutional autoencoder with a spatial energy compaction penalty in its loss function outperforms image compression standards under MS-SSIM and generalizes to video compression that beats MPEG-4 while matching H.264.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors build a convolutional autoencoder for images and extend it with an interpolation loop for videos. Their core proposal is to add a penalty term that enforces spatial energy compaction during training and to use measured temporal energy distribution to choose how many frames belong in each interpolation loop. This produces image results better than the latest standard when measured by MS-SSIM and superior to other learned methods at high bit rates. For video, the same principle yields compression that significantly exceeds MPEG-4 and remains competitive with H.264 across varied content. The work therefore tests whether explicit energy compaction inside a learned codec is sufficient to surpass hand-designed transforms.

Core claim

The central claim is that realizing spatial-temporal energy compaction inside a convolutional autoencoder framework produces image compression that outperforms the latest standard under the MS-SSIM metric and exceeds prior learning-based methods at high bit rates, while the video extension that selects interpolation-loop length from temporal energy distribution significantly outperforms MPEG-4 and competes with H.264.

What carries the argument

The spatial energy compaction-based penalty added to the training loss, together with the temporal energy distribution used to adaptively set the number of frames inside each interpolation loop.

If this is right

Image compression exceeds the latest standard on MS-SSIM and other learned methods at high bit rates.
Video compression significantly outperforms MPEG-4 while remaining competitive with H.264.
Both image and video outputs are described as more visually pleasant than the traditional codecs.
Performance benefits are attributed to the spatial energy compaction term especially at higher rates.
The interpolation loop length is chosen per video segment according to its measured temporal energy distribution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If energy compaction is the operative mechanism, the same penalty could be inserted into other autoencoder codecs without changing their architectures.
The temporal-energy rule for loop length may fail on videos whose motion statistics deviate sharply from the training distribution.
Extending the same penalty to rate-distortion optimization in learned codecs for other modalities such as audio could be tested directly.
A controlled experiment that varies only the energy-compaction weight while freezing all other hyperparameters would isolate its contribution.

Load-bearing premise

That the reported gains arise directly from the added energy-compaction penalty and the temporal-energy rule for loop length, rather than from other unstated training choices or content-specific tuning.

What would settle it

An ablation that trains the same autoencoder without the spatial energy compaction penalty and checks whether the MS-SSIM advantage over the standard at high bit rates disappears.

Figures

Figures reproduced from arXiv: 1906.09683 by Heming Sun, Jiro Katto, Masaru Takeuchi, Zhengxue Cheng.

**Figure 2.** Figure 2: Overview of our proposed learning image and video compression with spatial-temporal energy compaction. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Network architecture of analysis and synthesis [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Performance with different quantization methods. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Examples of Temporal Energy Histogram for [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation Study. 5.1. Ablation study In order to show the effectiveness of our proposed spatial-temporal energy compaction approach, we first perform the following ablation study. We compare the performance of our image compression with spatial energy compaction constraint to the case without energy constraint. The RD performance averaged on the Kodak dataset is presented in [PITH_FULL_IMAGE:figures/full… view at source ↗

**Figure 7.** Figure 7: Comparison results using different datasets. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison results for each video sequence. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Example of one reconstruction image kodim01 from Kodak dataset [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 10.** Figure 10: Example of one reconstruction frame in Video [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗

**Figure 11.** Figure 11: Results on PSNR [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

read the original abstract

Compression has been an important research topic for many decades, to produce a significant impact on data transmission and storage. Recent advances have shown a great potential of learning image and video compression. Inspired from related works, in this paper, we present an image compression architecture using a convolutional autoencoder, and then generalize image compression to video compression, by adding an interpolation loop into both encoder and decoder sides. Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. Thereby, we propose to add a spatial energy compaction-based penalty into loss function, to achieve higher image compression performance. Furthermore, based on temporal energy distribution, we propose to select the number of frames in one interpolation loop, adapting to the motion characteristics of video contents. Experimental results demonstrate that our proposed image compression outperforms the latest image compression standard with MS-SSIM quality metric, and provides higher performance compared with state-of-the-art learning compression methods at high bit rates, which benefits from our spatial energy compaction approach. Meanwhile, our proposed video compression approach with temporal energy compaction can significantly outperform MPEG-4 and is competitive with commonly used H.264. Both our image and video compression can produce more visually pleasant results than traditional standards.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The spatial energy penalty and adaptive temporal loop are the concrete additions, but the paper does not isolate their contribution from architecture or tuning choices.

read the letter

The paper takes a convolutional autoencoder for images and extends it to video with an interpolation loop at both ends. The new pieces are a spatial energy compaction penalty added to the rate-distortion loss and a rule that picks the loop length from the temporal energy distribution of the sequence. Those two moves are what the authors present as their contribution beyond the learned-compression baselines they cite. The abstract claims the image version beats BPG on MS-SSIM and beats prior learned methods at high rates, while the video version beats MPEG-4 and stays competitive with H.264. That is the headline result they want readers to take away. The energy-compaction framing is a reasonable attempt to import a classical compression principle into the learned setting, and the adaptive frame selection is a straightforward way to make the loop content-dependent. Both ideas are easy to understand and could be tried by other groups. The main weakness is the lack of any controlled check that the penalty term, rather than the base network or the choice of its weight, is responsible for the measured improvement. The same holds for the temporal selection rule: no evidence is given that it generalizes across motion types without per-sequence adjustment. The abstract lists the penalty weight and energy threshold as free parameters, which makes the usual attribution problem harder to dismiss. Without those ablations or at least a clear statement of the training protocol and test sets, the performance numbers remain hard to interpret. This is the sort of incremental idea paper that people already running learned-compression experiments might want to read for the loss term and the loop trick. It is not a foundational result, but the claims are stated plainly enough that a referee could ask for the missing controls. I would send it out for review rather than desk-reject, mainly to see whether the full experiments address the isolation issue.

Referee Report

2 major / 2 minor

Summary. The paper proposes a convolutional autoencoder for image compression augmented with a spatial energy compaction penalty added to the loss function. It extends the approach to video by inserting an interpolation loop at both encoder and decoder and selecting the number of frames per loop according to temporal energy distribution to adapt to motion. The central empirical claims are that the image method outperforms BPG under MS-SSIM and exceeds prior learned codecs at high rates, while the video method significantly beats MPEG-4 and is competitive with H.264.

Significance. If the attribution of gains to the energy-compaction terms can be isolated and the results reproduced, the work would usefully connect classical energy-compaction principles with end-to-end learned compression. The current manuscript, however, supplies only high-level performance statements without the supporting experimental controls or implementation details needed to evaluate that contribution.

major comments (2)

[Experimental results / abstract claims] The abstract and experimental claims attribute outperformance to the spatial energy compaction penalty, yet no ablation (with vs. without the penalty term, holding architecture and rate-distortion loss fixed) is reported. Without this control the central attribution cannot be verified and the reported gains may be due to other factors in the autoencoder design or training procedure.
[Video compression method] Frame-count selection is said to be driven by temporal energy distribution, but the manuscript provides neither the precise threshold rule nor any cross-sequence validation showing that the same rule generalizes without per-video retuning. This leaves the video claim vulnerable to the circularity concern that the adaptation is effectively fitted to the test set.

minor comments (2)

Exact loss formulation (weight of the compaction penalty, rate term, distortion metric) and training hyper-parameters are not stated, preventing reproduction or direct comparison with other learned codecs.
[Abstract] The phrase 'the latest image compression standard' should be replaced by the explicit reference (BPG) already used in the reader's summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will incorporate the requested clarifications and experiments in the revision.

read point-by-point responses

Referee: The abstract and experimental claims attribute outperformance to the spatial energy compaction penalty, yet no ablation (with vs. without the penalty term, holding architecture and rate-distortion loss fixed) is reported. Without this control the central attribution cannot be verified and the reported gains may be due to other factors in the autoencoder design or training procedure.

Authors: We agree that a direct ablation isolating the spatial energy compaction penalty is required to substantiate the attribution. In the revised manuscript we will add an ablation study comparing performance with and without the penalty term while holding the autoencoder architecture and rate-distortion loss fixed. revision: yes
Referee: Frame-count selection is said to be driven by temporal energy distribution, but the manuscript provides neither the precise threshold rule nor any cross-sequence validation showing that the same rule generalizes without per-video retuning. This leaves the video claim vulnerable to the circularity concern that the adaptation is effectively fitted to the test set.

Authors: We agree that the precise threshold rule and evidence of generalization must be supplied. In the revision we will state the exact threshold rule used for frame selection and add cross-sequence validation experiments confirming that the rule performs consistently across different videos without per-video retuning. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on reported experiments, not self-referential derivation.

full rationale

The paper describes a convolutional autoencoder architecture, introduces a spatial energy compaction penalty into the loss, and selects interpolation loop length from temporal energy distribution. Performance claims (outperformance vs. BPG on MS-SSIM, vs. learned codecs at high rates, and vs. MPEG-4/H.264) are presented as experimental outcomes. No equations, uniqueness theorems, or self-citations are shown that reduce the reported gains to the penalty term or loop selection by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on the unverified assumption that energy compaction penalties improve rate-distortion without side effects and that temporal energy statistics generalize for frame selection; no independent evidence for these is given in the abstract.

free parameters (2)

spatial energy compaction penalty weight
The penalty coefficient added to the loss must be chosen or fitted to achieve the reported gains.
temporal energy threshold for frame selection
The criterion for choosing the number of frames in the interpolation loop depends on energy distribution and is likely parameterized.

axioms (1)

domain assumption Convolutional autoencoders can be trained to perform effective lossy compression when augmented with domain-specific penalties
Invoked as the basis for the architecture and loss design.

pith-pipeline@v0.9.0 · 5748 in / 1298 out tokens · 31108 ms · 2026-05-25T17:34:22.094260+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 4 internal anchors

[1]

The JPEG still picture compression stan- dard

G. K Wallace, “The JPEG still picture compression stan- dard”, IEEE Trans. on Consumer Electronics, vol. 38, no. 1, pp. 43-59, Feb. 1991

work page 1991
[2]

An overview of the JPEG2000 still image compression standard

Majid Rabbani, Rajan Joshi, “An overview of the JPEG2000 still image compression standard” , ELSEVIER Signal Pro- cessing: Image Communication, vol. 17, no, 1, pp. 3-48, Jan. 2002

work page 2002
[3]

Overview of the High Efﬁciency Video Coding (HEVC) Standard

G. J. Sullivan, J. Ohm, W. Han and T. Wiegand, “Overview of the High Efﬁciency Video Coding (HEVC) Standard” , IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1668, Dec. 2012

work page 2012
[4]

Overview of the H.264/AVC Video Coding Standard

T. Wiegand, G. J. Sullivan, G. Bjontegaard, A. Luthra, “Overview of the H.264/AVC Video Coding Standard”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560-576, July. 2003

work page 2003
[5]

Extracting and composing robust features with denoising au- toencoders

P. Vincent, H. Larochelle, Y . Bengio and P.-A. Manzagol, “Extracting and composing robust features with denoising au- toencoders”, Intl. conf. on Machine Learning (ICML), pp. 1096-1103, July 5-9. 2008

work page 2008
[6]

Performance Com- parison of Convolutional AutoEncoders, Generative Adver- sarial Networks and Super-Resolution for Image Compres- sion

Z. Cheng, H. Sun, M. Takeuchi, J. Katto, “Performance Com- parison of Convolutional AutoEncoders, Generative Adver- sarial Networks and Super-Resolution for Image Compres- sion”, CVPR Workshop and Challenge on Learned Image Compression (CLIC), pp. 1-4, June 17-22, 2018

work page 2018
[7]

CNN-Optimized Image Compression with Uncertainty based Resource Allocation

Z. Chen, Y . Li, F. Liu, Z. Liu, X. Pan, W. Sun, Y . Wang, Y . Zhou, H. Zhu, S. Liu, “CNN-Optimized Image Compression with Uncertainty based Resource Allocation” , CVPR Work- shop and Challenge on Learned Image Compression (CLIC), pp. 1-4, June 17-22, 2018

work page 2018
[8]

Variable Rate Image Compression with Recurrent Neural Networks

G. Toderici, S. M.O’Malley, S. J. Hwang, et al.,“Variable rate image compression with recurrent neural networks” , arXiv: 1511.06085, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[9]

Full Resolution Image Compression with Recurrent Neural Networks

G, Toderici, D. Vincent, N. Johnson, et al., “Full Resolution Image Compression with Recurrent Neural Networks”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, July 21-26, 2017

work page 2017
[10]

Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks

Nick Johnson, Damien Vincent, David Minnen, et al., “Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks” , arXiv:1703.10114, pp. 1-9, March 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[11]

Lossy Image Compression with Compressive Au- toencoders

Lucas Theis, Wenzhe Shi, Andrew Cunninghan and Ferenc Huszar, “Lossy Image Compression with Compressive Au- toencoders”, Intl. Conf. on Learning Representations (ICLR), pp. 1-19, April 24-26, 2017

work page 2017
[12]

End-to-End Optimized Image Compression

J. Balle, Valero Laparra, Eero P. Simoncelli, “End-to-End Optimized Image Compression”, Intl. Conf. on Learning Rep- resentations (ICLR), pp. 1-27, April 24-26, 2017

work page 2017
[13]

Variational Image Compression with a Hyper- prior

Johannes Balle, D. Minnen, S. Singh, S. J. Hwang, N. Johnston, “Variational Image Compression with a Hyper- prior”, Intl. Conf. on Learning Representations (ICLR), pp. 1-23, 2018. https://tensorflow.github.io/ compression/

work page 2018
[14]

Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations

E. Agustsson, F. Mentzer, M. Tschannen, L. Cavigelli, R. Timofte, L. Benini, L. V . Gool, “Soft-to-Hard Vector Quan- tization for End-to-End Learning Compressible Representa- tions”, Neural Information Processing Systems (NIPS) 2017, arXiv:1704.00648v2

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Conditional Probability Models for Deep Image Compression

F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, L. V . Gool, “Conditional Probability Models for Deep Image Compression”, IEEE Conf. on Computer Vision and Pat- tern Recognition (CVPR), June 17-22, 2018. https:// github.com/fab-jul/imgcomp-cvpr

work page 2018
[16]

Deep Convolu- tional AutoEncoder-based Lossy Image Compression

Z. Cheng, H. Sun, M. Takeuchi, J. Katto, “Deep Convolu- tional AutoEncoder-based Lossy Image Compression” , Pic- ture Coding Symposium, pp. 1-5, June 24-27, 2018

work page 2018
[17]

Learning Con- volutional Networks for Content-weighted Image Compres- sion

M. Li, W. Zuo, S. Gu, D. Zhao, D. Zhang, “Learning Con- volutional Networks for Content-weighted Image Compres- sion”, IEEE Conf. on Computer Vision and Pattern Recog- nition (CVPR), June 17-22, 2018

work page 2018
[18]

Real Time Adaptive Image Com- pression

Ripple Oren, L. Bourdev, “Real Time Adaptive Image Com- pression”, Proc. of Machine Learning Research, V ol. 70, pp. 2922-2930, 2017

work page 2017
[19]

Generative Compres- sion

S. Santurkar, D. Budden, N. Shavit, “Generative Compres- sion”, Picture Coding Symposium, June 24-27, 2018

work page 2018
[20]

Generative Adversarial Networks for Extreme Learned Image Compression

E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. V . Gool, “Generative Adversarial Networks for Extreme Learned Image Compression”, arXiv:1804.02958

work page arXiv
[21]

Video Compression through Image Interpolation

C-Y Wu, N. Singhal, P. Krahenbuhl, “Video Compression through Image Interpolation”, 15th European Conference on Computer Vision, September 8 C 14, 2018

work page 2018
[22]

Deepcoder: A deep neural network based video compres- sion

T. Chen, H. Liu, Q. Shen, T. Yue, X. Cao, and Z. Ma. “Deepcoder: A deep neural network based video compres- sion”. 2017 IEEE Visual Communications and Image Pro- cessing (VCIP), pp. 1C4, Dec 2017

work page 2017
[23]

Workshop and Challenge on Learned Image Compres- sion, CVPR2018, http://www.compression.cc/ challenge/

work page
[24]

Real-time single image and video super-resolution using an efﬁcient sub-pixel convo- lutional neural network

W. Shi, J. Caballero, F. Huszar, et al.“Real-time single image and video super-resolution using an efﬁcient sub-pixel convo- lutional neural network”, Intl. IEEE Conf. on Computer Vi- sion and Pattern Recognition, June 26-July 1, 2016

work page 2016
[25]

Digital coding of waveforms

N.S. Jayant and P. Noll, “Digital coding of waveforms”, En- glewood Cliffs NJ, Prentice-Hall, 1984

work page 1984
[26]

Performance Evaluation of Subband Coding and Optimization of Its Filter Coefﬁcients

J.Katto and Y .Yasuda:“Performance Evaluation of Subband Coding and Optimization of Its Filter Coefﬁcients”, SPIE Vi- sual Communication and Image Processing, Nov.1991

work page 1991
[27]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization”, arXiv:1412.6980, pp.1-15, Dec. 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[28]

ImageNet: A Large-Scale Hierarchical Image Database

J. Deng, W. Dong, R. Socher, L. Li, K. Li and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database” , IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1-8, June 20-25, 2009

work page 2009
[29]

Kodak Lossless True Color Image Suite, Download from http://r0k.us/graphics/kodak/

work page
[30]

Multiscale structural similarity for image quality assessment

Z. Wang, E. P. Simoncelli and A. C. Bovik, “Multiscale structural similarity for image quality assessment” , The 36- th Asilomar Conference on Signals, Systems and Computers, V ol.2, pp. 1398-1402, Nov. 2013

work page 2013
[31]

JPEG ofﬁcial software libjpeg, https://jpeg.org/jpeg/software.html

work page
[32]

JPEG2000 ofﬁcial software OpenJPEG, https://jpeg.org/jpeg2000/software.html

work page
[33]

BPG Image Format, https://bellard.org/bpg/

work page
[34]

A Mathematical Theory of Communica- tion

C. E. Shannon, “A Mathematical Theory of Communica- tion”, The Bell System Technical Journal, V ol. 27, pp. 379- 423, July, 1948

work page 1948
[35]

Video Frame Interpola- tion via Adaptive Separable Convolution

S. Niklaus, L. Mai and F. Liu, “Video Frame Interpola- tion via Adaptive Separable Convolution”, IEEE International Conference on Computer Vision (ICCV) 2017

work page 2017
[36]

http://trace.eas.asu.edu/ index.html

Video Trace Library. http://trace.eas.asu.edu/ index.html

work page
[37]

High Efﬁciency Video Coding (HEVC) Test Model 16 (HM 16) Encoder Description

K. McCann, C. Rosewarne, B. Bross, M. Naccari, K. Shar- man, G. J. Sullivan, “High Efﬁciency Video Coding (HEVC) Test Model 16 (HM 16) Encoder Description” , Document JCTVC-R1002, Sapporo, Jul. 2014. https://hevc.hhi. fraunhofer.de/svn/svn_HEVCSoftware/

work page 2014
[38]

H.264/14496-10 AVC Reference Software Manual

A. M.Tourapis, K. Suhring, G. Sullivan, “H.264/14496-10 AVC Reference Software Manual” , Document JVT-AE010, London, UK, 28 June- 3 July 2009. http://iphome. hhi.de/suehring/tml/download/

work page 2009
[39]

Proof of Spatial Energy Constraint In Section 3.1.2, we propose a spatial energy compaction constraint

Supplementary Material 7.1. Proof of Spatial Energy Constraint In Section 3.1.2, we propose a spatial energy compaction constraint. The detailed proof for this proposal is given in the following. Let αk = Nk N , where Nk and N are the total number of inputs and that of yk(n), respectively. Our autoencoder net- work consist of three downsampling units, so αk = 1

work page
[40]

Rk is bit rate for the k-th channel

Re- fer to [26], the optimum bit allocation problem is described as follows: under the constant rate constraint K−1∑ k=0 αkRk = R(const) (18) , minimize σ2 r = K−1∑ k=0 Bkσ2 qk (19) where y, qk has K channels, so we denote them as yk and qk. Rk is bit rate for the k-th channel. By substituting the approximating relationship [25] σ2 qk≃ ϵ22−2Rσ2 yk (20) wh...

work page

[1] [1]

The JPEG still picture compression stan- dard

G. K Wallace, “The JPEG still picture compression stan- dard”, IEEE Trans. on Consumer Electronics, vol. 38, no. 1, pp. 43-59, Feb. 1991

work page 1991

[2] [2]

An overview of the JPEG2000 still image compression standard

Majid Rabbani, Rajan Joshi, “An overview of the JPEG2000 still image compression standard” , ELSEVIER Signal Pro- cessing: Image Communication, vol. 17, no, 1, pp. 3-48, Jan. 2002

work page 2002

[3] [3]

Overview of the High Efﬁciency Video Coding (HEVC) Standard

G. J. Sullivan, J. Ohm, W. Han and T. Wiegand, “Overview of the High Efﬁciency Video Coding (HEVC) Standard” , IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1668, Dec. 2012

work page 2012

[4] [4]

Overview of the H.264/AVC Video Coding Standard

T. Wiegand, G. J. Sullivan, G. Bjontegaard, A. Luthra, “Overview of the H.264/AVC Video Coding Standard”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560-576, July. 2003

work page 2003

[5] [5]

Extracting and composing robust features with denoising au- toencoders

P. Vincent, H. Larochelle, Y . Bengio and P.-A. Manzagol, “Extracting and composing robust features with denoising au- toencoders”, Intl. conf. on Machine Learning (ICML), pp. 1096-1103, July 5-9. 2008

work page 2008

[6] [6]

Performance Com- parison of Convolutional AutoEncoders, Generative Adver- sarial Networks and Super-Resolution for Image Compres- sion

Z. Cheng, H. Sun, M. Takeuchi, J. Katto, “Performance Com- parison of Convolutional AutoEncoders, Generative Adver- sarial Networks and Super-Resolution for Image Compres- sion”, CVPR Workshop and Challenge on Learned Image Compression (CLIC), pp. 1-4, June 17-22, 2018

work page 2018

[7] [7]

CNN-Optimized Image Compression with Uncertainty based Resource Allocation

Z. Chen, Y . Li, F. Liu, Z. Liu, X. Pan, W. Sun, Y . Wang, Y . Zhou, H. Zhu, S. Liu, “CNN-Optimized Image Compression with Uncertainty based Resource Allocation” , CVPR Work- shop and Challenge on Learned Image Compression (CLIC), pp. 1-4, June 17-22, 2018

work page 2018

[8] [8]

Variable Rate Image Compression with Recurrent Neural Networks

G. Toderici, S. M.O’Malley, S. J. Hwang, et al.,“Variable rate image compression with recurrent neural networks” , arXiv: 1511.06085, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[9] [9]

Full Resolution Image Compression with Recurrent Neural Networks

G, Toderici, D. Vincent, N. Johnson, et al., “Full Resolution Image Compression with Recurrent Neural Networks”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, July 21-26, 2017

work page 2017

[10] [10]

Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks

Nick Johnson, Damien Vincent, David Minnen, et al., “Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks” , arXiv:1703.10114, pp. 1-9, March 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[11] [11]

Lossy Image Compression with Compressive Au- toencoders

Lucas Theis, Wenzhe Shi, Andrew Cunninghan and Ferenc Huszar, “Lossy Image Compression with Compressive Au- toencoders”, Intl. Conf. on Learning Representations (ICLR), pp. 1-19, April 24-26, 2017

work page 2017

[12] [12]

End-to-End Optimized Image Compression

J. Balle, Valero Laparra, Eero P. Simoncelli, “End-to-End Optimized Image Compression”, Intl. Conf. on Learning Rep- resentations (ICLR), pp. 1-27, April 24-26, 2017

work page 2017

[13] [13]

Variational Image Compression with a Hyper- prior

Johannes Balle, D. Minnen, S. Singh, S. J. Hwang, N. Johnston, “Variational Image Compression with a Hyper- prior”, Intl. Conf. on Learning Representations (ICLR), pp. 1-23, 2018. https://tensorflow.github.io/ compression/

work page 2018

[14] [14]

Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations

E. Agustsson, F. Mentzer, M. Tschannen, L. Cavigelli, R. Timofte, L. Benini, L. V . Gool, “Soft-to-Hard Vector Quan- tization for End-to-End Learning Compressible Representa- tions”, Neural Information Processing Systems (NIPS) 2017, arXiv:1704.00648v2

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Conditional Probability Models for Deep Image Compression

F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, L. V . Gool, “Conditional Probability Models for Deep Image Compression”, IEEE Conf. on Computer Vision and Pat- tern Recognition (CVPR), June 17-22, 2018. https:// github.com/fab-jul/imgcomp-cvpr

work page 2018

[16] [16]

Deep Convolu- tional AutoEncoder-based Lossy Image Compression

Z. Cheng, H. Sun, M. Takeuchi, J. Katto, “Deep Convolu- tional AutoEncoder-based Lossy Image Compression” , Pic- ture Coding Symposium, pp. 1-5, June 24-27, 2018

work page 2018

[17] [17]

Learning Con- volutional Networks for Content-weighted Image Compres- sion

M. Li, W. Zuo, S. Gu, D. Zhao, D. Zhang, “Learning Con- volutional Networks for Content-weighted Image Compres- sion”, IEEE Conf. on Computer Vision and Pattern Recog- nition (CVPR), June 17-22, 2018

work page 2018

[18] [18]

Real Time Adaptive Image Com- pression

Ripple Oren, L. Bourdev, “Real Time Adaptive Image Com- pression”, Proc. of Machine Learning Research, V ol. 70, pp. 2922-2930, 2017

work page 2017

[19] [19]

Generative Compres- sion

S. Santurkar, D. Budden, N. Shavit, “Generative Compres- sion”, Picture Coding Symposium, June 24-27, 2018

work page 2018

[20] [20]

Generative Adversarial Networks for Extreme Learned Image Compression

E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. V . Gool, “Generative Adversarial Networks for Extreme Learned Image Compression”, arXiv:1804.02958

work page arXiv

[21] [21]

Video Compression through Image Interpolation

C-Y Wu, N. Singhal, P. Krahenbuhl, “Video Compression through Image Interpolation”, 15th European Conference on Computer Vision, September 8 C 14, 2018

work page 2018

[22] [22]

Deepcoder: A deep neural network based video compres- sion

T. Chen, H. Liu, Q. Shen, T. Yue, X. Cao, and Z. Ma. “Deepcoder: A deep neural network based video compres- sion”. 2017 IEEE Visual Communications and Image Pro- cessing (VCIP), pp. 1C4, Dec 2017

work page 2017

[23] [23]

Workshop and Challenge on Learned Image Compres- sion, CVPR2018, http://www.compression.cc/ challenge/

work page

[24] [24]

Real-time single image and video super-resolution using an efﬁcient sub-pixel convo- lutional neural network

W. Shi, J. Caballero, F. Huszar, et al.“Real-time single image and video super-resolution using an efﬁcient sub-pixel convo- lutional neural network”, Intl. IEEE Conf. on Computer Vi- sion and Pattern Recognition, June 26-July 1, 2016

work page 2016

[25] [25]

Digital coding of waveforms

N.S. Jayant and P. Noll, “Digital coding of waveforms”, En- glewood Cliffs NJ, Prentice-Hall, 1984

work page 1984

[26] [26]

Performance Evaluation of Subband Coding and Optimization of Its Filter Coefﬁcients

J.Katto and Y .Yasuda:“Performance Evaluation of Subband Coding and Optimization of Its Filter Coefﬁcients”, SPIE Vi- sual Communication and Image Processing, Nov.1991

work page 1991

[27] [27]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization”, arXiv:1412.6980, pp.1-15, Dec. 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[28] [28]

ImageNet: A Large-Scale Hierarchical Image Database

J. Deng, W. Dong, R. Socher, L. Li, K. Li and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database” , IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1-8, June 20-25, 2009

work page 2009

[29] [29]

Kodak Lossless True Color Image Suite, Download from http://r0k.us/graphics/kodak/

work page

[30] [30]

Multiscale structural similarity for image quality assessment

Z. Wang, E. P. Simoncelli and A. C. Bovik, “Multiscale structural similarity for image quality assessment” , The 36- th Asilomar Conference on Signals, Systems and Computers, V ol.2, pp. 1398-1402, Nov. 2013

work page 2013

[31] [31]

JPEG ofﬁcial software libjpeg, https://jpeg.org/jpeg/software.html

work page

[32] [32]

JPEG2000 ofﬁcial software OpenJPEG, https://jpeg.org/jpeg2000/software.html

work page

[33] [33]

BPG Image Format, https://bellard.org/bpg/

work page

[34] [34]

A Mathematical Theory of Communica- tion

C. E. Shannon, “A Mathematical Theory of Communica- tion”, The Bell System Technical Journal, V ol. 27, pp. 379- 423, July, 1948

work page 1948

[35] [35]

Video Frame Interpola- tion via Adaptive Separable Convolution

S. Niklaus, L. Mai and F. Liu, “Video Frame Interpola- tion via Adaptive Separable Convolution”, IEEE International Conference on Computer Vision (ICCV) 2017

work page 2017

[36] [36]

http://trace.eas.asu.edu/ index.html

Video Trace Library. http://trace.eas.asu.edu/ index.html

work page

[37] [37]

High Efﬁciency Video Coding (HEVC) Test Model 16 (HM 16) Encoder Description

K. McCann, C. Rosewarne, B. Bross, M. Naccari, K. Shar- man, G. J. Sullivan, “High Efﬁciency Video Coding (HEVC) Test Model 16 (HM 16) Encoder Description” , Document JCTVC-R1002, Sapporo, Jul. 2014. https://hevc.hhi. fraunhofer.de/svn/svn_HEVCSoftware/

work page 2014

[38] [38]

H.264/14496-10 AVC Reference Software Manual

A. M.Tourapis, K. Suhring, G. Sullivan, “H.264/14496-10 AVC Reference Software Manual” , Document JVT-AE010, London, UK, 28 June- 3 July 2009. http://iphome. hhi.de/suehring/tml/download/

work page 2009

[39] [39]

Proof of Spatial Energy Constraint In Section 3.1.2, we propose a spatial energy compaction constraint

Supplementary Material 7.1. Proof of Spatial Energy Constraint In Section 3.1.2, we propose a spatial energy compaction constraint. The detailed proof for this proposal is given in the following. Let αk = Nk N , where Nk and N are the total number of inputs and that of yk(n), respectively. Our autoencoder net- work consist of three downsampling units, so αk = 1

work page

[40] [40]

Rk is bit rate for the k-th channel

Re- fer to [26], the optimum bit allocation problem is described as follows: under the constant rate constraint K−1∑ k=0 αkRk = R(const) (18) , minimize σ2 r = K−1∑ k=0 Bkσ2 qk (19) where y, qk has K channels, so we denote them as yk and qk. Rk is bit rate for the k-th channel. By substituting the approximating relationship [25] σ2 qk≃ ϵ22−2Rσ2 yk (20) wh...

work page