Reducing The Amortization Gap of Entropy Bottleneck In End-to-End Image Compression

Muhammet Balcilar InterDigital, Inc.
Rennes, France
muhammet.balcilar@interdigital.com Bharath Damodaran InterDigital, Inc.
Rennes, France
bharath.damodaran@interdigital.com Pierre Hellier InterDigital, Inc.
Rennes, France
pierre.hellier@interdigital.com

Abstract

End-to-end deep trainable models are about to exceed the performance of the traditional handcrafted compression techniques on videos and images. The core idea is to learn a non-linear transformation, modeled as a deep neural network, mapping input image into latent space, jointly with an entropy model of the latent distribution. The decoder is also learned as a deep trainable network, and the reconstructed image measures the distortion. These methods enforce the latent to follow some prior distributions. Since these priors are learned by optimization over the entire training set, the performance is optimal in average. However, it cannot fit exactly on every single new instance, hence damaging the compression performance by enlarging the bit-stream. In this paper, we propose a simple yet efficient instance-based parameterization method to reduce this amortization gap at a minor cost. The proposed method is applicable to any end-to-end compressing methods, improving the compression bitrate by $1 %$ without any impact on the reconstruction quality.

Neural Image Compression, Entropy Model, Amortization Gap.

\newCommentsColor

bharath0.0, 0.5, 0.0 \newCommentsColormuhammet0.0, 0.0, 0.5 \newCommentsColorpierre0.5, 0.0, 0.0

I Introduction

Image and video compression is still a topic of utmost importance, despite decades of hard work. The pandemic, as well as the rise of the metaverse, lead to a huge volume of video transmitted over the network, and limiting the amount of data matters to reduce the energy consumption. Classical methods, based on handcrafted extracted features such as DCT, have prevailed over the last three decades. Since roughly five years, methods based on machine learning have emerged, and their performances now challenge those of traditional techniques.

These end-to-end deep compression methods are a special case of Variational Autoencoder (VAE) model as in [12], where the encoder, decoder and entropy model are learned jointly. The output of the encoder is referred to as the latent variable, and the entropy model prior is trained on the distribution of observed latent variable over the training dataset [22, 3, 4, 18, 19, 7, 24]. It was shown that minimizing the evidence lower bound (ELBO) of this special VAE is equivalent to minimizing jointly the mean square error (MSE) of the reconstruction image and the entropy of latents w.r.t the priors [4].

At test time, these codecs encode the quantized latent variable losslessly w.r.t the trained priors into a bit-stream. It is typically performed by any entropy coder such as range or arithmetic coding [20]. The prior distribution being known by the decoder (or reconstructed by side information [4, 18, 19, 7, 24]), the latent can be reconstructed from the transmitted bitstream, and consequently used to reconstruct the original data. At the moment, various priors have been proposed so far: fully-factorized [3], zero-mean gaussian [4], gaussian [18, 19] or mixture of gaussian [7], where some predict priors using an autoregressive schema [18, 19, 7, 24].

All prior models are learned by amortizing its parameters over the entire training set. Consequently, the learned prior is optimal in average, but sub-optimal for a given specific data instance. This is due to the fact that the observed latent distribution differs from the learned prior distribution. This problem is referred to as the amortization gap [9]. Methods proposed to solve this problem are two folds: firstly, enforce that the latent of the given instance obeys the priors [1, 16, 6, 25, 11]; secondly, modify the priors to better fit the given instance’s latents [13, 15, 26, 23]. The first class of methods does not need any update on the receiver side, but has limited gain. The second class of methods can update the encoder/decoder in addition to entropy model as well resulting more gain, at the additional cost of transmitting these updates to the receiver. However, all methods require post-training to overfit on a given data instance, which increases the encoding time significantly.

In this paper, we propose two main contributions: first, we define the amortization gap of entropy model in compressing perspective and report the amortization gap of some recent neural image compression model over benchmark datasets. Second, we propose simple yet efficient methods for factorized and hyperprior entropy models to adjust the priors to fit on any new instance to be compressed. Our solution does not need post-training and does not add computational complexity. We show that a gain of at least $1 %$ can be expected from sota end-to-end compression method without any impact on the reconstruction quality.

Ii Neural Image Compression

In this section, we introduce the mathematical notations for the end-to-end variational autoencoders and the models of the entropy model. The encoder $y = g_{a} (x; ϕ)$ , where $ϕ$ are the parameters of the corresponding neural network, transforms each input $x \in R^{n \times n \times 3}$ into a lower dimensional latent $y \in R^{m \times m \times o}$ and quantize it to obtain $^y = Q (y)$ .

Recent methods have proposed the two following possible solutions:

The fully factorized model [3], where the the quantized latents are compressed losslessly by the entropy coder using factorized entropy model $p_{f} (^y | Ψ)$ .
The hyperprior model, now used in most settings [4, 7], where a side information is extracted so as to remove spatial structure from the latent information so that the model generalizes better. The side information $z = h_{a} (y; Φ)$ where $z \in R^{k \times k \times f}$ (and its quantization $^z = Q (z)$ ) are also learned. In that case, $^y$ is encoded with the hyperprior entropy model $p_{h} (^y |^z; Θ)$ , and $^z$ is encoded with factorized entropy model $p_{f} (^z | Ψ)$ .

The decoder $^x = g_{s} (^y; θ)$ reconstructs the image $^x$ from the transmitted quantized latent variables, or reconstructed latent in the case of the hyperprior model. In the general case, the parameters $ϕ, θ, Ψ, Θ$ of $g_{a}, g_{s}, p_{f}, p_{h}$ are obtained by minimizing the following rate-distortion loss.

L = E \begin{matrix} x \sim p_{x} ϵ \sim U \end{matrix} [- l o g (p_{h} (^y |^z, Θ)) - l o g (p_{f} (^z | Ψ)) + λ d (x,^x)],

(1)

where $d (., .)$ is any distortion loss such as MSE, $λ$ is the trade-off parameter to control compression ratio and quality, $Q (.)$ is continuous relaxation at train time as $Q (x) = x + ϵ$ , $ϵ \sim U (- 0.5, 0.5)$ .

At test time, the quantized latent variables are compressed losslessly by the entropy coder as follows:

For the fully factorized model [3], each $k \times k$ slice of side latent has a trainable cumulative distribution function (cdf) in entropy model shown by ${¯ p}_{Ψ}^{(c)} (.), c = 1 \dots f$ , and probability mass function (pmf) for a given value of $x$ is derived as ${^p}_{Ψ}^{(c)} (x) = {¯ p}_{Ψ}^{(c)} (x + 0.5) - {¯ p}_{Ψ}^{(c)} (x - 0.5)$ . Thus, the entropy model applies as follows;

$p_{f} (^z | Ψ) = f \prod c = 1 k, k \prod i, j = 1 {^p}_{Ψ}^{(c)} ({^z}_{i, j, c})$ (2)
In the case of the hyperprior model, the entropy model of the latent $^y$ is conditioned with the side information $z = h_{a} (y; Φ)$ . Thus, $^y$ is encoded with the hyperprior entropy model $p_{h} (^y |^z; Θ)$ , and $^z$ is encoded with factorized entropy model $p_{f} (^z | Ψ)$ .

Let us describe in more details the special case of the hyperprior model. Each latent point is modeled as $1 d$ Gaussian distribution and its pmf is $^N (x; μ, σ) = ¯ N (x + 0.5; μ, σ) - ¯ N (x - 0.5; μ, σ)$ while $¯ N (.; μ, σ)$ is the cdf of $1 d$ Gaussian distribution. The hyperprior entropy model is written $p_{h} (^y |^z, Θ) = \prod_{i}^N ({^y}_{i}; μ_{i}, σ_{i})$ at train time where $μ, σ = h_{s} (^z; Θ)$ as in [4] or $μ_{i}, σ_{i} = h_{s} (^z, {^y}_{< i}; Θ)$ in autoregressive prediction in [18, 7, 24]. $h_{s}$ is a trainable model implemented as a neural network with parameter $Θ$ . However, this implementation is not effective at test time due to the necessity of recalculating the pmf table at receiver side for each latent points $i$ . Thus, it is common practice to use $s$ number of predefined integer resolution pmf tables under zero means but different scale parameters (logarithmic distributed scale values between $σ_{m i n}$ to $σ_{m a x}$ ) [4, 18, 19, 7, 24]. As long as ${~ y}_{i} = Q (y_{i} - μ_{i})$ , ${^y}_{i} = {~ y}_{i} + μ_{i}$ , $σ_{c}$ is $c$ -th predefined scale and $N (σ_{c})$ is a set of latent indices whose winning scale is $σ_{c}$ , the hyperprior entropy model is implemented as follows at test time:

p_{h} (^y |^z, Θ) = s \prod c = 1 \prod i \in N (σ_{c})^N ({~ y}_{i}; 0, σ_{c})

(3)

Iii Proposed Method

In this section, we define the amortization gap of the entropy models and propose solutions to reduce it.

Iii-a Amortization Gap of the Entropy Model

At training time, parameters $ϕ, θ, Φ, Θ$ and $Ψ$ are estimated by optimizing $L$ over the data distribution $x \sim p_{x}$ . The parameters may be optimal in average for entire dataset, but not any specific instance $x$ , which is known as an amortization gap [9]. Actually, the amortization gap in compression schema may occur for each trainable blocks in the model. However, we are here only interested in the gap for the entropy models.

The amortization gap of the entropy model is the difference between the optimal entropy model and the learned entropy model (see Fig. 1). It quantifies the expected gain in bit length, if the entropy models’ pmfs are optimal on every input instance. This gap can be calculated for the factorized entropy model as follows:

G_{f} = - l o g (p_{f} (^z | Ψ)) + log (p_{f}^{*} (^z))

(4)

$p_{f}^{*} (^z)$ refers to the optimal entropy model within the same entropy family $P_{f}$ . For the hyperprior entropy model, it reads:

G_{h} = - l o g (p_{h} (^y |^z, Θ)) + l o g (p_{h}^{*} (^y))

(5)

$p_{h}^{*} (^y)$ refers to the optimal entropy model within the same entropy family $P_{h}$ . Eqn. 5 and 4 shows that this gap can be bridged by using the optimal entropy model on each instance. To find such an instance-specific optimal entropy model, one does not need to optimize the log-likelihood, since the normalized histogram is the optimal pmf [17].

Thus, replacing learned pmf tables with histogram of actual latents for each instance removes the amortization gap of the entropy model. However, it introduces the additional cost of transmitting these histograms for each instance, enlarging the bit stream which is not practically feasible.

Iii-B Explicit Parameterization

Here, we propose an efficient solution to bridge the amortization gap of the entropy model at a negligible additional transmission cost by an explicit parameterization. More specifically, when an input image has to be encoded, we parameterize the distribution of the latents ${~ p}_{f | h} (., β)$ for factorized and hyperprior entropy models to closely approximate the optimal pmf. Our approximation is illustrated in Fig. 1. This low-level approximation can be hopefully transmitted at a negligible extra cost (signaling cost of $β$ ), and aims at improving the encoding using ${~ p}_{f | h} (., β)$ .

We propose two variants for approximating ${~ p}_{f | h} (., β)$ . The first one is generic, uses Gaussian mixture model and is described in section III-B1. It is dedicated to the factorized entropy model, since this modeling is flexible. The second one is a simplified version, where the central bin is spread on neighbouring bins, and is described in section III-B2. This approach is dedicated to the hyperprior entropy model, since the discrepancy between learned and actual pmf is smaller.

Fig. 1: Learned pmfs ( ${^p}_{Ψ},^N$ ), reparameterized pmfs ( ${~ p}_{f}, {~ p}_{h}$ ) and normalized frequencies ( $h_{f}, h_{h}$ ) for a certain image’s selected latent under factorized (left) and hyperprior (right) entropy models. Our reparameterization fits better on the normalized frequencies, leading to improved compression.

Iii-B1 Truncated Gaussian Mixture on Discrete Support

Since the factorized entropy model is a non-parametric distribution model [3, 4], the pmf is flexible enough to have any shape. One of the most successful parametric distribution model is Gaussian Mixture Model (GMM) which can approximate any smooth density function [10] with a cost of three parameters per component. Thus, we propose to use GMM as a tool to re-model the latent distribution of the factorized entropy model.

In our case, the function to be approximated is defined on integer center and support domain of GMM is truncated such that $[x_{m i n} \dots x_{m a x}]$ . Thus, we can write our mixture model’s pmf for factorized entropy model ${~ p}_{f} (x; β^{(c)}))$ as:

{~ p}_{f} (x; β^{(c)}) = \frac{\sum_{k = 1}^{K} π_{k} N (x; μ_{k}, σ_{k})}{\sum_{z = x_{m i n}}^{x_{m a x}} \sum_{k = 1}^{K} π_{k} N (z; μ_{k}, σ_{k})}

(6)

Here $β^{(c)} = {π_{k}, μ_{k}, σ_{k}}_{k = 1 \dots K}$ denotes the set of parameters to be inferred in (6) for $c$ -th latent band. Since there is no closed form solution of $β^{(c) *}$ that maximize (6), optimization is used for estimation. In practice, given the small number of parameters, convergence is fast. Fig 1 (left) displays the result of pmf approximation using a mixture of two Gaussians ( $K = 2$ ). In practice, the tuning of parameter $K$ leads to a trade-off between approximation accuracy and transmission cost.

For the hyperprior entropy model, since the latent distribution has zero-mean and is uni-modal, we relax eqn. 6 by setting $K = 1$ and $μ = 0$ , also called as zero-mean truncated Gaussian approximation. This is illustrated in Fig. 1(right) where ${~ p}_{h} (x; β^{(c)})$ obtained by equation (6) under $K = 1$ and $μ = 0$ .

Iii-B2 Difference of the Center Bins Probability

We propose in this section a simpler alternative for the hyperprior entropy, since the latter is parametric and by construction has a zero-mean Gaussian shape. Hence, in that special case, we propose an alternative at minimal transmission cost. We use the heuristic of computing the error of center bins probability and spread this error to the other bins proportionally seems arguable strong closed form alternative. In this approach, reparameterized $c$ -th pmf of hyperprior entropy model can be written in (7) where the parameter $β$ is the error of the center bins probability between learned one and the actual one in the normalized histogram such as $β^{(c)} =^N (0; 0, σ_{c}) - h_{h}^{(c)} (0)$ .

{~ p}_{h} (x; β^{(c)}) = ⎧ ⎨ ⎩ \begin{matrix} ^N (x; 0, σ_{c}) - β^{(c)} & if x = 0^N (x; 0, σ_{c}) (1 + \frac{β^{(c)}}{1 -^N (0; 0, σ_{c})}) & % i f x \neq 0 \end{matrix}

(7)

Iii-B3 Quantization of the Parameters

According to selected re-parameterization method, 4 different kind of parameters ( $μ, σ, π$ of Gaussian and differences of center bin probability) should be explicitly encoded into the bitstream as an extra parameters. In order to do that, we apply quantization on predefined number of bins (8-bit by default). Since the range of these parameters are different, we prepared shared quantization tables for each parameters. We discretize $σ$ in range of $[0.002, 20]$ with logarithmic bin width, $π$ in range of $[0, 1]$ , $μ$ in range of $[x_{m i n}, x_{m a x}]$ (minimum symbol to maximum symbol in the cdf table) and differences of center bin probability in range of $[- 0.03, 0.03]$ with uniform bin width using 256 quantization centers for 8-bit quantization.

Iv Experimental Results

We used CompressAI library [5] to test our contributions on already implemented $6$ SOTA deep compression methods, as well as on a very recent method [24]. We used two datasets to evaluate our method: Kodak test set[14] and Clic-2021 Challenge’s Professional test set [8], consisting of 24 and 60 images respectively.

Analysis of amortization gap: We measured the amortization gaps of pre-trained SOTA methods and our gains on Kodak test set [14] on the lowest bpp. Results are given in Table I. Regardless of the baseline method, the factorized entropy model’s amortization gap is quite large (7.6-11.8%), compared to the hyperpriors one (1.9-4.5%). This observation is also predictable by Fig 1 where it clearly shows that mismatch between $h_{f}$ and ${^p}_{Ψ}$ is much more bigger than $h_{h}$ and $^N$ . This can be explained easily: the hyperprior entropy model uses instance specific information, and fully factorized model does not. The hyper-prior methods encode very small amount of the data (0.6-5.9%) with less effective entropy model (factorized entropy), but vast majority of them (94.1-99.4%) is encoded by effective entropy model (hyperprior entropy). Thus, in average their amortization gap (1.9-4.7%) is smaller compared to the fully-factorized method (9.5%). From the different version of same method, it can be seen that when the amount of side information decreases, correspondingly the hyperprior gap increases. For instance, the hyperprior gap is 3.4% where there is 5.9% side information but increases to 4.5% when there is 3.5% side information.

Reduction of Amortization Gap: Our proposed methods managed to save significant amount of bits on all studied methods in Table I. The magnitude of saving is higher with the factorized entropy model compared to hyper-prior model. We can reduce the factorized entropy gap from 9.5% to 2.71%, with the gain of 6.79% when all information is encoded by this entropy model in bmshj2018-factorized model. However, when the amount of information to be encoded is less with this entropy model, our gain also reduces. For instance, when only 2.3% of the information is encoded by this entropy model in cheng2020-attn, our gain is 2.77%. The fixed extra parameter cost in our proposal makes the efficiency of our method proportionally less when the amount of information to be encoded decreases. The hyperprior entropy encodes large percentage of the information, and our gain ranges from 1-1.8%, which almost fills in average 45% gap exists in hyperprior entropy over different methods.

Model	Factorized Entropy			Hyperprior Entropy			Total
	Ratio	Gap	Gain	Ratio	Gap	Gain	Gap	Gain
	(in %)	(in %)	(in %)	(in %)	(in %)	(in %)	(in %)	(in %)
bmshj2018-factorized [3]	100	9.5	6.79	-	-	-	9.5	6.79
bmshj2018-hyperprior [4]	3.5	10.0	4.65	96.5	4.5	1.84	4.7	1.98
mbt2018-mean [18]	5.9	11.4	4.27	94.1	3.4	1.24	3.8	1.43
mbt2018 [18]	2.3	9.9	4.09	97.7	2.5	0.99	2.7	1.06
cheng2020-anchor [7]	1.2	11.8	3.68	98.8	2.3	1.06	2.5	1.09
cheng2020-attn [7]	2.3	9.5	2.77	97.7	1.9	0.88	2.1	0.92
InvCompress [24]	0.6	7.6	2.18	99.4	3.0	1.32	3.0	1.33

TABLE I: Ratio of the encoded information, the amortization gap of the entropy models and our gain for each entropy model relative to the original bit-length for the methods trained lowest bpp objective. The gap and the gain is much more higher in factorized entropy. Our solution reduces the gap significantly both factorized and hyperprior entropy model. In total, proposed method saves more than 1% of file size from sota model. Results are averaged over Kodak Test set.

Fig. 2: Experimental results on Kodak and Clic Test Set for bhshj2018-factorized model in [3] trained on 7 different psnr objectives. Proposed method based on truncated GMM saves 6.8% original bit length for Kodak, 11.5% for Clic test test on lower psnr and outperformed post-train based high computational demanding alternatives.

In order to measure the performance of proposed method on different PSNR targets, we plugged our method on the pre-trained model in [3] for the factorized entropy and the best neural compressing model cheng2020-anchor [7] provided in [5] for the hyperprior entropy.The performances are measured with Kodak and Clic-2021 Challenge’s Professional test set. According to results in Fig 2, the amortization gap of factorized entropy varies from 8.5% to 9.5% in Kodak dataset where the proposed method gains from 5.3% to 6.8% in file size. In Click-2021 dataset, the gap (9.5%-12.5%) and our gain (8%-11.5%) are even bigger. Fig. 3 reports the results of the hyperprior entropy, our method saves more than 1% of original file size in lower bit-rate and save around 0.5% in highest bit-rate. The simplest approach that parameterizes the new probability by the difference between center bin’s probability in Eq.7 gives competitive result even better in higher psnr with zero-mean Gaussian parameterization.

Comparison with competitors: To compare our method with the existing method, we also implemented two instance based post-training methods: post-train encoder (trains the encoder for given test image) [16], post-train latent (learns more effective instance’s latent directly without training encoder)[6]. According to our test, we have found that [6] is faster (but still needs significant time to train) than [16] but with less performance as can be seen in Fig 2. Our proposal reaches better results and outperforms significantly compared to these two approaches even without giving any significant computational complexity.

Computational Complexity: To test the computational complexity, we use python library of PAPI [21] and count floating point operations per pixel in flops/pxl both encoding and decoding time. As the results are shown in Table II, our proposal gives negligible extra complexity both encoding and decoding time. Even in the fully-factorized model which uses relatively costly GMM model fitting with $K = 2$ , our extra complexity demanding is less than %0.3 in encoding time. Since the hyperprior model uses zero-mean gaussian fitting and has good initial guess with learned standard deviation, extra encoding operation even goes below %0.05. Since in the decoding time, extra process is to re-arrange pmf table with encoded extra parameters, there is almost no differences in decoding complexity. Note that, our alternatives which applies finetuning in encoding time as in [6, 16] needs enormous extra computational complexity in encoding. Their extra computational demand is depend on the maximum iteration of finetuning, where each iteration is not less than a single decoding (one forward pass) complexity in any model (most of the time 3-4 times longer than one forward-pass because of necessity of back-propagation of gradients). In our test, we did not get meaningful result from [6, 16] without applying 1000 iteration which has 3000-4000 times bigger computational demand.

Implementation Details: In order to reproducibility, we implement the proposals as a collection of classes of python end-to-end image compression library CompressAI [5] and integrate to it as a part of this well-known library. Using our classes, the one can easily apply proposal gap reduction to any model and decrease the file size without any impact on the reconstruction quality. In our models, the re-parameterization method which can be either zero-mean gaussian, GMM with K=1,2,3 or difference of center bin approaches is hyperparameter for both factorized and hyperprior entropy models. In practice, a table-wise selection mechanism is used to determine if the learned pmf should be replaced by the approximated one. This results necessity of signaling receiver by 1-bit if the cdf table is replaced or not. Instead of testing all pmf tables in both entropy models, we define another hyperparameter which shows how many cdf tables are targeded. The number of bits to encode single extra parameter explicitly can also be seen as hyperparameter which is common for both entropy models. We use GMM for $K = 2$ and targetted top 64 pmf tables in bmshj2018-factorized model. For cheng2020-anchor, we use GMM for K=1 and targeted 32 pmf for factorized entropy (side information), while zero mean gaussian and targetted 32 pmf for hyperprior entropy (main information). We always use 8-bit quantization on the extra parameters.

Model	Encoding		Decoding
	Baseline	Ours	Baseline	Ours
bmshj2018-factorized [3]	83,797	84,024	85,400	85,412
cheng2020-anchor [7]	348,142	348,298	519,626	519,639

TABLE II: Necessary number of floating point operations per pixel (flops/pix) in encoding and decoding for studied two neural models. Our results are obtained over Kodak test set with our most efficient settings in terms of compression performance.

V Conclusion

We have proposed here to improve the efficiency of entropy models in deep neural image compression algorithm. First, we have defined the amortization gap for entropy models, and we measured experimentally the gap for sota methods. Then, we have proposed an effective and computationally-friendly method to fill the gap, which differs from previously published methods. We have shown experimentally that a gain above $1 %$ was obtained, on different dataset and on $7$ various SOTA compression methods. Our method can actually be applied to any end-to-end deep compression technique, without any impact on the reconstruction quality. One known limitation of deep methods is reconstruction failure, due to architecture discrepancy (the architecture used at encoding and decoding differ). This is explained by the need of recomputing the pmf table at decoding, and small errors occur because of floating-value approximations [2]. In this regards, our method based on center-bin difference will be robust since it needs only integer divide operation. Future work may explore the link between explicit parameterization of the entropy model, and fine-tuning of the encoder. We may expect to increase the reconstruction quality, at constant bit-rate.

References

[1] C. Aytekin, X. Ni, F. Cricri, J. Lainema, E. Aksu, and M. Hannuksela (2018-06) Block-optimized variable bit rate neural image compression. In CVPR workshop, Cited by: §I.
[2] J. Ballé, N. Johnston, and D. Minnen (2019) Integer networks for data compression with latent-variable models. In ICLR, Cited by: §V.
[3] J. Ballé, V. Laparra, and E. P. Simoncelli (2017) End-to-end optimized image compression. In ICLR, Cited by: §I, §I, 1st item, 1st item, §III-B1, Fig. 2, TABLE I, TABLE II, §IV.
[4] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston (2018) Variational image compression with a scale hyperprior. In ICLR, Cited by: §I, §I, 2nd item, §II, §III-B1, TABLE I.
[5] J. Bégaint, F. Racapé, S. Feltman, and A. Pushparaja (2020) CompressAI: a pytorch library and evaluation platform for end-to-end compression research. arXiv preprint arXiv:2011.03029. Cited by: §IV, §IV, §IV.
[6] J. Campos, S. Meierhans, A. Djelouah, and C. Schroers (2019-06) Content adaptive optimization for neural image compression. In CVPR Workshops, Cited by: §I, §IV, §IV.
[7] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto (2020) Learned image compression with discretized gaussian mixture likelihoods and attention modules. In CVPR, Cited by: §I, §I, 2nd item, §II, Fig. 3, TABLE I, TABLE II, §IV.
[8] CLIC: challenge on learned image compression. Note: \urlhttp://compression.cc@misc{CLIC, title = {{CLIC}: Challenge on Learned Image Compression}, howpublished = {\url{http://compression.cc}%\url{https://storage.googleapis.com/clic2021_public/professional_test_2021.zip}}} Cited by: §IV.
[9] C. Cremer, X. Li, and D. Duvenaud (2018) Inference suboptimality in variational autoencoders. ICML. Cited by: §I, §III-A.
[10] I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. Cited by: §III-B1.
[11] T. Guo, J. Wang, Z. Cui, Y. Feng, Y. Ge, and B. Bai (2020-06) Variable rate image compression with content adaptive optimization. In CVPR Workshops, Cited by: §I.
[12] D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. In ICLR, Cited by: §I.
[13] J. P. Klopp, L. Chen, and S. Chien (2020) Utilising low complexity cnns to lift non-local redundancies in video coding. IEEE Transactions on Image Processing 29 (), pp. 6372–6385. External Links: Document Cited by: §I.
[14] E. Kodak Kodak Lossless True Color Image Suite (PhotoCD PCD0992). External Links: Link Cited by: §IV, §IV.
[15] Y. Lam, A. Zare, F. Cricri, J. Lainema, and M. M. Hannuksela (2020) Efficient adaptation of neural network filter for video compression. In ACM International Conference on Multimedia, MM ’20, pp. 358–366. External Links: ISBN 9781450379885, Link, Document Cited by: §I.
[16] G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao (2020) Content adaptive and error propagation aware deep video compression. In ECCV, Vol. 12347, pp. 456–472. Cited by: §I, §IV, §IV.
[17] M. Mezard and A. Montanari (2009) Information, physics, and computation. Oxford University Press. Cited by: §III-A.
[18] D. Minnen, J. Ballé, and G. D. Toderici (2018) Joint autoregressive and hierarchical priors for learned image compression. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31, pp. . Cited by: §I, §I, §II, TABLE I.
[19] D. Minnen and S. Singh (2020) Channel-wise autoregressive entropy models for learned image compression. In ICIP, pp. 3339–3343. Cited by: §I, §I, §II.
[20] J. Rissanen and G. Langdon (1981) Universal modeling and coding. IEEE Transactions on Information Theory 27 (1), pp. 12–23. External Links: Document Cited by: §I.
[21] D. Terpstra, H. Jagode, H. You, and J. Dongarra (2010) Collecting performance data with papi-c. In Tools for High Performance Computing 2009, pp. 157–173. Cited by: §IV.
[22] L. Theis, W. Shi, A. Cunningham, and F. Huszár (2017) Lossy image compression with compressive autoencoders. In ICLR, Cited by: §I.
[23] T. van Rozendaal, I. A. Huijben, and T. S. Cohen (2021) Overfitting for fun and profit: instance-adaptive data compression. In ICLR, Cited by: §I.
[24] Y. Xie, K. L. Cheng, and Q. Chen (2021) Enhanced invertible encoding for learned image compression. In Proceedings of the ACM International Conference on Multimedia, Cited by: §I, §I, §II, TABLE I, §IV.
[25] Y. Yang, R. Bamler, and S. Mandt (2020) Improving inference for neural image compression. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33, pp. 573–584. Cited by: §I.
[26] N. Zou, H. Zhang, F. Cricri, H. R. Tavakoli, J. Lainema, M. Hannuksela, E. Aksu, and E. Rahtu (2020) L2C – learning to learn to compress. In 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), Vol. , pp. 1–6. External Links: Document Cited by: §I.