端到端的深层训练模型将超过视频和图像上传统手工制作的压缩技术的性能。核心思想是学习一个非线性转换,以深度神经网络建模,将输入图像映射到潜在空间中,并与潜在分布的熵模型共同映射到潜在的空间中。解码器也被学习为可训练的深层网络,重建图像可以测量失真。这些方法强迫潜在遵循一些先前的分布。由于这些先验是通过在整个训练组中优化学习的,因此性能平均是最佳的。但是,它不能完全适合每个新实例,因此可以通过扩大位流损坏压缩性能。在本文中,我们提出了一种简单但有效的基于实例的参数化方法,以较小的成本减少此摊销差距。所提出的方法适用于任何端到端的压缩方法,将压缩比特率提高了1%,而不会对重建质量产生任何影响。
translated by 谷歌翻译
Most semantic communication systems leverage deep learning models to provide end-to-end transmission performance surpassing the established source and channel coding approaches. While, so far, research has mainly focused on architecture and model improvements, but such a model trained over a full dataset and ergodic channel responses is unlikely to be optimal for every test instance. Due to limitations on the model capacity and imperfect optimization and generalization, such learned models will be suboptimal especially when the testing data distribution or channel response is different from that in the training phase, as is likely to be the case in practice. To tackle this, in this paper, we propose a novel semantic communication paradigm by leveraging the deep learning model's overfitting property. Our model can for instance be updated after deployment, which can further lead to substantial gains in terms of the transmission rate-distortion (RD) performance. This new system is named adaptive semantic communication (ASC). In our ASC system, the ingredients of wireless transmitted stream include both the semantic representations of source data and the adapted decoder model parameters. Specifically, we take the overfitting concept to the extreme, proposing a series of ingenious methods to adapt the semantic codec or representations to an individual data or channel state instance. The whole ASC system design is formulated as an optimization problem whose goal is to minimize the loss function that is a tripartite tradeoff among the data rate, model rate, and distortion terms. The experiments (including user study) verify the effectiveness and efficiency of our ASC system. Notably, the substantial gain of our overfitted coding paradigm can catalyze semantic communication upgrading to a new era.
translated by 谷歌翻译
Learning-based image compression has improved to a level where it can outperform traditional image codecs such as HEVC and VVC in terms of coding performance. In addition to good compression performance, device interoperability is essential for a compression codec to be deployed, i.e., encoding and decoding on different CPUs or GPUs should be error-free and with negligible performance reduction. In this paper, we present a method to solve the device interoperability problem of a state-of-the-art image compression network. We implement quantization to entropy networks which output entropy parameters. We suggest a simple method which can ensure cross-platform encoding and decoding, and can be implemented quickly with minor performance deviation, of 0.3% BD-rate, from floating point model results.
translated by 谷歌翻译
Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM. The project page is at https://github.com/ZhengxueCheng/ Learned-Image-Compression-with-GMM-and-Attention.
translated by 谷歌翻译
We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.
translated by 谷歌翻译
Recent models for learned image compression are based on autoencoders, learning approximately invertible mappings from pixels to a quantized latent representation. These are combined with an entropy model, a prior on the latent representation that can be used with standard arithmetic coding algorithms to yield a compressed bitstream. Recently, hierarchical entropy models have been introduced as a way to exploit more structure in the latents than simple fully factorized priors, improving compression performance while maintaining end-to-end optimization. Inspired by the success of autoregressive priors in probabilistic generative models, we examine autoregressive, hierarchical, as well as combined priors as alternatives, weighing their costs and benefits in the context of image compression. While it is well known that autoregressive models come with a significant computational penalty, we find that in terms of compression performance, autoregressive and hierarchical priors are complementary and, together, exploit the probabilistic structure in the latents better than all previous learned models. The combined model yields state-of-the-art rate-distortion performance, providing a 15.8% average reduction in file size over the previous state-of-the-art method based on deep learning, which corresponds to a 59.8% size reduction over JPEG, more than 35% reduction compared to WebP and JPEG2000, and bitstreams 8.4% smaller than BPG, the current state-of-the-art image codec. To the best of our knowledge, our model is the first learning-based method to outperform BPG on both PSNR and MS-SSIM distortion metrics.32nd Conference on Neural Information Processing Systems (NIPS 2018),
translated by 谷歌翻译
最近的工作表明,变异自动编码器(VAE)与速率失真理论之间有着密切的理论联系。由此激发,我们从生成建模的角度考虑了有损图像压缩的问题。从最初是为数据(图像)分布建模设计的Resnet VAE开始,我们使用量化意识的后验和先验重新设计其潜在变量模型,从而实现易于量化和熵编码的图像压缩。除了改进的神经网络块外,我们还提出了一类强大而有效的有损图像编码器类别,超过了自然图像(有损)压缩的先前方法。我们的模型以粗略的方式压缩图像,并支持并行编码和解码,从而在GPU上快速执行。
translated by 谷歌翻译
It has been witnessed that learned image compression has outperformed conventional image coding techniques and tends to be practical in industrial applications. One of the most critical issues that need to be considered is the non-deterministic calculation, which makes the probability prediction cross-platform inconsistent and frustrates successful decoding. We propose to solve this problem by introducing well-developed post-training quantization and making the model inference integer-arithmetic-only, which is much simpler than presently existing training and fine-tuning based approaches yet still keeps the superior rate-distortion performance of learned image compression. Based on that, we further improve the discretization of the entropy parameters and extend the deterministic inference to fit Gaussian mixture models. With our proposed methods, the current state-of-the-art image compression models can infer in a cross-platform consistent manner, which makes the further development and practice of learned image compression more promising.
translated by 谷歌翻译
我们引入基于实例自适应学习的视频压缩算法。在要传输的每个视频序列上,我们介绍了预训练的压缩模型。最佳参数与潜在代码一起发送到接收器。通过熵编码在合适的混合模型下的参数更新,我们确保可以有效地编码网络参数。该实例自适应压缩算法对于基础模型的选择是不可知的,并且具有改进任何神经视频编解码器的可能性。在UVG,HEVC和XIPH数据集上,我们的CODEC通过21%至26%的BD速率节省,提高了低延迟尺度空间流量模型的性能,以及最先进的B帧模型17至20%的BD速率储蓄。我们还证明了实例 - 自适应FineTuning改善了域移位的鲁棒性。最后,我们的方法降低了压缩模型的容量要求。我们表明它即使在将网络大小减少72%之后也能实现最先进的性能。
translated by 谷歌翻译
学习的图像压缩技术近年来取得了相当大的发展。在本文中,我们发现性能瓶颈位于使用单个高度解码器,在这种情况下,三元高斯模型折叠到二进制文件。为了解决这个问题,我们建议使用三个高度解码器来分离混合参数的解码过程,以分散的高斯混合似然性,实现更准确的参数估计。实验结果表明,与最先进的方法相比,MS-SSSIM优化的所提出的方法实现了3.36%的BD速率。所提出的方法对编码时间和拖鞋的贡献可以忽略不计。
translated by 谷歌翻译
最近,基于深度学习的图像压缩已取得了显着的进步,并且在主观度量和更具挑战性的客观指标中,与最新的传统方法H.266/vvc相比,取得了更好的评分(R-D)性能。但是,一个主要问题是,许多领先的学识渊博的方案无法保持绩效和复杂性之间的良好权衡。在本文中,我们提出了一个效率和有效的图像编码框架,该框架的复杂性比最高的状态具有相似的R-D性能。首先,我们开发了改进的多尺度残差块(MSRB),该块可以扩展容纳长石,并且更容易获得全球信息。它可以进一步捕获和减少潜在表示的空间相关性。其次,引入了更高级的重要性图网络,以自适应地分配位置到图像的不同区域。第三,我们应用2D定量后flter(PQF)来减少视频编码中样本自适应偏移量(SAO)flter的动机。此外,我们认为编码器和解码器的复杂性对图像压缩性能有不同的影响。基于这一观察结果,我们设计了一个不对称范式,其中编码器采用三个阶段的MSRB来提高学习能力,而解码器只需要一个srb的一个阶段就可以产生令人满意的重建,从而在不牺牲性能的情况下降低了解码的复杂性。实验结果表明,与最先进的方法相比,所提出方法的编码和解码时间速度约为17倍,而R-D性能仅在Kodak和Tecnick数据集中降低了1%,而R-D性能仅少于1%。它仍然比H.266/VVC(4:4:4)和其他基于学习的方法更好。我们的源代码可在https://github.com/fengyurenpingsheng上公开获得。
translated by 谷歌翻译
这项工作解决了基于深度神经网络的端到端学习图像压缩(LIC)的两个主要问题:可变速率学习,其中需要单独的网络以不同的质量生成压缩图像,以及可微分近似之间的列车测试不匹配量化和真正的硬量化。我们介绍了LIC的在线元学习(OML)设置,将Meta学习和在线学习中的思想结合在条件变分自动编码器(CVAE)框架中。通过将条件变量视为元参数并将生成的条件特征视为元前沿,可以通过元参数控制所需的重建以适应变量质量的压缩。在线学习框架用于更新元参数,以便为当前图像自适应地调整条件重建。通过OML机制,可以通过SGD有效更新元参数。条件重建基于解码器网络中的量化潜在表示,因此有助于弥合训练估计与真正量化的潜在分布之间的间隙。实验表明,我们的OML方法可以灵活地应用于不同的最先进的LIC方法,以实现具有很少的计算和传输开销的额外性能改进。
translated by 谷歌翻译
卷积式自动统计器现在处于图像压缩研究的最前沿。为了改善其熵编码,通常用第二自动码器分析编码器输出以产生每个可变参数化的先前概率分布。相反,我们提出了一种压缩方案,它使用单个卷积的自动化器和多个学习的先前分布作为专家竞争。培训的先前分布存储在累积分布函数的静态表中。在推理期间,该表由熵编码器用作查找表以确定每个空间位置的最佳选择。我们的方法提供了与在其熵编码的一小部分之前的预测参数化获得的速率失真性能,其具有预测的参数化和解码复杂性。
translated by 谷歌翻译
对于许多技术领域的专业用户,例如医学,遥感,精密工程和科学研究,无损和近乎无情的图像压缩至关重要。但是,尽管在基于学习的图像压缩方面的研究兴趣迅速增长,但没有发表的方法提供无损和近乎无情的模式。在本文中,我们提出了一个统一而强大的深层损失加上残留(DLPR)编码框架,以实现无损和近乎无情的图像压缩。在无损模式下,DLPR编码系统首先执行有损压缩,然后执行残差的无损编码。我们在VAE的方法中解决了关节损失和残留压缩问题,并添加残差的自回归上下文模型以增强无损压缩性能。在近乎荒谬的模式下,我们量化了原始残差以满足给定的$ \ ell_ \ infty $错误绑定,并提出了可扩展的近乎无情的压缩方案,该方案适用于可变$ \ ell_ \ infty $ bunds而不是训练多个网络。为了加快DLPR编码,我们通过新颖的编码环境设计提高了算法并行化的程度,并以自适应残留间隔加速熵编码。实验结果表明,DLPR编码系统以竞争性的编码速度实现了最先进的无损和近乎无效的图像压缩性能。
translated by 谷歌翻译
在本文中,我们提出了一类新的高效的深源通道编码方法,可以在非线性变换下的源分布下,可以在名称非线性变换源通道编码(NTSCC)下收集。在所考虑的模型中,发射器首先了解非线性分析变换以将源数据映射到潜伏空间中,然后通过深关节源通道编码将潜在的表示发送到接收器。我们的模型在有效提取源语义特征并提供源通道编码的侧面信息之前,我们的模型包括强度。与现有的传统深度联合源通道编码方法不同,所提出的NTSCC基本上学习源潜像和熵模型,作为先前的潜在表示。因此,开发了新的自适应速率传输和高辅助辅助编解码器改进机制以升级深关节源通道编码。整个系统设计被制定为优化问题,其目标是最小化建立感知质量指标下的端到端传输率失真性能。在简单的示例源和测试图像源上,我们发现所提出的NTSCC传输方法通常优于使用标准的深关节源通道编码和基于经典分离的数字传输的模拟传输。值得注意的是,由于其剧烈的内容感知能力,所提出的NTSCC方法可能会支持未来的语义通信。
translated by 谷歌翻译
上下文自适应熵模型的应用显着提高了速率 - 渗透率(R-D)的性能,在该表现中,超级培训和自回归模型被共同利用来有效捕获潜在表示的空间冗余。但是,潜在表示仍然包含一些空间相关性。此外,这些基于上下文自适应熵模型的方法在解码过程中无法通过并行计算设备,例如FPGA或GPU。为了减轻这些局限性,我们提出了一个学识渊博的多分辨率图像压缩框架,该框架利用了最近开发的八度卷积,以将潜在表示形式分配到高分辨率(HR)和低分辨率(LR)部分,类似于小波变换,这进一步改善了R-D性能。为了加快解码的速度,我们的方案不使用上下文自适应熵模型。取而代之的是,我们利用一个额外的超层,包括超级编码器和超级解码器,以进一步删除潜在表示的空间冗余。此外,将跨分辨率参数估计(CRPE)引入提出的框架中,以增强信息流并进一步改善速率延伸性能。提出了对总损耗函数提出的其他信息损失,以调整LR部分对最终位流的贡献。实验结果表明,与最先进的学术图像压缩方法相比,我们的方法分别将解码时间减少了约73.35%和93.44%,R-D性能仍然优于H.266/VVC(4:4::4:: 2:0)以及对PSNR和MS-SSIM指标的一些基于学习的方法。
translated by 谷歌翻译
我们提出了一种压缩具有隐式神经表示的全分辨率视频序列的方法。每个帧表示为映射坐标位置到像素值的神经网络。我们使用单独的隐式网络来调制坐标输入,从而实现帧之间的有效运动补偿。与一个小的残余网络一起,这允许我们有效地相对于前一帧压缩p帧。通过使用学习的整数量化存储网络权重,我们进一步降低了比特率。我们呼叫隐式像素流(IPF)的方法,提供了几种超简化的既定神经视频编解码器:它不需要接收器可以访问预先磨普的神经网络,不使用昂贵的内插基翘曲操作,而不是需要单独的培训数据集。我们展示了神经隐式压缩对图像和视频数据的可行性。
translated by 谷歌翻译
我们提出了一种用于在仅在解码器处作为侧面信息可用时压缩图像的新型神经网络(DNN)架构。该问题在信息理论中称为分布式源编码(DSC)。特别地,我们考虑一对立体图像,其由于视野的重叠场而通常彼此具有高相关,并且假设要压缩和发送该对的一个图像,而另一个图像仅在解码器。在所提出的架构中,编码器将输入图像映射到潜像,量化潜在表示,并使用熵编码压缩它。训练解码器以仅使用后者使用后者提取输入图像和相关图像之间的公共信息。接收的潜在表示和本地生成的公共信息通过解码器网络来获得增强的输入图像的增强重建。公共信息提供了ReceIver上相关信息的简洁表示。我们训练并展示所提出的方法对立体声图像对的拟议方法的有效性。我们的结果表明,该建筑的架构能够利用仅解码器的侧面信息,并且在使用解码器侧信息的情况下优于立体图像压缩的先前工作。
translated by 谷歌翻译
随着深度学习技术的发展,深度学习与图像压缩的结合引起了很多关注。最近,学到的图像压缩方法在速率绩效方面超出了其经典对应物。但是,连续的速率适应仍然是一个悬而未决的问题。一些学到的图像压缩方法将多个网络用于多个速率,而另一些则使用一个模型,而牺牲了计算复杂性的增加和性能降解。在本文中,我们提出了一个不断的可调节率的学术图像压缩框架,不对称获得了变异自动编码器(AG-VAE)。 AG-VAE利用一对增益单元在一个单个模型中实现离散率适应,并具有可忽略的附加计算。然后,通过使用指数插值,可以在不损害性能的情况下实现连续速率适应。此外,我们提出了不对称的高斯熵模型,以进行更准确的熵估计。详尽的实验表明,与经典图像编解码器相比,我们的方法通过SOTA学习的图像压缩方法获得了可比的定量性能,并且定性性能更好。在消融研究中,我们证实了增益单元和不对称高斯熵模型的有用性和优势。
translated by 谷歌翻译
In recent years, neural image compression (NIC) algorithms have shown powerful coding performance. However, most of them are not adaptive to the image content. Although several content adaptive methods have been proposed by updating the encoder-side components, the adaptability of both latents and the decoder is not well exploited. In this work, we propose a new NIC framework that improves the content adaptability on both latents and the decoder. Specifically, to remove redundancy in the latents, our content adaptive channel dropping (CACD) method automatically selects the optimal quality levels for the latents spatially and drops the redundant channels. Additionally, we propose the content adaptive feature transformation (CAFT) method to improve decoder-side content adaptability by extracting the characteristic information of the image content, which is then used to transform the features in the decoder side. Experimental results demonstrate that our proposed methods with the encoder-side updating algorithm achieve the state-of-the-art performance.
translated by 谷歌翻译