压缩在通过限制系统(例如流媒体服务,虚拟现实或视频游戏)等系统的有效传输和存储图像和视频中起着重要作用。但是,不可避免地会导致伪影和原始信息的丢失,这可能会严重降低视觉质量。由于这些原因,压缩图像的质量增强已成为流行的研究主题。尽管大多数最先进的图像恢复方法基于卷积神经网络,但基于Swinir等其他基于变压器的方法在这些任务上表现出令人印象深刻的性能。在本文中,我们探索了新型的Swin Transformer V2,以改善图像超分辨率的Swinir,尤其是压缩输入方案。使用这种方法,我们可以解决训练变压器视觉模型中的主要问题,例如训练不稳定性,预训练和微调之间的分辨率差距以及数据饥饿。我们对三个代表性任务进行实验:JPEG压缩伪像去除,图像超分辨率(经典和轻巧)以及压缩的图像超分辨率。实验结果表明,我们的方法SWIN2SR可以改善SWINIR的训练收敛性和性能,并且是“ AIM 2022挑战压缩图像和视频的超分辨率”的前5个解决方案。
translated by 谷歌翻译
Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from lowquality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by up to 0.14∼0.45dB, while the total number of parameters can be reduced by up to 67%.
translated by 谷歌翻译
近年来,压缩图像超分辨率已引起了极大的关注,其中图像被压缩伪像和低分辨率伪影降解。由于复杂的杂化扭曲变形,因此很难通过简单的超分辨率和压缩伪像消除掉的简单合作来恢复扭曲的图像。在本文中,我们向前迈出了一步,提出了层次的SWIN变压器(HST)网络,以恢复低分辨率压缩图像,该图像共同捕获分层特征表示并分别用SWIN Transformer增强每个尺度表示。此外,我们发现具有超分辨率(SR)任务的预处理对于压缩图像超分辨率至关重要。为了探索不同的SR预审查的影响,我们将常用的SR任务(例如,比科比奇和不同的实际超分辨率仿真)作为我们的预处理任务,并揭示了SR在压缩的图像超分辨率中起不可替代的作用。随着HST和预训练的合作,我们的HST在AIM 2022挑战中获得了低质量压缩图像超分辨率轨道的第五名,PSNR为23.51db。广泛的实验和消融研究已经验证了我们提出的方法的有效性。
translated by 谷歌翻译
基于变压器的方法与基于CNN的方法相比,由于其对远程依赖性的模型,因此获得了令人印象深刻的图像恢复性能。但是,像Swinir这样的进步采用了基于窗口的和本地注意力的策略来平衡性能和计算开销,这限制了采用大型接收领域来捕获全球信息并在早期层中建立长期依赖性。为了进一步提高捕获全球信息的效率,在这项工作中,我们建议Swinfir通过更换具有整个图像范围的接收场的快速傅立叶卷积(FFC)组件来扩展Swinir。我们还重新访问其他先进技术,即数据增强,预训练和功能集合,以改善图像重建的效果。并且我们的功能合奏方法使模型的性能得以大大增强,而无需增加训练和测试时间。与现有方法相比,我们将算法应用于多个流行的大规模基准,并实现了最先进的性能。例如,我们的Swinfir在漫画109数据集上达到了32.83 dB的PSNR,该PSNR比最先进的Swinir方法高0.8 dB。
translated by 谷歌翻译
Recently, Transformer-based image restoration networks have achieved promising improvements over convolutional neural networks due to parameter-independent global interactions. To lower computational cost, existing works generally limit self-attention computation within non-overlapping windows. However, each group of tokens are always from a dense area of the image. This is considered as a dense attention strategy since the interactions of tokens are restrained in dense regions. Obviously, this strategy could result in restricted receptive fields. To address this issue, we propose Attention Retractable Transformer (ART) for image restoration, which presents both dense and sparse attention modules in the network. The sparse attention module allows tokens from sparse areas to interact and thus provides a wider receptive field. Furthermore, the alternating application of dense and sparse attention modules greatly enhances representation ability of Transformer while providing retractable attention on the input image.We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks. Experimental results validate that our proposed ART outperforms state-of-the-art methods on various benchmark datasets both quantitatively and visually. We also provide code and models at the website https://github.com/gladzhang/ART.
translated by 谷歌翻译
盲面修复(BFR)旨在从相应的低质量(LQ)输入中构建高质量(HQ)面部图像。最近,已经提出了许多BFR方法,并取得了杰出的成功。但是,这些方法经过私人合成的数据集进行了培训或评估,这使得与后续方法相比的方法是不可行的。为了解决这个问题,我们首先合成两个称为EDFEACE-CELEB-1M(BFR128)和EDFACE-CELEB-150K(BFR512)的盲面恢复基准数据集。在五个设置下,将最先进的方法在它们的五个设置下进行了基准测试,包括模糊,噪声,低分辨率,JPEG压缩伪像及其组合(完全退化)。为了使比较更全面,应用了五个广泛使用的定量指标和两个任务驱动的指标,包括平均面部标志距离(AFLD)和平均面部ID余弦相似性(AFICS)。此外,我们开发了一个有效的基线模型,称为Swin Transformer U-NET(昏迷)。带有U-NET体系结构的昏迷器应用了注意机制和移动的窗口方案,以捕获远程像素相互作用,并更多地关注重要功能,同时仍受到有效训练。实验结果表明,所提出的基线方法对各种BFR任务的SOTA方法表现出色。
translated by 谷歌翻译
现实世界图像Denoising是一个实用的图像恢复问题,旨在从野外嘈杂的输入中获取干净的图像。最近,Vision Transformer(VIT)表现出强大的捕获远程依赖性的能力,许多研究人员试图将VIT应用于图像DeNosing任务。但是,现实世界的图像是一个孤立的框架,它使VIT构建了内部贴片的远程依赖性,该依赖性将图像分为贴片并混乱噪声模式和梯度连续性。在本文中,我们建议通过使用连续的小波滑动转换器来解决此问题,该小波滑动转换器在现实世界中构建频率对应关系,称为dnswin。具体而言,我们首先使用CNN编码器从嘈杂的输入图像中提取底部功能。 DNSWIN的关键是将高频和低频信息与功能和构建频率依赖性分开。为此,我们提出了小波滑动窗口变压器,该变压器利用离散的小波变换,自我注意力和逆离散小波变换来提取深度特征。最后,我们使用CNN解码器将深度特征重建为DeNo的图像。对现实世界的基准测试的定量和定性评估都表明,拟议的DNSWIN对最新方法的表现良好。
translated by 谷歌翻译
Face Restoration (FR) aims to restore High-Quality (HQ) faces from Low-Quality (LQ) input images, which is a domain-specific image restoration problem in the low-level computer vision area. The early face restoration methods mainly use statistic priors and degradation models, which are difficult to meet the requirements of real-world applications in practice. In recent years, face restoration has witnessed great progress after stepping into the deep learning era. However, there are few works to study deep learning-based face restoration methods systematically. Thus, this paper comprehensively surveys recent advances in deep learning techniques for face restoration. Specifically, we first summarize different problem formulations and analyze the characteristic of the face image. Second, we discuss the challenges of face restoration. Concerning these challenges, we present a comprehensive review of existing FR methods, including prior based methods and deep learning-based methods. Then, we explore developed techniques in the task of FR covering network architectures, loss functions, and benchmark datasets. We also conduct a systematic benchmark evaluation on representative methods. Finally, we discuss future directions, including network designs, metrics, benchmark datasets, applications,etc. We also provide an open-source repository for all the discussed methods, which is available at https://github.com/TaoWangzj/Awesome-Face-Restoration.
translated by 谷歌翻译
本文回顾了AIM 2022上压缩图像和视频超级分辨率的挑战。这项挑战包括两条曲目。轨道1的目标是压缩图像的超分辨率,轨迹〜2靶向压缩视频的超分辨率。在轨道1中,我们使用流行的数据集DIV2K作为培训,验证和测试集。在轨道2中,我们提出了LDV 3.0数据集,其中包含365个视频,包括LDV 2.0数据集(335个视频)和30个其他视频。在这一挑战中,有12支球队和2支球队分别提交了赛道1和赛道2的最终结果。所提出的方法和解决方案衡量了压缩图像和视频上超分辨率的最先进。提出的LDV 3.0数据集可在https://github.com/renyang-home/ldv_dataset上找到。此挑战的首页是在https://github.com/renyang-home/aim22_compresssr。
translated by 谷歌翻译
Informative features play a crucial role in the single image super-resolution task. Channel attention has been demonstrated to be effective for preserving information-rich features in each layer. However, channel attention treats each convolution layer as a separate process that misses the correlation among different layers. To address this problem, we propose a new holistic attention network (HAN), which consists of a layer attention module (LAM) and a channel-spatial attention module (CSAM), to model the holistic interdependencies among layers, channels, and positions. Specifically, the proposed LAM adaptively emphasizes hierarchical features by considering correlations among layers. Meanwhile, CSAM learns the confidence at all the positions of each channel to selectively capture more informative features. Extensive experiments demonstrate that the proposed HAN performs favorably against the state-ofthe-art single image super-resolution approaches.
translated by 谷歌翻译
通过利用大型内核分解和注意机制,卷积神经网络(CNN)可以在许多高级计算机视觉任务中与基于变压器的方法竞争。但是,由于远程建模的优势,具有自我注意力的变压器仍然主导着低级视野,包括超分辨率任务。在本文中,我们提出了一个基于CNN的多尺度注意网络(MAN),该网络由多尺度的大内核注意力(MLKA)和一个封闭式的空间注意单元(GSAU)组成,以提高卷积SR网络的性能。在我们的MLKA中,我们使用多尺度和栅极方案纠正LKA,以在各种粒度水平上获得丰富的注意图,从而共同汇总了全局和局部信息,并避免了潜在的阻塞伪像。在GSAU中,我们集成了栅极机制和空间注意力,以消除不必要的线性层和汇总信息丰富的空间环境。为了确认我们的设计的有效性,我们通过简单地堆叠不同数量的MLKA和GSAU来评估具有多种复杂性的人。实验结果表明,我们的人可以在最先进的绩效和计算之间实现各种权衡。代码可从https://github.com/icandle/man获得。
translated by 谷歌翻译
Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting pertinent textures from nearby frames with known degradation processes. Despite significant progress, grand challenges are remained to effectively extract and transmit high-quality textures from high-degraded low-quality sequences, such as blur, additive noises, and compression artifacts. In this work, a novel Frequency-Transformer (FTVSR) is proposed for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain. First, video frames are split into patches and each patch is transformed into spectral maps in which each channel represents a frequency band. It permits a fine-grained self-attention on each frequency band, so that real visual texture can be distinguished from artifacts. Second, a novel dual frequency attention (DFA) mechanism is proposed to capture the global frequency relations and local frequency relations, which can handle different complicated degradation processes in real-world scenarios. Third, we explore different self-attention schemes for video processing in the frequency domain and discover that a ``divided attention'' which conducts a joint space-frequency attention before applying temporal-frequency attention, leads to the best video enhancement quality. Extensive experiments on three widely-used VSR datasets show that FTVSR outperforms state-of-the-art methods on different low-quality videos with clear visual margins. Code and pre-trained models are available at https://github.com/researchmm/FTVSR.
translated by 谷歌翻译
压缩视频超分辨率(VSR)旨在从压缩的低分辨率对应物中恢复高分辨率帧。最近的VSR方法通常通过借用相邻视频帧的相关纹理来增强输入框架。尽管已经取得了一些进展,但是从压缩视频中有效提取和转移高质量纹理的巨大挑战,这些视频通常会高度退化。在本文中,我们提出了一种用于压缩视频超分辨率(FTVSR)的新型频率转换器,该频率在联合时空频域中进行自我注意。首先,我们将视频框架分为斑块,然后将每个贴片转换为DCT光谱图,每个通道代表频带。这样的设计使每个频带都可以进行细粒度的自我注意力,因此可以将真实的视觉纹理与伪影区分开,并进一步用于视频框架修复。其次,我们研究了不同的自我发场方案,并发现在对每个频带上应用暂时关注之前,会引起关节空间的注意力,从而带来最佳的视频增强质量。两个广泛使用的视频超分辨率基准的实验结果表明,FTVSR在未压缩和压缩视频的最先进的方法中都具有清晰的视觉边距。代码可在https://github.com/researchmm/ftvsr上找到。
translated by 谷歌翻译
Real-world image super-resolution (RISR) has received increased focus for improving the quality of SR images under unknown complex degradation. Existing methods rely on the heavy SR models to enhance low-resolution (LR) images of different degradation levels, which significantly restricts their practical deployments on resource-limited devices. In this paper, we propose a novel Dynamic Channel Splitting scheme for efficient Real-world Image Super-Resolution, termed DCS-RISR. Specifically, we first introduce the light degradation prediction network to regress the degradation vector to simulate the real-world degradations, upon which the channel splitting vector is generated as the input for an efficient SR model. Then, a learnable octave convolution block is proposed to adaptively decide the channel splitting scale for low- and high-frequency features at each block, reducing computation overhead and memory cost by offering the large scale to low-frequency features and the small scale to the high ones. To further improve the RISR performance, Non-local regularization is employed to supplement the knowledge of patches from LR and HR subspace with free-computation inference. Extensive experiments demonstrate the effectiveness of DCS-RISR on different benchmark datasets. Our DCS-RISR not only achieves the best trade-off between computation/parameter and PSNR/SSIM metric, and also effectively handles real-world images with different degradation levels.
translated by 谷歌翻译
Recent research on super-resolution has progressed with the development of deep convolutional neural networks (DCNN). In particular, residual learning techniques exhibit improved performance. In this paper, we develop an enhanced deep super-resolution network (EDSR) with performance exceeding those of current state-of-the-art SR methods. The significant performance improvement of our model is due to optimization by removing unnecessary modules in conventional residual networks. The performance is further improved by expanding the model size while we stabilize the training procedure. We also propose a new multi-scale deep super-resolution system (MDSR) and training method, which can reconstruct high-resolution images of different upscaling factors in a single model. The proposed methods show superior performance over the state-of-the-art methods on benchmark datasets and prove its excellence by winning the NTIRE2017 Super-Resolution Challenge [26].
translated by 谷歌翻译
视频修复(例如,视频超分辨率)旨在从低品质框架中恢复高质量的帧。与单图像恢复不同,视频修复通常需要从多个相邻但通常未对准视频帧的时间信息。现有的深度方法通常通过利用滑动窗口策略或经常性体系结构来解决此问题,该策略要么受逐帧恢复的限制,要么缺乏远程建模能力。在本文中,我们提出了一个带有平行框架预测和远程时间依赖性建模能力的视频恢复变压器(VRT)。更具体地说,VRT由多个量表组成,每个量表由两种模块组成:时间相互注意(TMSA)和平行翘曲。 TMSA将视频分为小剪辑,将相互关注用于关节运动估计,特征对齐和特征融合,而自我注意力则用于特征提取。为了启用交叉交互,视频序列对其他每一层都发生了变化。此外,通过并行功能翘曲,并行翘曲用于进一步从相邻帧中融合信息。有关五项任务的实验结果,包括视频超分辨率,视频脱张,视频denoising,视频框架插值和时空视频超级分辨率,证明VRT优于大幅度的最先进方法($ \ textbf) {最高2.16db} $)在十四个基准数据集上。
translated by 谷歌翻译
在立体声设置下,可以通过利用第二视图提供的其他信息来进一步改善图像JPEG伪像删除的性能。但是,将此信息纳入立体声图像jpeg trifacts删除是一个巨大的挑战,因为现有的压缩工件使像素级视图对齐变得困难。在本文中,我们提出了一个新颖的视差变压器网络(PTNET),以整合来自立体图像对的立体图像对jpeg jpeg trifacts删除的信息。具体而言,提出了精心设计的对称性双向视差变压器模块,以匹配具有不同视图之间相似纹理的特征,而不是像素级视图对齐。由于遮挡和边界的问题,提出了一个基于置信的跨视图融合模块,以实现两种视图的更好的特征融合,其中跨视图特征通过置信图加权。尤其是,我们为跨视图的互动采用粗到最新的设计,从而提高性能。全面的实验结果表明,与其他测试最新方法相比,我们的PTNET可以有效地消除压缩伪像并获得更高的性能。
translated by 谷歌翻译
视频修复旨在从多个低质量框架中恢复多个高质量的帧。现有的视频修复方法通常属于两种极端情况,即它们并行恢复所有帧,或者以复发方式恢复视频框架,这将导致不同的优点和缺点。通常,前者具有时间信息融合的优势。但是,它遭受了较大的模型尺寸和密集的内存消耗;后者的模型大小相对较小,因为它在跨帧中共享参数。但是,它缺乏远程依赖建模能力和并行性。在本文中,我们试图通过提出经常性视频恢复变压器(即RVRT)来整合两种情况的优势。 RVRT在全球经常性的框架内并行处理本地相邻框架,该框架可以在模型大小,有效性和效率之间实现良好的权衡。具体而言,RVRT将视频分为多个剪辑,并使用先前推断的剪辑功能来估计后续剪辑功能。在每个剪辑中,通过隐式特征聚合共同更新不同的帧功能。在不同的剪辑中,引导的变形注意力是为剪辑对齐对齐的,该剪辑对齐可预测整个推断的夹子中的多个相关位置,并通过注意机制汇总其特征。关于视频超分辨率,DeBlurring和DeNoising的广泛实验表明,所提出的RVRT在具有平衡模型大小,测试内存和运行时的基准数据集上实现了最先进的性能。
translated by 谷歌翻译
Recently, great progress has been made in single-image super-resolution (SISR) based on deep learning technology. However, the existing methods usually require a large computational cost. Meanwhile, the activation function will cause some features of the intermediate layer to be lost. Therefore, it is a challenge to make the model lightweight while reducing the impact of intermediate feature loss on the reconstruction quality. In this paper, we propose a Feature Interaction Weighted Hybrid Network (FIWHN) to alleviate the above problem. Specifically, FIWHN consists of a series of novel Wide-residual Distillation Interaction Blocks (WDIB) as the backbone, where every third WDIBs form a Feature shuffle Weighted Group (FSWG) by mutual information mixing and fusion. In addition, to mitigate the adverse effects of intermediate feature loss on the reconstruction results, we introduced a well-designed Wide Convolutional Residual Weighting (WCRW) and Wide Identical Residual Weighting (WIRW) units in WDIB, and effectively cross-fused features of different finenesses through a Wide-residual Distillation Connection (WRDC) framework and a Self-Calibrating Fusion (SCF) unit. Finally, to complement the global features lacking in the CNN model, we introduced the Transformer into our model and explored a new way of combining the CNN and Transformer. Extensive quantitative and qualitative experiments on low-level and high-level tasks show that our proposed FIWHN can achieve a good balance between performance and efficiency, and is more conducive to downstream tasks to solve problems in low-pixel scenarios.
translated by 谷歌翻译
在本文中,我们呈现了UFFORER,一种用于图像恢复的有效和高效的变换器架构,其中我们使用变压器块构建分层编码器解码器网络。在UFFAR中,有两个核心设计。首先,我们介绍了一个新颖的本地增强型窗口(Lewin)变压器块,其执行基于窗口的自我关注而不是全局自我关注。它显着降低了高分辨率特征映射的计算复杂性,同时捕获本地上下文。其次,我们提出了一种以多尺度空间偏置的形式提出了一种学习的多尺度恢复调制器,以调整UFFORER解码器的多个层中的特征。我们的调制器展示了卓越的能力,用于恢复各种图像恢复任务的详细信息,同时引入边缘额外参数和计算成本。通过这两个设计提供支持,UFFORER享有高能力,可以捕获本地和全局依赖性的图像恢复。为了评估我们的方法,在几种图像恢复任务中进行了广泛的实验,包括图像去噪,运动脱棕,散焦和污染物。没有钟声和口哨,与最先进的算法相比,我们的UFormer实现了卓越的性能或相当的性能。代码和模型可在https://github.com/zhendongwang6/uformer中找到。
translated by 谷歌翻译