Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities. Prior methods successfully preserve the character of clothing images, however, occlusion remains a pernicious effect for realistic virtual try-on. In this work, we first present a comprehensive analysis of the occlusions and categorize them into two aspects: i) Inherent-Occlusion: the ghost of the former cloth still exists in the try-on image; ii) Acquired-Occlusion: the target cloth warps to the unreasonable body part. Based on the in-depth analysis, we find that the occlusions can be simulated by a novel semantically-guided mixup module, which can generate semantic-specific occluded images that work together with the try-on images to facilitate training a de-occlusion try-on (DOC-VTON) framework. Specifically, DOC-VTON first conducts a sharpened semantic parsing on the try-on person. Aided by semantics guidance and pose prior, various complexities of texture are selectively blending with human parts in a copy-and-paste manner. Then, the Generative Module (GM) is utilized to take charge of synthesizing the final try-on image and learning to de-occlusion jointly. In comparison to the state-of-the-art methods, DOC-VTON achieves better perceptual quality by reducing occlusion effects.
translated by 谷歌翻译
未配对的图像到图像翻译旨在找到源域和目标域之间的映射。为了减轻缺乏源图像的监督标签的问题,通过假设未配对的图像之间的可逆关系,已经提出了基于周期矛盾的方法来保存图像结构。但是,此假设仅使用图像对之间的有限对应关系。最近,使用基于贴片的正/负学习,对比度学习(CL)已被用来进一步研究未配对图像翻译中的图像对应关系。基于贴片的对比例程通过自相似度计算获得阳性,并将其余的斑块视为负面。这种灵活的学习范式以低成本获得辅助上下文化信息。由于负面的样本人数令人印象深刻,因此我们有好奇心,我们基于一个问题进行了调查:是否需要所有负面的对比度学习?与以前的CL方法不同,在本文中,我们从信息理论的角度研究了负面因素,并通过稀疏和对补丁进行排名来引入一种新的负面修剪技术,以用于未配对的图像到图像翻译(PUT) 。所提出的算法是有效的,灵活的,并使模型能够稳定地学习相应贴片之间的基本信息。通过将质量置于数量上,只需要几个负贴片即可获得更好的结果。最后,我们通过比较实验验证了模型的优势,稳定性和多功能性。
translated by 谷歌翻译
translated by 谷歌翻译
Face forgery detection plays an important role in personal privacy and social security. With the development of adversarial generative models, high-quality forgery images become more and more indistinguishable from real to humans. Existing methods always regard as forgery detection task as the common binary or multi-label classification, and ignore exploring diverse multi-modality forgery image types, e.g. visible light spectrum and near-infrared scenarios. In this paper, we propose a novel Hierarchical Forgery Classifier for Multi-modality Face Forgery Detection (HFC-MFFD), which could effectively learn robust patches-based hybrid domain representation to enhance forgery authentication in multiple-modality scenarios. The local spatial hybrid domain feature module is designed to explore strong discriminative forgery clues both in the image and frequency domain in local distinct face regions. Furthermore, the specific hierarchical face forgery classifier is proposed to alleviate the class imbalance problem and further boost detection performance. Experimental results on representative multi-modality face forgery datasets demonstrate the superior performance of the proposed HFC-MFFD compared with state-of-the-art algorithms. The source code and models are publicly available at
translated by 谷歌翻译
Controller design for bipedal walking on dynamic rigid surfaces (DRSes), which are rigid surfaces moving in the inertial frame (e.g., ships and airplanes), remains largely uninvestigated. This paper introduces a hierarchical control approach that achieves stable underactuated bipedal robot walking on a horizontally oscillating DRS. The highest layer of our approach is a real-time motion planner that generates desired global behaviors (i.e., the center of mass trajectories and footstep locations) by stabilizing a reduced-order robot model. One key novelty of this layer is the derivation of the reduced-order model by analytically extending the angular momentum based linear inverted pendulum (ALIP) model from stationary to horizontally moving surfaces. The other novelty is the development of a discrete-time foot-placement controller that exponentially stabilizes the hybrid, linear, time-varying ALIP model. The middle layer of the proposed approach is a walking pattern generator that translates the desired global behaviors into the robot's full-body reference trajectories for all directly actuated degrees of freedom. The lowest layer is an input-output linearizing controller that exponentially tracks those full-body reference trajectories based on the full-order, hybrid, nonlinear robot dynamics. Simulations of planar underactuated bipedal walking on a swaying DRS confirm that the proposed framework ensures the walking stability under different DRS motions and gait types.
translated by 谷歌翻译
在本文中,我们提出了用于滚动快门摄像机的概率连续时间视觉惯性频道(VIO)。连续的时轨迹公式自然促进异步高频IMU数据和运动延伸的滚动快门图像的融合。为了防止棘手的计算负载,提出的VIO是滑动窗口和基于密钥帧的。我们建议概率地将控制点边缘化,以保持滑动窗口中恒定的密钥帧数。此外,可以在我们的连续时间VIO中在线校准滚动快门相机的线曝光时间差(线延迟)。为了广泛检查我们的连续时间VIO的性能,对公共可用的WHU-RSVI,TUM-RSVI和Sensetime-RSVI Rolling快门数据集进行了实验。结果表明,提出的连续时间VIO显着优于现有的最新VIO方法。本文的代码库也将通过\ url {}开源。
translated by 谷歌翻译
现实世界图像Denoising是一个实用的图像恢复问题,旨在从野外嘈杂的输入中获取干净的图像。最近,Vision Transformer(VIT)表现出强大的捕获远程依赖性的能力,许多研究人员试图将VIT应用于图像DeNosing任务。但是,现实世界的图像是一个孤立的框架,它使VIT构建了内部贴片的远程依赖性,该依赖性将图像分为贴片并混乱噪声模式和梯度连续性。在本文中,我们建议通过使用连续的小波滑动转换器来解决此问题,该小波滑动转换器在现实世界中构建频率对应关系,称为dnswin。具体而言,我们首先使用CNN编码器从嘈杂的输入图像中提取底部功能。 DNSWIN的关键是将高频和低频信息与功能和构建频率依赖性分开。为此,我们提出了小波滑动窗口变压器,该变压器利用离散的小波变换,自我注意力和逆离散小波变换来提取深度特征。最后,我们使用CNN解码器将深度特征重建为DeNo的图像。对现实世界的基准测试的定量和定性评估都表明,拟议的DNSWIN对最新方法的表现良好。
translated by 谷歌翻译
translated by 谷歌翻译
近年来,随着面部编辑和发电的迅速发展,越来越多的虚假视频正在社交媒体上流传,这引起了极端公众的关注。基于频域的现有面部伪造方法发现,与真实图像相比,GAN锻造图像在频谱中具有明显的网格视觉伪像。但是对于综合视频,这些方法仅局限于单个帧,几乎不关注不同框架之间最歧视的部分和时间频率线索。为了充分利用视频序列中丰富的信息,本文对空间和时间频域进行了视频伪造检测,并提出了一个离散的基于余弦转换的伪造线索增强网络(FCAN-DCT),以实现更全面的时空功能表示。 FCAN-DCT由一个骨干网络和两个分支组成:紧凑特征提取(CFE)模块和频率时间注意(FTA)模块。我们对两个可见光(VIS)数据集Wilddeepfake和Celeb-DF(V2)进行了彻底的实验评估,以及我们的自我构建的视频伪造数据集DeepFakenir,这是第一个近境模式的视频伪造数据集。实验结果证明了我们方法在VIS和NIR场景中检测伪造视频的有效性。
translated by 谷歌翻译
translated by 谷歌翻译