Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Determining causal effects of temporal multi-intervention assists decision-making. Restricted by time-varying bias, selection bias, and interactions of multiple interventions, the disentanglement and estimation of multiple treatment effects from individual temporal data is still rare. To tackle these challenges, we propose a comprehensive framework of temporal counterfactual forecasting from an individual multiple treatment perspective (TCFimt). TCFimt constructs adversarial tasks in a seq2seq framework to alleviate selection and time-varying bias and designs a contrastive learning-based block to decouple a mixed treatment effect into separated main treatment effects and causal interactions which further improves estimation accuracy. Through implementing experiments on two real-world datasets from distinct fields, the proposed method shows satisfactory performance in predicting future outcomes with specific treatments and in choosing optimal treatment type and timing than state-of-the-art methods.
translated by 谷歌翻译
In this work, we propose a semantic flow-guided two-stage framework for shape-aware face swapping, namely FlowFace. Unlike most previous methods that focus on transferring the source inner facial features but neglect facial contours, our FlowFace can transfer both of them to a target face, thus leading to more realistic face swapping. Concretely, our FlowFace consists of a face reshaping network and a face swapping network. The face reshaping network addresses the shape outline differences between the source and target faces. It first estimates a semantic flow (i.e., face shape differences) between the source and the target face, and then explicitly warps the target face shape with the estimated semantic flow. After reshaping, the face swapping network generates inner facial features that exhibit the identity of the source face. We employ a pre-trained face masked autoencoder (MAE) to extract facial features from both the source face and the target face. In contrast to previous methods that use identity embedding to preserve identity information, the features extracted by our encoder can better capture facial appearances and identity information. Then, we develop a cross-attention fusion module to adaptively fuse inner facial features from the source face with the target facial attributes, thus leading to better identity preservation. Extensive quantitative and qualitative experiments on in-the-wild faces demonstrate that our FlowFace outperforms the state-of-the-art significantly.
translated by 谷歌翻译
为多个机器人制定安全,稳定和高效的避免障碍政策是具有挑战性的。大多数现有研究要么使用集中控制,要么需要与其他机器人进行通信。在本文中,我们提出了一种基于对数地图的新型对数深度强化学习方法,以避免复杂且无通信的多机器人方案。特别是,我们的方法将激光信息转换为对数图。为了提高训练速度和概括性能,我们的政策将在两个专门设计的多机器人方案中进行培训。与其他方法相比,对数图可以更准确地表示障碍,并提高避免障碍的成功率。我们最终在各种模拟和现实情况下评估了我们的方法。结果表明,我们的方法为复杂的多机器人场景和行人场景中的机器人提供了一种更稳定,更有效的导航解决方案。视频可在https://youtu.be/r0esuxe6mze上找到。
translated by 谷歌翻译
我们研究了普遍存在的动作,即所有动作都有预设执行持续时间的环境中,研究了无模型的多机械加固学习(MARL)。在执行期间,环境变化受到动作执行的影响但不同步。在许多现实世界中,这种设置无处不在。但是,大多数MAL方法都假定推断后立即执行动作,这通常是不现实的,并且可能导致多机构协调的灾难性失败。为了填补这一空白,我们为MARL开发了一个算法的算法框架。然后,我们为无模型的MARL算法提出了一种新颖的情节记忆,legeM。 Legem通过利用代理人的个人经历来建立代理商的情节记忆。它通过解决了通过我们的新型奖励再分配计划提出的具有挑战性的时间信用分配问题来提高多机构学习,从而减轻了非马克维亚奖励的问题。我们在各种多代理方案上评估了Legem,其中包括猎鹿游戏,采石场游戏,造林游戏和Starcraft II微管理任务。经验结果表明,LegeM显着提高了多机构的协调,并提高了领先的绩效并提高了样本效率。
translated by 谷歌翻译
在复杂的协调问题中,深层合作多智能经纪增强学习(Marl)的高效探索仍然依然存在挑战。在本文中,我们介绍了一种具有奇妙驱动的探索的新型情节多功能钢筋学习,称为EMC。我们利用对流行分解的MARL算法的洞察力“诱导的”个体Q值,即用于本地执行的单个实用程序功能,是本地动作观察历史的嵌入,并且可以捕获因奖励而捕获代理之间的相互作用在集中培训期间的反向化。因此,我们使用单独的Q值的预测误差作为协调勘探的内在奖励,利用集肠内存来利用探索的信息经验来提高政策培训。随着代理商的个人Q值函数的动态捕获了国家的新颖性和其他代理人的影响,我们的内在奖励可以促使对新或有前途的国家的协调探索。我们通过教学实例说明了我们的方法的优势,并展示了在星际争霸II微互动基准中挑战任务的最先进的MARL基础上的其显着优势。
translated by 谷歌翻译
最近,已被证明基于大规模的变换器的模型在许多域中的各种任务中有效。尽管如此,将它们投入生产非常昂贵,需要全面的优化技术来降低推理成本。本文介绍了一系列变压器推理优化技术,既可算法等级和硬件级别。这些技术包括预填充解码机制,其改善了文本生成的令牌并行性,并且设计用于非常长的输入长度和大的隐藏尺寸设计的高度优化的内核。在此基础上,我们提出了一种变压器推理加速库 - 简单高效的变压器(EET),对现有库具有显着的性能改进。与更快的变压器V4.0在A100上的GPT-2层的实现相比,EET实现了1.5-4.5倍的最先进的加速,随着不同的上下文长度而变化。 EET可在https://github.com/netease-fuxi/eet中获得。 Demo视频可在https://youtu.be/22upcngcerg获得。
translated by 谷歌翻译
元增强学习(Meta-RL)从以前任务提取知识,并实现对新任务的快速调整。尽管最近的进展,但Meta-RL的有效探索仍然是稀疏奖励任务中的关键挑战,因为它需要快速寻找在荟萃培训和适应方面的信息相关的经验。为了解决这一挑战,我们明确地模拟了Meta-RL的探索政策学习问题,该探索政策学习问题与开发政策学习分开,并介绍了一种新的赋权驱动探索目标,旨在最大限度地提高任务识别的信息收益。我们派生了相应的内在奖励并开发了一个新的off-Policy Meta-RL框架,它通过分享任务推断的知识有效地学习单独的上下文感知探索和开发策略。实验评估表明,我们的META-RL方法显着优于各种稀疏奖励Mujoco机器人任务和更复杂的稀疏奖励元世界任务的最先进的基线。
translated by 谷歌翻译
This paper presents a comprehensive survey of low-light image and video enhancement. We begin with the challenging mixed over-/under-exposed images, which are under-performed by existing methods. To this end, we propose two variants of the SICE dataset named SICE_Grad and SICE_Mix. Next, we introduce Night Wenzhou, a large-scale, high-resolution video dataset, to address the issue of the lack of a low-light video dataset that discount the use of low-light image enhancement (LLIE) to videos. The Night Wenzhou dataset is challenging since it consists of fast-moving aerial scenes and streetscapes with varying illuminations and degradation. We conduct extensive key technique analysis and experimental comparisons for representative LLIE approaches using these newly proposed datasets and the current benchmark datasets. Finally, we address unresolved issues and propose future research topics for the LLIE community.
translated by 谷歌翻译
由于点云数据结构的不规则性,点云分析具有挑战性。现有作品通常采用PointNet ++的临时采样组操作,然后采用复杂的本地和/或全局特征提取器来利用点云的3D几何形状。不幸的是,这些复杂的手工制作的模型设计导致过去几年的推理潜伏期和性能饱和度较差。在本文中,我们指出,不规则点云上的经典抽样组对随后的MLP层的学习难度。为了减少点云的不规则性,我们在采样组操作后引入了双向模块。 DualNorm模块由点归一化组成,该点标准化将分组的点标准化为采样点,并反向点归一化,从而将采样点归一化为分组点。拟议的PointNorm利用本地平均值和全球标准偏差从本地和全球特征中受益,同时保持忠实的推理速度。点云分类的实验表明,我们在ModelNet40和ScanoBjectNN数据集上实现了最新的精度。我们还概括了我们的模型以指向云部分细分并在ShapenetPart数据集上展示竞争性能。代码可在https://github.com/shenzheng2000/pointnorm-for-point-cloud-analysis中找到。
translated by 谷歌翻译