We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors. Since human motions are highly diverse and have a property of quite different distribution from conditional modalities, such as textual descriptors in natural languages, it is hard to learn a probabilistic mapping from the desired conditional modality to the human motion sequences. Besides, the raw motion data from the motion capture system might be redundant in sequences and contain noises; directly modeling the joint distribution over the raw motion sequences and conditional modalities would need a heavy computational overhead and might result in artifacts introduced by the captured noises. To learn a better representation of the various human motion sequences, we first design a powerful Variational AutoEncoder (VAE) and arrive at a representative and low-dimensional latent code for a human motion sequence. Then, instead of using a diffusion model to establish the connections between the raw motion sequences and the conditional inputs, we perform a diffusion process on the motion latent space. Our proposed Motion Latent-based Diffusion model (MLD) could produce vivid motion sequences conforming to the given conditional inputs and substantially reduce the computational overhead in both the training and inference stages. Extensive experiments on various human motion generation tasks demonstrate that our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks, with two orders of magnitude faster than previous diffusion models on raw motion sequences.
translated by 谷歌翻译
Significant progress has been made in learning image classification neural networks under long-tail data distribution using robust training algorithms such as data re-sampling, re-weighting, and margin adjustment. Those methods, however, ignore the impact of data imbalance on feature normalization. The dominance of majority classes (head classes) in estimating statistics and affine parameters causes internal covariate shifts within less-frequent categories to be overlooked. To alleviate this challenge, we propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. In addition, a moving average-based expectation maximization (EM) algorithm is employed to estimate the statistical parameters of multiple Gaussian distributions. However, the EM algorithm is sensitive to initialization and can easily become stuck in local minima where the multiple Gaussian components continue to focus on majority classes. To tackle this issue, we developed a dual-path learning framework that employs class-aware split feature normalization to diversify the estimated Gaussian distributions, allowing the Gaussian components to fit with training samples of less-frequent classes more comprehensively. Extensive experiments on commonly used datasets demonstrated that the proposed method outperforms existing methods on long-tailed image classification.
translated by 谷歌翻译
Early-exiting dynamic neural networks (EDNN), as one type of dynamic neural networks, has been widely studied recently. A typical EDNN has multiple prediction heads at different layers of the network backbone. During inference, the model will exit at either the last prediction head or an intermediate prediction head where the prediction confidence is higher than a predefined threshold. To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data. This brings a train-test mismatch problem that all the prediction heads are optimized on all types of data in training phase while the deeper heads will only see difficult inputs in testing phase. Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions. To mitigate this problem, we formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively. We name our method BoostNet. Our experiments show it achieves the state-of-the-art performance on CIFAR100 and ImageNet datasets in both anytime and budgeted-batch prediction modes. Our code is released at https://github.com/SHI-Labs/Boosted-Dynamic-Networks.
translated by 谷歌翻译
Radar, the only sensor that could provide reliable perception capability in all weather conditions at an affordable cost, has been widely accepted as a key supplement to camera and LiDAR in modern advanced driver assistance systems (ADAS) and autonomous driving systems. Recent state-of-the-art works reveal that fusion of radar and LiDAR can lead to robust detection in adverse weather, such as fog. However, these methods still suffer from low accuracy of bounding box estimations. This paper proposes a bird's-eye view (BEV) fusion learning for an anchor box-free object detection system, which uses the feature derived from the radar range-azimuth heatmap and the LiDAR point cloud to estimate the possible objects. Different label assignment strategies have been designed to facilitate the consistency between the classification of foreground or background anchor points and the corresponding bounding box regressions. Furthermore, the performance of the proposed object detector can be further enhanced by employing a novel interactive transformer module. We demonstrated the superior performance of the proposed methods in this paper using the recently published Oxford Radar RobotCar (ORR) dataset. We showed that the accuracy of our system significantly outperforms the other state-of-the-art methods by a large margin.
translated by 谷歌翻译
Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth estimation solutions that can show a real-time performance on IoT platforms and smartphones. For this, the participants used a large-scale RGB-to-depth dataset that was collected with the ZED stereo camera capable to generated depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the Raspberry Pi 4 platform, where the developed solutions were able to generate VGA resolution depth maps at up to 27 FPS while achieving high fidelity results. All models developed in the challenge are also compatible with any Android or Linux-based mobile devices, their detailed description is provided in this paper.
translated by 谷歌翻译
许多参与者批评深度强化学习(DRL)算法在解决各种具有挑战性的强化学习(RL)问题方面已经取得了尖端的表现,包括具有高维连续状态和动作空间的复杂控制任务。尽管有广泛报道的成功,但现有的DRL算法经常遭受无效的勘探问题的困扰,从而导致学习稳定性和表现有限。为了解决这一限制,最近提出了几种集成DRL算法,以增强探索和稳定学习过程。但是,许多现有的合奏算法旨在单独训练每个基础学习者,而无需明确控制训练有素的基础学习者之间的协作。在本文中,我们提出了一种新技术,以基于多步集成方法来培训基础学习者的合奏。新的多步培训技术使我们能够为集合DRL开发一种新的层次结构培训算法,该算法通过显式的Inter-Learner参数共享来促进学习中的协作。理论上对我们的新算法的设计进行了验证。该算法在经验上也显示出在多个基准RL问题上的表现优于几种尖端的DRL算法。
translated by 谷歌翻译
尽管在各种应用中取得了突出的性能,但点云识别模型经常遭受自然腐败和对抗性扰动的困扰。在本文中,我们深入研究了点云识别模型的一般鲁棒性,并提出了点云对比对抗训练(PointCat)。 PointCat的主要直觉是鼓励目标识别模型缩小清洁点云和损坏点云之间的决策差距。具体而言,我们利用有监督的对比损失来促进识别模型提取的超晶体特征的对齐和均匀性,并设计一对带有动态原型指南的集中式损失,以避免这些特征与其属于其属于其归属类别群的偏离。为了提供更具挑战性的损坏点云,我们对噪声生成器以及从头开始的识别模型进行了对手训练,而不是将基于梯度的攻击用作内部循环,例如以前的对手训练方法。全面的实验表明,在包括各种损坏的情况下,所提出的PointCat优于基线方法,并显着提高不同点云识别模型的稳健性,包括各向同性点噪声,LIDAR模拟的噪声,随机点掉落和对抗性扰动。
translated by 谷歌翻译
车道检测是自动驾驶中的基本模块之一。在本文中,我们采用了一种仅变压器的方法来进行车道检测,因此,它可以受益于完全视觉变压器的开发,并通过精细的 - 通过精细 - 通过精细 - 通过精细的 - 调整重量在大型数据集上进行全面训练。更重要的是,本文提出了一个名为Priorlane的新颖和一般框架,该框架用于通过引入低成本的局部先验知识来增强完全视觉变压器的分割性能。 PriorLane利用仅编码变压器来融合由预训练的分割模型与先验知识嵌入的功能融合。请注意,知识嵌入对齐(KEA)模块可通过对齐知识嵌入来提高融合性能。我们ZJLAB数据集的广泛实验表明,Prior-Lane以2.82%MIOU优于SOTA LANE检测方法,并且该代码将在以下位置发布:https:// github。 com/vincentqqb/priorlane。
translated by 谷歌翻译
由于受试者辍学或扫描失败,在纵向研究中不可避免地扫描是不可避免的。在本文中,我们提出了一个深度学习框架,以预测获得的扫描中缺少扫描,从而迎合纵向婴儿研究。由于快速的对比和结构变化,特别是在生命的第一年,对婴儿脑MRI的预测具有挑战性。我们引入了值得信赖的变质生成对抗网络(MGAN),用于将婴儿脑MRI从一个时间点转换为另一个时间点。MGAN具有三个关键功能:(i)图像翻译利用空间和频率信息以进行详细信息提供映射;(ii)将注意力集中在具有挑战性地区的质量指导学习策略。(iii)多尺度杂种损失函数,可改善组织对比度和结构细节的翻译。实验结果表明,MGAN通过准确预测对比度和解剖学细节来优于现有的gan。
translated by 谷歌翻译
联合学习(FL)是一个新兴的隐私机器学习范式(ML)。 FL的一种重要类型是Cross-Silo FL,它使少数组织能够通过在本地保密数据并在中央参数服务器上汇总权重来合作训练共享模型。但是,在实践中,中央服务器可能容易受到恶意攻击或软件故障的影响。为了解决这个问题,在本文中,我们提出了DEFL,这是一个新颖的分散体重聚集框架,用于交叉silo fl。 DEFL通过在每个参与节点上汇总权重来消除中央服务器,并且仅在所有节点之间维护并同步当前的训练回合的权重。我们使用Multi-Krum来启用诚实节点的正确权重,并使用HotStuff来确保训练循环数和权重的一致性。此外,我们从理论上分析了DEFL的拜占庭式容错,收敛性和复杂性。我们对两个广泛的公共数据集进行了广泛的实验,即CIFAR-10和Sentiment140,以评估DEFL的性能。结果表明,与最先进的分散FL方法相比,DEFL可以防御通用的威胁模型,并以最小的精度损失损失降低了100倍的存储空间和最多减少网络开销的12倍。
translated by 谷歌翻译