肢体语言是一种引人注目的社交信号,其自动分析可以大大提高人工智能系统,以理解和积极参与社交互动。尽管计算机视觉在诸如头部和身体姿势估计之类的低级任务中取得了令人印象深刻的进步,但探索诸如示意,修饰或摸索之类的更微妙行为的发现尚未得到很好的探索。在本文中,我们介绍了BBSI,这是复杂的身体行为的第一组注释,嵌入了小组环境中的连续社交互动中。根据心理学的先前工作,我们在MpiigroupContraction数据集中手动注释了26个小时的自发人类行为,并具有15种不同的肢体语言类别。我们介绍了所得数据集的全面描述性统计数据以及注释质量评估的结果。为了自动检测这些行为,我们适应了金字塔扩张的注意网络(PDAN),这是一种最新的人类动作检测方法。我们使用四个空间特征的四种变体作为PDAN的输入进行实验:两流膨胀的3D CNN,颞段网络,时间移位模块和SWIN变压器。结果是有希望的,这表明了这项艰巨的任务改进的好空间。 BBSI代表了自动理解社会行为的难题中的关键作品,研究界完全可以使用。
translated by 谷歌翻译
The distributed representation of symbols is one of the key technologies in machine learning systems today, playing a pivotal role in modern natural language processing. Traditional word embeddings associate a separate vector with each word. While this approach is simple and leads to good performance, it requires a lot of memory for representing a large vocabulary. To reduce the memory footprint, the default embedding layer in spaCy is a hash embeddings layer. It is a stochastic approximation of traditional embeddings that provides unique vectors for a large number of words without explicitly storing a separate vector for each of them. To be able to compute meaningful representations for both known and unknown words, hash embeddings represent each word as a summary of the normalized word form, subword information and word shape. Together, these features produce a multi-embedding of a word. In this technical report we lay out a bit of history and introduce the embedding methods in spaCy in detail. Second, we critically evaluate the hash embedding architecture with multi-embeddings on Named Entity Recognition datasets from a variety of domains and languages. The experiments validate most key design choices behind spaCy's embedders, but we also uncover a few surprising results.
translated by 谷歌翻译
Increasingly high-stakes decisions are made using neural networks in order to make predictions. Specifically, meteorologists and hedge funds apply these techniques to time series data. When it comes to prediction, there are certain limitations for machine learning models (such as lack of expressiveness, vulnerability of domain shifts and overconfidence) which can be solved using uncertainty estimation. There is a set of expectations regarding how uncertainty should ``behave". For instance, a wider prediction horizon should lead to more uncertainty or the model's confidence should be proportional to its accuracy. In this paper, different uncertainty estimation methods are compared to forecast meteorological time series data and evaluate these expectations. The results show how each uncertainty estimation method performs on the forecasting task, which partially evaluates the robustness of predicted uncertainty.
translated by 谷歌翻译
近年来,已经提出了新颖的激活功能来提高神经网络的性能,并且与Relu对应物相比,它们的性能卓越。但是,在某些环境中,复杂激活的可用性受到限制,并且通常只支持relu。在本文中,我们提出的方法可用于通过在模型训练期间使用这些有效的新型激活来改善Relu网络的性能。更具体地说,我们提出了由relu和这些新型激活之一组成的集合激活。此外,合奏的系数既不固定也不是固定的,而是在训练过程中逐渐更新的方式,即到训练结束时,只有RELU激活在网络中保持活跃,并且可以删除其他激活。这意味着在推理时间内,网络仅包含RELU激活。我们使用各种紧凑的网络体系结构和各种新型激活功能对Imagenet分类任务进行广泛的评估。结果显示0.2-0.8%的TOP-1准确性增益,这证实了所提出的方法的适用性。此外,我们演示了有关语义分割的建议方法,并在CityScapes数据集上提高了紧凑型分割网络的性能。
translated by 谷歌翻译
我们介绍并评估了一种弱监督的方法,以基于远程感知的数据和接近零的人类相互作用来量化城市森林的时空分布。成功训练语义细分的机器学习模型通常取决于高质量标签的可用性。我们评估高分辨率,三维点云数据(LIDAR)作为嘈杂标签的来源的好处,以便训练模型以在正吞原中的定位。作为概念证明,我们感觉到桑迪飓风对纽约市康尼岛(NYC)的城市森林的影响,并将其引用到纽约布鲁克林的影响较小的城市空间。
translated by 谷歌翻译
我们为有限的信息提供了一种用于平行扩散过程的超分辨率模型。虽然大多数超分辨率模型在训练中假设高分辨率(HR)地面真实数据,但在许多情况下,这种HR数据集不易访问。在这里,我们表明,通过基于物理的规则化训练的经常性卷积网络能够在不具有HR地面真实数据的情况下重建HR信息。此外,考虑到超分辨率问题的不良性质,我们采用了经常性的Wasserstein AutoEncoder来模拟不确定性。
translated by 谷歌翻译