我们提出了一种新的基于深入学习的方法,用于估计从空中平台捕获的3D点云估算植被层的占用。我们的模型预测了三个植被阶层的光栅占用地图:下层,中等和更高的地层。我们的培训方案允许我们的网络仅使用圆柱形图中聚合的值,这更容易产生比像素 - 明智的或点明智的注释。我们的方法在精度方面占据了手工制作和深度学习的基线,同时提供视觉和可解释的预测。我们沿着199农业地块的数据集提供了我们的方法的开源实现,以培训和评估占用回归算法。
Missing values are a common problem in data science and machine learning. Removing instances with missing values can adversely affect the quality of further data analysis. This is exacerbated when there are relatively many more features than instances, and thus the proportion of affected instances is high. Such a scenario is common in many important domains, for example, single nucleotide polymorphism (SNP) datasets provide a large number of features over a genome for a relatively small number of individuals. To preserve as much information as possible prior to modeling, a rigorous imputation scheme is acutely needed. While Denoising Autoencoders is a state-of-the-art method for imputation in high-dimensional data, they still require enough complete cases to be trained on which is often not available in real-world problems. In this paper, we consider missing value imputation as a multi-label classification problem and propose Chains of Autoreplicative Random Forests. Using multi-label Random Forests instead of neural networks works well for low-sampled data as there are fewer parameters to optimize. Experiments on several SNP datasets show that our algorithm effectively imputes missing values based only on information from the dataset and exhibits better performance than standard algorithms that do not require any additional information. In this paper, the algorithm is implemented specifically for SNP data, but it can easily be adapted for other cases of missing value imputation.
During training, reinforcement learning systems interact with the world without considering the safety of their actions. When deployed into the real world, such systems can be dangerous and cause harm to their surroundings. Often, dangerous situations can be mitigated by defining a set of rules that the system should not violate under any conditions. For example, in robot navigation, one safety rule would be to avoid colliding with surrounding objects and people. In this work, we define safety rules in terms of the relationships between the agent and objects and use them to prevent reinforcement learning systems from performing potentially harmful actions. We propose a new safe epsilon-greedy algorithm that uses safety rules to override agents' actions if they are considered to be unsafe. In our experiments, we show that a safe epsilon-greedy policy significantly increases the safety of the agent during training, improves the learning efficiency resulting in much faster convergence, and achieves better performance than the base model.
This paper examines the encoding of analogy in large-scale pretrained language models, such as BERT and GPT-2. Existing analogy datasets typically focus on a limited set of analogical relations, with a high similarity of the two domains between which the analogy holds. As a more realistic setup, we introduce the Scientific and Creative Analogy dataset (SCAN), a novel analogy dataset containing systematic mappings of multiple attributes and relational structures across dissimilar domains. Using this dataset, we test the analogical reasoning capabilities of several widely-used pretrained language models (LMs). We find that state-of-the-art LMs achieve low performance on these complex analogy tasks, highlighting the challenges still posed by analogy understanding.
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
It has been experimentally demonstrated that humans are able to learn in a manner that allows them to make predictions on categories for which they have not seen any examples (Malaviya et al., 2022). Sucholutsky and Schonlau (2020) have recently presented a machine learning approach that aims to do the same. They utilise synthetically generated data and demonstrate that it is possible to achieve sub-linear scaling and develop models that can learn to recognise N classes from M training samples where M is less than N - aka less-than-one shot learning. Their method was, however, defined for univariate or simple multivariate data (Sucholutsky et al., 2021). We extend it to work on large, high-dimensional and real-world datasets and empirically validate it in this new and challenging setting. We apply this method to learn previously unseen NLP tasks from very few examples (4, 8 or 16). We first generate compact, sophisticated less-than-one shot representations called soft-label prototypes which are fitted on training data, capturing the distribution of different classes across the input domain space. We then use a modified k-Nearest Neighbours classifier to demonstrate that soft-label prototypes can classify data competitively, even outperforming much more computationally complex few-shot learning methods.
为了在高移动性虚拟环境中实现柔软物体的高富度触觉渲染,我们提出了一种新颖的触觉显示dandeliontouch。一群无人机将触觉执行器传递给用户的指尖。 DandelionTouch的用户能够在不受设备工作区域限制的大空间中体验触觉反馈。重要的是,在与虚拟物体的长时间互动中,他们不会经历肌肉疲劳。手动跟踪和群控制算法允许用手动运动引导群,并避免在编队内部发生冲突。在这项研究中,研究了群体之间的阻抗连接的几种拓扑结构。该实验在实时在正方形轨迹上执行了一个遵循的实验,该实验表明,在恒星拓扑中连接的无人机执行了平均位置误差较低的轨迹(与其他阻抗拓扑相比,RMSE降低了20.6 \%与潜在的基于现场的群体控制相比,为40.9 \%。在所有具有阻抗行为的地层中,无人机的达到的速度比通过潜在场算法控制的群体高28%。此外,在与7名参与者的用户研究中评估了几种纤维骨架模式的感知。该研究表明,提议的时间延迟和频率调制的组合使用户可以同时成功识别VR中的表面特性和运动方向(平均识别率为70 \%,最大为93 \%)。 DandelionTouch建议在VR系统中提出一种新型的触觉反馈,无需手持或可穿戴界面。
现代时间域的光度测验收集了许多天文学对象的观察结果,大规模调查的即将到来的时代将提供更多信息。大多数对象从未接受过光谱随访,这对于瞬态尤其至关重要。超新星。在这种情况下,观察到的光曲线可以提供负担得起的替代方案。时间序列被积极用于光度分类和表征,例如峰值和光度下降估计。但是,收集的时间序列是多维的,不规则地采样,包含异常值,并且没有明确定义的系统不确定性。机器学习方法有助于以最有效的方式从可用数据中提取有用的信息。我们考虑了基于神经网络的几种光曲线近似方法:多层感知,贝叶斯神经网络以及使流量正常化,以近似单光曲线观察。使用模拟的Parperc和Real Zwicky瞬态设施数据样本的测试表明,即使很少有观察值足以拟合网络并获得比其他最新方法更好的近似质量。我们表明,这项工作中描述的方法具有比高斯流程更快的计算复杂性和更快的工作速度。我们分析了旨在填补光曲线观察中空白的近似技术的性能,并表明使用适当的技术会提高峰值发现和超新星分类的准确性。此外,研究结果是在GitHub上可用的Fulu Python库中组织的,该库可以很容易地由社区使用。
