产量估计是葡萄园管理中的强大工具,因为它允许种植者微调实践以优化产量和质量。但是,目前使用手动抽样进行估计,这是耗时和不精确的。这项研究表明,近端成像的应用与深度学习相结合,以进行葡萄园中的产量估计。使用车辆安装的传感套件进行连续数据收集,并使用商业收益率监控器在收获时结合了地面真实收益数据的收集,可以生成一个23,581个收益点和107,933张图像的大数据集。此外,这项研究是在机械管理的商业葡萄园中进行的,代表了一个充满挑战的图像分析环境,但在加利福尼亚中央山谷中的一组常见条件。测试了三个模型架构:对象检测,CNN回归和变压器模型。对象检测模型在手工标记的图像上进行了训练以定位葡萄束,并将束数量或像素区域求和以与葡萄产量相关。相反,回归模型端到端训练,以预测图像数据中的葡萄产量,而无需手动标记。结果表明,在代表性的保留数据集上,具有相当的绝对百分比误差为18%和18.5%的变压器和具有像素区域处理的对象检测模型。使用显着映射来证明CNN模型的注意力位于葡萄束的预测位置附近以及葡萄树冠的顶部。总体而言,该研究表明,近端成像和深度学习对于大规模预测葡萄群的适用性。此外,端到端建模方法能够与对象检测方法相当地执行,同时消除了手工标记的需求。
translated by 谷歌翻译
通常用于诊断和研究目的的组织病理学图像中炎症细胞结构的当前研究排除了许多有关活检幻灯片的信息。在自身免疫性疾病中,关于哪种细胞类型参与组织水平的炎症以及它们如何相互作用,仍然存在重大的研究问题。尽管可以使用传统方法来部分回答这些问题,但人工智能方法进行分割和分类提供了一种更有效的方法来了解自身免疫性疾病中炎症的结构,并对新颖见解保持着巨大的希望。在本文中,我们从经验上开发了使用人类组织的皮肌炎活检来检测和鉴定炎症细胞的深度学习方法。我们的方法将分类绩效提高了26%,细分性能提高了5%。我们还提出了一种新颖的后处理自动编码器体系结构,可将细分性能额外提高3%。我们已经在https://github.com/pranavsinghps1/dedl开源了我们的方法和架构
translated by 谷歌翻译
对机器人在现实世界中的准确控制需要一个控制系统,该控制系统能够考虑机器人与环境的动力学相互作用。在高速度下,机器人对这些运动动力学相互作用的运动依赖性变得更加明显,使高速,准确的机器人控制一个具有挑战性的问题。先前的工作表明,学习机器人的逆动力动力学(IKD)可能有助于高速机器人控制。但是,学习的逆运动动力学模型只能应用于有限的控制问题类别,不同的控制问题需要学习新的IKD模型。在这项工作中,我们提出了一种新的公式,用于精确,高速机器人控制,该配方利用了学习的前进运动动力学(FKD)模型和非线性最小二乘优化。从公式的本质上讲,这种方法可以扩展到各种各样的控制问题,而无需重新培训新模型。我们证明了这种方法在高速上准确控制刻度的十分之一机器人车的能力,并显示出比基线相比的结果。
translated by 谷歌翻译
深度学习和计算机视觉的最新进展减轻了许多瓶颈,从而使算法无标记,并且性能更好。具体而言,变形金刚提供了图像的全球视角,该图像卷积神经网络(CNN)缺乏设计。在这里,我们介绍了跨体系结构自学,这是一种新颖的自我监督学习方法,同时利用了变形金刚和CNN,同时也可以通过易于可用的云服务在计算上访问。与现有的最先进的自我监督学习方法相比,我们从经验上显示了经过CASS训练的CNN,而Transformers则使用100%标记的数据,平均获得8.5%,具有10%标记的数据,为11.5%,1.5%,1百分比在三个不同数据集中标记的数据。值得注意的是,一个被使用的数据集包括自身免疫性疾病的组织病理学幻灯片,这是医学成像中代表性不足的主题,并且数据最少。此外,我们的发现表明,就训练时间而言,CASS的效率是其他最先进方法的两倍。
translated by 谷歌翻译
高速偏离地面车辆的高速偏离道路导航的主要挑战之一是,车辆地形相互作用的动力动力学会根据地形而大不相同。以前解决这一挑战的方法已经考虑学习一种基于车辆的惯性信息,以感知运动动力学相互作用。在本文中,我们假设,除了过去的惯性信息外,还必须预料到将来,还必须预料到将来,还必须预料到将来,还必须预料到将来,还必须预料到将来,还必须预料到将来的动力学相互作用,以实现精确的高速越野导航。为此,我们引入了视觉惯性逆动力动力学(VI-IKD),这是一种新型的基于学习的IKD模型,除了过去的惯性信息外,还基于从机器人前面的地形贴片的视觉信息进行条件,使其能够预期会素动力学相互作用在将来。我们在室内和室外环境中验证了VI-IKD在实验上进行实验性高速越野导航的有效性ART方法,VI-IKD可以以高达3.5 m/s的速度在各种不同的地形上更准确,更强大的越野导航。
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
Object movement identification is one of the most researched problems in the field of computer vision. In this task, we try to classify a pixel as foreground or background. Even though numerous traditional machine learning and deep learning methods already exist for this problem, the two major issues with most of them are the need for large amounts of ground truth data and their inferior performance on unseen videos. Since every pixel of every frame has to be labeled, acquiring large amounts of data for these techniques gets rather expensive. Recently, Zhao et al. [1] proposed one of a kind Arithmetic Distribution Neural Network (ADNN) for universal background subtraction which utilizes probability information from the histogram of temporal pixels and achieves promising results. Building onto this work, we developed an intelligent video surveillance system that uses ADNN architecture for motion detection, trims the video with parts only containing motion, and performs anomaly detection on the trimmed video.
translated by 谷歌翻译
The machine translation mechanism translates texts automatically between different natural languages, and Neural Machine Translation (NMT) has gained attention for its rational context analysis and fluent translation accuracy. However, processing low-resource languages that lack relevant training attributes like supervised data is a current challenge for Natural Language Processing (NLP). We incorporated a technique known Active Learning with the NMT toolkit Joey NMT to reach sufficient accuracy and robust predictions of low-resource language translation. With active learning, a semi-supervised machine learning strategy, the training algorithm determines which unlabeled data would be the most beneficial for obtaining labels using selected query techniques. We implemented two model-driven acquisition functions for selecting the samples to be validated. This work uses transformer-based NMT systems; baseline model (BM), fully trained model (FTM) , active learning least confidence based model (ALLCM), and active learning margin sampling based model (ALMSM) when translating English to Hindi. The Bilingual Evaluation Understudy (BLEU) metric has been used to evaluate system results. The BLEU scores of BM, FTM, ALLCM and ALMSM systems are 16.26, 22.56 , 24.54, and 24.20, respectively. The findings in this paper demonstrate that active learning techniques helps the model to converge early and improve the overall quality of the translation system.
translated by 谷歌翻译
We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss. Our bound suggests that the average regret over tasks decreases as the number of tasks increases and as the tasks are more similar. In the classical single-task setting, it is known that the planning horizon should depend on the estimated model's accuracy, that is, on the number of samples within task. We generalize this finding to meta-RL and study this dependence of planning horizons on the number of tasks. Based on our theoretical findings, we derive heuristics for selecting slowly increasing discount factors, and we validate its significance empirically.
translated by 谷歌翻译
As language models have grown in parameters and layers, it has become much harder to train and infer with them on single GPUs. This is severely restricting the availability of large language models such as GPT-3, BERT-Large, and many others. A common technique to solve this problem is pruning the network architecture by removing transformer heads, fully-connected weights, and other modules. The main challenge is to discern the important parameters from the less important ones. Our goal is to find strong metrics for identifying such parameters. We thus propose two strategies: Cam-Cut based on the GradCAM interpretations, and Smooth-Cut based on the SmoothGrad, for calculating the importance scores. Through this work, we show that our scoring functions are able to assign more relevant task-based scores to the network parameters, and thus both our pruning approaches significantly outperform the standard weight and gradient-based strategies, especially at higher compression ratios in BERT-based models. We also analyze our pruning masks and find them to be significantly different from the ones obtained using standard metrics.
translated by 谷歌翻译