增加对肉类产品的需求与农业劳动力短缺相结合,导致需要开发新的实时解决方案来有效监控动物。使用跟踪逐方法连续定位单个猪进行了重大进展。然而,这些方法由于单个固定摄像机而不能以足够的分辨率覆盖整个地板的椭圆形钢笔。我们通过使用多个相机来解决这个问题,使得相邻摄像机的视野重叠,它们在一起跨越整个楼层。当猪从一个摄像机视图到相邻相机的视图时,避免跟踪中的断裂需要相互作用的切换。我们在地板上识别相邻的相机和共用猪位置,在地板上使用视图间的界面定位。我们的实验涉及两个生长良好的钢笔,每个成长型猪,每个猪,以及三个RGB相机。我们的算法首先使用基于深度学习的对象检测模型(YOLO)来检测猪,并使用多目标跟踪算法(DevelSort)创建其本地跟踪ID。然后,我们使用相互相互作用的共享位置来匹配多个视图,并为在整个跟踪中保存的每只猪生成全局ID。为了评估我们的方法,我们提供了五种两分钟的长视频序列,具有完全注释的全球标识。我们在单个摄像头视图中跟踪猪,多目标跟踪精度和精度分别为65.0%和54.3%,实现了74.0%的相机切换精度。我们在https://github.com/aifarms/multi-camera-pig-tracking中开源我们的代码和注释数据集
translated by 谷歌翻译
Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image superresolution. In this paper, we propose the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images. At each pyramid level, our model takes coarse-resolution feature maps as input, predicts the high-frequency residuals, and uses transposed convolutions for upsampling to the finer level. Our method does not require the bicubic interpolation as the pre-processing step and thus dramatically reduces the computational complexity. We train the proposed LapSRN with deep supervision using a robust Charbonnier loss function and achieve high-quality reconstruction. Furthermore, our network generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of speed and accuracy.
translated by 谷歌翻译
t-SNE remains one of the most popular embedding techniques for visualizing high-dimensional data. Most standard packages of t-SNE, such as scikit-learn, use the Barnes-Hut t-SNE (BH t-SNE) algorithm for large datasets. However, existing CPU implementations of this algorithm are inefficient. In this work, we accelerate the BH t-SNE on CPUs via cache optimizations, SIMD, parallelizing sequential steps, and improving parallelization of multithreaded steps. Our implementation (Acc-t-SNE) is up to 261x and 4x faster than scikit-learn and the state-of-the-art BH t-SNE implementation from daal4py, respectively, on a 32-core Intel(R) Icelake cloud instance.
translated by 谷歌翻译
Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions: download a copy of a foundation model, and fine-tune it using some in-house data about the target task of interest. Consequently, the Internet is swarmed by a handful of foundation models fine-tuned on many diverse tasks. Yet, these individual fine-tunings often lack strong generalization and exist in isolation without benefiting from each other. In our opinion, this is a missed opportunity, as these specialized models contain diverse features. Based on this insight, we propose model recycling, a simple strategy that leverages multiple fine-tunings of the same foundation model on diverse auxiliary tasks, and repurposes them as rich and diverse initializations for the target task. Specifically, model recycling fine-tunes in parallel each specialized model on the target task, and then averages the weights of all target fine-tunings into a final model. Empirically, we show that model recycling maximizes model diversity by benefiting from diverse auxiliary tasks, and achieves a new state of the art on the reference DomainBed benchmark for out-of-distribution generalization. Looking forward, model recycling is a contribution to the emerging paradigm of updatable machine learning where, akin to open-source software development, the community collaborates to incrementally and reliably update machine learning models.
translated by 谷歌翻译
Recent work has demonstrated substantial gains in pre-training large-scale unidirectional language models such as the GPT-2, GPT-3, and GPT-neo, followed by fine-tuning on a downstream task. In this paper, we evaluate the performance of the GPT-neo 1.3 billion model for commonsense reasoning tasks. We assess the model performance on six commonsense reasoning benchmark tasks and report the accuracy scores for these tasks. When fine-tuned using the right set of hyperparameters, we obtain competitive scores on three of these tasks but struggle when the dataset size is significantly smaller. The low model performance on a few of these tasks suggests the inherent difficulty in these datasets and since it fails to establish coherent patterns given their limited training samples. We also investigate and substantiate our results using visualization and conduct numerous inference tests to understand the model performance better. Finally, we conduct thorough robustness tests using various methods to gauge the model performance under numerous settings. These findings suggest a promising path for exploring smaller language models than the GPT-3 175 billion model to perform tasks requiring natural language understanding.
translated by 谷歌翻译
The goal of autonomous vehicles is to navigate public roads safely and comfortably. To enforce safety, traditional planning approaches rely on handcrafted rules to generate trajectories. Machine learning-based systems, on the other hand, scale with data and are able to learn more complex behaviors. However, they often ignore that agents and self-driving vehicle trajectory distributions can be leveraged to improve safety. In this paper, we propose modeling a distribution over multiple future trajectories for both the self-driving vehicle and other road agents, using a unified neural network architecture for prediction and planning. During inference, we select the planning trajectory that minimizes a cost taking into account safety and the predicted probabilities. Our approach does not depend on any rule-based planners for trajectory generation or optimization, improves with more training data and is simple to implement. We extensively evaluate our method through a realistic simulator and show that the predicted trajectory distribution corresponds to different driving profiles. We also successfully deploy it on a self-driving vehicle on urban public roads, confirming that it drives safely without compromising comfort. The code for training and testing our model on a public prediction dataset and the video of the road test are available at https://woven.mobi/safepathnet
translated by 谷歌翻译
Climate change is expected to aggravate wildfire activity through the exacerbation of fire weather. Improving our capabilities to anticipate wildfires on a global scale is of uttermost importance for mitigating their negative effects. In this work, we create a global fire dataset and demonstrate a prototype for predicting the presence of global burned areas on a sub-seasonal scale with the use of segmentation deep learning models. Particularly, we present an open-access global analysis-ready datacube, which contains a variety of variables related to the seasonal and sub-seasonal fire drivers (climate, vegetation, oceanic indices, human-related variables), as well as the historical burned areas and wildfire emissions for 2001-2021. We train a deep learning model, which treats global wildfire forecasting as an image segmentation task and skillfully predicts the presence of burned areas 8, 16, 32 and 64 days ahead of time. Our work motivates the use of deep learning for global burned area forecasting and paves the way towards improved anticipation of global wildfire patterns.
translated by 谷歌翻译
可识别表示学习的理论旨在构建通用方法,从低水平的感觉数据中提取高级潜在(因果)因素。大多数现有的作品都集中在可识别的表示学习中,并依赖于对潜在因素(因果)因素的分配假设。但是,实际上,我们通常还可以访问用于表示学习的介入数据。我们如何利用介入数据来帮助识别高级潜在的潜伏期?为此,我们探讨了在这项工作中可识别的代表学习中介入数据的作用。我们研究潜在因果因素在没有介入数据的情况下,在未介入数据的情况下,在最小的分布假设上。我们证明,如果真实的潜在变量通过多项式函数映射到观察到的高维数据,则通过最小化自动装饰器的标准重建损失来表示学习,将确定真正的潜在潜在的潜在潜在转化。如果我们进一步访问了由硬$ $ do $ $干预产生的干预数据,那么我们就可以识别出这些干预潜在的潜在潜在的潜在潜在的潜在潜在的潜在潜在的潜伏期。
translated by 谷歌翻译
拆分计算已成为实现基于DNN的AI工作负载的最新范例,其中DNN模型分为两个部分,其中一个是在移动/客户端设备上执行的,另一部分是在边缘服务器(或cloud)上执行的。 。数据压缩适用于需要传输的DNN的中间张量,以应对优化速率准确性复杂性权衡的挑战。现有的拆分计算方法采用基于ML的数据压缩,但要求将整个DNN模型的参数(或其中的大部分)用于不同的压缩级别。这会产生高的计算和存储负担:训练从头开始的完整DNN模型在计算上是要求的,维持DNN参数的多个副本会增加存储要求,并在推断期间切换全套权重增加内存带宽。在本文中,我们提出了一种解决所有这些挑战的方法。它涉及瓶颈单元的系统设计和训练 - 简单,低成本的神经网络 - 可以在分裂点插入。与现有方法相比,在训练和推理期间,在训练和推理期间,高效和储存额的一小部分,我们的方法都非常轻巧。
translated by 谷歌翻译
仔细构建和介绍了一系列包含文本和数字的页面,这些页面是一系列页面,并仔细构建并呈现,以便将知识最佳地转移给学生。先前在多媒体和心理学方面的研究将演讲的有效性归因于其多模式的性质。为了开发AI的一步,以帮助学生学习作为智能教师助理,我们将多模式演讲演示文稿数据集作为大规模的基准测试,以测试机器学习模型在多模式了解教育内容的能力。我们的数据集包含一个对齐的幻灯片和口语,用于180多个小时的视频和9000多个幻灯片,其中10位来自各种主题的讲师(例如,计算机科学,牙科,生物学)。我们介绍了两项研究任务,它们被设计为对AI代理商的垫脚石,这些阶梯可以解释(自动为演讲演示字幕),并说明(综合视觉图形以伴随口语解释)教育内容。我们提供手动注释,以帮助执行这两项研究任务并评估其最新模型。比较基线和人类学生的表现,我们发现当前模型在(1)幻灯片和口语文本之间的较弱的跨模式对齐中挣扎,(2)学习新颖的视觉介质,(3)技术语言和(4)(4)远程序列。为了解决这个问题,我们还引入了Polyvilt,这是一种多模式变压器,经过多种模式的学习损失,比目前的方法更有效。最后,我们阐明了对教育演示的多模式理解的挑战和机遇。
translated by 谷歌翻译