由于其在崎rough的地形中的高机动性和遍历性,四倍的平台已成为一个积极的研究主题。但是,确定机器人是否可以通过裂缝环境以及如何准确计算其路径是高度挑战。此外,计算出的路径可能会穿过具有动态物体或环境对机器人或周围人危险的区域。因此,我们提出了一种新颖的概念方法,即通过虚拟现实(VR)中的用户指导路径计划进行教学四倍的机器人导航。我们的系统包含全球和本地路径计划者,使机器人可以通过学习的迭代来生成路径。 VR接口允许用户与环境进行交互,并在具有挑战性的情况下协助四足机器人。比较实验的结果表明,人与路径计划算法之间的合作可以使算法的计算速度平均增加35.58%,并且在测试方案中,路径长度(平均6.66%)的非急剧增加。此外,用户将VR接口描述为不需要物理需求(10中的2.3),并高度评估了其性能(10中的7.1分)。寻找不太最佳但更安全的路径的能力仍然需要在混乱和非结构化的环境中导航的任务。
translated by 谷歌翻译
如今,自动移动机器人为人类存在多余或太危险的许多地区提供支持。他们在探险,天然气行业,矿山,仓库等中成功证明了自己。但是,即使是腿部的机器人也可能陷入困境的地形条件下,需要人类的认知能力来浏览该系统。尽管游戏手柄和键盘方便用于轮式机器人控制,但3D空间中的四足机器人可以沿所有线性坐标和欧拉角移动,需要至少12个按钮才能独立控制其DOF。因此,需要更方便的控制接口。在本文中,我们介绍了超大型:一种与四足机器人直观的人类机器人相互作用的新型手势界面。如果没有其他设备,操作员可以通过手势识别识别3D空间中的四倍机器人的完全位置和方向控制,只有5个手势和6个DOF手动运动。实验结果表明,将5个静态手势分类为高精度(96.5%),可以准确预测手在三维空间中手的6D位置的位置。所提出的方法的绝对线性偏离根均方根偏差(RMSD)为11.7毫米,比第二个测试方法低50%,所建议方法的绝对角度偏差RMSD为2.6度,几乎为27%低于第二个测试方法。此外,进行了用户研究,以探索用户通过建议的手势接口从人类机器人交互中的主观体验。参与者将其与超级方面的互动评估为直观(2.0),不会引起挫败感(2.63),并且需要较低的身体需求(2.0)。
translated by 谷歌翻译
如今,腿部四足机器人的设计和开发是科学研究的一个非常活跃的领域。实际上,由于与其他移动机器人相比,腿部机器人能够适应严峻的地形和各种环境条件,因此变得流行。随着对腿部机器人实验的需求较高,更多的研究和工程师需要一种负担得起,快速的运动算法开发方式。在本文中,我们提出了一个新的开源四倍的机器人超狗平台,该平台具有12个RC伺服电机,NVIDIA JETSON NANO COMPUTER和STM32F4 DISCOVERY板。 HyperDog是四倍的机器人软件开发的开源平台,该平台基于机器人操作系统2(ROS2)和Micro-Ros。此外,HyperDog是完全由3D印刷零件和碳纤维建造的四倍的机器人狗,它使机器人的重量轻和强度良好。这项工作的想法是证明机器人开发的一种负担得起且可定制的方式,并为研究和工程师提供了腿部机器人平台,在该平台中可以在模拟和真实环境中测试和验证不同的算法。具有代码的开发项目可在GitHub(https://github.com/ndhana94/hyperdog_ros2)上获得。
translated by 谷歌翻译
在各种地形上进行运动的能力对于腿部机器人至关重要。但是,机器人必须更好地了解其在不同地形上进行强大运动的表面。动物和人类能够在脚上的触觉感觉的帮助下识别表面。虽然,腿部机器人的脚触觉感觉并没有得到太多探索。本文介绍了针对触觉脚(TSF)的新型四足机器人Dogtouch的研究。 TSF允许使用触觉传感器和卷积神经网络(CNN)识别不同的表面纹理。实验结果表明,我们训练有素的基于CNN的模型的足够验证精度为74.37 \%,对线模式的90 \%\%的识别最高。将来,我们计划通过呈现各种模式深度的表面样本并应用高级深度学习和浅层学习模型来改善预测模型。此外,我们提出了一种新颖的方法,用于导航四倍和腿部机器人。我们可以安排触觉铺路纹理表面(类似于盲人或视障人士)。因此,只需识别将指示直路,左或右转弯,行人穿越,道路等的特定触觉图案,就可以在未知环境中进行运动,无论光线如何,都可以允许强大的导航。配备了视觉和触觉感知系统的未来四足机器人将能够在非结构化的室内和室外环境中安全,智能地导航和交互。
translated by 谷歌翻译
Objective: Accurate visual classification of bladder tissue during Trans-Urethral Resection of Bladder Tumor (TURBT) procedures is essential to improve early cancer diagnosis and treatment. During TURBT interventions, White Light Imaging (WLI) and Narrow Band Imaging (NBI) techniques are used for lesion detection. Each imaging technique provides diverse visual information that allows clinicians to identify and classify cancerous lesions. Computer vision methods that use both imaging techniques could improve endoscopic diagnosis. We address the challenge of tissue classification when annotations are available only in one domain, in our case WLI, and the endoscopic images correspond to an unpaired dataset, i.e. there is no exact equivalent for every image in both NBI and WLI domains. Method: We propose a semi-surprised Generative Adversarial Network (GAN)-based method composed of three main components: a teacher network trained on the labeled WLI data; a cycle-consistency GAN to perform unpaired image-to-image translation, and a multi-input student network. To ensure the quality of the synthetic images generated by the proposed GAN we perform a detailed quantitative, and qualitative analysis with the help of specialists. Conclusion: The overall average classification accuracy, precision, and recall obtained with the proposed method for tissue classification are 0.90, 0.88, and 0.89 respectively, while the same metrics obtained in the unlabeled domain (NBI) are 0.92, 0.64, and 0.94 respectively. The quality of the generated images is reliable enough to deceive specialists. Significance: This study shows the potential of using semi-supervised GAN-based classification to improve bladder tissue classification when annotations are limited in multi-domain data.
translated by 谷歌翻译
The receptive field (RF), which determines the region of time series to be ``seen'' and used, is critical to improve the performance for time series classification (TSC). However, the variation of signal scales across and within time series data, makes it challenging to decide on proper RF sizes for TSC. In this paper, we propose a dynamic sparse network (DSN) with sparse connections for TSC, which can learn to cover various RF without cumbersome hyper-parameters tuning. The kernels in each sparse layer are sparse and can be explored under the constraint regions by dynamic sparse training, which makes it possible to reduce the resource cost. The experimental results show that the proposed DSN model can achieve state-of-art performance on both univariate and multivariate TSC datasets with less than 50\% computational cost compared with recent baseline methods, opening the path towards more accurate resource-aware methods for time series analyses. Our code is publicly available at: https://github.com/QiaoXiao7282/DSN.
translated by 谷歌翻译
While the problem of hallucinations in neural machine translation has long been recognized, so far the progress on its alleviation is very little. Indeed, recently it turned out that without artificially encouraging models to hallucinate, previously existing methods fall short and even the standard sequence log-probability is more informative. It means that characteristics internal to the model can give much more information than we expect, and before using external models and measures, we first need to ask: how far can we go if we use nothing but the translation model itself ? We propose to use a method that evaluates the percentage of the source contribution to a generated translation. Intuitively, hallucinations are translations "detached" from the source, hence they can be identified by low source contribution. This method improves detection accuracy for the most severe hallucinations by a factor of 2 and is able to alleviate hallucinations at test time on par with the previous best approach that relies on external models. Next, if we move away from internal model characteristics and allow external tools, we show that using sentence similarity from cross-lingual embeddings further improves these results.
translated by 谷歌翻译
We pose video object segmentation as spectral graph clustering in space and time, with one graph node for each pixel and edges forming local space-time neighborhoods. We claim that the strongest cluster in this video graph represents the salient object. We start by introducing a novel and efficient method based on 3D filtering for approximating the spectral solution, as the principal eigenvector of the graph's adjacency matrix, without explicitly building the matrix. This key property allows us to have a fast parallel implementation on GPU, orders of magnitude faster than classical approaches for computing the eigenvector. Our motivation for a spectral space-time clustering approach, unique in video semantic segmentation literature, is that such clustering is dedicated to preserving object consistency over time, which we evaluate using our novel segmentation consistency measure. Further on, we show how to efficiently learn the solution over multiple input feature channels. Finally, we extend the formulation of our approach beyond the segmentation task, into the realm of object tracking. In extensive experiments we show significant improvements over top methods, as well as over powerful ensembles that combine them, achieving state-of-the-art on multiple benchmarks, both for tracking and segmentation.
translated by 谷歌翻译
Metric Elicitation (ME) is a framework for eliciting classification metrics that better align with implicit user preferences based on the task and context. The existing ME strategy so far is based on the assumption that users can most easily provide preference feedback over classifier statistics such as confusion matrices. This work examines ME, by providing a first ever implementation of the ME strategy. Specifically, we create a web-based ME interface and conduct a user study that elicits users' preferred metrics in a binary classification setting. We discuss the study findings and present guidelines for future research in this direction.
translated by 谷歌翻译
Learning-based image compression has improved to a level where it can outperform traditional image codecs such as HEVC and VVC in terms of coding performance. In addition to good compression performance, device interoperability is essential for a compression codec to be deployed, i.e., encoding and decoding on different CPUs or GPUs should be error-free and with negligible performance reduction. In this paper, we present a method to solve the device interoperability problem of a state-of-the-art image compression network. We implement quantization to entropy networks which output entropy parameters. We suggest a simple method which can ensure cross-platform encoding and decoding, and can be implemented quickly with minor performance deviation, of 0.3% BD-rate, from floating point model results.
translated by 谷歌翻译