当人类共同完成联合任务时,每个人都会建立一个情况的内部模型以及如何发展。有效的协作取决于这些单个模型如何重叠以在团队成员之间形成共同的心理模型,这对于人类机器人团队中的协作流程很重要。准确的共享心理模型的发展和维护需要个人意图的双向交流以及解释其他团队成员意图的能力。为了实现有效的人类机器人协作,本文介绍了人类机器人团队合作中新型联合行动框架的设计和实施,利用增强现实(AR)技术和用户眼目光来实现意图的双向交流。我们通过与37名参与者的用户研究测试了我们的新框架,发现我们的系统提高了任务效率,信任和任务流利。因此,使用AR和眼睛凝视使双向交流是一种有前途的平均值,可以改善影响人与机器人之间协作的核心组成部分。
translated by 谷歌翻译
本文旨在研究入侵攻击,然后为区块链网络开发新的网络攻击检测框架。具体来说,我们首先在实验室设计和实施区块链网络。该区块链网络将实现两个目的,即为我们的学习模型生成真实的流量数据(包括正常数据和攻击数据),并实施实时实验,以评估我们建议的入侵检测框架的性能。据我们所知,这是第一个在区块链网络中用于网络攻击的实验室中合成的数据集。然后,我们提出了一个新颖的协作学习模型,该模型允许区块链网络中的有效部署来检测攻击。提出的学习模型的主要思想是使区块链节点能够积极收集数据,从其数据中分享知识,然后与网络中的其他区块链节点交换知识。这样,我们不仅可以利用网络中所有节点的知识,而且还不需要收集所有原始数据进行培训,以便在常规的集中学习解决方案等集中式节点上进行培训。这样的框架还可以避免暴露本地数据的隐私以及过多的网络开销/拥堵的风险。密集模拟和实时实验都清楚地表明,我们提出的基于协作的入侵检测框架可以在检测攻击方面达到高达97.7%的准确性。
translated by 谷歌翻译
联邦学习(FL)最近成为网络攻击检测系统的有效方法,尤其是在互联网上(物联网)网络。通过在IOT网关中分配学习过程,FL可以提高学习效率,降低通信开销并增强网络内人检测系统的隐私。在这种系统中实施FL的挑战包括不同物联网中的数据特征的标记数据和不可用的不可用。在本文中,我们提出了一种新的协作学习框架,利用转移学习(TL)来克服这些挑战。特别是,我们开发一种新颖的协作学习方法,使目标网络能够有效地和快速学习来自拥有丰富标记数据的源网络的知识。重要的是,最先进的研究要求网络的参与数据集具有相同的特征,从而限制了入侵检测系统的效率,灵活性以及可扩展性。但是,我们所提出的框架可以通过在各种深度学习模型中交换学习知识来解决这些问题,即使他们的数据集具有不同的功能。关于最近的真实网络安全数据集的广泛实验表明,与基于最先进的深度学习方法相比,拟议的框架可以提高超过40%。
translated by 谷歌翻译
In this work, we propose a new approach that combines data from multiple sensors for reliable obstacle avoidance. The sensors include two depth cameras and a LiDAR arranged so that they can capture the whole 3D area in front of the robot and a 2D slide around it. To fuse the data from these sensors, we first use an external camera as a reference to combine data from two depth cameras. A projection technique is then introduced to convert the 3D point cloud data of the cameras to its 2D correspondence. An obstacle avoidance algorithm is then developed based on the dynamic window approach. A number of experiments have been conducted to evaluate our proposed approach. The results show that the robot can effectively avoid static and dynamic obstacles of different shapes and sizes in different environments.
translated by 谷歌翻译
We introduce an approach for the answer-aware question generation problem. Instead of only relying on the capability of strong pre-trained language models, we observe that the information of answers and questions can be found in some relevant sentences in the context. Based on that, we design a model which includes two modules: a selector and a generator. The selector forces the model to more focus on relevant sentences regarding an answer to provide implicit local information. The generator generates questions by implicitly combining local information from the selector and global information from the whole context encoded by the encoder. The model is trained jointly to take advantage of latent interactions between the two modules. Experimental results on two benchmark datasets show that our model is better than strong pre-trained models for the question generation task. The code is also available (shorturl.at/lV567).
translated by 谷歌翻译
Physics-Informed Neural Networks (PINNs) have gained much attention in various fields of engineering thanks to their capability of incorporating physical laws into the models. PINNs integrate the physical constraints by minimizing the partial differential equations (PDEs) residuals on a set of collocation points. The distribution of these collocation points appears to have a huge impact on the performance of PINNs and the assessment of the sampling methods for these points is still an active topic. In this paper, we propose a Fixed-Budget Online Adaptive Mesh Learning (FBOAML) method, which decomposes the domain into sub-domains, for training collocation points based on local maxima and local minima of the PDEs residuals. The stopping criterion is based on a data set of reference, which leads to an adaptive number of iterations for each specific problem. The effectiveness of FBOAML is demonstrated in the context of non-parameterized and parameterized problems. The impact of the hyper-parameters in FBOAML is investigated in this work. The comparison with other adaptive sampling methods is also illustrated. The numerical results demonstrate important gains in terms of accuracy of PINNs with FBOAML over the classical PINNs with non-adaptive collocation points. We also apply FBOAML in a complex industrial application involving coupling between mechanical and thermal fields. We show that FBOAML is able to identify the high-gradient location and even give better prediction for some physical fields than the classical PINNs with collocation points taken on a pre-adapted finite element mesh.
translated by 谷歌翻译
Video understanding is a growing field and a subject of intense research, which includes many interesting tasks to understanding both spatial and temporal information, e.g., action detection, action recognition, video captioning, video retrieval. One of the most challenging problems in video understanding is dealing with feature extraction, i.e. extract contextual visual representation from given untrimmed video due to the long and complicated temporal structure of unconstrained videos. Different from existing approaches, which apply a pre-trained backbone network as a black-box to extract visual representation, our approach aims to extract the most contextual information with an explainable mechanism. As we observed, humans typically perceive a video through the interactions between three main factors, i.e., the actors, the relevant objects, and the surrounding environment. Therefore, it is very crucial to design a contextual explainable video representation extraction that can capture each of such factors and model the relationships between them. In this paper, we discuss approaches, that incorporate the human perception process into modeling actors, objects, and the environment. We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding. Source code is publicly available at https://github.com/UARK-AICV/Video_Representation.
translated by 谷歌翻译
Machine Learning as a service (MLaaS) permits resource-limited clients to access powerful data analytics services ubiquitously. Despite its merits, MLaaS poses significant concerns regarding the integrity of delegated computation and the privacy of the server's model parameters. To address this issue, Zhang et al. (CCS'20) initiated the study of zero-knowledge Machine Learning (zkML). Few zkML schemes have been proposed afterward; however, they focus on sole ML classification algorithms that may not offer satisfactory accuracy or require large-scale training data and model parameters, which may not be desirable for some applications. We propose ezDPS, a new efficient and zero-knowledge ML inference scheme. Unlike prior works, ezDPS is a zkML pipeline in which the data is processed in multiple stages for high accuracy. Each stage of ezDPS is harnessed with an established ML algorithm that is shown to be effective in various applications, including Discrete Wavelet Transformation, Principal Components Analysis, and Support Vector Machine. We design new gadgets to prove ML operations effectively. We fully implemented ezDPS and assessed its performance on real datasets. Experimental results showed that ezDPS achieves one-to-three orders of magnitude more efficient than the generic circuit-based approach in all metrics while maintaining more desirable accuracy than single ML classification approaches.
translated by 谷歌翻译
Video anomaly detection (VAD) -- commonly formulated as a multiple-instance learning problem in a weakly-supervised manner due to its labor-intensive nature -- is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video. In this paper, we first propose to utilize the ViT-encoded visual features from CLIP, in contrast with the conventional C3D or I3D features in the domain, to efficiently extract discriminative representations in the novel technique. We then model long- and short-range temporal dependencies and nominate the snippets of interest by leveraging our proposed Temporal Self-Attention (TSA). The ablation study conducted on each component confirms its effectiveness in the problem, and the extensive experiments show that our proposed CLIP-TSA outperforms the existing state-of-the-art (SOTA) methods by a large margin on two commonly-used benchmark datasets in the VAD problem (UCF-Crime and ShanghaiTech Campus). The source code will be made publicly available upon acceptance.
translated by 谷歌翻译
The biomedical imaging world is notorious for working with small amounts of data, frustrating state-of-the-art efforts in the computer vision and deep learning worlds. With large datasets, it is easier to make progress we have seen from the natural image distribution. It is the same with microscopy videos of neuron cells moving in a culture. This problem presents several challenges as it can be difficult to grow and maintain the culture for days, and it is expensive to acquire the materials and equipment. In this work, we explore how to alleviate this data scarcity problem by synthesizing the videos. We, therefore, take the recent work of the video diffusion model to synthesize videos of cells from our training dataset. We then analyze the model's strengths and consistent shortcomings to guide us on improving video generation to be as high-quality as possible. To improve on such a task, we propose modifying the denoising function and adding motion information (dense optical flow) so that the model has more context regarding how video frames transition over time and how each pixel changes over time.
translated by 谷歌翻译