智能论文笔记

Consensus-based Fast and Energy-Efficient Multi-Robot Task Allocation

Prabhat Mahato , Sudipta Saha , Chayan Sarkar , Md Shaghil

分类：机器人

2022-09-21

在多机器人系统中，任务对单个机器人的适当分配是非常重要的组成部分。集中式基础架构的可用性可以保证任务的最佳分配。但是，在许多重要的情况下，例如搜索和救援，探索，灾难管理，战场等，以分散的方式将动态任务直接分配给机器人。机器人之间的有效交流在任何这样的分散环境中都起着至关重要的作用。现有的关于分布式多机器人任务分配（MRTA）的作品假设网络可用或使用幼稚的通信范例。相反，在大多数情况下，网络基础架构是不稳定的或不可用的，并且临时网络是唯一的度假胜地。在同步传输（ST）的无线通信协议（ST）的最新发展显示，比在临时网络（例如无线传感器网络（WSN）/物联网（IOT）应用程序中的传统异步传输协议（IOT）应用程序中比传统的基于异步传输的协议更有效。当前的工作是将ST用于MRTA的第一项工作。具体而言，我们提出了一种有效调整基于ST的多对多交互的算法，并将信息交换最小化以达成任务分配的共识。我们通过广泛的基于基于模拟的研究在不同的环境下进行了基于模拟的延迟和能源效率来展示拟议算法的功效。

translated by 谷歌翻译

DoRO: Disambiguation of referred object for embodied agents

Pradip Pramanick , Chayan Sarkar , Sayan Paul , Ruddra dev Roychoudhury , Brojeshwar Bhowmick

分类：机器人 | 人工智能

2022-07-28

机器人任务说明通常涉及机器人必须在环境中定位（地面）的引用对象。尽管任务意图理解是自然语言理解的重要组成部分，但努力却减少了解决任务时可能出现的歧义的努力。现有作品使用基于视觉的任务接地和歧义检测，适用于固定视图和静态机器人。但是，该问题对移动机器人进行了放大，其中未知的理想视图是未知的。此外，单个视图可能不足以定位给定区域中的所有对象实例，从而导致歧义检测不准确。只有机器人能够传达其面临的歧义，人类干预才能有所帮助。在本文中，我们介绍了doro（对对象的歧义），该系统可以帮助体现的代理在需要时提出合适的查询来消除引用对象的歧义。给定预期对象所处的区域，Doro通过在探索和扫描该区域的同时从多个视图中汇总观察结果来找到对象的所有实例。然后，它使用接地对象实例的信息提出合适的查询。使用AI2thor模拟器进行的实验表明，Doro不仅更准确地检测到歧义，而且还通过从视觉语言接地中获得了更准确的信息来提高冗长的查询。

translated by 谷歌翻译

Concurrent Transmission for Multi-Robot Coordination

Sourabha Bharadwaj , Karunakar Gonabattula , Sudipta Saha , Chayan Sarkar , Rekha Raja

分类：机器人

2021-12-01

有效的通信机制形成任何多机器人系统的骨干，以实现富有成效的协作和协调。在快速传播和聚合中存在基于异步传输的策略的限制将设计人员尽可能多地修剪这些要求。这也限制了移动多机器人系统的可能应用领域。在这项工作中，我们将基于并行的传输策略介绍为替代品。尽管常见地发现了同时传输的困难，例如微秒时间同步，硬件异质性等，但我们演示了如何利用多机器人系统。我们提出了一种分割架构，其中两个主要活动 - 通信和计算独立地进行并通过周期性相互作用进行协调。所提出的分离架构应用于自定义构建完整的网络控制系统，该控制系统由具有异质架构的五个双轮差分驱动器移动机器人组成。我们在领导者追随器设置中使用所提出的设计，以协调动态速度变化以及各种形状的独立形成。实验显示了厘米级空间和毫秒的时间准确度，同时在宽测试区域下花费非常低的无线电核心循环。

translated by 谷歌翻译

Talk-to-Resolve: Combining scene understanding and spatial dialogue to resolve granular task ambiguity for a collocated robot

Pradip Pramanick , Chayan Sarkar , Snehasis Banerjee , Brojeshwar Bhowmick

分类：机器人 | 人工智能 | 计算机视觉

2021-11-22

搭配机器人的效用在很大程度上取决于人类的简单和直观的相互作用机制。如果机器人在自然语言中接受任务指令，首先，它必须通过解码指令来了解用户的意图。然而，在执行任务时，由于观察到的场景的变化，机器人可能面临不可预见的情况，因此需要进一步的用户干预。在本文中，我们提出了一个称为谈话的系统，该系统使机器人能够通过在视觉上观察僵局来启动与教师的相干对话交换。通过对话，它要么在原始计划中找到一个提示，它是一个可接受的替代原始计划的替代方案，或者完全肯定地中止任务。为了实现可能的僵局，我们利用观察到的场景的密集标题和给定的指令，共同计算机器人的下一个动作。我们基于初始指令和情境场景对的数据集评估我们的系统。我们的系统可以识别僵局，并以适当的对话交换来解决82％的准确性。此外，与现有技术相比，用户学习表明，我们的系统的问题更自然（4.02平均为1到5的平均值）（平均3.08）。

translated by 谷歌翻译

A generalized algorithm and framework for online 3-dimensional bin packing in an automated sorting center

Ankush Ojha , Marichi Agarwal , Aniruddha Singhal , Chayan Sarkar , Supratim Ghosh , Rajesh Sinha

分类：机器人

2021-11-01

在线三维垃圾箱包装问题（O3D-BPP）由于行业带来的工业自动化而越来越重新突出。然而，由于过去的关注及其具有挑战性，与1D或2D问题相比，良好的近似算法缺乏。本文考虑了自动机器人分拣中心中的局部信息（寻找）的局部信息（寻找）的立场o $ 3 $ d-bpp。我们呈现了两个滚动地平线混合整数线性编程（MILP）Cum-heuristic基于基于算法的算法：MPAck（用于替补标记）和MPACKLITE（用于实时部署）。此外，我们介绍了一个框架Opack，它通过利用在线设置中的信息来适应并提高BP启发式的性能。然后，我们对综合和行业的BP启发式（带有和没有Opack），Mpack和Mpacklite进行了比较分析，提供了越来越多的超越的数据。 MPACKLITE和基线启发式在机器人操作的范围内执行，因此可以实时使用。

translated by 谷歌翻译

LogAnMeta: Log Anomaly Detection Using Meta Learning

Abhishek Sarkar , Tanmay Sen , Srimanta Kundu , Arijit Sarkar , Abdul Wazed

分类：机器学习 | (统计)机器学习

2022-12-21

Modern telecom systems are monitored with performance and system logs from multiple application layers and components. Detecting anomalous events from these logs is key to identify security breaches, resource over-utilization, critical/fatal errors, etc. Current supervised log anomaly detection frameworks tend to perform poorly on new types or signatures of anomalies with few or unseen samples in the training data. In this work, we propose a meta-learning-based log anomaly detection framework (LogAnMeta) for detecting anomalies from sequence of log events with few samples. LoganMeta train a hybrid few-shot classifier in an episodic manner. The experimental results demonstrate the efficacy of our proposed method

translated by 谷歌翻译

Scene-aware Egocentric 3D Human Pose Estimation

Jian Wang , Lingjie Liu , Weipeng Xu , Kripasindhu Sarkar , Diogo Luvizon , Christian Theobalt

分类：计算机视觉

2022-12-20

Egocentric 3D human pose estimation with a single head-mounted fisheye camera has recently attracted attention due to its numerous applications in virtual and augmented reality. Existing methods still struggle in challenging poses where the human body is highly occluded or is closely interacting with the scene. To address this issue, we propose a scene-aware egocentric pose estimation method that guides the prediction of the egocentric pose with scene constraints. To this end, we propose an egocentric depth estimation network to predict the scene depth map from a wide-view egocentric fisheye camera while mitigating the occlusion of the human body with a depth-inpainting network. Next, we propose a scene-aware pose estimation network that projects the 2D image features and estimated depth map of the scene into a voxel space and regresses the 3D pose with a V2V network. The voxel-based feature representation provides the direct geometric connection between 2D image features and scene geometry, and further facilitates the V2V network to constrain the predicted pose based on the estimated scene geometry. To enable the training of the aforementioned networks, we also generated a synthetic dataset, called EgoGTA, and an in-the-wild dataset based on EgoPW, called EgoPW-Scene. The experimental results of our new evaluation sequences show that the predicted 3D egocentric poses are accurate and physically plausible in terms of human-scene interaction, demonstrating that our method outperforms the state-of-the-art methods both quantitatively and qualitatively.

translated by 谷歌翻译

'If you build they will come': Automatic Identification of News-Stakeholders to detect Party Preference in News Coverage

Alapan Kuila , Sudeshna Sarkar

分类：自然语言处理 | 人工智能

2022-12-17

The coverage of different stakeholders mentioned in the news articles significantly impacts the slant or polarity detection of the concerned news publishers. For instance, the pro-government media outlets would give more coverage to the government stakeholders to increase their accessibility to the news audiences. In contrast, the anti-government news agencies would focus more on the views of the opponent stakeholders to inform the readers about the shortcomings of government policies. In this paper, we address the problem of stakeholder extraction from news articles and thereby determine the inherent bias present in news reporting. Identifying potential stakeholders in multi-topic news scenarios is challenging because each news topic has different stakeholders. The research presented in this paper utilizes both contextual information and external knowledge to identify the topic-specific stakeholders from news articles. We also apply a sequential incremental clustering algorithm to group the entities with similar stakeholder types. We carried out all our experiments on news articles on four Indian government policies published by numerous national and international news agencies. We also further generalize our system, and the experimental results show that the proposed model can be extended to other news topics.

translated by 谷歌翻译

A Visual Active Search Framework for Geospatial Exploration

Anindya Sarkar , Michael Lanier , Scott Alfeld , Roman Garnett , Nathan Jacobs , Yevgeniy Vorobeychik

分类：计算机视觉 | 人工智能

2022-11-28

Many problems can be viewed as forms of geospatial search aided by aerial imagery, with examples ranging from detecting poaching activity to human trafficking. We model this class of problems in a visual active search (VAS) framework, which takes as input an image of a broad area, and aims to identify as many examples of a target object as possible. It does this through a limited sequence of queries, each of which verifies whether an example is present in a given region. We propose a reinforcement learning approach for VAS that leverages a collection of fully annotated search tasks as training data to learn a search policy, and combines features of the input image with a natural representation of active search state. Additionally, we propose domain adaptation techniques to improve the policy at decision time when training data is not fully reflective of the test-time distribution of VAS tasks. Through extensive experiments on several satellite imagery datasets, we show that the proposed approach significantly outperforms several strong baselines. Code and data will be made public.

translated by 谷歌翻译

XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning

Pritam Sarkar , Ali Etemad

分类：计算机视觉

2022-11-25

We present XKD, a novel self-supervised framework to learn meaningful representations from unlabelled video clips. XKD is trained with two pseudo tasks. First, masked data reconstruction is performed to learn modality-specific representations. Next, self-supervised cross-modal knowledge distillation is performed between the two modalities through teacher-student setups to learn complementary information. To identify the most effective information to transfer and also to tackle the domain gap between audio and visual modalities which could hinder knowledge transfer, we introduce a domain alignment strategy for effective cross-modal distillation. Lastly, to develop a general-purpose solution capable of handling both audio and visual streams, a modality-agnostic variant of our proposed framework is introduced, which uses the same backbone for both audio and visual modalities. Our proposed cross-modal knowledge distillation improves linear evaluation top-1 accuracy of video action classification by 8.4% on UCF101, 8.1% on HMDB51, 13.8% on Kinetics-Sound, and 14.2% on Kinetics400. Additionally, our modality-agnostic variant shows promising results in developing a general-purpose network capable of handling different data streams. The code is released on the project website.

translated by 谷歌翻译