Graph Neural Networks (GNNs) are a family of graph networks inspired by mechanisms existing between nodes on a graph. In recent years there has been an increased interest in GNN and their derivatives, i.e., Graph Attention Networks (GAT), Graph Convolutional Networks (GCN), and Graph Recurrent Networks (GRN). An increase in their usability in computer vision is also observed. The number of GNN applications in this field continues to expand; it includes video analysis and understanding, action and behavior recognition, computational photography, image and video synthesis from zero or few shots, and many more. This contribution aims to collect papers published about GNN-based approaches towards computer vision. They are described and summarized from three perspectives. Firstly, we investigate the architectures of Graph Neural Networks and their derivatives used in this area to provide accurate and explainable recommendations for the ensuing investigations. As for the other aspect, we also present datasets used in these works. Finally, using graph analysis, we also examine relations between GNN-based studies in computer vision and potential sources of inspiration identified outside of this field.
translated by 谷歌翻译
最近,图形神经网络已成为机器学习界的热门话题。本文提出了基于SCOPUS,自2004年以来,GNN论文首次发布的基于GNNS研究的概述。该研究旨在评估总量和定性的GNN研究趋势。我们提供了研究,积极和有影响力的作者和机构,出版物来源,最具引用文件和热门话题的趋势。我们的调查表明,该领域中最常见的主题类别是计算机科学,工程,电信,语言学,运营研究和管理科学,信息科学和图书馆学,商业和经济学,自动化和控制系统,机器人和社会科学。此外,GNN出版物最具活跃的来源是计算机科学的讲义。最多产或有影响力的机构在美国,中国和加拿大发现。我们还提供必须阅读论文和未来方向。最后,图表卷积网络和注意机制的应用现在是GNN研究的热门话题。
translated by 谷歌翻译
深度学习技术导致了通用对象检测领域的显着突破,近年来产生了很多场景理解的任务。由于其强大的语义表示和应用于场景理解,场景图一直是研究的焦点。场景图生成(SGG)是指自动将图像映射到语义结构场景图中的任务,这需要正确标记检测到的对象及其关系。虽然这是一项具有挑战性的任务,但社区已经提出了许多SGG方法并取得了良好的效果。在本文中,我们对深度学习技术带来了近期成就的全面调查。我们审查了138个代表作品,涵盖了不同的输入方式,并系统地将现有的基于图像的SGG方法从特征提取和融合的角度进行了综述。我们试图通过全面的方式对现有的视觉关系检测方法进行连接和系统化现有的视觉关系检测方法,概述和解释SGG的机制和策略。最后,我们通过深入讨论当前存在的问题和未来的研究方向来完成这项调查。本调查将帮助读者更好地了解当前的研究状况和想法。
translated by 谷歌翻译
场景图是一个场景的结构化表示,可以清楚地表达场景中对象之间的对象,属性和关系。随着计算机视觉技术继续发展,只需检测和识别图像中的对象,人们不再满足。相反,人们期待着对视觉场景更高的理解和推理。例如,给定图像,我们希望不仅检测和识别图像中的对象,还要知道对象之间的关系(视觉关系检测),并基于图像内容生成文本描述(图像标题)。或者,我们可能希望机器告诉我们图像中的小女孩正在做什么(视觉问题应答(VQA)),甚至从图像中移除狗并找到类似的图像(图像编辑和检索)等。这些任务需要更高水平的图像视觉任务的理解和推理。场景图只是场景理解的强大工具。因此,场景图引起了大量研究人员的注意力,相关的研究往往是跨模型,复杂,快速发展的。然而,目前没有对场景图的相对系统的调查。为此,本调查对现行场景图研究进行了全面调查。更具体地说,我们首先总结了场景图的一般定义,随后对场景图(SGG)和SGG的发电方法进行了全面和系统的讨论,借助于先验知识。然后,我们调查了场景图的主要应用,并汇总了最常用的数据集。最后,我们对场景图的未来发展提供了一些见解。我们相信这将是未来研究场景图的一个非常有帮助的基础。
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
视觉变压器正在成为解决计算机视觉问题的强大工具。最近的技术还证明了超出图像域之外的变压器来解决许多与视频相关的任务的功效。其中,由于其广泛的应用,人类的行动识别是从研究界受到特别关注。本文提供了对动作识别的视觉变压器技术的首次全面调查。我们朝着这个方向分析并总结了现有文献和新兴文献,同时突出了适应变形金刚以进行动作识别的流行趋势。由于其专业应用,我们将这些方法统称为``动作变压器''。我们的文献综述根据其架构,方式和预期目标为动作变压器提供了适当的分类法。在动作变压器的背景下,我们探讨了编码时空数据,降低维度降低,框架贴片和时空立方体构造以及各种表示方法的技术。我们还研究了变压器层中时空注意的优化,以处理更长的序列,通常通过减少单个注意操作中的令牌数量。此外,我们还研究了不同的网络学习策略,例如自我监督和零局学习,以及它们对基于变压器的行动识别的相关损失。这项调查还总结了在具有动作变压器重要基准的评估度量评分方面取得的进步。最后,它提供了有关该研究方向的挑战,前景和未来途径的讨论。
translated by 谷歌翻译
生物医学网络是与疾病网络的蛋白质相互作用的普遍描述符,从蛋白质相互作用,一直到医疗保健系统和科学知识。随着代表学习提供强大的预测和洞察的显着成功,我们目睹了表现形式学习技术的快速扩展,进入了这些网络的建模,分析和学习。在这篇综述中,我们提出了一个观察到生物学和医学中的网络长期原则 - 而在机器学习研究中经常出口 - 可以为代表学习提供概念基础,解释其当前的成功和限制,并告知未来进步。我们综合了一系列算法方法,即在其核心利用图形拓扑到将网络嵌入到紧凑的向量空间中,并捕获表示陈述学习证明有用的方式的广度。深远的影响包括鉴定复杂性状的变异性,单细胞的异心行为及其对健康的影响,协助患者的诊断和治疗以及制定安全有效的药物。
translated by 谷歌翻译
深度学习属于人工智能领域,机器执行通常需要某种人类智能的任务。类似于大脑的基本结构,深度学习算法包括一种人工神经网络,其类似于生物脑结构。利用他们的感官模仿人类的学习过程,深入学习网络被送入(感官)数据,如文本,图像,视频或声音。这些网络在不同的任务中优于最先进的方法,因此,整个领域在过去几年中看到了指数增长。这种增长在过去几年中每年超过10,000多种出版物。例如,只有在医疗领域中的所有出版物中覆盖的搜索引擎只能在Q3 2020中覆盖所有出版物的子集,用于搜索术语“深度学习”,其中大约90%来自过去三年。因此,对深度学习领域的完全概述已经不可能在不久的将来获得,并且在不久的将来可能会难以获得难以获得子场的概要。但是,有几个关于深度学习的综述文章,这些文章专注于特定的科学领域或应用程序,例如计算机愿景的深度学习进步或在物体检测等特定任务中进行。随着这些调查作为基础,这一贡献的目的是提供对不同科学学科的深度学习的第一个高级,分类的元调查。根据底层数据来源(图像,语言,医疗,混合)选择了类别(计算机愿景,语言处理,医疗信息和其他工程)。此外,我们还审查了每个子类别的常见架构,方法,专业,利弊,评估,挑战和未来方向。
translated by 谷歌翻译
人口级社会事件,如民事骚乱和犯罪,往往对我们的日常生活产生重大影响。预测此类事件对于决策和资源分配非常重要。由于缺乏关于事件发生的真实原因和潜在机制的知识,事件预测传统上具有挑战性。近年来,由于两个主要原因,研究事件预测研究取得了重大进展:(1)机器学习和深度学习算法的开发和(2)社交媒体,新闻来源,博客,经济等公共数据的可访问性指标和其他元数据源。软件/硬件技术中的数据的爆炸性增长导致了社会事件研究中的深度学习技巧的应用。本文致力于提供社会事件预测的深层学习技术的系统和全面概述。我们专注于两个社会事件的域名:\ Texit {Civil unrest}和\ texit {犯罪}。我们首先介绍事件预测问题如何作为机器学习预测任务制定。然后,我们总结了这些问题的数据资源,传统方法和最近的深度学习模型的发展。最后,我们讨论了社会事件预测中的挑战,并提出了一些有希望的未来研究方向。
translated by 谷歌翻译
Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.
translated by 谷歌翻译
以图形为中心的人工智能(Graph AI)在建模自然界中普遍存在的相互作用系统(从生物学的动态系统到粒子物理学)方面取得了显着成功。数据的异质性的增加,需要对可以结合多种电感偏见的图形神经体系结构。但是,将来自各种来源的数据组合起来是具有挑战性的,因为适当的归纳偏差可能会因数据模式而异。多模式学习方法融合了多个数据模式,同时利用跨模式依赖性来应对这一挑战。在这里,我们调查了以图形为中心的AI的140项研究,并意识到,使用图越来越多地将各种数据类型汇集在一起​​,并将其馈入复杂的多模型模型。这些模型分为图像,语言和知识接地的多模式学习。我们提出了基于此分类的多模式图学习的算法蓝图。该蓝图是通过选择适当的四个不同组件来处理多模式数据的最先进架构的方法。这项工作可以为标准化精致的多模式体系结构的设计铺平道路,以解决高度复杂的现实世界问题。
translated by 谷歌翻译
图形神经网络(GNNS)已被广泛用于许多域,在这些领域中,数据被表示为图,包括社交网络,推荐系统,生物学,化学等。最近,GNNS的表现力引起了人们的兴趣。已经表明,尽管GNNS在许多应用中取得了有希望的经验结果,但GNN中存在一些局限性,阻碍了他们对某些任务的绩效。例如,由于GNNS更新节点功能主要基于本地信息,因此它们在捕获图中节点之间的长距离依赖性方面具有有限的表达能力。为了解决GNN的一些局限性,最近的几项工作开始探索增强的GNN,并记忆以提高其在相关任务中的表现力。在本文中,我们对现有的记忆启发性GNN的现有文献进行了全面综述。我们通过心理学和神经科学的角度回顾了这些作品,后者已经在生物学大脑中建立了多种记忆系统和机制。我们提出了记忆GNN作品的分类法,以及比较记忆机制的一组标准。我们还提供有关这些作品局限性的重要讨论。最后,我们讨论了该领域的挑战和未来方向。
translated by 谷歌翻译
深度强化学习(DRL)赋予了各种人工智能领域,包括模式识别,机器人技术,推荐系统和游戏。同样,图神经网络(GNN)也证明了它们在图形结构数据的监督学习方面的出色表现。最近,GNN与DRL用于图形结构环境的融合引起了很多关注。本文对这些混合动力作品进行了全面评论。这些作品可以分为两类:(1)算法增强,其中DRL和GNN相互补充以获得更好的实用性; (2)特定于应用程序的增强,其中DRL和GNN相互支持。这种融合有效地解决了工程和生命科学方面的各种复杂问题。基于审查,我们进一步分析了融合这两个领域的适用性和好处,尤其是在提高通用性和降低计算复杂性方面。最后,集成DRL和GNN的关键挑战以及潜在的未来研究方向被突出显示,这将引起更广泛的机器学习社区的关注。
translated by 谷歌翻译
Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks e.g., Long short-term memory (LSTM). Different from convolutional networks, Transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the straightforward design of Transformers allows processing multiple modalities (e.g., images, videos, text and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge datasets. These strengths have led to exciting progress on a number of vision tasks using Transformer networks. This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline. We start with an introduction to fundamental concepts behind the success of Transformers i.e., self-attention, large-scale pre-training, and bidirectional feature encoding. We then cover extensive applications of transformers in vision including popular recognition tasks (e.g., image classification, object detection, action recognition, and segmentation), generative modeling, multi-modal tasks (e.g., visual-question answering, visual reasoning, and visual grounding), video processing (e.g., activity recognition, video forecasting), low-level vision (e.g., image super-resolution, image enhancement, and colorization) and 3D analysis (e.g., point cloud classification and segmentation). We compare the respective advantages and limitations of popular techniques both in terms of architectural design and their experimental value. Finally, we provide an analysis on open research directions and possible future works. We hope this effort will ignite further interest in the community to solve current challenges towards the application of transformer models in computer vision.
translated by 谷歌翻译
Graphs are ubiquitous in nature and can therefore serve as models for many practical but also theoretical problems. For this purpose, they can be defined as many different types which suitably reflect the individual contexts of the represented problem. To address cutting-edge problems based on graph data, the research field of Graph Neural Networks (GNNs) has emerged. Despite the field's youth and the speed at which new models are developed, many recent surveys have been published to keep track of them. Nevertheless, it has not yet been gathered which GNN can process what kind of graph types. In this survey, we give a detailed overview of already existing GNNs and, unlike previous surveys, categorize them according to their ability to handle different graph types and properties. We consider GNNs operating on static and dynamic graphs of different structural constitutions, with or without node or edge attributes. Moreover, we distinguish between GNN models for discrete-time or continuous-time dynamic graphs and group the models according to their architecture. We find that there are still graph types that are not or only rarely covered by existing GNN models. We point out where models are missing and give potential reasons for their absence.
translated by 谷歌翻译
通信网络是当代社会中的重要基础设施。仍存在许多挑战,在该活性研究区域中不断提出新的解决方案。近年来,为了模拟网络拓扑,基于图形的深度学习在通信网络中的一系列问题中实现了最先进的性能。在本调查中,我们使用基于不同的图形的深度学习模型来审查快速增长的研究机构,例如,使用不同的图形深度学习模型。图表卷积和曲线图注意网络,在不同类型的通信网络中的各种问题中,例如,无线网络,有线网络和软件定义的网络。我们还为每项研究提供了一个有组织的问题和解决方案列表,并确定了未来的研究方向。据我们所知,本文是第一个专注于在涉及有线和无线场景的通信网络中应用基于图形的深度学习方法的调查。要跟踪后续研究,创建了一个公共GitHub存储库,其中相关文件将不断更新。
translated by 谷歌翻译
图表神经网络(GNNS)最近在人工智能(AI)领域的普及,这是由于它们作为输入数据相对非结构化数据类型的独特能力。尽管GNN架构的一些元素在概念上类似于传统神经网络(以及神经网络变体)的操作中,但是其他元件代表了传统深度学习技术的偏离。本教程通过整理和呈现有关GNN最常见和性能变种的动机,概念,数学和应用的细节,将GNN的权力和新颖性暴露给AI从业者。重要的是,我们简明扼要地向实际示例提出了本教程,从而为GNN的主题提供了实用和可访问的教程。
translated by 谷歌翻译
translated by 谷歌翻译
人类行动识别是计算机视觉中的重要应用领域。它的主要目的是准确地描述人类的行为及其相互作用,从传感器获得的先前看不见的数据序列中。识别,理解和预测复杂人类行动的能力能够构建许多重要的应用,例如智能监视系统,人力计算机界面,医疗保健,安全和军事应用。近年来,计算机视觉社区特别关注深度学习。本文使用深度学习技术的视频分析概述了当前的动作识别最新识别。我们提出了识别人类行为的最重要的深度学习模型,并分析它们,以提供用于解决人类行动识别问题的深度学习算法的当前进展,以突出其优势和缺点。基于文献中报道的识别精度的定量分析,我们的研究确定了动作识别中最新的深层体系结构,然后为该领域的未来工作提供当前的趋势和开放问题。
translated by 谷歌翻译
Graph mining tasks arise from many different application domains, ranging from social networks, transportation to E-commerce, etc., which have been receiving great attention from the theoretical and algorithmic design communities in recent years, and there has been some pioneering work employing the research-rich Reinforcement Learning (RL) techniques to address graph data mining tasks. However, these graph mining methods and RL models are dispersed in different research areas, which makes it hard to compare them. In this survey, we provide a comprehensive overview of RL and graph mining methods and generalize these methods to Graph Reinforcement Learning (GRL) as a unified formulation. We further discuss the applications of GRL methods across various domains and summarize the method descriptions, open-source codes, and benchmark datasets of GRL methods. Furthermore, we propose important directions and challenges to be solved in the future. As far as we know, this is the latest work on a comprehensive survey of GRL, this work provides a global view and a learning resource for scholars. In addition, we create an online open-source for both interested scholars who want to enter this rapidly developing domain and experts who would like to compare GRL methods.
translated by 谷歌翻译