图表卷积网络(GCNS)广泛应用于许多应用,但仍需要大量标记的培训数据。此外,GCNS的邻接矩阵是稳定的,这使得数据处理策略无法有效地调整来自内置的图形结构的训练数据的数量。从本文中进一步提高了GCN的性能和自学能力,我们提出在一个区域(rrlfsor).rrlfsor上的有效删除的gcns的高效自我监督的GCNS的学习策略(RRLFSOR).rrlfsor可以被视为新的数据增强器来改进过度平滑。在两个有效和代表性的GCN模型上检查了rrlfsor使用三个公开引文数据集 - 科拉,Pubmed和CiteSeer。转换链路预测任务的实验表明,在三个基准数据集的准确性方面,我们的策略始终如一地始终如一的基线模型。
translated by 谷歌翻译
图表是一个宇宙数据结构,广泛用于组织现实世界中的数据。像交通网络,社交和学术网络这样的各种实际网络网络可以由图表代表。近年来,目睹了在网络中代表顶点的快速发展,进入低维矢量空间,称为网络表示学习。表示学习可以促进图形数据上的新算法的设计。在本调查中,我们对网络代表学习的当前文献进行了全面审查。现有算法可以分为三组:浅埋模型,异构网络嵌入模型,图形神经网络的模型。我们为每个类别审查最先进的算法,并讨论这些算法之间的基本差异。调查的一个优点是,我们系统地研究了不同类别的算法底层的理论基础,这提供了深入的见解,以更好地了解网络表示学习领域的发展。
translated by 谷歌翻译
图形神经网络(GNNS)在各种基于图形的应用中显示了优势。大多数现有的GNNS假设图形结构的强大奇妙并应用邻居的置换不变本地聚合以学习每个节点的表示。然而,它们未能概括到异质图,其中大多数相邻节点具有不同的标签或特征,并且相关节点远处。最近的几项研究通过组合中央节点的隐藏表示(即,基于多跳的方法)的多个跳数来解决这个问题,或者基于注意力分数对相邻节点进行排序(即,基于排名的方法)来解决这个问题。结果,这些方法具有一些明显的限制。一方面,基于多跳的方法没有明确区分相关节点的大量多跳社区,导致严重的过平滑问题。另一方面,基于排名的模型不与结束任务进行联合优化节点排名,并导致次优溶液。在这项工作中,我们呈现图表指针神经网络(GPNN)来解决上述挑战。我们利用指针网络从大量的多跳邻域选择最相关的节点,这根据与中央节点的关系来构造有序序列。然后应用1D卷积以从节点序列中提取高级功能。 GPNN中的基于指针网络的Ranker是以端到端的方式与其他部件进行联合优化的。在具有异质图的六个公共节点分类数据集上进行了广泛的实验。结果表明,GPNN显着提高了最先进方法的分类性能。此外,分析还揭示了拟议的GPNN在过滤出无关邻居并减少过平滑的特权。
translated by 谷歌翻译
图形神经网络(GNNS)在学习归属图中显示了很大的力量。但是,GNNS从源节点利用遥控器的信息仍然是一个挑战。此外,常规GNN要求将图形属性作为输入,因此它们无法应用于纯图。在论文中,我们提出了名为G-GNNS(GNN的全局信息)的新模型来解决上述限制。首先,通过无监督的预训练获得每个节点的全局结构和属性特征,其保留与节点相关联的全局信息。然后,使用全局功能和原始网络属性,我们提出了一个并行GNN的并行框架来了解这些功能的不同方面。所提出的学习方法可以应用于普通图和归属图。广泛的实验表明,G-GNNS可以在三个标准评估图上优于其他最先进的模型。特别是,我们的方法在学习归属图表时建立了Cora(84.31 \%)和PubMed(80.95 \%)的新基准记录。
translated by 谷歌翻译
尽管图表学习(GRL)取得了重大进展,但要以足够的方式提取和嵌入丰富的拓扑结构和特征信息仍然是一个挑战。大多数现有方法都集中在本地结构上,并且无法完全融合全球拓扑结构。为此,我们提出了一种新颖的结构保留图表学习(SPGRL)方法,以完全捕获图的结构信息。具体而言,为了减少原始图的不确定性和错误信息,我们通过k-nearest邻居方法构建了特征图作为互补视图。该特征图可用于对比节点级别以捕获本地关系。此外,我们通过最大化整个图形和特征嵌入的相互信息(MI)来保留全局拓扑结构信息,从理论上讲,该信息可以简化为交换功能的特征嵌入和原始图以重建本身。广泛的实验表明,我们的方法在半监督节点分类任务上具有相当出色的性能,并且在图形结构或节点特征上噪声扰动下的鲁棒性出色。
translated by 谷歌翻译
图形神经网络(GNN)是通过学习通用节点表示形式来建模和处理图形结构数据的主要范例。传统的培训方式GNNS取决于许多标记的数据,这导致了成本和时间的高需求。在某个特殊场景中,它甚至不可用。可以通过图形结构数据本身生成标签的自我监督表示学习是解决此问题的潜在方法。并且要研究对异质图的自学学习问题的研究比处理同质图更具挑战性,对此,研究也更少。在本文中,我们通过基于Metapath(SESIM)的结构信息提出了一种用于异质图的自我监督学习方法。提出的模型可以通过预测每个Metapath中节点之间的跳跃数来构建借口任务,以提高主任务的表示能力。为了预测跳跃数量,Sesim使用数据本身来生成标签,避免了耗时的手动标签。此外,预测每个Metapath中的跳跃数量可以有效地利用图形结构信息,这是节点之间的重要属性。因此,Sesim加深对图形结构模型的理解。最后,我们共同培训主要任务和借口任务,并使用元学习来平衡借口任务对主要任务的贡献。经验结果验证了SESIM方法的性能,并证明该方法可以提高传统神经网络在链接预测任务和节点分类任务上的表示能力。
translated by 谷歌翻译
关于图表的深度学习最近吸引了重要的兴趣。然而,大多数作品都侧重于(半)监督学习,导致缺点包括重标签依赖,普遍性差和弱势稳健性。为了解决这些问题,通过良好设计的借口任务在不依赖于手动标签的情况下提取信息知识的自我监督学习(SSL)已成为图形数据的有希望和趋势的学习范例。与计算机视觉和自然语言处理等其他域的SSL不同,图表上的SSL具有独家背景,设计理念和分类。在图表的伞下自我监督学习,我们对采用图表数据采用SSL技术的现有方法及时及全面的审查。我们构建一个统一的框架,数学上正式地规范图表SSL的范例。根据借口任务的目标,我们将这些方法分为四类:基于生成的,基于辅助性的,基于对比的和混合方法。我们进一步描述了曲线图SSL在各种研究领域的应用,并总结了绘图SSL的常用数据集,评估基准,性能比较和开源代码。最后,我们讨论了该研究领域的剩余挑战和潜在的未来方向。
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
数据增强已广泛用于图像数据和语言数据,但仍然探索图形神经网络(GNN)。现有方法专注于从全局视角增强图表数据,并大大属于两个类型:具有特征噪声注入的结构操纵和对抗训练。但是,最近的图表数据增强方法忽略了GNNS“消息传递机制的本地信息的重要性。在这项工作中,我们介绍了本地增强,这通过其子图结构增强了节点表示的局部。具体而言,我们将数据增强模拟为特征生成过程。鉴于节点的功能,我们的本地增强方法了解其邻居功能的条件分布,并生成更多邻居功能,以提高下游任务的性能。基于本地增强,我们进一步设计了一个新颖的框架:La-GNN,可以以即插即用的方式应用于任何GNN模型。广泛的实验和分析表明,局部增强一致地对各种基准的各种GNN架构始终如一地产生性能改进。
translated by 谷歌翻译
Graph structure learning (GSL), which aims to learn the adjacency matrix for graph neural networks (GNNs), has shown great potential in boosting the performance of GNNs. Most existing GSL works apply a joint learning framework where the estimated adjacency matrix and GNN parameters are optimized for downstream tasks. However, as GSL is essentially a link prediction task, whose goal may largely differ from the goal of the downstream task. The inconsistency of these two goals limits the GSL methods to learn the potential optimal graph structure. Moreover, the joint learning framework suffers from scalability issues in terms of time and space during the process of estimation and optimization of the adjacency matrix. To mitigate these issues, we propose a graph structure refinement (GSR) framework with a pretrain-finetune pipeline. Specifically, The pre-training phase aims to comprehensively estimate the underlying graph structure by a multi-view contrastive learning framework with both intra- and inter-view link prediction tasks. Then, the graph structure is refined by adding and removing edges according to the edge probabilities estimated by the pre-trained model. Finally, the fine-tuning GNN is initialized by the pre-trained model and optimized toward downstream tasks. With the refined graph structure remaining static in the fine-tuning space, GSR avoids estimating and optimizing graph structure in the fine-tuning phase which enjoys great scalability and efficiency. Moreover, the fine-tuning GNN is boosted by both migrating knowledge and refining graphs. Extensive experiments are conducted to evaluate the effectiveness (best performance on six benchmark datasets), efficiency, and scalability (13.8x faster using 32.8% GPU memory compared to the best GSL baseline on Cora) of the proposed model.
translated by 谷歌翻译
Data-efficient learning on graphs (GEL) is essential in real-world applications. Existing GEL methods focus on learning useful representations for nodes, edges, or entire graphs with ``small'' labeled data. But the problem of data-efficient learning for subgraph prediction has not been explored. The challenges of this problem lie in the following aspects: 1) It is crucial for subgraphs to learn positional features to acquire structural information in the base graph in which they exist. Although the existing subgraph neural network method is capable of learning disentangled position encodings, the overall computational complexity is very high. 2) Prevailing graph augmentation methods for GEL, including rule-based, sample-based, adaptive, and automated methods, are not suitable for augmenting subgraphs because a subgraph contains fewer nodes but richer information such as position, neighbor, and structure. Subgraph augmentation is more susceptible to undesirable perturbations. 3) Only a small number of nodes in the base graph are contained in subgraphs, which leads to a potential ``bias'' problem that the subgraph representation learning is dominated by these ``hot'' nodes. By contrast, the remaining nodes fail to be fully learned, which reduces the generalization ability of subgraph representation learning. In this paper, we aim to address the challenges above and propose a Position-Aware Data-Efficient Learning framework for subgraph neural networks called PADEL. Specifically, we propose a novel node position encoding method that is anchor-free, and design a new generative subgraph augmentation method based on a diffused variational subgraph autoencoder, and we propose exploratory and exploitable views for subgraph contrastive learning. Extensive experiment results on three real-world datasets show the superiority of our proposed method over state-of-the-art baselines.
translated by 谷歌翻译
图表表示学习(GRL)对于图形结构数据分析至关重要。然而,大多数现有的图形神经网络(GNNS)严重依赖于标签信息,这通常是在现实世界中获得的昂贵。现有无监督的GRL方法遭受某些限制,例如对单调对比和可扩展性有限的沉重依赖。为了克服上述问题,鉴于最近的图表对比学习的进步,我们通过曲线图介绍了一种新颖的自我监控图形表示学习算法,即通过利用所提出的调整变焦方案来学习节点表示来学习节点表示。具体地,该机制使G-Zoom能够从多个尺度的图表中探索和提取自我监督信号:MICRO(即,节点级别),MESO(即,邻域级)和宏(即,子图级) 。首先,我们通过两个不同的图形增强生成输入图的两个增强视图。然后,我们逐渐地从节点,邻近逐渐为上述三个尺度建立三种不同的对比度,在那里我们最大限度地提高了横跨尺度的图形表示之间的协议。虽然我们可以从微距和宏观视角上从给定图中提取有价值的线索,但是邻域级对比度基于我们的调整后的缩放方案提供了可自定义选项的能力,以便手动选择位于微观和介于微观之间的最佳视点宏观透视更好地理解图数据。此外,为了使我们的模型可扩展到大图,我们采用了并行图形扩散方法来从图形尺寸下解耦模型训练。我们对现实世界数据集进行了广泛的实验,结果表明,我们所提出的模型始终始终优于最先进的方法。
translated by 谷歌翻译
Graph Neural Networks (GNNs) have attracted increasing attention in recent years and have achieved excellent performance in semi-supervised node classification tasks. The success of most GNNs relies on one fundamental assumption, i.e., the original graph structure data is available. However, recent studies have shown that GNNs are vulnerable to the complex underlying structure of the graph, making it necessary to learn comprehensive and robust graph structures for downstream tasks, rather than relying only on the raw graph structure. In light of this, we seek to learn optimal graph structures for downstream tasks and propose a novel framework for semi-supervised classification. Specifically, based on the structural context information of graph and node representations, we encode the complex interactions in semantics and generate semantic graphs to preserve the global structure. Moreover, we develop a novel multi-measure attention layer to optimize the similarity rather than prescribing it a priori, so that the similarity can be adaptively evaluated by integrating measures. These graphs are fused and optimized together with GNN towards semi-supervised classification objective. Extensive experiments and ablation studies on six real-world datasets clearly demonstrate the effectiveness of our proposed model and the contribution of each component.
translated by 谷歌翻译
由于其独立性与标签及其稳健性的独立性,自我监督的学习最近引起了很多关注。目前关于本主题的研究主要使用诸如图形结构的静态信息,但不能很好地捕获诸如边缘时间戳的动态信息。现实图形通常是动态的,这意味着节点之间的交互发生在特定时间。本文提出了一种自我监督的动态图形表示学习框架(DYSUBC),其定义了一个时间子图对比学学习任务,以同时学习动态图的结构和进化特征。具体地,首先提出了一种新的时间子图采样策略,其将动态图的每个节点作为中心节点提出,并使用邻域结构和边缘时间戳来采样相应的时间子图。然后根据在编码每个子图中的节点之后,根据中心节点上的邻域节点的影响设计子图表示功能。最后,定义了结构和时间对比损失,以最大化节点表示和时间子图表示之间的互信息。五个现实数据集的实验表明(1)DySubc比下游链路预测任务中的两个图形对比学习模型和四个动态图形表示学习模型更好地表现出更好的相关基线,(2)使用时间信息不能使用只有更有效的子图,还可以通过时间对比损失来学习更好的表示。
translated by 谷歌翻译
Network embedding (NE) approaches have emerged as a predominant technique to represent complex networks and have benefited numerous tasks. However, most NE approaches rely on a homophily assumption to learn embeddings with the guidance of supervisory signals, leaving the unsupervised heterophilous scenario relatively unexplored. This problem becomes especially relevant in fields where a scarcity of labels exists. Here, we formulate the unsupervised NE task as an r-ego network discrimination problem and develop the SELENE framework for learning on networks with homophily and heterophily. Specifically, we design a dual-channel feature embedding pipeline to discriminate r-ego networks using node attributes and structural information separately. We employ heterophily adapted self-supervised learning objective functions to optimise the framework to learn intrinsic node embeddings. We show that SELENE's components improve the quality of node embeddings, facilitating the discrimination of connected heterophilous nodes. Comprehensive empirical evaluations on both synthetic and real-world datasets with varying homophily ratios validate the effectiveness of SELENE in homophilous and heterophilous settings showing an up to 12.52% clustering accuracy gain.
translated by 谷歌翻译
我们展示了拓扑转型等值表示学习,是图形数据节点表示的自我监督学习的一般范式,以实现图形卷积神经网络(GCNNS)的广泛适用性。通过在转换之前和之后的拓扑转换和节点表示之间的相互信息,从信息理论的角度来看,我们将提出的模型正式化。我们得出最大化这种相互信息可以放宽以最小化应用拓扑变换与节点表示之间的估计之间的跨熵。特别是,我们寻求从原始图表中采样节点对的子集,并在每对之间翻转边缘连接以改变图形拓扑。然后,我们通过从原始和变换图的特征表示重构拓扑转换来自动列出表示编码器以学习节点表示。在实验中,我们将所提出的模型应用于下游节点分类,图形分类和链路预测任务,结果表明,所提出的方法优于现有的无监督方法。
translated by 谷歌翻译
Over-fitting and over-smoothing are two main obstacles of developing deep Graph Convolutional Networks (GCNs) for node classification. In particular, over-fitting weakens the generalization ability on small dataset, while over-smoothing impedes model training by isolating output representations from the input features with the increase in network depth. This paper proposes DropEdge, a novel and flexible technique to alleviate both issues. At its core, DropEdge randomly removes a certain number of edges from the input graph at each training epoch, acting like a data augmenter and also a message passing reducer. Furthermore, we theoretically demonstrate that DropEdge either reduces the convergence speed of over-smoothing or relieves the information loss caused by it. More importantly, our DropEdge is a general skill that can be equipped with many other backbone models (e.g. GCN, ResGCN, GraphSAGE, and JKNet) for enhanced performance. Extensive experiments on several benchmarks verify that DropEdge consistently improves the performance on a variety of both shallow and deep GCNs. The effect of DropEdge on preventing over-smoothing is empirically visualized and validated as well. Codes are released on https://github.com/DropEdge/DropEdge.
translated by 谷歌翻译
图形神经网络(GNNS)在建模图形结构数据方面表明了它们的能力。但是,实际图形通常包含结构噪声并具有有限的标记节点。当在这些图表中培训时,GNN的性能会显着下降,这阻碍了许多应用程序的GNN。因此,与有限标记的节点开发抗噪声GNN是重要的。但是,这是一个相当有限的工作。因此,我们研究了在具有有限标记节点的嘈杂图中开发鲁棒GNN的新问题。我们的分析表明,嘈杂的边缘和有限的标记节点都可能损害GNN的消息传递机制。为减轻这些问题,我们提出了一种新颖的框架,该框架采用嘈杂的边缘作为监督,以学习去噪和密集的图形,这可以减轻或消除嘈杂的边缘,并促进GNN的消息传递,以缓解有限标记节点的问题。生成的边缘还用于规则地将具有标记平滑度的未标记节点的预测规范化,以更好地列车GNN。实验结果对现实世界数据集展示了在具有有限标记节点的嘈杂图中提出框架的稳健性。
translated by 谷歌翻译
基于图形神经网络(GNN)方法最近已成为处理图数据的流行工具,因为它们能够合并结构信息。GNNS性能的唯一障碍是缺乏标记数据。图像和文本数据的数据增强技术无法用于图形数据,因为图形数据的复杂和非欧几里得结构。这一差距迫使研究人员将注意力转向开发图形数据的数据增强技术。大多数提出的图形数据增强(GDA)技术都是特定于任务的。在本文中,我们根据不同的图形任务调查了现有的GDA技术。这项调查不仅提供了GDA研究界的参考,而且还向其他领域的研究人员提供了必要的信息。
translated by 谷歌翻译
无监督的图形表示学习是图形数据的非琐碎主题。在结构化数据的无监督代表学习中对比学习和自我监督学习的成功激发了图表上的类似尝试。使用对比损耗的当前无监督的图形表示学习和预培训主要基于手工增强图数据之间的对比度。但是,由于不可预测的不变性,图数据增强仍然没有很好地探索。在本文中,我们提出了一种新颖的协作图形神经网络对比学习框架(CGCL),它使用多个图形编码器来观察图形。不同视图观察的特征充当了图形编码器之间对比学习的图表增强,避免了任何扰动以保证不变性。 CGCL能够处理图形级和节点级表示学习。广泛的实验表明CGCL在无监督的图表表示学习中的优势以及图形表示学习的手工数据增强组合的非必要性。
translated by 谷歌翻译