Link prediction is a crucial problem in graph-structured data. Due to the recent success of graph neural networks (GNNs), a variety of GNN-based models were proposed to tackle the link prediction task. Specifically, GNNs leverage the message passing paradigm to obtain node representation, which relies on link connectivity. However, in a link prediction task, links in the training set are always present while ones in the testing set are not yet formed, resulting in a discrepancy of the connectivity pattern and bias of the learned representation. It leads to a problem of dataset shift which degrades the model performance. In this paper, we first identify the dataset shift problem in the link prediction task and provide theoretical analyses on how existing link prediction methods are vulnerable to it. We then propose FakeEdge, a model-agnostic technique, to address the problem by mitigating the graph topological gap between training and testing sets. Extensive experiments demonstrate the applicability and superiority of FakeEdge on multiple datasets across various domains.
translated by 谷歌翻译
在本文中,我们提供了一种使用图形神经网络(GNNS)的理论,用于多节点表示学习(我们有兴趣学习一组多个节点的表示)。我们知道GNN旨在学习单节点表示。当我们想学习涉及多个节点的节点集表示时,先前作品中的常见做法是直接将GNN学习的多节点表示与节点集的关节表示。在本文中,我们显示了这种方法的基本限制,即无法捕获节点集中节点之间的依赖性,并且认为直接聚合各个节点表示不会导致多个节点的有效关节表示。然后,我们注意到,以前的一些成功的工作作品用于多节点表示学习,包括密封,距离编码和ID-GNN,所有使用的节点标记。这些方法根据应用GNN之前的与目标节点集的关系,首先标记图中的节点。然后,在标记的图表中获得的节点表示被聚合到节点集表示中。通过调查其内部机制,我们将这些节点标记技术统一到单个和最基本的形式,即标记技巧。我们证明,通过标记技巧,可以获得足够富有表现力的GNN学习最具表现力的节点集表示,因此原则上可以解决节点集的任何联合学习任务。关于一个重要的双节点表示学习任务,链接预测,验证了我们理论的实验。我们的工作建立了使用GNN在节点集上使用GNN进行联合预测任务的理论基础。
translated by 谷歌翻译
Link prediction is a key problem for network-structured data. Link prediction heuristics use some score functions, such as common neighbors and Katz index, to measure the likelihood of links. They have obtained wide practical uses due to their simplicity, interpretability, and for some of them, scalability. However, every heuristic has a strong assumption on when two nodes are likely to link, which limits their effectiveness on networks where these assumptions fail. In this regard, a more reasonable way should be learning a suitable heuristic from a given network instead of using predefined ones. By extracting a local subgraph around each target link, we aim to learn a function mapping the subgraph patterns to link existence, thus automatically learning a "heuristic" that suits the current network. In this paper, we study this heuristic learning paradigm for link prediction. First, we develop a novel γ-decaying heuristic theory. The theory unifies a wide range of heuristics in a single framework, and proves that all these heuristics can be well approximated from local subgraphs. Our results show that local subgraphs reserve rich information related to link existence. Second, based on the γ-decaying theory, we propose a new method to learn heuristics from local subgraphs using a graph neural network (GNN). Its experimental results show unprecedented performance, working consistently well on a wide range of problems.
translated by 谷歌翻译
链接预测是图神经网络(GNN)的重要应用。链接预测的大多数现有GNN基于一维Weisfeiler-Lehman(1-WL)测试。 1-wl-gnn首先通过迭代的相邻节点特征来计算中心,然后通过汇总成对节点表示来获得链接表示。正如先前的作品所指出的那样,这两步过程会导致较低的区分功能,因为自然而然地学习节点级表示而不是链接级别。在本文中,我们研究了一种完全不同的方法,该方法可以基于\ textit {二维WEISFEILER-LEHMAN(2-WL)测试直接获得节点对(链接)表示。 2-WL测试直接使用链接(2个小说)作为消息传递单元而不是节点,因此可以直接获得链接表示。我们理论上分析了2-WL测试的表达能力以区分非晶状体链接,并证明其优越的链接与1-WL相比。基于不同的2-WL变体,我们提出了一系列用于链路预测的新型2-WL-GNN模型。在广泛的现实数据集上进行的实验证明了它们对最先进的基线的竞争性能以及优于普通1-WL-GNN的优势。
translated by 谷歌翻译
图形神经网络(GNN)已被广泛应用于各种领域,以通过图形结构数据学习。在各种任务(例如节点分类和图形分类)中,他们对传统启发式方法显示了显着改进。但是,由于GNN严重依赖于平滑的节点特征而不是图形结构,因此在链接预测中,它们通常比简单的启发式方法表现出差的性能,例如,结构信息(例如,重叠的社区,学位和最短路径)至关重要。为了解决这一限制,我们建议邻里重叠感知的图形神经网络(NEO-GNNS),这些神经网络(NEO-GNNS)从邻接矩阵中学习有用的结构特征,并估算了重叠的邻域以进行链接预测。我们的Neo-Gnns概括了基于社区重叠的启发式方法,并处理重叠的多跳社区。我们在开放图基准数据集(OGB)上进行的广泛实验表明,NEO-GNNS始终在链接预测中实现最新性能。我们的代码可在https://github.com/seongjunyun/neo_gnns上公开获取。
translated by 谷歌翻译
Graph neural networks (GNNs) have received remarkable success in link prediction (GNNLP) tasks. Existing efforts first predefine the subgraph for the whole dataset and then apply GNNs to encode edge representations by leveraging the neighborhood structure induced by the fixed subgraph. The prominence of GNNLP methods significantly relies on the adhoc subgraph. Since node connectivity in real-world graphs is complex, one shared subgraph is limited for all edges. Thus, the choices of subgraphs should be personalized to different edges. However, performing personalized subgraph selection is nontrivial since the potential selection space grows exponentially to the scale of edges. Besides, the inference edges are not available during training in link prediction scenarios, so the selection process needs to be inductive. To bridge the gap, we introduce a Personalized Subgraph Selector (PS2) as a plug-and-play framework to automatically, personally, and inductively identify optimal subgraphs for different edges when performing GNNLP. PS2 is instantiated as a bi-level optimization problem that can be efficiently solved differently. Coupling GNNLP models with PS2, we suggest a brand-new angle towards GNNLP training: by first identifying the optimal subgraphs for edges; and then focusing on training the inference model by using the sampled subgraphs. Comprehensive experiments endorse the effectiveness of our proposed method across various GNNLP backbones (GCN, GraphSage, NGCF, LightGCN, and SEAL) and diverse benchmarks (Planetoid, OGB, and Recommendation datasets). Our code is publicly available at \url{https://github.com/qiaoyu-tan/PS2}
translated by 谷歌翻译
链接预测是一项重要的任务,在各个域中具有广泛的应用程序。但是,大多数现有的链接预测方法都假定给定的图遵循同质的假设,并设计基于相似性的启发式方法或表示学习方法来预测链接。但是,许多现实世界图是异性图,同义假设不存在,这挑战了现有的链接预测方法。通常,在异性图中,有许多引起链接形成的潜在因素,并且两个链接的节点在一个或两个因素中往往相似,但在其他因素中可能是不同的,导致总体相似性较低。因此,一种方法是学习每个节点的分离表示形式,每个矢量捕获一个因子上的节点的潜在表示,这铺平了一种方法来模拟异性图中的链接形成,从而导致更好的节点表示学习和链接预测性能。但是,对此的工作非常有限。因此,在本文中,我们研究了一个新的问题,该问题是在异性图上进行链接预测的分离表示学习。我们提出了一种新颖的框架分解,可以通过建模链接形成并执行感知因素的消息来学习以促进链接预测来学习解开的表示形式。在13个现实世界数据集上进行的广泛实验证明了Disenlink对异性恋和血友病图的链接预测的有效性。我们的代码可从https://github.com/sjz5202/disenlink获得
translated by 谷歌翻译
Graph machine learning has been extensively studied in both academia and industry. Although booming with a vast number of emerging methods and techniques, most of the literature is built on the in-distribution hypothesis, i.e., testing and training graph data are identically distributed. However, this in-distribution hypothesis can hardly be satisfied in many real-world graph scenarios where the model performance substantially degrades when there exist distribution shifts between testing and training graph data. To solve this critical problem, out-of-distribution (OOD) generalization on graphs, which goes beyond the in-distribution hypothesis, has made great progress and attracted ever-increasing attention from the research community. In this paper, we comprehensively survey OOD generalization on graphs and present a detailed review of recent advances in this area. First, we provide a formal problem definition of OOD generalization on graphs. Second, we categorize existing methods into three classes from conceptually different perspectives, i.e., data, model, and learning strategy, based on their positions in the graph machine learning pipeline, followed by detailed discussions for each category. We also review the theories related to OOD generalization on graphs and introduce the commonly used graph datasets for thorough evaluations. Finally, we share our insights on future research directions. This paper is the first systematic and comprehensive review of OOD generalization on graphs, to the best of our knowledge.
translated by 谷歌翻译
Inferring missing links or detecting spurious ones based on observed graphs, known as link prediction, is a long-standing challenge in graph data analysis. With the recent advances in deep learning, graph neural networks have been used for link prediction and have achieved state-of-the-art performance. Nevertheless, existing methods developed for this purpose are typically discriminative, computing features of local subgraphs around two neighboring nodes and predicting potential links between them from the perspective of subgraph classification. In this formalism, the selection of enclosing subgraphs and heuristic structural features for subgraph classification significantly affects the performance of the methods. To overcome this limitation, this paper proposes a novel and radically different link prediction algorithm based on the network reconstruction theory, called GraphLP. Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated. Moreover, GraphLP explores high-order connectivity patterns to utilize the hierarchical organizational structures of graphs for link prediction. Our experimental results on all common benchmark datasets from different applications demonstrate that the proposed method consistently outperforms other state-of-the-art methods. Unlike the discriminative neural network models used for link prediction, GraphLP is generative, which provides a new paradigm for neural-network-based link prediction.
translated by 谷歌翻译
在本文中,我们旨在提供有效的成对学习神经链路预测(PLNLP)框架。该框架将链路预测视为对等级问题的成对学习,包括四个主要组件,即邻域编码器,链路预测器,负采样器和目标函数组成。该框架灵活地,任何通用图形神经卷积或链路预测特定神经结构都可以作为邻域编码器。对于链路预测器,我们设计不同的评分功能,可以基于不同类型的图表来选择。在否定采样器中,我们提供了几种采样策略,这些策略是特定的问题。至于目标函数,我们建议使用有效的排名损失,这大约最大化标准排名度量AUC。我们在4个链路属性预测数据集上评估了开放图基准的4个链接属性预测数据集,包括\ texttt {ogbl-ddi},\ texttt {ogbl-collbab},\ texttt {ogbl-ppa}和\ texttt {ogbl-ciation2}。 PLNLP在\ TextTt {ogbl-ddi}上实现前1个性能,以及仅使用基本神经架构的\ texttt {ogbl-collab}和\ texttt {ogbl-ciation2}的前2个性能。该性能展示了PLNLP的有效性。
translated by 谷歌翻译
Learning node embeddings that capture a node's position within the broader graph structure is crucial for many prediction tasks on graphs. However, existing Graph Neural Network (GNN) architectures have limited power in capturing the position/location of a given node with respect to all other nodes of the graph. Here we propose Position-aware Graph Neural Networks (P-GNNs), a new class of GNNs for computing position-aware node embeddings. P-GNN first samples sets of anchor nodes, computes the distance of a given target node to each anchor-set, and then learns a non-linear distance-weighted aggregation scheme over the anchor-sets. This way P-GNNs can capture positions/locations of nodes with respect to the anchor nodes. P-GNNs have several advantages: they are inductive, scalable, and can incorporate node feature information. We apply P-GNNs to multiple prediction tasks including link prediction and community detection. We show that P-GNNs consistently outperform state of the art GNNs, with up to 66% improvement in terms of the ROC AUC score.Node embedding methods can be categorized into Graph Neural Networks (GNNs) approaches (Scarselli et al., 2009),
translated by 谷歌翻译
链接预测是图形结构数据(例如,社交网络,药物副作用网络等)的基本问题。图形神经网络为此问题提供了强大的解决方案,特别是通过学习封闭目标链接的子图的表示(即节点对)。但是,这些解决方案不能很好地扩展到大图,因为封闭子图的提取和操作在计算上是昂贵的,尤其是对于大图。本文提出了一个可扩展的链接预测解决方案,我们称之为缩放,该解决方案利用稀疏的封闭子图来做出预测。为了提取稀疏的封闭子图,缩放缩放从目标对节点进行多次随机步行,然后在所有访问的节点引起的采样封闭子图上操作。通过利用较小的采样封闭子图,缩放的缩放可以缩放到较大的图形,而在保持高精度的同时,缩小开销要少得多。缩放进一步提供了控制计算开销与准确性之间的权衡的灵活性。通过全面的实验,我们已经证明,缩放可以产生与现有子图表示学习框架报告的同时所报道的,同时计算要求较少的准确性。
translated by 谷歌翻译
图形神经网络(GNNS)最流行的设计范例是1跳消息传递 - 反复反复从1跳邻居聚集特征。但是,1-HOP消息传递的表达能力受Weisfeiler-Lehman(1-WL)测试的界定。最近,研究人员通过同时从节点的K-Hop邻居汇总信息传递到K-HOP消息。但是,尚无分析K-Hop消息传递的表达能力的工作。在这项工作中,我们从理论上表征了K-Hop消息传递的表达力。具体而言,我们首先正式区分了两种k-hop消息传递的内核,它们在以前的作品中经常被滥用。然后,我们通过表明它比1-Hop消息传递更强大,从而表征了K-Hop消息传递的表现力。尽管具有较高的表达能力,但我们表明K-Hop消息传递仍然无法区分一些简单的常规图。为了进一步增强其表现力,我们引入了KP-GNN框架,该框架通过利用每个跳跃中的外围子图信息来改善K-HOP消息。我们证明,KP-GNN可以区分几乎所有常规图,包括一些距离常规图,这些图无法通过以前的距离编码方法来区分。实验结果验证了KP-GNN的表达能力和有效性。 KP-GNN在所有基准数据集中都取得了竞争成果。
translated by 谷歌翻译
最近,图形神经网络(GNNS)在各种现实情景中获得了普及。尽管取得了巨大成功,但GNN的建筑设计严重依赖于体力劳动。因此,自动化图形神经网络(Autopmn)引起了研究界的兴趣和关注,近年来显着改善。然而,现有的autopnn工作主要采用隐式方式来模拟并利用图中的链接信息,这对图中的链路预测任务不充分规范化,并限制了自动启动的其他图表任务。在本文中,我们介绍了一个新的Autognn工作,该工作明确地模拟了缩写为autogel的链接信息。以这种方式,AutoGel可以处理链路预测任务并提高Autognns对节点分类和图形分类任务的性能。具体地,AutoGel提出了一种新的搜索空间,包括层内和层间设计中的各种设计尺寸,并采用更强大的可分辨率搜索算法,以进一步提高效率和有效性。基准数据集的实验结果展示了自动池上的优势在几个任务中。
translated by 谷歌翻译
Graph Neural Networks (GNNs), originally proposed for node classification, have also motivated many recent works on edge prediction (a.k.a., link prediction). However, existing methods lack elaborate design regarding the distinctions between two tasks that have been frequently overlooked: (i) edges only constitute the topology in the node classification task but can be used as both the topology and the supervisions (i.e., labels) in the edge prediction task; (ii) the node classification makes prediction over each individual node, while the edge prediction is determinated by each pair of nodes. To this end, we propose a novel edge prediction paradigm named Edge-aware Message PassIng neuRal nEtworks (EMPIRE). Concretely, we first introduce an edge splitting technique to specify use of each edge where each edge is solely used as either the topology or the supervision (named as topology edge or supervision edge). We then develop a new message passing mechanism that generates the messages to source nodes (through topology edges) being aware of target nodes (through supervision edges). In order to emphasize the differences between pairs connected by supervision edges and pairs unconnected, we further weight the messages to highlight the relative ones that can reflect the differences. In addition, we design a novel negative node-pair sampling trick that efficiently samples 'hard' negative instances in the supervision instances, and can significantly improve the performance. Experimental results verify that the proposed method can significantly outperform existing state-of-the-art models regarding the edge prediction task on multiple homogeneous and heterogeneous graph datasets.
translated by 谷歌翻译
建议图表神经网络(GNNS)在不考虑训练和测试图之间的不可知分布的情况下,诱导GNN的泛化能力退化在分布外(OOD)设置。这种退化的根本原因是大多数GNN是基于I.I.D假设开发的。在这种设置中,GNN倾向于利用在培训中存在的微妙统计相关性用于预测,即使它是杂散的相关性。然而,这种杂散的相关性可能在测试环境中改变,导致GNN的失败。因此,消除了杂散相关的影响对于稳定的GNN来说是至关重要的。为此,我们提出了一个普遍的因果代表框架,称为稳定凝球。主要思想是首先从图数据中提取高级表示,并诉诸因因果推理的显着能力,以帮助模型摆脱虚假相关性。特别是,我们利用图形池化层以提取基于子图的表示作为高级表示。此外,我们提出了一种因果变量区别,以纠正偏置训练分布。因此,GNN将更多地集中在稳定的相关性上。对合成和现实世界ood图数据集的广泛实验良好地验证了所提出的框架的有效性,灵活性和可解释性。
translated by 谷歌翻译
最近提出了基于子图的图表学习(SGRL)来应对规范图神经网络(GNNS)遇到的一些基本挑战,并在许多重要的数据科学应用(例如链接,关系和主题预测)中证明了优势。但是,当前的SGRL方法遇到了可伸缩性问题,因为它们需要为每个培训或测试查询提取子图。扩大规范GNN的最新解决方案可能不适用于SGRL。在这里,我们通过共同设计学习算法及其系统支持,为可扩展的SGRL提出了一种新颖的框架Surel。 Surel采用基于步行的子图表分解,并将步行重新形成子图,从而大大降低了子图提取的冗余并支持并行计算。具有数百万个节点和边缘的六个同质,异质和高阶图的实验证明了Surel的有效性和可扩展性。特别是,与SGRL基线相比,Surel可以实现10 $ \ times $ Quad-Up,具有可比甚至更好的预测性能;与规范GNN相比,Surel可实现50%的预测准确性。
translated by 谷歌翻译
Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction. However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs-a limitation that is especially problematic for the task of graph classification, where the goal is to predict the label associated with an entire graph. Here we propose DIFFPOOL, a differentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various graph neural network architectures in an end-to-end fashion. DIFFPOOL learns a differentiable soft cluster assignment for nodes at each layer of a deep GNN, mapping nodes to a set of clusters, which then form the coarsened input for the next GNN layer. Our experimental results show that combining existing GNN methods with DIFFPOOL yields an average improvement of 5-10% accuracy on graph classification benchmarks, compared to all existing pooling approaches, achieving a new state-of-the-art on four out of five benchmark data sets.
translated by 谷歌翻译
链接预测旨在推断网络/图中的一对节点对之间的链接存在。尽管应用了广泛的应用,但传统链接预测算法的成功受到了三个主要挑战(链接稀疏,节点属性噪声和动态变化)的影响,这些挑战受到许多现实世界网络所面临的。为了应对这些挑战,我们提出了一个上下文化的自我监督学习(CSSL)框架,该框架充分利用了链接预测的结构上下文预测。提出的CSSL框架学习了一个链接编码器,以从配对的节点嵌入中推断链接存在概率,这些嵌入是通过节点属性上的转换构建的。为了生成链接预测的信息节点嵌入,结构上下文预测被用作自我监督的学习任务,以提高链接预测性能。研究了两种类型的结构上下文,即从随机步行和上下文子图收集的上下文节点。 CSSL框架可以以端到端的方式进行训练,并通过通过链接预测和自我监督的学习任务来监督模型参数的学习。提出的CSSL是一个通用且灵活的框架,因为它可以同时处理属性和非属性网络,并且在跨性和归纳性链接预测设置下进行操作。对七个现实世界基准网络进行的广泛实验和消融研究表明,在转化和归纳性环境下,在不同类型的网络上,提出的基于自学的链接链路预测算法优于最先进的基线。拟议的CSSL还可以从大规模网络上的节点属性噪声和可扩展性方面产生竞争性能。
translated by 谷歌翻译
图形神经网络(GNN)已通过隐式捕获协作效应的消息通知成功地采用了推荐系统。然而,大多数现有的推荐消息机制是直接从GNN继承的,而无需仔细检查捕获的协作效果是否会受益于用户偏好的预测。在本文中,我们首先分析了消息传播如何捕获协作效应,并提出了面向建议的拓扑指标,共同的相互作用比率(CIR),该比例(CIR)衡量了节点的特定邻居与其其余邻居之间的相互作用水平。在证明了利用邻居与高级CIR合作的好处之后,我们提出了一项推荐销售的GNN,协作意识图形卷积网络(CAGCN),它超出了1-Weisfeiler-Lehman(1-WL)测试,以区分非优质 - 图形图形。六个基准数据集的实验表明,最佳CAGCN变体的表现优于最具代表性的基于GNN的建议模型LightGCN,在Recess@20中的近10%,并且达到了80 \%的加速。我们的代码可在https://github.com/yuwvandy/cagcn上公开获取。
translated by 谷歌翻译