智能论文笔记

Unlearning Nonlinear Graph Classifiers in the Limited Training Data Regime

Chao Pan , Eli Chien , Olgica Milenkovic

分类：机器学习

2022-11-06

As the demand for user privacy grows, controlled data removal (machine unlearning) is becoming an important feature of machine learning models for data-sensitive Web applications such as social networks and recommender systems. Nevertheless, at this point it is still largely unknown how to perform efficient machine unlearning of graph neural networks (GNNs); this is especially the case when the number of training samples is small, in which case unlearning can seriously compromise the performance of the model. To address this issue, we initiate the study of unlearning the Graph Scattering Transform (GST), a mathematical framework that is efficient, provably stable under feature or graph topology perturbations, and offers graph classification performance comparable to that of GNNs. Our main contribution is the first known nonlinear approximate graph unlearning method based on GSTs. Our second contribution is a theoretical analysis of the computational complexity of the proposed unlearning mechanism, which is hard to replicate for deep neural networks. Our third contribution are extensive simulation results which show that, compared to complete retraining of GNNs after each removal request, the new GST-based approach offers, on average, a $10.38$x speed-up and leads to a $2.6$% increase in test accuracy during unlearning of $90$ out of $100$ training graphs from the IMDB dataset ($10$% training ratio).

translated by 谷歌翻译

图形结构化数据在实践中无处不在，并且经常使用图神经网络（GNN）处理。随着最近的法律确保``被遗忘的权利''的法律，删除图数据的问题已变得非常重要。为了解决该问题，我们介绍了GNNS的\ emph {认证图形}的第一个已知框架。与标准机器学习相反，在处理复杂的图形数据时，出现了新的分析和启发式学位挑战。首先，需要考虑三种不同类型的未学习请求，包括节点功能，边缘和节点学习。其次，为了建立可证明的绩效保证，需要解决与传播过程中功能混合相关的挑战。简单的图卷积（SGC）及其广泛的Pagerank（GPR）扩展的示例说明了基本分析，从而为GNN的认证未学习奠定了理论基础。我们对六个基准数据集的实证研究表明，与不利用图形信息的完整再培训方法和方法相比，相比之下，表现出色的性能复杂性权衡。例如，当在CORA数据集上学习$ 20 \％$的节点时，我们的方法仅遭受$ 0.1 \％$ $的测试准确性损失，而与完整的再培训相比，提供了$ 4 $倍的加速。我们的方案还胜过未利用图形信息的学习方法，其测试准确性提高了$ 12 \％$，以相当的时间复杂性。

translated by 谷歌翻译

产品空间的嵌入方法是用于复杂数据结构的低失真和低维表示的强大技术。在这里，我们解决了Euclidean，球形和双曲线产品的产品空间形式的线性分类新问题。首先，我们描述了使用测地仪和黎曼·歧木的线性分类器的新型制剂，其使用大气和黎曼指标在向量空间中推广直线和内部产品。其次，我们证明了$ D $ -dimential空间形式的线性分类器的任何曲率具有相同的表现力，即，它们可以粉碎恰好$ d + 1 $积分。第三，我们在产品空间形式中正式化线性分类器，描述了第一个已知的Perceptron和支持这些空间的传染媒介机分类器，并为感知者建立严格的融合结果。此外，我们证明了vapnik-chervonenkis尺寸在尺寸的产品空间形式的线性分类器的维度为\ {至少} $ d + 1 $。我们支持我们的理论发现，在多个数据集上模拟，包括合成数据，图像数据和单细胞RNA测序（SCRNA-SEQ）数据。结果表明，与相同维度的欧几里德空间中的欧几里德空间中，SCRNA-SEQ数据的低维产品空间形式的分类为SCRNA-SEQ数据提供了$ \ SIM15 \％$的性能改进。

translated by 谷歌翻译