离群值检测是一项具有挑战性的活动。文献中提出了几种机器学习技术,以进行异常检测。在本文中,我们为双向gan(Bigan)提出了一种新的培训方法,以检测异常值。为了验证拟议的方法,我们采用拟议的培训方法来培训一个Bigan,以检测正在操纵其纳税申报表的纳税人。对于每个纳税人,我们从他/她提交的纳税申报表中得出六个相关参数和三个比率参数。我们在这九个派生的地面数据集上采用拟议的培训方法来训练Bigan。接下来,我们使用$ encoder $(使用$ encoder $编码此数据集)生成此数据集的潜在表示,并使用$ Generator $(使用$ Generator $解码)再生此数据集,通过提供此潜在表示为输入。对于每个纳税人,计算其基地数据和再生数据之间的余弦相似性。具有较低余弦相似性措施的纳税人是潜在的回程操纵者。我们应用了我们的方法来分析印度特兰加纳政府商业税务部提供的钢铁纳税人数据集。
translated by 谷歌翻译
As social media grows faster, harassment becomes more prevalent which leads to considered fake detection a fascinating field among researchers. The graph nature of data with the large number of nodes caused different obstacles including a considerable amount of unrelated features in matrices as high dispersion and imbalance classes in the dataset. To deal with these issues Auto-encoders and a combination of semi-supervised learning and the GAN algorithm which is called SGAN were used. This paper is deploying a smaller number of labels and applying SGAN as a classifier. The result of this test showed that the accuracy had reached 91\% in detecting fake accounts using only 100 labeled samples.
translated by 谷歌翻译
异常检测是确定不符合正常数据分布的样品。由于异常数据的无法获得,培训监督的深神经网络是一项繁琐的任务。因此,无监督的方法是解决此任务的常见方法。深度自动编码器已被广泛用作许多无监督的异常检测方法的基础。但是,深层自动编码器的一个显着缺点是,它们通过概括重建异常值来提供不足的表示异常检测的表示。在这项工作中,我们设计了一个对抗性框架,该框架由两个竞争组件组成,一个对抗性变形者和一个自动编码器。对抗性变形器是一种卷积编码器,学会产生有效的扰动,而自动编码器是一个深层卷积神经网络,旨在重建来自扰动潜在特征空间的图像。这些网络经过相反的目标训练,在这种目标中,对抗性变形者会产生用于编码器潜在特征空间的扰动,以最大化重建误差,并且自动编码器试图中和这些扰动的效果以最大程度地减少它。当应用于异常检测时,该提出的方法会由于对特征空间的扰动应用而学习语义上的富裕表示。所提出的方法在图像和视频数据集上的异常检测中优于现有的最新方法。
translated by 谷歌翻译
Anomaly detection is a classical problem in computer vision, namely the determination of the normal from the abnormal when datasets are highly biased towards one class (normal) due to the insufficient sample size of the other class (abnormal). While this can be addressed as a supervised learning problem, a significantly more challenging problem is that of detecting the unknown/unseen anomaly case that takes us instead into the space of a one-class, semi-supervised learning paradigm. We introduce such a novel anomaly detection model, by using a conditional generative adversarial network that jointly learns the generation of high-dimensional image space and the inference of latent space. Employing encoder-decoder-encoder sub-networks in the generator network enables the model to map the input image to a lower dimension vector, which is then used to reconstruct the generated output image. The use of the additional encoder network maps this generated image to its latent representation. Minimizing the distance between these images and the latent vectors during training aids in learning the data distribution for the normal samples. As a result, a larger distance metric from this learned data distribution at inference time is indicative of an outlier from that distribution -an anomaly. Experimentation over several benchmark datasets, from varying domains, shows the model efficacy and superiority over previous state-of-the-art approaches.
translated by 谷歌翻译
半监督异常检测旨在使用在正常数据上培训的模型来检测来自正常样本的异常。随着近期深度学习的进步,研究人员设计了高效的深度异常检测方法。现有作品通常使用神经网络将数据映射到更具内容性的表示中,然后应用异常检测算法。在本文中,我们提出了一种方法,DASVDD,它共同学习AutoEncoder的参数,同时最小化其潜在表示上的封闭超球的音量。我们提出了一个异常的分数,它是自动化器的重建误差和距离潜在表示中封闭边距中心的距离的组合。尽量减少这种异常的分数辅助我们在培训期间学习正常课程的潜在分布。包括异常分数中的重建错误确保DESVDD不受常见的极度崩溃问题,因为DESVDD模型不会收敛到映射到潜在表示中的恒定点的常量点。几个基准数据集上的实验评估表明,该方法优于常用的最先进的异常检测算法,同时在不同的异常类中保持鲁棒性能。
translated by 谷歌翻译
As the number of heterogenous IP-connected devices and traffic volume increase, so does the potential for security breaches. The undetected exploitation of these breaches can bring severe cybersecurity and privacy risks. Anomaly-based \acp{IDS} play an essential role in network security. In this paper, we present a practical unsupervised anomaly-based deep learning detection system called ARCADE (Adversarially Regularized Convolutional Autoencoder for unsupervised network anomaly DEtection). With a convolutional \ac{AE}, ARCADE automatically builds a profile of the normal traffic using a subset of raw bytes of a few initial packets of network flows so that potential network anomalies and intrusions can be efficiently detected before they cause more damage to the network. ARCADE is trained exclusively on normal traffic. An adversarial training strategy is proposed to regularize and decrease the \ac{AE}'s capabilities to reconstruct network flows that are out-of-the-normal distribution, thereby improving its anomaly detection capabilities. The proposed approach is more effective than state-of-the-art deep learning approaches for network anomaly detection. Even when examining only two initial packets of a network flow, ARCADE can effectively detect malware infection and network attacks. ARCADE presents 20 times fewer parameters than baselines, achieving significantly faster detection speed and reaction time.
translated by 谷歌翻译
识别异常是指检测不像训练数据分布的样本。许多生成模型已被用于寻找异常,以及其中,基于生成的对抗网络(GaN)的方法目前非常受欢迎。 GANS主要依靠这些模型的丰富上下文信息来识别实际培训分布。在这一类比之后,我们建议了基于GANS -A组合的新型无人监督模型和甘甘。此外,引入了一种新的评分功能,以靶向异常,其中鉴别器的内部表示和发电机的视觉表示的线性组合加上自动化器的编码表示,共同定义所提出的异常得分。该模型进一步评估了诸如SVHN,CIFAR10和MNIST之类的基准数据集以及白血病图像的公共医疗数据集。在所有实验中,我们的模型表现出现有的对应物,同时略微改善推理时间。
translated by 谷歌翻译
In data-driven systems, data exploration is imperative for making real-time decisions. However, big data is stored in massive databases that are difficult to retrieve. Approximate Query Processing (AQP) is a technique for providing approximate answers to aggregate queries based on a summary of the data (synopsis) that closely replicates the behavior of the actual data, which can be useful where an approximate answer to the queries would be acceptable in a fraction of the real execution time. In this paper, we discuss the use of Generative Adversarial Networks (GANs) for generating tabular data that can be employed in AQP for synopsis construction. We first discuss the challenges associated with constructing synopses in relational databases and then introduce solutions to those challenges. Following that, we organized statistical metrics to evaluate the quality of the generated synopses. We conclude that tabular data complexity makes it difficult for algorithms to understand relational database semantics during training, and improved versions of tabular GANs are capable of constructing synopses to revolutionize data-driven decision-making systems.
translated by 谷歌翻译
Supervised classification methods have been widely utilized for the quality assurance of the advanced manufacturing process, such as additive manufacturing (AM) for anomaly (defects) detection. However, since abnormal states (with defects) occur much less frequently than normal ones (without defects) in the manufacturing process, the number of sensor data samples collected from a normal state outweighs that from an abnormal state. This issue causes imbalanced training data for classification models, thus deteriorating the performance of detecting abnormal states in the process. It is beneficial to generate effective artificial sample data for the abnormal states to make a more balanced training set. To achieve this goal, this paper proposes a novel data augmentation method based on a generative adversarial network (GAN) using additive manufacturing process image sensor data. The novelty of our approach is that a standard GAN and classifier are jointly optimized with techniques to stabilize the learning process of standard GAN. The diverse and high-quality generated samples provide balanced training data to the classifier. The iterative optimization between GAN and classifier provides the high-performance classifier. The effectiveness of the proposed method is validated by both open-source data and real-world case studies in polymer and metal AM processes.
translated by 谷歌翻译
在运行时检测新颖类的问题称为开放式检测,对于各种现实世界应用,例如医疗应用,自动驾驶等。在深度学习的背景下进行开放式检测涉及解决两个问题:(i):(i)必须将输入图像映射到潜在表示中,该图像包含足够的信息来检测异常值,并且(ii)必须学习一个可以从潜在表示中提取此信息以识别异常情况的异常评分函数。深度异常检测方法的研究缓慢进展。原因之一可能是大多数论文同时引入了新的表示学习技术和新的异常评分方法。这项工作的目的是通过提供分别衡量表示学习和异常评分的有效性的方法来改善这种方法。这项工作做出了两项方法论贡献。首先是引入甲骨文异常检测的概念,以量化学习潜在表示中可用的信息。第二个是引入Oracle表示学习,该学习产生的表示形式可以保证足以准确的异常检测。这两种技术可帮助研究人员将学习表示的质量与异常评分机制的性能分开,以便他们可以调试和改善系统。这些方法还为通过更好的异常评分机制改善了多少开放类别检测提供了上限。两个牙齿的组合给出了任何开放类别检测方法可以实现的性能的上限。这项工作介绍了这两种Oracle技术,并通过将它们应用于几种领先的开放类别检测方法来演示其实用性。
translated by 谷歌翻译
新奇检测是识别不属于目标类分布的样本的任务。在培训期间,缺乏新颖的课程,防止使用传统分类方法。深度自动化器已被广泛用作许多无监督的新奇检测方法的基础。特别地,上下文自动码器在新颖的检测任务中已经成功了,因为他们通过从随机屏蔽的图像重建原始图像来学习的更有效的陈述。然而,上下文AutoEncoders的显着缺点是随机屏蔽不能一致地涵盖输入图像的重要结构,导致次优表示 - 特别是对于新颖性检测任务。在本文中,为了优化输入掩蔽,我们设计了由两个竞争网络,掩模模块和重建器组成的框架。掩码模块是一个卷积的AutoEncoder,用于生成涵盖最重要的图像的最佳掩码。或者,重建器是卷积编码器解码器,其旨在从屏蔽图像重建未受带的图像。网络训练以侵略的方式训练,其中掩模模块生成应用于给予重构的图像的掩码。以这种方式,掩码模块寻求最大化重建错误的重建错误最小化。当应用于新颖性检测时,与上下文自动置换器相比,所提出的方法学习语义上更丰富的表示,并通过更新的屏蔽增强了在测试时间的新颖性检测。 MNIST和CIFAR-10图像数据集上的新奇检测实验证明了所提出的方法对尖端方法的优越性。在用于新颖性检测的UCSD视频数据集的进一步实验中,所提出的方法实现了最先进的结果。
translated by 谷歌翻译
由于能够产生与实际数据的显着统计相似性的高质量数据,生成的对抗性网络(GANS)最近在AI社区中引起了相当大的关注。从根本上,GaN是在训练中以越野方式训练的两个神经网络之间的游戏,以达到零和纳什均衡轮廓。尽管在过去几年中在GAN完成了改进,但仍有几个问题仍有待解决。本文评论了GANS游戏理论方面的文献,并解决了游戏理论模型如何应对生成模型的特殊挑战,提高GAN的表现。我们首先提出一些预备,包括基本GaN模型和一些博弈论背景。然后,我们将分类系统将最先进的解决方案分为三个主要类别:修改的游戏模型,修改的架构和修改的学习方法。分类基于通过文献中提出的游戏理论方法对基本GaN模型进行的修改。然后,我们探讨每个类别的目标,并讨论每个类别的最新作品。最后,我们讨论了这一领域的剩余挑战,并提出了未来的研究方向。
translated by 谷歌翻译
聚类是一项基本的机器学习任务,在文献中已广泛研究。经典聚类方法遵循以下假设:数据通过各种表示的学习技术表示为矢量化形式的特征。随着数据变得越来越复杂和复杂,浅(传统)聚类方法无法再处理高维数据类型。随着深度学习的巨大成功,尤其是深度无监督的学习,在过去的十年中,已经提出了许多具有深层建筑的代表性学习技术。最近,已经提出了深层聚类的概念,即共同优化表示的学习和聚类,因此引起了社区的日益关注。深度学习在聚类中的巨大成功,最基本的机器学习任务之一以及该方向的最新进展的巨大成功所激发。 - 艺术方法。我们总结了深度聚类的基本组成部分,并通过设计深度表示学习和聚类之间的交互方式对现有方法进行了分类。此外,该调查还提供了流行的基准数据集,评估指标和开源实现,以清楚地说明各种实验设置。最后但并非最不重要的一点是,我们讨论了深度聚类的实际应用,并提出了应有的挑战性主题,应将进一步的研究作为未来的方向。
translated by 谷歌翻译
Obtaining models that capture imaging markers relevant for disease progression and treatment monitoring is challenging. Models are typically based on large amounts of data with annotated examples of known markers aiming at automating detection. High annotation effort and the limitation to a vocabulary of known markers limit the power of such approaches. Here, we perform unsupervised learning to identify anomalies in imaging data as candidates for markers. We propose AnoGAN, a deep convolutional generative adversarial network to learn a manifold of normal anatomical variability, accompanying a novel anomaly scoring scheme based on the mapping from image space to a latent space. Applied to new data, the model labels anomalies, and scores image patches indicating their fit into the learned distribution. Results on optical coherence tomography images of the retina demonstrate that the approach correctly identifies anomalous images, such as images containing retinal fluid or hyperreflective foci.
translated by 谷歌翻译
Time series anomaly detection has applications in a wide range of research fields and applications, including manufacturing and healthcare. The presence of anomalies can indicate novel or unexpected events, such as production faults, system defects, or heart fluttering, and is therefore of particular interest. The large size and complex patterns of time series have led researchers to develop specialised deep learning models for detecting anomalous patterns. This survey focuses on providing structured and comprehensive state-of-the-art time series anomaly detection models through the use of deep learning. It providing a taxonomy based on the factors that divide anomaly detection models into different categories. Aside from describing the basic anomaly detection technique for each category, the advantages and limitations are also discussed. Furthermore, this study includes examples of deep anomaly detection in time series across various application domains in recent years. It finally summarises open issues in research and challenges faced while adopting deep anomaly detection models.
translated by 谷歌翻译
有限重复的游戏是一个充满活力的游戏,在该游戏中,同时玩的游戏有限多次。GAN包含两个竞争模块:对发电机模块进行了训练以生成新的示例,并训练了判别器模块以区分真实示例与生成的示例。GAN的训练过程是一个有限重复的游戏,每个模块都试图以非合作方式在每个同时游戏的情况下优化其错误。我们观察到,如果在同时游戏的每个实例中,更强大的模块与较弱的模块合作,并且只有较弱的模块只能优化其错误。
translated by 谷歌翻译
我们描述了作为黑暗机器倡议和LES Houches 2019年物理学研讨会进行的数据挑战的结果。挑战的目标是使用无监督机器学习算法检测LHC新物理学的信号。首先,我们提出了如何实现异常分数以在LHC搜索中定义独立于模型的信号区域。我们定义并描述了一个大型基准数据集,由> 10亿美元的Muton-Proton碰撞,其中包含> 10亿美元的模拟LHC事件组成。然后,我们在数据挑战的背景下审查了各种异常检测和密度估计算法,我们在一组现实分析环境中测量了它们的性能。我们绘制了一些有用的结论,可以帮助开发无监督的新物理搜索在LHC的第三次运行期间,并为我们的基准数据集提供用于HTTPS://www.phenomldata.org的未来研究。重现分析的代码在https://github.com/bostdiek/darkmachines-unsupervisedChallenge提供。
translated by 谷歌翻译
We propose a novel reconstruction-based model for anomaly detection, called Y-GAN. The model consists of a Y-shaped auto-encoder and represents images in two separate latent spaces. The first captures meaningful image semantics, key for representing (normal) training data, whereas the second encodes low-level residual image characteristics. To ensure the dual representations encode mutually exclusive information, a disentanglement procedure is designed around a latent (proxy) classifier. Additionally, a novel consistency loss is proposed to prevent information leakage between the latent spaces. The model is trained in a one-class learning setting using normal training data only. Due to the separation of semantically-relevant and residual information, Y-GAN is able to derive informative data representations that allow for efficient anomaly detection across a diverse set of anomaly detection tasks. The model is evaluated in comprehensive experiments with several recent anomaly detection models using four popular datasets, i.e., MNIST, FMNIST and CIFAR10, and PlantVillage.
translated by 谷歌翻译
循环贸易是商品和服务税的逃税形式,其中一组欺诈性纳税人(交易者)的目标是通过在短期内将几项虚拟交易(在商品或服务中添加价值不高)来掩盖非法交易,以掩盖非法交易。。由于纳税人的庞大数据库,当局可以手动识别循环交易者和他们所涉及的非法交易的群体是不可行的。这项工作使用大数据分析和图形表示技术来提出一个框架来识别循环交易者社区并隔离各个社区的非法交易。我们的方法经过印度特兰加纳政府商业税部提供的现实生活数据,在那里我们发现了几个循环商人社区。
translated by 谷歌翻译
可解释的AI(XAI)的最新进展增加了对各个行业中安全和可解释的AI模型部署的需求。尽管深度神经网络在各种领域取得了最新的成功,但了解这种复杂模型的决策过程对于领域专家来说仍然是一项艰巨的任务。尤其是在金融领域,仅指向通常由数百种混合类型列组成的异常,对专家的价值有限。因此,在本文中,我们提出了一个框架,用于解释使用用于混合类型表格数据的Denoisising自动编码器。我们专门将技术集中在错误的观察方面上。这是通过将潜在误差定位的单个样品柱(单元)定位并分配相应的置信度得分来实现的。此外,该模型提供了预期的单元格估计来解决错误。我们根据三个标准的公共表格数据集(信用默认,成人,IEEE欺诈)和一个专有数据集(Holdings)来评估我们的方法。我们发现,适用于此任务的Denoing自动编码器已经在细胞误差检测率和预期价值率中的其他方法都优于其他方法。此外,我们分析了设计用于细胞误差检测的专门损失如何进一步改善这些指标。我们的框架是为域专家设计的,以了解异常的异常特征,并改善内部数据质量管理流程。
translated by 谷歌翻译