人工智能(AI)在过去十年中一直在改变药物发现的实践。各种AI技术已在广泛的应用中使用,例如虚拟筛选和药物设计。在本调查中,我们首先概述了药物发现,并讨论了相关的应用,可以减少到两个主要任务,即分子性质预测和分子产生。然后,我们讨论常见的数据资源,分子表示和基准平台。此外,为了总结AI在药物发现中的进展情况,我们介绍了在调查的论文中包括模型架构和学习范式的相关AI技术。我们预计本调查将作为有兴趣在人工智能和药物发现界面工作的研究人员的指南。我们还提供了GitHub存储库(HTTPS:///github.com/dengjianyuan/survey_survey_au_drug_discovery),其中包含文件和代码,如适用,作为定期更新的学习资源。
translated by 谷歌翻译
人工智能(AI)已被广泛应用于药物发现中,其主要任务是分子财产预测。尽管分子表示学习中AI技术的繁荣,但尚未仔细检查分子性质预测的一些关键方面。在这项研究中,我们对三个代表性模型,即随机森林,莫尔伯特和格罗弗进行了系统比较,该模型分别利用了三个主要的分子表示,扩展连接的指纹,微笑的字符串和分子图。值得注意的是,莫尔伯特(Molbert)和格罗弗(Grover)以自我监督的方式在大规模的无标记分子库中进行了预定。除了常用的分子基准数据集外,我们还组装了一套与阿片类药物相关的数据集进行下游预测评估。我们首先对标签分布和结构分析进行了数据集分析;我们还检查了阿片类药物相关数据集中的活动悬崖问题。然后,我们培训了4,320个预测模型,并评估了学习表示的有用性。此外,我们通过研究统计测试,评估指标和任务设置的效果来探索模型评估。最后,我们将化学空间的概括分解为施加间和支柱内的概括,并测量了预测性能,以评估两种设置下模型的普遍性。通过采取这种喘息,我们反映了分子财产预测的基本关键方面,希望在该领域带来更好的AI技术的意识。
translated by 谷歌翻译
与靶蛋白具有高结合亲和力的药物样分子的产生仍然是药物发现中的一项困难和资源密集型任务。现有的方法主要采用强化学习,马尔可夫采样或以高斯过程为指导的深层生成模型,在生成具有高结合亲和力的分子时,通过基于计算量的物理学方法计算出的高结合亲和力。我们提出了对分子(豪华轿车)的潜在构成主义,它通过类似于Inceptionism的技术显着加速了分子的产生。豪华轿车采用序列的两个神经网络采用变异自动编码器生成的潜在空间和性质预测,从而使基于梯度的分子特性更快地基于梯度的反相比。综合实验表明,豪华轿车在基准任务上具有竞争力,并且在产生具有高结合亲和力的类似药物的化合物的新任务上,其最先进的技术表现出了最先进的技术,可针对两个蛋白质靶标达到纳摩尔范围。我们通过对绝对结合能的基于更准确的基于分子动力学的计算来证实这些基于对接的结果,并表明我们生成的类似药物的化合物之一的预测$ k_d $(结合亲和力的量度)为$ 6 \ cdot 10^ {-14} $ m针对人类雌激素受体,远远超出了典型的早期药物候选物和大多数FDA批准的药物的亲和力。代码可从https://github.com/rose-stl-lab/limo获得。
translated by 谷歌翻译
生物医学网络是与疾病网络的蛋白质相互作用的普遍描述符,从蛋白质相互作用,一直到医疗保健系统和科学知识。随着代表学习提供强大的预测和洞察的显着成功,我们目睹了表现形式学习技术的快速扩展,进入了这些网络的建模,分析和学习。在这篇综述中,我们提出了一个观察到生物学和医学中的网络长期原则 - 而在机器学习研究中经常出口 - 可以为代表学习提供概念基础,解释其当前的成功和限制,并告知未来进步。我们综合了一系列算法方法,即在其核心利用图形拓扑到将网络嵌入到紧凑的向量空间中,并捕获表示陈述学习证明有用的方式的广度。深远的影响包括鉴定复杂性状的变异性,单细胞的异心行为及其对健康的影响,协助患者的诊断和治疗以及制定安全有效的药物。
translated by 谷歌翻译
基于合并和处理对称信息的神经网络架构的几何深度学习(GDL)已经成为人工智能最近的范式。GDL在分子建模应用中具有特定的承诺,其中存在具有不同对称性和抽象水平的各种分子表示。本综述提供了分子GDL的结构化和协调概述,突出了其在药物发现,化学合成预测和量子化学中的应用。重点是学习的分子特征的相关性及其对成熟的分子描述符的互补性。本综述概述了当前的挑战和机会,并提出了用于分子科学GDL的未来的预测。
translated by 谷歌翻译
图表无处不在地编码许多域中现实世界对象的关系信息。图形生成的目的是从类似于观察到的图形的分布中生成新图形,由于深度学习模型的最新进展,人们的关注越来越大。在本文中,我们对现有的图形生成文献进行了全面综述,从各种新兴方法到其广泛的应用领域。具体来说,我们首先提出了深图生成的问题,并与几个相关的图形学习任务讨论了它的差异。其次,我们根据模型架构将最新方法分为三类,并总结其生成策略。第三,我们介绍了深图生成的三个关键应用领域。最后,我们重点介绍了深图生成的未来研究中的挑战和机遇。
translated by 谷歌翻译
图形结构数据的深层生成模型为化学合成问题提供了一个新的角度:通过优化直接生成分子图的可区分模型,可以在化学结构的离散和广阔空间中侧键入昂贵的搜索程序。我们介绍了Molgan,这是一种用于小分子图的隐式,无似然生成模型,它规避了对以前基于可能性的方法的昂贵图形匹配程序或节点订购启发式方法的需求。我们的方法适应生成对抗网络(GAN)直接在图形结构数据上操作。我们将方法与增强学习目标结合起来,以鼓励具有特定所需化学特性的分子产生。在QM9化学数据库的实验中,我们证明了我们的模型能够生成接近100%有效化合物。莫尔根(Molgan)与最近使用基于字符串的分子表示(微笑)表示的提案和基于似然的方法直接生成图的方法进行了比较。 https://github.com/nicola-decao/molgan上的代码
translated by 谷歌翻译
In this work, we propose MEDICO, a Multi-viEw Deep generative model for molecule generation, structural optimization, and the SARS-CoV-2 Inhibitor disCOvery. To the best of our knowledge, MEDICO is the first-of-this-kind graph generative model that can generate molecular graphs similar to the structure of targeted molecules, with a multi-view representation learning framework to sufficiently and adaptively learn comprehensive structural semantics from targeted molecular topology and geometry. We show that our MEDICO significantly outperforms the state-of-the-art methods in generating valid, unique, and novel molecules under benchmarking comparisons. In particular, we showcase the multi-view deep learning model enables us to generate not only the molecules structurally similar to the targeted molecules but also the molecules with desired chemical properties, demonstrating the strong capability of our model in exploring the chemical space deeply. Moreover, case study results on targeted molecule generation for the SARS-CoV-2 main protease (Mpro) show that by integrating molecule docking into our model as chemical priori, we successfully generate new small molecules with desired drug-like properties for the Mpro, potentially accelerating the de novo design of Covid-19 drugs. Further, we apply MEDICO to the structural optimization of three well-known Mpro inhibitors (N3, 11a, and GC376) and achieve ~88% improvement in their binding affinity to Mpro, demonstrating the application value of our model for the development of therapeutics for SARS-CoV-2 infection.
translated by 谷歌翻译
在三维分子结构上运行的计算方法有可能解决生物学和化学的重要问题。特别地,深度神经网络的重视,但它们在生物分子结构域中的广泛采用受到缺乏系统性能基准或统一工具包的限制,用于与分子数据相互作用。为了解决这个问题,我们呈现Atom3D,这是一个新颖的和现有的基准数据集的集合,跨越几个密钥的生物分子。我们为这些任务中的每一个实施多种三维分子学习方法,并表明它们始终如一地提高了基于单维和二维表示的方法的性能。结构的具体选择对于性能至关重要,具有涉及复杂几何形状的任务的三维卷积网络,在需要详细位置信息的系统中表现出良好的图形网络,以及最近开发的设备越多的网络显示出显着承诺。我们的结果表明,许多分子问题符合三维分子学习的增益,并且有可能改善许多仍然过分曝光的任务。为了降低进入并促进现场进一步发展的障碍,我们还提供了一套全面的DataSet处理,模型培训和在我们的开源ATOM3D Python包中的评估工具套件。所有数据集都可以从https://www.atom3d.ai下载。
translated by 谷歌翻译
在药物发现中,具有所需生物活性的新分子的合理设计是一项至关重要但具有挑战性的任务,尤其是在治疗新的靶家庭或研究靶标时。在这里,我们提出了PGMG,这是一种用于生物活化分子产生的药效团的深度学习方法。PGMG通过药理的指导提供了一种灵活的策略,以使用训练有素的变异自动编码器在各种情况下生成具有结构多样性的生物活性分子。我们表明,PGMG可以在给定药效团模型的情况下生成匹配的分子,同时保持高度的有效性,独特性和新颖性。在案例研究中,我们证明了PGMG在基于配体和基于结构的药物从头设计以及铅优化方案中生成生物活性分子的应用。总体而言,PGMG的灵活性和有效性使其成为加速药物发现过程的有用工具。
translated by 谷歌翻译
在药物发现中,分子优化是在所需药物性质方面将药物候选改变为更好的阶梯。随着近期人工智能的进展,传统上的体外过程越来越促进了Silico方法。我们以硅方法提出了一种创新的,以通过深生成模型制定分子并制定问题,以便产生优化的分子图。我们的生成模型遵循基于片段的药物设计的关键思想,并通过修改其小碎片来优化分子。我们的模型了解如何识别待优化的碎片以及如何通过学习具有良好和不良性质的分子的差异来修改此类碎片。在优化新分子时,我们的模型将学习信号应用于在片段的预测位置解码优化的片段。我们还将多个这样的模型构造成管道,使得管道中的每个模型能够优化一个片段,因此整个流水线能够在需要时改变多个分子片段。我们将我们的模型与基准数据集的其他最先进的方法进行比较,并证明我们的方法在中等分子相似度约束下具有超过80%的性质改善,在高分子相似度约束下具有超过80%的财产改善。 。
translated by 谷歌翻译
虽然最近在许多科学领域都变得无处不在,但对其评估的关注较少。对于分子生成模型,最先进的是孤立或与其输入有关的输出。但是,它们的生物学和功能特性(例如配体 - 靶标相互作用)尚未得到解决。在这项研究中,提出了一种新型的生物学启发的基准,用于评估分子生成模型。具体而言,设计了三个不同的参考数据集,并引入了与药物发现过程直接相关的一组指标。特别是我们提出了一个娱乐指标,将药物目标亲和力预测和分子对接应用作为评估生成产量的互补技术。虽然所有三个指标均在测试的生成模型中均表现出一致的结果,但对药物目标亲和力结合和分子对接分数进行了更详细的比较,表明单峰预测器可能会导致关于目标结合在分子水平和多模式方法的错误结论,而多模式的方法是错误的结论。因此优选。该框架的关键优点是,它通过明确关注配体 - 靶标相互作用,将先前的物理化学域知识纳入基准测试过程,从而创建了一种高效的工具,不仅用于评估分子生成型输出,而且还用于丰富富含分子生成的输出。一般而言,药物发现过程。
translated by 谷歌翻译
Artificial intelligence (AI) in the form of deep learning bears promise for drug discovery and chemical biology, $\textit{e.g.}$, to predict protein structure and molecular bioactivity, plan organic synthesis, and design molecules $\textit{de novo}$. While most of the deep learning efforts in drug discovery have focused on ligand-based approaches, structure-based drug discovery has the potential to tackle unsolved challenges, such as affinity prediction for unexplored protein targets, binding-mechanism elucidation, and the rationalization of related chemical kinetic properties. Advances in deep learning methodologies and the availability of accurate predictions for protein tertiary structure advocate for a $\textit{renaissance}$ in structure-based approaches for drug discovery guided by AI. This review summarizes the most prominent algorithmic concepts in structure-based deep learning for drug discovery, and forecasts opportunities, applications, and challenges ahead.
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
Graph classification is an important area in both modern research and industry. Multiple applications, especially in chemistry and novel drug discovery, encourage rapid development of machine learning models in this area. To keep up with the pace of new research, proper experimental design, fair evaluation, and independent benchmarks are essential. Design of strong baselines is an indispensable element of such works. In this thesis, we explore multiple approaches to graph classification. We focus on Graph Neural Networks (GNNs), which emerged as a de facto standard deep learning technique for graph representation learning. Classical approaches, such as graph descriptors and molecular fingerprints, are also addressed. We design fair evaluation experimental protocol and choose proper datasets collection. This allows us to perform numerous experiments and rigorously analyze modern approaches. We arrive to many conclusions, which shed new light on performance and quality of novel algorithms. We investigate application of Jumping Knowledge GNN architecture to graph classification, which proves to be an efficient tool for improving base graph neural network architectures. Multiple improvements to baseline models are also proposed and experimentally verified, which constitutes an important contribution to the field of fair model comparison.
translated by 谷歌翻译
Molecular machine learning has been maturing rapidly over the last few years.Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem
translated by 谷歌翻译
Deep learning has been shown to be successful in a number of domains, ranging from acoustics, images, to natural language processing. However, applying deep learning to the ubiquitous graph data is non-trivial because of the unique characteristics of graphs. Recently, substantial research efforts have been devoted to applying deep learning methods to graphs, resulting in beneficial advances in graph analysis techniques. In this survey, we comprehensively review the different types of deep learning methods on graphs. We divide the existing methods into five categories based on their model architectures and training strategies: graph recurrent neural networks, graph convolutional networks, graph autoencoders, graph reinforcement learning, and graph adversarial methods. We then provide a comprehensive overview of these methods in a systematic manner mainly by following their development history. We also analyze the differences and compositions of different methods. Finally, we briefly outline the applications in which they have been used and discuss potential future research directions.
translated by 谷歌翻译
深度强化学习(DRL)赋予了各种人工智能领域,包括模式识别,机器人技术,推荐系统和游戏。同样,图神经网络(GNN)也证明了它们在图形结构数据的监督学习方面的出色表现。最近,GNN与DRL用于图形结构环境的融合引起了很多关注。本文对这些混合动力作品进行了全面评论。这些作品可以分为两类:(1)算法增强,其中DRL和GNN相互补充以获得更好的实用性; (2)特定于应用程序的增强,其中DRL和GNN相互支持。这种融合有效地解决了工程和生命科学方面的各种复杂问题。基于审查,我们进一步分析了融合这两个领域的适用性和好处,尤其是在提高通用性和降低计算复杂性方面。最后,集成DRL和GNN的关键挑战以及潜在的未来研究方向被突出显示,这将引起更广泛的机器学习社区的关注。
translated by 谷歌翻译
对机器学习和创造力领域的兴趣越来越大。这项调查概述了计算创造力理论,关键机器学习技术(包括生成深度学习)和相应的自动评估方法的历史和现状。在对该领域的主要贡献进行了批判性讨论之后,我们概述了当前的研究挑战和该领域的新兴机会。
translated by 谷歌翻译