In molecular research, simulation \& design of molecules are key areas with significant implications for drug development, material science, and other fields. Current classical computational power falls inadequate to simulate any more than small molecules, let alone protein chains on hundreds of peptide. Therefore these experiment are done physically in wet-lab, but it takes a lot of time \& not possible to examine every molecule due to the size of the search area, tens of billions of dollars are spent every year in these research experiments. Molecule simulation \& design has lately advanced significantly by machine learning models, A fresh perspective on the issue of chemical synthesis is provided by deep generative models for graph-structured data. By optimising differentiable models that produce molecular graphs directly, it is feasible to avoid costly search techniques in the discrete and huge space of chemical structures. But these models also suffer from computational limitations when dimensions become huge and consume huge amount of resources. Quantum Generative machine learning in recent years have shown some empirical results promising significant advantages over classical counterparts.
translated by 谷歌翻译
In this work, we propose MEDICO, a Multi-viEw Deep generative model for molecule generation, structural optimization, and the SARS-CoV-2 Inhibitor disCOvery. To the best of our knowledge, MEDICO is the first-of-this-kind graph generative model that can generate molecular graphs similar to the structure of targeted molecules, with a multi-view representation learning framework to sufficiently and adaptively learn comprehensive structural semantics from targeted molecular topology and geometry. We show that our MEDICO significantly outperforms the state-of-the-art methods in generating valid, unique, and novel molecules under benchmarking comparisons. In particular, we showcase the multi-view deep learning model enables us to generate not only the molecules structurally similar to the targeted molecules but also the molecules with desired chemical properties, demonstrating the strong capability of our model in exploring the chemical space deeply. Moreover, case study results on targeted molecule generation for the SARS-CoV-2 main protease (Mpro) show that by integrating molecule docking into our model as chemical priori, we successfully generate new small molecules with desired drug-like properties for the Mpro, potentially accelerating the de novo design of Covid-19 drugs. Further, we apply MEDICO to the structural optimization of three well-known Mpro inhibitors (N3, 11a, and GC376) and achieve ~88% improvement in their binding affinity to Mpro, demonstrating the application value of our model for the development of therapeutics for SARS-CoV-2 infection.
translated by 谷歌翻译
人工智能(AI)在过去十年中一直在改变药物发现的实践。各种AI技术已在广泛的应用中使用,例如虚拟筛选和药物设计。在本调查中,我们首先概述了药物发现,并讨论了相关的应用,可以减少到两个主要任务,即分子性质预测和分子产生。然后,我们讨论常见的数据资源,分子表示和基准平台。此外,为了总结AI在药物发现中的进展情况,我们介绍了在调查的论文中包括模型架构和学习范式的相关AI技术。我们预计本调查将作为有兴趣在人工智能和药物发现界面工作的研究人员的指南。我们还提供了GitHub存储库(HTTPS:///github.com/dengjianyuan/survey_survey_au_drug_discovery),其中包含文件和代码,如适用,作为定期更新的学习资源。
translated by 谷歌翻译
与靶蛋白具有高结合亲和力的药物样分子的产生仍然是药物发现中的一项困难和资源密集型任务。现有的方法主要采用强化学习,马尔可夫采样或以高斯过程为指导的深层生成模型,在生成具有高结合亲和力的分子时,通过基于计算量的物理学方法计算出的高结合亲和力。我们提出了对分子(豪华轿车)的潜在构成主义,它通过类似于Inceptionism的技术显着加速了分子的产生。豪华轿车采用序列的两个神经网络采用变异自动编码器生成的潜在空间和性质预测,从而使基于梯度的分子特性更快地基于梯度的反相比。综合实验表明,豪华轿车在基准任务上具有竞争力,并且在产生具有高结合亲和力的类似药物的化合物的新任务上,其最先进的技术表现出了最先进的技术,可针对两个蛋白质靶标达到纳摩尔范围。我们通过对绝对结合能的基于更准确的基于分子动力学的计算来证实这些基于对接的结果,并表明我们生成的类似药物的化合物之一的预测$ k_d $(结合亲和力的量度)为$ 6 \ cdot 10^ {-14} $ m针对人类雌激素受体,远远超出了典型的早期药物候选物和大多数FDA批准的药物的亲和力。代码可从https://github.com/rose-stl-lab/limo获得。
translated by 谷歌翻译
图形结构数据的深层生成模型为化学合成问题提供了一个新的角度:通过优化直接生成分子图的可区分模型,可以在化学结构的离散和广阔空间中侧键入昂贵的搜索程序。我们介绍了Molgan,这是一种用于小分子图的隐式,无似然生成模型,它规避了对以前基于可能性的方法的昂贵图形匹配程序或节点订购启发式方法的需求。我们的方法适应生成对抗网络(GAN)直接在图形结构数据上操作。我们将方法与增强学习目标结合起来,以鼓励具有特定所需化学特性的分子产生。在QM9化学数据库的实验中,我们证明了我们的模型能够生成接近100%有效化合物。莫尔根(Molgan)与最近使用基于字符串的分子表示(微笑)表示的提案和基于似然的方法直接生成图的方法进行了比较。 https://github.com/nicola-decao/molgan上的代码
translated by 谷歌翻译
在药物发现中,具有所需生物活性的新分子的合理设计是一项至关重要但具有挑战性的任务,尤其是在治疗新的靶家庭或研究靶标时。在这里,我们提出了PGMG,这是一种用于生物活化分子产生的药效团的深度学习方法。PGMG通过药理的指导提供了一种灵活的策略,以使用训练有素的变异自动编码器在各种情况下生成具有结构多样性的生物活性分子。我们表明,PGMG可以在给定药效团模型的情况下生成匹配的分子,同时保持高度的有效性,独特性和新颖性。在案例研究中,我们证明了PGMG在基于配体和基于结构的药物从头设计以及铅优化方案中生成生物活性分子的应用。总体而言,PGMG的灵活性和有效性使其成为加速药物发现过程的有用工具。
translated by 谷歌翻译
Drug targets are the main focus of drug discovery due to their key role in disease pathogenesis. Computational approaches are widely applied to drug development because of the increasing availability of biological molecular datasets. Popular generative approaches can create new drug molecules by learning the given molecule distributions. However, these approaches are mostly not for target-specific drug discovery. We developed an energy-based probabilistic model for computational target-specific drug discovery. Results show that our proposed TagMol can generate molecules with similar binding affinity scores as real molecules. GAT-based models showed faster and better learning relative to GCN baseline models.
translated by 谷歌翻译
分子的深度生成模型在相关数据集上培训,这些模型培训的流行度非常受欢迎,这些模型用于通过化学空间进行搜索。用于新型功能化合物的逆设计的生成模型的下游效用取决于它们学习分子训练分布的能力。最简单的示例是一种语言模型,采用经常性神经网络的形式,并使用串表示生成分子。更复杂的是图形生成模型,其顺序地构建分子图,通常实现最先进的结果。然而,最近的工作表明,语言模型比曾经的想法更具能力,特别是在低数据制度中。在这项工作中,我们调查了简单语言模型学习分子分布的能力。为此目的,我们通过编译特别复杂的分子分布来介绍几个具有挑战性的生成建模任务。在每个任务上,与两个广泛使用的图形生成模型相比,我们评估语言模型的能力。结果表明,语言模型是强大的生成模型,能够熟练地学习复杂的分子分配 - 并产生比图形模型更好的性能。语言模型可以准确地产生:锌15,多模态分子分布以及Pubchem中最大分子的最高评分罚款的分布。
translated by 谷歌翻译
虽然最近在许多科学领域都变得无处不在,但对其评估的关注较少。对于分子生成模型,最先进的是孤立或与其输入有关的输出。但是,它们的生物学和功能特性(例如配体 - 靶标相互作用)尚未得到解决。在这项研究中,提出了一种新型的生物学启发的基准,用于评估分子生成模型。具体而言,设计了三个不同的参考数据集,并引入了与药物发现过程直接相关的一组指标。特别是我们提出了一个娱乐指标,将药物目标亲和力预测和分子对接应用作为评估生成产量的互补技术。虽然所有三个指标均在测试的生成模型中均表现出一致的结果,但对药物目标亲和力结合和分子对接分数进行了更详细的比较,表明单峰预测器可能会导致关于目标结合在分子水平和多模式方法的错误结论,而多模式的方法是错误的结论。因此优选。该框架的关键优点是,它通过明确关注配体 - 靶标相互作用,将先前的物理化学域知识纳入基准测试过程,从而创建了一种高效的工具,不仅用于评估分子生成型输出,而且还用于丰富富含分子生成的输出。一般而言,药物发现过程。
translated by 谷歌翻译
Quantum machine learning (QML) has received increasing attention due to its potential to outperform classical machine learning methods in various problems. A subclass of QML methods is quantum generative adversarial networks (QGANs) which have been studied as a quantum counterpart of classical GANs widely used in image manipulation and generation tasks. The existing work on QGANs is still limited to small-scale proof-of-concept examples based on images with significant down-scaling. Here we integrate classical and quantum techniques to propose a new hybrid quantum-classical GAN framework. We demonstrate its superior learning capabilities by generating $28 \times 28$ pixels grey-scale images without dimensionality reduction or classical pre/post-processing on multiple classes of the standard MNIST and Fashion MNIST datasets, which achieves comparable results to classical frameworks with 3 orders of magnitude less trainable generator parameters. To gain further insight into the working of our hybrid approach, we systematically explore the impact of its parameter space by varying the number of qubits, the size of image patches, the number of layers in the generator, the shape of the patches and the choice of prior distribution. Our results show that increasing the quantum generator size generally improves the learning capability of the network. The developed framework provides a foundation for future design of QGANs with optimal parameter set tailored for complex image generation tasks.
translated by 谷歌翻译
自我监督的神经语言模型最近在有机分子和蛋白质序列的生成设计中发现了广泛的应用,以及用于下游结构分类和功能预测的表示学习。但是,大多数现有的分子设计深度学习模型通常都需要一个大数据集并具有黑盒架构,这使得很难解释其设计逻辑。在这里,我们提出了生成分子变压器(GMTRANSFORMER),这是一种用于分子生成设计的概率神经网络模型。我们的模型建立在最初用于文本处理的空白填充语言模型上,该模型在学习具有高质量生成,可解释性和数据效率的“分子语法”方面具有独特的优势。与其他基线相比,我们的模型在摩西数据集上的基准测试后获得了高新颖性和SCAF。概率生成步骤具有修补分子设计的潜力,因为它们有能力推荐如何通过学习的隐式分子化学指导,并通过解释来修饰现有分子。可以在https://github.com/usccolumbia/gmtransformer上自由访问源代码和数据集
translated by 谷歌翻译
深度生成模型吸引了具有所需特性的分子设计的极大关注。大多数现有模型通过顺序添加原子来产生分子。这通常会使产生的分子与目标性能和低合成可接近性较少。诸如官能团的分子片段与分子性质和合成可接近的比原子更密切相关。在此,我们提出了一种基于片段的分子发生模型,其通过顺序向任何给定的起始分子依次向任何给定的起始分子添加分子片段来设计具有靶性质的新分子。我们模型的一个关键特征是属性控制和片段类型方面的高概括能力。通过以自动回归方式学习各个片段对目标属性的贡献来实现前者。对于后者,我们使用深神经网络,其从两个分子的嵌入载体中预测两个分子的键合概率作为输入。在用金砖石分解方法制备片段文库的同时隐式考虑所生成的分子的高合成可用性。我们表明该模型可以以高成功率同时控制多个目标性质的分子。即使在培训数据很少的财产范围内,它也与看不见的片段同样很好地工作,验证高概括能力。作为一种实际应用,我们证明,在对接得分方面,该模型可以产生具有高结合亲和力的潜在抑制剂,其抗对接得分的3CL-COV-2。
translated by 谷歌翻译
基于结构的药物设计涉及发现具有对蛋白质袋的结构和化学互补性的配体分子。深度生成方法表明了在提出从划痕(De-Novo设计)的新型分子中的承诺,避免了化学空间的详尽虚拟筛选。大多数生成的de-novo模型未能包含详细的配体 - 蛋白质相互作用和3D袋结构。我们提出了一种新的监督模型,在离散的分子空间中与3D姿势共同产生分子图。分子在口袋内部构建原子原子,由来自晶体数据的结构信息引导。我们使用对接基准进行评估我们的模型,并发现引导生成将预测的结合亲和力提高了8%,并在基线上通过10%的药物相似分数提高了预测的结合亲和力。此外,我们的模型提出了具有超过一些已知配体的结合分数的分子,这可能在未来的湿式实验室研究中有用。
translated by 谷歌翻译
Cancer is one of the leading causes of death worldwide. It is caused by a variety of genetic mutations, which makes every instance of the disease unique. Since chemotherapy can have extremely severe side effects, each patient requires a personalized treatment plan. Finding the dosages that maximize the beneficial effects of the drugs and minimize their adverse side effects is vital. Deep neural networks automate and improve drug selection. However, they require a lot of data to be trained on. Therefore, there is a need for machine-learning approaches that require less data. Hybrid quantum neural networks were shown to provide a potential advantage in problems where training data availability is limited. We propose a novel hybrid quantum neural network for drug response prediction, based on a combination of convolutional, graph convolutional, and deep quantum neural layers of 8 qubits with 363 layers. We test our model on the reduced Genomics of Drug Sensitivity in Cancer dataset and show that the hybrid quantum model outperforms its classical analog by 15% in predicting IC50 drug effectiveness values. The proposed hybrid quantum machine learning model is a step towards deep quantum data-efficient algorithms with thousands of quantum gates for solving problems in personalized medicine, where data collection is a challenge.
translated by 谷歌翻译
机器学习潜力是分子模拟的重要工具,但是由于缺乏高质量数据集来训练它们的发展,它们的开发阻碍了它们。我们描述了Spice数据集,这是一种新的量子化学数据集,用于训练与模拟与蛋白质相互作用的药物样的小分子相关的潜在。它包含超过110万个小分子,二聚体,二肽和溶剂化氨基酸的构象。它包括15个元素,带电和未充电的分子以及广泛的共价和非共价相互作用。它提供了在{\ omega} b97m-d3(bj)/def2-tzVPPD理论水平以及其他有用的数量(例如多极矩和键阶)上计算出的力和能量。我们在其上训练一组机器学习潜力,并证明它们可以在化学空间的广泛区域中实现化学精度。它可以作为创建可转移的,准备使用潜在功能用于分子模拟的宝贵资源。
translated by 谷歌翻译
Variational autoencoder (VAE) is a popular method for drug discovery and there had been a great deal of architectures and pipelines proposed to improve its performance. But the VAE model itself suffers from deficiencies such as poor manifold recovery when data lie on low-dimensional manifold embedded in higher dimensional ambient space and they manifest themselves in each applications differently. The consequences of it in drug discovery is somewhat under-explored. In this paper, we study how to improve the similarity of the data generated via VAE and the training dataset by improving manifold recovery via a 2-stage VAE where the second stage VAE is trained on the latent space of the first one. We experimentally evaluated our approach using the ChEMBL dataset as well as a polymer datasets. In both dataset, the 2-stage VAE method is able to improve the property statistics significantly from a pre-existing method.
translated by 谷歌翻译
Generating molecules that bind to specific proteins is an important but challenging task in drug discovery. Previous works usually generate atoms in an auto-regressive way, where element types and 3D coordinates of atoms are generated one by one. However, in real-world molecular systems, the interactions among atoms in an entire molecule are global, leading to the energy function pair-coupled among atoms. With such energy-based consideration, the modeling of probability should be based on joint distributions, rather than sequentially conditional ones. Thus, the unnatural sequentially auto-regressive modeling of molecule generation is likely to violate the physical rules, thus resulting in poor properties of the generated molecules. In this work, a generative diffusion model for molecular 3D structures based on target proteins as contextual constraints is established, at a full-atom level in a non-autoregressive way. Given a designated 3D protein binding site, our model learns the generative process that denoises both element types and 3D coordinates of an entire molecule, with an equivariant network. Experimentally, the proposed method shows competitive performance compared with prevailing works in terms of high affinity with proteins and appropriate molecule sizes as well as other drug properties such as drug-likeness of the generated molecules.
translated by 谷歌翻译
药物发现对于保护人免受疾病至关重要。基于目标的筛查是过去几十年来开发新药的最流行方法之一。该方法有效地筛选了候选药物在体外抑制靶蛋白,但由于体内所选药物的活性不足,它通常失败。需要准确的计算方法来弥合此差距。在这里,我们提出了一个新的图形多任务深度学习模型,以识别具有目标抑制性和细胞活性(matic)特性的化合物。在经过精心策划的SARS-COV-2数据集中,提出的Matic模型显示了与传统方法相比,在筛选体内有效化合物方面的优点。接下来,我们探索了模型的解释性,发现目标抑制(体外)或细胞活性(体内)任务的学习特征与分子属性相关性和原子功能专注不同。基于这些发现,我们利用了基于蒙特卡洛的增强性学习生成模型来生成具有体外和体内功效的新型多毛皮化合物,从而弥合了基于靶基于靶基于靶标的药物和基于细胞的药物发现之间的差距。
translated by 谷歌翻译
Artificial intelligence (AI) in the form of deep learning bears promise for drug discovery and chemical biology, $\textit{e.g.}$, to predict protein structure and molecular bioactivity, plan organic synthesis, and design molecules $\textit{de novo}$. While most of the deep learning efforts in drug discovery have focused on ligand-based approaches, structure-based drug discovery has the potential to tackle unsolved challenges, such as affinity prediction for unexplored protein targets, binding-mechanism elucidation, and the rationalization of related chemical kinetic properties. Advances in deep learning methodologies and the availability of accurate predictions for protein tertiary structure advocate for a $\textit{renaissance}$ in structure-based approaches for drug discovery guided by AI. This review summarizes the most prominent algorithmic concepts in structure-based deep learning for drug discovery, and forecasts opportunities, applications, and challenges ahead.
translated by 谷歌翻译
We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds.
translated by 谷歌翻译