There is increasing adoption of artificial intelligence in drug discovery. However, existing works use machine learning to mainly utilize the chemical structures of molecules yet ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions, and predict complex biological activities. We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecule's chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct the largest multi-modal dataset to date, namely PubChemSTM, with over 280K chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM possesses two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
translated by 谷歌翻译
Most previous unsupervised domain adaptation (UDA) methods for question answering(QA) require access to source domain data while fine-tuning the model for the target domain. Source domain data may, however, contain sensitive information and may be restricted. In this study, we investigate a more challenging setting, source-free UDA, in which we have only the pretrained source model and target domain data, without access to source domain data. We propose a novel self-training approach to QA models that integrates a unique mask module for domain adaptation. The mask is auto-adjusted to extract key domain knowledge while trained on the source domain. To maintain previously learned domain knowledge, certain mask weights are frozen during adaptation, while other weights are adjusted to mitigate domain shifts with pseudo-labeled samples generated in the target domain. %As part of the self-training process, we generate pseudo-labeled samples in the target domain based on models trained in the source domain. Our empirical results on four benchmark datasets suggest that our approach significantly enhances the performance of pretrained QA models on the target domain, and even outperforms models that have access to the source data during adaptation.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
This paper focuses on the task of survival time analysis for lung cancer. Although much progress has been made in this problem in recent years, the performance of existing methods is still far from satisfactory. Traditional and some deep learning-based survival time analyses for lung cancer are mostly based on textual clinical information such as staging, age, histology, etc. Unlike existing methods that predicting on the single modality, we observe that a human clinician usually takes multimodal data such as text clinical data and visual scans to estimate survival time. Motivated by this, in this work, we contribute a smart cross-modality network for survival analysis network named Lite-ProSENet that simulates a human's manner of decision making. Extensive experiments were conducted using data from 422 NSCLC patients from The Cancer Imaging Archive (TCIA). The results show that our Lite-ProSENet outperforms favorably again all comparison methods and achieves the new state of the art with the 89.3% on concordance. The code will be made publicly available.
translated by 谷歌翻译
With the demand for standardized large-scale livestock farming and the development of artificial intelligence technology, a lot of research in area of animal face recognition were carried on pigs, cattle, sheep and other livestock. Face recognition consists of three sub-task: face detection, face normalizing and face identification. Most of animal face recognition study focuses on face detection and face identification. Animals are often uncooperative when taking photos, so the collected animal face images are often in arbitrary directions. The use of non-standard images may significantly reduce the performance of face recognition system. However, there is no study on normalizing of the animal face image with arbitrary directions. In this study, we developed a light-weight angle detection and region-based convolutional network (LAD-RCNN) containing a new rotation angle coding method that can detect the rotation angle and the location of animal face in one-stage. LAD-RCNN has a frame rate of 72.74 FPS (including all steps) on a single GeForce RTX 2080 Ti GPU. LAD-RCNN has been evaluated on multiple dataset including goat dataset and gaot infrared image. Evaluation result show that the AP of face detection was more than 95% and the deviation between the detected rotation angle and the ground-truth rotation angle were less than 0.036 (i.e. 6.48{\deg}) on all the test dataset. This shows that LAD-RCNN has excellent performance on livestock face and its direction detection, and therefore it is very suitable for livestock face detection and Normalizing. Code is available at https://github.com/SheepBreedingLab-HZAU/LAD-RCNN/
translated by 谷歌翻译
We propose an analysis in fair learning that preserves the utility of the data while reducing prediction disparities under the criteria of group sufficiency. We focus on the scenario where the data contains multiple or even many subgroups, each with limited number of samples. As a result, we present a principled method for learning a fair predictor for all subgroups via formulating it as a bilevel objective. Specifically, the subgroup specific predictors are learned in the lower-level through a small amount of data and the fair predictor. In the upper-level, the fair predictor is updated to be close to all subgroup specific predictors. We further prove that such a bilevel objective can effectively control the group sufficiency and generalization error. We evaluate the proposed framework on real-world datasets. Empirical evidence suggests the consistently improved fair predictions, as well as the comparable accuracy to the baselines.
translated by 谷歌翻译
由于其交易实体的伪匿名性质,比特币比任何其他金融资产都更频繁地进行非法活动。理想的检测模型有望实现(i)早期检测,(ii)良好的解释性和(iii)多功能性的所有三个特性。但是,现有的解决方案无法满足所有这些要求,因为它们中的大多数都在不满意的情况下严重依赖深度学习,并且仅用于对特定非法类型的回顾性分析。首先,我们提出资产转移路径,旨在描述解决早期特征。接下来,采用基于决策树的特征选择和分割策略,我们将整个观察期分为不同的段,并将每个段作为段向量进行编码。聚集了所有这些段向量后,我们获得了全局状态向量,本质上是描述整体意图的基本单元。最后,一个层次自我注意力预测指标可以实时预测给定地址的标签。生存模块告诉预测因子何时停止并提出状态序列,即意图。 %依赖类型的选择策略和全球状态向量,我们的模型可用于检测具有强大解释性的各种非法活动。精心设计的预测指标和特定的损失功能可以进一步增强模型的预测速度和解释性。在三个现实世界数据集上进行的广泛实验表明,我们提出的算法优于最先进的方法。此外,其他案例研究证明我们的模型不仅可以解释现有的非法模式,还可以找到新的可疑字符。
translated by 谷歌翻译
阐明并准确预测分子的吸毒性和生物活性在药物设计和发现中起关键作用,并且仍然是一个开放的挑战。最近,图神经网络(GNN)在基于图的分子属性预测方面取得了显着进步。但是,当前基于图的深度学习方法忽略了分子的分层信息以及特征通道之间的关系。在这项研究中,我们提出了一个精心设计的分层信息图神经网络框架(称为hignn),用于通过利用分子图和化学合成的可见的无限元素片段来预测分子特性。此外,首先在Hignn体系结构中设计了一个插件功能的注意块,以适应消息传递阶段后自适应重新校准原子特征。广泛的实验表明,Hignn在许多具有挑战性的药物发现相关基准数据集上实现了最先进的预测性能。此外,我们设计了一种分子碎片的相似性机制,以全面研究Hignn模型在子图水平上的解释性,表明Hignn作为强大的深度学习工具可以帮助化学家和药剂师识别出设计更好分子的关键分子,以设计更好的分子,以设计出所需的更好分子。属性或功能。源代码可在https://github.com/idruglab/hignn上公开获得。
translated by 谷歌翻译
艺术文本识别是一项极具挑战性的任务,具有广泛的应用程序。但是,当前场景文本识别方法主要集中于不规则文本,而未专门探讨艺术文本。艺术文本识别的挑战包括具有特殊设计的字体和效果的各种外观,字符之间的复杂连接和重叠以及背景模式的严重干扰。为了减轻这些问题,我们建议在三个层面上识别艺术文本。首先,考虑到角结构对外观和形状的稳健性,使用角点指导角色内部特征的提取。通过这种方式,角点的离散性切断了字符之间的连接,它们的稀疏性改善了背景干扰的稳健性。其次,我们设计了一个字符对比损失,以模拟字符级别的特征,从而改善了字符分类的特征表示。第三,我们利用变形金刚在图像级别上学习全局功能,并在角落跨注意机制的帮助下对角点的全球关系进行建模。此外,我们提供了一个艺术文本数据集来基准表演。实验结果验证了我们提出的方法在艺术文本识别方面的显着优势,并在几个模糊和透视数据集上实现了最先进的性能。
translated by 谷歌翻译
由于类间的相似性和注释歧义,嘈杂的标签面部表达识别(FER)比传统的嘈杂标签分类任务更具挑战性。最近的作品主要通过过滤大量损坏样本来解决此问题。在本文中,我们从新功能学习的角度探索了嘈杂的标签。我们发现,FER模型通过专注于可以认为与嘈杂标签相关的一部分来记住嘈杂的样本,而不是从导致潜在真理的整个功能中学习。受到的启发,我们提出了一种新颖的擦除注意力一致性(EAC)方法,以自动抑制嘈杂的样品。具体而言,我们首先利用面部图像的翻转语义一致性来设计不平衡的框架。然后,我们随机删除输入图像,并使用翻转注意一致性,以防止模型专注于部分特征。 EAC明显优于最先进的嘈杂标签方法,并将其概括地概括为其他类似CIFAR100和Tiny-Imagenet等类别的任务。该代码可在https://github.com/zyh-uaiaaaa/erasing-prestention-consistency中获得。
translated by 谷歌翻译