对RAS和RAF蛋白的行为与细胞膜中局部脂质环境之间关系之间的关系的了解对了解癌症形成的基础机制至关重要。在这项工作中,我们采用深度学习(DL)来学习这种关系,通过预测基于脂质膜的RAS和RAS-RAF蛋白复合物的蛋白质定位状态,该状态基于蛋白质结构域周围的脂质密度(CG),相对于脂质膜。分子动力学(MD)模拟。我们的DL模型可以预测六个蛋白质状态,总体准确性超过80%。这项工作的发现为蛋白质如何调节脂质环境提供了新的见解,这反过来又可以帮助设计新型疗法以调节与癌症发展相关的机制中的这种相互作用。
translated by 谷歌翻译
Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic features are poorly defined. Here, we present a method for improving explainability of DNN models using synthetic histology generated by a conditional generative adversarial network (cGAN). We show that cGANs generate high-quality synthetic histology images that can be leveraged for explaining DNN models trained to classify molecularly-subtyped tumors, exposing histologic features associated with molecular state. Fine-tuning synthetic histology through class and layer blending illustrates nuanced morphologic differences between tumor subtypes. Finally, we demonstrate the use of synthetic histology for augmenting pathologist-in-training education, showing that these intuitive visualizations can reinforce and improve understanding of histologic manifestations of tumor biology.
translated by 谷歌翻译
通用数据模型解决了标准化电子健康记录(EHR)数据的许多挑战,但无法将其集成深度表型所需的资源。开放的生物学和生物医学本体论(OBO)铸造本体论提供了可用于生物学知识的语义计算表示,并能够整合多种生物医学数据。但是,将EHR数据映射到OBO Foundry本体论需要大量的手动策展和域专业知识。我们介绍了一个框架,用于将观察性医学成果合作伙伴关系(OMOP)标准词汇介绍给OBO铸造本体。使用此框架,我们制作了92,367条条件,8,615种药物成分和10,673个测量结果的映射。域专家验证了映射准确性,并且在24家医院进行检查时,映射覆盖了99%的条件和药物成分和68%的测量结果。最后,我们证明OMOP2OBO映射可以帮助系统地识别可能受益于基因检测的未诊断罕见病患者。
translated by 谷歌翻译
数据增强是自然语言处理(NLP)模型的鲁棒性评估的重要组成部分,以及增强他们培训的数据的多样性。在本文中,我们呈现NL-Cogmenter,这是一种新的参与式Python的自然语言增强框架,它支持创建两个转换(对数据的修改)和过滤器(根据特定功能的数据拆分)。我们描述了框架和初始的117个变换和23个过滤器,用于各种自然语言任务。我们通过使用其几个转换来分析流行自然语言模型的鲁棒性来证明NL-Upmenter的功效。基础架构,Datacards和稳健性分析结果在NL-Augmenter存储库上公开可用(\ url {https://github.com/gem-benchmark/nl-augmenter})。
translated by 谷歌翻译
我们总结了使用巨大的自动语音识别(ASR)模型的大量努力的结果,该模型使用包含大约一百万小时音频的大型,多样的未标记数据集进行了预训练。我们发现,即使对于拥有数万个小时的标记数据的非常大的任务,预训练,自我培训和扩大模型大小的组合也大大提高了数据效率。特别是,在具有34K小时标记数据的ASR任务上,通过微调80亿个参数预先训练的构象异构体模型,我们可以匹配最先进的(SOTA)性能(SOTA)的性能,只有3%的培训数据和通过完整的训练集可以显着改善SOTA。我们还报告了从使用大型预训练和自我训练的模型来完成一系列下游任务所获得的普遍利益,这些任务涵盖了广泛的语音域,并涵盖了多个数据集大小的大小,包括在许多人中获得SOTA性能公共基准。此外,我们利用预先训练的网络的学会表示,在非ASR任务上实现SOTA结果。
translated by 谷歌翻译
我们的工作针对自动分析,以量化细菌细菌群体的生长动力学。我们提出了一种创新的方法,通过自动化新的,特定的成本功能的自动化最小化对可变形细胞运动的框架跟踪。这种最小化由专用的玻尔兹曼机器(随机复发神经网络)实现。通过连续的两个成本函数的最小化,对细胞分裂的自动检测进行了类似的处理,从而交替地识别儿童对和父母的识别。我们使用(i)记录模拟细胞菌落的记录来验证提出的自动细胞跟踪算法,这些算法与微流体陷阱和(ii)真实数据密切模仿大肠杆菌的生长动力学。在一批1100个模拟图像框架上,每帧的单元格登记精度范围从94.5%到100%,平均水平很高。我们使用大肠杆菌菌落的实验图像序列(即实际数据)进行的初始测试也产生令人信服的结果,注册精度范围从90%到100%。
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译
While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners and controls, thereby proposing a method to calculate directional connectivity. As causal relationships run in both directions, we explore three scenarios, namely channels' activity as sources, as sinks, and in total. Our proposed method can be used for both classification and exploratory analysis. In all scenarios, we find confirmation of the established right-lateralized Theta sampling network anomaly, in line with the temporal sampling framework's assumption of oscillatory differences in the Theta and Gamma bands. Further, we show that this anomaly primarily occurs in the causal relationships of channels acting as sinks, where it is significantly more pronounced than when only total activity is observed. In the sink scenario, our classifier obtains 0.84 and 0.88 accuracy and 0.87 and 0.93 AUC for the Theta and Gamma bands, respectively.
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
Extracting complex structures from grid-based data is a common key step in automated medical image analysis. The conventional solution to recovering tree-structured geometries typically involves computing the minimal cost path through intermediate representations derived from segmentation masks. However, this methodology has significant limitations in the context of projective imaging of tree-structured 3D anatomical data such as coronary arteries, since there are often overlapping branches in the 2D projection. In this work, we propose a novel approach to predicting tree connectivity structure which reformulates the task as an optimization problem over individual steps of a recursive process. We design and train a two-stage model which leverages the UNet and Transformer architectures and introduces an image-based prompting technique. Our proposed method achieves compelling results on a pair of synthetic datasets, and outperforms a shortest-path baseline.
translated by 谷歌翻译