多模式融合方法旨在整合来自不同数据源的信息。与天然数据集不同,例如在视听应用中,样本由“配对”模式组成,医疗保健中的数据通常异步收集。因此,对于给定样品需要所有方式,对于临床任务而言并不现实,并且在训练过程中显着限制了数据集的大小。在本文中,我们提出了Medfuse,这是一种概念上简单但有前途的基于LSTM的融合模块,可以容纳Uni-Mododal和多模式输入。我们使用MIMIC-IV数据集中的临床时间序列数据以及Mimic-CXR中的相应的胸部X射线图像,评估了融合方法,并引入了院内死亡率预测和表型分类的新基准结果。与更复杂的多模式融合策略相比,MEDFUSE在完全配对的测试集上的差距很大。它在部分配对的测试集中还保持了强大的稳定性,其中包含带有缺少胸部X射线图像的样品。我们发布了我们的可重复性代码,并在将来对竞争模型进行评估。
translated by 谷歌翻译
医疗领域的特征在于,不同种类的数据的模式,诸如成像和生理数据。在实践中,各种决策的医疗数据有助于临床医生。然而,目前大多数国家的最先进的深学习模式完全依赖一个单一的模式精心策划的数据。在本文中,我们提出了一个动态的培训方法,学习特定的模态数据表示和集成的辅助功能,而不是仅仅依靠一个单一的模式。我们的初步实验结果使用在MIMIC- CXR数据集表明,该方法实现了ROC曲线(AUROC)(0.764 AUROC)下的最高区域在MIMIC-IV胸片生理数据相比,患者的表型任务在先前的工作基准的方法,其中仅使用生理数据(AUROC 0.740)的性能。对于重复一组五个或周期性急性发作,包括心律失常,传导障碍,和充血性心脏衰竭的慢性疾病中,AUROC改善了从0.747至0.798。这说明利用在表型任务胸部成像方式的好处,并强调在医疗应用多模态学习的潜能。
translated by 谷歌翻译
主动学习(AL)是一个有希望的ML范式,有可能解析大型未标记数据并有助于降低标记数据可能令人难以置信的域中的注释成本。最近提出的基于神经网络的AL方法使用不同的启发式方法来实现这一目标。在这项研究中,我们证明,在相同的实验环境下,不同类型的AL算法(基于不确定性,基于多样性和委员会)产生了与随机采样基线相比的不一致增长。通过各种实验,控制了随机性来源,我们表明,AL算法实现的性能指标方差可能会导致与先前报道的结果不符的结果。我们还发现,在强烈的正则化下,AL方法在各种实验条件下显示出比随机采样基线的边缘或没有优势。最后,我们以一系列建议进行结论,以了解如何使用新的AL算法评估结果,以确保在实验条件下的变化下结果可再现和健壮。我们共享我们的代码以促进AL评估。我们认为,我们的发现和建议将有助于使用神经网络在AL中进行可重复的研究。我们通过https://github.com/prateekmunjal/torchal开源代码
translated by 谷歌翻译
This paper presents our solutions for the MediaEval 2022 task on DisasterMM. The task is composed of two subtasks, namely (i) Relevance Classification of Twitter Posts (RCTP), and (ii) Location Extraction from Twitter Texts (LETT). The RCTP subtask aims at differentiating flood-related and non-relevant social posts while LETT is a Named Entity Recognition (NER) task and aims at the extraction of location information from the text. For RCTP, we proposed four different solutions based on BERT, RoBERTa, Distil BERT, and ALBERT obtaining an F1-score of 0.7934, 0.7970, 0.7613, and 0.7924, respectively. For LETT, we used three models namely BERT, RoBERTa, and Distil BERTA obtaining an F1-score of 0.6256, 0.6744, and 0.6723, respectively.
translated by 谷歌翻译
Automated synthesis of histology images has several potential applications in computational pathology. However, no existing method can generate realistic tissue images with a bespoke cellular layout or user-defined histology parameters. In this work, we propose a novel framework called SynCLay (Synthesis from Cellular Layouts) that can construct realistic and high-quality histology images from user-defined cellular layouts along with annotated cellular boundaries. Tissue image generation based on bespoke cellular layouts through the proposed framework allows users to generate different histological patterns from arbitrary topological arrangement of different types of cells. SynCLay generated synthetic images can be helpful in studying the role of different types of cells present in the tumor microenvironmet. Additionally, they can assist in balancing the distribution of cellular counts in tissue images for designing accurate cellular composition predictors by minimizing the effects of data imbalance. We train SynCLay in an adversarial manner and integrate a nuclear segmentation and classification model in its training to refine nuclear structures and generate nuclear masks in conjunction with synthetic images. During inference, we combine the model with another parametric model for generating colon images and associated cellular counts as annotations given the grade of differentiation and cell densities of different cells. We assess the generated images quantitatively and report on feedback from trained pathologists who assigned realism scores to a set of images generated by the framework. The average realism score across all pathologists for synthetic images was as high as that for the real images. We also show that augmenting limited real data with the synthetic data generated by our framework can significantly boost prediction performance of the cellular composition prediction task.
translated by 谷歌翻译
Face swapping technology used to create "Deepfakes" has advanced significantly over the past few years and now enables us to create realistic facial manipulations. Current deep learning algorithms to detect deepfakes have shown promising results, however, they require large amounts of training data, and as we show they are biased towards a particular ethnicity. We propose a deepfake detection methodology that eliminates the need for any real data by making use of synthetically generated data using StyleGAN3. This not only performs at par with the traditional training methodology of using real data but it shows better generalization capabilities when finetuned with a small amount of real data. Furthermore, this also reduces biases created by facial image datasets that might have sparse data from particular ethnicities.
translated by 谷歌翻译
This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Our proposed framework, named MARLIN, is a facial video masked autoencoder, that learns highly robust and generic facial embeddings from abundantly available non-annotated web crawled facial videos. As a challenging auxiliary task, MARLIN reconstructs the spatio-temporal details of the face from the densely masked facial regions which mainly include eyes, nose, mouth, lips, and skin to capture local and global aspects that in turn help in encoding generic and transferable features. Through a variety of experiments on diverse downstream tasks, we demonstrate MARLIN to be an excellent facial video encoder as well as feature extractor, that performs consistently well across a variety of downstream tasks including FAR (1.13% gain over supervised benchmark), FER (2.64% gain over unsupervised benchmark), DFD (1.86% gain over unsupervised benchmark), LS (29.36% gain for Frechet Inception Distance), and even in low data regime. Our codes and pre-trained models will be made public.
translated by 谷歌翻译
Advances in reinforcement learning (RL) often rely on massive compute resources and remain notoriously sample inefficient. In contrast, the human brain is able to efficiently learn effective control strategies using limited resources. This raises the question whether insights from neuroscience can be used to improve current RL methods. Predictive processing is a popular theoretical framework which maintains that the human brain is actively seeking to minimize surprise. We show that recurrent neural networks which predict their own sensory states can be leveraged to minimise surprise, yielding substantial gains in cumulative reward. Specifically, we present the Predictive Processing Proximal Policy Optimization (P4O) agent; an actor-critic reinforcement learning agent that applies predictive processing to a recurrent variant of the PPO algorithm by integrating a world model in its hidden state. P4O significantly outperforms a baseline recurrent variant of the PPO algorithm on multiple Atari games using a single GPU. It also outperforms other state-of-the-art agents given the same wall-clock time and exceeds human gamer performance on multiple games including Seaquest, which is a particularly challenging environment in the Atari domain. Altogether, our work underscores how insights from the field of neuroscience may support the development of more capable and efficient artificial agents.
translated by 谷歌翻译
音频是人类交流最常用的方式之一,但与此同时,它很容易被欺骗人们滥用。随着AI的革命,几乎每个人都可以访问相关技术,从而使罪犯犯罪和伪造变得简单。在这项工作中,我们引入了一种深度学习方法,以开发一种分类器,该分类器将盲目地将输入音频分类为真实或模仿。提出的模型接受了从大型音频数据集提取的一组重要功能的培训,以获取分类器,该分类器已在不同音频的相同功能上进行了测试。为这项工作创建了两个数据集;所有英语数据集和混合数据集(阿拉伯语和英语)。这些数据集已通过GitHub提供,可在https://github.com/sass7/dataset上使用研究社区。为了进行比较,还通过人类检查对音频进行了分类,主题是母语人士。随之而来的结果很有趣,并且表现出强大的精度。
translated by 谷歌翻译
由于MRI体积的强度在各机构之间是不一致的,因此必须将多模式MRI的通用特征提取到精确分段脑肿瘤。在这个概念中,我们提出了一个体积视觉变压器,遵循两种窗口策略,以提取精美特征和局部分配平滑度(LDS)在受虚拟对手训练(VAT)启发的模型训练过程中提取精美的特征和局部分配平滑度(LDS),以使模型可靠。我们在FETS Challenge 2022数据集上培训和评估了网络体系结构。我们在在线验证数据集上的性能如下:骰子相似性得分为81.71%,91.38%和85.40%; Hausdorff距离(95%)的14.81毫米,3.93毫米,11.18毫米,分别用于增强肿瘤,整个肿瘤和肿瘤核。总体而言,实验结果通过在每个肿瘤子区域的分割准确性中得出更好的性能来验证我们的方法的有效性。我们的代码实施公开可用:https://github.com/himashi92/vizviva_fets_2022
translated by 谷歌翻译