长期椎骨骨折严重影响了患者的生活质量,导致脑诊断,腰椎畸形甚至瘫痪。计算机断层扫描(CT)是在早期筛查该疾病的常见临床检查。但是,微弱的放射学表现和非特异性症状导致遗体诊断的高风险。特别是,对于深度学习模型和缺乏经验的医生而言,轻度骨折和正常对照很难区分。在本文中,我们认为增强微弱的断裂特征以鼓励阶层间的可分离性是提高准确性的关键。在此激励的情况下,我们提出了一个基于对比度学习的监督模型,以通过CT扫描估算Genent的椎骨骨折等级。作为一项辅助任务,受监督的对比学习在将其他人推开的同时缩小了同一类中特征的距离,从而增强了模型捕获椎骨骨折的微妙特征的能力。考虑到该领域缺乏数据集,我们构建了一个数据库,其中包括经验丰富的放射科医生注释的208个样本。我们的方法的特异性为99 \%,在二元分类中的敏感性为85%,在多分类中的Macio-F1为77 \%,表明对比度学习显着提高了椎骨骨折筛选的准确性,尤其是在轻度断裂和正常对照。我们的脱敏数据和代码将公开为社区提供。
translated by 谷歌翻译
近年来,几项作品采用了卷积神经网络(CNN)来诊断基于X射线图像或磁共振成像(MRI)的股骨头(AVNFH)的无血管坏死。但是,由于组织重叠,X射线图像很难为早期诊断提供细粒度。另一方面,MRI的成像时间很长,更昂贵,使其在大规模筛查中不切实际。计算机断层扫描(CT)显示了层的组织,图像速度更快,并且比MRI成本较小。但是,据我们所知,对于基于CT的AVNFH诊断没有工作。在这项工作中,我们收集并标记为AVNFH排名的大型数据集。此外,现有的端到端CNN仅产生分类结果,并且很难为诊断医生提供更多信息。为了解决这个问题,我们提出了结构正规化的专注网络(Sranet),该网络能够根据贴剂注意力在分类过程中突出坏死区域。 Sranet提取物在图像块中的特征,通过注意机制获得重量以汇总特征,并通过具有先验知识的结构正常化程序来限制它们以改善概括。 Sranet在我们的AVNFH-CT数据集上进行了评估。实验结果表明,Sranet优于CNN,用于AVNFH分类,此外,它可以定位病变并提供更多信息以帮助医生进行诊断。我们的代码在https://github.com/tomas-lilingfeng/sranet上公开。
translated by 谷歌翻译
文本分类是自然语言处理(NLP)的主要任务。最近,图神经网络(GNN)已迅速发展,并应用于文本分类任务。作为一种特殊的图形数据,该树具有更简单的数据结构,可以为文本分类提供丰富的层次结构信息。受结构熵的启发,我们通过最小化结构熵并提出提示来构造图形的编码树,该提示旨在充分利用文本中包含的文本中包含的层次信息,以完成文本分类的任务。具体来说,我们首先为每个文本建立依赖关系解析图。然后,我们设计了一种结构熵最小化算法来解码图中的关键信息,并将每个图转换为其相应的编码树。基于编码树的层次结构,通过逐层更新编码树中的非叶子节点的表示来获得整个图的表示。最后,我们介绍了层次信息在文本分类中的有效性。实验结果表明,在具有简单的结构和很少的参数的同时,提示在流行基准测试上的最新方法优于最先进的方法。
translated by 谷歌翻译
In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient style patterns, the content details may be damaged and sometimes the objects of images can not be distinguished clearly. For this reason, we present a new transformer-based method named STT for image style transfer and an edge loss which can enhance the content details apparently to avoid generating blurred results for excessive rendering on style features. Qualitative and quantitative experiments demonstrate that STT achieves comparable performance to state-of-the-art image style transfer methods while alleviating the content leak problem.
translated by 谷歌翻译
In recent years, the Transformer architecture has shown its superiority in the video-based person re-identification task. Inspired by video representation learning, these methods mainly focus on designing modules to extract informative spatial and temporal features. However, they are still limited in extracting local attributes and global identity information, which are critical for the person re-identification task. In this paper, we propose a novel Multi-Stage Spatial-Temporal Aggregation Transformer (MSTAT) with two novel designed proxy embedding modules to address the above issue. Specifically, MSTAT consists of three stages to encode the attribute-associated, the identity-associated, and the attribute-identity-associated information from the video clips, respectively, achieving the holistic perception of the input person. We combine the outputs of all the stages for the final identification. In practice, to save the computational cost, the Spatial-Temporal Aggregation (STA) modules are first adopted in each stage to conduct the self-attention operations along the spatial and temporal dimensions separately. We further introduce the Attribute-Aware and Identity-Aware Proxy embedding modules (AAP and IAP) to extract the informative and discriminative feature representations at different stages. All of them are realized by employing newly designed self-attention operations with specific meanings. Moreover, temporal patch shuffling is also introduced to further improve the robustness of the model. Extensive experimental results demonstrate the effectiveness of the proposed modules in extracting the informative and discriminative information from the videos, and illustrate the MSTAT can achieve state-of-the-art accuracies on various standard benchmarks.
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering. Motivated by the fact that informative point cloud features should be able to encode rich geometry and appearance cues and render realistic images, we train a point-cloud encoder within a devised point-based neural renderer by comparing the rendered images with real images on massive RGB-D data. The learned point-cloud encoder can be easily integrated into various downstream tasks, including not only high-level tasks like 3D detection and segmentation, but low-level tasks like 3D reconstruction and image synthesis. Extensive experiments on various tasks demonstrate the superiority of our approach compared to existing pre-training methods.
translated by 谷歌翻译
Collaboration among industrial Internet of Things (IoT) devices and edge networks is essential to support computation-intensive deep neural network (DNN) inference services which require low delay and high accuracy. Sampling rate adaption which dynamically configures the sampling rates of industrial IoT devices according to network conditions, is the key in minimizing the service delay. In this paper, we investigate the collaborative DNN inference problem in industrial IoT networks. To capture the channel variation and task arrival randomness, we formulate the problem as a constrained Markov decision process (CMDP). Specifically, sampling rate adaption, inference task offloading and edge computing resource allocation are jointly considered to minimize the average service delay while guaranteeing the long-term accuracy requirements of different inference services. Since CMDP cannot be directly solved by general reinforcement learning (RL) algorithms due to the intractable long-term constraints, we first transform the CMDP into an MDP by leveraging the Lyapunov optimization technique. Then, a deep RL-based algorithm is proposed to solve the MDP. To expedite the training process, an optimization subroutine is embedded in the proposed algorithm to directly obtain the optimal edge computing resource allocation. Extensive simulation results are provided to demonstrate that the proposed RL-based algorithm can significantly reduce the average service delay while preserving long-term inference accuracy with a high probability.
translated by 谷歌翻译
The traditional statistical inference is static, in the sense that the estimate of the quantity of interest does not affect the future evolution of the quantity. In some sequential estimation problems however, the future values of the quantity to be estimated depend on the estimate of its current value. This type of estimation problems has been formulated as the dynamic inference problem. In this work, we formulate the Bayesian learning problem for dynamic inference, where the unknown quantity-generation model is assumed to be randomly drawn according to a random model parameter. We derive the optimal Bayesian learning rules, both offline and online, to minimize the inference loss. Moreover, learning for dynamic inference can serve as a meta problem, such that all familiar machine learning problems, including supervised learning, imitation learning and reinforcement learning, can be cast as its special cases or variants. Gaining a good understanding of this unifying meta problem thus sheds light on a broad spectrum of machine learning problems as well.
translated by 谷歌翻译
Most Graph Neural Networks follow the message-passing paradigm, assuming the observed structure depicts the ground-truth node relationships. However, this fundamental assumption cannot always be satisfied, as real-world graphs are always incomplete, noisy, or redundant. How to reveal the inherent graph structure in a unified way remains under-explored. We proposed PRI-GSL, a Graph Structure Learning framework guided by the Principle of Relevant Information, providing a simple and unified framework for identifying the self-organization and revealing the hidden structure. PRI-GSL learns a structure that contains the most relevant yet least redundant information quantified by von Neumann entropy and Quantum Jensen-Shannon divergence. PRI-GSL incorporates the evolution of quantum continuous walk with graph wavelets to encode node structural roles, showing in which way the nodes interplay and self-organize with the graph structure. Extensive experiments demonstrate the superior effectiveness and robustness of PRI-GSL.
translated by 谷歌翻译