对于头颈癌(HNC)患者管理,自动总肿瘤量(GTV)细分和准确的治疗前癌症复发预测对于协助医师设计个性化管理计划非常重要,这有可能改善治疗结果和治疗结果和HNC患者的生活质量。在本文中,我们基于HNC患者的组合预处理正电子发射断层扫描/计算机发射断层扫描(PET/CT)扫描,开发了一种自动原发性肿瘤(GTVP)和淋巴结(GTVN)分割方法。我们从分段的肿瘤体积中提取了放射素学特征,并构建了多模式肿瘤复发生存率(RFS)预测模型,该模型融合了预测由单独的CT放射线学,PET放射线学和临床模型融合在一起。我们进行了5倍的交叉验证,以训练和评估MICCAI 2022头和颈部肿瘤分割和结果预测挑战(Hecktor)数据集的方法。 GTVP和GTVN分割的测试队列的集合预测分别达到0.77和0.73,RFS预测的C-指数值为0.67。该代码公开可用(https://github.com/wangkaiwan/hecktor-2022-airt)。我们团队的名字叫艾特。
translated by 谷歌翻译
深度学习已被广​​泛用于医学图像细分和其他方面。但是,现有的医学图像分割模型的性能受到获得足够数量的高质量数据的挑战的限制。为了克服限制,我们提出了一个新的视觉医学图像分割模型LVIT(语言符合视觉变压器)。在我们的模型中,引入了医学文本注释,以弥补图像数据的质量缺陷。此外,文本信息可以在一定程度上指导伪标签的产生,并进一步保证半监督学习中伪标签的质量。我们还提出了指数伪标签迭代机制(EPI),以帮助扩展LVIT和像素级注意模块(PLAM)的半监督版本,以保留图像的局部特征。在我们的模型中,LV(语言视觉)损失旨在直接使用文本信息监督未标记图像的培训。为了验证LVIT的性能,我们构建了包含病理图像,X射线等的多模式医学分割数据集(图像 +文本)。实验结果表明,我们提出的LVIT在完全和半监督条件下具有更好的分割性能。代码和数据集可在https://github.com/huanglizi/lvit上找到。
translated by 谷歌翻译
中文角色是一款具有挑战性的谜语游戏,将一个角色作为解决方案。谜语用修辞技术描述了解决方案特征的发音,形状和含义。在本文中,我们提出了一个汉字谜语数据集,该数据集涵盖了大多数普通简化的中文字符,通过从网络上爬出谜语并生成全新的杂物。在一代阶段,我们为生成模型提供了中文的语音字母,解释和解释解决方案特征,并为每个测试的字符获得多个谜语描述。然后,生成的谜语是手动过滤的,最终数据集CC-Riddle由人写的谜语和过滤的生成的谜语组成。此外,我们基于数据集构建了一个角色谜语QA系统,发现现有模型难以解决此类棘手的问题。CC-Riddle现已公开可用。
translated by 谷歌翻译
检测有益特征交互在推荐系统中至关重要,现有方法通过检查所有可能的特征交互来实现这一目标。但是,检查所有可能的高阶特征相互作用的成本是过于良好的(随着阶的增加而呈指数增长)。因此,现有方法仅检测有限的顺序(例如,最多四个功能的组合)有益特征交互,这可能会错过高于限制的订单的有益特征相互作用。在本文中,我们提出了一个名为HIRS的高图神经网络模型。 HIRS是直接产生任意订单的有益特征相互作用并相应地进行建议预测的第一项工作。生成的特征交互的数量可以指定比所有可能的交互的数量小得多,因此我们的模型承认运行时间要低得多。为了获得有效的算法,我们利用了有益特征相互作用的三种特性,并提出了基于深入的Infomax的方法来指导相互作用的产生。我们的实验结果表明,就建议准确性而言,HIRS的效果优于最先进的算法。
translated by 谷歌翻译
Skull stripping is a crucial prerequisite step in the analysis of brain magnetic resonance images (MRI). Although many excellent works or tools have been proposed, they suffer from low generalization capability. For instance, the model trained on a dataset with specific imaging parameters cannot be well applied to other datasets with different imaging parameters. Especially, for the lifespan datasets, the model trained on an adult dataset is not applicable to an infant dataset due to the large domain difference. To address this issue, numerous methods have been proposed, where domain adaptation based on feature alignment is the most common. Unfortunately, this method has some inherent shortcomings, which need to be retrained for each new domain and requires concurrent access to the input images of both domains. In this paper, we design a plug-and-play shape refinement (PSR) framework for multi-site and lifespan skull stripping. To deal with the domain shift between multi-site lifespan datasets, we take advantage of the brain shape prior, which is invariant to imaging parameters and ages. Experiments demonstrate that our framework can outperform the state-of-the-art methods on multi-site lifespan datasets.
translated by 谷歌翻译
计算高效的非近视贝叶斯优化(BO)的最新进展提高了传统近视方法的查询效率,如预期的改进,同时仅适度提高计算成本。然而,这些进展在很大程度上是有限的,因为不受约束的优化。对于约束优化,少数现有的非近视博方法需要重量计算。例如,一个现有的非近视约束BO方法[LAM和Willcox,2017]依赖于计算昂贵的不可靠的暴力衍生物的无可靠性衍生物优化蒙特卡罗卷展卷采集功能。使用Reparameterization技巧进行更有效的基于衍生物的优化的方法,如在不受约束的环境中,如样本平均近似和无限扰动分析,不扩展:约束在取样的采集功能表面中引入阻碍其优化的不连续性。此外,我们认为非近视在受限制问题中更为重要,因为违反限制的恐惧将近视方法推动了可行和不可行区域之间的边界,减缓了具有严格约束的最佳解决方案的发现。在本文中,我们提出了一种计算的有效的两步保护受限贝叶斯优化采集功能(2-OPT-C)支持顺序和批处理设置。为了实现快速采集功能优化,我们开发了一种新的基于似然比的非偏见估计,其两步最佳采集函数的梯度不使用Reparameterization技巧。在数值实验中,2-OPT-C通常通过先前的方法通过2倍或更多的查询效率,并且在某些情况下通过10倍或更大。
translated by 谷歌翻译
建筑高度估计在许多应用中都很重要,如3D城市重建,城市规划和导航。最近,提出了一种新的建筑物高度估计方法,使用街道场景图像和2D地图。该方法比使用昂贵的高分辨率光学数据,LIDAR数据或雷达数据来获得的传统方法更具可扩展。该方法需要通过针孔相机模型来检测建筑屋顶线,然后计算建筑物高度。我们观察到这种方法在处理复杂的街道场景图像中具有局限性,其中建筑物彼此重叠并且屋顶线难以定位。我们提出CBHE,考虑到建筑角落和屋顶线的建筑高度估计算法。 CBHE首先根据来自2D地图和相机参数的建筑占地面积获得街道场景图像中的建筑角和屋顶候选。然后,我们使用一个名为BuildionNet的深神经网络来分类和过滤角落和屋顶候选。基于来自建筑物的有效角落和屋顶线,CBHE通过针孔相机模型计算建筑物高度。实验结果表明,与最先进的开放式分类器相比,该建议的建筑物对建筑角和屋顶候选滤波的准确性提高了。同时,CBHE以建筑物高度估计精度超过10%以上的基线算法。
translated by 谷歌翻译
B扫描超声模式中图像的精确和快速分类对于诊断眼部疾病至关重要。然而,在超声波中区分各种疾病仍然挑战经验丰富的眼科医生。因此,在这项工作中开发了一个新颖的对比度截面网络(CDNET),旨在应对超声图像中眼异常的细粒度图像分类(FGIC)挑战,包括眼内肿瘤(IOT),视网膜脱离(RD),后堆肥葡萄球菌(PSS)和玻璃体出血(VH)。 CDNET的三个基本组成部分分别是弱监督的病变定位模块(WSLL),对比度多Zoom(CMZ)策略和超级性对比度分解损失(HCD-LOSS)。这些组件促进了在输入和输出方面的细粒度识别的特征分离。所提出的CDNET在我们的ZJU Ocular Ultrasound数据集(Zjuuld)上进行了验证,该数据集由5213个样品组成。此外,在两个公共且广泛使用的胸部X射线FGIC基准上验证了CDNET的概括能力。定量和定性结果证明了我们提出的CDNET的功效,该CDNET在FGIC任务中实现了最新的性能。代码可在以下网址获得:https://github.com/zeroonegame/cdnet-for-ous-fgic。
translated by 谷歌翻译
Interacting particle or agent systems that display a rich variety of swarming behaviours are ubiquitous in science and engineering. A fundamental and challenging goal is to understand the link between individual interaction rules and swarming. In this paper, we study the data-driven discovery of a second-order particle swarming model that describes the evolution of $N$ particles in $\mathbb{R}^d$ under radial interactions. We propose a learning approach that models the latent radial interaction function as Gaussian processes, which can simultaneously fulfill two inference goals: one is the nonparametric inference of {the} interaction function with pointwise uncertainty quantification, and the other one is the inference of unknown scalar parameters in the non-collective friction forces of the system. We formulate the learning problem as a statistical inverse problem and provide a detailed analysis of recoverability conditions, establishing that a coercivity condition is sufficient for recoverability. Given data collected from $M$ i.i.d trajectories with independent Gaussian observational noise, we provide a finite-sample analysis, showing that our posterior mean estimator converges in a Reproducing kernel Hilbert space norm, at an optimal rate in $M$ equal to the one in the classical 1-dimensional Kernel Ridge regression. As a byproduct, we show we can obtain a parametric learning rate in $M$ for the posterior marginal variance using $L^{\infty}$ norm, and the rate could also involve $N$ and $L$ (the number of observation time instances for each trajectory), depending on the condition number of the inverse problem. Numerical results on systems that exhibit different swarming behaviors demonstrate efficient learning of our approach from scarce noisy trajectory data.
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译