The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Although weakly-supervised techniques can reduce the labeling effort, it is unclear whether a saliency model trained with weakly-supervised data (e.g., point annotation) can achieve the equivalent performance of its fully-supervised version. This paper attempts to answer this unexplored question by proving a hypothesis: there is a point-labeled dataset where saliency models trained on it can achieve equivalent performance when trained on the densely annotated dataset. To prove this conjecture, we proposed a novel yet effective adversarial trajectory-ensemble active learning (ATAL). Our contributions are three-fold: 1) Our proposed adversarial attack triggering uncertainty can conquer the overconfidence of existing active learning methods and accurately locate these uncertain pixels. {2)} Our proposed trajectory-ensemble uncertainty estimation method maintains the advantages of the ensemble networks while significantly reducing the computational cost. {3)} Our proposed relationship-aware diversity sampling algorithm can conquer oversampling while boosting performance. Experimental results show that our ATAL can find such a point-labeled dataset, where a saliency model trained on it obtained $97\%$ -- $99\%$ performance of its fully-supervised version with only ten annotated points per image.
translated by 谷歌翻译
Person re-identification plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. Recently, leveraging the supervised or semi-unsupervised learning paradigms, which benefits from the large-scale datasets and strong computing performance, has achieved a competitive performance on a specific target domain. However, when Re-ID models are directly deployed in a new domain without target samples, they always suffer from considerable performance degradation and poor domain generalization. To address this challenge, we propose a Deep Multimodal Fusion network to elaborate rich semantic knowledge for assisting in representation learning during the pre-training. Importantly, a multimodal fusion strategy is introduced to translate the features of different modalities into the common space, which can significantly boost generalization capability of Re-ID model. As for the fine-tuning stage, a realistic dataset is adopted to fine-tune the pre-trained model for better distribution alignment with real-world data. Comprehensive experiments on benchmarks demonstrate that our method can significantly outperform previous domain generalization or meta-learning methods with a clear margin. Our source code will also be publicly available at https://github.com/JeremyXSC/DMF.
translated by 谷歌翻译
模拟和混合信号(AMS)电路设计仍然依赖于人类设计专业知识。机器学习一直通过用人工智能代替人类的体验来协助电路设计自动化。本文介绍了标签,这是一种从利用文本,自我注意力和图形的布局中学习电路表示的新范式。嵌入网络模型在无手动标签的情况下学习空间信息。我们向AMS电路学习介绍文本嵌入和自我注意的机制。实验结果表明,具有工业罚款技术基准的实例之间的布局距离的能力。通过在案例研究中显示有限数据的其他三个学习任务的转移性,可以验证电路表示的有效性:布局匹配预测,线长度估计和净寄生电容预测。
translated by 谷歌翻译
任意为导向的对象检测(AOOD)在遥感方案中的图像理解起着重要作用。现有的AOOD方法面临歧义和高成本的挑战。为此,提出了由粗粒角分类(CAC)和细粒角回归(FAR)组成的多透明角度表示(MGAR)方法。具体而言,设计的CAC避免了通过离散角编码(DAE)避免角度预测的歧义,并通过使DAE的粒度变形来降低复杂性。基于CAC,FAR的开发是为了优化角度预测,成本比狭窄的DAE粒度要低得多。此外,与IOU指导的自适应重新加权机制相交,旨在提高角度预测的准确性(IFL)。在几个公共遥感数据集上进行了广泛的实验,这证明了拟议的MGAR的有效性。此外,对嵌入式设备进行的实验表明,拟议的MGAR也对轻型部署也很友好。
translated by 谷歌翻译
尽管参数有效调整(PET)方法在自然语言处理(NLP)任务上显示出巨大的潜力,但其有效性仍然对计算机视觉(CV)任务的大规模转向进行了研究。本文提出了Conv-Adapter,这是一种专为CONCNET设计的PET模块。 Conv-Adapter具有轻巧的,可转让的域和架构,不合时宜,并且在不同的任务上具有广义性能。当转移下游任务时,Conv-Adapter将特定于任务的特征调制到主链的中间表示,同时保持预先训练的参数冻结。通过仅引入少量可学习的参数,例如,仅3.5%的RESNET50的完整微调参数,Conv-Adapter优于先前的宠物基线方法,并实现可比性或超过23个分类任务的全面调查的性能。它还在几乎没有分类的情况下表现出卓越的性能,平均利润率为3.39%。除分类外,Conv-Adapter可以推广到检测和细分任务,其参数降低了50%以上,但性能与传统的完整微调相当。
translated by 谷歌翻译
半监督学习(SSL)通过利用大量未标记数据来增强有限标记的样品来改善模型的概括。但是,目前,流行的SSL评估协议通常受到计算机视觉(CV)任务的约束。此外,以前的工作通常从头开始训练深层神经网络,这是耗时且环境不友好的。为了解决上述问题,我们通过从简历,自然语言处理(NLP)和音频处理(AUDIO)中选择15种不同,具有挑战性和全面的任务来构建统一的SSL基准(USB),我们会系统地评估主导的SSL方法,以及开源的一个模块化和可扩展的代码库,以对这些SSL方法进行公平评估。我们进一步为简历任务提供了最新的神经模型的预训练版本,以使成本负担得起,以进行进一步调整。 USB启用对来自多个域的更多任务的单个SSL算法的评估,但成本较低。具体而言,在单个NVIDIA V100上,仅需要37个GPU天才能在USB中评估15个任务的FIXMATCH,而335 GPU天(除ImageNet以外的4个CV数据集中的279 GPU天)在使用典型协议的5个CV任务上需要进行5个CV任务。
translated by 谷歌翻译
我们研究人员重新识别(RE-ID)的向后兼容问题,该问题旨在限制更新的新模型的功能,以与画廊中旧模型的现有功能相提并论。大多数现有作品都采用基于蒸馏的方法,这些方法着重于推动新功能模仿旧功能。但是,基于蒸馏的方法本质上是最佳的,因为它迫使新的特征空间模仿旧特征空间。为了解决这个问题,我们提出了基于排名的向后兼容学习(RBCL),该学习直接优化了新功能和旧功能之间的排名指标。与以前的方法不同,RBCL仅推动新功能以在旧功能空间而不是严格对齐中找到最佳的位置,并且与向后检索的最终目标保持一致。但是,用于使排名度量可区分的尖锐的Sigmoid函数也会导致梯度消失的问题,因此在训练后期的时期造成了排名的完善。为了解决这个问题,我们提出了动态梯度重新激活(DGR),可以通过在远期步骤中添加动态计算的常数来重新激活抑制梯度。为了进一步帮助目标最佳位置,我们包括邻居上下文代理(NCAS),以近似训练期间的整个旧特征空间。与以前仅在内域设置上测试的作品不同,我们首次尝试引入跨域设置(包括受监督和无监督的),这更有意义和困难。所有五个设置上的实验结果表明,在所有设置下,提出的RBCL都以大幅度优于先前的最新方法。
translated by 谷歌翻译
目前,在有监督的学习下,由大规模自然界数据集预测的模型,然后在一些特定的任务标签数据上进行微调,这是主导知识转移学习的范式。它已达到遥感域(RSD)中任务感知模型培训的共识解决方案的状态。不幸的是,由于不同类别的成像数据和数据注释的严峻挑战,因此没有足够大且均匀的遥感数据集来支持RSD中的大规模预处理。此外,通过监督学习,然后直接对不同的下游任务进行微调,在大规模自然场景数据集上进行了预处理的模型似乎是一种粗略的方法,这很容易受到不可避免的标记噪声,严重的域间隙和任务意识到的差异的影响。因此,在本文中,考虑了一个简洁有效的知识转移学习策略,称为连续预审计(CSPT),考虑了不停止在自然语言处理中预处理的想法(CSPT)(CSPT)(CSPT)(CSPT)(CSPT)(CSPT)(CSPT)(CSPT)(CSPT)(CSPT)(CSPT)(CSPT)(CSPT)(CSPT)(CSPT),那么在本文中。 NLP),可以逐渐弥合域间隙并将知识从自然场景域转移到RSD。拟议的CSPT还可以发布未标记数据的巨大潜力,以进行任务感知模型培训。最后,在RSD的十二个数据集上进行了广泛的实验,涉及三种类型的下游任务(例如,场景分类,对象检测和土地覆盖分类)和两种类型的成像数据(例如,光学和SAR)。结果表明,通过利用拟议的CSPT进行任务感知模型培训,RSD中的几乎所有下游任务都可以胜过先前的监督预处理的方法,然后再进行预先调整,甚至超过了最先进的方法(SOTA)(SOTA)(SOTA)性能没有任何昂贵的标签消费和仔细的模型设计。
translated by 谷歌翻译
本文提出了一种凝视校正和动画方法,用于高分辨率,不受约束的肖像图像,可以在没有凝视角度和头部姿势注释的情况下对其进行训练。常见的凝视校正方法通常需要用精确的注视和头姿势信息对培训数据进行注释。使用无监督的方法解决此问题仍然是一个空旷的问题,尤其是对于野外高分辨率的面部图像,这并不容易用凝视和头部姿势标签注释。为了解决这个问题,我们首先创建两个新的肖像数据集:Celebgaze和高分辨率Celebhqgaze。其次,我们将目光校正任务制定为图像介绍问题,使用凝视校正模块(GCM)和凝视动画模块(GAM)解决。此外,我们提出了一种无监督的训练策略,即训练的综合训练,以学习眼睛区域特征与凝视角度之间的相关性。结果,我们可以在此空间中使用学习的潜在空间进行凝视动画。此外,为了减轻培训和推理阶段中的记忆和计算成本,我们提出了一个与GCM和GAM集成的粗到精细模块(CFM)。广泛的实验验证了我们方法对野外低和高分辨率面部数据集中的目光校正和凝视动画任务的有效性,并证明了我们方法在艺术状态方面的优越性。代码可从https://github.com/zhangqianhui/gazeanimationv2获得。
translated by 谷歌翻译