在培训深层网络中进行部分分割的重要瓶颈是获得详细注释的成本。我们提出了一个框架,以利用粗糙标签,例如图形地面蒙版和关键点位置,这些位置容易用于某些类别以改善零件分割模型。一个关键的挑战是,这些注释是针对不同任务和不同的标签样式收集的,并且不能轻易地映射到零件标签上。为此,我们建议共同学习标签样式与部分分割模型之间的依赖关系,从而使我们能够利用来自不同标签的监督。为了评估我们的方法,我们在Caltech-UCSD鸟类和OID飞机数据集上开发了基准。我们的方法优于基于多任务学习,半监督学习和竞争方法的基准,这些方法依赖于手动设计的损失功能,以利用稀疏的supervision。
translated by 谷歌翻译
The semantic image segmentation task presents a trade-off between test time accuracy and training-time annotation cost. Detailed per-pixel annotations enable training accurate models but are very timeconsuming to obtain; image-level class labels are an order of magnitude cheaper but result in less accurate models. We take a natural step from image-level annotation towards stronger supervision: we ask annotators to point to an object if one exists. We incorporate this point supervision along with a novel objectness potential in the training loss function of a CNN model. Experimental results on the PASCAL VOC 2012 benchmark reveal that the combined effect of point-level supervision and objectness potential yields an improvement of 12.9% mIOU over image-level supervision. Further, we demonstrate that models trained with pointlevel supervision are more accurate than models trained with image-level, squiggle-level or full supervision given a fixed annotation budget.
translated by 谷歌翻译
带有像素天标签的注释图像是耗时和昂贵的过程。最近,DataSetGan展示了有希望的替代方案 - 通过利用一小组手动标记的GaN生成的图像来通过生成的对抗网络(GAN)来综合大型标记数据集。在这里,我们将DataSetGan缩放到ImageNet类别的规模。我们从ImageNet上训练的类条件生成模型中拍摄图像样本,并为所有1K类手动注释每个类的5张图像。通过在Biggan之上培训有效的特征分割架构,我们将Bigan转换为标记的DataSet生成器。我们进一步表明,VQGan可以类似地用作数据集生成器,利用已经注释的数据。我们通过在各种设置中标记一组8K实图像并在各种设置中评估分段性能来创建一个新的想象因基准。通过广泛的消融研究,我们展示了利用大型生成的数据集来培训在像素 - 明智的任务上培训不同的监督和自我监督的骨干模型的大增益。此外,我们证明,使用我们的合成数据集进行预培训,以改善在几个下游数据集上的标准Imagenet预培训,例如Pascal-VOC,MS-Coco,Citycapes和Chink X射线以及任务(检测,细分)。我们的基准将公开并维护一个具有挑战性的任务的排行榜。项目页面:https://nv-tlabs.github.io/big-dataseTgan/
translated by 谷歌翻译
Recent leading approaches to semantic segmentation rely on deep convolutional networks trained with humanannotated, pixel-level segmentation masks. Such pixelaccurate supervision demands expensive labeling effort and limits the performance of deep networks that usually benefit from more training data. In this paper, we propose a method that achieves competitive accuracy but only requires easily obtained bounding box annotations. The basic idea is to iterate between automatically generating region proposals and training convolutional networks. These two steps gradually recover segmentation masks for improving the networks, and vise versa. Our method, called "BoxSup", produces competitive results (e.g., 62.0% mAP for validation) supervised by boxes only, on par with strong baselines (e.g., 63.8% mAP) fully supervised by masks under the same setting. By leveraging a large amount of bounding boxes, BoxSup further unleashes the power of deep convolutional networks and yields state-of-the-art results on PAS-CAL VOC 2012 and PASCAL-CONTEXT [24].
translated by 谷歌翻译
Jitendra Malik once said, "Supervision is the opium of the AI researcher". Most deep learning techniques heavily rely on extreme amounts of human labels to work effectively. In today's world, the rate of data creation greatly surpasses the rate of data annotation. Full reliance on human annotations is just a temporary means to solve current closed problems in AI. In reality, only a tiny fraction of data is annotated. Annotation Efficient Learning (AEL) is a study of algorithms to train models effectively with fewer annotations. To thrive in AEL environments, we need deep learning techniques that rely less on manual annotations (e.g., image, bounding-box, and per-pixel labels), but learn useful information from unlabeled data. In this thesis, we explore five different techniques for handling AEL.
translated by 谷歌翻译
The success of state-of-the-art deep neural networks heavily relies on the presence of large-scale labelled datasets, which are extremely expensive and time-consuming to annotate. This paper focuses on tackling semi-supervised part segmentation tasks by generating high-quality images with a pre-trained GAN and labelling the generated images with an automatic annotator. In particular, we formulate the annotator learning as a learning-to-learn problem. Given a pre-trained GAN, the annotator learns to label object parts in a set of randomly generated images such that a part segmentation model trained on these synthetic images with their predicted labels obtains low segmentation error on a small validation set of manually labelled images. We further reduce this nested-loop optimization problem to a simple gradient matching problem and efficiently solve it with an iterative algorithm. We show that our method can learn annotators from a broad range of labelled images including real images, generated images, and even analytically rendered images. Our method is evaluated with semi-supervised part segmentation tasks and significantly outperforms other semi-supervised competitors when the amount of labelled examples is extremely limited.
translated by 谷歌翻译
自我监督的视觉表现学习的目标是学习强大,可转让的图像表示,其中大多数研究专注于物体或场景水平。另一方面,在部分级别的代表学习得到了显着的关注。在本文中,我们向对象部分发现和分割提出了一个无人监督的方法,并进行三个贡献。首先,我们通过一系列目标构建一个代理任务,鼓励模型将图像的有意义分解成其部件。其次,先前的工作争辩地用于重建或聚类预先计算的功能作为代理的代理;我们凭经验展示了这一点,这种情况不太可能找到有意义的部分;主要是因为它们的低分辨率和分类网络到空间涂抹信息的趋势。我们建议像素水平的图像重建可以缓解这个问题,充当互补的提示。最后,我们表明基于Keypoint回归的标准评估与分割质量不符合良好,因此引入不同的指标,NMI和ARI,更好地表征对象的分解成零件。我们的方法产生了一致的细粒度但视觉上不同的类别的语义部分,优于三个基准数据集的现有技术。代码可在项目页面上找到:https://www.robots.ox.ac.uk/~vgg/research/unsup-parts/
translated by 谷歌翻译
深度学习的快速发展在分割方面取得了长足的进步,这是计算机视觉的基本任务之一。但是,当前的细分算法主要取决于像素级注释的可用性,这些注释通常昂贵,乏味且费力。为了减轻这一负担,过去几年见证了越来越多的关注,以建立标签高效,深度学习的细分算法。本文对标签有效的细分方法进行了全面的审查。为此,我们首先根据不同类型的弱标签提供的监督(包括没有监督,粗略监督,不完整的监督和嘈杂的监督和嘈杂的监督),首先开发出一种分类法来组织这些方法,并通过细分类型(包括语义细分)补充,实例分割和全景分割)。接下来,我们从统一的角度总结了现有的标签有效的细分方法,该方法讨论了一个重要的问题:如何弥合弱监督和密集预测之间的差距 - 当前的方法主要基于启发式先导,例如交叉像素相似性,跨标签约束,跨视图一致性,跨图像关系等。最后,我们分享了对标签有效深层细分的未来研究方向的看法。
translated by 谷歌翻译
标记数据通常昂贵且耗时,特别是对于诸如对象检测和实例分割之类的任务,这需要对图像的密集标签进行密集的标签。虽然几张拍摄对象检测是关于培训小说中的模型(看不见的)对象类具有很少的数据,但它仍然需要在许多标记的基础(见)类的课程上进行训练。另一方面,自我监督的方法旨在从未标记数据学习的学习表示,该数据转移到诸如物体检测的下游任务。结合几次射击和自我监督的物体检测是一个有前途的研究方向。在本调查中,我们审查并表征了几次射击和自我监督对象检测的最新方法。然后,我们给我们的主要外卖,并讨论未来的研究方向。https://gabrielhuang.github.io/fsod-survey/的项目页面
translated by 谷歌翻译
The success of deep learning in vision can be attributed to: (a) models with high capacity; (b) increased computational power; and (c) availability of large-scale labeled data. Since 2012, there have been significant advances in representation capabilities of the models and computational capabilities of GPUs. But the size of the biggest dataset has surprisingly remained constant. What will happen if we increase the dataset size by 10× or 100×? This paper takes a step towards clearing the clouds of mystery surrounding the relationship between 'enormous data' and visual deep learning. By exploiting the JFT-300M dataset which has more than 375M noisy labels for 300M images, we investigate how the performance of current vision tasks would change if this data was used for representation learning. Our paper delivers some surprising (and some expected) findings. First, we find that the performance on vision tasks increases logarithmically based on volume of training data size. Second, we show that representation learning (or pretraining) still holds a lot of promise. One can improve performance on many vision tasks by just training a better base model. Finally, as expected, we present new state-of-theart results for different vision tasks including image classification, object detection, semantic segmentation and human pose estimation. Our sincere hope is that this inspires vision community to not undervalue the data and develop collective efforts in building larger datasets.
translated by 谷歌翻译
我们提出了一个令人尴尬的简单点注释方案,以收集弱监督,例如分割。除了边界框外,我们还收集了在每个边界框内均匀采样的一组点的二进制标签。我们表明,为完整的掩模监督开发的现有实例细分模型可以通过我们的方案收集基于点的监督而无缝培训。值得注意的是,接受了可可,Pascal VOC,CityScapes和LVI的面具R-CNN,每个物体只有10个带注释的随机点可实现94% - 占其完全监督的性能的98%,为弱化的实例细分定下了强大的基线。新点注释方案的速度比注释完整的对象掩码快5倍,使高质量实例分割在实践中更容易访问。受基于点的注释形式的启发,我们提出了对Pointrend实例分割模块的修改。对于每个对象,称为隐式pointrend的新体系结构生成一个函数的参数,该函数可以使最终的点级掩码预测。隐式Pointrend更加简单,并使用单点级掩蔽丢失。我们的实验表明,新模块更适合基于点的监督。
translated by 谷歌翻译
基于GAN的生成建模的进展是,社区的推动是为了发现超出图像生成和编辑任务的使用。特别是,最近的几项工作表明,可以重新用诸如零件分割的判别任务重新用来重新用,尤其是当训练数据有限时。但这些改进如何解决自我监督学习的最新进展情况?由此引起这种激励,我们提出了一种基于对比学习的替代方法,并比较它们对标准的几次射击部分分割基准的性能。我们的实验表明,不仅GAN的方法不提供显着的性能优势,它们的多步训练很复杂,几乎是数量级较慢,并且可以引入额外的偏差。这些实验表明,由使用对比学习训练的标准前馈网络捕获的生成模型的感应偏差,例如它们的解开形状和纹理的能力。这些实验表明,目前生成模型中存在的电感偏差,例如它们的解开形状和纹理的能力,通过使用对比学习训练的标准前馈网络充分捕获。
translated by 谷歌翻译
临床医生在手术室(OR)的细粒度定位是设计新一代或支持系统的关键组成部分。需要基于人像素的分段和身体视觉计算机的计算机视觉模型检测,以更好地了解OR的临床活动和空间布局。这是具有挑战性的,这不仅是因为或图像与传统视觉数据集有很大不同,还因为在隐私问题上很难收集和生成数据和注释。为了解决这些问题,我们首先研究了如何在低分辨率图像上进行姿势估计和实例分割,而下采样因子从1x到12倍进行下采样因子。其次,为了解决域的偏移和缺乏注释,我们提出了一种新型的无监督域适应方法,称为适配器,以使模型从野外标记的源域中适应统计上不同的未标记目标域。我们建议在未标记的目标域图像的不同增强上利用明确的几何约束,以生成准确的伪标签,并使用这些伪标签在自我训练框架中对高分辨率和低分辨率或图像进行训练。此外,我们提出了分离的特征归一化,以处理统计上不同的源和目标域数据。对两个或数据集MVOR+和TUM-或TUM-或测试的详细消融研究的广泛实验结果表明,我们方法对强构建的基线的有效性,尤其是在低分辨率的隐私性或图像上。最后,我们在大规模可可数据集上显示了我们作为半监督学习方法(SSL)方法的普遍性,在这里,我们获得了可比较的结果,而对经过100%标记的监督培训的模型的标签监督只有1%。 。
translated by 谷歌翻译
We propose EM-PASTE: an Expectation Maximization(EM) guided Cut-Paste compositional dataset augmentation approach for weakly-supervised instance segmentation using only image-level supervision. The proposed method consists of three main components. The first component generates high-quality foreground object masks. To this end, an EM-like approach is proposed that iteratively refines an initial set of object mask proposals generated by a generic region proposal method. Next, in the second component, high-quality context-aware background images are generated using a text-to-image compositional synthesis method like DALL-E. Finally, the third component creates a large-scale pseudo-labeled instance segmentation training dataset by compositing the foreground object masks onto the original and generated background images. The proposed approach achieves state-of-the-art weakly-supervised instance segmentation results on both the PASCAL VOC 2012 and MS COCO datasets by using only image-level, weak label information. In particular, it outperforms the best baseline by +7.4 and +2.8 mAP0.50 on PASCAL and COCO, respectively. Further, the method provides a new solution to the long-tail weakly-supervised instance segmentation problem (when many classes may only have few training samples), by selectively augmenting under-represented classes.
translated by 谷歌翻译
我们表明,将人类的先验知识与端到端学习相结合可以通过引入基于零件的对象分类模型来改善深神经网络的鲁棒性。我们认为,更丰富的注释形式有助于指导神经网络学习更多可靠的功能,而无需更多的样本或更大的模型。我们的模型将零件分割模型与一个微小的分类器结合在一起,并经过训练的端到端,以同时将对象分割为各个部分,然后对分段对象进行分类。从经验上讲,与所有三个数据集的Resnet-50基线相比,我们的基于部分的模型既具有更高的精度和更高的对抗性鲁棒性。例如,鉴于相同的鲁棒性,我们部分模型的清洁准确性高达15个百分点。我们的实验表明,这些模型还减少了纹理偏见,并对共同的腐败和虚假相关性产生更好的鲁棒性。该代码可在https://github.com/chawins/adv-part-model上公开获得。
translated by 谷歌翻译
我们提出了一种在数据样本集合中共同推断标签的方法,其中每个样本都包含一个观察和对标签的先验信念。通过隐式假设存在一种生成模型,可区分预测因子是后部,我们得出了一个训练目标,该目标允许在弱信念下学习。该配方统一了各种机器学习设置;弱信念可以以嘈杂或不完整的标签形式出现,由辅助输入的不同预测机制给出的可能性,或反映出有关手头问题结构的知识的常识性先验。我们证明了有关各种问题的建议算法:通过负面培训示例进行分类,从排名中学习,弱和自我监督的空中成像细分,视频框架的共段以及粗糙的监督文本分类。
translated by 谷歌翻译
在弱监督的本地化设置中,监督作为图像级标签。我们建议使用图像分类器$ F $,并培训发电网络$ G $,给定输入图像,指示图像内对象位置的每个像素权重映射。通过最大限度地减少原始图像上的分类器F $ F $的输出之间的差异来培训网络$ G $培训。该方案需要一个正常化术语,确保$ G $不提供统一的重量,以及提前停止标准,以防止超过段图像。我们的结果表明,该方法在充满挑战的细粒度分类数据集中的相当余量以及通用图像识别数据集中优于现有的本地化方法。另外,在细粒度分类数据集中的弱监督分割中,所获得的权重映射也是最新的。
translated by 谷歌翻译
For best performance, today's semantic segmentation methods use large and carefully labeled datasets, requiring expensive annotation budgets. In this work, we show that coarse annotation is a low-cost but highly effective alternative for training semantic segmentation models. Considering the urban scene segmentation scenario, we leverage cheap coarse annotations for real-world captured data, as well as synthetic data to train our model and show competitive performance compared with finely annotated real-world data. Specifically, we propose a coarse-to-fine self-training framework that generates pseudo labels for unlabeled regions of the coarsely annotated data, using synthetic data to improve predictions around the boundaries between semantic classes, and using cross-domain data augmentation to increase diversity. Our extensive experimental results on Cityscapes and BDD100k datasets demonstrate that our method achieves a significantly better performance vs annotation cost tradeoff, yielding a comparable performance to fully annotated data with only a small fraction of the annotation budget. Also, when used as pretraining, our framework performs better compared to the standard fully supervised setting.
translated by 谷歌翻译
我们对最近的自我和半监督ML技术进行严格的评估,从而利用未标记的数据来改善下游任务绩效,以河床分割的三个遥感任务,陆地覆盖映射和洪水映射。这些方法对于遥感任务特别有价值,因为易于访问未标记的图像,并获得地面真理标签通常可以昂贵。当未标记的图像(标记数据集之外)提供培训时,我们量化性能改进可以对这些遥感分割任务进行期望。我们还设计实验以测试这些技术的有效性,当测试集相对于训练和验证集具有域移位时。
translated by 谷歌翻译
人类姿势信息是许多下游图像处理任务中的关键组成部分,例如活动识别和运动跟踪。同样地,所示字符域的姿势估计器将在辅助内容创建任务中提供有价值的,例如参考姿势检索和自动字符动画。但是,虽然现代数据驱动技术在自然图像上具有显着提高的姿态估计性能,但是对于插图来说已经完成了很少的工作。在我们的工作中,我们通过从域特定的和任务特定的源模型有效地学习来弥合这个域名差距。此外,我们还升级和展开现有的所示姿势估计数据集,并引入两个用于分类和分段子任务的新数据集。然后,我们应用所产生的最先进的角色姿势估算器来解决姿势引导例证检索的新颖任务。所有数据,模型和代码都将公开可用。
translated by 谷歌翻译