We pose video object segmentation as spectral graph clustering in space and time, with one graph node for each pixel and edges forming local space-time neighborhoods. We claim that the strongest cluster in this video graph represents the salient object. We start by introducing a novel and efficient method based on 3D filtering for approximating the spectral solution, as the principal eigenvector of the graph's adjacency matrix, without explicitly building the matrix. This key property allows us to have a fast parallel implementation on GPU, orders of magnitude faster than classical approaches for computing the eigenvector. Our motivation for a spectral space-time clustering approach, unique in video semantic segmentation literature, is that such clustering is dedicated to preserving object consistency over time, which we evaluate using our novel segmentation consistency measure. Further on, we show how to efficiently learn the solution over multiple input feature channels. Finally, we extend the formulation of our approach beyond the segmentation task, into the realm of object tracking. In extensive experiments we show significant improvements over top methods, as well as over powerful ensembles that combine them, achieving state-of-the-art on multiple benchmarks, both for tracking and segmentation.
translated by 谷歌翻译
分析数据的分布转移是当今机器学习的一个不断增长的研究方向,从而导致新的基准分析,重点是提供用于研究ML模型的概括属性的合适场景。现有的基准将重点放在监督的学习上,据我们所知,没有任何不受监督的学习。因此,我们引入了一个无监督的异常检测基准,其数据随着时间的流逝而变化,该数据随着时间的推移而变化,该数据是在京都-2006+上建立的,这是一个用于网络入侵检测的流量数据集。这种数据符合转移输入分布的前提:它涵盖了较大的时间跨度($ 10美元),随着时间的推移,自然发生的变化(\ eg用户正在修改其行为模式和软件更新)。我们首先使用基本的均衡分析,T-SNE和最佳运输方法来强调数据的非平稳性质,以测量年份之间的整体分布距离。接下来,我们提出AnoShift,该协议将数据分配为IID,近乎远距离测试拆分。我们通过不同的模型(传统到经典隔离林)来验证随时间推移的性能降解。最后,我们表明,通过确认分配转移问题并正确解决该问题,与经典的IID培训相比,性能可以提高(平均最高3美元\%$)。数据集和代码可在https://github.com/bit-ml/anoshift/上找到。
translated by 谷歌翻译
识别文本跨越几十年的作者的任务,并使用语言学,统计数据,更新,最近,机器学习。灵感灵感来自广泛的自然语言处理任务的令人印象深刻的性能增益,并通过持续的潘大型作者数据集的可用性,我们首先研究几个伯特式变压器的有效性,以便为作者验证的任务。这些模型证明了始终如一地达到非常高的分数。接下来,我们经验证明他们专注于局部线索而不是作者写作风格特征,利用数据集中的现有偏差。为了解决这个问题,我们为PAN-2020提供了新的分割,其中培训和测试数据从不相交的主题或作者采样。最后,我们介绍了DarkRedDit,一个具有不同输入数据分发的数据集。我们进一步使用它来分析低数据制度中模型的域泛化性能,以及在使用所提出的PAN-2020分割时如何变化,以进行微调。我们表明这些分割可以提高模型的模型,以通过新的,显着不同的数据集传输知识。
translated by 谷歌翻译
人类能力同步其所有感官的反馈启发了最近在多任务和多模态学习中的作品。虽然这些作品依赖于昂贵的监督,但我们的多任务图只需要来自专家模型的伪标签。每个图形节点代表任务,每个边沿都会在任务转换之间学习。一旦初始化,图表就会根据新的共识班算法学习自我监督,智能地利用图形路径之间的协议来为下一个学习周期生成新的伪标签。我们展示了一个无人监督的学习迭代到下一个令人市场,优于两个具有挑战性的数据集中的广泛的多任务学习实验中的最新相关方法的显着改善。我们的代码可在https://github.com/bit-ml/cshift中获得。
translated by 谷歌翻译
我们提出了一种对象跟踪方法,SFTRACK ++,其平滑地学习通过在来自视频的像素图中采用光谱聚类方法来保护跟踪的对象一致性,使用快速的3D滤波配方来查找主特征向量该图的邻接矩阵。为了更好地捕获跟踪对象的复杂方面,我们将我们的配方丰富到多通道输入,这允许同一输入的不同视点。通道输入在我们的实验中,是多个跟踪方法的输出。组合后,而不是仅依赖于隐藏的层表示来预测良好的跟踪边界框,我们明确地学习中间,更精细的一个,即跟踪对象的分割映射。这可以防止粗糙的常见边界盒方法在学习过程中引入噪音和分心器。我们在五个跟踪基准上测试我们的方法SFtrack ++:OTB,UAV,NFS,GOT-10K和TrackingNet,使用五个顶部跟踪器作为输入。我们的实验结果验证了预先注册假设。我们获得了一致和强大的结果,在三个传统基准(OTB,UAV,NFS)上竞争,在Got-10K和TrackingNet上大大竞争,在其他(以超过1.1 \%$ 1 \%)上,较新,较大,和更多不同的数据集。
translated by 谷歌翻译
Objective: Accurate visual classification of bladder tissue during Trans-Urethral Resection of Bladder Tumor (TURBT) procedures is essential to improve early cancer diagnosis and treatment. During TURBT interventions, White Light Imaging (WLI) and Narrow Band Imaging (NBI) techniques are used for lesion detection. Each imaging technique provides diverse visual information that allows clinicians to identify and classify cancerous lesions. Computer vision methods that use both imaging techniques could improve endoscopic diagnosis. We address the challenge of tissue classification when annotations are available only in one domain, in our case WLI, and the endoscopic images correspond to an unpaired dataset, i.e. there is no exact equivalent for every image in both NBI and WLI domains. Method: We propose a semi-surprised Generative Adversarial Network (GAN)-based method composed of three main components: a teacher network trained on the labeled WLI data; a cycle-consistency GAN to perform unpaired image-to-image translation, and a multi-input student network. To ensure the quality of the synthetic images generated by the proposed GAN we perform a detailed quantitative, and qualitative analysis with the help of specialists. Conclusion: The overall average classification accuracy, precision, and recall obtained with the proposed method for tissue classification are 0.90, 0.88, and 0.89 respectively, while the same metrics obtained in the unlabeled domain (NBI) are 0.92, 0.64, and 0.94 respectively. The quality of the generated images is reliable enough to deceive specialists. Significance: This study shows the potential of using semi-supervised GAN-based classification to improve bladder tissue classification when annotations are limited in multi-domain data.
translated by 谷歌翻译
The receptive field (RF), which determines the region of time series to be ``seen'' and used, is critical to improve the performance for time series classification (TSC). However, the variation of signal scales across and within time series data, makes it challenging to decide on proper RF sizes for TSC. In this paper, we propose a dynamic sparse network (DSN) with sparse connections for TSC, which can learn to cover various RF without cumbersome hyper-parameters tuning. The kernels in each sparse layer are sparse and can be explored under the constraint regions by dynamic sparse training, which makes it possible to reduce the resource cost. The experimental results show that the proposed DSN model can achieve state-of-art performance on both univariate and multivariate TSC datasets with less than 50\% computational cost compared with recent baseline methods, opening the path towards more accurate resource-aware methods for time series analyses. Our code is publicly available at: https://github.com/QiaoXiao7282/DSN.
translated by 谷歌翻译
While the problem of hallucinations in neural machine translation has long been recognized, so far the progress on its alleviation is very little. Indeed, recently it turned out that without artificially encouraging models to hallucinate, previously existing methods fall short and even the standard sequence log-probability is more informative. It means that characteristics internal to the model can give much more information than we expect, and before using external models and measures, we first need to ask: how far can we go if we use nothing but the translation model itself ? We propose to use a method that evaluates the percentage of the source contribution to a generated translation. Intuitively, hallucinations are translations "detached" from the source, hence they can be identified by low source contribution. This method improves detection accuracy for the most severe hallucinations by a factor of 2 and is able to alleviate hallucinations at test time on par with the previous best approach that relies on external models. Next, if we move away from internal model characteristics and allow external tools, we show that using sentence similarity from cross-lingual embeddings further improves these results.
translated by 谷歌翻译
Metric Elicitation (ME) is a framework for eliciting classification metrics that better align with implicit user preferences based on the task and context. The existing ME strategy so far is based on the assumption that users can most easily provide preference feedback over classifier statistics such as confusion matrices. This work examines ME, by providing a first ever implementation of the ME strategy. Specifically, we create a web-based ME interface and conduct a user study that elicits users' preferred metrics in a binary classification setting. We discuss the study findings and present guidelines for future research in this direction.
translated by 谷歌翻译
Learning-based image compression has improved to a level where it can outperform traditional image codecs such as HEVC and VVC in terms of coding performance. In addition to good compression performance, device interoperability is essential for a compression codec to be deployed, i.e., encoding and decoding on different CPUs or GPUs should be error-free and with negligible performance reduction. In this paper, we present a method to solve the device interoperability problem of a state-of-the-art image compression network. We implement quantization to entropy networks which output entropy parameters. We suggest a simple method which can ensure cross-platform encoding and decoding, and can be implemented quickly with minor performance deviation, of 0.3% BD-rate, from floating point model results.
translated by 谷歌翻译