In this paper, we address the problem of image splicing localization with a multi-stream network architecture that processes the raw RGB image in parallel with other handcrafted forensic signals. Unlike previous methods that either use only the RGB images or stack several signals in a channel-wise manner, we propose an encoder-decoder architecture that consists of multiple encoder streams. Each stream is fed with either the tampered image or handcrafted signals and processes them separately to capture relevant information from each one independently. Finally, the extracted features from the multiple streams are fused in the bottleneck of the architecture and propagated to the decoder network that generates the output localization map. We experiment with two handcrafted algorithms, i.e., DCT and Splicebuster. Our proposed approach is benchmarked on three public forensics datasets, demonstrating competitive performance against several competing methods and achieving state-of-the-art results, e.g., 0.898 AUC on CASIA.
translated by 谷歌翻译
为了将时尚服装视为美学上的令人愉悦,构成它们的服装需要在视觉方面(例如样式,类别和颜色)兼容。随着计算机视觉深度学习模型的出现和无所不知,人们对视觉兼容检测的任务也增加了兴趣,目的是开发优质的时尚服装推荐系统。先前的作品将视觉兼容性定义为二进制分类任务,而衣服中的项目被认为是完全兼容或完全不相容的。但是,这不适用于用户创建自己的服装的服装制造商应用程序,并且需要知道哪些特定项目可能与其余的服装不相容。为了解决这个问题,我们提出了针对两个任务进行优化的视觉不兼容变压器(Victor):1)总体兼容性作为回归和2)检测不匹配项目。与以前的作品依赖于来自Imagenet预测模型的功能提取或端到端的微调不同,我们利用了时尚特定于时尚的对比语言图像预训练来进行微调计算机视觉神经网络在时尚图像上。此外,我们基于Polyvore Outfit基准测试,以产生部分不匹配的服装,创建一个称为Polyvore-Misfits的新数据集,该数据集用于训练Victor。一系列消融和比较分析表明,所提出的体系结构可以竞争,甚至超过Polyvore数据集上的最新目前,同时将实例的浮动操作减少88%,从而在高性能和效率之间达到平衡。
translated by 谷歌翻译
在这项工作中,我们的目标是将非结构化的点对点网络的节点与通信不确定性进行分类,例如分散的社交网络的用户。已知图形神经网络(GNNS)通过利用自然发生的网络链路来提高集中设置中更简单的分类器的准确性,但是当节点邻居不断可用时,图形卷积层在分散的设置中实现了在分散的设置中实现了具有挑战性的。我们通过采用分离的GNN来解决这个问题,其中基本分类器预测和错误通过训练之后通过图来扩散。为此,我们部署了预先训练和八卦培训的基本分类器,并在通信不确定性下实现对等图形扩散。特别地,我们开发了一种异步分散的扩散制剂,其在相对于通信速率线性地收敛于相同的预测。我们在具有节点特征和标签的三个实际图表上尝试,并使用均匀随机通信频率模拟点对点网络;给定一部分已知的标签,我们的分散的图形扩散实现了集中GNN的可比精度。
translated by 谷歌翻译
计算机视觉(CV)取得了显着的结果,在几个任务中表现优于人类。尽管如此,如果不正确处理,可能会导致重大歧视,因为CV系统高度依赖于他们所用的数据,并且可以在此类数据中学习和扩大偏见。因此,理解和发现偏见的问题至关重要。但是,没有关于视觉数据集中偏见的全面调查。因此,这项工作的目的是:i)描述可能在视觉数据集中表现出来的偏差; ii)回顾有关视觉数据集中偏置发现和量化方法的文献; iii)讨论现有的尝试收集偏见视觉数据集的尝试。我们研究的一个关键结论是,视觉数据集中发现和量化的问题仍然是开放的,并且在方法和可以解决的偏见范围方面都有改进的余地。此外,没有无偏见的数据集之类的东西,因此科学家和从业者必须意识到其数据集中的偏见并使它们明确。为此,我们提出了一个清单,以在Visual DataSet收集过程中发现不同类型的偏差。
translated by 谷歌翻译
在本文中,我们解决了大型数据集中的高性能和基于计算有效的基于内容的视频检索问题。当前方法通常提出:(i)采用时空表示和相似性计算的细粒度方法,以高计算成本以高性能获得高性能,或(ii)代表/索引视频作为全球向量的粗粒粒度方法,其中时空 - 时间结构丢失,提供较低的性能,但计算成本也很低。在这项工作中,我们提出了一个知识蒸馏框架,称为Distill-Select(DNS),该框架从表现良好的细颗粒教师网络开始学习:a)具有不同检索性能和计算效率折衷和计算效率的学生网络b)在测试时间迅速将样本引导到合适的学生以保持高检索性能和高计算效率的选择网络。我们培训几个具有不同架构的学生,并得出不同的性能和效率的不同权衡,即速度和存储要求,包括使用二进制表示的精细颗粒学生。重要的是,提出的计划允许在大型,未标记的数据集中进行知识蒸馏 - 这导致了好学生。我们在三个不同的视频检索任务上评估了五个公共数据集的DNS,并证明a)我们的学生在几种情况下达到最先进的性能,b)b)DNS框架在检索性能,计算中提供了极好的权衡速度和存储空间。在特定的配置中,所提出的方法可以通过老师获得相似的地图,但要快20倍,需要减少240倍的存储空间。收集到的数据集和实施已公开可用:https://github.com/mever-team/distill-and-select。
translated by 谷歌翻译
In recent years distributional reinforcement learning has produced many state of the art results. Increasingly sample efficient Distributional algorithms for the discrete action domain have been developed over time that vary primarily in the way they parameterize their approximations of value distributions, and how they quantify the differences between those distributions. In this work we transfer three of the most well-known and successful of those algorithms (QR-DQN, IQN and FQF) to the continuous action domain by extending two powerful actor-critic algorithms (TD3 and SAC) with distributional critics. We investigate whether the relative performance of the methods for the discrete action space translates to the continuous case. To that end we compare them empirically on the pybullet implementations of a set of continuous control tasks. Our results indicate qualitative invariance regarding the number and placement of distributional atoms in the deterministic, continuous action setting.
translated by 谷歌翻译
Data scarcity is one of the main issues with the end-to-end approach for Speech Translation, as compared to the cascaded one. Although most data resources for Speech Translation are originally document-level, they offer a sentence-level view, which can be directly used during training. But this sentence-level view is single and static, potentially limiting the utility of the data. Our proposed data augmentation method SegAugment challenges this idea and aims to increase data availability by providing multiple alternative sentence-level views of a dataset. Our method heavily relies on an Audio Segmentation system to re-segment the speech of each document, after which we obtain the target text with alignment methods. The Audio Segmentation system can be parameterized with different length constraints, thus giving us access to multiple and diverse sentence-level views for each document. Experiments in MuST-C show consistent gains across 8 language pairs, with an average increase of 2.2 BLEU points, and up to 4.7 BLEU for lower-resource scenarios in mTEDx. Additionally, we find that SegAugment is also applicable to purely sentence-level data, as in CoVoST, and that it enables Speech Translation models to completely close the gap between the gold and automatic segmentation at inference time.
translated by 谷歌翻译
The cyber-physical convergence is opening up new business opportunities for industrial operators. The need for deep integration of the cyber and the physical worlds establishes a rich business agenda towards consolidating new system and network engineering approaches. This revolution would not be possible without the rich and heterogeneous sources of data, as well as the ability of their intelligent exploitation, mainly due to the fact that data will serve as a fundamental resource to promote Industry 4.0. One of the most fruitful research and practice areas emerging from this data-rich, cyber-physical, smart factory environment is the data-driven process monitoring field, which applies machine learning methodologies to enable predictive maintenance applications. In this paper, we examine popular time series forecasting techniques as well as supervised machine learning algorithms in the applied context of Industry 4.0, by transforming and preprocessing the historical industrial dataset of a packing machine's operational state recordings (real data coming from the production line of a manufacturing plant from the food and beverage domain). In our methodology, we use only a single signal concerning the machine's operational status to make our predictions, without considering other operational variables or fault and warning signals, hence its characterization as ``agnostic''. In this respect, the results demonstrate that the adopted methods achieve a quite promising performance on three targeted use cases.
translated by 谷歌翻译
Automated Machine Learning-based systems' integration into a wide range of tasks has expanded as a result of their performance and speed. Although there are numerous advantages to employing ML-based systems, if they are not interpretable, they should not be used in critical, high-risk applications where human lives are at risk. To address this issue, researchers and businesses have been focusing on finding ways to improve the interpretability of complex ML systems, and several such methods have been developed. Indeed, there are so many developed techniques that it is difficult for practitioners to choose the best among them for their applications, even when using evaluation metrics. As a result, the demand for a selection tool, a meta-explanation technique based on a high-quality evaluation metric, is apparent. In this paper, we present a local meta-explanation technique which builds on top of the truthfulness metric, which is a faithfulness-based metric. We demonstrate the effectiveness of both the technique and the metric by concretely defining all the concepts and through experimentation.
translated by 谷歌翻译
The sheer volume of online user-generated content has rendered content moderation technologies essential in order to protect digital platform audiences from content that may cause anxiety, worry, or concern. Despite the efforts towards developing automated solutions to tackle this problem, creating accurate models remains challenging due to the lack of adequate task-specific training data. The fact that manually annotating such data is a highly demanding procedure that could severely affect the annotators' emotional well-being is directly related to the latter limitation. In this paper, we propose the CM-Refinery framework that leverages large-scale multimedia datasets to automatically extend initial training datasets with hard examples that can refine content moderation models, while significantly reducing the involvement of human annotators. We apply our method on two model adaptation strategies designed with respect to the different challenges observed while collecting data, i.e. lack of (i) task-specific negative data or (ii) both positive and negative data. Additionally, we introduce a diversity criterion applied to the data collection process that further enhances the generalization performance of the refined models. The proposed method is evaluated on the Not Safe for Work (NSFW) and disturbing content detection tasks on benchmark datasets achieving 1.32% and 1.94% accuracy improvements compared to the state of the art, respectively. Finally, it significantly reduces human involvement, as 92.54% of data are automatically annotated in case of disturbing content while no human intervention is required for the NSFW task.
translated by 谷歌翻译