近年来,许多作品已经解决了在视频中发现从未见过的问题。然而,大多数工作都集中在从安全摄像机中检测监视视频中的异常帧。同时,异常检测(AD)在具有异常力学行为的视频中的任务大多被忽视。在这些视频中的异常检测是学术和实际的兴趣,因为它们可以在许多制造,维护和现实生活中自动检测出故障。为了评估检测这种异常的不同方法的潜力,我们评估了两个简单的基线方法:(i)时间汇集图像广告技术。 (ii)用于视频分类的预追溯特征的视频的密度估计。开发此类方法要求新的基准,以允许评估不同可能的方法。我们介绍了物理异常轨迹或运动(幻像)数据集,其中包含六个不同的视频类。每个类都包括正常和异常的视频。课程在呈现的现象,正常的级别变异性和视频中的异常类型中不同。我们还建议甚至更难的基准,应该在高度变量场景中发现异常活动。
translated by 谷歌翻译
异常检测方法识别偏离数据集的正常行为的样本。它通常用于训练集,其中包含来自多个标记类或单个未标记的类的普通数据。当前方法面对培训数据时争取多个类但没有标签。在这项工作中,我们首先发现自我监督的图像聚类方法学习的分类器为未标记的多级数据集上的异常检测提供了强大的基线。也许令人惊讶的是,我们发现初始化具有预先训练功能的聚类方法并不能改善其自我监督的对应物。这是由于灾难性遗忘的现象。相反,我们建议了两级方法。我们使用自我监督方法群集图像并为每个图像获取群集标签。我们使用群集标签作为“伪监督”,用于分销(OOD)方法。具体而言,我们通过群集标签对图像进行分类的任务进行预训练功能。我们提供了我们对方法的广泛分析,并展示了我们两级方法的必要性。我们评估符合最先进的自我监督和预用方法,并表现出卓越的性能。
translated by 谷歌翻译
Video anomaly detection (VAD) is a challenging computer vision task with many practical applications. As anomalies are inherently ambiguous, it is essential for users to understand the reasoning behind a system's decision in order to determine if the rationale is sound. In this paper, we propose a simple but highly effective method that pushes the boundaries of VAD accuracy and interpretability using attribute-based representations. Our method represents every object by its velocity and pose. The anomaly scores are computed using a density-based approach. Surprisingly, we find that this simple representation is sufficient to achieve state-of-the-art performance in ShanghaiTech, the largest and most complex VAD dataset. Combining our interpretable attribute-based representations with implicit, deep representation yields state-of-the-art performance with a $99.1\%, 93.3\%$, and $85.9\%$ AUROC on Ped2, Avenue, and ShanghaiTech, respectively. Our method is accurate, interpretable, and easy to implement.
translated by 谷歌翻译
Surveillance videos are able to capture a variety of realistic anomalies. In this paper, we propose to learn anomalies by exploiting both normal and anomalous videos. To avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to learn anomaly through the deep multiple instance ranking framework by leveraging weakly labeled training videos, i.e. the training labels (anomalous or normal) are at videolevel instead of clip-level. In our approach, we consider normal and anomalous videos as bags and video segments as instances in multiple instance learning (MIL), and automatically learn a deep anomaly ranking model that predicts high anomaly scores for anomalous video segments. Furthermore, we introduce sparsity and temporal smoothness constraints in the ranking loss function to better localize anomaly during training.We also introduce a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities. This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group. Second, for recognizing each of 13 anomalous activities. Our experimental results show that our MIL method for anomaly detection achieves significant improvement on anomaly detection performance as compared to the state-of-the-art approaches. We provide the results of several recent deep learning baselines on anomalous activity recognition. The low recognition performance of these baselines reveals that our dataset is very challenging and opens more opportunities for future work. The dataset is
translated by 谷歌翻译
异常检测方法努力以语义方式发现与规范不同的模式。这个目标是模棱两可的,因为数据点与规范不同的属性不同,例如年龄,种族或性别,可能被某些操作员认为是异常的,而其他操作员可能认为这种属性无关紧要。从先前的研究中断,我们提出了一种新的异常检测方法,该方法使操作员可以将属性排除在被认为与异常检测相关的情况下。然后,我们的方法学习了不包含有关滋扰属性的信息的表示形式。使用基于密度的方法进行异常评分。重要的是,我们的方法不需要指定与检测异常相关的属性,这在异常检测中通常是不可能的,而是只能忽略的属性。提出了一项实证研究,以验证我们方法的有效性。
translated by 谷歌翻译
We develop a novel framework for single-scene video anomaly localization that allows for human-understandable reasons for the decisions the system makes. We first learn general representations of objects and their motions (using deep networks) and then use these representations to build a high-level, location-dependent model of any particular scene. This model can be used to detect anomalies in new videos of the same scene. Importantly, our approach is explainable - our high-level appearance and motion features can provide human-understandable reasons for why any part of a video is classified as normal or anomalous. We conduct experiments on standard video anomaly detection datasets (Street Scene, CUHK Avenue, ShanghaiTech and UCSD Ped1, Ped2) and show significant improvements over the previous state-of-the-art.
translated by 谷歌翻译
Deep anomaly detection methods learn representations that separate between normal and anomalous images. Although self-supervised representation learning is commonly used, small dataset sizes limit its effectiveness. It was previously shown that utilizing external, generic datasets (e.g. ImageNet classification) can significantly improve anomaly detection performance. One approach is outlier exposure, which fails when the external datasets do not resemble the anomalies. We take the approach of transferring representations pre-trained on external datasets for anomaly detection. Anomaly detection performance can be significantly improved by fine-tuning the pre-trained representations on the normal training images. In this paper, we first demonstrate and analyze that contrastive learning, the most popular self-supervised learning paradigm cannot be naively applied to pre-trained features. The reason is that pre-trained feature initialization causes poor conditioning for standard contrastive objectives, resulting in bad optimization dynamics. Based on our analysis, we provide a modified contrastive objective, the Mean-Shifted Contrastive Loss. Our method is highly effective and achieves a new state-of-the-art anomaly detection performance including $98.6\%$ ROC-AUC on the CIFAR-10 dataset.
translated by 谷歌翻译
视频异常检测是视觉中的核心问题。正确检测和识别视频数据中行人中的异常行为将使安全至关重要的应用,例如监视,活动监测和人类机器人的互动。在本文中,我们建议利用无监督的行人异常事件检测的轨迹定位和预测。与以前的基于重建的方法不同,我们提出的框架依赖于正常和异常行人轨迹的预测误差来在空间和时间上检测异常。我们介绍了有关不同时间尺度的现实基准数据集的实验结果,并表明我们提出的基于轨迹预言的异常检测管道在识别视频中行人的异常活动方面有效有效。代码将在https://github.com/akanuasiegbu/leveraging-trajectory-prediction-for-pedestrian-video-anomaly-detection上提供。
translated by 谷歌翻译
Video anomaly detection (VAD) -- commonly formulated as a multiple-instance learning problem in a weakly-supervised manner due to its labor-intensive nature -- is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video. In this paper, we first propose to utilize the ViT-encoded visual features from CLIP, in contrast with the conventional C3D or I3D features in the domain, to efficiently extract discriminative representations in the novel technique. We then model long- and short-range temporal dependencies and nominate the snippets of interest by leveraging our proposed Temporal Self-Attention (TSA). The ablation study conducted on each component confirms its effectiveness in the problem, and the extensive experiments show that our proposed CLIP-TSA outperforms the existing state-of-the-art (SOTA) methods by a large margin on two commonly-used benchmark datasets in the VAD problem (UCF-Crime and ShanghaiTech Campus). The source code will be made publicly available upon acceptance.
translated by 谷歌翻译
在当代社会中,监视异常检测,即在监视视频中发现异常事件,例如犯罪或事故,是一项关键任务。由于异常发生很少发生,大多数培训数据包括没有标记的视频,没有异常事件,这使得任务具有挑战性。大多数现有方法使用自动编码器(AE)学习重建普通视频;然后,他们根据未能重建异常场景的出现来检测异常。但是,由于异常是通过外观和运动来区分的,因此许多先前的方法使用预训练的光流模型明确分开了外观和运动信息,例如。这种明确的分离限制了两种类型的信息之间的相互表示功能。相比之下,我们提出了一个隐式的两路AE(ITAE),其中两个编码器隐含模型外观和运动特征以及一个将它们组合在一起以学习正常视频模式的结构。对于正常场景的复杂分布,我们建议通过归一化流量(NF)的生成模型对ITAE特征的正常密度估计,以学习可拖动的可能性,并使用无法分布的检测来识别异常。 NF模型通过隐式学习的功能通过学习正常性来增强ITAE性能。最后,我们在六个基准测试中演示了ITAE及其特征分布建模的有效性,包括在现实世界中包含各种异常的数据库。
translated by 谷歌翻译
检测视频中的异常事件通常被帧为单级分类任务,其中培训视频仅包含正常事件,而测试视频包含正常和异常事件。在这种情况下,异常检测是一个开放式问题。然而,一些研究吸收异常检测行动识别。这是一个封闭式场景,无法测试检测到新的异常类型时系统的能力。为此,我们提出UbnorMal,这是一个由多个虚拟场景组成的新的监督开放式基准,用于视频异常检测。与现有数据集不同,我们首次引入在训练时间的像素级别注释的异常事件,从而实现了用于异常事件检测的完全监督的学习方法。为了保留典型的开放式配方,我们确保在我们的培训和测试集合中包括Disjoint集的异常类型。据我们所知,Ubnormal是第一个视频异常检测基准,以允许一流的开放模型和监督闭合模型之间的公平头部比较,如我们的实验所示。此外,我们提供了实证证据,表明Ubnormal可以提高两个突出数据集,大道和上海学习的最先进的异常检测框架的性能。
translated by 谷歌翻译
异常检测是一种既定的研究区,寻求识别出预定分布外的样本。异常检测管道由两个主要阶段组成:(1)特征提取和(2)正常评分分配。最近的论文使用预先训练的网络进行特征提取,实现最先进的结果。然而,使用预先训练的网络没有完全利用火车时间可用的正常样本。本文建议通过使用教师学生培训利用此信息。在我们的环境中,佩带的教师网络用于训练正常训练样本上的学生网络。由于学生网络仅在正常样本上培训,因此预计将偏离异常情况下的教师网络。这种差异可以用作预先训练的特征向量的互补表示。我们的方法 - 变换 - 利用预先训练的视觉变压器(VIV)来提取两个特征向量:预先接受的(不可知论者)功能和教师 - 学生(微调)功能。我们报告最先进的AUROC导致共同的单向设置,其中一个类被认为是正常的,其余的被认为是异常的,并且多模式设置,其中所有类别但是一个被认为是正常的,只有一个类被认为是异常的。代码可在https://github.com/matancohen1/transformaly获得。
translated by 谷歌翻译
视频异常检测旨在识别视频中发生的异常事件。由于异常事件相对较少,收集平衡数据集并培训二进制分类器以解决任务是不可行的。因此,最先前的方法只使用无监督或半监督方法从正常视频中学到。显然,它们是有限的捕获和利用鉴别异常特征,这导致受损的异常检测性能。在本文中,为了解决这个问题,我们通过充分利用用于视频异常检测的正常和异常视频来提出新的学习范式。特别是,我们制定了一个新的学习任务:跨域几次射击异常检测,可以从源域中的众多视频中学习知识,以帮助解决目标域中的几次异常检测。具体而言,我们利用目标普通视频的自我监督培训来减少域间隙,并设计一个Meta Context Cenception模块,以探索几次拍摄设置中的事件的视频上下文。我们的实验表明,我们的方法显着优于DotA和UCF犯罪数据集的基线方法,新任务有助于更实用的异常检测范例。
translated by 谷歌翻译
The existing methods for video anomaly detection mostly utilize videos containing identifiable facial and appearance-based features. The use of videos with identifiable faces raises privacy concerns, especially when used in a hospital or community-based setting. Appearance-based features can also be sensitive to pixel-based noise, straining the anomaly detection methods to model the changes in the background and making it difficult to focus on the actions of humans in the foreground. Structural information in the form of skeletons describing the human motion in the videos is privacy-protecting and can overcome some of the problems posed by appearance-based features. In this paper, we present a survey of privacy-protecting deep learning anomaly detection methods using skeletons extracted from videos. We present a novel taxonomy of algorithms based on the various learning approaches. We conclude that skeleton-based approaches for anomaly detection can be a plausible privacy-protecting alternative for video anomaly detection. Lastly, we identify major open research questions and provide guidelines to address them.
translated by 谷歌翻译
深度异常检测旨在将异常与具有高质量示例的正常样本分开。预磨料的特点带来了有效的代表和有前途的异常检测性能。但是,通过单级培训数据,调整佩带的功能是棘手的问题。具体而言,具有全局目标的现有优化目标通常导致图案崩溃,即所有输入都映射到同一个。在本文中,我们提出了一种新颖的适应框架,包括简单的线性变换和自我关注。这种适应应用于特定输入,并且其普定的特征空间中的正常样本的最接近的表示和相似的单级语义特征之间的内部关系。此外,基于此类框架,我们提出了有效的约束项来避免学习琐碎的解决方案。我们的简单自适应投影预呈现特征(SAP2)产生了一种新的异常检测标准,其更准确和坚固地崩溃。我们的方法在语义异常检测和感官异常检测基准上实现了最先进的异常检测性能,包括CIFAR-100数据集的96.5%Auroc,CiFar-10数据集97.0%Auroc和MVTEC数据集上的88.1%Auroc。
translated by 谷歌翻译
视频异常检测是现在计算机视觉中的热门研究主题之一,因为异常事件包含大量信息。异常是监控系统中的主要检测目标之一,通常需要实时行动。关于培训的标签数据的可用性(即,没有足够的标记数据进行异常),半监督异常检测方法最近获得了利益。本文介绍了该领域的研究人员,以新的视角,并评论了最近的基于深度学习的半监督视频异常检测方法,基于他们用于异常检测的共同策略。我们的目标是帮助研究人员开发更有效的视频异常检测方法。由于选择右深神经网络的选择对于这项任务的几个部分起着重要作用,首先准备了对DNN的快速比较审查。与以前的调查不同,DNN是从时空特征提取观点审查的,用于视频异常检测。这部分审查可以帮助本领域的研究人员选择合适的网络,以获取其方法的不同部分。此外,基于其检测策略,一些最先进的异常检测方法受到严格调查。审查提供了一种新颖,深入了解现有方法,并导致陈述这些方法的缺点,这可能是未来作品的提示。
translated by 谷歌翻译
In recent years, we have seen a significant interest in data-driven deep learning approaches for video anomaly detection, where an algorithm must determine if specific frames of a video contain abnormal behaviors. However, video anomaly detection is particularly context-specific, and the availability of representative datasets heavily limits real-world accuracy. Additionally, the metrics currently reported by most state-of-the-art methods often do not reflect how well the model will perform in real-world scenarios. In this article, we present the Charlotte Anomaly Dataset (CHAD). CHAD is a high-resolution, multi-camera anomaly dataset in a commercial parking lot setting. In addition to frame-level anomaly labels, CHAD is the first anomaly dataset to include bounding box, identity, and pose annotations for each actor. This is especially beneficial for skeleton-based anomaly detection, which is useful for its lower computational demand in real-world settings. CHAD is also the first anomaly dataset to contain multiple views of the same scene. With four camera views and over 1.15 million frames, CHAD is the largest fully annotated anomaly detection dataset including person annotations, collected from continuous video streams from stationary cameras for smart video surveillance applications. To demonstrate the efficacy of CHAD for training and evaluation, we benchmark two state-of-the-art skeleton-based anomaly detection algorithms on CHAD and provide comprehensive analysis, including both quantitative results and qualitative examination.
translated by 谷歌翻译
Despite significant advances in image anomaly detection and segmentation, few methods use 3D information. We utilize a recently introduced 3D anomaly detection dataset to evaluate whether or not using 3D information is a lost opportunity. First, we present a surprising finding: standard color-only methods outperform all current methods that are explicitly designed to exploit 3D information. This is counter-intuitive as even a simple inspection of the dataset shows that color-only methods are insufficient for images containing geometric anomalies. This motivates the question: how can anomaly detection methods effectively use 3D information? We investigate a range of shape representations including hand-crafted and deep-learning-based; we demonstrate that rotation invariance plays the leading role in the performance. We uncover a simple 3D-only method that beats all recent approaches while not using deep learning, external pre-training datasets, or color information. As the 3D-only method cannot detect color and texture anomalies, we combine it with color-based features, significantly outperforming previous state-of-the-art. Our method, dubbed BTF (Back to the Feature) achieves pixel-wise ROCAUC: 99.3% and PRO: 96.4% on MVTec 3D-AD.
translated by 谷歌翻译
这项工作的目的是检测并自动生成视频中异常事件的高级解释。了解异常事件的原因至关重要,因为所需的响应取决于其性质和严重程度。最近的作品通常使用对象或操作分类器来检测和提供异常事件的标签。然而,这将检测系统限制为有限的已知类别,并防止到未知物体或行为的概括。在这里,我们展示了如何在不使用对象或操作分类器的情况下稳健地检测异组织,但仍然恢复事件背后的高级原因。我们提出以下贡献:(1)一种使用显着性图来解除对象和动作分类器的异常事件解释的方法,(2)显示如何使用新的神经架构来学习视频的离散表示来提高显着图的质量通过预测未来帧和(3)将最先进的异常解释方法击败60 \%在公共基准X-MAN数据集的子集上。
translated by 谷歌翻译
最近在文献中引入了用于视频异常检测的自我监督的多任务学习(SSMTL)框架。由于其准确的结果,该方法吸引了许多研究人员的注意。在这项工作中,我们重新审视了自我监督的多任务学习框架,并提出了对原始方法的几个更新。首先,我们研究各种检测方法,例如基于使用光流或背景减法检测高运动区域,因为我们认为当前使用的预训练的Yolov3是次优的,例如从未检测到运动中的对象或来自未知类的对象。其次,我们通过引入多头自发项模块的启发,通过引入多头自我发项模块,使3D卷积骨干链现代化。因此,我们替代地引入了2D和3D卷积视觉变压器(CVT)块。第三,为了进一步改善模型,我们研究了其他自我监督的学习任务,例如通过知识蒸馏来预测细分图,解决拼图拼图,通过知识蒸馏估算身体的姿势,预测掩盖的区域(Inpaining)和对抗性学习具有伪异常。我们进行实验以评估引入变化的性能影响。在找到框架的更有希望的配置后,称为SSMTL ++ V1和SSMTL ++ V2后,我们将初步实验扩展到了更多数据集,表明我们的性能提高在所有数据集中都是一致的。在大多数情况下,我们在大道,上海the夫和Ubnormal上的结果将最新的表现提升到了新的水平。
translated by 谷歌翻译