Bokeh效果是一种自然浅的景观现象,使焦点部分陷入摄影。为了追求美学上令人愉悦的照片,人们通常认为散景效应是照片不可或缺的一部分。由于其自然的优势和普遍性,以及许多视觉识别任务的事实可能已经受到“天然散景”现象的负面影响,在这项工作中,我们系统地研究了从新角度,即对抗性散景的散景效应攻击(Advbokeh)旨在将计算的欺骗性信息嵌入到Bokeh生成中,并产生自然的对抗性示例而没有任何人明显的噪声伪影。为此,我们首先提出了一种深度引导的Bokeh合成网络(Debsnet),其能够灵活地合成,重新分析和调整图像的散景水平,具有一级训练程序。 Debsnet允许我们利用Bokeh生成过程并攻击基于后续视觉任务生成现实Bokeh(即,对接地调整深度映射)所需的深度图。为了进一步提高对抗散景的真实性,我们提出了深度引导的梯度基攻击来规范梯度。我们在流行的对手图像分类数据集中验证所提出的方法,即Neurips-2017开发,并表明所提出的方法可以通过高成功率和高图像质量来穿透四个最先进的(SOTA)图像分类网络,即Reset50,VGG,DenSenet和MobileNetv2。通过Advbokeh获得的对抗实例也在黑匣子环境下表现出高水平的可转移性。此外,来自AdvboKeh的离前事实产生的散焦模糊图像实际上可以大写以增强SOTA Defocus Deblurring系统的性能,即IFAN。
translated by 谷歌翻译
目前的高保真发电和高精度检测DeepFake图像位于臂赛中。我们认为,生产高度逼真和“检测逃避”的深度可以服务于改善未来一代深度检测能力的最终目标。在本文中,我们提出了一种简单但强大的管道,以通过执行隐式空间域陷波滤波来减少假图像的伪影图案而不会损伤图像质量。我们首先表明频域陷波滤波,尽管由于陷波滤波器所需的手动设计,我们的任务对于我们的任务是有效的,但是频域陷波过滤虽然是有效的。因此,我们诉诸基于学习的方法来重现陷波滤波效果,而是仅在空间域中。我们采用添加压倒性的空间噪声来打破周期性噪声模式和深映像滤波来重建无噪声假图像,我们将我们的方法命名为Deadnotch。深度图像过滤为嘈杂图像中的每个像素提供专用过滤器,与其DeepFake对应物相比,产生具有高保真度的滤波图像。此外,我们还使用图像的语义信息来生成对抗性引导映射,以智能地添加噪声。我们对3种代表性的最先进的深蓝进行的大规模评估(在16种DeepFakes上测试)已经证明,我们的技术显着降低了这3种假图像检测方法的准确性,平均和高度为36.79% 97.02%在最好的情况下。
translated by 谷歌翻译
Consensus clustering aggregates partitions in order to find a better fit by reconciling clustering results from different sources/executions. In practice, there exist noise and outliers in clustering task, which, however, may significantly degrade the performance. To address this issue, we propose a novel algorithm -- robust consensus clustering that can find common ground truth among experts' opinions, which tends to be minimally affected by the bias caused by the outliers. In particular, we formalize the robust consensus clustering problem as a constraint optimization problem, and then derive an effective algorithm upon alternating direction method of multipliers (ADMM) with rigorous convergence guarantee. Our method outperforms the baselines on benchmarks. We apply the proposed method to the real-world advertising campaign segmentation and forecasting tasks using the proposed consensus clustering results based on the similarity computed via Kolmogorov-Smirnov Statistics. The accurate clustering result is helpful for building the advertiser profiles so as to perform the forecasting.
translated by 谷歌翻译
Current mainstream object detection methods for large aerial images usually divide large images into patches and then exhaustively detect the objects of interest on all patches, no matter whether there exist objects or not. This paradigm, although effective, is inefficient because the detectors have to go through all patches, severely hindering the inference speed. This paper presents an Objectness Activation Network (OAN) to help detectors focus on fewer patches but achieve more efficient inference and more accurate results, enabling a simple and effective solution to object detection in large images. In brief, OAN is a light fully-convolutional network for judging whether each patch contains objects or not, which can be easily integrated into many object detectors and jointly trained with them end-to-end. We extensively evaluate our OAN with five advanced detectors. Using OAN, all five detectors acquire more than 30.0% speed-up on three large-scale aerial image datasets, meanwhile with consistent accuracy improvements. On extremely large Gaofen-2 images (29200$\times$27620 pixels), our OAN improves the detection speed by 70.5%. Moreover, we extend our OAN to driving-scene object detection and 4K video object detection, boosting the detection speed by 112.1% and 75.0%, respectively, without sacrificing the accuracy. Code is available at https://github.com/Ranchosky/OAN.
translated by 谷歌翻译
Due to the issue that existing wireless sensor network (WSN)-based anomaly detection methods only consider and analyze temporal features, in this paper, a self-supervised learning-based anomaly node detection method based on an autoencoder is designed. This method integrates temporal WSN data flow feature extraction, spatial position feature extraction and intermodal WSN correlation feature extraction into the design of the autoencoder to make full use of the spatial and temporal information of the WSN for anomaly detection. First, a fully connected network is used to extract the temporal features of nodes by considering a single mode from a local spatial perspective. Second, a graph neural network (GNN) is used to introduce the WSN topology from a global spatial perspective for anomaly detection and extract the spatial and temporal features of the data flows of nodes and their neighbors by considering a single mode. Then, the adaptive fusion method involving weighted summation is used to extract the relevant features between different models. In addition, this paper introduces a gated recurrent unit (GRU) to solve the long-term dependence problem of the time dimension. Eventually, the reconstructed output of the decoder and the hidden layer representation of the autoencoder are fed into a fully connected network to calculate the anomaly probability of the current system. Since the spatial feature extraction operation is advanced, the designed method can be applied to the task of large-scale network anomaly detection by adding a clustering operation. Experiments show that the designed method outperforms the baselines, and the F1 score reaches 90.6%, which is 5.2% higher than those of the existing anomaly detection methods based on unsupervised reconstruction and prediction. Code and model are available at https://github.com/GuetYe/anomaly_detection/GLSL
translated by 谷歌翻译
We study algorithms for detecting and including glass objects in an optimization-based Simultaneous Localization and Mapping (SLAM) algorithm in this work. When LiDAR data is the primary exteroceptive sensory input, glass objects are not correctly registered. This occurs as the incident light primarily passes through the glass objects or reflects away from the source, resulting in inaccurate range measurements for glass surfaces. Consequently, the localization and mapping performance is impacted, thereby rendering navigation in such environments unreliable. Optimization-based SLAM solutions, which are also referred to as Graph SLAM, are widely regarded as state of the art. In this paper, we utilize a simple and computationally inexpensive glass detection scheme for detecting glass objects and present the methodology to incorporate the identified objects into the occupancy grid maintained by such an algorithm (Google Cartographer). We develop both local (submap level) and global algorithms for achieving the objective mentioned above and compare the maps produced by our method with those produced by an existing algorithm that utilizes particle filter based SLAM.
translated by 谷歌翻译
Persuasion modeling is a key building block for conversational agents. Existing works in this direction are limited to analyzing textual dialogue corpus. We argue that visual signals also play an important role in understanding human persuasive behaviors. In this paper, we introduce the first multimodal dataset for modeling persuasion behaviors. Our dataset includes 199 dialogue transcriptions and videos captured in a multi-player social deduction game setting, 26,647 utterance level annotations of persuasion strategy, and game level annotations of deduction game outcomes. We provide extensive experiments to show how dialogue context and visual signals benefit persuasion strategy prediction. We also explore the generalization ability of language models for persuasion modeling and the role of persuasion strategies in predicting social deduction game outcomes. Our dataset, code, and models can be found at https://persuasion-deductiongame.socialai-data.org.
translated by 谷歌翻译
Image-based head swapping task aims to stitch a source head to another source body flawlessly. This seldom-studied task faces two major challenges: 1) Preserving the head and body from various sources while generating a seamless transition region. 2) No paired head swapping dataset and benchmark so far. In this paper, we propose an image-based head swapping framework (HS-Diffusion) which consists of a semantic-guided latent diffusion model (SG-LDM) and a semantic layout generator. We blend the semantic layouts of source head and source body, and then inpaint the transition region by the semantic layout generator, achieving a coarse-grained head swapping. SG-LDM can further implement fine-grained head swapping with the blended layout as condition by a progressive fusion process, while preserving source head and source body with high-quality reconstruction. To this end, we design a head-cover augmentation strategy for training and a neck alignment trick for geometric realism. Importantly, we construct a new image-based head swapping benchmark and propose two tailor-designed metrics (Mask-FID and Focal-FID). Extensive experiments demonstrate the superiority of our framework. The code will be available: https://github.com/qinghew/HS-Diffusion.
translated by 谷歌翻译
Point cloud registration (PCR) is a popular research topic in computer vision. Recently, the registration method in an evolutionary way has received continuous attention because of its robustness to the initial pose and flexibility in objective function design. However, most evolving registration methods cannot tackle the local optimum well and they have rarely investigated the success ratio, which implies the probability of not falling into local optima and is closely related to the practicality of the algorithm. Evolutionary multi-task optimization (EMTO) is a widely used paradigm, which can boost exploration capability through knowledge transfer among related tasks. Inspired by this concept, this study proposes a novel evolving registration algorithm via EMTO, where the multi-task configuration is based on the idea of solution space cutting. Concretely, one task searching in cut space assists another task with complex function landscape in escaping from local optima and enhancing successful registration ratio. To reduce unnecessary computational cost, a sparse-to-dense strategy is proposed. In addition, a novel fitness function robust to various overlap rates as well as a problem-specific metric of computational cost is introduced. Compared with 7 evolving registration approaches and 4 traditional registration approaches on the object-scale and scene-scale registration datasets, experimental results demonstrate that the proposed method has superior performances in terms of precision and tackling local optima.
translated by 谷歌翻译
Prompt learning is one of the most effective and trending ways to adapt powerful vision-language foundation models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples. However, although prompt learning achieves excellent performance over in-domain data, it still faces the major challenge of generalizing to unseen classes and domains. Some existing prompt learning methods tackle this issue by adaptively generating different prompts for different tokens or domains but neglecting the ability of learned prompts to generalize to unseen domains. In this paper, we propose a novel prompt learning paradigm that directly generates domain invariant prompt generalizable to unseen domains, called MetaPrompt. Specifically, a dual-modality prompt tuning network is proposed to generate prompts for inputs from both image and text modalities. More importantly, we propose a meta-learning-based prompt tuning algorithm that explicitly constrains the prompt tuned on a specific domain or class also to achieve good performance on another domain or class. Extensive experiments on 11 datasets for base-to-new generalization and four datasets for domain generalization demonstrate that our method consistently and significantly outperforms existing methods.
translated by 谷歌翻译