Bokeh效果是一种自然浅的景观现象,使焦点部分陷入摄影。为了追求美学上令人愉悦的照片,人们通常认为散景效应是照片不可或缺的一部分。由于其自然的优势和普遍性,以及许多视觉识别任务的事实可能已经受到“天然散景”现象的负面影响,在这项工作中,我们系统地研究了从新角度,即对抗性散景的散景效应攻击(Advbokeh)旨在将计算的欺骗性信息嵌入到Bokeh生成中,并产生自然的对抗性示例而没有任何人明显的噪声伪影。为此,我们首先提出了一种深度引导的Bokeh合成网络(Debsnet),其能够灵活地合成,重新分析和调整图像的散景水平,具有一级训练程序。 Debsnet允许我们利用Bokeh生成过程并攻击基于后续视觉任务生成现实Bokeh(即,对接地调整深度映射)所需的深度图。为了进一步提高对抗散景的真实性,我们提出了深度引导的梯度基攻击来规范梯度。我们在流行的对手图像分类数据集中验证所提出的方法,即Neurips-2017开发,并表明所提出的方法可以通过高成功率和高图像质量来穿透四个最先进的(SOTA)图像分类网络,即Reset50,VGG,DenSenet和MobileNetv2。通过Advbokeh获得的对抗实例也在黑匣子环境下表现出高水平的可转移性。此外,来自AdvboKeh的离前事实产生的散焦模糊图像实际上可以大写以增强SOTA Defocus Deblurring系统的性能,即IFAN。
translated by 谷歌翻译
目前的高保真发电和高精度检测DeepFake图像位于臂赛中。我们认为,生产高度逼真和“检测逃避”的深度可以服务于改善未来一代深度检测能力的最终目标。在本文中,我们提出了一种简单但强大的管道,以通过执行隐式空间域陷波滤波来减少假图像的伪影图案而不会损伤图像质量。我们首先表明频域陷波滤波,尽管由于陷波滤波器所需的手动设计,我们的任务对于我们的任务是有效的,但是频域陷波过滤虽然是有效的。因此,我们诉诸基于学习的方法来重现陷波滤波效果,而是仅在空间域中。我们采用添加压倒性的空间噪声来打破周期性噪声模式和深映像滤波来重建无噪声假图像,我们将我们的方法命名为Deadnotch。深度图像过滤为嘈杂图像中的每个像素提供专用过滤器,与其DeepFake对应物相比,产生具有高保真度的滤波图像。此外,我们还使用图像的语义信息来生成对抗性引导映射,以智能地添加噪声。我们对3种代表性的最先进的深蓝进行的大规模评估(在16种DeepFakes上测试)已经证明,我们的技术显着降低了这3种假图像检测方法的准确性,平均和高度为36.79% 97.02%在最好的情况下。
translated by 谷歌翻译
凭借生成的对抗网络(GANS)和其变体的全面合成和部分面部操纵已经提高了广泛的公众关注。在多媒体取证区,检测和最终定位图像伪造已成为一个必要的任务。在这项工作中,我们调查了现有的GaN的面部操纵方法的架构,并观察到其上采样方法的不完美可以作为GaN合成假图像检测和伪造定位的重要资产。基于这一基本观察,我们提出了一种新的方法,称为FAKELOCATOR,以在操纵的面部图像上全分辨率获得高分辨率准确性。据我们所知,这是第一次尝试解决GaN的虚假本地化问题,灰度尺寸贴身贴图,保留了更多伪造地区的信息。为了改善Fakelocator跨越多种面部属性的普遍性,我们介绍了注意机制来指导模型的培训。为了改善不同的DeepFake方法的FakElecator的普遍性,我们在训练图像上提出部分数据增强和单一样本聚类。对流行的面部刻度++,DFFD数据集和七种不同最先进的GAN的面部生成方法的实验结果表明了我们方法的有效性。与基线相比,我们的方法在各种指标上表现更好。此外,该方法对针对各种现实世界的面部图像劣化进行鲁棒,例如JPEG压缩,低分辨率,噪声和模糊。
translated by 谷歌翻译
Gaze estimation is the fundamental basis for many visual tasks. Yet, the high cost of acquiring gaze datasets with 3D annotations hinders the optimization and application of gaze estimation models. In this work, we propose a novel Head-Eye redirection parametric model based on Neural Radiance Field, which allows dense gaze data generation with view consistency and accurate gaze direction. Moreover, our head-eye redirection parametric model can decouple the face and eyes for separate neural rendering, so it can achieve the purpose of separately controlling the attributes of the face, identity, illumination, and eye gaze direction. Thus diverse 3D-aware gaze datasets could be obtained by manipulating the latent code belonging to different face attributions in an unsupervised manner. Extensive experiments on several benchmarks demonstrate the effectiveness of our method in domain generalization and domain adaptation for gaze estimation tasks.
translated by 谷歌翻译
Despite recent progress towards scaling up multimodal vision-language models, these models are still known to struggle on compositional generalization benchmarks such as Winoground. We find that a critical component lacking from current vision-language models is relation-level alignment: the ability to match directional semantic relations in text (e.g., "mug in grass") with spatial relationships in the image (e.g., the position of the mug relative to the grass). To tackle this problem, we show that relation alignment can be enforced by encouraging the directed language attention from 'mug' to 'grass' (capturing the semantic relation 'in') to match the directed visual attention from the mug to the grass. Tokens and their corresponding objects are softly identified using the cross-modal attention. We prove that this notion of soft relation alignment is equivalent to enforcing congruence between vision and language attention matrices under a 'change of basis' provided by the cross-modal attention matrix. Intuitively, our approach projects visual attention into the language attention space to calculate its divergence from the actual language attention, and vice versa. We apply our Cross-modal Attention Congruence Regularization (CACR) loss to UNITER and improve on the state-of-the-art approach to Winoground.
translated by 谷歌翻译
During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often changes the execution frequency as in the widely-used dynamic voltage and frequency scaling (DVFS) technique. This leads to highly unstable inference speed performance, especially for computation-intensive DNN models, which can harm user experience and waste hardware resources. We firstly identify this problem and then propose All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks (together with other auxiliary parameters of negligible storage) to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i.e., keeping the difference in speed performance under various execution frequencies as small as possible. Our experiments demonstrate that our method not only achieves high accuracy for multiple models of different pruning ratios, but also reduces their variance of inference latency for various frequencies, with minimal memory consumption of only one model and one soft mask.
translated by 谷歌翻译
How to effectively leverage the plentiful existing datasets to train a robust and high-performance model is of great significance for many practical applications. However, a model trained on a naive merge of different datasets tends to obtain poor performance due to annotation conflicts and domain divergence.In this paper, we attempt to train a unified model that is expected to perform well across domains on several popularity segmentation datasets.We conduct a detailed analysis of the impact on model generalization from three aspects of data augmentation, training strategies, and model capacity.Based on the analysis, we propose a robust solution that is able to improve model generalization across domains.Our solution ranks 2nd on RVC 2022 semantic segmentation task, with a dataset only 1/3 size of the 1st model used.
translated by 谷歌翻译
Clustering analysis of sequence data continues to address many applications in engineering design, aided with the rapid growth of machine learning in applied science. This paper presents an unsupervised machine learning algorithm to extract defining characteristics of earthquake ground-motion records, also called latent features, to aid in ground-motion clustering and selection. In this context, a latent feature is a low dimensional machine-discovered spectral characteristic learned through nonlinear relationships of a neural network autoencoder. Clustering can be performed on the latent features and used to select a representative archetypal subgroup from a large ground-motion suite. The objective of efficient ground-motion selection is to choose records representative of what the structure will probabilistically experience in its lifetime. Three examples are presented to validate this approach, including a synthetic spectral dataset and spectra from field recorded ground-motion records. Deep embedding clustering of ground motion spectra improves on the results of static feature extraction, utilizing characteristics that represent the sparse spectral content of ground motions.
translated by 谷歌翻译
Accelerated MRI aims to find a pair of samplers and reconstructors to reduce acquisition time while maintaining the reconstruction quality. Most of the existing works focus on finding either sparse samplers with a fixed reconstructor or finding reconstructors with a fixed sampler. Recently, people have begun to consider learning samplers and reconstructors jointly. In this paper, we propose an alternating training framework for finding a good pair of samplers and reconstructors via deep reinforcement learning (RL). In particular, we propose a novel sparse-reward Partially Observed Markov Decision Process (POMDP) to formulate the MRI sampling trajectory. Compared to the existing works that utilize dense-reward POMDPs, the proposed sparse-reward POMDP is more computationally efficient and has a provable advantage over dense-reward POMDPs. We evaluate our method on fastMRI, a public benchmark MRI dataset, and it achieves state-of-the-art reconstruction performances.
translated by 谷歌翻译
Vertical federated learning is a trending solution for multi-party collaboration in training machine learning models. Industrial frameworks adopt secure multi-party computation methods such as homomorphic encryption to guarantee data security and privacy. However, a line of work has revealed that there are still leakage risks in VFL. The leakage is caused by the correlation between the intermediate representations and the raw data. Due to the powerful approximation ability of deep neural networks, an adversary can capture the correlation precisely and reconstruct the data. To deal with the threat of the data reconstruction attack, we propose a hashing-based VFL framework, called \textit{HashVFL}, to cut off the reversibility directly. The one-way nature of hashing allows our framework to block all attempts to recover data from hash codes. However, integrating hashing also brings some challenges, e.g., the loss of information. This paper proposes and addresses three challenges to integrating hashing: learnability, bit balance, and consistency. Experimental results demonstrate \textit{HashVFL}'s efficiency in keeping the main task's performance and defending against data reconstruction attacks. Furthermore, we also analyze its potential value in detecting abnormal inputs. In addition, we conduct extensive experiments to prove \textit{HashVFL}'s generalization in various settings. In summary, \textit{HashVFL} provides a new perspective on protecting multi-party's data security and privacy in VFL. We hope our study can attract more researchers to expand the application domains of \textit{HashVFL}.
translated by 谷歌翻译