基于面部的情感计算包括检测面部图像的情绪。它可以更好地自动理解人类行为是有用的,并且可以为改善人机相互作用铺平道路。但是,它涉及设计情绪的计算表示的挑战。到目前为止,情绪已经在2D价/唤醒空间中连续地表示,或者以Ekman的7种基本情绪为单位。另外,Ekman的面部动作单元(AU)系统也已被用来使用单一肌肉激活的代码手册来粘附情绪。 ABAW3和ABAW4多任务挑战是第一项提供用这三种标签注释的大规模数据库的工作。在本文中,我们提出了一种基于变压器的多任务方法,用于共同学习以预测唤醒,动作单位和基本情绪。从体系结构的角度来看,我们的方法使用任务的令牌方法来有效地建模任务之间的相似性。从学习的角度来看,我们使用不确定性加权损失来建模三个任务注释之间的随机性差异。
translated by 谷歌翻译
面部表达识别(FER)在许多研究领域至关重要,因为它使机器能够更好地理解人类的行为。 FER方法面临着相对较小的数据集和嘈杂数据的问题,这些数据不允许经典网络良好地概括。为了减轻这些问题,我们指导该模型专注于眼睛,嘴或眉毛等特定面部区域,我们认为这是决定面部表情的决定性的。我们提出了特权归因损失(PAL),该方法通过鼓励其归因图与面部标志形成的热图相对应,从而将模型的注意力引向最显着的面部区域。此外,我们引入了几种渠道策略,使该模型具有更高的自由度。所提出的方法独立于骨干体系结构,并且在测试时不需要其他语义信息。最后,实验结果表明,所提出的PAL方法的表现优于RAF-DB和Actionnet上的当前最新方法。
translated by 谷歌翻译
We propose a novel antialiasing method to increase shift invariance in convolutional neural networks (CNNs). More precisely, we replace the conventional combination "real-valued convolutions + max pooling" ($\mathbb R$Max) by "complex-valued convolutions + modulus" ($\mathbb C$Mod), which produce stable feature representations for band-pass filters with well-defined orientations. In a recent work, we proved that, for such filters, the two operators yield similar outputs. Therefore, $\mathbb C$Mod can be viewed as a stable alternative to $\mathbb R$Max. To separate band-pass filters from other freely-trained kernels, in this paper, we designed a "twin" architecture based on the dual-tree complex wavelet packet transform, which generates similar outputs as standard CNNs with fewer trainable parameters. In addition to improving stability to small shifts, our experiments on AlexNet and ResNet showed increased prediction accuracy on natural image datasets such as ImageNet and CIFAR10. Furthermore, our approach outperformed recent antialiasing methods based on low-pass filtering by preserving high-frequency information, while reducing memory usage.
translated by 谷歌翻译
最先进的计算机视觉方法的性能飞跃归因于深度神经网络的发展。但是,它通常以计算价格可能会阻碍其部署。为了减轻这种限制,结构化修剪是一种众所周知的技术,它包括去除通道,神经元或过滤器,并且通常用于生产更紧凑的模型。在大多数情况下,根据相对重要性标准选择要删除的计算。同时,对可解释的预测模型的需求极大地增加了,并激发了强大归因方法的发展,该方法突出了输入图像或特征图的像素的相对重要性。在这项工作中,我们讨论了现有的修剪启发式方法的局限性,其中包括基于梯度和基于梯度的方法。我们从归因方法中汲取灵感来设计一种新型的集成梯度修剪标准,其中每个神经元的相关性被定义为梯度变化在通往这种神经元去除的路径上的积分。此外,我们提出了一个纠缠的DNN修剪和微调流程图,以更好地保留DNN准确性,同时删除参数。我们通过在几个数据集,架构以及修剪场景上进行广泛的验证,该方法称为Singe,大大优于现有的最新DNN修剪方法。
translated by 谷歌翻译
Action Unit (AU) Detection is the branch of affective computing that aims at recognizing unitary facial muscular movements. It is key to unlock unbiased computational face representations and has therefore aroused great interest in the past few years. One of the main obstacles toward building efficient deep learning based AU detection system is the lack of wide facial image databases annotated by AU experts. In that extent the ABAW challenge paves the way toward better AU detection as it involves a 2M frames AU annotated dataset. In this paper, we present our submission to the ABAW3 challenge. In a nutshell, we applied a multi-label detection transformer that leverage multi-head attention to learn which part of the face image is the most relevant to predict each AU.
translated by 谷歌翻译