预测地标的面部图像中的属性本身就是一个具有挑战性的任务,当由于使用面具而被遮挡时,进一步复杂。利用身份验证或安全登录到个人电子小工具的智能访问控制门可以使用面部作为生物特征。特别是,Covid-19大流行越来越多地验证卫生和非接触式身份验证的基础。在这种情况下,掩模的使用变得更加不可避免并且表演属性预测有助于在协作环境中对社区传播或确保他们的社会疏散来分离目标脆弱的群体。我们通过有效地覆盖不同形状,尺寸和纹理的掩模来创建一个蒙面的面部数据集,以有效地模拟戴掩模产生的变异性。本文提出了深度多任务学习(MTL)方法,共同估计来自单个掩蔽的面部图像的各种异构属性。基准面部属性UTKface数据集的实验结果表明,所提出的方法在绩效中取代其他竞争技术。
translated by 谷歌翻译
There has been a concurrent significant improvement in the medical images used to facilitate diagnosis and the performance of machine learning techniques to perform tasks such as classification, detection, and segmentation in recent years. As a result, a rapid increase in the usage of such systems can be observed in the healthcare industry, for instance in the form of medical image classification systems, where these models have achieved diagnostic parity with human physicians. One such application where this can be observed is in computer vision tasks such as the classification of skin lesions in dermatoscopic images. However, as stakeholders in the healthcare industry, such as insurance companies, continue to invest extensively in machine learning infrastructure, it becomes increasingly important to understand the vulnerabilities in such systems. Due to the highly critical nature of the tasks being carried out by these machine learning models, it is necessary to analyze techniques that could be used to take advantage of these vulnerabilities and methods to defend against them. This paper explores common adversarial attack techniques. The Fast Sign Gradient Method and Projected Descent Gradient are used against a Convolutional Neural Network trained to classify dermatoscopic images of skin lesions. Following that, it also discusses one of the most popular adversarial defense techniques, adversarial training. The performance of the model that has been trained on adversarial examples is then tested against the previously mentioned attacks, and recommendations to improve neural networks robustness are thus provided based on the results of the experiment.
translated by 谷歌翻译
Datacenter operators ensure fair and regular server maintenance by using automated processes to schedule maintenance jobs to complete within a strict time budget. Automating this scheduling problem is challenging because maintenance job duration varies based on both job type and hardware. While it is tempting to use prior machine learning techniques for predicting job duration, we find that the structure of the maintenance job scheduling problem creates a unique challenge. In particular, we show that prior machine learning methods that produce the lowest error predictions do not produce the best scheduling outcomes due to asymmetric costs. Specifically, underpredicting maintenance job duration has results in more servers being taken offline and longer server downtime than overpredicting maintenance job duration. The system cost of underprediction is much larger than that of overprediction. We present Acela, a machine learning system for predicting maintenance job duration, which uses quantile regression to bias duration predictions toward overprediction. We integrate Acela into a maintenance job scheduler and evaluate it on datasets from large-scale, production datacenters. Compared to machine learning based predictors from prior work, Acela reduces the number of servers that are taken offline by 1.87-4.28X, and reduces the server offline time by 1.40-2.80X.
translated by 谷歌翻译
We study the expressibility and learnability of convex optimization solution functions and their multi-layer architectural extension. The main results are: \emph{(1)} the class of solution functions of linear programming (LP) and quadratic programming (QP) is a universal approximant for the $C^k$ smooth model class or some restricted Sobolev space, and we characterize the rate-distortion, \emph{(2)} the approximation power is investigated through a viewpoint of regression error, where information about the target function is provided in terms of data observations, \emph{(3)} compositionality in the form of a deep architecture with optimization as a layer is shown to reconstruct some basic functions used in numerical analysis without error, which implies that \emph{(4)} a substantial reduction in rate-distortion can be achieved with a universal network architecture, and \emph{(5)} we discuss the statistical bounds of empirical covering numbers for LP/QP, as well as a generic optimization problem (possibly nonconvex) by exploiting tame geometry. Our results provide the \emph{first rigorous analysis of the approximation and learning-theoretic properties of solution functions} with implications for algorithmic design and performance guarantees.
translated by 谷歌翻译
Recovery of true color from underwater images is an ill-posed problem. This is because the wide-band attenuation coefficients for the RGB color channels depend on object range, reflectance, etc. which are difficult to model. Also, there is backscattering due to suspended particles in water. Thus, most existing deep-learning based color restoration methods, which are trained on synthetic underwater datasets, do not perform well on real underwater data. This can be attributed to the fact that synthetic data cannot accurately represent real conditions. To address this issue, we use an image to image translation network to bridge the gap between the synthetic and real domains by translating images from synthetic underwater domain to real underwater domain. Using this multimodal domain adaptation technique, we create a dataset that can capture a diverse array of underwater conditions. We then train a simple but effective CNN based network on our domain adapted dataset to perform color restoration. Code and pre-trained models can be accessed at https://github.com/nehamjain10/TRUDGCR
translated by 谷歌翻译
As multimodal learning finds applications in a wide variety of high-stakes societal tasks, investigating their robustness becomes important. Existing work has focused on understanding the robustness of vision-and-language models to imperceptible variations on benchmark tasks. In this work, we investigate the robustness of multimodal classifiers to cross-modal dilutions - a plausible variation. We develop a model that, given a multimodal (image + text) input, generates additional dilution text that (a) maintains relevance and topical coherence with the image and existing text, and (b) when added to the original text, leads to misclassification of the multimodal input. Via experiments on Crisis Humanitarianism and Sentiment Detection tasks, we find that the performance of task-specific fusion-based multimodal classifiers drops by 23.3% and 22.5%, respectively, in the presence of dilutions generated by our model. Metric-based comparisons with several baselines and human evaluations indicate that our dilutions show higher relevance and topical coherence, while simultaneously being more effective at demonstrating the brittleness of the multimodal classifiers. Our work aims to highlight and encourage further research on the robustness of deep multimodal models to realistic variations, especially in human-facing societal applications. The code and other resources are available at https://claws-lab.github.io/multimodal-robustness/.
translated by 谷歌翻译
Spiking Neural Networks (SNNs) are bio-plausible models that hold great potential for realizing energy-efficient implementations of sequential tasks on resource-constrained edge devices. However, commercial edge platforms based on standard GPUs are not optimized to deploy SNNs, resulting in high energy and latency. While analog In-Memory Computing (IMC) platforms can serve as energy-efficient inference engines, they are accursed by the immense energy, latency, and area requirements of high-precision ADCs (HP-ADC), overshadowing the benefits of in-memory computations. We propose a hardware/software co-design methodology to deploy SNNs into an ADC-Less IMC architecture using sense-amplifiers as 1-bit ADCs replacing conventional HP-ADCs and alleviating the above issues. Our proposed framework incurs minimal accuracy degradation by performing hardware-aware training and is able to scale beyond simple image classification tasks to more complex sequential regression tasks. Experiments on complex tasks of optical flow estimation and gesture recognition show that progressively increasing the hardware awareness during SNN training allows the model to adapt and learn the errors due to the non-idealities associated with ADC-Less IMC. Also, the proposed ADC-Less IMC offers significant energy and latency improvements, $2-7\times$ and $8.9-24.6\times$, respectively, depending on the SNN model and the workload, compared to HP-ADC IMC.
translated by 谷歌翻译
分散的学习算法可以通过在不同设备和位置生成的大型分布式数据集对深度学习模型进行培训,而无需中央服务器。在实际情况下,分布式数据集可以在整个代理之间具有显着不同的数据分布。当前的最新分散算法主要假设数据分布是独立且分布相同的(IID)。本文的重点是用最小的计算和内存开销来改善非IID数据分布的分散学习。我们提出了邻居梯度聚类(NGC),这是一种新型的分散学习算法,使用自我和交叉梯度信息修改每个代理的局部梯度。特别是,所提出的方法用自级的加权平均值,模型变化的跨梯度(接收到的邻居模型参数相对于本地数据集的衍生物)和数据变化,将模型的局部梯度取代了模型变化的均值平均值交叉梯度(相对于其邻居数据集的本地模型的衍生物)。此外,我们提出了compngc,这是NGC的压缩版本,通过压缩交叉梯度将通信开销降低了$ 32 \ times $。我们证明了所提出的技术在各种模型体系结构和图形拓扑上采样的非IID数据分布上提出的技术的经验收敛性和效率。我们的实验表明,NGC和COMPNGC的表现优于现有的最先进的(SOTA)去中心化学习算法,而不是非IID数据的$ 1-5 \%$,其计算和内存需求明显降低。此外,我们还表明,所提出的NGC方法的表现优于$ 5-40 \%$,而没有其他交流。
translated by 谷歌翻译
电缆在房屋,医院和工业仓库中很普遍,容易纠结。本文通过引入新颖的不确定性定量指标和与电缆相互作用以减少感知不确定性相互作用的新型不确定性定量指标和动作,扩展了对自动释放长电缆的先前工作。我们为Tangle操纵2.0(SGTM 2.0)提供了滑动和握力,该系统使用双边机器人自动解开大约3米长的电缆,并使用每个步骤的不确定性估算值估计,以告知动作。通过互动降低不确定性,缠结操作2.0(SGTM 2.0)的滑动和握住可以减少其必须采用的状态排列动作的数量,从而大大加快运行时间。实验表明,SGTM 2.0可以在1或2台上和图8节的电缆上取得83%的脱节成功,并且在这些配置中的70%终止检测成功,在无障碍精度上优于SGTM 1.0,超过43%,在全部推出速度上超过200% 。可以在sites.google.com/view/sgtm2上找到补充材料,可视化和视频。
translated by 谷歌翻译
已经证明,深层合奏将典型的集体学习中看到的积极效果扩展到神经网络和增强学习(RL)。但是,要提高此类整体模型的效率仍然有很多事情要做。在这项工作中,我们介绍了在RL(feft)中快速传输的各种合奏,这是一种基于合奏的新方法,用于在高度多模式环境中进行增强学习,并改善了转移到看不见的环境。该算法分为两个主要阶段:合奏成员的培训,以及合成成员的合成(或微调)成员,以在新环境中起作用。该算法的第一阶段涉及并行培训常规的政策梯度或参与者 - 批评者,但增加了鼓励这些政策彼此不同的损失。这会导致单个单峰剂探索最佳策略的空间,并捕获与单个参与者相比,捕获环境的多模式的更多。 DEFT的第二阶段涉及将组件策略综合为新的策略,该策略以两种方式之一在修改的环境中效果很好。为了评估DEFT的性能,我们从近端策略优化(PPO)算法的基本版本开始,并通过faft的修改将其扩展。我们的结果表明,预处理阶段可有效地在多模式环境中产生各种策略。除了替代方案,faft通常会收敛到高奖励的速度要快得多,例如随机初始化而无需faft和合奏成员的微调。虽然当然还有更多的工作来分析理论上的熟练并将其扩展为更强大,但我们认为,它为在环境中捕获多模式的框架提供了一个强大的框架,同时仍将使用简单策略表示的RL方法。
translated by 谷歌翻译