深度学习越来越多地在医疗保健中获得迅速采用,以帮助改善患者的结果。在医学图像分析中,需要进行广泛的培训,以获得必要的专业知识,以成为值得信赖的从业者。但是,尽管深度学习技术继续提供最先进的预测性能,但阻碍医疗保健中这一进展的主要挑战之一是这些模型推理机制的不透明性质。因此,归因在建立对利益相关者的信心中对深度学习模型为临床决策做出的预测的信心至关重要。这项工作试图回答以下问题:深神网络模型在医学图像中学到什么?从这个角度来看,我们使用基于自适应路径的梯度积分技术提出了一个新颖的归因框架。结果表明,通过允许他们了解输入预测相关结构,发现新的生物标志物并揭示潜在的模型偏见来提高领域专家的信任,以改善医疗保健结果。
translated by 谷歌翻译
We present a novel image inversion framework and a training pipeline to achieve high-fidelity image inversion with high-quality attribute editing. Inverting real images into StyleGAN's latent space is an extensively studied problem, yet the trade-off between the image reconstruction fidelity and image editing quality remains an open challenge. The low-rate latent spaces are limited in their expressiveness power for high-fidelity reconstruction. On the other hand, high-rate latent spaces result in degradation in editing quality. In this work, to achieve high-fidelity inversion, we learn residual features in higher latent codes that lower latent codes were not able to encode. This enables preserving image details in reconstruction. To achieve high-quality editing, we learn how to transform the residual features for adapting to manipulations in latent codes. We train the framework to extract residual features and transform them via a novel architecture pipeline and cycle consistency losses. We run extensive experiments and compare our method with state-of-the-art inversion methods. Qualitative metrics and visual comparisons show significant improvements. Code: https://github.com/hamzapehlivan/StyleRes
translated by 谷歌翻译
Artificial Intelligence (AI) and its applications have sparked extraordinary interest in recent years. This achievement can be ascribed in part to advances in AI subfields including Machine Learning (ML), Computer Vision (CV), and Natural Language Processing (NLP). Deep learning, a sub-field of machine learning that employs artificial neural network concepts, has enabled the most rapid growth in these domains. The integration of vision and language has sparked a lot of attention as a result of this. The tasks have been created in such a way that they properly exemplify the concepts of deep learning. In this review paper, we provide a thorough and an extensive review of the state of the arts approaches, key models design principles and discuss existing datasets, methods, their problem formulation and evaluation measures for VQA and Visual reasoning tasks to understand vision and language representation learning. We also present some potential future paths in this field of research, with the hope that our study may generate new ideas and novel approaches to handle existing difficulties and develop new applications.
translated by 谷歌翻译
Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation existed, until now. In this paper, we first formalize the problem, naming it tracking any point (TAP). We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. Central to the construction of our benchmark is a novel semi-automatic crowdsourced pipeline which uses optical flow estimates to compensate for easier, short-term motion like camera shake, allowing annotators to focus on harder sections of video. We validate our pipeline on synthetic data and propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.
translated by 谷歌翻译
通过脑电图信号的情绪分类取得了许多进步。但是,诸如缺乏数据和学习重要特征和模式之类的问题始终是具有在计算和预测准确性方面改进的领域。这项工作分析了基线机器学习分类器在DEAP数据集上的性能以及一种表格学习方法,该方法提供了最新的可比结果,从而利用了性能提升,这是由于其深度学习架构而无需部署重型神经网络。
translated by 谷歌翻译
发现广泛使用的深度学习模型的稳健性差。几乎没有噪音可以欺骗最先进的模型来做出错误的预测。尽管有很多高性能攻击生成方法,但其中大多数直接在原始数据中添加了扰动,并使用L_P规范对其进行测量;这可能会破坏数据的主要结构,从而产生无效的攻击。在本文中,我们提出了一个黑框攻击,该攻击不是修改原始数据,而是修改由自动编码器提取的数据的潜在特征;然后,我们测量语义空间中的噪音以保护数据的语义。我们在MNIST和CIFAR-10数据集上训练了自动编码器,并使用遗传算法发现了最佳的对抗扰动。我们的方法在MNIST和CIFAR-10数据集的前100个数据上获得了100%的攻击成功率,而扰动率较小。
translated by 谷歌翻译
由于钻孔对准的困难以及任务的固有不稳定性,在手动完成时,在弯曲的表面上钻一个孔很容易失败,可能会对工人造成伤害和疲劳。另一方面,在实际制造环境中充分自动化此类任务可能是不切实际的,因为到达装配线的零件可以具有各种复杂形状,在这些零件上不容易访问钻头位置,从而使自动化路径计划变得困难。在这项工作中,开发并部署了一个具有6个自由度的自适应入学控制器,并部署在Kuka LBR IIWA 7配件上,使操作员能够用一只手舒适地在机器人上安装在机器人上的钻头,并在弯曲的表面上开放孔,并在弯曲的表面上开放孔。通过AR界面提供的玉米饼和视觉指导的触觉指导。接收阻尼的实时适应性在自由空间中驱动机器人时,可以在确保钻孔过程中稳定时提供更高的透明度。用户将钻头足够靠近钻头目标并大致与所需的钻探角度对齐后,触觉指导模块首先对对齐进行微调,然后将用户运动仅限于钻孔轴,然后操作员仅将钻头推动钻头以最小的努力进入工件。进行了两组实验,以定量地研究触觉指导模块的潜在好处(实验I),以及根据参与者的主观意见(实验II),提出的用于实际制造环境的PHRI系统的实际价值。
translated by 谷歌翻译
扫描像素摄像机是一种新型的低成本低功率传感器,不受衍射限制。它作为扫描过程中从场景的各个部分提取的样品序列产生数据。它可以提供非常详细的图像,而牺牲了采样和缓慢的图像获取时间。本文提出了一种新的算法,该算法允许传感器在此序列的过程中调整采样量。这可以通过最大程度地减少图像和传输场景所需的带宽和时间来克服这些限制,同时保持图像质量。我们检查了图像分类和语义分割的应用,与完全采样的输入相比,能够获得相似的结果,而使用样本少80%
translated by 谷歌翻译
闭环水库管理(CLRM)在资产的生命中多次进行历史匹配和生产优化,可以为指定目标提供显着改善。由于数据同化和优化所需的大量流量模拟,这些过程在计算上昂贵。现有的CLRM程序是通过资产应用的,而无需利用可能在范围资产中有用的信息。在这里,我们开发了一个CLRM框架,用于多个井数的多个资产。我们使用深度强化学习来培训适用于所有资产的单一全球控制政策。新框架是最近引入的单个资产控制政策方法的扩展。将嵌入层纳入表示形式,以处理针对不同资产出现的不同数量的决策变量。由于全球控制策略从多个资产中学习了有用功能的统一表示,因此构造比逐项培训要便宜(我们在示例中观察到大约3倍加速)。生产优化问题包括对井设置的相对变化约束,这使得适合实际使用的结果。我们将多资产的CLRM框架应用于2D和3D水浸水的示例。在这两种情况下,都考虑了四个具有不同井计数,井配置和地统计描述的资产。数值实验表明,全球控制策略为2D和3D案例提供了客观函数值,这些策略与每个资产单独培训的控制策略中几乎相同。这一有希望的发现表明,多资产的CLRM确实可能代表了可行的实践策略。
translated by 谷歌翻译
我们提出了Vecgan,这是一个图像到图像翻译框架,用于带有可解释潜在方向的面部属性编辑。面部属性编辑任务面临着精确属性编辑的挑战,具有可控的强度和图像的其他属性的保存。对于此目标,我们通过潜在空间分解设计属性编辑,对于每个属性,我们学习了与其他属性正交的线性方向。另一个组件是变化的可控强度,标量值。在我们的框架中,可以通过投影从参考图像中对此标量进行采样或编码。我们的工作灵感来自固定预验证的gan的潜在空间分解作品。但是,尽管这些模型无法进行端到端训练,并难以精确编辑编码的图像,但Vecgan受到了端到端的培训,用于图像翻译任务,并成功地编辑了属性,同时保留了其他属性。我们的广泛实验表明,vecgan对本地和全球编辑的最先进进行了重大改进。
translated by 谷歌翻译