We propose an interactive editing method that allows humans to help deep neural networks (DNNs) learn a latent space more consistent with human knowledge, thereby improving classification accuracy on indistinguishable ambiguous data. Firstly, we visualize high-dimensional data features through dimensionality reduction methods and design an interactive system \textit{SpaceEditing} to display the visualized data. \textit{SpaceEditing} provides a 2D workspace based on the idea of spatial layout. In this workspace, the user can move the projection data in it according to the system guidance. Then, \textit{SpaceEditing} will find the corresponding high-dimensional features according to the projection data moved by the user, and feed the high-dimensional features back to the network for retraining, therefore achieving the purpose of interactively modifying the high-dimensional latent space for the user. Secondly, to more rationally incorporate human knowledge into the training process of neural networks, we design a new loss function that enables the network to learn user-modified information. Finally, We demonstrate how \textit{SpaceEditing} meets user needs through three case studies while evaluating our proposed new method, and the results confirm the effectiveness of our method.
translated by 谷歌翻译
End-to-end generative methods are considered a more promising solution for image restoration in physics-based vision compared with the traditional deconstructive methods based on handcrafted composition models. However, existing generative methods still have plenty of room for improvement in quantitative performance. More crucially, these methods are considered black boxes due to weak interpretability and there is rarely a theory trying to explain their mechanism and learning process. In this study, we try to re-interpret these generative methods for image restoration tasks using information theory. Different from conventional understanding, we analyzed the information flow of these methods and identified three sources of information (extracted high-level information, retained low-level information, and external information that is absent from the source inputs) are involved and optimized respectively in generating the restoration results. We further derived their learning behaviors, optimization objectives, and the corresponding information boundaries by extending the information bottleneck principle. Based on this theoretic framework, we found that many existing generative methods tend to be direct applications of the general models designed for conventional generation tasks, which may suffer from problems including over-invested abstraction processes, inherent details loss, and vanishing gradients or imbalance in training. We analyzed these issues with both intuitive and theoretical explanations and proved them with empirical evidence respectively. Ultimately, we proposed general solutions or ideas to address the above issue and validated these approaches with performance boosts on six datasets of three different image restoration tasks.
translated by 谷歌翻译
Scene understanding is an essential and challenging task in computer vision. To provide the visually fundamental graphical structure of an image, the scene graph has received increased attention due to its powerful semantic representation. However, it is difficult to draw a proper scene graph for image retrieval, image generation, and multi-modal applications. The conventional scene graph annotation interface is not easy to use in image annotations, and the automatic scene graph generation approaches using deep neural networks are prone to generate redundant content while disregarding details. In this work, we propose SGDraw, a scene graph drawing interface using object-oriented scene graph representation to help users draw and edit scene graphs interactively. For the proposed object-oriented representation, we consider the objects, attributes, and relationships of objects as a structural unit. SGDraw provides a web-based scene graph annotation and generation tool for scene understanding applications. To verify the effectiveness of the proposed interface, we conducted a comparison study with the conventional tool and the user experience study. The results show that SGDraw can help generate scene graphs with richer details and describe the images more accurately than traditional bounding box annotations. We believe the proposed SGDraw can be useful in various vision tasks, such as image retrieval and generation.
translated by 谷歌翻译
不利的天气条件(例如阴霾,雨水和雪)通常会损害被捕获的图像的质量,从而导致在正常图像上训练的检测网络在这些情况下概括了很差。在本文中,我们提出了一个有趣的问题 - 如果图像恢复和对象检测的结合可以提高不利天气条件下尖端探测器的性能。为了回答它,我们提出了一个有效但统一的检测范式,该范式通过动态增强学习将这两个子任务桥接在一起,以在不利的天气条件下辨别对象,称为Togethernet。与现有的努力不同,这些努力将图像除去/der绘制为预处理步骤,而是考虑了一个多任务联合学习问题。遵循联合学习方案,可以共享由恢复网络产生的清洁功能,以在检测网络中学习更好的对象检测,从而有助于TogEthERNET在不利天气条件下增强检测能力。除了联合学习体系结构外,我们还设计了一个新的动态变压器功能增强模块,以提高togethernet的功能提取和表示功能。对合成和现实世界数据集的广泛实验表明,我们的togethernet在定量和质量上都超过了最先进的检测方法。源代码可从https://github.com/yz-wang/togethernet获得。
translated by 谷歌翻译
图像平滑是一项基本的低级视觉任务,旨在保留图像的显着结构,同时删除微不足道的细节。图像平滑中已经探索了深度学习,以应对语义结构和琐碎细节的复杂纠缠。但是,当前的方法忽略了平滑方面的两个重要事实:1)受限数量的高质量平滑地面真相监督的幼稚像素级回归可能会导致域的转移,并导致对现实世界图像的概括问题; 2)纹理外观与对象语义密切相关,因此图像平滑需要意识到语义差异以应用自适应平滑强度。为了解决这些问题,我们提出了一个新颖的对比语义引导的图像平滑网络(CSGIS-NET),该网络在促进强大的图像平滑之前结合了对比的先验和语义。通过利用不希望的平滑效应作为负面教师,并结合分段任务以鼓励语义独特性来增强监督信号。为了实现所提出的网络,我们还使用纹理增强和平滑标签(即VOC-Smooth)丰富了原始的VOC数据集,它们首先桥接图像平滑和语义分割。广泛的实验表明,所提出的CSGI-NET大量优于最先进的算法。代码和数据集可在https://github.com/wangjie6866/csgis-net上找到。
translated by 谷歌翻译
捕获不规则点云的局部和全局特征对于3D对象检测(3OD)至关重要。但是,主流3D探测器,例如,投票机及其变体,要么放弃池操作过程中的大量本地功能,要么忽略整个场景中的许多全球功能。本文探讨了新的模块,以同时学习积极服务3OD的场景点云的局部全球特征。为此,我们通过同时局部全球特征学习(称为3DLG-detector)提出了一个有效的3OD网络。 3DLG检测器有两个关键贡献。首先,它会开发一个动态点交互(DPI)模块,该模块可在合并过程中保留有效的本地特征。此外,DPI是可拆卸的,可以将其合并到现有的3OD网络中以提高其性能。其次,它开发了一个全局上下文聚合模块,以汇总编码器不同层的多尺度特征,以实现场景上下文意识。我们的方法在SUN RGB-D和扫描仪数据集的检测准确性和鲁棒性方面显示了13个竞争对手的进步。源代码将在出版物时提供。
translated by 谷歌翻译
在前景点(即物体)和室外激光雷达点云中的背景点之间通常存在巨大的失衡。它阻碍了尖端的探测器专注于提供信息的区域,以产生准确的3D对象检测结果。本文提出了一个新的对象检测网络,该对象检测网络通过称为PV-RCNN ++的语义点 - 素voxel特征相互作用。与大多数现有方法不同,PV-RCNN ++探索了语义信息,以增强对象检测的质量。首先,提出了一个语义分割模块,以保留更具歧视性的前景关键。这样的模块将指导我们的PV-RCNN ++在关键区域集成了更多与对象相关的点和体素特征。然后,为了使点和体素有效相互作用,我们利用基于曼哈顿距离的体素查询来快速采样关键点周围的体素特征。与球查询相比,这种体素查询将降低从O(N)到O(K)的时间复杂性。此外,为了避免仅学习本地特征,基于注意力的残留点网模块旨在扩展接收场,以将相邻的素素特征适应到关键点中。 Kitti数据集的广泛实验表明,PV-RCNN ++达到81.60 $ \%$,40.18 $ \%$,68.21 $ \%$ \%$ 3D地图在汽车,行人和骑自行车的人方面,可以在州,甚至可以在州立骑行者,甚至更好地绩效-艺术。
translated by 谷歌翻译
您将如何通过一些错过来修复物理物体?您可能会想象它的原始形状从先前捕获的图像中,首先恢复其整体(全局)但粗大的形状,然后完善其本地细节。我们有动力模仿物理维修程序以解决点云完成。为此,我们提出了一个跨模式的形状转移双转化网络(称为CSDN),这是一种带有全循环参与图像的粗到精细范式,以完成优质的点云完成。 CSDN主要由“ Shape Fusion”和“ Dual-Refinect”模块组成,以应对跨模式挑战。第一个模块将固有的形状特性从单个图像传输,以指导点云缺失区域的几何形状生成,在其中,我们建议iPadain嵌入图像的全局特征和部分点云的完成。第二个模块通过调整生成点的位置来完善粗糙输出,其中本地改进单元通过图卷积利用了小说和输入点之间的几何关系,而全局约束单元则利用输入图像来微调生成的偏移。与大多数现有方法不同,CSDN不仅探讨了图像中的互补信息,而且还可以在整个粗到精细的完成过程中有效利用跨模式数据。实验结果表明,CSDN对十个跨模式基准的竞争对手表现出色。
translated by 谷歌翻译
最近,针对各种实际应用,例如操纵学习,已经广泛探索了触觉手套。以前的手套设备具有不同的力驱动系统,例如形状记忆合金,伺服电动机和气动执行器;但是,这些提议的设备在快速运动,易于繁殖和安全问题方面可能难以置信。在这项研究中,我们提出了Magglove,这是一种具有线性电动机的可移动磁铁机制的新型触觉手套,以解决这些问题。拟议的Magglove设备是佩戴者手背面紧凑的系统,具有很高的响应性,易用性和良好的安全性。提出的设备是自适应的,随着电流流过线圈的大小的修饰。基于我们的评估研究,可以证实所提出的设备可以在给定任务中实现手指运动。因此,Magglove可以为操纵学习任务中的佩戴者学习水平提供量身定制的灵活支持。
translated by 谷歌翻译
点云的语义分割,旨在为每个点分配语义类别,对3D场景的理解至关重要。尽管近年来取得了重大进展,但大多数现有方法仍然遭受对象级别的错误分类或边界级别的歧义。在本文中,我们通过深入探索被称为Geosegnet的点云的几何形状来提出一个强大的语义分割网络。我们的Geosegnet由一个基于多几何的编码器和边界引导的解码器组成。在编码器中,我们从多几何的角度开发了一个新的残差几何模块,以提取对象级特征。在解码器中,我们引入了一个对比边界学习模块,以增强边界点的几何表示。从几何编码器模型中受益,我们的GEOSEGNET可以在使两个或多个对象的相交(边界)清晰地确定对象的分割。从总体分割精度和对象边界清除方面,实验显示了我们方法对竞争对手的明显改善。代码可在https://github.com/chen-yuiyui/geosegnet上找到。
translated by 谷歌翻译