深度度量学习(DML)旨在最大程度地减少嵌入图像中成对内部/间阶层接近性违规的经验预期损失。我们将DML与有限机会限制的可行性问题联系起来。我们表明,基于代理的DML的最小化器满足了某些机会限制,并且基于代理方法的最坏情况可以通过围绕类代理的最小球的半径来表征,以覆盖相应类的整个域样本,建议每课多个代理有助于表现。为了提供可扩展的算法并利用更多代理,我们考虑了基于代理的DML实例的最小化者所隐含的机会限制,并将DML重新制定为在此类约束的交叉点中找到可行的点,从而导致问题近似解决。迭代预测。简而言之,我们反复训练基于代理的损失,并用故意选择的新样本的嵌入来重新定位代理。我们将我们的方法应用于公认的损失,并在四个流行的基准数据集上评估图像检索。优于最先进的方法,我们的方法一致地提高了应用损失的性能。代码可在以下网址找到:https://github.com/yetigurbuz/ccp-dml
translated by 谷歌翻译
硬示例挖掘方法通常可以改善对象探测器的性能,这些探测器患有不平衡的训练集。在这项工作中,将两种现有的硬采矿方法(LRM和焦点损失,FL)改编成最先进的实时对象检测器Yolov5。广泛评估了提出的方法改善硬性示例性能的有效性。与使用原始损失函数相比,该方法将MAP提高3%,而在2021 Anti-UAV挑战数据集上单独使用硬挖掘方法(LRM或FL)相比,MAP和1-2%左右。
translated by 谷歌翻译
视频框架插值(VFI)是一项基本视觉任务,旨在综合两个连续的原始视频图像之间的几个帧。大多数算法旨在通过仅使用密钥帧来完成VFI,这是一个错误的问题,因为密钥帧通常不会对场景中对象的轨迹产生任何准确的精度。另一方面,基于事件的摄像机在视频的关键帧之间提供了更精确的信息。一些最新的基于事件的最新方法通过利用事件数据来更好地解决此问题,以更好地进行光流估计来通过翘曲插值视频框架。尽管如此,这些方法严重遭受了重影效果。另一方面,仅使用框架作为输入的一些基于内核的VFI方法表明,在用变压器备份时,可变形的卷积可能是处理长期依赖关系的可靠方法。我们提出了基于事件的视频框架插值,并作为一种基于轻质核的方法(E-VFIA)。 E-VFIA通过可变形的卷积将事件信息与标准视频帧融合在一起,以生成高质量的插值框架。所提出的方法表示具有高时间分辨率的事件,并使用多头发项机制来更好地编码基于事件的信息,同时不太容易受到模糊和鬼影的影响;因此,产生更脆的框架。仿真结果表明,该提出的技术优于当前最新方法(基于框架和事件),其模型大小明显较小。
translated by 谷歌翻译
在本文中,在模拟环境中对战斗无人机(UAV)进行了建模。旋转翼无人机成功执行了各种任务,例如锁定目标,跟踪并与周围车辆共享相关数据。采用了不同的软件技术,例如API通信,地面控制站配置,自主运动算法,计算机视觉和深度学习。
translated by 谷歌翻译
串联连接的机器人是希望在大规模灾害中的搜索和救援等限制空间中执行任务的候选人。这种机器人通常是韧带,我们假设肢体的添加可以改善移动性。然而,在设计和控制这种装置方面的挑战在于以提高移动性的方式协调高维冗余模块。在这里,我们开发了一个控制串联连接的多腿机器人的一般框架。具体地,我们结合了两种方法来构建一般的形状控制方案,其可以为各种机器人形态的有效运动提供自变形(“Gaits”)的基线模式。首先,我们从维度降低和生物步态分类方案中获取灵感,以产生身体变形和脚提升/降低的循环模式,其促进了任意基板接触图案的产生。其次,我们使用几何力学方法来促进识别这些起伏的最佳相位,以最大化速度和/或稳定性。我们的方案允许在扁平摩擦地形上的多腿机器人机车上的有效Gaits开发有多种数量的四肢(4,6,16,甚至0四肢)和身体致动能力(包括在Limbless设备上的侧壁Gaits)。通过适当协调身体波动和腿部放置,我们的框架结合了Limbless机器人(模块化)和腿机器人(移动性)的优势。我们预计我们的框架可以提供一般的控制方案,以便快速部署一般的多腿机器人,铺平往达在现实条件下遍历复杂环境的机器的方式。
translated by 谷歌翻译
在距离度量学习网络的培训期间,典型损耗函数的最小值可以被认为是满足由训练数据施加的一组约束的“可行点”。为此,我们将距离度量学习问题重构为查找约束集的可行点,其中训练数据的嵌入向量满足所需的类内和帧间接近度。由约束集引起的可行性集被表示为仅针对训练数据的特定样本(来自每个类别的样本)强制执行接近约束的宽松可行集合。然后,通过在那些可行的组上执行交替的投影来大致解决可行点问题。这种方法引入了正则化术语,并导致最小化具有系统批量组结构的典型损失函数,其中这些批次被约束以包含来自每个类的相同样本,用于一定数量的迭代。此外,这些特定样品可以被认为是阶级代表,允许在批量构建期间有效地利用艰难的挖掘。所提出的技术应用于良好的损失,并在斯坦福在线产品,CAR196和CUB200-2011数据集进行了评估,用于图像检索和聚类。表现优于现有技术,所提出的方法一致地提高了综合损失函数的性能,没有额外的计算成本,并通过硬负面挖掘进一步提高性能。
translated by 谷歌翻译
Variational inference uses optimization, rather than integration, to approximate the marginal likelihood, and thereby the posterior, in a Bayesian model. Thanks to advances in computational scalability made in the last decade, variational inference is now the preferred choice for many high-dimensional models and large datasets. This tutorial introduces variational inference from the parametric perspective that dominates these recent developments, in contrast to the mean-field perspective commonly found in other introductory texts.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译