投影技术经常用于可视化高维数据,使用户能够更好地理解在2D屏幕上的多维空间的总体结构。尽管存在着许多这样的方法,相当小的工作已经逆投影的普及方法来完成 - 绘制投影点,或者更一般的过程中,投影空间回到原来的高维空间。在本文中我们提出NNInv,用近似的任何突起或映射的逆的能力的深学习技术。 NNInv学会重建上的二维投影空间从任意点高维数据,给用户在视觉分析系统所学习的高维表示的能力进行交互。我们提供NNInv的参数空间的分析,并在选择这些参数提供指导。我们通过一系列定量和定性分析的延长NNInv的有效性验证。交互式实例中插值,分级协议,梯度可视化:然后,我们把它应用到三个可视化任务,验证了该方法的效用。
translated by 谷歌翻译
视觉预读(VLP)模型最近成功地促进了许多跨模式下游任务。大多数现有作品通过比较微调的下游任务性能来评估其系统。但是,只有平均下游任务准确性才能提供有关每种VLP方法的优缺点的几乎没有信息,更不用说有关社区如何改善系统的见解。受清单进行自然语言处理的启发,我们引入了VL-CheckList,这是一个新颖的框架,以了解VLP模型的功能。所提出的方法将VLP模型的图像定位能力分为三类:对象,属性和关系,并使用新颖的分类法进一步分解这三个方面。我们进行了全面的研究,通过提出的框架分析了七个最近流行的VLP模型。结果通过揭示了仅在下游任务评估中看不见的模型之间的细粒度差异来证实所提出的方法的有效性。进一步的结果表明,在构建更好的VLP模型方面有希望的研究方向。数据和代码:https://github.com/om--ai-lab/vl-checklist
translated by 谷歌翻译
大规模矢量映射对于运输,城市规划,调查和人口普查很重要。我们提出了GraphMapper,这是从卫星图像中提取端到端向量图的统一框架。我们的关键思想是一种新颖的统一表示,称为“原始图”的不同拓扑的形状,这是一组形状原语及其成对关系矩阵。然后,我们将向量形状的预测,正则化和拓扑重构转换为独特的原始图学习问题。具体而言,GraphMapper是一个基于多头注意的全局形状上下文建模的通用原始图形学习网络。开发了一种嵌入式空间排序方法,用于准确的原始关系建模。我们从经验上证明了GraphMapper对两个具有挑战性的映射任务的有效性,即建立足迹正则化和道路网络拓扑重建。我们的模型在公共基准上的两项任务中都优于最先进的方法。所有代码将公开可用。
translated by 谷歌翻译
我们提出了一种新颖的软件服务推荐模型,以帮助用户在Github中找到合适的存储库。我们的模型首先设计了一种新颖的上下文诱导的存储库图嵌入方法,以利用存储库的丰富上下文信息来缓解数据稀疏问题引起的困难。然后,它在软件服务推荐字段中首次利用用户存储库交互的序列信息。具体地,采用基于深度学习的顺序推荐技术来捕获用户偏好的动态。在从Github收集的大型数据集中进行了综合实验,以根据现有方法列表。结果说明了我们在各个方面的方法的优越性。
translated by 谷歌翻译
我们展示了MVLayoutNet,是来自多视图全景的整体三维重建端到端网络。我们的核心贡献是无缝地将学习的单目布局估计和多视图立体声(MV)结合起来,以便在3D和图像空间中准确地重建。我们共同列出布局模块以产生初始布局和新型MVS模块,以获得精确的布局几何形状。与标准MVSNET [33]不同,我们的MVS模块采用新建的布局成本卷,其在相同的深度层中聚合到相应的布局元件中的多视图成本。我们还提供了一种基于注意的方案,指导MVS模块专注于结构区域。这种设计考虑了本地像素级成本和全球整体信息,以便更好地重建。实验表明,我们的方法在2D-3D-S [1]和Zind [5]数据集中,在深度RMSE方面以21.7%和20.6%表示最先进的。最后,我们的方法导致连贯的布局几何,使整个场景的重建能够。
translated by 谷歌翻译
本文侧重于NYSTR \“{o} M正常化的学习速率分析,为$ \ tau $ -mixing时间序列使用顺序子采样。使用最近开发的Banach-valueed Bernstein不等式以\ tau $ -mixing序列和一个基于二阶分解的积分操作方法,我们成功地推出了NYSTR \“{o} M正常化的最佳学习率,以及用于$ \ TAU $ -MIXING时间序列的顺序子采样。进行了一系列数值实验以验证我们的理论结果,显示NYSTR \“{o} M正则化的优异学习性能,在学习大规模时间序列数据中具有顺序子采样。所有这些结果都扩展了NYSTR \适用范围“{o} M正常化从IID对非i.i.d的样品。序列。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译