Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.
translated by 谷歌翻译
最近,社区对模型缩放的关注越来越多,并有助于开发具有广泛尺度的模型家族。当前的方法要么简单地采用单发NAS的方式来构建非结构性和不可缩放的模型家族,要么依靠手动固定的缩放策略来扩展不必要的最佳基础模型。在本文中,我们桥接了两个组件,并将Scalenet提出到共同搜索基础模型和缩放策略,以便缩放大型模型可以具有更有希望的性能。具体来说,我们设计了一个超级植物,以体现具有不同尺寸频谱(例如拖鞋)的模型。然后,可以通过基于马尔可夫链的进化算法与基本模型进行交互学习缩放策略,并概括以开发更大的模型。为了获得一个体面的超级植物,我们设计了一种分层抽样策略,以增强其训练充足并减轻干扰。实验结果表明,我们的缩放网络在各种失败的方面都具有显着的性能优势,但搜索成本至少降低了2.53倍。代码可在https://github.com/luminolx/scalenet上找到。
translated by 谷歌翻译
最先进的自主车辆(AV)框架中的对象检测依赖于深神经网络。通常,这些网络在整个相机LIDAR帧上均匀地执行对象检测。然而,这种均匀性通过向场景中的所有对象提供相同的优先级而危及AV的安全性,无论其碰撞到AV。在本文中,我们为AV提供了一个新的端到端管道,它稍后引入LIDAR群集的概念和相机推断,以检测和分类对象。我们拟议的框架的好处是双重的。首先,我们的管道优先考虑检测对AV的碰撞风险更高的物体,给予AV的更多时间对不安全的条件作出反应。其次,与流行的深神经网络管道相比,它还提供更快的推理速度。我们使用现实世界数据集设计我们的框架,Waymo Open DataSet,解决LIDAR传感器和物体检测算法的局限性引起的挑战。我们表明我们的新型对象检测管道优先考虑了更高风险物体的检测,同时实现了与相机推断相比的相当精度和25%的平均速度。
translated by 谷歌翻译
视觉变形金刚(VITS)继承了NLP的成功,但它们的结构尚未充分调查并针对视觉任务进行优化。最简单的解决方案之一是通过CNN中的广泛使用的神经结构搜索(NAS)直接搜索最佳的问题。但是,我们经验探讨了这种直接的适应将遇到灾难性的失败,并对超级形式的培训感到沮丧。在本文中,我们认为,由于VITS主要在令牌嵌入具有很小的归纳偏差上运行,因此不同架构的通道的不平衡将使重量共享假设恶化并导致培训不稳定。因此,我们开发了一种新的循环重量共享机制,用于令牌的VITS嵌入式,这使得每个通道能够更均匀地贡献所有候选架构。此外,我们还提出了身份转移,以减轻超级形式的多对一问题,并利用弱的增强和正规化技术以维持更稳定的培训。基于这些,我们所提出的方法Vitas在Deit-and Twins的Vits中取得了显着的优势。例如,只有1.4美元的G拖鞋预算,我们搜索的架构有3.3 \%$ ImageNet-比基准Deit为1美元$ k准确性。我们的结果达到3.0美元,我们的结果达到了82.0 \%$ 1 $ k,$ 1 $ k,$ 45.9 \%$ 2017 $上涨,这是2.4美元的$ 2.4 \%$优于其他VITS。
translated by 谷歌翻译
Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e.g. pedestrians and vehicles) and road context information (e.g. lanes, traffic lights). This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components. In contrast to most recent approaches, which render trajectories of moving agents and road context information as bird-eye images and encode them with convolutional neural networks (ConvNets), our approach operates on a vector representation. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. To further boost VectorNet's capability in learning context features, we propose a novel auxiliary task to recover the randomly masked out map entities and agent trajectories based on their context. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset. Our method achieves on par or better performance than the competitive rendering approach on both benchmarks while saving over 70% of the model parameters with an order of magnitude reduction in FLOPs. It also outperforms the state of the art on the Argoverse dataset.
translated by 谷歌翻译
Deep neural networks (DNNs) are sensitive and susceptible to tiny perturbation by adversarial attacks which causes erroneous predictions. Various methods, including adversarial defense and uncertainty inference (UI), have been developed in recent years to overcome the adversarial attacks. In this paper, we propose a multi-head uncertainty inference (MH-UI) framework for detecting adversarial attack examples. We adopt a multi-head architecture with multiple prediction heads (i.e., classifiers) to obtain predictions from different depths in the DNNs and introduce shallow information for the UI. Using independent heads at different depths, the normalized predictions are assumed to follow the same Dirichlet distribution, and we estimate distribution parameter of it by moment matching. Cognitive uncertainty brought by the adversarial attacks will be reflected and amplified on the distribution. Experimental results show that the proposed MH-UI framework can outperform all the referred UI methods in the adversarial attack detection task with different settings.
translated by 谷歌翻译
预审前的语言模型已被证明在许多与软件有关的一代任务中都是有效的。但是,它们不适合编辑任务,因为它们不是为了推理编辑的原因。为了解决这个问题,我们提出了一个新颖的预处理目标,该目标明确地对编辑进行了建模并使用它来构建Coditt5,这是一种用于软件相关编辑任务的大型语言模型,该任务是在大量源代码和自然语言评论中鉴定的。我们将其对各种下游编辑任务进行微调,包括评论更新,错误修复和自动代码审核。通过优于基于纯生成的模型,我们证明了方法的普遍性及其对编辑任务的适用性。我们还展示了纯生成模型和我们的基于编辑的模型如何通过简单的重读策略相互补充,我们可以通过该策略实现三个下游编辑任务的最新性能。
translated by 谷歌翻译
通过利用预熟gan的潜在空间,已经提出了许多最近的作品来进行面部图像编辑。但是,很少有尝试将它们直接应用于视频,因为1)他们不能保证时间一致性,2)他们的应用受到视频的处理速度的限制,3)他们无法准确编码面部运动和表达的细节。为此,我们提出了一个新颖的网络,将面部视频编码到Stylegan的潜在空间中,以进行语义面部视频操纵。基于视觉变压器,我们的网络重复了潜在向量的高分辨率部分,以实现时间一致性。为了捕捉微妙的面部运动和表情,我们设计了涉及稀疏面部地标和密集的3D脸部网眼的新颖损失。我们已经彻底评估了我们的方法,并成功证明了其对各种面部视频操作的应用。特别是,我们提出了一个新型网络,用于3D坐标系中的姿势/表达控制。定性和定量结果都表明,我们的方法可以显着优于现有的单图方法,同时实现实时(66 fps)速度。
translated by 谷歌翻译
Code review is an integral part of any mature software development process, and identifying the best reviewer for a code change is a well accepted problem within the software engineering community. Selecting a reviewer who lacks expertise and understanding can slow development or result in more defects. To date, most reviewer recommendation systems rely primarily on historical file change and review information; those who changed or reviewed a file in the past are the best positioned to review in the future. We posit that while these approaches are able to identify and suggest qualified reviewers, they may be blind to reviewers who have the needed expertise and have simply never interacted with the changed files before. To address this, we present CORAL, a novel approach to reviewer recommendation that leverages a socio-technical graph built from the rich set of entities (developers, repositories, files, pull requests, work-items, etc.) and their relationships in modern source code management systems. We employ a graph convolutional neural network on this graph and train it on two and a half years of history on 332 repositories. We show that CORAL is able to model the manual history of reviewer selection remarkably well. Further, based on an extensive user study, we demonstrate that this approach identifies relevant and qualified reviewers who traditional reviewer recommenders miss, and that these developers desire to be included in the review process. Finally, we find that "classical" reviewer recommendation systems perform better on smaller (in terms of developers) software projects while CORAL excels on larger projects, suggesting that there is "no one model to rule them all."
translated by 谷歌翻译
图形神经网络(GNN)在学习强大的节点表示中显示了令人信服的性能,这些表现在保留节点属性和图形结构信息的强大节点表示中。然而,许多GNNS在设计有更深的网络结构或手柄大小的图形时遇到有效性和效率的问题。已经提出了几种采样算法来改善和加速GNN的培训,但他们忽略了解GNN性能增益的来源。图表数据中的信息的测量可以帮助采样算法来保持高价值信息,同时消除冗余信息甚至噪声。在本文中,我们提出了一种用于GNN的公制引导(MEGUIDE)子图学习框架。 MEGUIDE采用两种新颖的度量:功能平滑和连接失效距离,以指导子图采样和迷你批次的培训。功能平滑度专为分析节点的特征而才能保留最有价值的信息,而连接失败距离可以测量结构信息以控制子图的大小。我们展示了MEGUIDE在多个数据集上培训各种GNN的有效性和效率。
translated by 谷歌翻译