数据装配过程是量子机学习的瓶颈之一,可能会否定任何量子加速。鉴于此,必须采用更有效的数据编码策略。我们提出了一种基于光子的骨气数据编码方案,该方案使用较少的编码层嵌入经典数据点,并通过将数据点映射到高维FOCK空间中,从而规避非线性光学组件的需求。电路的表达能力可以通过输入光子的数量来控制。我们的工作阐明了量子光子学在量子机学习模型的表达能力方面提供的独特优势。通过利用光子数依赖的表达能力,我们提出了三种不同的中间尺度量子兼容二进制分类方法,其所需资源适用于不同监督分类任务。
translated by 谷歌翻译
构建强大的通用对象检测框架需要扩展到更大的标签空间和更大的培训数据集。但是,大规模获取数千个类别的注释是高昂的成本。我们提出了一种新颖的方法,该方法利用了最近的视觉和语言模型中可用的丰富语义来将对象定位和分类在未标记的图像中,从而有效地生成了伪标签以进行对象检测。从通用和类别的区域建议机制开始,我们使用视觉和语言模型将图像的每个区域分类为下游任务所需的任何对象类别。我们在两个特定的任务(开放式摄影检测检测)中演示了生成的伪标签的值,其中模型需要概括为看不见的对象类别以及半监督对象检测,其中可以使用其他未标记的图像来改善模型。我们的经验评估显示了伪标签在这两个任务中的有效性,我们在其中优于竞争基准并实现了开放式摄制对象检测的新颖最新。我们的代码可在https://github.com/xiaofeng94/vl-plm上找到。
translated by 谷歌翻译
尽管在自然语言处理(NLP)中经常发生的经常性神经网络(RNN),但由于RNN中的本质上复杂计算,RNN的理论理解仍然有限。我们在普遍存在的NLP任务中对RNNS的行为进行了系统分析,通过映射到一种称为经常性算术电路(RAC)和矩阵产品状态(MPS)之间的映射来对电影评论的情感分析。使用von-neumann纠缠熵(EE)作为信息传播的代理,我们表明单层RACS具有最大信息传播能力,由EE的饱和反映。放大超出EE饱和阈值的MP的键尺寸不会增加预测精度,因此可以构建最佳估计数据统计数据的最小模型。虽然饱和EE小于MPS的面积法可实现的最大EE,但我们的模型在现实情绪分析数据集中实现了〜99%的训练准确性。因此,单独的低EE不是针对NLP采用单层RAC的权证。与常见的信念相反,远程信息传播是RNNS表达的主要来源,我们表明单层RACS也从有意义的单词矢量嵌入中利用高表现力。我们的工作揭示了在RAC的现象学中,更一般地用于NLP的RNNS的解释性方面,使用来自许多身体量子物理学的工具。
translated by 谷歌翻译
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network. We make such a notion more precise for a variant of the classification problem that we term selective dependence classification (SDC) when used with attention model architectures. Under such a setting, we demonstrate various error modes where an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We illustrate various situations that can accentuate and mitigate this behaviour. Finally, we use our objective definition of interpretability for SDC tasks to evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
translated by 谷歌翻译
Transformers are becoming increasingly popular due to their superior performance over conventional convolutional neural networks(CNNs). However, transformers usually require a much larger amount of memory to train than CNNs, which prevents their application in many low resource settings. Local learning, which divides the network into several distinct modules and trains them individually, is a promising alternative to the end-to-end (E2E) training approach to reduce the amount of memory for training and to increase parallelism. This paper is the first to apply Local Learning on transformers for this purpose. The standard CNN-based local learning method, InfoPro [32], reconstructs the input images for each module in a CNN. However, reconstructing the entire image does not generalize well. In this paper, we propose a new mechanism for each local module, where instead of reconstructing the entire image, we reconstruct its input features, generated from previous modules. We evaluate our approach on 4 commonly used datasets and 3 commonly used decoder structures on Swin-Tiny. The experiments show that our approach outperforms InfoPro-Transformer, the InfoPro with Transfomer backbone we introduced, by at up to 0.58% on CIFAR-10, CIFAR-100, STL-10 and SVHN datasets, while using up to 12% less memory. Compared to the E2E approach, we require 36% less GPU memory when the network is divided into 2 modules and 45% less GPU memory when the network is divided into 4 modules.
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译
Artificial neural networks can learn complex, salient data features to achieve a given task. On the opposite end of the spectrum, mathematically grounded methods such as topological data analysis allow users to design analysis pipelines fully aware of data constraints and symmetries. We introduce a class of persistence-based neural network layers. Persistence-based layers allow the users to easily inject knowledge about symmetries (equivariance) respected by the data, are equipped with learnable weights, and can be composed with state-of-the-art neural architectures.
translated by 谷歌翻译
KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.
translated by 谷歌翻译