尽管最近的进步,但是,尽管最近的进展,但是从单个图像中的人类姿势的全3D估计仍然是一个具有挑战性的任务。在本文中,我们探讨了关于场景几何体的强先前信息的假设可用于提高姿态估计精度。为了主弱地解决这个问题,我们已经组装了一种新的$ \ textbf {几何姿势提供} $ DataSet,包括与各种丰富的3D环境交互的人员的多视图图像。我们利用商业运动捕获系统来收集场景本身的姿势和构造精确的几何3D CAD模型的金标估计。要将对现有框架的现有框架注入图像的现有框架,我们介绍了一种新颖的,基于视图的场景几何形状,一个$ \ textbf {多层深度图} $,它采用了多次射线跟踪到简明地编码沿着每种相机视图光线方向的多个表面入口和退出点。我们提出了两种不同的机制,用于集成多层深度信息姿势估计:输入作为升降2D姿势的编码光线特征,其次是促进学习模型以支持几何一致姿态估计的可差异损失。我们通过实验展示这些技术可以提高3D姿势估计的准确性,特别是在遮挡和复杂场景几何形状的存在中。
translated by 谷歌翻译
Parkinson's disease is marked by altered and increased firing characteristics of pathological oscillations in the brain. In other words, it causes abnormal synchronous oscillations and suppression during neurological processing. In order to examine and regulate the synchronization and pathological oscillations in motor circuits, deep brain stimulators (DBS) are used. Although machine learning methods have been applied for the investigation of suppression, these models require large amounts of training data and computational power, both of which pose challenges to resource-constrained DBS. This research proposes a novel reinforcement learning (RL) framework for suppressing the synchronization in neuronal activity during episodes of neurological disorders with less power consumption. The proposed RL algorithm comprises an ensemble of a temporal representation of stimuli and a twin-delayed deep deterministic (TD3) policy gradient algorithm. We quantify the stability of the proposed framework to noise and reduced synchrony using RL for three pathological signaling regimes: regular, chaotic, and bursting, and further eliminate the undesirable oscillations. Furthermore, metrics such as evaluation rewards, energy supplied to the ensemble, and the mean point of convergence were used and compared to other RL algorithms, specifically the Advantage actor critic (A2C), the Actor critic with Kronecker-featured trust region (ACKTR), and the Proximal policy optimization (PPO).
translated by 谷歌翻译
Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention. The proposed SWTA is comprised of two parts. First, temporal segment network that sparsely samples a given set of frames. Second, weighted temporal attention, which incorporates a fusion of attention maps derived from optical flow, with raw RGB images. This is followed by a basenet network, which comprises a convolutional neural network (CNN) module along with fully connected layers that provide us with activity recognition. The SWTA network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a margin of 25.26%, 18.56%, and 2.94%, respectively.
translated by 谷歌翻译
Coordinate-based implicit neural networks, or neural fields, have emerged as useful representations of shape and appearance in 3D computer vision. Despite advances however, it remains challenging to build neural fields for categories of objects without datasets like ShapeNet that provide canonicalized object instances that are consistently aligned for their 3D position and orientation (pose). We present Canonical Field Network (CaFi-Net), a self-supervised method to canonicalize the 3D pose of instances from an object category represented as neural fields, specifically neural radiance fields (NeRFs). CaFi-Net directly learns from continuous and noisy radiance fields using a Siamese network architecture that is designed to extract equivariant field features for category-level canonicalization. During inference, our method takes pre-trained neural radiance fields of novel object instances at arbitrary 3D pose, and estimates a canonical field with consistent 3D pose across the entire category. Extensive experiments on a new dataset of 1300 NeRF models across 13 object categories show that our method matches or exceeds the performance of 3D point cloud-based methods.
translated by 谷歌翻译
Topological data analysis (TDA) is a branch of computational mathematics, bridging algebraic topology and data science, that provides compact, noise-robust representations of complex structures. Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture, resulting in high-dimensional, difficult-to-interpret internal representations of input data. As DNNs become more ubiquitous across multiple sectors of our society, there is increasing recognition that mathematical methods are needed to aid analysts, researchers, and practitioners in understanding and interpreting how these models' internal representations relate to the final classification. In this paper, we apply cutting edge techniques from TDA with the goal of gaining insight into the interpretability of convolutional neural networks used for image classification. We use two common TDA approaches to explore several methods for modeling hidden-layer activations as high-dimensional point clouds, and provide experimental evidence that these point clouds capture valuable structural information about the model's process. First, we demonstrate that a distance metric based on persistent homology can be used to quantify meaningful differences between layers, and we discuss these distances in the broader context of existing representational similarity metrics for neural network interpretability. Second, we show that a mapper graph can provide semantic insight into how these models organize hierarchical class knowledge at each layer. These observations demonstrate that TDA is a useful tool to help deep learning practitioners unlock the hidden structures of their models.
translated by 谷歌翻译
We introduce SketchySGD, a stochastic quasi-Newton method that uses sketching to approximate the curvature of the loss function. Quasi-Newton methods are among the most effective algorithms in traditional optimization, where they converge much faster than first-order methods such as SGD. However, for contemporary deep learning, quasi-Newton methods are considered inferior to first-order methods like SGD and Adam owing to higher per-iteration complexity and fragility due to inexact gradients. SketchySGD circumvents these issues by a novel combination of subsampling, randomized low-rank approximation, and dynamic regularization. In the convex case, we show SketchySGD with a fixed stepsize converges to a small ball around the optimum at a faster rate than SGD for ill-conditioned problems. In the non-convex case, SketchySGD converges linearly under two additional assumptions, interpolation and the Polyak-Lojaciewicz condition, the latter of which holds with high probability for wide neural networks. Numerical experiments on image and tabular data demonstrate the improved reliability and speed of SketchySGD for deep learning, compared to standard optimizers such as SGD and Adam and existing quasi-Newton methods.
translated by 谷歌翻译
手写的文本识别问题是由计算机视觉社区的研究人员广泛研究的,因为它的改进和适用于日常生活的范围,它是模式识别的子域。自从过去几十年以来,基于神经网络的系统的计算能力提高了计算能力,因此有助于提供最新的手写文本识别器。在同一方向上,我们采用了两个最先进的神经网络系统,并将注意力机制合并在一起。注意技术已被广泛用于神经机器翻译和自动语音识别的领域,现在正在文本识别域中实现。在这项研究中,我们能够在IAM数据集上达到4.15%的字符错误率和9.72%的单词错误率,7.07%的字符错误率和GW数据集的16.14%单词错误率与现有的Flor合并后,GW数据集的单词错误率等。建筑学。为了进一步分析,我们还使用了类似于Shi等人的系统。具有贪婪解码器的神经网络系统,观察到基本模型的字符错误率提高了23.27%。
translated by 谷歌翻译
对比度学习通常用作一种自我监督学习的方法,“锚”和“正”是给定输入图像的两个随机增强,而“负”是所有其他图像的集合。但是,对大批量和记忆库的需求使训练变得困难和缓慢。这促使有监督的对比方法的崛起通过使用带注释的数据来克服这些问题。我们希望通过基于其相似性进行排名,并观察人类偏见(以排名形式)对学习表示的影响,以进一步改善受监督的对比学习。我们认为这是一个重要的问题,因为学习良好的功能嵌入是在计算机视觉中长期以来一直追求的问题。
translated by 谷歌翻译
对比学习通常应用于自学的学习,并且已被证明超过了传统方法,例如三胞胎损失和n对损失。但是,对大批量和记忆库的需求使训练变得困难和缓慢。最近,已经开发出有监督的对比方法来克服这些问题。他们更多地专注于分别或在各个班级之间为每个班级学习一个良好的表示。在这项工作中,我们尝试使用用户定义的排名来基于相似性对类进行排名,以了解所有类之间的有效表示。我们观察到如何将人类偏见纳入学习过程可以改善参数空间中的学习表征。我们表明,我们的结果可与受监督的对比度学习用于图像分类和对象检测,并讨论其在OOD检测中的缺点
translated by 谷歌翻译
草书手写文本识别是模式识别领域中一个具有挑战性的研究问题。当前的最新方法包括基于卷积复发性神经网络和多维长期记忆复发性神经网络技术的模型。这些方法在高度计算上是广泛的模型,在设计级别上也很复杂。在最近的研究中,与基于卷积的复发性神经网络相比,基于卷积神经网络和票面卷积神经网络模型的组合显示出较少的参数。在减少要训练的参数总数的方向上,在这项工作中,我们使用了深度卷积代替标准卷积,结合了封闭式跨跨跨性神经网络和双向封闭式复发单元来减少参数总数接受训练。此外,我们还在测试步骤中包括了基于词典的单词梁搜索解码器。它还有助于提高模型的整体准确性。我们在IAM数据集上获得了3.84%的字符错误率和9.40%的单词错误率;乔治·华盛顿数据集的字符错误率和14.56%的字符错误率和14.56%的单词错误率。
translated by 谷歌翻译