普遍缺乏与深馈出神经网络(DNN)有关的理解,可能部分归因于缺乏分析非线性功能组成的工具,部分是由于缺乏适用于DNN体系结构多样性的数学模型。在本文中,我们做出了许多与激活函数,非线性转换和DNN体系结构有关的基本假设,以便使用未矫正方法通过定向的无环图(DAGS)分析DNN。满足这些假设的DNN被称为一般DNN。我们对分析图的构造是基于一种公理方法,在该方法中,根据监管规则,通过将原子操作应用于基本元素,从自下而上构建了DAG。这种方法使我们能够通过数学诱导得出一般DNN的特性。我们表明,使用建议的方法,可以得出一般DNN的某些属性。如果可以利用大量的图形分析工具,则该分析可以提高我们对网络功能的理解,并可以促进进一步的理论见解。
translated by 谷歌翻译
We study the expressibility and learnability of convex optimization solution functions and their multi-layer architectural extension. The main results are: \emph{(1)} the class of solution functions of linear programming (LP) and quadratic programming (QP) is a universal approximant for the $C^k$ smooth model class or some restricted Sobolev space, and we characterize the rate-distortion, \emph{(2)} the approximation power is investigated through a viewpoint of regression error, where information about the target function is provided in terms of data observations, \emph{(3)} compositionality in the form of a deep architecture with optimization as a layer is shown to reconstruct some basic functions used in numerical analysis without error, which implies that \emph{(4)} a substantial reduction in rate-distortion can be achieved with a universal network architecture, and \emph{(5)} we discuss the statistical bounds of empirical covering numbers for LP/QP, as well as a generic optimization problem (possibly nonconvex) by exploiting tame geometry. Our results provide the \emph{first rigorous analysis of the approximation and learning-theoretic properties of solution functions} with implications for algorithmic design and performance guarantees.
translated by 谷歌翻译
我们为特殊神经网络架构,称为运营商复发性神经网络的理论分析,用于近似非线性函数,其输入是线性运算符。这些功能通常在解决方案算法中出现用于逆边值问题的问题。传统的神经网络将输入数据视为向量,因此它们没有有效地捕获与对应于这种逆问题中的数据的线性运算符相关联的乘法结构。因此,我们介绍一个类似标准的神经网络架构的新系列,但是输入数据在向量上乘法作用。由较小的算子出现在边界控制中的紧凑型操作员和波动方程的反边值问题分析,我们在网络中的选择权重矩阵中促进结构和稀疏性。在描述此架构后,我们研究其表示属性以及其近似属性。我们还表明,可以引入明确的正则化,其可以从所述逆问题的数学分析导出,并导致概括属性上的某些保证。我们观察到重量矩阵的稀疏性改善了概括估计。最后,我们讨论如何将运营商复发网络视为深度学习模拟,以确定诸如用于从边界测量的声波方程中重建所未知的WAVESTED的边界控制的算法算法。
translated by 谷歌翻译
神经网络的经典发展主要集中在有限维欧基德空间或有限组之间的学习映射。我们提出了神经网络的概括,以学习映射无限尺寸函数空间之间的运算符。我们通过一类线性积分运算符和非线性激活函数的组成制定运营商的近似,使得组合的操作员可以近似复杂的非线性运算符。我们证明了我们建筑的普遍近似定理。此外,我们介绍了四类运算符参数化:基于图形的运算符,低秩运算符,基于多极图形的运算符和傅里叶运算符,并描述了每个用于用每个计算的高效算法。所提出的神经运营商是决议不变的:它们在底层函数空间的不同离散化之间共享相同的网络参数,并且可以用于零击超分辨率。在数值上,与现有的基于机器学习的方法,达西流程和Navier-Stokes方程相比,所提出的模型显示出卓越的性能,而与传统的PDE求解器相比,与现有的基于机器学习的方法有关的基于机器学习的方法。
translated by 谷歌翻译
我们提供了在Relu神经网络层的动作下不变的概率分布系列的完整表征。在贝叶斯网络培训期间出现对这些家庭的需求或对训练有素的神经网络的分析,例如,在不确定量化(UQ)或解释的人工智能(XAI)的范围内。我们证明,除非以下三个限制中的至少一个限制,否则不可能存在不变的参数化分布族:首先,网络层具有一个宽度,这对于实际神经网络是不合理的。其次,家庭的概率措施具有有限的支持,基本上适用于采样分布。第三,家庭的参数化不是局部Lipschitz连续,这排除了所有计算可行的家庭。最后,我们表明这些限制是单独必要的。对于三种情况中的每一个,我们可以构建一个不变的家庭,究竟是一个限制之一,但不是另一个。
translated by 谷歌翻译
作为一种强大的建模方法,分段线性神经网络(PWLNNS)已在各个领域都被证明是成功的,最近在深度学习中。为了应用PWLNN方法,长期以来一直研究了表示和学习。 1977年,规范表示率先通过增量设计学到了浅层PWLNN的作品,但禁止使用大规模数据的应用。 2010年,纠正的线性单元(RELU)提倡在深度学习中PWLNN的患病率。从那以后,PWLNNS已成功地应用于广泛的任务并实现了有利的表现。在本引物中,我们通过将作品分组为浅网络和深层网络来系统地介绍PWLNNS的方法。首先,不同的PWLNN表示模型是由详细示例构建的。使用PWLNNS,提出了学习数据的学习算法的演变,并且基本理论分析遵循深入的理解。然后,将代表性应用与讨论和前景一起引入。
translated by 谷歌翻译
Although theoretical properties such as expressive power and over-smoothing of graph neural networks (GNN) have been extensively studied recently, its convergence property is a relatively new direction. In this paper, we investigate the convergence of one powerful GNN, Invariant Graph Network (IGN) over graphs sampled from graphons. We first prove the stability of linear layers for general $k$-IGN (of order $k$) based on a novel interpretation of linear equivariant layers. Building upon this result, we prove the convergence of $k$-IGN under the model of \citet{ruiz2020graphon}, where we access the edge weight but the convergence error is measured for graphon inputs. Under the more natural (and more challenging) setting of \citet{keriven2020convergence} where one can only access 0-1 adjacency matrix sampled according to edge probability, we first show a negative result that the convergence of any IGN is not possible. We then obtain the convergence of a subset of IGNs, denoted as IGN-small, after the edge probability estimation. We show that IGN-small still contains function class rich enough that can approximate spectral GNNs arbitrarily well. Lastly, we perform experiments on various graphon models to verify our statements.
translated by 谷歌翻译
我们研究了使用动力学系统的流量图相对于输入指数的某些置换的函数的近似值。这种不变的功能包括涉及图像任务的经过研究的翻译不变性功能,但还包含许多在科学和工程中找到新兴应用程序的置换不变函数。我们证明了通过受控的模棱两可的动态系统的通用近似的足够条件,可以将其视为具有对称约束的深度残留网络的一般抽象。这些结果不仅意味着用于对称函数近似的各种常用神经网络体系结构的通用近似,而且还指导设计具有近似值保证的架构的设计,以保证涉及新对称要求的应用。
translated by 谷歌翻译
由于其在输入空间子集上的功能的知识,因此可以根据情况,诅咒或祝福来恢复神经网络的参数权重和偏差的可能性。一方面,恢复参数允许更好的对抗攻击,并且还可以从用于构造网络的数据集中披露敏感信息。另一方面,如果可以恢复网络的参数,它可以保证用户可以解释潜在空间中的特征。它还提供基础,以获得对网络性能的正式保障。因此,表征可以识别其参数的网络以及其参数不能的网络是很重要的。在本文中,我们在深度全连接的前馈recu网络上提供了一组条件,在该馈电中,网络的参数是唯一识别的模型置换和正重型 - 从其实现输入空间的子集。
translated by 谷歌翻译
Deep Neural Networks (DNNs) training can be difficult due to vanishing and exploding gradients during weight optimization through backpropagation. To address this problem, we propose a general class of Hamiltonian DNNs (H-DNNs) that stem from the discretization of continuous-time Hamiltonian systems and include several existing DNN architectures based on ordinary differential equations. Our main result is that a broad set of H-DNNs ensures non-vanishing gradients by design for an arbitrary network depth. This is obtained by proving that, using a semi-implicit Euler discretization scheme, the backward sensitivity matrices involved in gradient computations are symplectic. We also provide an upper-bound to the magnitude of sensitivity matrices and show that exploding gradients can be controlled through regularization. Finally, we enable distributed implementations of backward and forward propagation algorithms in H-DNNs by characterizing appropriate sparsity constraints on the weight matrices. The good performance of H-DNNs is demonstrated on benchmark classification problems, including image classification with the MNIST dataset.
translated by 谷歌翻译
These notes were compiled as lecture notes for a course developed and taught at the University of the Southern California. They should be accessible to a typical engineering graduate student with a strong background in Applied Mathematics. The main objective of these notes is to introduce a student who is familiar with concepts in linear algebra and partial differential equations to select topics in deep learning. These lecture notes exploit the strong connections between deep learning algorithms and the more conventional techniques of computational physics to achieve two goals. First, they use concepts from computational physics to develop an understanding of deep learning algorithms. Not surprisingly, many concepts in deep learning can be connected to similar concepts in computational physics, and one can utilize this connection to better understand these algorithms. Second, several novel deep learning algorithms can be used to solve challenging problems in computational physics. Thus, they offer someone who is interested in modeling a physical phenomena with a complementary set of tools.
translated by 谷歌翻译
本文通过引入几何深度学习(GDL)框架来构建通用馈电型型模型与可区分的流形几何形状兼容的通用馈电型模型,从而解决了对非欧国人数据进行处理的需求。我们表明,我们的GDL模型可以在受控最大直径的紧凑型组上均匀地近似任何连续目标函数。我们在近似GDL模型的深度上获得了最大直径和上限的曲率依赖性下限。相反,我们发现任何两个非分类紧凑型歧管之间始终都有连续的函数,任何“局部定义”的GDL模型都不能均匀地近似。我们的最后一个主要结果确定了数据依赖性条件,确保实施我们近似的GDL模型破坏了“维度的诅咒”。我们发现,任何“现实世界”(即有限)数据集始终满足我们的状况,相反,如果目标函数平滑,则任何数据集都满足我们的要求。作为应用,我们确认了以下GDL模型的通用近似功能:Ganea等。 (2018)的双波利馈电网络,实施Krishnan等人的体系结构。 (2015年)的深卡尔曼 - 滤波器和深度玛克斯分类器。我们构建了:Meyer等人的SPD-Matrix回归剂的通用扩展/变体。 (2011)和Fletcher(2003)的Procrustean回归剂。在欧几里得的环境中,我们的结果暗示了Kidger和Lyons(2020)的近似定理和Yarotsky和Zhevnerchuk(2019)无估计近似率的数据依赖性版本的定量版本。
translated by 谷歌翻译
On general regular simplicial partitions $\mathcal{T}$ of bounded polytopal domains $\Omega \subset \mathbb{R}^d$, $d\in\{2,3\}$, we construct \emph{exact neural network (NN) emulations} of all lowest order finite element spaces in the discrete de Rham complex. These include the spaces of piecewise constant functions, continuous piecewise linear (CPwL) functions, the classical ``Raviart-Thomas element'', and the ``N\'{e}d\'{e}lec edge element''. For all but the CPwL case, our network architectures employ both ReLU (rectified linear unit) and BiSU (binary step unit) activations to capture discontinuities. In the important case of CPwL functions, we prove that it suffices to work with pure ReLU nets. Our construction and DNN architecture generalizes previous results in that no geometric restrictions on the regular simplicial partitions $\mathcal{T}$ of $\Omega$ are required for DNN emulation. In addition, for CPwL functions our DNN construction is valid in any dimension $d\geq 2$. Our ``FE-Nets'' are required in the variationally correct, structure-preserving approximation of boundary value problems of electromagnetism in nonconvex polyhedra $\Omega \subset \mathbb{R}^3$. They are thus an essential ingredient in the application of e.g., the methodology of ``physics-informed NNs'' or ``deep Ritz methods'' to electromagnetic field simulation via deep learning techniques. We indicate generalizations of our constructions to higher-order compatible spaces and other, non-compatible classes of discretizations, in particular the ``Crouzeix-Raviart'' elements and Hybridized, Higher Order (HHO) methods.
translated by 谷歌翻译
许多前馈神经网络会产生连续和分段线性(CPWL)映射。具体而言,它们将输入域分配给映射为仿射函数的区域。这些所谓的线性区域的数量提供了自然度量标准,可以表征CPWL映射的表现力。尽管该数量的精确确定通常是无法触及的,但已经针对包括众所周知的Relu和Maxout网络提出了界限。在这项工作中,我们提出了一个更一般的观点,并基于三种表达能力来源:深度,宽度和激活复杂性,就CPWL网络的最大线性区域数量提供精确的界限。我们的估计依赖于凸形分区的组合结构,并突出了深度的独特作用,该作用本身能够呈指数级增加区域数量。然后,我们引入了一个互补的随机框架,以估计CPWL网络体系结构产生的线性区域的平均数量。在合理的假设下,沿任何一维路径的线性区域的预期密度都受深度,宽度和激活复杂度度量(最高缩放系数)的量的限制。这对三种表达能力产生了相同的作用:不再观察到深度的指数增长。
translated by 谷歌翻译
由于存在对抗性攻击,因此在安全至关重要系统中使用神经网络需要安全,可靠的模型。了解任何输入X的最小对抗扰动,或等效地知道X与分类边界的距离,可以评估分类鲁棒性,从而提供可认证的预测。不幸的是,计算此类距离的最新技术在计算上很昂贵,因此不适合在线应用程序。这项工作提出了一个新型的分类器家族,即签名的距离分类器(SDC),从理论的角度来看,它直接输出X与分类边界的确切距离,而不是概率分数(例如SoftMax)。 SDC代表一个强大的设计分类器家庭。为了实际解决SDC的理论要求,提出了一种名为Unitary级别神经网络的新型网络体系结构。实验结果表明,所提出的体系结构近似于签名的距离分类器,因此允许以单个推断为代价对X进行在线认证分类。
translated by 谷歌翻译
这项调查的目的是介绍对深神经网络的近似特性的解释性回顾。具体而言,我们旨在了解深神经网络如何以及为什么要优于其他经典线性和非线性近似方法。这项调查包括三章。在第1章中,我们回顾了深层网络及其组成非线性结构的关键思想和概念。我们通过在解决回归和分类问题时将其作为优化问题来形式化神经网络问题。我们简要讨论用于解决优化问题的随机梯度下降算法以及用于解决优化问题的后传播公式,并解决了与神经网络性能相关的一些问题,包括选择激活功能,成本功能,过度适应问题和正则化。在第2章中,我们将重点转移到神经网络的近似理论上。我们首先介绍多项式近似中的密度概念,尤其是研究实现连续函数的Stone-WeierStrass定理。然后,在线性近似的框架内,我们回顾了馈电网络的密度和收敛速率的一些经典结果,然后在近似Sobolev函数中进行有关深网络复杂性的最新发展。在第3章中,利用非线性近似理论,我们进一步详细介绍了深度和近似网络与其他经典非线性近似方法相比的近似优势。
translated by 谷歌翻译
我们指出,对于随机深神经网络(SDNN)的隐藏层以及整个SDNN的输出,浓度和martingale不平等现象。这些结果使我们能够引入预期的分类器(EC),并为EC的分类误差提供概率上限。我们还通过最佳的停止过程陈述了SDNN的最佳层数。我们将分析应用于具有Relu激活函数的前馈神经网络的随机版本。
translated by 谷歌翻译
We develop new theoretical results on matrix perturbation to shed light on the impact of architecture on the performance of a deep network. In particular, we explain analytically what deep learning practitioners have long observed empirically: the parameters of some deep architectures (e.g., residual networks, ResNets, and Dense networks, DenseNets) are easier to optimize than others (e.g., convolutional networks, ConvNets). Building on our earlier work connecting deep networks with continuous piecewise-affine splines, we develop an exact local linear representation of a deep network layer for a family of modern deep networks that includes ConvNets at one end of a spectrum and ResNets, DenseNets, and other networks with skip connections at the other. For regression and classification tasks that optimize the squared-error loss, we show that the optimization loss surface of a modern deep network is piecewise quadratic in the parameters, with local shape governed by the singular values of a matrix that is a function of the local linear representation. We develop new perturbation results for how the singular values of matrices of this sort behave as we add a fraction of the identity and multiply by certain diagonal matrices. A direct application of our perturbation results explains analytically why a network with skip connections (such as a ResNet or DenseNet) is easier to optimize than a ConvNet: thanks to its more stable singular values and smaller condition number, the local loss surface of such a network is less erratic, less eccentric, and features local minima that are more accommodating to gradient-based optimization. Our results also shed new light on the impact of different nonlinear activation functions on a deep network's singular values, regardless of its architecture.
translated by 谷歌翻译
Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to log nfactors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential network parameters exceeding the sample size. The analysis gives some insights into why multilayer feedforward neural networks perform well in practice. Interestingly, for ReLU activation function the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural. It is also shown that under the composition assumption wavelet estimators can only achieve suboptimal rates.
translated by 谷歌翻译
在本文中,我们为通过深神经网络参数参数的离散时间动力学系统的消散性和局部渐近稳定提供了足够的条件。我们利用神经网络作为点式仿射图的表示,从而揭示其本地线性操作员并使其可以通过经典的系统分析和设计方法访问。这使我们能够通过评估其耗散性并估算其固定点和状态空间分区来“打开神经动力学系统行为的黑匣子”。我们将这些局部线性运算符的规范与耗散系统中存储的能量的规范联系起来,其供应率由其总偏差项表示。从经验上讲,我们分析了这些局部线性运算符的动力学行为和特征值光谱的差异,具有不同的权重,激活函数,偏置项和深度。
translated by 谷歌翻译