Neural ordinary differential equations (NODEs) -- parametrizations of differential equations using neural networks -- have shown tremendous promise in learning models of unknown continuous-time dynamical systems from data. However, every forward evaluation of a NODE requires numerical integration of the neural network used to capture the system dynamics, making their training prohibitively expensive. Existing works rely on off-the-shelf adaptive step-size numerical integration schemes, which often require an excessive number of evaluations of the underlying dynamics network to obtain sufficient accuracy for training. By contrast, we accelerate the evaluation and the training of NODEs by proposing a data-driven approach to their numerical integration. The proposed Taylor-Lagrange NODEs (TL-NODEs) use a fixed-order Taylor expansion for numerical integration, while also learning to estimate the expansion's approximation error. As a result, the proposed approach achieves the same accuracy as adaptive step-size schemes while employing only low-order Taylor expansions, thus greatly reducing the computational cost necessary to integrate the NODE. A suite of numerical experiments, including modeling dynamical systems, image classification, and density estimation, demonstrate that TL-NODEs can be trained more than an order of magnitude faster than state-of-the-art approaches, without any loss in performance.
translated by 谷歌翻译
translated by 谷歌翻译
Effective inclusion of physics-based knowledge into deep neural network models of dynamical systems can greatly improve data efficiency and generalization. Such a-priori knowledge might arise from physical principles (e.g., conservation laws) or from the system's design (e.g., the Jacobian matrix of a robot), even if large portions of the system dynamics remain unknown. We develop a framework to learn dynamics models from trajectory data while incorporating a-priori system knowledge as inductive bias. More specifically, the proposed framework uses physics-based side information to inform the structure of the neural network itself, and to place constraints on the values of the outputs and the internal states of the model. It represents the system's vector field as a composition of known and unknown functions, the latter of which are parametrized by neural networks. The physics-informed constraints are enforced via the augmented Lagrangian method during the model's training. We experimentally demonstrate the benefits of the proposed approach on a variety of dynamical systems -- including a benchmark suite of robotics environments featuring large state spaces, non-linear dynamics, external forces, contact forces, and control inputs. By exploiting a-priori system knowledge during training, the proposed approach learns to predict the system dynamics two orders of magnitude more accurately than a baseline approach that does not include prior knowledge, given the same training dataset.
translated by 谷歌翻译
We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a blackbox differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.
translated by 谷歌翻译
Neural ordinary differential equations (neural ODEs) have emerged as a novel network architecture that bridges dynamical systems and deep learning. However, the gradient obtained with the continuous adjoint method in the vanilla neural ODE is not reverse-accurate. Other approaches suffer either from an excessive memory requirement due to deep computational graphs or from limited choices for the time integration scheme, hampering their application to large-scale complex dynamical systems. To achieve accurate gradients without compromising memory efficiency and flexibility, we present a new neural ODE framework, PNODE, based on high-level discrete adjoint algorithmic differentiation. By leveraging discrete adjoint time integrators and advanced checkpointing strategies tailored for these integrators, PNODE can provide a balance between memory and computational costs, while computing the gradients consistently and accurately. We provide an open-source implementation based on PyTorch and PETSc, one of the most commonly used portable, scalable scientific computing libraries. We demonstrate the performance through extensive numerical experiments on image classification and continuous normalizing flow problems. We show that PNODE achieves the highest memory efficiency when compared with other reverse-accurate methods. On the image classification problems, PNODE is up to two times faster than the vanilla neural ODE and up to 2.3 times faster than the best existing reverse-accurate method. We also show that PNODE enables the use of the implicit time integration methods that are needed for stiff dynamical systems.
translated by 谷歌翻译
translated by 谷歌翻译
Many dynamical systems -- from robots interacting with their surroundings to large-scale multiphysics systems -- involve a number of interacting subsystems. Toward the objective of learning composite models of such systems from data, we present i) a framework for compositional neural networks, ii) algorithms to train these models, iii) a method to compose the learned models, iv) theoretical results that bound the error of the resulting composite models, and v) a method to learn the composition itself, when it is not known a prior. The end result is a modular approach to learning: neural network submodels are trained on trajectory data generated by relatively simple subsystems, and the dynamics of more complex composite systems are then predicted without requiring additional data generated by the composite systems themselves. We achieve this compositionality by representing the system of interest, as well as each of its subsystems, as a port-Hamiltonian neural network (PHNN) -- a class of neural ordinary differential equations that uses the port-Hamiltonian systems formulation as inductive bias. We compose collections of PHNNs by using the system's physics-informed interconnection structure, which may be known a priori, or may itself be learned from data. We demonstrate the novel capabilities of the proposed framework through numerical examples involving interacting spring-mass-damper systems. Models of these systems, which include nonlinear energy dissipation and control inputs, are learned independently. Accurate compositions are learned using an amount of training data that is negligible in comparison with that required to train a new model from scratch. Finally, we observe that the composite PHNNs enjoy properties of port-Hamiltonian systems, such as cyclo-passivity -- a property that is useful for control purposes.
translated by 谷歌翻译
普通微分方程和神经网络的组合,即神经普通微分方程(神经ode),已从各个角度广泛研究。但是,在神经ode中解密的数值整合仍然是一个开放的挑战,因为许多研究表明,数值整合会显着影响模型的性能。在本文中,我们提出了反修改的微分方程(IMDE),以阐明数值整合对训练神经模型的影响。 IMDE取决于学习任务和受雇的ODE求解器。结果表明,训练神经模型实际上返回IMDE的紧密近似值,而不是真实的ode。在IMDE的帮助下,我们推断出(i)学习模型与真实颂歌之间的差异是由离散误差和学习损失的总和界定的; (ii)使用非透明数值整合的神经颂歌理论上无法学习保护定律。进行了几项实验以在数值上验证我们的理论分析。
translated by 谷歌翻译
差分方程管理的学习动态对于预测和控制科学和工程系统来说至关重要。神经常规方程(节点)是一种与微分方程集成的深度学习模型,最近是由于其对不规则样本的鲁棒性及其对高维输入的灵活性而流行的学习动态。然而,节点的训练对数值求解器的精度敏感,这使得节点的收敛不稳定,特别是对于不稳定的动态系统。在本文中,为了减少对数值求解器的依赖,我们建议提高节点训练中的监督信号。具体地,我们预先训练神经差分运算符(NDO)以输出衍生物的估计用作额外的监督信号。 NDO在一类基础函数上预先培训,并将这些功能的轨迹样本之间的映射学习到其衍生物。为了利用来自NDO的轨迹信号和估计的衍生工具,我们提出了一种称为NDO-Node的算法,其中损耗函数包含两个术语:真正轨迹样本的适应性以及由输出的估计衍生物的适应度预先训练的NDO。各种动力学的实验表明,我们提出的NDO-Node可以一致地用一个预先训练的NDO来改善预测精度。特别是对于僵硬的杂散,我们观察到与其他正则化方法相比,NDO-Node可以更准确地捕获动态的过渡。
translated by 谷歌翻译
translated by 谷歌翻译
扩散概率模型(DPM)是新兴的强大生成模型。尽管具有高质量的生成性能,但DPM仍然遭受缓慢采样的苦难,因为它们通常需要数百或数千个大型神经网络的顺序函数评估(步骤)来绘制样本。可以将来自DPM的采样视为求解相应的扩散普通微分方程(ODE)。在这项工作中,我们提出了扩散ODE的溶液的精确表述。该公式通过分析计算解决方案的线性部分,而不是将所有术语留给先前工作中采用的黑盒ode求解器。通过应用可变化的更改,可以将解决方案等效地简化为神经网络的指数加权积分。根据我们的公式,我们提出了DPM-Solver,这是一种通过收敛顺序保证的快速专用高阶求解器。 DPM溶剂适用于离散时间和连续时间DPM,而无需进行任何进一步的培训。实验结果表明,DPM-Solver可以在各种数据集上的10至20个功能评估中生成高质量的样本。我们在10个功能评估中实现了4.70 FID,在CIFAR10数据集上进行20个功能评估中的2.87 FID,与以前的各种数据集中的先前最先进的无培训样本器相比,$ 4 \ sim 16 \ times $速度。
translated by 谷歌翻译
我们提出了一种新颖的二阶优化框架,用于训练新兴的深度连续时间模型,特别是神经常规方程(神经杂物杂物)。由于他们的训练已经涉及昂贵的梯度计算来通过求解向后ode,因此导出有效的二阶方法变得高度不变。然而,灵感来自最近的最佳控制(OC)对训练深网络的解释,我们表明,可以采用称为差分编程的特定连续时间oC方法,以获得同一O(1 )内存成本。我们进一步探索了二阶衍生品的低级别表示,并表明它导致借助基于Kronecker的分子化的有效的预处理更新。由此产生的方法 - 命名的snopt - 收敛于壁钟时间中的一阶基线的速度要快得多,并且改进仍然在各种应用中保持一致,例如,图像分类,生成流量和时间序列预测。我们的框架还实现了直接的架构优化,例如神经杂物的集成时间,具有二阶反馈策略,加强了OC视角作为深度学习中优化的原则性工具。我们的代码可在上获得。
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
Relying on recent research results on Neural ODEs, this paper presents a methodology for the design of state observers for nonlinear systems based on Neural ODEs, learning Luenberger-like observers and their nonlinear extension (Kazantzis-Kravaris-Luenberger (KKL) observers) for systems with partially-known nonlinear dynamics and fully unknown nonlinear dynamics, respectively. In particular, for tuneable KKL observers, the relationship between the design of the observer and its trade-off between convergence speed and robustness is analysed and used as a basis for improving the robustness of the learning-based observer in training. We illustrate the advantages of this approach in numerical simulations.
translated by 谷歌翻译
随机偏微分方程(SPDES)是在随机性影响下模拟动态系统的选择的数学工具。通过将搜索SPDE的温和解决方案作为神经定点问题,我们介绍了神经SPDE模型,以便从部分观察到的数据中使用(可能随机)的PDE溶液运营商。我们的模型为两类物理启发神经架构提供了扩展。一方面,它延伸了神经CDES,SDES,RDE - RNN的连续时间类似物,因为即使当后者在无限尺寸状态空间中演变时,它也能够处理进入的顺序信息。另一方面,它扩展了神经运营商 - 神经网络的概括到函数空间之间的模型映射 - 因为它可以用于学习解决方案运算符$(U_0,\ xi)\ MapSto U $同时上的SPDES初始条件$ u_0 $和驾驶噪声$ \ xi $的实现。神经SPDE是不变的,它可以使用基于记忆有效的隐式分化的反向化的训练,并且一旦接受训练,其评估比传统求解器快3个数量级。在包括2D随机Navier-Stokes方程的各种半线性SPDES的实验证明了神经间隙如何能够以更好的准确性学习复杂的时空动态,并仅使用适度的培训数据与所有替代模型相比。
translated by 谷歌翻译
在许多学科中,动态系统的数据信息预测模型的开发引起了广泛的兴趣。我们提出了一个统一的框架,用于混合机械和机器学习方法,以从嘈杂和部分观察到的数据中识别动态系统。我们将纯数据驱动的学习与混合模型进行比较,这些学习结合了不完善的域知识。我们的公式与所选的机器学习模型不可知,在连续和离散的时间设置中都呈现,并且与表现出很大的内存和错误的模型误差兼容。首先,我们从学习理论的角度研究无内存线性(W.R.T.参数依赖性)模型误差,从而定义了过多的风险和概括误差。对于沿阵行的连续时间系统,我们证明,多余的风险和泛化误差都通过与T的正方形介于T的术语(指定训练数据的时间间隔)的术语界定。其次,我们研究了通过记忆建模而受益的方案,证明了两类连续时间复发性神经网络(RNN)的通用近似定理:两者都可以学习与内存有关的模型误差。此外,我们将一类RNN连接到储层计算,从而将学习依赖性错误的学习与使用随机特征在Banach空间之间进行监督学习的最新工作联系起来。给出了数值结果(Lorenz '63,Lorenz '96多尺度系统),以比较纯粹的数据驱动和混合方法,发现混合方法较少,渴望数据较少,并且更有效。最后,我们从数值上证明了如何利用数据同化来从嘈杂,部分观察到的数据中学习隐藏的动态,并说明了通过这种方法和培训此类模型来表示记忆的挑战。
translated by 谷歌翻译
我们介绍了一种新的随机验证算法,该算法正式地定量了配制成连续深度模型的任何连续过程的行为稳健性。我们的算法在给定的时间范围内解决了一组全局优化(GO)问题,以构造从初始状态的球开始的所有处理执行集的紧密机箱(管)。我们称我们的算法GoTube。通过其结构,GoTube确保边界管保守达到所需的概率和最高的紧密性。 GoTube以JAX实现,并优化以扩展到复杂的连续深度神经网络模型。与用于时间持续神经网络的高级可达性分析工具相比,GoTube不会在时间步骤之间积累过度估计误差,并避免符号技术中固有的臭名昭着包装效果。我们展示了GOTUBE在初始球,速度,时间 - 地平线,任务完成和大量实验中的可扩展性方面表现出最先进的验证工具。 GOTUBE是稳定的,并在其能够扩展到以前可能的视野的能力方面来设置最先进的。
translated by 谷歌翻译