NeuralUQ: A comprehensive library for uncertainty quantification in neural differential equations and operators
Abstract
Uncertainty quantification (UQ) in machine learning is currently drawing increasing research interest, driven by the rapid deployment of deep neural networks across different fields, such as computer vision, natural language processing, and the need for reliable tools in risk-sensitive applications. Recently, various machine learning models have also been developed to tackle problems in the field of scientific computing with applications to computational science and engineering (CSE). Physics-informed neural networks and deep operator networks are two such models for solving partial differential equations and learning operator mappings, respectively. In this regard, a comprehensive study of UQ methods tailored specifically for scientific machine learning (SciML) models has been provided in [psaros2022uncertainty]. Nevertheless, and despite their theoretical merit, implementations of these methods are not straightforward, especially in large-scale CSE applications, hindering their broad adoption in both research and industry settings. In this paper, we present an open-source Python library (github.com/Crunch-UQ4MI), termed NeuralUQ and accompanied by an educational tutorial, for employing UQ methods for SciML in a convenient and structured manner. The library, designed for both educational and research purposes, supports multiple modern UQ methods and SciML models. It is based on a succinct workflow and facilitates flexible employment and easy extensions by the users. We first present a tutorial of NeuralUQ and subsequently demonstrate its applicability and efficiency in four diverse examples, involving dynamical systems and high-dimensional parametric and time-dependent PDEs.
remarkRemark \newsiamremarkhypothesisHypothesis \newsiamthmclaimClaim \headersNeuralUQZ. Zou, X. Meng, A. F. Psaros, G. E. Karniadakis
ducation software, uncertainty quantification, scientific machine learning, PINNs, DeepONet, deep learning
65-01, 65-04, 65L99, 65M99, 65N99
1 Introduction
Physics-informed and more generally scientific machine learning (SciML) provide potent tools for accurately modeling and predicting the dynamic response of physical systems; see [karniadakis2021physicsinformed, cuomo2022scientific, willard2021integrating] for a comprehensive review as well as [cai2022physics, psaros2022uncertainty]. Neural networks (NNs) for solving ordinary/partial differential equations (ODEs, PDEs), such as the physics-informed NN (PINN), and for learning operator mappings, such as the deep operator network (DeepONet), seamlessly combine scattered observational data with available physical models (e.g. [lye2020deep, kashinath2021physics, trask2022enforcing, rao2020physics, sun2020surrogate]). Specifically, they can address mixed problems when only partial observational data and information regarding the underlying physical system are available (e.g., [raissi2020hidden, cai2021flow]); it is straightforward to incorporate noisy and multi-fidelity data (e.g., [yang2021bpinns, meng2022learning, meng2020composite, meng2021multifidelity]); they can provide physically consistent predictions even for extrapolatory tasks (e.g., [yang2021bpinns]); they do not require computationally expensive mesh generation (e.g., [raissi2019physicsinformed, lu2021deepxde]); and open-source software is available and continuously extended (e.g., [lu2021deepxde, hennigh2021nvidia, koryagin2019pydens, chen2020neurodiffeq, rackauckas2019diffeqflux, rackauckas2020universal, haghighat2021sciann, xu2020adcme]).
However, uncertainty, associated with noisy and limited data as well as NN overparameterization, for example, can degrade the accuracy of these models significantly [abdar2021review, psaros2022uncertainty, tran2016edward, gardner2018gpytorch]. As a result, uncertainty quantification (UQ) is paramount for deep learning to be reliably used in critical applications involving physical and biological systems [wilson2020case, qin2021deep, tipireddy2021time, babaee2020multifidelity, zhu2019physics, tripathy2018deep]. In Fig. 1, we provide an overview of SciML problems of interest as well as different sources of uncertainty. In addition to reliability concerns, UQ is also necessary for efficient and economical design; for capital budgeting and risk-adaptive decision making in related projects; and even for assessing investment opportunities involving such systems in the form of real options [richard2006real, de2008real]. Clearly, this broad view of applications gives to UQ an additional dimension pertaining to the potential of harnessing uncertainty, and renders it an indispensable research area of SciML applied to computational science and engineering problems and beyond.
Further, UQ in SciML is considerably more challenging than in traditional numerical solvers because of optimization-related issues of over-parameterized NNs, as well as the various uncertainties shown in Fig. 1. In addition, SciML is applied to increasingly more diverse problems, from steady to time-dependent and high-dimensional ones, and to problems involving systems given only in the form of black-box models (see, e.g., [yang2021bpinns, meng2022learning, psaros2022uncertainty, han2018solving, darbon2020overcoming]). Moreover, evaluating the performance of the available UQ methods is not straightforward, especially in situations involving limited data. In this regard, Psaros et al. [psaros2022uncertainty] provided an extensive review and proposed novel methods of UQ for SciML. Nevertheless, employing current methods in large-scale applications and developing new ones, where trustworthy implementations and fast experimentation are needed, requires an up-to-date, well-maintained, and versatile software package. To this end, we present in this paper an open-source Python library termed NeuralUQ, which can be accessed at github.com/Crunch-UQ4MI. We expect that NeuralUQ will be a useful toolbox for both practitioners and researchers, who either choose to apply off-the-shelf methods for solving their problems, or to build their own methods based on the ones the library currently supports. In passing, we note that NeuralUQ is constantly updated and its most recent capabilities can be viewed in the GitHub repository.
We organize the paper as follows. In Section 2, we briefly present SciML, specifically the PINN and DeepONet methods, as well as recent advances in UQ for SciML, which are the building blocks for the library; however, other neural models for PDEs and operators can be employed as well. In Section 3, we describe the design of the NeuralUQ library, which includes model construction, posterior inference, and performance evaluation as well as post-training calibration. In Section 4, we provide four computational examples for demonstrating the applicability and capabilities of our library. Lastly, in Section 5, we conclude with the findings of this work and discuss future research.
2 Uncertainty quantification for scientific machine learning
In this section, we first briefly describe various SciML problem scenarios and the recently developed deep learning methods for solving them. Subsequently, we review the related UQ methods as well as the approaches for model performance evaluation and for post-training calibration. We also provide a glossary table of frequently used terms in this paper, and in ML/SciML more generally, in Appendix 6.1. The interested reader is directed to [psaros2022uncertainty] for more detailed information regarding UQ methods for SciML.
2.1 Scientific machine learning
Consider a nonlinear ODE/PDE describing the dynamics of a physical system as follows: \cref@addtoresetequationparentequation
(1a) | ||||
(1b) |
where is the -dimensional space-time coordinate, is a bounded domain with boundary , is a random event in a probability space , while and represent the -dimensional source term and sought solution evaluated at , respectively. Further, is a general differential operator; is a boundary/initial condition operator acting on the domain boundary ; and is a boundary/initial condition term. Furthermore, , with , denotes the problem parameters. We refer to problems where the operators and are known, i.e., the differential operators that define them are specified, as neural ODEs/PDEs, because we use neural networks (NNs) throughout the present work to solve them. We refer to problems involving unknown operators and as neural operators, because we use NNs to learn them.
For neural ODEs/PDEs, , are known and deterministic, i.e. is fixed, while , and are partially unknown. Given noisy data of and sampled at random finite locations in and , respectively, as well as noisy data of and , the objective of this problem is to obtain the solution at every and the problem parameters at every . The dataset in this case is expressed as , where , , , , and are given as above. The PINN method, developed in [raissi2019physicsinformed], addresses the PDE problems deterministically, by constructing NN approximators, parameterized by , for modeling the solution . The approximator is substituted into Eq. (1) via automatic differentiation [tensorflow2015-whitepaper] for producing and . For solving the mixed deterministic PDE problem, an additional NN can be constructed for modeling . Finally, the optimization problem for obtaining in the mixed problem scenario with the dataset is cast as
(2) |
In this equation, are objective function weights for balancing the various terms in Eq. (2). Following determination of , and can be evaluated at any in , , respectively.
For neural operators, , are unknown and stochastic, while are partially unknown. Note that the stochasticity of and arises from the stochasticity in with . The available data consists of paired realizations of , and , each pair associated with a different . The realizations are considered noisy in general for , and clean for , and . Further, the objective of this problem is to learn the operator mapping defined by Eq. (1). The DeepONet method, developed in [lu2021learning] and designed for general nonlinear operator learning, addresses this problem by constructing an approximator using NNs, denoted by and parameterized by , to model the mapping from the input function domain to the output function domain. In the following and without loss of generality, we consider both and in Eq. (1) as known and the problem becomes learning the mapping from the parameter to the solution . Further, each value of in Eq. (1) corresponds to a different function, and thus to a different solution . For any , the approximator takes as input a representation of , and outputs a function, . The DeepONet consists of two sub-networks, one that takes the coordinates as input, called trunk net, and one that takes the discretized function as input, called branch net. Considering a one-dimensional , i.e., , each of the two networks produces a -dimensional output. The two outputs are merged by computing their inner product for producing the final prediction . For learning the mapping from to using the vanilla DeepONet [lu2021comprehensive], the minimization problem with the mean squared error (MSE) loss
(3) |
is solved, using the training dataset , where are the locations, are the sampled events and in this case represent the unique pairing between and , is the measurement of at the location for event , and is the representation of . Note that, to be consistent with neural ODEs/PDEs, we use to denote the approximator for operators in the rest of this paper. In this case, . DeepONets, once trained, could act as PDE-agnostic surrogates encoding the hidden physics for further learning and inferences [meng2022learning]. Finally, note that in addition to supervised learning using data as described above, DeepONet can alternatively be trained by employing only the PINN loss during training. As shown in [goswami2021physicsinformed], the most effective training is the hybrid one, where both physics and data are used in the training loss; see also [wang2021learning].
2.2 Uncertainty quantification
In contrast to the point estimates obtained via the optimization problems of Eqs. (2) and (3), the goal of UQ is to estimate the predictive distribution
(4) |
which is often referred to as Bayesian model average (BMA). In Eq. (4), denotes the posterior density, which can be obtained using Bayes’ rule via
(5) |
In Eq. (5), is the likelihood of the data, i.e., for independent and identically distributed (i.i.d.) data; is the probability density function for the prior distribution of the parameters ; and is called marginal likelihood or evidence, because it represents the probability that we observe . The evidence is given as
(6) |
The uncertainty of parameters is typically referred to as epistemic uncertainty.
Generally, inferring given , i.e., computing the posterior exactly via Eq. (5) is computationally and analytically intractable. This is addressed by approximate posterior inference, which approximates the posterior by another distribution and/or obtains representative samples from the posterior. Specifically, the Bayesian and the deterministic frameworks are the two main families of UQ methods for SciML. The former employs either a sampling method or a variational inference (VI) method to obtain samples from the posterior distribution, whereas the latter can be interpreted as an extension of standard training of deterministic NNs. Various UQ methods have been developed for SciML within these two frameworks. We adopt the unified framework proposed in [psaros2022uncertainty] to include all the UQ models employed in this study in Fig. 2, and present a short overview for them in the rest of this section.
For neural ODEs/PDEs, the solution and/or the problem parameters are modeled with surrogates parameterized by and denoted by in Fig. 2. The Bayesian PINNs (B-PINNs) [yang2020bayesian] and the functional prior (FP) based on the physics-informed generative adversarial networks (PI-GANs) [meng2022learning, yang2020physicsinformed] are two recently proposed physics-informed learning approaches with UQ within the Bayesian framework. The Bayesian neural networks (BNNs) and the generators from trained PI-GANs are employed as the surrogate for in these two methods, respectively. Further, either sampling or variational inference (VI) approaches can be utilized for obtaining the posterior samples for . In addition to the aforementioned methods, the Monte Carlo dropout [gal2016dropout] (MCD) as well as the deep ensemble [lakshminarayanan2017simple] (DEns) approaches, which employ standard, i.e., deterministic, NNs as surrogates for and/or , can be combined with PINNs to quantify uncertainties in problems with known physics [zhang2019quantifying, psaros2022uncertainty].
For neural operators, a DeepONet parameterized by is employed as the surrogate for the solution and denoted by in Fig. 2. In the Bayesian framework, [lin2021accelerated] employed BNNs in DeepONet and a sampling method is then utilized to obtain the posterior samples for the parameters in BNNs. As for deterministic methods, extensions of standard training of DeepONets, such as DEns, are used for quantifying uncertainties in predictions [psaros2022uncertainty, yang2022scalable, pickering2022discovering]. These models correspond to Fig. 2 with being a DeepONet and the dashed arrows are not used. Further, the inference input in these models, i.e. the input to the branch net, should be represented in the same way as used in the training, i.e., the number of the discrete values as well as their locations should be the same. For a more general case in which the inference input and/or output data are noisy and measured at a certain number of random locations, [meng2022learning] proposed to combine a pre-trained DeepONet with a GAN-based FP to quantify the uncertainties in predictions. In [psaros2022uncertainty], a BNN is utilized as an alternative to the generator of a GAN in [meng2022learning]. These two models correspond to Fig. 2 with being either a BNN or the generator from a pre-trained GAN and a pre-trained DeepONet.
In all cases, different values correspond to different functions . We refer to the surrogate combined with a distribution for as a process. Following approximate posterior inference, the posterior samples for are used for obtaining the statistics of the function samples. In this regard, the value of at a location given the data is a random variable denoted by . The mean of is denoted by and approximated by as
(7) |
where is the set of NN predictions corresponding to the samples . Further, the diagonal part of the covariance matrix of is given by
(8) |
In this equation, represents the approximate total uncertainty of , while and denote the aleatoric and the epistemic parts of the total uncertainty, respectively. In practice, a Gaussian (or, e.g., a Student-t) distribution is commonly fitted to the posterior samples. For example, we denote as the Gaussian approximation to the samples at an arbitrary location , where and are obtained via Eqs. (7) and (8), respectively.
2.3 Performance evaluation and post-training calibration
Performance evaluation is necessary for model selection, e.g., prior/architecture optimization, comparison between UQ methods, and overall quality evaluation of the UQ design procedure [pickering2022structure]. As reviewed in [psaros2022uncertainty], performance evaluation metrics include, but are not limited to, the relative error for evaluating accuracy, the mean predictive likelihood for evaluating predictive capacity, and the root mean squared calibration error for evaluating statistical consistency; i.e., how closely the predicted model matches the data-generating distribution.
Further, calibration approaches can be adopted to improve statistical consistency in UQ for SciML [psaros2022uncertainty]. Various approaches have been developed for calibration, e.g., [levi2019evaluating] re-weights the predicted variance optimally to calibrate the output uncertainty; [kuleshov2018accurate] approximates the mis-calibration function and applies it to all predictive distributions, and thus modifies both the output mean and variance; and [zelikman2020crude] fits an empirical distribution to the scaled residuals, and also calibrates both the output mean and variance. To use these approaches, an additional dataset is required after training. This can be a left-out calibration dataset, similar to validation and test sets.
3 Overview and details of NeuralUQ
In this section, we first present an overview of the NeuralUQ library, and subsequently provide details regarding its design, use, and also customization for extended capabilities.
3.1 Library overview
As mentioned in the previous section, the value of at a location given the data is a random variable denoted by . The aim of UQ in the library is to obtain, calibrate, compute the statistics, and evaluate the quality of samples drawn from the posterior distribution . To that end, in the design phase a model is constructed that includes surrogates representing the solution and potentially and/or , as well as likelihood functions representing the data distribution. Surrogates are parameterized by and when combined with probability distributions of are represented as stochastic processes; i.e., a proposed family of solutions. To obtain the function samples for the solutions, a posterior inference method is combined with the model. Finally, the obtained function samples can be calibrated using additional data. In the revision phase, metrics are used for evaluating the performance and re-designing the solution procedure. A schematic workflow of the library is provided in Fig. 3 and Procedure 1.
3.2 Library design
UQ methods in NeuralUQ | |||
---|---|---|---|
UQ framework | UQ family | Process | Inference methods |
Bayesian | Samplable | BNNs, Gens | HMC, LD, MALA, NoUTurn |
Variational | BNNs, Gens | MFVI, MCD | |
Deterministic | Trainable | FNNs, Gens, DeepONets | DEns, SEns, LA |
As discussed in Section 2.2, there are two general UQ frameworks for SciML, namely the Bayesian and the deterministic frameworks. The former relies on Bayes’ theorem, and includes sampling methods, such as Monte Carlo Markov Chain (MCMC), and VI methods, such as mean-field variational inference (MFVI). The latter relies on standard NN training, i.e., minimizing/maximizing an objective function, typically using the MSE or the maximum a posteriori (MAP) metric, between measurements and NN predictions.
The procedural differences of the aforementioned methods necessitate the development of three different UQ method families in our library, i.e., the Samplable, Variational, and Trainable to include all the sampling-, VI- and standard-NN-based methods, respectively. Specifically, the sampling methods target directly the unnormalized posterior density function, without employing and optimizing any trainable variables. The VI-based methods employ pre-defined distributions parameterized by trainable variables, which are optimized to approximate the true posterior distributions. Finally, the standard-NN-based UQ methods obtain point estimates for the parameters in the surrogates using an optimizer, rather than directly sampling from or approximating the posterior distributions. A summary of the currently supported UQ methods in NeuralUQ is provided in Table 1.
3.3 Details of NeuralUQ components
In this section, we describe the main components, built-in functionalities and how to use NeuralUQ.
3.3.1 Processes
As described in Procedure 1, the first step for using NeuralUQ is to specify a process for the sought solution, e.g., BNNs or pre-trained generators, according to the task at hand. In this regard, the surrogates, variables, and Process modules have been developed in NeuralUQ. Specifically, the surrogates module expresses how and are combined to yield the approximators or ( or in Fig. 2); the variables module is used to specify the attributes of the parameters in surrogates, e.g., prior/posterior distributions (Fig. 4); and the Process module is an interface for the implementation of the unified models in Fig. 2, which connects the surrogate with its associated parameters.
Currently, NeuralUQ provides four built-in surrogates, namely, surrogates.identity, used as a surrogate for an unknown constant; fnn, for a fully-connected neural network (FNN); generator, for the generator of a PI-GAN pre-trained with historical data; and deeponet, for a DeepONet. The variables module, on the other hand, is used to specify the prior and/or posterior distributions of surrogate parameters. For instance, the prior distribution is required for surrogate parameters in the Samplable family, while both the prior and the posterior distributions are required for parameters in the Variational family. The built-in variables of NeuralUQ are variables.const, used for parameters of unknown constants as well as the generators of pre-trained GANs/PI-GANs; and fnn, used for FNNs and also DeepONets. Further, as presented in Fig. 4 (also in Table 1 and Section 3.2), different UQ method families support different surrogates and are compatible with different variables. In NeuralUQ, variables.fnn and variables.const are decomposed into three different sub-modules, each one pertaining to a UQ method family of Table 1, i.e., variables.fnn/const.Samplable, Variational, and Trainable.
With the specified surrogate and parameters, we can then instantiate a Process object to create a corresponding process. The surrogate and the attributes for its parameters can then be accessed via Process.surrogate and Process.prior/posterior, respectively. Also, for problems with unknown parameters or multi-physics, more than one process may be employed. The module Process.key can be used to trace each process in NeuralUQ.
3.3.2 Likelihoods
In the Samplable and Variational families, likelihood distributions for the given data are required to construct the unnormalized posterior distributions of the surrogate parameters. Similarly, a metric, namely, MSE or MAP, is required for methods in the Trainable family. In NeuralUQ, the likelihoods module can be used to define the likelihood distributions in Samplable and Variational, as well as the MSE/MAP metrics in Trainable (i.e., Step 3 in Procedure 1). In addition, the currently supported likelihood for Samplable and Variational is the Gaussian distribution, accessed via likelihoods.Normal, and the supported metric for Trainable is the MSE, accessed via likelihoods.MSE. Each likelihood receives inputs, targets and processes as input arguments, for specifying the locations of the measurements, the measurements, and the corresponding processes used to model them. For tasks involving PDEs, the argument pde to the likelihood, which is a Python callable defined separately (i.e., Step 2 in Procedure 1), is also required.
3.3.3 Inferences
Uncertainty in the sought solution and/or unknown problem parameters is associated with the uncertainty of the surrogate parameters. The inferences module has been developed to obtain the posterior samples for the surrogate parameters in NeuralUQ (i.e., Step 5 in Procedure 1). As shown in Table 1, NeuralUQ provides various built-in inference methods in each of the UQ method families. Specifically, NeuralUQ currently supports inferences.HMC, LD, MALA, and NoUTurn in Samplable; inferences.VI, MCD in Variational; and inferences.DEns and SEns in Trainable. Short descriptions of these methods can be found in Table 2.
Inference methods for posterior estimation | ||
Full name | Abbreviation | Short description |
Hamiltonian | HMC | Simulates Hamiltonian dynamics |
Monte Carlo [neal2011mcmc] | followed by the Metropolis-Hasting algorithm | |
Metropolis-adjusted | MALA | Simulates Langevin dynamics |
Langevin algorithm [roberts1998optimal, xifara2014langevin] | followed by the Metropolis-Hasting algorithm | |
Langevin dynamics [neal2011mcmc, welling2011bayesian] | LD | One leap-frog step of HMC |
NoUTurn [hoffman2014no, betancourt2017conceptual] | NUTS | A variant of HMC that adaptively chooses |
the number of steps for the leap-frog integrator | ||
Mean-field | MFVI | Approximates by a Gaussian variational |
variational inference [blundell2015weight] | distribution with diagonal covariance matrix | |
Monte Carlo | MCD | Approximates by a Bernoulli variational |
dropout [gal2016dropout] | distribution with fixed dropout rate | |
Deep ensemble [lakshminarayanan2017simple] | DEns | Standard training times independently |
with random initializations | ||
Snapshot ensemble [huang2017snapshot] | SEns | Standard training with learning rate |
schedule and snapshots | ||
Laplace | LA | Standard training followed by fitting a |
approximation [tierney1986accurate, ritter2018scalable] | Gaussian to approximate |
3.3.4 Performance evaluation and post-training calibration
Performance evaluation, model revision and post-training prediction improvement are crucial for reliably deploying UQ methods in risk-sensitive SciML settings. In NeuralUQ, we have developed the metrics module, which currently includes two different metrics, i.e., metrics.RL2E/MSE, and NLL (i.e., Step 9 in Procedure 1). The former serves as an evaluation metric of the predicted mean and computes the relative error (RL2E) or MSE between the predicted mean and the reference solution, whereas the latter serves as a metric of the predicted uncertainty and computes the negative log likelihood (NLL) [yao2019quality, rahaman2021uncertainty].
Further, we have developed the calibrations module for performing post-training calibration (i.e., Step 10 in Procedure 1). Specifically, this module currently includes one sub-module, i.e., calibrations.var, for the implementation of the method proposed in [levi2019evaluating]. We present a workflow for calibration in Procedure 2, and ; a demonstration example involving post-training calibration can be found in the GitHub repository of the library. We will keep updating NeuralUQ to support more metrics and calibration methods, e.g., [chung2021uncertainty], in the future.
3.4 Customization
The components of NeuralUQ, including surrogates, variables, likelihoods, and inferences, are highly flexible and customizable. In this section, we describe how to configure the library for extending current capabilities and addressing problem scenarios beyond the ones considered in this study.
3.4.1 Surrogates
As mentioned in Section 3.3.1, NeuralUQ currently supports four surrogates, namely, identity, fnn, generator, and deeponet. Users may need to develop and incorporate in the library alternative surrogates. In NeuralUQ, this can be achieved in a straightforward manner by following Procedure 3. Demonstration examples involving customizing new surrogates are presented in Appendix 6.2.
3.4.2 Variables
As mentioned in Section 3.3.1, variables.fnn/const is decomposed into three sub-modules for different UQ method families, and can be customized according to the task at hand. In this section, customization of the variables in the first two families is discussed, as the corresponding variables influence more the performance as compared to the variables in the third one. The built-in prior distribution for Samplable and Variational variables is the Gaussian distribution. A straightforward customization in this context pertains to modifying the mean and/or the standard deviation to obtain different priors. Indicatively, adding a Samplable in NeuralUQ variable can be done by following Procedure 4, which includes the setup of the initial values as well as the log probability density function. Another option pertains to selecting alternative parameterized distributions for performing VI. Demonstration examples involving prior customization as well as selecting alternative distributions for performing VI are presented in Appendix 6.3.
3.4.3 Inference methods
As presented in Section 3.3.3 and summarized in Table 2, NeuralUQ supports various inference methods for each of the UQ method families, i.e., Samplable, Variational, and Trainable. However, depending on the task at hand, users may need to develop in the library new inference methods, e.g., for reducing computational cost. Indicatively, adding a new sampling method can be done by following Procedure 5. In passing, note that all sampling methods included currently in NeuralUQ are implemented using the TensorFlow Probability MCMC python package [lao2020tfp, dillon2017tensorflow], which is computationally efficient and straightforward to use. For consistency reasons, we recommend that users of NeuralUQ employ this package for adding new sampling methods to the library.
4 Demonstration examples
In this section, we provide four computational examples for demonstrating the applicability and capabilities of our library. Specifically, we quantify uncertainty in both neural ODEs/PDEs and neural operators applied to the following problems: (1) a Kraichnan-Orszag system, (2) a susceptible-infected-recovered-dead (SIRD) model for COVID-19, (3) a Korteweg-de Vries problem, and (4) a 100-dimensional Darcy problem. In the first three cases, we obtain and present results using uncertain PINNs, i.e., B-PINNs combined with HMC and PINNs combined with DEns, as representative examples of the Bayesian and deterministic methods, respectively. For the 100-dimensional Darcy problem, we present results using the DeepONet combined with two physics-agnostic models, i.e., physics-agnostic GAN functional prior (PA-GAN-FP) and physics-agnostic BNN functional prior (PA-BNN-FP), as well as DEns. Results corresponding to additional UQ methods and details on the computations in each case, e.g., architectures of NNs and parameters used in the training, are provided in the GitHub repository of our library.
4.1 Kraichnan-Orszag system
The Kraichnan-Orszag model consisting of three coupled nonlinear ODEs describes the temporal evolution of a system with several interacting inviscid shear waves, whose initial conditions are drawn from a Gaussian distribution [wan2006multi]. Here, we use NeuralUQ to solve the inverse Kraichnan-Orszag problem, i.e., identify the unknowns in the governing equations given sparse and noisy measurements on the solutions without the knowledge of the initial conditions. The governing equations are:
(9) | ||||
(10) |
where and are constants.
|
|
Particularly for this problem, we assume that and are unknown, whereas 11 noisy measurements of and are available, and 7 of ; see also Fig. 5. The noise for all measurements is assumed to be Gaussian with zero mean and a known standard deviation set to 0.05. The objective here is to identify and , as well as to reconstruct the complete solutions , , and for , with uncertainties.
We present the predictions for and using different approaches in Fig. 5 and Table 3, respectively. As shown in Fig. 5, for both HMC and DEns, the predicted uncertainty increases at locations with no available data. Further, the error between the predicted mean and the reference solution for is bounded by the total uncertainty. Furthermore, as shown in Table 3, the predicted values for and , using both approaches, are close to the reference values. In the table, the results using MFVI and MCD are also provided for comparisons. In Table 4, we evaluate the predictions for using the RL2E and the NLL metrics. As shown, HMC and DEns provide the most reasonable predictions for the mean as well as uncertainties according to the computed metrics.
HMC | DEns | MFVI | MCD | |
---|---|---|---|---|
(mean std) | ||||
(mean std) |
HMC | DEns | MFVI | MCD | |
---|---|---|---|---|
RL2E | 0.0579 | 0.0585 | 0.2232 | 0.1527 |
NLL () | -3.6558 | -4.0275 | -0.1187 | -1.8930 |
4.2 SIRD model for COVID-19
Scientific models are critical tools for understanding and forecasting the spread of the COVID-19. However, constructing models that are capable of accurately describing the COVID-19 still remains a challenge. Here, we demonstrate the capability of NeuralUQ for learning the corresponding model with uncertainties given measurements. In particular, we consider the SIRD compartmental model with time-dependent parameters/coefficients/rates for the COVID-19 [chen2020time, calafiore2020time, sen2021use], expressed as:
(11) | ||||
where , the transmission rate, , the recovery rate, and , the death rate, are unknown and assumed to be time-dependent functions, and is the total population, which is assumed to be constant and equal to the summation of throughout the time span. and are the number of individuals susceptible to the disease, the number of individuals infected, the (cumulative) number of individuals recovered, and the (cumulative) number of individuals deceased from the disease, respectively, at time . In this example, we use the COVID-19 dataset from Italy [dong2020interactive], which is the same as in [kharazmi2021identifiability].
|
|
In this problem, we utilize the data on , , and in the first 160 days for training. The objective is to predict , , and for , and identify the unknown functions , , and for . The results from HMC and DEns are presented in Fig. 6. We observe that the error between the predicted mean and the reference solution of from both approaches is bounded by the total uncertainties. Further, HMC and DEns yield different predictions for and for times , while they are quite similar for . Also, both methods predict a decrease of and with time. In addition, the predictions for from both methods are similar, i.e., they are close to zero for . Note that reference solutions for the unknown functions , , and , are not available and thus corresponding comparisons are not provided here.
|
|
4.3 Inverse KdV problem
The Korteweg-de Vries (KdV) equation, derived more than 150 years ago, can be used to model surface waves in a shallow canal. It has been applied in various disciplines for modeling surface gravity waves, internal solitons in the ocean, magma flows, and conduit waves, for example. In this section, we solve an inverse KdV problem using NeuralUQ. Specifically, we discover the KdV equation given data for a system with interactions between two solitary waves. The KdV equation is expressed in the general form:
(12) |
where and are constants, and is considered. The exact solution can be expressed as [benes2006decompositions]
(13) |
where
(14) |
and
(15) | ||||
where and are considered.
For the inverse problem, the objective is to identify and , which are considered unknown, and to predict in the entire spatial-temporal domain, given partial noisy measurements on and . Specifically, 200 and 100 random noisy measurements on and , respectively, are considered available; see also Fig. 7. The noise for the measurements on both and is assumed to be Gaussian with zero mean and known standard deviation set to 0.1.
Shown in Figs. 7-7 are the errors between the HMC and DEns predicted means, for at and 0.4, and the reference solutions bounded by the predicted total uncertainties. Further, the predicted values for and , which are provided in Table 5, are close to the reference values for both methods.
HMC | DEns | |
---|---|---|
(mean std) | 1.4095 0.0372 | 1.5452 0.0672 |
(mean std) | 0.2642 0.0057 | 0.3249 0.0234 |
4.4 100-dimensional Darcy problem
Flow through porous media is studied in various disciplines, such as chemical engineering, e.g., for flows in packed-bed catalytic reactors, oil recovery, e.g., for the displacement of oil with water, and in geophysics e.g., for the transport of pollutant in soil. A steady flow through porous media in two dimensions is considered here, which is described by the Darcy’s law as follows:
(16) |
where is the hydraulic conductivity field, denotes the hydraulic head, and is considered. The boundary conditions are: \cref@addtoresetequationparentequation
(17a) | |||
(17b) |
where denotes the unit normal vector of the boundary.
In general, is determined by the pore structure. In this study, we employ a stochastic model for to consider porous media with different micro-structures. In particular, , where is a truncated Karhunen-Loève expansion of a Gaussian process with zero mean and the following kernel: \cref@addtoresetequationparentequation
(18a) | |||
(18b) |
and we only keep the first 100 leading terms of the expansion.
We consider two different scenarios here, i.e., Case A, in which we employ a DeepONet mapping to pre-trained with 9900 different paired data (, ), and the test data are partial noisy measurements on and ; and Case B, in which we employ the DEns method to train a DeepONet using the same training data as in Case A, which also maps to . Complete measurements are used as test data, i.e., a measurement is available at each required location of the DeepONet input. Particularly, we refer to the DeepONet trained with DEns as uncertain DeepONet (U-DeepONet) following [psaros2022uncertainty].
4.4.1 Physics-agnostic functional prior
|
|
For Case A, 20 noisy measurements of and are assumed to be available; see also Figs. 8 and 9 for the respective locations. The objective is to reconstruct the entire and given partial noisy measurements of . Specifically, a pre-trained DeepONet combined with two physics-agnostic models, i.e. a PA-BNN-FP and a PA-GAN-FP, are considered. In particular, the PA-GAN-FP is pre-trained using the same data for training the DeepONet to learn the functional prior for , while we specify a standard normal distribution as the prior for each parameter in the PA-BNN-FP for the prior of . For posterior inference, HMC is used for both models. The details for the pre-trained of DeepONet and PA-GAN-FP can be found in the GitHub repository.
With the pre-trained DeepONet and the physics-agnostic models as the prior in function space, we present the predictions for and in Figs. 8 and 9, respectively. We observe that the errors between the predicted means and the reference solutions for and are mostly bounded by the predicted uncertainties in both methods. Further, the predicted uncertainties from PA-GAN-FP are smaller than those from PA-BNN-FP, which is expected since PA-GAN-FP is equipped with a more informative prior. Interested readers can refer to [meng2022learning, psaros2022uncertainty] for more comparison between the GAN- and BNN-FP.
|
|
4.4.2 Uncertain DeepONet
For Case B, we assume that a complete input is available, i.e., a datapoint is available at each required location of . We quantify model uncertainty in the prediction of using DEns. In particular, we run the DeepONet training algorithm five times with different parameter initializations, and subsequently compute the predicted mean and uncertainties based on the five predictions of ; see Eqs. (7)-(8). The error between the predicted mean and the reference solution of for an unseen is bounded by the predicted uncertainties, as shown in Fig. 10.
5 Summary
Scientific machine learning (SciML) has emerged recently as an effective and powerful tool for data fusion, solving ordinary/partial differential equations (ODEs, PDEs), and learning operator mappings in various scientific and engineering disciplines. Indicative SciML models are the physics-informed neural networks (PINNs) for solving forward and inverse ODE/PDE problems and the deep operator networks (DeepONets) for learning operators that can, for instance, be used as fast solvers of ODEs/PDEs. In this context, quantifying predictive uncertainties is crucial for risk-sensitive applications as well as for efficient and economical design. In this regard, a comprehensive study of UQ methods for various SciML models, e.g., neural networks (NNs), PINNs, DeepONets, has been provided in [psaros2022uncertainty].
However, due to the fact that the implementations of current UQ methods are not straightforward, we have developed in this paper an open-source Python library (github.com/Crunch-UQ4MI), termed NeuralUQ, that serves as an efficient and reliable toolbox for UQ in SciML. In this study, we have presented a detailed tutorial of NeuralUQ, followed by four numerical examples, including dynamical systems and high-dimensional parametric and time-dependent PDEs, to demonstrate the applicability, reliability, and efficiency of NeuralUQ. We have further demonstrated the flexibility and customizability of NeuralUQ for supporting alternative surrogates, prior and posterior parameter distributions, as well as the inference methods. Indicative such modifications have been described and applied in regression and differential equation problems. More generally, in addition to the UQ methods presented herein for addressing the problem scenarios of Section 2.1, NeuralUQ enables the users to address additional scenarios by combining the built-in components, such as the different surrogates and inference methods.
Despite the plurality of methods implemented in NeuralUQ, appropriately selecting a UQ method for solving a specific problem is not trivial. In this regard, we provide a few empirical remarks in the following. Deterministic methods, such as deep ensemble, are computationally efficient, and they provide reasonable model uncertainties for both neural ODE/PDE and neural operators problems. Further, they can handle big data by using data mini-batches, as in standard neural network (NN) training. For problems with noisy data, NN parameter regularization can be used for addressing overfitting. However, this may have a negative effect on predicted uncertainties. Specifically, a large regularization weight can lead to small model uncertainty, whereas a small weight may be insufficient for addressing overfitting issues. As a potential remedy, different regularization weights can be used and compared by evaluating their corresponding predicted uncertainties using metrics and validation data. As for Bayesian methods, Hamiltonian Monte Carlo (HMC) typically achieves both satisfactory computational accuracy and efficiency for posterior estimation for problems with relatively small datasets, of the order of a few hundred measurements. For problems involving big data, the methods Langevin dynamics (LD), mean-field variational inference (MFVI), and Monte Carlo dropout (MCD), which support mini-batch training in a straightforward manner, are computationally more efficient. For solving neural ODE/PDE problems without and with historical data using Bayesian methods, we recommend the use of a BNN and the generator of a pre-trained physics-informed generative adversarial network as surrogates, respectively. Further, as a rule of thumb for neural operator problems, the physics-agnostic functional prior (PA-FP) can be used to quantify the uncertainties caused by incomplete data, and the deep ensemble for quantifying the epistemic uncertainties. Furthermore, the methods HMC and LD/MFVI are recommended for problems with small and big data, respectively, if FP is used.
Overall, we envision that NeuralUQ will be used in the future in large-scale computational science and engineering applications as a toolbox of reliable and efficient UQ implementations. Clearly, such applications require up-to-date, well-maintained, and versatile software packages. In this regard, indicative extensions of our library relate to incorporating more sampling methods that can handle big data, e.g., stochastic gradient Hamiltonian Monte Carlo, and variational inference methods with more accurate parameterized posterior distributions, e.g., normalizing flows. Finally, we will continue updating the library and we also encourage future users and researchers to develop additional components for NeuralUQ with the goal of expanding its current capabilities and advancing the field.
Acknowledgments
This work is supported by the DOE PhILMs project (No. DE-SC0019453), the MURI-AFOSR FA9550-20-1-0358 projects, and the DARPA HR00112290029 project.
References
6 Appendix
In this section, we provide a glossary table in Appendix 6.1, and an example for surrogate customization, i.e., a one-dimensional regression problem, in Appendix 6.2. We then demonstrate the customization for prior in Samplable and posterior estimation in Variational, using a one-dimensional differential equation problem, in Appendix 6.3, and provide the details for data generation in Appendix 6.4.
6.1 Glossary for frequently used terms
In Table 6, we provide a glossary table of frequently used terms in this paper, and in ML/SciML more generally.
Glossary | |||
---|---|---|---|
UQ | uncertainty quantification | SciML | scientific machine learning |
DNNs | deep neural networks | BNNs | Bayesian neural networks |
GANs | generative adversarial networks | FNN | fully-connected neural network |
PINNs | physics-informed neural networks | DeepONets | deep operator networks |
FP | functional prior | PI-GANs | physics-informed GANs |
PA-FP | physics-agnostic FP | GAN-FP | GAN-based FP |
BNN-FP | BNN-based FP |
6.2 Customization of surrogates
We demonstrate the surrogate customization, using a one-dimensional regression problem. The target function is:
(19) |
We assume that we have 3 noisy measurements of , which are equidistantly distributed in . The noise for the measurements is assumed to be Gaussian with zero mean and known standard deviation set to 0.05.
We employ three different surrogates for different scenarios: (1) We have neither historical data nor prior knowledge on the formulation of the target function. We then employ a BNN with one hidden layer which has 50 neurons and the hyperbolic tangent activation function as the surrogate. In addition, the standard normal distribution is utilized as the prior distribution for each weight and bias in the BNN; (2) the expression describing the target function is not considered as known, but we have historical data for it. We then train a GAN to learn the functional prior using historical data; and (3) we have prior knowledge regarding the target function expressing that it is a sine function with unknown amplitude and frequency, i.e., , where and are parameters to be inferred given measurements. The details regarding the pre-trained GAN in (2) can be seen in the GitHub repository, and the posteriors for the parameters in each surrogate are computed using HMC.
As shown in Fig. 11, the predictions from the GAN-FP and surrogates are quite similar for both the predicted means and uncertainties, which are more accurate than the results from the BNN surrogate in terms of the means. Also, the errors between the predicted means and the reference solution are mostly bounded by the predicted uncertainties from GAN-FP and , whereas the errors are only partly bounded by the predicted uncertainties in BNN. Overall, we demonstrate the flexibility and customizability of NeuralUQ by customizing problem-dependent surrogates to achieve better predicted accuracy.
6.3 Customization of prior and inference
We demonstrate the customization for prior in Samplable and posterior estimation in Variational, using a one-dimensional differential equation problem. Specifically, we consider a nonlinear diffusion-reaction system expressed as follows:
(20) |
where represents the concentration of a certain solute, denotes the diffusion coefficient, is the reaction rate, and is the source term.
We first demonstrate the customization of prior using the example of an inverse problem described by Eq. (20). In this problem, is a known constant, and we have a few noisy measurements on , and . The objective is to infer and for , and the reaction rate which is an unknown constant. Specifically, we have noisy measurements of at 5 random locations, and 17 noisy equidistant measurements of . We then employ a BNN as well as a Samplable variable as the surrogate for and , respectively. The equation can then be encoded using the automatic differentiation as in PINNs. The employed BNN has 3 hidden layers with 50 neurons per layers, and the hyperbolic tangent activation function is used as the activation function. We assume that each weight/bias has the same prior distribution in this BNN, which is independent Gaussian here, i.e., . Further, we employ three different prior distributions for , i.e., , and , where refers to half-normal distribution. The posterior samples for the parameters in BNN as well as are obtained using HMC. We only present the predictions for in Table 7, for simplicity. As shown, the predictions for are strongly affected by its prior distribution. Overall, NeuralUQ enables the users to utilize different prior distributions to achieve better accuracy.
Priors for | |||
---|---|---|---|
mean std | 0.5727 0.4407 | 0.4131 0.2488 | 0.4538 0.2550 |
|
|
We proceed to show the customization of posterior estimate in Variational using the example of a forward problem governed by Eq. (20). In particular, , , and the boundary condition is . In addition, we assume that we have 10 noisy measurements of which are randomly distributed at . The noise for the measurements, which is , is assumed to be known. The objective is to infer given data of . We employ a PI-GAN, which is the same as in [meng2022learning], to solve this problem. The same historical data in [meng2022learning] is utilized here to train the PI-GAN to obtain the functional priors for and . Note that HMC is employed in [meng2022learning] for the posterior estimation of the parameters in PI-GAN. Here we utilize VI with two different parameterized posterior distributions to demonstrate the flexibility of NeuralUQ, i.e., independent Gaussian with unknown means and standard deviations (mean-field Gaussian), and independent Bernoulli distributions with fixed as the probability of success/failure, multiplied by unknown variables (mean-field Bernoulli). The predictions for and are displayed in Fig. 12, it is observed that (1) the predicted means for and from the two parameterized posteriors are similar, and (2) the errors between the predicted means for and are mostly bounded by the predicted uncertainties in both methods.
Finally, note that we employ these specific prior/posterior distributions for demonstration purposes. Interested users of NeuralUQ can also customize other problem-dependent prior/posterior distributions by following the examples presented above.
6.4 Data generation
Regarding data generation, in Section 4.1, the nonstiff differential equations solver ode45 in Matlab is utilized for solving Eq. (9) to generate the training data as well as the reference solutions; in Section 4.3, the training data are generated from the exact solution of from [benes2006decompositions]; and in Section 4.4, the finite-element-based Partial Differential Equation Toolbox in Matlab is utilized for solving Eq. (16) to generate the training data as well as the reference solutions. Note that all data as well as the employed NNs related to this study can be accessed in the GitHub repository of NeuralUQ.