Data Sampling — Input Distributions
Probability distributions are used for specifying input parameters used for sampling input data points used in model evaluations and Monte Carlo-type analyses as well as in response surfaces. For all UQ analysis types and methods, you can use uniform, normal (Gaussian), log-normal, gamma, beta, Gumbel, and Weibull distributions as the marginal distribution for each input parameter. Multiple input parameters including their dependency structure can be combined to specify their multivariate distribution. The Gaussian copula is the mechanism used to isolate the dependency from the marginal distribution. For the marginal distributions, except the uniform and beta distributions, lower and upper bounds are automatically computed based on cumulative distribution function levels. There is also a possibility to manually specify the bounds.
The uniform distribution is defined by the upper and lower bounds, and the probability density function (PDF) of the uniform distribution
.
The normal distribution is defined by the mean μ and the standard deviation σ, and the PDF of the normal distribution is
.
The log-normal distribution is defined by the mean μ and the standard deviation σ, and the PDF of the log-normal distribution is
.
The gamma distribution is defined by the shape parameter k and the scale θ, and the PDF of the gamma distribution is
.
The beta distribution is defined by the shape parameter α, the shape parameter deviation β, the upper and lower bounds a and b, respectively, and the PDF of the beta distribution is
.
The Weibull distribution is defined by the shape parameter k and the scale λ, and the PDF of the Weibull distribution is
.
The Gumbel distribution is defined by the location μ and the scale β, and the PDF of the Gumbel distribution is
.
The Gaussian copula for dependent input parameters in a unit hypercube of dimension d is defined as
where ΦP is the cumulative distribution function (CDF) of the multivariate normal distribution with correlation matrix P, and φ1 is the inverse of a standard univariate normal CDF.
If no correlation group is specified, the input parameters are independent. During the Latin hypercube sampling, Morris sampling, Monte Carlo sampling, as well as the importance sampling, the algorithms always first sample data in a unit interval for each parameter, ϕi ∈ [0, 1], where i = 1, m, and m is the number of input parameters. The sampling can therefore be seen as selecting points in a unit hypercube of dimension m, [0, 1]m. The boundaries of the hypercube correspond to the bound values for each physical parameter value pi. The CDF Fi and its inverse Fi1 are used to map the values between the unit interval to the input parameter values such that
where pi.l and pi,u are the lower and upper bounds, respectively. For the uniform distribution, this takes the simple form . The lower and upper bounds can be given manually in the user interface. However, for all distributions, except uniform and beta, where manual bounds are needed, they are by default computed automatically from a CDF level. For example, if a lower bound CDF level of 0.1% is used, the lower bound is computed by the corresponding inverse CDF function .
For input distributions specified in the same correlation group, the Latin hypercube sampling, Morris sampling, Monte Carlo sampling, as well as the importance sampling also first sample data in a unit interval for each parameter, . Then sampled data is transformed to dependent samples based on the Gaussian copula. Finally, the sample data are mapped to their marginal distributions.
If there are existing QoIs or important input parameter points that need to be evaluated, specified values can be used as the input parameter points when a GP-type surrogate model is selected. For UQ analysis computed with the specified values, the input parameter space is a space defined by the maximum and minimum values of the specified values in each dimension, which is used for the Monte Carlo-type analysis associated with the study. Note that specified values can also be used for the correlation method in sensitivity analysis.