Project in collaboration with Soroush Zamanian and Ge Liu
Failure in wastewater infrastructure systems is recognized as a serious, worldwide concern which can have irreversible impacts on health, environment, and the economy. Concrete sewer pipes are most commonly used in wastewater systems. Cracks form in them when the tensile stress on the pipes exceeds its tensile strain. The geometry of the pipe, materials used, properties of the soil in which the pipe is buried, etc. can affect the tensile stress on pipes. There are currently 11 known factors that can influence the tensile stress. In this project, we considered 5 of them. Our goal was to identify the dominant factors that influence the tensile stress on the pipe. We also estimated a safety region for the significant variables that would keep the tensile stress within permissible limits.
We used a simulator that provides the tensile stress as the output response. Due to the high computational complexity, the simulator takes about 2 hours for a single run. We fitted the simulator data to two models; Bayesian Gaussian Process (GP) and Bayesian Additive Regression Trees (BART). We then used the fitted models to perform Sensitivity Analysis (SA). We also selected one of the two models based on Mean Squared Prediction Error (MSPE) and it was used for some further analysis.
We chose a sample size of 50 where 40 samples were used as training data to fit the two models (Bayesian GP and BART) and the remaining 10 samples were used as testing data and to measure the predictive performance of each model. To generate a randomized design of the experiment, we used a space-filling design method, the Latin Hypercube Sampling (LHS) on the domain of the five predictors. Based on the prior information we have on the five predictors, all of them are independent lognormal random variables with corresponding mean and standard deviation of the Gaussian component as given in the table below.
Simulator inputs
For i = 1,...,5, we generate the inputs using LHS such that , where and are the 5th and the 95th quantile of the lognormal distribution of respectively. The response , tensile stress, is recorded by running the simulator. Before fitting the dataset to any model, the samples for each Xi are normalized to the interval .
The first model considered was the Bayesian Gaussian Process (GP) model with a non-zero trend function. Since the five material variables were correlated (the backfill shear velocity and the density of backfill were correlated and the bedding shear velocity and the density of bedding were correlated), we used the Bayesian GP model with trend function that captured the known structure in them. The output of a GP is given as
where is the assumed trend function in the GP and are the unknown regression coefficients. We use the separable Gaussian covariance function,
where are the unknown correlation parameters and is the precision. The trend functions are specified as,
In the Bayesian setting, the unknown parameters in our model , , are treated as random variables with known prior distributions. The aim is to estimate the parameters from the samples of posterior distribution. The choice of priors are , and then the posterior is also a Gamma distribution. For , the priors are , the choice of the hyperparameters in the Beta distribution is based on the prior information of the strength of the correlation between each Xi and Y(x).