appendix.tex


\chapter{Functions and Datasets}

\section{Analytical Functions}
\label{sec:functions}

Analytical functions---see~\cref{tab:functions}---with increasing numbers of input dimensions are presented, namely: \textit{(i)} \textit{Rosenbrock} ; \textit{(ii)} \textit{Michalewicz} ; \textit{(iii)} \textit{Branin} ; \textit{(iv)} \textit{Ishigami} ; and \textit{(v)} \textit{g-function}~\cite{molga2005,ishigami1990,Saltelli2007,legratiet2016,forrester2008a}. They are all widely used because they are nonlinear and nonmonotonic. Moreover, two versions of the \emph{g-function} 11-D are also used. \textit{g-function (i)} 11-D demonstrates the behaviour of the methods with a small number of input parameters contributing to the QoI, whereas \textit{g-function (ii)} 11-D exhibits more influent input parameters.

\begin{table}[ht]
\centering
\setcellgapes{5pt}
\makegapedcells
\begin{tabular}{lll}
\toprule
Function&Hypercube & Definition \\
\cmidrule{1-3}
\textit{Rosenbrock} 2-D& $[-2.048, 2.048]^2$ & $
f(X_1, X_2) = \sum_{i = 1}^{d-1}[100(x_{i+1} - x_i^2)^2  +(x_i -1)^2]$\\
\textit{Michalewicz} 2-D& $[0, \pi]^2$ & $
f(X_1, X_2) = -\sum_{i=1}^d \sin(x_i)\sin^{2m}\left(\frac{ix_i^2}{\pi}\right)$\\
\textit{Branin} 2-D& $[-5, 10] \times [0, 15]$ &
$\begin{multlined}[t][0.6\linewidth]
f(X_1, X_2) = \left( x_2 - \frac{5.1}{4\pi^2}x_1^2 + \frac{5}{\pi}x_1 - 6
              \right)^2\\[-1ex]
+ 10 \left[ \left( 1 - \frac{1}{8\pi} \right) \cos(x_1) + 1 \right] + 5x_1\hfill
\end{multlined}$\\
\textit{Ishigami} 3-D& $[-\pi, \pi]^3$ & $ f(X_1, X_2, X_3) = \sin X_1 + 7 \sin^2 X_2 + 0.1 X_3^4 \sin X_1 $\\
\textit{g-function} 4-D& $[0, 1]^4$ & $
f(X_1, X_2, X_3, X_4) = \prod_{i=1}^4 \frac{\lvert 4X_i - 2\rvert + a_i}{1 + a_i}, \quad a_{i} = i$\\
\textit{g-function (i)} 11-D & $[0, 1]^{11}$ & 
$\begin{multlined}[t][0.6\linewidth]
f(X_1, ..., X_{11}) = \prod_{i=1}^{11} \frac{\lvert 4X_i - 2\rvert + a_i}{1 + a_i},\\[-1ex]
\hspace{0pt} \mathbf{a} = [1, 2, 5, 10, 20, 50, 100, 500, 10^3, 10^3, 10^3]\hfill
\end{multlined}$\\
\textit{g-function (ii)} 11-D & $[0, 1]^{11}$ &
$\begin{multlined}[t][0.6\linewidth]
f(X_1, ..., X_{11}) = \prod_{i=1}^{11} \frac{\lvert 4X_i - 2\rvert + a_i}{1 + a_i},\\[-1ex]
\hspace{0pt} \mathbf{a} = [1, 2, 2, 3, 3, 10, 50, 50, 50, 100, 100]\hfill
\end{multlined}$\\
\bottomrule
\end{tabular}
\caption{Analytical functions considered sorted by increasing number of input parameters.}
\label{tab:functions}
\end{table}

\section{Datasets}
\label{sec:dataset}

\Cref{tab:dataset} presents two datasets. The first dataset (El Ni\~no) has no input-output relation and only features a temporal output. The second dataset Hydrodynamics features an input-output relation with spatially varying output. The datasets are as follows:
\begin{itemize}
\item The \emph{El Ni\~{n}o} dataset is a well-known functional dataset~\citep{Hyndman2009}. It consists in a time series of monthly averaged Sea Surface Temperature (SST) in degrees Celsius spatially averaged over the Pacific Ocean region (0-10°S and 90-80°W) from January 1950 to December 2007. The response variable is a vector of size 12 and the data set gathers 58 realizations. Data originate from NOAA ERSSTv5's database available at \href{http://www.cpc.ncep.noaa.gov/data/indices}{http://www.cpc.ncep.noaa.gov/data/indices}.
\item The \emph{Hydrodynamics} dataset gathers water levels (in m) computed with the 1-dimensional Shallow Water Equation MASCARET solver (opentelemac.org) for a 50~km reach of the Garonne river in South-West of France~\citep{Roy2017c}. Uncertain inputs relate to 4 scalars: the friction coefficients of the river bed $Ks1, Ks2, Ks3$ defined over three homogeneous spatial areas, and the upstream boundary condition described by a constant scalar value for the inflow $Q$ in stationary flow. The response variable is a vector of size 463 (number of computation nodes for the 1D mesh) and an ensemble of 200 realizations is considered here. 
\end{itemize}

\begin{table}[!ht]
\centering
\begin{tabular}{lccc}
\toprule
Dataset & Scalar input & Functional output & Sample size\\
\midrule % \cmidrule{2-3}
El Ni\~no & - & 12 & 58\\
Hydrodynamics & 4 & 463 & 200\\
\bottomrule
\end{tabular}
\caption{Description of the El Ni\~no and Hydrodynamics datasets.}
\label{tab:dataset}
\end{table}

\chapter{Optimization Method}
\label{sec:optim_method}

Among the numerous methods that can be used to do an optimization~\cite{Cavazzuti2013} are the one taking advantages of a surrogate model~\cite{forrester2009}. When dealing with complex cases, the numerical cost is such that only a limited number of simulations can be performed. However, ensuring the convergence of an optimization requires a minimal number of such evaluations for both deterministic and stochastic methods. In this context, building a proxy of the simulation setup allows overcoming the computational cost. As this proxy is being used for optimizing the problem, its quality is paramount. At some point refining the space of parameters---leading to a quality improvement of the model---can improve the optimization.

Batman~\cref{chap:batman} was used to handle all the workflow from the design of experiments, to the creation of the surrogate and the optimization. Its workflow is presented in~\cref{alg:workflow}.

\begin{algorithm}
  \caption{Workflow using BATMAN}
  \label{alg:workflow}
  \begin{algorithmic}[1]
  \Require $N_{max}$, $N_s$
  \State Formulate the surrogate $\mathcal{M}$ on $N_s$' output
  \While{$N_s < N_{max}$}
    \State $\mathbf{x_{*}}$ $\gets$ optimization
    \State Compute a new sample at $\mathbf{x_{*}}$
    \State Update the surrogate $\mathcal{M}$
  \EndWhile
  \end{algorithmic}
\end{algorithm}

\emph{Efficient Global Optimization} (EGO)~\cite{jones1998} is a Bayesian optimization taking into account the variance of the model. The objective of the optimization is to improve the current solution $Y_{\min}$. The improvement is computed as:
\begin{align}
I(\mathbf{x}) \left\{\begin{array}{rcl} Y_{\min} - \hat{Y}(\mathbf{x}) & \text{if } \hat{Y}(\mathbf{x}) < Y_{\min} \\ 0   & \text{otherwise} \end{array}\right. .
\end{align}

Using the surrogate, the prediction is expressed as a random process with $Y\sim \mathbb{N}(\hat{Y}, s^2)$. Thus, the objective is to get the maximal mean improvement. The expected improvement (EI) is computed as a tradeoff between the minimal value $Y_{\min}$ and an expected value given by the standard error $s$ for a given prediction $\hat{Y}$. It reads:
\begin{align}
\mathbb{E}[I(\mathbf{x})] = (Y_{\min} - \hat{Y})\Phi \left( \frac{Y_{\min} - \hat{y}(\mathbf{x})}{s} \right) + s\phi \left( \frac{Y_{\min} - \hat{Y}(\mathbf{x})}{s} \right), 
\end{align}
\noindent with $\phi(.), \Phi(.)$, respectively, the Probability Density Function (PDF) and Cumulative Distribution Function (CDF) of the normal distribution. Selecting the point with the highest expected improvement is achieved either: \emph{(i)} by maximizing the difference between the minimal value and the predicted response; or \emph{(ii)} by increasing the standard deviation---see~\cref{fig:branin_ego}. Hence, the first component is said to \emph{exploit} the model while the other seeks to \emph{explore} it. These two phases are here automatically selected. In~\cite{schonlau1998}, they proposed with the \emph{Generalized Expected Improvement} a way to define a constant $g$ to adjust the degree of \emph{exploration vs exploitation}: $\mathbb{E}[I^g]$. The highest $g$ is, the more exploratory the strategy is. Indeed, the initial EI strategy tends to add too much effort at improving the current solution and only then consider other regions of the parameters space. But this raises another concern about the value of this constant.

\begin{figure}[!ht]               
\centering
\subfloat[Response surface of the Branin function]{
\includegraphics[width=0.6\linewidth,height=\textheight,keepaspectratio]{fig/applications/optim/branin_forrester.pdf}}
   
\subfloat[EGO]{
\includegraphics[width=\linewidth,height=\textheight,keepaspectratio]{fig/applications/optim/branin_forrester_ego.pdf}}
\caption{Visualization of EGO on the Branin function---see~\cref{sec:functions}. \emph{Bottom-left} is the response surface of the current surrogate model constructed using samples represented by \emph{black dots}. \emph{Bottom-center} is the variance of the GP surrogate. \emph{Bottom-right} is the expected improvement. The \emph{red triangle} represents the global optimum of the function.}
\label{fig:branin_ego}
\end{figure}

Thus, in this work we used the classical expected improvement formulation in order to avoid the definition of an additional constant. Due to the computational cost of the simulations, this parameter was not characterized.


\chapter{Computational Fluid Dynamics}
\label{sec:LES}

\section{Governing Equations of Compressible Flows}

To solve compressible flows, the governing equations are the total mass, the momentum and the energy conservation laws. They respectively write:

\begin{align}
\frac{\partial \rho}{\partial t} + \frac{\partial (\rho u_i)}{\partial x_i} = 0,\label{eq:mass}\\
\frac{\partial (\rho u_i)}{\partial t} + \frac{\partial (\rho u_i u_j)}{\partial x_{j}} = - \frac{p}{\partial x_i} + \frac{\partial \tau_{ij}}{\partial x_j} + \rho f_{j},\label{eq:momentum}\\
\frac{\partial (\rho e_t)}{\partial t} +\frac{\partial (\rho u_i e_t)}{\partial x_i} = - \frac{\partial q_i}{\partial x_i} - \frac{\partial (p u_i)}{\partial x_i} + \frac{\partial(\tau_{i,j} u_i)}{\partial x_j} + \dot{Q} + \rho f_{i}u_i. \label{eq:energy}
\end{align}

These equations are the Navier-Stokes equations and they are here presented in Cartesian coordinates using the conventional Einstein notation.

In~\cref{eq:mass}, $x_i$ and $u_i$ are the $i$th spatial coordinate, and $\rho$ is the fluid density. In the momentum conservation equation (\cref{eq:momentum}), $p$ is the static pressure, $f_{j}$ is the volume force in the direction $j$ and $\tau_{ij}$ is the viscous stress tensor:
\begin{align}
\tau_{ij} = - \frac{2}{3}\mu \frac{\partial u_k}{\partial x_k}\delta_{ij} + \mu \left( \frac{\partial u_i}{\partial x_j} + \frac{\partial u_j}{\partial x_i} \right),
\end{align}
\noindent where $\mu$ is the dynamic viscosity of the fluid ($\mu = \rho \nu$, with $\nu$ the kinematic viscosity), and $\delta_{ij}$ is the Kronecker symbol ($\delta_{ij} = 1$ if $i = j$, $0$ otherwise).

Finally, \cref{eq:energy} is the energy conservation equation, written with the total energy $e_t$. $\dot{Q}$ is the heat source term and the energy flux $q_i$ is defined as:
\begin{align}
q_i = - \lambda \frac{\partial T}{\partial x_i},
\end{align}
\noindent where $T$ is the fluid temperature and $\lambda$ is the thermal conductivity of the fluid.

In this work, the Navier-Stokes equations are closed using the perfect gas state equation:
\begin{align}
p = \rho R T,
\end{align}
\noindent with the perfect gas constant $R = \unit{\numprint{8.314}}{J.mol−1.K−1}$.  

\section{Turbulence Modelling}

Turbulence is a natural regime of viscous flows characterized by an apparent random and chaotic behaviour of the flow structures. Despite turbulence being a complex phenomenon, its mechanism was identified by Kolmogorov~\cite{Kolmogorov1941} who theorized the idea of the energetic cascade of turbulence. \Cref{fig:energy_cascade} presents an illustration of the energetic cascade. The energy of turbulence $E(\kappa)$ is expressed as a function of the wavenumber $\kappa$. The turbulent kinetic energy is produced by the largest turbulent structures which are associated to the small values of $\kappa$. The characteristic size $l_0$ is called the integral scale and is related to the mean flow geometry. The turbulent energy is transmitted to smaller turbulent scales by stretching the largest eddies, leading to the formation of smaller coherent structures. The smallest eddies of the flow are defined by a characteristic size  $\kappa_\eta$ called the Kolmogorov scale. At this point, the eddies dissipate the turbulent energy by viscous effect.

Different approaches may be used to solve the turbulent flow equations. The Direct Numerical Simulation (DNS) approach solves these equation on all fluid scales, which leads to a very good accuracy but also a high computational cost. Large Eddy Simulation (LES) and Reynolds-Averaged Navier-Stokes (RANS) are two approaches that model the turbulence. LES only models the smallest scales while RANS models all scales. It means that the result of a RANS computation correspond to an averaging over time. Turbulence is hence filtered with this approach and only DNS and LES are able to account for such physical phenomena.

%\emph{RANS} is averaging over time whereas \emph{LES} is averaging over space. So the value of the temperature is a spatial approximation but is time dependant. The average of \emph{LES} gives the \emph{RANS} solution. The eyes are averaging. We cannot understand the phenomenun if we cannot see the turbulence. A flamme respond like a low pass filter when an acoustic pulsation is send. This can only be seen with \emph{LES}.

\begin{figure}[!ht]
\centering
\includegraphics[width=\linewidth,keepaspectratio]{fig/applications/ls89/energy_cascade.png}
\caption{Energy spectrum associated to the different fluid scales. Source~\cite{Fransen2013}.}
\label{fig:energy_cascade}
\end{figure}

\chapter{Batman's API}
\label{sec:api_ref}

Following is an extract of Batman's API. For an up to date version, refer to the only documentation:

\begin{align}
\text{\href{https://batman.readthedocs.io/en/develop/api.html}{https://batman.readthedocs.io/en/develop/api.html}}	\nonumber
\end{align}


\includepdf[pages=-,scale=0.83,pagecommand={\pagestyle{fancy}}]{fig/contributions/batman/api_ref.pdf}