lbbe-software
diff --git a/‎vignettes/Optimalgo.Rmd
Lines changed: 21 additions & 21 deletions b/‎vignettes/Optimalgo.Rmd
Lines changed: 21 additions & 21 deletions
@@ -24,15 +24,15 @@ options(digits = 3)
 ```
 
 
-# Quick overview of main optimization methods
+# 1. Quick overview of main optimization methods
 
 We present very quickly the main optimization methods.
 Please refer to **Numerical Optimization (Nocedal \& Wright, 2006)** 
 or **Numerical Optimization: theoretical and practical aspects 
 (Bonnans, Gilbert, Lemarechal \& Sagastizabal, 2006)** for a good introduction. 
 We consider the following problem $\min_x f(x)$ for $x\in\mathbb{R}^n$.
 
-## Derivative-free optimization methods
+## 1.1. Derivative-free optimization methods
 The Nelder-Mead method is one of the most well known derivative-free methods
 that use only values of $f$ to search for the minimum.
 It consists in building a simplex of $n+1$ points and moving/shrinking
@@ -67,12 +67,12 @@ this simplex into the good direction.
 The Nelder-Mead method is available in `optim`.
 By default, in `optim`, $\alpha=1$, $\beta=1/2$, $\gamma=2$ and $\sigma=1/2$.
 
-## Hessian-free optimization methods
+## 1.2. Hessian-free optimization methods
 
 For smooth non-linear function, the following method is generally used:
 a local method combined with line search work on the scheme $x_{k+1} =x_k + t_k d_{k}$, where the local method will specify the direction $d_k$ and the line search will specify the step size $t_k \in \mathbb{R}$.	
 
-### Computing the direction $d_k$
+### 1.2.1. Computing the direction $d_k$
 A desirable property for $d_k$ is that $d_k$ ensures a descent $f(x_{k+1}) < f(x_{k})$. 
 Newton methods are such that $d_k$ minimizes a local quadratic approximation of $f$ based on a Taylor expansion, that is  $q_f(d) = f(x_k) + g(x_k)^Td +\frac{1}{2} d^T H(x_k) d$ where $g$ denotes the gradient and $H$ denotes the Hessian.
 
@@ -121,7 +121,7 @@ See Yuan (2006) for other well-known schemes such as Hestenses-Stiefel, Dixon or
 The three updates (Fletcher-Reeves, Polak-Ribiere, Beale-Sorenson) of the (non-linear) conjugate gradient are available in `optim`.
 
 
-### Computing the stepsize $t_k$
+### 1.2.2. Computing the stepsize $t_k$
 
 Let $\phi_k(t) = f(x_k + t d_k)$ for a given direction/iterate $(d_k, x_k)$. 
 We need to find conditions to find a satisfactory stepsize $t_k$. In literature, we consider the  descent condition: $\phi_k'(0) < 0$
@@ -136,7 +136,7 @@ Nocedal \& Wright (2006) presents a backtracking (or geometric) approach satisfy
 This backtracking linesearch is available in `optim`.
 
 
-## Benchmark
+## 1.3. Benchmark
 
 To simplify the benchmark of optimization methods, we create a `fitbench` function that computes
 the desired estimation method for all optimization methods. 
@@ -152,12 +152,12 @@ fitbench <- fitdistrplus:::fitbench
 
 
 
-# Numerical illustration with the beta distribution
+# 2. Numerical illustration with the beta distribution
 
 
-## Log-likelihood function and its gradient for beta distribution
+## 2.1. Log-likelihood function and its gradient for beta distribution
 
-### Theoretical value
+### 2.1.1. Theoretical value
 The density of the beta distribution is given by 
 $$
 f(x; \delta_1,\delta_2) = \frac{x^{\delta_1-1}(1-x)^{\delta_2-1}}{\beta(\delta_1,\delta_2)},
@@ -179,7 +179,7 @@ $$
 where $\psi(x)=\Gamma'(x)/\Gamma(x)$ is the digamma function, 
 see the NIST Handbook of mathematical functions https://dlmf.nist.gov/.
 
-### `R` implementation
+### 2.1.2. `R` implementation
 As in the `fitdistrplus` package, we minimize the opposite of the log-likelihood: 
 we implement the opposite of the gradient in `grlnL`. Both the log-likelihood and its gradient
 are not exported.
@@ -191,7 +191,7 @@ grlnlbeta <- fitdistrplus:::grlnlbeta
 
 
 
-## Random generation of a sample
+## 2.2. Random generation of a sample
 
 ```{r, fig.height=4, fig.width=4}
 #(1) beta distribution
@@ -204,7 +204,7 @@ curve(dbeta(x, 3, 3/4), col="green", add=TRUE)
 legend("topleft", lty=1, col=c("red","green"), legend=c("empirical", "theoretical"), bty="n")
 ```
 
-## Fit Beta distribution
+## 2.3 Fit Beta distribution
 
 Define control parameters.
 ```{r}
@@ -243,7 +243,7 @@ numerically approximated one).
 
 
 
-## Results of the numerical investigation
+## 2.4. Results of the numerical investigation
 Results are displayed in the following tables:
 (1) the original parametrization without specifying the gradient (`-B` stands for bounded version),
 (2) the original parametrization with the (true) gradient (`-B` stands for bounded version and `-G` for gradient),
@@ -289,12 +289,12 @@ plot(b1, trueval = c(3, 3/4))
 ```
 
 
-# Numerical illustration with the negative binomial distribution
+# 3. Numerical illustration with the negative binomial distribution
 
 
-## Log-likelihood function and its gradient for negative binomial distribution
+## 3.1. Log-likelihood function and its gradient for negative binomial distribution
 
-### Theoretical value
+### 3.1.1. Theoretical value
 The p.m.f. of the Negative binomial distribution is given by 
 $$
 f(x; m,p) = \frac{\Gamma(x+m)}{\Gamma(m)x!} p^m (1-p)^x,
@@ -325,7 +325,7 @@ $$
 where $\psi(x)=\Gamma'(x)/\Gamma(x)$ is the digamma function, 
 see the NIST Handbook of mathematical functions https://dlmf.nist.gov/.
 
-### `R` implementation
+### 3.1.2. `R` implementation
 As in the `fitdistrplus` package, we minimize the opposite of the log-likelihood: we implement the opposite of the gradient in `grlnL`.
 ```{r}
 grlnlNB <- function(x, obs, ...)
@@ -342,7 +342,7 @@ grlnlNB <- function(x, obs, ...)
 
 
 
-## Random generation of a sample
+## 3.2. Random generation of a sample
 
 ```{r, fig.height=4, fig.width=4}
 #(2) negative binomial distribution
@@ -358,7 +358,7 @@ legend("topright", lty = 1, col = c("red", "green"),
        legend = c("empirical", "theoretical"), bty="n")
 ```
 
-## Fit a negative binomial distribution
+## 3.3. Fit a negative binomial distribution
 
 Define control parameters and make the benchmark.
 ```{r}
@@ -399,7 +399,7 @@ to minimize and its gradient (whether it is the theoretical gradient or the
 numerically approximated one).
 
 
-## Results of the numerical investigation
+## 3.4. Results of the numerical investigation
 Results are displayed in the following tables:
 (1) the original parametrization without specifying the gradient (`-B` stands for bounded version),
 (2) the original parametrization with the (true) gradient (`-B` stands for bounded version and `-G` for gradient),
@@ -447,7 +447,7 @@ plot(b1, trueval=trueval[c("size", "mu")])
 
 
 
-# Conclusion
+# 4. Conclusion
 
 Based on the two previous examples, we observe that all methods converge to the same
 point. This is reassuring.