Skip to content

Commit 26ef761

Browse files
committed
Update two vignettes: manual addition of section numbering
1 parent 0a25d1d commit 26ef761

File tree

2 files changed

+68
-68
lines changed

2 files changed

+68
-68
lines changed

vignettes/Optimalgo.Rmd

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,15 @@ options(digits = 3)
2424
```
2525

2626

27-
# Quick overview of main optimization methods
27+
# 1. Quick overview of main optimization methods
2828

2929
We present very quickly the main optimization methods.
3030
Please refer to **Numerical Optimization (Nocedal \& Wright, 2006)**
3131
or **Numerical Optimization: theoretical and practical aspects
3232
(Bonnans, Gilbert, Lemarechal \& Sagastizabal, 2006)** for a good introduction.
3333
We consider the following problem $\min_x f(x)$ for $x\in\mathbb{R}^n$.
3434

35-
## Derivative-free optimization methods
35+
## 1.1. Derivative-free optimization methods
3636
The Nelder-Mead method is one of the most well known derivative-free methods
3737
that use only values of $f$ to search for the minimum.
3838
It consists in building a simplex of $n+1$ points and moving/shrinking
@@ -67,12 +67,12 @@ this simplex into the good direction.
6767
The Nelder-Mead method is available in `optim`.
6868
By default, in `optim`, $\alpha=1$, $\beta=1/2$, $\gamma=2$ and $\sigma=1/2$.
6969

70-
## Hessian-free optimization methods
70+
## 1.2. Hessian-free optimization methods
7171

7272
For smooth non-linear function, the following method is generally used:
7373
a local method combined with line search work on the scheme $x_{k+1} =x_k + t_k d_{k}$, where the local method will specify the direction $d_k$ and the line search will specify the step size $t_k \in \mathbb{R}$.
7474

75-
### Computing the direction $d_k$
75+
### 1.2.1. Computing the direction $d_k$
7676
A desirable property for $d_k$ is that $d_k$ ensures a descent $f(x_{k+1}) < f(x_{k})$.
7777
Newton methods are such that $d_k$ minimizes a local quadratic approximation of $f$ based on a Taylor expansion, that is $q_f(d) = f(x_k) + g(x_k)^Td +\frac{1}{2} d^T H(x_k) d$ where $g$ denotes the gradient and $H$ denotes the Hessian.
7878

@@ -121,7 +121,7 @@ See Yuan (2006) for other well-known schemes such as Hestenses-Stiefel, Dixon or
121121
The three updates (Fletcher-Reeves, Polak-Ribiere, Beale-Sorenson) of the (non-linear) conjugate gradient are available in `optim`.
122122

123123

124-
### Computing the stepsize $t_k$
124+
### 1.2.2. Computing the stepsize $t_k$
125125

126126
Let $\phi_k(t) = f(x_k + t d_k)$ for a given direction/iterate $(d_k, x_k)$.
127127
We need to find conditions to find a satisfactory stepsize $t_k$. In literature, we consider the descent condition: $\phi_k'(0) < 0$
@@ -136,7 +136,7 @@ Nocedal \& Wright (2006) presents a backtracking (or geometric) approach satisfy
136136
This backtracking linesearch is available in `optim`.
137137

138138

139-
## Benchmark
139+
## 1.3. Benchmark
140140

141141
To simplify the benchmark of optimization methods, we create a `fitbench` function that computes
142142
the desired estimation method for all optimization methods.
@@ -152,12 +152,12 @@ fitbench <- fitdistrplus:::fitbench
152152

153153

154154

155-
# Numerical illustration with the beta distribution
155+
# 2. Numerical illustration with the beta distribution
156156

157157

158-
## Log-likelihood function and its gradient for beta distribution
158+
## 2.1. Log-likelihood function and its gradient for beta distribution
159159

160-
### Theoretical value
160+
### 2.1.1. Theoretical value
161161
The density of the beta distribution is given by
162162
$$
163163
f(x; \delta_1,\delta_2) = \frac{x^{\delta_1-1}(1-x)^{\delta_2-1}}{\beta(\delta_1,\delta_2)},
@@ -179,7 +179,7 @@ $$
179179
where $\psi(x)=\Gamma'(x)/\Gamma(x)$ is the digamma function,
180180
see the NIST Handbook of mathematical functions https://dlmf.nist.gov/.
181181

182-
### `R` implementation
182+
### 2.1.2. `R` implementation
183183
As in the `fitdistrplus` package, we minimize the opposite of the log-likelihood:
184184
we implement the opposite of the gradient in `grlnL`. Both the log-likelihood and its gradient
185185
are not exported.
@@ -191,7 +191,7 @@ grlnlbeta <- fitdistrplus:::grlnlbeta
191191

192192

193193

194-
## Random generation of a sample
194+
## 2.2. Random generation of a sample
195195

196196
```{r, fig.height=4, fig.width=4}
197197
#(1) beta distribution
@@ -204,7 +204,7 @@ curve(dbeta(x, 3, 3/4), col="green", add=TRUE)
204204
legend("topleft", lty=1, col=c("red","green"), legend=c("empirical", "theoretical"), bty="n")
205205
```
206206

207-
## Fit Beta distribution
207+
## 2.3 Fit Beta distribution
208208

209209
Define control parameters.
210210
```{r}
@@ -243,7 +243,7 @@ numerically approximated one).
243243

244244

245245

246-
## Results of the numerical investigation
246+
## 2.4. Results of the numerical investigation
247247
Results are displayed in the following tables:
248248
(1) the original parametrization without specifying the gradient (`-B` stands for bounded version),
249249
(2) the original parametrization with the (true) gradient (`-B` stands for bounded version and `-G` for gradient),
@@ -289,12 +289,12 @@ plot(b1, trueval = c(3, 3/4))
289289
```
290290

291291

292-
# Numerical illustration with the negative binomial distribution
292+
# 3. Numerical illustration with the negative binomial distribution
293293

294294

295-
## Log-likelihood function and its gradient for negative binomial distribution
295+
## 3.1. Log-likelihood function and its gradient for negative binomial distribution
296296

297-
### Theoretical value
297+
### 3.1.1. Theoretical value
298298
The p.m.f. of the Negative binomial distribution is given by
299299
$$
300300
f(x; m,p) = \frac{\Gamma(x+m)}{\Gamma(m)x!} p^m (1-p)^x,
@@ -325,7 +325,7 @@ $$
325325
where $\psi(x)=\Gamma'(x)/\Gamma(x)$ is the digamma function,
326326
see the NIST Handbook of mathematical functions https://dlmf.nist.gov/.
327327

328-
### `R` implementation
328+
### 3.1.2. `R` implementation
329329
As in the `fitdistrplus` package, we minimize the opposite of the log-likelihood: we implement the opposite of the gradient in `grlnL`.
330330
```{r}
331331
grlnlNB <- function(x, obs, ...)
@@ -342,7 +342,7 @@ grlnlNB <- function(x, obs, ...)
342342

343343

344344

345-
## Random generation of a sample
345+
## 3.2. Random generation of a sample
346346

347347
```{r, fig.height=4, fig.width=4}
348348
#(2) negative binomial distribution
@@ -358,7 +358,7 @@ legend("topright", lty = 1, col = c("red", "green"),
358358
legend = c("empirical", "theoretical"), bty="n")
359359
```
360360

361-
## Fit a negative binomial distribution
361+
## 3.3. Fit a negative binomial distribution
362362

363363
Define control parameters and make the benchmark.
364364
```{r}
@@ -399,7 +399,7 @@ to minimize and its gradient (whether it is the theoretical gradient or the
399399
numerically approximated one).
400400

401401

402-
## Results of the numerical investigation
402+
## 3.4. Results of the numerical investigation
403403
Results are displayed in the following tables:
404404
(1) the original parametrization without specifying the gradient (`-B` stands for bounded version),
405405
(2) the original parametrization with the (true) gradient (`-B` stands for bounded version and `-G` for gradient),
@@ -447,7 +447,7 @@ plot(b1, trueval=trueval[c("size", "mu")])
447447

448448

449449

450-
# Conclusion
450+
# 4. Conclusion
451451

452452
Based on the two previous examples, we observe that all methods converge to the same
453453
point. This is reassuring.

0 commit comments

Comments
 (0)