diff --git a/docs/book/1 - Introduction.md b/docs/book/1 - Introduction.md
index 5d428f6..389f357 100644
--- a/docs/book/1 - Introduction.md	
+++ b/docs/book/1 - Introduction.md	
@@ -30,7 +30,7 @@ Extensive resources, including excellent textbooks covering linear system identi
 
 While most real world systems are nonlinear, you probably should give linear models a try first. Linear models usually serves as a strong baseline and can be good enough for your case, giving satisfactory performance. [Astron and Murray](https://www.cds.caltech.edu/~murray/books/AM05/pdf/am06-complete_16Sep06.pdf) and [Glad and Ljung](https://www.taylorfrancis.com/books/mono/10.1201/9781315274737/control-theory-lennart-ljung-torkel-glad) showed that many nonlinear systems can be well described by locally linear models. Besides, linear models are easy to fit, easy to interpret, and requires less computational resources than nonlinear models, allowing you to experiment fast and gather insights before thinking about gray box models or complex nonlinear models.
 
-Linear models can be very useful, even in the presence of strong nonlinearities, because it is much easier to deal with it. Moreover, the development of linear identification algorithms is still a very active and healthy research field, with many papers being released every year [Sai Li, Linjun Zhang, T. Tony Cai & Hongzhe Li](Sai Li, Linjun Zhang, T. Tony Cai & Hongzhe Li), [Maria Jaenada, Leandro Pardo](https://www.mdpi.com/1099-4300/24/1/123), [Xing Liu; Lin Qiu, Youtong Fang; Kui Wang; Yongdong Li, Jose Rodríguez](https://ieeexplore.ieee.org/abstract/document/10296948), [Alessandro D’Innocenzo and Francesco Smarra](https://www.paperhost.org/proceedings/controls/ECC24/files/0026.pdf). Linear models works well most of the time, and should be
+Linear models can be very useful, even in the presence of strong nonlinearities, because it is much easier to deal with it. Moreover, the development of linear identification algorithms is still a very active and healthy research field, with many papers being released every year [Sai Li, Linjun Zhang, T. Tony Cai & Hongzhe Li](Sai Li, Linjun Zhang, T. Tony Cai & Hongzhe Li), [Maria Jaenada, Leandro Pardo](https://www.mdpi.com/1099-4300/24/1/123), [Xing Liu; Lin Qiu, Youtong Fang; Kui Wang; Yongdong Li, Jose Rodríguez](https://ieeexplore.ieee.org/abstract/document/10296948), [Alessandro D’Innocenzo and Francesco Smarra](https://www.paperhost.org/proceedings/controls/ECC24/files/0026.pdf). Linear models work well most of the time and should be the first choice for many applications. However, when dealing with complex systems where linear assumptions don’t hold, nonlinear models become essential.
 
 ### Nonlinear Models
 
diff --git a/docs/book/2 - NARMAX Model Representation.md b/docs/book/2 - NARMAX Model Representation.md
index b9f1d0d..66c2e9f 100644
--- a/docs/book/2 - NARMAX Model Representation.md	
+++ b/docs/book/2 - NARMAX Model Representation.md	
@@ -1344,7 +1344,7 @@ $$
 \tag{2.34}
 $$
 
-where $Xk ~= \{x_{{_1}k}, x_{{_2}k}, \dotsc, x_{{_r}k}\}\in \mathbb{R}^{n^i_{x{_r}}}$ and $\boldsymbol Yk~= \{y_{{_1}k}, y_{{_2}k}, \dotsc, y_{{_s}k}\}\in \mathbb{R}^{n^i_{y{_s}}}$. The number of possibles terms of MIMO NARX model given the $i$-th polynomial degree, $\ell_i$, is:
+where $Xk ~= \{x_{{_1}k}, x_{{_2}k}, \dotsc, x_{{_r}k}\}\in \mathbb{R}^{n^i_{x{_r}}}$ and $Yk~= \{y_{{_1}k}, y_{{_2}k}, \dotsc, y_{{_s}k}\}\in \mathbb{R}^{n^i_{y{_s}}}$. The number of possibles terms of MIMO NARX model given the $i$-th polynomial degree, $\ell_i$, is:
 
 $$
 \begin{equation}
diff --git a/docs/book/4 - Model Structure Selection (MSS).md b/docs/book/4 - Model Structure Selection (MSS).md
index bfb9cb1..030c32d 100644
--- a/docs/book/4 - Model Structure Selection (MSS).md	
+++ b/docs/book/4 - Model Structure Selection (MSS).md	
@@ -209,12 +209,10 @@ $$
 Thus, the ERR due to the inclusion of the regressor $q_{{_i}}$ is expressed as:
 
 $$
-\begin{align}
-    [\text{ERR}]_i = \frac{g_{_i}^2q_{_i}^\topq_{_i} }{Y^\topY}, \qquad \text{for~} i=1,2,\dotsc, n_\Theta.
-\end{align}
-\tag{16}
+[\text{ERR}]_i = \frac{g_{i}^2 \cdot q_{i}^\top q_{i}}{Y^\top Y}, \qquad \text{for } i=1,2,\dotsc, n_\Theta.
 $$
 
+
 There are many ways to terminate the algorithm. An approach often used is stop the algorithm if the model output variance drops below some predetermined limit $\varepsilon$:
 
 $$
@@ -252,6 +250,7 @@ $$
 \left\langle x, y\right\rangle = \left\langle \hat{\theta} x, x\right\rangle = \hat{\theta} \left\langle x, x\right\rangle
 \tag{20}
 $$
+
 Which implies that
 
 $$
diff --git a/docs/book/5 - Multiobjective Parameter Estimation.md b/docs/book/5 - Multiobjective Parameter Estimation.md
index 69a0657..d194be4 100644
--- a/docs/book/5 - Multiobjective Parameter Estimation.md	
+++ b/docs/book/5 - Multiobjective Parameter Estimation.md	
@@ -621,21 +621,19 @@ So
 $$
 q_i =
 \begin{bmatrix}
-0 & 0\\
-1 & 0\\
-2 & 0\\
-1 & 1\\
-2 & 1\\
+0 & 0 \\
+1 & 0 \\
+2 & 0 \\
+1 & 1 \\
+2 & 1 \\
 \end{bmatrix}
-
 =
-
 \begin{bmatrix}
-1\\
-\overline{y}\\
-\overline{u}\\
-\overline{y^2}\\
-\overline{u}\:\overline{y}\\
+1 \\
+\overline{y} \\
+\overline{u} \\
+\overline{y^2} \\
+\overline{u} \cdot \overline{y} \\
 \end{bmatrix}
 $$
 
@@ -944,17 +942,16 @@ $$
 $$
 q_i =
 \begin{bmatrix}
-0 & 0\\
-1 & 0\\
-2 & 0\\
-2 & 2\\
+0 & 0 \\
+1 & 0 \\
+2 & 0 \\
+2 & 2 \\
 \end{bmatrix}
 =
-
 \begin{bmatrix}
-1\\
-\overline{y}\\
-\overline{u}\\
+1 \\
+\overline{y} \\
+\overline{u} \\
 \overline{u^2}
 \end{bmatrix}
 $$
diff --git a/docs/book/8 - Severely Nonlinear System Identification.md b/docs/book/8 - Severely Nonlinear System Identification.md
index a64226a..d1c1efd 100644
--- a/docs/book/8 - Severely Nonlinear System Identification.md	
+++ b/docs/book/8 - Severely Nonlinear System Identification.md	
@@ -1,4 +1,4 @@
-We have categorized systems into two different classes for now:** linear systems** and **nonlinear** systems. As mentioned, **linear systems** has been extensively studied with several different well-established methods available, while **nonlinear** systems is a very active field with several problems that are still open for research. Besides linear and nonlinear systems, there are the ones called **Severely Nonlinear Systems**. Severely Nonlinear Systems are the ones that exhibit highly complex and exotic dynamic behaviors like sub-harmonics, chaotic behavior and hysteresis. For now, we will focus on system with hysteresis.
+We have categorized systems into two different classes for now: **linear systems** and **nonlinear** systems. As mentioned, **linear systems** has been extensively studied with several different well-established methods available, while **nonlinear** systems is a very active field with several problems that are still open for research. Besides linear and nonlinear systems, there are the ones called **Severely Nonlinear Systems**. Severely Nonlinear Systems are the ones that exhibit highly complex and exotic dynamic behaviors like sub-harmonics, chaotic behavior and hysteresis. For now, we will focus on system with hysteresis.
 
 ## Modeling Hysteresis With Polynomial NARX Model
 
diff --git a/docs/book/9 - Validation.md b/docs/book/9 - Validation.md
index e973018..fee16d8 100644
--- a/docs/book/9 - Validation.md	
+++ b/docs/book/9 - Validation.md	
@@ -132,7 +132,12 @@ $$
 
 where $\hat{y}_k \in \mathbb{R}$ the model predicted output and $\bar{y} \in \mathbb{R}$ the mean of the measured output $y_k$. The RRSE gives some indication regarding the quality of the model, but concluding about the best model by evaluating only this quantity may leads to an incorrect interpretation, as shown in following example.
 
-Consider the models $y_{{_a}k} = 0.7077y_{{_a}k-1} + 0.1642u_{k-1} + 0.1280u_{k-2}$ and $y_{{_b}k}=0.7103y_{{_b}k-1} + 0.1458u_{k-1} + 0.1631u_{k-2} -1467y^3_{{_b}k-1} + 0.0710y^3_{{_b}k-2} +0.0554y^2_{{_b}k-3}u_{k-3}$ defined in the [Meta Model Structure Selection: An Algorithm For Building Polynomial NARX Models For Regression And Classification](https://ufsj.edu.br/portal2-repositorio/File/ppgel/225-2020-02-17-DissertacaoWilsonLacerda.pdf). The former results in a $RRSE = 0.1202$ while the latter gives $RRSE~=0.0857$. Although the model $y_{{_b}k}$ fits the data better, it is only a biased representation to one piece of data and not a good description of the entire system.
+Consider the models 
+$$
+y_{{_a}k} = 0.7077y_{{_a}k-1} + 0.1642u_{k-1} + 0.1280u_{k-2}
+$$
+
+and $y_{{_b}k}=0.7103y_{{_b}k-1} + 0.1458u_{k-1} + 0.1631u_{k-2} -1467y^3_{{_b}k-1} + 0.0710y^3_{{_b}k-2} +0.0554y^2_{{_b}k-3}u_{k-3}$ defined in the [Meta Model Structure Selection: An Algorithm For Building Polynomial NARX Models For Regression And Classification](https://ufsj.edu.br/portal2-repositorio/File/ppgel/225-2020-02-17-DissertacaoWilsonLacerda.pdf). The former results in a $RRSE = 0.1202$ while the latter gives $RRSE~=0.0857$. Although the model $y_{{_b}k}$ fits the data better, it is only a biased representation to one piece of data and not a good description of the entire system.
 
 The RRSE (or any other metric) shows that validations test might be performed carefully. Another traditional practice is split the data set in two parts. In this respect, one can test the models obtained from the estimation part of the data using an specific data for validation. However, the one-step-ahead performance of NARX models generally results in misleading interpretations because even strongly biased models may fit the data well. Therefore, a free run simulation approach usually allows a better interpretation if the model is adequate or not ([Billings, S. A.](https://www.wiley.com/en-us/Nonlinear+System+Identification%3A+NARMAX+Methods+in+the+Time%2C+Frequency%2C+and+Spatio-Temporal+Domains-p-9781119943594)).