lamethods · johan-mattias · Jul 10, 2025 · Jul 11, 2025 · Jul 11, 2025
diff --git a/book1/chapters/ch07-prediction/ch7-pred.qmd b/book1/chapters/ch07-prediction/ch7-pred.qmd
@@ -671,7 +671,8 @@ explained in more detail below:
 All built models and their evaluation measures are stored (in `models` and
 `eval_measures` lists) so that they can later be compared.
 
-Going now into details of each step, we start with the creation of a
+In the *feature_creation* R script we can see the details of each step. 
+We start with the creation of a
 dataset to be used for predictive modelling in week *k*. This is done by
 first computing all features based on the logged events data
 (`events_data`) up to the week *k*, and then adding the course outcome

diff --git a/book1/chapters/ch08-clustering/ch8-clus.qmd b/book1/chapters/ch08-clustering/ch8-clus.qmd
@@ -698,7 +698,7 @@ Thereafter, one simply extracts the resulting partition by invoking `cutree()` w
 hc_ward2 <- cutree(hc_euclidean_ward, h=45)
 ```
 
-The object `hc_ward2` is now simply an vector indicating the cluster-membership of each observation in the data set. We show only the first few, for brevity, and then tabulate this vector to compute the cluster sizes. However, interpretation of these clusters is more difficult than in the case of $K$-Means and $K$-Medoids, as there is no centroid or medoid prototype with which to characterise each cluster.
+The object `hc_ward2` is now simply a vector indicating the cluster-membership of each observation in the data set. We show only the first few, for brevity, and then tabulate this vector to compute the cluster sizes. However, interpretation of these clusters is more difficult than in the case of $K$-Means and $K$-Medoids, as there is no centroid or medoid prototype with which to characterise each cluster.
 
 ```{r}
 head(hc_ward2)
@@ -770,6 +770,21 @@ silhouettes <- data.frame(K=2:K,
 
 In @fig-silall, we plot these silhouettes against $K$ using `matplot()`, omitting the code to do so for brevity.
 
+```{r}
+silhouettes_long <- silhouettes |>
+  pivot_longer(cols = -K, names_to = "Method", values_to = "ASW")
+
+ggplot(silhouettes_long, aes(x = K, y = ASW, color = Method)) +
+  geom_line(size = 1) +
+  geom_point(size = 2) +
+  labs(title = "Average Silhouette Width (ASW) by Clustering Method and K",
+       x = "Number of Clusters (K)",
+       y = "Average Silhouette Width (ASW)",
+       color = "Clustering Method") +
+  theme_minimal() +
+  theme(legend.position = "bottom")
+```
+
 ```{r, echo=FALSE, fig.height=4.125, fig.width=5.5}
 #| label: "fig-silall"
 #| fig-cap: "ASW criterion values plotted against $K$ for $K$-Means, $K$-medoids (with the Euclidean, Manhattan, Minkowski ($p=3$), and Gower distances), and agglomerative hierarchical clustering based on Euclidean distance and the Ward criterion."
@@ -942,4 +957,4 @@ Overall, we encourage readers to further explore the potential of dissimilarity-
 
 
 ::: {#refs}
-:::
+:::