From 83db7c186ab38f031e95ca931dfb3aac03d6472f Mon Sep 17 00:00:00 2001 From: Johan <3921204+johan-mattias@users.noreply.github.com> Date: Thu, 10 Jul 2025 14:49:36 +0200 Subject: [PATCH 1/3] Update ch7-pred.qmd added a clarification that this code is in the provided r script --- book1/chapters/ch07-prediction/ch7-pred.qmd | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/book1/chapters/ch07-prediction/ch7-pred.qmd b/book1/chapters/ch07-prediction/ch7-pred.qmd index ad74893..43b9ac9 100644 --- a/book1/chapters/ch07-prediction/ch7-pred.qmd +++ b/book1/chapters/ch07-prediction/ch7-pred.qmd @@ -671,7 +671,8 @@ explained in more detail below: All built models and their evaluation measures are stored (in `models` and `eval_measures` lists) so that they can later be compared. -Going now into details of each step, we start with the creation of a +In the *feature_creation* R script we can see the details of each step. +We start with the creation of a dataset to be used for predictive modelling in week *k*. This is done by first computing all features based on the logged events data (`events_data`) up to the week *k*, and then adding the course outcome From d42893e3fc6ef2e43adad6eea8861332df7670f1 Mon Sep 17 00:00:00 2001 From: Johan <3921204+johan-mattias@users.noreply.github.com> Date: Fri, 11 Jul 2025 13:24:42 +0200 Subject: [PATCH 2/3] Update ch8-clus.qmd small typo --- book1/chapters/ch08-clustering/ch8-clus.qmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/book1/chapters/ch08-clustering/ch8-clus.qmd b/book1/chapters/ch08-clustering/ch8-clus.qmd index ce99963..5360e0e 100644 --- a/book1/chapters/ch08-clustering/ch8-clus.qmd +++ b/book1/chapters/ch08-clustering/ch8-clus.qmd @@ -698,7 +698,7 @@ Thereafter, one simply extracts the resulting partition by invoking `cutree()` w hc_ward2 <- cutree(hc_euclidean_ward, h=45) ``` -The object `hc_ward2` is now simply an vector indicating the cluster-membership of each observation in the data set. We show only the first few, for brevity, and then tabulate this vector to compute the cluster sizes. However, interpretation of these clusters is more difficult than in the case of $K$-Means and $K$-Medoids, as there is no centroid or medoid prototype with which to characterise each cluster. +The object `hc_ward2` is now simply a vector indicating the cluster-membership of each observation in the data set. We show only the first few, for brevity, and then tabulate this vector to compute the cluster sizes. However, interpretation of these clusters is more difficult than in the case of $K$-Means and $K$-Medoids, as there is no centroid or medoid prototype with which to characterise each cluster. ```{r} head(hc_ward2) @@ -942,4 +942,4 @@ Overall, we encourage readers to further explore the potential of dissimilarity- ::: {#refs} -::: \ No newline at end of file +::: From ace51c370ca9c14becefd7db9edad065c81dda62 Mon Sep 17 00:00:00 2001 From: Johan <3921204+johan-mattias@users.noreply.github.com> Date: Fri, 11 Jul 2025 13:45:06 +0200 Subject: [PATCH 3/3] Added R code to visualize ASW results I think this could be added because it's nice to be able to see the results for your own data when following the lesson. --- book1/chapters/ch08-clustering/ch8-clus.qmd | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/book1/chapters/ch08-clustering/ch8-clus.qmd b/book1/chapters/ch08-clustering/ch8-clus.qmd index 5360e0e..cce9bac 100644 --- a/book1/chapters/ch08-clustering/ch8-clus.qmd +++ b/book1/chapters/ch08-clustering/ch8-clus.qmd @@ -770,6 +770,21 @@ silhouettes <- data.frame(K=2:K, In @fig-silall, we plot these silhouettes against $K$ using `matplot()`, omitting the code to do so for brevity. +```{r} +silhouettes_long <- silhouettes |> + pivot_longer(cols = -K, names_to = "Method", values_to = "ASW") + +ggplot(silhouettes_long, aes(x = K, y = ASW, color = Method)) + + geom_line(size = 1) + + geom_point(size = 2) + + labs(title = "Average Silhouette Width (ASW) by Clustering Method and K", + x = "Number of Clusters (K)", + y = "Average Silhouette Width (ASW)", + color = "Clustering Method") + + theme_minimal() + + theme(legend.position = "bottom") +``` + ```{r, echo=FALSE, fig.height=4.125, fig.width=5.5} #| label: "fig-silall" #| fig-cap: "ASW criterion values plotted against $K$ for $K$-Means, $K$-medoids (with the Euclidean, Manhattan, Minkowski ($p=3$), and Gower distances), and agglomerative hierarchical clustering based on Euclidean distance and the Ward criterion."