Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion book1/chapters/ch07-prediction/ch7-pred.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -671,7 +671,8 @@ explained in more detail below:
All built models and their evaluation measures are stored (in `models` and
`eval_measures` lists) so that they can later be compared.

Going now into details of each step, we start with the creation of a
In the *feature_creation* R script we can see the details of each step.
We start with the creation of a
dataset to be used for predictive modelling in week *k*. This is done by
first computing all features based on the logged events data
(`events_data`) up to the week *k*, and then adding the course outcome
Expand Down
19 changes: 17 additions & 2 deletions book1/chapters/ch08-clustering/ch8-clus.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -698,7 +698,7 @@ Thereafter, one simply extracts the resulting partition by invoking `cutree()` w
hc_ward2 <- cutree(hc_euclidean_ward, h=45)
```

The object `hc_ward2` is now simply an vector indicating the cluster-membership of each observation in the data set. We show only the first few, for brevity, and then tabulate this vector to compute the cluster sizes. However, interpretation of these clusters is more difficult than in the case of $K$-Means and $K$-Medoids, as there is no centroid or medoid prototype with which to characterise each cluster.
The object `hc_ward2` is now simply a vector indicating the cluster-membership of each observation in the data set. We show only the first few, for brevity, and then tabulate this vector to compute the cluster sizes. However, interpretation of these clusters is more difficult than in the case of $K$-Means and $K$-Medoids, as there is no centroid or medoid prototype with which to characterise each cluster.

```{r}
head(hc_ward2)
Expand Down Expand Up @@ -770,6 +770,21 @@ silhouettes <- data.frame(K=2:K,

In @fig-silall, we plot these silhouettes against $K$ using `matplot()`, omitting the code to do so for brevity.

```{r}
silhouettes_long <- silhouettes |>
pivot_longer(cols = -K, names_to = "Method", values_to = "ASW")

ggplot(silhouettes_long, aes(x = K, y = ASW, color = Method)) +
geom_line(size = 1) +
geom_point(size = 2) +
labs(title = "Average Silhouette Width (ASW) by Clustering Method and K",
x = "Number of Clusters (K)",
y = "Average Silhouette Width (ASW)",
color = "Clustering Method") +
theme_minimal() +
theme(legend.position = "bottom")
```

```{r, echo=FALSE, fig.height=4.125, fig.width=5.5}
#| label: "fig-silall"
#| fig-cap: "ASW criterion values plotted against $K$ for $K$-Means, $K$-medoids (with the Euclidean, Manhattan, Minkowski ($p=3$), and Gower distances), and agglomerative hierarchical clustering based on Euclidean distance and the Ward criterion."
Expand Down Expand Up @@ -942,4 +957,4 @@ Overall, we encourage readers to further explore the potential of dissimilarity-


::: {#refs}
:::
:::