-
Notifications
You must be signed in to change notification settings - Fork 8
Clustering Algorithms
The aim of K-means algorithm is to obtain the minimum distance between observation within the same subgroup. This algorithm requires the K number of clusters (argument centers
) with no default. The K-means algorithm uses random initialization in order to obtain the clusters, so different centroid coordinates and cluster ordering will be obtained at each realization. These and other additional features of the K-means algorithm can be handled and tuned by passing clusterGrid
the specific arguments of the kmeans
function of the R package stats
. An exmaple is next provided:
library(transformeR)
data(NCEP_Iberia_psl, package = "transformeR")
A re-analysis of the Sea Pressure Level over the Iberia peninsula will be used as Dataset in order to obtain 10 CTs (the clusters):
clusters<- clusterGrid(NCEP_Iberia_psl, type="kmeans", centers=10, iter.max=1000)
After that, the centroids of the CTs can be plotted by using spatialPlot
, a function in visualizeR
, if we first process the output of clusterGrid
by using subsetGrid
as follows:
cts <- lapply(1:attr(clusters, "centers"), function(x) {
climatology(subsetGrid(clusters, cluster = x))})
#A list of grids with K elements was created and CTs centroids are now located in time-dimension of the grids
#"makeMultiGrid" can be used to create a multigrid containing all the elements from the CTs list.
cts.mg <- makeMultiGrid(cts, skip.temporal.check = TRUE)
visualizeR::spatialPlot(cts.mg, backdrop.theme = "coastline", rev.colors = TRUE, main="PSL Clusters from NCEP Iberia (Kmeans)", layout = c(2,ceiling(attr(clusters, "centers")/2)), as.table = TRUE)
In contrast to K-means, Hierarchical algorithm doesn't require the number of clusters to be provided. It allows the user either to specify the number of clusters or not. If centers
is not provided, they are automatically set and the Hierarchical "tree" is cut when the height difference between two consecutive divisions (sorted in ascending order) is larger than the intercuartile range of the heights vector.
In this example, centers
will not be provided, so the algorithm decides the number of clusters itself:
clusters<- clusterGrid(NCEP_Iberia_psl, type="hierarchical")
The clusters will be plotted using spatialPlot
after processing the data with subsetGrid
and makeMultiGrid
:
cts <- lapply(1:attr(clusters, "centers"), function(x) {
climatology(subsetGrid(clusters, cluster = x))})
cts.mg <- makeMultiGrid(cts, skip.temporal.check = TRUE)
visualizeR::spatialPlot(cts.mg, backdrop.theme = "coastline", rev.colors = TRUE, main="PSL Clusters from NCEP Iberia (Hierarchical)", layout = c(2,ceiling(attr(clusters, "centers")/2)), as.table = TRUE)
While using the SOM algorithm, the argument centers
is provided as a two-element vector, indicating the dimensions {xdim, ydim}
of the grid. Otherwise, by default 48 clusters (8x6) with rectangular topology are obtained.
In this example, SOM is forced to create 10 CTs, that will be plotted later:
clusters<- clusterGrid(NCEP_Iberia_psl, type="som", centers = c(10,1))
cts <- lapply(1:attr(clusters, "centers"), function(x) {
climatology(subsetGrid(clusters, cluster = x))})
cts.mg <- makeMultiGrid(cts, skip.temporal.check = TRUE)
visualizeR::spatialPlot(cts.mg, backdrop.theme = "coastline", rev.colors = TRUE, main="PSL Clusters from NCEP Iberia (SOM)", layout = c(2,ceiling(attr(clusters, "centers")/2)), as.table = TRUE)
Lamb Weather Types (LWTs) is one of the best known and most analysed WTs developed for the British Isles by Lamb (1972). It is applied to daily sea level pressure data and 26 different WTs are defined, 10 pure types (NE, E, SE, S, SW, W, NW, N, C and A) and 16 hybrid types (8 for each C and A hybrid). For further information, check Jones et Al. (2013)[1]
In the following example, we use daily sea level pressure from the NCEP1 Reanalysis on 2001-2010 period in order to obtain the LWTs using clusterGrid
. This dataset is included in transformeR
package.
data(NCEP_slp_2001_2010, package = "transformeR")
clusters<- clusterGrid(NCEP_slp_2001_2010, type="lamb")
Plot the spatial distribution of the LWTs:
cts <- lapply(1:attr(clusters, "centers"), function(x) {
climatology(subsetGrid(clusters, cluster = x))})
cts.mg <- makeMultiGrid(cts, skip.temporal.check = TRUE)
visualizeR::spatialPlot(cts.mg, backdrop.theme = "coastline", rev.colors = TRUE, main="PSL Clusters from NCEP (Lamb WTs)", as.table = TRUE)
[1] Jones, P. D., Harpham, C., & Briffa, K. R. (2013). Lamb weather types derived from reanalysis products. International Journal of Climatology, 33(5), 1129-1139. https://doi.org/10.1002/joc.3498
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=es_ES.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=es_ES.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] magrittr_1.5 transformeR_1.7.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 rmsfact_0.0.3 codetools_0.2-16 lattice_0.20-38 grid_3.6.2 spam_2.5-1 kohonen_3.0.10
[8] raster_3.0-12 sp_1.3-2 akima_0.6-2 cowsay_0.7.0 Matrix_1.2-18 fortunes_1.5-4 tools_3.6.2
[15] RcppEigen_0.3.3.7.0 maps_3.3.0 fields_10.3 parallel_3.6.2 abind_1.4-5 compiler_3.6.2 dotCall64_1.0-0
transformeR - Santander MetGroup (Univ. Cantabria - CSIC)
- Package Installation
- Included illustrative datasets
- Standard data manipulation
- Principal Components (and EOFs)
- Circulation and Weather Typing