Skip to content

Clustering Algorithms

juanferngran edited this page Feb 14, 2020 · 9 revisions

Introduction

In the following examples, the climate4R packages transformeR and visualizeR will be used.

library(transformeR)
library(visualizeR)

K-means:

The aim of K-means algorithm [1] is to obtain the minimum distance between observation within the same subgroup. This algorithm requires the K number of clusters (argument centers) with no default. The K-means algorithm uses random initialization in order to obtain the clusters, so different centroid coordinates and cluster ordering will be obtained at each realization. These and other additional features of the K-means algorithm can be handled and tuned by passing clusterGrid the specific arguments of the kmeans function of the R package stats. An example is next provided:

data(NCEP_Iberia_psl, package = "transformeR")

A re-analysis of the Sea Pressure Level over the Iberia peninsula will be used as Dataset in order to obtain 10 CTs (the clusters):

clusters <- clusterGrid(NCEP_Iberia_psl, type = "kmeans", centers = 10, iter.max = 1000)

After that, the centroids of the CTs will be plotted using spatialPlot. To this aim, a list of K grids containing each CT is produced. Within the same lapply loop, the function climatology is used to extract their respective centroids:

cts <- lapply(1:attr(clusters, "centers"), function(x) {
  climatology(subsetGrid(clusters, cluster = x))
})

makeMultiGrid is next used to create a multigrid containing each CT centroid as a separate layer for plotting purposes:

cts.mg <- makeMultiGrid(cts, skip.temporal.check = TRUE)
spatialPlot(cts.mg, backdrop.theme = "coastline", rev.colors = TRUE,
            main = "PSL Clusters from NCEP Iberia (Kmeans)",
            layout = c(2, 5), as.table = TRUE)

Hierarchical:

In contrast to K-means, Hierarchical algorithm doesn't require the number of clusters to be provided (it can be optionally indicated though). If centers is not provided, they are automatically set and the Hierarchical tree is cut when the height difference between two consecutive divisions (sorted in ascending order) is larger than the interquartile range of the heights vector.

In this example centers is omitted, so the algorithm automatically decides the number of clusters:

clusters<- clusterGrid(NCEP_Iberia_psl, type = "hierarchical")

The clusters will be plotted using spatialPlot after processing the data with subsetGrid and makeMultiGrid (see the previous example using K-means):

cts <- lapply(1:attr(clusters, "centers"), function(x) {
  climatology(subsetGrid(clusters, cluster = x))
})
cts.mg <- makeMultiGrid(cts, skip.temporal.check = TRUE)
visualizeR::spatialPlot(cts.mg, backdrop.theme = "coastline", rev.colors = TRUE,
                        main = "PSL Clusters from NCEP Iberia (Hierarchical)",
                        layout = c(2,ceiling(attr(clusters, "centers")/2)), as.table = TRUE)

SOM:

SOM is an extended version of k-means algorithm where the centroids of each cluster get self-organized into an user-friendly and efficient topology[2]. In SOM the argument centers is provided as a two-element vector, indicating the dimensions c(xdim, ydim) of the grid. Otherwise, by default 48 clusters (8x6) with rectangular topology are obtained.

In this example, SOM is forced to create 10 CTs, that will be plotted later:

clusters<- clusterGrid(NCEP_Iberia_psl, type = "som", centers = c(10, 1))

cts <- lapply(1:attr(clusters, "centers"), function(x) {
  climatology(subsetGrid(clusters, cluster = x))
})
cts.mg <- makeMultiGrid(cts, skip.temporal.check = TRUE) 
spatialPlot(cts.mg, backdrop.theme = "coastline", rev.colors = TRUE,
            main = "PSL Clusters from NCEP Iberia (SOM)",
            layout = c(2,ceiling(attr(clusters, "centers")/2)), as.table = TRUE)

Lamb Weather Types:

Lamb Weather Types (LWTs) is one of the best known and most analysed WTs developed for the British Isles by Lamb (1972)[3]. It is applied to daily sea level pressure data and 26 different WTs are defined, 10 pure types (NE, E, SE, S, SW, W, NW, N, C and A) and 16 hybrid types (8 for each C and A hybrid). For further information, check Jones et al. (2013)[4]

In the following example, we use daily sea level pressure from the NCEP1 Reanalysis on 2001-2010 period in order to obtain the LWTs using clusterGrid. This dataset is included in transformeR package.

data(NCEP_slp_2001_2010, package = "transformeR")

clusters <- clusterGrid(NCEP_slp_2001_2010, type = "lamb")

Plot the spatial distribution of the LWTs:

cts <- lapply(1:attr(clusters, "centers"), function(x) {
  climatology(subsetGrid(clusters, cluster = x))
})
cts.mg <- makeMultiGrid(cts, skip.temporal.check = TRUE)
visualizeR::spatialPlot(cts.mg, backdrop.theme = "coastline", rev.colors = TRUE,
                        main = "PSL Clusters from NCEP (Lamb WTs)",
                        as.table = TRUE)

[1] Anderberg, M. R.(1973).Cluster Analysis for Applications. Academic Press,New York.

[2] Kohonen, T.(2001).Self-Organizing Maps. Third, extended edition. Springer.

[3] Lamb HH. 1972. British Isles weather types and a register of daily sequence of circulation patterns, 1861–1971’. Geophysical Memoir 116, HMSO, London.

[4] Jones, P. D., Harpham, C., & Briffa, K. R. (2013). Lamb weather types derived from reanalysis products. International Journal of Climatology, 33(5), 1129-1139. https://doi.org/10.1002/joc.3498

Session Info

sessionInfo(package = c("transformeR", "visualizeR"))
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=es_ES.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=es_ES.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
character(0)

other attached packages:
[1] transformeR_1.7.2 visualizeR_1.5.1 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3              compiler_3.6.2          RColorBrewer_1.1-2      methods_3.6.2           utils_3.6.2             tools_3.6.2            
 [7] grDevices_3.6.2         boot_1.3-24             dotCall64_1.0-0         vioplot_0.3.2           lattice_0.20-38         Matrix_1.2-18          
[13] parallel_3.6.2          spam_2.5-1              akima_0.6-2             padr_0.5.0              raster_3.0-12           graphics_3.6.2         
[19] mapplots_1.5.1          datasets_3.6.2          stats_3.6.2             fields_10.3             maps_3.3.0              grid_3.6.2             
[25] base_3.6.2              data.table_1.12.6       dtw_1.21-3              pbapply_1.4-2           tcltk_3.6.2             sm_2.2-5.6             
[31] SpecsVerification_0.5-2 sp_1.3-2                latticeExtra_0.6-28     magrittr_1.5            scales_1.0.0            codetools_0.2-16       
[37] CircStats_0.2-6         MASS_7.3-51.5           abind_1.4-5             colorspace_1.4-1        proxy_0.4-23            munsell_0.5.0          
[43] kohonen_3.0.10          verification_1.42       easyVerification_0.4.4  RcppEigen_0.3.3.7.0     zoo_1.8-6