forked from e-kotov/ru-resilient-arctic-clusters
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpaper.Rmd
executable file
·220 lines (147 loc) · 9.39 KB
/
paper.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
---
title: "Clustering of Russian Arctic Cities: data, R code
and plots"
authors:
- name: Zamyatina, N., Kotov, E., Goncharov, R., Burceva, A., Grebenec, V., Medvedkov, A., Molodcova, V., Kljueva, V., Kulchitsky, Y., Mironova, B., Nikitin, B., Pilyasov, A., Polyachenko, A., Poturaeva, A., Streletskiy, D., & Shamalo, I.
output:
# github_document
html_document:
theme: readable
toc: yes
number_sections: yes
toc_depth: 5
toc_float:
collapsed: no
smooth_scroll: yes
editor_options:
chunk_output_type: console
always_allow_html: true
---
# Introduction
Zamyatina, N., Kotov, E., Goncharov, R., Burceva, A., Grebenec, V., Medvedkov, A., Molodcova, V., Kljueva, V., Kulchitsky, Y., Mironova, B., Nikitin, B., Pilyasov, A., Polyachenko, A., Poturaeva, A., Streletskiy, D., & Shamalo, I. (2022). Resilience Potential of the Russian Arctic Cities. Vestnik Moskovskogo Universiteta. Seria 5, Geografia (In Russian), 5, 52–65. URL: [https://vestnik5.geogr.msu.ru/jour/article/view/1065](https://vestnik5.geogr.msu.ru/jour/article/view/1065){target='_blank'}
**Abstract**
Resilience is the ability of urban systems to overcome natural or manufactured crises. It is regarded as a complementary concept to that of sustainable development. Application of the concept of resilience is particularly relevant in the Arctic, where both natural and economic systems are particularly vulnerable. The article analyzes 19 quantitative indicators for 27 Arctic settlements of the Russian Federation according to the following subsystems: economic specialization, life support and communal services, socio-cultural, natural-ecological, administrative and managerial. Cluster analysis identified 7 groups of cities that consistently demonstrate similarity under different versions of analysis. Overcoming crises in a city development requires simultaneous resilience in different subsystems of urban development; the weakness of any of these subsystems could cause the collapse of the entire system. Therefore, the assessment of resilience requires an integrated approach.
**Key words:**
sustainable development, Arctic cities, socio-ecological systems
**Note:**
All code is visible for reference. Some custom functions are saved in `/R/functions.R`.
Repository: [https://github.com/e-kotov/ru-resilient-arctic-clusters](https://github.com/e-kotov/ru-resilient-arctic-clusters)
## Loading packages
```{r setup, include=T, cache = F, echo=TRUE, message=FALSE, warning=FALSE}
# pre-install key packages
if( !"renv" %in% installed.packages()[,'Package'] ) { install.packages("renv") }
if( !"here" %in% installed.packages()[,'Package'] ) { install.packages("here") }
# install other packages
source(here::here("R", "functions.R"))
knitr::opts_chunk$set(cache = FALSE)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(warning = FALSE)
# Set so that long lines in R will be wrapped:
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=60), tidy=TRUE)
invisible(lapply(eval(packages_to_load), require, character.only = TRUE))
```
## Loading data
Load the data set and replace the Russian city names with transliterated versions.
```{r load_data, include=T}
settlements <- fread(here::here("data", "arctic_settlements_stats.csv"))
ru_en_names <- fread(here::here("data", "ru_en_names.csv"))
codebook <- fread(here::here("codebook", "codebook.csv"))
settlements <- settlements %>%
rename(Settlement_Name_ru = Settlement_Name) %>%
left_join(y = ru_en_names, by = "Settlement_Name_ru") %>%
relocate(Settlement_Name_en, .before = Settlement_Name_ru) %>%
mutate(Settlement_Name = paste0(Settlement_Name_en, "\n(", Settlement_Name_ru, ")")) %>%
relocate(Settlement_Name, .before = Settlement_Name_en) %>%
dplyr::select(-Settlement_Name_en, -Settlement_Name_ru)
DT::datatable(settlements, options = list(scrollX = TRUE, paging=TRUE))
```
# Data preprocessing
Scale and centre the data before clustering.
```{r prep_for_clustering}
settlements_scaled <- scale(settlements %>% dplyr::select(-Settlement_Name), center = T, scale = T)
row.names(settlements_scaled) <- settlements$Settlement_Name
```
# UMAP + GMM
Here we iterate 1000 times with UMAP dimensionality reduction and cluster the results using GMM with automatic selection of optimal number of clusters. Since 100 runs of UMAP is time consuming, the results are cached. To rerun the analysis, remove the `./outputs/cache/um_gmm_clusters.rds` file.
```{r cluster_umaps}
um_gmm_cache_path <- here::here("outputs", "cache", "um_gmm_clusters.rds")
if( fs::file_exists(um_gmm_cache_path) == FALSE) {
set.seed(84756)
seeds <- sample(1:2^15, 1000)
plan(multisession, workers = (parallel::detectCores() - 1) )
um_resamples <- seeds %>%
future_map(~ helper_umap_df(x = dtx_scaled, seed = .x, dims = 2),
.progress = T,
.options = furrr_options(seed = NULL))
um_resamples_gmm <- um_resamples %>% future_map(~ helper_gmm_clust(scale(.x), kk = 1:15, seed = 123),
.options = furrr_options(seed = NULL))
um_resamples_gmm_clusters <- um_resamples_gmm %>% purrr::map( ~ .x[["clust"]][["classification"]] )
saveRDS(um_resamples_gmm_clusters, um_gmm_cache_path)
} else {
um_resamples_gmm_clusters <- readRDS(um_gmm_cache_path)
}
```
# Transform data into network structure
We find all pairs of cities that fell into the same cluster. Every settlement is a node, every edge between settlements is evidence of them being part of the same cluster. The edge weight is the count of how many times (out of 1000 reruns of UMAP + GMM) a pair of settlements belonged to the same cluster.
```{r um_pre_graph, include=TRUE}
umap_city_pairs_w <- um_resamples_gmm_clusters %>%
purrr::map( ~ helper_create_city_pairs(city_names = rownames(settlements_scaled), .x) ) %>%
rbindlist(.) %>%
.[ , .(from = V1, to = V2), ] %>%
.[ , .(weight = as.numeric(.N)), by = .(from, to)]
DT::datatable(umap_city_pairs_w, options = list(scrollX = TRUE, paging=TRUE))
```
As we can see in the histogram below, there is a significant number of settlements that were never grouped in the same cluster despite certain randomness in UMAP dimensionality reduction followed by the GMM clustering. We can also observe a set of settlement pairs that almost always ended up in the same cluster in over 90% of reruns of UMAP + GMM, which means we can be confident that those settlements do have similar characteristics.
```{r graph_hist}
umap_city_pairs_w %>%
ggplot(aes(x = weight)) +
geom_histogram(binwidth = 10) +
labs(title = "Number of times\na pair of settlements\nbelonged to the same cluster",
x = "Times in the same cluster") +
theme_pubclean()
```
# Visualise the network-based groups
Here we visualise the clustering results using a network representation. The weight of the edges shows how confident we are that a pair of settlements is similar in terms of their resiliency indicators.
```{r network_vis}
netw_plot_path <- here::here("outputs", "plots", "umap_net.html")
umapgt <- prep_graph(DF = umap_city_pairs_w, cut_off = c(85,90,95) ,
colour_scale = c('#DFDFDF', 'lightpink', 'hotpink', 'brown'))
umap_vis <- vis_net(umapgt, cut_off_weight = 1, font_size = 40, width = "100%", height = "100%", physics_on = TRUE)
visNetwork::visSave(umap_vis, file = netw_plot_path, selfcontained = TRUE)
umap_vis
```
# Set the clusters and show the indicators
After examining the network representation above with our Arctic experts, we have settled on a cluster that was manually assigned to each settlement.
```{r set_expert_clusters}
expert_clusters <- fread(here::here("data", "manual_clusters.csv"))
expert_clusters <- expert_clusters %>%
rename(Settlement_Name_ru = Settlement_Name) %>%
left_join(y = ru_en_names, by = "Settlement_Name_ru") %>%
relocate(Settlement_Name_en, .before = Settlement_Name_ru) %>%
mutate(Settlement_Name = paste0(Settlement_Name_en, "\n(", Settlement_Name_ru, ")")) %>%
relocate(Settlement_Name, .before = Settlement_Name_en) %>%
dplyr::select(-Settlement_Name_en, -Settlement_Name_ru)
DT::datatable(expert_clusters, options = list(scrollX = TRUE, paging=TRUE))
settlements_cl <- settlements %>%
left_join(expert_clusters, by = "Settlement_Name") %>%
relocate(Settlement_Name, Cluster, .before = AUT_admin_lev) %>%
as.data.frame()
row.names(settlements_cl) <- settlements_cl$Settlement_Name
```
```{r}
settlements_cl_long_var_names <- copy(settlements_cl)
names_dt <- data.table(var_id = names(settlements_cl)[ names(settlements_cl) %in% codebook$var_id] )
names_dt <- merge(names_dt, codebook[ var_id %in% names_dt$var_id, .(var_id, var_name_short_en)], by = "var_id")
setnames(settlements_cl_long_var_names, old = names_dt$var_id, new = names_dt$var_name_short_en)
```
We now add these clusters to the original data set and visualise the indicator values for every cluster.
```{r bee_plot}
k_bee_plot <- helper_hists_with_clusters(df = settlements_cl_long_var_names %>% select(-Settlement_Name, -Cluster),
clust = settlements_cl_long_var_names %>% pull(Cluster),
n_col = 2)
k_bee_plot_inter <- ggplotly(k_bee_plot, tooltip = "text", height = 1400)
htmlwidgets::saveWidget(k_bee_plot_inter, file = here::here("outputs", "plots", "umap_net_groups.html"))
ggsave(filename = here::here("outputs", "plots", "umap_net_groups.png"), k_bee_plot, width = 10, height = 14, dpi = 400)
k_bee_plot_inter
```