-
Notifications
You must be signed in to change notification settings - Fork 0
/
01-united-nations.qmd
763 lines (611 loc) · 24.9 KB
/
01-united-nations.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
# M49 Classification {#sec-m49}
```{r}
#| label: setup
#| results: hold
#| include: false
base::source(file = "R/helper.R")
```
## Objectives {#sec-m49-objectives}
:::::: {#obj-m49-objectives}
::::: my-objectives
::: my-objectives-header
Chapter section list
:::
::: my-objectives-container
1. Locate the file for the UN M49 geoscheme classification data, get it
and save it as raw data (@sec-m49-download).
2. Inspect the data thoroughly (@sec-m49-inspect-raw).
3. Clean the data (first step) (@sec-m49-clean-data-1).
4. Display the grouping results (@sec-m49-display-result-1).
5. Clean the intermediate region to get the official results
(@sec-m49-clean-data-2).
6. Display the new intermediate region group (@sec-m49-display-result-2).
7. List UN and non-UN member states {@sec-m49-un-member-states}
8. Summary and conclusion {@sec-m49-summary}
:::
:::::
::::::
## Download M49 Data {#sec-m49-download}
An important --- maybe the most authoritative --- classification system
is developed and maintained by the United Nations. It is expressively
developed for statistical purposes by the United Nations Statistics
Division `r glossary("UNSD")` using the `r glossary("M49")` methodology.
The result is called [Standard country or area codes for statistical use
(M49)](https://unstats.un.org/unsd/methodology/m49/) and can be
downloaded manually in different languages and formats (Copy into the
clipboard, Excel or CSV) from the [United Nations Methodology Overview
page](https://unstats.un.org/unsd/methodology/m49/overview/).
:::::: my-resource
:::: my-resource-header
::: {#lem-m49-unsd}
: United Nations Statistics Division M49 Classification
:::
::::
::: my-resource-container
- [Manual
download](https://unstats.un.org/unsd/methodology/m49/overview/):
The standard country or area codes for statistical use (M49) is
available in different languages (English, Chinese, Russian, French,
Spanish, Arabic) by clicking one of the buttons "Copy", "Excel" or
"CSV". On this page is no URL for an programmable download with an R
script available, because Javascript triggers the buttons mentioned
above.
- [Automatic download by OMNIKA
store](https://github.com/omnika-datastore/unsd-m49-standard-area-codes):
I found with [OMNIKA
DataStore](https://omnika.org/datastore)[^01-united-nations-1] an
external source to download the classification file via R
script[^01-united-nations-2].
The `r glossary("OMNIKA")` URLs for download are:
- **EXCEL**:
[2022-09-24\_\_Excel_UNSD_M49.xlsx](https://github.com/omnika-datastore/unsd-m49-standard-area-codes/raw/refs/heads/main/2022-09-24__Excel_UNSD_M49.xlsx)
- **CSV**:
[2022-09-24\_\_CSV_UNSD_M49.csv](https://github.com/omnika-datastore/unsd-m49-standard-area-codes/raw/refs/heads/main/2022-09-24__CSV_UNSD_M49.csv)
:::
::::::
[^01-united-nations-1]: OMNIKA DataStore is an open-access data science
resource for researchers, authors, and technologists. It is 501c3
nonprofit organization whose mission is to digitize, organize, and
make important (free) contents available for the general public. The
service provides raw data from trusted sources, data visualizations,
data analysis tools, and other digital resources.
[^01-united-nations-2]: A check with `base::all.equal()` turned out that
the files from the two different sources (UNSD and OMNIKA) are
identical.
:::::: my-r-code
:::: my-r-code-header
::: {#cnj-m49-classification-download}
: Download the United Nations M49 Classification
:::
::::
::: my-r-code-container
<center>**Run this code chunk manually if the file still needs to be
downloaded.**</center>
```{r}
#| label: m49-raw
#| code-fold: show
#| eval: false
## create folders ###########
pb_create_folder(base::paste0(here::here(), "/data/"))
pb_create_folder(base::paste0(here::here(), "/data/unsd"))
## download m49 file ############
url <- "https://github.com/omnika-datastore/unsd-m49-standard-area-codes/raw/refs/heads/main/2022-09-24__CSV_UNSD_M49.csv"
downloader::download(
url = url,
destfile = base::paste0(here::here(),
"/data/unsd/m49_raw.csv")
)
## create R object ###############
m49_raw <-
readr::read_delim(
file = base::paste0(here::here(),
"/data/unsd/m49_raw.csv"),
delim = ";"
)
## save as .rds file ################
pb_save_data_file(
"unsd",
m49_raw,
"m49_raw.rds")
```
<center>(*For this R code chunk is no output available*)</center>
:::
::::::
## Inspect M49 Data {#sec-m49-inspect-raw}
To get an detailed understanding of the data structures I will provide
the following two outputs of the raw-data:
1. A summary statistics with `skimr::skim()` followed by inspection of
the first data rows with `dplyr::glimpse()`.
2. Several detailed outputs of the classifications categories (regions)
and their elements (countries) in different code chunks (tabs).
To facilitate the second task I have prepared the function
`pb_class_scheme()` and stored in "R./helper.r".
:::::: my-r-code
:::: my-r-code-header
::: {#cnj-m49-inspect-raw}
: Inspect raw data of the UNSD M49 geoscheme classification
:::
::::
::: my-r-code-container
```{r}
#| label: inspect-m49-clean
#| results: hold
m49_raw <- base::readRDS("data/unsd/m49_raw.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(m49_raw)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(m49_raw)
```
:::
::::::
## Special cases
### Missing values
The file has 15 columns as you can also see online from the [Overview
page](https://unstats.un.org/unsd/methodology/m49/overview/).
The many missing values (`NAs`) for the categories `r glossary("LDC")`,
`r glossary("LLDC")` and `r glossary("SIDS")` are easy explained: These
three columns are coded with an 'x' if the country of this row belong to
this category. Recoding these three columns with 1 and 0 (1 = yes,
belongs to this category, 0 = no, does not belong to this category) will
reduce most of their missing values.
### Antarctica
One missing value in the regional categories (Region, Sub-Region and
Intermediate Region) is related to Antarctica which is not seen by the
M49 scheme as a separated region. It has therefore no regional codes and
names with the exception of the overall comprising global region. But it
has M49 as well ISO-alpha codes.
### Channel Island Sark
One of the missing values for ISO-alpha2 and ISO-alpha3 is related to
Sark, which is "recognized by the United Nations Statistics Division
(UNSD) as a separate territory" but was not accepted by ISO now for more
than 20 years [@mccarthy-2020]. 2020 a new 54-page submission for an ISO code (see
[PDF](https://www.sarkid.org/assets/pdf/SarkID%20Identity%20info%20v1_2.pdf)) was applied to ISO but it seems still under consideration, because currently [^01-united-nations-3] Sark has no [ISO 3166 codes](https://www.iso.org/iso-3166-country-codes.html).
For further processing of the M49 dataset, however, it is crucial to have a complete list of ISO Codes because these are the columns where the joining of two dataset will be linked. There exist two possibilities to get a complete list for all UN entries for the ISO codes:
1. Remove the Sark entry from the dataset.
2. Add ISO-alpha2 and ISO-alpha3 for Sark.
The second possibility is not so out of the hand, because it is possible that Sark will succeed finally with its application. In that case it is highly likely that it will get "CQ" for ISO-alpha2 and "SCQ" for the ISO-alpha3 Code. (The other possibility "sk" is already by Slovakia and Sercq is the original Norman dialect spelling of the island. See [After 20-year battle, Channel island Sark finally earns the right to exist on the internet with its own top-level domain](https://www.theregister.com/2020/03/23/sark_cctld_iso/) [@mccarthy-2020]).
[^01-united-nations-3]: Even if United Nations and ISO both list 249
entries (including Antarctica) there is a small difference between
ISO 3166 codes and the M49 geoscheme of the UN: The UN lists Sark
(without ISO codes), whereas the list of ISO includes "Taiwan
(Province of China)" with the ISO codes "TW" and "TWN".
### Namibia
The other missing value for ISO-alpha2 codes belongs to Namibia because
its abbreviation `NA` is interpreted by R as a missing value!
## Clean Data (first step) {#sec-m49-clean-data-1}
### Procedure
:::::: my-procedure
:::: my-procedure-header
::: {#prp-m49-clean-1}
: Clean M49 data of the UNSD geoscheme classification (first step)
:::
::::
::: my-procedure-container
1. Load the original M49 dataset ("m49_raw.rds")
2. Remove the global codes and global names because they a redundant:
All rows have global code "001" ("World").
3. Shorten long names to their abbreviation ("LCD", "LLCD" and "SIDS").
4. Remove row "Antarctica" because it is not seen as separate country.
5. Add ISO Codes "CQ" and "SCQ" to `Country or Area` of Sark.
6. Replace `NA` in the column ISO-alpha2 Code" of Namibia with the
string "NA".
7. Recode the columns `r glossary("LDC")`, `r glossary("LLDC")` and
`r glossary("SIDS")` with 0 and 1.
8. Relocate columns `ISO-alpha3 CODE` and `Country or Area` to the
first two columns because these two columns are always relevant for
the later groupings and joining with groupings from other sources.
9. Sort the data alphabetically by `Country or Area`.
:::
::::::
### Result (first step) {#sec-m49-display-result-1}
#### Data Structure
:::::: my-r-code
:::: my-r-code-header
::: {#cnj-m49-clean-1}
: Clean UNSD M49 geoscheme classification data (first step)
:::
::::
::: my-r-code-container
```{r}
#| label: clean-m49-1
#| results: hold
## column renaming vector ########
m49_cols = c(
LDC = "Least Developed Countries (LDC)",
LLDC = "Land Locked Developing Countries (LLDC)",
SIDS = "Small Island Developing States (SIDS)"
)
## clean data ###############################
m49_clean <- base::readRDS("data/unsd/m49_raw.rds") |> # (1)
dplyr::select(-(1:2)) |> # (2)
dplyr::rename(tidyselect::all_of(m49_cols)) |> # (3)
dplyr::filter(`Country or Area` != "Antarctica") |> # (4)
dplyr::mutate(
`ISO-alpha2 Code` =
base::ifelse(`Country or Area` == "Sark",
"CQ", `ISO-alpha2 Code`), # (5a)
`ISO-alpha3 Code` =
base::ifelse(`Country or Area` == "Sark",
"SCQ", `ISO-alpha3 Code`), # (5b)
`ISO-alpha2 Code` =
base::ifelse(`Country or Area` == "Namibia",
"NA", `ISO-alpha2 Code`) # (6)
) |>
dplyr::relocate(
## any_of() does not understand object names (??)
tidyselect::any_of(
c("ISO-alpha3 Code", "Country or Area")),
.before = `Region Code`) |> # (7)
# .x = anonymous function; "x" = value in cols of m40_clean
dplyr::mutate(dplyr::across(
LDC:SIDS, ~ dplyr::if_else(.x == "x", 1, 999, 0) # (8)
)) |>
dplyr::arrange(`Country or Area`) # (9)
## save new tibble ##########
pb_save_data_file(
"unsd",
m49_clean,
"m49_clean.rds"
)
## display results ##########
m49_clean <- base::readRDS("data/unsd/m49_clean.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(m49_clean)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(m49_clean)
```
------------------------------------------------------------------------
As explained above, only Sark has no value for `ISO-alpha2 Code` and
`ISO-alpha3 Code` .
:::
::::::
#### Regional Groups
As we can see from @cnj-m49-clean-1 the M49 classification of the United
Nations knows three different regional groups (in addition to the
overall region `World`.)
:::::::::::::::::::: my-code-collection
::::: my-code-collection-header
::: my-code-collection-icon
:::
::: {#exm-m49-show-regional-groups}
: Display different regional groups
:::
:::::
:::::::::::::::: my-code-collection-container
::::::::::::::: panel-tabset
###### Region
:::::: my-r-code
:::: my-r-code-header
::: {#cnj-m49-show-region-group}
: Show Regions of the UNSD M49 Geoscheme
:::
::::
::: my-r-code-container
```{r}
#| label: show-m49-region
(
m49_region <- pb_class_scheme(
df = base::readRDS("data/unsd/m49_clean.rds"),
sel1 = rlang::quo(`Country or Area`),
sel2 = rlang::quo(`Region Name`)
)
)
```
------------------------------------------------------------------------
"Region" is a classification scheme with **`r sum(m49_region$x$data$N)`
countries in `r length(m49_region$x$data$N)` regions**.
:::
::::::
###### Sub-region
:::::: my-r-code
:::: my-r-code-header
::: {#cnj-m49-show-sub-regional-groups}
: Show Sub-regions of the UNSD M49 Geoscheme
:::
::::
::: my-r-code-container
```{r}
#| label: show-m49-sub-region
(
m49_sub_region <- pb_class_scheme(
df <- base::readRDS("data/unsd/m49_clean.rds"),
sel1 = rlang::quo(`Country or Area`),
sel2 = rlang::quo(`Sub-region Name`)
)
)
```
------------------------------------------------------------------------
"Sub-region" is a classification scheme with
**`r sum(m49_sub_region$x$data$N)` countries in
`r length(m49_sub_region$x$data$N)` regions**.
:::
::::::
###### Intermediate
:::::: my-r-code
:::: my-r-code-header
::: {#cnj-m49-show-intermediate-regional-groups}
: : Show Intermediate Regions of the UNSD M49 Geoscheme
:::
::::
::: my-r-code-container
```{r}
#| label: show-m49-intermediate-region
(
m49_intermediate_region <- pb_class_scheme(
df = base::readRDS("data/unsd/m49_clean.rds"),
sel1 = rlang::quo(`Country or Area`),
sel2 = rlang::quo(`Intermediate Region Name`)
)
)
```
------------------------------------------------------------------------
"Intermediate Region" is a classification scheme with
**`r sum(m49_intermediate_region$x$data$N)` countries in
`r length(m49_intermediate_region$x$data$N)` regions**.
:::
::::::
:::::::::::::::
::::::::::::::::
::::::::::::::::::::
## Clean Data (second step) {#sec-m49-clean-data-2}
The intermediate grouping does not result into the expected 22 (with
Antarctica: 23) different regions as is mentioned in many documents. See
for instance the [Article on Wikipedia about the UN
geoscheme](https://en.wikipedia.org/wiki/United_Nations_geoscheme) which
features a [colored world
map](https://en.wikipedia.org/wiki/United_Nations_geoscheme#/media/File:United_Nations_geographical_subregions.png)
and a list of countries grouped into the 22 different regions.
![M49 Geoscheme developed and maintained by the United Nations
Statistics Divisions (UNSD).
(<a href="http://creativecommons.org/licenses/by-sa/3.0/" title="Creative Commons Attribution-Share Alike 3.0">CC
BY-SA 3.0</a>,
<a href="https://commons.wikimedia.org/w/index.php?curid=497598">Wikimedia
Commons</a>)](img/chap01_UN_geographical_subregions.png){#fig-chap01_UN_geographical_subregions
fig-alt="22 geographical sub-regions as defined by the UNSD are shown with different colors. Antarctica is not shown."
fig-align="center" width="100%"}
The solution is that we have the `NA`s in `Intermediate Region Name` to
replace with the values of the sub-regions. Additionally --- as can be
seen in @cnj-m49-show-intermediate-regional-groups --- there is a second
small problem: Three small countries are listed as an extra group
"Channel Islands". To get the official intermediate grouping we need to
get rid of this group and sort all three of them into the category of
"Northern Europe".
### Procedure
:::::: my-procedure
:::: my-procedure-header
::: {#prp-m49-cnj-clean-2}
: Clean M49 data of the UNSD geoscheme classification (second step)
:::
::::
::: my-procedure-container
1. Replace the `NA` values of `Intermediate Region Name` with values
from the `Sub-region Name` column.
2. Replace the `NA` values of `Intermediate Region Code` with values
from the `Sub-region Code` column.
3. Replace the "Channel Islands" values in `Intermediate Region Name`
with the value of "Northern Europe".
4. Replace the "Channel Islands" values ("830") in
`Intermediate Region Code` with the code of "Northern Europe"
("154").
:::
::::::
### Result (second step) {#sec-m49-display-result-2}
#### Data Structure
:::::: my-r-code
:::: my-r-code-header
::: {#cnj-m49-clean-2}
: Clean UNSD M49 geoscheme classification data (second step)
:::
::::
::: my-r-code-container
```{r}
#| label: clean-m49-2
#| results: hold
m49_clean2 <- base::readRDS("data/unsd/m49_clean.rds") |>
## replace `NA`s of intermediate regions with sub-region values ######
dplyr::mutate(`Intermediate Region Name` =
base::ifelse(is.na(`Intermediate Region Name`),
`Sub-region Name`, `Intermediate Region Name`), # (1)
`Intermediate Region Code` =
base::ifelse(is.na(`Intermediate Region Code`),
`Sub-region Code`, `Intermediate Region Code`), # (2)
## replace ""Channel Islands" with "Northen Europe" values ######
`Intermediate Region Name` =
base::ifelse(`Intermediate Region Name` == "Channel Islands",
"Northern Europe", `Intermediate Region Name`), # (3)
`Intermediate Region Code` =
base::ifelse(`Intermediate Region Code` == "830",
"154", `Intermediate Region Code`) # (4)
)
## save new tibble as clean2 ##########
pb_save_data_file("unsd", m49_clean2, "m49_clean2.rds")
## display results ##########
m49_clean2 <- base::readRDS("data/unsd/m49_clean2.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(m49_clean2)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(m49_clean2)
```
:::
::::::
With the "trick" to replace the `NA`s of `Intermediate Region Name` with
the values of `Sub-regional Name`column we also got rid of the many
`NA`s in that and the accompanying `Intermediate Region Code` column.
#### Intermediate Region Again
:::::: my-r-code
:::: my-r-code-header
::: {#cnj-m49-show-intermediate-region-2}
: Show correct intermediate region for the UNSD M49 Geoscheme
:::
::::
::: my-r-code-container
```{r}
#| label: show-intermediate2-region
#| results: hold
## show new intermediate result ###############
(
m49_intermediate2 <- pb_class_scheme(
df = base::readRDS("data/unsd/m49_clean2.rds"),
sel1 = rlang::quo(`Country or Area`),
sel2 = rlang::quo(`Intermediate Region Name`)
)
)
```
:::
::::::
The new regional group `Intermediate Region Name` is a classification
scheme with **`r sum(m49_intermediate2$x$data$N)` countries in
`r length(m49_intermediate2$x$data$N)` regions**. As Antarctica is not
included the grouping with `Intermediate Region Name` represents the
correct M49 classification.
## Un Countries {#sec-m49-un-member-states}
The UN M49 geoscheme classification data contains 15 columns and 248
rows (= Columns or Areas). This is much more than the [currently 193
member states of the United
Nations](https://www.worldatlas.com/geography/how-many-countries-are-there-in-the-world.html).
Even if we include Holy See (Vatican) and the State of Palestine, which
are non-member observer states and the two controversial countries /
areas (Taiwan & Kosovo) we are far from the 248 countries or areas
listed in the M49 geoscheme of the United Nations.
### Procedure
:::::: my-procedure
:::: my-procedure-header
::: {#prp-m49-un-countries}
: Add Column for UN country membership
:::
::::
::: my-procedure-container
To get only the 193 member states and the two non-member observer states
(= 195 UN states) I will apply the following procedure:
1. Prepared a list with the names of the countries and their ISO-alpha3
codes. (The ISO-alpha3 code is important for the later joining with
the cleaned data `m49_clean.rds`). --- Done manually, no program
code.
2. Saved this two row data file manually as
"data/unsd/un_countries.csv". --- Done manually, no program code.
3. Load "data/unsd/un_countries.csv" into memory.
4. Add a new row `UN` (for UN member state) to the data, fill all
values with "1" (= member state).
5. Save the result as R object `un_countries.rds`
6. Load `m49_clean.rds` and `un_countries.rds`.
7. Join these two data frames as tibbles fully via their ISO code
columns.
8. Replace all `NA`s of the `UN` column (these rows did not come from
the "un_countries" file) with the value "0" (= non UN member state).
9. Delete the redundant `Country` column --- originally from the
"un_countries" file.
10. Save the result as new R object `m49_clean3.rds`.
11. Inspect the result.
:::
::::::
### Result
::::::: my-code-collection
::::: my-code-collection-header
::: my-code-collection-icon
:::
::: {#exm-m49-show-clean3-file}
: Display structure and content of the adapted UN M49 geoscheme dataset
:::
:::::
:::::: my-code-collection-container
::: panel-tabset
###### Structure
:::::: my-r-code
:::: my-r-code-header
::: {#cnj-m49-add-un-member-state-column}
: Add column with UN membership
:::
::::
::: my-r-code-container
```{r}
#| label: get-un-countries
#| results: hold
un_countries <- readr::read_csv("data/unsd/un_countries.csv",
show_col_types = FALSE) |> # (3)
dplyr::mutate(UN = 1) # (4)
pb_save_data_file("unsd", un_countries, "un_countries.rds") # (5)
x <- base::readRDS("data/unsd/m49_clean2.rds") # (6a)
y <- base::readRDS("data/unsd/un_countries.rds") # (6b)
m49_clean3 <- dplyr::full_join(
x, y, dplyr::join_by(`ISO-alpha3 Code` == Code)
) |> # (7)
dplyr::mutate(UN =
base::ifelse(base::is.na(UN), 0, UN)
) |> # (8)
dplyr::select(-Country) # (9)
pb_save_data_file("unsd", m49_clean3, "m49_clean3.rds") # (10)
skimr::skim(m49_clean3) # (11)
```
:::
::::::
###### UN members
:::::: my-r-code
:::: my-r-code-header
::: {#cnj-m49-show-un-member-states}
: List the 193 UN member states (plus Vatican and Palestine as
non-member observers)
:::
::::
::: my-r-code-container
```{r}
#| label: list-un-members
base::readRDS("data/unsd/m49_clean3.rds") |>
dplyr::filter(UN == 1) |>
dplyr::select(1,2,4,8) |>
DT::datatable()
```
:::
::::::
###### Non-UN members
:::::: my-r-code
:::: my-r-code-header
::: {#cnj-show-non-un-member-states}
: List the 193 UN member states (plus Vatican and Palestine as
non-member observers)
:::
::::
::: my-r-code-container
```{r}
#| label: list-non-un-members
base::readRDS("data/unsd/m49_clean3.rds") |>
dplyr::filter(UN == 0) |>
dplyr::select(1,2,4,8) |>
DT::datatable()
```
:::
::::
:::::
::::::
:::::::
## Summary {#sec-m49-summary}
The UN M49 geoscheme data contains 249 countries or areas This includes
193 UN members, 2 non-member states with observer status, Antartica and 53
dependent territories. With the exception of Sark all of this regions have ISO-alpha2 and
ISO-alpha3 codes `ISO-alpha2 Code` and `ISO-alpha3 Code`. These two columns are important, because
they facilitate joining data from other sources via this standardized codes. This is
crucial because the spelling of the names of countries and areas is
not always identical in the different sources. You can't therefore often
not join two data sets just by country or area name but needs a more
systematic approach with the ISO codes.
UN M49 geoscheme classification has three regional division with 5, 17
and 22 groups (always without Antarctica and not counting the overall
group "World" with code "001").
About the three grouping we can say (always not including Antarctica):
- **Region**: It consists of five groups representing more or less the
continents. But instead of the traditional separation between
Northern and Southern America it unites these two continents into
the "Americas", including also the Caribbean countries.
- **Sub-region**: It consists of 17 groups by dividing some of the
continents into very big sub-regions. For instance the many African
countries are grouped only into two groups: Northern Africa and
Sub-Saharan Africa. This is in contrast to the more detailed
"Intermediate region" where we have Northern-, Eastern, Western,
Southern and Middle Africa. On the other hand the smaller Europe is divided into
four sub-regions. The sub-regional division is in my opinion therefore not a
very consistent classification.
- **Intermediate region**: It consists of 22 groups and is for
statistical purposes the most detailed **regional** classification.
Additionally there are with `r glossary("LDC")`, `r glossary("LLDC")` and `r glossary("SIDS")` three other divisions, driven not by regional reasons but by geographical common features. As we will see in the other chapters there are --- besides of the UN M49 geoscheme --- other approaches for a consistent country classification.