TheAttraIndexOfGlobEcon.Rmd

---
title: "The Attractiveness Index of Global Economies (Oct. 2015)"
author: "Dmitrij Petrov (@dmpe)"
date: "`r format(Sys.time(), '%b %d %Y')`"
output: 
  html_document: 
    css: TheAttraIndexOfGlobEcon.css
    fig_height: 7
    fig_width: 10
    number_sections: yes
    toc: yes
    mathjax: null
---
# Intro

This is a short `code-summary`, which shows how I have created <i>The Attractiveness Index of Global Economies</i> in my [bachelor thesis](https://dmpe.github.io/PapersAndArticles/thesis/). The ONLY difference with the original text is that this summary uses (kind-of) latest data available as of October 2015 (the thesis was submitted in mid-July 2015).

The source code for the thesis can be found at GitHub: <https://github.com/dmpe/bachelor>

# Libraries

```{r, echo=TRUE, message=FALSE, warning=FALSE}
library(grid)
library(gridExtra)
library(rvest)
library(plyr)
library(dplyr)
library(stringr)
library(xlsx)
library(Quandl)
library(scales)
library(corrplot)
library(ellipse)
library(psych)
library(cluster)
library(ggplot2)
library(clustrd)
library(reshape2)

set.seed(5154)
```

# 30 Counties
```{r}
selectedCountries <- list("Korea", "Singapore", "Japan", "Chile", "Czech Republic", "Nigeria", "China", "Germany", "Switzerland",
                          "Mexico", "Jordan", "Brazil", "Russia", "United States", "United Kingdom", "United Arab Emirates",
                          "Australia", "South Africa", "Kenya", "Finland", "Canada", "Israel", "New Zealand", "France", "Hungary",
                          "Thailand", "Indonesia", "Ghana", "Colombia", "Turkey")

```

# Data Sources

Based on chapter 2.2, page 15ff. 

## Human Development's Educational Index (UNDP) 

```{r}
# "Extract-HTML" way
hdi <- read_html('http://hdr.undp.org/en/content/education-index') 
hdi <- hdi %>%
  html_node('.table') %>% 
  html_table(header = T)

hdi <- hdi[1:187,c("Country", "2013")]
hdi <- plyr::rename(hdi, c(`2013` = "HDIEducatIndex"))

hdi$Country[hdi$Country == "Korea (Republic of)"] <- "Korea"
hdi$Country[hdi$Country == "Russian Federation"] <- "Russia"
hdi$HDIEducatIndex <- as.numeric(hdi$HDIEducatIndex)
head(hdi)
```

## Pearson’s Learning Curve Index

```{r}
# Data as of January 2014
learningCurveData <- read.xlsx("1_RawData/DataSources/learningcurve.xlsx", sheetIndex = 1, startRow = 18, endRow = 58)

learningCurveData <- plyr::rename(learningCurveData, c(NA. = "Country", Overall.Index = "LearningCurve_Index"))
# sapply(learningCurveData, class) # factors -> to char

learningCurveData$Country <- str_trim(learningCurveData$Country, side = "both")

learningCurveData$Country[learningCurveData$Country == "South Korea"] <- "Korea"
learningCurveData$Country[learningCurveData$Country == "Hong Kong-China"] <- "China"

#' delete some columns
learningCurveData <- learningCurveData[, !(colnames(learningCurveData) %in% c("Cognitive.Skills", "Educational.Attainment", 
    "Notes", "NA..1", "NA..2"))]

learningCurveData$Ranking_LearningCurve <- seq(1, 40)
head(learningCurveData)
```

## The Youth unemployment rate

```{r, echo=TRUE}
options(width=200)
Country = c("Korea", "Singapore", "Japan", "Chile", "Czech Republic", "Nigeria", "China", "Germany", "Switzerland", "Mexico", 
            "Jordan", "Brazil", "Russia", "United States", "United Kingdom", "United Arab Emirates", "Australia", "South Africa", "Kenya", "Finland", "Canada", "Israel", "New Zealand", "France", "Hungary", "Thailand", "Indonesia", "Ghana", "Colombia", "Turkey")

Quandl.api_key("GgnxpyUBXHsyQxqp67bY")

Korea <- Quandl("WORLDBANK/KOR_SL_UEM_1524_ZS")[1, 2]
# http://api.worldbank.org/countries/CHN/indicators/SL.UEM.1524.ZS?per_page=1000
# China <- Quandl('WORLDBANK/CHN_SL_UEM_1524_ZS')[1,2]
China <- 10.1
Germany <- Quandl("WORLDBANK/DEU_SL_UEM_1524_ZS")[1, 2]
```

```{r, echo=FALSE}
options(width=200)
Switzerland <- Quandl("WORLDBANK/CHE_SL_UEM_1524_ZS")[1, 2]
Mexico <- Quandl("WORLDBANK/MEX_SL_UEM_1524_ZS")[1, 2]
Brazil <- Quandl("WORLDBANK/BRA_SL_UEM_1524_ZS")[1, 2]
Russia <- Quandl("WORLDBANK/RUS_SL_UEM_1524_ZS")[1, 2]
USA <- Quandl("WORLDBANK/USA_SL_UEM_1524_ZS")[1, 2]
UAE <- Quandl("WORLDBANK/ARE_SL_UEM_1524_ZS")[1, 2]
# http://api.worldbank.org/countries/KEN/indicators/SL.UEM.1524.ZS?per_page=1000
# Kenya <- Quandl('WORLDBANK/KEN_SL_UEM_1524_ZS')[1,2]
Kenya <- 17.1
Finland <- Quandl("WORLDBANK/FIN_SL_UEM_1524_ZS")[1, 2]
NewZeland <- Quandl("WORLDBANK/NZL_SL_UEM_1524_ZS")[1, 2]
Czech <- Quandl("WORLDBANK/CZE_SL_UEM_1524_ZS")[1, 2]
Japan <- Quandl("WORLDBANK/JPN_SL_UEM_1524_ZS")[1, 2]
Chile <- Quandl("WORLDBANK/CHL_SL_UEM_1524_ZS")[1, 2]
Nigeria <- Quandl("WORLDBANK/NGA_SL_UEM_1524_ZS")[1, 2]
SA <- Quandl("WORLDBANK/ZAF_SL_UEM_1524_ZS")[1, 2]
Canada <- Quandl("WORLDBANK/CAN_SL_UEM_1524_ZS")[1, 2]
Australia <- Quandl("WORLDBANK/AUS_SL_UEM_1524_ZS")[1, 2]
UK <- Quandl("WORLDBANK/GBR_SL_UEM_1524_ZS")[1, 2]
Jordan <- Quandl("WORLDBANK/JOR_SL_UEM_1524_ZS")[1, 2]
Israel <- Quandl("WORLDBANK/ISR_SL_UEM_1524_ZS")[1, 2]
Singapore <- Quandl("WORLDBANK/SGP_SL_UEM_1524_ZS")[1, 2]
France <- Quandl("WORLDBANK/FRA_SL_UEM_1524_ZS")[1, 2]
Indonesia <- Quandl("WORLDBANK/IDN_SL_UEM_1524_ZS")[1, 2]
Turkey <- Quandl("WORLDBANK/TUR_SL_UEM_1524_ZS")[1, 2]
Hungary <- Quandl("WORLDBANK/HUN_SL_UEM_1524_ZS")[1, 2]
Ghana <- Quandl("WORLDBANK/GHA_SL_UEM_1524_ZS")[1, 2]
Thailand <- Quandl("WORLDBANK/THA_SL_UEM_1524_ZS")[1, 2]
Colombia <- Quandl("WORLDBANK/COL_SL_UEM_1524_ZS")[1, 2]


unemplo <- data.frame(Country = Country, Unemployment_NonScaled = seq(1, 30), stringsAsFactors = FALSE)
# Unemployment = seq(1, 23),
unemplo$Unemployment_NonScaled[unemplo$Country == "Korea"] <- Korea
unemplo$Unemployment_NonScaled[unemplo$Country == "Singapore"] <- Singapore
unemplo$Unemployment_NonScaled[unemplo$Country == "China"] <- China
unemplo$Unemployment_NonScaled[unemplo$Country == "Germany"] <- Germany
unemplo$Unemployment_NonScaled[unemplo$Country == "Switzerland"] <- Switzerland
unemplo$Unemployment_NonScaled[unemplo$Country == "Mexico"] <- Mexico
unemplo$Unemployment_NonScaled[unemplo$Country == "Brazil"] <- Brazil
unemplo$Unemployment_NonScaled[unemplo$Country == "Russia"] <- Russia
unemplo$Unemployment_NonScaled[unemplo$Country == "United States"] <- USA
unemplo$Unemployment_NonScaled[unemplo$Country == "United Kingdom"] <- UK
unemplo$Unemployment_NonScaled[unemplo$Country == "United Arab Emirates"] <- UAE
unemplo$Unemployment_NonScaled[unemplo$Country == "Australia"] <- Australia
unemplo$Unemployment_NonScaled[unemplo$Country == "South Africa"] <- SA
unemplo$Unemployment_NonScaled[unemplo$Country == "Kenya"] <- Kenya
unemplo$Unemployment_NonScaled[unemplo$Country == "Finland"] <- Finland
unemplo$Unemployment_NonScaled[unemplo$Country == "Canada"] <- Canada
unemplo$Unemployment_NonScaled[unemplo$Country == "Israel"] <- Israel
unemplo$Unemployment_NonScaled[unemplo$Country == "New Zealand"] <- NewZeland
unemplo$Unemployment_NonScaled[unemplo$Country == "Jordan"] <- Jordan
unemplo$Unemployment_NonScaled[unemplo$Country == "Czech Republic"] <- Czech
unemplo$Unemployment_NonScaled[unemplo$Country == "Chile"] <- Chile
unemplo$Unemployment_NonScaled[unemplo$Country == "Japan"] <- Japan
unemplo$Unemployment_NonScaled[unemplo$Country == "Nigeria"] <- Nigeria
unemplo$Unemployment_NonScaled[unemplo$Country == "France"] <- France
unemplo$Unemployment_NonScaled[unemplo$Country == "Ghana"] <- Ghana
unemplo$Unemployment_NonScaled[unemplo$Country == "Indonesia"] <- Indonesia
unemplo$Unemployment_NonScaled[unemplo$Country == "Columbia"] <- Colombia
unemplo$Unemployment_NonScaled[unemplo$Country == "Turkey"] <- Turkey
unemplo$Unemployment_NonScaled[unemplo$Country == "Hungary"] <- Hungary
unemplo$Unemployment_NonScaled[unemplo$Country == "Thailand"] <- Thailand

unemplo$Unemployment <- as.numeric(scale(unemplo$Unemployment_NonScaled))
unemplo$Unemployment_ZscoreNEGATIVE <- as.numeric(-scale(unemplo$Unemployment_NonScaled))

unemplo <- plyr::arrange(unemplo, unemplo$Country)
head(unemplo)
```

## Index of Economic Freedom (Heritage Found./WSJ)

```{r}
# Excel Way | http://www.heritage.org/index/download
freedom <- read.xlsx("1_RawData/DataSources/index2015_data.xlsx", sheetIndex = 1, endRow = 187)
freedom <- plyr::rename(freedom, c(Country.Name = "Country", X2015.Score = "Freedom_Index", World.Rank = "RankOverall"), warn_duplicated = F)
freedom$Country <- str_trim(freedom$Country, side = "both")
freedom$Country[freedom$Country == "Korea, South"] <- "Korea"
freedom <- subset(freedom, select = c(Country, Freedom_Index, RankOverall))

# convert from factor to numeric
freedom$Freedom_Index <- suppressWarnings(as.numeric(as.character(freedom$Freedom_Index)))
freedom$RankOverall <- suppressWarnings(as.numeric(as.character(freedom$RankOverall)))

freedom <- subset(freedom, Country %in% selectedCountries, select = c(Country, Freedom_Index, RankOverall))

freedom$Freedom_Index_NonScaled <- freedom$Freedom_Index
freedom$Freedom_Index <- as.numeric(scale(freedom$Freedom_Index_NonScaled))
head(freedom)
```

## WEF's Global Competiveness Index (2015/2016)

```{r}
wef <- read.xlsx("1_RawData/DataSources/newRMD/GCR_Rankings_2015-2016.xlsx", sheetName = "GCI 2013-2014")[4:147, 1:3]
wef <- plyr::rename(wef, c("The.Global.Competitiveness.Index.2015.2016.rankings." = "Country", "NA."= "Ranking_WEF", "NA..1" = "WEF_Score"))
wef$Country <- str_trim(wef$Country, side = "both")

# correct names and convert to numeric
# https://stackoverflow.com/questions/3418128/how-to-convert-a-factor-to-an-integer-numeric-without-a-loss-of-information
wef$Country[wef$Country == "Taiwan, China"] <- "Taiwan"
wef$Country[wef$Country == "Korea, Rep."] <- "Korea"
wef$Country[wef$Country == "Russian Federation"] <- "Russia"

wef <- subset(wef, Country %in% selectedCountries)

# normalazing on the sample, not population
wef$WEF_Score_NonScaled <- as.numeric(levels(wef$WEF_Score)[wef$WEF_Score])
wef$WEF_Score <- as.numeric(scale(wef$WEF_Score_NonScaled))
head(wef)
```

## Countries’ H-Index (SCImago)

```{r}
# http://www.scimagojr.com/countryrank.php?area=0&category=0&region=all&year=all&order=h&min=0&min_type=it as of 30.Sep.2015
hindex <- read.xlsx("1_RawData/DataSources/newRMD/scimagojr.xlsx", sheetIndex = 1)

# sapply(hindex, class) # factors -> to char

hindex$Country <- str_trim(hindex$Country, side = "both")
hindex$Country[hindex$Country == "South Korea"] <- "Korea"
hindex$Country[hindex$Country == "Russian Federation"] <- "Russia"

hindex <- hindex[, !(colnames(hindex) %in% c("Documents", "Citable.documents", "Citations", "Self.Citations", "Citations.per.Document"))]
hindex <- plyr::rename(hindex, c(H.index = "H_Index"))

hindex <- subset(hindex, Country %in% selectedCountries, select = c(Country, Rank, H_Index))

hindex$H_Index_NonScaled <- hindex$H_Index
hindex$H_Index <- as.numeric(scale(hindex$H_Index_NonScaled))
head(hindex)
```

## Combine all datasets

```{r}
# Data here are non scaled, they contain 'the real values'.

df.Original <- dplyr::left_join(unemplo, freedom, by = "Country")

df.Original <- dplyr::left_join(df.Original, wef, by = "Country")
df.Original <- plyr::arrange(df.Original, df.Original$Country)

df.Original <- dplyr::left_join(df.Original, learningCurveData, by = "Country")
df.Original <- plyr::arrange(df.Original, df.Original$Country)
df.Original <- subset(df.Original, select = c(Country, Unemployment_NonScaled, 
                                              Freedom_Index_NonScaled, WEF_Score_NonScaled, LearningCurve_Index))

df.Original <- dplyr::left_join(df.Original, hindex, by = "Country")
df.Original <- plyr::arrange(df.Original, df.Original$Country)

df.Original <- dplyr::left_join(df.Original, hdi, by = "Country")
df.Original <- plyr::arrange(df.Original, df.Original$Country)
df.Original <- subset(df.Original, select = c(Country, Unemployment_NonScaled, Freedom_Index_NonScaled, WEF_Score_NonScaled, 
                                              LearningCurve_Index, HDIEducatIndex, H_Index_NonScaled))
```

# Imputation of missing values

Based on chapter 2.3, page 20ff. 

> ... I am not going to use any of the abovementioned mechanisms for handling missing data, but will 
> return to a much simpler method. Namely, given my knowledge, I will choose and assign six values for
> Nigeria, Kenya, Jordan, Ghana, South Africa and the UAE. On the one hand, this is not a 
> scientifically good approach as it brings a tangible source of uncertainty on my results. In the 
> case of large dataset and/or very high rate of missingness it may be even impossible doing so. On 
> the other hand, if data are not available and the reason is not related to other variables in my 
> dataset – as it is the case here – it is very hard to impute them in a preferable (‘desired’) way 
> even with the most advanced statistical models, simply because data do not exist. 

> As result, I decide to assign z-score of -2.1 to Nigeria, -1.9 to South Africa, -1.5 to Kenya,
> -1 to Ghana, -0.5 to Jordan, and finally -0.2 to the UAE. To conclude the whole chapter, I 
> would like to point out that the best solution to the problem of missing data is not to have 
> a problem of missing data. However, this is often not possible and therefore in this chapter
> I showed several available techniques and finally assigned values to those countries 
> considering my best (yet also limited) knowledge of their real situation.

Quoted from my thesis, chapter 2.3, page 22f. 

```{r}
df.Original.Imputed <- df.Original
df.Original.Imputed$LearningCurve_Index[df.Original.Imputed$Country == "Nigeria"] <- -2.1
df.Original.Imputed$LearningCurve_Index[df.Original.Imputed$Country == "South Africa"] <- -1.9
df.Original.Imputed$LearningCurve_Index[df.Original.Imputed$Country == "Kenya"] <- -1.5
df.Original.Imputed$LearningCurve_Index[df.Original.Imputed$Country == "Ghana"] <- -1.0
df.Original.Imputed$LearningCurve_Index[df.Original.Imputed$Country == "Jordan"] <- -0.5
df.Original.Imputed$LearningCurve_Index[df.Original.Imputed$Country == "United Arab Emirates"] <- -0.2

df.Original.Imputed <- data.frame(df.Original.Imputed[, -1], row.names = df.Original.Imputed[, 1])
```

# Normalisation

Based on chapter 2.4, page 23f. 

> The last normalisation technique, which I want to mention here (also used in the
> construction of my index), is called min-max normalisation.

> ... I am going to use a range between 0 and 100 and as briefly mentioned in the 
> chapter about the youth unemployment rate, it will be required to transform its polarity, 
> i.e. from having the highest number being the worst to having the lowest number being the
> worst.

Quoted from my thesis, chapter 2.4, page 24. 

```{r}
#' create a new data frame (df.Original.MinMax) based on the old one (df.Original.Imputed). This 
#' makes 1:1 copy of the data frame, yet with the different name
df.Original.MinMax <- df.Original.Imputed

df.Original.MinMax$WEF_Score_NonScaled <- ((100-0)*(df.Original.Imputed$WEF_Score_NonScaled-1)/ (7-1)) + 0

df.Original.MinMax$H_Index_NonScaled <- ((100-0)*(df.Original.Imputed$H_Index_NonScaled-1)/ (1518-1)) + 0

# HDI's Educat. Index is between 0 and 1 -> convert to (by multipling it with) 0-100
df.Original.MinMax$HDIEducatIndex <- df.Original.Imputed$HDIEducatIndex * 100

#' Unemployment_NonScaled goes into opposite direction, worst South Africa must be the worst, not the best (e.i. that would be 
#' the logic without this step). 
df.Original.MinMax$Unemployment_NonScaled = ((100-0)*(df.Original.Imputed$Unemployment_NonScaled-100)/ (0-100)) + 0

#' This normalizes columns of 'LearningCurve_Index' from minValue to maxValue. Beware of the colwise function that will be used on
#' on the whole data frame (from plyr)!
#' An assumption is made that although z-score beginngs from -Inf to Inf, I am going to use only a range between +-3.5
#' 
#' @param x A data frame
#' @param minValue A minimal value of the range of the scale (e.g. 0)
#' @param maxValue A maximal value of the range of the scale (e.g. 100)
rescaleColumns <- function(x, minValue, maxValue) {
  scales::rescale(x, to = c(minValue, maxValue), from = range(-3.5:3.5))
}

df.Original.MinMax$LearningCurve_Index <- plyr::colwise(rescaleColumns)(df.Original.Imputed, 0, 100)[, 4]
```

# Multivariate Analysis

Based on chapter 2.5, page 24ff. 

## Principal component analysis

Based on chapter 2.5.1, page 25f. 

```{r}
names(df.Original.MinMax) <- c("Unemployment", "Freedom_Index", "WEF_Score", "LearningCurve_Index", "HDIEducat_Index", "H_Index")

corelationMat2 <- cor(df.Original.MinMax)

colorfun2 <- colorRampPalette(c("#ffffcc", "#a1dab4", "#41b6c4", "#2c7fb8", "#253494"))
corrplot(corelationMat2, method = "number", type = "lower", order = "FPC", col = colorfun2(100)) 
```


```{r}
pc2 <- prcomp(df.Original.MinMax, center = TRUE, scale = FALSE)
summary(pc2)
as.data.frame(round(pc2$rotation, 3))

scree(df.Original.MinMax, factors = TRUE, pc = TRUE)
```

## Factor analysis

Based on chapter 2.5.2, page 28f. 

```{r}
options(width=200)
factorAn <- factanal(df.Original.MinMax, rotation = "varimax", factors = 2)
factorAn  # SS is sum of squares 
communality <- round(cbind(1 - factorAn$uniquenesses), 3)
communality
```


## Cluster analysis (hierarchical clustering)

Based on chapter 2.5.3, page 30ff. 

```{r}
#' Hierarchical Clustering 
euroclust <- hclust(dist(df.Original.MinMax, method = "euclidean"), "ward.D2")
plot(euroclust, hang = -1)
rect.hclust(euroclust, k = 2, border = "red")  # create border for 2 clusters
coef.hclust(euroclust) # agglomerative coef.
```

```{r, echo=FALSE}
options(width=200)
#' Pseudo K-Means - first create wrong clustering and then replace it with the correct one 
klust <- kmeans(dist(df.Original.MinMax, method = "euclidean"), 2, nstart = 25, iter.max = 100)
dataWithCluster <- data.frame(df.Original.MinMax, klust$cluster)  # append cluster assignment df.Original.MinMax

# now apply the fix -> will become 19 vs. 11
dataWithCluster$klust.cluster[rownames(dataWithCluster) == "Chile"] <- 2
dataWithCluster$klust.cluster[rownames(dataWithCluster) == "Hungary"] <- 2
dataWithCluster$klust.cluster[rownames(dataWithCluster) == "Russia"] <- 2
dataWithCluster$klust.cluster[rownames(dataWithCluster) == "United Arab Emirates"] <- 2

Developing <- sapply(dataWithCluster[dataWithCluster$klust.cluster == "1", ], mean)
Advanced <- sapply(dataWithCluster[dataWithCluster$klust.cluster == "2", ], mean)

aggregate(df.Original.MinMax, by=list(dataWithCluster$klust.cluster), FUN = mean) # gets cluster mean

dfClustMeans <- data.frame(Developing, Advanced)
dfClustMeans <- dfClustMeans[1:6, ]
dfClustMeans$vars <- rownames(dfClustMeans)
dfClustMeans$vars[dfClustMeans$vars == "Unemployment_NonScaled"] <- "Y. Unemployment"
dfClustMeans$vars[dfClustMeans$vars == "Freedom_Index_NonScaled"] <- "Freedom Index"
dfClustMeans$vars[dfClustMeans$vars == "WEF_Score_NonScaled"] <- "WEF's GCI"
dfClustMeans$vars[dfClustMeans$vars == "LearningCurve_Index"] <- "Learning Curve Index"
dfClustMeans$vars[dfClustMeans$vars == "HDIEducatIndex"] <- "HDI's Edu. Index"
dfClustMeans$vars[dfClustMeans$vars == "H_Index_NonScaled"] <- "H-Index"

# sapply(dfClustMeans, class)

dataWithCluster.long <- melt(dfClustMeans)  # convert to long format

# table for the picture
dataWithCluster.table <- data.frame(cbind(Indicator = dfClustMeans$vars, Difference=round(dfClustMeans$Advanced-dfClustMeans$Developing,1)))
dataWithCluster.table$Indicator <- as.character(dataWithCluster.table$Indicator)
dataWithCluster.table$Difference <- as.numeric(levels(dataWithCluster.table$Difference))[dataWithCluster.table$Difference]
dataWithCluster.table <- dataWithCluster.table[with(dataWithCluster.table, order(Difference)), ]
row.names(dataWithCluster.table) <- NULL

gp <- ggplot(dataWithCluster.long, aes(x = vars, y = value, group = variable, color = variable)) 
gp <- gp + geom_line() + geom_point() + ggtitle("Means plot for clusters")
gp <- gp + coord_cartesian(ylim = c(10, 105)) + scale_y_continuous(breaks = seq(10, 105, 5))
gp <- gp + ylab("Mean of each varaible in 2 clusters") + xlab("Indicators") + labs(color = "Types of Countries")
gp <- gp + annotation_custom(grob = tableGrob(as.matrix(dataWithCluster.table)), xmin = 0, xmax = 11, ymin = 0, ymax = 48)
gp
```

# Weighting and aggregation

Based on chapter 2.6, page 34ff. 

## Weighting based on factor analysis and my own preference

Based on chapter 2.6.1.1, page 36f. 

```{r}
options(width=200)
factor1SquaredLoadings <- factorAn$loadings[, 1]^2
factor2SquaredLoadings <- factorAn$loadings[, 2]^2

Sum_SFL <- sum(factor1SquaredLoadings) + sum(factor2SquaredLoadings) # + sum(factorAn$loadings[, 3]^2)

FactorWeight1 <- sum(factor1SquaredLoadings)/Sum_SFL
FactorWeight2 <- sum(factor2SquaredLoadings)/Sum_SFL

df.weights <- data.frame(Factor1ScaledWeight = factor1SquaredLoadings/sum(factor1SquaredLoadings), 
                         Factor2ScaledWeight = factor2SquaredLoadings/sum(factor2SquaredLoadings))

df.weights$colMax <- apply(df.weights, 1, function(x) max(x[])) # take max values from both columns, yet rowwise!

df.weights$WholeFactorWeight <- c(FactorWeight2, FactorWeight1, FactorWeight2, 
                                  FactorWeight1, FactorWeight1, FactorWeight1)

df.weights$Multipl <- df.weights$colMax * df.weights$WholeFactorWeight
df.weights$UnitScaled <- round(df.weights$Multipl / sum(df.weights$Multipl), 4)
df.weights

#' Min-MAX + FA weights
minMaxMultiFA.Weights <- t(t(df.Original.MinMax) * df.weights$UnitScaled)
df.Original.MM.FA <- sort(rowSums(minMaxMultiFA.Weights), decreasing = T)
df.Original.MM.FA <- data.frame(Value = df.Original.MM.FA, RankMM.FA = seq(1:30))

#' Min-MAX + EW
minMaxMultiEqual.Weights <- t(t(df.Original.MinMax) * c(rep(1/6, 6)))
df.Original.MM.EW <- sort(rowSums(minMaxMultiEqual.Weights), decreasing = T)
df.Original.MM.EW <- data.frame(Value = df.Original.MM.EW, RankMM.EW = seq(1:30))

#' Min-MAX + My own choice
minMaxMultiMyChoice.Weights <- t(t(df.Original.MinMax) * c(0.140, 0.170, 0.230, 0.220, 0.130, 0.110))
df.Original.MM.MyChoice <- sort(rowSums(minMaxMultiMyChoice.Weights), decreasing = T)
df.Original.MM.MyChoice <- data.frame(Value = df.Original.MM.MyChoice, RankMM.MC = seq(1:30))
```

# Results

## Positions in the ranking

```{r}
options(width=200)
df.Original.MM.FA$Country <- rownames(df.Original.MM.FA) 
df.Original.MM.EW$Country <- rownames(df.Original.MM.EW) 
df.Original.MM.MyChoice$Country <- rownames(df.Original.MM.MyChoice) 

# all lines are different, doens't have a straight one
df.Original.MM.FAEW <- inner_join(df.Original.MM.FA, df.Original.MM.EW, by= "Country")
df.Original.MM.FAEW.Subset <- subset(df.Original.MM.FAEW, select=c(Country, RankMM.FA, RankMM.EW))

df.Original.MM.FAEWMC <- inner_join(df.Original.MM.FAEW, df.Original.MM.MyChoice, by= "Country")
df.Original.MM.FAEWMC.Subset <- subset(df.Original.MM.FAEWMC, select=c(Country, RankMM.FA, RankMM.EW, RankMM.MC))
df.Original.MM.FAEWMC.Subset
```

## Bar chart decomposition of the Attractiveness Index (MM.FA)

```{r}
df.BackToDetails <- as.data.frame(minMaxMultiFA.Weights)
df.BackToDetails$Country <- rownames(df.BackToDetails) 
df.BackToDetails.p1 <- df.BackToDetails[, 1:3]
df.BackToDetails.p1$Country <- rownames(df.BackToDetails.p1) 
df.BackToDetails.p2 <- df.BackToDetails[, 4:7]

# Sum rowwise
df.BackToDetails.p2 <- adply(df.BackToDetails.p2, 1, transform, sumEdu = sum(LearningCurve_Index, HDIEducat_Index, H_Index))
df.BackToDetails.p1 <- adply(df.BackToDetails.p1, 1, transform, sumBuEco = sum(Unemployment, WEF_Score, Freedom_Index))


df.BackToDetails <- data.frame(Education = df.BackToDetails.p2$sumEdu, 
                               BussEcon = df.BackToDetails.p1$sumBuEco, 
                               Country = df.BackToDetails.p1$Country)

df.BackToDetails$Country <- as.character(df.BackToDetails$Country)

df.BackToDetails$Country[df.BackToDetails$Country == "United States"] <- "USA"
df.BackToDetails$Country[df.BackToDetails$Country == "United Arab Emirates"] <- "UAE"
df.BackToDetails$Country[df.BackToDetails$Country == "United Kingdom"] <- "UK"
df.BackToDetails$Country[df.BackToDetails$Country == "Czech Republic"] <- "Czech Rep."
df.BackToDetails$Country[df.BackToDetails$Country == "South Africa"] <- "S. Africa"
df.BackToDetails$Country[df.BackToDetails$Country == "New Zealand"] <- "N. Zealand"
# df.BackToDetails$Country[df.BackToDetails$Country == "Switzerland"] <- "Swizerl."

df.meltedBackToDetails <- melt(df.BackToDetails, id = "Country")  # convert to long format

e9 <- ggplot(data = df.meltedBackToDetails, aes(reorder(Country, value), fill = variable, weight = value)) + geom_bar()
e9 <- e9 + coord_cartesian(ylim = c(0, 100)) + scale_y_continuous(breaks = seq(0, 100, 5))
e9 <- e9 + ggtitle("Bar chart decomposition of the Attractiveness Index (MM.FA)") + scale_fill_discrete(name = "Dimensions")
e9 <- e9 + ylab("Index Value") + xlab("Countries")
e9

# Table
EducatValue <- cbind(df.BackToDetails$Education / (df.BackToDetails$Education + df.BackToDetails$BussEcon))
BusinessValue <- cbind(df.BackToDetails$BussEcon / (df.BackToDetails$Education + df.BackToDetails$BussEcon))

df.BackToDetails.table <- data.frame(cbind(df.BackToDetails$Country), EducatValue, BusinessValue)
head(df.BackToDetails.table)
```

## Comparison of 3 weighting methods (FA/EW/'my choice')

```{r, echo=FALSE}
meltingOriginal.MM.FA.Subset <- melt(df.Original.MM.FA[, c("Country", "RankMM.FA")],  id = "Country")
meltingOriginal.MM.EW.Subset <- melt(df.Original.MM.EW[, c("Country", "RankMM.EW")],  id = "Country")
meltingOriginal.MM.MC.Subset <- melt(df.Original.MM.MyChoice[, c("Country", "RankMM.MC")],  id = "Country")

meltingOriginal.MM.FA.Subset$Country[meltingOriginal.MM.FA.Subset$Country == "United States"] <- "USA"
meltingOriginal.MM.FA.Subset$Country[meltingOriginal.MM.FA.Subset$Country == "United Arab Emirates"] <- "UAE"
meltingOriginal.MM.FA.Subset$Country[meltingOriginal.MM.FA.Subset$Country == "United Kingdom"] <- "UK"
meltingOriginal.MM.FA.Subset$Country[meltingOriginal.MM.FA.Subset$Country == "Czech Republic"] <- "Czech Rep."
meltingOriginal.MM.FA.Subset$Country[meltingOriginal.MM.FA.Subset$Country == "South Africa"] <- "S. Africa"

meltingOriginal.MM.EW.Subset$Country[meltingOriginal.MM.EW.Subset$Country == "United States"] <- "USA"
meltingOriginal.MM.EW.Subset$Country[meltingOriginal.MM.EW.Subset$Country == "United Arab Emirates"] <- "UAE"
meltingOriginal.MM.EW.Subset$Country[meltingOriginal.MM.EW.Subset$Country == "United Kingdom"] <- "UK"
meltingOriginal.MM.EW.Subset$Country[meltingOriginal.MM.EW.Subset$Country == "Czech Republic"] <- "Czech Rep."
meltingOriginal.MM.EW.Subset$Country[meltingOriginal.MM.EW.Subset$Country == "South Africa"] <- "S. Africa"

meltingOriginal.MM.MC.Subset$Country[meltingOriginal.MM.MC.Subset$Country == "United States"] <- "USA"
meltingOriginal.MM.MC.Subset$Country[meltingOriginal.MM.MC.Subset$Country == "United Arab Emirates"] <- "UAE"
meltingOriginal.MM.MC.Subset$Country[meltingOriginal.MM.MC.Subset$Country == "United Kingdom"] <- "UK"
meltingOriginal.MM.MC.Subset$Country[meltingOriginal.MM.MC.Subset$Country == "Czech Republic"] <- "Czech Rep."
meltingOriginal.MM.MC.Subset$Country[meltingOriginal.MM.MC.Subset$Country == "South Africa"] <- "S. Africa"


me1 <- ggplot()
# green
me1 <- me1 + geom_line(data=meltingOriginal.MM.FA.Subset, aes(reorder(Country, value), value, colour=variable, group = variable))
me1 <- me1 + geom_point(data=meltingOriginal.MM.FA.Subset, aes(reorder(Country, value), value, colour=variable, group = variable), size = 4, shape=21, fill="white") 
# red
me1 <- me1 + geom_line(data=meltingOriginal.MM.EW.Subset, aes(reorder(Country, value), value, colour=variable, group = variable))
me1 <- me1 + geom_point(data=meltingOriginal.MM.EW.Subset, aes(reorder(Country, value), value, colour=variable, group = variable), size = 4, shape=21, fill="white") 
# blue
me1 <- me1 + geom_line(data=meltingOriginal.MM.MC.Subset, aes(reorder(Country, value), value, colour=variable, group = variable))
me1 <- me1 + geom_point(data=meltingOriginal.MM.MC.Subset, aes(reorder(Country, value), value, colour=variable, group = variable), size = 4, shape=21, fill="white") 
# all together
me1 <- me1 + coord_cartesian(ylim = c(0, 35)) + scale_y_continuous(breaks = seq(0, 35, 1))
me1 <- me1 + ggtitle("Comparison of 3 weighting methods (FA/EW/'my choice')") + ylab("Positions") + xlab("Countries") + labs(color = "We/No methods")
me1 
```

## Comparison of different weights based on Min-Max norm. method

```{r, echo=FALSE}
# Now melt them all
meltingOriginal.MM.FAEW.Subset <- melt(df.Original.MM.FAEW.Subset, id="Country") 
meltingOriginal.MM.FAEWMC.Subset <- melt(df.Original.MM.FAEWMC.Subset, id = "Country")

meltingOriginal.MM.FAEWMC.Subset$Country[meltingOriginal.MM.FAEWMC.Subset$Country == "United States"] <- "USA"
meltingOriginal.MM.FAEWMC.Subset$Country[meltingOriginal.MM.FAEWMC.Subset$Country == "United Arab Emirates"] <- "UAE"
meltingOriginal.MM.FAEWMC.Subset$Country[meltingOriginal.MM.FAEWMC.Subset$Country == "United Kingdom"] <- "UK"
meltingOriginal.MM.FAEWMC.Subset$Country[meltingOriginal.MM.FAEWMC.Subset$Country == "Czech Republic"] <- "Czech Rep."
meltingOriginal.MM.FAEWMC.Subset$Country[meltingOriginal.MM.FAEWMC.Subset$Country == "South Africa"] <- "S. Africa"

me3 <- ggplot(data=meltingOriginal.MM.FAEWMC.Subset, aes(reorder(Country, value), value, colour = variable, group = variable))
me3 <- me3 + geom_line() + geom_point(size = 4, shape=21, fill="white")
me3 <- me3 + coord_cartesian(ylim = c(0, 30)) + scale_y_continuous(breaks = seq(0, 30, 1))
me3 <- me3 + ggtitle("Comparison of different weights based on Min-Max norm. method") 
me3 <- me3 + ylab("Position in Ranking") + xlab("Countries") + labs(color = "Weights")
me3
```

## Comparison of 3 weighting methods (FA/EW/'my choice')

```{r, echo=FALSE}
qwer1 <- melt(df.Original.MM.FA[, c("Country", "Value")],  id = "Country")
qwer1$variable <- as.character(qwer1$variable)
qwer2 <- melt(df.Original.MM.EW[, c("Country", "Value")],  id = "Country")
qwer2$variable <- as.character(qwer2$variable)
qwer3 <- melt(df.Original.MM.MyChoice[, c("Country", "Value")],  id = "Country")
qwer3$variable <- as.character(qwer3$variable)

# sapply(qwer1, class)

qwer1$variable[qwer1$variable == "Value"] <- as.character("RankMM.FA")
qwer2$variable[qwer2$variable == "Value"] <- "RankMM.EW"
qwer3$variable[qwer3$variable == "Value"] <- "RankMM.MC"

qwer1$Country[qwer1$Country == "United States"] <- "USA"
qwer1$Country[qwer1$Country == "United Arab Emirates"] <- "UAE"
qwer1$Country[qwer1$Country == "United Kingdom"] <- "UK"
qwer1$Country[qwer1$Country == "Czech Republic"] <- "Czech Rep."
qwer1$Country[qwer1$Country == "South Africa"] <- "S. Africa"

qwer2$Country[qwer2$Country == "United States"] <- "USA"
qwer2$Country[qwer2$Country == "United Arab Emirates"] <- "UAE"
qwer2$Country[qwer2$Country == "United Kingdom"] <- "UK"
qwer2$Country[qwer2$Country == "Czech Republic"] <- "Czech Rep."
qwer2$Country[qwer2$Country == "South Africa"] <- "S. Africa"

qwer3$Country[qwer3$Country == "United States"] <- "USA"
qwer3$Country[qwer3$Country == "United Arab Emirates"] <- "UAE"
qwer3$Country[qwer3$Country == "United Kingdom"] <- "UK"
qwer3$Country[qwer3$Country == "Czech Republic"] <- "Czech Rep."
qwer3$Country[qwer3$Country == "South Africa"] <- "S. Africa"


tyr2 <- ggplot()
tyr2 <- tyr2 + geom_line(data=qwer1, aes(reorder(Country, value), value, group = variable, colour=variable))
tyr2 <- tyr2 + geom_point(data=qwer1, aes(reorder(Country, value), value, group = variable, colour=variable), size = 2, shape=21, fill="white") 
tyr2 <- tyr2 + geom_line(data=qwer2, aes(reorder(Country, value), value, group = variable, colour=variable))
tyr2 <- tyr2 + geom_point(data=qwer2, aes(reorder(Country, value), value, group = variable, colour=variable), size = 2, shape=21, fill="white") 
tyr2 <- tyr2 + geom_line(data=qwer3, aes(reorder(Country, value), value, group = variable, colour=variable))
tyr2 <- tyr2 + geom_point(data=qwer3, aes(reorder(Country, value), value, group = variable, colour=variable), size = 2, shape=21, fill="white") 
tyr2 <- tyr2 + coord_cartesian(ylim = c(35, 85)) + scale_y_continuous(breaks = seq(30, 85, 2))
tyr2 <- tyr2 + ggtitle("Comparison of 3 weighting methods (FA/EW/'my choice')") + ylab("Index Value") + xlab("Countries")
tyr2 
```

## Box Plot of 3 weighting methods

```{r, echo=FALSE}
me2 <- ggplot()
me2 <- me2 + geom_line(data=meltingOriginal.MM.FA.Subset, aes(reorder(Country, value), value, colour=variable, group = variable), colour="green")
me2 <- me2 + geom_point(data=meltingOriginal.MM.FA.Subset, aes(reorder(Country, value), value, colour=variable, group = variable), size = 3, shape=21, fill="white")
me2 <- me2 + geom_boxplot(data=meltingOriginal.MM.FAEWMC.Subset, aes(reorder(Country, value), value))
me2 <- me2 + coord_cartesian(ylim = c(0, 35)) + scale_y_continuous(breaks = seq(0, 35, 1))# + scale_colour_manual(values=c("green - FA weights"))
me2 <- me2 + ylab("Position in Ranking") + xlab("Countries") + ggtitle("Box Plot of 3 weighting methods")
me2
```

## Relationship between GDP and my Attractiveness Index (MM.FA)

```{r, echo=FALSE, message=FALSE, warning=FALSE}
###################
# Sources: https://stackoverflow.com/questions/24954624/error-in-download-file-no-such-file-or-directory
# gdp projections from 2015 IMF
# https://www.imf.org/external/pubs/ft/weo/2015/01/weodata/index.aspx
# https://www.imf.org/external/pubs/ft/weo/2015/01/weodata/weorept.aspx?sy=2015&ey=2015&scsm=1&ssd=1&sort=country&ds=.&br=0&pr1.x=49&pr1.y=10&c=512%2C668%2C914%2C672%2C612%2C946%2C614%2C137%2C311%2C962%2C213%2C674%2C911%2C676%2C193%2C548%2C122%2C556%2C912%2C678%2C313%2C181%2C419%2C867%2C513%2C682%2C316%2C684%2C913%2C273%2C124%2C868%2C339%2C921%2C638%2C948%2C514%2C943%2C218%2C686%2C963%2C688%2C616%2C518%2C223%2C728%2C516%2C558%2C918%2C138%2C748%2C196%2C618%2C278%2C624%2C692%2C522%2C694%2C622%2C142%2C156%2C449%2C626%2C564%2C628%2C565%2C228%2C283%2C924%2C853%2C233%2C288%2C632%2C293%2C636%2C566%2C634%2C964%2C238%2C182%2C662%2C453%2C960%2C968%2C423%2C922%2C935%2C714%2C128%2C862%2C611%2C135%2C321%2C716%2C243%2C456%2C248%2C722%2C469%2C942%2C253%2C718%2C642%2C724%2C643%2C576%2C939%2C936%2C644%2C961%2C819%2C813%2C172%2C199%2C132%2C733%2C646%2C184%2C648%2C524%2C915%2C361%2C134%2C362%2C652%2C364%2C174%2C732%2C328%2C366%2C258%2C734%2C656%2C144%2C654%2C146%2C336%2C463%2C263%2C528%2C268%2C923%2C532%2C738%2C944%2C578%2C176%2C537%2C534%2C742%2C536%2C866%2C429%2C369%2C433%2C744%2C178%2C186%2C436%2C925%2C136%2C869%2C343%2C746%2C158%2C926%2C439%2C466%2C916%2C112%2C664%2C111%2C826%2C298%2C542%2C927%2C967%2C846%2C443%2C299%2C917%2C582%2C544%2C474%2C941%2C754%2C446%2C698%2C666&s=PPPPC&grp=0&a=
###################
GDPperCapitaIMF <- read.delim("1_RawData/DataSources/GDPIMF.txt")
GDPperCapitaIMF$X2015 = str_replace_all(GDPperCapitaIMF$X2015, ",", "")
GDPperCapitaIMF = GDPperCapitaIMF[-188, ]
GDPperCapitaIMF$X2015 = as.numeric(GDPperCapitaIMF$X2015)
names(GDPperCapitaIMF)[names(GDPperCapitaIMF) == "X2015"] <- "GDPinDollars"


# https://stat.ethz.ch/pipermail/r-help/2011-April/274149.html
TigersGDP = subset(GDPperCapitaIMF, Country %in% c("Korea", "Singapore", "Japan", "Chile", "Czech Republic", "Nigeria", "China", "Germany", "Switzerland", 
                                                   "Mexico", "Jordan", "Brazil", "Russia", "United States", "United Kingdom", "United Arab Emirates", "Australia", "South Africa", 
                                                   "Kenya", "Finland", "Canada", "Israel", "New Zealand", "France", "Hungary", "Thailand", "Indonesia", "Ghana", "Colombia", "Turkey"),
                   select = c(Country, GDPinDollars))

df.Original.MM.FA.special <- df.Original.MM.FA
df.Original.MM.FA.special$Country <- rownames(df.Original.MM.FA)

gdpTiger <- dplyr::full_join(TigersGDP, df.Original.MM.FA.special, by = "Country")

f18 <- ggplot(data=gdpTiger, aes(x = Value, y = GDPinDollars, label=Country))
f18 <- f18 + geom_point() + geom_text(aes(label=Country), hjust=0, vjust=0) + stat_smooth(method="lm", se=FALSE)
f18 <- f18 + coord_cartesian(ylim = c(0, 85000)) + scale_y_continuous(breaks = seq(0, 85000, 5000))
f18 <- f18 + coord_cartesian(xlim = c(35, 90)) + scale_x_continuous(breaks = seq(35, 90, 2))
f18 <- f18 + ggtitle("Relationship between GDP and my Attractiveness Index (MM.FA)") + ylab("GDP (PPP) per Capita, 2015")
f18 <- f18 + xlab("Index Value, score between 35-90")
f18 
```

Is there any correlation between two values

```{r}
cor(gdpTiger$Value, gdpTiger$GDPinDollars) 
```