-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgrouplab1_final.Rmd
118 lines (85 loc) · 11.9 KB
/
grouplab1_final.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: "Group Lab 1 - Changes in the Geography of Chinese Production"
author: "Group 2: Jin Chang, Carmelita Deleon, Kevin Ho, Billy Wang, Alisha Husain"
date: "1/25/2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
source('Lab1_Part3.R')
source('Lab1_Part2.R')
```
#Part 1
Cities like Beijing and Shanghai are included in the list of provinces in this dataset because they are municipalities. In China, a municipality is a "city"[^1] with "provincial" power under the unified jurisdiction of the central government. This means that Beijing and Shanghai are considered both cities and provinces. Furthermore, there are a few locations excluded in this dataset that some viewers might argue are "missing". For example, the People's Republic of China claims Taiwan and its surrounding islets as "Taiwan Province", even though Taiwan has been independent from China since 1949. If someone were interested in finding economic statistical information about Taiwan, they would have to visit the website on National Statistics for the Republic of China (Taiwan). Another province that has a large amount of economic data missing from this dataset is the Tibet province; Tibet has been occupied by China since 1965, when it established the "Tibetan Autonomous Region". From our dataset, it is evident that regular reporting of enterprise and output about this region only became available after China's establishment of the TAR.
[^1]: From Wikipedia - A municipality is often not a "city" in the usual sense of the term (i.e., a large continuous urban settlement), but instead an administrative unit of sorts.
There are several problems with Chinese statistics that cause unreliability and uncertainty. The first problem is that when China began its major economic transformation in the late 1970s, changing from a command economy to a market economy, the transformation happened faster than China's National Bureau of Statistics had prepared itself for. The result was a major challenge in collecting measurements, because many of these new private service-sector businesses did not report their activity directly to the NBS until the 1990s. This means that the period of 1970-1990 is less reliable for China's economic data. Another issue is that in 1993, China changed its methods of calculating its GDP to the United Nations' standards - however, these concepts were very new to the statisticians in China and would take time to adopt. The challenge of overhauling the country's statistical system makes accurately measuring growth during the transition period unlikely. [Source](https://www.stlouisfed.org/publications/regional-economist/second-quarter-2017/chinas-economic-data-an-accurate-reflection-or-just-smoke-and-mirrors
Furthermore, additional [sources] (https://www.forbes.com/sites/unicefusa/2019/01/23/a-step-closer-to-adulthood-along-a-cold-highway/#7aec101a1632) argue that "there are enormous discrepancies both within NBSC data and then when comparing it to outside data. These discrepancies come from both poor quality statistical management but also unquestionably data manipulation". Critics have offered several recommendations to help fight data falsification at the provincial and individual levels. These include recommendations to introduce measures to check the quality of data collection, corrections for exaggeration, better methodology, and improving timeliness and periodicity of data releases.
#Part 2
```{r visualization 1, echo=FALSE, warning=FALSE}
line_grph1
```
Here we have a chart showing the output, in 100 million RMB, for 4 Provinces in China from 1975 through 2017. Output has been relatively flat for these 4 provinces up until 1990. At around that point, there appears to be a divergence in output increase, with some provinces experiencing greater output growth than others. What might explain this? Qinghai and Xinjiang are on the **western** part of the country, where the terrain is _mountainous_ and _fragmented_, isolating those areas from the rest of the country. On the other hand, Guangdong and Fujian are along the coast where industrialization and economic activity are greater [(Batisse 2005)](https://journals.openedition.org/chinaperspectives/502).
```{r visualization 2, echo=FALSE, warning=FALSE}
line_grph2
```
Focusing in on Guangdong Province from the previous chart, this chart displays the **relationship** between the number of enterprises and output. Generally, the _greater_ the number of enterprises the _greater_ the output. Howerver, that isn't always the case as there are some instances of lower numbers of enterprises yet high output. When there are **low** numbers of enterprises but output remains **high**, that may be an indicator for _greater_ instances of mergers/acquisitions and/or _increased_ productivity.
```{r visualization 3, echo=FALSE, warning=FALSE}
line_grph3
```
Here are the **top 5** provinces and the output that they produced from the years _1949_ to _2017_. These provinces had the greatest enterprises from the start of 1949. As shown, all five started with **_low outputs_**. After 1990 is when all five companies started to disperse with _different_ values in output. Jiangsu Enterprise has the _highest_ output with about 143016.94 hundred million RMB in the year 2014 followed by Shandong Enterprise as a close second with an output of 129906 hundred million RMB in the year 2013.
Beijing Enterprise has the _lowest value_ of output of 18087.27 hundred million RMB in the year 2016.
```{r visualization 4, echo=FALSE, warning=FALSE}
line_grph4
```
The fourth visualization presented are provinces whom started with a **small** output value in 1949. It was a curiosity to see if said provinces grew a substantial amount of output from 1949 to 2017. The provinces that had _less than 2 hundred million RMB are_: Yunnan, Beijing, Gansu, Xinjiang, Qinghai, and Ningxia. There was not much change from 1949 to 1985. It was not until **1990** where there was a _change_ in output for each province. There was an ongoing trend of staying consistent for the five provinces: Gansu, Ningxia, Qinghai, Xinjiang, and Yunnan as it was approaching the year 2000. There was a _huge_ increase for the province _Guangdong_ as it **gradually** grew larger outputs than the other five provinces.
Guangdong went from 9.89 million RMB in 1949 to 9486.79 million RMB in the year 2011.
```{r visualization 5, echo=FALSE, warning=FALSE}
line_grph5
```
Beijing is symbolized by the blue dots,
Shanghai is symbolized by the red dots.
The purpose of this visualization is to compare the economic output between the two _wealthiest_ provinces in China, Beijing and Shanghai. Up until around 1986, both provinces had roughly the same output, but after that year both provinces began to experience **exponential growth**, with Shanghai's rate of increase in output being _higher_ than Beijing's.
#Part 3
##First
```{r first, echo=FALSE, warning=FALSE}
plot1
plot2
plot3
```
### How we treated NA Values
We did not modify or remove the NA values during the calculation or while plotting because we believe that there may be a _loss_ in data. Furthermore, we believe the ommission of this values may impact the analysis **negatively**.
##Second
```{r second, echo=FALSE, warning=FALSE}
national.plot
```
###Analysis
Based on the figure above we noticed that Beijing originally had a **large** percent of enterprises in the past. However, there seems to be a large drop near 1960 and it has been consistent until the early 2000s. Around this period it did increase however, dropped again. Unlike **_Beijing_**, **_Tianjin_** shows a consistent trend until the early 2000s where just like Beijing it _increased_, then the percent of enterprises dropped around the year 2000. Similarly, **_Shanxi_** does show an increase in trend before 1960 however, in contrast to the other three provinces the national share total _gradually_ begins to decrease around 1980. Conversely, although **_Hebei_** shows an increase in trend before 1960, the national shares total _does not_ decrease. Rather, it gradually increasing but, drops near 2010.
##Third
```{r third, echo=FALSE, warning=FALSE}
output_enterprise
```
###Analysis
Our team noticed that over time the **price per enterprise** was consistent until it drastically increased around 1997. Similarly, the plot shows that _all five provinces_ increase at a similar rate. However, around 2010 **_Shanghai_** showed a large drop in price per enterprise in contrast to the trend that Beijing and Tianjin exhibited. Conversely, both **_Heilongjiang_** and **_Shanghai_** showed an interesting occurrence since both province's trend just _stopped_ around the 2010. Overall, our team noticed that over time all five province's price per enterprise was consistent until 2010 where the ratios started to differ.
## Four, Five, and Six
Below we see population data gathered from the National Bureau of Statistics of China. Formatting of the original raw file was odd & difficult to work with, so I modified it and cleaned it in a separate file. Below you will see the Gross Domestic Populations of
the first 10 provinces, so it matches up well with the Enterprise/Output data, as well as a cut year range from 2000 to 2017.
```{r four, echo=FALSE, warning=FALSE}
pop1 = read.csv("GrossDomPop_byProvince.csv")
pop2=as_tibble(pop1)
plot(pop2$Year, pop2$Beijing_GrossDomesticPopulation, main="Population in Beijing Last 10 Years (in 10,000)", pch=19)
plot(pop2$Year, pop2$Tianjin_GrossDomesticPopulation, main="Population in Tianjin Last 10 Years (in 10,000)", pch=19)
```
Here we see population graphs for both the province of Beijing as well as Tianjin. Notice how they similarly mark against one another in a logarithmic fashion. As previously mentioned, the timeframe of 2000 to 2017 was chosen purpusely as you can see from around 2000 to ~2010, China was experiencing huge population growth, to the point of exponential codification. However, post 2010 you see a sharp decline in the growth rate as exemplified below, taken as well from the Natural Bureau of of Statistics of China, which causes the progression from exponential growth to a more modest logarithmic.
```{r five, echo=FALSE, warning=FALSE}
growthRate = read.csv("GrowthRate.csv")
growthRate2=as_tibble(growthRate)
natural_growthRate <- growthRate2[3,]
years <- c("1999","2000","2001","2002","2003","2004","2005","2006","2007","2008","2009", "2010","2011","2012","2013","2014","2015","2016","2017")
plot(years, natural_growthRate, main="Natural Growth Rate of Chinese Citizens Totaled in %", pch=19)
```
Next, we combine both the Enterprise/Output data, as well as the new Population Data, to check for hidden variables. We will normalize the Enterpise data to the Gross Domestic Population variable to limit the role of external factors and try and derive a relationship from it. Below you will see a function that takes in the province names of all ten listed and calculates it's enterprise to population ratio, followed by a line plot to add contour to the pattern. It runs through Beijing then Tianjin then Hebei, etc so you the viewer can easily visually compare Enterprise/Population ratios from the top down.
```{r, echo=FALSE}
plot(beijingEnterprise_Pop$Year, beijingEnterprise_Pop$ratio, main="Enterprise/Population Ratio in Beijing Last 10 Years", pch=19)
bigMerge(provinceList)
```
From our line plots, we can see certain provinces such as Beijing, Tianjin, Shanxi, and Shanghai have similar plotted trajectories to our growth rate function. This could be illuminated by the fact that these provinces have significantly higher Investment in Fixed Assets (National Bureau of Statistics) as compared to the other provinces as well as higher Number of Employed Personnel, transforming the Enterprise/Population metric into more of a productive unit of value/person metric, which also makes sense as median average wage is higher in Beijing & Shanghai than almost the rest of the provinces listed. I wish I could have expanded more into other variable factors to draw correlations, but I wanted to stay on topic, so those were not included.