forked from KellyLuo233/mental_illness.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathproject_report.Rmd
1678 lines (1404 loc) · 66.7 KB
/
project_report.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Project Report: Mental Health Post COVID-19 Pandemic"
output:
html_document:
theme: journal
code_folding: hide
toc: true
toc_float: true
editor_options:
chunk_output_type: console
---
**Fei Xiao, Kelly Luo, Tongtong Zhu, Yi Fang, Yujie Huang**
```{r setup, include=FALSE}
library(tidyverse)
library(viridis)
library(plotly)
library(mgcv)
library(modelr)
library(ggmosaic)
library(lmtest)
library(performance)
library(knitr)
library(kableExtra)
library(patchwork)
library(ggfortify)
library(usmap)
library(gridExtra)
library(dplyr)
library(readxl)
knitr::opts_chunk$set(
echo = TRUE,
warning = FALSE,
fig.width = 12,
fig.height = 6,
out.width = "90%"
)
options(
ggplot2.continuous.colour = "viridis",
ggplot2.continuous.fill = "viridis"
)
scale_colour_discrete = scale_colour_viridis_d
scale_fill_discrete = scale_fill_viridis_d
theme_set(
theme_minimal() +
theme(
legend.position = "bottom",
plot.title = element_text(hjust = 0.5)
)
)
```
## Motivation
Mental health is an important part of overall health and well-being. Mental health includes our emotional, psychological, and social well-being. Mental health problems exist frequently throughout the United States. About one in five adults suffer from a diagnosable mental illness in a given year. Many common mental illnesses, such as depression, anxiety, bipolar disorder, may increase risk of suicide.
## Related Work
**The Big Event for Mental Health:** A WHO initiative on Global Awareness and Investment in World Mental Health
Annually on October 10, World Mental Health Day is celebrated around the world to advance advocacy and awareness about mental illnesses and acknowledge its effects on daily life. With these goals in mind, the first-ever event on this day, sponsored by the World Health Organization (WHO), was celebrated this year. The event highlighted advances and preventive measures in mental health care worldwide. The event also promoted mental health advocacy in collaboration among governmental, non-governmental, public agency partners, and individuals worldwide.
## Questions and Planned Analyses
1. What is the general status of mental illness across states in the US?
2. What is the overall trend of the percentage, frequency and level of anxiety and depression in the US across years?
3. How do trends in anxiety and depression percentages differ by biological sex, household income, marital status, and age?
4. What is the overall suicide trend across years? How does suicide differ by state, age, sex and means?
5. What is the association between taking depression or anxious medication and COVID-19 exposure?
## Data Source
**IPUMS Health Surveys:**
NHIS is a harmonized set of data covering more than 50 years (1963-present) of the National Health Interview Survey (NHIS). The NHIS is the principal source of information on the health of the U.S. population, covering such topics as general health status, the distribution of acute and chronic illness, functional limitations, access to and use of medical services, insurance coverage, and health behaviors. On average, the survey covers 100,000 persons in 45,000 households each year. The IPUMS NHIS facilitates cross-time comparisons of these invaluable survey data by coding variables identically across time. Our analysis will use data from 2015 to 2021, which covers the COVID-19 period.
**National Survey on Drug Use and Health (NSDUH):**
Substance Abuse and Mental Health Services Administration (SAMHSA), Center for Behavioral Health Statistics and Quality, National Survey on Drug Use and Health (NSDUH), 2019 and 2020.
**Centers for Disease Control and Prevention (CDC):**
CDC WONDER online databases, deaths(2014-2020). Data were collected from the WONDER online databases under the category of the compressed mortality.
National Center for Health Statistics, National Vital Statistics System, Mortality. Data were retrieved using NVSS multiple cause-of-death mortality files for 2000 through 2020. Suicide deaths were identified using International Classification of Diseases, 10th Revision (ICD–10) underlying cause-of-death codes U03, X60–X84, and Y87.0. Means of suicide were identified using ICD–10 codes X72–X74 for firearm, X60–X69 for poisoning, and X70 for suffocation. “Other means” includes: Cut or pierce; Drowning; Falls; Fire or flame; Other land transport; Struck by or against; Other specified, classifiable injury; Other specified, not elsewhere classified injury; and Unspecified injury, as classified by ICD–10.
## Data Cleaning
### Mental Illness Data
To understand the distribution of mental illness across states, we retrieved the mental illness data from the National Survey on Drug Use and Health (NSDUH),2019-2020. We focused on adults reporting any mental illness and adults reporting serious mental illness from 2019 to 2020. Number of adults reporting any mental illness and serious mental illness were rounded to the nearest 1,000. Serious mental illness (SMI) is defined as having a diagnosable mental, behavioral, or emotional disorder, other than a developmental or substance use disorder, as assessed by the Mental Health Surveillance Study (MHSS) Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders. Estimates of SMI are a subset of estimates of any mental illness (AMI) because SMI is limited to people with AMI that resulted in serious functional impairment. The dataset included the mental illness data for 50 states and Washington D.C. The variables we focused were:
* `state`: U.S. state
* `any_mental_num`: number of adults reporting any mental illness
* `any_mental_per`: percent of adults reporting any mental illness
* `ser_mental_num`: number of adults reporting serious mental illness
* `ser_mental_per`: percent of adults reporting serious mental illness
* `state_abb`: abbreviation of states
* `region`: state regions, including northeast, south, west and north central
```{r, message=FALSE}
mental_df =
read_csv("./data/mental_data.csv") %>%
janitor::clean_names() %>%
mutate(
any_mental_num = any_mental_num / 1000000,
any_mental_per = any_mental_per * 100,
ser_mental_num = ser_mental_num / 1000000,
ser_mental_per = ser_mental_per * 100,
state_abb = state.abb[match(state, state.name)],
region = state.region[match(state, state.name)]
) %>%
mutate(
state_abb = replace(state_abb, state == "District of Columbia", "DC"))
```
### Anxiety and Depression Data
We pulled out data from IPUMS Health Surveys: NHIS and will limit our analysis using data from 2015 to 2021. To analyze the trend of anxiety prevalence, frequency and level from 2015 to 2021, we will focus on anxiety indicators listed below:
- `WORFREQ`:How often feel worried, nervous, or anxious
- `WORRX`: Take medication for worried, nervous, or anxious feelings
- `WORFEELEVL`: Level of worried, nervous, or anxious feelings, last time
To analyze the trend of depression prevalence, frequency and level from 2015 to 2021, we will focus on depression indicators listed below:
- `DEPRX`:Take medication for depression
- `DEPFREQ`:How often feel depressed
- `DEPFEELEVL`: Level of depression, last time depressed
Core demographic and Social economic status indicators listed below are also included in this analysis:
- `AGE`:Age, individuals with age above 85 is excluded from analysis as 85 is the top code.
- `SEX`:Biological sex
- `MARST`:Current marital status
- `POVERTY`:Ratio of family income to poverty threshold
Responses indicate Unknown or not applied are excluded from our analysis.
```{r, message=FALSE}
anx_dep =
read_csv("data/nhis_data01.csv") %>%
janitor::clean_names() %>%
filter(year>=2015) %>%
select(year, worrx, worfreq, worfeelevl, deprx, depfreq, depfeelevl, age, sex, marst, poverty) %>%
mutate(
sex = recode_factor(sex,
"1" = "Male",
"2" = "Female"),
marst = recode_factor(marst,
"10" = "Married", "11" = "Married", "12" = "Married", "13" = "Married",
"20" = "Widowed",
"30" = "Divorced",
"40" = "Separated",
"50" = "Never married"),
poverty = recode_factor(poverty,
"11" = "Less than 1.0", "12" = "Less than 1.0",
"13" = "Less than 1.0", "14" = "Less than 1.0",
"21" = "1.0-2.0", "22" = "1.0-2.0",
"23" = "1.0-2.0", "24" = "1.0-2.0",
"25" = "1.0-2.0",
"31" = "2.0 and above","32" = "2.0 and above",
"33" = "2.0 and above","34" = "2.0 and above",
"35" = "2.0 and above","36" = "2.0 and above",
"37" = "2.0 and above","38" = "2.0 and above"),
worrx = recode_factor(worrx,
'1' = "no",
'2' = "yes"),
worfreq = recode_factor(worfreq,
'1' = "Daily",
'2' = "Weekly",
'3' = "Monthly",
'4' = "A few times a year",
'5' = "Never"),
worfeelevl = recode_factor(worfeelevl,
'1' = "A lot",
'3' = "Somewhere between a little and a lot",
'2' = "A little"),
deprx = recode_factor(deprx, '1' = "no", '2' = "yes"),
depfreq = recode_factor(depfreq, '1' = "Daily", '2' = "Weekly",
'3' = "Monthly", '4' = "A few times a year",
'5' = "Never"),
depfeelevl = recode_factor(depfeelevl, '1' = "A lot",
'3' = "Somewhere between a little and a lot",
'2' = "A little"),
age = ifelse(age>=85, NA, age)
)
```
### Suicide Data
To understand the distribution of suicides across states, we retrieved the suicide data from the online CDC WONDER database,2014-2020. The suicide number is the number of per 100,000 population. The suicide rate is the suicide per 100,000 population. To analyze the overall trend of suicide in the US and the difference in suicide rate by age, gender, and means of suicide, we collected the suicide data from the National Vital Statistics System, Mortality. The age groups excluded the suicide number for people aged 5-9 years. Although suicides for those aged 5-9 years were included in total numbers, they were not included as a studied age group because of the small number of suicides per year in this age group. We focused on 20 years of suicide data from 2000 to 2020, and paid more attention to the changes in suicide trends in recent years (2018-2020). The key variables in the dataset were:
* `year`: year, 2000-2020
* `state`: U.S. state
* `sex`: sex group, including female and male
* `age`: age group, including 10-14, 15-24, 25-44, 45-64, 65-74, 75+
* `suicide_no`: number of suicide per 100,000 population
* `suicide_100k`: suicide rate (suicide per 100k)
* `means`: means of suicide, including firearm, poisoning, suffocation and others
```{r, message=FALSE}
suicide_df =
read_excel(
"./data/suicide_data.xlsx",
sheet = 1,
col_names = TRUE) %>%
janitor::clean_names() %>%
mutate(
population = (suicide_no / suicide_100k) * 100000,
sex = as.factor(sex),
age = as.factor(age)
)
average_20years = sum(suicide_df$suicide_no) / sum(suicide_df$population) * 100000
suicide_state_df =
read_excel(
"./data/suicide_data.xlsx",
sheet = 2,
range = "A1:D351",
skip = 1,
col_names = TRUE) %>%
janitor::clean_names() %>%
rename(
suicide_no = deaths,
suicide_100k = death_rate) %>%
mutate(
population = (suicide_no / suicide_100k) * 100000
)
suicide_means_df =
read_excel(
"./data/suicide_data.xlsx",
sheet = 3,
col_names = TRUE) %>%
janitor::clean_names() %>%
pivot_longer(
firearm:others,
names_to = "means",
values_to = "rate"
) %>%
mutate(
sex = as.factor(sex),
means = as.factor(means))
```
## Exploratory Analysis
### Mental Illness
#### Map for Percent of Adults Reporting Any Mental Illness, by State between 2019-2020
According to the mental health data collected between 2019 -2020, the mental illness percents are high in the US overall, with variations between states. Among them, Utah has the highest rate of mental illness, 29.7%; Florida has the lowest rate of mental illness, 17.5%.
```{r, message=FALSE}
state_mental=
plot_usmap(
data = mental_df,
regions = "state",
values = "any_mental_per",
labels = TRUE, label_color = "white") +
labs(
title = "Percent of adults reporting any mental illness for each state, 2019-2022"
) +
scale_fill_continuous(
name = "Mental illness percent (%)",
label = scales::comma) +
theme(
legend.position = "right",
plot.title = element_text(size = 12))
ggplotly(state_mental)
```
#### Any/Serious Mental Illness Numbers (Million), by Region between 2019-2020
Both any mental illness and serious mental illness are highest in the South, lowest in the Northeast. And the number of mental illness in the South is more than twice that of the Northeast.
```{r, message=FALSE}
any_mental_plot =
mental_df %>%
group_by(region) %>%
drop_na() %>%
summarize(any_mental_num = sum(any_mental_num)) %>%
ggplot(
aes(x = region, y = any_mental_num, fill = region)) +
geom_bar(stat = "identity") +
labs(
title = "Any Mental Illness Number, by Region, 2019-2020",
x = "Region",
y = "Mental illness number (million)",
fill = "Region") +
theme(legend.position = "bottom")
ser_mental_plot =
mental_df %>%
group_by(region) %>%
drop_na() %>%
summarize(ser_mental_num = sum(ser_mental_num)) %>%
ggplot(
aes(x = region, y = ser_mental_num, fill = region)) +
geom_bar(stat = "identity") +
labs(
title = "Serious Mental Illness Number, by Region, 2019-2020",
x = "Region",
y = "Mental illness number (million)",
fill = "Region") +
theme(legend.position = "bottom")
grid.arrange(any_mental_plot, ser_mental_plot, ncol =2)
```
#### Any/Serious Mental Illness Percent, Top 10 States, 2019-2020
The top 10 states for any and serious mental illness are 8/10 the same, except Washington, Rhode Island, Arkansas and Indiana. Utah has the highest any/serious mental illness percent.
```{r, message=FALSE}
any_top10_plot =
mental_df %>%
filter(row_number(desc(any_mental_per)) <= 10) %>%
mutate(
state = fct_reorder(state, any_mental_per)
) %>%
ggplot(
aes(x = any_mental_per, y = state, fill = state)) +
geom_bar(stat = "identity") +
labs(
title = "Any Mental Illness Percent, Top 10 States",
x = "Any Mental illness percent (%)",
y = "State",
fill = "State") +
theme(legend.position = "bottom")
ser_top10_plot =
mental_df %>%
filter(row_number(desc(ser_mental_per)) <= 10) %>%
mutate(
state = fct_reorder(state, ser_mental_per)
) %>%
ggplot(
aes(x = ser_mental_per, y = state, fill = state)) +
geom_bar(stat = "identity") +
labs(
title = "Serious Mental Illness Percent, Top 10 States",
x = "Serious mental illness percent (%)",
y = "State",
fill = "State") +
theme(legend.position = "bottom")
grid.arrange(any_top10_plot, ser_top10_plot, ncol =2)
```
### Anxiety
#### Percentage of People Reported Taken Medication for Worried, Nervous, or Anxious Feelings
According to the plot, from 2015 to 2021, the percentage of people who report taking medication for worry, stress or anxiety is constantly increasing from 9.13% in 2015 to 13.57% in 2021. We can observe a rapid increase from 2017 to 2019 and, contrary to our expectations, a relatively slow increase from 2019 to 2020. The effect of COVID-19 on anxiety percentage is not evident in this plot.
```{r, message=FALSE}
anx_dep %>%
drop_na(worrx) %>%
group_by(year, worrx) %>%
summarize(wor_num = n()) %>%
pivot_wider(
names_from = worrx,
values_from = wor_num
) %>%
mutate(
wor_percentage = yes/(no + yes)*100,
text_label = str_c(yes, " out of ", no + yes)
) %>%
ungroup() %>%
plot_ly(
y = ~wor_percentage,
x = ~year,
color = ~year,
type = "bar",
colors = "viridis",
text = ~text_label
) %>%
layout(
title = "Percentage of people reported taken medication for anxiety",
xaxis = list (title = ""),
yaxis = list (title = "Percentage"),
showlegend = FALSE
) %>%
hide_colorbar()
```
##### Stratify by Biological Sex
Stratify the reported percentage of people taking medication for worried, nervous, or anxious feelings by biological sex, we can observe a much higher percentage among females than males. There is also a faster increase in the percentage among females from 14.41% in 2018 to 16.52% in 2019. Among males, the percentage is relatively stable from 2018 to 2020, while there is an increase from 2020 to 2021. Considering the fact that COVID-19 is prevalent in the United States starting in 2020, the effect of COVID-19 on anxiety percentage is not evident for either sex.
```{r, message=FALSE}
anx_dep %>%
drop_na(sex, worrx) %>%
group_by(sex, year, worrx) %>%
summarize(wor_num = n()) %>%
pivot_wider(
names_from = worrx,
values_from = wor_num
) %>%
mutate(
wor_percentage = yes/(no + yes)*100,
text_label = str_c(yes, " out of ", no + yes)
) %>%
ungroup() %>%
plot_ly(
y = ~wor_percentage,
x = ~year,
color = ~sex,
type = "bar",
colors = "viridis",
text = ~text_label
) %>%
add_trace(
x = ~year,
y = ~wor_percentage,
color = ~sex,
type='scatter',
mode='lines+markers'
) %>%
layout(
title = "Percentage of people reported taken medication for anxiety, by biological sex",
xaxis = list (title = ""),
yaxis = list (title = "Percentage"),
legend = list(orientation = 'h')
)
```
##### Stratify by Ratio of Family Income to Poverty Threshold
Stratify the percentage of people reported taken medication for worried, nervous, or anxious feelings by the ratio of household income to the poverty line, we can clearly see that the lower the household income, the higher their percentage. The percentage among the lowest income stratum decreased rapidly from 17.30% in 2017 to 15.84% in 2018, which is the opposite of what happened in the other two strata. Although the percentage of the lowest income stratum decreased rapidly from 2017 to 2018, they still had the highest percentage of the three strata, and this decrease was followed by a rapid increase from 15.84% in 2018 to 18.58% in 2019. From 2020 to 2021, the percentage decreases for the other two strata, while for the highest-income stratum, the percentage steadily increases. Although household income appears to have an effect on anxiety, the effect of COVID-19 on anxiety is not evident for all three strata.
```{r, message=FALSE}
anx_dep %>%
drop_na(poverty, worrx) %>%
group_by(poverty, year, worrx) %>%
summarize(wor_num = n()) %>%
pivot_wider(
names_from = worrx,
values_from = wor_num
) %>%
mutate(
wor_percentage = yes/(no + yes)*100,
text_label = str_c(yes, " out of ", no + yes)
) %>%
ungroup() %>%
plot_ly(
y = ~wor_percentage,
x = ~year,
color = ~poverty,
type = "scatter",
mode = "lines+markers",
colors = "viridis",
text = ~text_label
) %>%
layout(
title = "Percentage of people reported taken medication for anxiety, by household income",
xaxis = list (title = ""),
yaxis = list (title = "Percentage"),
legend = list(orientation = 'h')
)
```
##### Stratify by Current Martial Status
Stratify the percentage of people reported taken medication for worried, nervous, or anxious feelings by current martial status, we can observe a rapid increase from 14.31% in 2019 to 17.49% in 2020 in those separated. Considering the timing, this could be an effect of COVID-19.For other strata, the effect of COVID-19 is not obvious.
```{r, message=FALSE}
anx_dep %>%
drop_na(marst, worrx) %>%
group_by(marst, year, worrx) %>%
summarize(wor_num = n()) %>%
pivot_wider(
names_from = worrx,
values_from = wor_num
) %>%
mutate(
wor_percentage = yes/(no + yes)*100,
text_label = str_c(yes, " out of ", no + yes)
) %>%
ungroup() %>%
plot_ly(
y = ~wor_percentage,
x = ~year,
color = ~marst,
type = "scatter",
mode='lines+markers',
colors = "viridis",
text = ~text_label
) %>%
layout(
title = "Percentage of people reported taken medication for anxiety, by martial status",
xaxis = list (title = ""),
yaxis = list (title = "Percentage"),
legend = list(orientation = 'h')
)
```
#### Age Distribution
As we can see from the plot, the age distribution of people taking medication for worried, nervous, or anxious feelings did not change much from 2015 to 2021. The effect of COVID-19 was not evident in this plot.
```{r, message=FALSE}
anxiety_age_plot =
anx_dep %>%
drop_na(age, worrx) %>%
ggplot(
aes(x=age, group=worrx, fill=worrx)
) +
geom_density(alpha=0.4) +
facet_wrap(~year) +
labs(
title = "Age distribution of whether reported taken medication for anxiety",
fill = "Whether taken medicine for anxiety"
)
ggplotly(anxiety_age_plot) %>%
layout(legend = list(orientation = "h"))
```
#### Frequency of Worried, Nervous, or Anxious Feelings
From this bar plot about how often people feel worried, nervous, or anxious, we can observe that the frequency is steadily increasing from 2015 to 2021. There is also a rapid increase from 2019 to 2020, which could be COVID-19 related.
```{r, message=FALSE}
anx_dep %>%
drop_na(worfreq) %>%
group_by(year, worfreq) %>%
summarize(count = n()) %>%
group_by(year) %>%
summarize(
percentage=100 * count/sum(count),
sum_count = sum(count),
worfreq = worfreq,
count=count
) %>%
mutate(
text_label = str_c(count, " out of ", sum_count)
) %>%
plot_ly(
y = ~percentage,
x = ~year,
color = ~worfreq,
type = "bar",
colors = "viridis",
text = ~text_label
) %>%
layout(
title = "Frequency of anxiety",
xaxis = list (title = ""),
yaxis = list (title = "Percentage"),
barmode = 'stack',
legend = list(orientation = 'h')
)
```
#### Level of Worried, Nervous, or Anxious Feelings
From this bar plot about the level of worried, nervous, or anxious feelings people felt last time, we can observe a relatively large increase from 2018 to 2019 in the percentage of people who felt worried, stressed, or anxious a lot or between a little and a lot. However, the distribution did not change much from 2019 to 2020, which indicates that the impact of COVID-19 on level of anxiety may not be significant.
```{r, message=FALSE}
anx_dep %>%
drop_na(worfeelevl) %>%
group_by(year, worfeelevl) %>%
summarize(count = n()) %>%
group_by(year) %>%
summarize(
percentage=100 * count/sum(count),
sum_count = sum(count),
worfeelevl = worfeelevl,
count=count
) %>%
mutate(
text_label = str_c(count, " out of ", sum_count)
) %>%
plot_ly(
y = ~percentage,
x = ~year,
color = ~worfeelevl,
type = "bar",
colors = "viridis",
text = ~text_label
) %>%
layout(
title = "Level of anxiety",
xaxis = list (title = ""),
yaxis = list (title = "Percentage"),
barmode = 'stack',
legend = list(orientation = 'h')
)
```
#### Summary
- Contrary to our expectation, the association between COVID-19 and anxiety may not be significant from the plots.
- There is no major change in the trend of anxiety from 2019 to 2020.
- The increase in anxiety actually occurred prior to the COVID-19 period.
- Other factors such as biological sex and household income seem to have an greater impact on anxiety.
### Depression
#### Percentage of People Reported Taken Medication for Depression
According to the plot, the proportion of people reported taken medication for depression increased from 8.75% in 2015 to 11.42% in 2020, followed by a slight decrease from 2020 to 2021. COVID-19 appears to have a limited impact on depression percentage.
```{r, message=FALSE}
anx_dep %>%
drop_na(deprx) %>%
group_by(year, deprx) %>%
summarize(dep_num = n()) %>%
pivot_wider(
names_from = deprx,
values_from = dep_num
) %>%
mutate(
dep_percentage = yes/(no + yes)*100,
text_label = str_c(yes, " out of ", no + yes)
) %>%
ungroup() %>%
plot_ly(
y = ~dep_percentage,
x = ~year,
color = ~year,
type = "bar",
colors = "viridis",
text = ~text_label
) %>%
layout(
title = "Percentage of people reported taken medication for depression",
xaxis = list (title = ""),
yaxis = list (title = "Percentage"),
showlegend = FALSE
) %>%
hide_colorbar()
```
##### Stratify by Biological Sex
Stratify the reported percentage of people taking medication for depression by biological sex, we can observe a much higher percentage among females than males. There are also a faster increase in the percentage among females from 12.68% in 2017 to 15.14% in 2020 and a decrease from 15.14% in 2020 to 14.52% in 2021. Contrary to females, the percentage slightly decreased from 2018 to 2019 and then increased from 2020 to 2021 among males. The effect of COVID-19 is not evident fro either sex from this plot.
```{r, message=FALSE}
anx_dep %>%
drop_na(sex, deprx) %>%
group_by(sex, year, deprx) %>%
summarize(dep_num = n()) %>%
pivot_wider(
names_from = deprx,
values_from = dep_num
) %>%
mutate(
dep_percentage = yes/(no + yes)*100,
text_label = str_c(yes, " out of ", no + yes)
) %>%
ungroup() %>%
plot_ly(
y = ~dep_percentage,
x = ~year,
color = ~sex,
type = "bar",
colors = "viridis",
text = ~text_label
) %>%
add_trace(
x = ~year,
y = ~dep_percentage,
color = ~sex,
type='scatter',
mode='lines+markers'
) %>%
layout(
title = "Percentage of people reported taken medication for depression, by biological sex",
xaxis = list (title = ""),
yaxis = list (title = "Percentage"),
legend = list(orientation = 'h')
)
```
##### Stratify by Ratio of Family Income to Poverty Threshold
Stratify the percentage of people reported taken medication for depression by the ratio of household income to the poverty line, we can clearly see that the lower the household income, the higher their percentage. The percentage among the lowest-income stratum decreased from 17.41% in 2017 to 16.53% in 2018, which is the opposite of what happened in the other two strata. The change in the percentage is quite stable from 2018 to 2019 among all three strata. There is a rapid increase from 17.02% in 2019 to 18.66% in 2020, which may indicate that people belonging to the lowest-income stratum are affected by COVID-19 related depression. For other two strata, the effect of COVID-19 is not evident.
```{r, message=FALSE}
anx_dep %>%
drop_na(poverty, deprx) %>%
group_by(poverty, year, deprx) %>%
summarize(dep_num = n()) %>%
pivot_wider(
names_from = deprx,
values_from = dep_num
) %>%
mutate(
dep_percentage = yes/(no + yes)*100,
text_label = str_c(yes, " out of ", no + yes)
) %>%
ungroup() %>%
plot_ly(
y = ~dep_percentage,
x = ~year,
color = ~poverty,
type = "scatter",
mode = "lines+markers",
colors = "viridis",
text = ~text_label
) %>%
layout(
title = "Percentage of people reported taken medication for depression, by household income",
xaxis = list (title = ""),
yaxis = list (title = "Percentage"),
legend = list(orientation = 'h')
)
```
##### Stratify by Current Martial Status
Stratify the percentage of people reported taken medication for depression by current martial status, we can observe a rapid decrease from 17.26% in 2016 to 13.12% in 2019 among separated, while this downward trend slows from 2018 to 2019 and reverses from 2019 to 2020. This reversal may be associated with COVID-19. The trends are similar for married and never married, divorced and widowed. The effect of COVID-19 is not evident for these three strata.
```{r, message=FALSE}
anx_dep %>%
drop_na(marst, deprx) %>%
group_by(marst, year, deprx) %>%
summarize(dep_num = n()) %>%
pivot_wider(
names_from = deprx,
values_from = dep_num
) %>%
mutate(
dep_percentage = yes/(no + yes)*100,
text_label = str_c(yes, " out of ", no + yes)
) %>%
ungroup() %>%
plot_ly(
y = ~dep_percentage,
x = ~year,
color = ~marst,
type = "scatter",
mode='lines+markers',
colors = "viridis",
text = ~text_label
) %>%
layout(
title = "Percentage of people reported taken medication for depression, by martial status",
xaxis = list (title = ""),
yaxis = list (title = "Percentage"),
legend = list(orientation = 'h')
)
```
#### Age Distribution
As we can see from the graph, people in their 60s tend to have a higher incidence of depression. However, the age distribution of people taking medication for depression did not change much from 2015 to 2021. The effect of COVID-19 is not evident in this plot.
```{r, message=FALSE}
depression_age_plot =
anx_dep %>%
drop_na(age, deprx) %>%
ggplot(
aes(x=age, group=deprx, fill=deprx)
) +
geom_density(alpha=0.4) +
facet_wrap(~year) +
labs(
title = "Age distribution of whether reported taken medication for depression",
fill = "Whether taken medicine for depression"
)
ggplotly(depression_age_plot) %>%
layout(legend = list(orientation = "h"))
```
#### Frequency of Depression
From this bar plot about how often people feel depressed, we can observe that the frequency is quite stable and there is no clear evidence of the effect of COVID-19 on the frequency of depression.
```{r, message=FALSE}
anx_dep %>%
drop_na(depfreq) %>%
group_by(year, depfreq) %>%
summarize(count = n()) %>%
group_by(year) %>%
summarize(
percentage=100 * count/sum(count),
sum_count = sum(count),
depfreq = depfreq,
count=count
) %>%
mutate(
text_label = str_c(count, " out of ", sum_count)
) %>%
plot_ly(
y = ~percentage,
x = ~year,
color = ~depfreq,
type = "bar",
colors = "viridis",
text = ~text_label
) %>%
layout(
title = "Frequency of depression",
xaxis = list (title = ""),
yaxis = list (title = "Percentage"),
barmode = 'stack',
legend = list(orientation = 'h')
)
```
#### Level of Depression
From this bar plot about the level of depression last time, we can see that the percentage of people who felt "a lot" or "between a little and a lot depression" is stable over the time period and a decrease of percentage of people feel "a lot depression" from 2018 to 2019. There is also no clear evidence of the effect of COVID-19 on the level of depression.
```{r, message=FALSE}
anx_dep %>%
drop_na(depfeelevl) %>%
group_by(year, depfeelevl) %>%
summarize(count = n()) %>%
group_by(year) %>%
summarize(
percentage=100 * count/sum(count),
sum_count = sum(count),
depfeelevl = depfeelevl,
count=count
) %>%
mutate(
text_label = str_c(count, " out of ", sum_count)
) %>%
plot_ly(
y = ~percentage,
x = ~year,
color = ~depfeelevl,
type = "bar",
colors = "viridis",
text = ~text_label
) %>%
layout(
title = "Level of depression",
xaxis = list (title = ""),
yaxis = list (title = "Percentage"),
barmode = 'stack',
legend = list(orientation = 'h')
)
```
#### Summary
- Contrary to our expectation, the association between COVID-19 and depression may not be significant from the plots.
- From 2019 to 2020, there is no major change in the trend of depression.
- Other factors such as biological sex and household income seem to have an greater impact on depression.
### Suicide
#### National Trend of Suicide Rate, 2000-2020, (per 100K, per year)
The US national suicide rates increase from 2000 to 2018, then decline from 2018 to 2020. The average suicide rate from 2000 to 2020 is 14.2 per 100,000 (represented with red dot line).
```{r, message=FALSE}
suicide_plot = suicide_df %>%
group_by(year) %>%
summarize(
population = sum(population),
suicide = sum(suicide_no),
suicide_100k = (suicide / population) * 100000
) %>%
ggplot(aes(x = year, y = suicide_100k)) +
geom_line(col = "deepskyblue", size = 1) +
geom_point(col = "deepskyblue", size = 2) +
geom_hline(
yintercept = average_20years, linetype = 2, color = "red", size = 1) +
scale_x_continuous(breaks = seq(2000, 2020, 5)) +
scale_y_continuous(breaks = seq(8, 18, 1)) +
labs(title = "US National Suicide Rate (per 100K), 2000-2020",
x = "Year",
y = "Suicides per 100k")
ggplotly(suicide_plot)
```
#### Cumulative Suicide Rate for Each State, 2014-2020
Wyoming has the highest suicide rate and New Jersey has the lowest suicide rate from 2014-2020.
The top 5 states with high cumulative suicide rates are Wyoming, Alaska, Montana, New Mexico, and Idaho; the top 5 states with low cumulative suicide rates are New Jersey, New York, Massachusetts, Maryland and Connecticut.
```{r, message=FALSE}
suicide_state_df %>%
group_by(state) %>%
summarize(
population = sum(population),
suicide = sum(suicide_no),
suicide_100k = (suicide / population) * 100000
) %>%
mutate(
state = fct_reorder(state, suicide_100k)) %>%
ggplot(aes(x = suicide_100k, y = state, fill = state )) +
geom_bar(stat = "identity") +
scale_x_continuous(breaks = seq(0, 30, 2)) +
labs(
title = "Cumulative Suicide Rate, by State, 2014-2020",
x = "Suicides per 100k",
y = "State") +
theme(legend.position = "right")
```
#### Suicide Rate for Each State over Years, 2014-2020
Between 2014 and 2020, the state with the largest change in suicide rate is Wyoming, from 20.6 to 30.5; the state with the smallest change in suicide rate is New York, from 7.8 to 8.3.
```{r, message=FALSE}
suicide_state_df %>%
ggplot(aes(x = suicide_100k, y = state, color = year)) +
geom_line(size = 1) +
geom_point(size = 2) +
labs(
title = "Suicide Rate for Each State Over Years, 2014-2020",
x = "State",
y = "Suicides per 100K") +
theme(legend.position = "right")
```
#### Suicide Rate by Sex
Nationally, the overall suicide rate for males is about 4 times that of females. For females, suicide rates peak in 2018 and decline since then; for males, suicide rates peak in 2017 and decline since then. From year 2000 to 2020, the male suicide rate remains apparently higher than the female suicide rate, and the ratio is constantly about 4:1.
```{r, message=FALSE}
total_sex_plot = suicide_df %>%
group_by(sex) %>%
summarize(
population = sum(population),
suicide = sum(suicide_no),
suicide_100k = (suicide / population) * 100000
) %>%
ggplot(aes(x = sex, y = suicide_100k, fill = sex )) +
geom_bar(stat = "identity") +
scale_y_continuous(breaks = seq(0, 24, 4)) +
labs(
title = "National Suicide Rate, by Sex",
x = "Sex",
y = "Suicides per 100k")
year_sex_plot = suicide_df %>%
group_by(year, sex) %>%
summarize(
population = sum(population),
suicide = sum(suicide_no),
suicide_100k = (suicide / population) * 100000
) %>%
ggplot(aes(x = year, y = suicide_100k, color = sex)) +
geom_line(size = 1) +
geom_point(size = 2) +
scale_y_continuous(breaks = seq(0, 30, 5)) +
labs(
title = "Suicide Trend Over Years, by Sex",
x = "Year",
y = "Suicides per 100k"
)
grid.arrange(total_sex_plot, year_sex_plot, ncol = 2 )
```
#### Suicide Rate by Age
Nationally, aged 45-64 had the highest suicide rate, second highest group is aged 75+. The 10-14 aged group has the lowest suicide rate. From year 2000 to 2020, the suicide rate of group aged 10-14 remains roughly static and small. Suicide rates in all other age groups show an overall upward trend. Among them, the group aged 25-44 has the largest change, roughly from 10 to 18 suicide rate per 100k. The suicide rates of those aged 45-64 and aged 65-74 start to drop since 2018.
```{r, message=FALSE}
total_age_plot = suicide_df %>%
group_by(age) %>%
summarize(
population = sum(population),
suicide = sum(suicide_no),
suicide_100k = (suicide / population) * 100000
) %>%
ggplot(aes(x = age, y = suicide_100k, fill = age )) +
geom_bar(stat = "identity") +
scale_y_continuous(breaks = seq(0, 20, 2)) +
labs(
title = "National Suicide Rate, by Age",
x = "Age",
y = "Suicides per 100k")
year_age_plot = suicide_df %>%