-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathmapping.qmd
1592 lines (1290 loc) · 59.4 KB
/
mapping.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
output:
html_document:
includes:
in_header: analytics.html
css: styles.css
code_folding: show
toc: TRUE
toc_float: TRUE
pandoc_args:
"--tab-stop=2"
editor_options:
markdown:
wrap: 72
---
<link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato" />
::: {#header}
<img src="mapping/www/images/urban-institute-logo.png" width="350"/>
:::
```{r markdown-setup, include=FALSE}
knitr::opts_chunk$set(fig.path = "mapping/www/images/")
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(warning = FALSE)
options(scipen = 999)
```
```{r setup, include=FALSE}
library(tidyverse)
library(knitr)
library(kableExtra)
library(here)
library(sf)
```
# Introduction
This guide will teach you the concepts and code you will need for
mapping and geospatial analysis in R. **This is a long guide, so if you
need something specific, we encourage you to scroll to the appropriate
section using the Table of Contents on the left.** If you just want copy
and pasteable code to create different kinds of maps, head to the
[`Map Gallery`](#map_gallery).
Now let's start mapping!
![](mapping/www/images/yay_maps.gif)
## Geospatial Workflow
This picture below outlines what we think are the main steps in a
geospatial workflow. This guide will be split into sections describing
each of the steps.
![](mapping/www/images/geospatial_workflow.png)
## Should this be a map?
The [Urban Institute Data Visualization Style
Guide](http://urbaninstitute.github.io/graphics-styleguide/) offers some
blunt but useful suggestions for maps:
> Just because you've got geographic data, doesn't mean that you have to
> make a map. Many times, there are more efficient storyforms that will
> get your point across more clearly. If your data shows a very clear
> geographic trend or if the absolute location of a place or event
> matters, maps might be the best approach, but sometimes the reflexive
> impulse to map the data can make you forget that showing the data in
> another form might answer other---and sometimes more
> important---questions.
So we would encourage you to think critically before making a map.
## Why map with R?
R can have a steeper learning curve than point-and-click tools - like
QGIS or ArcGIS - for geospatial analysis and mapping. But creating maps
in R has many advantages including:
1) **Reproducibility**: By creating maps with R code, you can easily
share the outputs and the code that generated the output with
collaborators, allowing them to replicate your work and catch errors
easily.
2) **Iteration**: With point and click software like ArcGIS, making 50
maps would be 50 times the work/time. But using R, we can easily
make make many iterations of the same map with a few changes to the
code.
3) **Easy Updates**: Writing code provides a roadmap for others (and
future you!) to quickly update parts of the map as needed. Say for
example a collaborator wanted to change the legend colors of 50
state maps. With R, this is possible in just a few seconds!
4) **An Expansive ecosystem**: There are several R packages that make
it very easy to get spatial data, create static and interactive
maps, and perform spatial analyses. This feature rich package
ecosystem which all play nice together is frankly unmatched by other
programming languages and even point and click tools like QGIS and
ArcGIS. Some of these R packages include:
- `sf`: For managing and analyzing spatial dataframes
- `tigris`: For downloading in Census geographies
- `ggplot2`: For making publication ready static maps
- `urbnmapr`: For automatically adding Urban styling to static
maps
- `mapview`: For making expxploratory interactive maps
5) **Cost**: Most point-and-click tools for geospatial analysis are
proprietary and expensive. R is free open-source software. The
software and most of its packages can be used for free by anyone for
almost any use case.
## Helpful Learning Resources
In addition to this guide, you may want to look at these other helpful
resources:
- The Urban Institute [mapping training
series](https://ui-research.github.io/urbn101-mapping/) (with video
lectures and notes)
- Chapters
[5](https://walker-data.com/census-r/census-geographic-data-and-applications-in-r.html),
[6](https://walker-data.com/census-r/mapping-census-data-with-r.html),
and
[7](https://walker-data.com/census-r/spatial-analysis-with-us-census-data.html)
from Kyle Walker's Analyzing US Census Data
[book](https://walker-data.com/census-r/index.html).
- Andrew Heiss' fantastic mapping
[guide](https://datavizm20.classes.andrewheiss.com/example/12-example/)
- All of the vignettes for the [`sf`
package](https://cran.r-project.org/web/packages/sf/sf.pdf)
- [Geocomputation with
R](https://geocompr.robinlovelace.net/index.html): A book by Robin
Lovelace and others
- UChicago's R Spatial Workshops:
<https://spatialanalysis.github.io/tutorials/>
# Get Spatial Data {#get_spatial_data}
------------------------------------------------------------------------
## library(sf) {.tabset .tabset-pills}
### The short version
`library(sf)` stores geospatial data, which are
<font color="red">**points**</font> (a single longitude/latitude),
<font color="blue">**lines**</font> (a pair of connected points), or
<font color="black">**polygons**</font> (a collection of points which
make a polygon) in a `geometry` column within R dataframes
![](mapping/www/images/amtrak_points_lines_polygons.jpg)
This is what `sf` dataframe looks like in the console:
```{r print-sf-dataframe}
dc_parks <- st_read("mapping/data/dc_parks.geojson",
quiet = TRUE)
# Print just the NAME and geometry column
dc_parks %>%
select(NAME) %>%
head(2)
```
### The long version
The `sf` library is a key tool for reading in, managing, and working
with spatial data in R. `sf` stands for simple features (not San
Francisco you Bay Area folks) and denotes a way to describe the spatial
attributes of real life objects. The R object you will be working with
most frequently for mapping is an `sf` dataframe. An `sf` dataframe is
essentially a regular R dataframe, with a couple of extra features for
use in mapping. These extra features exclusive to `sf` dataframes
include:
- sticky `geometry` columns
- attached coordinate reference systems
- some other spatial metadata
The most important of the above list is the sticky `geometry` column,
which is a magical column that contains all of the geographic
information for each row of data. Say for example you had a `sf`
dataframe of all DC census tracts. Then the `geometry` column would
contain all of the geographic points used to define DC census tract
polygons. The stickiness of this column means that no matter what data
munging/filtering you do, you will not be able to drop or delete the
`geometry` column. Below is a graphic to help you understand this:
![](mapping/www/images/sf_sticky_geometry.png)
credits: @allisonhorst
This is what an `sf` dataframe looks like in the console:
```{r print_sf}
# Read in spatial data about DC parks from DC Open Data Portal
dc_parks <- st_read("https://opendata.arcgis.com/api/v3/datasets/287eaa2ecbff4d699762bbc6795ffdca_9/downloads/data?format=geojson&spatialRefId=4326",
quiet = TRUE)
# dc_parks <- st_read("mapping/data/dc_parks.geojson")
# Select just a few columns for readability
dc_parks <- dc_parks %>%
select(NAME, geometry)
# Print to the console
dc_parks
```
Note that there is some spatial metadata such as the `Geometry Type`,
`Bounding Box`, and `CRS` which shows up as a header before the actual
contents of the dataframe.
Since `sf` dataframes operate similarly to regular dataframes, we can
use all our familiar `tidyverse` functions for data wrangling, including
`select`, `filter`, `rename`, `mutate`, `group_by` and `summarize`. The
`sf` package also has many functions that provide easy ways to replicate
common tasks done in other GIS software like spatial joins, clipping,
and buffering. Almost all of the mapping and geospatial analysis methods
described in this guide rely on you having an `sf` dataframe. So let's
talk about how to get one!
## Importing spatial data {.tabset .tabset-pills}
Getting an `sf` dataframe is always the first step in the geospatial
workflow. Here's how to import spatial data for...
### States and counties
We highly recommend using the `library(urbnmapr)` package, which was
created by folks here at Urban to easily create state and county level
maps. The `get_urbn_map()` function in the package allows you to read in
spatial data on states and counties, with options to include
territories. Importantly, it will also display AL and HI as insets on
the map in accordance with the Urban Institute Data Visualization Style
Guide. For information on how to install `urbnmapr`, see the [GitHub
repository](https://github.com/UrbanInstitute/urbnmapr).
Below is an example of how you would use `urbnmapr` to get an `sf`
dataframe of all the states or counties in the US.
```{r urbnmapr-1, eval=FALSE}
library(urbnmapr)
# Get state data
states <- get_urbn_map("states", sf = TRUE)
# Can also get county data
counties <- get_urbn_map("counties", sf = TRUE)
```
### Other Census geographies
Use the `library(tigris)` package, which allows you to easily download
TIGER and other cartographic boundaries from the US Census Bureau. In
order to automatically load in the boundaries as `sf` objects, run
`r options(tigris_class = "sf")` once per R session.
`library(tigris)` has all the standard census geographies, including
census tracts, counties, CBSAs, ZCTAs, congressional districts, tribal
areas, and more. It also includes other elements such as water, roads,
and military bases.
By default, `libraray(tigris)` will download large very large and
detailed TIGER line boundary files. For thematic mapping, the smaller
cartographic boundary files are a better choice, as they are clipped to
the shoreline, generalized, and therefore usually smaller in size
without losing too much accuracy. To load cartographic boundaries, use
the `cb = TRUE` argument. If you are doing detailed geospatial analysis
and need the most detailed shapefiles, then you should use the detailed
TIGER line boundary files and set `cb = FALSE`.
Below is an example of how you would use `library(tigris)` to get a `sf`
dataframe of all Census tracts in DC for 2019.
```{r tigris-1, eval=FALSE}
library(tigris)
# Only need to set once per script
options(tigris_class = "sf")
dc_tracts <- tracts(
state = "DC",
cb = TRUE,
year = 2019
)
```
Unlike `library(urbnmapr)`, different functions are used to get
geographic data for different geographic levels. For instance, the
`blocks()` function will load census block group data, and the
`tracts()` function will load tract data. Other functions include
`block_groups()`, `zctas()` , and `core_based_statistical_areas()`. For
the full list of supported geographies and functions, see the [package
vignette](https://cran.r-project.org/web/packages/tigris/tigris.pdf).
For folks interested in pulling in Census demographic information along
with Census geographies, we recommend checking out the sister package to
`library(tigris)`: `library(tidycensus)`. That package allows you to
download in Census variables and Census geographic data simultaneously.
### Countries
We recommend using the `library(rnaturalearth)` package, which is
similar to `library(tigris)` but allows you to download and use
boundaries beyond the US. Instead of setting class to `sf` one time per
session as we did with `library(tigris)`, you must set the
`returnclass = "sf"` argument each time you use a function from the
package. Below is an example of downloading in an `sf` dataframe of all
the countries in the world.
```{r natural-earth, eval = FALSE}
library(rnaturalearth)
world <- ne_countries(returnclass = "sf")
ggplot() +
geom_sf(data = world, mapping = aes())
```
### Your own files
#### Shapefiles/GeoJSONS
Shapefiles and GeoJSONs are 2 common spatial file formats you will found
out in the wild. `library(sf)` has a function called `st_read` which
allows you to easily read in these files as `sf` dataframes. The only
required argument is `dsn` or data source name. This is the filepath of
the `.shp` file or the `.geojson` file on your local computer. For
geojsons, `dsn` can also be a URL.
Below is an example of reading in a shapefile of fire stations in DC
which is stored in `mapping/data/shapefiles/`. Note that shapefiles are
actually stored as 6+ different files inside a folder. You need to
provide the filepath to the file ending in `.shp`.
```{r list f-ei}
library(sf)
# Print out all files in the directory
list.files("mapping/data/shapefiles")
# Read in .shp file
dc_firestations <- st_read(
dsn = "mapping/data/shapefiles/Fire_Stations.shp",
quiet = TRUE
)
```
And now `dc_firestations` is an `sf` dataframe you can use for all your
mapping needs! `st_read` supports reading in a wide variety of other
spatial file formats, including geodatabases, KML files, and over 200
others. For an incomplete list, please see the this `sf`
[vignette](https://r-spatial.github.io/sf/articles/sf2.html).
#### CSVs or dataframes with lat/lons
If you have a CSV with geographic information stored in columns, you
will need to read in the CSV as a regular R dataframe and then convert
to an `sf` dataframe. `library(sf)` contains the `st_as_sf()` function
for converting regular R dataframes into an `sf` dataframe. The two
arguments you must specify for this function are:
- `coords`: A length 2 vector with the names of the columns
corresponding to longitude and latitude (in that order!). For
example, `c("lon", "lat")`.
- `crs`: The CRS (coordinate references system) for your
longitude/latitude coordinates. Remember you need to specify both
the\
authority and the SRID code, for example ("EPSG:4326"). For more
information on finding and setting CRS codes, please see the
[`CRS`](#crs) section.
Below is an example of reading in data from a CSV and converting it to
an `sf` dataframe.
```{r make-sf}
library(sf)
# Read in dataset of state capitals which is stored as a csv
state_capitals <- read_csv("mapping/data/state-capitals.csv")
state_capitals <- state_capitals %>%
# Specify names of the lon/lat columns in the CSV to use to make geometry col
st_as_sf(
coords = c("longitude", "latitude"),
crs = 4326
)
```
One common mistake is that before converting to an `sf` dataframe, you
must drop any rows that have `NA` values for latitude or longitude. If
your data contains `NA` values, then the `st_as_sf()` function will
throw an error.
## Appending spatial info to your data
Oftentimes, the data you are working with will just have state or county
identifiers - like FIPS codes or state abbreviations - but will not
contain any geographic information. In this case, you must do the extra
work of downloading in the geographic data as an `sf` dataframe and then
joining your non-spatial data to the spatial data. Generally this
involves 3 steps:
1) Reading in your own data as a data frame
2) Reading in the geographic data as an `sf` dataframe
3) Using `left_join` to merge the geographic data with your own non
spatial data and create a new expanded `sf` dataframe
Let's say we had a dataframe on CHIP enrollment by state with state
abbreviations.
```{r readin-chip-data}
# read the state CHIP data
chip_by_state <- read_csv("mapping/data/chip-enrollment.csv") %>%
# clean column names so there are no random spaces/uppercase letters
janitor::clean_names()
# print to the console
chip_by_state %>% head()
```
In order to convert this to an `sf` dataframe, we need to read in the
spatial boundaries for each state and append it to our dataframe. Here
is how we do that with `get_urbn_map()` and `left_join()` .
```{r append-spatial-info, cache = FALSE}
library(urbnmapr)
# read in state geographic data from urbnmapr
states <- get_urbn_map(map = "states", sf = TRUE)
# left join state geographies to chip data
chip_with_geographies <- states %>%
left_join(
chip_by_state,
# Specify join column, which are slightly differently named in states and chip
# respectively
by = c("state_abbv" = "state_abbreviation")
)
chip_with_geographies %>%
select(state_fips, state_abbv, chip_enrollment)
```
```{r append-state-pops, include = FALSE, eval = TRUE, echo = FALSE}
# TODO: DELETE THIS
# Read in data on state populations from 2010
state_pops <-
read_csv("https://raw.githubusercontent.com/jakevdp/data-USstates/master/state-population.csv",
# Set this to disable printing column info to console
col_types = cols()
) %>%
filter(ages == "total", year == "2010") %>%
select(state_abbv = `state/region`, population)
chip_with_geographies <- chip_with_geographies %>%
# Specify left_join from tidylog to print summary messages
tidylog::left_join(state_pops, by = "state_abbv") %>%
# Calculate the chip enrollment percentage and append as a column
mutate(chip_pct = chip_enrollment / population)
```
# Project
## Coordinate Reference Systems {#crs .tabset .tabset-pills}
### The short version
Just watch [this video](https://www.youtube.com/watch?v=vVX-PrBRtTY%60)
and know the following:
- All spatial data has a CRS, which specifies how to identify a
location on earth.
- It's important that all spatial datasets you are working with be in
the same CRS. You can find the CRS with `st_crs()` and change the
CRS with `st_transform()`.
- The Urban Institute Style Guide requires the use of the Atlas Equal
Earth Projection (`"ESRI:102003"`) for national maps. For state and
local maps, use [this](https://github.com/veltman/d3-stateplane)
handy guide to find an appropriate State Plane projection.
### The long version
Coordinate reference systems (CRS) specify the 3d shape of the earth and
optionally how we project that 3d shape onto a 2d surface. They are an
important part of working with spatial data as you need to ensure that
all the data you are working with are in the same CRS in order for
spatial operations and maps to be accurate.
CRS can be specified either by name (ie Maryland State Plane) or
**S**patial **R**eference System **ID**entifier (SRID). THe SRID is a
numeric identifier that uniquely identifies a coordinate reference
system. Generally when referring to an SRID, you need to refer to an
authority (ie the data source) and a unique ID. An example is
`EPSG:26985` which refers to the Maryland State plane projection from
the EPSG, or `ESRI:102003` which refers to the Atlas Equal Area
projection from ESRI. Most CRS codes will be from the EPSG, and some
from ESRI and others. A good resource for finding/validating CRS codes
is [epsg.io](epsg.io).
Sidenote - EPSG stands for the now defunct European Petroleum Survey
Group. And while oil companies have generally been terrible for the
earth, the one nice thing they did for the earth was to set up common
standards for coordinate reference systems.
You might be thinking well isn't the earth just a sphere? Why do we need
all this complicated stuff? And the answer is well the earth is [kind
of](https://oceanservice.noaa.gov/facts/earth-round.html) a sphere, but
it's really more of a misshapen ellipsoid which is pudgier at the
equator than at the poles. To visualize how coordinate reference systems
work, imagine that the earth is a (lumpy) orange. Now peel the skin off
an orange and try to flatten it. There are many ways to do it, but all
will create
[distortions](https://twitter.com/neilrkaye/status/1050740679008296967)
of some kind. The CRS will give us the formula we've used to specify the
shape of the orange (usually a sphere or ellipsoid of some kind) and
optionally, specify how we flattened the orange into 2d.
Broadly, there are two kinds of Coordinate Reference Systems:
1) [**Geographic coordinate
systems**](https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.spatial.doc/ids_spat_407.html)
- (sometimes called unprojected coordinate systems)
- Specifies a 3d shape for the earth
- Uses a spheroid/ellipsoid to approximate shape of the earth
- Usually use decimal degree units (ie latitude/longitude) to
identify locations on earth
![](mapping/www/images/gcs_image.png)
1) [**Projected coordinate
systems**](https://mgimond.github.io/Spatial/chp09-0.html#projected-coordinate-systems)
- Specifies a 3d shape for the earth + a 2d mapping
- Is a geographic coordinate system + a *projection*
![](mapping/www/images/projecting_xkcd.png)
credit: [xkcd](https://imgs.xkcd.com/comics/projecting.png)
- **projection**: mathematical formula used to convert a 3d
coordinate system to a 2d flat coordinate system
- Many different kinds of projections, including Equal Area,
Equidistant, Conformal, etc
- All projections distort the true shape of the earth in some
way, either in terms of shape, area, or angle. Required
[xkcd comic](https://xkcd.com/977/)
- Usually use linear units (ie feet, meters) and therefore
useful for distance based spatial operations (ie creating
buffers)
## Finding the CRS
If you are lucky, your data will have embedded CRS data that will be
automatically detected when the file is read in. This is usually the
case for GeoJSONS (`.geojson`) and shapefiles (`.shp`). When you use
`st_read()` on these files, you should see the CRS displayed in the
metadata:
![](mapping/www/images/sf_crs_pic.png)
You can also the `st_crs()` function to find the CRS. The CRS code is
located at the end in `ID[authority, SRID]`.
```{r st_crs}
st_crs(dc_firestations)
```
Sometimes, the CRS will be blank or `NA` as the dataset did not specify
the CRS. In that case you **MUST find and set the CRS for your data
before proceeding** with analysis. Below are some good rules of thumb
for finding out what the CRS for your data is:
- For geojsons, the CRS should always be `EPSG:4326` (or WGS 84). The
official geojson specification states that this is the only valid
CRS for geojsons, but in the wild, this may not be true 100% of the
time.
- For shapefiles, there should be a file that ends in `.proj` in the
same directory as the `.shp` file. This file contains the projection
information for that file and should be used automatically when
reading in shapefiles.
- For CSV's with latitude/longitude columns, the CRS is usually
`EPSG:4326` (or WGS 84).
- Look at the metadata and any accompanying documentation to see if
the coordinate reference system for the data is specified
If none of the above rules of thumb apply to you, check out the
`crsuggest` R [package](https://github.com/walkerke/crsuggest).
Once you've identified the appropriate CRS, you can set the CRS for your
data with `st_crs()`:
```{r set_crs, eval = FALSE}
# If you are certain that your data contains coordinates in the ESRI Atlas Equal Earth projections
st_crs(some_sf_dataframe) <- st_crs("ESRI:102003")
```
## Transforming the CRS
Often you will need to change the CRS for your `sf` dataframe so that
all datasets you are using have the same CRS, or to use a projected CRS
for performing more accurate spatial operations. You can do this with
`st_transform`:
```{r transform-crs}
# Transforming CRS from WGS 84 to Urban required Equal Earth Projection
state_capitals <- state_capitals %>% st_transform("ESRI:102003")
```
`st_transform()` also allows you to just use the CRS of another `sf`
dataframe when transforming.
```{r transform-crs-with-another-sf-object}
# transform CRS of chip_with_geographies to be the same as CRS of dc_firestations
chip_with_geographies <- chip_with_geographies %>%
st_transform(crs = st_crs(state_capitals))
```
If you are working with local data, you should use an appropriate state
plane projection instead of the Atlas Equal Earth projection which is
meant for national maps. `library(crsuggest)` can simplify the process
of picking an appropriate state plane CRS.
```{r crsuggest-ex, cache = TRUE}
library(crsuggest)
suggest_crs(dc_firestations) %>%
# Use the value in the "crs_code" column to transform CRS's
head(4)
```
# Map
In order to start mapping, you need an `sf` dataframe. If you don't have
one, see the [`Get Spatial Data`](#get_spatial_data) section above.
## The basics
### library(ggplot2)
Most mapping in R fits the same theoretical framework as plotting in R
using `library(ggplot2)`. To learn more about ggplot2, visit the Data
Viz
[page](https://urbaninstitute.github.io/r-at-urban/graphics-guide.html#Grammar_of_Graphics_and_Conventions)
or read the official ggplot [book](html).
The key function for mapping is **the special `geom_sf()` function**
which works with `sf` dataframes. This function magically detects
whether you have point or polygon spatial data and displays the results
on a map.
### A simple map
To make a simple map, add `geom_sf()` to a `ggplot()` and set
`data = an_sf_dataframe`. Below is code for making a map of all 50
states using `library(urbnmapr)`:
```{r first-map, cache = TRUE}
library(urbnmapr)
states <- get_urbn_map("states", sf = TRUE)
ggplot() +
geom_sf(
data = states,
mapping = aes()
)
```
## Styling
### `library(urbnthemes)`
`library(urbnthemes)` automatically styles maps in accordance with the
[Urban Institute Data Visualization Style
Guide](http://urbaninstitute.github.io/graphics-styleguide/). By using
`library(urbnthemes)`, you can create publication ready maps you can
immediately drop in to Urban research briefs or blog posts.
To install `urbnthemes`, visit the package's [GitHub
repository](https://github.com/UrbanInstitute/urbnthemes) and follow the
instructions. There are 2 ways to use the `urbnthemes` functions:
```{r urbnthemes}
library(urbnthemes)
# You can either run this once per script to automatically style all maps with
# the Urban theme
set_urbn_defaults(style = "map")
# Or you can add `+ theme_urbn_map()` to the end of every map you make
ggplot() +
geom_sf(states, mapping = aes()) +
theme_urbn_map()
```
### Layering
You can layer multiple points/lines/polygons on top of each other using
the `+` operator from `library(ggplot2)`. The shapes will appear from
bottom to top (ie the last mapped object will show up on top). It is
important that all layers are in the same CRS (coordinate reference
system).
```{r layers, cache = TRUE}
state_capitals <- state_capitals %>%
# This will change CRS to ESRI:102003 and shift the AK and HI state capitals
# point locations to the appropriate locations on the inset maps.
tigris::shift_geometry() %>%
# For now filter out AL and HI as their state capitals will be slightly off.
filter(!state %in% c("Alaska", "Hawaii"))
ggplot() +
geom_sf(
data = states,
mapping = aes()
) +
# Note we change the data argument
geom_sf(
data = state_capitals,
mapping = aes(),
# urbnthemes library has urbn color palettes built in.
color = palette_urbn_main["yellow"],
size = 2.0
) +
theme_urbn_map()
```
### Fill and Outline Colors
The same commands used to change colors, opacity, lines, size, etc. in
charts can be used for maps too. To change the colors of the map , just
use the `fill =` and `color =` parameters in `geom_sf()`. `fill` will
change the fill color of polygons; `color` will change the color of
polygon outlines, lines, and points.
Generally, maps that show the magnitude of a variable use the blue
sequential ramp and maps that display positives and negatives use the
diverging color ramp.`library(urbnthemes)` contains inbuilt. helper
variables (like `palette_urbn_main`) for accessing color palettes from
the Urban Data Viz Style guide. If for example you want states to be
Urban's magenta color:
```{r urbnthemes- pink}
ggplot() +
geom_sf(states,
mapping = aes(),
# Adjust polygon fill color
fill = palette_urbn_main["magenta"],
# Adjust polygon outline color
color = "white"
) +
theme_urbn_map()
```
### Adding text
You can also add text, like state abbreviations, directly to your map
using `geom_sf_text` and the helper function `get_urbn_labels()`.
```{r geom_sf_text}
library(urbnmapr)
ggplot() +
geom_sf(states,
mapping = aes(),
color = "white"
) +
theme_urbn_map() +
# Generates dataframe of state abbv and appropriate location to plot them
geom_sf_text(
data = get_urbn_labels(
map = "states",
sf = TRUE
),
aes(label = state_abbv),
size = 3
)
```
There's also `geom_sf_label()` if you want labels with a border.
# Map Gallery {#map_gallery}
Below are copy and pasteable examples of maps you can make, after you
have an `sf` dataframe.
## Choropleth Maps
Choropleth maps display geographic areas with shades, colors, or
patterns in proportion to a variable or variables. Choropleth maps can
represent massive geographies like the entire world and small
geographies like Census Tracts. To make a choropleth map, you need to
set `geom_sf(aes(fill = some_variable_name))`. Below are examples
### Continuous color scale
```{r choropoleth_continious}
# Map of CHIP enrollment percentage by state
chip_with_geographies_map <- chip_with_geographies %>%
ggplot() +
geom_sf(aes(
# Color in states by the chip_pct variable
fill = chip_pct
))
# Below add-ons to the map are optional, but make the map look prettier.
chip_with_geographies_map +
# scale_fill_gradientn adds colors with more interpolation and reverses color scale
scale_fill_gradientn(
# Convert legend from decimal to percentages
labels = scales::percent_format(),
# Make legend title more readable
name = "CHIP Enrollment %",
# Manually add 0 to lower limit to include it in legend. NA=use maximum value in data
limits = c(0, NA),
# Set number of breaks on legend = 3
n.breaks = 3
)
```
### Discrete color scale
The quick and dirty way is with `scale_fill_steps()`, which creates
discretized bins for continuous variables:
```{r chorpleth_disccrete}
chip_with_geographies %>%
ggplot() +
geom_sf(aes(
# Color in states by the chip_pct variable
fill = chip_pct
)) +
scale_fill_steps(
# Convert legend from decimal to percentages
labels = scales::percent_format(),
# Make legend title more readable
name = "CHIP Enrollment %",
# Show top and bottom limits on legend
show.limits = TRUE,
# Roughly set number of bins. Won't be exact as R uses algorithms under the
# hood for pretty looking breaks.
n.breaks = 4
)
```
Often you will want to manually generate the bins yourself to give you
more fine grained control over the exact legend text. (ie `1% - 1.8%`,
`1.8 - 2.5%`, etc). Below is an example of discretizing the continuous
`chip_pct` variable yourself using `cut_interval()` and a helper
function to get nice looking interval labels:
```{r format_intervals}
# Helper function to clean up R generated intervals into nice looking interval labels
format_interval <- function(interval_text) {
text <- interval_text %>%
# Remove open and close brackets which is R generated math notation
str_remove_all("\\(") %>%
str_remove_all("\\)") %>%
str_remove_all("\\[") %>%
str_remove_all("\\]") %>%
str_replace_all(",", " — ")
# Convert decimal ranges to percent ranges
text <- text %>%
str_split(" — ") %>%
map(~ as.numeric(.x) %>%
scales::percent() %>%
paste0(collapse = " — ")) %>%
unlist() %>%
# By default character vectors are plotted in alphabetical order. We want
# factors in reverse alphabetical order to get correct colors in ggplot
fct_rev()
return(text)
}
chip_with_geographies <- chip_with_geographies %>%
# cut_interval into n groups with equal range. Set boundary so 0 is included in the bins
mutate(chip_pct_interval = cut_interval(chip_pct, n = 5)) %>%
# Generate nice looking interval labels
mutate(chip_pct_interval = format_interval(chip_pct_interval))
```
And now we can map the discretized `chip_pct_interval` variable using
`geom_sf()`:
```{r make_discrete_map}
chip_with_geographies %>%
ggplot() +
geom_sf(aes(
# Color in states by the chip_pct variable
fill = chip_pct_interval
)) +
# Default is to use main urban palette, which assumes unrelated groups. We
# adjust colors manually to be on Urban cyan palette
scale_fill_manual(
values = palette_urbn_cyan[c(8, 7, 5, 3, 1)],
name = "CHIP Enrollment %"
)
```
In addition to `cut_interval` there are [similar
functions](https://ggplot2.tidyverse.org/reference/cut_interval.html)
for creating intervals/bins with slightly different rules. When creating
bins, be careful as changing the number of bins can drastically change
how the map looks.
## Bubble Maps
This is just a layered map with one polygon layer and one point layer,
where the points are sized in accordance with a variable in your data.
```{r bubble_maps, cache = TRUE}
set_urbn_defaults(style = "map")
# Get sf dataframe of DC tracts
library(tigris)
dc_tracts <- tracts(
state = "DC",
year = 2019,
progress_bar = FALSE
)
# Add bubbles for firestations
ggplot() +
geom_sf(data = dc_tracts, fill = palette_urbn_main["gray"]) +
geom_sf(
data = dc_firestations,
# Size bubbles by number of trucks at each station
aes(size = TRUCK),
color = palette_urbn_main["yellow"],
# Adjust transparency for readability
alpha = 0.8
)
```
## Dot-density Maps
These maps scatter dots within a geographic area. Typically each dot
represents a unit (like 100 people, or 1000 houses). To create this kind
of map, you need to start with an `sf` dataframe that is of `geometry`
type `POLYGON` or `MULTIPOLYGON` and then sample points within the
polygon.
The below code generates a dot-density map representing people of
different races within Washington DC tracts The code may look a little
complicated, but the key workhorse function is `st_sample()` which
samples points within each polygon to use in the dot density map:
```{r dot_density_maps, cache = TRUE}
library(tidycensus)
# Get counts by race of DC tracts
dc_pop <- get_acs(
geography = "tract",
state = "DC",
year = 2019,
variables = c(
Hispanic = "DP05_0071",
White = "DP05_0077",
Black = "DP05_0078",
Asian = "DP05_0080"
),
geometry = TRUE,
progress_bar = FALSE
)
# Get unique groups (ie races)
groups <- unique(dc_pop$variable)
# For each unique group (ie race), generate sampled points
dc_race_dots <- map_dfr(groups, ~ {
dc_pop %>%
# .x = the group used in the loop
filter(variable == .x) %>%
# Use the projected MD state plane for accuracy