-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
1445 lines (1109 loc) · 59.5 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "ihydro: Integrated hydrology tools for environmental science"
always_allow_html: true
output:
github_document:
toc: true
pandoc_args: --webtex
editor_options:
markdown:
wrap: 72
chunk_output_type: console
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# ihydro <img src="man/figures/logo.png" align="right" width="280"/>
<!-- badges: start -->
[![Lifecycle:
experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
<!-- badges: end -->
## 1 Introduction
Aquatic environmental scientists are often interested in relating landscape features to observed responses (e.g., fish size, water temperature, invertebrate community composition, etc.) in streams, rivers and lakes. The computational workflow for conducting these investigations is complex. Simply describing how water flows and accumulates across the landscape can be a challenge itself, but for aquatic scientists it is only the first step. The stream network must then be extracted from the landscape, and reaches (a.k.a. segments; i.e., stretches of river between two confluences) identified and given unique identifiers. These reaches must then be attributed to be
informative (e.g., slope, stream order, upstream channel length, etc.); and upstream-downstream connectivity between reaches established.
Typically, observed responses are measured at discrete sampling locations along the stream network. If the location information is to be preserved (i.e., if samples are upstream and downstream of a particular effluent outflow), they must be incorporated into the network. This is done by splitting stream lines and catchments at these points.
Once that is done, landscape features of interest (e.g., land-cover, geology, climate, etc.) must be related to the reach (or sampling points). This alone can be complex as the spatial configuration of these factors relative to flow direction and accumulation can be important (Peterson ***et al.***, 2011). The ***ihydro*** package uses ***hydroweight*** (Kielstra ***et al.*** 2021) to calculate these attributes.
The complexity of this workflow can be a rate limiting step in the scope, content, quality, and applicability of investigations by aquatic environmental scientists. The ***ihydro*** package offers tools and workflows to simplify these complex steps. It is capable of handling all above computation steps, leaving researchers the task of identifying layers of interest and modeling with (potentially) large numbers of predictors.
The ***ihydro*** package also implements descriptions of spatial autocorrelation among sites. Due to the linear nature of flow in streams and rivers, autocorrelation tends to be asymmetric, with sites generally more similar when they are flow connected, than not flow connected. ***ihydro*** produces tables that can be transformed into an asymmetric matrices that describes the relationships between sampling points based on instream distances and/or the proportions shared catchments. Proportion of shared upstream catchment (rather than in-stream distance) are a more relevant measure of spatial autocorrelation in streams because it accounts for differences in catchment areas between points, as well as spatial proximity. For example, if water chemistry samples are taken from a large 6th order stream, and a upstream small 1st order tributary we would expect the small tributary to have only a small impact on the larger stream (despite its close physical proximity). Hence, autocorrelation should be low because the tributary does not contribute much flow to the larger stream. Using in-stream distances alone may misrepresent this pattern.
***ihydro*** stores its geospatial products in a geopackage file for ease of retrieval and extraction in external software (i.e. QGIS). Many ***ihydro*** functions can be run in parallel for increased speed (if enough memory is available). The functions are also quick at removing internal intermediate files to keep hard drives from filling up too fast.
[Back to top](#1-introduction)
## 2 System setup and installation
*WhiteboxTools* and *whitebox* are required for ***ihydro***. See [whiteboxR](https://github.com/giswqs/whiteboxR) or below for installation.
```{r, eval = FALSE}
## Follow instructions for whitebox installation accordingly
## devtools::install_github("giswqs/whiteboxR") # For development version
## whitebox is now available on CRAN
#install.packages("whitebox")
library(whitebox)
if (F){
install_whitebox()
# Possible warning message:
# ------------------------------------------------------------------------
# Could not find WhiteboxTools!
# ------------------------------------------------------------------------
#
# Your next step is to download and install the WhiteboxTools binary:
# > whitebox::install_whitebox()
#
# If you have WhiteboxTools installed already run `wbt_init(exe_path=...)`':
# > wbt_init(exe_path='/home/user/path/to/whitebox_tools')
#
# For whitebox package documentation, ask for help:
# > ??whitebox
#
# For more information visit https://giswqs.github.io/whiteboxR/
#
# ------------------------------------------------------------------------
}
```
[Back to top](#1-introduction)
## 3 Prepare DEM and Sampling Points for analysis
### 3.1 Generate toy terrain dataset and sampling points
Begin by bringing in the digital elevation model.
```{r, message = FALSE, error = FALSE, warning = FALSE}
## Load libraries
# remotes::install_github("p-schaefer/ihydro)
library(ihydro)
library(tmap)
library(furrr)
library(whitebox)
library(terra)
library(sf)
library(dplyr)
library(tidyr)
library(purrr)
# Many function in 'ihydro' can be run in parallel internally.
# Parallelization is done through the future package, so all parallel backends
# should be supported i.e.,:
plan(multisession(workers=4))
## Generate save_dir as a temporary directory
save_dir <- tempdir()
## Import toy_dem from openSTARS package
# devtools::install_github("MiKatt/openSTARS", ref = "dev")
ot<-system.file("extdata", "nc", "elev_ned_30m.tif", package = "openSTARS") %>%
rast()
crs(ot)<-crs(rast(system.file("extdata", "nc", "landuse_r.tif", package = "openSTARS")))
writeRaster(ot,file.path(save_dir, "toy_dem.tif"),overwrite=T)
toy_dem<-rast(file.path(save_dir, "toy_dem.tif"))
## Identify some sampling points
system.file("extdata", "nc", "sites_nc.shp", package = "openSTARS") %>%
vect() %>%
st_as_sf() %>%
st_transform(st_crs(toy_dem)) %>%
vect() %>%
writeVector(file.path(save_dir, "sites.shp"),overwrite=T)
plot(toy_dem,main="Elevation")
```
[Back to top](#1-introduction)
### 3.2 Generate flow direction/accumulation geospatial analysis products with `process_flowdir()`
The DEM must be processed in a way to remove depressions. Whitebox offers methods for breaching and filling DEMs to remove depressions. These may be applied before had, or through the `depression_corr` argument in `process_flowdir()`.
Another factor to consider at this step is whether to burn a stream vector into the DEM; ***ihydro*** implements a simplified stream burning method that lowers the evation along the DEM by `burn_depth` meters. See [here](https://proceedings.esri.com/library/userconf/proc99/proceed/papers/pap802/p802.htm#Trois) for more detailed approaches.
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
# Outputs of all functions are always saved to output_filename by default,
# but can be included in function return with return_products=T (note this can
# be very slow for large regions)
output_filename_hydro<-file.path(save_dir,"Processed_Hydrology.gpkg")
# Generates d8 flow direction and accumulation, extracts streams at a specified
# flow accumulation threshold
hydro_out<-process_flowdir(
dem=toy_dem,
burn_streams=system.file("extdata", "nc", "streams.shp", package = "openSTARS"),
burn_depth=5,
min_length=3,
depression_corr="breach",
threshold=100L,
return_products=T,
output_filename=output_filename_hydro,
temp_dir=NULL,
verbose=T
)
# List files present in gpkg
ihydro_layers(hydro_out)
# hydro_out$outfile # -> This is the full file path of the resulting geopackage (gpkg) file
# remaining outputs only present when `return_products` == T
# hydro_out$dem_final.tif # -> final dem after stream burning and depression correction
# hydro_out$dem_d8.tif # -> d8 flow direction
# hydro_out$dem_accum_d8.tif # -> d8 flow accumulation (cells)
# hydro_out$dem_accum_d8_sca.tif # -> d8 flow accumulation (specific catchment areas)
# hydro_out$dem_streams_d8.tif # -> extracted streams at specified `threshold`
# if `return_products` == F, all produces are only available in the gpkg file.
# terra and sf allow access to files directly in the gpkg file, whitebox
# requires them to be extracted to a folder
flow_accum<-rast(hydro_out$outfile,"dem_accum_d8")
plot(log10(flow_accum))
```
To create an ihydro object from an existing geopackage (i.e., created from a previous R run):
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
hydro_out_new<-as.ihydro(output_filename_hydro)
ihydro_layers(hydro_out_new)
```
[Back to top](#1-introduction)
### 3.3 Generate vector geospatial analysis products with `generate_vectors()`
This function combines `generate_subbasins()` and `attrib_streamline()` to produce subcatchment polygons, stream lines, stream links (downstream most points along each subcatchment), stream points (stream line represented as individual points).
Typically, observed responses are measured at discrete sampling locations along the stream network. If the location information is to be preserved (i.e., if samples are upstream and downstream of a particular effluent outflow), the sampling points must be incorporated into the network. This is done by splitting stream lines and subcatchments at these points.
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
hydro_out<-generate_vectors(
input=hydro_out,
points=file.path(save_dir, "sites.shp"), # These are optional sampling locations
site_id_col="site_id", # Column name in points layer that corresponds to
# # unique IDs that will be available in data products
snap_distance=100L, # points that are more than 100m from closest stream are excluded
break_on_noSnap =F, # default is to stop when any points don't snap, this will ignore that
return_products=T,
temp_dir=NULL,
verbose=T
)
ihydro_layers(hydro_out)
# Several important columns are used throughout the vector layers:
# `link_id` - identifies reaches/segments (stream length between two confluences)
# `trib_id` - identifies streams/tributaries, with the shortest channel getting
# # a new trib_id at a confluence, and longest channel retaining the original ID
# `uslink_id` and `dslink_id` - columns identify upstream and downstream links
# `ustrib_id` and `dstrib_id` - columns identify upstream and downstream tributaries (longest
# continuous flow-paths)
# If 'points' are provided, this function modifies the vector outputs by inserting
# links, and splitting lines/subbasins at `points`. Inserted points are given "link_id'
# values that increment with decimal places from downstream to upstream directions
# (i.e., 10.0 remains the pour point for the segment, and 10.1, 10.2,... identify
# sample points in an upstream direction).
# New layers added by `generate_vectors()`
# hydro_out$subbasins # -> polygon subbasins attributed with `link_id` and reach
# # contributing area (in m^2)
# hydro_out$stream_lines # -> line vectors attributed with `link_id`
# hydro_out$points # -> point vectors along lines identifying 'nodes' (confluence
# # points), vs 'links' segments joining 'nodes', and also
# # attributed with `link_id`, `trib_id`, upstream
# # and downstream link and trib IDS and a number of extra attributes
# hydro_out$links # -> point vector representing pour-points for subbasins,
# # attributed with `link_id` and extra attributes
# hydro_out$snapped_points # -> provided points snapped to nearest segment. Any points
# # beyond snap distance are removed, with a warning if
# # break_on_noSnap == F.
tm_shape(hydro_out$subbasins) + tm_polygons(col="link_id",palette = "viridis",alpha =0.2,legend.show=F) +
tm_shape(hydro_out$stream_lines) + tm_lines(col="blue",alpha =0.5,legend.show=F,lwd =3) +
tm_shape(hydro_out$links) + tm_dots(col="yellow",legend.show=F,size=0.15,border.col="black",border.alpha=1)
```
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
# Optional step, inserts sampling points into stream vectors, splitting subbasins
# and lines at sampling points, additional links inserted at sampling points as well
tm_shape(rast(hydro_out$dem_final.tif)) + tm_raster(palette = "cividis",legend.show=F) +
tm_shape(hydro_out$stream_lines) + tm_lines(col="red",alpha =0.5,legend.show=F,lwd =2) +
tm_shape(hydro_out$snapped_points %>% mutate(site_id=as.numeric(site_id))) +
tm_dots(col="darkgray",legend.show=F,size=0.35,border.col="black",border.alpha=1,border.lwd=1) +
tm_shape(read_sf(file.path(save_dir, "sites.shp"))) +
tm_dots(col="black",legend.show=F,size=0.35,border.col="black",border.alpha=1,border.lwd=1)
```
[Back to top](#1-introduction)
### 3.4 Create lookup tables of flow-directions with `trace_flowpaths()`
To more efficiently generate catchments, look-ups are created that identify all upstream and downstream links originating from each link.
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
hydro_out<-trace_flowpaths(
input=hydro_out,
return_products=T,
pwise_dist=T, # This will calculate all downstream flow-connected pairwise distances
pwise_all_links=F, # This will calculate all flow un-connected pairwise distances (can be very time consuming)
temp_dir=NULL,
verbose=T
)
ihydro_layers(hydro_out)
# We will add these upstream and downstream lookup tables to the output below
con <- DBI::dbConnect(RSQLite::SQLite(), hydro_out$outfile)
hydro_out$us_flowpaths <-collect(tbl(con,"us_flowpaths"))
hydro_out$ds_flowpaths <-collect(tbl(con,"ds_flowpaths"))
DBI::dbDisconnect(con)
head(hydro_out$us_flowpaths)
head(hydro_out$ds_flowpaths %>% arrange(destination_link_id))
# Below we demonstrate how to extract the upstream and downstream flowpaths for specific points
us_647<-hydro_out$us_flowpaths %>%
filter(pour_point_id == "647") # get all upstream link_ids from link_id 647
ds_101.1<-hydro_out$ds_flowpaths %>%
filter(origin_link_id == "101.1") # get all downstream link_ids from
# # link_id 101.1 (this corresponds with site_id = 62)
lines_out<-hydro_out$stream_lines %>%
filter(link_id %in% us_647$origin_link_id |
link_id %in% ds_101.1$destination_link_id
)
sub_out<-hydro_out$subbasins %>%
filter(link_id %in% us_647$origin_link_id |
link_id %in% ds_101.1$destination_link_id
)
tm_shape(sub_out) + tm_polygons(col="white",alpha =0.2,legend.show=F) +
tm_shape(lines_out) + tm_lines(col="blue",alpha =0.5,legend.show=F,lwd =3) +
tm_shape(hydro_out$links %>% filter(link_id %in% c("647","101.1"))) +
tm_dots(legend.show=F,size=0.45,border.col="black",border.alpha=1,border.lwd=1) +
tm_layout(frame = FALSE)
```
[Back to top](#1-introduction)
### 3.5 Generate complete upstream catchment areas with `get_catchment()`
Once lookups are established, catchments can easily be retrieved
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
subbasin_catchments<-get_catchment( # retrieve catchment for an arbitrary reach
input=hydro_out,
sample_points=NULL,
link_id="838"
)
point_catchments<-get_catchment( # retrieve catchments at specified sampling points
input=hydro_out,
sample_points=c("1","25"),
link_id=NULL
)
tm_shape(bind_rows(subbasin_catchments,point_catchments)) +
tm_polygons(col="white",alpha =0.2,legend.show=F,lwd =4) +
tm_shape(hydro_out$stream_lines) +
tm_lines(col="blue",alpha =0.5,legend.show=F,lwd =2)
```
[Back to top](#1-introduction)
### 3.6 Generate entire hydrology workflow in one step.
The entire workflow above can be accomplished with a single function:
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
output_filename_hydro_sparse<-file.path(save_dir,"Processed_Hydrology_sparse.gpkg")
# In this case we will use a higher stream initiation threshold to speed up calculation.
hydro_out_sparse<-process_hydrology(
dem=toy_dem,
output_filename=output_filename_hydro_sparse,
# burn_streams=system.file("extdata", "nc", "streams.shp", package = "openSTARS"),
burn_depth=5,
depression_corr="breach",
threshold=500L,
points=hydro_out$snapped_points,
site_id_col="site_id",
snap_distance = 1L,
break_on_noSnap=F,
pwise_dist=T,
pwise_all_links=F,
return_products=F,
temp_dir=NULL,
verbose=F
)
ihydro_layers(hydro_out_sparse)
# Since we didn't return the products, we'll verify the outputs exist in the gpkg file
tm_shape(read_sf(hydro_out_sparse$outfile,"Subbasins_poly")) +
tm_polygons(col="white",alpha =0.2,legend.show=F) +
tm_shape(read_sf(hydro_out_sparse$outfile,"stream_lines")) +
tm_lines(col="blue",alpha =0.3,legend.show=F,lwd =2) +
tm_shape(read_sf(hydro_out_sparse$outfile,"stream_links")) +
tm_dots(legend.show=F,size=0.2,border.col="black",border.alpha=1,border.lwd=1)
```
[Back to top](#1-introduction)
## 4 Examining Pairwise Instream Distances
For more complete and thorough treatment on spatial autocorrelation in stream systems, see [Zimmerman and Hoef (2007)](https://www.fs.usda.gov/rm/boise/AWAE/projects/NationalStreamInternet/downloads/17ZimmermanVerHoef_TheTorgeg amForFluvialVariography.pdf).
In case we didn't request pairwise distances in `trace_flowpaths()`, we can calculate them separately with `generate_pdist()`. Below we calculate a table that summaries relevant pairwise distance measures between reaches.
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
# If pairwise distances were not requested with trace_flowpaths(), they can be added
# at any point after using generate_pdist()
hydro_out<-generate_pdist(
input=hydro_out,
pwise_all_links=T # This will calculate pairwise distances between all stream links
# # which can be very slow for dense stream networks
)
# New data available in `fcon_pwise_dist` and 'funcon_pwise_dist'
# *_pwise_dist # -> tables of downstream path lengths between each pair of link_ids,
# # with flow-connected in-stream distances (directed_path_length),
# # flow-unconnected in-stream distances (undirected_path_length),
# # and proportions of shared catchments (prop_shared_catchment)
# # and log-transformed catchment proportions (prop_shared_logcatchment)
con <- DBI::dbConnect(RSQLite::SQLite(), hydro_out$outfile)
ihydro_layers(hydro_out)
# funcon_pwise_dist is only available if pwise_all_links==T
hydro_out$pwise_dist<-bind_rows(collect(tbl(con,"fcon_pwise_dist")) %>% mutate(dist_type="Flow Connected"),
collect(tbl(con,"funcon_pwise_dist")) %>% mutate(dist_type="Flow Unconnected")
)
DBI::dbDisconnect(con)
head(hydro_out$pwise_dist)
p1<-hydro_out$stream_lines %>%
mutate(link_id=as.character(link_id)) %>%
left_join(hydro_out$pwise_dist %>%
filter(dist_type=="Flow Connected") %>%
filter(origin=="1048") %>%
mutate(directed_path_length=log(directed_path_length)),
by=c("link_id"="destination")) %>%
tm_shape() +
tm_lines(col="directed_path_length",alpha =1,legend.show=T,lwd =2,palette = "viridis",style="cont")+
tm_shape(hydro_out$links %>% filter(link_id=="1048"))+
tm_dots(legend.show=F,size=0.45,border.col="black",border.alpha=1,border.lwd=1)+
tm_layout(main.title = "Flow connected path log-length from 1048",legend.outside = TRUE)
p2<-hydro_out$stream_lines %>%
mutate(link_id=as.character(link_id)) %>%
left_join(hydro_out$pwise_dist %>%
filter(origin=="99") %>%
filter(dist_type=="Flow Unconnected") %>%
mutate(undirected_path_length=log(undirected_path_length)),
by=c("link_id"="destination"),
multiple="all") %>%
tm_shape() +
tm_lines(col="undirected_path_length",alpha =1,legend.show=T,lwd =2,palette = "viridis",style="cont")+
tm_shape(hydro_out$links %>% filter(link_id=="99"))+
tm_dots(legend.show=F,size=0.45,border.col="black",border.alpha=1,border.lwd=1)+
tm_layout(main.title = "Flow Unconnected path log-length from 99",legend.outside = TRUE)
# Verify upstream and downstream distances make sense
tmap_arrange(p1,p2,ncol=1)
```
Below. we take these distances and visualize them as heatmaps to illustrate how pairwise distance matrices can be used to represent spatial relationships among sites.
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
# Examine relationships among sampled sites.
# Get link_id for sampled points
sel_link_id<-hydro_out$snapped_points
# filter long table to selected sites, and convert to wide format
# This table describes the flow-connected in-stream distances at destination points (rows)
# from upstream origin points (columns)
dmat<-hydro_out$pwise_dist %>%
filter(origin %in% sel_link_id$link_id &
destination %in% sel_link_id$link_id
) %>%
filter(origin!=destination) %>%
select(-prop_shared_catchment,-undirected_path_length,-prop_shared_logcatchment,-dist_type) %>%
rename(link_id=origin) %>%
mutate(directed_path_length=ifelse(directed_path_length==1 | is.na(directed_path_length),0,directed_path_length)) %>%
distinct() %>%
filter(directed_path_length!=0) %>%
pivot_wider(names_from=destination,values_from=directed_path_length,values_fill = 0) %>%
data.frame(check.names = F) %>%
tibble::column_to_rownames("link_id") %>%
log1p()
head(dmat)
# This table describes the proportions of shared catchments at destination points (rows)
# from upstream origin points (columns)
dmat2<-hydro_out$pwise_dist %>%
filter(origin %in% sel_link_id$link_id &
destination %in% sel_link_id$link_id
) %>%
filter(origin!=destination) %>%
select(-directed_path_length,-undirected_path_length,-prop_shared_catchment,-dist_type) %>%
rename(link_id=origin) %>%
mutate(prop_shared_logcatchment=ifelse(prop_shared_logcatchment==1 | is.na(prop_shared_logcatchment),
1,prop_shared_logcatchment)) %>%
distinct() %>%
filter(prop_shared_logcatchment!=0) %>%
pivot_wider(names_from=destination,values_from=prop_shared_logcatchment ,values_fill = 0) %>%
data.frame(check.names = F) %>%
tibble::column_to_rownames("link_id")
head(dmat2)
# Here we multiply the matrices (using the proportions of shared catchments as a rough weighting scheme)
# calculate manhattan distances and generate a heatmap
(dmat*dmat2) %>%
dist("man") %>%
as.matrix() %>%
heatmap()
```
Using these relationships, we can perform a clustering analysis to identify groups of sites with potentially high spatial autocorrelation. These groups could for instance be used for cross-validation purposes.
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
# Using the above approach, we create 5 groups of spatially proximate points
km<-(dmat*dmat2) %>%
dist("man") %>%
hclust() %>%
cutree(k=5)
gps<-tibble::enframe(km,"link_id","group") %>%
mutate(link_id=as.numeric(link_id))
point_groups<-hydro_out$snapped_points %>%
left_join(gps) %>%
filter(!is.na(group))
# These can be used for cross-validation purposes to see how well the models extrapolate outside of
# sampled areas
tm_shape(hydro_out$subbasins) + tm_polygons(col="white",alpha =0.2,legend.show=F) +
tm_shape(hydro_out$stream_lines) + tm_lines(col="blue",alpha =0.3,legend.show=F,lwd =2) +
tm_shape(point_groups) + tm_dots(col="group", palette = "Dark2",legend.show=T,size=0.45)+
tm_layout(legend.outside = TRUE)
```
Finally, we'll examine relationships among our response variable in terms of in-stream distances, contrasting flow connected sites vs, flow unconnected sites. Here we see greater differences between responses with increasing in-stream distance. We expect greater similarity among flow-connected than flow-unconnected sites, but don't see it here. This may be a product of this being an artificial data set.
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
# get response variables
response_table<-file.path(save_dir, "sites.shp") %>%
read_sf() %>%
as_tibble() %>%
select(site_id,value) %>%
left_join(hydro_out$snapped_points %>% # join by snapped sites to get link_id values
as_tibble() %>%
select(site_id,link_id)) %>%
mutate(link_id=as.character(link_id))
# Combine pairwise data with values and examine spatial relationships.
dmat<-hydro_out$pwise_dist %>%
filter(origin %in% sel_link_id$link_id &
destination %in% sel_link_id$link_id
) %>%
# Add response values for origin and destination points into the table
left_join(response_table %>% rename(origin_value=value),by=c("origin"="link_id")) %>%
left_join(response_table %>% rename(destination_value=value),by=c("destination"="link_id")) %>%
mutate(value_diff=sqrt((origin_value-destination_value)^2)) %>% # Calculate the squared difference
filter(origin!=destination) %>%
select(-dist_type,-site_id.x,-site_id.y,-origin_value,-destination_value) %>%
pivot_longer(c(directed_path_length,undirected_path_length,prop_shared_catchment,prop_shared_logcatchment),
names_to ="dist_type",
values_to ="dist") %>%
filter(!is.na(dist)) %>%
mutate(`Distance Type`=case_when(
dist_type=="directed_path_length" ~ "Flow Connected",
dist_type=="undirected_path_length" ~ "Flow Unconnected",
dist_type=="prop_shared_catchment" ~ "Shared Catchment",
dist_type=="prop_shared_logcatchment" ~ "Shared log-Catchment",
)) %>%
distinct()
dmat
require(ggplot2)
ggplot(dmat %>% filter(dist_type %in% c("directed_path_length","undirected_path_length")),
aes(x=dist,y=value_diff,col=`Distance Type`))+
geom_hex(bins=50)+
geom_smooth(method="gam",se=F)+
theme_bw()+
theme(legend.position = "bottom")+
ylab("Pairwise Value Difference")+
xlab("Pairwise Distance (m)")+
scale_x_log10(labels=scales::comma)+
facet_wrap(~`Distance Type`)
```
Finally, we'll do a similar comparison but using percent of shared catchments. Here we expect pairwise differences to decrease as the percent of shared catchments increases.
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
ggplot(dmat %>%
filter(dist_type %in% c("prop_shared_catchment","prop_shared_logcatchment")) %>%
filter(dist>0),
aes(x=dist,y=value_diff,colour=`Distance Type`))+
geom_hex()+
geom_smooth(method="gam",se=F)+
theme_bw()+
theme(legend.position = "bottom")+
ylab("Pairwise Value Difference")+
xlab("Percent of shared catchments (Flow Connected only)")+
scale_x_continuous(labels=scales::percent)+
facet_wrap(~`Distance Type`,scales ="free_x")
```
[Back to top](#1-introduction)
## 5 Process layers of interest
### 5.1 Separate from hydrology products
Layers of interest (loi) represent the features on the landscape we are interested in relating to the observed responses in the stream. These can include: land-cover, climate, geology, soils, NDVI, slope, etc.
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
## Predictors from openSTARS
system.file("extdata", "nc", "landuse_r.tif", package = "openSTARS") %>%
rast() %>%
setNames("LC") %>%
writeRaster(file.path(save_dir, "LC.tif"),overwrite=T)
landuse_r_path <-file.path(save_dir, "LC.tif")
geology_path<-system.file("extdata", "nc", "geology.shp", package = "openSTARS")
pointsources_path<-system.file("extdata", "nc", "pointsources.shp", package = "openSTARS")
read_sf(pointsources_path) %>%
mutate(pointsource="pontsrc") %>%
st_buffer(60) %>%
write_sf(file.path(save_dir, "pointsources.shp"),overwrite=T)
pointsources_path<-file.path(save_dir, "pointsources.shp")
# Numeric Raster
wbt_slope(
dem = file.path(save_dir, "toy_dem.tif"),
output = file.path(save_dir, "slope.tif")
)
# Combine loi layers
output_filename_loi<-file.path(save_dir,"Processed_loi.gpkg")
# This function standardizes numeric and categorical loi layers.
loi_combined<-process_loi(
dem=toy_dem,
num_inputs=list(# Can be given as a mixture of input types (file paths, or any sf or terra format)
slope=file.path(save_dir, "slope.tif")
),
cat_inputs=list(# Can be given as a mixture of input types (file paths, or any sf or terra format)
landcover=landuse_r_path,
geology=geology_path,
pointsources=pointsources_path
),
variable_names=list( # any unlisted inputs will be used in their entirety
geology="GEO_NAME", # names listed here will subset those attributes or layers from the inputs
pointsources="pontsrc"
),
output_filename=output_filename_loi,
return_products=T,
temp_dir=NULL,
verbose=T
)
ihydro_layers(loi_combined)
```
All categorical layers have been transformed to rasters with 1 indicating presence, and NA for absence. The layers have been rescaled and projected to match the DEM
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
plot(loi_combined$cat_inputs,type="classes",col="darkgreen")
```
All numeric layers have been rescaled and projected to match the DEM.
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
plot(loi_combined$num_inputs,type="continuous")
```
[Back to top](#1-introduction)
### 5.2 Together with hydrology products
Instead of processing the loi separately (as was done above), they can instead be added to the completed workflow, and added to the existing gpkg file for convenient file storage/organization.
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
# In this case, we will use our previously calculated loi results, but if `process_loi`
# is run with an input parameter specified, the loi rasters will be added to the
# output. This can make for convenient data storage.
hydro_out_sparse<-process_loi(
input=hydro_out_sparse,
num_inputs=list(# Can be given as a mixture of input types (file paths, or any sf or terra format)
slope=file.path(save_dir, "slope.tif")
),
cat_inputs=list(# Can be given as a mixture of input types (file paths, or any sf or terra format)
landcover=landuse_r_path,
geology=geology_path,
pointsources=pointsources_path
),
variable_names=list( # any unlisted inputs will be used in their entirety
geology="GEO_NAME", # names listed here will subset those attributes or layers from the inputs
pointsources="pontsrc"
),
return_products=F, # these layers can get large, and it is generally not advised to return them into R
temp_dir=NULL,
verbose=F
)
print(ihydro_layers(hydro_out_sparse),n=40)
```
[Back to top](#1-introduction)
## 6 Calculate (weighted) spatial summaries:
New research is showing that the presence of particular features on the landscape is not always sufficient to predict the in-stream response to those features, and the location of those features relative to the locations of streams and areas of higher flow accumulation is very important (Peterson ***et al.*** 2011).
***ihydro*** provides two functions for calculating weighted spatial summaries: `attrib_points()` and `fasttrib_points()`. `attrib_points()` uses the [hydroweight](https://github.com/bkielstr/hydroweight) package to calculate weighted spatial summaries of supplied loi layers. It can be used to examine the resulting distance-weighted rasters and distance-weighted loi layers. However, `attrib_points()` is slow, so `fasttrib_points()` is available for improved performance for larger datasets.
### 6.1 At specific sampling points
The `attrib_points()` function uses the [hydroweight](https://github.com/bkielstr/hydroweight) package and can calculate and return weighted attributes for any points identified either through supplied sampling points and/or arbitrary link_ids. The returned products can be helpful in visualizing how the final weighted attributes are derived.
```{r, fig.width = 7, fig.height = 7, message = F, error = F, warning = F}
attrib_points_time_small<-system.time(
final_attributes_sub_slow<-attrib_points(
input=hydro_out,
out_filename=file.path(tempdir(),"attrib_points1.csv"),
loi_file=output_filename_loi,
loi_cols=NULL,
sample_points=c("1","25","80"),
link_id=NULL,
clip_region=NULL,
target_o_type=c("point"),
weighting_scheme = c("lumped","iFLO", "iFLS", "HAiFLO", "HAiFLS"),
loi_numeric_stats = c("distwtd_mean", "distwtd_sd", "mean", "sd", "median", "min", "max", "sum"),
inv_function = function(x) {
(x * 0.001 + 1)^-1
},
temp_dir=NULL,
return_products=T,
verbose=T
)
)
final_attributes_sub_slow
```
We can access the weighting layers and weighted attribute layers (if return_products==T) of the `attrib_points()` output.
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
plot(
rast(
list(
rast(final_attributes_sub_slow$products[[2]]$iFLS)%>%
setNames("iFLS Weighting"),
log10(rast(final_attributes_sub_slow$products[[2]]$HAiFLO))%>%
setNames("log10-HAiFLO Weighting"),
rast(final_attributes_sub_slow$products[[2]]$iFLS_num) %>%
setNames("iFLS Weighted Slope"),
log10(rast(final_attributes_sub_slow$products[[2]]$HAiFLO_cat)[[1]]) %>%
setNames("log10-HAiFLO Weighted Landcover Class 1")
)
),
col=viridis::viridis(101),
axes=F
)
```
The `fasttrib_points()` function is faster for larger data sets, more loi, and more sampling points. For very small datasets `attrib_points()` may be faster.
```{r, fig.width = 7, fig.height = 7, message = F, error = F, warning = F}
fasttrib_points_time_small<-system.time(
final_attributes_sub<-fasttrib_points(
input=hydro_out,
out_filename="sample_points_wgtattr.csv",
loi_file=output_filename_loi, # specify loi file, if NULL, function will look for loi in 'input'
loi_cols=NULL, # Specify loi columns to use, if NULL, all will be used
iDW_file=NULL, # Leaving this as NULL will look for iDW in 'input' and calculate any not available
store_iDW=T, # This will save the distance weights to the input or iDW_file if it is specified
sample_points=c("1","25","80"),
link_id=NULL,
target_o_type=c("point"),
weighting_scheme = c("lumped", "iFLS", "HAiFLS","iFLO", "HAiFLO"),
loi_numeric_stats = c("mean", "sd", "median", "min", "max", "sum"),
inv_function = function(x) {
(x * 0.001 + 1)^-1
},
temp_dir=NULL,
verbose=T
)
)
ihydro_layers(hydro_out)
final_attributes_sub
```
[Back to top](#1-introduction)
### 6.2 At all sampled points
```{r, fig.width = 7, fig.height = 7, message = FALSE, error = FALSE, warning = FALSE}
final_attributes<-fasttrib_points(
input=hydro_out,
out_filename="sample_points_wgtattr.csv",
loi_file=output_filename_loi,
loi_cols=NULL,
iDW_file=NULL,
store_iDW=T,
sample_points=hydro_out$snapped_points$site_id, # here we specify all sampling points
link_id=NULL,
target_o_type=c("point"),
weighting_scheme = c("lumped", "iFLS", "iFLO", "HAiFLO", "HAiFLS"),
loi_numeric_stats = c("mean", "sd", "min", "max"),
inv_function = function(x) {
(x * 0.001 + 1)^-1
},
temp_dir=NULL,
verbose=F
)
final_attributes
```
[Back to top](#1-introduction)
### 6.3 Across entire study area `fasttrib_points()`
In order to make predictions across the landscape, we will need to calculate our attributes across the landscape as well. We leave `sample_points` and `link_id` as NULL to predict across all reaches. At this point, we may also consider switching our target_o parameter from the sampling point (as was done above) to the entire reach by setting `target_o_type`="segment_whole". This will calculate all target_o weighting schemes to be the entire reach. This may be more conservative for predicting beyond sampling points as it integrates landscape factors across the length of the whole reach.
```{r error=FALSE, fig.height=12, fig.width=6, message=FALSE, warning=FALSE}
# This function is optional. It will save the hydroweight rasters in either a separate gpkp,
# or the input pkg.
# These can be large and take up a lot of room, so consider whether they are needed before storing.
# Having the raster available does decrease the computation time of fasttrib_points().
dw_time<-system.time(
Processed_HW<-prep_weights(
input=hydro_out,
output_filename=file.path(tempfile(),"Processed_HW_out.gpkg")
)
)
fasttrib_points_time_big<-system.time(
final_attributes_all<-fasttrib_points(
input=hydro_out,
out_filename="sample_points_wgtattr.csv",
loi_file=output_filename_loi,
loi_cols=NULL,
iDW_file=file.path(tempfile(),"iDW_temp.gpkg"),
#iDW_file = file.path(tempfile(),"Processed_HW_out.gpkg"), # to use previously calculated weights
store_iDW=T,
sample_points=NULL, # by specifying neither sample_points nor link_id
link_id=NULL, # we get all reaches
target_o_type=c("segment_point"),
weighting_scheme = c("lumped", "iFLS", "iFLO", "HAiFLO", "HAiFLS"),
loi_numeric_stats = c("mean", "sd", "min", "max"),
inv_function = function(x) {
(x * 0.001 + 1)^-1
},
temp_dir=NULL,
verbose=F
)
)
final_attributes_all
attrib_points_time_big<-system.time(
final_attributes_slow<-attrib_points(
input=hydro_out,
out_filename=file.path(tempdir(),"attrib_points1.csv"),
loi_file=output_filename_loi,
loi_cols=NULL,
sample_points=NULL,
link_id=NULL,
clip_region=NULL,
target_o_type=c("segment_point"),
weighting_scheme = c("lumped","iFLO", "iFLS", "HAiFLO", "HAiFLS"),
loi_numeric_stats = c("distwtd_mean", "distwtd_sd", "mean", "sd", "min", "max"),
inv_function = function(x) {
(x * 0.001 + 1)^-1
},
temp_dir=NULL,
return_products=T,
verbose=F
)
)
final_attributes_slow
pmap(
list(
list("attrib_points()","fasttrib_points()",
"attrib_points()","fasttrib_points()"),
list(attrib_points_time_small,
fasttrib_points_time_small,
attrib_points_time_big,
fasttrib_points_time_big),
list(final_attributes_sub_slow %>%
select(-any_of("products"),-any_of("link_id"),-any_of("site_id"),-any_of("status")),
final_attributes_sub%>%
select(-any_of("products"),-any_of("link_id"),-any_of("site_id"),-any_of("status")),
final_attributes_slow %>%
select(-any_of("products"),-any_of("link_id"),-any_of("site_id"),-any_of("status")),
final_attributes_all%>%
select(-any_of("products"),-any_of("link_id"),-any_of("site_id"),-any_of("status"))
)),
function(.z,.x,.y) paste0(.z," took ",
round(.x[[3]]/60,2),
" min to calculate for ",
nrow(.y)," reaches with ",
ncol(.y),
" attributes using ", nbrOfWorkers(),
" cores.")
)
paste0(round(dw_time[[3]]/60,2),
" min to calculate distance weights for ",
nrow(final_attributes_all)," reaches using ",
nbrOfWorkers(),
" cores.")
```
[Back to top](#1-introduction)
## 7 Example Modelling
### 7.1 Train a Random Forest Model
We'll finish off with a brief demonstration of how to use the geospatial products calculated above to build, and validate a predictive model. First we'll create a combined dataset:
```{r error=FALSE, fig.height=7, fig.width=7, message=FALSE, warning=FALSE}
# As we will ultimately be predicting to our sparse landscape, we will only keep
# autocorrelation variables that are relevant to the sparse landscape. This would
# not necessarily be required if predicting to the more dense landscape.
# get response variables
response_table<-file.path(save_dir, "sites.shp") %>%
read_sf() %>%
as_tibble() %>%
select(site_id,value) %>%
left_join(hydro_out$links %>%
as_tibble() %>%