diff --git a/docs/index.md b/docs/index.md index 865c532..33bea39 100644 --- a/docs/index.md +++ b/docs/index.md @@ -7,6 +7,7 @@ layout: default ### Synthesis Manuscript * [Continuous Chlorophyll GAM Analysis - Categorical Predictors](rtm_chl_gam_analysis_categorical.html) +* [Continuous Chlorophyll GAM Analysis - Flow as a continuous predictor](rtm_chl_gam_analysis_flow.html) ### Contaminants Manuscript diff --git a/docs/rtm_chl_gam_analysis_flow.html b/docs/rtm_chl_gam_analysis_flow.html new file mode 100644 index 0000000..9838ede --- /dev/null +++ b/docs/rtm_chl_gam_analysis_flow.html @@ -0,0 +1,2806 @@ + + + + + + + + + + + + + + + +NDFS Synthesis Manuscript: Chlorophyll GAM analysis + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+
+ +
+ + + + + + + +
+

Purpose

+

Explore and analyze the continuous chlorophyll data to be included in +the NDFS synthesis manuscript. We will attempt to fit a generalized +additive model (GAM) to the data set to help account for seasonality in +the data. This is an extension of the original analysis which used a +categorical predictor for flow action period. In this analysis we will +replace this categorical predictor with daily average flow as a +continuous predictor.

+
+
+

Global code and functions

+
# Load packages
+library(tidyverse)
+library(lubridate)
+library(scales)
+library(knitr)
+library(mgcv)
+library(lme4)
+library(car)
+library(emmeans)
+library(gratia)
+library(visreg)
+library(here)
+library(conflicted)
+
+# Declare package conflict preferences 
+conflicts_prefer(dplyr::filter())
+

Display current versions of R and packages used for this +analysis:

+
devtools::session_info()
+
## ─ Session info ───────────────────────────────────────────────────────────────
+##  setting  value
+##  version  R version 4.2.3 (2023-03-15 ucrt)
+##  os       Windows 10 x64 (build 19044)
+##  system   x86_64, mingw32
+##  ui       RTerm
+##  language (EN)
+##  collate  English_United States.utf8
+##  ctype    English_United States.utf8
+##  tz       America/Los_Angeles
+##  date     2023-07-20
+##  pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
+## 
+## ─ Packages ───────────────────────────────────────────────────────────────────
+##  package      * version  date (UTC) lib source
+##  abind          1.4-5    2016-07-21 [1] CRAN (R 4.2.0)
+##  boot           1.3-28.1 2022-11-22 [2] CRAN (R 4.2.3)
+##  bslib          0.4.2    2022-12-16 [1] CRAN (R 4.2.2)
+##  cachem         1.0.8    2023-05-01 [1] CRAN (R 4.2.3)
+##  callr          3.7.3    2022-11-02 [1] CRAN (R 4.2.2)
+##  car          * 3.1-2    2023-03-30 [1] CRAN (R 4.2.3)
+##  carData      * 3.0-5    2022-01-06 [1] CRAN (R 4.2.1)
+##  cli            3.6.1    2023-03-23 [1] CRAN (R 4.2.3)
+##  coda           0.19-4   2020-09-30 [1] CRAN (R 4.2.1)
+##  codetools      0.2-19   2023-02-01 [2] CRAN (R 4.2.3)
+##  colorspace     2.1-0    2023-01-23 [1] CRAN (R 4.2.2)
+##  conflicted   * 1.2.0    2023-02-01 [1] CRAN (R 4.2.2)
+##  crayon         1.5.2    2022-09-29 [1] CRAN (R 4.2.1)
+##  devtools       2.4.5    2022-10-11 [1] CRAN (R 4.2.1)
+##  digest         0.6.31   2022-12-11 [1] CRAN (R 4.2.2)
+##  dplyr        * 1.1.2    2023-04-20 [1] CRAN (R 4.2.3)
+##  ellipsis       0.3.2    2021-04-29 [1] CRAN (R 4.2.1)
+##  emmeans      * 1.8.5    2023-03-08 [1] CRAN (R 4.2.2)
+##  estimability   1.4.1    2022-08-05 [1] CRAN (R 4.2.1)
+##  evaluate       0.21     2023-05-05 [1] CRAN (R 4.2.3)
+##  fansi          1.0.4    2023-01-22 [1] CRAN (R 4.2.2)
+##  fastmap        1.1.1    2023-02-24 [1] CRAN (R 4.2.2)
+##  forcats      * 1.0.0    2023-01-29 [1] CRAN (R 4.2.2)
+##  fs             1.6.2    2023-04-25 [1] CRAN (R 4.2.3)
+##  generics       0.1.3    2022-07-05 [1] CRAN (R 4.2.1)
+##  ggplot2      * 3.4.2    2023-04-03 [1] CRAN (R 4.2.3)
+##  glue           1.6.2    2022-02-24 [1] CRAN (R 4.2.1)
+##  gratia       * 0.8.1    2023-02-02 [1] CRAN (R 4.2.2)
+##  gtable         0.3.3    2023-03-21 [1] CRAN (R 4.2.3)
+##  here         * 1.0.1    2020-12-13 [1] CRAN (R 4.2.1)
+##  hms            1.1.3    2023-03-21 [1] CRAN (R 4.2.3)
+##  htmltools      0.5.5    2023-03-23 [1] CRAN (R 4.2.3)
+##  htmlwidgets    1.6.2    2023-03-17 [1] CRAN (R 4.2.3)
+##  httpuv         1.6.9    2023-02-14 [1] CRAN (R 4.2.2)
+##  jquerylib      0.1.4    2021-04-26 [1] CRAN (R 4.2.1)
+##  jsonlite       1.8.4    2022-12-06 [1] CRAN (R 4.2.2)
+##  knitr        * 1.42     2023-01-25 [1] CRAN (R 4.2.2)
+##  later          1.3.0    2021-08-18 [1] CRAN (R 4.2.1)
+##  lattice        0.21-8   2023-04-05 [1] CRAN (R 4.2.3)
+##  lifecycle      1.0.3    2022-10-07 [1] CRAN (R 4.2.1)
+##  lme4         * 1.1-32   2023-03-14 [1] CRAN (R 4.2.3)
+##  lubridate    * 1.9.2    2023-02-10 [1] CRAN (R 4.2.2)
+##  magrittr       2.0.3    2022-03-30 [1] CRAN (R 4.2.1)
+##  MASS           7.3-58.2 2023-01-23 [2] CRAN (R 4.2.3)
+##  Matrix       * 1.5-4    2023-04-04 [1] CRAN (R 4.2.3)
+##  memoise        2.0.1    2021-11-26 [1] CRAN (R 4.2.1)
+##  mgcv         * 1.8-42   2023-03-02 [1] CRAN (R 4.2.2)
+##  mime           0.12     2021-09-28 [1] CRAN (R 4.2.0)
+##  miniUI         0.1.1.1  2018-05-18 [1] CRAN (R 4.2.1)
+##  minqa          1.2.5    2022-10-19 [1] CRAN (R 4.2.2)
+##  multcomp       1.4-23   2023-03-09 [1] CRAN (R 4.2.2)
+##  munsell        0.5.0    2018-06-12 [1] CRAN (R 4.2.1)
+##  mvnfast        0.2.8    2023-02-23 [1] CRAN (R 4.2.2)
+##  mvtnorm        1.1-3    2021-10-08 [1] CRAN (R 4.2.0)
+##  nlme         * 3.1-162  2023-01-31 [2] CRAN (R 4.2.3)
+##  nloptr         2.0.3    2022-05-26 [1] CRAN (R 4.2.1)
+##  patchwork      1.1.2    2022-08-19 [1] CRAN (R 4.2.1)
+##  pillar         1.9.0    2023-03-22 [1] CRAN (R 4.2.3)
+##  pkgbuild       1.4.0    2022-11-27 [1] CRAN (R 4.2.2)
+##  pkgconfig      2.0.3    2019-09-22 [1] CRAN (R 4.2.1)
+##  pkgload        1.3.2    2022-11-16 [1] CRAN (R 4.2.2)
+##  prettyunits    1.1.1    2020-01-24 [1] CRAN (R 4.2.1)
+##  processx       3.8.1    2023-04-18 [1] CRAN (R 4.2.3)
+##  profvis        0.3.7    2020-11-02 [1] CRAN (R 4.2.1)
+##  promises       1.2.0.1  2021-02-11 [1] CRAN (R 4.2.1)
+##  ps             1.7.5    2023-04-18 [1] CRAN (R 4.2.3)
+##  purrr        * 1.0.1    2023-01-10 [1] CRAN (R 4.2.2)
+##  R6             2.5.1    2021-08-19 [1] CRAN (R 4.2.1)
+##  Rcpp           1.0.10   2023-01-22 [1] CRAN (R 4.2.2)
+##  readr        * 2.1.4    2023-02-10 [1] CRAN (R 4.2.2)
+##  remotes        2.4.2    2021-11-30 [1] CRAN (R 4.2.1)
+##  rlang          1.1.1    2023-04-28 [1] CRAN (R 4.2.3)
+##  rmarkdown      2.21     2023-03-26 [1] CRAN (R 4.2.3)
+##  rprojroot      2.0.3    2022-04-02 [1] CRAN (R 4.2.1)
+##  rstudioapi     0.14     2022-08-22 [1] CRAN (R 4.2.1)
+##  sandwich       3.0-2    2022-06-15 [1] CRAN (R 4.2.1)
+##  sass           0.4.6    2023-05-03 [1] CRAN (R 4.2.3)
+##  scales       * 1.2.1    2022-08-20 [1] CRAN (R 4.2.1)
+##  sessioninfo    1.2.2    2021-12-06 [1] CRAN (R 4.2.1)
+##  shiny          1.7.4    2022-12-15 [1] CRAN (R 4.2.2)
+##  stringi        1.7.12   2023-01-11 [1] CRAN (R 4.2.2)
+##  stringr      * 1.5.0    2022-12-02 [1] CRAN (R 4.2.2)
+##  survival       3.5-5    2023-03-12 [1] CRAN (R 4.2.3)
+##  TH.data        1.1-2    2023-04-17 [1] CRAN (R 4.2.3)
+##  tibble       * 3.2.1    2023-03-20 [1] CRAN (R 4.2.3)
+##  tidyr        * 1.3.0    2023-01-24 [1] CRAN (R 4.2.2)
+##  tidyselect     1.2.0    2022-10-10 [1] CRAN (R 4.2.1)
+##  tidyverse    * 2.0.0    2023-02-22 [1] CRAN (R 4.2.2)
+##  timechange     0.2.0    2023-01-11 [1] CRAN (R 4.2.2)
+##  tzdb           0.4.0    2023-05-12 [1] CRAN (R 4.2.3)
+##  urlchecker     1.0.1    2021-11-30 [1] CRAN (R 4.2.1)
+##  usethis        2.1.6    2022-05-25 [1] CRAN (R 4.2.1)
+##  utf8           1.2.3    2023-01-31 [1] CRAN (R 4.2.2)
+##  vctrs          0.6.2    2023-04-19 [1] CRAN (R 4.2.3)
+##  visreg       * 2.7.0    2020-06-04 [1] CRAN (R 4.2.1)
+##  withr          2.5.0    2022-03-03 [1] CRAN (R 4.2.1)
+##  xfun           0.39     2023-04-20 [1] CRAN (R 4.2.3)
+##  xtable         1.8-4    2019-04-21 [1] CRAN (R 4.2.1)
+##  yaml           2.3.7    2023-01-23 [1] CRAN (R 4.2.2)
+##  zoo            1.8-12   2023-04-13 [1] CRAN (R 4.2.3)
+## 
+##  [1] C:/R/win-library/4.2
+##  [2] C:/Program Files/R/R-4.2.3/library
+## 
+## ──────────────────────────────────────────────────────────────────────────────
+
+
+

Import Data

+
# Define file path for processed data
+fp_data <- here("manuscript_synthesis/data/processed")
+
+# Import daily average chlorophyll data
+df_chla <- readRDS(file.path(fp_data, "chla_daily_avg_2013-2019.rds"))
+
+# Import daily average flow data
+df_flow <- readRDS(file.path(fp_data, "flow_daily_avg_2013-2019.rds"))
+
+
+

Prepare Data

+
# Create a vector for the factor order of StationCode
+sta_order <- c(
+  "RCS",
+  "RD22",
+  "I80",
+  "LIS",
+  "STTD",
+  "LIB",
+  "RVB"
+)
+
+# We will use LIS flow data as a proxy for STTD and RD22 flow data as a proxy
+  # for I-80
+df_flow_c <- df_flow %>% 
+  filter(StationCode %in% c("RD22", "LIS")) %>% 
+  mutate(StationCode = case_match(StationCode, "RD22" ~ "I80", "LIS" ~ "STTD")) %>% 
+  bind_rows(df_flow)
+
+# Prepare chlorophyll and flow data for exploration and analysis
+df_chla_c1 <- df_chla %>% 
+  # Remove RYI since its chlorophyll data is so sparse
+  filter(StationCode != "RYI") %>% 
+  # Join flow data to chlorophyll data
+  left_join(df_flow_c, by = join_by(StationCode, Date)) %>% 
+  # Remove all NA flow values
+  drop_na(FlowAvg) %>% 
+  mutate(
+    # Scale and log transform chlorophyll values
+    ChlaAvg_log = log(ChlaAvg * 1000),
+    # Add Year and DOY variables
+    Year = year(Date),
+    DOY = yday(Date),
+    # Apply factor order to StationCode
+    StationCode = factor(StationCode, levels = sta_order),
+    # Add a column for Year as a factor for the model
+    Year_fct = factor(Year)
+  ) %>% 
+  arrange(StationCode, Date)
+
+
+

Explore sample counts by Station

+
df_chla_c1 %>% 
+  summarize(
+    min_date = min(Date),
+    max_date = max(Date),
+    num_samples = n(),
+    .by = c(StationCode, Year)
+  ) %>% 
+  arrange(StationCode, Year) %>% 
+  kable()
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
StationCodeYearmin_datemax_datenum_samples
RCS20142014-07-242014-09-2459
RCS20152015-08-172015-09-2843
RCS20162016-07-122016-11-15105
RCS20172017-07-012017-11-15128
RCS20182018-07-102018-11-15129
RCS20192019-07-112019-11-15128
RD2220142014-09-302014-11-1547
RD2220152015-07-242015-11-10110
RD2220162016-07-012016-11-15138
RD2220172017-07-012017-11-15138
RD2220182018-07-052018-11-15134
RD2220192019-07-112019-11-15128
I8020142014-09-302014-11-1547
I8020152015-07-242015-11-1091
I8020162016-07-012016-11-15138
I8020172017-07-012017-11-15138
I8020182018-07-052018-11-15134
I8020192019-07-112019-11-15128
LIS20132013-07-012013-11-15111
LIS20142014-07-012014-11-15133
LIS20152015-07-012015-11-15133
LIS20162016-07-012016-11-15135
LIS20172017-07-012017-11-15129
LIS20182018-07-012018-11-15131
LIS20192019-07-052019-11-15122
STTD20132013-08-152013-11-0478
STTD20142014-07-242014-11-1595
STTD20152015-07-272015-11-15107
STTD20162016-07-012016-11-15135
STTD20172017-07-012017-11-1576
STTD20182018-07-012018-10-1191
STTD20192019-07-052019-11-15111
LIB20142014-10-012014-11-1240
LIB20152015-07-012015-11-15129
LIB20162016-07-012016-11-15135
LIB20172017-07-032017-11-13116
LIB20182018-08-142018-11-1539
LIB20192019-07-012019-10-2541
RVB20132013-07-012013-11-15138
RVB20142014-07-012014-11-15136
RVB20152015-07-012015-11-15138
RVB20162016-07-012016-11-15114
RVB20172017-07-012017-11-15138
RVB20182018-07-012018-11-15138
RVB20192019-07-012019-11-15121
+

Looking at the sample counts and date ranges, we’ll only include +years 2015-2019 for the analysis.

+
df_chla_c2 <- df_chla_c1 %>% 
+  filter(Year %in% 2015:2019) %>% 
+  mutate(Year_fct = fct_drop(Year_fct))
+
+
+

Plots

+

Let’s explore the data with some plots. First, lets plot the data in +scatterplots of chlorophyll and flow facetted by Station and grouping +all years together.

+
df_chla_c2 %>% 
+  ggplot(aes(x = FlowAvg, y = ChlaAvg)) +
+  geom_point() +
+  geom_smooth(formula = "y ~ x") +
+  facet_wrap(vars(StationCode), scales = "free") +
+  theme_bw()
+

+

At first glance, I’m not sure how well flow is going to be able to +predict chlorophyll concentrations. At the furthest upstream stations - +RCS, RD22, and I-80 - chlorophyll appears to be highest at the lowest +flows, but the variation is at its maximum at the lowest flows. There +may be some dilution effect going on here at the higher flows. At LIS +and STTD, there does seem to be a modest increase in chlorophyll +concentrations at the mid-range flows. This pattern is even more obvious +at LIB. There appears to be no effect of flow on chlorophyll at RVB, but +the range of chlorophyll concentrations is narrow at this station +(between 0 and 4).

+

Let’s break these scatterplots apart by year to see how these +patterns vary annually.

+
df_chla_c2 %>% 
+  ggplot(aes(x = FlowAvg, y = ChlaAvg)) +
+  geom_point() +
+  geom_smooth(formula = "y ~ x") +
+  facet_wrap(
+    vars(StationCode, Year), 
+    ncol = 5, 
+    scales = "free", 
+    labeller = labeller(.multi_line = FALSE)
+  ) +
+  theme_bw()
+

+

The patterns appear to vary annually at each station, which may +justify using a 3-way interaction. We’ll stick with the model using +2-way interactions for now.

+
+
+

GAM Model

+

We’ll try running a GAM including all two-way interactions between +Year, Daily Average Flow, and Station, and a smooth term for day of year +to account for seasonality. First we’ll run the GAM without accounting +for serial autocorrelation.

+
+

Initial Model

+
m_chla_gam <- gam(
+  ChlaAvg_log ~ (Year_fct + FlowAvg + StationCode)^2 + s(DOY), 
+  data = df_chla_c2,
+  method = "REML"
+)
+

Lets look at the model summary and diagnostics:

+
summary(m_chla_gam)
+
## 
+## Family: gaussian 
+## Link function: identity 
+## 
+## Formula:
+## ChlaAvg_log ~ (Year_fct + FlowAvg + StationCode)^2 + s(DOY)
+## 
+## Parametric coefficients:
+##                                Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)                   9.345e+00  6.754e-02 138.373  < 2e-16 ***
+## Year_fct2016                  4.495e-02  7.282e-02   0.617 0.537079    
+## Year_fct2017                  1.404e-01  7.423e-02   1.891 0.058674 .  
+## Year_fct2018                  1.303e-01  7.055e-02   1.846 0.064946 .  
+## Year_fct2019                  5.097e-02  7.070e-02   0.721 0.470965    
+## FlowAvg                      -1.504e-03  1.060e-04 -14.184  < 2e-16 ***
+## StationCodeRD22               3.131e-01  7.773e-02   4.028 5.73e-05 ***
+## StationCodeI80                5.471e-01  7.956e-02   6.876 7.08e-12 ***
+## StationCodeLIS               -7.165e-01  7.533e-02  -9.512  < 2e-16 ***
+## StationCodeSTTD              -6.661e-01  7.700e-02  -8.650  < 2e-16 ***
+## StationCodeLIB               -1.739e+00  1.199e-01 -14.507  < 2e-16 ***
+## StationCodeRVB               -1.878e+00  1.131e-01 -16.596  < 2e-16 ***
+## Year_fct2016:FlowAvg         -4.568e-05  2.822e-05  -1.619 0.105529    
+## Year_fct2017:FlowAvg         -6.586e-05  2.593e-05  -2.540 0.011123 *  
+## Year_fct2018:FlowAvg         -3.120e-05  2.632e-05  -1.186 0.235796    
+## Year_fct2019:FlowAvg         -5.887e-05  2.666e-05  -2.208 0.027298 *  
+## Year_fct2016:StationCodeRD22 -6.730e-01  8.776e-02  -7.669 2.16e-14 ***
+## Year_fct2017:StationCodeRD22 -5.241e-01  8.951e-02  -5.856 5.12e-09 ***
+## Year_fct2018:StationCodeRD22 -4.729e-01  8.593e-02  -5.503 3.96e-08 ***
+## Year_fct2019:StationCodeRD22 -1.859e-01  8.623e-02  -2.156 0.031146 *  
+## Year_fct2016:StationCodeI80   3.992e-02  8.939e-02   0.447 0.655170    
+## Year_fct2017:StationCodeI80  -2.124e-01  9.109e-02  -2.332 0.019756 *  
+## Year_fct2018:StationCodeI80  -5.269e-01  8.763e-02  -6.013 1.99e-09 ***
+## Year_fct2019:StationCodeI80   4.941e-01  8.794e-02   5.618 2.06e-08 ***
+## Year_fct2016:StationCodeLIS   4.374e-01  8.662e-02   5.049 4.64e-07 ***
+## Year_fct2017:StationCodeLIS   3.475e-01  8.815e-02   3.942 8.23e-05 ***
+## Year_fct2018:StationCodeLIS  -1.694e-01  8.495e-02  -1.994 0.046168 *  
+## Year_fct2019:StationCodeLIS   4.356e-01  8.590e-02   5.071 4.14e-07 ***
+## Year_fct2016:StationCodeSTTD  1.353e-02  8.809e-02   0.154 0.877975    
+## Year_fct2017:StationCodeSTTD -3.708e-01  9.406e-02  -3.942 8.20e-05 ***
+## Year_fct2018:StationCodeSTTD -8.224e-01  8.956e-02  -9.183  < 2e-16 ***
+## Year_fct2019:StationCodeSTTD -1.156e+00  8.800e-02 -13.136  < 2e-16 ***
+## Year_fct2016:StationCodeLIB   3.420e-01  1.006e-01   3.399 0.000682 ***
+## Year_fct2017:StationCodeLIB  -1.307e-01  1.038e-01  -1.259 0.208266    
+## Year_fct2018:StationCodeLIB  -2.026e+00  1.202e-01 -16.858  < 2e-16 ***
+## Year_fct2019:StationCodeLIB  -8.146e-01  1.171e-01  -6.957 4.05e-12 ***
+## Year_fct2016:StationCodeRVB   1.467e-01  1.835e-01   0.800 0.424039    
+## Year_fct2017:StationCodeRVB   1.630e-02  1.798e-01   0.091 0.927751    
+## Year_fct2018:StationCodeRVB  -3.632e-01  1.566e-01  -2.319 0.020439 *  
+## Year_fct2019:StationCodeRVB  -4.883e-01  1.680e-01  -2.907 0.003667 ** 
+## FlowAvg:StationCodeRD22       6.743e-05  1.347e-04   0.501 0.616688    
+## FlowAvg:StationCodeI80       -5.232e-04  1.358e-04  -3.851 0.000119 ***
+## FlowAvg:StationCodeLIS        1.778e-03  1.360e-04  13.078  < 2e-16 ***
+## FlowAvg:StationCodeSTTD       3.382e-03  1.374e-04  24.621  < 2e-16 ***
+## FlowAvg:StationCodeLIB        1.452e-03  1.189e-04  12.214  < 2e-16 ***
+## FlowAvg:StationCodeRVB        1.544e-03  1.037e-04  14.884  < 2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Approximate significance of smooth terms:
+##          edf Ref.df     F p-value    
+## s(DOY) 8.456  8.918 63.14  <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## R-sq.(adj) =  0.852   Deviance explained = 85.4%
+## -REML = 2068.9  Scale est. = 0.14737   n = 4089
+
appraise(m_chla_gam)
+

+
k.check(m_chla_gam)
+
##        k'      edf   k-index p-value
+## s(DOY)  9 8.456119 0.9165078       0
+
draw(m_chla_gam, select = 1, residuals = TRUE, rug = FALSE)
+

+
plot(m_chla_gam, pages = 1, all.terms = TRUE)
+

+
acf(residuals(m_chla_gam))
+

+
Box.test(residuals(m_chla_gam), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_chla_gam)
+## X-squared = 14644, df = 20, p-value < 2.2e-16
+
+
+

Model with k = 5

+

The model definitely has residual autocorrelation as indicated by the +ACF plot and the Box-Ljung test. Even though the p-value for the k-check +is zero, the smooth term for day of year appears to be overfitted. Let’s +try a smaller k-value for the smooth first, then lets try to address the +residual autocorrelation.

+
m_chla_gam_k5 <- gam(
+  ChlaAvg_log ~ (Year_fct + FlowAvg + StationCode)^2 + s(DOY, k = 5), 
+  data = df_chla_c2,
+  method = "REML"
+)
+
+summary(m_chla_gam_k5)
+
## 
+## Family: gaussian 
+## Link function: identity 
+## 
+## Formula:
+## ChlaAvg_log ~ (Year_fct + FlowAvg + StationCode)^2 + s(DOY, k = 5)
+## 
+## Parametric coefficients:
+##                                Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)                   9.346e+00  6.793e-02 137.580  < 2e-16 ***
+## Year_fct2016                  4.038e-02  7.333e-02   0.551 0.581894    
+## Year_fct2017                  1.397e-01  7.474e-02   1.869 0.061631 .  
+## Year_fct2018                  1.308e-01  7.106e-02   1.841 0.065644 .  
+## Year_fct2019                  5.139e-02  7.120e-02   0.722 0.470500    
+## FlowAvg                      -1.477e-03  1.047e-04 -14.107  < 2e-16 ***
+## StationCodeRD22               3.093e-01  7.826e-02   3.953 7.86e-05 ***
+## StationCodeI80                5.437e-01  8.004e-02   6.793 1.26e-11 ***
+## StationCodeLIS               -7.177e-01  7.582e-02  -9.467  < 2e-16 ***
+## StationCodeSTTD              -6.715e-01  7.752e-02  -8.662  < 2e-16 ***
+## StationCodeLIB               -1.694e+00  1.206e-01 -14.056  < 2e-16 ***
+## StationCodeRVB               -1.916e+00  1.139e-01 -16.827  < 2e-16 ***
+## Year_fct2016:FlowAvg         -4.934e-05  2.841e-05  -1.737 0.082517 .  
+## Year_fct2017:FlowAvg         -7.658e-05  2.608e-05  -2.936 0.003339 ** 
+## Year_fct2018:FlowAvg         -4.301e-05  2.647e-05  -1.625 0.104294    
+## Year_fct2019:FlowAvg         -7.045e-05  2.683e-05  -2.626 0.008671 ** 
+## Year_fct2016:StationCodeRD22 -6.673e-01  8.840e-02  -7.549 5.38e-14 ***
+## Year_fct2017:StationCodeRD22 -5.218e-01  9.016e-02  -5.787 7.71e-09 ***
+## Year_fct2018:StationCodeRD22 -4.684e-01  8.655e-02  -5.412 6.61e-08 ***
+## Year_fct2019:StationCodeRD22 -1.803e-01  8.686e-02  -2.076 0.037951 *  
+## Year_fct2016:StationCodeI80   4.558e-02  8.997e-02   0.507 0.612469    
+## Year_fct2017:StationCodeI80  -2.103e-01  9.171e-02  -2.293 0.021872 *  
+## Year_fct2018:StationCodeI80  -5.223e-01  8.821e-02  -5.921 3.47e-09 ***
+## Year_fct2019:StationCodeI80   4.999e-01  8.852e-02   5.647 1.75e-08 ***
+## Year_fct2016:StationCodeLIS   4.401e-01  8.726e-02   5.043 4.77e-07 ***
+## Year_fct2017:StationCodeLIS   3.492e-01  8.879e-02   3.933 8.54e-05 ***
+## Year_fct2018:StationCodeLIS  -1.682e-01  8.557e-02  -1.965 0.049449 *  
+## Year_fct2019:StationCodeLIS   4.334e-01  8.650e-02   5.010 5.67e-07 ***
+## Year_fct2016:StationCodeSTTD  2.014e-02  8.874e-02   0.227 0.820492    
+## Year_fct2017:StationCodeSTTD -3.685e-01  9.471e-02  -3.891 0.000102 ***
+## Year_fct2018:StationCodeSTTD -8.279e-01  9.019e-02  -9.179  < 2e-16 ***
+## Year_fct2019:StationCodeSTTD -1.159e+00  8.861e-02 -13.077  < 2e-16 ***
+## Year_fct2016:StationCodeLIB   3.343e-01  1.013e-01   3.299 0.000979 ***
+## Year_fct2017:StationCodeLIB  -1.535e-01  1.046e-01  -1.468 0.142108    
+## Year_fct2018:StationCodeLIB  -2.057e+00  1.209e-01 -17.010  < 2e-16 ***
+## Year_fct2019:StationCodeLIB  -8.413e-01  1.178e-01  -7.145 1.07e-12 ***
+## Year_fct2016:StationCodeRVB   1.381e-01  1.847e-01   0.748 0.454607    
+## Year_fct2017:StationCodeRVB   7.168e-02  1.808e-01   0.396 0.691845    
+## Year_fct2018:StationCodeRVB  -3.055e-01  1.575e-01  -1.939 0.052555 .  
+## Year_fct2019:StationCodeRVB  -4.268e-01  1.690e-01  -2.525 0.011620 *  
+## FlowAvg:StationCodeRD22       5.488e-05  1.357e-04   0.404 0.685954    
+## FlowAvg:StationCodeI80       -5.399e-04  1.368e-04  -3.945 8.12e-05 ***
+## FlowAvg:StationCodeLIS        1.775e-03  1.370e-04  12.958  < 2e-16 ***
+## FlowAvg:StationCodeSTTD       3.390e-03  1.384e-04  24.498  < 2e-16 ***
+## FlowAvg:StationCodeLIB        1.455e-03  1.177e-04  12.359  < 2e-16 ***
+## FlowAvg:StationCodeRVB        1.526e-03  1.024e-04  14.901  < 2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Approximate significance of smooth terms:
+##          edf Ref.df     F p-value    
+## s(DOY) 3.593  3.901 125.7  <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## R-sq.(adj) =   0.85   Deviance explained = 85.2%
+## -REML = 2089.1  Scale est. = 0.14973   n = 4089
+
appraise(m_chla_gam_k5)
+

+
k.check(m_chla_gam_k5)
+
##        k'      edf   k-index p-value
+## s(DOY)  4 3.593062 0.9011268       0
+
draw(m_chla_gam_k5, select = 1, residuals = TRUE, rug = FALSE)
+

+
plot(m_chla_gam_k5, pages = 1, all.terms = TRUE)
+

+
acf(residuals(m_chla_gam_k5))
+

+
Box.test(residuals(m_chla_gam_k5), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_chla_gam_k5)
+## X-squared = 14647, df = 20, p-value < 2.2e-16
+

Changing the k-value to 5 seems to help reduce the “wiggliness” of +the smooth term for DOY, but the p-value for the k-check is still zero. +Despite this, lets proceed with a k-value of 5.

+
+
+

Model with AR() correlation structure

+

Now, we’ll try to deal with the residual autocorrelation. We’ll run +AR(1), AR(3), and AR(4) models and compare them using AIC. It wasn’t +possible running an AR(2) model.

+
# Define model formula as an object
+f_chla_gam_k5 <- as.formula("ChlaAvg_log ~ (Year_fct + FlowAvg + StationCode)^2 + s(DOY, k = 5)")
+
+# Fit original model with k = 5 and three successive AR(p) models
+m_chla_gamm_k5 <- gamm(
+  f_chla_gam_k5, 
+  data = df_chla_c2,
+  method = "REML"
+)
+
+m_chla_gamm_k5_ar1 <- gamm(
+  f_chla_gam_k5, 
+  data = df_chla_c2, 
+  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 1), # grouped by Year_fct and StationCode
+  method = "REML"
+)
+
+m_chla_gamm_k5_ar3 <- gamm(
+  f_chla_gam_k5, 
+  data = df_chla_c2, 
+  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 3),
+  method = "REML"
+)
+
+m_chla_gamm_k5_ar4 <- gamm(
+  f_chla_gam_k5, 
+  data = df_chla_c2, 
+  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 4),
+  method = "REML"
+)
+
+# Compare models
+anova(
+  m_chla_gamm_k5$lme, 
+  m_chla_gamm_k5_ar1$lme, 
+  m_chla_gamm_k5_ar3$lme, 
+  m_chla_gamm_k5_ar4$lme
+)
+
##                        Model df       AIC       BIC    logLik   Test  L.Ratio
+## m_chla_gamm_k5$lme         1 49  4276.257  4585.177 -2089.128                
+## m_chla_gamm_k5_ar1$lme     2 50 -2839.254 -2524.029  1469.627 1 vs 2 7117.511
+## m_chla_gamm_k5_ar3$lme     3 52 -2871.074 -2543.240  1487.537 2 vs 3   35.820
+## m_chla_gamm_k5_ar4$lme     4 53 -2870.040 -2535.902  1488.020 3 vs 4    0.966
+##                        p-value
+## m_chla_gamm_k5$lme            
+## m_chla_gamm_k5_ar1$lme  <.0001
+## m_chla_gamm_k5_ar3$lme  <.0001
+## m_chla_gamm_k5_ar4$lme  0.3257
+

It looks like the AR(3) model has the best fit according to the AIC +values and backed up by the BIC values. Let’s take a closer look at that +one.

+
+
+

AR(3) Model

+
# Pull out the residuals and the GAM model
+resid_ar3 <- residuals(m_chla_gamm_k5_ar3$lme, type = "normalized")
+m_chla_gamm_k5_ar3_gam <- m_chla_gamm_k5_ar3$gam
+
+summary(m_chla_gamm_k5_ar3_gam)
+
## 
+## Family: gaussian 
+## Link function: identity 
+## 
+## Formula:
+## ChlaAvg_log ~ (Year_fct + FlowAvg + StationCode)^2 + s(DOY, k = 5)
+## 
+## Parametric coefficients:
+##                                Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)                   9.130e+00  2.732e-01  33.417  < 2e-16 ***
+## Year_fct2016                  1.461e-01  3.335e-01   0.438 0.661424    
+## Year_fct2017                  2.770e-01  3.262e-01   0.849 0.395886    
+## Year_fct2018                  2.257e-01  3.245e-01   0.696 0.486740    
+## Year_fct2019                  1.124e-01  3.249e-01   0.346 0.729430    
+## FlowAvg                      -2.224e-04  1.547e-04  -1.438 0.150648    
+## StationCodeRD22               3.883e-01  3.337e-01   1.164 0.244559    
+## StationCodeI80                7.753e-01  3.429e-01   2.261 0.023832 *  
+## StationCodeLIS               -5.168e-01  3.253e-01  -1.589 0.112210    
+## StationCodeSTTD              -4.315e-01  3.344e-01  -1.291 0.196911    
+## StationCodeLIB               -1.296e+00  3.338e-01  -3.881 0.000105 ***
+## StationCodeRVB               -1.480e+00  3.267e-01  -4.531 6.03e-06 ***
+## Year_fct2016:FlowAvg         -1.031e-05  1.932e-05  -0.534 0.593709    
+## Year_fct2017:FlowAvg         -1.169e-05  1.859e-05  -0.629 0.529421    
+## Year_fct2018:FlowAvg         -6.940e-06  1.947e-05  -0.356 0.721496    
+## Year_fct2019:FlowAvg         -3.020e-05  1.902e-05  -1.588 0.112291    
+## Year_fct2016:StationCodeRD22 -6.668e-01  4.210e-01  -1.584 0.113333    
+## Year_fct2017:StationCodeRD22 -4.765e-01  4.155e-01  -1.147 0.251449    
+## Year_fct2018:StationCodeRD22 -4.998e-01  4.148e-01  -1.205 0.228320    
+## Year_fct2019:StationCodeRD22 -2.484e-01  4.166e-01  -0.596 0.551077    
+## Year_fct2016:StationCodeI80  -1.773e-01  4.282e-01  -0.414 0.678805    
+## Year_fct2017:StationCodeI80  -3.613e-01  4.229e-01  -0.854 0.392940    
+## Year_fct2018:StationCodeI80  -7.238e-01  4.221e-01  -1.715 0.086463 .  
+## Year_fct2019:StationCodeI80   3.897e-01  4.239e-01   0.919 0.357942    
+## Year_fct2016:StationCodeLIS   3.748e-01  4.155e-01   0.902 0.367165    
+## Year_fct2017:StationCodeLIS   2.243e-01  4.110e-01   0.546 0.585290    
+## Year_fct2018:StationCodeLIS  -2.799e-01  4.093e-01  -0.684 0.494040    
+## Year_fct2019:StationCodeLIS   3.448e-01  4.124e-01   0.836 0.403086    
+## Year_fct2016:StationCodeSTTD -1.322e-01  4.227e-01  -0.313 0.754547    
+## Year_fct2017:StationCodeSTTD -7.055e-01  4.382e-01  -1.610 0.107497    
+## Year_fct2018:StationCodeSTTD -9.840e-01  4.303e-01  -2.287 0.022269 *  
+## Year_fct2019:StationCodeSTTD -1.116e+00  4.228e-01  -2.639 0.008354 ** 
+## Year_fct2016:StationCodeLIB   2.350e-01  4.178e-01   0.562 0.573858    
+## Year_fct2017:StationCodeLIB  -2.961e-01  4.173e-01  -0.710 0.478048    
+## Year_fct2018:StationCodeLIB  -2.245e+00  4.661e-01  -4.815 1.52e-06 ***
+## Year_fct2019:StationCodeLIB  -9.874e-01  4.630e-01  -2.133 0.033007 *  
+## Year_fct2016:StationCodeRVB  -1.185e-01  4.375e-01  -0.271 0.786513    
+## Year_fct2017:StationCodeRVB  -5.126e-01  4.339e-01  -1.181 0.237593    
+## Year_fct2018:StationCodeRVB  -5.631e-01  4.235e-01  -1.330 0.183685    
+## Year_fct2019:StationCodeRVB  -6.985e-01  4.300e-01  -1.625 0.104317    
+## FlowAvg:StationCodeRD22      -3.956e-04  2.333e-04  -1.695 0.090093 .  
+## FlowAvg:StationCodeI80       -1.237e-03  2.612e-04  -4.734 2.28e-06 ***
+## FlowAvg:StationCodeLIS        1.973e-05  2.622e-04   0.075 0.940026    
+## FlowAvg:StationCodeSTTD       1.383e-03  2.651e-04   5.217 1.91e-07 ***
+## FlowAvg:StationCodeLIB        3.159e-04  1.599e-04   1.976 0.048254 *  
+## FlowAvg:StationCodeRVB        2.277e-04  1.545e-04   1.473 0.140752    
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Approximate significance of smooth terms:
+##          edf Ref.df     F p-value    
+## s(DOY) 2.055  2.055 20.45  <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## R-sq.(adj) =  0.829   
+##   Scale est. = 0.19838   n = 4089
+
appraise(m_chla_gamm_k5_ar3_gam)
+

+
k.check(m_chla_gamm_k5_ar3_gam)
+
##        k'      edf   k-index p-value
+## s(DOY)  4 2.054539 0.7920177       0
+
draw(m_chla_gamm_k5_ar3_gam, select = 1, residuals = TRUE, rug = FALSE)
+

+
plot(m_chla_gamm_k5_ar3_gam, pages = 1, all.terms = TRUE)
+

+
acf(resid_ar3)
+

+
Box.test(resid_ar3, lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  resid_ar3
+## X-squared = 28.801, df = 20, p-value = 0.09176
+

The AR(3) model has much less residual autocorrelation, and the +diagnostics plots look pretty good. What does the ANOVA table look +like?

+
+

ANOVA table

+
# the anova.gam function is similar to a type 3 ANOVA
+anova(m_chla_gamm_k5_ar3_gam)
+
## 
+## Family: gaussian 
+## Link function: identity 
+## 
+## Formula:
+## ChlaAvg_log ~ (Year_fct + FlowAvg + StationCode)^2 + s(DOY, k = 5)
+## 
+## Parametric Terms:
+##                      df      F p-value
+## Year_fct              4  0.240   0.916
+## FlowAvg               1  2.066   0.151
+## StationCode           6 18.436 < 2e-16
+## Year_fct:FlowAvg      4  0.649   0.627
+## Year_fct:StationCode 24  3.885 5.7e-10
+## FlowAvg:StationCode   6 16.205 < 2e-16
+## 
+## Approximate significance of smooth terms:
+##          edf Ref.df     F p-value
+## s(DOY) 2.055  2.055 20.45  <2e-16
+

The main effect of StationCode and the Year:StationCode and +Flow:StationCode interactions were significant among the parametric +terms of the model. Let’s visualize the effect of flow on chlorophyll +concentrations for each station.

+
+
+

Effect of flow for each station

+
# Add the data back to the gam object so it works with visreg
+m_chla_gamm_k5_ar3_gam$data <- df_chla_c2
+
+# Visualize the effect of flow on chlorophyll concentrations for each station
+  # according to the model
+# First facetted by station
+visreg(m_chla_gamm_k5_ar3_gam, xvar = "FlowAvg", by = "StationCode", gg = TRUE) + 
+  facet_wrap(vars(StationCode), scales = "free") +
+  theme_bw() +
+  ylab("log(Chlorophyll x 1000)") +
+  xlab("Daily Average Flow (cfs)")
+

+
# All stations overlayed
+visreg(m_chla_gamm_k5_ar3_gam, xvar = "FlowAvg", by = "StationCode", overlay = TRUE, gg = TRUE) +
+  theme_bw() +
+  ylab("log(Chlorophyll x 1000)") +
+  xlab("Daily Average Flow (cfs)")
+

+

It looks like the slopes for the chlorophyll vs flow relationships +are significantly different than zero for RD22, I-80, and STTD. RD22 and +I-80 both have negative slopes while STTD has a positive slope. It’s +difficult to really see this though due to the x-axis scale on these +plots and the significant Flow:StationCode interaction, so it may be +interesting to run separate models for each of these stations. In +addition, I’m somewhat unsure what results visreg is +showing in these plots. These are conditional plots but they may be +displaying only a subset of the model results.

+
+
+

Pairwise contrasts

+

The Year:StationCode interaction was also significant, so let’s take +a closer look at its pairwise contrasts.

+
# Contrasts in Station main effect
+emmeans(
+  m_chla_gamm_k5_ar3,
+  data = df_chla_c2,
+  specs = pairwise ~ StationCode
+)
+
## $emmeans
+##  StationCode emmean     SE   df lower.CL upper.CL
+##  RCS           8.99 0.1803 4041     8.64     9.34
+##  RD22          8.55 0.2005 4041     8.16     8.95
+##  I80           8.19 0.2354 4041     7.73     8.65
+##  LIS           8.63 0.2511 4041     8.14     9.12
+##  STTD          9.54 0.2566 4041     9.04    10.04
+##  LIB           7.40 0.1375 4041     7.13     7.67
+##  RVB           7.39 0.0957 4041     7.20     7.58
+## 
+## Results are averaged over the levels of: Year_fct 
+## Confidence level used: 0.95 
+## 
+## $contrasts
+##  contrast    estimate    SE   df t.ratio p.value
+##  RCS - RD22   0.43867 0.268 4041   1.636  0.6587
+##  RCS - I80    0.80199 0.296 4041   2.713  0.0952
+##  RCS - LIS    0.36161 0.308 4041   1.174  0.9039
+##  RCS - STTD  -0.54979 0.312 4041  -1.762  0.5741
+##  RCS - LIB    1.59578 0.224 4041   7.123  <.0001
+##  RCS - RVB    1.60074 0.201 4041   7.957  <.0001
+##  RD22 - I80   0.36332 0.308 4041   1.178  0.9026
+##  RD22 - LIS  -0.07706 0.320 4041  -0.241  1.0000
+##  RD22 - STTD -0.98846 0.324 4041  -3.047  0.0376
+##  RD22 - LIB   1.15711 0.241 4041   4.801  <.0001
+##  RD22 - RVB   1.16207 0.220 4041   5.282  <.0001
+##  I80 - LIS   -0.44038 0.344 4041  -1.282  0.8605
+##  I80 - STTD  -1.35178 0.347 4041  -3.891  0.0020
+##  I80 - LIB    0.79379 0.271 4041   2.925  0.0537
+##  I80 - RVB    0.79875 0.253 4041   3.162  0.0264
+##  LIS - STTD  -0.91140 0.358 4041  -2.546  0.1434
+##  LIS - LIB    1.23417 0.285 4041   4.333  0.0003
+##  LIS - RVB    1.23913 0.267 4041   4.640  0.0001
+##  STTD - LIB   2.14557 0.289 4041   7.421  <.0001
+##  STTD - RVB   2.15053 0.272 4041   7.917  <.0001
+##  LIB - RVB    0.00496 0.163 4041   0.031  1.0000
+## 
+## Results are averaged over the levels of: Year_fct 
+## P value adjustment: tukey method for comparing a family of 7 estimates
+
# Contrasts in Station for each Year
+multcomp::cld(
+  emmeans(
+    m_chla_gamm_k5_ar3, 
+    data = df_chla_c2,
+    specs = pairwise ~ StationCode | Year_fct
+  )$emmeans,
+  sort = FALSE,
+  Letters = letters
+)
+
## Year_fct = 2015:
+##  StationCode emmean    SE   df lower.CL upper.CL .group
+##  RCS           8.85 0.302 4041     8.26     9.45  ab   
+##  RD22          8.79 0.261 4041     8.28     9.30  a    
+##  I80           8.23 0.298 4041     7.64     8.81  a c  
+##  LIS           8.36 0.301 4041     7.77     8.95  a c  
+##  STTD          9.99 0.312 4041     9.38    10.60   b   
+##  LIB           7.92 0.215 4041     7.49     8.34  a c  
+##  RVB           7.63 0.178 4041     7.28     7.98    c  
+## 
+## Year_fct = 2016:
+##  StationCode emmean    SE   df lower.CL upper.CL .group
+##  RCS           8.99 0.250 4041     8.50     9.48  ab   
+##  RD22          8.26 0.254 4041     7.76     8.76  a c  
+##  I80           8.18 0.282 4041     7.63     8.74  a c  
+##  LIS           8.87 0.295 4041     8.29     9.45  ab   
+##  STTD          9.99 0.299 4041     9.41    10.58   b   
+##  LIB           8.28 0.205 4041     7.88     8.69  a c  
+##  RVB           7.65 0.215 4041     7.22     8.07    c  
+## 
+## Year_fct = 2017:
+##  StationCode emmean    SE   df lower.CL upper.CL .group
+##  RCS           9.12 0.248 4041     8.63     9.60  ab   
+##  RD22          8.58 0.262 4041     8.07     9.09  abc  
+##  I80           8.13 0.292 4041     7.55     8.70  a cd 
+##  LIS           8.85 0.304 4041     8.25     9.44  abc  
+##  STTD          9.55 0.333 4041     8.89    10.20   b   
+##  LIB           7.88 0.209 4041     7.47     8.29    cd 
+##  RVB           7.38 0.217 4041     6.96     7.81     d 
+## 
+## Year_fct = 2018:
+##  StationCode emmean    SE   df lower.CL upper.CL .group
+##  RCS           9.07 0.237 4041     8.61     9.54  a    
+##  RD22          8.51 0.253 4041     8.01     9.01  ab   
+##  I80           7.72 0.280 4041     7.17     8.27   bc  
+##  LIS           8.30 0.297 4041     7.71     8.88  abc  
+##  STTD          9.22 0.316 4041     8.60     9.84  a    
+##  LIB           5.89 0.290 4041     5.32     6.46     d 
+##  RVB           7.29 0.200 4041     6.89     7.68    c  
+## 
+## Year_fct = 2019:
+##  StationCode emmean    SE   df lower.CL upper.CL .group
+##  RCS           8.93 0.239 4041     8.46     9.40  a    
+##  RD22          8.62 0.253 4041     8.13     9.12  a    
+##  I80           8.69 0.279 4041     8.15     9.24  a    
+##  LIS           8.78 0.290 4041     8.21     9.35  a    
+##  STTD          8.95 0.297 4041     8.37     9.54  a    
+##  LIB           7.01 0.289 4041     6.44     7.57   b   
+##  RVB           7.01 0.213 4041     6.59     7.43   b   
+## 
+## Confidence level used: 0.95 
+## P value adjustment: tukey method for comparing a family of 7 estimates 
+## significance level used: alpha = 0.05 
+## NOTE: If two or more means share the same grouping symbol,
+##       then we cannot show them to be different.
+##       But we also did not show them to be the same.
+

Comparing Stations in each Year, STTD stood out as being higher than +most other stations in 2015-2018, while LIB and RVB where lower than +some of the upstream stations during all years. The model using a +categorical predictor for flow action period didn’t show that STTD was +higher than most other stations, but it did show that LIB and RVB were +lower than most of the upstream stations. The differences in results +between the two models could be because of the continuous predictor of +flow and how it controls the chlorophyll concentrations.

+
+
+
+
+

Conclusion

+

These results seem mixed to me. It looks like daily average flow has +an effect on chlorophyll concentrations at some of the stations, but +flow doesn’t seem to be a good predictor overall. It may be interesting +to run separate models for a few of the stations - RD22, I-80, and STTD +- that appeared to have slopes that were significantly different than +zero.

+
+ + + +
+
+ +
+ + + + + + + + + + + + + + + + + diff --git a/manuscript_synthesis/notebooks/rtm_chl_gam_analysis_flow.Rmd b/manuscript_synthesis/notebooks/rtm_chl_gam_analysis_flow.Rmd new file mode 100644 index 0000000..6ed7fec --- /dev/null +++ b/manuscript_synthesis/notebooks/rtm_chl_gam_analysis_flow.Rmd @@ -0,0 +1,345 @@ +--- +title: "NDFS Synthesis Manuscript: Chlorophyll GAM analysis" +subtitle: "Daily average flow as continuous predictor" +author: "Dave Bosworth" +date: '`r format(Sys.Date(), "%B %d, %Y")`' +output: + html_document: + code_folding: show + toc: true + toc_float: + collapsed: false +editor_options: + chunk_output_type: console +knit: (function(input, ...) { + rmarkdown::render( + input, + output_dir = here::here("docs"), + envir = globalenv() + ) + }) +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +options(knitr.kable.NA = "") +``` + +# Purpose + +Explore and analyze the continuous chlorophyll data to be included in the NDFS synthesis manuscript. We will attempt to fit a generalized additive model (GAM) to the data set to help account for seasonality in the data. This is an extension of the original analysis which used a categorical predictor for flow action period. In this analysis we will replace this categorical predictor with daily average flow as a continuous predictor. + +# Global code and functions + +```{r load packages, message = FALSE, warning = FALSE} +# Load packages +library(tidyverse) +library(lubridate) +library(scales) +library(knitr) +library(mgcv) +library(lme4) +library(car) +library(emmeans) +library(gratia) +library(visreg) +library(here) +library(conflicted) + +# Declare package conflict preferences +conflicts_prefer(dplyr::filter()) +``` + +Display current versions of R and packages used for this analysis: + +```{r print session info} +devtools::session_info() +``` + +# Import Data + +```{r import data} +# Define file path for processed data +fp_data <- here("manuscript_synthesis/data/processed") + +# Import daily average chlorophyll data +df_chla <- readRDS(file.path(fp_data, "chla_daily_avg_2013-2019.rds")) + +# Import daily average flow data +df_flow <- readRDS(file.path(fp_data, "flow_daily_avg_2013-2019.rds")) +``` + +# Prepare Data + +```{r prepare chla data, message = FALSE} +# Create a vector for the factor order of StationCode +sta_order <- c( + "RCS", + "RD22", + "I80", + "LIS", + "STTD", + "LIB", + "RVB" +) + +# We will use LIS flow data as a proxy for STTD and RD22 flow data as a proxy + # for I-80 +df_flow_c <- df_flow %>% + filter(StationCode %in% c("RD22", "LIS")) %>% + mutate(StationCode = case_match(StationCode, "RD22" ~ "I80", "LIS" ~ "STTD")) %>% + bind_rows(df_flow) + +# Prepare chlorophyll and flow data for exploration and analysis +df_chla_c1 <- df_chla %>% + # Remove RYI since its chlorophyll data is so sparse + filter(StationCode != "RYI") %>% + # Join flow data to chlorophyll data + left_join(df_flow_c, by = join_by(StationCode, Date)) %>% + # Remove all NA flow values + drop_na(FlowAvg) %>% + mutate( + # Scale and log transform chlorophyll values + ChlaAvg_log = log(ChlaAvg * 1000), + # Add Year and DOY variables + Year = year(Date), + DOY = yday(Date), + # Apply factor order to StationCode + StationCode = factor(StationCode, levels = sta_order), + # Add a column for Year as a factor for the model + Year_fct = factor(Year) + ) %>% + arrange(StationCode, Date) +``` + +# Explore sample counts by Station + +```{r chla sample counts station} +df_chla_c1 %>% + summarize( + min_date = min(Date), + max_date = max(Date), + num_samples = n(), + .by = c(StationCode, Year) + ) %>% + arrange(StationCode, Year) %>% + kable() +``` + +Looking at the sample counts and date ranges, we'll only include years 2015-2019 for the analysis. + +```{r chla remove under sampled} +df_chla_c2 <- df_chla_c1 %>% + filter(Year %in% 2015:2019) %>% + mutate(Year_fct = fct_drop(Year_fct)) +``` + +# Plots + +Let's explore the data with some plots. First, lets plot the data in scatterplots of chlorophyll and flow facetted by Station and grouping all years together. + +```{r chla scatterplot all yrs, message = FALSE, fig.height = 7, fig.width = 7} +df_chla_c2 %>% + ggplot(aes(x = FlowAvg, y = ChlaAvg)) + + geom_point() + + geom_smooth(formula = "y ~ x") + + facet_wrap(vars(StationCode), scales = "free") + + theme_bw() +``` + +At first glance, I'm not sure how well flow is going to be able to predict chlorophyll concentrations. At the furthest upstream stations - RCS, RD22, and I-80 - chlorophyll appears to be highest at the lowest flows, but the variation is at its maximum at the lowest flows. There may be some dilution effect going on here at the higher flows. At LIS and STTD, there does seem to be a modest increase in chlorophyll concentrations at the mid-range flows. This pattern is even more obvious at LIB. There appears to be no effect of flow on chlorophyll at RVB, but the range of chlorophyll concentrations is narrow at this station (between 0 and 4). + +Let's break these scatterplots apart by year to see how these patterns vary annually. + +```{r chla scatterplot facet yrs, message = FALSE, fig.height = 11, fig.width = 8} +df_chla_c2 %>% + ggplot(aes(x = FlowAvg, y = ChlaAvg)) + + geom_point() + + geom_smooth(formula = "y ~ x") + + facet_wrap( + vars(StationCode, Year), + ncol = 5, + scales = "free", + labeller = labeller(.multi_line = FALSE) + ) + + theme_bw() +``` + +The patterns appear to vary annually at each station, which may justify using a 3-way interaction. We'll stick with the model using 2-way interactions for now. + +# GAM Model + +We'll try running a GAM including all two-way interactions between Year, Daily Average Flow, and Station, and a smooth term for day of year to account for seasonality. First we'll run the GAM without accounting for serial autocorrelation. + +## Initial Model + +```{r gam no autocorr, warning = FALSE} +m_chla_gam <- gam( + ChlaAvg_log ~ (Year_fct + FlowAvg + StationCode)^2 + s(DOY), + data = df_chla_c2, + method = "REML" +) +``` + +Lets look at the model summary and diagnostics: + +```{r gam no autocorr diag, warning = FALSE} +summary(m_chla_gam) +appraise(m_chla_gam) +k.check(m_chla_gam) +draw(m_chla_gam, select = 1, residuals = TRUE, rug = FALSE) +plot(m_chla_gam, pages = 1, all.terms = TRUE) +acf(residuals(m_chla_gam)) +Box.test(residuals(m_chla_gam), lag = 20, type = 'Ljung-Box') +``` + +## Model with k = 5 + +The model definitely has residual autocorrelation as indicated by the ACF plot and the Box-Ljung test. Even though the p-value for the k-check is zero, the smooth term for day of year appears to be overfitted. Let's try a smaller k-value for the smooth first, then lets try to address the residual autocorrelation. + +```{r gam no autocorr k5, warning = FALSE} +m_chla_gam_k5 <- gam( + ChlaAvg_log ~ (Year_fct + FlowAvg + StationCode)^2 + s(DOY, k = 5), + data = df_chla_c2, + method = "REML" +) + +summary(m_chla_gam_k5) +appraise(m_chla_gam_k5) +k.check(m_chla_gam_k5) +draw(m_chla_gam_k5, select = 1, residuals = TRUE, rug = FALSE) +plot(m_chla_gam_k5, pages = 1, all.terms = TRUE) +acf(residuals(m_chla_gam_k5)) +Box.test(residuals(m_chla_gam_k5), lag = 20, type = 'Ljung-Box') +``` + +Changing the k-value to 5 seems to help reduce the "wiggliness" of the smooth term for DOY, but the p-value for the k-check is still zero. Despite this, lets proceed with a k-value of 5. + +## Model with AR() correlation structure + +Now, we'll try to deal with the residual autocorrelation. We'll run AR(1), AR(3), and AR(4) models and compare them using AIC. It wasn't possible running an AR(2) model. + +```{r gam with AR comparison, warning = FALSE} +# Define model formula as an object +f_chla_gam_k5 <- as.formula("ChlaAvg_log ~ (Year_fct + FlowAvg + StationCode)^2 + s(DOY, k = 5)") + +# Fit original model with k = 5 and three successive AR(p) models +m_chla_gamm_k5 <- gamm( + f_chla_gam_k5, + data = df_chla_c2, + method = "REML" +) + +m_chla_gamm_k5_ar1 <- gamm( + f_chla_gam_k5, + data = df_chla_c2, + correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 1), # grouped by Year_fct and StationCode + method = "REML" +) + +m_chla_gamm_k5_ar3 <- gamm( + f_chla_gam_k5, + data = df_chla_c2, + correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 3), + method = "REML" +) + +m_chla_gamm_k5_ar4 <- gamm( + f_chla_gam_k5, + data = df_chla_c2, + correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 4), + method = "REML" +) + +# Compare models +anova( + m_chla_gamm_k5$lme, + m_chla_gamm_k5_ar1$lme, + m_chla_gamm_k5_ar3$lme, + m_chla_gamm_k5_ar4$lme +) +``` + +It looks like the AR(3) model has the best fit according to the AIC values and backed up by the BIC values. Let's take a closer look at that one. + +## AR(3) Model + +```{r gam ar3 diag, warning = FALSE} +# Pull out the residuals and the GAM model +resid_ar3 <- residuals(m_chla_gamm_k5_ar3$lme, type = "normalized") +m_chla_gamm_k5_ar3_gam <- m_chla_gamm_k5_ar3$gam + +summary(m_chla_gamm_k5_ar3_gam) +appraise(m_chla_gamm_k5_ar3_gam) +k.check(m_chla_gamm_k5_ar3_gam) +draw(m_chla_gamm_k5_ar3_gam, select = 1, residuals = TRUE, rug = FALSE) +plot(m_chla_gamm_k5_ar3_gam, pages = 1, all.terms = TRUE) +acf(resid_ar3) +Box.test(resid_ar3, lag = 20, type = 'Ljung-Box') +``` + +The AR(3) model has much less residual autocorrelation, and the diagnostics plots look pretty good. What does the ANOVA table look like? + +### ANOVA table + +```{r gam ar3 anova, warning = FALSE} +# the anova.gam function is similar to a type 3 ANOVA +anova(m_chla_gamm_k5_ar3_gam) +``` + +The main effect of StationCode and the Year:StationCode and Flow:StationCode interactions were significant among the parametric terms of the model. Let's visualize the effect of flow on chlorophyll concentrations for each station. + +### Effect of flow for each station + +```{r gam ar3 visualize, warning = FALSE} +# Add the data back to the gam object so it works with visreg +m_chla_gamm_k5_ar3_gam$data <- df_chla_c2 + +# Visualize the effect of flow on chlorophyll concentrations for each station + # according to the model +# First facetted by station +visreg(m_chla_gamm_k5_ar3_gam, xvar = "FlowAvg", by = "StationCode", gg = TRUE) + + facet_wrap(vars(StationCode), scales = "free") + + theme_bw() + + ylab("log(Chlorophyll x 1000)") + + xlab("Daily Average Flow (cfs)") + +# All stations overlayed +visreg(m_chla_gamm_k5_ar3_gam, xvar = "FlowAvg", by = "StationCode", overlay = TRUE, gg = TRUE) + + theme_bw() + + ylab("log(Chlorophyll x 1000)") + + xlab("Daily Average Flow (cfs)") +``` + +It looks like the slopes for the chlorophyll vs flow relationships are significantly different than zero for RD22, I-80, and STTD. RD22 and I-80 both have negative slopes while STTD has a positive slope. It's difficult to really see this though due to the x-axis scale on these plots and the significant Flow:StationCode interaction, so it may be interesting to run separate models for each of these stations. In addition, I'm somewhat unsure what results `visreg` is showing in these plots. These are conditional plots but they may be displaying only a subset of the model results. + +### Pairwise contrasts + +The Year:StationCode interaction was also significant, so let's take a closer look at its pairwise contrasts. + +```{r gam ar3 contrasts, warning = FALSE} +# Contrasts in Station main effect +emmeans( + m_chla_gamm_k5_ar3, + data = df_chla_c2, + specs = pairwise ~ StationCode +) + +# Contrasts in Station for each Year +multcomp::cld( + emmeans( + m_chla_gamm_k5_ar3, + data = df_chla_c2, + specs = pairwise ~ StationCode | Year_fct + )$emmeans, + sort = FALSE, + Letters = letters +) +``` + +Comparing Stations in each Year, STTD stood out as being higher than most other stations in 2015-2018, while LIB and RVB where lower than some of the upstream stations during all years. The model using a categorical predictor for flow action period didn't show that STTD was higher than most other stations, but it did show that LIB and RVB were lower than most of the upstream stations. The differences in results between the two models could be because of the continuous predictor of flow and how it controls the chlorophyll concentrations. + +# Conclusion + +These results seem mixed to me. It looks like daily average flow has an effect on chlorophyll concentrations at some of the stations, but flow doesn't seem to be a good predictor overall. It may be interesting to run separate models for a few of the stations - RD22, I-80, and STTD - that appeared to have slopes that were significantly different than zero. +