diff --git a/docs/index.md b/docs/index.md index c9357ea..c51d6d3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -8,7 +8,7 @@ layout: default * [GAM Model using Categorical Predictors - Initial analysis](rtm_chl_gam_categorical_initial.html) * [Models using Categorical Predictors - Revised analysis](rtm_chl_models_categorical_revised.html) -* [Models using Flow as a Continuous Predictor - Daily Averages](rtm_chl_models_flow.html) +* [Models using Flow as a Continuous Predictor - Daily Averages](rtm_chl_models_flow_daily_avg.html) * [Models using Flow as a Continuous Predictor - Weekly Averages](rtm_chl_models_flow_weekly_avg.html) * [Explore relationship between continuous chlorophyll and percent flow pulse water](rtm_chl_analysis_perc_flow_pulse.html) diff --git a/docs/rtm_chl_models_flow.html b/docs/rtm_chl_models_flow.html deleted file mode 100644 index 6fd0ba6..0000000 --- a/docs/rtm_chl_models_flow.html +++ /dev/null @@ -1,3659 +0,0 @@ - - - - - - - - - - - - - - - -NDFS Synthesis Manuscript: Chlorophyll analysis - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - -
-
-
-
-
- -
- - - - - - - -
-

Purpose

-

Explore and analyze the continuous chlorophyll data to be included in -the NDFS synthesis manuscript. We will attempt to fit various models to -the data set using daily average flow as a continuous predictor which -replaces the categorical predictor - flow action period - in the -original analysis. These models will only include representative -stations for 4 habitat types - upstream (RD22), lower Yolo Bypass -(STTD), Cache Slough complex (LIB), and downstream (RVB).

-
-
-

Global code and functions

-
# Load packages
-library(tidyverse)
-library(scales)
-library(knitr)
-library(mgcv)
-library(car)
-library(gratia)
-library(here)
-library(conflicted)
-
-# Source functions
-source(here("manuscript_synthesis/src/global_functions.R"))
-
-# Declare package conflict preferences 
-conflicts_prefer(dplyr::filter(), dplyr::lag())
-

Display current versions of R and packages used for this -analysis:

-
devtools::session_info()
-
## ─ Session info ───────────────────────────────────────────────────────────────
-##  setting  value
-##  version  R version 4.2.3 (2023-03-15 ucrt)
-##  os       Windows 10 x64 (build 19044)
-##  system   x86_64, mingw32
-##  ui       RTerm
-##  language (EN)
-##  collate  English_United States.utf8
-##  ctype    English_United States.utf8
-##  tz       America/Los_Angeles
-##  date     2023-11-21
-##  pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
-## 
-## ─ Packages ───────────────────────────────────────────────────────────────────
-##  package     * version date (UTC) lib source
-##  abind         1.4-5   2016-07-21 [1] CRAN (R 4.2.0)
-##  bslib         0.4.2   2022-12-16 [1] CRAN (R 4.2.2)
-##  cachem        1.0.8   2023-05-01 [1] CRAN (R 4.2.3)
-##  callr         3.7.3   2022-11-02 [1] CRAN (R 4.2.2)
-##  car         * 3.1-2   2023-03-30 [1] CRAN (R 4.2.3)
-##  carData     * 3.0-5   2022-01-06 [1] CRAN (R 4.2.1)
-##  cli           3.6.1   2023-03-23 [1] CRAN (R 4.2.3)
-##  colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.2.2)
-##  conflicted  * 1.2.0   2023-02-01 [1] CRAN (R 4.2.2)
-##  crayon        1.5.2   2022-09-29 [1] CRAN (R 4.2.1)
-##  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.2.1)
-##  digest        0.6.33  2023-07-07 [1] CRAN (R 4.2.3)
-##  dplyr       * 1.1.3   2023-09-03 [1] CRAN (R 4.2.3)
-##  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.1)
-##  evaluate      0.21    2023-05-05 [1] CRAN (R 4.2.3)
-##  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.2.2)
-##  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.2.2)
-##  forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.2.2)
-##  fs          * 1.6.3   2023-07-20 [1] CRAN (R 4.2.3)
-##  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.1)
-##  ggplot2     * 3.4.3   2023-08-14 [1] CRAN (R 4.2.3)
-##  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.1)
-##  gratia      * 0.8.1   2023-02-02 [1] CRAN (R 4.2.2)
-##  gtable        0.3.4   2023-08-21 [1] CRAN (R 4.2.3)
-##  here        * 1.0.1   2020-12-13 [1] CRAN (R 4.2.1)
-##  hms           1.1.3   2023-03-21 [1] CRAN (R 4.2.3)
-##  htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.2.3)
-##  htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.2.3)
-##  httpuv        1.6.9   2023-02-14 [1] CRAN (R 4.2.2)
-##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.2.1)
-##  jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.2.3)
-##  knitr       * 1.42    2023-01-25 [1] CRAN (R 4.2.2)
-##  later         1.3.0   2021-08-18 [1] CRAN (R 4.2.1)
-##  lattice       0.21-8  2023-04-05 [1] CRAN (R 4.2.3)
-##  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.1)
-##  lubridate   * 1.9.3   2023-09-27 [1] CRAN (R 4.2.3)
-##  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.1)
-##  Matrix        1.5-4   2023-04-04 [1] CRAN (R 4.2.3)
-##  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.2.1)
-##  mgcv        * 1.8-42  2023-03-02 [1] CRAN (R 4.2.2)
-##  mime          0.12    2021-09-28 [1] CRAN (R 4.2.0)
-##  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.2.1)
-##  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.2.1)
-##  mvnfast       0.2.8   2023-02-23 [1] CRAN (R 4.2.2)
-##  nlme        * 3.1-162 2023-01-31 [2] CRAN (R 4.2.3)
-##  patchwork   * 1.1.2   2022-08-19 [1] CRAN (R 4.2.1)
-##  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.2.3)
-##  pkgbuild      1.4.2   2023-06-26 [1] CRAN (R 4.2.3)
-##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.1)
-##  pkgload       1.3.2.1 2023-07-08 [1] CRAN (R 4.2.3)
-##  prettyunits   1.2.0   2023-09-24 [1] CRAN (R 4.2.3)
-##  processx      3.8.2   2023-06-30 [1] CRAN (R 4.2.3)
-##  profvis       0.3.7   2020-11-02 [1] CRAN (R 4.2.1)
-##  promises      1.2.0.1 2021-02-11 [1] CRAN (R 4.2.1)
-##  ps            1.7.5   2023-04-18 [1] CRAN (R 4.2.3)
-##  purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.2.3)
-##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.1)
-##  Rcpp          1.0.11  2023-07-06 [1] CRAN (R 4.2.3)
-##  readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.2.2)
-##  remotes       2.4.2   2021-11-30 [1] CRAN (R 4.2.1)
-##  rlang       * 1.1.1   2023-04-28 [1] CRAN (R 4.2.3)
-##  rmarkdown     2.21    2023-03-26 [1] CRAN (R 4.2.3)
-##  rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.2.1)
-##  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.1)
-##  sass          0.4.6   2023-05-03 [1] CRAN (R 4.2.3)
-##  scales      * 1.2.1   2022-08-20 [1] CRAN (R 4.2.1)
-##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.1)
-##  shiny         1.7.4   2022-12-15 [1] CRAN (R 4.2.2)
-##  stringi       1.7.12  2023-01-11 [1] CRAN (R 4.2.2)
-##  stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.2.2)
-##  tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.2.3)
-##  tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.2.2)
-##  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.1)
-##  tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.2.2)
-##  timechange    0.2.0   2023-01-11 [1] CRAN (R 4.2.2)
-##  tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.2.3)
-##  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.2.1)
-##  usethis       2.1.6   2022-05-25 [1] CRAN (R 4.2.1)
-##  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.2.2)
-##  vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.2.3)
-##  withr         2.5.1   2023-09-26 [1] CRAN (R 4.2.3)
-##  xfun          0.39    2023-04-20 [1] CRAN (R 4.2.3)
-##  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.2.1)
-##  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.2.2)
-## 
-##  [1] C:/R/win-library/4.2
-##  [2] C:/Program Files/R/R-4.2.3/library
-## 
-## ──────────────────────────────────────────────────────────────────────────────
-
-
-

Import Data

-
# Define file path for processed data
-fp_data <- here("manuscript_synthesis/data/processed")
-
-# Import daily average chlorophyll data
-df_chla <- readRDS(file.path(fp_data, "chla_wt_daily_avg_2013-2019.rds")) %>% select(-WaterTemp)
-
-# Import daily average flow data
-df_flow <- readRDS(file.path(fp_data, "flow_daily_avg_2013-2019.rds"))
-
-
-

Prepare Data

-
# Create a vector for the factor order of StationCode
-sta_order <- c("RD22", "STTD", "LIB", "RVB")
-
-# We will use LIS flow data as a proxy for STTD
-df_flow_c <- df_flow %>% mutate(StationCode = if_else(StationCode == "LIS", "STTD", StationCode))
-
-# Prepare chlorophyll and flow data for exploration and analysis
-df_chla_c1 <- df_chla %>% 
-  # Filter to only include representative stations for 4 habitat types - RD22, STTD, LIB, RVB
-  filter(StationCode %in% sta_order) %>% 
-  # Join flow data to chlorophyll data
-  left_join(df_flow_c, by = join_by(StationCode, Date)) %>% 
-  # Remove all NA flow values
-  drop_na(Flow) %>% 
-  mutate(
-    # Scale and log transform chlorophyll values
-    Chla_log = log(Chla * 1000),
-    # Add Year and DOY variables
-    Year = year(Date),
-    DOY = yday(Date),
-    # Apply factor order to StationCode
-    StationCode = factor(StationCode, levels = sta_order),
-    # Add a column for Year as a factor for the model
-    Year_fct = factor(Year)
-  ) %>% 
-  arrange(StationCode, Date)
-
-
-

Explore sample counts by Station

-
df_chla_c1 %>% 
-  summarize(
-    min_date = min(Date),
-    max_date = max(Date),
-    num_samples = n(),
-    .by = c(StationCode, Year)
-  ) %>% 
-  arrange(StationCode, Year) %>% 
-  kable()
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
StationCodeYearmin_datemax_datenum_samples
RD2220142014-09-302014-11-0840
RD2220152015-07-242015-11-10110
RD2220162016-06-232016-09-1686
RD2220172017-07-142017-11-03113
RD2220182018-07-132018-11-11122
RD2220192019-07-112019-11-06119
STTD20132013-08-152013-11-0478
STTD20142014-07-252014-10-1987
STTD20152015-07-272015-11-16108
STTD20162016-06-232016-09-1683
STTD20172017-07-142017-09-2556
STTD20182018-07-202018-10-1184
STTD20192019-07-262019-11-06100
LIB20142014-10-012014-11-0836
LIB20152015-07-062015-11-16125
LIB20162016-05-292016-09-16101
LIB20172017-07-142017-11-0296
LIB20182018-08-142018-10-3136
LIB20192019-07-112019-10-2531
RVB20132013-07-072013-11-17134
RVB20142014-07-252014-11-08105
RVB20152015-07-062015-11-16134
RVB20162016-05-292016-09-1680
RVB20172017-07-142017-11-03113
RVB20182018-07-132018-11-11122
RVB20192019-07-112019-11-06102
-

Looking at the sample counts and date ranges, we’ll only include -years 2015-2019 for the analysis.

-
df_chla_c2 <- df_chla_c1 %>% 
-  filter(Year %in% 2015:2019) %>% 
-  mutate(Year_fct = fct_drop(Year_fct))
-
-# Create another dataframe that has up to 3 lag variables for chlorophyll to be
-  # used in the linear models
-df_chla_c2_lag <- df_chla_c2 %>% 
-  # Fill in missing days for each StationCode-Year combination
-  group_by(StationCode, Year) %>% 
-  complete(Date = seq.Date(min(Date), max(Date), by = "1 day")) %>% 
-  # Create lag variables of scaled log transformed chlorophyll values
-  mutate(
-    lag1 = lag(Chla_log),
-    lag2 = lag(Chla_log, 2),
-    lag3 = lag(Chla_log, 3)
-  ) %>% 
-  ungroup()
-
-
-

Plots

-

Let’s explore the data with some plots. First, lets plot the data in -scatterplots of chlorophyll and flow facetted by Station and grouping -all years together.

-
df_chla_c2 %>% 
-  ggplot(aes(x = Flow, y = Chla)) +
-  geom_point() +
-  geom_smooth(formula = "y ~ x") +
-  facet_wrap(vars(StationCode), scales = "free") +
-  theme_bw()
-

-

At first glance, I’m not sure how well flow is going to be able to -predict chlorophyll concentrations. At the furthest upstream station - -RD22 - chlorophyll appears to be highest at the lowest flows, but the -variation is at its maximum at the lowest flows. There may be some -dilution effect going on here at the higher flows. At STTD, there does -seem to be a modest increase in chlorophyll concentrations at the -mid-range flows. This pattern is even more obvious at LIB. There appears -to be no effect of flow on chlorophyll at RVB, but the range of -chlorophyll concentrations is narrow at this station (between 0 and -5).

-

Let’s break these scatterplots apart by year to see how these -patterns vary annually.

-
df_chla_c2 %>% 
-  ggplot(aes(x = Flow, y = Chla)) +
-  geom_point() +
-  geom_smooth(formula = "y ~ x") +
-  facet_wrap(
-    vars(StationCode, Year), 
-    ncol = 5, 
-    scales = "free", 
-    labeller = labeller(.multi_line = FALSE)
-  ) +
-  theme_bw()
-

-

The patterns appear to vary annually at each station, which may -justify using a 3-way interaction.

-
-
-

GAM Model with 3-way interactions

-

First, we will attempt to fit a generalized additive model (GAM) to -the data set to help account for seasonality in the data. We’ll try -running a GAM using a three-way interaction between Year, Daily Average -Flow, and Station, and a smooth term for day of year to account for -seasonality. Initially, we’ll run the GAM without accounting for serial -autocorrelation.

-
-

Initial Model

-
m_gam3 <- gam(
-  Chla_log ~ Year_fct * Flow * StationCode + s(DOY), 
-  data = df_chla_c2,
-  method = "REML"
-)
-

Lets look at the model summary and diagnostics:

-
summary(m_gam3)
-
## 
-## Family: gaussian 
-## Link function: identity 
-## 
-## Formula:
-## Chla_log ~ Year_fct * Flow * StationCode + s(DOY)
-## 
-## Parametric coefficients:
-##                                     Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)                        9.745e+00  4.283e-02 227.546  < 2e-16 ***
-## Year_fct2016                      -8.341e-01  6.570e-02 -12.696  < 2e-16 ***
-## Year_fct2017                      -9.211e-01  9.375e-02  -9.825  < 2e-16 ***
-## Year_fct2018                      -3.811e-01  5.645e-02  -6.750 1.96e-11 ***
-## Year_fct2019                      -1.553e-01  5.595e-02  -2.775  0.00557 ** 
-## Flow                              -1.869e-03  1.825e-04 -10.240  < 2e-16 ***
-## StationCodeSTTD                   -1.075e+00  5.393e-02 -19.926  < 2e-16 ***
-## StationCodeLIB                    -2.137e+00  1.437e-01 -14.875  < 2e-16 ***
-## StationCodeRVB                    -2.195e+00  1.005e-01 -21.842  < 2e-16 ***
-## Year_fct2016:Flow                  2.209e-03  2.990e-04   7.388 2.24e-13 ***
-## Year_fct2017:Flow                  2.950e-02  4.852e-03   6.080 1.45e-09 ***
-## Year_fct2018:Flow                  4.676e-04  2.482e-04   1.884  0.05977 .  
-## Year_fct2019:Flow                 -7.187e-05  2.176e-04  -0.330  0.74126    
-## Year_fct2016:StationCodeSTTD       1.082e+00  8.210e-02  13.181  < 2e-16 ***
-## Year_fct2017:StationCodeSTTD       1.048e+00  1.242e-01   8.441  < 2e-16 ***
-## Year_fct2018:StationCodeSTTD      -2.895e-01  7.742e-02  -3.739  0.00019 ***
-## Year_fct2019:StationCodeSTTD      -8.838e-01  7.574e-02 -11.668  < 2e-16 ***
-## Year_fct2016:StationCodeLIB        2.203e+00  2.546e-01   8.652  < 2e-16 ***
-## Year_fct2017:StationCodeLIB        9.576e-01  1.991e-01   4.810 1.63e-06 ***
-## Year_fct2018:StationCodeLIB       -1.551e+00  2.409e-01  -6.436 1.55e-10 ***
-## Year_fct2019:StationCodeLIB       -3.510e-01  2.135e-01  -1.644  0.10033    
-## Year_fct2016:StationCodeRVB        1.212e+00  2.309e-01   5.248 1.71e-07 ***
-## Year_fct2017:StationCodeRVB        9.736e-01  2.001e-01   4.866 1.23e-06 ***
-## Year_fct2018:StationCodeRVB        1.274e-01  1.557e-01   0.818  0.41349    
-## Year_fct2019:StationCodeRVB       -8.231e-01  1.848e-01  -4.454 8.94e-06 ***
-## Flow:StationCodeSTTD               4.740e-03  2.936e-04  16.147  < 2e-16 ***
-## Flow:StationCodeLIB                1.812e-03  2.039e-04   8.883  < 2e-16 ***
-## Flow:StationCodeRVB                1.891e-03  1.838e-04  10.290  < 2e-16 ***
-## Year_fct2016:Flow:StationCodeSTTD -3.737e-03  4.362e-04  -8.566  < 2e-16 ***
-## Year_fct2017:Flow:StationCodeSTTD -2.723e-02  5.249e-03  -5.187 2.37e-07 ***
-## Year_fct2018:Flow:StationCodeSTTD -7.936e-04  3.882e-04  -2.044  0.04105 *  
-## Year_fct2019:Flow:StationCodeSTTD -1.449e-03  3.440e-04  -4.213 2.64e-05 ***
-## Year_fct2016:Flow:StationCodeLIB  -1.816e-03  3.421e-04  -5.307 1.25e-07 ***
-## Year_fct2017:Flow:StationCodeLIB  -2.955e-02  4.855e-03  -6.087 1.39e-09 ***
-## Year_fct2018:Flow:StationCodeLIB  -6.475e-04  3.801e-04  -1.703  0.08865 .  
-## Year_fct2019:Flow:StationCodeLIB   2.046e-04  3.072e-04   0.666  0.50555    
-## Year_fct2016:Flow:StationCodeRVB  -2.268e-03  3.003e-04  -7.551 6.70e-14 ***
-## Year_fct2017:Flow:StationCodeRVB  -2.955e-02  4.852e-03  -6.089 1.37e-09 ***
-## Year_fct2018:Flow:StationCodeRVB  -4.848e-04  2.495e-04  -1.943  0.05219 .  
-## Year_fct2019:Flow:StationCodeRVB   9.204e-05  2.197e-04   0.419  0.67527    
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Approximate significance of smooth terms:
-##          edf Ref.df     F p-value    
-## s(DOY) 7.487  8.411 40.88  <2e-16 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## R-sq.(adj) =  0.875   Deviance explained = 87.8%
-## -REML = 898.16  Scale est. = 0.12072   n = 1921
-
appraise(m_gam3)
-

-
shapiro.test(residuals(m_gam3))
-
## 
-##  Shapiro-Wilk normality test
-## 
-## data:  residuals(m_gam3)
-## W = 0.97102, p-value < 2.2e-16
-
k.check(m_gam3)
-
##        k'      edf   k-index p-value
-## s(DOY)  9 7.486841 0.9418805   0.005
-
draw(m_gam3, select = 1, residuals = TRUE, rug = FALSE)
-

-
plot(m_gam3, pages = 1, all.terms = TRUE)
-

-
acf(residuals(m_gam3))
-

-
Box.test(residuals(m_gam3), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_gam3)
-## X-squared = 3735.8, df = 20, p-value < 2.2e-16
-
-
-

Model with k = 5

-

The model definitely has residual autocorrelation as indicated by the -ACF plot and the Box-Ljung test. Even though the p-value for the k-check -is less than 0.05, the smooth term for day of year appears to be -overfitted. Let’s try a smaller k-value for the smooth first, then lets -try to address the residual autocorrelation.

-
m_gam3_k5 <- gam(
-  Chla_log ~ Year_fct * Flow * StationCode + s(DOY, k = 5), 
-  data = df_chla_c2,
-  method = "REML"
-)
-
-summary(m_gam3_k5)
-
## 
-## Family: gaussian 
-## Link function: identity 
-## 
-## Formula:
-## Chla_log ~ Year_fct * Flow * StationCode + s(DOY, k = 5)
-## 
-## Parametric coefficients:
-##                                     Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)                        9.733e+00  4.260e-02 228.453  < 2e-16 ***
-## Year_fct2016                      -7.901e-01  6.461e-02 -12.229  < 2e-16 ***
-## Year_fct2017                      -9.026e-01  9.409e-02  -9.593  < 2e-16 ***
-## Year_fct2018                      -3.787e-01  5.676e-02  -6.672 3.32e-11 ***
-## Year_fct2019                      -1.526e-01  5.625e-02  -2.714 0.006711 ** 
-## Flow                              -1.808e-03  1.803e-04 -10.029  < 2e-16 ***
-## StationCodeSTTD                   -1.069e+00  5.408e-02 -19.760  < 2e-16 ***
-## StationCodeLIB                    -2.132e+00  1.438e-01 -14.828  < 2e-16 ***
-## StationCodeRVB                    -2.191e+00  1.009e-01 -21.726  < 2e-16 ***
-## Year_fct2016:Flow                  2.031e-03  2.952e-04   6.881 8.06e-12 ***
-## Year_fct2017:Flow                  2.882e-02  4.852e-03   5.941 3.37e-09 ***
-## Year_fct2018:Flow                  4.505e-04  2.496e-04   1.805 0.071304 .  
-## Year_fct2019:Flow                 -9.132e-05  2.188e-04  -0.417 0.676427    
-## Year_fct2016:StationCodeSTTD       1.069e+00  8.243e-02  12.964  < 2e-16 ***
-## Year_fct2017:StationCodeSTTD       1.036e+00  1.248e-01   8.298  < 2e-16 ***
-## Year_fct2018:StationCodeSTTD      -2.965e-01  7.785e-02  -3.809 0.000144 ***
-## Year_fct2019:StationCodeSTTD      -8.854e-01  7.604e-02 -11.644  < 2e-16 ***
-## Year_fct2016:StationCodeLIB        2.201e+00  2.555e-01   8.615  < 2e-16 ***
-## Year_fct2017:StationCodeLIB        9.597e-01  1.992e-01   4.817 1.57e-06 ***
-## Year_fct2018:StationCodeLIB       -1.511e+00  2.415e-01  -6.257 4.84e-10 ***
-## Year_fct2019:StationCodeLIB       -3.715e-01  2.144e-01  -1.733 0.083277 .  
-## Year_fct2016:StationCodeRVB        1.158e+00  2.306e-01   5.021 5.62e-07 ***
-## Year_fct2017:StationCodeRVB        9.786e-01  1.999e-01   4.895 1.07e-06 ***
-## Year_fct2018:StationCodeRVB        1.267e-01  1.564e-01   0.810 0.417994    
-## Year_fct2019:StationCodeRVB       -8.344e-01  1.853e-01  -4.504 7.09e-06 ***
-## Flow:StationCodeSTTD               4.756e-03  2.952e-04  16.111  < 2e-16 ***
-## Flow:StationCodeLIB                1.749e-03  2.024e-04   8.641  < 2e-16 ***
-## Flow:StationCodeRVB                1.832e-03  1.816e-04  10.087  < 2e-16 ***
-## Year_fct2016:Flow:StationCodeSTTD -3.760e-03  4.388e-04  -8.570  < 2e-16 ***
-## Year_fct2017:Flow:StationCodeSTTD -2.633e-02  5.239e-03  -5.026 5.48e-07 ***
-## Year_fct2018:Flow:StationCodeSTTD -8.012e-04  3.906e-04  -2.051 0.040386 *  
-## Year_fct2019:Flow:StationCodeSTTD -1.475e-03  3.460e-04  -4.262 2.13e-05 ***
-## Year_fct2016:Flow:StationCodeLIB  -1.620e-03  3.392e-04  -4.776 1.93e-06 ***
-## Year_fct2017:Flow:StationCodeLIB  -2.885e-02  4.853e-03  -5.946 3.27e-09 ***
-## Year_fct2018:Flow:StationCodeLIB  -5.940e-04  3.818e-04  -1.556 0.119930    
-## Year_fct2019:Flow:StationCodeLIB   2.056e-04  3.088e-04   0.666 0.505630    
-## Year_fct2016:Flow:StationCodeRVB  -2.087e-03  2.962e-04  -7.045 2.59e-12 ***
-## Year_fct2017:Flow:StationCodeRVB  -2.887e-02  4.852e-03  -5.950 3.19e-09 ***
-## Year_fct2018:Flow:StationCodeRVB  -4.685e-04  2.510e-04  -1.867 0.062082 .  
-## Year_fct2019:Flow:StationCodeRVB   1.118e-04  2.207e-04   0.506 0.612687    
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Approximate significance of smooth terms:
-##          edf Ref.df     F p-value    
-## s(DOY) 3.897  3.994 82.03  <2e-16 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## R-sq.(adj) =  0.874   Deviance explained = 87.7%
-## -REML = 905.16  Scale est. = 0.12223   n = 1921
-
appraise(m_gam3_k5)
-

-
shapiro.test(residuals(m_gam3_k5))
-
## 
-##  Shapiro-Wilk normality test
-## 
-## data:  residuals(m_gam3_k5)
-## W = 0.96838, p-value < 2.2e-16
-
k.check(m_gam3_k5)
-
##        k'      edf   k-index p-value
-## s(DOY)  4 3.897303 0.9289271       0
-
draw(m_gam3_k5, select = 1, residuals = TRUE, rug = FALSE)
-

-
plot(m_gam3_k5, pages = 1, all.terms = TRUE)
-

-
acf(residuals(m_gam3_k5))
-

-
Box.test(residuals(m_gam3_k5), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_gam3_k5)
-## X-squared = 3775.8, df = 20, p-value < 2.2e-16
-

Changing the k-value to 5 seems to help reduce the “wiggliness” of -the smooth term for DOY, but the p-value for the k-check is still less -than 0.05. Despite this, lets proceed with a k-value of 5.

-
-
-

Model with AR() correlation structure

-

Now, we’ll try to deal with the residual autocorrelation. We’ll run -AR(1), AR(2), AR(3), and AR(4) models and compare them using AIC.

-
# Define model formula as an object
-f_gam3 <- as.formula("Chla_log ~ Year_fct * Flow * StationCode + s(DOY, k = 5)")
-
-# Fit original model with k = 5 and three successive AR(p) models
-m_gamm3 <- gamm(
-  f_gam3, 
-  data = df_chla_c2,
-  method = "REML"
-)
-
-m_gamm3_ar1 <- gamm(
-  f_gam3, 
-  data = df_chla_c2, 
-  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 1), # grouped by Year_fct and StationCode
-  method = "REML"
-)
-
-m_gamm3_ar2 <- gamm(
-  f_gam3, 
-  data = df_chla_c2, 
-  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 2),
-  method = "REML"
-)
-
-m_gamm3_ar3 <- gamm(
-  f_gam3, 
-  data = df_chla_c2, 
-  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 3),
-  method = "REML"
-)
-
-m_gamm3_ar4 <- gamm(
-  f_gam3, 
-  data = df_chla_c2, 
-  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 4),
-  method = "REML"
-)
-
-# Compare models
-anova(
-  m_gamm3$lme, 
-  m_gamm3_ar1$lme, 
-  m_gamm3_ar2$lme, 
-  m_gamm3_ar3$lme,
-  m_gamm3_ar4$lme
-)
-
##                 Model df       AIC       BIC    logLik   Test   L.Ratio p-value
-## m_gamm3$lme         1 43 1896.3152 2134.4933 -905.1576                         
-## m_gamm3_ar1$lme     2 44 -761.4514 -517.7342  424.7257 1 vs 2 2659.7665  <.0001
-## m_gamm3_ar2$lme     3 45 -761.1459 -511.8897  425.5730 2 vs 3    1.6945  0.1930
-## m_gamm3_ar3$lme     4 46 -762.6776 -507.8824  427.3388 3 vs 4    3.5317  0.0602
-## m_gamm3_ar4$lme     5 47 -760.6894 -500.3552  427.3447 4 vs 5    0.0118  0.9133
-

It looks like the AR(3) model has the best fit according to the AIC -values. However, BIC prefers the AR(1) model. Let’s take a closer look -at the AR(1) model.

-
-
-

AR(1) Model

-
# Pull out the residuals and the GAM model
-resid_gamm3_ar1 <- residuals(m_gamm3_ar1$lme, type = "normalized")
-m_gamm3_ar1_gam <- m_gamm3_ar1$gam
-
-summary(m_gamm3_ar1_gam)
-
## 
-## Family: gaussian 
-## Link function: identity 
-## 
-## Formula:
-## Chla_log ~ Year_fct * Flow * StationCode + s(DOY, k = 5)
-## 
-## Parametric coefficients:
-##                                     Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)                        9.518e+00  2.152e-01  44.217  < 2e-16 ***
-## Year_fct2016                      -6.228e-01  3.255e-01  -1.913  0.05587 .  
-## Year_fct2017                      -4.890e-01  3.314e-01  -1.476  0.14016    
-## Year_fct2018                      -2.959e-01  2.987e-01  -0.991  0.32201    
-## Year_fct2019                      -3.006e-02  2.998e-01  -0.100  0.92015    
-## Flow                              -6.681e-04  2.914e-04  -2.293  0.02197 *  
-## StationCodeSTTD                   -8.105e-01  3.024e-01  -2.680  0.00743 ** 
-## StationCodeLIB                    -1.712e+00  3.119e-01  -5.490 4.57e-08 ***
-## StationCodeRVB                    -1.847e+00  2.957e-01  -6.247 5.17e-10 ***
-## Year_fct2016:Flow                  1.008e-03  5.864e-04   1.719  0.08578 .  
-## Year_fct2017:Flow                  1.318e-02  8.063e-03   1.635  0.10232    
-## Year_fct2018:Flow                  6.137e-05  5.312e-04   0.116  0.90802    
-## Year_fct2019:Flow                 -3.428e-04  4.481e-04  -0.765  0.44432    
-## Year_fct2016:StationCodeSTTD       9.596e-01  4.533e-01   2.117  0.03439 *  
-## Year_fct2017:StationCodeSTTD       3.666e-01  4.833e-01   0.758  0.44827    
-## Year_fct2018:StationCodeSTTD      -4.601e-01  4.375e-01  -1.052  0.29313    
-## Year_fct2019:StationCodeSTTD      -1.078e+00  4.296e-01  -2.508  0.01221 *  
-## Year_fct2016:StationCodeLIB        1.892e+00  4.690e-01   4.035 5.69e-05 ***
-## Year_fct2017:StationCodeLIB        5.850e-01  4.679e-01   1.250  0.21127    
-## Year_fct2018:StationCodeLIB       -1.588e+00  4.994e-01  -3.181  0.00149 ** 
-## Year_fct2019:StationCodeLIB       -7.228e-01  5.097e-01  -1.418  0.15634    
-## Year_fct2016:StationCodeRVB        9.792e-01  4.745e-01   2.064  0.03919 *  
-## Year_fct2017:StationCodeRVB        2.970e-01  4.728e-01   0.628  0.52996    
-## Year_fct2018:StationCodeRVB        1.256e-02  4.359e-01   0.029  0.97701    
-## Year_fct2019:StationCodeRVB       -5.866e-01  4.441e-01  -1.321  0.18667    
-## Flow:StationCodeSTTD               1.563e-03  5.696e-04   2.744  0.00614 ** 
-## Flow:StationCodeLIB                7.331e-04  3.001e-04   2.443  0.01467 *  
-## Flow:StationCodeRVB                6.702e-04  2.917e-04   2.298  0.02167 *  
-## Year_fct2016:Flow:StationCodeSTTD -1.668e-03  9.315e-04  -1.791  0.07345 .  
-## Year_fct2017:Flow:StationCodeSTTD -7.118e-03  8.237e-03  -0.864  0.38762    
-## Year_fct2018:Flow:StationCodeSTTD  4.618e-04  8.986e-04   0.514  0.60738    
-## Year_fct2019:Flow:StationCodeSTTD  7.969e-04  7.567e-04   1.053  0.29244    
-## Year_fct2016:Flow:StationCodeLIB  -7.348e-04  5.979e-04  -1.229  0.21925    
-## Year_fct2017:Flow:StationCodeLIB  -1.306e-02  8.064e-03  -1.620  0.10537    
-## Year_fct2018:Flow:StationCodeLIB   1.411e-04  5.571e-04   0.253  0.80007    
-## Year_fct2019:Flow:StationCodeLIB   3.178e-04  4.756e-04   0.668  0.50408    
-## Year_fct2016:Flow:StationCodeRVB  -1.024e-03  5.868e-04  -1.746  0.08101 .  
-## Year_fct2017:Flow:StationCodeRVB  -1.319e-02  8.063e-03  -1.635  0.10212    
-## Year_fct2018:Flow:StationCodeRVB  -7.120e-05  5.316e-04  -0.134  0.89346    
-## Year_fct2019:Flow:StationCodeRVB   3.206e-04  4.485e-04   0.715  0.47479    
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Approximate significance of smooth terms:
-##          edf Ref.df     F p-value    
-## s(DOY) 3.745  3.745 9.993 5.3e-07 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## R-sq.(adj) =   0.82   
-##   Scale est. = 0.22667   n = 1921
-
appraise(m_gamm3_ar1_gam)
-

-
shapiro.test(resid_gamm3_ar1)
-
## 
-##  Shapiro-Wilk normality test
-## 
-## data:  resid_gamm3_ar1
-## W = 0.84032, p-value < 2.2e-16
-
k.check(m_gamm3_ar1_gam)
-
##        k'     edf   k-index p-value
-## s(DOY)  4 3.74514 0.8115987       0
-
draw(m_gamm3_ar1_gam, select = 1, residuals = TRUE, rug = FALSE)
-

-
plot(m_gamm3_ar1_gam, pages = 1, all.terms = TRUE)
-

-
acf(resid_gamm3_ar1)
-

-
Box.test(resid_gamm3_ar1, lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  resid_gamm3_ar1
-## X-squared = 50.995, df = 20, p-value = 0.0001593
-

The AR(1) model has much less residual autocorrelation, and the -diagnostics plots look okay. Let’s take a closer look at how the -back-transformed fitted values from the model match the observed -values.

-
df_gamm3_ar1_fit <- df_chla_c2 %>% 
-  fitted_values(m_gamm3_ar1_gam, data = .) %>% 
-  mutate(fitted_bt = exp(fitted) / 1000)
-
-plt_gamm3_ar1_obs_fit <- df_gamm3_ar1_fit %>% 
-  ggplot(aes(x = fitted_bt, y = Chla)) +
-  geom_point() +
-  geom_abline(slope = 1, intercept = 0, color = "red") +
-  theme_bw() +
-  labs(
-    x = "Back-transformed Fitted Values",
-    y = "Observed Values"
-  )
-
-plt_gamm3_ar1_obs_fit
-

-

Let’s look at each Station-Year combination separately.

-
plt_gamm3_ar1_obs_fit +
-  facet_wrap(
-    vars(StationCode, Year_fct),
-    ncol = 5,
-    scales = "free",
-    labeller = labeller(.multi_line = FALSE)
-  )
-

-

The red lines are the 1:1 ratio between fitted and observed values, -and we would expect for most of the points to be near this line if the -model has a good fit to the observed data. However, the fitted values -from the model don’t appear to match the observed values that well, and -there are some unusual patterns when we look at each Station-Year -combination separately. I don’t think this GAM model is a good candidate -to use for our analysis.

-
-
-
-

Linear Model with 3-way interactions

-

Since the GAM model didn’t seem to fit the observed data that well, -let’s try a linear model using a three-way interaction between Year, -Daily Average Flow, and Station. Initially, we’ll run the model without -accounting for serial autocorrelation.

-
-

Initial Model

-
m_lm3 <- df_chla_c2_lag %>% 
-  drop_na(Chla_log) %>% 
-  lm(Chla_log ~ Year_fct * Flow * StationCode, data = .)
-

Lets look at the model summary and diagnostics:

-
summary(m_lm3)
-
## 
-## Call:
-## lm(formula = Chla_log ~ Year_fct * Flow * StationCode, data = .)
-## 
-## Residuals:
-##      Min       1Q   Median       3Q      Max 
-## -1.53824 -0.22584 -0.03527  0.19362  1.75991 
-## 
-## Coefficients:
-##                                     Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)                        9.6844243  0.0452509 214.016  < 2e-16 ***
-## Year_fct2016                      -0.6624673  0.0679526  -9.749  < 2e-16 ***
-## Year_fct2017                      -1.1724759  0.0999868 -11.726  < 2e-16 ***
-## Year_fct2018                      -0.3418296  0.0613064  -5.576 2.82e-08 ***
-## Year_fct2019                      -0.1194239  0.0606175  -1.970 0.048971 *  
-## Flow                              -0.0018894  0.0001902  -9.935  < 2e-16 ***
-## StationCodeSTTD                   -1.0665077  0.0583396 -18.281  < 2e-16 ***
-## StationCodeLIB                    -2.2759582  0.1501355 -15.159  < 2e-16 ***
-## StationCodeRVB                    -2.2211596  0.1087675 -20.421  < 2e-16 ***
-## Year_fct2016:Flow                  0.0023191  0.0003112   7.451 1.40e-13 ***
-## Year_fct2017:Flow                  0.0459260  0.0050786   9.043  < 2e-16 ***
-## Year_fct2018:Flow                  0.0003299  0.0002697   1.223 0.221489    
-## Year_fct2019:Flow                 -0.0001500  0.0002357  -0.636 0.524638    
-## Year_fct2016:StationCodeSTTD       1.0883876  0.0888628  12.248  < 2e-16 ***
-## Year_fct2017:StationCodeSTTD       1.4828385  0.1306889  11.346  < 2e-16 ***
-## Year_fct2018:StationCodeSTTD      -0.3044581  0.0836642  -3.639 0.000281 ***
-## Year_fct2019:StationCodeSTTD      -0.9412845  0.0821169 -11.463  < 2e-16 ***
-## Year_fct2016:StationCodeLIB        2.0498183  0.2674351   7.665 2.85e-14 ***
-## Year_fct2017:StationCodeLIB        1.3182346  0.2107896   6.254 4.95e-10 ***
-## Year_fct2018:StationCodeLIB       -1.4934167  0.2582457  -5.783 8.58e-09 ***
-## Year_fct2019:StationCodeLIB       -0.3466158  0.2273610  -1.525 0.127548    
-## Year_fct2016:StationCodeRVB        1.9708585  0.2227786   8.847  < 2e-16 ***
-## Year_fct2017:StationCodeRVB        1.4604311  0.2117602   6.897 7.25e-12 ***
-## Year_fct2018:StationCodeRVB       -0.0019108  0.1643456  -0.012 0.990725    
-## Year_fct2019:StationCodeRVB       -0.7693190  0.1962948  -3.919 9.20e-05 ***
-## Flow:StationCodeSTTD               0.0046051  0.0003182  14.474  < 2e-16 ***
-## Flow:StationCodeLIB                0.0017121  0.0002100   8.153 6.42e-16 ***
-## Flow:StationCodeRVB                0.0019282  0.0001915  10.070  < 2e-16 ***
-## Year_fct2016:Flow:StationCodeSTTD -0.0037137  0.0004737  -7.840 7.46e-15 ***
-## Year_fct2017:Flow:StationCodeSTTD -0.0407758  0.0055375  -7.364 2.66e-13 ***
-## Year_fct2018:Flow:StationCodeSTTD -0.0007383  0.0004198  -1.759 0.078768 .  
-## Year_fct2019:Flow:StationCodeSTTD -0.0012540  0.0003732  -3.360 0.000794 ***
-## Year_fct2016:Flow:StationCodeLIB  -0.0020650  0.0003575  -5.777 8.89e-09 ***
-## Year_fct2017:Flow:StationCodeLIB  -0.0459271  0.0050805  -9.040  < 2e-16 ***
-## Year_fct2018:Flow:StationCodeLIB  -0.0004184  0.0004122  -1.015 0.310185    
-## Year_fct2019:Flow:StationCodeLIB   0.0001287  0.0003310   0.389 0.697449    
-## Year_fct2016:Flow:StationCodeRVB  -0.0024657  0.0003128  -7.884 5.33e-15 ***
-## Year_fct2017:Flow:StationCodeRVB  -0.0460030  0.0050786  -9.058  < 2e-16 ***
-## Year_fct2018:Flow:StationCodeRVB  -0.0003456  0.0002710  -1.275 0.202416    
-## Year_fct2019:Flow:StationCodeRVB   0.0001493  0.0002375   0.629 0.529618    
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Residual standard error: 0.378 on 1881 degrees of freedom
-## Multiple R-squared:  0.8556, Adjusted R-squared:  0.8526 
-## F-statistic: 285.8 on 39 and 1881 DF,  p-value: < 2.2e-16
-
df_chla_c2_lag %>% 
-  drop_na(Chla_log) %>% 
-  plot_lm_diag(Chla_log, m_lm3)
-
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
-

-
shapiro.test(residuals(m_lm3))
-
## 
-##  Shapiro-Wilk normality test
-## 
-## data:  residuals(m_lm3)
-## W = 0.96994, p-value < 2.2e-16
-
acf(residuals(m_lm3))
-

-
Box.test(residuals(m_lm3), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_lm3)
-## X-squared = 4218.2, df = 20, p-value < 2.2e-16
-

The residuals deviate from a normal distribution according to visual -inspection and the Shapiro-Wilk normality test. Also, model definitely -has residual autocorrelation as indicated by the ACF plot and the -Box-Ljung test.

-
-

Model with lag terms

-

Now, we’ll try to deal with the residual autocorrelation and the -non-normal residuals. We’ll run a series of linear models adding 1, 2, -and 3 lag terms and compare how well they correct for -autocorrelation.

-
-

Lag 1

-
m_lm3_lag1 <- df_chla_c2_lag %>% 
-  drop_na(Chla_log, lag1) %>% 
-  lm(Chla_log ~ Year_fct * Flow * StationCode + lag1, data = .)
-
-acf(residuals(m_lm3_lag1))
-

-
Box.test(residuals(m_lm3_lag1), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_lm3_lag1)
-## X-squared = 59.583, df = 20, p-value = 8.262e-06
-
-
-

Lag 2

-
m_lm3_lag2 <- df_chla_c2_lag %>% 
-  drop_na(Chla_log, lag1, lag2) %>% 
-  lm(Chla_log ~ Year_fct * Flow * StationCode + lag1 + lag2, data = .)
-
-acf(residuals(m_lm3_lag2))
-

-
Box.test(residuals(m_lm3_lag2), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_lm3_lag2)
-## X-squared = 36.166, df = 20, p-value = 0.01471
-
-
-

Lag 3

-
m_lm3_lag3 <- df_chla_c2_lag %>% 
-  drop_na(Chla_log, lag1, lag2, lag3) %>% 
-  lm(Chla_log ~ Year_fct * Flow * StationCode + lag1 + lag2 + lag3, data = .)
-
-acf(residuals(m_lm3_lag3))
-

-
Box.test(residuals(m_lm3_lag3), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_lm3_lag3)
-## X-squared = 29.8, df = 20, p-value = 0.07317
-

The model with 3 lag terms looks to have slightly better ACF and -Box-Ljung test results than the other models. Let’s compare the 3 models -using AIC and BIC to see which model those prefer.

-
-
-

Compare models

-
AIC(m_lm3_lag1, m_lm3_lag2, m_lm3_lag3)
-
##            df       AIC
-## m_lm3_lag1 42 -1332.474
-## m_lm3_lag2 43 -1320.991
-## m_lm3_lag3 44 -1301.195
-
BIC(m_lm3_lag1, m_lm3_lag2, m_lm3_lag3)
-
##            df       BIC
-## m_lm3_lag1 42 -1099.969
-## m_lm3_lag2 43 -1083.949
-## m_lm3_lag3 44 -1059.514
-

Both AIC and BIC prefer the model with 1 lag term. Let’s take a -closer look at that one.

-
-
-
-

Lag 1 model summary

-
summary(m_lm3_lag1)
-
## 
-## Call:
-## lm(formula = Chla_log ~ Year_fct * Flow * StationCode + lag1, 
-##     data = .)
-## 
-## Residuals:
-##      Min       1Q   Median       3Q      Max 
-## -1.11307 -0.05814 -0.00595  0.05217  1.06385 
-## 
-## Coefficients:
-##                                     Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)                        1.001e+00  1.029e-01   9.728  < 2e-16 ***
-## Year_fct2016                      -9.809e-02  3.103e-02  -3.161 0.001598 ** 
-## Year_fct2017                      -1.525e-01  4.616e-02  -3.303 0.000975 ***
-## Year_fct2018                      -5.790e-02  2.756e-02  -2.101 0.035792 *  
-## Year_fct2019                      -2.174e-02  2.708e-02  -0.803 0.422137    
-## Flow                              -3.367e-04  8.645e-05  -3.895 0.000102 ***
-## StationCodeSTTD                   -1.182e-01  2.830e-02  -4.177 3.10e-05 ***
-## StationCodeLIB                    -1.924e-01  7.091e-02  -2.713 0.006729 ** 
-## StationCodeRVB                    -2.181e-01  5.374e-02  -4.059 5.14e-05 ***
-## lag1                               8.985e-01  1.044e-02  86.090  < 2e-16 ***
-## Year_fct2016:Flow                  3.634e-04  1.401e-04   2.594 0.009552 ** 
-## Year_fct2017:Flow                  5.522e-03  2.327e-03   2.373 0.017739 *  
-## Year_fct2018:Flow                  2.083e-04  1.199e-04   1.738 0.082447 .  
-## Year_fct2019:Flow                  1.407e-04  1.048e-04   1.342 0.179847    
-## Year_fct2016:StationCodeSTTD       1.270e-01  4.128e-02   3.078 0.002115 ** 
-## Year_fct2017:StationCodeSTTD       2.685e-01  6.059e-02   4.432 9.90e-06 ***
-## Year_fct2018:StationCodeSTTD      -1.769e-02  3.753e-02  -0.471 0.637527    
-## Year_fct2019:StationCodeSTTD      -9.963e-02  3.804e-02  -2.619 0.008883 ** 
-## Year_fct2016:StationCodeLIB        2.879e-01  1.205e-01   2.388 0.017030 *  
-## Year_fct2017:StationCodeLIB        1.367e-01  9.785e-02   1.397 0.162477    
-## Year_fct2018:StationCodeLIB       -1.636e-01  1.329e-01  -1.231 0.218468    
-## Year_fct2019:StationCodeLIB       -2.590e-02  1.048e-01  -0.247 0.804794    
-## Year_fct2016:StationCodeRVB        2.448e-01  1.028e-01   2.382 0.017302 *  
-## Year_fct2017:StationCodeRVB        1.841e-01  9.545e-02   1.929 0.053871 .  
-## Year_fct2018:StationCodeRVB        3.772e-02  7.309e-02   0.516 0.605905    
-## Year_fct2019:StationCodeRVB       -3.273e-02  8.873e-02  -0.369 0.712219    
-## Flow:StationCodeSTTD               6.188e-04  1.508e-04   4.103 4.26e-05 ***
-## Flow:StationCodeLIB                3.573e-04  9.469e-05   3.774 0.000166 ***
-## Flow:StationCodeRVB                3.341e-04  8.712e-05   3.835 0.000130 ***
-## Year_fct2016:Flow:StationCodeSTTD -5.296e-04  2.150e-04  -2.463 0.013871 *  
-## Year_fct2017:Flow:StationCodeSTTD -2.463e-03  2.544e-03  -0.968 0.333192    
-## Year_fct2018:Flow:StationCodeSTTD -2.088e-04  1.882e-04  -1.109 0.267388    
-## Year_fct2019:Flow:StationCodeSTTD -2.486e-04  1.680e-04  -1.480 0.139048    
-## Year_fct2016:Flow:StationCodeLIB  -2.807e-04  1.601e-04  -1.753 0.079803 .  
-## Year_fct2017:Flow:StationCodeLIB  -5.534e-03  2.328e-03  -2.377 0.017554 *  
-## Year_fct2018:Flow:StationCodeLIB  -1.611e-04  2.125e-04  -0.758 0.448487    
-## Year_fct2019:Flow:StationCodeLIB  -1.318e-04  1.537e-04  -0.857 0.391346    
-## Year_fct2016:Flow:StationCodeRVB  -3.780e-04  1.410e-04  -2.681 0.007399 ** 
-## Year_fct2017:Flow:StationCodeRVB  -5.527e-03  2.327e-03  -2.375 0.017656 *  
-## Year_fct2018:Flow:StationCodeRVB  -2.086e-04  1.205e-04  -1.732 0.083497 .  
-## Year_fct2019:Flow:StationCodeRVB  -1.420e-04  1.056e-04  -1.345 0.178840    
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Residual standard error: 0.1677 on 1833 degrees of freedom
-## Multiple R-squared:  0.9709, Adjusted R-squared:  0.9703 
-## F-statistic:  1531 on 40 and 1833 DF,  p-value: < 2.2e-16
-
df_chla_c2_lag %>% 
-  drop_na(Chla_log, lag1) %>% 
-  plot_lm_diag(Chla_log, m_lm3_lag1)
-
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
-

-
shapiro.test(residuals(m_lm3_lag1))
-
## 
-##  Shapiro-Wilk normality test
-## 
-## data:  residuals(m_lm3_lag1)
-## W = 0.86196, p-value < 2.2e-16
-

The residuals deviate from a normal distribution particularly at the -both tails of the distribution. I’m not so sure about using this model. -Let’s take a closer look at how the back-transformed fitted values from -the model match the observed values.

-
df_lm3_lag1_fit <- df_chla_c2_lag %>% 
-  drop_na(Chla, lag1) %>% 
-  mutate(Fitted = as_tibble(predict(m_lm3_lag1, interval = "confidence"))) %>% 
-  unnest_wider(Fitted) %>% 
-  mutate(across(c("fit", "lwr", "upr"), ~ exp(.x) / 1000))
-
-plt_lm3_lag1_obs_fit <- df_lm3_lag1_fit %>% 
-  ggplot(aes(x = fit, y = Chla)) +
-  geom_point() +
-  geom_abline(slope = 1, intercept = 0, color = "red") +
-  theme_bw() +
-  labs(
-    x = "Back-transformed Fitted Values",
-    y = "Observed Values"
-  )
-
-plt_lm3_lag1_obs_fit
-

-

Let’s look at each Station-Year combination separately.

-
plt_lm3_lag1_obs_fit +
-  facet_wrap(
-    vars(StationCode, Year_fct),
-    ncol = 5,
-    scales = "free",
-    labeller = labeller(.multi_line = FALSE)
-  )
-

-

The red lines are the 1:1 ratio between fitted and observed values, -and we would expect for most of the points to be near this line if the -model has a good fit to the observed data. The fitted and observed -values appear to match pretty well at the lower end of the range of -values, but this deteriorates at the mid and higher range values. This -pattern holds for some of the separate Station-Year combinations. I’m -not sure this linear model is a good candidate to use for our -analysis.

-
-
-

Lag 3 model summary

-

Let’s take a closer look at the model with 3 lag terms since that had -the least autocorrelation.

-
summary(m_lm3_lag3)
-
## 
-## Call:
-## lm(formula = Chla_log ~ Year_fct * Flow * StationCode + lag1 + 
-##     lag2 + lag3, data = .)
-## 
-## Residuals:
-##      Min       1Q   Median       3Q      Max 
-## -1.10205 -0.05729 -0.00689  0.05130  1.10315 
-## 
-## Coefficients:
-##                                     Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)                        1.157e+00  1.107e-01  10.454  < 2e-16 ***
-## Year_fct2016                      -9.661e-02  3.126e-02  -3.091 0.002027 ** 
-## Year_fct2017                      -1.610e-01  4.643e-02  -3.467 0.000540 ***
-## Year_fct2018                      -5.585e-02  2.771e-02  -2.016 0.043975 *  
-## Year_fct2019                      -1.043e-02  2.720e-02  -0.383 0.701433    
-## Flow                              -3.249e-04  8.626e-05  -3.766 0.000171 ***
-## StationCodeSTTD                   -1.216e-01  2.874e-02  -4.232 2.43e-05 ***
-## StationCodeLIB                    -2.218e-01  7.148e-02  -3.102 0.001950 ** 
-## StationCodeRVB                    -2.404e-01  5.410e-02  -4.444 9.37e-06 ***
-## lag1                               9.520e-01  2.404e-02  39.604  < 2e-16 ***
-## lag2                              -6.912e-03  3.300e-02  -0.209 0.834141    
-## lag3                              -6.404e-02  2.396e-02  -2.672 0.007602 ** 
-## Year_fct2016:Flow                  3.667e-04  1.395e-04   2.629 0.008641 ** 
-## Year_fct2017:Flow                  6.434e-03  2.364e-03   2.722 0.006556 ** 
-## Year_fct2018:Flow                  1.808e-04  1.194e-04   1.514 0.130087    
-## Year_fct2019:Flow                  9.141e-05  1.045e-04   0.875 0.381899    
-## Year_fct2016:StationCodeSTTD       1.392e-01  4.182e-02   3.329 0.000889 ***
-## Year_fct2017:StationCodeSTTD       2.677e-01  6.087e-02   4.399 1.15e-05 ***
-## Year_fct2018:StationCodeSTTD      -3.326e-02  3.782e-02  -0.880 0.379189    
-## Year_fct2019:StationCodeSTTD      -1.314e-01  3.855e-02  -3.408 0.000669 ***
-## Year_fct2016:StationCodeLIB        2.940e-01  1.214e-01   2.422 0.015517 *  
-## Year_fct2017:StationCodeLIB        1.346e-01  1.007e-01   1.337 0.181412    
-## Year_fct2018:StationCodeLIB       -2.818e-01  1.426e-01  -1.976 0.048279 *  
-## Year_fct2019:StationCodeLIB       -6.092e-02  1.303e-01  -0.467 0.640243    
-## Year_fct2016:StationCodeRVB        2.601e-01  1.100e-01   2.364 0.018204 *  
-## Year_fct2017:StationCodeRVB        1.906e-01  9.562e-02   1.993 0.046416 *  
-## Year_fct2018:StationCodeRVB        2.043e-02  7.272e-02   0.281 0.778735    
-## Year_fct2019:StationCodeRVB       -1.232e-01  9.296e-02  -1.326 0.185077    
-## Flow:StationCodeSTTD               6.575e-04  1.545e-04   4.254 2.21e-05 ***
-## Flow:StationCodeLIB                3.407e-04  9.466e-05   3.599 0.000328 ***
-## Flow:StationCodeRVB                3.220e-04  8.693e-05   3.704 0.000219 ***
-## Year_fct2016:Flow:StationCodeSTTD -5.712e-04  2.169e-04  -2.633 0.008528 ** 
-## Year_fct2017:Flow:StationCodeSTTD -3.888e-03  2.608e-03  -1.491 0.136168    
-## Year_fct2018:Flow:StationCodeSTTD -2.004e-04  1.903e-04  -1.053 0.292443    
-## Year_fct2019:Flow:StationCodeSTTD -2.358e-04  1.704e-04  -1.384 0.166560    
-## Year_fct2016:Flow:StationCodeLIB  -2.891e-04  1.602e-04  -1.805 0.071255 .  
-## Year_fct2017:Flow:StationCodeLIB  -6.458e-03  2.365e-03  -2.731 0.006386 ** 
-## Year_fct2018:Flow:StationCodeLIB  -2.967e-04  2.335e-04  -1.271 0.203981    
-## Year_fct2019:Flow:StationCodeLIB  -1.055e-04  1.901e-04  -0.555 0.578996    
-## Year_fct2016:Flow:StationCodeRVB  -3.828e-04  1.405e-04  -2.725 0.006496 ** 
-## Year_fct2017:Flow:StationCodeRVB  -6.439e-03  2.364e-03  -2.724 0.006523 ** 
-## Year_fct2018:Flow:StationCodeRVB  -1.798e-04  1.200e-04  -1.498 0.134302    
-## Year_fct2019:Flow:StationCodeRVB  -8.592e-05  1.054e-04  -0.816 0.414863    
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Residual standard error: 0.1663 on 1752 degrees of freedom
-## Multiple R-squared:  0.9706, Adjusted R-squared:  0.9699 
-## F-statistic:  1378 on 42 and 1752 DF,  p-value: < 2.2e-16
-
df_chla_c2_lag %>% 
-  drop_na(Chla_log, lag1, lag2, lag3) %>% 
-  plot_lm_diag(Chla_log, m_lm3_lag3)
-
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
-

-
shapiro.test(residuals(m_lm3_lag3))
-
## 
-##  Shapiro-Wilk normality test
-## 
-## data:  residuals(m_lm3_lag3)
-## W = 0.8528, p-value < 2.2e-16
-

The diagnostic plots show that the lag 3 model has similar problems -as the lag 1 model, so this doesn’t look like a good candidate for our -analysis as well.

-
-
-
-
-

GAM Model with 2-way interactions

-

Because the models using 3-way interactions don’t seem to fit the -data well, we will attempt to fit a generalized additive model (GAM) to -the data set using all two-way interactions between Year, Daily Average -Flow, and Station, with a smooth term for day of year to account for -seasonality. Initially, we’ll run the GAM without accounting for serial -autocorrelation.

-
-

Initial Model

-
m_gam2 <- gam(
-  Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(DOY, k = 5), 
-  data = df_chla_c2,
-  method = "REML"
-)
-

Lets look at the model summary and diagnostics:

-
summary(m_gam2)
-
## 
-## Family: gaussian 
-## Link function: identity 
-## 
-## Formula:
-## Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(DOY, k = 5)
-## 
-## Parametric coefficients:
-##                                Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)                   9.688e+00  3.752e-02 258.201  < 2e-16 ***
-## Year_fct2016                 -5.462e-01  5.415e-02 -10.087  < 2e-16 ***
-## Year_fct2017                 -3.975e-01  5.038e-02  -7.891 5.02e-15 ***
-## Year_fct2018                 -3.166e-01  4.815e-02  -6.576 6.24e-11 ***
-## Year_fct2019                 -1.727e-01  4.848e-02  -3.562 0.000377 ***
-## Flow                         -1.460e-03  9.154e-05 -15.953  < 2e-16 ***
-## StationCodeSTTD              -9.919e-01  5.113e-02 -19.400  < 2e-16 ***
-## StationCodeLIB               -1.930e+00  1.079e-01 -17.897  < 2e-16 ***
-## StationCodeRVB               -2.140e+00  9.850e-02 -21.721  < 2e-16 ***
-## Year_fct2016:Flow            -4.570e-05  3.126e-05  -1.462 0.143979    
-## Year_fct2017:Flow            -4.495e-05  2.527e-05  -1.779 0.075411 .  
-## Year_fct2018:Flow            -1.826e-05  2.555e-05  -0.715 0.474786    
-## Year_fct2019:Flow             4.051e-06  2.720e-05   0.149 0.881632    
-## Year_fct2016:StationCodeSTTD  7.454e-01  7.498e-02   9.941  < 2e-16 ***
-## Year_fct2017:StationCodeSTTD  3.922e-01  7.886e-02   4.974 7.16e-07 ***
-## Year_fct2018:StationCodeSTTD -3.423e-01  7.218e-02  -4.743 2.26e-06 ***
-## Year_fct2019:StationCodeSTTD -9.528e-01  7.064e-02 -13.488  < 2e-16 ***
-## Year_fct2016:StationCodeLIB   1.293e+00  8.638e-02  14.969  < 2e-16 ***
-## Year_fct2017:StationCodeLIB   3.824e-01  8.524e-02   4.487 7.66e-06 ***
-## Year_fct2018:StationCodeLIB  -1.577e+00  1.060e-01 -14.875  < 2e-16 ***
-## Year_fct2019:StationCodeLIB  -5.273e-01  1.082e-01  -4.875 1.18e-06 ***
-## Year_fct2016:StationCodeRVB   8.304e-01  2.334e-01   3.557 0.000384 ***
-## Year_fct2017:StationCodeRVB   4.753e-01  1.859e-01   2.556 0.010665 *  
-## Year_fct2018:StationCodeRVB   7.054e-02  1.538e-01   0.459 0.646619    
-## Year_fct2019:StationCodeRVB  -6.698e-01  1.828e-01  -3.664 0.000255 ***
-## Flow:StationCodeSTTD          3.336e-03  1.263e-04  26.406  < 2e-16 ***
-## Flow:StationCodeLIB           1.501e-03  1.106e-04  13.572  < 2e-16 ***
-## Flow:StationCodeRVB           1.483e-03  9.028e-05  16.427  < 2e-16 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Approximate significance of smooth terms:
-##          edf Ref.df     F p-value    
-## s(DOY) 3.883  3.992 93.07  <2e-16 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## R-sq.(adj) =  0.863   Deviance explained = 86.5%
-## -REML = 906.44  Scale est. = 0.13309   n = 1921
-
appraise(m_gam2)
-

-
shapiro.test(residuals(m_gam2))
-
## 
-##  Shapiro-Wilk normality test
-## 
-## data:  residuals(m_gam2)
-## W = 0.97201, p-value < 2.2e-16
-
k.check(m_gam2)
-
##        k'      edf   k-index p-value
-## s(DOY)  4 3.883314 0.9303233       0
-
draw(m_gam2, select = 1, residuals = TRUE, rug = FALSE)
-

-
plot(m_gam2, pages = 1, all.terms = TRUE)
-

-
acf(residuals(m_gam2))
-

-
Box.test(residuals(m_gam2), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_gam2)
-## X-squared = 4292.7, df = 20, p-value < 2.2e-16
-
-
-

Model with AR() correlation structure

-

Now, we’ll try to deal with the residual autocorrelation. We’ll run -AR(1), AR(2), AR(3), and AR(4) models and compare them using AIC.

-
# Define model formula as an object
-f_gam2 <- as.formula("Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(DOY, k = 5)")
-
-# Fit original model with k = 5 and three successive AR(p) models
-m_gamm2 <- gamm(
-  f_gam2, 
-  data = df_chla_c2,
-  method = "REML"
-)
-
-m_gamm2_ar1 <- gamm(
-  f_gam2, 
-  data = df_chla_c2, 
-  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 1), # grouped by Year_fct and StationCode
-  method = "REML"
-)
-
-m_gamm2_ar2 <- gamm(
-  f_gam2, 
-  data = df_chla_c2, 
-  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 2),
-  method = "REML"
-)
-
-m_gamm2_ar3 <- gamm(
-  f_gam2, 
-  data = df_chla_c2, 
-  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 3),
-  method = "REML"
-)
-
-m_gamm2_ar4 <- gamm(
-  f_gam2, 
-  data = df_chla_c2, 
-  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 4),
-  method = "REML"
-)
-
-# Compare models
-anova(
-  m_gamm2$lme, 
-  m_gamm2_ar1$lme, 
-  m_gamm2_ar2$lme, 
-  m_gamm2_ar3$lme,
-  m_gamm2_ar4$lme
-)
-
##                 Model df       AIC       BIC    logLik   Test   L.Ratio p-value
-## m_gamm2$lme         1 31 1874.8752 2046.7822 -906.4376                         
-## m_gamm2_ar1$lme     2 32 -917.8475 -740.3950  490.9238 1 vs 2 2794.7227  <.0001
-## m_gamm2_ar2$lme     3 33 -916.3375 -733.3396  491.1687 2 vs 3    0.4900  0.4839
-## m_gamm2_ar3$lme     4 34 -917.2007 -728.6574  492.6003 3 vs 4    2.8632  0.0906
-## m_gamm2_ar4$lme     5 35 -915.4855 -721.3969  492.7428 4 vs 5    0.2848  0.5936
-

It looks like the AR(1) model has the best fit according to the AIC -and BIC values. Let’s take a closer look at the AR(1) model.

-
-
-

AR(1) Model

-
# Pull out the residuals and the GAM model
-resid_gamm2_ar1 <- residuals(m_gamm2_ar1$lme, type = "normalized")
-m_gamm2_ar1_gam <- m_gamm2_ar1$gam
-
-summary(m_gamm2_ar1_gam)
-
## 
-## Family: gaussian 
-## Link function: identity 
-## 
-## Formula:
-## Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(DOY, k = 5)
-## 
-## Parametric coefficients:
-##                                Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)                   9.499e+00  2.554e-01  37.194  < 2e-16 ***
-## Year_fct2016                 -5.044e-01  3.805e-01  -1.325 0.185176    
-## Year_fct2017                 -2.380e-01  3.574e-01  -0.666 0.505493    
-## Year_fct2018                 -2.842e-01  3.516e-01  -0.808 0.418971    
-## Year_fct2019                 -4.756e-02  3.534e-01  -0.135 0.892969    
-## Flow                         -5.814e-04  1.880e-04  -3.093 0.002013 ** 
-## StationCodeSTTD              -7.963e-01  3.606e-01  -2.208 0.027339 *  
-## StationCodeLIB               -1.541e+00  3.567e-01  -4.320 1.64e-05 ***
-## StationCodeRVB               -1.802e+00  3.510e-01  -5.133 3.14e-07 ***
-## Year_fct2016:Flow            -9.464e-06  2.004e-05  -0.472 0.636788    
-## Year_fct2017:Flow            -2.725e-06  2.032e-05  -0.134 0.893321    
-## Year_fct2018:Flow            -5.029e-06  2.185e-05  -0.230 0.818003    
-## Year_fct2019:Flow            -2.100e-05  2.047e-05  -1.026 0.304988    
-## Year_fct2016:StationCodeSTTD  8.188e-01  5.352e-01   1.530 0.126195    
-## Year_fct2017:StationCodeSTTD -9.049e-02  5.460e-01  -0.166 0.868392    
-## Year_fct2018:StationCodeSTTD -4.750e-01  5.186e-01  -0.916 0.359835    
-## Year_fct2019:StationCodeSTTD -1.039e+00  5.094e-01  -2.039 0.041548 *  
-## Year_fct2016:StationCodeLIB   1.407e+00  5.202e-01   2.705 0.006900 ** 
-## Year_fct2017:StationCodeLIB   1.655e-01  5.081e-01   0.326 0.744633    
-## Year_fct2018:StationCodeLIB  -1.832e+00  5.647e-01  -3.244 0.001197 ** 
-## Year_fct2019:StationCodeLIB  -7.660e-01  5.742e-01  -1.334 0.182305    
-## Year_fct2016:StationCodeRVB   8.858e-01  5.518e-01   1.605 0.108565    
-## Year_fct2017:StationCodeRVB   1.627e-02  5.257e-01   0.031 0.975309    
-## Year_fct2018:StationCodeRVB  -3.484e-02  5.084e-01  -0.069 0.945361    
-## Year_fct2019:StationCodeRVB  -5.837e-01  5.181e-01  -1.127 0.260065    
-## Flow:StationCodeSTTD          1.710e-03  3.001e-04   5.698 1.40e-08 ***
-## Flow:StationCodeLIB           7.500e-04  1.930e-04   3.885 0.000106 ***
-## Flow:StationCodeRVB           5.801e-04  1.879e-04   3.087 0.002052 ** 
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Approximate significance of smooth terms:
-##          edf Ref.df     F p-value    
-## s(DOY) 3.815  3.815 13.05  <2e-16 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## R-sq.(adj) =  0.787   
-##   Scale est. = 0.27723   n = 1921
-
appraise(m_gamm2_ar1_gam)
-

-
shapiro.test(resid_gamm2_ar1)
-
## 
-##  Shapiro-Wilk normality test
-## 
-## data:  resid_gamm2_ar1
-## W = 0.82416, p-value < 2.2e-16
-
k.check(m_gamm2_ar1_gam)
-
##        k'      edf   k-index p-value
-## s(DOY)  4 3.814541 0.7173524       0
-
draw(m_gamm2_ar1_gam, select = 1, residuals = TRUE, rug = FALSE)
-

-
plot(m_gamm2_ar1_gam, pages = 1, all.terms = TRUE)
-

-
acf(resid_gamm2_ar1)
-

-
Box.test(resid_gamm2_ar1, lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  resid_gamm2_ar1
-## X-squared = 48.902, df = 20, p-value = 0.0003175
-

The AR(1) model has much less residual autocorrelation, and the -diagnostics plots look okay. Let’s take a closer look at how the -back-transformed fitted values from the model match the observed -values.

-
df_gamm2_ar1_fit <- df_chla_c2 %>% 
-  fitted_values(m_gamm2_ar1_gam, data = .) %>% 
-  mutate(fitted_bt = exp(fitted) / 1000)
-
-plt_gamm2_ar1_obs_fit <- df_gamm2_ar1_fit %>% 
-  ggplot(aes(x = fitted_bt, y = Chla)) +
-  geom_point() +
-  geom_abline(slope = 1, intercept = 0, color = "red") +
-  theme_bw() +
-  labs(
-    x = "Back-transformed Fitted Values",
-    y = "Observed Values"
-  )
-
-plt_gamm2_ar1_obs_fit
-

-

Let’s look at each Station and Year separately.

-
plt_gamm2_ar1_obs_fit + facet_wrap(vars(StationCode), scales = "free")
-

-
plt_gamm2_ar1_obs_fit + facet_wrap(vars(Year_fct), scales = "free")
-

-

The red lines are the 1:1 ratio between fitted and observed values, -and we would expect for most of the points to be near this line if the -model has a good fit to the observed data. However, the fitted values -from the model don’t appear to match the observed values that well. -Again, I don’t think this GAM model is a good candidate to use for our -analysis.

-
-
-
-

Linear Model with 2-way interactions

-

Since the models using 3-way interactions don’t seem to fit the -observed data that well, let’s try a linear model using all two-way -interactions between Year, Daily Average Flow, and Station. Initially, -we’ll run the model without accounting for serial autocorrelation.

-
-

Initial Model

-
m_lm2 <- df_chla_c2_lag %>% 
-  drop_na(Chla_log) %>% 
-  lm(Chla_log ~ (Year_fct + Flow + StationCode)^2, data = .)
-

Lets look at the model summary and diagnostics:

-
summary(m_lm2)
-
## 
-## Call:
-## lm(formula = Chla_log ~ (Year_fct + Flow + StationCode)^2, data = .)
-## 
-## Residuals:
-##     Min      1Q  Median      3Q     Max 
-## -1.5367 -0.2393 -0.0399  0.2061  1.9457 
-## 
-## Coefficients:
-##                                Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)                   9.633e+00  4.043e-02 238.279  < 2e-16 ***
-## Year_fct2016                 -3.556e-01  5.747e-02  -6.188 7.43e-10 ***
-## Year_fct2017                 -3.852e-01  5.474e-02  -7.038 2.71e-12 ***
-## Year_fct2018                 -2.924e-01  5.249e-02  -5.569 2.92e-08 ***
-## Year_fct2019                 -1.411e-01  5.280e-02  -2.671 0.007619 ** 
-## Flow                         -1.529e-03  9.697e-05 -15.764  < 2e-16 ***
-## StationCodeSTTD              -9.934e-01  5.570e-02 -17.835  < 2e-16 ***
-## StationCodeLIB               -2.106e+00  1.139e-01 -18.485  < 2e-16 ***
-## StationCodeRVB               -2.159e+00  1.073e-01 -20.117  < 2e-16 ***
-## Year_fct2016:Flow            -1.249e-04  3.167e-05  -3.945 8.28e-05 ***
-## Year_fct2017:Flow            -7.386e-05  2.731e-05  -2.704 0.006909 ** 
-## Year_fct2018:Flow            -1.076e-05  2.737e-05  -0.393 0.694152    
-## Year_fct2019:Flow            -1.445e-05  2.928e-05  -0.494 0.621605    
-## Year_fct2016:StationCodeSTTD  7.414e-01  8.173e-02   9.072  < 2e-16 ***
-## Year_fct2017:StationCodeSTTD  5.161e-01  8.550e-02   6.037 1.89e-09 ***
-## Year_fct2018:StationCodeSTTD -3.377e-01  7.825e-02  -4.316 1.67e-05 ***
-## Year_fct2019:StationCodeSTTD -9.968e-01  7.700e-02 -12.945  < 2e-16 ***
-## Year_fct2016:StationCodeLIB   1.200e+00  9.360e-02  12.823  < 2e-16 ***
-## Year_fct2017:StationCodeLIB   4.159e-01  9.208e-02   4.517 6.67e-06 ***
-## Year_fct2018:StationCodeLIB  -1.560e+00  1.125e-01 -13.868  < 2e-16 ***
-## Year_fct2019:StationCodeLIB  -3.847e-01  1.150e-01  -3.346 0.000837 ***
-## Year_fct2016:StationCodeRVB   1.489e+00  2.227e-01   6.686 3.02e-11 ***
-## Year_fct2017:StationCodeRVB   6.563e-01  1.998e-01   3.285 0.001037 ** 
-## Year_fct2018:StationCodeRVB  -8.074e-02  1.636e-01  -0.494 0.621625    
-## Year_fct2019:StationCodeRVB  -6.171e-01  1.962e-01  -3.145 0.001686 ** 
-## Flow:StationCodeSTTD          3.308e-03  1.377e-04  24.020  < 2e-16 ***
-## Flow:StationCodeLIB           1.427e-03  1.140e-04  12.513  < 2e-16 ***
-## Flow:StationCodeRVB           1.565e-03  9.550e-05  16.386  < 2e-16 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Residual standard error: 0.3979 on 1893 degrees of freedom
-## Multiple R-squared:  0.8389, Adjusted R-squared:  0.8366 
-## F-statistic: 365.2 on 27 and 1893 DF,  p-value: < 2.2e-16
-
df_chla_c2_lag %>% 
-  drop_na(Chla_log) %>% 
-  plot_lm_diag(Chla_log, m_lm2)
-
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
-

-
shapiro.test(residuals(m_lm2))
-
## 
-##  Shapiro-Wilk normality test
-## 
-## data:  residuals(m_lm2)
-## W = 0.97046, p-value < 2.2e-16
-
acf(residuals(m_lm2))
-

-
Box.test(residuals(m_lm2), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_lm2)
-## X-squared = 4904.7, df = 20, p-value < 2.2e-16
-

The residuals deviate from a normal distribution according to visual -inspection and the Shapiro-Wilk normality test. Also, model definitely -has residual autocorrelation as indicated by the ACF plot and the -Box-Ljung test.

-
-

Model with lag terms

-

Now, we’ll try to deal with the residual autocorrelation and the -non-normal residuals. We’ll run a series of linear models adding 1, 2, -and 3 lag terms and compare how well they correct for -autocorrelation.

-
-

Lag 1

-
m_lm2_lag1 <- df_chla_c2_lag %>% 
-  drop_na(Chla_log, lag1) %>% 
-  lm(Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1, data = .)
-
-acf(residuals(m_lm2_lag1))
-

-
Box.test(residuals(m_lm2_lag1), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_lm2_lag1)
-## X-squared = 58.308, df = 20, p-value = 1.297e-05
-
-
-

Lag 2

-
m_lm2_lag2 <- df_chla_c2_lag %>% 
-  drop_na(Chla_log, lag1, lag2) %>% 
-  lm(Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + lag2, data = .)
-
-acf(residuals(m_lm2_lag2))
-

-
Box.test(residuals(m_lm2_lag2), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_lm2_lag2)
-## X-squared = 34.749, df = 20, p-value = 0.02148
-
-
-

Lag 3

-
m_lm2_lag3 <- df_chla_c2_lag %>% 
-  drop_na(Chla_log, lag1, lag2, lag3) %>% 
-  lm(Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + lag2 + lag3, data = .)
-
-acf(residuals(m_lm2_lag3))
-

-
Box.test(residuals(m_lm2_lag3), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_lm2_lag3)
-## X-squared = 28.624, df = 20, p-value = 0.09542
-

The model with 3 lag terms looks to have slightly better ACF and -Box-Ljung test results than the other models. Let’s compare the 3 models -using AIC and BIC to see which model those prefer.

-
-
-

Compare models

-
AIC(m_lm2_lag1, m_lm2_lag2, m_lm2_lag3)
-
##            df       AIC
-## m_lm2_lag1 30 -1331.263
-## m_lm2_lag2 31 -1317.974
-## m_lm2_lag3 32 -1299.688
-
BIC(m_lm2_lag1, m_lm2_lag2, m_lm2_lag3)
-
##            df       BIC
-## m_lm2_lag1 30 -1165.189
-## m_lm2_lag2 31 -1147.083
-## m_lm2_lag3 32 -1123.919
-

Both AIC and BIC prefer the model with 1 lag term. Let’s take a -closer look at that one.

-
-
-
-

Lag 1 model summary

-
summary(m_lm2_lag1)
-
## 
-## Call:
-## lm(formula = Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1, 
-##     data = .)
-## 
-## Residuals:
-##      Min       1Q   Median       3Q      Max 
-## -1.10990 -0.05742 -0.00637  0.05263  1.08891 
-## 
-## Coefficients:
-##                                Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)                   8.621e-01  9.706e-02   8.882  < 2e-16 ***
-## Year_fct2016                 -4.466e-02  2.467e-02  -1.810 0.070432 .  
-## Year_fct2017                 -3.920e-02  2.357e-02  -1.664 0.096380 .  
-## Year_fct2018                 -2.570e-02  2.248e-02  -1.143 0.253192    
-## Year_fct2019                  4.008e-04  2.248e-02   0.018 0.985774    
-## Flow                         -1.624e-04  4.370e-05  -3.717 0.000208 ***
-## StationCodeSTTD              -8.197e-02  2.571e-02  -3.188 0.001456 ** 
-## StationCodeLIB               -1.210e-01  5.403e-02  -2.239 0.025270 *  
-## StationCodeRVB               -1.584e-01  5.049e-02  -3.137 0.001736 ** 
-## lag1                          9.106e-01  9.917e-03  91.816  < 2e-16 ***
-## Year_fct2016:Flow            -7.998e-06  1.364e-05  -0.586 0.557696    
-## Year_fct2017:Flow            -1.197e-06  1.167e-05  -0.103 0.918296    
-## Year_fct2018:Flow             3.085e-06  1.166e-05   0.265 0.791304    
-## Year_fct2019:Flow             4.247e-07  1.252e-05   0.034 0.972946    
-## Year_fct2016:StationCodeSTTD  6.232e-02  3.561e-02   1.750 0.080298 .  
-## Year_fct2017:StationCodeSTTD  7.283e-02  3.689e-02   1.974 0.048527 *  
-## Year_fct2018:StationCodeSTTD -3.915e-02  3.345e-02  -1.170 0.242020    
-## Year_fct2019:StationCodeSTTD -1.145e-01  3.422e-02  -3.345 0.000838 ***
-## Year_fct2016:StationCodeLIB   9.269e-02  4.175e-02   2.220 0.026522 *  
-## Year_fct2017:StationCodeLIB   2.497e-02  3.947e-02   0.632 0.527175    
-## Year_fct2018:StationCodeLIB  -2.167e-01  5.218e-02  -4.152 3.45e-05 ***
-## Year_fct2019:StationCodeLIB  -6.106e-02  5.164e-02  -1.182 0.237206    
-## Year_fct2016:StationCodeRVB   1.449e-01  9.716e-02   1.491 0.136079    
-## Year_fct2017:StationCodeRVB   5.911e-02  8.512e-02   0.694 0.487457    
-## Year_fct2018:StationCodeRVB  -5.716e-03  6.935e-02  -0.082 0.934320    
-## Year_fct2019:StationCodeRVB  -4.787e-02  8.441e-02  -0.567 0.570736    
-## Flow:StationCodeSTTD          3.526e-04  6.676e-05   5.281 1.43e-07 ***
-## Flow:StationCodeLIB           1.992e-04  5.072e-05   3.928 8.89e-05 ***
-## Flow:StationCodeRVB           1.567e-04  4.326e-05   3.622 0.000300 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Residual standard error: 0.1682 on 1845 degrees of freedom
-## Multiple R-squared:  0.9706, Adjusted R-squared:  0.9701 
-## F-statistic:  2172 on 28 and 1845 DF,  p-value: < 2.2e-16
-
df_chla_c2_lag %>% 
-  drop_na(Chla_log, lag1) %>% 
-  plot_lm_diag(Chla_log, m_lm2_lag1)
-
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
-

-
shapiro.test(residuals(m_lm2_lag1))
-
## 
-##  Shapiro-Wilk normality test
-## 
-## data:  residuals(m_lm2_lag1)
-## W = 0.86339, p-value < 2.2e-16
-

As with the model with 3-way interactions, the residuals deviate from -a normal distribution particularly at the both tails of the -distribution. I’m not so sure about using this model. Let’s take a -closer look at how the back-transformed fitted values from the model -match the observed values.

-
df_lm2_lag1_fit <- df_chla_c2_lag %>% 
-  drop_na(Chla, lag1) %>% 
-  mutate(Fitted = as_tibble(predict(m_lm2_lag1, interval = "confidence"))) %>% 
-  unnest_wider(Fitted) %>% 
-  mutate(across(c("fit", "lwr", "upr"), ~ exp(.x) / 1000))
-
-plt_lm2_lag1_obs_fit <- df_lm2_lag1_fit %>% 
-  ggplot(aes(x = fit, y = Chla)) +
-  geom_point() +
-  geom_abline(slope = 1, intercept = 0, color = "red") +
-  theme_bw() +
-  labs(
-    x = "Back-transformed Fitted Values",
-    y = "Observed Values"
-  )
-
-plt_lm2_lag1_obs_fit
-

-

Let’s look at each Station and Year separately.

-
plt_lm2_lag1_obs_fit + facet_wrap(vars(StationCode), scales = "free")
-

-
plt_lm2_lag1_obs_fit + facet_wrap(vars(Year_fct), scales = "free")
-

-

The red lines are the 1:1 ratio between fitted and observed values, -and we would expect for most of the points to be near this line if the -model has a good fit to the observed data. The fitted and observed -values appear to match pretty well at the lower end of the range of -values, but this deteriorates at the mid and higher range values. This -pattern holds for some of the individual Stations and Years. I’m not -sure this linear model is a good candidate to use for our analysis.

-
-
-

Lag 3 model summary

-

Let’s take a closer look at the model with 3 lag terms since that had -the least autocorrelation.

-
summary(m_lm2_lag3)
-
## 
-## Call:
-## lm(formula = Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + 
-##     lag2 + lag3, data = .)
-## 
-## Residuals:
-##      Min       1Q   Median       3Q      Max 
-## -1.11082 -0.05607 -0.00690  0.05255  1.12323 
-## 
-## Coefficients:
-##                                Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)                   9.973e-01  1.037e-01   9.616  < 2e-16 ***
-## Year_fct2016                 -4.170e-02  2.478e-02  -1.682 0.092657 .  
-## Year_fct2017                 -3.659e-02  2.365e-02  -1.547 0.122072    
-## Year_fct2018                 -2.643e-02  2.255e-02  -1.172 0.241268    
-## Year_fct2019                  4.395e-03  2.253e-02   0.195 0.845349    
-## Flow                         -1.748e-04  4.378e-05  -3.994 6.77e-05 ***
-## StationCodeSTTD              -8.615e-02  2.611e-02  -3.299 0.000988 ***
-## StationCodeLIB               -1.599e-01  5.592e-02  -2.858 0.004309 ** 
-## StationCodeRVB               -1.834e-01  5.086e-02  -3.607 0.000319 ***
-## lag1                          9.620e-01  2.396e-02  40.159  < 2e-16 ***
-## lag2                         -7.617e-03  3.309e-02  -0.230 0.817974    
-## lag3                         -5.869e-02  2.395e-02  -2.450 0.014368 *  
-## Year_fct2016:Flow            -9.558e-06  1.419e-05  -0.673 0.500724    
-## Year_fct2017:Flow            -2.255e-06  1.164e-05  -0.194 0.846434    
-## Year_fct2018:Flow             3.211e-06  1.161e-05   0.277 0.782103    
-## Year_fct2019:Flow             5.258e-06  1.279e-05   0.411 0.680965    
-## Year_fct2016:StationCodeSTTD  7.111e-02  3.603e-02   1.973 0.048606 *  
-## Year_fct2017:StationCodeSTTD  7.658e-02  3.753e-02   2.041 0.041445 *  
-## Year_fct2018:StationCodeSTTD -4.966e-02  3.363e-02  -1.477 0.139987    
-## Year_fct2019:StationCodeSTTD -1.379e-01  3.461e-02  -3.984 7.05e-05 ***
-## Year_fct2016:StationCodeLIB   1.011e-01  4.269e-02   2.369 0.017957 *  
-## Year_fct2017:StationCodeLIB   2.753e-02  3.966e-02   0.694 0.487626    
-## Year_fct2018:StationCodeLIB  -2.237e-01  5.530e-02  -4.045 5.46e-05 ***
-## Year_fct2019:StationCodeLIB  -6.390e-02  5.763e-02  -1.109 0.267669    
-## Year_fct2016:StationCodeRVB   1.545e-01  1.041e-01   1.483 0.138162    
-## Year_fct2017:StationCodeRVB   5.969e-02  8.535e-02   0.699 0.484390    
-## Year_fct2018:StationCodeRVB  -1.333e-02  6.896e-02  -0.193 0.846691    
-## Year_fct2019:StationCodeRVB  -1.159e-01  8.851e-02  -1.310 0.190487    
-## Flow:StationCodeSTTD          3.796e-04  6.738e-05   5.634 2.04e-08 ***
-## Flow:StationCodeLIB           1.999e-04  5.141e-05   3.888 0.000105 ***
-## Flow:StationCodeRVB           1.698e-04  4.333e-05   3.919 9.23e-05 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Residual standard error: 0.1669 on 1764 degrees of freedom
-## Multiple R-squared:  0.9702, Adjusted R-squared:  0.9697 
-## F-statistic:  1914 on 30 and 1764 DF,  p-value: < 2.2e-16
-
df_chla_c2_lag %>% 
-  drop_na(Chla_log, lag1, lag2, lag3) %>% 
-  plot_lm_diag(Chla_log, m_lm2_lag3)
-
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
-

-
shapiro.test(residuals(m_lm2_lag3))
-
## 
-##  Shapiro-Wilk normality test
-## 
-## data:  residuals(m_lm2_lag3)
-## W = 0.85552, p-value < 2.2e-16
-

The diagnostic plots show that the lag 3 model has similar problems -as the lag 1 model, so this doesn’t look like a good candidate for our -analysis as well.

-
-
-
-
-

Linear Model with single station - STTD

-

Since none of the models we tried so far seemed to fit the observed -data very well, it may be interesting to run separate models for each of -the four stations - RD22, STTD, LIB, RVB - to see if that helps improve -model fit. We’ll try STTD as a test case.

-
-

Initial Model

-

Let’s try a linear model using a two-way interaction between Year and -Daily Average Flow. Initially, we’ll run the model without accounting -for serial autocorrelation.

-
df_sttd <- df_chla_c2_lag %>% filter(StationCode == "STTD")
-
-m_lm2_sttd <- df_sttd %>% 
-  drop_na(Chla_log) %>% 
-  lm(Chla_log ~ Year_fct * Flow, data = .)
-

Lets look at the model summary and diagnostics:

-
summary(m_lm2_sttd)
-
## 
-## Call:
-## lm(formula = Chla_log ~ Year_fct * Flow, data = .)
-## 
-## Residuals:
-##      Min       1Q   Median       3Q      Max 
-## -0.78556 -0.17235 -0.02937  0.13767  1.54418 
-## 
-## Coefficients:
-##                     Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)        8.6179166  0.0295372 291.765  < 2e-16 ***
-## Year_fct2016       0.4259203  0.0459339   9.272  < 2e-16 ***
-## Year_fct2017       0.3103625  0.0675065   4.598 5.66e-06 ***
-## Year_fct2018      -0.6462877  0.0456684 -14.152  < 2e-16 ***
-## Year_fct2019      -1.0607084  0.0444363 -23.870  < 2e-16 ***
-## Flow               0.0027158  0.0002046  13.273  < 2e-16 ***
-## Year_fct2016:Flow -0.0013946  0.0002864  -4.869 1.59e-06 ***
-## Year_fct2017:Flow  0.0051503  0.0017706   2.909  0.00382 ** 
-## Year_fct2018:Flow -0.0004084  0.0002580  -1.583  0.11417    
-## Year_fct2019:Flow -0.0014040  0.0002321  -6.051 3.19e-09 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Residual standard error: 0.3032 on 421 degrees of freedom
-## Multiple R-squared:  0.8091, Adjusted R-squared:  0.805 
-## F-statistic: 198.2 on 9 and 421 DF,  p-value: < 2.2e-16
-
df_sttd %>% 
-  drop_na(Chla_log) %>% 
-  plot_lm_diag(Chla_log, m_lm2_sttd)
-
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
-

-
shapiro.test(residuals(m_lm2_sttd))
-
## 
-##  Shapiro-Wilk normality test
-## 
-## data:  residuals(m_lm2_sttd)
-## W = 0.94653, p-value = 2.342e-11
-
acf(residuals(m_lm2_sttd))
-

-
Box.test(residuals(m_lm2_sttd), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_lm2_sttd)
-## X-squared = 463.96, df = 20, p-value < 2.2e-16
-

The residuals deviate from a normal distribution according to visual -inspection and the Shapiro-Wilk normality test. Also, model definitely -has residual autocorrelation as indicated by the ACF plot and the -Box-Ljung test.

-
-

Model with lag terms

-

Now, we’ll try to deal with the residual autocorrelation and the -non-normal residuals. We’ll run a series of linear models adding 1, 2, -and 3 lag terms and compare how well they correct for -autocorrelation.

-
-

Lag 1

-
m_lm2_sttd_lag1 <- df_sttd %>% 
-  drop_na(Chla_log, lag1) %>% 
-  lm(Chla_log ~ Year_fct * Flow + lag1, data = .)
-
-acf(residuals(m_lm2_sttd_lag1))
-

-
Box.test(residuals(m_lm2_sttd_lag1), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_lm2_sttd_lag1)
-## X-squared = 37.182, df = 20, p-value = 0.01113
-
-
-

Lag 2

-
m_lm2_sttd_lag2 <- df_sttd %>% 
-  drop_na(Chla_log, lag1, lag2) %>% 
-  lm(Chla_log ~ Year_fct * Flow + lag1 + lag2, data = .)
-
-acf(residuals(m_lm2_sttd_lag2))
-

-
Box.test(residuals(m_lm2_sttd_lag2), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_lm2_sttd_lag2)
-## X-squared = 18.739, df = 20, p-value = 0.5389
-
-
-

Lag 3

-
m_lm2_sttd_lag3 <- df_sttd %>% 
-  drop_na(Chla_log, lag1, lag2, lag3) %>% 
-  lm(Chla_log ~ Year_fct * Flow + lag1 + lag2 + lag3, data = .)
-
-acf(residuals(m_lm2_sttd_lag3))
-

-
Box.test(residuals(m_lm2_sttd_lag3), lag = 20, type = 'Ljung-Box')
-
## 
-##  Box-Ljung test
-## 
-## data:  residuals(m_lm2_sttd_lag3)
-## X-squared = 10.814, df = 20, p-value = 0.9509
-

The model with 2 lag terms seems to take care of the autocorrelation. -Let’s compare the 3 models using AIC and BIC to see which model those -prefer.

-
-
-

Compare models

-
AIC(m_lm2_sttd_lag1, m_lm2_sttd_lag2, m_lm2_sttd_lag3)
-
##                 df       AIC
-## m_lm2_sttd_lag1 12 -328.3102
-## m_lm2_sttd_lag2 13 -350.6190
-## m_lm2_sttd_lag3 14 -340.5244
-
BIC(m_lm2_sttd_lag1, m_lm2_sttd_lag2, m_lm2_sttd_lag3)
-
##                 df       BIC
-## m_lm2_sttd_lag1 12 -279.7701
-## m_lm2_sttd_lag2 13 -298.3142
-## m_lm2_sttd_lag3 14 -284.4699
-

Both AIC and BIC prefer the model with 2 lag terms. Let’s take a -closer look at that one.

-
-
-
-

Lag 2 model summary

-
summary(m_lm2_sttd_lag2)
-
## 
-## Call:
-## lm(formula = Chla_log ~ Year_fct * Flow + lag1 + lag2, data = .)
-## 
-## Residuals:
-##      Min       1Q   Median       3Q      Max 
-## -0.88419 -0.06597 -0.00902  0.05115  0.81167 
-## 
-## Coefficients:
-##                     Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)        1.613e+00  2.285e-01   7.059 7.43e-12 ***
-## Year_fct2016       6.865e-02  2.667e-02   2.574 0.010415 *  
-## Year_fct2017       1.177e-01  3.644e-02   3.230 0.001339 ** 
-## Year_fct2018      -1.292e-01  2.907e-02  -4.445 1.14e-05 ***
-## Year_fct2019      -2.085e-01  3.616e-02  -5.767 1.61e-08 ***
-## Flow               5.136e-04  1.324e-04   3.878 0.000123 ***
-## lag1               1.109e+00  4.925e-02  22.525  < 2e-16 ***
-## lag2              -2.955e-01  4.819e-02  -6.132 2.07e-09 ***
-## Year_fct2016:Flow -2.826e-04  1.555e-04  -1.818 0.069803 .  
-## Year_fct2017:Flow  2.695e-03  1.001e-03   2.692 0.007402 ** 
-## Year_fct2018:Flow -5.969e-05  1.375e-04  -0.434 0.664474    
-## Year_fct2019:Flow -2.504e-04  1.302e-04  -1.922 0.055260 .  
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Residual standard error: 0.1556 on 401 degrees of freedom
-## Multiple R-squared:  0.9497, Adjusted R-squared:  0.9483 
-## F-statistic: 687.9 on 11 and 401 DF,  p-value: < 2.2e-16
-
df_sttd %>% 
-  drop_na(Chla_log, lag1, lag2) %>% 
-  plot_lm_diag(Chla_log, m_lm2_sttd_lag2)
-
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
-

-
shapiro.test(residuals(m_lm2_sttd_lag2))
-
## 
-##  Shapiro-Wilk normality test
-## 
-## data:  residuals(m_lm2_sttd_lag2)
-## W = 0.86609, p-value < 2.2e-16
-

As with all the other linear models, the residuals deviate from a -normal distribution particularly at the both tails of the distribution. -I’m not so sure about using this model. Let’s take a closer look at how -the back-transformed fitted values from the model match the observed -values.

-
df_lm2_sttd_lag2_fit <- df_sttd %>% 
-  drop_na(Chla, lag1, lag2) %>% 
-  mutate(Fitted = as_tibble(predict(m_lm2_sttd_lag2, interval = "confidence"))) %>% 
-  unnest_wider(Fitted) %>% 
-  mutate(across(c("fit", "lwr", "upr"), ~ exp(.x) / 1000))
-
-plt_lm2_sttd_lag2_obs_fit <- df_lm2_sttd_lag2_fit %>% 
-  ggplot(aes(x = fit, y = Chla)) +
-  geom_point() +
-  geom_abline(slope = 1, intercept = 0, color = "red") +
-  theme_bw() +
-  labs(
-    x = "Back-transformed Fitted Values",
-    y = "Observed Values"
-  )
-
-plt_lm2_sttd_lag2_obs_fit
-

-

Let’s look at each Year separately.

-
plt_lm2_sttd_lag2_obs_fit + facet_wrap(vars(Year_fct), scales = "free")
-

-

The red lines are the 1:1 ratio between fitted and observed values, -and we would expect for most of the points to be near this line if the -model has a good fit to the observed data. The fitted and observed -values appear to match pretty well at the lower end of the range of -values, but this deteriorates at the higher values. This pattern holds -for some of the individual Years. I’m not sure this linear model is a -good candidate to use for our analysis.

-
-
-
-
-

Conclusion

-

None of the models we tried in this analysis seemed to fit the -observed data very well. It looks like daily average flow has an effect -on chlorophyll concentrations at some of the stations, but flow doesn’t -seem to be a good predictor overall. Unfortunately, I don’t think we -should use any of these models in our analysis.

-
- - - -
-
- -
- - - - - - - - - - - - - - - - - diff --git a/docs/rtm_chl_models_flow_daily_avg.html b/docs/rtm_chl_models_flow_daily_avg.html new file mode 100644 index 0000000..a54b7f4 --- /dev/null +++ b/docs/rtm_chl_models_flow_daily_avg.html @@ -0,0 +1,3515 @@ + + + + + + + + + + + + + + + +NDFS Synthesis Manuscript: Chlorophyll analysis + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+
+ +
+ + + + + + + +
+

Purpose

+

Explore and analyze the continuous chlorophyll data to be included in +the NDFS synthesis manuscript. We will attempt to fit various models to +the data set using daily average flow as a continuous predictor which +replaces the categorical predictor - flow action period - in the +original analysis. These models will only include representative +stations for 4 habitat types - upstream (RD22), lower Yolo Bypass +(STTD), Cache Slough complex (LIB), and downstream (RVB).

+
+
+

Global code and functions

+
# Load packages
+library(tidyverse)
+library(scales)
+library(knitr)
+library(mgcv)
+library(car)
+library(gratia)
+library(here)
+library(conflicted)
+
+# Source functions
+source(here("manuscript_synthesis/src/global_functions.R"))
+
+# Declare package conflict preferences 
+conflicts_prefer(dplyr::filter(), dplyr::lag())
+

Display current versions of R and packages used for this +analysis:

+
devtools::session_info()
+
## ─ Session info ───────────────────────────────────────────────────────────────
+##  setting  value
+##  version  R version 4.2.3 (2023-03-15 ucrt)
+##  os       Windows 10 x64 (build 19044)
+##  system   x86_64, mingw32
+##  ui       RTerm
+##  language (EN)
+##  collate  English_United States.utf8
+##  ctype    English_United States.utf8
+##  tz       America/Los_Angeles
+##  date     2023-12-11
+##  pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
+## 
+## ─ Packages ───────────────────────────────────────────────────────────────────
+##  package     * version date (UTC) lib source
+##  abind         1.4-5   2016-07-21 [1] CRAN (R 4.2.0)
+##  bslib         0.4.2   2022-12-16 [1] CRAN (R 4.2.2)
+##  cachem        1.0.8   2023-05-01 [1] CRAN (R 4.2.3)
+##  callr         3.7.3   2022-11-02 [1] CRAN (R 4.2.2)
+##  car         * 3.1-2   2023-03-30 [1] CRAN (R 4.2.3)
+##  carData     * 3.0-5   2022-01-06 [1] CRAN (R 4.2.1)
+##  cli           3.6.1   2023-03-23 [1] CRAN (R 4.2.3)
+##  colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.2.2)
+##  conflicted  * 1.2.0   2023-02-01 [1] CRAN (R 4.2.2)
+##  crayon        1.5.2   2022-09-29 [1] CRAN (R 4.2.1)
+##  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.2.1)
+##  digest        0.6.33  2023-07-07 [1] CRAN (R 4.2.3)
+##  dplyr       * 1.1.3   2023-09-03 [1] CRAN (R 4.2.3)
+##  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.1)
+##  evaluate      0.21    2023-05-05 [1] CRAN (R 4.2.3)
+##  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.2.2)
+##  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.2.2)
+##  forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.2.2)
+##  fs          * 1.6.3   2023-07-20 [1] CRAN (R 4.2.3)
+##  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.1)
+##  ggplot2     * 3.4.3   2023-08-14 [1] CRAN (R 4.2.3)
+##  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.1)
+##  gratia      * 0.8.1   2023-02-02 [1] CRAN (R 4.2.2)
+##  gtable        0.3.4   2023-08-21 [1] CRAN (R 4.2.3)
+##  here        * 1.0.1   2020-12-13 [1] CRAN (R 4.2.1)
+##  hms           1.1.3   2023-03-21 [1] CRAN (R 4.2.3)
+##  htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.2.3)
+##  htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.2.3)
+##  httpuv        1.6.9   2023-02-14 [1] CRAN (R 4.2.2)
+##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.2.1)
+##  jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.2.3)
+##  knitr       * 1.42    2023-01-25 [1] CRAN (R 4.2.2)
+##  later         1.3.0   2021-08-18 [1] CRAN (R 4.2.1)
+##  lattice       0.21-8  2023-04-05 [1] CRAN (R 4.2.3)
+##  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.1)
+##  lubridate   * 1.9.3   2023-09-27 [1] CRAN (R 4.2.3)
+##  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.1)
+##  Matrix        1.5-4   2023-04-04 [1] CRAN (R 4.2.3)
+##  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.2.1)
+##  mgcv        * 1.8-42  2023-03-02 [1] CRAN (R 4.2.2)
+##  mime          0.12    2021-09-28 [1] CRAN (R 4.2.0)
+##  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.2.1)
+##  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.2.1)
+##  mvnfast       0.2.8   2023-02-23 [1] CRAN (R 4.2.2)
+##  nlme        * 3.1-162 2023-01-31 [2] CRAN (R 4.2.3)
+##  patchwork   * 1.1.2   2022-08-19 [1] CRAN (R 4.2.1)
+##  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.2.3)
+##  pkgbuild      1.4.2   2023-06-26 [1] CRAN (R 4.2.3)
+##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.1)
+##  pkgload       1.3.2.1 2023-07-08 [1] CRAN (R 4.2.3)
+##  prettyunits   1.2.0   2023-09-24 [1] CRAN (R 4.2.3)
+##  processx      3.8.2   2023-06-30 [1] CRAN (R 4.2.3)
+##  profvis       0.3.7   2020-11-02 [1] CRAN (R 4.2.1)
+##  promises      1.2.0.1 2021-02-11 [1] CRAN (R 4.2.1)
+##  ps            1.7.5   2023-04-18 [1] CRAN (R 4.2.3)
+##  purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.2.3)
+##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.1)
+##  Rcpp          1.0.11  2023-07-06 [1] CRAN (R 4.2.3)
+##  readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.2.2)
+##  remotes       2.4.2   2021-11-30 [1] CRAN (R 4.2.1)
+##  rlang       * 1.1.1   2023-04-28 [1] CRAN (R 4.2.3)
+##  rmarkdown     2.21    2023-03-26 [1] CRAN (R 4.2.3)
+##  rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.2.1)
+##  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.1)
+##  sass          0.4.6   2023-05-03 [1] CRAN (R 4.2.3)
+##  scales      * 1.2.1   2022-08-20 [1] CRAN (R 4.2.1)
+##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.1)
+##  shiny         1.7.4   2022-12-15 [1] CRAN (R 4.2.2)
+##  stringi       1.7.12  2023-01-11 [1] CRAN (R 4.2.2)
+##  stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.2.2)
+##  tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.2.3)
+##  tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.2.2)
+##  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.1)
+##  tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.2.2)
+##  timechange    0.2.0   2023-01-11 [1] CRAN (R 4.2.2)
+##  tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.2.3)
+##  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.2.1)
+##  usethis       2.1.6   2022-05-25 [1] CRAN (R 4.2.1)
+##  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.2.2)
+##  vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.2.3)
+##  withr         2.5.1   2023-09-26 [1] CRAN (R 4.2.3)
+##  xfun          0.39    2023-04-20 [1] CRAN (R 4.2.3)
+##  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.2.1)
+##  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.2.2)
+## 
+##  [1] C:/R/win-library/4.2
+##  [2] C:/Program Files/R/R-4.2.3/library
+## 
+## ──────────────────────────────────────────────────────────────────────────────
+
+
+

Import Data

+
# Define file path for processed data
+fp_data <- here("manuscript_synthesis/data/processed")
+
+# Import daily average chlorophyll data
+df_chla <- readRDS(file.path(fp_data, "chla_wt_daily_avg_2013-2019.rds")) %>% select(-WaterTemp)
+
+# Import daily average flow data
+df_flow <- readRDS(file.path(fp_data, "flow_daily_avg_2013-2019.rds"))
+
+
+

Prepare Data

+
# Create a vector for the factor order of StationCode
+sta_order <- c("RD22", "STTD", "LIB", "RVB")
+
+# We will use LIS flow data as a proxy for STTD
+df_flow_c <- df_flow %>% mutate(StationCode = if_else(StationCode == "LIS", "STTD", StationCode))
+
+# Prepare chlorophyll and flow data for exploration and analysis
+df_chla_c1 <- df_chla %>% 
+  # Filter to only include representative stations for 4 habitat types - RD22, STTD, LIB, RVB
+  filter(StationCode %in% sta_order) %>% 
+  # Join flow data to chlorophyll data
+  left_join(df_flow_c, by = join_by(StationCode, Date)) %>% 
+  # Remove all NA flow values
+  drop_na(Flow) %>% 
+  mutate(
+    # Scale and log transform chlorophyll values
+    Chla_log = log(Chla * 1000),
+    # Add Year and DOY variables
+    Year = year(Date),
+    DOY = yday(Date),
+    # Apply factor order to StationCode
+    StationCode = factor(StationCode, levels = sta_order),
+    # Add a column for Year as a factor for the model
+    Year_fct = factor(Year)
+  ) %>% 
+  arrange(StationCode, Date)
+
+
+

Explore sample counts by Station

+
df_chla_c1 %>% 
+  summarize(
+    min_date = min(Date),
+    max_date = max(Date),
+    num_samples = n(),
+    .by = c(StationCode, Year)
+  ) %>% 
+  arrange(StationCode, Year) %>% 
+  kable()
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
StationCodeYearmin_datemax_datenum_samples
RD2220142014-10-012014-11-0839
RD2220152015-07-242015-11-10110
RD2220162016-06-232016-09-1686
RD2220172017-07-142017-11-03101
RD2220182018-07-132018-11-11122
RD2220192019-07-112019-11-06118
STTD20132013-08-152013-11-0478
STTD20142014-07-252014-10-1987
STTD20152015-07-272015-11-16108
STTD20162016-06-232016-09-1683
STTD20172017-07-142017-09-2556
STTD20182018-07-202018-10-1184
STTD20192019-07-262019-11-06100
LIB20142014-10-012014-11-0835
LIB20152015-07-062015-11-16123
LIB20162016-05-292016-09-1699
LIB20172017-07-142017-11-0189
LIB20182018-08-222018-10-2226
LIB20192019-07-112019-09-2121
RVB20132013-07-072013-11-17134
RVB20142014-07-252014-11-08105
RVB20152015-07-062015-11-16134
RVB20162016-05-292016-09-1680
RVB20172017-07-142017-11-03113
RVB20182018-07-132018-11-11122
RVB20192019-07-112019-11-0699
+

Looking at the sample counts and date ranges, we’ll only include +years 2015-2019 for the analysis.

+
df_chla_c2 <- df_chla_c1 %>% 
+  filter(Year %in% 2015:2019) %>% 
+  mutate(Year_fct = fct_drop(Year_fct))
+
+# Create another dataframe that has up to 3 lag variables for chlorophyll to be
+  # used in the linear models
+df_chla_c2_lag <- df_chla_c2 %>% 
+  # Fill in missing days for each StationCode-Year combination
+  group_by(StationCode, Year) %>% 
+  complete(Date = seq.Date(min(Date), max(Date), by = "1 day")) %>% 
+  # Create lag variables of scaled log transformed chlorophyll values
+  mutate(
+    lag1 = lag(Chla_log),
+    lag2 = lag(Chla_log, 2),
+    lag3 = lag(Chla_log, 3)
+  ) %>% 
+  ungroup()
+
+
+

Plots

+

Let’s explore the data with some plots. First, lets plot the data in +scatter plots of chlorophyll and flow faceted by Station and grouping +all years together.

+
df_chla_c2 %>% 
+  ggplot(aes(x = Flow, y = Chla)) +
+  geom_point() +
+  geom_smooth(formula = "y ~ x") +
+  facet_wrap(vars(StationCode), scales = "free") +
+  theme_bw()
+

+

At first glance, I’m not sure how well flow is going to be able to +predict chlorophyll concentrations. At the furthest upstream station - +RD22 - chlorophyll appears to be highest at the lowest flows, but the +variation is at its maximum at the lowest flows. There may be some +dilution effect going on here at the higher flows. At STTD, there does +seem to be a modest increase in chlorophyll concentrations at the +mid-range flows. This pattern is even more obvious at LIB. There appears +to be no effect of flow on chlorophyll at RVB, but the range of +chlorophyll concentrations is narrow at this station (between 0 and +5).

+

Let’s break these scatterplots apart by year to see how these +patterns vary annually.

+
df_chla_c2 %>% 
+  ggplot(aes(x = Flow, y = Chla)) +
+  geom_point() +
+  geom_smooth(formula = "y ~ x") +
+  facet_wrap(
+    vars(StationCode, Year), 
+    ncol = 5, 
+    scales = "free", 
+    labeller = labeller(.multi_line = FALSE)
+  ) +
+  theme_bw()
+

+

The patterns appear to vary annually at each station, which may +justify using a 3-way interaction.

+
+
+

GAM Model with 3-way interactions

+

First, we will attempt to fit a generalized additive model (GAM) to +the data set to help account for seasonality in the data. We’ll try +running a GAM using a three-way interaction between Year, Daily Average +Flow, and Station, and a smooth term for day of year to account for +seasonality. Initially, we’ll run the GAM without accounting for serial +autocorrelation.

+
+

Initial Model

+
m_gam3 <- gam(
+  Chla_log ~ Year_fct * Flow * StationCode + s(DOY), 
+  data = df_chla_c2,
+  method = "REML"
+)
+

Lets look at the model summary and diagnostics:

+
summary(m_gam3)
+
## 
+## Family: gaussian 
+## Link function: identity 
+## 
+## Formula:
+## Chla_log ~ Year_fct * Flow * StationCode + s(DOY)
+## 
+## Parametric coefficients:
+##                                     Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)                        9.734e+00  4.242e-02 229.451  < 2e-16 ***
+## Year_fct2016                      -8.100e-01  6.510e-02 -12.442  < 2e-16 ***
+## Year_fct2017                      -9.446e-01  9.333e-02 -10.120  < 2e-16 ***
+## Year_fct2018                      -3.798e-01  5.586e-02  -6.800 1.42e-11 ***
+## Year_fct2019                      -1.569e-01  5.549e-02  -2.827 0.004748 ** 
+## Flow                              -1.795e-03  1.810e-04  -9.916  < 2e-16 ***
+## StationCodeSTTD                   -1.068e+00  5.338e-02 -20.010  < 2e-16 ***
+## StationCodeLIB                    -2.164e+00  1.427e-01 -15.166  < 2e-16 ***
+## StationCodeRVB                    -2.181e+00  9.945e-02 -21.927  < 2e-16 ***
+## Year_fct2016:Flow                  2.101e-03  2.965e-04   7.087 1.95e-12 ***
+## Year_fct2017:Flow                  2.980e-02  4.868e-03   6.122 1.13e-09 ***
+## Year_fct2018:Flow                  4.690e-04  2.456e-04   1.909 0.056383 .  
+## Year_fct2019:Flow                 -8.769e-05  2.155e-04  -0.407 0.684079    
+## Year_fct2016:StationCodeSTTD       1.073e+00  8.125e-02  13.208  < 2e-16 ***
+## Year_fct2017:StationCodeSTTD       1.077e+00  1.234e-01   8.724  < 2e-16 ***
+## Year_fct2018:StationCodeSTTD      -2.829e-01  7.662e-02  -3.692 0.000229 ***
+## Year_fct2019:StationCodeSTTD      -8.837e-01  7.505e-02 -11.774  < 2e-16 ***
+## Year_fct2016:StationCodeLIB        2.103e+00  2.559e-01   8.217 3.92e-16 ***
+## Year_fct2017:StationCodeLIB        9.232e-01  2.051e-01   4.501 7.20e-06 ***
+## Year_fct2018:StationCodeLIB       -8.344e-01  2.888e-01  -2.889 0.003910 ** 
+## Year_fct2019:StationCodeLIB       -2.348e-01  2.275e-01  -1.032 0.302235    
+## Year_fct2016:StationCodeRVB        1.200e+00  2.292e-01   5.235 1.84e-07 ***
+## Year_fct2017:StationCodeRVB        9.410e-01  1.984e-01   4.743 2.27e-06 ***
+## Year_fct2018:StationCodeRVB        8.428e-02  1.543e-01   0.546 0.584870    
+## Year_fct2019:StationCodeRVB       -8.860e-01  1.847e-01  -4.797 1.74e-06 ***
+## Flow:StationCodeSTTD               4.763e-03  2.905e-04  16.396  < 2e-16 ***
+## Flow:StationCodeLIB                1.721e-03  2.024e-04   8.501  < 2e-16 ***
+## Flow:StationCodeRVB                1.817e-03  1.823e-04   9.964  < 2e-16 ***
+## Year_fct2016:Flow:StationCodeSTTD -3.755e-03  4.317e-04  -8.699  < 2e-16 ***
+## Year_fct2017:Flow:StationCodeSTTD -2.776e-02  5.251e-03  -5.285 1.40e-07 ***
+## Year_fct2018:Flow:StationCodeSTTD -8.361e-04  3.842e-04  -2.176 0.029658 *  
+## Year_fct2019:Flow:StationCodeSTTD -1.479e-03  3.405e-04  -4.345 1.47e-05 ***
+## Year_fct2016:Flow:StationCodeLIB  -1.778e-03  3.405e-04  -5.222 1.97e-07 ***
+## Year_fct2017:Flow:StationCodeLIB  -2.991e-02  4.870e-03  -6.143 9.93e-10 ***
+## Year_fct2018:Flow:StationCodeLIB   2.326e-04  4.499e-04   0.517 0.605220    
+## Year_fct2019:Flow:StationCodeLIB   3.155e-04  3.227e-04   0.978 0.328296    
+## Year_fct2016:Flow:StationCodeRVB  -2.159e-03  2.977e-04  -7.252 6.04e-13 ***
+## Year_fct2017:Flow:StationCodeRVB  -2.983e-02  4.868e-03  -6.129 1.08e-09 ***
+## Year_fct2018:Flow:StationCodeRVB  -4.803e-04  2.469e-04  -1.945 0.051881 .  
+## Year_fct2019:Flow:StationCodeRVB   1.161e-04  2.175e-04   0.534 0.593549    
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Approximate significance of smooth terms:
+##          edf Ref.df     F p-value    
+## s(DOY) 7.386  8.342 40.88  <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## R-sq.(adj) =  0.872   Deviance explained = 87.5%
+## -REML = 860.35  Scale est. = 0.11819   n = 1874
+
appraise(m_gam3)
+

+
shapiro.test(residuals(m_gam3))
+
## 
+##  Shapiro-Wilk normality test
+## 
+## data:  residuals(m_gam3)
+## W = 0.96852, p-value < 2.2e-16
+
k.check(m_gam3)
+
##        k'      edf  k-index p-value
+## s(DOY)  9 7.385659 0.952863    0.02
+
draw(m_gam3, select = 1, residuals = TRUE, rug = FALSE)
+

+
plot(m_gam3, pages = 1, all.terms = TRUE)
+

+
acf(residuals(m_gam3))
+

+
Box.test(residuals(m_gam3), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_gam3)
+## X-squared = 3426.2, df = 20, p-value < 2.2e-16
+
+
+

Model with k = 5

+

The model definitely has residual autocorrelation as indicated by the +ACF plot and the Box-Ljung test. Even though the p-value for the k-check +is less than 0.05, the smooth term for day of year appears to be +overfitted. Let’s try a smaller k-value for the smooth first, then lets +try to address the residual autocorrelation.

+
m_gam3_k5 <- gam(
+  Chla_log ~ Year_fct * Flow * StationCode + s(DOY, k = 5), 
+  data = df_chla_c2,
+  method = "REML"
+)
+
+summary(m_gam3_k5)
+
## 
+## Family: gaussian 
+## Link function: identity 
+## 
+## Formula:
+## Chla_log ~ Year_fct * Flow * StationCode + s(DOY, k = 5)
+## 
+## Parametric coefficients:
+##                                     Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)                        9.7278768  0.0421697 230.684  < 2e-16 ***
+## Year_fct2016                      -0.7776765  0.0639758 -12.156  < 2e-16 ***
+## Year_fct2017                      -0.9217677  0.0936371  -9.844  < 2e-16 ***
+## Year_fct2018                      -0.3780679  0.0561613  -6.732 2.23e-11 ***
+## Year_fct2019                      -0.1554310  0.0557794  -2.787 0.005383 ** 
+## Flow                              -0.0017682  0.0001785  -9.904  < 2e-16 ***
+## StationCodeSTTD                   -1.0661259  0.0535105 -19.924  < 2e-16 ***
+## StationCodeLIB                    -2.1691650  0.1428317 -15.187  < 2e-16 ***
+## StationCodeRVB                    -2.1846232  0.0998032 -21.889  < 2e-16 ***
+## Year_fct2016:Flow                  0.0019713  0.0002923   6.743 2.07e-11 ***
+## Year_fct2017:Flow                  0.0286811  0.0048738   5.885 4.73e-09 ***
+## Year_fct2018:Flow                  0.0004480  0.0002470   1.814 0.069834 .  
+## Year_fct2019:Flow                 -0.0000991  0.0002165  -0.458 0.647254    
+## Year_fct2016:StationCodeSTTD       1.0643578  0.0815638  13.049  < 2e-16 ***
+## Year_fct2017:StationCodeSTTD       1.0619864  0.1239959   8.565  < 2e-16 ***
+## Year_fct2018:StationCodeSTTD      -0.2896681  0.0770322  -3.760 0.000175 ***
+## Year_fct2019:StationCodeSTTD      -0.8811363  0.0753268 -11.698  < 2e-16 ***
+## Year_fct2016:StationCodeLIB        2.1057016  0.2567401   8.202 4.41e-16 ***
+## Year_fct2017:StationCodeLIB        0.9481491  0.2051177   4.622 4.06e-06 ***
+## Year_fct2018:StationCodeLIB       -0.7855803  0.2899198  -2.710 0.006798 ** 
+## Year_fct2019:StationCodeLIB       -0.2662514  0.2284866  -1.165 0.244056    
+## Year_fct2016:StationCodeRVB        1.1701753  0.2287826   5.115 3.47e-07 ***
+## Year_fct2017:StationCodeRVB        0.9672348  0.1981715   4.881 1.15e-06 ***
+## Year_fct2018:StationCodeRVB        0.0946581  0.1548241   0.611 0.541017    
+## Year_fct2019:StationCodeRVB       -0.8809824  0.1850921  -4.760 2.09e-06 ***
+## Flow:StationCodeSTTD               0.0047745  0.0002921  16.346  < 2e-16 ***
+## Flow:StationCodeLIB                0.0016888  0.0002007   8.416  < 2e-16 ***
+## Flow:StationCodeRVB                0.0017912  0.0001799   9.959  < 2e-16 ***
+## Year_fct2016:Flow:StationCodeSTTD -0.0037733  0.0004341  -8.692  < 2e-16 ***
+## Year_fct2017:Flow:StationCodeSTTD -0.0262643  0.0052481  -5.005 6.14e-07 ***
+## Year_fct2018:Flow:StationCodeSTTD -0.0008384  0.0003865  -2.169 0.030185 *  
+## Year_fct2019:Flow:StationCodeSTTD -0.0015024  0.0003424  -4.388 1.21e-05 ***
+## Year_fct2016:Flow:StationCodeLIB  -0.0016335  0.0003378  -4.836 1.44e-06 ***
+## Year_fct2017:Flow:StationCodeLIB  -0.0287590  0.0048744  -5.900 4.32e-09 ***
+## Year_fct2018:Flow:StationCodeLIB   0.0002943  0.0004521   0.651 0.515145    
+## Year_fct2019:Flow:StationCodeLIB   0.0002993  0.0003244   0.923 0.356341    
+## Year_fct2016:Flow:StationCodeRVB  -0.0020283  0.0002934  -6.914 6.48e-12 ***
+## Year_fct2017:Flow:StationCodeRVB  -0.0287229  0.0048739  -5.893 4.50e-09 ***
+## Year_fct2018:Flow:StationCodeRVB  -0.0004615  0.0002483  -1.859 0.063211 .  
+## Year_fct2019:Flow:StationCodeRVB   0.0001259  0.0002185   0.576 0.564661    
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Approximate significance of smooth terms:
+##          edf Ref.df     F p-value    
+## s(DOY) 3.892  3.993 81.76  <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## R-sq.(adj) =   0.87   Deviance explained = 87.3%
+## -REML = 867.13  Scale est. = 0.11965   n = 1874
+
appraise(m_gam3_k5)
+

+
shapiro.test(residuals(m_gam3_k5))
+
## 
+##  Shapiro-Wilk normality test
+## 
+## data:  residuals(m_gam3_k5)
+## W = 0.96615, p-value < 2.2e-16
+
k.check(m_gam3_k5)
+
##        k'      edf   k-index p-value
+## s(DOY)  4 3.892262 0.9395093  0.0025
+
draw(m_gam3_k5, select = 1, residuals = TRUE, rug = FALSE)
+

+
plot(m_gam3_k5, pages = 1, all.terms = TRUE)
+

+
acf(residuals(m_gam3_k5))
+

+
Box.test(residuals(m_gam3_k5), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_gam3_k5)
+## X-squared = 3449.2, df = 20, p-value < 2.2e-16
+

Changing the k-value to 5 seems to help reduce the “wiggliness” of +the smooth term for DOY, but the p-value for the k-check is still less +than 0.05. Despite this, lets proceed with a k-value of 5.

+
+
+

Model with AR() correlation structure

+

Now, we’ll try to deal with the residual autocorrelation. We’ll run +AR(1), AR(2), AR(3), and AR(4) models and compare them using AIC.

+
# Define model formula as an object
+f_gam3 <- as.formula("Chla_log ~ Year_fct * Flow * StationCode + s(DOY, k = 5)")
+
+# Fit original model with k = 5 and three successive AR(p) models
+m_gamm3 <- gamm(
+  f_gam3, 
+  data = df_chla_c2,
+  method = "REML"
+)
+
+m_gamm3_ar1 <- gamm(
+  f_gam3, 
+  data = df_chla_c2, 
+  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 1), # grouped by Year_fct and StationCode
+  method = "REML"
+)
+
+m_gamm3_ar2 <- gamm(
+  f_gam3, 
+  data = df_chla_c2, 
+  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 2),
+  method = "REML"
+)
+
+m_gamm3_ar3 <- gamm(
+  f_gam3, 
+  data = df_chla_c2, 
+  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 3),
+  method = "REML"
+)
+
+m_gamm3_ar4 <- gamm(
+  f_gam3, 
+  data = df_chla_c2, 
+  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 4),
+  method = "REML"
+)
+
+# Compare models
+anova(
+  m_gamm3$lme, 
+  m_gamm3_ar1$lme, 
+  m_gamm3_ar2$lme, 
+  m_gamm3_ar3$lme,
+  m_gamm3_ar4$lme
+)
+
##                 Model df       AIC       BIC    logLik   Test   L.Ratio p-value
+## m_gamm3$lme         1 43 1820.2656 2057.3551 -867.1328                         
+## m_gamm3_ar1$lme     2 44 -725.2482 -482.6450  406.6241 1 vs 2 2547.5138  <.0001
+## m_gamm3_ar2$lme     3 45 -723.7345 -475.6176  406.8672 2 vs 3    0.4862  0.4856
+## m_gamm3_ar3$lme     4 46 -724.9241 -471.2935  408.4621 3 vs 4    3.1896  0.0741
+## m_gamm3_ar4$lme     5 47 -723.7127 -464.5684  408.8564 4 vs 5    0.7886  0.3745
+

It looks like both AIC and BIC prefer the AR(1) model. Let’s take a +closer look at the AR(1) model.

+
+
+

AR(1) Model

+
# Pull out the residuals and the GAM model
+resid_gamm3_ar1 <- residuals(m_gamm3_ar1$lme, type = "normalized")
+m_gamm3_ar1_gam <- m_gamm3_ar1$gam
+
+summary(m_gamm3_ar1_gam)
+
## 
+## Family: gaussian 
+## Link function: identity 
+## 
+## Formula:
+## Chla_log ~ Year_fct * Flow * StationCode + s(DOY, k = 5)
+## 
+## Parametric coefficients:
+##                                     Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)                        9.525e+00  3.065e-01  31.077  < 2e-16 ***
+## Year_fct2016                      -6.067e-01  4.585e-01  -1.323 0.185863    
+## Year_fct2017                      -4.935e-01  4.661e-01  -1.059 0.289818    
+## Year_fct2018                      -2.740e-01  4.248e-01  -0.645 0.519064    
+## Year_fct2019                       1.705e-02  4.273e-01   0.040 0.968183    
+## Flow                              -6.516e-04  2.937e-04  -2.219 0.026619 *  
+## StationCodeSTTD                   -7.842e-01  4.322e-01  -1.815 0.069757 .  
+## StationCodeLIB                    -1.671e+00  4.339e-01  -3.852 0.000121 ***
+## StationCodeRVB                    -1.779e+00  4.201e-01  -4.235  2.4e-05 ***
+## Year_fct2016:Flow                  1.011e-03  5.967e-04   1.694 0.090507 .  
+## Year_fct2017:Flow                  1.339e-02  8.870e-03   1.510 0.131306    
+## Year_fct2018:Flow                  5.601e-05  5.438e-04   0.103 0.917981    
+## Year_fct2019:Flow                 -3.069e-04  4.586e-04  -0.669 0.503503    
+## Year_fct2016:StationCodeSTTD       9.683e-01  6.423e-01   1.508 0.131843    
+## Year_fct2017:StationCodeSTTD       2.722e-01  6.782e-01   0.401 0.688202    
+## Year_fct2018:StationCodeSTTD      -5.477e-01  6.227e-01  -0.880 0.379244    
+## Year_fct2019:StationCodeSTTD      -1.146e+00  6.127e-01  -1.870 0.061617 .  
+## Year_fct2016:StationCodeLIB        1.710e+00  6.473e-01   2.641 0.008333 ** 
+## Year_fct2017:StationCodeLIB        5.028e-01  6.538e-01   0.769 0.441911    
+## Year_fct2018:StationCodeLIB       -1.599e+00  7.123e-01  -2.244 0.024924 *  
+## Year_fct2019:StationCodeLIB       -7.410e-01  7.159e-01  -1.035 0.300801    
+## Year_fct2016:StationCodeRVB        1.035e+00  6.537e-01   1.584 0.113479    
+## Year_fct2017:StationCodeRVB        2.833e-01  6.466e-01   0.438 0.661318    
+## Year_fct2018:StationCodeRVB       -3.375e-02  6.055e-01  -0.056 0.955553    
+## Year_fct2019:StationCodeRVB       -6.441e-01  6.186e-01  -1.041 0.297865    
+## Flow:StationCodeSTTD               1.370e-03  5.784e-04   2.369 0.017938 *  
+## Flow:StationCodeLIB                7.042e-04  3.021e-04   2.331 0.019857 *  
+## Flow:StationCodeRVB                6.539e-04  2.939e-04   2.225 0.026221 *  
+## Year_fct2016:Flow:StationCodeSTTD -1.540e-03  9.500e-04  -1.621 0.105090    
+## Year_fct2017:Flow:StationCodeSTTD -7.559e-03  9.022e-03  -0.838 0.402250    
+## Year_fct2018:Flow:StationCodeSTTD  4.260e-04  9.270e-04   0.460 0.645894    
+## Year_fct2019:Flow:StationCodeSTTD  8.869e-04  7.767e-04   1.142 0.253651    
+## Year_fct2016:Flow:StationCodeLIB  -8.492e-04  6.085e-04  -1.396 0.162993    
+## Year_fct2017:Flow:StationCodeLIB  -1.333e-02  8.871e-03  -1.502 0.133191    
+## Year_fct2018:Flow:StationCodeLIB   5.668e-04  5.819e-04   0.974 0.330165    
+## Year_fct2019:Flow:StationCodeLIB   4.780e-04  4.908e-04   0.974 0.330269    
+## Year_fct2016:Flow:StationCodeRVB  -1.030e-03  5.971e-04  -1.724 0.084817 .  
+## Year_fct2017:Flow:StationCodeRVB  -1.340e-02  8.870e-03  -1.511 0.131071    
+## Year_fct2018:Flow:StationCodeRVB  -6.689e-05  5.443e-04  -0.123 0.902196    
+## Year_fct2019:Flow:StationCodeRVB   2.798e-04  4.591e-04   0.609 0.542337    
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Approximate significance of smooth terms:
+##          edf Ref.df     F p-value    
+## s(DOY) 3.806  3.806 14.49  <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## R-sq.(adj) =  0.748   
+##   Scale est. = 0.3387    n = 1874
+
appraise(m_gamm3_ar1_gam)
+

+
shapiro.test(resid_gamm3_ar1)
+
## 
+##  Shapiro-Wilk normality test
+## 
+## data:  resid_gamm3_ar1
+## W = 0.81636, p-value < 2.2e-16
+
k.check(m_gamm3_ar1_gam)
+
##        k'      edf   k-index p-value
+## s(DOY)  4 3.806167 0.6413984       0
+
draw(m_gamm3_ar1_gam, select = 1, residuals = TRUE, rug = FALSE)
+

+
plot(m_gamm3_ar1_gam, pages = 1, all.terms = TRUE)
+

+
acf(resid_gamm3_ar1)
+

+
Box.test(resid_gamm3_ar1, lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  resid_gamm3_ar1
+## X-squared = 52.893, df = 20, p-value = 8.426e-05
+

The AR(1) model has much less residual autocorrelation, and the +diagnostics plots look okay. Let’s take a closer look at how the +back-transformed fitted values from the model match the observed +values.

+
df_gamm3_ar1_fit <- df_chla_c2 %>% 
+  fitted_values(m_gamm3_ar1_gam, data = .) %>% 
+  mutate(fitted_bt = exp(fitted) / 1000)
+
+plt_gamm3_ar1_obs_fit <- df_gamm3_ar1_fit %>% 
+  ggplot(aes(x = fitted_bt, y = Chla)) +
+  geom_point() +
+  geom_abline(slope = 1, intercept = 0, color = "red") +
+  theme_bw() +
+  labs(
+    x = "Back-transformed Fitted Values",
+    y = "Observed Values"
+  )
+
+plt_gamm3_ar1_obs_fit
+

+

Let’s look at each Station-Year combination separately.

+
plt_gamm3_ar1_obs_fit +
+  facet_wrap(
+    vars(StationCode, Year_fct),
+    ncol = 5,
+    scales = "free",
+    labeller = labeller(.multi_line = FALSE)
+  )
+

+

The red lines are the 1:1 ratio between fitted and observed values, +and we would expect for most of the points to be near this line if the +model has a good fit to the observed data. However, the fitted values +from the model don’t appear to match the observed values that well, and +there are some unusual patterns when we look at each Station-Year +combination separately. I don’t think this GAM model is a good candidate +to use for our analysis.

+
+
+
+

Linear Model with 3-way interactions

+

Since the GAM model didn’t seem to fit the observed data that well, +let’s try a linear model using a three-way interaction between Year, +Daily Average Flow, and Station. Initially, we’ll run the model without +accounting for serial autocorrelation.

+
+

Initial Model

+
m_lm3 <- df_chla_c2_lag %>% 
+  drop_na(Chla_log) %>% 
+  lm(Chla_log ~ Year_fct * Flow * StationCode, data = .)
+

Lets look at the model summary and diagnostics:

+
summary(m_lm3)
+
## 
+## Call:
+## lm(formula = Chla_log ~ Year_fct * Flow * StationCode, data = .)
+## 
+## Residuals:
+##      Min       1Q   Median       3Q      Max 
+## -1.63942 -0.22564 -0.03365  0.19208  1.75991 
+## 
+## Coefficients:
+##                                     Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)                        9.6844243  0.0448372 215.991  < 2e-16 ***
+## Year_fct2016                      -0.6624673  0.0673312  -9.839  < 2e-16 ***
+## Year_fct2017                      -1.1888491  0.0996337 -11.932  < 2e-16 ***
+## Year_fct2018                      -0.3418296  0.0607458  -5.627 2.11e-08 ***
+## Year_fct2019                      -0.1257666  0.0602074  -2.089 0.036855 *  
+## Flow                              -0.0018894  0.0001884 -10.027  < 2e-16 ***
+## StationCodeSTTD                   -1.0664657  0.0578059 -18.449  < 2e-16 ***
+## StationCodeLIB                    -2.2872235  0.1491890 -15.331  < 2e-16 ***
+## StationCodeRVB                    -2.2211596  0.1077729 -20.610  < 2e-16 ***
+## Year_fct2016:Flow                  0.0023191  0.0003084   7.520 8.51e-14 ***
+## Year_fct2017:Flow                  0.0462944  0.0050980   9.081  < 2e-16 ***
+## Year_fct2018:Flow                  0.0003299  0.0002673   1.234 0.217261    
+## Year_fct2019:Flow                 -0.0001397  0.0002337  -0.598 0.549941    
+## Year_fct2016:StationCodeSTTD       1.0883529  0.0880500  12.361  < 2e-16 ***
+## Year_fct2017:StationCodeSTTD       1.4991116  0.1299212  11.539  < 2e-16 ***
+## Year_fct2018:StationCodeSTTD      -0.3045441  0.0828996  -3.674 0.000246 ***
+## Year_fct2019:StationCodeSTTD      -0.9349727  0.0814723 -11.476  < 2e-16 ***
+## Year_fct2016:StationCodeLIB        1.9616137  0.2700048   7.265 5.49e-13 ***
+## Year_fct2017:StationCodeLIB        1.3396481  0.2174563   6.161 8.89e-10 ***
+## Year_fct2018:StationCodeLIB       -0.7670441  0.3102772  -2.472 0.013521 *  
+## Year_fct2019:StationCodeLIB       -0.2261882  0.2427049  -0.932 0.351487    
+## Year_fct2016:StationCodeRVB        1.9708585  0.2207415   8.928  < 2e-16 ***
+## Year_fct2017:StationCodeRVB        1.4768042  0.2100894   7.029 2.91e-12 ***
+## Year_fct2018:StationCodeRVB       -0.0019108  0.1628429  -0.012 0.990639    
+## Year_fct2019:StationCodeRVB       -0.7680363  0.1962837  -3.913 9.45e-05 ***
+## Flow:StationCodeSTTD               0.0046042  0.0003152  14.606  < 2e-16 ***
+## Flow:StationCodeLIB                0.0017082  0.0002082   8.206 4.25e-16 ***
+## Flow:StationCodeRVB                0.0019282  0.0001897  10.163  < 2e-16 ***
+## Year_fct2016:Flow:StationCodeSTTD -0.0037128  0.0004693  -7.911 4.39e-15 ***
+## Year_fct2017:Flow:StationCodeSTTD -0.0411458  0.0055472  -7.417 1.82e-13 ***
+## Year_fct2018:Flow:StationCodeSTTD -0.0007371  0.0004159  -1.772 0.076508 .  
+## Year_fct2019:Flow:StationCodeSTTD -0.0012635  0.0003698  -3.417 0.000648 ***
+## Year_fct2016:Flow:StationCodeLIB  -0.0021272  0.0003563  -5.970 2.83e-09 ***
+## Year_fct2017:Flow:StationCodeLIB  -0.0463035  0.0051002  -9.079  < 2e-16 ***
+## Year_fct2018:Flow:StationCodeLIB   0.0005066  0.0004884   1.037 0.299825    
+## Year_fct2019:Flow:StationCodeLIB   0.0001596  0.0003482   0.458 0.646694    
+## Year_fct2016:Flow:StationCodeRVB  -0.0024657  0.0003099  -7.957 3.07e-15 ***
+## Year_fct2017:Flow:StationCodeRVB  -0.0463714  0.0050981  -9.096  < 2e-16 ***
+## Year_fct2018:Flow:StationCodeRVB  -0.0003456  0.0002686  -1.287 0.198288    
+## Year_fct2019:Flow:StationCodeRVB   0.0001395  0.0002354   0.593 0.553553    
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Residual standard error: 0.3745 on 1834 degrees of freedom
+## Multiple R-squared:  0.8512, Adjusted R-squared:  0.848 
+## F-statistic: 268.9 on 39 and 1834 DF,  p-value: < 2.2e-16
+
df_chla_c2_lag %>% 
+  drop_na(Chla_log) %>% 
+  plot_lm_diag(Chla_log, m_lm3)
+
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
+

+
shapiro.test(residuals(m_lm3))
+
## 
+##  Shapiro-Wilk normality test
+## 
+## data:  residuals(m_lm3)
+## W = 0.96914, p-value < 2.2e-16
+
acf(residuals(m_lm3))
+

+
Box.test(residuals(m_lm3), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_lm3)
+## X-squared = 3897.6, df = 20, p-value < 2.2e-16
+

The residuals deviate from a normal distribution according to visual +inspection and the Shapiro-Wilk normality test. Also, model definitely +has residual autocorrelation as indicated by the ACF plot and the +Box-Ljung test.

+
+

Model with lag terms

+

Now, we’ll try to deal with the residual autocorrelation and the +non-normal residuals. We’ll run a series of linear models adding 1, 2, +and 3 lag terms and compare how well they correct for +autocorrelation.

+
+

Lag 1

+
m_lm3_lag1 <- df_chla_c2_lag %>% 
+  drop_na(Chla_log, lag1) %>% 
+  lm(Chla_log ~ Year_fct * Flow * StationCode + lag1, data = .)
+
+acf(residuals(m_lm3_lag1))
+

+
Box.test(residuals(m_lm3_lag1), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_lm3_lag1)
+## X-squared = 59.819, df = 20, p-value = 7.595e-06
+
+
+

Lag 2

+
m_lm3_lag2 <- df_chla_c2_lag %>% 
+  drop_na(Chla_log, lag1, lag2) %>% 
+  lm(Chla_log ~ Year_fct * Flow * StationCode + lag1 + lag2, data = .)
+
+acf(residuals(m_lm3_lag2))
+

+
Box.test(residuals(m_lm3_lag2), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_lm3_lag2)
+## X-squared = 46.699, df = 20, p-value = 0.0006458
+
+
+

Lag 3

+
m_lm3_lag3 <- df_chla_c2_lag %>% 
+  drop_na(Chla_log, lag1, lag2, lag3) %>% 
+  lm(Chla_log ~ Year_fct * Flow * StationCode + lag1 + lag2 + lag3, data = .)
+
+acf(residuals(m_lm3_lag3))
+

+
Box.test(residuals(m_lm3_lag3), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_lm3_lag3)
+## X-squared = 26.866, df = 20, p-value = 0.1391
+

The model with 3 lag terms looks like it has better ACF and Box-Ljung +test results than the other models. Let’s compare the 3 models using AIC +and BIC to see which model those prefer.

+
+
+

Compare models

+
AIC(m_lm3_lag1, m_lm3_lag2, m_lm3_lag3)
+
##            df       AIC
+## m_lm3_lag1 42 -1343.149
+## m_lm3_lag2 43 -1328.212
+## m_lm3_lag3 44 -1352.918
+
BIC(m_lm3_lag1, m_lm3_lag2, m_lm3_lag3)
+
##            df       BIC
+## m_lm3_lag1 42 -1111.573
+## m_lm3_lag2 43 -1092.047
+## m_lm3_lag3 44 -1112.153
+

Both AIC and BIC prefer the model with 3 lag terms. Let’s take a +closer look at that one.

+
+
+
+

Lag 3 model summary

+
summary(m_lm3_lag3)
+
## 
+## Call:
+## lm(formula = Chla_log ~ Year_fct * Flow * StationCode + lag1 + 
+##     lag2 + lag3, data = .)
+## 
+## Residuals:
+##      Min       1Q   Median       3Q      Max 
+## -0.89652 -0.05739 -0.00662  0.05039  1.11198 
+## 
+## Coefficients:
+##                                     Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)                        1.191e+00  1.103e-01  10.799  < 2e-16 ***
+## Year_fct2016                      -9.858e-02  3.059e-02  -3.223 0.001294 ** 
+## Year_fct2017                      -1.589e-01  4.601e-02  -3.454 0.000566 ***
+## Year_fct2018                      -5.669e-02  2.710e-02  -2.092 0.036584 *  
+## Year_fct2019                      -4.812e-03  2.687e-02  -0.179 0.857864    
+## Flow                              -3.297e-04  8.442e-05  -3.905 9.78e-05 ***
+## StationCodeSTTD                   -1.253e-01  2.819e-02  -4.443 9.44e-06 ***
+## StationCodeLIB                    -2.304e-01  7.029e-02  -3.277 0.001069 ** 
+## StationCodeRVB                    -2.481e-01  5.311e-02  -4.672 3.21e-06 ***
+## lag1                               9.650e-01  2.412e-02  40.016  < 2e-16 ***
+## lag2                              -3.050e-02  3.321e-02  -0.919 0.358431    
+## lag3                              -5.694e-02  2.389e-02  -2.383 0.017260 *  
+## Year_fct2016:Flow                  3.732e-04  1.365e-04   2.735 0.006310 ** 
+## Year_fct2017:Flow                  6.200e-03  2.353e-03   2.635 0.008500 ** 
+## Year_fct2018:Flow                  1.793e-04  1.167e-04   1.536 0.124654    
+## Year_fct2019:Flow                  7.924e-05  1.024e-04   0.774 0.439015    
+## Year_fct2016:StationCodeSTTD       1.428e-01  4.096e-02   3.486 0.000503 ***
+## Year_fct2017:StationCodeSTTD       2.660e-01  6.001e-02   4.433 9.90e-06 ***
+## Year_fct2018:StationCodeSTTD      -3.452e-02  3.698e-02  -0.933 0.350725    
+## Year_fct2019:StationCodeSTTD      -1.405e-01  3.789e-02  -3.708 0.000215 ***
+## Year_fct2016:StationCodeLIB        2.629e-01  1.208e-01   2.175 0.029743 *  
+## Year_fct2017:StationCodeLIB        1.362e-01  9.987e-02   1.364 0.172741    
+## Year_fct2018:StationCodeLIB        1.543e-02  1.486e-01   0.104 0.917311    
+## Year_fct2019:StationCodeLIB       -6.836e-02  2.610e-01  -0.262 0.793414    
+## Year_fct2016:StationCodeRVB        2.661e-01  1.077e-01   2.472 0.013541 *  
+## Year_fct2017:StationCodeRVB        1.889e-01  9.380e-02   2.014 0.044143 *  
+## Year_fct2018:StationCodeRVB        1.962e-02  7.109e-02   0.276 0.782619    
+## Year_fct2019:StationCodeRVB       -1.364e-01  9.245e-02  -1.476 0.140206    
+## Flow:StationCodeSTTD               6.714e-04  1.514e-04   4.434 9.84e-06 ***
+## Flow:StationCodeLIB                3.450e-04  9.266e-05   3.724 0.000203 ***
+## Flow:StationCodeRVB                3.269e-04  8.507e-05   3.843 0.000126 ***
+## Year_fct2016:Flow:StationCodeSTTD -5.817e-04  2.122e-04  -2.741 0.006181 ** 
+## Year_fct2017:Flow:StationCodeSTTD -3.665e-03  2.588e-03  -1.416 0.156920    
+## Year_fct2018:Flow:StationCodeSTTD -2.008e-04  1.860e-04  -1.079 0.280640    
+## Year_fct2019:Flow:StationCodeSTTD -2.288e-04  1.667e-04  -1.373 0.170039    
+## Year_fct2016:Flow:StationCodeLIB  -3.224e-04  1.577e-04  -2.044 0.041063 *  
+## Year_fct2017:Flow:StationCodeLIB  -6.223e-03  2.355e-03  -2.643 0.008297 ** 
+## Year_fct2018:Flow:StationCodeLIB   2.158e-05  2.375e-04   0.091 0.927610    
+## Year_fct2019:Flow:StationCodeLIB  -1.214e-04  3.530e-04  -0.344 0.730859    
+## Year_fct2016:Flow:StationCodeRVB  -3.897e-04  1.375e-04  -2.835 0.004629 ** 
+## Year_fct2017:Flow:StationCodeRVB  -6.205e-03  2.354e-03  -2.636 0.008455 ** 
+## Year_fct2018:Flow:StationCodeRVB  -1.783e-04  1.173e-04  -1.520 0.128795    
+## Year_fct2019:Flow:StationCodeRVB  -7.338e-05  1.032e-04  -0.711 0.477174    
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Residual standard error: 0.1626 on 1715 degrees of freedom
+## Multiple R-squared:  0.9709, Adjusted R-squared:  0.9702 
+## F-statistic:  1361 on 42 and 1715 DF,  p-value: < 2.2e-16
+
df_chla_c2_lag %>% 
+  drop_na(Chla_log, lag1, lag2, lag3) %>% 
+  plot_lm_diag(Chla_log, m_lm3_lag3)
+
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
+

+
shapiro.test(residuals(m_lm3_lag3))
+
## 
+##  Shapiro-Wilk normality test
+## 
+## data:  residuals(m_lm3_lag3)
+## W = 0.85329, p-value < 2.2e-16
+

The residuals deviate from a normal distribution particularly at the +both tails of the distribution. I’m not so sure about using this model. +Let’s take a closer look at how the back-transformed fitted values from +the model match the observed values.

+
df_lm3_lag3_fit <- df_chla_c2_lag %>% 
+  drop_na(Chla, lag1, lag2, lag3) %>% 
+  mutate(Fitted = as_tibble(predict(m_lm3_lag3, interval = "confidence"))) %>% 
+  unnest_wider(Fitted) %>% 
+  mutate(across(c("fit", "lwr", "upr"), ~ exp(.x) / 1000))
+
+plt_lm3_lag3_obs_fit <- df_lm3_lag3_fit %>% 
+  ggplot(aes(x = fit, y = Chla)) +
+  geom_point() +
+  geom_abline(slope = 1, intercept = 0, color = "red") +
+  theme_bw() +
+  labs(
+    x = "Back-transformed Fitted Values",
+    y = "Observed Values"
+  )
+
+plt_lm3_lag3_obs_fit
+

+

Let’s look at each Station-Year combination separately.

+
plt_lm3_lag3_obs_fit +
+  facet_wrap(
+    vars(StationCode, Year_fct),
+    ncol = 5,
+    scales = "free",
+    labeller = labeller(.multi_line = FALSE)
+  )
+

+

The red lines are the 1:1 ratio between fitted and observed values, +and we would expect for most of the points to be near this line if the +model has a good fit to the observed data. The fitted and observed +values appear to match pretty well at the lower end of the range of +values, but this deteriorates at the mid and higher range values. This +pattern holds for some of the separate Station-Year combinations. I’m +not sure this linear model is a good candidate to use for our +analysis.

+
+
+
+
+

GAM Model with 2-way interactions

+

Because the models using 3-way interactions don’t seem to fit the +data well, we will attempt to fit a generalized additive model (GAM) to +the data set using all two-way interactions between Year, Daily Average +Flow, and Station, with a smooth term for day of year to account for +seasonality. Initially, we’ll run the GAM without accounting for serial +autocorrelation.

+
+

Initial Model

+
m_gam2 <- gam(
+  Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(DOY, k = 5), 
+  data = df_chla_c2,
+  method = "REML"
+)
+

Lets look at the model summary and diagnostics:

+
summary(m_gam2)
+
## 
+## Family: gaussian 
+## Link function: identity 
+## 
+## Formula:
+## Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(DOY, k = 5)
+## 
+## Parametric coefficients:
+##                                Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)                   9.687e+00  3.720e-02 260.381  < 2e-16 ***
+## Year_fct2016                 -5.422e-01  5.369e-02 -10.098  < 2e-16 ***
+## Year_fct2017                 -4.293e-01  5.124e-02  -8.377  < 2e-16 ***
+## Year_fct2018                 -3.173e-01  4.772e-02  -6.650 3.86e-11 ***
+## Year_fct2019                 -1.779e-01  4.815e-02  -3.694 0.000227 ***
+## Flow                         -1.438e-03  9.087e-05 -15.824  < 2e-16 ***
+## StationCodeSTTD              -9.908e-01  5.067e-02 -19.553  < 2e-16 ***
+## StationCodeLIB               -1.959e+00  1.108e-01 -17.684  < 2e-16 ***
+## StationCodeRVB               -2.136e+00  9.771e-02 -21.860  < 2e-16 ***
+## Year_fct2016:Flow            -4.743e-05  3.105e-05  -1.528 0.126764    
+## Year_fct2017:Flow            -4.247e-05  2.509e-05  -1.693 0.090640 .  
+## Year_fct2018:Flow            -1.286e-05  2.536e-05  -0.507 0.612267    
+## Year_fct2019:Flow             1.000e-05  2.711e-05   0.369 0.712110    
+## Year_fct2016:StationCodeSTTD  7.459e-01  7.431e-02  10.038  < 2e-16 ***
+## Year_fct2017:StationCodeSTTD  4.306e-01  7.903e-02   5.449 5.76e-08 ***
+## Year_fct2018:StationCodeSTTD -3.358e-01  7.154e-02  -4.694 2.88e-06 ***
+## Year_fct2019:StationCodeSTTD -9.471e-01  7.007e-02 -13.516  < 2e-16 ***
+## Year_fct2016:StationCodeLIB   1.303e+00  8.601e-02  15.152  < 2e-16 ***
+## Year_fct2017:StationCodeLIB   4.316e-01  8.611e-02   5.012 5.92e-07 ***
+## Year_fct2018:StationCodeLIB  -1.431e+00  1.135e-01 -12.609  < 2e-16 ***
+## Year_fct2019:StationCodeLIB  -4.886e-01  1.182e-01  -4.135 3.71e-05 ***
+## Year_fct2016:StationCodeRVB   8.502e-01  2.321e-01   3.663 0.000257 ***
+## Year_fct2017:StationCodeRVB   4.864e-01  1.847e-01   2.633 0.008531 ** 
+## Year_fct2018:StationCodeRVB   3.244e-02  1.527e-01   0.212 0.831772    
+## Year_fct2019:StationCodeRVB  -7.110e-01  1.831e-01  -3.884 0.000106 ***
+## Flow:StationCodeSTTD          3.329e-03  1.253e-04  26.579  < 2e-16 ***
+## Flow:StationCodeLIB           1.465e-03  1.114e-04  13.151  < 2e-16 ***
+## Flow:StationCodeRVB           1.460e-03  8.962e-05  16.292  < 2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Approximate significance of smooth terms:
+##         edf Ref.df     F p-value    
+## s(DOY) 3.88  3.992 93.39  <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## R-sq.(adj) =  0.858   Deviance explained = 86.1%
+## -REML = 869.89  Scale est. = 0.13071   n = 1874
+
appraise(m_gam2)
+

+
shapiro.test(residuals(m_gam2))
+
## 
+##  Shapiro-Wilk normality test
+## 
+## data:  residuals(m_gam2)
+## W = 0.96845, p-value < 2.2e-16
+
k.check(m_gam2)
+
##        k'      edf   k-index p-value
+## s(DOY)  4 3.880408 0.9386901       0
+
draw(m_gam2, select = 1, residuals = TRUE, rug = FALSE)
+

+
plot(m_gam2, pages = 1, all.terms = TRUE)
+

+
acf(residuals(m_gam2))
+

+
Box.test(residuals(m_gam2), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_gam2)
+## X-squared = 3957.6, df = 20, p-value < 2.2e-16
+
+
+

Model with AR() correlation structure

+

Now, we’ll try to deal with the residual autocorrelation. We’ll run +AR(1), AR(2), AR(3), and AR(4) models and compare them using AIC.

+
# Define model formula as an object
+f_gam2 <- as.formula("Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(DOY, k = 5)")
+
+# Fit original model with k = 5 and three successive AR(p) models
+m_gamm2 <- gamm(
+  f_gam2, 
+  data = df_chla_c2,
+  method = "REML"
+)
+
+m_gamm2_ar1 <- gamm(
+  f_gam2, 
+  data = df_chla_c2, 
+  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 1), # grouped by Year_fct and StationCode
+  method = "REML"
+)
+
+m_gamm2_ar2 <- gamm(
+  f_gam2, 
+  data = df_chla_c2, 
+  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 2),
+  method = "REML"
+)
+
+m_gamm2_ar3 <- gamm(
+  f_gam2, 
+  data = df_chla_c2, 
+  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 3),
+  method = "REML"
+)
+
+m_gamm2_ar4 <- gamm(
+  f_gam2, 
+  data = df_chla_c2, 
+  correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 4),
+  method = "REML"
+)
+
+# Compare models
+anova(
+  m_gamm2$lme, 
+  m_gamm2_ar1$lme, 
+  m_gamm2_ar2$lme, 
+  m_gamm2_ar3$lme,
+  m_gamm2_ar4$lme
+)
+
##                 Model df       AIC       BIC    logLik   Test   L.Ratio p-value
+## m_gamm2$lme         1 31 1801.7891 1972.9163 -869.8945                         
+## m_gamm2_ar1$lme     2 32 -880.0401 -703.3926  472.0201 1 vs 2 2683.8292  <.0001
+## m_gamm2_ar2$lme     3 33 -878.0514 -695.8836  472.0257 2 vs 3    0.0112  0.9156
+## m_gamm2_ar3$lme     4 34 -878.0388 -690.3508  473.0194 3 vs 4    1.9874  0.1586
+## m_gamm2_ar4$lme     5 35 -877.7177 -684.5095  473.8589 4 vs 5    1.6789  0.1951
+

It looks like the AR(1) model has the best fit according to the AIC +and BIC values. Let’s take a closer look at the AR(1) model.

+
+
+

AR(1) Model

+
# Pull out the residuals and the GAM model
+resid_gamm2_ar1 <- residuals(m_gamm2_ar1$lme, type = "normalized")
+m_gamm2_ar1_gam <- m_gamm2_ar1$gam
+
+summary(m_gamm2_ar1_gam)
+
## 
+## Family: gaussian 
+## Link function: identity 
+## 
+## Formula:
+## Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(DOY, k = 5)
+## 
+## Parametric coefficients:
+##                                Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)                   9.524e+00  4.705e-01  20.245  < 2e-16 ***
+## Year_fct2016                 -4.745e-01  6.930e-01  -0.685 0.493589    
+## Year_fct2017                 -1.946e-01  6.718e-01  -0.290 0.772141    
+## Year_fct2018                 -2.427e-01  6.521e-01  -0.372 0.709814    
+## Year_fct2019                  7.054e-02  6.557e-01   0.108 0.914341    
+## Flow                         -5.558e-04  1.911e-04  -2.908 0.003687 ** 
+## StationCodeSTTD              -7.488e-01  6.649e-01  -1.126 0.260262    
+## StationCodeLIB               -1.444e+00  6.549e-01  -2.204 0.027614 *  
+## StationCodeRVB               -1.664e+00  6.463e-01  -2.575 0.010103 *  
+## Year_fct2016:Flow            -1.617e-05  1.991e-05  -0.812 0.416926    
+## Year_fct2017:Flow            -4.077e-06  2.028e-05  -0.201 0.840700    
+## Year_fct2018:Flow            -2.836e-06  2.191e-05  -0.129 0.897009    
+## Year_fct2019:Flow            -2.267e-05  2.043e-05  -1.110 0.267297    
+## Year_fct2016:StationCodeSTTD  8.515e-01  9.770e-01   0.872 0.383585    
+## Year_fct2017:StationCodeSTTD -2.845e-01  9.961e-01  -0.286 0.775213    
+## Year_fct2018:StationCodeSTTD -6.349e-01  9.530e-01  -0.666 0.505336    
+## Year_fct2019:StationCodeSTTD -1.204e+00  9.406e-01  -1.280 0.200857    
+## Year_fct2016:StationCodeLIB   1.355e+00  9.566e-01   1.417 0.156753    
+## Year_fct2017:StationCodeLIB   6.824e-02  9.518e-01   0.072 0.942850    
+## Year_fct2018:StationCodeLIB  -2.291e+00  1.025e+00  -2.234 0.025580 *  
+## Year_fct2019:StationCodeLIB  -1.047e+00  1.039e+00  -1.008 0.313616    
+## Year_fct2016:StationCodeRVB   9.658e-01  9.787e-01   0.987 0.323854    
+## Year_fct2017:StationCodeRVB  -5.525e-02  9.433e-01  -0.059 0.953298    
+## Year_fct2018:StationCodeRVB  -1.545e-01  9.189e-01  -0.168 0.866506    
+## Year_fct2019:StationCodeRVB  -7.854e-01  9.363e-01  -0.839 0.401700    
+## Flow:StationCodeSTTD          1.554e-03  3.081e-04   5.044 5.01e-07 ***
+## Flow:StationCodeLIB           7.168e-04  1.967e-04   3.645 0.000275 ***
+## Flow:StationCodeRVB           5.545e-04  1.911e-04   2.902 0.003758 ** 
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Approximate significance of smooth terms:
+##          edf Ref.df     F p-value    
+## s(DOY) 3.864  3.864 27.52  <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## R-sq.(adj) =  0.643   
+##   Scale est. = 0.56753   n = 1874
+
appraise(m_gamm2_ar1_gam)
+

+
shapiro.test(resid_gamm2_ar1)
+
## 
+##  Shapiro-Wilk normality test
+## 
+## data:  resid_gamm2_ar1
+## W = 0.80125, p-value < 2.2e-16
+
k.check(m_gamm2_ar1_gam)
+
##        k'      edf   k-index p-value
+## s(DOY)  4 3.864181 0.4865552       0
+
draw(m_gamm2_ar1_gam, select = 1, residuals = TRUE, rug = FALSE)
+

+
plot(m_gamm2_ar1_gam, pages = 1, all.terms = TRUE)
+

+
acf(resid_gamm2_ar1)
+

+
Box.test(resid_gamm2_ar1, lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  resid_gamm2_ar1
+## X-squared = 47.503, df = 20, p-value = 0.0004993
+

The AR(1) model has much less residual autocorrelation, and the +diagnostics plots look okay. Let’s take a closer look at how the +back-transformed fitted values from the model match the observed +values.

+
df_gamm2_ar1_fit <- df_chla_c2 %>% 
+  fitted_values(m_gamm2_ar1_gam, data = .) %>% 
+  mutate(fitted_bt = exp(fitted) / 1000)
+
+plt_gamm2_ar1_obs_fit <- df_gamm2_ar1_fit %>% 
+  ggplot(aes(x = fitted_bt, y = Chla)) +
+  geom_point() +
+  geom_abline(slope = 1, intercept = 0, color = "red") +
+  theme_bw() +
+  labs(
+    x = "Back-transformed Fitted Values",
+    y = "Observed Values"
+  )
+
+plt_gamm2_ar1_obs_fit
+

+

Let’s look at each Station and Year separately.

+
plt_gamm2_ar1_obs_fit + facet_wrap(vars(StationCode), scales = "free")
+

+
plt_gamm2_ar1_obs_fit + facet_wrap(vars(Year_fct), scales = "free")
+

+

The red lines are the 1:1 ratio between fitted and observed values, +and we would expect for most of the points to be near this line if the +model has a good fit to the observed data. However, the fitted values +from the model don’t appear to match the observed values that well. +Again, I don’t think this GAM model is a good candidate to use for our +analysis.

+
+
+
+

Linear Model with 2-way interactions

+

Since the models using 3-way interactions don’t seem to fit the +observed data that well, let’s try a linear model using all two-way +interactions between Year, Daily Average Flow, and Station. Initially, +we’ll run the model without accounting for serial autocorrelation.

+
+

Initial Model

+
m_lm2 <- df_chla_c2_lag %>% 
+  drop_na(Chla_log) %>% 
+  lm(Chla_log ~ (Year_fct + Flow + StationCode)^2, data = .)
+

Lets look at the model summary and diagnostics:

+
summary(m_lm2)
+
## 
+## Call:
+## lm(formula = Chla_log ~ (Year_fct + Flow + StationCode)^2, data = .)
+## 
+## Residuals:
+##      Min       1Q   Median       3Q      Max 
+## -1.65329 -0.23920 -0.03958  0.20123  1.96767 
+## 
+## Coefficients:
+##                                Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)                   9.632e+00  4.015e-02 239.900  < 2e-16 ***
+## Year_fct2016                 -3.555e-01  5.707e-02  -6.228 5.83e-10 ***
+## Year_fct2017                 -4.068e-01  5.586e-02  -7.284 4.79e-13 ***
+## Year_fct2018                 -2.925e-01  5.213e-02  -5.611 2.31e-08 ***
+## Year_fct2019                 -1.467e-01  5.255e-02  -2.791 0.005313 ** 
+## Flow                         -1.525e-03  9.636e-05 -15.827  < 2e-16 ***
+## StationCodeSTTD              -9.929e-01  5.532e-02 -17.948  < 2e-16 ***
+## StationCodeLIB               -2.094e+00  1.172e-01 -17.863  < 2e-16 ***
+## StationCodeRVB               -2.154e+00  1.067e-01 -20.194  < 2e-16 ***
+## Year_fct2016:Flow            -1.258e-04  3.148e-05  -3.996 6.70e-05 ***
+## Year_fct2017:Flow            -7.326e-05  2.717e-05  -2.696 0.007076 ** 
+## Year_fct2018:Flow            -8.557e-06  2.722e-05  -0.314 0.753278    
+## Year_fct2019:Flow            -1.301e-05  2.923e-05  -0.445 0.656205    
+## Year_fct2016:StationCodeSTTD  7.413e-01  8.117e-02   9.133  < 2e-16 ***
+## Year_fct2017:StationCodeSTTD  5.377e-01  8.588e-02   6.261 4.75e-10 ***
+## Year_fct2018:StationCodeSTTD -3.377e-01  7.771e-02  -4.345 1.47e-05 ***
+## Year_fct2019:StationCodeSTTD -9.913e-01  7.655e-02 -12.950  < 2e-16 ***
+## Year_fct2016:StationCodeLIB   1.196e+00  9.326e-02  12.829  < 2e-16 ***
+## Year_fct2017:StationCodeLIB   4.439e-01  9.307e-02   4.769 1.99e-06 ***
+## Year_fct2018:StationCodeLIB  -1.446e+00  1.205e-01 -12.006  < 2e-16 ***
+## Year_fct2019:StationCodeLIB  -3.061e-01  1.257e-01  -2.436 0.014943 *  
+## Year_fct2016:StationCodeRVB   1.500e+00  2.213e-01   6.779 1.62e-11 ***
+## Year_fct2017:StationCodeRVB   6.775e-01  1.990e-01   3.404 0.000679 ***
+## Year_fct2018:StationCodeRVB  -9.430e-02  1.626e-01  -0.580 0.561995    
+## Year_fct2019:StationCodeRVB  -6.203e-01  1.969e-01  -3.151 0.001653 ** 
+## Flow:StationCodeSTTD          3.304e-03  1.368e-04  24.145  < 2e-16 ***
+## Flow:StationCodeLIB           1.434e-03  1.151e-04  12.458  < 2e-16 ***
+## Flow:StationCodeRVB           1.560e-03  9.489e-05  16.443  < 2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Residual standard error: 0.3952 on 1846 degrees of freedom
+## Multiple R-squared:  0.8332, Adjusted R-squared:  0.8308 
+## F-statistic: 341.5 on 27 and 1846 DF,  p-value: < 2.2e-16
+
df_chla_c2_lag %>% 
+  drop_na(Chla_log) %>% 
+  plot_lm_diag(Chla_log, m_lm2)
+
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
+

+
shapiro.test(residuals(m_lm2))
+
## 
+##  Shapiro-Wilk normality test
+## 
+## data:  residuals(m_lm2)
+## W = 0.96829, p-value < 2.2e-16
+
acf(residuals(m_lm2))
+

+
Box.test(residuals(m_lm2), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_lm2)
+## X-squared = 4598.1, df = 20, p-value < 2.2e-16
+

The residuals deviate from a normal distribution according to visual +inspection and the Shapiro-Wilk normality test. Also, model definitely +has residual autocorrelation as indicated by the ACF plot and the +Box-Ljung test.

+
+

Model with lag terms

+

Now, we’ll try to deal with the residual autocorrelation and the +non-normal residuals. We’ll run a series of linear models adding 1, 2, +and 3 lag terms and compare how well they correct for +autocorrelation.

+
+

Lag 1

+
m_lm2_lag1 <- df_chla_c2_lag %>% 
+  drop_na(Chla_log, lag1) %>% 
+  lm(Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1, data = .)
+
+acf(residuals(m_lm2_lag1))
+

+
Box.test(residuals(m_lm2_lag1), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_lm2_lag1)
+## X-squared = 59.714, df = 20, p-value = 7.884e-06
+
+
+

Lag 2

+
m_lm2_lag2 <- df_chla_c2_lag %>% 
+  drop_na(Chla_log, lag1, lag2) %>% 
+  lm(Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + lag2, data = .)
+
+acf(residuals(m_lm2_lag2))
+

+
Box.test(residuals(m_lm2_lag2), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_lm2_lag2)
+## X-squared = 45.379, df = 20, p-value = 0.00098
+
+
+

Lag 3

+
m_lm2_lag3 <- df_chla_c2_lag %>% 
+  drop_na(Chla_log, lag1, lag2, lag3) %>% 
+  lm(Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + lag2 + lag3, data = .)
+
+acf(residuals(m_lm2_lag3))
+

+
Box.test(residuals(m_lm2_lag3), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_lm2_lag3)
+## X-squared = 25.678, df = 20, p-value = 0.1767
+

The model with 3 lag terms looks like it has better ACF and Box-Ljung +test results than the other models. Let’s compare the 3 models using AIC +and BIC to see which model those prefer.

+
+
+

Compare models

+
AIC(m_lm2_lag1, m_lm2_lag2, m_lm2_lag3)
+
##            df       AIC
+## m_lm2_lag1 30 -1341.836
+## m_lm2_lag2 31 -1328.270
+## m_lm2_lag3 32 -1351.047
+
BIC(m_lm2_lag1, m_lm2_lag2, m_lm2_lag3)
+
##            df       BIC
+## m_lm2_lag1 30 -1176.425
+## m_lm2_lag2 31 -1158.012
+## m_lm2_lag3 32 -1175.945
+

AIC prefers the model with 3 lag terms while BIC slightly prefers the +model with 1 lag term. Let’s take a closer look at the model with 3 lag +terms since it had better ACF and Box-Ljung test results.

+
+
+
+

Lag 3 model summary

+
summary(m_lm2_lag3)
+
## 
+## Call:
+## lm(formula = Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + 
+##     lag2 + lag3, data = .)
+## 
+## Residuals:
+##      Min       1Q   Median       3Q      Max 
+## -0.88352 -0.05588 -0.00615  0.05095  1.13186 
+## 
+## Coefficients:
+##                                Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)                   1.021e+00  1.030e-01   9.917  < 2e-16 ***
+## Year_fct2016                 -4.245e-02  2.424e-02  -1.751 0.080137 .  
+## Year_fct2017                 -3.954e-02  2.399e-02  -1.648 0.099540 .  
+## Year_fct2018                 -2.729e-02  2.205e-02  -1.237 0.216141    
+## Year_fct2019                  8.278e-03  2.223e-02   0.372 0.709665    
+## Flow                         -1.826e-04  4.293e-05  -4.254 2.21e-05 ***
+## StationCodeSTTD              -8.930e-02  2.560e-02  -3.489 0.000497 ***
+## StationCodeLIB               -1.633e-01  5.564e-02  -2.936 0.003371 ** 
+## StationCodeRVB               -1.892e-01  4.989e-02  -3.793 0.000154 ***
+## lag1                          9.766e-01  2.404e-02  40.624  < 2e-16 ***
+## lag2                         -3.105e-02  3.327e-02  -0.933 0.350706    
+## lag3                         -5.219e-02  2.385e-02  -2.189 0.028766 *  
+## Year_fct2016:Flow            -1.045e-05  1.389e-05  -0.752 0.452144    
+## Year_fct2017:Flow            -2.358e-06  1.139e-05  -0.207 0.836010    
+## Year_fct2018:Flow             3.587e-06  1.136e-05   0.316 0.752176    
+## Year_fct2019:Flow             5.525e-06  1.265e-05   0.437 0.662376    
+## Year_fct2016:StationCodeSTTD  7.295e-02  3.526e-02   2.069 0.038693 *  
+## Year_fct2017:StationCodeSTTD  7.972e-02  3.727e-02   2.139 0.032553 *  
+## Year_fct2018:StationCodeSTTD -5.016e-02  3.289e-02  -1.525 0.127423    
+## Year_fct2019:StationCodeSTTD -1.443e-01  3.397e-02  -4.249 2.26e-05 ***
+## Year_fct2016:StationCodeLIB   1.040e-01  4.189e-02   2.482 0.013175 *  
+## Year_fct2017:StationCodeLIB   3.142e-02  3.948e-02   0.796 0.426219    
+## Year_fct2018:StationCodeLIB  -1.231e-01  5.723e-02  -2.152 0.031553 *  
+## Year_fct2019:StationCodeLIB  -4.936e-02  6.070e-02  -0.813 0.416244    
+## Year_fct2016:StationCodeRVB   1.633e-01  1.019e-01   1.602 0.109254    
+## Year_fct2017:StationCodeRVB   6.267e-02  8.373e-02   0.748 0.454265    
+## Year_fct2018:StationCodeRVB  -1.640e-02  6.745e-02  -0.243 0.807882    
+## Year_fct2019:StationCodeRVB  -1.252e-01  8.814e-02  -1.420 0.155702    
+## Flow:StationCodeSTTD          3.908e-04  6.614e-05   5.909 4.14e-09 ***
+## Flow:StationCodeLIB           2.095e-04  5.088e-05   4.118 4.00e-05 ***
+## Flow:StationCodeRVB           1.777e-04  4.250e-05   4.181 3.05e-05 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Residual standard error: 0.1632 on 1727 degrees of freedom
+## Multiple R-squared:  0.9704, Adjusted R-squared:  0.9699 
+## F-statistic:  1890 on 30 and 1727 DF,  p-value: < 2.2e-16
+
df_chla_c2_lag %>% 
+  drop_na(Chla_log, lag1, lag2, lag3) %>% 
+  plot_lm_diag(Chla_log, m_lm2_lag3)
+
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
+

+
shapiro.test(residuals(m_lm2_lag3))
+
## 
+##  Shapiro-Wilk normality test
+## 
+## data:  residuals(m_lm2_lag3)
+## W = 0.85565, p-value < 2.2e-16
+

As with the model with 3-way interactions, the residuals deviate from +a normal distribution particularly at the both tails of the +distribution. I’m not so sure about using this model. Let’s take a +closer look at how the back-transformed fitted values from the model +match the observed values.

+
df_lm2_lag3_fit <- df_chla_c2_lag %>% 
+  drop_na(Chla, lag1, lag2, lag3) %>% 
+  mutate(Fitted = as_tibble(predict(m_lm2_lag3, interval = "confidence"))) %>% 
+  unnest_wider(Fitted) %>% 
+  mutate(across(c("fit", "lwr", "upr"), ~ exp(.x) / 1000))
+
+plt_lm2_lag3_obs_fit <- df_lm2_lag3_fit %>% 
+  ggplot(aes(x = fit, y = Chla)) +
+  geom_point() +
+  geom_abline(slope = 1, intercept = 0, color = "red") +
+  theme_bw() +
+  labs(
+    x = "Back-transformed Fitted Values",
+    y = "Observed Values"
+  )
+
+plt_lm2_lag3_obs_fit
+

+

Let’s look at each Station and Year separately.

+
plt_lm2_lag3_obs_fit + facet_wrap(vars(StationCode), scales = "free")
+

+
plt_lm2_lag3_obs_fit + facet_wrap(vars(Year_fct), scales = "free")
+

+

The red lines are the 1:1 ratio between fitted and observed values, +and we would expect for most of the points to be near this line if the +model has a good fit to the observed data. The fitted and observed +values appear to match pretty well at the lower end of the range of +values, but this deteriorates at the mid and higher range values. This +pattern holds for some of the individual Stations and Years. I’m not +sure this linear model is a good candidate to use for our analysis.

+
+
+
+
+

Linear Model with single station - STTD

+

Since none of the models we tried so far seemed to fit the observed +data very well, it may be interesting to run separate models for each of +the four stations - RD22, STTD, LIB, RVB - to see if that helps improve +model fit. We’ll try STTD as a test case.

+
+

Initial Model

+

Let’s try a linear model using a two-way interaction between Year and +Daily Average Flow. Initially, we’ll run the model without accounting +for serial autocorrelation.

+
df_sttd <- df_chla_c2_lag %>% filter(StationCode == "STTD")
+
+m_lm2_sttd <- df_sttd %>% 
+  drop_na(Chla_log) %>% 
+  lm(Chla_log ~ Year_fct * Flow, data = .)
+

Lets look at the model summary and diagnostics:

+
summary(m_lm2_sttd)
+
## 
+## Call:
+## lm(formula = Chla_log ~ Year_fct * Flow, data = .)
+## 
+## Residuals:
+##     Min      1Q  Median      3Q     Max 
+## -0.7854 -0.1724 -0.0296  0.1377  1.5439 
+## 
+## Coefficients:
+##                     Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)        8.6179586  0.0295402 291.736  < 2e-16 ***
+## Year_fct2016       0.4258856  0.0459390   9.271  < 2e-16 ***
+## Year_fct2017       0.3102626  0.0675111   4.596 5.71e-06 ***
+## Year_fct2018      -0.6463737  0.0456741 -14.152  < 2e-16 ***
+## Year_fct2019      -1.0607392  0.0444413 -23.868  < 2e-16 ***
+## Flow               0.0027149  0.0002046  13.269  < 2e-16 ***
+## Year_fct2016:Flow -0.0013937  0.0002865  -4.865 1.62e-06 ***
+## Year_fct2017:Flow  0.0051487  0.0017705   2.908  0.00383 ** 
+## Year_fct2018:Flow -0.0004072  0.0002580  -1.578  0.11523    
+## Year_fct2019:Flow -0.0014032  0.0002321  -6.047 3.26e-09 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Residual standard error: 0.3032 on 421 degrees of freedom
+## Multiple R-squared:  0.809,  Adjusted R-squared:  0.805 
+## F-statistic: 198.2 on 9 and 421 DF,  p-value: < 2.2e-16
+
df_sttd %>% 
+  drop_na(Chla_log) %>% 
+  plot_lm_diag(Chla_log, m_lm2_sttd)
+
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
+

+
shapiro.test(residuals(m_lm2_sttd))
+
## 
+##  Shapiro-Wilk normality test
+## 
+## data:  residuals(m_lm2_sttd)
+## W = 0.94651, p-value = 2.328e-11
+
acf(residuals(m_lm2_sttd))
+

+
Box.test(residuals(m_lm2_sttd), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_lm2_sttd)
+## X-squared = 463.9, df = 20, p-value < 2.2e-16
+

The residuals deviate from a normal distribution according to visual +inspection and the Shapiro-Wilk normality test. Also, model definitely +has residual autocorrelation as indicated by the ACF plot and the +Box-Ljung test.

+
+

Model with lag terms

+

Now, we’ll try to deal with the residual autocorrelation and the +non-normal residuals. We’ll run a series of linear models adding 1, 2, +and 3 lag terms and compare how well they correct for +autocorrelation.

+
+

Lag 1

+
m_lm2_sttd_lag1 <- df_sttd %>% 
+  drop_na(Chla_log, lag1) %>% 
+  lm(Chla_log ~ Year_fct * Flow + lag1, data = .)
+
+acf(residuals(m_lm2_sttd_lag1))
+

+
Box.test(residuals(m_lm2_sttd_lag1), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_lm2_sttd_lag1)
+## X-squared = 37.18, df = 20, p-value = 0.01113
+
+
+

Lag 2

+
m_lm2_sttd_lag2 <- df_sttd %>% 
+  drop_na(Chla_log, lag1, lag2) %>% 
+  lm(Chla_log ~ Year_fct * Flow + lag1 + lag2, data = .)
+
+acf(residuals(m_lm2_sttd_lag2))
+

+
Box.test(residuals(m_lm2_sttd_lag2), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_lm2_sttd_lag2)
+## X-squared = 18.734, df = 20, p-value = 0.5392
+
+
+

Lag 3

+
m_lm2_sttd_lag3 <- df_sttd %>% 
+  drop_na(Chla_log, lag1, lag2, lag3) %>% 
+  lm(Chla_log ~ Year_fct * Flow + lag1 + lag2 + lag3, data = .)
+
+acf(residuals(m_lm2_sttd_lag3))
+

+
Box.test(residuals(m_lm2_sttd_lag3), lag = 20, type = 'Ljung-Box')
+
## 
+##  Box-Ljung test
+## 
+## data:  residuals(m_lm2_sttd_lag3)
+## X-squared = 10.813, df = 20, p-value = 0.9509
+

The models with 2 and 3 lag terms seem to take care of the +autocorrelation. Let’s compare the 3 models using AIC and BIC to see +which model those prefer.

+
+
+

Compare models

+
AIC(m_lm2_sttd_lag1, m_lm2_sttd_lag2, m_lm2_sttd_lag3)
+
##                 df       AIC
+## m_lm2_sttd_lag1 12 -328.2932
+## m_lm2_sttd_lag2 13 -350.6079
+## m_lm2_sttd_lag3 14 -340.5105
+
BIC(m_lm2_sttd_lag1, m_lm2_sttd_lag2, m_lm2_sttd_lag3)
+
##                 df       BIC
+## m_lm2_sttd_lag1 12 -279.7532
+## m_lm2_sttd_lag2 13 -298.3031
+## m_lm2_sttd_lag3 14 -284.4561
+

Both AIC and BIC prefer the model with 2 lag terms. Let’s take a +closer look at that one.

+
+
+
+

Lag 2 model summary

+
summary(m_lm2_sttd_lag2)
+
## 
+## Call:
+## lm(formula = Chla_log ~ Year_fct * Flow + lag1 + lag2, data = .)
+## 
+## Residuals:
+##      Min       1Q   Median       3Q      Max 
+## -0.88426 -0.06597 -0.00901  0.05116  0.81176 
+## 
+## Coefficients:
+##                     Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)        1.612e+00  2.284e-01   7.058 7.50e-12 ***
+## Year_fct2016       6.863e-02  2.667e-02   2.573 0.010442 *  
+## Year_fct2017       1.177e-01  3.644e-02   3.230 0.001339 ** 
+## Year_fct2018      -1.292e-01  2.907e-02  -4.445 1.14e-05 ***
+## Year_fct2019      -2.085e-01  3.616e-02  -5.766 1.62e-08 ***
+## Flow               5.131e-04  1.324e-04   3.875 0.000124 ***
+## lag1               1.109e+00  4.925e-02  22.527  < 2e-16 ***
+## lag2              -2.956e-01  4.819e-02  -6.133 2.07e-09 ***
+## Year_fct2016:Flow -2.822e-04  1.555e-04  -1.816 0.070186 .  
+## Year_fct2017:Flow  2.695e-03  1.001e-03   2.693 0.007380 ** 
+## Year_fct2018:Flow -5.917e-05  1.375e-04  -0.430 0.667197    
+## Year_fct2019:Flow -2.500e-04  1.302e-04  -1.920 0.055609 .  
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Residual standard error: 0.1557 on 401 degrees of freedom
+## Multiple R-squared:  0.9497, Adjusted R-squared:  0.9483 
+## F-statistic: 687.9 on 11 and 401 DF,  p-value: < 2.2e-16
+
df_sttd %>% 
+  drop_na(Chla_log, lag1, lag2) %>% 
+  plot_lm_diag(Chla_log, m_lm2_sttd_lag2)
+
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
+

+
shapiro.test(residuals(m_lm2_sttd_lag2))
+
## 
+##  Shapiro-Wilk normality test
+## 
+## data:  residuals(m_lm2_sttd_lag2)
+## W = 0.86609, p-value < 2.2e-16
+

As with all the other linear models, the residuals deviate from a +normal distribution particularly at the both tails of the distribution. +I’m not so sure about using this model. Let’s take a closer look at how +the back-transformed fitted values from the model match the observed +values.

+
df_lm2_sttd_lag2_fit <- df_sttd %>% 
+  drop_na(Chla, lag1, lag2) %>% 
+  mutate(Fitted = as_tibble(predict(m_lm2_sttd_lag2, interval = "confidence"))) %>% 
+  unnest_wider(Fitted) %>% 
+  mutate(across(c("fit", "lwr", "upr"), ~ exp(.x) / 1000))
+
+plt_lm2_sttd_lag2_obs_fit <- df_lm2_sttd_lag2_fit %>% 
+  ggplot(aes(x = fit, y = Chla)) +
+  geom_point() +
+  geom_abline(slope = 1, intercept = 0, color = "red") +
+  theme_bw() +
+  labs(
+    x = "Back-transformed Fitted Values",
+    y = "Observed Values"
+  )
+
+plt_lm2_sttd_lag2_obs_fit
+

+

Let’s look at each Year separately.

+
plt_lm2_sttd_lag2_obs_fit + facet_wrap(vars(Year_fct), scales = "free")
+

+

The red lines are the 1:1 ratio between fitted and observed values, +and we would expect for most of the points to be near this line if the +model has a good fit to the observed data. The fitted and observed +values appear to match pretty well at the lower end of the range of +values, but this deteriorates at the higher values. This pattern holds +for some of the individual Years. I’m not sure this linear model is a +good candidate to use for our analysis.

+
+
+
+
+

Conclusion

+

None of the models we tried in this analysis seemed to fit the +observed data very well. It looks like daily average flow has an effect +on chlorophyll concentrations at some of the stations, but flow doesn’t +seem to be a good predictor overall. Unfortunately, I don’t think we +should use any of these models in our analysis.

+
+ + + +
+
+ +
+ + + + + + + + + + + + + + + + + diff --git a/manuscript_synthesis/notebooks/rtm_chl_models_flow.Rmd b/manuscript_synthesis/notebooks/rtm_chl_models_flow_daily_avg.Rmd similarity index 90% rename from manuscript_synthesis/notebooks/rtm_chl_models_flow.Rmd rename to manuscript_synthesis/notebooks/rtm_chl_models_flow_daily_avg.Rmd index 914e460..faf1a0f 100644 --- a/manuscript_synthesis/notebooks/rtm_chl_models_flow.Rmd +++ b/manuscript_synthesis/notebooks/rtm_chl_models_flow_daily_avg.Rmd @@ -138,9 +138,9 @@ df_chla_c2_lag <- df_chla_c2 %>% # Plots -Let's explore the data with some plots. First, lets plot the data in scatterplots of chlorophyll and flow facetted by Station and grouping all years together. +Let's explore the data with some plots. First, lets plot the data in scatter plots of chlorophyll and flow faceted by Station and grouping all years together. -```{r chla scatterplot all yrs, message = FALSE, fig.height = 6, fig.width = 6} +```{r chla scatterplot all yrs, message = FALSE, fig.height = 6} df_chla_c2 %>% ggplot(aes(x = Flow, y = Chla)) + geom_point() + @@ -153,7 +153,7 @@ At first glance, I'm not sure how well flow is going to be able to predict chlor Let's break these scatterplots apart by year to see how these patterns vary annually. -```{r chla scatterplot facet yrs, message = FALSE, fig.height = 8, fig.width = 8} +```{r chla scatterplot facet yrs, message = FALSE, fig.height = 8, fig.width = 8.5} df_chla_c2 %>% ggplot(aes(x = Flow, y = Chla)) + geom_point() + @@ -272,7 +272,7 @@ anova( ) ``` -It looks like the AR(3) model has the best fit according to the AIC values. However, BIC prefers the AR(1) model. Let's take a closer look at the AR(1) model. +It looks like both AIC and BIC prefer the AR(1) model. Let's take a closer look at the AR(1) model. ## AR(1) Model @@ -313,7 +313,7 @@ plt_gamm3_ar1_obs_fit Let's look at each Station-Year combination separately. -```{r gam3 ar1 obs vs fit facet sta yr, fig.height = 8, fig.width = 8} +```{r gam3 ar1 obs vs fit facet sta yr, fig.height = 8, fig.width = 8.5} plt_gamm3_ar1_obs_fit + facet_wrap( vars(StationCode, Year_fct), @@ -390,7 +390,7 @@ acf(residuals(m_lm3_lag3)) Box.test(residuals(m_lm3_lag3), lag = 20, type = 'Ljung-Box') ``` -The model with 3 lag terms looks to have slightly better ACF and Box-Ljung test results than the other models. Let's compare the 3 models using AIC and BIC to see which model those prefer. +The model with 3 lag terms looks like it has better ACF and Box-Ljung test results than the other models. Let's compare the 3 models using AIC and BIC to see which model those prefer. #### Compare models @@ -399,30 +399,30 @@ AIC(m_lm3_lag1, m_lm3_lag2, m_lm3_lag3) BIC(m_lm3_lag1, m_lm3_lag2, m_lm3_lag3) ``` -Both AIC and BIC prefer the model with 1 lag term. Let's take a closer look at that one. +Both AIC and BIC prefer the model with 3 lag terms. Let's take a closer look at that one. -### Lag 1 model summary +### Lag 3 model summary -```{r lm3 lag1 summary and diag} -summary(m_lm3_lag1) +```{r lm3 lag3 summary and diag} +summary(m_lm3_lag3) df_chla_c2_lag %>% - drop_na(Chla_log, lag1) %>% - plot_lm_diag(Chla_log, m_lm3_lag1) + drop_na(Chla_log, lag1, lag2, lag3) %>% + plot_lm_diag(Chla_log, m_lm3_lag3) -shapiro.test(residuals(m_lm3_lag1)) +shapiro.test(residuals(m_lm3_lag3)) ``` The residuals deviate from a normal distribution particularly at the both tails of the distribution. I'm not so sure about using this model. Let's take a closer look at how the back-transformed fitted values from the model match the observed values. -```{r lm3 lag1 obs vs fit} -df_lm3_lag1_fit <- df_chla_c2_lag %>% - drop_na(Chla, lag1) %>% - mutate(Fitted = as_tibble(predict(m_lm3_lag1, interval = "confidence"))) %>% +```{r lm3 lag3 obs vs fit} +df_lm3_lag3_fit <- df_chla_c2_lag %>% + drop_na(Chla, lag1, lag2, lag3) %>% + mutate(Fitted = as_tibble(predict(m_lm3_lag3, interval = "confidence"))) %>% unnest_wider(Fitted) %>% mutate(across(c("fit", "lwr", "upr"), ~ exp(.x) / 1000)) -plt_lm3_lag1_obs_fit <- df_lm3_lag1_fit %>% +plt_lm3_lag3_obs_fit <- df_lm3_lag3_fit %>% ggplot(aes(x = fit, y = Chla)) + geom_point() + geom_abline(slope = 1, intercept = 0, color = "red") + @@ -432,13 +432,13 @@ plt_lm3_lag1_obs_fit <- df_lm3_lag1_fit %>% y = "Observed Values" ) -plt_lm3_lag1_obs_fit +plt_lm3_lag3_obs_fit ``` Let's look at each Station-Year combination separately. -```{r lm3 lag1 obs vs fit facet sta yr, fig.height = 8, fig.width = 8} -plt_lm3_lag1_obs_fit + +```{r lm3 lag1 obs vs fit facet sta yr, fig.height = 8, fig.width = 8.5} +plt_lm3_lag3_obs_fit + facet_wrap( vars(StationCode, Year_fct), ncol = 5, @@ -449,22 +449,6 @@ plt_lm3_lag1_obs_fit + The red lines are the 1:1 ratio between fitted and observed values, and we would expect for most of the points to be near this line if the model has a good fit to the observed data. The fitted and observed values appear to match pretty well at the lower end of the range of values, but this deteriorates at the mid and higher range values. This pattern holds for some of the separate Station-Year combinations. I'm not sure this linear model is a good candidate to use for our analysis. -### Lag 3 model summary - -Let's take a closer look at the model with 3 lag terms since that had the least autocorrelation. - -```{r lm3 lag3 summary and diag} -summary(m_lm3_lag3) - -df_chla_c2_lag %>% - drop_na(Chla_log, lag1, lag2, lag3) %>% - plot_lm_diag(Chla_log, m_lm3_lag3) - -shapiro.test(residuals(m_lm3_lag3)) -``` - -The diagnostic plots show that the lag 3 model has similar problems as the lag 1 model, so this doesn't look like a good candidate for our analysis as well. - # GAM Model with 2-way interactions Because the models using 3-way interactions don't seem to fit the data well, we will attempt to fit a generalized additive model (GAM) to the data set using all two-way interactions between Year, Daily Average Flow, and Station, with a smooth term for day of year to account for seasonality. Initially, we'll run the GAM without accounting for serial autocorrelation. @@ -659,7 +643,7 @@ acf(residuals(m_lm2_lag3)) Box.test(residuals(m_lm2_lag3), lag = 20, type = 'Ljung-Box') ``` -The model with 3 lag terms looks to have slightly better ACF and Box-Ljung test results than the other models. Let's compare the 3 models using AIC and BIC to see which model those prefer. +The model with 3 lag terms looks like it has better ACF and Box-Ljung test results than the other models. Let's compare the 3 models using AIC and BIC to see which model those prefer. #### Compare models @@ -668,30 +652,30 @@ AIC(m_lm2_lag1, m_lm2_lag2, m_lm2_lag3) BIC(m_lm2_lag1, m_lm2_lag2, m_lm2_lag3) ``` -Both AIC and BIC prefer the model with 1 lag term. Let's take a closer look at that one. +AIC prefers the model with 3 lag terms while BIC slightly prefers the model with 1 lag term. Let's take a closer look at the model with 3 lag terms since it had better ACF and Box-Ljung test results. -### Lag 1 model summary +### Lag 3 model summary -```{r lm2 lag1 summary and diag} -summary(m_lm2_lag1) +```{r lm2 lag3 summary and diag} +summary(m_lm2_lag3) df_chla_c2_lag %>% - drop_na(Chla_log, lag1) %>% - plot_lm_diag(Chla_log, m_lm2_lag1) + drop_na(Chla_log, lag1, lag2, lag3) %>% + plot_lm_diag(Chla_log, m_lm2_lag3) -shapiro.test(residuals(m_lm2_lag1)) +shapiro.test(residuals(m_lm2_lag3)) ``` As with the model with 3-way interactions, the residuals deviate from a normal distribution particularly at the both tails of the distribution. I'm not so sure about using this model. Let's take a closer look at how the back-transformed fitted values from the model match the observed values. -```{r lm2 lag1 obs vs fit} -df_lm2_lag1_fit <- df_chla_c2_lag %>% - drop_na(Chla, lag1) %>% - mutate(Fitted = as_tibble(predict(m_lm2_lag1, interval = "confidence"))) %>% +```{r lm2 lag3 obs vs fit} +df_lm2_lag3_fit <- df_chla_c2_lag %>% + drop_na(Chla, lag1, lag2, lag3) %>% + mutate(Fitted = as_tibble(predict(m_lm2_lag3, interval = "confidence"))) %>% unnest_wider(Fitted) %>% mutate(across(c("fit", "lwr", "upr"), ~ exp(.x) / 1000)) -plt_lm2_lag1_obs_fit <- df_lm2_lag1_fit %>% +plt_lm2_lag3_obs_fit <- df_lm2_lag3_fit %>% ggplot(aes(x = fit, y = Chla)) + geom_point() + geom_abline(slope = 1, intercept = 0, color = "red") + @@ -701,35 +685,19 @@ plt_lm2_lag1_obs_fit <- df_lm2_lag1_fit %>% y = "Observed Values" ) -plt_lm2_lag1_obs_fit +plt_lm2_lag3_obs_fit ``` Let's look at each Station and Year separately. -```{r lm2 lag1 obs vs fit facet sta yr} -plt_lm2_lag1_obs_fit + facet_wrap(vars(StationCode), scales = "free") +```{r lm2 lag3 obs vs fit facet sta yr} +plt_lm2_lag3_obs_fit + facet_wrap(vars(StationCode), scales = "free") -plt_lm2_lag1_obs_fit + facet_wrap(vars(Year_fct), scales = "free") +plt_lm2_lag3_obs_fit + facet_wrap(vars(Year_fct), scales = "free") ``` The red lines are the 1:1 ratio between fitted and observed values, and we would expect for most of the points to be near this line if the model has a good fit to the observed data. The fitted and observed values appear to match pretty well at the lower end of the range of values, but this deteriorates at the mid and higher range values. This pattern holds for some of the individual Stations and Years. I'm not sure this linear model is a good candidate to use for our analysis. -### Lag 3 model summary - -Let's take a closer look at the model with 3 lag terms since that had the least autocorrelation. - -```{r lm2 lag3 summary and diag} -summary(m_lm2_lag3) - -df_chla_c2_lag %>% - drop_na(Chla_log, lag1, lag2, lag3) %>% - plot_lm_diag(Chla_log, m_lm2_lag3) - -shapiro.test(residuals(m_lm2_lag3)) -``` - -The diagnostic plots show that the lag 3 model has similar problems as the lag 1 model, so this doesn't look like a good candidate for our analysis as well. - # Linear Model with single station - STTD Since none of the models we tried so far seemed to fit the observed data very well, it may be interesting to run separate models for each of the four stations - RD22, STTD, LIB, RVB - to see if that helps improve model fit. We'll try STTD as a test case. @@ -799,7 +767,7 @@ acf(residuals(m_lm2_sttd_lag3)) Box.test(residuals(m_lm2_sttd_lag3), lag = 20, type = 'Ljung-Box') ``` -The model with 2 lag terms seems to take care of the autocorrelation. Let's compare the 3 models using AIC and BIC to see which model those prefer. +The models with 2 and 3 lag terms seem to take care of the autocorrelation. Let's compare the 3 models using AIC and BIC to see which model those prefer. #### Compare models