diff --git a/docs/index.md b/docs/index.md index 071e5a5..cc5b7b2 100644 --- a/docs/index.md +++ b/docs/index.md @@ -6,7 +6,8 @@ layout: default ### Exploratory Analyses -* [Models using Categorical Predictors](rtm_chl_gam_analysis_categorical.html) +* [GAM Model using Categorical Predictors - Initial analysis](rtm_chl_gam_categorical_initial.html) +* [Models using Categorical Predictors - Revised analysis](rtm_chl_models_categorical_revised.html) * [Models using Flow as a Continuous Predictor](rtm_chl_models_flow.html) * [Explore relationship between continuous chlorophyll and percent flow pulse water](rtm_chl_analysis_perc_flow_pulse.html) diff --git a/docs/rtm_chl_analysis_final_model_results.html b/docs/rtm_chl_analysis_final_model_results.html deleted file mode 100644 index 0b375bf..0000000 --- a/docs/rtm_chl_analysis_final_model_results.html +++ /dev/null @@ -1,2093 +0,0 @@ - - - - -
- - - - - - - - - - -Look at results of the preferred model during model selection: Linear -model with 3-way interactions between categorical variables - Year, -Station, and Flow Pulse Period - without seasonal term.
-# Load packages
-library(tidyverse)
-library(car)
-library(emmeans)
-library(multcomp)
-library(here)
-library(conflicted)
-
-# Declare package conflict preferences
-conflicts_prefer(dplyr::filter(), dplyr::select())
-Display current versions of R and packages used for this -analysis:
-devtools::session_info()
-## ─ Session info ───────────────────────────────────────────────────────────────
-## setting value
-## version R version 4.2.3 (2023-03-15 ucrt)
-## os Windows 10 x64 (build 19044)
-## system x86_64, mingw32
-## ui RTerm
-## language (EN)
-## collate English_United States.utf8
-## ctype English_United States.utf8
-## tz America/Los_Angeles
-## date 2023-11-01
-## pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
-##
-## ─ Packages ───────────────────────────────────────────────────────────────────
-## package * version date (UTC) lib source
-## abind 1.4-5 2016-07-21 [1] CRAN (R 4.2.0)
-## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.2.2)
-## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.2.3)
-## callr 3.7.3 2022-11-02 [1] CRAN (R 4.2.2)
-## car * 3.1-2 2023-03-30 [1] CRAN (R 4.2.3)
-## carData * 3.0-5 2022-01-06 [1] CRAN (R 4.2.1)
-## cli 3.6.1 2023-03-23 [1] CRAN (R 4.2.3)
-## coda 0.19-4 2020-09-30 [1] CRAN (R 4.2.1)
-## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.2.3)
-## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.2.2)
-## conflicted * 1.2.0 2023-02-01 [1] CRAN (R 4.2.2)
-## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.1)
-## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.2.1)
-## digest 0.6.33 2023-07-07 [1] CRAN (R 4.2.3)
-## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.2.3)
-## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.1)
-## emmeans * 1.8.5 2023-03-08 [1] CRAN (R 4.2.2)
-## estimability 1.4.1 2022-08-05 [1] CRAN (R 4.2.1)
-## evaluate 0.21 2023-05-05 [1] CRAN (R 4.2.3)
-## fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.2)
-## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.2.2)
-## forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.2.2)
-## fs 1.6.3 2023-07-20 [1] CRAN (R 4.2.3)
-## generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.1)
-## ggplot2 * 3.4.3 2023-08-14 [1] CRAN (R 4.2.3)
-## glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.1)
-## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.2.3)
-## here * 1.0.1 2020-12-13 [1] CRAN (R 4.2.1)
-## hms 1.1.3 2023-03-21 [1] CRAN (R 4.2.3)
-## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.2.3)
-## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.2.3)
-## httpuv 1.6.9 2023-02-14 [1] CRAN (R 4.2.2)
-## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.1)
-## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.2.3)
-## knitr 1.42 2023-01-25 [1] CRAN (R 4.2.2)
-## later 1.3.0 2021-08-18 [1] CRAN (R 4.2.1)
-## lattice 0.21-8 2023-04-05 [1] CRAN (R 4.2.3)
-## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.1)
-## lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.2.3)
-## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.1)
-## MASS * 7.3-58.2 2023-01-23 [2] CRAN (R 4.2.3)
-## Matrix 1.5-4 2023-04-04 [1] CRAN (R 4.2.3)
-## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.1)
-## mime 0.12 2021-09-28 [1] CRAN (R 4.2.0)
-## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.2.1)
-## multcomp * 1.4-23 2023-03-09 [1] CRAN (R 4.2.2)
-## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.1)
-## mvtnorm * 1.1-3 2021-10-08 [1] CRAN (R 4.2.0)
-## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.2.3)
-## pkgbuild 1.4.2 2023-06-26 [1] CRAN (R 4.2.3)
-## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.1)
-## pkgload 1.3.2.1 2023-07-08 [1] CRAN (R 4.2.3)
-## prettyunits 1.2.0 2023-09-24 [1] CRAN (R 4.2.3)
-## processx 3.8.2 2023-06-30 [1] CRAN (R 4.2.3)
-## profvis 0.3.7 2020-11-02 [1] CRAN (R 4.2.1)
-## promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.2.1)
-## ps 1.7.5 2023-04-18 [1] CRAN (R 4.2.3)
-## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.2.3)
-## R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.1)
-## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.2.3)
-## readr * 2.1.4 2023-02-10 [1] CRAN (R 4.2.2)
-## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.1)
-## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.2.3)
-## rmarkdown 2.21 2023-03-26 [1] CRAN (R 4.2.3)
-## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.1)
-## rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.1)
-## sandwich 3.0-2 2022-06-15 [1] CRAN (R 4.2.1)
-## sass 0.4.6 2023-05-03 [1] CRAN (R 4.2.3)
-## scales 1.2.1 2022-08-20 [1] CRAN (R 4.2.1)
-## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
-## shiny 1.7.4 2022-12-15 [1] CRAN (R 4.2.2)
-## stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.2)
-## stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.2.2)
-## survival * 3.5-5 2023-03-12 [1] CRAN (R 4.2.3)
-## TH.data * 1.1-2 2023-04-17 [1] CRAN (R 4.2.3)
-## tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.2.3)
-## tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.2.2)
-## tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.1)
-## tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.2.2)
-## timechange 0.2.0 2023-01-11 [1] CRAN (R 4.2.2)
-## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.2.3)
-## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.2.1)
-## usethis 2.1.6 2022-05-25 [1] CRAN (R 4.2.1)
-## utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.2)
-## vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.2.3)
-## withr 2.5.1 2023-09-26 [1] CRAN (R 4.2.3)
-## xfun 0.39 2023-04-20 [1] CRAN (R 4.2.3)
-## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.1)
-## yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.2)
-## zoo 1.8-12 2023-04-13 [1] CRAN (R 4.2.3)
-##
-## [1] C:/R/win-library/4.2
-## [2] C:/Program Files/R/R-4.2.3/library
-##
-## ──────────────────────────────────────────────────────────────────────────────
-# Define file path for model and data used in the model
-fp_model <- here("manuscript_synthesis/model")
-
-# Import model and data used in the model
-m_lm_cat3 <- read_rds(file.path(fp_model, "chla_cat3_lm_model.rds"))
-df_chla <- read_rds(file.path(fp_model, "chla_model_data.rds"))
-Model formula: Chla_log ~ Year_fct * FlowActionPeriod * StationCode + -lag1
-Anova(m_lm_cat3, type = 3, contrasts = list(topic = contr.sum, sys = contr.sum))
-## Anova Table (Type III tests)
-##
-## Response: Chla_log
-## Sum Sq Df F value Pr(>F)
-## (Intercept) 4.886 1 186.5576 < 2.2e-16 ***
-## Year_fct 0.299 6 1.9058 0.07623 .
-## FlowActionPeriod 0.546 2 10.4251 3.084e-05 ***
-## StationCode 0.690 3 8.7860 8.495e-06 ***
-## lag1 175.299 1 6693.8549 < 2.2e-16 ***
-## Year_fct:FlowActionPeriod 1.547 12 4.9233 4.106e-08 ***
-## Year_fct:StationCode 2.081 18 4.4137 1.525e-09 ***
-## FlowActionPeriod:StationCode 0.980 6 6.2376 1.606e-06 ***
-## Year_fct:FlowActionPeriod:StationCode 2.519 36 2.6719 3.074e-07 ***
-## Residuals 73.614 2811
-## ---
-## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-Tukey pairwise contrasts of Flow Pulse Period for each Station - Year -combination:
-em_cat3 <- emmeans(m_lm_cat3, ~ FlowActionPeriod | StationCode * Year_fct)
-pairs(em_cat3)
-## StationCode = RD22, Year_fct = 2013:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.03104 0.0706 2811 0.439 0.8990
-## Before - After -0.14576 0.0726 2811 -2.009 0.1103
-## During - After -0.17680 0.0391 2811 -4.527 <.0001
-##
-## StationCode = STTD, Year_fct = 2013:
-## contrast estimate SE df t.ratio p.value
-## Before - During -0.12119 0.0711 2811 -1.705 0.2035
-## Before - After 0.03327 0.0718 2811 0.463 0.8885
-## During - After 0.15446 0.0384 2811 4.027 0.0002
-##
-## StationCode = LIB, Year_fct = 2013:
-## contrast estimate SE df t.ratio p.value
-## Before - During -0.01689 0.0368 2811 -0.459 0.8903
-## Before - After -0.03037 0.0361 2811 -0.842 0.6767
-## During - After -0.01348 0.0345 2811 -0.390 0.9195
-##
-## StationCode = RVB, Year_fct = 2013:
-## contrast estimate SE df t.ratio p.value
-## Before - During -0.02352 0.0348 2811 -0.677 0.7771
-## Before - After 0.02883 0.0340 2811 0.849 0.6727
-## During - After 0.05235 0.0347 2811 1.509 0.2869
-##
-## StationCode = RD22, Year_fct = 2014:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.08579 0.0488 2811 1.757 0.1844
-## Before - After 0.08265 0.0347 2811 2.385 0.0452
-## During - After -0.00314 0.0481 2811 -0.065 0.9977
-##
-## StationCode = STTD, Year_fct = 2014:
-## contrast estimate SE df t.ratio p.value
-## Before - During -0.05266 0.0483 2811 -1.090 0.5206
-## Before - After -0.00278 0.0390 2811 -0.071 0.9972
-## During - After 0.04987 0.0518 2811 0.962 0.6009
-##
-## StationCode = LIB, Year_fct = 2014:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.01696 0.0482 2811 0.352 0.9342
-## Before - After 0.03826 0.0340 2811 1.127 0.4975
-## During - After 0.02131 0.0481 2811 0.443 0.8977
-##
-## StationCode = RVB, Year_fct = 2014:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.05967 0.0512 2811 1.165 0.4743
-## Before - After 0.02518 0.0342 2811 0.736 0.7418
-## During - After -0.03450 0.0509 2811 -0.678 0.7762
-##
-## StationCode = RD22, Year_fct = 2015:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.20880 0.0410 2811 5.092 <.0001
-## Before - After 0.03994 0.0403 2811 0.991 0.5829
-## During - After -0.16886 0.0367 2811 -4.601 <.0001
-##
-## StationCode = STTD, Year_fct = 2015:
-## contrast estimate SE df t.ratio p.value
-## Before - During -0.14016 0.0423 2811 -3.311 0.0027
-## Before - After -0.03064 0.0408 2811 -0.752 0.7327
-## During - After 0.10951 0.0353 2811 3.100 0.0056
-##
-## StationCode = LIB, Year_fct = 2015:
-## contrast estimate SE df t.ratio p.value
-## Before - During -0.05377 0.0360 2811 -1.495 0.2934
-## Before - After 0.01182 0.0340 2811 0.348 0.9354
-## During - After 0.06559 0.0359 2811 1.828 0.1607
-##
-## StationCode = RVB, Year_fct = 2015:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.06439 0.0349 2811 1.846 0.1549
-## Before - After 0.05180 0.0342 2811 1.514 0.2845
-## During - After -0.01259 0.0346 2811 -0.364 0.9295
-##
-## StationCode = RD22, Year_fct = 2016:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.07822 0.0520 2811 1.505 0.2888
-## Before - After 0.17052 0.0442 2811 3.857 0.0003
-## During - After 0.09230 0.0444 2811 2.078 0.0945
-##
-## StationCode = STTD, Year_fct = 2016:
-## contrast estimate SE df t.ratio p.value
-## Before - During -0.09446 0.0522 2811 -1.811 0.1662
-## Before - After -0.00767 0.0434 2811 -0.177 0.9829
-## During - After 0.08679 0.0444 2811 1.954 0.1239
-##
-## StationCode = LIB, Year_fct = 2016:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.00762 0.0443 2811 0.172 0.9839
-## Before - After 0.13064 0.0345 2811 3.791 0.0005
-## During - After 0.12302 0.0443 2811 2.778 0.0152
-##
-## StationCode = RVB, Year_fct = 2016:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.02856 0.0459 2811 0.622 0.8080
-## Before - After 0.05545 0.0381 2811 1.457 0.3119
-## During - After 0.02689 0.0459 2811 0.586 0.8278
-##
-## StationCode = RD22, Year_fct = 2017:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.01657 0.0429 2811 0.386 0.9212
-## Before - After 0.09256 0.0345 2811 2.683 0.0201
-## During - After 0.07599 0.0427 2811 1.780 0.1764
-##
-## StationCode = STTD, Year_fct = 2017:
-## contrast estimate SE df t.ratio p.value
-## Before - During -0.40805 0.0847 2811 -4.820 <.0001
-## Before - After 0.24182 0.0769 2811 3.143 0.0048
-## During - After 0.64986 0.1097 2811 5.922 <.0001
-##
-## StationCode = LIB, Year_fct = 2017:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.03781 0.0428 2811 0.883 0.6511
-## Before - After 0.05582 0.0342 2811 1.634 0.2317
-## During - After 0.01801 0.0427 2811 0.422 0.9065
-##
-## StationCode = RVB, Year_fct = 2017:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.05830 0.0430 2811 1.355 0.3648
-## Before - After 0.06964 0.0344 2811 2.027 0.1059
-## During - After 0.01133 0.0426 2811 0.266 0.9618
-##
-## StationCode = RD22, Year_fct = 2018:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.13021 0.0390 2811 3.340 0.0024
-## Before - After 0.03575 0.0341 2811 1.049 0.5457
-## During - After -0.09446 0.0383 2811 -2.466 0.0366
-##
-## StationCode = STTD, Year_fct = 2018:
-## contrast estimate SE df t.ratio p.value
-## Before - During -0.16774 0.0408 2811 -4.110 0.0001
-## Before - After 0.04857 0.0455 2811 1.068 0.5339
-## During - After 0.21631 0.0486 2811 4.453 <.0001
-##
-## StationCode = LIB, Year_fct = 2018:
-## contrast estimate SE df t.ratio p.value
-## Before - During -0.02439 0.0538 2811 -0.453 0.8931
-## Before - After 0.24472 0.0524 2811 4.672 <.0001
-## During - After 0.26911 0.0412 2811 6.529 <.0001
-##
-## StationCode = RVB, Year_fct = 2018:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.08524 0.0384 2811 2.218 0.0684
-## Before - After 0.06216 0.0342 2811 1.817 0.1642
-## During - After -0.02308 0.0380 2811 -0.608 0.8159
-##
-## StationCode = RD22, Year_fct = 2019:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.17499 0.0409 2811 4.278 0.0001
-## Before - After -0.05633 0.0340 2811 -1.658 0.2217
-## During - After -0.23132 0.0413 2811 -5.608 <.0001
-##
-## StationCode = STTD, Year_fct = 2019:
-## contrast estimate SE df t.ratio p.value
-## Before - During -0.15945 0.0436 2811 -3.657 0.0008
-## Before - After 0.01866 0.0380 2811 0.491 0.8754
-## During - After 0.17811 0.0399 2811 4.461 <.0001
-##
-## StationCode = LIB, Year_fct = 2019:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.08552 0.0551 2811 1.553 0.2665
-## Before - After 0.09072 0.0457 2811 1.985 0.1161
-## During - After 0.00520 0.0482 2811 0.108 0.9936
-##
-## StationCode = RVB, Year_fct = 2019:
-## contrast estimate SE df t.ratio p.value
-## Before - During 0.00326 0.0395 2811 0.083 0.9962
-## Before - After 0.03298 0.0340 2811 0.971 0.5950
-## During - After 0.02972 0.0394 2811 0.755 0.7307
-##
-## P value adjustment: tukey method for comparing a family of 3 estimates
-# Add significance grouping letters from the Tukey post-hoc results and back
- # transform log-transformed results
-df_cat3_tuk <- em_cat3 %>%
- cld(sort = FALSE, Letters = letters) %>%
- as_tibble() %>%
- mutate(
- group = str_remove_all(.group, fixed(" ")),
- across(c(emmean, lower.CL, upper.CL), ~ exp(.x) / 1000),
- Year = as.numeric(as.character(Year_fct))
- ) %>%
- select(
- StationCode,
- Year,
- FlowActionPeriod,
- emmean,
- lower.CL,
- upper.CL,
- group
- )
-
-# Determine vertical positioning of letters for figure
-# Calculate min and max values of observed data for each Station - Year group
-df_chla_summ <- df_chla %>%
- summarize(
- max_val = max(Chla),
- min_val = min(Chla),
- .by = c(StationCode, Year)
- )
-
-# Add min and max values of observed data to the Tukey post-hoc results and
- # calculate vertical positioning of letters
-df_cat3_tuk_c <- df_cat3_tuk %>%
- left_join(df_chla_summ, by = join_by(StationCode, Year)) %>%
- mutate(max_val = if_else(upper.CL > max_val, upper.CL, max_val)) %>%
- group_by(StationCode, Year) %>%
- mutate(max_val = max(max_val)) %>%
- ungroup() %>%
- mutate(y_pos = max_val + (max_val - min_val) / 10)
-
-# Create summary figure
-df_cat3_tuk_c %>%
- ggplot(
- aes(
- x = FlowActionPeriod,
- y = emmean,
- ymin = lower.CL,
- ymax = upper.CL,
- label = group
- )
- ) +
- geom_boxplot(
- data = df_chla,
- aes(x = FlowActionPeriod, y = Chla),
- inherit.aes = FALSE
- ) +
- geom_crossbar(color = "grey82", fill = "grey", alpha = 0.7, linewidth = 0.1) +
- geom_point(color = "red") +
- geom_text(aes(y = y_pos), size = 3) +
- facet_wrap(
- vars(StationCode, Year),
- scales = "free_y",
- nrow = 4,
- labeller = labeller(.multi_line = FALSE)
- ) +
- xlab("Flow Pulse Period") +
- ylab(expression(Chlorophyll~Fluoresence~(mu*g~L^{-1}))) +
- theme_bw() +
- theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
-
-Observed daily average chlorophyll fluorescence values (boxplots) and -model results (model means as red points ±95% confidence intervals as -gray boxes) for the Flow Pulse Period comparisons by Station and Year. -Different letters above boxplots identify statistically significant (p -< 0.05) differences from a Tukey post-hoc test.
-k.check(m_chla_gam)
## k' edf k-index p-value
-## s(DOY) 9 7.206775 1.019771 0.915
+## s(DOY) 9 7.206775 1.015457 0.875
## s(StationCode) 8 5.976525 NA NA
draw(m_chla_gam, select = 1, residuals = TRUE, rug = FALSE)
@@ -2578,9 +2579,9 @@ k.check(m_chla_gam_k5)
-## k' edf k-index p-value
-## s(DOY) 4 3.837093 1.013962 0.8275
-## s(StationCode) 8 5.976554 NA NA
+## k' edf k-index p-value
+## s(DOY) 4 3.837093 0.9938893 0.325
+## s(StationCode) 8 5.976554 NA NA
draw(m_chla_gam_k5, select = 1, residuals = TRUE, rug = FALSE)
plot(m_chla_gam_k5, pages = 1, all.terms = TRUE)
@@ -2695,13 +2696,13 @@ appraise(m_chla_gam_lag1)
k.check(m_chla_gam_lag1)
-## k' edf k-index p-value
-## s(DOY) 4 1.055954 0.9910648 0.235
-## s(StationCode) 8 4.805585 NA NA
+## k' edf k-index p-value
+## s(DOY) 4 1.055954 1.015476 0.855
+## s(StationCode) 8 4.805585 NA NA
draw(m_chla_gam_lag1, select = 1, residuals = TRUE, rug = FALSE)
plot(m_chla_gam_lag1, pages = 1, all.terms = TRUE)
-
+
acf(residuals(m_chla_gam_lag1), na.action = na.pass)
Box.test(residuals(m_chla_gam_lag1), lag = 20, type = 'Ljung-Box')
@@ -2869,8 +2870,8 @@ appraise(m_chla_gamm_k5_ar3_gam)
k.check(m_chla_gamm_k5_ar3_gam)
-## k' edf k-index p-value
-## s(DOY) 4 2.354098 0.9992533 0.46
+## k' edf k-index p-value
+## s(DOY) 4 2.354098 1.002718 0.5225
draw(m_chla_gamm_k5_ar3_gam, select = 1, residuals = TRUE, rug = FALSE)
plot(m_chla_gamm_k5_ar3_gam, pages = 1, all.terms = TRUE)
@@ -2965,11 +2966,11 @@ k.check(m_chla_gamm_k5_ar3b_gam)
## k' edf k-index p-value
-## s(DOY) 4 2.315079 0.9809585 0.0825
+## s(DOY) 4 2.315079 0.9816397 0.115
draw(m_chla_gamm_k5_ar3b_gam, select = 1, residuals = TRUE, rug = FALSE)
plot(m_chla_gamm_k5_ar3b_gam, pages = 1, all.terms = TRUE)
-
+
acf(resid_ar3b)
Box.test(resid_ar3b, lag = 20, type = 'Ljung-Box')
@@ -3601,11 +3602,11 @@ k.check(m_chla_gam_st)
## k' edf k-index p-value
-## s(DOY) 9 8.005388 0.9563788 0
+## s(DOY) 9 8.005388 0.9921791 0.2975
draw(m_chla_gam_st, select = 1, residuals = TRUE, rug = FALSE)
plot(m_chla_gam_st, pages = 1, all.terms = TRUE)
-
+
acf(residuals(m_chla_gam_st))
Box.test(residuals(m_chla_gam_st), lag = 20, type = 'Ljung-Box')
@@ -3737,11 +3738,11 @@ k.check(m_chla_gam_st_k5)
## k' edf k-index p-value
-## s(DOY) 4 3.876466 1.005801 0.6325
+## s(DOY) 4 3.876466 1.012022 0.8075
draw(m_chla_gam_st_k5, select = 1, residuals = TRUE, rug = FALSE)
plot(m_chla_gam_st_k5, pages = 1, all.terms = TRUE)
-
+
Changing the k-value to 5 seems to help reduce the “wiggliness” of the smooth term for DOY. Now, let’s use an AR(p) model to account for the correlated residuals. We’ll run AR(1), AR(2), AR(3), and AR(4) diff --git a/docs/rtm_chl_analysis_model_selection.html b/docs/rtm_chl_models_categorical_revised.html similarity index 90% rename from docs/rtm_chl_analysis_model_selection.html rename to docs/rtm_chl_models_categorical_revised.html index afb8b7a..982ccb2 100644 --- a/docs/rtm_chl_analysis_model_selection.html +++ b/docs/rtm_chl_models_categorical_revised.html @@ -11,7 +11,7 @@ - +
Display current versions of R and packages used for this analysis:
devtools::session_info()
@@ -1632,102 +1635,113 @@ Create function to plot linear model diagnostics:
-# Create model diagnostic plots to check assumptions
+Create functions for document:
+# Create diagnostic plots for linear models to check assumptions
plot_lm_diag <- function(df_data, param_var, unit_label, model, trans = c("none", "log", "sqrt"), ...) {
trans <- match.arg(trans)
@@ -1783,6 +1797,75 @@ Global code and functions
theme_bw()
(plt_hist + plt_qq) / (plt_res_fit + plt_obs_fit)
+}
+
+plot_model_summary <- function(df_data, em_tuk_obj) {
+ # Calculate min and max values of observed data for each Station - Year group to
+ # determine vertical positioning of letters for figure
+ df_data_summ <- df_data %>%
+ summarize(
+ max_val = max(Chla),
+ min_val = min(Chla),
+ .by = c(StationCode, Year)
+ )
+
+ # Add significance grouping letters to the Tukey post-hoc results
+ df_tuk <- em_tuk_obj %>%
+ cld(sort = FALSE, Letters = letters) %>%
+ as_tibble() %>%
+ mutate(
+ group = str_remove_all(.group, fixed(" ")),
+ # back transform log-transformed results
+ across(c(emmean, lower.CL, upper.CL), ~ exp(.x) / 1000),
+ Year = as.numeric(as.character(Year_fct))
+ ) %>%
+ # Add min and max values of observed data to the Tukey post-hoc results and
+ # calculate vertical positioning of letters
+ left_join(df_data_summ, by = join_by(StationCode, Year)) %>%
+ mutate(max_val = if_else(upper.CL > max_val, upper.CL, max_val)) %>%
+ group_by(StationCode, Year) %>%
+ mutate(max_val = max(max_val)) %>%
+ ungroup() %>%
+ mutate(y_pos = max_val + (max_val - min_val) / 10) %>%
+ select(
+ StationCode,
+ Year,
+ FlowActionPeriod,
+ emmean,
+ lower.CL,
+ upper.CL,
+ group,
+ y_pos
+ )
+
+ # Create summary figure
+ df_tuk %>%
+ ggplot(
+ aes(
+ x = FlowActionPeriod,
+ y = emmean,
+ ymin = lower.CL,
+ ymax = upper.CL
+ )
+ ) +
+ geom_boxplot(
+ data = df_data,
+ aes(x = FlowActionPeriod, y = Chla),
+ inherit.aes = FALSE
+ ) +
+ geom_crossbar(color = "grey82", fill = "grey", alpha = 0.7, linewidth = 0.1) +
+ geom_point(color = "red") +
+ geom_text(aes(y = y_pos, label = group), size = 3) +
+ facet_wrap(
+ vars(StationCode, Year),
+ scales = "free_y",
+ nrow = 4,
+ labeller = labeller(.multi_line = FALSE)
+ ) +
+ xlab("Flow Pulse Period") +
+ ylab(expression(Chlorophyll~Fluoresence~(mu*g~L^{-1}))) +
+ theme_bw() +
+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
}
@@ -2770,11 +2853,11 @@ Initial Model
k.check(m_gam_cat3)
## k' edf k-index p-value
-## s(DOY) 4 3.674445 0.969169 0.055
+## s(DOY) 4 3.674445 0.969169 0.045
draw(m_gam_cat3, select = 1, residuals = TRUE, rug = FALSE)
plot(m_gam_cat3, pages = 1, all.terms = TRUE)
-
+
acf(residuals(m_gam_cat3))
Box.test(residuals(m_gam_cat3), lag = 20, type = 'Ljung-Box')
@@ -3045,7 +3128,7 @@ AR(1) Model
draw(m_gamm_cat3_ar1_gam, select = 1, residuals = TRUE, rug = FALSE)
plot(m_gamm_cat3_ar1_gam, pages = 1, all.terms = TRUE)
-
+
acf(resid_cat3_ar1)
Box.test(resid_cat3_ar1, lag = 20, type = 'Ljung-Box')
@@ -4243,11 +4326,11 @@ Initial Model
k.check(m_gam_cat2)
## k' edf k-index p-value
-## s(DOY) 4 3.526722 1.026702 0.9225
+## s(DOY) 4 3.526722 1.026702 0.8925
draw(m_gam_cat2, select = 1, residuals = TRUE, rug = FALSE)
plot(m_gam_cat2, pages = 1, all.terms = TRUE)
-
+
acf(residuals(m_gam_cat2))
Box.test(residuals(m_gam_cat2), lag = 20, type = 'Ljung-Box')
@@ -4446,7 +4529,7 @@ AR(1) Model
draw(m_gamm_cat2_ar1_gam, select = 1, residuals = TRUE, rug = FALSE)
plot(m_gamm_cat2_ar1_gam, pages = 1, all.terms = TRUE)
-
+
acf(resid_cat2_ar1)
Box.test(resid_cat2_ar1, lag = 20, type = 'Ljung-Box')
@@ -5294,18 +5377,415 @@ Model selection - GAM and LM models
interactions without seasonal term) because the 3-way interaction
between categorical variables was significant in this model indicating
that there are possible significant differences between Flow Pulse
-Periods within each Station-Year grouping. We’ll export model 3 and its
-data to be used for further exploration in other notebooks.
-# Define file path for model and data used in the model
-fp_model <- here("manuscript_synthesis/model")
-
-# Export model and data used in the model
-m_lm_cat3_lag1 %>% saveRDS(file.path(fp_model, "chla_cat3_lm_model.rds"))
+Periods within each Station-Year grouping.
+
+
+Model Results
+
+Model 3
+Lets take a closer look at model 3: LM 3-way interactions without
+seasonal term
Formula: Chla_log ~ Year_fct * FlowActionPeriod *
+StationCode + lag1
+
+Pairwise Contrasts
+Tukey pairwise contrasts of Flow Pulse Period for each Station - Year
+combination:
+em_lm_cat3_lag1 <- emmeans(m_lm_cat3_lag1, ~ FlowActionPeriod | StationCode * Year_fct)
+pairs(em_lm_cat3_lag1)
+## StationCode = RD22, Year_fct = 2013:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.03104 0.0706 2811 0.439 0.8990
+## Before - After -0.14576 0.0726 2811 -2.009 0.1103
+## During - After -0.17680 0.0391 2811 -4.527 <.0001
+##
+## StationCode = STTD, Year_fct = 2013:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.12119 0.0711 2811 -1.705 0.2035
+## Before - After 0.03327 0.0718 2811 0.463 0.8885
+## During - After 0.15446 0.0384 2811 4.027 0.0002
+##
+## StationCode = LIB, Year_fct = 2013:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.01689 0.0368 2811 -0.459 0.8903
+## Before - After -0.03037 0.0361 2811 -0.842 0.6767
+## During - After -0.01348 0.0345 2811 -0.390 0.9195
+##
+## StationCode = RVB, Year_fct = 2013:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.02352 0.0348 2811 -0.677 0.7771
+## Before - After 0.02883 0.0340 2811 0.849 0.6727
+## During - After 0.05235 0.0347 2811 1.509 0.2869
+##
+## StationCode = RD22, Year_fct = 2014:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.08579 0.0488 2811 1.757 0.1844
+## Before - After 0.08265 0.0347 2811 2.385 0.0452
+## During - After -0.00314 0.0481 2811 -0.065 0.9977
+##
+## StationCode = STTD, Year_fct = 2014:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.05266 0.0483 2811 -1.090 0.5206
+## Before - After -0.00278 0.0390 2811 -0.071 0.9972
+## During - After 0.04987 0.0518 2811 0.962 0.6009
+##
+## StationCode = LIB, Year_fct = 2014:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.01696 0.0482 2811 0.352 0.9342
+## Before - After 0.03826 0.0340 2811 1.127 0.4975
+## During - After 0.02131 0.0481 2811 0.443 0.8977
+##
+## StationCode = RVB, Year_fct = 2014:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.05967 0.0512 2811 1.165 0.4743
+## Before - After 0.02518 0.0342 2811 0.736 0.7418
+## During - After -0.03450 0.0509 2811 -0.678 0.7762
+##
+## StationCode = RD22, Year_fct = 2015:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.20880 0.0410 2811 5.092 <.0001
+## Before - After 0.03994 0.0403 2811 0.991 0.5829
+## During - After -0.16886 0.0367 2811 -4.601 <.0001
+##
+## StationCode = STTD, Year_fct = 2015:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.14016 0.0423 2811 -3.311 0.0027
+## Before - After -0.03064 0.0408 2811 -0.752 0.7327
+## During - After 0.10951 0.0353 2811 3.100 0.0056
+##
+## StationCode = LIB, Year_fct = 2015:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.05377 0.0360 2811 -1.495 0.2934
+## Before - After 0.01182 0.0340 2811 0.348 0.9354
+## During - After 0.06559 0.0359 2811 1.828 0.1607
+##
+## StationCode = RVB, Year_fct = 2015:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.06439 0.0349 2811 1.846 0.1549
+## Before - After 0.05180 0.0342 2811 1.514 0.2845
+## During - After -0.01259 0.0346 2811 -0.364 0.9295
+##
+## StationCode = RD22, Year_fct = 2016:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.07822 0.0520 2811 1.505 0.2888
+## Before - After 0.17052 0.0442 2811 3.857 0.0003
+## During - After 0.09230 0.0444 2811 2.078 0.0945
+##
+## StationCode = STTD, Year_fct = 2016:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.09446 0.0522 2811 -1.811 0.1662
+## Before - After -0.00767 0.0434 2811 -0.177 0.9829
+## During - After 0.08679 0.0444 2811 1.954 0.1239
+##
+## StationCode = LIB, Year_fct = 2016:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.00762 0.0443 2811 0.172 0.9839
+## Before - After 0.13064 0.0345 2811 3.791 0.0005
+## During - After 0.12302 0.0443 2811 2.778 0.0152
+##
+## StationCode = RVB, Year_fct = 2016:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.02856 0.0459 2811 0.622 0.8080
+## Before - After 0.05545 0.0381 2811 1.457 0.3119
+## During - After 0.02689 0.0459 2811 0.586 0.8278
+##
+## StationCode = RD22, Year_fct = 2017:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.01657 0.0429 2811 0.386 0.9212
+## Before - After 0.09256 0.0345 2811 2.683 0.0201
+## During - After 0.07599 0.0427 2811 1.780 0.1764
+##
+## StationCode = STTD, Year_fct = 2017:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.40805 0.0847 2811 -4.820 <.0001
+## Before - After 0.24182 0.0769 2811 3.143 0.0048
+## During - After 0.64986 0.1097 2811 5.922 <.0001
+##
+## StationCode = LIB, Year_fct = 2017:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.03781 0.0428 2811 0.883 0.6511
+## Before - After 0.05582 0.0342 2811 1.634 0.2317
+## During - After 0.01801 0.0427 2811 0.422 0.9065
+##
+## StationCode = RVB, Year_fct = 2017:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.05830 0.0430 2811 1.355 0.3648
+## Before - After 0.06964 0.0344 2811 2.027 0.1059
+## During - After 0.01133 0.0426 2811 0.266 0.9618
+##
+## StationCode = RD22, Year_fct = 2018:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.13021 0.0390 2811 3.340 0.0024
+## Before - After 0.03575 0.0341 2811 1.049 0.5457
+## During - After -0.09446 0.0383 2811 -2.466 0.0366
+##
+## StationCode = STTD, Year_fct = 2018:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.16774 0.0408 2811 -4.110 0.0001
+## Before - After 0.04857 0.0455 2811 1.068 0.5339
+## During - After 0.21631 0.0486 2811 4.453 <.0001
+##
+## StationCode = LIB, Year_fct = 2018:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.02439 0.0538 2811 -0.453 0.8931
+## Before - After 0.24472 0.0524 2811 4.672 <.0001
+## During - After 0.26911 0.0412 2811 6.529 <.0001
+##
+## StationCode = RVB, Year_fct = 2018:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.08524 0.0384 2811 2.218 0.0684
+## Before - After 0.06216 0.0342 2811 1.817 0.1642
+## During - After -0.02308 0.0380 2811 -0.608 0.8159
+##
+## StationCode = RD22, Year_fct = 2019:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.17499 0.0409 2811 4.278 0.0001
+## Before - After -0.05633 0.0340 2811 -1.658 0.2217
+## During - After -0.23132 0.0413 2811 -5.608 <.0001
+##
+## StationCode = STTD, Year_fct = 2019:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.15945 0.0436 2811 -3.657 0.0008
+## Before - After 0.01866 0.0380 2811 0.491 0.8754
+## During - After 0.17811 0.0399 2811 4.461 <.0001
+##
+## StationCode = LIB, Year_fct = 2019:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.08552 0.0551 2811 1.553 0.2665
+## Before - After 0.09072 0.0457 2811 1.985 0.1161
+## During - After 0.00520 0.0482 2811 0.108 0.9936
+##
+## StationCode = RVB, Year_fct = 2019:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.00326 0.0395 2811 0.083 0.9962
+## Before - After 0.03298 0.0340 2811 0.971 0.5950
+## During - After 0.02972 0.0394 2811 0.755 0.7307
+##
+## P value adjustment: tukey method for comparing a family of 3 estimates
+
+
+Summary Figure
+df_chla_wt_lag %>%
+ drop_na(Chla, lag1) %>%
+ plot_model_summary(em_lm_cat3_lag1)
+
+Observed daily average chlorophyll fluorescence values (boxplots) and
+model results (model means as red points ±95% confidence intervals as
+gray boxes) for the Flow Pulse Period comparisons by Station and Year.
+Different letters above boxplots identify statistically significant (p
+< 0.05) differences from a Tukey post-hoc test.
+The model results don’t match the observed values well at all.
+Unfortunately, this linear model is not a good candidate to use for our
+analysis.
+
+
+
+Model 1
+Since the linear model using 3-way interactions doesn’t fit the
+observed data that well, let’s take a closer look at the GAM version of
+this model, model 1: GAM 3-way interactions with s(DOY)
Formula:
+Chla_log ~ Year_fct * FlowActionPeriod * StationCode + s(DOY, k = 5)
+
+Pairwise Contrasts
+Tukey pairwise contrasts of Flow Pulse Period for each Station - Year
+combination:
+# Add the model call back to the gam object so it works with emmeans
+m_gamm_cat3_ar1_gam$call <- quote(
+ gamm(
+ f_gam_cat3,
+ data = df_chla_wt_c,
+ correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 1),
+ method = "REML"
+ )
+)
-df_chla_wt_lag %>%
- select(StationCode, Year, Date, Chla, Chla_log, FlowActionPeriod, lag1) %>%
- drop_na(Chla_log, lag1) %>%
- saveRDS(file.path(fp_model, "chla_model_data.rds"))
+em_gamm_cat3_ar1 <- emmeans(m_gamm_cat3_ar1_gam, ~ FlowActionPeriod | StationCode * Year_fct)
+pairs(em_gamm_cat3_ar1)
+## StationCode = RD22, Year_fct = 2013:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.01185 0.165 2847 -0.072 0.9972
+## Before - After -0.11630 0.226 2847 -0.514 0.8648
+## During - After -0.10445 0.161 2847 -0.647 0.7942
+##
+## StationCode = STTD, Year_fct = 2013:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.08311 0.165 2847 -0.504 0.8696
+## Before - After 0.03022 0.226 2847 0.133 0.9902
+## During - After 0.11333 0.161 2847 0.702 0.7625
+##
+## StationCode = LIB, Year_fct = 2013:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.10931 0.160 2847 -0.684 0.7727
+## Before - After -0.46042 0.216 2847 -2.128 0.0845
+## During - After -0.35111 0.159 2847 -2.209 0.0699
+##
+## StationCode = RVB, Year_fct = 2013:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.03730 0.159 2847 -0.235 0.9700
+## Before - After -0.07590 0.215 2847 -0.353 0.9335
+## During - After -0.03861 0.159 2847 -0.243 0.9679
+##
+## StationCode = RD22, Year_fct = 2014:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.08902 0.160 2847 0.555 0.8437
+## Before - After 0.19177 0.214 2847 0.896 0.6428
+## During - After 0.10274 0.160 2847 0.641 0.7975
+##
+## StationCode = STTD, Year_fct = 2014:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.06891 0.161 2847 -0.427 0.9042
+## Before - After -0.05232 0.218 2847 -0.240 0.9687
+## During - After 0.01660 0.162 2847 0.102 0.9942
+##
+## StationCode = LIB, Year_fct = 2014:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.03282 0.160 2847 0.205 0.9772
+## Before - After 0.05879 0.214 2847 0.275 0.9593
+## During - After 0.02597 0.160 2847 0.162 0.9856
+##
+## StationCode = RVB, Year_fct = 2014:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.16764 0.161 2847 1.044 0.5493
+## Before - After 0.14007 0.214 2847 0.653 0.7905
+## During - After -0.02757 0.160 2847 -0.172 0.9839
+##
+## StationCode = RD22, Year_fct = 2015:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.17341 0.161 2847 1.077 0.5287
+## Before - After -0.17346 0.219 2847 -0.791 0.7086
+## During - After -0.34687 0.160 2847 -2.170 0.0766
+##
+## StationCode = STTD, Year_fct = 2015:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.14495 0.161 2847 -0.898 0.6416
+## Before - After -0.12722 0.219 2847 -0.580 0.8306
+## During - After 0.01773 0.159 2847 0.111 0.9932
+##
+## StationCode = LIB, Year_fct = 2015:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.03530 0.159 2847 -0.222 0.9732
+## Before - After 0.04603 0.215 2847 0.214 0.9750
+## During - After 0.08133 0.159 2847 0.512 0.8657
+##
+## StationCode = RVB, Year_fct = 2015:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.06070 0.159 2847 0.382 0.9226
+## Before - After 0.08565 0.215 2847 0.399 0.9161
+## During - After 0.02495 0.159 2847 0.157 0.9865
+##
+## StationCode = RD22, Year_fct = 2016:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.10145 0.163 2847 -0.623 0.8075
+## Before - After -0.06039 0.220 2847 -0.274 0.9593
+## During - After 0.04106 0.161 2847 0.254 0.9649
+##
+## StationCode = STTD, Year_fct = 2016:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.01472 0.163 2847 -0.090 0.9955
+## Before - After 0.08079 0.220 2847 0.367 0.9284
+## During - After 0.09552 0.161 2847 0.592 0.8245
+##
+## StationCode = LIB, Year_fct = 2016:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.11206 0.160 2847 0.700 0.7633
+## Before - After 0.23055 0.214 2847 1.077 0.5285
+## During - After 0.11850 0.160 2847 0.741 0.7394
+##
+## StationCode = RVB, Year_fct = 2016:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.05674 0.161 2847 0.352 0.9339
+## Before - After 0.07202 0.217 2847 0.331 0.9413
+## During - After 0.01528 0.161 2847 0.095 0.9951
+##
+## StationCode = RD22, Year_fct = 2017:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.01313 0.160 2847 -0.082 0.9963
+## Before - After -0.35828 0.214 2847 -1.673 0.2157
+## During - After -0.34516 0.160 2847 -2.159 0.0786
+##
+## StationCode = STTD, Year_fct = 2017:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.26668 0.165 2847 -1.615 0.2396
+## Before - After 1.94873 0.228 2847 8.548 <.0001
+## During - After 2.21541 0.168 2847 13.223 <.0001
+##
+## StationCode = LIB, Year_fct = 2017:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.00506 0.160 2847 0.032 0.9994
+## Before - After -0.04569 0.214 2847 -0.213 0.9752
+## During - After -0.05075 0.160 2847 -0.317 0.9460
+##
+## StationCode = RVB, Year_fct = 2017:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.05804 0.160 2847 0.363 0.9299
+## Before - After 0.11184 0.214 2847 0.522 0.8605
+## During - After 0.05380 0.160 2847 0.336 0.9395
+##
+## StationCode = RD22, Year_fct = 2018:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.30690 0.159 2847 1.926 0.1315
+## Before - After 0.39685 0.214 2847 1.851 0.1534
+## During - After 0.08995 0.159 2847 0.565 0.8390
+##
+## StationCode = STTD, Year_fct = 2018:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.14774 0.161 2847 -0.917 0.6295
+## Before - After 0.05024 0.222 2847 0.227 0.9721
+## During - After 0.19798 0.163 2847 1.215 0.4441
+##
+## StationCode = LIB, Year_fct = 2018:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.04202 0.164 2847 0.257 0.9643
+## Before - After 0.17618 0.222 2847 0.792 0.7080
+## During - After 0.13416 0.161 2847 0.834 0.6818
+##
+## StationCode = RVB, Year_fct = 2018:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.04241 0.159 2847 0.266 0.9617
+## Before - After 0.07725 0.214 2847 0.360 0.9309
+## During - After 0.03484 0.159 2847 0.219 0.9740
+##
+## StationCode = RD22, Year_fct = 2019:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.09297 0.159 2847 -0.583 0.8293
+## Before - After -0.34906 0.214 2847 -1.629 0.2336
+## During - After -0.25609 0.159 2847 -1.606 0.2434
+##
+## StationCode = STTD, Year_fct = 2019:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.04377 0.161 2847 -0.272 0.9601
+## Before - After 0.06919 0.217 2847 0.318 0.9457
+## During - After 0.11296 0.160 2847 0.706 0.7602
+##
+## StationCode = LIB, Year_fct = 2019:
+## contrast estimate SE df t.ratio p.value
+## Before - During 0.32408 0.170 2847 1.911 0.1358
+## Before - After 0.25881 0.227 2847 1.138 0.4906
+## During - After -0.06527 0.162 2847 -0.403 0.9146
+##
+## StationCode = RVB, Year_fct = 2019:
+## contrast estimate SE df t.ratio p.value
+## Before - During -0.06432 0.159 2847 -0.403 0.9143
+## Before - After -0.05915 0.214 2847 -0.276 0.9589
+## During - After 0.00517 0.159 2847 0.032 0.9994
+##
+## P value adjustment: tukey method for comparing a family of 3 estimates
+
+
+Summary Figure
+plot_model_summary(df_chla_wt_c, em_gamm_cat3_ar1)
+
+Observed daily average chlorophyll fluorescence values (boxplots) and
+model results (model means as red points ±95% confidence intervals as
+gray boxes) for the Flow Pulse Period comparisons by Station and Year.
+Different letters above boxplots identify statistically significant (p
+< 0.05) differences from a Tukey post-hoc test.
+The model means seem to match the observed values better than Model
+3, but there is a large amount of uncertainty around the means resulting
+in the model not seeing any significant differences among the pairwise
+comparisons. Unfortunately, this linear model does not seem like a good
+candidate to use for our analysis as well.
+
+
diff --git a/manuscript_synthesis/model/chla_cat3_lm_model.rds b/manuscript_synthesis/model/chla_cat3_lm_model.rds
deleted file mode 100644
index fa39635..0000000
Binary files a/manuscript_synthesis/model/chla_cat3_lm_model.rds and /dev/null differ
diff --git a/manuscript_synthesis/model/chla_model_data.rds b/manuscript_synthesis/model/chla_model_data.rds
deleted file mode 100644
index 6c23646..0000000
Binary files a/manuscript_synthesis/model/chla_model_data.rds and /dev/null differ
diff --git a/manuscript_synthesis/notebooks/rtm_chl_analysis_final_model_results.Rmd b/manuscript_synthesis/notebooks/rtm_chl_analysis_final_model_results.Rmd
deleted file mode 100644
index 0725497..0000000
--- a/manuscript_synthesis/notebooks/rtm_chl_analysis_final_model_results.Rmd
+++ /dev/null
@@ -1,156 +0,0 @@
----
-title: "NDFS Synthesis Manuscript: Chlorophyll analysis"
-subtitle: "Results from Final Model"
-author: "Dave Bosworth"
-date: '`r format(Sys.Date(), "%B %d, %Y")`'
-output:
- html_document:
- code_folding: show
- toc: true
- toc_float:
- collapsed: false
-editor_options:
- chunk_output_type: console
-knit: (function(input, ...) {
- rmarkdown::render(
- input,
- output_dir = here::here("docs"),
- envir = globalenv()
- )
- })
----
-
-```{r setup, include=FALSE}
-knitr::opts_chunk$set(echo = TRUE)
-```
-
-# Purpose
-
-Look at results of the preferred model during model selection: Linear model with 3-way interactions between categorical variables - Year, Station, and Flow Pulse Period - without seasonal term.
-
-# Global code and functions
-
-```{r load packages, message = FALSE, warning = FALSE}
-# Load packages
-library(tidyverse)
-library(car)
-library(emmeans)
-library(multcomp)
-library(here)
-library(conflicted)
-
-# Declare package conflict preferences
-conflicts_prefer(dplyr::filter(), dplyr::select())
-```
-
-Display current versions of R and packages used for this analysis:
-
-```{r print session info}
-devtools::session_info()
-```
-
-# Import Model and Data
-
-```{r import data}
-# Define file path for model and data used in the model
-fp_model <- here("manuscript_synthesis/model")
-
-# Import model and data used in the model
-m_lm_cat3 <- read_rds(file.path(fp_model, "chla_cat3_lm_model.rds"))
-df_chla <- read_rds(file.path(fp_model, "chla_model_data.rds"))
-```
-
-# Model Results
-
-Model formula: `r format(m_lm_cat3$call) %>% str_c(collapse = "") %>% str_squish() %>% str_sub(start = 14, end = -12)`
-
-## ANOVA table
-
-```{r anova lm cat3, warning = FALSE}
-Anova(m_lm_cat3, type = 3, contrasts = list(topic = contr.sum, sys = contr.sum))
-```
-
-## Pairwise Contrasts
-
-Tukey pairwise contrasts of Flow Pulse Period for each Station - Year combination:
-
-```{r tukey flow action period, warning = FALSE}
-em_cat3 <- emmeans(m_lm_cat3, ~ FlowActionPeriod | StationCode * Year_fct)
-pairs(em_cat3)
-```
-
-## Summary Figure
-
-```{r summary figure, fig.width = 8.5, fig.height = 7.5}
-# Add significance grouping letters from the Tukey post-hoc results and back
- # transform log-transformed results
-df_cat3_tuk <- em_cat3 %>%
- cld(sort = FALSE, Letters = letters) %>%
- as_tibble() %>%
- mutate(
- group = str_remove_all(.group, fixed(" ")),
- across(c(emmean, lower.CL, upper.CL), ~ exp(.x) / 1000),
- Year = as.numeric(as.character(Year_fct))
- ) %>%
- select(
- StationCode,
- Year,
- FlowActionPeriod,
- emmean,
- lower.CL,
- upper.CL,
- group
- )
-
-# Determine vertical positioning of letters for figure
-# Calculate min and max values of observed data for each Station - Year group
-df_chla_summ <- df_chla %>%
- summarize(
- max_val = max(Chla),
- min_val = min(Chla),
- .by = c(StationCode, Year)
- )
-
-# Add min and max values of observed data to the Tukey post-hoc results and
- # calculate vertical positioning of letters
-df_cat3_tuk_c <- df_cat3_tuk %>%
- left_join(df_chla_summ, by = join_by(StationCode, Year)) %>%
- mutate(max_val = if_else(upper.CL > max_val, upper.CL, max_val)) %>%
- group_by(StationCode, Year) %>%
- mutate(max_val = max(max_val)) %>%
- ungroup() %>%
- mutate(y_pos = max_val + (max_val - min_val) / 10)
-
-# Create summary figure
-df_cat3_tuk_c %>%
- ggplot(
- aes(
- x = FlowActionPeriod,
- y = emmean,
- ymin = lower.CL,
- ymax = upper.CL,
- label = group
- )
- ) +
- geom_boxplot(
- data = df_chla,
- aes(x = FlowActionPeriod, y = Chla),
- inherit.aes = FALSE
- ) +
- geom_crossbar(color = "grey82", fill = "grey", alpha = 0.7, linewidth = 0.1) +
- geom_point(color = "red") +
- geom_text(aes(y = y_pos), size = 3) +
- facet_wrap(
- vars(StationCode, Year),
- scales = "free_y",
- nrow = 4,
- labeller = labeller(.multi_line = FALSE)
- ) +
- xlab("Flow Pulse Period") +
- ylab(expression(Chlorophyll~Fluoresence~(mu*g~L^{-1}))) +
- theme_bw() +
- theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
-```
-
-Observed daily average chlorophyll fluorescence values (boxplots) and model results (model means as red points ±95% confidence intervals as gray boxes) for the Flow Pulse Period comparisons by Station and Year. Different letters above boxplots identify statistically significant (p < 0.05) differences from a Tukey post-hoc test.
-
diff --git a/manuscript_synthesis/notebooks/rtm_chl_gam_analysis_categorical.Rmd b/manuscript_synthesis/notebooks/rtm_chl_gam_categorical_initial.Rmd
similarity index 99%
rename from manuscript_synthesis/notebooks/rtm_chl_gam_analysis_categorical.Rmd
rename to manuscript_synthesis/notebooks/rtm_chl_gam_categorical_initial.Rmd
index cebb14d..831c345 100644
--- a/manuscript_synthesis/notebooks/rtm_chl_gam_analysis_categorical.Rmd
+++ b/manuscript_synthesis/notebooks/rtm_chl_gam_categorical_initial.Rmd
@@ -1,6 +1,6 @@
---
title: "NDFS Synthesis Manuscript: Chlorophyll analysis"
-subtitle: "GAM model using categorical predictors"
+subtitle: "GAM model using categorical predictors - Initial"
author: "Dave Bosworth"
date: '`r format(Sys.Date(), "%B %d, %Y")`'
output:
diff --git a/manuscript_synthesis/notebooks/rtm_chl_analysis_model_selection.Rmd b/manuscript_synthesis/notebooks/rtm_chl_models_categorical_revised.Rmd
similarity index 84%
rename from manuscript_synthesis/notebooks/rtm_chl_analysis_model_selection.Rmd
rename to manuscript_synthesis/notebooks/rtm_chl_models_categorical_revised.Rmd
index 35e87d1..0e46180 100644
--- a/manuscript_synthesis/notebooks/rtm_chl_analysis_model_selection.Rmd
+++ b/manuscript_synthesis/notebooks/rtm_chl_models_categorical_revised.Rmd
@@ -1,6 +1,6 @@
---
title: "NDFS Synthesis Manuscript: Chlorophyll analysis"
-subtitle: "Model selection"
+subtitle: "Models using categorical predictors - Revised Analysis"
author: "Dave Bosworth"
date: '`r format(Sys.Date(), "%B %d, %Y")`'
output:
@@ -42,6 +42,8 @@ library(car)
library(gratia)
library(rlang)
library(patchwork)
+library(emmeans)
+library(multcomp)
library(here)
library(conflicted)
@@ -49,7 +51,7 @@ library(conflicted)
source(here("manuscript_synthesis/src/global_functions.R"))
# Declare package conflict preferences
-conflicts_prefer(dplyr::filter(), dplyr::lag())
+conflicts_prefer(dplyr::filter(), dplyr::lag(), dplyr::select())
```
Display current versions of R and packages used for this analysis:
@@ -58,10 +60,10 @@ Display current versions of R and packages used for this analysis:
devtools::session_info()
```
-Create function to plot linear model diagnostics:
+Create functions for document:
```{r global funcs}
-# Create model diagnostic plots to check assumptions
+# Create diagnostic plots for linear models to check assumptions
plot_lm_diag <- function(df_data, param_var, unit_label, model, trans = c("none", "log", "sqrt"), ...) {
trans <- match.arg(trans)
@@ -118,6 +120,75 @@ plot_lm_diag <- function(df_data, param_var, unit_label, model, trans = c("none"
(plt_hist + plt_qq) / (plt_res_fit + plt_obs_fit)
}
+
+plot_model_summary <- function(df_data, em_tuk_obj) {
+ # Calculate min and max values of observed data for each Station - Year group to
+ # determine vertical positioning of letters for figure
+ df_data_summ <- df_data %>%
+ summarize(
+ max_val = max(Chla),
+ min_val = min(Chla),
+ .by = c(StationCode, Year)
+ )
+
+ # Add significance grouping letters to the Tukey post-hoc results
+ df_tuk <- em_tuk_obj %>%
+ cld(sort = FALSE, Letters = letters) %>%
+ as_tibble() %>%
+ mutate(
+ group = str_remove_all(.group, fixed(" ")),
+ # back transform log-transformed results
+ across(c(emmean, lower.CL, upper.CL), ~ exp(.x) / 1000),
+ Year = as.numeric(as.character(Year_fct))
+ ) %>%
+ # Add min and max values of observed data to the Tukey post-hoc results and
+ # calculate vertical positioning of letters
+ left_join(df_data_summ, by = join_by(StationCode, Year)) %>%
+ mutate(max_val = if_else(upper.CL > max_val, upper.CL, max_val)) %>%
+ group_by(StationCode, Year) %>%
+ mutate(max_val = max(max_val)) %>%
+ ungroup() %>%
+ mutate(y_pos = max_val + (max_val - min_val) / 10) %>%
+ select(
+ StationCode,
+ Year,
+ FlowActionPeriod,
+ emmean,
+ lower.CL,
+ upper.CL,
+ group,
+ y_pos
+ )
+
+ # Create summary figure
+ df_tuk %>%
+ ggplot(
+ aes(
+ x = FlowActionPeriod,
+ y = emmean,
+ ymin = lower.CL,
+ ymax = upper.CL
+ )
+ ) +
+ geom_boxplot(
+ data = df_data,
+ aes(x = FlowActionPeriod, y = Chla),
+ inherit.aes = FALSE
+ ) +
+ geom_crossbar(color = "grey82", fill = "grey", alpha = 0.7, linewidth = 0.1) +
+ geom_point(color = "red") +
+ geom_text(aes(y = y_pos, label = group), size = 3) +
+ facet_wrap(
+ vars(StationCode, Year),
+ scales = "free_y",
+ nrow = 4,
+ labeller = labeller(.multi_line = FALSE)
+ ) +
+ xlab("Flow Pulse Period") +
+ ylab(expression(Chlorophyll~Fluoresence~(mu*g~L^{-1}))) +
+ theme_bw() +
+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
+}
```
@@ -911,18 +982,67 @@ df_m_aic_bic %>% arrange(AIC)
df_m_aic_bic %>% arrange(BIC)
```
-According to AIC, model 3 (LM 3-way interactions without seasonal term) was the best model, while BIC preferred model 6 (LM 2-way interactions without seasonal term). We'll select model 3 (LM 3-way interactions without seasonal term) because the 3-way interaction between categorical variables was significant in this model indicating that there are possible significant differences between Flow Pulse Periods within each Station-Year grouping. We'll export model 3 and its data to be used for further exploration in other notebooks.
+According to AIC, model 3 (LM 3-way interactions without seasonal term) was the best model, while BIC preferred model 6 (LM 2-way interactions without seasonal term). We'll select model 3 (LM 3-way interactions without seasonal term) because the 3-way interaction between categorical variables was significant in this model indicating that there are possible significant differences between Flow Pulse Periods within each Station-Year grouping.
+
+# Model Results
+
+## Model 3
+
+Lets take a closer look at model 3: LM 3-way interactions without seasonal term
+Formula: `r format(m_lm_cat3_lag1$call) %>% str_c(collapse = "") %>% str_squish() %>% str_sub(start = 14, end = -12)`
-```{r export models and data}
-# Define file path for model and data used in the model
-fp_model <- here("manuscript_synthesis/model")
+### Pairwise Contrasts
-# Export model and data used in the model
-m_lm_cat3_lag1 %>% saveRDS(file.path(fp_model, "chla_cat3_lm_model.rds"))
+Tukey pairwise contrasts of Flow Pulse Period for each Station - Year combination:
+```{r lm cat3 lag1 tukey flow action period, warning = FALSE}
+em_lm_cat3_lag1 <- emmeans(m_lm_cat3_lag1, ~ FlowActionPeriod | StationCode * Year_fct)
+pairs(em_lm_cat3_lag1)
+```
+
+### Summary Figure
+
+```{r lm cat3 lag 1 summary figure, fig.width = 8.5, fig.height = 7.5}
df_chla_wt_lag %>%
- select(StationCode, Year, Date, Chla, Chla_log, FlowActionPeriod, lag1) %>%
- drop_na(Chla_log, lag1) %>%
- saveRDS(file.path(fp_model, "chla_model_data.rds"))
+ drop_na(Chla, lag1) %>%
+ plot_model_summary(em_lm_cat3_lag1)
```
+Observed daily average chlorophyll fluorescence values (boxplots) and model results (model means as red points ±95% confidence intervals as gray boxes) for the Flow Pulse Period comparisons by Station and Year. Different letters above boxplots identify statistically significant (p < 0.05) differences from a Tukey post-hoc test.
+
+The model results don't match the observed values well at all. Unfortunately, this linear model is not a good candidate to use for our analysis.
+
+## Model 1
+
+Since the linear model using 3-way interactions doesn't fit the observed data that well, let's take a closer look at the GAM version of this model, model 1: GAM 3-way interactions with s(DOY)
+Formula: `r format(m_gamm_cat3_ar1_gam$formula) %>% str_c(collapse = "") %>% str_squish()`
+
+### Pairwise Contrasts
+
+Tukey pairwise contrasts of Flow Pulse Period for each Station - Year combination:
+
+```{r gam cat3 ar1 tukey flow action period, warning = FALSE}
+# Add the model call back to the gam object so it works with emmeans
+m_gamm_cat3_ar1_gam$call <- quote(
+ gamm(
+ f_gam_cat3,
+ data = df_chla_wt_c,
+ correlation = corARMA(form = ~ 1|Year_fct/StationCode, p = 1),
+ method = "REML"
+ )
+)
+
+em_gamm_cat3_ar1 <- emmeans(m_gamm_cat3_ar1_gam, ~ FlowActionPeriod | StationCode * Year_fct)
+pairs(em_gamm_cat3_ar1)
+```
+
+### Summary Figure
+
+```{r gam cat3 ar1 summary figure, fig.width = 8.5, fig.height = 7.5}
+plot_model_summary(df_chla_wt_c, em_gamm_cat3_ar1)
+```
+
+Observed daily average chlorophyll fluorescence values (boxplots) and model results (model means as red points ±95% confidence intervals as gray boxes) for the Flow Pulse Period comparisons by Station and Year. Different letters above boxplots identify statistically significant (p < 0.05) differences from a Tukey post-hoc test.
+
+The model means seem to match the observed values better than Model 3, but there is a large amount of uncertainty around the means resulting in the model not seeing any significant differences among the pairwise comparisons. Unfortunately, this linear model does not seem like a good candidate to use for our analysis as well.
+