diff --git a/docs/index.md b/docs/index.md index cc5b7b2..c9357ea 100644 --- a/docs/index.md +++ b/docs/index.md @@ -8,7 +8,8 @@ layout: default * [GAM Model using Categorical Predictors - Initial analysis](rtm_chl_gam_categorical_initial.html) * [Models using Categorical Predictors - Revised analysis](rtm_chl_models_categorical_revised.html) -* [Models using Flow as a Continuous Predictor](rtm_chl_models_flow.html) +* [Models using Flow as a Continuous Predictor - Daily Averages](rtm_chl_models_flow.html) +* [Models using Flow as a Continuous Predictor - Weekly Averages](rtm_chl_models_flow_weekly_avg.html) * [Explore relationship between continuous chlorophyll and percent flow pulse water](rtm_chl_analysis_perc_flow_pulse.html) ### Final Analyses diff --git a/docs/rtm_chl_models_flow_weekly_avg.html b/docs/rtm_chl_models_flow_weekly_avg.html index 879f480..0731359 100644 --- a/docs/rtm_chl_models_flow_weekly_avg.html +++ b/docs/rtm_chl_models_flow_weekly_avg.html @@ -1581,7 +1581,7 @@

NDFS Synthesis Manuscript: Chlorophyll analysis

-

Models using daily average flow as continuous +

Models using weekly average flow as continuous predictor

Dave Bosworth

December 11, 2023

@@ -1593,7 +1593,7 @@

December 11, 2023

Purpose

Explore and analyze the continuous chlorophyll data to be included in the NDFS synthesis manuscript. We will attempt to fit various models to -the data set using daily average flow as a continuous predictor which +the data set using weekly average flow as a continuous predictor which replaces the categorical predictor - flow action period - in the original analysis. These models will only include representative stations for 4 habitat types - upstream (RD22), lower Yolo Bypass @@ -1754,13 +1754,10 @@

Prepare Data

mutate( # Scale and log transform chlorophyll values Chla_log = log(Chla * 1000), - # Add Year and DOY variables - #Year = year(Date), - #DOY = yday(Date), # Apply factor order to StationCode StationCode = factor(StationCode, levels = sta_order), # Add a column for Year as a factor for the model - Year_fct = factor(Year, levels=c("2014", "2015", "2016", "2017", "2018", "2019")) + Year_fct = factor(Year) ) %>% arrange(StationCode, Year, Week) @@ -1973,14 +1970,15 @@

Explore sample counts by Station

Looking at the sample counts and date ranges, we’ll only include years 2015-2019 for the analysis.

df_chla_c2 <- df_chla_c1 %>% 
-  filter(Year %in% 2015:2019) 
+  filter(Year %in% 2015:2019) %>% 
+  mutate(Year_fct = fct_drop(Year_fct))
 
 # Create another dataframe that has up to 3 lag variables for chlorophyll to be
   # used in the linear models
 df_chla_c2_lag <- df_chla_c2 %>% 
-  # Fill in missing days for each StationCode-Year combination
+  # Fill in missing weeks for each StationCode-Year combination
   group_by(StationCode, Year) %>% 
-  #complete(Date = seq.Date(min(Week), max(Week), by = "1 week")) %>% 
+  # complete(Week = seq.int(min(Week), max(Week), by = 1)) %>% 
   # Create lag variables of scaled log transformed chlorophyll values
   mutate(
     lag1 = lag(Chla_log),
@@ -2000,7 +1998,7 @@ 

Plots

geom_smooth(formula = "y ~ x") + facet_wrap(vars(StationCode), scales = "free") + theme_bw()
-

+

At first glance, I’m not sure how well flow is going to be able to predict chlorophyll concentrations. At the furthest upstream station - RD22 - chlorophyll appears to be highest at the lowest flows, but the @@ -2024,7 +2022,7 @@

Plots

labeller = labeller(.multi_line = FALSE) ) + theme_bw() -

+

The patterns appear to vary annually at each station, which may justify using a 3-way interaction.

@@ -2032,7 +2030,7 @@

Plots

GAM Model with 3-way interactions

First, we will attempt to fit a generalized additive model (GAM) to the data set to help account for seasonality in the data. We’ll try -running a GAM using a three-way interaction between Year, Daily Average +running a GAM using a three-way interaction between Year, Weekly Average Flow, and Station, and a smooth term for day of year to account for seasonality. Initially, we’ll run the GAM without accounting for serial autocorrelation.

@@ -2115,11 +2113,11 @@

Initial Model

## W = 0.97425, p-value = 2.071e-05
k.check(m_gam3)
##         k'      edf   k-index p-value
-## s(Week)  9 4.741192 0.9640348  0.2625
+## s(Week) 9 4.741192 0.9640348 0.3075
draw(m_gam3, select = 1, residuals = TRUE, rug = FALSE)

plot(m_gam3, pages = 1, all.terms = TRUE)
-

+

acf(residuals(m_gam3))

Box.test(residuals(m_gam3), lag = 20, type = 'Ljung-Box')
@@ -2132,37 +2130,15 @@

Initial Model

Model with first and second order lag terms

Now, we’ll try to deal with the residual autocorrelation.

-
# Fit original model with k = 9 and three successive lagged models
-#lag1
+
# Fit two successive lagged models
+# lag1
 m_gam3_lag1 <- gam(
-  Chla_log ~ lag1 + Year_fct * Flow * StationCode + s(Week, bs="cc"), 
+  Chla_log ~ lag1 + Year_fct * Flow * StationCode + s(Week, bs = "cc"), 
   data = df_chla_c2_lag,
-  method = "REML", knots=list(week=c(0,52))
-)
-
## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
+  method = "REML", knots = list(week = c(0, 52))
+)
 
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
summary(m_gam3_lag1)
+summary(m_gam3_lag1)
## 
 ## Family: gaussian 
 ## Link function: identity 
@@ -2234,15 +2210,11 @@ 

Model with first and second order lag terms

## W = 0.96619, p-value = 2.237e-06
k.check(m_gam3_lag1)
##         k'      edf   k-index p-value
-## s(Week)  8 3.235634 0.9700211    0.26
+## s(Week) 8 3.235634 0.9700211 0.2925
draw(m_gam3_lag1, select = 1, residuals = TRUE, rug = FALSE)
-
## Warning in model.matrix.default(Terms[[i]], mf, contrasts = oc): partial
-## argument match of 'contrasts' to 'contrasts.arg'

plot(m_gam3_lag1, pages = 1, all.terms = TRUE)
-
## Warning in seq.default(min(raw), max(raw), length = n): partial argument match
-## of 'length' to 'length.out'
-

+

acf(residuals(m_gam3_lag1))

Box.test(residuals(m_gam3_lag1), lag = 20, type = 'Ljung-Box')
@@ -2251,39 +2223,14 @@

Model with first and second order lag terms

## ## data: residuals(m_gam3_lag1) ## X-squared = 27.952, df = 20, p-value = 0.1105 -
#lag2
+
# lag2
 m_gam3_lag2 <- gam(
-  Chla_log ~ lag1 + lag2 + Year_fct * Flow * StationCode + s(Week, bs="cc"), 
+  Chla_log ~ lag1 + lag2 + Year_fct * Flow * StationCode + s(Week, bs = "cc"), 
   data = df_chla_c2_lag,
-  method = "REML", knots=list(week=c(0,52))
-)
-
## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
+  method = "REML", knots = list(week = c(0, 52))
+)
 
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
summary(m_gam3_lag2)
+summary(m_gam3_lag2)
## 
 ## Family: gaussian 
 ## Link function: identity 
@@ -2357,15 +2304,11 @@ 

Model with first and second order lag terms

## W = 0.96218, p-value = 1.372e-06
k.check(m_gam3_lag2)
##         k'      edf   k-index p-value
-## s(Week)  8 3.233359 0.9488561  0.1975
+## s(Week) 8 3.233359 0.9488561 0.1575
draw(m_gam3_lag2, select = 1, residuals = TRUE, rug = FALSE)
-
## Warning in model.matrix.default(Terms[[i]], mf, contrasts = oc): partial
-## argument match of 'contrasts' to 'contrasts.arg'

plot(m_gam3_lag2, pages = 1, all.terms = TRUE)
-
## Warning in seq.default(min(raw), max(raw), length = n): partial argument match
-## of 'length' to 'length.out'
-

+

acf(residuals(m_gam3_lag2))

Box.test(residuals(m_gam3_lag2), lag = 20, type = 'Ljung-Box')
@@ -2374,7 +2317,7 @@

Model with first and second order lag terms

## ## data: residuals(m_gam3_lag2) ## X-squared = 15.746, df = 20, p-value = 0.7322 -
#compare fit of lag1 and lag2 models
+
# compare fit of lag1 and lag2 models
 AIC(m_gam3_lag1, m_gam3_lag2)
##                   df      AIC
 ## m_gam3_lag1 45.92801 160.0664
@@ -2384,6 +2327,7 @@ 

Model with first and second order lag terms

Let’s take a closer look at how the back-transformed fitted values from the model match the observed values.

df_m_gam3_lag2_fit <- df_chla_c2_lag %>% 
+  drop_na(lag1, lag2) %>% 
   fitted_values(m_gam3_lag2, data = .) %>% 
   mutate(fitted_bt = exp(fitted) / 1000)
 
@@ -2407,43 +2351,29 @@ 

Model with first and second order lag terms

scales = "free", labeller = labeller(.multi_line = FALSE) )
-
## Warning: Removed 40 rows containing missing values (`geom_point()`).
-

+

Everything looks pretty decent from lag 2 model. Not perfect, but pretty good given the number of data points. Let’s compare to a two-way interaction model.

-
# Fit two-way interaction model with k = 9 and three successive lagged models
-#lag1
+
+ +
+

GAM Model with 2-way interactions

+
# Fit two-way interaction model and two successive lagged models
+# no lag terms
 m_gam2 <- gam(
-  Chla_log ~ Year_fct * Flow + StationCode * Flow + Year_fct * StationCode + s(Week, bs="cc"), 
-  data = df_chla_c2_lag,
-  method = "REML", knots=list(week=c(0,52))
-)
-
## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
+  Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(Week, bs = "cc"), 
+  data = df_chla_c2,
+  method = "REML", knots = list(week = c(0, 52))
+)
 
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
summary(m_gam2)
+summary(m_gam2)
## 
 ## Family: gaussian 
 ## Link function: identity 
 ## 
 ## Formula:
-## Chla_log ~ Year_fct * Flow + StationCode * Flow + Year_fct * 
-##     StationCode + s(Week, bs = "cc")
+## Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(Week, bs = "cc")
 ## 
 ## Parametric coefficients:
 ##                                Estimate Std. Error t value Pr(>|t|)    
@@ -2460,9 +2390,6 @@ 

Model with first and second order lag terms

## Year_fct2017:Flow -1.378e-04 1.300e-04 -1.060 0.28993 ## Year_fct2018:Flow -1.094e-04 1.302e-04 -0.840 0.40144 ## Year_fct2019:Flow -7.125e-05 1.304e-04 -0.546 0.58535 -## Flow:StationCodeSTTD 3.624e-03 3.217e-04 11.265 < 2e-16 *** -## Flow:StationCodeLIB 1.469e-03 2.898e-04 5.069 7.25e-07 *** -## Flow:StationCodeRVB 1.614e-03 2.390e-04 6.755 8.17e-11 *** ## Year_fct2016:StationCodeSTTD 8.511e-01 1.779e-01 4.783 2.79e-06 *** ## Year_fct2017:StationCodeSTTD 3.374e-01 1.860e-01 1.814 0.07081 . ## Year_fct2018:StationCodeSTTD -3.322e-01 1.729e-01 -1.921 0.05575 . @@ -2475,6 +2402,9 @@

Model with first and second order lag terms

## Year_fct2017:StationCodeRVB 8.853e-01 6.667e-01 1.328 0.18527 ## Year_fct2018:StationCodeRVB 4.471e-01 6.051e-01 0.739 0.46057 ## Year_fct2019:StationCodeRVB -4.912e-01 6.191e-01 -0.793 0.42821 +## Flow:StationCodeSTTD 3.624e-03 3.217e-04 11.265 < 2e-16 *** +## Flow:StationCodeLIB 1.469e-03 2.898e-04 5.069 7.25e-07 *** +## Flow:StationCodeRVB 1.614e-03 2.390e-04 6.755 8.17e-11 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## @@ -2496,15 +2426,11 @@

Model with first and second order lag terms

## W = 0.9836, p-value = 0.00119
k.check(m_gam2)
##         k'      edf   k-index p-value
-## s(Week)  8 4.347465 0.9768648  0.3175
+## s(Week) 8 4.347465 0.9768648 0.32
draw(m_gam2, select = 1, residuals = TRUE, rug = FALSE)
-
## Warning in model.matrix.default(Terms[[i]], mf, contrasts = oc): partial
-## argument match of 'contrasts' to 'contrasts.arg'

plot(m_gam2, pages = 1, all.terms = TRUE)
-
## Warning in seq.default(min(raw), max(raw), length = n): partial argument match
-## of 'length' to 'length.out'
-

+

acf(residuals(m_gam2))

Box.test(residuals(m_gam2), lag = 20, type = 'Ljung-Box')
@@ -2513,45 +2439,25 @@

Model with first and second order lag terms

## ## data: residuals(m_gam2) ## X-squared = 144.8, df = 20, p-value < 2.2e-16 -
#lag1
+
# lag1
 m_gam2_lag1 <- gam(
-  Chla_log ~ lag1 + Year_fct * Flow + StationCode * Flow + Year_fct * StationCode + s(Week, bs="cc"), 
+  Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + s(Week, bs = "cc"), 
   data = df_chla_c2_lag,
-  method = "REML", knots=list(week=c(0,52))
-)
-
## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
+  method = "REML", knots = list(week = c(0, 52))
+)
 
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
summary(m_gam2_lag1)
+summary(m_gam2_lag1)
## 
 ## Family: gaussian 
 ## Link function: identity 
 ## 
 ## Formula:
-## Chla_log ~ lag1 + Year_fct * Flow + StationCode * Flow + Year_fct * 
-##     StationCode + s(Week, bs = "cc")
+## Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + s(Week, 
+##     bs = "cc")
 ## 
 ## Parametric coefficients:
 ##                                Estimate Std. Error t value Pr(>|t|)    
 ## (Intercept)                   4.882e+00  4.770e-01  10.234  < 2e-16 ***
-## lag1                          4.984e-01  4.850e-02  10.278  < 2e-16 ***
 ## Year_fct2016                 -3.418e-01  1.194e-01  -2.864 0.004528 ** 
 ## Year_fct2017                 -2.333e-01  1.129e-01  -2.067 0.039724 *  
 ## Year_fct2018                 -1.358e-01  1.084e-01  -1.252 0.211530    
@@ -2560,13 +2466,11 @@ 

Model with first and second order lag terms

## StationCodeSTTD -5.177e-01 1.219e-01 -4.246 3.02e-05 *** ## StationCodeLIB -9.106e-01 2.939e-01 -3.098 0.002159 ** ## StationCodeRVB -1.401e+00 5.146e-01 -2.722 0.006917 ** +## lag1 4.984e-01 4.850e-02 10.278 < 2e-16 *** ## Year_fct2016:Flow -1.304e-04 1.369e-04 -0.952 0.341996 ## Year_fct2017:Flow -8.557e-05 1.251e-04 -0.684 0.494508 ## Year_fct2018:Flow -8.607e-05 1.253e-04 -0.687 0.492624 ## Year_fct2019:Flow -5.161e-05 1.253e-04 -0.412 0.680675 -## Flow:StationCodeSTTD 2.174e-03 3.132e-04 6.942 3.04e-11 *** -## Flow:StationCodeLIB 1.026e-03 2.585e-04 3.971 9.26e-05 *** -## Flow:StationCodeRVB 1.022e-03 2.151e-04 4.753 3.32e-06 *** ## Year_fct2016:StationCodeSTTD 4.121e-01 1.637e-01 2.518 0.012414 * ## Year_fct2017:StationCodeSTTD 1.600e-01 1.682e-01 0.951 0.342282 ## Year_fct2018:StationCodeSTTD -1.870e-01 1.547e-01 -1.208 0.227952 @@ -2579,6 +2483,9 @@

Model with first and second order lag terms

## Year_fct2017:StationCodeRVB 4.735e-01 6.165e-01 0.768 0.443217 ## Year_fct2018:StationCodeRVB 3.698e-01 5.621e-01 0.658 0.511178 ## Year_fct2019:StationCodeRVB -2.789e-01 5.724e-01 -0.487 0.626467 +## Flow:StationCodeSTTD 2.174e-03 3.132e-04 6.942 3.04e-11 *** +## Flow:StationCodeLIB 1.026e-03 2.585e-04 3.971 9.26e-05 *** +## Flow:StationCodeRVB 1.022e-03 2.151e-04 4.753 3.32e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## @@ -2600,15 +2507,11 @@

Model with first and second order lag terms

## W = 0.97011, p-value = 8.55e-06
k.check(m_gam2_lag1)
##         k'      edf   k-index p-value
-## s(Week)  8 3.081106 0.9564703   0.265
+## s(Week) 8 3.081106 0.9564703 0.2075
draw(m_gam2_lag1, select = 1, residuals = TRUE, rug = FALSE)
-
## Warning in model.matrix.default(Terms[[i]], mf, contrasts = oc): partial
-## argument match of 'contrasts' to 'contrasts.arg'

plot(m_gam2_lag1, pages = 1, all.terms = TRUE)
-
## Warning in seq.default(min(raw), max(raw), length = n): partial argument match
-## of 'length' to 'length.out'
-

+

acf(residuals(m_gam2_lag1))

Box.test(residuals(m_gam2_lag1), lag = 20, type = 'Ljung-Box')
@@ -2617,49 +2520,25 @@

Model with first and second order lag terms

## ## data: residuals(m_gam2_lag1) ## X-squared = 30.865, df = 20, p-value = 0.057 -
#lag2
+
# lag2
 m_gam2_lag2 <- gam(
-  Chla_log ~ lag1 + lag2 + Year_fct * Flow + StationCode * Flow + Year_fct * StationCode + s(Week, bs="cc"), 
+  Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + lag2 + s(Week, bs = "cc"), 
   data = df_chla_c2_lag,
-  method = "REML", knots=list(week=c(0,52))
-)
-
## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
+  method = "REML", knots = list(week = c(0, 52))
+)
 
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
-## Warning in attr(pterms[tind[j]], "term.label"): partial match of 'term.label'
-## to 'term.labels'
-
summary(m_gam2_lag2)
+summary(m_gam2_lag2)
## 
 ## Family: gaussian 
 ## Link function: identity 
 ## 
 ## Formula:
-## Chla_log ~ lag1 + lag2 + Year_fct * Flow + StationCode * Flow + 
-##     Year_fct * StationCode + s(Week, bs = "cc")
+## Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + lag2 + 
+##     s(Week, bs = "cc")
 ## 
 ## Parametric coefficients:
 ##                                Estimate Std. Error t value Pr(>|t|)    
 ## (Intercept)                   5.322e+00  5.266e-01  10.105  < 2e-16 ***
-## lag1                          5.899e-01  6.319e-02   9.335  < 2e-16 ***
-## lag2                         -1.397e-01  5.927e-02  -2.357 0.019233 *  
 ## Year_fct2016                 -3.471e-01  1.257e-01  -2.762 0.006181 ** 
 ## Year_fct2017                 -2.386e-01  1.179e-01  -2.024 0.044031 *  
 ## Year_fct2018                 -1.789e-01  1.129e-01  -1.585 0.114280    
@@ -2668,13 +2547,12 @@ 

Model with first and second order lag terms

## StationCodeSTTD -5.372e-01 1.265e-01 -4.245 3.12e-05 *** ## StationCodeLIB -1.087e+00 3.070e-01 -3.541 0.000478 *** ## StationCodeRVB -1.434e+00 5.449e-01 -2.632 0.009034 ** +## lag1 5.899e-01 6.319e-02 9.335 < 2e-16 *** +## lag2 -1.397e-01 5.927e-02 -2.357 0.019233 * ## Year_fct2016:Flow -1.353e-04 1.452e-04 -0.932 0.352416 ## Year_fct2017:Flow -6.392e-05 1.315e-04 -0.486 0.627461 ## Year_fct2018:Flow -6.242e-05 1.311e-04 -0.476 0.634499 ## Year_fct2019:Flow -3.053e-05 1.311e-04 -0.233 0.816139 -## Flow:StationCodeSTTD 2.004e-03 3.208e-04 6.247 1.88e-09 *** -## Flow:StationCodeLIB 8.535e-04 2.722e-04 3.135 0.001931 ** -## Flow:StationCodeRVB 9.082e-04 2.219e-04 4.093 5.80e-05 *** ## Year_fct2016:StationCodeSTTD 4.949e-01 1.703e-01 2.906 0.004005 ** ## Year_fct2017:StationCodeSTTD 1.930e-01 1.765e-01 1.093 0.275390 ## Year_fct2018:StationCodeSTTD -1.501e-01 1.610e-01 -0.932 0.352364 @@ -2687,6 +2565,9 @@

Model with first and second order lag terms

## Year_fct2017:StationCodeRVB 3.048e-01 6.591e-01 0.462 0.644153 ## Year_fct2018:StationCodeRVB 2.620e-01 5.862e-01 0.447 0.655360 ## Year_fct2019:StationCodeRVB -3.909e-01 5.963e-01 -0.656 0.512728 +## Flow:StationCodeSTTD 2.004e-03 3.208e-04 6.247 1.88e-09 *** +## Flow:StationCodeLIB 8.535e-04 2.722e-04 3.135 0.001931 ** +## Flow:StationCodeRVB 9.082e-04 2.219e-04 4.093 5.80e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## @@ -2708,15 +2589,11 @@

Model with first and second order lag terms

## W = 0.96668, p-value = 5.596e-06
k.check(m_gam2_lag2)
##         k'     edf   k-index p-value
-## s(Week)  8 3.11316 0.9479573   0.195
+## s(Week) 8 3.11316 0.9479573 0.18
draw(m_gam2_lag2, select = 1, residuals = TRUE, rug = FALSE)
-
## Warning in model.matrix.default(Terms[[i]], mf, contrasts = oc): partial
-## argument match of 'contrasts' to 'contrasts.arg'

plot(m_gam2_lag2, pages = 1, all.terms = TRUE)
-
## Warning in seq.default(min(raw), max(raw), length = n): partial argument match
-## of 'length' to 'length.out'
-

+

acf(residuals(m_gam2_lag2))

Box.test(residuals(m_gam2_lag2), lag = 20, type = 'Ljung-Box')
@@ -2725,7 +2602,7 @@

Model with first and second order lag terms

## ## data: residuals(m_gam2_lag2) ## X-squared = 19.125, df = 20, p-value = 0.5137 -
#compare fit of lag1 and lag2 models from two and three-way interaction models
+
# compare fit of lag1 and lag2 models from two and three-way interaction models
 AIC(m_gam2_lag1, m_gam2_lag2, m_gam3_lag1, m_gam3_lag2)
##                   df      AIC
 ## m_gam2_lag1 33.59917 154.4213
@@ -2737,6 +2614,7 @@ 

Model with first and second order lag terms

Let’s take a closer look at how the back-transformed fitted values from the model match the observed values.

df_m_gam2_lag2_fit <- df_chla_c2_lag %>% 
+  drop_na(lag1, lag2) %>% 
   fitted_values(m_gam2_lag2, data = .) %>% 
   mutate(fitted_bt = exp(fitted) / 1000)
 
@@ -2760,13 +2638,11 @@ 

Model with first and second order lag terms

scales = "free", labeller = labeller(.multi_line = FALSE) )
-
## Warning: Removed 40 rows containing missing values (`geom_point()`).
-

+

Everything looks pretty decent from lag 2 two-way interaction model. Not perfect, but pretty good given the number of data points. Let’s compare to a two-way interaction model with a smooth term for flow.

- @@ -2822,7 +2698,7 @@

Model with first and second order lag terms

// establish options var options = { - selectors: "h1,h2,h3,h4", + selectors: "h1,h2,h3", theme: "bootstrap3", context: '.toc-content', hashGenerator: function (text) { diff --git a/manuscript_synthesis/notebooks/rtm_chl_models_flow_weekly_avg.Rmd b/manuscript_synthesis/notebooks/rtm_chl_models_flow_weekly_avg.Rmd index 9b46046..8939c63 100644 --- a/manuscript_synthesis/notebooks/rtm_chl_models_flow_weekly_avg.Rmd +++ b/manuscript_synthesis/notebooks/rtm_chl_models_flow_weekly_avg.Rmd @@ -1,13 +1,12 @@ --- title: "NDFS Synthesis Manuscript: Chlorophyll analysis" -subtitle: "Models using daily average flow as continuous predictor" +subtitle: "Models using weekly average flow as continuous predictor" author: "Dave Bosworth" date: '`r format(Sys.Date(), "%B %d, %Y")`' output: html_document: code_folding: show toc: true - toc_depth: 4 toc_float: collapsed: false editor_options: @@ -28,7 +27,7 @@ options(knitr.kable.NA = "") # Purpose -Explore and analyze the continuous chlorophyll data to be included in the NDFS synthesis manuscript. We will attempt to fit various models to the data set using daily average flow as a continuous predictor which replaces the categorical predictor - flow action period - in the original analysis. These models will only include representative stations for 4 habitat types - upstream (RD22), lower Yolo Bypass (STTD), Cache Slough complex (LIB), and downstream (RVB). +Explore and analyze the continuous chlorophyll data to be included in the NDFS synthesis manuscript. We will attempt to fit various models to the data set using weekly average flow as a continuous predictor which replaces the categorical predictor - flow action period - in the original analysis. These models will only include representative stations for 4 habitat types - upstream (RD22), lower Yolo Bypass (STTD), Cache Slough complex (LIB), and downstream (RVB). # Global code and functions @@ -89,13 +88,10 @@ df_chla_c1 <- df_chla %>% mutate( # Scale and log transform chlorophyll values Chla_log = log(Chla * 1000), - # Add Year and DOY variables - #Year = year(Date), - #DOY = yday(Date), # Apply factor order to StationCode StationCode = factor(StationCode, levels = sta_order), # Add a column for Year as a factor for the model - Year_fct = factor(Year, levels=c("2014", "2015", "2016", "2017", "2018", "2019")) + Year_fct = factor(Year) ) %>% arrange(StationCode, Year, Week) ``` @@ -118,14 +114,15 @@ Looking at the sample counts and date ranges, we'll only include years 2015-2019 ```{r chla remove under sampled} df_chla_c2 <- df_chla_c1 %>% - filter(Year %in% 2015:2019) + filter(Year %in% 2015:2019) %>% + mutate(Year_fct = fct_drop(Year_fct)) # Create another dataframe that has up to 3 lag variables for chlorophyll to be # used in the linear models df_chla_c2_lag <- df_chla_c2 %>% - # Fill in missing days for each StationCode-Year combination + # Fill in missing weeks for each StationCode-Year combination group_by(StationCode, Year) %>% - #complete(Date = seq.Date(min(Week), max(Week), by = "1 week")) %>% + # complete(Week = seq.int(min(Week), max(Week), by = 1)) %>% # Create lag variables of scaled log transformed chlorophyll values mutate( lag1 = lag(Chla_log), @@ -139,7 +136,7 @@ df_chla_c2_lag <- df_chla_c2 %>% Let's explore the data with some plots. First, lets plot the data in scatter plots of chlorophyll and flow faceted by Station and grouping all years together. -```{r chla scatterplot all yrs, message = FALSE, fig.height = 6, fig.width = 6} +```{r chla scatterplot all yrs, message = FALSE, fig.height = 6} df_chla_c2 %>% ggplot(aes(x = Flow, y = Chla)) + geom_point() + @@ -152,7 +149,7 @@ At first glance, I'm not sure how well flow is going to be able to predict chlor Let's break these scatterplots apart by year to see how these patterns vary annually. -```{r chla scatterplot facet yrs, message = FALSE, fig.height = 8, fig.width = 8} +```{r chla scatterplot facet yrs, message = FALSE, fig.height = 8, fig.width = 8.5} df_chla_c2 %>% ggplot(aes(x = Flow, y = Chla)) + geom_point() + @@ -170,7 +167,7 @@ The patterns appear to vary annually at each station, which may justify using a # GAM Model with 3-way interactions -First, we will attempt to fit a generalized additive model (GAM) to the data set to help account for seasonality in the data. We'll try running a GAM using a three-way interaction between Year, Daily Average Flow, and Station, and a smooth term for day of year to account for seasonality. Initially, we'll run the GAM without accounting for serial autocorrelation. +First, we will attempt to fit a generalized additive model (GAM) to the data set to help account for seasonality in the data. We'll try running a GAM using a three-way interaction between Year, Weekly Average Flow, and Station, and a smooth term for day of year to account for seasonality. Initially, we'll run the GAM without accounting for serial autocorrelation. ## Initial Model @@ -195,19 +192,17 @@ acf(residuals(m_gam3)) Box.test(residuals(m_gam3), lag = 20, type = 'Ljung-Box') ``` - ## Model with first and second order lag terms Now, we'll try to deal with the residual autocorrelation. -```{r gam3 with lag terms} - -# Fit original model with k = 9 and three successive lagged models -#lag1 +```{r gam3 with lag terms, warning = FALSE} +# Fit two successive lagged models +# lag1 m_gam3_lag1 <- gam( - Chla_log ~ lag1 + Year_fct * Flow * StationCode + s(Week, bs="cc"), + Chla_log ~ lag1 + Year_fct * Flow * StationCode + s(Week, bs = "cc"), data = df_chla_c2_lag, - method = "REML", knots=list(week=c(0,52)) + method = "REML", knots = list(week = c(0, 52)) ) summary(m_gam3_lag1) @@ -219,11 +214,11 @@ plot(m_gam3_lag1, pages = 1, all.terms = TRUE) acf(residuals(m_gam3_lag1)) Box.test(residuals(m_gam3_lag1), lag = 20, type = 'Ljung-Box') -#lag2 +# lag2 m_gam3_lag2 <- gam( - Chla_log ~ lag1 + lag2 + Year_fct * Flow * StationCode + s(Week, bs="cc"), + Chla_log ~ lag1 + lag2 + Year_fct * Flow * StationCode + s(Week, bs = "cc"), data = df_chla_c2_lag, - method = "REML", knots=list(week=c(0,52)) + method = "REML", knots = list(week = c(0, 52)) ) summary(m_gam3_lag2) @@ -235,9 +230,8 @@ plot(m_gam3_lag2, pages = 1, all.terms = TRUE) acf(residuals(m_gam3_lag2)) Box.test(residuals(m_gam3_lag2), lag = 20, type = 'Ljung-Box') -#compare fit of lag1 and lag2 models +# compare fit of lag1 and lag2 models AIC(m_gam3_lag1, m_gam3_lag2) - ``` It looks like lag2 model has the best fit according to the AIC values. @@ -246,6 +240,7 @@ Let's take a closer look at how the back-transformed fitted values from the mode ```{r gam3 ar1 obs vs fit, warning = FALSE} df_m_gam3_lag2_fit <- df_chla_c2_lag %>% + drop_na(lag1, lag2) %>% fitted_values(m_gam3_lag2, data = .) %>% mutate(fitted_bt = exp(fitted) / 1000) @@ -264,7 +259,7 @@ plt_m_gam3_lag2_fit Let's look at each Station-Year combination separately. -```{r gam3 ar1 obs vs fit facet sta yr, fig.height = 8, fig.width = 8} +```{r gam3 ar1 obs vs fit facet sta yr, fig.height = 8, fig.width = 8.5} plt_m_gam3_lag2_fit + facet_wrap( vars(StationCode, Year_fct), @@ -276,15 +271,15 @@ plt_m_gam3_lag2_fit + Everything looks pretty decent from lag 2 model. Not perfect, but pretty good given the number of data points. Let's compare to a two-way interaction model. +# GAM Model with 2-way interactions -```{r gam2 with and without lag terms} - -# Fit two-way interaction model with k = 9 and three successive lagged models -#lag1 +```{r gam2 with and without lag terms, warning = FALSE} +# Fit two-way interaction model and two successive lagged models +# no lag terms m_gam2 <- gam( - Chla_log ~ Year_fct * Flow + StationCode * Flow + Year_fct * StationCode + s(Week, bs="cc"), - data = df_chla_c2_lag, - method = "REML", knots=list(week=c(0,52)) + Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(Week, bs = "cc"), + data = df_chla_c2, + method = "REML", knots = list(week = c(0, 52)) ) summary(m_gam2) @@ -296,11 +291,11 @@ plot(m_gam2, pages = 1, all.terms = TRUE) acf(residuals(m_gam2)) Box.test(residuals(m_gam2), lag = 20, type = 'Ljung-Box') -#lag1 +# lag1 m_gam2_lag1 <- gam( - Chla_log ~ lag1 + Year_fct * Flow + StationCode * Flow + Year_fct * StationCode + s(Week, bs="cc"), + Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + s(Week, bs = "cc"), data = df_chla_c2_lag, - method = "REML", knots=list(week=c(0,52)) + method = "REML", knots = list(week = c(0, 52)) ) summary(m_gam2_lag1) @@ -312,11 +307,11 @@ plot(m_gam2_lag1, pages = 1, all.terms = TRUE) acf(residuals(m_gam2_lag1)) Box.test(residuals(m_gam2_lag1), lag = 20, type = 'Ljung-Box') -#lag2 +# lag2 m_gam2_lag2 <- gam( - Chla_log ~ lag1 + lag2 + Year_fct * Flow + StationCode * Flow + Year_fct * StationCode + s(Week, bs="cc"), + Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + lag2 + s(Week, bs = "cc"), data = df_chla_c2_lag, - method = "REML", knots=list(week=c(0,52)) + method = "REML", knots = list(week = c(0, 52)) ) summary(m_gam2_lag2) @@ -329,9 +324,8 @@ acf(residuals(m_gam2_lag2)) Box.test(residuals(m_gam2_lag2), lag = 20, type = 'Ljung-Box') -#compare fit of lag1 and lag2 models from two and three-way interaction models +# compare fit of lag1 and lag2 models from two and three-way interaction models AIC(m_gam2_lag1, m_gam2_lag2, m_gam3_lag1, m_gam3_lag2) - ``` It looks like lag2 model with a two-way interaction has the best fit according to the AIC values. @@ -340,6 +334,7 @@ Let's take a closer look at how the back-transformed fitted values from the mode ```{r gam2 lag2 obs vs fit, warning = FALSE} df_m_gam2_lag2_fit <- df_chla_c2_lag %>% + drop_na(lag1, lag2) %>% fitted_values(m_gam2_lag2, data = .) %>% mutate(fitted_bt = exp(fitted) / 1000) @@ -358,7 +353,7 @@ plt_m_gam2_lag2_fit Let's look at each Station-Year combination separately. -```{r gam2 lag2 obs vs fit facet sta yr, fig.height = 8, fig.width = 8} +```{r gam2 lag2 obs vs fit facet sta yr, fig.height = 8, fig.width = 8.5} plt_m_gam2_lag2_fit + facet_wrap( vars(StationCode, Year_fct), @@ -368,5 +363,4 @@ plt_m_gam2_lag2_fit + ) ``` - Everything looks pretty decent from lag 2 two-way interaction model. Not perfect, but pretty good given the number of data points. Let's compare to a two-way interaction model with a smooth term for flow.