-
Notifications
You must be signed in to change notification settings - Fork 3
/
OccEval.R
276 lines (253 loc) · 11.2 KB
/
OccEval.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
#######################################################################
#######################################################################
## This script was created by Dr. Jen Cruz as part of ##
## the Applied Population Ecology Class ###
## ##
## Here we evaluate model fit and plot model output. ##
# After we conduct an analysis we get results for the single model. #
# If we ran model selection, we get a top model. But is it any good? #
# A model may over or under-fit the data and thus be a poor #
# representation of the system we are trying to understand. #
# Too often ecologists stop when they finish running #
# their analyses. This is dangerous! #
# Part of the issue is that evaluating models is hard. This is #
# particularly so for Binomial models, as traditional approaches #
# that are suitable with Gaussian distributions do not work #
# with binomial responses. #
# There is therefore no silver bullet. We recommend exploring multiple #
# metrics as a way of confirming that your model is fit for purpose. #
## ##
#######################################################################
##### Set up your workspace and load relevant packages -----------
# Clean your workspace to reset your R environment. #
rm( list = ls() )
# Check that you are in the right project folder
getwd()
#and some multinomial approaches for capture data
install.packages( "AICcmodavg" ) #for assessing goodness of fit
# load packages:
library( tidyverse )#includes dplyr, tidyr and ggplot2
library( unmarked ) #
library( AICcmodavg )
## end of package load ###############
###################################################################
#### Load or create data -----------------------------------------
# set directory where your data are:
datadir <- paste( getwd(), "/Data/", sep = "" )
# load our analysis workspace containing all the objects we created #
load( "OccAnalysisWorkspace.RData" )
###########end of loading ######
##########################################################################
# Model fit and evaluation -----------------------------------------------
# We assess goodness of fit (GoF) on detection frequencies, which relies on a #
# Pearson chi-square to assess fit as suggested by Mackenzie and Bailey (2004) #
# J. Agr., Bio. & Env. Stats. 9: 300-318
# using AICmodavg package
gof.boot <- AICcmodavg::mb.gof.test( fm.closed, nsim = 1000 )
#view
gof.boot
# What does the table tell us about the comparison between observed and expected #
# frequency of ground squirrel site-level detection histories?
# Answer:
#
# What is the c-hat value?
# Answer:
#
# Note that values of c-hat > 1 indicate overdispersion (variance > mean), but #
# that values much higher than 1 (i.e., > 4) probably indicate lack-of-fit. #
# In cases of moderate overdispersion, one usually multiplies the #
# variance-covariance matrix of the estimates by c-hat inflating the SE’s of the#
# estimates (c-hat is also known as a variance inflation factor). #
# In cases of underdispersion (c-hat < 1), keep the value of c-hat to 1. #
# Note that values of c-hat << 1 can also indicate lack-of-fit. #
# Is our model over- or under-dispersed?
# Answer:
#
# We can also evaluate how well our full model did against the null model #
# by estimating pseudo-R^2, based on Nagelkerke, N.J.D. (1991) A Note #
# on a General Definition of the Coefficient of Determination. Biometrika 78,#
# pp. 691-692.#
# (1 - R^2) represents the proportion of unexplained variation in the model
# We create a reduced model list with only our two models of interest:
rms <- fitList( 'psi(sagebrush + cheatgrass)p(obsv+sagebrush)' = fm.closed,
'psi(.)p(.)' = fm.16 )
# Then use model selection function from unmarked but this time we define #
# which one our null model is:
unmarked::modSel(rms, nullmod = "psi(.)p(.)" )
# What does this tell us about the fit of our model?
# Answer:
#
# Note that there are multiple approaches for estimating pseudo-Rsquares. See:
# https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/
# We chose the one available in unmarked but like everything, it should be #
# taken with a grain of salt.
# We could also compare all the models we run against our null:
modSel(fms, nullmod = "psi(.)p(.)" )
######## end of model evaluation ##############
##### Producing model output ##############
# Now that we have evaluated the value of our model we can produce #
# some output. If our aim is inference on what drives occupancy and #
# detection of Piute ground squirrels at the NCA, we would want to #
# plot the relationships between our model predictors and responses #
# Using our global model we estimate partial prediction plots #
# for our predictors. These plot the relationship between the #
# response and predictor of interest, while keeping the remaining #
# predictors at their mean values. #
# What is the mean of our predictors?
# Answer:
#
# To create nice plots we need to create new vectors with evenly #
# spaced predictors within their actual observed range: #
# how many values do we use:
n <- 100
# we use the observed values to define our range:
sagebrush <- seq( min( closeddf[,"sagebrush"]),max( closeddf[,"sagebrush"]),
length.out = n )
cheatgrass <- seq( min( closeddf[,"cheatgrass"]),max( closeddf[,"cheatgrass"]),
length.out = n )
#standardize them
sage.std <- scale( sagebrush )
cheat.std <- scale( cheatgrass )
#combine standardized predictor into a new dataframe to predict partial relationship
# with sagebrush. We replace value of other predictor with its mean
sageData <- data.frame( sagebrush = sage.std, cheatgrass = 0 )
#predict partial relationship between sagebrush and occupancy
pred.occ.sage <- predict( fm.closed, type = "state", newdata = sageData,
appendData = TRUE )
#view
head( pred.occ.sage ); dim( pred.occ.sage )
#combine standardized predictor into a new dataframe to predict partial relationship
# with cheatgrass. We replace value of other predictor with its mean
cheatData <- data.frame( sagebrush = 0, cheatgrass = cheat.std )
#predict partial relationship between sagebrush and occupancy
pred.occ.cheat <- predict( fm.closed, type = "state", newdata = cheatData,
appendData = TRUE )
#combine standardized predictor into a new dataframe to predict partial relationship
# with sagebrush. We set observer effect as tech 1
sageDet <- data.frame( obsv = factor( "tech.4", levels = c("tech.1", "tech.2",
"tech.3", "tech.4") ), sagebrush = sage.std )
#predict partial relationship between sagebrush and detection
pred.det.sage <- predict( fm.closed, type = "det", newdata = sageDet,
appendData = TRUE )
#now for observer effects:
obsvDet <- data.frame( obsv = factor( c("tech.1", "tech.2","tech.3", "tech.4"),
levels = c("tech.1", "tech.2","tech.3", "tech.4") ),
sagebrush = 0 )
#predict partial relationship between observer effects and detection
pred.det.obsv <- predict( fm.closed, type = "det", newdata = obsvDet,
appendData = TRUE )
# create plots
#Starting with sagebrush and occupancy:
# select the predicted values we want to plot and combine with unscaled predictor
sagep <- cbind( pred.occ.sage[,c("Predicted", "lower", "upper") ], sagebrush ) %>%
# define x and y values
ggplot(., aes( x = sagebrush, y = Predicted ) ) +
#choose preset look
theme_bw( base_size = 15 ) +
# add labels
labs( x = "Sagebrush (%)", y = "Predicted occupancy" ) +
# add band of confidence intervals
geom_smooth( aes(ymin = lower, ymax = upper ),
stat = "identity",
size = 1.5, alpha = 0.5, color = "grey" ) +
# add mean line on top
geom_line( size = 2 )
#view
sagep
#Now cheatgrass and occupancy:
# select the predicted values we want to plot and combine with unscaled predictor
cheatp <- cbind( pred.occ.cheat[,c("Predicted", "lower", "upper") ], cheatgrass ) %>%
# define x and y values
ggplot(., aes( x = cheatgrass, y = Predicted ) ) +
#choose preset look
theme_bw( base_size = 15 ) +
# add labels
labs( x = "Cheatgrass (%)", y = "Predicted occupancy" ) +
# add band of confidence intervals
geom_smooth( aes(ymin = lower, ymax = upper ),
stat = "identity",
size = 1.5, alpha = 0.5, color = "grey" ) +
# add mean line on top
geom_line( size = 2 )
#view
cheatp
# Now sagebrush and detection:
# select the predicted values we want to plot and combine with unscaled predictor
sagep.det <- cbind( pred.det.sage[,c("Predicted", "lower", "upper") ], sagebrush ) %>%
# define x and y values
ggplot(., aes( x = sagebrush, y = Predicted ) ) +
#choose preset look
theme_bw( base_size = 15 ) +
# add labels
labs( x = "Sagebrush (%)", y = "Predicted detection" ) +
# add band of confidence intervals
geom_smooth( aes(ymin = lower, ymax = upper ),
stat = "identity",
size = 1.5, alpha = 0.5, color = "grey" ) +
# add mean line on top
geom_line( size = 2 )
#view
sagep.det
# Now observer and detection:
obsvp.det <- pred.det.obsv %>%
# define x and y values
ggplot(., aes( x = obsv, y = Predicted, color = obsv ) ) +
#choose preset look
theme_bw( base_size = 15 ) +
#remove legend
theme( legend.position = "none" ) +
# add labels
labs( x = "Observer", y = "Predicted detection" ) +
#add mean detection for each observer
geom_point( size = 4 ) +
# add confidence intervals
geom_errorbar( aes(ymin = lower, ymax = upper ),
size = 1.5, width = 0.3 )
#view
obsvp.det
# If you were reporting these in a manuscript, which (if any) would you #
# leave out? Why?
# Answer:
#
#### end model results section #############
###################
############################################################################
################## Save your data and workspace ###################
#save the plot objects we created
#start by calling the file where you will save it
tiff( 'Data/SageXOcc.tiff',
height = 10, width = 12, units = 'cm',
compression = "lzw", res = 400 )
#call the plot
sagep
#end connection
dev.off()
#Now the cheatgrass x occupancy plot:
tiff( 'Data/CheatXOcc.tiff',
height = 10, width = 12, units = 'cm',
compression = "lzw", res = 400 )
#call the plot
cheatp
#end connection
dev.off()
#save the sagebrush x detection plot
tiff( 'Data/SageXDet.tiff',
height = 10, width = 12, units = 'cm',
compression = "lzw", res = 400 )
#call the plot
sagep.det
#end connection
dev.off()
#save the observer x detection plot
tiff( 'Data/ObsvXDet.tiff',
height = 10, width = 10, units = 'cm',
compression = "lzw", res = 400 )
#call the plot
obsvp.det
#end connection
dev.off()
# We save the workspace with our new objects in case we want to extract/modify #
# them #
save.image( "OccResultsWorkspace.RData" )
########## End of saving section ##################################
############# END OF SCRIPT #####################################