-
Notifications
You must be signed in to change notification settings - Fork 0
/
detiding_sim.Rnw
623 lines (510 loc) · 25.4 KB
/
detiding_sim.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
\documentclass{article}
\usepackage[top=1in,bottom=1in,left=1in,right=1in]{geometry}
\usepackage[colorlinks=true,allcolors=Blue]{hyperref}
\usepackage{cleveref}
\usepackage[usenames,dvipsnames]{xcolor}
\usepackage{pdflscape}
% knitr options
<<opts, echo = F>>=
# knitr options
opts_chunk$set(warning = F, message = F,
tidy.opts = list(width.cutoff = 65), fig.align = 'center',
dev = 'pdf', dev.args = list(family = 'serif'),
fig.pos = '!h')
@
\begin{document}
\setlength{\parskip}{5mm}
\setlength{\parindent}{0in}
\title{Simulation and removal of advection effects on DO measurements}
\author{Marcus W. Beck}
\maketitle
\section{Overview}
Observed DO in estuaries can be described as the summation of DO from biological processes, air-sea gas diffusion, water transport by tidal advection, and error or noise.
\begin{equation} \label{eqn:one}
DO_{obs} = DO_{bio} + DO_{dif} + DO_{adv}
\end{equation}
Reliable estimates of ecosystem metabolism are dependent on measures of DO flux that are dominated by biological processes. Long-term time series of DO measurements may include variation related to both biological and physical processses such that the use of observed data may be insufficient in many examples. Statistical modelling techniques that quantify variation in DO over time and tidal changes have the potential to isolate biological signals in DO variation to more accurately estimate metabolism. We used a simulation approach to create an observed DO time series as the summation of diel variation. The effects of air-sea gas diffusion were not considered in the simulation given that methods for quantifying the contribution are available and not of concern for the analysis. A weighted regression approach was used to predict the simulated time series and then remove variation related to tidal changes. The following describes the general approach and results of the analysis.
First, a biological DO time series was created using a sine/cosine function where:
\begin{equation} \label{eqn:bio}
DO_{bio} = \alpha + \beta\cos\left(2\pi ft + \Phi\right)
\end{equation}
where the mean DO $\alpha$ was 8, amplitude $\beta$ was 1, $f$ was 1/48 to repeat on a 24 hour period (30 minute observations, \cref{fig:do_sim}), $t$ was the time series vector and $\Phi$ was the x-axis origin set for sunrise at 630am. The signal was increasing during hypothetical daylight and decreasing during the night for each 24 hour period. The signal ranged from 7 to 9 mg L$^{-1}$.
Noise was added to the biological DO signal to simulate natural variation in DO throughout the time series (\cref{fig:do_sim}). Total uncertainty was the sum of process and observation uncertainty simulated as random variables from the normal distribution, such that:
\begin{equation}
\epsilon _{tot} = \epsilon _{obs} + \int_0^n \epsilon _{pro}
\end{equation}
where $\epsilon$ for observation and process uncertainty was simulated as a normally distributed random variable with mean zero and standard deviation varying from zero to an upper limit, described below. The noise for process uncertainty was estimated as a cumulative sum for time $t$ in 0 to $n$ observations such that the noise at time $t+1$ was equal to the noise at time $t$ plus additional variation drawn from the normal distribution. This approach created a noise vector that was auto-correlated throughout the time series. The noise vector for process uncertainty was rescaled to constrain the variation within the bounds for standard deviation defined by the random variable. The total error was added to the biological DO time series and was assumed to represent variation in biological processes as DO time series are inherently variable.
Second, a tidal time series was simulated by adding sine waves with relevant solar and lunar periods (\cref{fig:do_sim}). Each sine wave was created using \cref{eqn:bio} varying $f$ for each period, e.g., 1/25 for a 12.5 hour principal lunar semi-diurnal wave. The amplitude of each tidal component was set constant to one meter. The combined tidal series was the additive time series of all sine waves, scaled to 1 meter and centered at 4 meters to approximate a shallow water station.
The tidal time series was added to the biological DO series to simulate DO changes with advection (\cref{fig:do_sim}). Conceptually, this vector represents the rate of change in DO as a function of tidal advection such that:
\begin{equation}
\frac{\delta DO_{adv}}{\delta t} = \frac{\delta DO}{\delta x} \cdot \frac{\delta x}{\delta t}
\end{equation}
\begin{equation}
\frac{\delta x}{\delta t} = k \cdot \frac{\delta H}{\delta t}
\end{equation}
where the first derivative of the tidal time series, as change in height over time $\delta H / \delta t$, is multiplied by a constant $k$, to simulate the rate of the horizontal tidal excursion over time, $\delta x / \delta t$, associated with tidal height changes. The horizontal excursion is assumed to be associated with a horizontal DO change, $\delta DO / \delta x$, such that the product of the two estimates the DO change at each time step from advection, $DO_{adv}$. In practice, the simulated tidal signal was used to estimate $DO_{adv}$:
\begin{equation}
DO_{adv} \propto H
\end{equation}
\begin{equation}
DO_{adv} = 2\cdot a + a \cdot \frac{H- \min H}{\max H - \min H}
\end{equation}
where $a$ is chosen as the transformation parameter to standardize change in DO from tidal height change to desired units. For example, $a = 1$ will convert $H$ to the scale of +/- 1 mg L$^{-1}$.
The final time series for simulated observed DO was the sum of biological DO and advection DO (\cref{fig:do_sim}).
\begin{equation}
DO_{obs} = DO_{bio} + DO_{adv}
\end{equation}
The weighted regression method was then applied to the observed DO time series such that observed DO was modelled as a function of time and tide using a moving window approach. The weighted regression approach estimated DO values using weights that are specific to each observation. Weights are based on the product of three weight vectors that consider the relation of all other observations in respect to day, hour, and tidal height. Predicted values are obtained sequentially for each observation and the remaining observations that are closer in time (either in day and hour of day) and those with similar tidal heights are given higher weights in the regression. The process is repeated for each observation in the time series. Window widths of eight days, 24 hours, and half the range of tidal height values were used. Normalized DO values were obtained by using the mean tidal height as the predictor variable throughout the time series. This detided vector represents the mean response of DO conditional on time and a constant tidal height. The residuals from the predicted estimate, i.e., the observed values minus the predicted values, were considered to represent random variation in the DO signal from biological processes and were added to the detided time series.
The predicted and detided values were compared to the observed and biological DO signals as a basis for evaluating the weighted regression method.
\section{Systematic evaluation of detiding}
A systematic approach was used to evaluate ability of the WRTDS method to detide the DO signal. Specifically, the weighted regression approach was evaluated using simulated data that varied in the relative amount of error in the measurement, degree of association of the tide with the DO signal, relative strength of the biological signal, and tidal type as diurnal, semidiurnal, and mixed semidiurnal (\cref{fig:sim_ex}). Three levels were evaluated for each variable: relative noise from 0 to 2 standard deviations for both process and observation uncertainty, DO change from tidal advection ($k$) from 0 to 2 mg L$^-L$, and amplitude of biological DO from 0 to 2 mg L$^{-1}$. This resulted in 81 combinations for each of three tidal categories, or 243 total simulations. Results were evaluated based on correlations between observed and predicted DO and normalized and biological DO (\cref{fig:cor_surf}).
\section{Complex tidal signal}
The above simulations were repeated using a complex tidal signal and a longer time series to further evaluate use of the weighted regression method. Specifically, the weighted regression method was evaluated for its ability to predict observed DO and normalize by tide given effects of actual tidal variation. A tidal time series was estimated for six months using observed height data for First Mallard site, San Francisco Bay. The above simulations were repated using the predicted tidal time series (\cref{fig:do_sim_act,fig:cor_surf_act}).
\section{Conclusions}
The following conclusions are made:
\begin{itemize}
\item Results were not affected by tidal type (i.e., diurnal, etc.)
\item Increasing biological DO signal improved predicted and normalized results.
\item Predicted values were more negatively affected by inreasing observation error compared to process error.
\item Normalized results were more more negatively affected by increasing process error compared to observation error.
\item At low biological DO signals, increasing observation error had a larger effect on model predictions than at higher biological DO signals.
\item Increasing tidal advection effects decreased effects of increasing observation error on model predictions.
\item Increasing tidal advection had no effect on normalized results.
\item Although not systematic, weighted regression performed similarly using actual tidal data.
\end{itemize}
Overall, the method should produce accurate predictions and unbiased normalized DO estimates for most scenarios, excluding those with both high error and low biological DO amplitudes. Additionally, normalized results with high process error may exhibit some bias on long-term time series as the window widths are best suited for evaluating daily DO variation within a few days.
%%%%%%
% figures and proc data
% example of creating simulated time series
<<do_sim, eval = T, echo = F, fig.width = 8, fig.height = 7, fig.cap = 'Example of a simulated time series using the equations above. Yellow indicates daylight periods.'>>=
# create time vector
vec <- c('2014-05-01 00:00:00', '2014-05-31 00:00:00')
vec <- as.POSIXct(vec, format = '%Y-%m-%d %H:%M:%S')
vec <- seq(vec[1], vec[2], by = 60*30)
# create simulated time series of DO, tide, etc.
DO_sim <- ts_create(
vec,
do.amp = 2,
tide_cat = 'Diurnal',
tide_assoc = 4,
err_rng_obs = 2,
err_rng_pro = 2,
seeded = T
)
levs <- c('e_obs', 'e_pro', 'e_tot', 'DO_bio', 'DO_adv', 'DO_obs')
to.plo <- melt(DO_sim, id.var = c('Day', 'sunrise'),
measure.var = levs
)
to.plo$variable <- factor(to.plo$variable, levels = levs)
p <- ggplot(to.plo, aes(x = Day, y = value, col = sunrise)) +
geom_line() +
facet_wrap(~ variable, scales = 'free_y', ncol = 1) +
theme_bw() +
scale_colour_gradientn(colours = c('orange', 'black')) +
theme(legend.position = 'none')
facet_wrap_labeller(p, labels = c(
expression(italic(epsilon [obs])),
expression(italic(epsilon [ pro])),
expression(italic(epsilon [ obs] + epsilon [ pro])),
expression(italic(DO [Bio])),
expression(italic(DO [adv])),
expression(italic(paste(DO [obs], '=', DO [bio] + DO [adv])))
))
@
\clearpage
% run simulations using eval_grd
<<run_sims, eval = F, echo = F>>=
tide_cat <- c('Diurnal', 'Semidiurnal', 'Mixed Semidiurnal')
tide_cat <- factor(tide_cat, levels = tide_cat)
bio_rng <- round(seq(0, 2, length = 3),2)
tide_assoc <- round(seq(0, 2, length = 3), 2)
err_rng_pro <- round(seq(0, 2, length = 3), 2)
err_rng_obs <- round(seq(0, 2, length = 3), 2)
eval_grd <- expand.grid(tide_cat, bio_rng, tide_assoc, err_rng_pro,
err_rng_obs)
names(eval_grd) <- c('tide_cat', 'bio_rng', 'tide_assoc', 'err_rng_pro',
'err_rng_obs')
save(eval_grd, file = 'eval_grd.RData')
wins <- c(1, seq(5, 20, by = 5))
wins_grd <- expand.grid(wins)
names(wins_grd) <- c('dec_time')
comb_grd <- expand.grid(tide_cat, bio_rng, tide_assoc, err_rng_pro, err_rng_obs,
wins)
names(comb_grd) <- c(names(eval_grd), names(wins_grd))
save(comb_grd, file = 'comb_grd.RData')
# time vector
vec <- c('2014-05-01 00:00:00', '2014-05-31 00:00:00')
vec <- as.POSIXct(vec, format = '%Y-%m-%d %H:%M:%S')
vec <- seq(vec[1], vec[2], by = 60*30)
# setup parallel
cl <- makeCluster(8)
registerDoParallel(cl)
# iterate through evaluation grid to create sim series
strt <- Sys.time()
res <- foreach(row = 1:nrow(comb_grd)) %dopar% {
# progress
sink('log.txt')
cat('Log entry time', as.character(Sys.time()), '\n')
cat(row, ' of ', nrow(comb_grd), '\n')
print(Sys.time() - strt)
sink()
# eval grid to evaluate
to_eval <- comb_grd[row, ]
# create simulated time series of DO, tide, etc.
DO_sim <- with(to_eval,
ts_create(
vec,
do.amp = bio_rng,
tide_cat = as.character(tide_cat),
tide_assoc = tide_assoc,
err_rng_obs = err_rng_obs,
err_rng_pro = err_rng_pro
)
)
win_in <- with(to_eval, dec_time)
# get results
res_tmp <- wtreg_fun(DO_sim, win = win_in, parallel = F)
res_tmp
}
stopCluster(cl)
# save
prdnrm <- res
save(prdnrm, file = 'prdnrm.RData')
@
% plot of representative time series for simulation
% note that the plot uses data from 'get_prdnrm' chunk below
<<sim_ex, fig.height = 9.5, fig.width = 8, echo = F, eval = T, fig.cap = 'Representative examples of simulated DO time series. Black lines are observed DO and red lines are biological DO.'>>=
load('eval_grd.RData')
# find rows in eval_grd of parms to plot
sel_vec <- !with(eval_grd,
bio_rng %in% 1|
tide_assoc %in% 1|
err_rng_pro %in% 1|
err_rng_obs %in% 1
)
parms <- eval_grd[sel_vec,]
parms$L1 <- as.numeric(row.names(parms))
load('prdnrm.RData')
to_plo <- prdnrm[as.numeric(rownames(parms))]
names(to_plo) <- rownames(parms)
to_plo <- melt(to_plo, id.var = names(to_plo[[1]]))
to_plo <- merge(to_plo, parms, by = 'L1', all.x = T)
# rename extremes for facet labs
labs <- paste('Bio Amp', unique(to_plo$bio_rng))
to_plo$bio_rng <- factor(to_plo$bio_rng, labels = labs)
labs <- paste('Assoc', unique(to_plo$tide_assoc))
to_plo$tide_assoc <- factor(to_plo$tide_assoc, labels = labs)
labs <- paste('Noise pro', unique(to_plo$err_rng_pro))
to_plo$err_rng_pro <- factor(to_plo$err_rng_pro, labels = labs)
labs <- paste('Noise obs', unique(to_plo$err_rng_obs))
to_plo$err_rng_obs <- factor(to_plo$err_rng_obs, labels = labs)
# setup facet labels
facet1_names <- list(
'Diurnal' = expression(Diurnal),
'Semidiurnal' = expression(Semidi.),
'Mixed Semidiurnal' = expression(Mixed)
)
facet2_names <- list(
'Assoc 0' = expression(paste(italic(DO [adv]), ' 0')),
'Assoc 2' = expression(paste(italic(DO [adv]), ' 2'))
)
facet3_names <- list(
'Bio Amp 0' = expression(paste(italic(DO [bio]), ' 0')),
'Bio Amp 2' = expression(paste(italic(DO [bio]), ' 2'))
)
facet4_names <- list(
'Noise pro 0' = expression(paste(italic(epsilon [pro]), ' 0')),
'Noise pro 2' = expression(paste(italic(epsilon [pro]), ' 2'))
)
facet5_names <- list(
'Noise obs 0' = expression(paste(italic(epsilon [obs]), ' 0')),
'Noise obs 2' = expression(paste(italic(epsilon [obs]), ' 2'))
)
plot_labeller <- function(variable,value){
if (variable=='tide_cat')
return(facet1_names[value])
if (variable=='tide_assoc')
return(facet2_names[value])
if (variable=='bio_rng')
return(facet3_names[value])
if (variable=='err_rng_pro')
return(facet4_names[value])
if (variable=='err_rng_obs')
return(facet5_names[value])
}
ggplot(to_plo, aes(x = Day, y = DO_obs, group = L1)) +
geom_line() +
geom_line(aes(y = DO_bio, colour = 'DO_bio'), alpha = 0.8) +
theme_bw() +
theme(legend.position = 'none', axis.text.x = element_text(size = 8)) +
ylab(expression(paste('DO mg', L^-1))) +
facet_grid(bio_rng + err_rng_obs + err_rng_pro ~ tide_cat + tide_assoc,
labeller = plot_labeller)
@
\clearpage
% correlation surface for simulation results
\begin{landscape}
\centering\vspace*{\fill}
<<cor_surf, fig.height = 6, fig.width = 12, eval = T, echo = F, fig.cap = 'Correlation surfaces for each unique combination of simulation parameters and window widths. Correlations are based on Pearson coefficients comparing biological and detided DO time series.'>>=
######
# combination grid, used for systematic sims
load('comb_grd.RData')
# load predicted/normalized data
load('prdnrm.RData')
# get results from prdnrm corresponding to a given window value in which_wins
# rownames in which_wins are used to iterate through prdnrm
names(comb_grd)[names(comb_grd) %in% 'dec_time'] <- 'dec_win'
wins <- unique(comb_grd$dec_win)
for(win in wins){
which_wins <- comb_grd[comb_grd$dec_win == win, ]
# get err comps for pred/obs and norm/bio
errs_all <- NULL
jitt <- rnorm(nrow(prdnrm[[1]]), 0, 1e-10)
for(i in as.numeric(rownames(which_wins))){
# cat(i, '\t')
x <- prdnrm[[i]]
prd_err <- cor(x$DO_obs + jitt, x$DO_prd + jitt)
dtd_err <- cor(x$DO_bio + jitt, x$DO_dtd + jitt)
errs_all <- rbind(errs_all, data.frame(prd_err, dtd_err))
}
# reassign factor labels for bio amp and tidal assoc
to_plo <- data.frame(which_wins, errs_all)
labs <- paste('Bio Amp',unique(to_plo$bio_rng))
to_plo$bio_rng <- factor(to_plo$bio_rng, labels = labs)
labs <- paste('Assoc',unique(to_plo$tide_assoc))
to_plo$tide_assoc <- factor(to_plo$tide_assoc, labels = labs)
# setup facet labels
facet1_names <- list(
'Diurnal' = expression(Diurnal),
'Semidiurnal' = expression(Semidi.),
'Mixed Semidiurnal' = expression(Mixed)
)
facet2_names <- list(
'Assoc 0' = expression(paste(italic(DO [adv]), ' 0')),
'Assoc 1' = expression(paste(italic(DO [adv]), ' 1')),
'Assoc 2' = expression(paste(italic(DO [adv]), ' 2'))
)
facet3_names <- list(
'Bio Amp 0' = expression(paste(italic(DO [bio]), ' 0')),
'Bio Amp 1' = expression(paste(italic(DO [bio]), ' 1')),
'Bio Amp 2' = expression(paste(italic(DO [bio]), ' 2'))
)
plot_labeller <- function(variable,value){
if (variable=='tide_cat')
return(facet1_names[value])
if (variable=='tide_assoc')
return(facet2_names[value])
if (variable=='bio_rng')
return(facet3_names[value])
}
p <- ggplot(to_plo, aes(x = factor(err_rng_pro), y = factor(err_rng_obs),
z = dtd_err, fill = dtd_err)) +
geom_tile() +
facet_grid(tide_cat + bio_rng ~ tide_assoc, scales = 'free',
labeller = plot_labeller) +
scale_fill_gradientn(limits = c(0.7, 1), name = 'Obs ~ Prd', colours=cm.colors(3)) +
scale_x_discrete(expand = c(0,0)) +
scale_y_discrete(expand = c(0,0)) +
xlab(expression(italic(epsilon [pro]))) +
ylab(expression(italic(epsilon [obs]))) +
theme_bw() +
theme(legend.position = 'top') +
ggtitle(paste0('win = ', win))
pl_nm <- paste0('p', which(win == wins))
assign(pl_nm, p)
}
grid.arrange(p1, p2, p3, p4, p5, ncol = 5)
@
\vfill
\end{landscape}
\clearpage
% example of creating simulated time series using actual predicted tide
<<do_sim_act, eval = F, echo = F, fig.width = 8, fig.height = 7, fig.cap = 'Example of simulated time series using the equations above and actual predicted tides from San Francisco Bay, First Mallard. Yellow indicates daylight periods.'>>=
# site for tide
site <- 'SFBFM'
load(paste0('M:/wq_models/SWMP/raw/rproc/tide_preds/', site, '.RData'))
tide <- get(site)
nobs <- 48 * 31 * 3 # 6 months
vec <- c(as.character(min(tide$DateTimeStamp)),
as.character(tide$DateTimeStamp[nobs]))
vec <- as.POSIXct(vec, format = '%Y-%m-%d %H:%M:%S')
vec <- seq(vec[1], vec[2], by = 60*30)
# create simulated time series of DO, tide, etc.
DO_sim <- ts_create(
vec,
do.amp = 2,
tide_cat = tide[1:length(vec),],
tide_assoc = 1,
err_rng_obs = 0.5,
err_rng_pro = 1
)
levs <- c('DO_bio', 'Tide', 'DO_adv', 'DO_tid', 'e_obs', 'e_pro', 'e_tot', 'DO_obs')
to.plo <- melt(DO_sim, id.var = c('Day', 'sunrise'),
measure.var = levs
)
to.plo$variable <- factor(to.plo$variable, levels = levs)
p <- ggplot(to.plo, aes(x = Day, y = value, col = sunrise)) +
geom_line() +
facet_wrap(~ variable, scales = 'free_y', ncol = 1) +
theme_bw() +
scale_colour_gradientn(colours = c('orange', 'black')) +
theme(legend.position = 'none')
facet_wrap_labeller(p, labels = c(
expression(italic(DO [Bio])),
expression(italic(Tide)),
expression(italic(DO [adv])),
expression(italic(DO [bio] + DO [adv])),
expression(italic(epsilon [obs])),
expression(italic(epsilon [ pro])),
expression(italic(epsilon [ obs] + epsilon [ pro])),
expression(italic(paste(DO [obs], '=', DO [bio] + DO [adv] + epsilon [tot])))
))
@
% do simulations with actual tidal data, no figure
<<act_proc, eval = F, echo = F>>=
######
#eval grd, same as before but no tidal category since using actual tide
load('eval_grd.RData')
eval_grd_act <- unique(eval_grd[, !names(eval_grd) %in% 'tide_cat'])
# tide to simulate
site <- 'SFBFM'
load(paste0('M:/wq_models/SWMP/raw/rproc/tide_preds/', site, '.RData'))
tide <- get(site)
# time vector
nobs <- 48 * 31 * 3 # 6 months
vec <- c(as.character(min(tide$DateTimeStamp)),
as.character(tide$DateTimeStamp[nobs]))
vec <- as.POSIXct(vec, format = '%Y-%m-%d %H:%M:%S')
vec <- seq(vec[1], vec[2], by = 60*30)
# setup parallel, for ddply in interpgrd
cl <- makeCluster(8)
registerDoParallel(cl)
# iterate through evaluation grid to create sim series
strt <- Sys.time()
int_grds_tidact <- vector('list', length = nrow(eval_grd))
for(row in 1:nrow(eval_grd)){
# progress
sink('log.txt')
cat('Log entry time', as.character(Sys.time()), '\n')
cat(row, ' of ', nrow(eval_grd), '\n')
print(Sys.time() - strt)
sink()
# eval grid to evaluate
to_eval <- eval_grd[row, ]
# create simulated time series of DO, tide, etc.
DO_sim <- with(to_eval,
ts_create(
vec,
do.amp = bio_rng,
tide_cat = tide[1:length(vec),],
tide_assoc = tide_assoc,
err_rng = err_rng
)
)
# get interp grid, done in parallel
int_grd <- interp_grd(DO_sim, wins = list(2, 0.5, 0.2), parallel = T)
# append to results
int_grds_tidact[[row]] <- list(to_eval, DO_sim, int_grd)
# save res as results are appended
save(int_grds_tidact, file = 'int_grds_tidact.RData')
}
stopCluster(cl)
Sys.time() - strt
@
\clearpage
\vfill
% get preds, norms from all int_grds w/ actual tidal data, no figure
<<get_prednrm_tidact, eval = F, echo = F>>=
# load interpolation grids
load('int_grds_tidact.RData')
# setup parallel backend
cl <- makeCluster(8)
registerDoParallel(cl)
#process
strt <- Sys.time()
prdnrm_tidact <- foreach(sim = 1:length(int_grds_tidact),
.packages = 'plyr') %dopar% {
# data to proc
grd_in <- int_grds_tidact[[sim]][[3]]
dat_in <- int_grds_tidact[[sim]][[2]]
# progress
sink('log.txt')
cat('Log entry time', as.character(Sys.time()), '\n')
cat(sim, ' of ', length(int_grds_tidact), '\n')
print(Sys.time() - strt)
sink()
# proc
res <- prdnrm_fun(grd_in, dat_in)
res
}
stopCluster(cl)
save(prdnrm_tidact, file = 'prdnrm_tidact.RData')
@
% correlation surface for simulation results w/ actual tidal data
<<cor_surf_act, fig.height =4.5, fig.width = 7.5, eval = F, echo = F, fig.cap = 'Correlation surfaces for each unique combination of simulation parameters. The top plots indicate correlations between observed and predicted DO and the bottom plots indicate correlations between biological and normalized DO.'>>=
######
#eval grd, same as before but no tidal category since using actual tide
load('eval_grd.RData')
eval_grd_act <- unique(eval_grd[, !names(eval_grd) %in% 'tide_cat'])
load('prdnrm_tidact.RData')
errs_all <- NULL
jitt <- rnorm(nrow(prdnrm_tidact[[1]]), 0, 1e-10)
for(i in 1:length(prdnrm_tidact)){
# cat(i, '\t')
x <- prdnrm_tidact[[i]]
prd_err <- cor(x$DO_obs + jitt, x$DO_pred + jitt)
nrm_err <- cor(x$DO_bio + jitt, x$DO_nrm + jitt)
errs_all <- rbind(errs_all, data.frame(prd_err, nrm_err))
}
# reassign factor labels for bio amp and tidal assoc
to_plo <- data.frame(eval_grd_act, errs_all)
labs <- paste('Bio Amp',unique(to_plo$bio_rng))
to_plo$bio_rng <- factor(to_plo$bio_rng, labels = labs)
labs <- paste('Assoc',unique(to_plo$tide_assoc))
to_plo$tide_assoc <- factor(to_plo$tide_assoc, labels = labs)
# setup facet labels
facet1_names <- list(
'Assoc 0' = expression(paste(italic(DO [adv]), ' 0')),
'Assoc 1' = expression(paste(italic(DO [adv]), ' 1')),
'Assoc 2' = expression(paste(italic(DO [adv]), ' 2'))
)
facet2_names <- list(
'Bio Amp 0' = expression(paste(italic(DO [bio]), ' 0')),
'Bio Amp 1' = expression(paste(italic(DO [bio]), ' 1')),
'Bio Amp 2' = expression(paste(italic(DO [bio]), ' 2'))
)
plot_labeller <- function(variable,value){
if (variable=='tide_assoc')
return(facet1_names[value])
if (variable=='bio_rng')
return(facet2_names[value])
}
p1 <- ggplot(to_plo, aes(x = factor(err_rng_pro), y = factor(err_rng_obs),
rz = prd_err, fill = prd_err)) +
geom_tile() +
facet_grid(tide_assoc ~ bio_rng, scales = 'free',
labeller = plot_labeller) +
scale_fill_gradientn(name = 'Obs ~ Prd', colours=cm.colors(3)) +
scale_x_discrete(expand = c(0,0)) +
scale_y_discrete(expand = c(0,0)) +
xlab(expression(italic(epsilon [pro]))) +
ylab(expression(italic(epsilon [obs]))) +
theme_bw() +
theme(legend.position = 'top')
p2 <- ggplot(to_plo, aes(x = factor(err_rng_pro), y = factor(err_rng_obs),
z = nrm_err, fill = nrm_err)) +
geom_tile() +
facet_grid(tide_assoc ~ bio_rng, scales = 'free',
labeller = plot_labeller) +
scale_fill_gradientn(name = 'Bio ~ Nrm', colours=cm.colors(3)) +
scale_x_discrete(expand = c(0,0)) +
scale_y_discrete(expand = c(0,0)) +
xlab(expression(italic(epsilon [pro]))) +
ylab(expression(italic(epsilon [obs]))) +
theme_bw() +
theme(legend.position = 'top')
grid.arrange(p1, p2, ncol = 2)
@
\vfill
\clearpage
\end{document}