Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predConc is extremely slow in certain situations, R session must be killed to exit. #6

Open
jacaronda opened this issue Feb 8, 2016 · 4 comments

Comments

@jacaronda
Copy link

Please read the document ISSUE.docx for details.

Providence Data.zip
ISSUE.docx

@ldecicco-USGS
Copy link
Member

I haven't looked to see what exactly is happening. I can say that there is a user in our office who is running "unit" predConc calculations, and she literally has them running for days. So, it may not be actually hanging, and might eventually finish....(I know...that's not very helpful).

I don't think any work has been done to optimize the unit calculations, and as of this moment, there is not resources available to do that sort of work on the package.

@jacaronda
Copy link
Author

OK. Can anyone help regarding the second issue, that the bias-corrected predictions from predConc() are smaller than the uncorrected retransformed predictions from predict.lm() ?

@ldecicco-USGS
Copy link
Member

I've been asking around. The nice thing about bringing up the question is that it reminds some of our managers that people are using this package, and it needs more resources to be maintained properly.

Anyway, here are some responses I got, I think they bring up some relevant information (full disclaimer, I don't claim to understand all/most of it):


The unbiased (MVUE) are not always greater than the rating-curve estimates.
 In fact, for extreme extrapolation the MVUE can yield negative load estimates. 
This is covered in the paper by Lukes IIRC. However, it requires pretty extreme 
extrapolation, and the corresponding certainties are enormous. This situation 
really should not arise if the sampling program covers the range of flows.

Unbiased estimators are not always admissible. I think this is mentioned in one
 of the papers that Gilroy, Hirsch and I wrote a few decades ago.

Also was forwarded an old email (2004ish):

 I'm wondering if you've ever seen anything like this
   before - the case were AMLE loads for the 7-parameter model
   go negative.  The first plot shows daily load estimates for
   the entire estimation period (loads.gif) - uncorrected (biased),
   mle, and amle loads appear reasonable for most of the time
   period.  However, if one looks closely at the later time
   period (loadsclose.gif) you can see the AMLE loads going
   negative, while the uncorrected and mle loads are OK.

And the response:

The answer is "yes," the AMLE (or Bradu and Mundlak's MVUE) can go
negative
-- it happens when you extrapolate _way_ past your calibration data.
What's going on is the argument to the gm() function, which has a (-V) in
it, where V is the variance of the log-space estimator -- is very
negative,
say less than 1.0.  Needless to say, in such situations, the uncertainty
is
so large that there really isn't much you can say about the loads --
that's
the point you ought to make.

Incidentally:  This is a weird coincidence:  I just finished updating the
paper on AMLE, and one of the comments you made was that I should either
eliminate the cryptic remark that "the AMLE is inadmissible" or I should
elaborate on it.  This is the same issue.  I chose to elaborate:

    (Lik\v{e}s \citeyear{Likes:1980} notes that $\hat{C}_{F}$ is,
strictly
    speaking, inadmissible, because $g_{\nu}()$ can assume a negative
    value. If such a case were ever to arise in practice, the estimator
    could always be improved upon by replacing a negative estimate with a
    zero).

It sounds like "such a case" has now arisen in practice.  Note that it is
not possible for the MLE to go negative, because the last step is to
exponentiate the log-space estimator.

Hope that helps

@jacaronda
Copy link
Author

Thanks for looking into this, Laura. The example I gave for which the
AMLE estimates are smaller than the least squares (MLE) predictions did
not require extrapolation.

p3014.flo[14401:14405,]
chr pos dump bottle codes stg stgcode q
14401 16129.00 2014-02-28 00:00:00 1 0 BX 16.309 -1
0.5759478
14402 16129.01 2014-02-28 00:15:00 1 0 BX 16.309 -1
0.5759478
14403 16129.02 2014-02-28 00:30:00 1 0 BX 16.309 -1
0.5759478
14404 16129.03 2014-02-28 00:45:00 1 0 BX 15.716 -1
0.5550062
14405 16129.04 2014-02-28 01:00:00 1 0 BX 16.011 -1
0.5654241

range(p3014.regdat$pos)
[1] "2014-01-30 12:52:00 GMT+8" "2014-02-28 12:04:00 GMT+8"
range(p3014.regdat$q)
[1] 0.3093852 4.3427059

However, it's new information to me that|||the unbiased (MVUE) are not
always greater than the rating-curve estimates|. This is the first
time I've encountered that, and I've been using MVUE estimates (my own R
code) for over a decade. It's also the first case where I've seen that
large a bias-correction. It's usually 0-10% in my experience. But this
is also the first time where I've used a 4-parameter model. I usually
use just one predictor (turbidity).

By the way, do you know whether loadest has an option to show the
un-corrected predictions?

Thanks, Jack

On 2/25/2016 9:11 AM, Laura DeCicco wrote:

I've been asking around. The nice thing about bringing up the question
is that it reminds some of our managers that people are using this
package, and it needs more resources to be maintained properly.

Anyway, here are some responses I got, I think they bring up some
relevant information (full disclaimer, I don't claim to understand
all/most of it):

|
|

In fact, for extreme extrapolation the MVUE can yield negative load estimates.
This is covered in the paper by Lukes IIRC. However, it requires pretty extreme
extrapolation, and the corresponding certainties are enormous. This situation
really should not arise if the sampling program covers the range of flows.

Unbiased estimators are not always admissible. I think this is mentioned in one
of the papers that Gilroy, Hirsch and I wrote a few decades ago.

|
|

Also was forwarded an old email (2004ish):

| I'm wondering if you've ever seen anything like this
before - the case were AMLE loads for the 7-parameter model
go negative. The first plot shows daily load estimates for
the entire estimation period (loads.gif) - uncorrected (biased),
mle, and amle loads appear reasonable for most of the time
period. However, if one looks closely at the later time
period (loadsclose.gif) you can see the AMLE loads going
negative, while the uncorrected and mle loads are OK.

|
|

And the response:

|The answer is "yes," the AMLE (or Bradu and Mundlak's MVUE) can go
negative
-- it happens when you extrapolate way past your calibration data.
What's going on is the argument to the gm() function, which has a (-V) in
it, where V is the variance of the log-space estimator -- is very
negative,
say less than 1.0. Needless to say, in such situations, the uncertainty
is
so large that there really isn't much you can say about the loads --
that's
the point you ought to make.

Incidentally: This is a weird coincidence: I just finished updating the
paper on AMLE, and one of the comments you made was that I should either
eliminate the cryptic remark that "the AMLE is inadmissible" or I should
elaborate on it. This is the same issue. I chose to elaborate:

 (Lik\v{e}s \citeyear{Likes:1980} notes that $\hat{C}_{F}$ is,

strictly
speaking, inadmissible, because $g_{\nu}()$ can assume a negative
value. If such a case were ever to arise in practice, the estimator
could always be improved upon by replacing a negative estimate with a
zero).

It sounds like "such a case" has now arisen in practice. Note that it is
not possible for the MLE to go negative, because the last step is to
exponentiate the log-space estimator.
|

Hope that helps


Reply to this email directly or view it on GitHub
#6 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants