-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward search without candidates at specific model size #307
Labels
perhaps
Consider implementing this, but this is not a must-have.
Comments
This happens because of the internal solution I came up with, which is
definitely not the only one. Namely, if the user provides a search terms we
take each of them as a unit, so they effectively are a single term form
projpred perspective. This can lead to some confusion, as you see that size
1 is actually size 2. Does this help? To specifically answer the question,
the behavior you see is intended, but this same functionality can be
implemented by other means.
…On Mon, 2 May 2022 at 1:04 PM, Frank Weber ***@***.***> wrote:
I'm currently working on the search_terms argument (fixing bugs and
improving documentation). While doing so, I realized that there can be
model sizes for which the forward search doesn't have any candidate models,
for example:
options(mc.cores = parallel::detectCores(logical = FALSE))
data("df_gaussian", package = "projpred")df_gaussian <- df_gaussian[1:41, ]dat <- data.frame(y = df_gaussian$y, df_gaussian$x)
library(rstanarm)rfit <- stan_glm(y ~ X1 + X2 + X3 + X4 + X5,
data = dat,
seed = 1140350788)
library(projpred)vs <- varsel(rfit,
nclusters = 3,
nclusters_pred = 5,
method = "forward",
search_terms = c("X1 + X2"),
seed = 46782345)
(tested with projpred 2.1.1). If you inspect the output of that varsel()
call, you'll see that X1 + X2 is regarded as the solution term at model
size 1:
print(vs)
gives
Family: gaussian
Link function: identity
Formula: y ~ X1 + X2 + X3 + X4 + X5
Observations: 41
Search method: forward, maximum number of terms 1
Number of clusters used for selection: 3
Number of clusters used for prediction: 5
Suggested Projection Size: NA
Selection Summary:
size solution_terms elpd se diff diff.se
0 <NA> -101.6 2.9 -17.4 3.4
1 X1 + X2 -93.9 2.8 -9.7 2.3
and plot(vs) behaves accordingly. Now my question (especially to
@AlejandroCatalina <https://github.com/AlejandroCatalina>) is whether
this is intended or whether X1 + X2 should be regarded as the solution
term at model size 2 because it consists of the 2 terms X1 and X2. The
latter would probably require some larger changes because all functions
downstream of search_forward() would have to be adapted to deal with
"empty model sizes".
—
Reply to this email directly, view it on GitHub
<#307>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZ5FH24XDUSIQHODNPHRFTVH6SEDANCNFSM5U3RCUQA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks, yes that helps. For now, I'll keep the current behavior. In a future release, we could think about switching to the alternative approach proposed above which requires some larger changes. |
fweber144
added
the
perhaps
Consider implementing this, but this is not a must-have.
label
May 2, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm currently working on the
search_terms
argument (fixing bugs and improving documentation). While doing so, I realized that there can be model sizes for which the forward search doesn't have any candidate models, for example:(tested with projpred 2.1.1). If you inspect the output of that
varsel()
call, you'll see thatX1 + X2
is regarded as the solution term at model size 1:print(vs)
gives
and
plot(vs)
behaves accordingly. Now my question (especially to @AlejandroCatalina) is whether this is intended or whetherX1 + X2
should be regarded as the solution term at model size 2 because it consists of the 2 termsX1
andX2
. The latter would probably require some larger changes because all functions downstream ofsearch_forward()
would have to be adapted to deal with "empty model sizes".The text was updated successfully, but these errors were encountered: