Forward search without candidates at specific model size #307

fweber144 · 2022-05-02T10:04:38Z

I'm currently working on the search_terms argument (fixing bugs and improving documentation). While doing so, I realized that there can be model sizes for which the forward search doesn't have any candidate models, for example:

options(mc.cores = parallel::detectCores(logical = FALSE))
data("df_gaussian", package = "projpred")
df_gaussian <- df_gaussian[1:41, ]
dat <- data.frame(y = df_gaussian$y, df_gaussian$x)
library(rstanarm)
rfit <- stan_glm(y ~ X1 + X2 + X3 + X4 + X5,
                 data = dat,
                 seed = 1140350788)
library(projpred)
vs <- varsel(rfit,
             nclusters = 3,
             nclusters_pred = 5,
             method = "forward",
             search_terms = c("X1 + X2"),
             seed = 46782345)

(tested with projpred 2.1.1). If you inspect the output of that varsel() call, you'll see that X1 + X2 is regarded as the solution term at model size 1:

print(vs)

gives


Family: gaussian 
Link function: identity 

Formula: y ~ X1 + X2 + X3 + X4 + X5
Observations: 41
Search method: forward, maximum number of terms 1
Number of clusters used for selection: 3
Number of clusters used for prediction: 5
Suggested Projection Size: NA

Selection Summary:
 size solution_terms   elpd  se  diff diff.se
    0           <NA> -101.6 2.9 -17.4     3.4
    1        X1 + X2  -93.9 2.8  -9.7     2.3

and plot(vs) behaves accordingly. Now my question (especially to @AlejandroCatalina) is whether this is intended or whether X1 + X2 should be regarded as the solution term at model size 2 because it consists of the 2 terms X1 and X2. The latter would probably require some larger changes because all functions downstream of search_forward() would have to be adapted to deal with "empty model sizes".

The text was updated successfully, but these errors were encountered:

AlejandroCatalina · 2022-05-02T11:10:59Z

This happens because of the internal solution I came up with, which is definitely not the only one. Namely, if the user provides a search terms we take each of them as a unit, so they effectively are a single term form projpred perspective. This can lead to some confusion, as you see that size 1 is actually size 2. Does this help? To specifically answer the question, the behavior you see is intended, but this same functionality can be implemented by other means.

…

On Mon, 2 May 2022 at 1:04 PM, Frank Weber ***@***.***> wrote: I'm currently working on the search_terms argument (fixing bugs and improving documentation). While doing so, I realized that there can be model sizes for which the forward search doesn't have any candidate models, for example: options(mc.cores = parallel::detectCores(logical = FALSE)) data("df_gaussian", package = "projpred")df_gaussian <- df_gaussian[1:41, ]dat <- data.frame(y = df_gaussian$y, df_gaussian$x) library(rstanarm)rfit <- stan_glm(y ~ X1 + X2 + X3 + X4 + X5, data = dat, seed = 1140350788) library(projpred)vs <- varsel(rfit, nclusters = 3, nclusters_pred = 5, method = "forward", search_terms = c("X1 + X2"), seed = 46782345) (tested with projpred 2.1.1). If you inspect the output of that varsel() call, you'll see that X1 + X2 is regarded as the solution term at model size 1: print(vs) gives Family: gaussian Link function: identity Formula: y ~ X1 + X2 + X3 + X4 + X5 Observations: 41 Search method: forward, maximum number of terms 1 Number of clusters used for selection: 3 Number of clusters used for prediction: 5 Suggested Projection Size: NA Selection Summary: size solution_terms elpd se diff diff.se 0 <NA> -101.6 2.9 -17.4 3.4 1 X1 + X2 -93.9 2.8 -9.7 2.3 and plot(vs) behaves accordingly. Now my question (especially to @AlejandroCatalina <https://github.com/AlejandroCatalina>) is whether this is intended or whether X1 + X2 should be regarded as the solution term at model size 2 because it consists of the 2 terms X1 and X2. The latter would probably require some larger changes because all functions downstream of search_forward() would have to be adapted to deal with "empty model sizes". — Reply to this email directly, view it on GitHub <#307>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZ5FH24XDUSIQHODNPHRFTVH6SEDANCNFSM5U3RCUQA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

fweber144 · 2022-05-02T12:17:43Z

Thanks, yes that helps. For now, I'll keep the current behavior. In a future release, we could think about switching to the alternative approach proposed above which requires some larger changes.

fweber144 added the perhaps Consider implementing this, but this is not a must-have. label May 2, 2022

sor16 mentioned this issue Aug 8, 2022

search_terms argument in cv_varsel not working properly #345

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward search without candidates at specific model size #307

Forward search without candidates at specific model size #307

fweber144 commented May 2, 2022

AlejandroCatalina commented May 2, 2022 via email

fweber144 commented May 2, 2022

Forward search without candidates at specific model size #307

Forward search without candidates at specific model size #307

Comments

fweber144 commented May 2, 2022

AlejandroCatalina commented May 2, 2022 via email

fweber144 commented May 2, 2022