How to interpret "optimal cutpoint" when bootstrapping? #50

kapsner · 2021-12-12T13:26:43Z

kapsner
Dec 12, 2021

When performing bootstrapping, the "optimal cutpoint" is different from the mean / median of all bootstrap samples.

How is this to be interpreted? How is the "optimal cutpoint" determined during bootstrapping?

opt_cut$optimal_cutpoint == opt_cut %>%
    dplyr::select(boot) %>%
    tidyr::unnest(cols = boot) %>%
    dplyr::summarize(mean_oc = mean(optimal_cutpoint))

FALSE

opt_cut$optimal_cutpoint == opt_cut %>%
    dplyr::select(boot) %>%
    tidyr::unnest(cols = boot) %>%
    dplyr::summarize(mean_oc = median(optimal_cutpoint))

FALSE

Answered by Thie1e

Dec 12, 2021

Hi,

I can't see the code you used to create opt_cut, but I assume it is something like:

opt_cut <- cutpointr(suicide, dsi, suicide, 
  method = maximize_metric, metric = youden, boot_runs = 1000)

This will calculate an optimal cutpoint for the Youden-Index in the full data set and also calculate optimal cutpoints for the Youden-Index in 1000 bootstrap samples. The cutpoint in the full data set is of course not necessarily identical to the mean or median of the cutpoints from the bootstrap.

This type of "outer bootstrap" is useful for estimating the out-of-sample performance of the estimation method at hand, in this example empirically maximizing the Youden-Index. You get the distribution…

View full answer

Thie1e · 2021-12-12T14:04:54Z

Thie1e
Dec 12, 2021
Maintainer

Hi,

I can't see the code you used to create opt_cut, but I assume it is something like:

opt_cut <- cutpointr(suicide, dsi, suicide, 
  method = maximize_metric, metric = youden, boot_runs = 1000)

This will calculate an optimal cutpoint for the Youden-Index in the full data set and also calculate optimal cutpoints for the Youden-Index in 1000 bootstrap samples. The cutpoint in the full data set is of course not necessarily identical to the mean or median of the cutpoints from the bootstrap.

This type of "outer bootstrap" is useful for estimating the out-of-sample performance of the estimation method at hand, in this example empirically maximizing the Youden-Index. You get the distribution of the obtained values of the Youden-Index in the bootstrap samples, along with some other metrics (try summary(opt_cut)).

If you want bootstrapped cutpoints, you should use method = maximize_boot_metric or method = minimize_boot_metric, depending on the metric. For example:

opt_cut_b <- cutpointr(
  data = suicide, 
  x = dsi, 
  class = suicide, 
  method = maximize_boot_metric, 
  metric = youden, 
  boot_cut = 500, 
  summary_func = mean)

This will return an optimal cutpoint which is the mean of the optimal cutpoints from 500 bootstrap samples, maximizing the Youden-Index in every sample.

If you would like to estimate the out-of-sample performance too, you can again set, e.g., boot_runs = 1000 to run the outer bootstrap.

Keep in mind, though, that this will create 1000 outer bootstrap samples which are then resampled 500 times each, so that this might take quite long to run. You can parallelize this if you set allowParallel = TRUE and start a cluster, see the example code in ?cutpointr.

Let me know if you have other questions or if this does not answer your question.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to interpret "optimal cutpoint" when bootstrapping? #50

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to interpret "optimal cutpoint" when bootstrapping? #50

kapsner Dec 12, 2021

Replies: 1 comment

Thie1e Dec 12, 2021 Maintainer

kapsner
Dec 12, 2021

Thie1e
Dec 12, 2021
Maintainer