-
Notifications
You must be signed in to change notification settings - Fork 9
Description
I notice that the can.include and must.include each work individually, but not together. See my reprex below.
I can't think of any reason why they can't, in principle, both be used. It would be very useful to be able to build a sample that includes legacy samples but also is restricted to a subset of the population.
Looking at the R-code, it seems to me that the problem lies with: can.include <- 1:nrow(dat) in clhs.data.frame (https://github.com/pierreroudier/clhs/blob/8d45408d030b74b81073a8aa5fc4aec4d860ce06/R/clhs-data.frame.R#L122C1-L122C33), where if there is a must.include, then can.include becomes the rest of the rows. I don't know, however, if changing this in the R-code would work with the cpp implementation...
library(clhs)
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, were retired in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
df <- data.frame(
a = runif(1000),
b = rnorm(1000),
c = sample(LETTERS[1:5], size = 1000, replace = TRUE))
res1 <- clhs(df, size = 50, use.cpp = TRUE, iter = 5000, progress = FALSE, simple = FALSE,
can.include = 1:500)
#> Warning: NAs introduced by coercion
range(res1$index_samples) # can.include correctly restricts range
#> [1] 10 498
res2 <- clhs(df, size = 50, use.cpp = TRUE, iter = 5000, progress = FALSE, simple = FALSE,
must.include = 1:25)
#> Warning: NAs introduced by coercion
print(res2$index_samples, N=50) # must.include correctly guarantees selection
#> [1] 922 279 712 736 754 962 569 810 557 554 109 244 847 356 629 352 724 686 879
#> [20] 819 286 448 825 133 523 1 2 3 4 5 6 7 8 9 10 11 12 13
#> [39] 14 15 16 17 18 19 20 21 22 23 24 25
range(res2$index_samples)
#> [1] 1 962
res3 <- clhs(df, size = 50, use.cpp = TRUE, iter = 5000, progress = FALSE, simple = FALSE,
can.include = 1:500, must.include = 1:25)
#> Warning: NAs introduced by coercion
range(res3$index_samples) # in combination with must.include, can.include does not restrict range
#> [1] 1 954
print(res3$index_samples, N=50)
#> [1] 399 830 841 360 625 548 448 19 232 954 199 785 441 322 603 252 754 804 244
#> [20] 484 167 611 236 261 462 1 2 3 4 5 6 7 8 9 10 11 12 13
#> [39] 14 15 16 17 18 19 20 21 22 23 24 25Created on 2024-06-20 with reprex v2.0.2
Session info
sessionInfo()
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.utf8
#> [2] LC_CTYPE=English_United Kingdom.utf8
#> [3] LC_MONETARY=English_United Kingdom.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United Kingdom.utf8
#>
#> time zone: Europe/Oslo
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] styler_1.10.2 digest_0.6.33 fastmap_1.1.1 xfun_0.41
#> [5] magrittr_2.0.3 glue_1.6.2 R.utils_2.12.3 knitr_1.45
#> [9] htmltools_0.5.7 rmarkdown_2.25 lifecycle_1.0.4 cli_3.6.1
#> [13] R.methodsS3_1.8.2 vctrs_0.6.4 reprex_2.0.2 withr_2.5.2
#> [17] compiler_4.3.1 R.oo_1.25.0 R.cache_0.16.0 purrr_1.0.2
#> [21] rstudioapi_0.15.0 tools_4.3.1 evaluate_0.23 yaml_2.3.7
#> [25] rlang_1.1.2 fs_1.6.3Created on 2024-06-20 with reprex v2.0.2