Releases: tidymodels/rsample
rsample 1.3.1
-
The new
internal_calibration_split()function and its methods for various resamples is for usage in tune to create a internal split of the analysis set to fit the preprocessor and model on one part and the post-processor on the other part (#483, #488, #489, #569, #575, #577, #582). -
New accessor function
calibration()for the calibration set of an internal calibration split (#581).
rsample 1.3.0
-
Bootstrap intervals via
int_pctl(),int_t(), andint_bca()now allow for more flexible grouping (#465). -
Errors and warnings are now styled via cli (#499, #502). Largely done by @PriKalra (#523, #526, #528, #530, #531, #532), @Dpananos (#516, #517, #529), and @JamesHWade (#518) as part of the tidyverse dev day.
-
rolling_origin()is now superseded bysliding_window(),sliding_index(), andsliding_period()which provide more flexibility and control (@nmercadeb, #524). -
The deprecation of
validation_split(),validation_time_split(), andgroup_validation_split()has been moved to the next level so that they now warn.
Bug fixes
-
vfold_cv()now utilizes thebreaksargument correctly for repeated cross-validation (@ZWael, #471). -
Grouped resampling functions now work with an explicit
strata = NULLinstead of strata being either a name or missing (#485).
Breaking changes
-
vfold_cv()andclustering_cv()now error on implicit leave-one-out cross-validation (@seb09, #527). -
The class of grouped MC splits is now
group_mc_splitinstead ofgrouped_mc_split, aligning it with the other grouped splits (#478). -
The
rsplitobjects of anapparent()split now have the correct class inheritance structure. The order is nowapparent_splitand thenrsplitrather than the other way around (#477).
Documentation improvements
-
Improved documentation and formatting: function names are now more easily identifiable through either
()at the end or being links to the function documentation (@brshallo , #521). -
Formatting improvement: package names are now not in backticks anymore (@agmurray, #525).
-
Improved documentation for
initial_split()and friends (@laurabrianna, #519). -
Removed trailing space in printing of
mc_cv()objects (@ccani007, #464).
rsample 1.2.1
rsample 1.2.0
-
The new
initial_validation_split(), along with variantsinitial_validation_time_split()andgroup_initial_validation_split(), generates a three-way split of the data into training, validation, and test sets. With the newvalidation_set(), this can be turned into anrsetobject for tuning (#403, #446). -
validation_split(),validation_time_split(), andgroup_validation_split()have been soft-deprecated in favor of the new functions implementing a 3-way split (initial_validation_split(),initial_validation_time_split(), andgroup_initial_validation_split()) (#449). -
Functions which don't use the ellipsis
...now enforce empty dots (#429). -
make_splits()gained an example in the documentation (@AngelFelizR, #432). -
training(),testing(),analysis(), andassessment()are now S3 generics with methods forrsplitobjects. Previously they manually required the input to be anrsplitobject (#384). -
The
int_*()functions are now S3 generics and have corresponding methods for classbootstraps(#435). -
The underlying mechanics of data splitting were changed so that
Survobjects maintain their class. This change affects the row names of the resulting objects; they are reindexed from one instead of being a subset of the original row names (#443). -
rsample does not re-export
gather()anymore (#451).
rsample 1.1.1
-
All grouped resampling functions (
group_vfold_cv(),group_mc_cv(),group_initial_split()andgroup_validation_split(), andgroup_bootstraps()) now support stratification. Strata must be constant within each group (@mikemahoney218, #317, #360, #363, #364, #365). -
Added a new function,
clustering_cv(), for blocked cross-validation in various predictor spaces. This is a very flexible function, taking arguments to bothdistance_functionandcluster_function, allowing it to be used for spatial clustering as well as potentially phylogenetic and other forms of clustering (@mikemahoney218, #351). -
bootstraps()andgroup_bootstraps()now warn if resampling returns any empty assessment sets. Previously,bootstraps()was silent whilegroup_bootstraps()errored (@mikemahoney218, #356, #357). -
The assessment set of
validation_time_split()now also contains the lagged observations (#376). -
The new helper
get_rsplit()lets you conveniently access thersplitobjects inside anrsetobjects (@mikemahoney218, #399). -
The result of
initial_time_split()now has its own subclass"initial_time_split", in addition to existing classes (#397). -
The dependency on the ellipsis package has been removed (#393).
-
Removed an overly strict test in preparation for dplyr 1.1.0 (#380).
rsample 1.1.0
-
rset objects now include all parameters used to create them as attributes (#329).
-
Objects returned by sliding functions now have an
indexattribute, where appropriate, containing the column name used as an index (#329). -
Objects returned by
permutations()now have apermutesattribute containing the column name used for permutation (#329). -
Added
breaksandpoolas attributes to all functions which support stratification (#329). -
Changed the "strata" attribute on rset objects so that it now is either a character vector identifying the column used to stratify the data, and is not present (set to
NULL) if stratification was not used. (#329) -
Added a new function,
reshuffle_rset(), which takes anrsetobject and generates a new version of it using the same arguments but the current random seed. (#79, #329) -
Added arguments to control how
group_vfold_cv()combines groups. Usebalance = "groups"to assign (roughly) the same number of groups to each fold, orbalance = "observations"to assign (roughly) the same number of observations to each fold. -
Added a
repeatsargument togroup_vfold_cv()(#330). -
Added new functions for grouped resampling:
group_mc_cv()(#313),group_initial_split()andgroup_validation_split()(#315), andgroup_bootstraps()(#316). -
Added a new function,
reverse_splits(), to swap analysis and assessment splits (#319, #284). -
Improved the error thrown when calling
assessment()on aperm_splitobject created bypermutations()(#321, #322).
rsample 1.0.0
-
Fixed how
nested_cv()handles call objects so variables in the environment can be used when specifying resampling schemes (#81). -
Updated to testthat 3e (#280) and added better checking for
vfold_cv()(#293). -
Finally removed the
gather()method forrsetobjects. Usetidyr::pivot_longer()instead (#280). -
Changed
initial_split()to avoid calling tidyselect twice onstrata(#296). This fix stopsinitial_split()from generating messages like:
Note: Using an external vector in selections is ambiguous.
i Use `all_of(strata)` instead of `strata` to silence this message.
i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
- Added better printing methods for initial split objects.
rsample 0.1.1
-
Updated documentation on stratified sampling (#245).
-
Changed
make_splits()to an S3 generic, with the original functionality a method forlistand a new method for dataframes that allows users to create a split from existing analysis & assessment sets (@liamblake, #246). -
Added
validation_time_split()for a single validation sample taking the first samples for training (@mine-cetinkaya-rundel, #256). -
Escalated the deprecation of the
gather()method forrsetobjects to a hard deprecation. Usetidyr::pivot_longer()instead (#257). -
Changed resample "fingerprint" to hash the indices only rather than the entire resample result (including the data object). This is much faster and will still ensure the same resample for the same original data object (#259).
rsample 0.1.0
-
Fixed how
mc_cv(),initial_split(), andvalidation_split()use thepropargument to first compute the assessment indices, rather than the analysis indices. This is a minor but breaking change in some situations; the previous implementation could cause an inconsistency in the sizes of the generated analysis and assessment sets when compared to howpropis documented to function (#217, @issactoast). -
Fixed problem with creation of
apparent()(#223) andcaret2rsample()(#232) resamples. -
Re-licensed package from GPL-2 to MIT. See consent from copyright holders here.
-
Attempts to stratify on a
Survobject now error more informatively (#230). -
Exposed
poolargument frommake_strata()in user-facing resampling functions (#229). -
Deprecated the
gather()method forrsetobjects in favor oftidyr::pivot_longer()(#233). -
Fixed bug in
make_strata()for numeric variables withNAvalues (@brian-j-smith, #236).
rsample 0.0.9
-
New
rset_reconstruct(), a developer tool to ease creation of new rset subclasses (#210). -
Added
permutations(), a function for creating permutation resamples by performing column-wise shuffling (@mattwarkentin, #198). -
Fixed an issue where empty assessment sets couldn't be created by
make_splits()(#188). -
rsetobjects now contain a "fingerprint" attribute that can be used to check to see if the same object uses the same resamples. -
The
reg_intervals()function is a convenience function forlm(),glm(),survreg(), andcoxph()models (#206). -
A few internal functions were exported so that
rsample-adjacent packages can use the same underlying code. -
The
obj_sum()method forrsplitobjects was updated (#215). -
Changed the inheritance structure for
rsplitobjects from specific to general and simplified the methods for thecomplement()generic (#216).