Skip to content

Commit d741099

Browse files
authored
Merge pull request #876 from mlr-org/fix/targettrafo_levels
fix: Add `drop_levels = FALSE` to call of `mlr3::convert_task()` in `PipeOpTargetMutate` and `PipeOpTargetTrafoScaleRange`
2 parents 4bb786d + df9f964 commit d741099

File tree

7 files changed

+112
-85
lines changed

7 files changed

+112
-85
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212
* Fix: Corrected hash calculation for `PipeOpFilter`.
1313
* New PipeOps `PipeOpEncodePLQuantiles` and `PipeOpEncodePLTree` that implement piecewise linear encoding with two different binning methods.
1414
* Compatibility with new `R6` release.
15+
* Fix: `PipeOpTargetMutate` and `PipeOpTargetTrafoScaleRange` no longer drop unseen factor levels of features or targets during train and predict.
16+
* Simplified parameter checks and added internal type checking for `PipeOpTargetMutate`.
1517

1618
# mlr3pipelines 0.7.1
1719

R/PipeOpTrafo.R

Lines changed: 32 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -15,37 +15,36 @@
1515
#'
1616
#' @section Construction:
1717
#' ```
18-
#' PipeOpTargetTrafo$new(id, param_set = ps(), param_vals = list() packages = character(0), task_type_in = "Task", task_type_out = task_type_in, tags = NULL)
18+
#' PipeOpTargetTrafo$new(id, param_set = ps(), param_vals = list(), packages = character(0), task_type_in = "Task", task_type_out = task_type_in, tags = NULL)
1919
#' ```
2020
#'
2121
#' * `id` :: `character(1)`\cr
2222
#' Identifier of resulting object. See `$id` slot of [`PipeOp`].
2323
#' * `param_set` :: [`ParamSet`][paradox::ParamSet]\cr
24-
#' Parameter space description. This should be created by the subclass and given to
25-
#' `super$initialize()`.
24+
#' Parameter space description. This should be created by the subclass and given to `super$initialize()`.
2625
#' * `param_vals` :: named `list`\cr
2726
#' List of hyperparameter settings, overwriting the hyperparameter settings given in `param_set`.
2827
#' The subclass should have its own `param_vals` parameter and pass it on to `super$initialize()`.
2928
#' Default `list()`.
3029
#' * `task_type_in` :: `character(1)`\cr
31-
#' The class of [`Task`][mlr3::Task] that should be accepted as input. This
32-
#' should generally be a `character(1)` identifying a type of [`Task`][mlr3::Task], e.g. `"Task"`, `"TaskClassif"` or
33-
#' `"TaskRegr"` (or another subclass introduced by other packages). Default is `"Task"`.
30+
#' The class of [`Task`][mlr3::Task] that should be accepted as input. This should generally be a `character(1)`
31+
#' identifying a type of [`Task`][mlr3::Task], e.g. `"Task"`, `"TaskClassif"` or `"TaskRegr"` (or another subclass
32+
#' introduced by other packages). Default is `"Task"`.
3433
#' * `task_type_out` :: `character(1)`\cr
35-
#' The class of [`Task`][mlr3::Task] that is produced as output. This
36-
#' should generally be a `character(1)` identifying a type of [`Task`][mlr3::Task], e.g. `"Task"`, `"TaskClassif"` or
37-
#' `"TaskRegr"` (or another subclass introduced by other packages). Default is the value of `task_type_in`.
38-
#' * packages :: `character`\cr
34+
#' The class of [`Task`][mlr3::Task] that is produced as output. This should generally be a `character(1)`
35+
#' identifying a type of [`Task`][mlr3::Task], e.g. `"Task"`, `"TaskClassif"` or `"TaskRegr"` (or another subclass
36+
#' introduced by other packages). Default is the value of `task_type_in`.
37+
#' * `packages` :: `character`\cr
3938
#' Set of all required packages for the [`PipeOp`]'s methods. See `$packages` slot. Default is
4039
#' `character(0)`.
41-
#' * tags :: `character` | `NULL`\cr
40+
#' * `tags` :: `character` | `NULL`\cr
4241
#' Tags of the resulting `PipeOp`. This is added to the tag `"target transform"`. Default `NULL`.
4342
#'
4443
#' @section Input and Output Channels:
45-
#' [`PipeOpTargetTrafo`] has one input channels named `"input"` taking a [`Task`][mlr3::Task] (or whatever class
44+
#' `PipeOpTargetTrafo` has one input channels named `"input"` taking a [`Task`][mlr3::Task] (or whatever class
4645
#' was specified by the `task_type` during construction) both during training and prediction.
4746
#'
48-
#' [`PipeOpTargetTrafo`] has two output channels named `"fun"` and `"output"`. During training,
47+
#' `PipeOpTargetTrafo` has two output channels named `"fun"` and `"output"`. During training,
4948
#' `"fun"` returns `NULL` and during prediction, `"fun"` returns a function that can later be used
5049
#' to invert the transformation done during training according to the overloaded `.train_invert()`
5150
#' and `.invert()` functions. `"output"` returns the modified input [`Task`][mlr3::Task] (or `task_type`)
@@ -56,11 +55,11 @@
5655
#' `.get_state()` function.
5756
#'
5857
#' @section Internals:
59-
#' [`PipeOpTargetTrafo`] is an abstract class inheriting from [`PipeOp`]. It implements the
58+
#' `PipeOpTargetTrafo` is an abstract class inheriting from [`PipeOp`]. It implements the
6059
#' `private$.train()` and `private$.predict()` functions. These functions perform checks and go on
6160
#' to call `.get_state()`, `.transform()`, `.train_invert()`. `.invert()` is packaged and sent along
6261
#' the `"fun"` output to be applied to a [`Prediction`][mlr3::Prediction] by [`PipeOpTargetInvert`].
63-
#' A subclass of [`PipeOpTargetTrafo`] should implement these functions and be used in combination
62+
#' A subclass of `PipeOpTargetTrafo` should implement these functions and be used in combination
6463
#' with [`PipeOpTargetInvert`].
6564
#'
6665
#' @section Fields:
@@ -70,15 +69,15 @@
7069
#' Methods inherited from [`PipeOp`], as well as:
7170
#' * `.get_state(task)`\cr
7271
#' ([`Task`][mlr3::Task]) -> `list`\cr
73-
#' Called by [`PipeOpTargetTrafo`]'s implementation of `private$.train()`. Takes a single
72+
#' Called by `PipeOpTargetTrafo`'s implementation of `private$.train()`. Takes a single
7473
#' [`Task`][mlr3::Task] as input and returns a `list` to set the `$state`.
7574
#' `.get_state()` will be called a single time during *training* right before
7675
#' `.transform()` is called. The return value (i.e. the `$state`) should contain info needed in
7776
#' `.transform()` as well as in `.invert()`.\cr
7877
#' The base implementation returns `list()` and should be overloaded if setting the state is desired.
7978
#' * `.transform(task, phase)`\cr
8079
#' ([`Task`][mlr3::Task], `character(1)`) -> [`Task`][mlr3::Task]\cr
81-
#' Called by [`PipeOpTargetTrafo`]'s implementation of `private$.train()` and
80+
#' Called by `PipeOpTargetTrafo`'s implementation of `private$.train()` and
8281
#' `private$.predict()`. Takes a single [`Task`][mlr3::Task] as input and modifies it.
8382
#' This should typically consist of calculating a new target and modifying the
8483
#' [`Task`][mlr3::Task] by using the [`convert_task`][mlr3::convert_task] function. `.transform()` will be called during training and
@@ -93,16 +92,15 @@
9392
#' This function is abstract and should be overloaded by inheriting classes.
9493
#' * `.train_invert(task)`\cr
9594
#' ([`Task`][mlr3::Task]) -> `any`\cr
96-
#' Called by [`PipeOpTargetTrafo`]'s implementation of `private$.predict()`. Takes a single
95+
#' Called by `PipeOpTargetTrafo`'s implementation of `private$.predict()`. Takes a single
9796
#' [`Task`][mlr3::Task] as input and returns an arbitrary value that will be given as
98-
#' `predict_phase_state` to `.invert()`. This should not modify the input [`Task`][mlr3::Task] .\cr
97+
#' `predict_phase_state` to `.invert()`. This should not modify the input [`Task`][mlr3::Task].\cr
9998
#' The base implementation returns a list with a single element, the `$truth` column of the [`Task`][mlr3::Task],
10099
#' and should be overloaded if a more training-phase-dependent state is desired.
101100
#' * `.invert(prediction, predict_phase_state)`\cr
102101
#' ([`Prediction`][mlr3::Prediction], `any`) -> [`Prediction`][mlr3::Prediction]\cr
103-
#' Takes a [`Prediction`][mlr3::Prediction] and a `predict_phase_state`
104-
#' object as input and inverts the prediction. This function is sent as `"fun"` to
105-
#' [`PipeOpTargetInvert`].\cr
102+
#' Takes a [`Prediction`][mlr3::Prediction] and a `predict_phase_state` object as input and inverts the prediction.
103+
#' This function is sent as `"fun"` to [`PipeOpTargetInvert`].\cr
106104
#' This function is abstract and should be overloaded by inheriting classes. Care should be
107105
#' taken that the `predict_type` of the [`Prediction`][mlr3::Prediction] being inverted is handled well.
108106
#' * `.invert_help(predict_phase_state)`\cr
@@ -188,7 +186,7 @@ PipeOpTargetTrafo = R6Class("PipeOpTargetTrafo",
188186
#'
189187
#' During prediction phase the function supplied through `"fun"` is called with a `list` containing
190188
#' the `"prediction"` as a single element, and should return a `list` with a single element
191-
#' (a [`Prediction`][mlr3::Prediction]) that is returned by [`PipeOpTargetInvert`].
189+
#' (a [`Prediction`][mlr3::Prediction]) that is returned by `PipeOpTargetInvert`.
192190
#'
193191
#' @section Construction:
194192
#' ```
@@ -201,18 +199,18 @@ PipeOpTargetTrafo = R6Class("PipeOpTargetTrafo",
201199
#' List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default `list()`.
202200
#'
203201
#' @section Input and Output Channels:
204-
#' [`PipeOpTargetInvert`] has two input channels named `"fun"` and `"prediction"`. During
202+
#' `PipeOpTargetInvert` has two input channels named `"fun"` and `"prediction"`. During
205203
#' training, both take `NULL` as input. During prediction, `"fun"` takes a function and
206204
#' `"prediction"` takes a [`Prediction`][mlr3::Prediction].
207205
#'
208-
#' [`PipeOpTargetInvert`] has one output channel named `"output"` and returns `NULL` during
206+
#' `PipeOpTargetInvert` has one output channel named `"output"` and returns `NULL` during
209207
#' training and a [`Prediction`][mlr3::Prediction] during prediction.
210208
#'
211209
#' @section State:
212210
#' The `$state` is left empty (`list()`).
213211
#'
214212
#' @section Parameters:
215-
#' [`PipeOpTargetInvert`] has no parameters.
213+
#' `PipeOpTargetInvert` has no parameters.
216214
#'
217215
#' @section Internals:
218216
#' Should be used in combination with a subclass of [`PipeOpTargetTrafo`].
@@ -283,7 +281,7 @@ mlr_pipeops$add("targetinvert", PipeOpTargetInvert)
283281
#'
284282
#' @section Parameters:
285283
#' The parameters are the parameters inherited from [`PipeOpTargetTrafo`], as well as:
286-
#' * `trafo` :: `function` `data.table` -> `data.table`\cr
284+
#' * `trafo` :: `function` `data.table` -> `data.frame` | `data.table` | `matrix`\cr
287285
#' Transformation function for the target. Should only be a function of the target, i.e., taking a
288286
#' single `data.table` argument, typically with one column. The return value is used as the new
289287
#' target of the resulting [`Task`][mlr3::Task]. To change target names, change the column name of the data
@@ -349,8 +347,8 @@ PipeOpTargetMutate = R6Class("PipeOpTargetMutate",
349347
initialize = function(id = "targetmutate", param_vals = list(), new_task_type = NULL) {
350348
private$.new_task_type = assert_choice(new_task_type, mlr_reflections$task_types$type, null.ok = TRUE)
351349
ps = ps(
352-
trafo = p_uty(tags = c("train", "predict"), custom_check = crate(function(x) check_function(x, nargs = 1L))),
353-
inverter = p_uty(tags = "predict", custom_check = crate(function(x) check_function(x, nargs = 1L)))
350+
trafo = p_uty(tags = c("train", "predict"), custom_check = check_function),
351+
inverter = p_uty(tags = "predict", custom_check = check_function)
354352
)
355353
# We could add a condition here for new_task_type on trafo and inverter when mlr-org/paradox#278 has an answer.
356354
# HOWEVER conditions are broken in paradox, it is a terrible idea to use them in PipeOps,
@@ -373,8 +371,11 @@ PipeOpTargetMutate = R6Class("PipeOpTargetMutate",
373371

374372
.transform = function(task, phase) {
375373
new_target = self$param_set$values$trafo(task$data(cols = task$target_names))
374+
if (!is.data.frame(new_target) && !is.matrix(new_target)) {
375+
stopf("Hyperparameter 'trafo' must be a function returning a 'data.frame', 'data.table', or 'matrix', not '%s'.", class(new_target)[[1L]])
376+
}
376377
task$cbind(new_target)
377-
convert_task(task, target = colnames(new_target), new_type = private$.new_task_type, drop_original_target = TRUE)
378+
convert_task(task, target = colnames(new_target), new_type = private$.new_task_type, drop_original_target = TRUE, drop_levels = FALSE)
378379
},
379380

380381
.invert = function(prediction, predict_phase_state) {
@@ -478,7 +479,7 @@ PipeOpTargetTrafoScaleRange = R6Class("PipeOpTargetTrafoScaleRange",
478479
new_target = self$state$offset + x * self$state$scale
479480
setnames(new_target, paste0(colnames(new_target), ".scaled"))
480481
task$cbind(new_target)
481-
convert_task(task, target = colnames(new_target), drop_original_target = TRUE)
482+
convert_task(task, target = colnames(new_target), drop_original_target = TRUE, drop_levels = FALSE)
482483
},
483484

484485
.invert = function(prediction, predict_phase_state) {

0 commit comments

Comments
 (0)